You are on page 1of 19



This chapter presents the review of literature that relates to the topic of the
research. They are the basic concept of phonology, English pronunciation, drilling
technique, the purpose of teaching pronunciation, the factors influencing students’
pronunciation ability, Text to Speech (TTS) software in language learning, teaching
pronunciation by using “Talk Any” software, the importance of educational software
in the teaching and learning process and the action hypotheses. The topic will be
presented in turn.

2.1 The Nature of Pronunciation

2.1.1 The Definition of Pronunciation
Pronunciation is one of the important aspects in English, especially in oral
communication. Every sound, stress pattern, and intonation may convey meaning.
The non native speakers of English who speak English have to be very careful in
pronouncing some utterances or he may create misunderstanding. So, having an
intelligible pronunciation is necessary rather than having a native-like pronunciation.
According to Lado (1964: 70), pronunciation is the use of a sound system in
speaking and listening. Here, pronunciation is merely treated as the act that happens
in speaking and listening. Pronunciation is the act or manner of pronouncing words;
utterance of speech. In other words, it can also be said that it is a way of speaking a
word, especially a way that is accepted or generally understood. In the senses,
pronunciation entails the production and reception of sounds of speech and the
achievement of the meaning (Kristina, Diah, et al.2006: 1). This second definition
gives a briefer pronunciation’s definition. It contains some important keys in the
words, phrases, and sentences being pronounced should be intelligible.
2.1.2 The Definition of Phonology
According to Yule (2001:54) the definition of phonology is the study of the
systems, patterns and use of sounds that occur in the languages in this world. In
addition, Kusuma (1990:7) adds that phonology deals with the phonemes and
sequences of phonemes. A phoneme is a class of sounds. It is an abstract alphabet
unit that can be used for writing a language in a systematic and unambiguous way
(Yusuf, 1998:19). For example, phoneme /p/ and /t/ in word pie and tie, etc.
However, to acquire a full understanding of the use of speech sounds in English, we
should study both the phonology and phonetics.

2.1.3 The Definition of Phonetics

Based on Kusuma (1990:1), the definition of phonetics is the study of speech
sounds, the production, transmission and reception. After knowing the definition of
phonetics, we understand the way where the sounds of languages are formed. Further,
the mistake that may occur in someone utterances may be detected and corrected. In
sum, to complete an act of communication, our speech mechanism should simply
function in such a way to produce sounds. In turn, it must be received and understood
by the listener.

2.1.4 Speech Sounds Production

According to Giegerich (1995:1), the most usual source of energy for our
vocal activity is provided by an air stream expelled from our lungs. In short, the air
stream is responsible in producing the speech sounds. The production of speech
sounds covers four processes, they are initiation, phonation, oro-nasal, and

1. Initiation Process
The initiation process or airstream mechanism is the process when the air is
expelled from the lungs then it goes through the trachea to the oral/nasal
cavity. In English, speech sounds are the result of “a pulmonic egressive air
stream” (Giegerich, 1992). Ambercrombie in Dr. Photini Coutsougera‟s
journal (2004) says that the airstream provided the action of some organs of
speech that makes audible the movements of other organs. In the same
journal, other expert, Catford (1994: 4) states that the airstream mechanism is
the movements of organs during the organic phase act upon the air contained
within the vocal tract. They compress the air, or dilate it, and they set it
moving in various ways – in rapid puffs, in sudden bursts, in a smooth flow,
in a rough, eddying, turbulent stream, and so on.” There are 3 airstream
mechanisms used in world languages: pulmonic (involving lungs), velaric
(involving velum and tongue) and glottalic (involving glottis and larynx).

2. The Phonation Process

The phonation process occurs at the larynx. The larynx has two horizontal
folds of tissue in the passage of air; they are the vocal folds. The gap between
these folds is called the glottis. When glottis is closed no air can pass. Or it
can have a narrow opening which can make the vocal folds vibrate producing
the “voiced sounds”. The examples of voiced sounds are: [b], [g], and all
vowels. Finally, when the glottis can be wide open, as in normal breathing,
thus the vibration of the vocal folds is reduced, producing the “voiceless
sounds”, for example a plosive such as [p], [t], and [k].

3. Oro-Nasal Process
After it has gone through the larynx and the pharynx, the air can go into the
nasal or the oral cavity. Giegerich (1995:3-5) thinks that the air stream can go
either into the nasal or oral activity after having passed the larynx and the
back of the throat (pharynx). This process called oro-nasal process. The
velum is the part responsible for that selection. Through the oro-nasal process,
nasal consonants (/m/, /n/, /ŋ/) can be differentiated to other sounds.

4. The Articulation Process

The final process is articulation. The articulation process takes place in the
mouth and it is the process which speech sounds are distinguished from one
another in terms of the place where and the manner how they are articulated.
In other word, the people can distinct the oral cavity, which acts as a
resonator, and the articulators, which can be active or passive: upper and
lower lips, upper and lower teeth, tongue (tip, blade, front, back) and roof of
the mouth (alveolar ridge, palate and velum).
Based on the explanation above, the research focuses on the articulation or
pronunciation of speech sounds. It is due to the articulation of speech sounds
appears in the pronunciation of English words.

2.1.5 The Classification of Speech Sounds

Jones, (1995:12), states that there are many kinds of speech sounds that can be
articulated. The three kinds of speech sounds are vowels, diphthongs and consonants.
Based on Kenworthy (1987:46), the three kinds of speech sounds cause a perception
problem to the students. The vowels are produced in which the air stream can pass
freely through and out of mouth (Kusuma, 1990:14). These sounds are made in which
there is no hindrance to the flow of air as it passes from the larynx to the lips. Based
on Kusuma (1990:28-38) English has twelve pure vowels, which are:
1. [i:] – tree / tri: / 7. [ɑ:] – pass /pɑ:s /
2. [I] – milk / mIlk / 8. [ʌ] – sun / sʌn /
3. [e] – bed / bed / 9. [ʊ:] – blue / blʊ: /
4. [æ] – sat / sæt / 10. [ʊ] – put / pʊt /
5. [ɜ:] – word / wɜ:d / 11. [ɔ:] – four / fɔ:(r) /
6. [ə] – along / ə’lɔ:ŋ / 12. [ɔ] – dog /dɔg /

According to the tongue position (Kusuma, 1990:18), the English vowels can be put
in a chart as follows:


Close u:
Half Close i ʊ
MID e ɔ:
Half Open ə ɔ
LOW æ ʌ
Open ɑ:
(Picture 2.1 Adopted from Kusuma, 1990:18)

The second category of speech sound is diphthongs. Kusuma (1990:15)

asserts that a diphthong is a sound made by gliding from one vowel position into
another position. In other words, diphthongs is a sound which consists of a movement
or glide from one vowel to another. It is a union of two vowels likes fine / faIn /, goes
/ gaʊ /, etc. there are nine diphthongs (Kusuma, 1990:39-42), they are:

1. [aI] – lie / laI / 4. [Iə] – deer / dIə(r) / 7. [ʊə] – poor / pʊə(r) /

2. [eI] – day / deI / 5. [ɔə] – door / dɔə(r) / 8. [aʊ] – down / daʊn /

3. [ɔI] – toy / tɔI 6. [eə] – bear / beə(r) / 9. [əʊ] – ago / aəʊ /

The third category of speech sound is consonants. According to Kusuma

(1990:14), consonants are speech sounds in which the air stream after having passed
the larynx is either stopped for a moment and release is driven through such a narrow
opening that we hear a friction. All of them are also voiced or voiceless consonants
(Haycraft, 1980:160-161), they are:

1. [p] – pet / pet / 13. [tʃ] – chin / tʃIn /

2. [b] - bad / bæd / 14. [dʒ] – jump / dʒʌmp /

3. [t] – tea / ti: / 15. [θ] – thin / θIn/

4. [d] – day / deI / 16. [ð] – this / ðIs /
5. [k] – key / ki: / 17. [m] – man / mæn

6. [g] – go / gəʊ/ 18. [n] – night / naIt /

7. [f] – fish / fIʃ/ 19. [ŋ] – sing / sIŋ /

8. [v] – vase / vɑ:z / 20. [h] – hot / hɔt /

9. [s] – sea / si: / 21. [I] – like / laIk /

10. [z] – zoo / zu: / 22. [r] – right / raIt /
11. [ʃ] – shoe / ʃu: / 23. [w] – wait / weIt /
12. [ʒ] – leisure / ‘leʒə(r) / 24. [j] – you / ju: /
Based on the types of speech sounds described above, this research focuses on
investigating English words pairs which differ only in the two sounds being focused.
They are either between vowels, consonants, diphthongs or vowel and diphthong
sounds. For example, ‘leave’ and ‘live’ that are in English words pairs which
contrasts vowel sound / i: / and vowel sound / I /.

2.2 Teaching English Pronunciation

Murcia et al. in Hermansyah (2011) states that, the goal of teaching
pronunciation to the students is to be at eased intelligible. As a speaker, he or she
needs to be able to pronounce the words correctly. It is proposed to avoid
misunderstanding to the listener and make comfortable situation in which the listener
can catch the meaning of the speaker’s utterances. As a listener, he or she needs to be
able to listen comfortably in which he or she can understand what the speaker says
without any unnecessary repetitions from the speaker that may bother both the
speaker and the listener. In other word, having a correct pronunciation is a key to
have an effective communication.
Based on Hornby (1974), pronunciation is the way in which a language is
spoken or the way in which a word is pronounced. In speaking English, the students
face some difficulties in the matter of pronunciation (Kusuma, 1990:1). They have to
learn to memorize the foreign sounds with their organ of speech.
According to Haycraft (1971:56-57), there are several ways to overcome the
difficulties of pronunciation. The students can contribute by trying to overcome their
own difficulties by training their ears to have successful speech, the students
recognize the foreign sounds that they hear, and they have to have ability to imitate
and produce the good speech. And also, they have to be sensitive of errors made by
themselves and others. The second way is that pronunciation practice helped by the
teacher and some appropriate media with pronunciation ability. Nowadays, computer
software can help them to practice and imitate what they hear followed by some
drills, that will be the major of improving the young student’s pronunciation. Here,
the students need to watch and listen carefully how the teacher or the computer
software pronounces the sounds like native speakers and they can produce the sounds
Kenwrothy (1987:1-3) suggests that teaching and learning pronunciation
needs supports from the teacher also students. In line with his idea, participation of
the English teacher and the students was very required in the application of this
technique in teaching and learning process. Teacher provided way to the students to
produce sounds, gave related media for effective teaching, and gave feedback for the
students’ performance.
In this research, the goal of teaching pronunciation at the first grade of Junior
High School was to enable students pronouncing English words, phrases, and
sentences correctly. The teacher should build an effort to make the students able to
communicate effectively. According to Samantray (2009), the approach relied
resolutely on listening for good human models and electronic media (records, tape
recorders, language labs, audio-video, cassettes, and CDs) prevailed as the
requirements. In this case, the teacher apply “Talk Any” as the media software to
train the students to pronounced English words, phrases and sentences to produce an
effective communication in which the listener can understand what the speaker says
without unnecessary repetition.
Based on the description above, pronouncing the English sounds within the
English words, phrases, and sentences are very essential in order to make good

2.2.1 Pronouncing Words

Pronunciation activities involve carefully listening and then articulation of
words and word pairs (Mora, 1999:1). Many young learners would be able to produce
new sounds just by imitating what they heard, but if the students seemed to be unable
to imitate, then the teacher could give guidance to the students which may help them
to attain the sounds. One of useful guidance to the students are ones that can be
followed, responded to, understood and carried out, and controlled (Kenworthy,
1987:69). Below are the examples to be given to the students that are easy enough to
notice and control (Kenworthy, 1987:69-70):
a) Lip position: whether the lips are pursued (as in whistling) or spread (as in a
smile) or wide apart (as when yawning).
b) Contact between the tongue and teeth: whether the sides of the tongue are
touching the upper back teeth or whether the tip of the tongue is touching the
top or bottom front teeth.

c) Contact between the tongue and the roof of the mouth: whether the tip of the
tongue or the back of the tongue is touching a part of the roof of the mouth.

Apart from what kind of instructions should be given to the students, the
teacher should start with pronouncing the words where the sound is at the beginning
of the word, then move on to words where it occurs at the end, then to middle
position (Kenworthy, 1987:70-71). It is usually easier for the students to produce a
new sound in first position. This kind of exercises should also involve self correction
to individual learners where their problems lie and how to repairs them. Besides that,
along with recognition practice, such activity may be an essential part of any
language teaching as they make pronunciation an active element of the learning
process and focus learners on the language they are producing (Dalton, 1997:2).

2.2.2 Pronouncing Phrases

2.2.3 Pronouncing Sentences
2.3 The Text To Speech Software (Talk Any) in Language Learning
Computer is widely used by most people in the world. In language teaching or
in a classroom, teachers and students are helped by computer as a media in their
teaching and learning process to deliver important information. Unwin and McAleese
(1978:149) state that the capability of today’s minicomputer will be usual in the
classroom, laboratory, equipments and library aids of the schools and colleges. In the
same view, Rivers (1987:177) says that anyone who has used computer for whatever
purposes realizes that the computer is an essentially interactive device. It means that
computer provides certain information that is preferred by the users. Furthermore,
teaching and learning process where communication between the teacher and students
occur need certain media.
Goodwyn (1992:28) says that media which can be used such as television,
film, video, radio, photography, popular, music, printed material, books, comics,
magazines and also computer software. Davies (1996:8) adds that English teacher
should try to vary the English teaching to make the students active in learning. It
means that English teacher should use various teaching techniques, in this research by
using text to speech or natural sounding software. Computer software also can be
used as media in teaching in order to optimize the teacher performance in presenting
the material and also it is a key area for media education that helps to simulate
specific media situations (Goodwyn, 1992:64). There are so many kinds of software,
specifically in teaching learning activity the computer software usually called
educational software. According to Elliot, et al. (2000:361), the educational software
is a one popular way to increase students’ level of motivation and enthusiasm in
learning. Computer assisted language learning is an approach to language teaching in
which computer technology is used as a language teaching aid to the mastering of
material to be learned.
Software that we can use as a media in delivering information is a text to
speech software or natural sounding software. The style of software is texts that can
be transformed into sound like human being’s voice, based on what we type on it. So,
everyone who want be mastered in pronouncing English words can use this software
as the matter of learning basic pronunciation. Based on the growth of technology,
there are so many kinds of text to speech or natural sounding software that we can
find through the internet whether it is free or paying.

2.3.1 The Role of Text To Speech (Talk Any) Software as a Media Teaching
According to Khalifa, et al. (2000:2), the appropriate way to use the computer
is as a tool for learning, and the software as a support for the educational programs.
As media in teaching pronunciation, this software has some roles in classroom. First,
a kind of media is able to reduce time of delivering material (Wiharjo, 2007:5). As
result, this media can save a lot of time of the teacher in teaching and learning process
in classroom. Second, the media of software is able to make the classroom to be more
interesting and motivating. Furthermore, Wiharjo (2007:5) states that software can
improve not only attention and concentration of the students but also to improve
motivation of the students’ interest. The abilities of this software are able to imitate
the real foreign speakers’ sound with several characters styles and we can adjust the
speed of voices. The variation of media can help the teacher to drill the students’
pronunciation and also motivate them to build their interest in learning pronunciation.
The roles described above indicate that presentation software is very useful
for both teacher and students in teaching and learning process in classroom.
By using this software, the students not only motivated but also they could be more
active to learn. The “Talk Any” interfaces could be displayed as follow:
(Picture 2.2 The screenshot of “Talk Any”)

2.3.2 The Procedure of the Use “Talk Any” Software in Teaching

The simple and easy way in the use of “Talk Any” software is, knowing the
function of each button on the software. There are some menus and buttons that this
software has, namely:
1) The file menu, it is used to open a file formatted *.txt (example: notepad file)
so that, the long text/saved text can be pronounced by this software without
write it back.
2) The Edit menu, it is used to copy/cut and paste certain text from unsaved file
on the computer and,
3) The help menu is the software’s description.

The buttons has different functions to maximize the output of speech

production, they are:

1) The personality button, it has twenty different sound, start from male, female,
woman, child robot voices and so on.
2) The pitch button, it has function to adjust the level of voice it will be
automatically adjusted when we choose the personality voice.
3) The speed button, it is used to adjust the tempo of voice, increasing the
number of speed will make the voice faster and if it is decreased the speed of
voice getting slower.
4) The Pitch quality button is to show the natural voice human, the flat voice and
sing voice.
5) The vocal effort button is to show the effect of voice like normal human
voices, human breathy and human whispered sound.
6) The last is the talk it! and Stop Talking button is to start and stop the sound’s

The simple procedure in teaching pronunciation by using “Talk Any”can be

arranged as follow:
a) Open some saved files based on the exercises and/or write down some words
on the place provided.
b) Then, press Talk It! button to start the machine saying.
c) The students listen what the software says.
d) The last is, the students need to be drilled many times by repeating the
difficult words that they do not really good in pronouncing. To make
maximize the sound voice, the speed button needs to be decreased or

2.4 The “Talk Any” Software as a Media in Pronunciation Class

A text to speech (TTS) is a system that converts normal language text into
speech. There are many types of natural sounding or text to speech software that we
can use to assist our job and for teaching and learning. Kinds of software that include
to TTS software are PistonSoft, TextAloud, Natural Voice Text to Speech Reader,
Talk Any, Pronunciation Power, and so many others. In education those software also
can be used whether for listening or pronunciation drilling. In teaching pronunciation
it leads the learners to imitate the words that they usually difficult to pronounce by
listening to the speech machine continuously.
Based on Goodwyn (1992:105), one of the richest areas of development is the
increasingly dynamic interface of media technology for the English teacher. One of
the text to speech software which has friendly interface and fun that we can use in
teaching pronunciation is “Talk Any” software. Friendly and fun mean that it lets us
to listen to words or texts we type on clipboard and just one click to show its natural
speech sound and read it back to you.

2.4.1 The Advantages and Disadvantages of Using “Talk Any” Software

As we know, everything has its own strengths and weaknesses. It has some
strengths that make this software useful for teaching and learning they are:
1. It does not need high computer specification and the file size is small only
380kb, so it doesn’t need large space for the system operation.
2. We can open *.txt format files from the File menu or just paste to the
clipboard so that the machine will read it for us.
3. It is free downloaded software, we can get it at Http://talk-
4. An unlimited number of vocal qualities/personalities can be created by the
user also control of speed and quality of speech so that the listener can imitate the
same as the machine did.
5. The last advantage is user friendly; it has the ease in use and learns especially
for pronunciation practice.
Besides those strengths, there are some weaknesses of this software that
should we know, namely:
1. There is no English accent provided for speech sounds, such as American or
British English. Default accent is American English.
2. No option for sound recording to differentiate the machine and user voices.
3. No phonetic transcription and translation of words.

Based on those strengths and weaknesses above, it gave easiness to the

teacher in running the program. It assisted the teacher role in acting the pronunciation
or gave clearer examples of voices and speed control. It was clear that this software
can be used as a media to assist the teacher in teaching pronunciation, and also it
could be used to teach listening. So “Talk Any” software has capability to present
material attractively, and the students could improve their pronunciation ability in
English class.

2.4.2 The Comparison with Other Related Software

There are many kinds of pronunciation software, but some software are
integrated with dictionary. The examples are Encarta, Longman, and Cambridge
Learner’s dictionary. The three software above have some features, they are,
pronunciation practice (record and listen to your own pronunciation), sounds
recording (both British and American native English speakers), smart thesaurus (find
the right word for any situation), quick find (look up words quickly while using other
programs), Study pages (explore different areas of English), practice (what have you
learned). The three dictionary software only focuses on the meaning and description
of the words only but they have a feature that can record our sounds but it cannot be
saved as files. In this research, the researcher prefers to use Talk Any software rather
than the three dictionary software above although it doesn’t has recording feature.
The strength of Talk Any than the other software is that it can pronounce some words
in a time, so it is match with the word pairs drilling.

2.6 The Effect of “Talk Any” as the Media on the Students’ Pronunciation
Pronunciation is one of the English components that should be taught
seriously. To speak or communicate fluently, the students should have good mastery
in pronunciation. To get the ease in teaching pronunciation, the software as the media
is implemented in the teaching. (Sigafoos, 2007:115) states that, the existence of
computer technology grows sophisticated also has very important roles in teaching
and learning process at various levels. It means that the educational software can help
the educators to develop students’ motivation and enthusiasm in learning process. It is
supported by Khalifa et al. (2000:2) who says that the appropriate way to use the
computer is as a tool for learning, and the software as a support for the educational
Based on the statements above, the use of technology/software computer in
this case “Talk Any” as the media to assist the teacher and the students to get the ease
in mastering the pronunciation, encourage students to imitate the real voice of native,
motivating and hopefully the students are more enjoyable and interested in practicing
pronunciation to make their good communication in English.

2.7 Alternative Hypothesis

Based on the theory above, the hypothesis of this research can be formulated
as follow; there was a significant effect of the use “Talk Any” software as a media in
drilling the seventh grade students’ pronunciation ability at SMPN 5 Tanggul Jember
in the 2010/2011 academic year.
Pronunciation is one of the important aspects in English, especially in oral
communication. Every sound, stress pattern, and intonation may convey meaning.
The non native speakers of English who speak English have to be very careful in
pronouncing some utterances or he may create misunderstanding. So, having an
intelligible pronunciation is necessary rather than having a native-like pronunciation.
Here is pronunciation definition from some experts: According to Lado (1964: 70),
pronunciation is the use of a sound system in speaking and listening. Here,
pronunciation is merely treated as the act that happens in speaking and listening,
Lado doesn’t mention how the sounds are produced. Pronunciation is the act or
manner of pronouncing words; utterance of speech. In other words, it can also be said
that it is a way of speaking a word, especially a way that is accepted or generally
understood. In the senses, pronunciation entails the production and reception of
sounds of speech and the achievement of the meaning (Kristina, Diah, et al.2006: 1).
This second definition gives a briefer pronunciation‟s definition. It contains some
important keys in pronunciation: act, speaking, production and reception of sound. It
means that the words being pronounced should be understandable (intelligible).