You are on page 1of 173






Name : Mr. Therdpong Daengsi
Thesis Title : VoIP Quality Measurement: Recommendation of MOS and
Enhanced Objective Measurement Method for Standard Thai
Spoken Language
Major Field : Information Technology
King Mongkut’s University of Technology North Bangkok
Thesis Advisor : Assistant Professor Dr. Saowanit Sukparungsee
Co-Advisor : Dr. Apiruck Preechayasomboon
Co-Advisor : Dr. Chai Wutiwiwatchai
Academic Year : 2012

Voice over IP (VoIP), a modern form of telecommunications, requires real-time
transmission. However, there are limitations (e.g., packet loss and delay) which result
in degradation in quality of voice transmission over an IP network. ITU-T, a sector of
the International Telecommunication Union, has issued recommendations for VoIP
quality based on a Mean Opinion Score (MOS) derived primarily from studies on
European languages. ITU-T has acknowledged that there is an issue related to
dependence on language/culture/nationality and the Quality of Experience (QoE) of
multimedia, including MOS. Because ITU-T standards have yet to be adopted for
measurement of voice quality for tonal languages (e.g. Thai), the research in this
thesis has several aims as follows. First, it is to carry out detailed assessment of
subjective voice quality for the standard spoken Thai language and for native Thai
speakers in a Thai environment. Second, to compare voice quality perceived by Thai
users of four VoIP codecs, G.729, G.711A-law, G.722 and G.723.1 (at 5.3 kbps). It
has been found that Thai users found no significant difference between three of the
codecs with a slight preference for G.722 over G.711A-law over G.729 codec.
However, they ranked G.723.1 as having significantly poorer voice quality than the
others. Therefore G.729 is recommended as the best choice because it has MOS in the
range 4.13- 4.18 but requires a lower bandwidth than G.722 or G.711A-law. Third, it
is to propose acceptable voice quality standards for VoIP services for Thai language
and for native speakers of Thai. It has been found that, on average, Thai users
expected VoIP quality equivalent to a MOS of 3.41, i.e., fair quality. Finally, this
thesis proposes two new models for Thai languages and Thai users, namely a Thai
subjective-VoIP Quality Evaluation (ThaiVQE) model and an Enhanced E-model
(E2-model), an objective method. The ThaiVQE model focused on two major network
parameters, packet loss and packet delay, and a G.711A-law codec. The E2-model was
obtained by including a subjective Thai bias factor in a standard E-model. The results
showed that both models gave significant accuracy and reliability improvements
compared to the standard E-model, with error reduction more than 20%. Therefore,
both of the new models can support voice quality measurement in Thai environments
with high accuracy, reliability and confidence.
(Total 169 pages)

Keywords : VoIP, MOS, E-model, packet loss, packet delay, codec, Thai

______________________________________________________________ Advisor

ชื่อ : นายเทอดพงษ์ แดงสี
ชื่อวิทยานิพนธ์ ี่ ั เอ็มโอเอส
: การวัดคุณภาพเสี ยงวีโอไอพี: ข้อแนะนําเกยวกบ
และวิธีการวัดเชิงวัตถุวิสยั เสริ มสมรรถนะสําหรับภาษาพูด
สาขาวิชา : เทคโนโลยีสารสนเทศ
อาจารย์ที่ปรึ กษาวิทยานิพนธ์หลัก : ผูช้ ่วยศาสตราจารย์ ดร. เสาวณิ ต สุ ขภารังษี
อาจารย์ที่ปรึ กษาวิทยานิพนธ์ร่ วม : ดร. อภิรักษ์ ปรี ชญสมบูรณ์
อาจารย์ที่ปรึ กษาวิทยานิพนธ์ร่ วม : ดร. ชัย วุฒิวิวฒั น์ชยั
ปี การศึกษา : 2555

บทคัดย่ อ
วีโอไอพี (VoIP) เป็ นการสื่ อสารโทรคมนาคมยุคใหม่ ที่ต้องการการสงแบบทั ่ นทีทนั ใด แตก่็
ยังมีขอ้ จํากดั (เชน่ การสู ญเสี ยและการหนวง ่ แพ็คเกต) ซึ่งมีผลตอ่คุณภาพเสี ยงได้ ไอทีย-ู ที (ITU-T)
ได้เสนอวิธีการวัดคุณภาพเสี ยงวีโอไอพี ด้วยมาตรวัดที่ชื่อ เอ็มโอเอสหรื อมอส (MOS) และไอทีย-ู ที

กตระหนั ก ดี ว่ า มี ป ระเด็น เกยวกบภาษา
ี่ ั ั
/วัฒ นธรรม/สัญ ชาติ กบการวั ด คุ ณ ภาพเสี ย งวี โอไอพี
เนื่ องจากมาตรฐานไอทีย-ู ที ยังไมมี่ การกาหนดสํ
ํ าหรับภาษาที่มีวรรณยุกต์ การค้นคว้านี้ จึงเกดขึิ ้น
ด้วยวัตถุประสงค์ อันดับแรก เพื่อศึกษาการประเมินคุณภาพเสี ยง สําหรับคนไทยและภาษาพูดไทย
มาตรฐาน อันดับที่สอง เพื่อเปรี ยบเทียบ 4 โคเด็ค (Codec) คือ G.729, G.711A-law, G.722 และ
G.723.1(5.3 kbps) ซึ่ งพบวา่ G.729 ให้คุณภาพเสี ยงที่ไมแตกตางจาก ่ ่ G.711A-law และ G.722
อย่างมีนยั สําคัญ แตใช้ ่ แบนด์วิธท์ (Bandwidth) น้อยกวา่ ขณะที่ G.723.1 ให้คุณภาพเสี ยงตํ่ากวา่
โคเด็คอื่นอยางมี ่ นัยสําคัญ อันดับถัดไป เพื่อหาเกณฑ์คุณภาพเสี ยงที่ สามารถยอมรั บได้สําหรั บ
คนไทย ซึ่งพบวา่ คา่เอ็มโอเอส 3.41 เป็ นคาเฉลี
่ ่ยที่คนไทยคาดหวังจากวีโอไอพี สุ ดท้าย เพื่อนําเสนอ
2 โมเดลใหมคื่ อ ไทยวีคิวอี (ThaiVQE) และ อีสแควร์ -โมเดล (E2-model) ที่ได้จากการรวมคา่ไทย
ไบแอสแฟคเตอร์ (Thai bias factor) ซึ่งพบวา่คาผิ ่ ดพลาดลดลงกวา่ 20% เมื่อเทียบกบัอี-โมเดลเดิม
(วิทยานิพนธ์มีจาํ นวนทั้ งสิ้ น 169 หน้า)

คําสําคัญ : วีโอไอพี เอ็มโอเอส อี-โมเดล การสู ญเสี ยแพ็คเกต การหนวงแพ็
คเกต โคเด็ค ไทย

___________________________________________________อาจารย์ที่ปรึ กษาวิทยานิพนธ์หลัก


Thank you to over one thousand people involved in this research, including
students, staff and lecturers, sorry that I cannot write all of your names here.
Particularly, thank Mr. Chumpol Ngamphiw and Mr. Worasit Junsawang, my friends,
for helping me to implement the VoIP testbed system, thank Mr. Nattawut
Unwanatham for conducting subjective tests with hundreds subjects, thank you to Mr.
Wiwat Suwanuntawong and the Central Library Studio staff for their kindness to use
the studio for almost two years, and Mr. Gary Sherriff, the international coordinator,
Faculty of Information Technology for editing and English support of over 15 papers,
both published and unpublished.
Thank you to the Graduate College, KMUTNB and the Faculty of Information
Technology, KMUTNB, and the Speech and Audio Laboratory, NECTEC, for a part
of funding support.
Gratitude is due to my parents, particularly my mother who came from our
hometown to help take care of my family which provided me with valuable time to
pursue my PhD. Thank you to my wife who tries her best to understand me and is
always standing by my side with our two beautiful daughters.
My deepest gratitude to my advisors, Asst. Prof. Dr. Saowanit Sukparungsee,
Dr. Apiruck Preechayasomboon and Dr. Chai Wutiwiwatchai. Without your support
my dream would not have become true. Thanks to the thesis defense examination
committee members, Dr. Theerawat Piboongungon who is the chair, Dr. Maleerat
Sodanil and Dr. Elvin James Moore for very useful comments.
Finally, I would like to dedicate the contributions of this research to Dr. Gareth
Clayton, my advisor who passed away sadly. The beginning idea of this research
about the activation in the brain by tones in tonal languages came from our lengthy
discussions. We all miss you deeply.

Therdpong Daengsi


Abstract (in English) ii
Abstract (in Thai) iii
Acknowledgements iv
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
1.1 Motivation of the Study 1
1.2 Purpose of the Study 3
1.3 Scope of the Study 3
1.4 Major Contributions 3
1.5 Thesis Structure 4
Chapter 2 Background and Literature Review 7
2.1 Thai Language and Thai Culture 7
2.2 Hearing and Language Ability 18
2.3 Voice over Internet Protocol 25
2.4 Voice Quality Measurement 40
2.5 Previous Research on Voice Quality Measurement 60
2.6 Selected Tools for Implementation of the Testbed VoIP System 66
2.7 Statistical and Mathematical Tools 70
Chapter 3 Methodology: Subjective and Objective Measurement 75
3.1 Phase I: Experimental Design 76
3.2 Phase II: Preparation 78
3.3 Phase III: Pilot Tests 84
3.4 Phase IV: Intensive Subjective Tests 87
3.5 Phase V: Objective Tests Using E-model 89
Chapter 4 Results of Analysis 91
4.1 Pilot Phase Results 91
4.2 Intensive Subjective Test Results 93
4.3 Objective Test Results 96
4.4 Analysis and Comparison 98
Chapter 5 E2-model ThaiVQE 105
5.1 E2-model 105
5.2 Thai Subjective VoIP Quality Evaluation Mathematical Model 107
5.3 Model Comparison: Standard E-model, E2-model and ThaiVQE 109
5.4 ThaiVQE for G.729 110
Chapter 6 Discussion Conclusion and Future Work 113
6.1 Discussion 113
6.2 Conclusion 116
6.3 Future Work 118
References 119
Appendix A Selected Thai Speech Samples for the Listening Opinion Tests 137
Appendix B Three Questionnaire Forms for All Subjective Tests (Thai) 141
Appendix E Selected Publications 149
Biography 167


Table Page
2-1 Thai initial consonant 9
2-2 Thai final consonant 10
2-3 Thai consonant cluster 10
2-4 Thai vowel 11
2-5 Thai tones 11
2-6 Example of Thai words with different tones 12
2-7 Some definitions of “culture” 13
2-8 Characteristics of “cultures” 14
2-9 Some characteristics of Thai culture versus Western culture 16
2-10 Familiar sounds and their loudness level in dB 19
2-11 Functions of main parts of the brain 20
2-12 Different function between LH and RH 21
2-13 VoIP QoS specification in Thailand 28
2-14 VoIP operators and VoIP numbers at present 30
2-15 Comparison of International call rates of traditional services and some
PC-to-Phone VoIP services 30
2-16 Comparison of international call rates of some services of phone-to-
phone via VoIP networks 31
2-17 Codecs and their properties 33
2-18 Comparison of H.323 versus SIP 38
2-19 General QoS controls 39
2-20 Comparison of header compression techniques 41
2-21 The statistics of the terms QoS vs QoE 43
2-22 ITU-T definition comparison of QoS vs QoE 44
2-23 Some definitions of QoE 45
2-24 Categories of VoIP User’s Experience and their quality expectation 45
2-25 Subjective measurement methods versus objective measurement methods 46
2-26 Scale of opinion scores and meaning 47
2-27 Definitions of subjective versus objective 48
2-28 The statistic of the results from IEEEXplore after using the keywords VoIP
quality, packet delay, packet loss and jitter 49
2-29 The evidence for the importance of subjective measurement 52
2-30 Network parameters between two endpoints, in a telephone network 58
2-31 The relation among R-value, MOS-CQE and user satisfaction 59
2-32 The statistic of the results from IEEEXplore for search using the
keywords VoIP, quality , PESQ and E-model 60
2-33 Example of previous works with subjective tests 63
2-34 Example of MOS from different language 64
2-35 Description of each Asterisk part based on its architecture 68
2-36 Important Asterisk files based on FreeBSD 69
2-37 Dummynet features 69
2-38 Comparison of t-tests and ANOVA for two and three groups respectively 71


Table Page
3-1 Summary about the pilot tests 77
3-2 Summary about the conversation opinion tests, each scenario required at
least 24 subjects (total 576 subjects) 77
3-3 Test scenarios 78
3-4 Comparison of imported values and properties of the modified room 81
3-5 Speech lists 83
3-6 The estimate numbers of subjects 84
4-1 Interview test results 92
4-2 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to
delay 94
4-3 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to
loss 95
4-4 MOS-CQS versus MOS-CQE from G.711 referring to loss and delay
effects after validating 96
4-5 Numbers of subjects, total 400 subjects referring on the tests with G.711 97
4-6 Statistic from the survey of the mean expectation score of voice quality
bases-on 5-point scale from 828 Thai users 98
4-7 Hypotheses 101
4-8 Hypothesis analysis results with 95% CI 102
4-9 Comparison of MOS from Thai users and users from different
languages and cultures 103
5-1 MOS-CQS* from ThaiVQE model referring to packet delay and packet
loss effects for G.711 only 109
5-2 Test set information 110
5-3 E2-model and ThaiVQE evaluation results using comparison with the
test set 110
5-4 The difference of MOS-CQS provided by G.711 and G.729 111
5-5 Estimation of MOS-CQS* from ThaiVQE model referring to packet
delay and packet loss effects for G.729 112
6-1 Summary of numbers of subjects 115
6-2 Subjective MOS from Thai users 116
A-1 Selected TSST Speech Samples for the Listening Opinion Test 138


Figure Page
2-1 A map of Thailand 7
2-2 The example of fundamental frequency (F0) contours of five Thai tones 12
2-3 Comparison of Hofstedges’ Score of National Culture in five dimensions 17
2-4 The auditory pathway 20
2-5 Superior view of the brain shows left and right hemispheres 21
2-6 Lateral view of the brain (LH) 22
2-7 Working of the brain 22
2-8 Schematic representation of MMN regions of interest. 23
2-9 Potential maps of electric MMN response 24
2-10 Comparison of English brain, Chinese brain and Thai brain 24
2-11 The PET scanning results from the PET study of tone perception 24
2-12 The vocal organs of human 25
2-13 Business perspective to IP telephony 26
2-14 Overview of 4G Network 28
2-15 NGN layers 29
2-16 NGN subsystem architecture 29
2-17 VoIP architecture overview 32
2-18 H.323 architecture overview 35
2-19 H.323 protocol suite 35
2-20 SIP architecture overview 36
2-21 SIP protocol suite 37
2-22 A comparison of message flows between H.323 and SIP 37
2-23 Overview of CAC mechanisms 39
2-24 Voice packet compression overview 40
2-25 Comparison of QoS and QoE: test points 42
2-26 Position of QoE and QoS for VoIP 42
2-27 VoIP QoE: Good QoE perceived by User A vs poor QoE perceived by
User B 43
2-28 Influence factors for voice quality 44
2-29 Voice quality measurement concept 44
2-30 The result from search using the keyword ‘VoIP quality’ 49
2-31 Relation of R-value from E-model vs delay 50
2-32 Example of random and bursty packet loss 51
2-33 Subjective Voice Quality Measurement Methods 53
2-34 Objective Voice Quality Measurement Methods 54
2-35 PESQ application guide 55
2-36 PESQ overview 56
2-37 Reference connection of E-model 57
2-38 The graph represents the relation of R-value and MOS-CQE 59
2-39 Chinese MOS versus Japanese MOS 62
2-40 The overall structure of the extended E-model 65
2-41 Asterisk architecture 67
2-42 The structure of a dummynet “pipe” with configurable parameters 70
2-43 Model fitting process overview 72


Figure Page
3-1 The methodology for objective measurement method enhancement
using Thai bias factor 75
3-2 Top view of the plan of the studio room 79
3-3 The background noise was checked before testing 80
3-4 Example of the window sill 80
3-5 Example of room floor 81
3-6 Diagram of the VoIP testbed system 81
3-7 The real VoIP testbed system 82
3-8 The overview of VoIP system for the test 84
3-9 The captured screen shot from investigation of packet delay and packet
loss before testing 84
3-10 An IP phone and facilities for the pilot test 85
3-11 A speech list that starts with the speech by Child1 86
3-12 Overview of the interview tests 86
3-13 Overview of the test system over an IP network 87
3-14 Example of random shapes 88
3-15 The overview of the conversation opinion tests in this reseach 89
3-16 Diagram of E-model measurement 89
4-1 The ACR-listening test results 91
4-2 MOS-LQS of G.711 vs G.729 from different types of voices 92
4-3 Comparison of percent of the votes: G.711 vs G. 722 at 64 kbps 93
4-4 The MOS-LQO results of 4 lists of Thai speech and American-English
speech 93
4-5 Comparison of G.722, G.711 and G.729 referring to delay effects 94
4-6 Comparison of G.722, G.711 and G.729 referring to loss effects 95
4-7 Representing of MOS-CQS versus MOS-CQE 97
5-1 Overview of finding Thai bias factor 106
5-2 Overview of the proposed E2-model with Thai bias factor 106
5-3 Overview of development of ThaiVQE model 108
5-4 The surface chart of the MOS-CQS provided by G.711 for packet loss
of 0-10% and packet delay of 0-0.8 s. 108

In today’s world of rapidly changing technology, information and
communication technologies are at the forefront of the change. Before the emergence
of the Internet, data and voice calls ran over different routes in the network. Now,
more than 15 years later, as a result of modern technology convergence both data and
voice calls can run together over the same routes through the Internet. However, the
nature of a voice call is different from the nature of data transmission. Most data sent
over networks does not require real-time support from the network but a voice call
needs it. Moreover, voice calls cannot tolerate problems which often occur over a
packet based network such as the Internet. Problems such as packet delay, packet loss,
jitter and echo cause major difficulties for a real-time application such as VoIP (Voice
over Internet Protocol). Because of the above problems, voice call quality perceived
by a listener is sure to be affected.
This study focuses on VoIP quality for the standard Thai spoken language. Thai
is a tonal language that is used by most of the approximately 65 million people in
Thailand. The motivation, purpose, and scope of this study are as follows:

1.1 Motivation of the Study

VoIP is a modern telecommunication technology, beyond the legacy telephone
technology. As described in [1], VoIP technology is a combination of speech (or
voice) communication technology and data communication (computer) technology. In
the past, voice traffic flowed in public switched telephone networks (PSTNs),
whereas data traffic flowed in data communication networks. With the development
of the Internet, the two technologies have been merged together and the data
communication network has been used to carry voice over IP networks. The use of
VoIP has expanded rapidly due to the higher cost of calls on PSTNs and because
bandwidth on the data network was available.
As reported in [1], VoIP is now being widely used in Thailand by private users,
companies, universities and government departments. Thai people are using VoIP
applications (e.g. Skype) to chat with friends both within Thailand and internationally.
Some companies in the private sector are using IP telephony systems in some parts of
their organizations [2-3]. Universities in Thailand that have implemented their own
VoIP systems include Prince of Songkla University, Maejo University,
Pibulsongkram Rajabhat University, Burapha University, and Rajamangala University
of Technology Lanna [4-9]. The Thai Customs Department of the Ministry of
Finance has implemented its own VoIP system with over 2,000 IP phones covering
over 70 customs houses nationwide with the expectation of reducing telephone costs
by at least 30% [10]. Thai telephone operators such as TOT, True and CAT, have
already introduced VoIP services, particularly, for international call services.
Examples of these services are TOTnetcall and 008 by TOT, NetTalk by True and
009 by CAT [11-14]. The VoIP services in Thailand provided by these operators are
controlled by the National Broadcasting and Telecommunications Commission
(NBTC), which functions as the regulator. NBTC has adopted a numbering plan that
reserves the numbers 061-xxx-xxxx, 065x-xxx-xxxx and 067-xxx-xxxx to 068-xxx-
xxxx for VoIP [15].
Although it has many advantages, VoIP also has some important limitations.
Some of these limitations are as follows. Voice quality can be degraded by packet
loss and packet delay in IP networks. It has been stated [16], that people can perceive
voice delay if packet delay is more than 150 ms and that they may want to stop
conversation if packet delay is more than 400 ms. Further, packet loss of more than
1% is usually noticed by users and packet loss of more than 3%.may not be acceptable
to them [16]. These voice quality problems can be reduced by the use of high-quality
codecs. For a Local Area Network (LAN), the G.711 codec is generally used.
However, G.722 that is supposed to provide better voice quality might also be used.
For a Wide Area Network (WAN), the G.729 codec is generally used, but G.723.1 has
also been used as an option. According to the information presented in [16], the voice
quality of G.711 and G.722, as measured by the mean opinion score (MOS), are about
the same (~4.1), while the voice quality of G.729 and G.723.1 are lower with values
of 3.92 and 3.6-3.8 respectively. These results were presumably based on English,
which is a non-tonal language. It is not clear whether these same results would also
apply to a tonal language such as Thai.
The study of VoIP quality is related to voice quality measurement. This is a
new discipline in the theory of telecommunication networks, and particularly of the
Internet network [17]. However, most studies of VoIP quality were conducted for
English which is a non-tonal language. The tonal feature of the Thai language is very
important as it can change the meaning of words. For example, the Thai word “ไกล”
(kla#j) (middle tone) means “far” but the Thai word “ใกล้” (klaflj) (rising tone) means
“near”, whereas the Thai word “ม้า” (ma¤˘) (rising tone) means “a horse” but the Thai
word “หมา” (ma‡˘) (high tone) means “a dog” [18]. There is therefore a need to
conduct research on voice quality measurement of VoIP based on Thai subjects.
ITU-T has published recommendations for objective and subjective measurements of
voice quality. These recommendations were presumably based on studies of non-
tonal languages, such as English. One aim of this research is to develop similar
recommendations for Thai. In particular, an aim is to develop methods of modifying
or improving objective measurement using subjective MOS (e.g. MOS-CQS) from
subjective tests with Thai speakers of the standard Thai spoken language.
In the future, when voice quality becomes one of the indicators of the quality of
life of Thai people, voice quality measurement will be necessary because it can be
important for VoIP stake-holders, which consist of:
1) VoIP regulator
2) VoIP service providers
3) VoIP system owners
4) VoIP system developers/implementers
5) VoIP users or consumers.
The Thai VoIP regulator (NBTC) may issue a license with conditions to
guarantee that VoIP service providers provide acceptable voice quality for VoIP, and
then the VoIP service provider will find it necessary to measure and control their

services to achieve the conditions of the regulator. Further, when a new VoIP system
is installed by system developers/implementers for system owners, the owners will
usually require a voice quality measurement before accepting the new system. Also,
if VoIP service providers can announce their level of voice quality or MOS, then
VoIP users or consumers will be able to select VoIP services from the service
provider who provides the best VoIP quality. Therefore, all Thai VoIP stake-holders
can be expected to obtain benefits from this research into VoIP quality for Thai users
and Thai Spoken language.

1.2 Purpose of the Study

The purpose of this research is to study VoIP quality measurement with Thai
users in the Thai environment as follows:
1.2.1 To study both subjective and objective measurements of VoIP in the Thai
environment, and then compare the results of the two measurements.
1.2.2 To make recommendations of the MOS for Thai users who use the
standard Thai spoken language.
1.2.3 To enhance an objective measurement method for measuring voice quality
with a factor called the Thai bias factor obtained from a comparison of objective and
subjective measurements. The enhanced method will assist in the development of
voice quality measurements of higher accuracy and reliability so that users will have
increased confidence in the results of VoIP quality measurement for Thai speakers of
the standard Thai spoken language.

1.3 Scope of the Study

The following list is the scope of this VoIP quality measurement study of the
Thai spoken language in a Thai environment:
1.3.1 This research covered listening and conversational tests of subjective
measurement methods.
1.3.2 This research studied the two objective measurement methods PESQ and
E-model and then selected the E-model for enhancement for Thai speakers.
1.3.3 For subjective measurement methods, the tests were conducted in
soundproof rooms with controlled conditions following ITU-T recommendations.
1.3.4 The recorded speech samples consisted of standard Thai spoken language
and did not include any dialects.
1.3.5 The recorded speech samples for listening tests were taken from two
children, four male and four female speakers.
1.3.6 Asterisk was selected as the only IP telephony system to be tested in this
thesis. IP phones and other IP network equipment were only used as tools for this

1.4 Major Contributions

There are four major contributions from this research as follows:
1.4.1 Subjective MOS for Thai users have been determined for narrow band
codecs, consisting of G.711A-law, G.729 and G.723.1 at 5.3 kbps, and one wideband
codec G.722. These results can be the benchmark based on Thai users for voice
quality measurement in Thai environments.

1.4.2 Mean Expectation Score (MES), which is equivalent to MOS, from Thai
users, have been determined. Therefore, MES could be recommended as the baseline
for providing voice quality for VoIP services in Thailand.
1.4.3 An Enhanced E-model (E2-model) based on Thai users in Thai
environments has been developed. This model is suitable for covering Thai tonal
language and Thai culture, particularly characteristics of Thai people and the way
they respond to situations. The E2-model should increase overall accuracy, reliability
and confidence in measurements of VoIP quality.
1.4.4 The Thai Subjective VoIP Quality Evaluation mathematical model
(ThaiVQE) has been developed as an option for VoIP quality measurements. This
model can be applied in Thailand with high accuracy, reliability and confidence,
without high cost

1.5 Thesis Structure

The rest of this thesis consists of Chapters 2 to 6. Each chapter can be briefly
described as follows:
Chapter 2 presents all related background and a literature review. Firstly, the
Thai language and culture is presented in Section 2.1. It starts with the Thai language
and Thai sound system, before presenting Thai culture, particularly Thai
characteristics and the way they respond to situations. The discussion includes a
comparison of Thai culture and Western culture based on Hoftedes’ very interesting
culture dimension ideas. Interesting facts about Thai populations based on their
income are presented briefly. Section 2.2 presents information about hearing and
language ability that mainly relates to the human brain. Therefore, a brief explanation
about the human brain and Thai brain is given. The first subsection describes the ear
and hearing, followed by the presenting of brain and language ability. Section 2.3
starts with a discussion of the basics of VoIP. Then, it presents an overview of VoIP
Services in Thailand before giving an overview of VoIP architecture, codecs,
signaling protocols, mechanisms for quality of service, header compression and
security respectively. Section 2.4 summarizes the background information that is the
most important for voice quality measurement. This section contains five sub-
sections, consisting of quality of experience, voice quality measurement overview,
network factors (e.g. packet loss and packet delay), subjective measurement methods
(e.g. listening-opinion tests, conversation-opinion tests and interview tests), and
objective measurement methods based on ITU-T standards (e.g. PESQ and E-model).
Section 2.5 gives a review of previous VoIP research that uses subjective methods to
measure voice quality, a review of VoIP measurements on Thai and other languages,
and a review of some factors suggested for E-model enhancement. The last two
sections, 2.6 and 2.7, discuss tools for implementation of a testbed VoIP system.
These tools include a network emulator and VoIP software, and statistical and
mathematical tools including hypothesis testing, model fitting and model evaluation.
Charter 3 covers the experimental design and methodology of this research.
Section 3.1.1 discusses subjective test design and includes the codecs used, the
methods of selecting the required subjects, and the conditions for the subjective tests.
Section 3.1.2 briefly summarizes objective test design. Section 3.2 describes the
laboratory preparation and the VoIP testbed system implementation that consists of an
IP Telephony system and a network emulator. Section 3.3 discusses the pilot phase of

the experiments. Section 3.4 describes intensive subjective tests using conversation
opinion tests. Finally, Section 3.5 discusses objective tests based on the PESQ and E-
model test methods.
Chapter 4 presents results from all tests and gives an analysis and comparison of
the results. Sections 4.1-4.3 presents a comparison of results of the subjective MOS
and the objective MOS tests described in Chapter 3. Section 4.4 uses statistical tools
called t-test and ANOVA for hypothesis tests about the perception of Thai subjects to
the codecs under test (e.g. G.729, G.711 a G.722) and for a statistical comparison of
MOS from Thai users and MOS from users in other countries.
Chapter 5 presents the main contribution of this research. Sections 5.1 and 5.2
describe the development of the enhanced E-model using Thai bias factor, called E2-
model, and the Thai subjective VoIP Quality Evaluation mathematical model
(ThaiVQE). The development is based on a model fitting technique. In Section 5.3
these two models are evaluated, analyzed and discussed using two simple
mathematical tools, called Mean Absolute Error (MAE) and Mean Absolute
Percentage Error (MAPE) in order to see if these two approaches can give higher
accuracy, reliability and confidence. From the evaluation, it is found that the E2-
model and ThaiVQE can reduce error by more than 20% when compared with the
standard E-model. Section 5.4 presents an estimation of ThaiVQE for G.729. This
chapter also proposes a ThaiVQE table based on packet loss and delay that can be
used as a guideline for the use of G.711 and G.729 by Thai users in Thai
Chapter 6 gives a discussion, recommendations and conclusions on the results
of this research. The chapter concludes with suggestions for possible future work.

2.1 Thai Language and Thai Culture

Before conducting this study, it is very important to understand several related
areas of interest, as follows:
2.1.1 Overview on Thai Language [19-21]
Thailand is a nation in Southeast Asia, as shown in Figure 2-1 [21]. For Thai
language, there are many Thai dialects spoken in various parts of Thailand. For
example, Northern Thai or Kam Muang is used in the north, Southern Thai is used in
the south and Lao or Northeastern Thai is used in the northeast. Furthermore, there are
some Thai dialects which are only used in small parts of Thailand, for example,
Phuthai is used in some areas of Nakhon Phanom, Sakon Nakhon, Mukdahan and

FIGURE 2-1 A map of Thailand


However, the formal one that all Thai people understand is Thai, which could be
called as standard or official Thai. It is used in official places such as schools,
universities, hospitals, police stations and many organizations. Also, it is used for
news broadcasting over TV and radio. Thai is in the Tai language family, a subgroup
of Kadai or Kam-Tai which is under the Sino-Tibetan family. Thai is a tonal
language, similar to several languages in Asia, such as, Mandarin-Chinese and
Vietnamese, in Europe, such as, Swedish and Norwegian and in Africa, such as, Igbo.
Thai is used by almost 70 million people at present which includes Thailand and
surrounding borders. Basically, Thai consists of 44 consonants (42 consonants in use
and 2 consonants are obsolete), 15 basic vowels and 4 tone markers. Similar to
English, Thai text is written from left to right horizontally. Contradicting to English,
there is no space between words in the same sentence and no explicit sentence
markers. Vowels can be found before, after, below or above the consonant.
Combinations of a few consonants and vowel characters can make compound vowels,
called diphthongs. Compared to English, Thai grammar is simpler. It is “Subject +
Verb + Object” but there is no article, verb conjugation, declension, object-pronouns,
and tenses.
2.1.2 Thai sound system [19, 22-24]
For the Thai sound system, basically each word consists of an initial consonant,
a vowel, a final consonant and a tone marker respectively. Some background
information about the Thai sound system is presented as follows: Initial consonant
There are 21 phonemes for initial consonant in Thai that are produced from
different points of articulation, as in Table 2-1. Final consonant
There are 9 phonemes for final consonant in Thai that are produced from
different points of articulation, as in Table 2-2. Consonant cluster
There are possibly 6 phonemes for the first phonemes and 3 phonemes for the
second phonemes in the cluster respectively. Therefore, there are 12 forms of
consonant cluster in Thai, as in Table 2-3. Vowel
For the sound system in Thai language, there are both monophthongs and
diphthongs, as in Table 2-4. Tone
For tones of Thai, there are five tones which consist of the middle tone (no tone-
marker), the low tone, the falling tone, the high tone and the rising tone, as in Table 2-
5 and Figure 2-2 [19, 22]. Tone, which is about pitch variation, is a very important
feature for Thai because different tones results in different lexical words and meaning,
as in the Table 2-6. There are many tonal languages used around the world but Thai
differs from them in many ways.

TABLE 2-1 Thai initial consonant

Character Phonetic symbol Point of articulation

ป p

พ, ภ, ผ pH

บ b

ต, ฏ t

ท, ธ, ถ, ฐ, ฑ, ฒ tH Plosive

ด, ฎ, ฑ d

ก k

ข, ค, ฆ kH

อ /

ม m

น, ณ n Nasal

ง N

จ tC
ช, ฉ, ฌ tCH

ฟ, ฝ f

ซ, ส, ศ, ษ s Fricative

ห, ฮ h

ล, ฬ l Lateral

ร r Trill

ว w
ย, ญ j

TABLE 2-2 Thai final consonant

Character Phonetic symbol Point of articulation

ป, ภ, บ, พ p
ด, ศ, จ, ช, ฐ, ฏ, ฒ, ถ, ท, ธ, ษ, ต t
ก, ข, ค, ฆ k
- /
ม, อํา m
น, ร, ญ, ณ, ล, ร n Nasal
ง N
ว, เ-า w
ย, ญ, ใ, ไ j

TABLE 2-3 Thai consonant cluster

Character Phonetic symbol Remark

ปร pr
ตร tr
กร kr

พร pHr
ทร tHr e.g., ทฤษฎี (tHri¤t sd$/ di#˘)

คร kHr

ปล pl
พล pHl
กล kl

คล kHl
กว kw
คว kHw

TABLE 2-4 Thai vowel


Short Long

Character Phonetic symbol Character Phonetic symbol

◌ิ i ◌ี i˘

เะ e เ e˘

แะ E แ E˘

◌ึ μ ◌ื μ˘

เ อะ F เอ F˘

ะ, ◌ั, ◌าํ, ใ , ไ , เ า,
a า a˘
◌ุ u ◌ู u˘

โะ o โ o˘

เ าะ, อ, ◌อ็ ç อ ç˘

เ ◌ยี ะ ia เ ◌ยี i˘a


เ ◌ือะ μa เ ◌ือ μ˘a

◌ัวะ ua ◌ัว u˘a

TABLE 2-5 Thai tones

Tone mark Tone sound Phonetic symbol

None (สามัญ) Middle ‹

◌่ (ไม้เอก) Low ›
◌้ (ไม้โท) Falling fl
◌๊ (ไม้ตรี ) High ¤
◌๋ (ไม้จตั วา) Rising ‡

FIGURE 2-2 The example of fundamental frequency (F0) contours of five Thai

TABLE 2-6 Example of Thai words with different tones

Thai Tone
Mid Low Falling High Rising

คา ขา่ ข้า/คา่/ฆา่ (khafl˘) ค้า ขา

1 (kha‹˘) (kha›˘) I or me (kha¤˘) (kha‡˘)
to be stuck galangal /value/to kill to trade a leg

คาว ่
ขาว ข้าว ค้าว (kha¤˘w) ขาว
2 (kha‹˘w) (kha›˘w) (khafl˘w) a kind of (kha‡˘w)
a bad odor news rice freshwater fish White

ฟา ฝ่ า ฝ้ า ฟ้ า ฝา
3 (fa‹˘) (fa›˘) (fafl˘) (fa'˘) (fa‡˘)
4 note to violate scum sky a lid

ซอง ่
สอง ่
ซอง ซ้อง (sç'˘N) สอง
4 (s狢N) (sç›N) (sçflN) to acclaim (s燢N)
an envelope to shine a bawdy house with one voice Two

ไซ ใส่ ไส้ ไซ้ ใส/ไส

5 (sa‹j) (sa›j) (saflj) (sa'j) (sa‡j)
a fish trap to put in/on intestine to preen limpid/to plane

2.1.3 Thai Culture: Characteristics of Thais and the Way They Respond to
Situations Overview on Thai Culture [25-43]
In general, meaning of the term “culture” is very wide; it covers history,
religion, art, food and so on. For this research, its meaning is narrower because it
focuses on characteristics of Thai people and the way Thai people respond to
From a survey, it has been stated in [26] that there are over 300 definitions of
the term “culture”. Some of those meanings that are consistent with the context of the
research are presented in Table 2-7, whereas, characteristics of culture are
summarized in Table 2-8 [27-28, 31].

TABLE 2-7 Some definitions of “culture”

Author Definition
The human-made part of the environment, an all-
Herkovits (1944) encompassing explanation leading to the nation that culture is
It is characterized by systems of shared meanings and
Geertz (1973)
A system of knowledge enabling communication with others
Keesing (1974)
and interpretation of their behavior
Vongvipanond A complex phenomenon, a sum total of behavior and belief of
(1974) a society.
The collective programming of the mind which distinguishes
Hofstede (1984)
the members of one human group from another.
A social unit’s collective sense of what reality is, what it
Putnam and
means to be a member of a group, and how a member ought to
Cheney (1985)
Triandis and Its critical attribute is reflecting “shared meanings, norms and
Albert (1987) values”.

The complex combination of common symbols, knowledge,

folklore conventions, language, information-processing
Ruben (1988) patterns, rules, rituals, habits, life styles, and attitudes that link
and give a common identify to a particular group of people at
a particular point in time”.
Society’s end product and generally refers to “the total
patterns of values, ideas, beliefs, customs, practices,
Komin (1991)
techniques, institutions, objects and artifacts, which make a
society distinctive”.
It is what learned shared by a large group of people, and
Lustig and Koester
creating their perceptions about beliefs, values, and norms
which reflect their behavior.

TABLE 2-8 Characteristics of “culture”

Culture characteristics

1. It is learned through the interaction among members of the culture

2. It provides for appropriate and acceptable behavior in the form of values, beliefs
and norms. It also identifies desirable behavior for the members.
3. It provides a means of organizing and classifying the environment in distinctive
ways. It also structures daily life.
4. It gives meaning and reality to one’s existence. It also provides a way for “seeing
a world”.
5. It is transmitted and passed on from generation to generation giving consistency
and tradition to the group,
6. Its common code is language that is used in rituals, education, institutions,
politics, religion and myths, for example.

Thais in Thailand have their own culture without a hybrid characteristic affected
by Western colonization like other countries or islands in Asia (e.g. Singapore,
Malaysia, Vietnam, Philippines, Macao and Hong Kong) that had been conquered by
western countries. Thailand is the only Southeast Asian nation, and one of the three
nations in Asia, including Mainland China and Japan, that has never been colonized
by Europeans (e.g. British, France, Spain, Portugal and USA) throughout the 19th
century [29-30]. Therefore, Thais have preserved their own rich culture [29].
Thailand with an abundance of resources located near the equator which
provides a warm climate benefits agriculture in many ways. Therefore, agriculture
spirit is embedded in Thai culture and can be seen in the daily life and the way Thais
live their life. Thai culture is different and unique from other societies. The
characteristics of Thais [25, 27, 31-32], clearly shows deep gratitude and generosity,
honor of elders, and grate respect to the king the queen and the royal family. Thais are
also kind, polite, easy-going, friendly, fun and carefree. Besides, Thais love peace and
they are proud to be Thai.
Thais are flexible and situation oriented rather
ideologically/principle/system/law [32, 42]. That means it is acceptable for Thais to
adjust laws and principles to fit situations. This can also be seen from using the phrase
่ นไร” (mA^j pe##n rA#j), which means never mind or it does not matter as Thais
frequently and prefer compromise to resolve conflict [28, 32-33]. This phrase also
presents the easy-going feeling about life, flexibility and a high degree of tolerance.
Normally, a Thai greets another by placing hands together and raising them to
the face, which is called “ไหว้” (wa^j), while say “สวัสดี” (sa $wa$t di#˘) and smile. In
general, the junior/younger/socially lower status starts the “ไหว้” to the
senior/elder/socially higher status because Thais are taught to respect
seniors/elders/superiors, teachers and parents.

Thais always smile, which can be implied as sincerity. It becomes one of the
characteristics of Thais; therefore, Thailand is referred by foreigners as the “Land of
Smile” and becomes a part of the good image of Thailand [34-35]. Knutson [36]
stated that it is not just slogan but it is real. Normally, there are various meanings of
smile in Thai society, for example, to excuse and give pardon for none-serious
mistakes, to thank without saying any word for a small service/support, to show
embarrassment, fear or remorse, or to avoid comment on some issues [25, 37]. Further
kinds of smile and meanings can be found in [31].
For other examples of Thai culture, each part of body has different levels of
honor [25]. The head is considered as the most honorable part, whereas, the foot is
considered as the lower part that is dirty. Therefore, touching the head of other people
or moving something for someone by the foot is generally impolite for Thais.
The national religion in Thailand is Buddhism, therefore, Thai culture is also
known as “Thai Buddhism culture” [31]. The belief or the faiths about this religion,
and the teachings from the “Tripitaka”, have a profound and pervasive influence on
Thai culture greatly since the Sukhothai era. Buddhism encourages individualism,
youthful idealism, open-mindedness, non-violent and tolerance [34, 38-39], for
example, Thais can accept and suffer their loss or occurrences in life because of the
attitude towards life with the Buddhist concept of “karma”, which is believed as a
“road map” to explain the “how and why” things happened in their life [32]. Good
things in life are believed to be the consequences from good karma, while, bad things,
suffering and trouble are explained as the results of bad karma.
Nevertheless, in Thai culture, “face-saving” is very important and sensitive. It is
about the top concern for “ego” that is of fundamental importance to Thais, related to
criticism/conflict/confrontation avoidance [25, 31, 39-40]. “Face” in Thai culture can
be implied as dignity, reputation, honor, shyness, respectability, credibility and
integrity. The face may be lost when, for example, the action of one with his/her
social position fails to meet the requirement, one is being laughed at or insulted by
other one, and one feels embarrassed and looks foolish in front of others [37, 40-41].
Therefore, avoidance of making someone lose face is very necessary [42].
However, for the negative perspectives of Thai culture, there are issues about
disordering, ignoring of the rules, lack of discipline and non-assertiveness [32, 43]. Thai Culture versus Western Culture
“A different group of people has different ways of life, different ways to give
meaning to things, and different values and behaviors. Therefore, social or national
culture is dictated by the values, beliefs, behaviors, and norms which permeate their
members and are expressed through the words and behaviors of those members in
society [43]”.
MacDonald [44] stated that Western culture is unique compared to other
cultures. Western culture has been influenced by the Catholic Church and
Christianity. This culture was evolved from the hunter-gatherer culture that adapted in
the cold and cloudy environments which is ecologically adverse climates because
Western countries are located far from the equator. People from Western culture are
free to: debate, argue, analyze and prove things reasonably and logically.
However, compared to Thai culture, there are many different issues. The
fundamental differences come from many factors, such as, geography, ecology,
climate, resources and beliefs. Many characteristics of both cultures are compared and

highlighted in Table 2-9 [27, 31-32, 46-48], while the comparison based on Hoftedes’
culture dimension is presented in Figure 2-3 [45].

TABLE 2-9 Some characteristics of Thai culture versus Western culture

Thai culture Western culture

Younger Culture Older Culture
Developed from the Sukhothai era over Developed from the Greek era over
800 years ago. 2450 years Greek ago.
Collectivism Individualism
Thais are group orientated. Evidence for This can be seen from the “I” basis of
this can be seen in Facebook statistics. Americans, for example.
Extended family structure Simple family structure
With this structure, grandparents and This structure is based on a single
grandchildren can be found living in the married couple and their children.
same house.
Based on Buddhism Based on Christianity
Thais believe both good and bad things Christianity influences Westerners to
in their life are a consequence of challenge the conditions under that they
“karma”, which is a “being” orientation live, which is a “doing” orientation (e.g.
(e.g. Thais often say “Mai pen rai”, as a phrase that may often hear from
described in 2.2.1). American is “Just do it” which has
become a slogan of one American
product already.
Base on agriculture Base on hunter-gatherer culture
Most Thai populations are in agriculture It is not based on agriculture because of
sectors. ecologically adverse climates.
Compromising Confrontation
Thais generally deal with conflicts or In general, westerners emphasis on
issues using compromise that may lead winning and losing, like black and
to corruption. white.
High context Low context
While talking, non-verbal messages When communicating, words mainly
(e.g. body language) are usually work by themselves.
Loose culture Tight culture
There are fewer rules and norms and There are strict rules and norms with
more tolerance for deviations. Besides, little tolerance for individuals. Also,
rules, values and perceptions are formal and legal rules regulate/govern
regulated/governed by beliefs. behavior by social science theory.
Unpunctuality Punctuality
Thais often go to meetings late. It is valued for Americans.
Informal relations Formal relations
In general, informal relations are Relations are generally formal.

FIGURE 2-3 Comparison of Hofstedges’ Score of National Culture in five

dimensions, consisting of Power Distance (PDI), Individualism
(IDV), Masculinity (MAS), Uncertainty Avoidance (AI) and Long-
Term Orientation (LTO) between Thailand and the two major
Western countries

In the Figure 2-3, it can be seen that Thailand is different from the two Western
countries, USA and UK, in all Hofstedes’ cultural dimensions obviously. Each
dimension has been described in [49] and can be represented as follows:
Firstly, Thailand shows a higher power distance score of 64 meaning Thais
accept unequal power between leaders and followers in society or workplace, whereas
the Westerners requires more equality. Secondly, Thailand shows a lower
individualism score of 20 meaning Thais tend to become a member of group in most
areas of life more than Westerners. Thirdly, Thailand presents a lower masculinity
sore of 34, which can be implied as Thais have less competitiveness ability and
assertiveness. Next, Thailand has a higher uncertainly avoidance score of 64, which
can be translated as Thais do not prefer change or to take risks. Lastly, Thailand has a
higher long term orientation score of 52 that means Thais plan far into the future.
Moreover, Knutson [29] presented that, in 1991, Komin displayed some
characteristics about Thai culture but those are rare to find in Western culture, for
example, caring-considerateness, contentedness, interdependence, mutually
helpfulness, and gratefulness.
To understand Thai culture deeper, it can be emphasized by considering Thai
terms such as, “บุญ” (bu#n) and “บาป” (ba$˘p), “บุญคุณ” (bu#n kHu#n), “กตัญญู” (ka$ tHa#n
ju#˘), “ญาติ” “ที่ต่าํ ที่สูง” (tHifl˘ ta$m tHifl˘ su&˘N), “ใจเย็น”
(ja^˘t), (tCa#j je#n)” or “ใจเย็นๆ”
(tCa#j je#n je#n), “สนุก” (sa$ nu$k) and “เลน ่ ” (le^˘n) [29-30].

2.1.4 Other Interesting Facts about Thais [50]

Thai population has been considered and divided into five groups based on their
income (1 is the lowest and 5 is the highest income). Only the highest income group
(the 5th group, 16.1%, that earns average income per person per month over 10,000
baht, ~22,000 baht, whereas, the first group (the lowest group), the 2nd group, the 3rd
group and the 4th group earn average income per person per month about 1,900 baht,
3,500 baht, 5,200 baht and 8,100 baht respectively. At the other point of view, 88.5%
of Thai population earns 15,000 baht per person per month maximum. Therefore,
from this information, it can be implied that most Thai people may have low standard
and low quality of life because of their low income.

2.2 Hearing and Language Ability

To conduct this research, voice over IP quality measurement and understanding
crucial background information about the ear and brain is useful. The details are
presented as follows:
2.2.1 Ear and Hearing Nature of Sound Waves [51-53]
To understand the ear and hearing, the nature of sound waves should be also
understood. Sound waves are mechanical waves which are alternating pressure waves
that consist of areas where the air molecules are compressed and areas where the air
molecules are rarefied. Sounds are combination of frequency and amplitude during
changing of time. The amplitude (loudness) of sound is about the difference in density
of air molecules in the areas of compression and rarefaction. The loudness can be
measured from the sound pressure level (SPL). It could be determined by using Eq.
(2-1) with the unit of loudness called decibels (dB), as follows:

L p  20 log 10 ( P / P0 ) (2-1)
Lp is sound pressure level (dB).
P is the root mean square (RMS) sound pressure (Pa)
P0 is a reference RMS sound pressure, normally it is 20x10-6 Pa
Loud sounds of 130-140 dB can be painful and damage hearing abilities of the
ear. Table 2-10 presents some familiar sounds with their SPL [53-54].
Naturally, sounds are not just one frequency but are in form of a mixture of
frequencies. The high frequencies generate the high pitch whereas the low frequencies
generate the low pitch.
In general, sounds of speech cover frequencies of about 100-3,000 Hz but the
range of frequencies that the human ear can hear acutely is about 500-5,000 Hz
(although the average of total range that the human ear can hear is between 20-20,000
Hz). Anatomy of the ear [52-54]
The human ear mainly consists of three parts, the external ear, the middle ear
and the internal ear respectively. The external (outer) ear includes the auricle (pinna),
external auditory canal (ear canal) and eardrum (tympanic membrane). Mainly, its
function is to gather sound waves, then conduct them pass through the eardrum, into
the next part, the middle ear. The middle ear includes three small bones (ossicles),

called hammer (malleus), anvil (incus) and stirrup (stapes) respectively (details may
be seen in [52-54]. The function of the middle ear is mainly to amplify the received
sound waves from the external ear in preparation before transmitting into the last part,
the internal ear. The internal (inner) ear is the most complicated part. It contains the
structures that support hearing and equilibrium. However, in this paper only hearing is
presented, the main part of the internal ear is cochlea. It is a spiral-shape structure that
contains the receptor cells (hair cells) which is for hearing. This part of ear is
connected with the nerve VIII, called vestibulocochlear nerve. The pressure waves
from sound waves lead the receptor cells to produce receptor potentials before
generating of nerve impulse to the nerve VIII which links to the brain, as in the Figure
2-4 [53]. Finally, nerve impulses are transmitted to the primary auditory area of the
cerebral cortex which locates in the temporal lobe of the cerebrum of the brain.
2.2.2 Brain and Language Ability Fundamental and Functions of Human Brain [53-56]
The brain consists of four main parts, brain stem, cerebellum, diencephalon and
cerebrum. Main functions of each part are presented as in Table 2-11 [53-54].
However, when the brain is considered at the other point of view, it could be
separated into Right Hemispheres (RH) and Left Hemispheres (LH), located in the
biggest region of the brain is the cerebrum as in Figure 2-5 [55]. Although the brain
seems symmetrical on its two sides and both sides share performance of functions but
each hemisphere specializes in performing some functions as Table 2-12 [53]:

Table 2-10 Familiar sounds and their loudness level in dB

Sound Sound Pressure Level (dB)

Rustling leaves 15
Ticking watch 20
Whispered speech 30
Elevator music 40
Conversational speech 50-60
Alarm clock, Shouting 80
Live rock band 100
Jackhammer 110
Propeller airplane 120
Jet airplane 130

FIGURE 2-4 The auditory pathway

TABLE 2-11 Functions of main parts of the brain

Part of the brain Main Function

- Eye movement
- Coordination of breathing and relay station
Brain stem
- Control of involuntary functions
- Arousal, sleep, muscle tone and pain modulation
- Movement coordination by providing feed back to motor systems
Cerebellum to ensure smooth movements of the eyes and body
- Motor coordination and balance
- Integrating center and relay station
- Homeostasis and behavioral drives, which consists of activation
of sympathetic nervous system, maintenance of body
Diencephalon temperature, control of body osmolarity, control of reproductive
functions, control of food intake, to influence behavior and
emotions, to influence cardiovascular control center and to
secrete trophic hormones.
- Perception of sensory information
- Skeletal muscle movement
- Integration of information and direction of voluntary movement
Cerebrum - Emotion, including pleasure, pain, docility, affection, fear and
- Learning, memory and intelligence
- Personality traits

FIGURE 2-5 Superior view of the brain shows left and right hemispheres

TABLE 2-12 Different function between LH and RH

Left Hemisphere Functions Right Hemisphere Functions

- Receives sensory signals from - Receives sensory signals from muscles on
muscles on right side of body left side of body
- Controls muscles on right side - Controls muscles on left side of body
of body - Musical and artistic awareness
- Reasoning - Space and pattern perception
- Numerical and scientific skills - Recognition of faces and emotional content
- Ability to use and of facial expressions
understanding sign language - Generating emotional content of language
- Spoken and written language - Generating mental images to compare spatial
- Identifying and discriminating among odors Language Ability of Human Brain [52, 54, 57]

There is some interesting information presented that 90% of the population are
left-brain dominant (right-handed). Moreover, in most people, approximately 95% of
the population, the areas of the brain that is responsible for language ability are found
in only the left hemisphere of the cerebrum, even 70 % of people who are right-brain
dominant (left-handed) or ambidextrous. The language ability involves two areas in
the cerebral cortex, Broca’s area or motor speech area of Broca which is in the frontal
lobe, concerned with speaking ability by controlling production of speech, and
Wernicke’s area or sensory speech area of Wernicke which is a broad region in the
temporal and parietal lobes, concerned with language comprehension by interpreting
the meaning of speech by translating words into thoughts, as in Figure 2-6 [54] and 2-
7 [57].

FIGURE 2-6 Lateral view of the brain (LH) that shows important areas related to the
motor speech area of Broca and the sensory speech area of Wernicke

FIGURE 2-7 Working of the brain, when listening, the speech is perceived and
interpreted by the sensory speech area of Wernicke, then it is
responded by the motor speech area and the motor cortex respectively
to speak out as responding Tone Processing in Thai Brains

There are few papers that investigated about perception of Thai lexical tones
directly [58-65], as follows:
a) W. Sittiprapaporn et al. found from their experiments using
EEGs and Low-Resolution Brain Electromagnetic Tomography (LORETA) with the
Mismatch Negativity (MMN), with 9 Thai native listeners who do not have Chinese
knowledge, that only the left hemisphere of Thai native listeners were activated
significantly when listening to the native condition with Thai speech which is the
mother language, whereas, the right hemisphere of them were activated when
listening to the non-native condition with Chinese speech [58]. Besides, it was

confirmed in [59] that part of LH was predominant in the perception of the prosody of
Thai speech sound, while the prosody of Chinese speech sound was dominated by the
part of the RH. From both [58-59] it is evident that only the left hemisphere can
respond to native speech sounds, as shown in Figure 2-8 [58]. Moreover, it was found
that the response of the LH of brains of Thai native listeners to the familiar Thai word
were larger than the response of the LH of brains to the unfamiliar word [60]. From
this study, it has been implied that it may reflect the presence of a long-term memory
trace for familiar Thai words, as shown in Figure 2-9 [60].
b) Gandour J., et al. found from a study about discriminating
pitch patterns in Thai words, that Thai native listeners showed activation in some
parts of LH when compared with an American English native listeners who do not
know Thai [61]. Similar to the results from one later work, about presenting Thai
tones to groups of Thai native listeners and Chinese and English native listeners who
do not know Thai, as in Figure 2-10 [62]. Also, there are several papers that revealed
tone discrimination by the LH of native listeners to other tonal languages [63-65], for
example, Mandarin Chinese, as shown in Figure 2-11 [63], and Norwegian. The
results are consistent.



FIGURE 2-8 Schematic representation of MMN regions of interest when listening to

(a) native condition, see dark-gray area, LH of Thai brain was
activated and (b) none-native condition, LH of Thai brain was not
activated but RH of Thai brain was activated instead

(a) (b)

FIGURE 2-9 Potential maps of electric MMN response, for (a) Familiar word and
(b) Unfamiliar word

FIGURE 2-10 Comparison of English brain, Chinese brain and Thai brain, when
comparing the Thai lexical tone to the pitch task, only the Thai brain
showed significant activation in the FO near Broca’s area

(a) (b)

FIGURE 2-11 The PET scanning results from the PET study of tone perception that
shows different response on parts of (a) the left hemisphere of the
Mandarin Chinese group (b) and the right hemisphere of the English

2.3 Voice over Internet Protocol

2.3.1 Understanding the Basics Voice versus Speech
Speech is the most natural form of human communication that is mainly
produced by precisely coordinate muscle actions in the head and neck, as shown in
Figure 2-12 [66], controlled from the brain [67-68]. Normally, it consists of talk-
spurts and silence gaps [69]. However, the definition of “speech” is a subset of
“voice”, due to every speech is voice but every voice is not always produced as
speech because it covers various sounds, including adult humans laugh, cry, sign, and
infants babble [68]. Voice is generated by airflow from lungs as the vocal folds are
brought close together. The vocal folds in the larynx (or voice box) vibrate when air is
pushed to flow past them with enough pressure. Individually, voice is as unique as
fingerprint. Besides, it can express personality, mood and health of the speaker.
Therefore, the given definitions of the term “speech” and “voice” can be
implied as a possible reason of defining of the terms “VoIP (Voice over IP)” instead
of “Speech over IP”.

FIGURE 2-12 The vocal organs of human Definitions: VoIP versus IP Telephony [70-78]

VoIP is an emerging technology for communicating or an attractive form of
communication for users using Internet protocol, instead of traditional systems (e.g.
typical analog telephone lies), to digitize and convert the voice signal into voice
packet format that can travel over the Internet [70-71]. This is to allow voice traffic to
be transmitted over the Internet as data traffic. Therefore, it becomes a way to carry
voice calls over the Internet including packetization of the voice streams [72] VoIP
mainly refers to a conversion and transportation process but it does not add any value
to the user experience [73]. For ITU-T, the term of VoIP covers Voice over
Broadband (VoB), Voice over Digital Subscriber Line (DSL), Voice over Internet,
Voice over Wireless Local Area Network and Internet Telephony [74]. This term has

been used to refer collectively to a large group of technologies designed to provide

Internet-based communication/services. However, VoIP should be referred only to the
underlying transport protocol that packetizes voice traffic and audio streams, then,
allows them to be transported over data networks using Internet Protocol [75]. On the
other perspective, VoIP is a service that offers an alternative of cheaper voice calls
that by passes the public switched telephone network (PSTN) [76].
For IP Telephony, it is also commonly known as VoIP. However, IP Telephony
is not VoIP, due to these terms have some differences [78-79]. IP Telephony refers to
the telephony applications that are enabled in the same IP environment as well as the
integration of these applications with mainstream business processes [74]. It is about
the user experience and the business case. As in Figure 2-13 [73], IP Telephony is
beyond VoIP, whereas IP network is only the foundation [74]. IP telephony is more
than simply VoIP because it refers to call processing and signaling technologies that
provide end-to-end voice, data, and video communications/services in the enterprise,
based on Internet protocol family of stands. For the other definition, it has been
mentioned that IP Telephony is an emerging set of Technologies that enables voice,
data and video collaboration over IP based networks, including local and wide are
network. Generally, IP telephony uses ITU-T and IETF standards, H.323 and Session
Initiation Protocol (SIP) respectively. IP-PABX that is not a hybrid system is an
example of IP telephony systems that are used in enterprises/businesses [76].

FIGURE 2-13 Business perspective to IP telephony Advantages and Limitations of VoIP [1, 79-83]

There are a lot of advantages from VoIP technology to, users/consumers and
enterprises/businesses and service providers/businesses. For users/consumers [79], of
course, they are very satisfied about the cheaper costs compared to the traditional
telephone rates. Besides, for some VoIP services/applications such as Skype, Google
Talk, TOT netcall and True Nettalk, users can use freely in the case of PC-to-PC.
For businesses and organizations, they usually own their private networks [1,
80-81]. Therefore, if they deploy VoIP systems (e.g. IP PABX), they can gain more
security and reliability, while, they can utilize their data networks for voice traffic.
Also it has been reported that they are expected to reduce telephone costs of about 30-
45%, which is one of the major benefits for businesses and organizations, while the IT
departments can manage the network administrators to take care of both data systems
and voice systems. Moreover, VoIP provides increased flexibility and location
independence and improved productivity.
For service providers [74, 79], VoIP technology mainly helps them to reduce
costs of investment and operation. For example, reductions in installation of
submarine cables or new fibers in the ground and the possibility of offering services

over a unified network. This also helps service providers to enhance innovation by
offering new services to users/consumers over IP network, with new business models,
such as flat-rate pricing. Moreover, it helps Internet Service Providers (ISPs) to enter
into the telecom market; on the other hand, it helps PSTNs to enter into the broadband
However, the other side of the coin, there are several issues that become the
limitations of VoIP. For example, voice quality that will be presented in detail with
Section 2.4, security, reliability and voice quality. For security, including VoIP threats
and vulnerability [82], it has been revealed that DoS, which relates to availability, is
the major attacks, whereas, 90% of vulnerability issues were found during
implementation of VoIP system (the remaining was from configuration and VoIP
For reliability, this is about frequency of failure and recovery time of VoIP
system after a failure. This issue may become big if power shortages occur frequently
and requires a long time to recover, due to the VoIP system does not provide backup
power to all IP phones, as in the traditional PABX system that supplies power to the
legacy phones--analog and digital phones [83].
For the other obstacles [79], VoIP can be seen as a threat to revenues of
traditional PSTNs, particularly in the market that is less mature and monopoly,
whereas, regulatory from the regulator or the telecommunication commission can
become an issue for new entrants.
2.3.2 Overview on VoIP Services in Thailand The Regulator [84-94]
In 2011, the first committees of the National Broadcasting and
Telecommunications Commission (NBTC) of Thailand were approved [84-86]. The
important mission of this commission is to complete the auction of the long-awaited
licenses for the 3G service, after the 3G license auction problem became talk of the
town in 2010 [87] (although the 3G service has been used in Bangkok and several
provinces already). While 3G communication technology is very important for
Thailand because it has been studied and found that 3G network investment
dramatically affects Thailand’s economy (e.g. boosting up the number of mobile
phone and wireless internet users), leading to higher employment rate and the GDP
[88]. Besides, it is the big step, leading to 4G communication technologies, which is
currently used in some cities in Scandinavia [89]. 4G, as shown in Figure 2-14 [90],
can provide a lot of advantages using IP-based core network, for example, very high
speed rates of data transfer and high capacity to support applications that require more
resources including VoIP applications.

FIGURE 2-14 4G network overview

The main function of the NBTC, the former National Telecommunications

Commission (NTC), is to act as the regulator for broadcasting and telecommunication
in Thailand. Therefore, telecommunication operators who want to run the
telecommunication business, including VoIP services, must follow the NBTC
regulatory. However, several years ago, NTC put forward some regulations about
VoIP services, such as assigning VoIP telephone numbering and defining QoS of
VoIP services for Thailand, as presented in Table 2-13 [15, 91].
Moreover, NTC has promoted the usage of VoIP services in Thailand, while the
communication technology of the future that NTC has selected to promote NGN to
replace PSTN [92]. Of course, NGN can support VoIP services, as presented in Figure
2-15 and 2-16 [92, 94]. NTC has a collaboration with KMUTT to create tools for the
VoIP system’s development using Asterisk which is open source called ANT
(AsteriskNow for Thailand) [93].

TABLE 2-13 VoIP QoS specification

Type of Use
Phone-to-Phone Phone-to-Phone
QoS PC-to-
PC-to-PC (Excluded 06-xxxx- (The Calling No.:
Parameters Phone
xx) 06-xxxx-xxx)
R-value Not defined > 50 > 80 > 80

 0.15  0.15  0.15

End-to-end delay Not defined < 400 ms < 100 ms < 150 ms
Call failure rate Not defined

FIGURE 2-15 NGN layers

FIGURE 2-16 NGN subsystem architecture VoIP Operators in Thailand

It has been presented in [95] that there are seven VoIP operators in Thailand
that have been licensed from NBTC at present, as in Table 2-14 [96]. The major
players of the VoIP market in Thailand consists of CAT, AIN (a member of Shin
Corporation), TIC (a member of True Corp) and particularly TOT that started the use
of VoIP technology behind the scene for Y-TEL 1234 about 10 years ago [92]. There
are many VoIP services provided by those operators. Most VoIP services are certainly
cheaper than the traditional services, particularly when using PC-to-Phone VoIP
services. It saves cost, for example, about 89-95% when calling USA and China, as
shown in Table 2-15 [97-100] and 2-16 [101-104]. There are other VoIP services,
such as, CAT PhoneNet that provides a PIN when using a traditional/mobile phone
with cheaper rates than CAT 001 [105].

TABLE 2-14 VoIP operators and VoIP numbers at present

Operators Quantity VoIP_Number

06 00000 000 – 06 00000 999

MicomSystem Co.,Ltd. (MSC) 2,000
06 00009 000 - 06 00009 999
True Internet Co.,Ltd. (TI) 15,000 06 00022 000 – 06 00036 999
TT&T Subscriber Service Co.,Ltd.
1,000 06 00052 000 – 06 00052 999
CAT Telecom Public CAT) 10,000 06 00053 000 – 06 00062 999
Super Broadband Co.Ltd. (SBN) 10,000 06 00063 000 – 06 00072 999
10,000 06 00073 000 – 06 00082 999
2,000 06 00020 000 – 06 00021 999
TOT Public Co.,Ltd. (TOT) 1,000 06 00037 000 – 06 00037 999
10,000 06 00040 000 – 06 00049 999
2,000 06 00050 000 – 06 00051 999
ACeS Regional Service Co.,Ltd.
2,000 06 00017 000 – 06 00018 999

TABLE 2-15 Comparison of International call rates of Traditional services and some
PC-to-Phone VoIP services (rates might be changed depending on the
promotion of each operator, unit: Baht/minute)

Service Rate
Traditional – CAT
Country CAT2call TOT netcall True NetTalk
Cambodia 20.00 3.00 2.00 3.00
Brunei 22.00 2.00-2.50 1.90-2.00 2.00
Myanmar 22.00 12.00 10.00 12.00
Philippines 20.00 6.00-7.00 5.50 6.00
Malaysia 9.00 0.91-2.00 1.00-1.60 1.00-3.00
Vietnam 20.00 3.00 2.10-2.20 2.50
Singapore 9.00 0.50 0.90-1.00 0.50-1.00
Laos 14.00 3.00 2.25 2.50
Indonesia 18.00 4.00-6.00 3.25-5.00 4.00-6.00
China 9.00 0.50 1.00 0.50-1.00
Japan 18.00 0.91-5.00 1.70-5.00 1.00-5.00
South Korea 18.00 0.91-3.00 1.30-2.00 1.00-3.00
UK 14.00 0.75-7.00 1.20-7.00 1.00-6.00
USA 9.00 0.50 1.00 0.50-1.00

TABLE 2-16 Comparison of international call rates of some services of phone-to-

phone via VoIP networks (rates might be changed depending on
promotions of each operator, unit: Baht/minute)

Service Rate
CAT 009,
Country TOT 008 AIN 00500 TIC 00600
Cambodia 14.00 24.00 14.00 7.00
Brunei 5.00 5.00 5.00 2.00
Myanmar 14.00 20.00 14.00 12.00
Philippines 15.00 18.00 15.00 8.00-9.00
Malaysia 4.00 5.00 4.00 1.00-3.00
Vietnam 14.00 26.00 14.00 7.00
Singapore 3.00 7.00 3.00 1.00
Laos 4.00 6.00 5.00 4.00
Indonesia 5.00-8.00 7.00 5.00-8.00 4.00-7.00
China 3.00 5.00 3.00 1.00
Japan 5.00-6.00 5.00 6.00-7.00 1.00-5.00
South Korea 4.00-7.00 5.00 5.00-7.00 1.00-3.00
UK 6.00-7.00 7.00 6.00-7.00 1.00-6.00
USA 3.00 5.00 3.00 1.00

2.3.3 VoIP Architecture Overview [16]

VoIP is a Internet Protocol-based technology. Therefore, the original voice
signal must be changed to voice packets by using a voice codec, before transmitting
into IP network, as presented in Figure 2-17 [16]. To make VoIP work and enable end
users to talk to each other successfully, it also requires VoIP signaling protocol and
QoS mechanisms, to ensure that there is available bandwidth to transmit voice packets
because VoIP applications are a real-time application that cannot tolerate unreliability
of IP networks. Security is also very important, therefore, it is also recommended to
provide securities for VoIP applications, like other computer systems/ applications.
Further information about each of the components is described separately in the next
few sections.
2.3.4 Codecs [16]
Codecs are the algorithms that enable VoIP systems/applications to carry voice
signals over IP networks [16]. There are many codecs, varying in complexity,
bandwidth consumption and voice quality. Each can be narrow band, wideband or
multimode. Normally, the more bandwidth a codec requires the better quality of
voice. Codecs that can be applied to VoIP applications are shown in Table 2-17 [16].
However, only four codecs have been applied in this research, consisting of
G.722, G.711, G.729 and G.723.1. Therefore, these four codecs have been described,
as follows:

FIGURE 2-17 VoIP architecture overview G.722 [16, 106]

This is a wideband codec that supports up to 7 kHz using 64 kbps, the same rate
as G.711. It is coded using Sub-Band Adaptive Differential Pulse Code Modulation
(SB-ADPCM). The frequency band is separated into higher and lower sub-bands.
There are three operation modes at 64 kps for audio coding only, 56 kbps for audio
coding and 8 auxiliary data channel and 48 kbps for audio coding and 16 auxiliary
data channel. In general, it is supposed to provide better voice quality than narrow
band codecs. G.711 [16, 107-108]

Subtypes of G.711 are G.711-law and G.711A-law. The -law is mainly used in
It is a 64 Kbps coding technique called Pulse Code Modulation (PCM).

USA, Canada and Japan, whereas the A-law is used widely in Europe and the rest of
the world, including Thailand. Actually, this kind of codec is not a new thing that has
came with VoIP technology because it has been used in ISDN since the decade 90. It
is usually used in LAN with its MOS of about 4.1, whereas its LAN bandwidth
requirement with 1-3 frames/packet or 10-30ms payload is about 75-100 kbps per call. G.729 [108-110]
This codec is an 8 Kbps coding technique called Conjugate Structure -
Algebraic code-excited linear prediction (CS-ACELP). It is usually used in WAN. Its
WAN bandwidth requirement with 1-3 frames/packet or 10-30ms payload requires
about 20-45 kbps per call (it depends on type of WAN). Its MOS is 3.92. It is not
designed for music. Besides, it does not reliably support DTMF tones and cannot
support fax or modem. There is a total algorithmic delay of 37.5 ms. G.729 has
several annexes such as Annex A, called G.729A, that is the reduced complexity
version of the G.729. It is the original codec in its family which consists of, for
example, G.729B, G.729D and G.729E. G.723.1 [108-109, 111]
There are two voice coding techniques, Multi-Pulse Maximum Likelihood
Quantization (MP-MLQ) with 6.3 kbps which provides better voice quality than the
second technique – Algebraic –Code Excited Linear Prediction (ACELP) with 5.3

kbps. There is a total algorithmic delay of 15 ms. However, the voice quality is not so
bad because the MOS is 3.8 and 3.6 for the bitrate of 6.3 kbps and 5.3 kbps
respectively. Therefore, it is optional for use in WAN, whereas, it requires WAN
bandwidth 18-27 kbps per call approximately. It was created from about 18 patents
from several organizations. Codec Selection [16, 112]
This is an important issue for implementation and administration the VoIP
systems/applications, due to its affects of voice quality perception of VoIP users. As
displayed in Table 2-17 [16], the last column is MOS (it stands for Mean Opinion
Score) that is the metric for voice quality, which will be present in detail with Chapter
5. The higher MOS means the better voice quality, however, it is a trade-off with
higher bandwidth consumption. Moreover, the use of a low bitrate which can impact
quality might be of concern to VoIP users.

TABLE 2-17 Codecs and their properties


Therefore, it has been recommended [112] that low bitrate codecs (e.g. G.729
and G.723.1) should be used over WANs, whereas, high bitrate codec (e.g. G.711 and
G.722) should be applied over LANs. However, codec selection may depend on
available codecs in each VoIP system because some codecs are not free, they may
require a license for use.
2.3.5 VoIP Signaling Protocols
VoIP signaling protocol is the main part that enables other components in a
VoIP system to work together. It makes the connection session of a call between
endpoints which are registered to the VoIP system already. Functions of IP signaling
protocol can be divided into four main functions, consisting of user location which is
to discover the location of the endpoint to establish a session, session setup which is
to enable the establishment of session parameters to call the endpoint and a called
endpoint, session negotiation which is about a set of properties for the session that is
involved in the call by endpoints, and call management which allows endpoints to join
a joint session or release.
In the telecom market at present, H.323 and Session Initiation Protocol (SIP) are
the main players [16, 112]. However, there are other VoIP signaling protocols that can
be used alternatively, such as Media Gateway Control protocol (MGCP),
MeGaCo/H.248 protocol and Inter-Asterisk exchange protocol (IAX) [16]. Skype
which is popular and classified as a peer-to-peer protocol, can be applied for personal
purpose and small businesses that require few VoIP clients for users [113-114].
Nevertheless, only H.323 and SIP that have been described in detail in this thesis. H.323
H.323, as presented in Figure 2-18 and 2-19 [78], is the first official protocol
suite of VoIP signaling [78, 115]. It was developed and promoted by ITU-T, which is
the old standard body of telecommunication standards. ITU-T has been looking after
telecommunication standard since the analog era. Many standards such as SS7 can be
compatible and interworks with the traditional telecommunication technologies of
PSTN successfully because ITU-T experts in PSTN standards are mostly issued by
themselves. Due to the fact that H.323 came first into the VoIP industry, therefore, it
is now mature and has dominated the market in last decade, the first era of VoIP.
Basically, H.323 was design and developed using four key components that consists
of terminals, Gateway, Gatekeeper and Multipoint Control Unit (MCU) [78, 116-
118]. These components has been described briefly in [119] that terminals are H.323
endpoints or clients for communication with the other H.323 terminals, which can be
both IP softphones and IP hardphones, while gateways are the interface devices
between a H.323 terminal and a non-H.323 terminal from a circuit-switched network.
The gatekeeper works as the controller to provide central management and control
services such as address translation, admission and access control of H.323 endpoints,
bandwidth management, zone management, call signaling and management. Of
course, all MCUs, gateways and terminals must be registered with a gatekeeper. Last
but not least, MCU is used for managing multipoint conferences (from at least three)
by handling the signaling to add participants to a conference call and remove if it is
required. Conference calls could be both audio and video. However, MCU might be
combined into a gatekeeper or a gateway.

FIGURE 2-18 H.323 architecture overview

FIGURE 2-19 H.323 protocol suite SIP
SIP stands for Session Initiation Protocol. Its architecture and protocol stack can
be seen in Figure 2-20 and 2-21 [78]. It has been described that it is a text-based peer-
to-peer protocol, using design concepts and architecture from Transfer Protocol
(HTTP) [117]. Its fist version was issued in early 1999 [119]. SIP, a high potential
competitive protocol for H.323, was developed and promoted by Internet Engineering
Task Force (IETF), which is the standard body for Internet protocols. Of course, IETF

experts in Internet technology, that likes the road for VoIP applications. Therefore, it
could be implied that the SIP protocol should be more compatible with IP protocol
than H.323 that has some issues about its complexity, centralization and monolith
[116]. However, IETF is not an expert of interworking with the traditional
technologies provided by the PSTN. Also, it was developed later than H.323,
therefore, its major point of weakness is the lack of maturity, compared to H.323 that
is more mature [120]. SIP mainly provides five functions to VoIP systems [117, 119],
consisting of session setup, session management, user location, user availability and
user capacities. These functions have been described briefly in [115] that session
setup is to enable the establishment of session parameters for both calling and called
parties, whereas session management is to manage the session by modifying session
parameters, transferring and terminating. For user location, it is to discover the
location of the end user when delivering a new SIP request or establishing a session,
while user availability and user capacities are to enable the determination of the
willingness of the called party to communicate and reachability of an end user, and to
enable the determination of media capacities of the components that can be used.
Also, it has been described [115] that in general SIP-based VoIP systems are the
major components consisting of User Agents (UA) and network servers. For UA, a
User Agent Clients (UAC) initiates a call while a User Agent Server (UAS) basically
replies. For the network servers, there are a proxy server for forwarding SIP requests
and providing the routing function, a registrar server for supporting register to clients,
a redirect server for directing the UAC to contact the alternative or next server, and a
location server for supporting address resolution [118-119].

FIGURE 2-20 SIP architecture overview

37 Comparison between H.323 and SIP [120-126]

After H.323 occupies the market-share in the IP telephony market, in the last
decade [120], the momentum of the market is becoming more balance for H.323 and
SIP now. For technical perspective, as shown in Figure 2-22 [123], it seems the basics
of call setup from both H.323 and SIP are not different. However, SIP has been states
in [121-122] that SIP provides lower complexity, rich extensibility, better scalability
and well-suited for Internet protocol compared to H.323. On the other hand, in terms
of telephony applications, well-suited for the traditional Telephony systems and
protocol maturity, H.323 seems to be better than SIP. Nevertheless, the comparative
study between the two major VoIP signaling protocols, H.323 and SIP, can be
summarized as in Table 2-18, adopted from [124-126].

FIGURE 2-21 SIP protocol suite

FIGURE 2-22 A comparison of message flows between H.323 and SIP


TABLE 2-18 Comparison of H.323 versus SIP

Comparative Area H.323 SIP

Complexity Complex Simple

Encoding Binary Text (similar to HTTP)
Extensibility Limited Not limited (Easy)
Compatibility Requires full backward Does not Require full
compatibility backward compatibility
Scalability Less scalable More scalable
Services Richer set of functionality Richer set of
Loop control Limited/difficult Good/easier
Mobility Limited Flexible and rapid
Network Intelligence Provided by Gatekeepers Provided by Servers
Capability Negotiation Good Linited
PSTN Integration Well suited Non-native
Conferencing Using MCU Using IP multicast
Conference control Supported Not Supported
Documentation Long Short
Implementation time Long Short
Firewall/Proxy More difficult Simpler

2.3.6 Mechanisms for Quality of Service [16, 127-130]

Quality of Service (QoS) is very important for VoIP applications, the real-time
applications, that operate over IP networks, which are unreliable media, based-on data
In general, the ideas to control QoS have been categorized as in Table 2-19.
Whereas, Chen et al, classified QoS mechanisms for VoIP into data plan and control
plan [127]. However, it was mentioned in [16] that it has been studied and found that
some QoS mechanisms work fine for data, for example, The First In First Out (FIFO),
The Earliest Deadline First (EDF), Weighted Round Robin (WRR) and Weighted Fir
Queuing (WFQ) are not appropriate to guaranty speech quality of VoIP and satisfy
the perception of users. Also, other approaches have been proposed, such as allocating
slack capacity and capacity over-providing but there are problems about determining
the required bandwidth of traffic load for voice packets in advance and scalability if
the traffic load of voice packets increase, respectively. Therefore, the approach called
Call Admission Control (CAC), as presented in Figure 2-23 [16], which is in the
control plan [127-128], has been considered.
CAC is used to make the decision if a new VoIP call can be admitted into the
VoIP network without issuance of any problem to existing VoIP calls that have been
served with guarantee of QoS or not. CAC mechanisms have been mainly classified
into four types, consisting of: per call parameter-based admission control, per call
reservation with path-based bandwidth allocation, per call reservation with link-based
bandwidth allocation and measurement-based admission control that has been divided
into six subtypes. Further information about those mechanisms can be found in [16].

Mainly, the perspective of CAC mechanisms is about VoIP system design and
development. For VoIP system implementation, installation and services, it has been
suggested to control VoIP traffic using Virtual LAN (VLAN) [112, 129-130], which
could be categorized as path control - traffic adaptation mentioned in Table 2-19
[128]. Not only virtual network isolation but VLAN also restrict access and protects
against both intentional and unintentional service disruptions to critical VoIP
servers/equipments [129-130].

FIGURE 2-23 Overview of CAC mechanisms

TABLE 2-19 General QoS controls

QoS Controls Objectives Examples


To block unauthorized network access Password


To apply a user’s profile of permissions Access list
Application To disallow access to some application UDP port
control only control
To classify traffic (e.g., gold and silver Differentiated

service) services

To smooth traffic and prevent overload

Shaping Queuing
Policing To block traffic that violates policies
access rate limits
Policy and
To allow using some protocols only application
Traffic Adaptation

Path control To allow using some paths only
based routing
To change user behaviours Chargeback
Congestion Random Early
To avoid congestion in a network
avoidance Detection (RED)
Congestion To handle congestion without network Fair queuing
management failure and priority

2.3.7 Header Compression [16, 131]

It has been stated in [131] that, in general, the payload of the IP packet is almost
the same size or even smaller than the header. Over the connection of multiple hops,
the protocol headers are very important for the end-to-end connection, but it is not
necessary for hop-to-hop connection. Therefore, if those headers are compressed, it
can save the bandwidth of the networks and increase efficiency of resource usage.
Moreover, it can help decreasing bit error rate or packet loss rate and response time
that relates to end-to-end delay [16, 131].
For voice packets of VoIP applications, each packet includes header. Each
header consists of IP/UDP/RTP header of 40 bytes totally, as in Figure 2-24 [16].
Mainly, a packet header carries packet information, such as, source address and
destination address. That means, some bandwidth of a VoIP network is occupied by
packet headers that are almost the same.
Therefore, the ideas to reduce the size of voice packet headers have been
applied using a header compression technique. This technique can reduce much of the
bandwidth consumption of the VoIP flow. After compressing, a compressed header
can be smaller than an original header by 20-40 times, from 40 bytes to 1-2 bytes, for
IPv4 it’s about 20 times, and 60 bytes to 3 bytes for IPv6. However, this technique
can be implemented on a hop-by-hop basis only, not an end-to-end basis.
Header compression approach can be classified into IP Header Compression
(IPHC), Compressed Real Time Protocol (CRTP), Enhanced CRTP (ECRTP) and
Robust Header Compression (ROHC). Those techniques are compared in Table 2-20
[16]. It is recommended that IPHC and CRTP are appropriate for low delay links,
whereas, ECRTP and ROHC are appropriate for lossy links. Further information
about IP header compression can be found in [16, 131].

FIGURE 2-24 Voice packet compression overview


TABLE 2-20 Comparison of header compression techniques

Compression techniques
Robustness to errors Low Low High High
Robustness to long delays Low Low High High
Robustness to reordering No No Yes No
Complexity Low Low Low High
Compression ratios High High Medium High
Maximum compression 2 bytes 2 bytes 2 bytes 1 bytes

2.3.8 VoIP Security [129, 132-136]

A VoIP system cannot operate without security because it could be hacked.
Therefore, security is one of the highest concerns that should be presented in this
It has been mention that VoIP technology emerged with new vulnerabilities
[132] because VoIP, which is a modern telecommunication technology, is very
different from the traditional telecommunication technology. For example, an IP
phone can logged-in everywhere that has access and registered to a VoIP system
which now includes wireless technology. VoIP threats have been classified in [133],
six threats that consists of Denial-of-Service (DoS) attacks, theft of service, telephone
fraud, nuisance calls (or spam over IP telephony), eavesdropping and
misrepresentation. One of the most referred to vulnerable classifications of VoIP is
security and privacy threat of taxonomy [134] from VoIP Security Alliance
(VoIPSA), cited in [135]. This consists of social threats, eavesdropping/hijacking,
DoS, Service abuse, physical access and interruption of services. At the other point of
view, including some administrator tasks, such as, move, add and change, the threats
have been classified into seven threats [129], consisting of external attack (including
DoS attacks), internal misuse and abuse, theft, system malfunction, service
interruption, human error and unforeseen effects of change.
However, it has been mentioned in [135] that most problems (e.g. server or
equipment crashes) lead to DoS attacks. This is consistent with the previous work,
which stated that the most dangerous threat is DoS attacks [136] because all users
cannot use their IP phone if the VoIP system is out-of-service. This issue is a serious
concerned because traditional telephone systems can provide five nine reliability and
availability or 99.999% up-time. That means it allows only few minutes downtime per

2.4 Voice Quality Measurement

2.4.1 Quality of Experience
Unlike Quality of Service (QoS) which is relatively well understood and
established, Quality of Experience (QoE) is still an active area of research and
standards of work [137]. The fact about these two terms can be supported by Table 2-
21 that has been obtained from Kilkki’s study [138]. However, definitions of these
two terms have been defined by ITU-T officially. The comparison of the definitions

of QoS and QoE can be seen in Table 2-22 [139-141], while definitions of the term
‘QoE’ have been described widely, as in Table 2-23 [137-138, 142-148].
To understand QoE for VoIP systems/services/applications clearly, the
comparison of QoS and QoE for VoIP can be drawn as in Figure 2-25, whereas,
Figure 2-26 shows that QoE for VoIP is on top of QoS [149-150]. Similar to
traditional telephony services provided by PSTNs, for VoIP / IP Telephone users,
QoE of VoIP relates to several categories of the experience of users, their
expectations are presented in Table 2-24 [148]. Of course, poor QoS within a VoIP
network can result in poor QoE which may dissatisfy as the User A1 in Figure 2-27,
adopted from [147].

FIGURE 2-25 Comparison of QoS and QoE: test points

FIGURE 2-26 Position of QoE and QoS for VoIP


FIGURE 2-27 VoIP QoE: Good QoE perceived by User A vs poor QoE perceived
by User B

2.4.2 Voice Quality Measurement Overview Understanding Voice Quality Measurement
For VoIP - Quality of Experience, voice quality is one of the most important
categories for VoIP users. Therefore, voice quality measurement is also important in
order to assess or evaluate voice quality of VoIP systems/applications/services [145,
By the way, what is voice quality? Voice quality is very subjective and
ambiguous; therefore, defining voice quality is also difficult because good voice
quality for one user may be just fair for other users, particularly in another culture or
country that has cultural differences [112, 152]. Voice quality is made up of both
subjective (e.g. expectation and conversational effort) and objective factors (e.g.
hardware, software and network conditions including packet loss, packet delay and
jitter), as in Figure 2-28 that adopted from [152-153]. In telecommunication networks,
including VoIP networks, voice quality can be generally described as the result of the
judgment from subjective assessment by users who perceived the speech that has been
provided over the networks [154].
Voice quality measurement is classified into subjective and objective
measurement, as in Figure 2-29 [155]. Each has both advantages and disadvantages as
in Table 2-25 [1]. For fundamental, voice quality is traditionally evaluated using
subjective measurement. However, subjective measurement has limitations, thus
objective measurement has been developed and now become very popular.
Nevertheless, the result from subjective measurement is crucial because it is required
as the benchmark for calibration of objective measurement. Further information about
both subjective and objective measurement can be found in 2.5.

TABLE 2-21 The Statistics of the terms QoS vs QoE, from the abstracts of IEEE
No. of appearing times of the terms
2002-2004 2362 6
2005-2007 (by 26 Oct only) 2627 16

TABLE 2-22 ITU-T definition comparison of QoS vs QoE

ITU-T Definition

The collective effect of The overall acceptability of an application or
service performances, which service, as perceived subjectively by the end-user.
determine the degree of NOTES:
satisfaction of a user of the 1. Quality of Experience includes the complete end-
service. to-end system effects (client, terminal, network,
services infrastructure, etc.).
2. Overall acceptability may be influenced by user
expectations and context.

FIGURE 2-28 Influence factors for voice quality

FIGURE 2-29 Voice quality measurement concept


TABLE 2-23 Some definitions of QoE

Author Definition
Hestnes et al. 1. The user's perception of what is being presented by a
(2003) communication service or application user interface.
2. The overall result of the individual Quality of Services and a
measure of overall acceptability of a service or application that
includes factors such as usability, utility, fidelity and the level of
support from the application/service provider.
Nokia (2004) 1. The ability of the network to provide a service with an assured
service level.
2. How a user perceives the usability of a service when in use – how
satisfied he or she is with a service.
3. The perception of the user about the quality of a particular service
or network
Lopaz et al. An extension of the traditional QoS in the sense that QoE provides
(2006) information regarding the delivered services from an end-user point
of view.
Soldani How a user perceives the usability of a service when in use – how
(2006) satisfied he/she is with a service in terms of, e.g., usability,
accessibility, retainability and integrity.
IneoQuest The customers’ perception of how good of a job the service provider
(2008) is doing delivering the service.
Kilkki (2008) The basic character or nature of direct personal participation or
Winkler Quality from the perspective of the user or consumer (e.g. viewer),
(2009) with a focus on perceived quality of the content (or more
comprehensively, user experience)
Dagstuhi The degree of delight of the user of a service, influenced by content,
(2009) network, device, application, user expectations and goals, and context
of use.
Batteram et The measure of how well a system or an application meets the user’s
al. (2010) expectations.

TABLE 2-24 Categories of VoIP User’s Experience and their quality expectation

Category of User’s Experience Expectations for level of Quality

Reliability Works every time
Availability Always available
Call Completion Calls always completed as dialed
Connect Latency / Responsiveness Rings in few seconds
Voice Quality At least as good as the PSTN
Speech Latency Imperceptible
Services All Available / all functioning properly
Billing Completely accurate

TABLE 2-25 Subjective measurement methods versus objective measurement

Voice Quality Measurement

Subjective Objective
Accuracy and
High Medium – high
Management skill
High Low
Endeavor requirement High Low
No Yes
objective measurement tool
Special test facilities
Soundproof room(s) (e.g. E-model measurement
Very long
Time consumption (e.g. 5 minutes per Short
(e.g. 24-32 participants per Low
(for conducting subject to
participate, to employ a High
Cost research assistant and to (for a standard tool, e.g. E-
prepare standard test model measurement tool)
facilities, e.g. a sound proof
room) Voice Quality Indicator: MOS

Daengsi et al. stated in [1] that instead of using QoS parameters (e.g. packet
loss, packet delay and jitter), ITU-T officially recommended using Mean Opinion
Score (MOS) as the voice quality indicator [153]. MOS has been claimed as the most
reliable metric of user’s perceptual voice quality [153]. It is the accepted metric
because it can provide a direct link to voice quality as perceived by the end users
Basically, MOS is obtained by evaluation subjectively. A group of people are
required to vote the quality of voice from both female and male speakers (normally
24-32 subjects, while it is originally recommended 30 subjects) [157-160]. MOS can
be obtained from listening-opinion tests or conversation-opinion tests by using the
score for voting as presented in Table 2-26 [156-157]. After voting by enough number
of subjects, the mean of score would be calculated. That is the source of its name.

TABLE 2-26 Scale of opinion scores and meaning

Opinion Score Meaning

5 Excellent; no perceptible impairments

4 Good; barely perceptible but not annoying impairments
3 Fair; perceptible and slightly annoying impairments
2 Poor; annoying but not objectionable impairments
1 Bad; very annoying and objectionable impairments

However, voice quality for general users relates to their expectations, while
their expectations regarding voice quality exist at two levels, fixed-line level and
mobile phone level [157]. For the fixed-line network provided by a PSTN, it normally
provides “toll quality” voice with a MOS of al least 4.0. Particularly the fixed lines
using DS-0 circuits (e.g. ISDN), most calls achieve ‘excellent’ voice quality or MOS
of 4.3, which is as good as it gets for narrowband speech. For mobile phone network,
general users perceive that most calls are provides as ‘fair’ voice quality. Its voice
quality expectation may considerably be MOS of 3.2-3.5. Understanding Subjective and Objective
To understand subjective and objective quality measurement/assessment clearly,
it is necessary to clarify the terminology of subjective and objective. These two terms,
subjective and objective, have been confused and misused widely, including the
misusing in the law or criminal discussions, clinical discussions and journal articles
[161-162]. In the criminal discussions, fingerprint identification has often been
referred to as subjective, whereas, in clinical discussion, pain, muscle testing and
force measurements become issues. Thus, back to the basics, is necessary to
understand the term subjective and objective clearly, dictionaries can be consulted in
[161] and Table 2-27 [163-166].
According to speech or voice quality measurement/assessment in the area of
VoIP and the definitions in [161-166], therefore, it can be summarized shortly that
subjective quality measurement/assessment for VoIP is the voice or speech quality
measurement/assessment methods that are made based on the feelings or opinions of
the subjects or users, reflect the perceived voice or speech. While, the objective
quality measurement/assessment is the measurement/assessment methods that are
made based on facts, something real, or observable phenomena, not influenced by
personal beliefs or feelings.
2.4.3 Network Factors: The Gang of Evils
There are several factors that may affect voice quality of VoIP
applications/services, not only codec selection and packet size [167] but also network
factors. There are three classic major network factors [16, 168-169], consisting of
packet delay, packet loss and jitter, that have been called the ‘three evils’ for voice
quality [170]. However, it has been stated in [155] that echo is also a cause of voice
quality issue for VoIP. Nevertheless, there are other factors that can also affect voice
quality, such as, packet mis-order, transcoding, network duplex and voice activity
detection algorithm [171]. However, using search in IEEEXplore--the major

organization that sharp the world, the evidence about research directions based on
VoIP quality between the years 2000-2012 has been found as in Figure 2-30 [172].
Nevertheless, focusing on the three evils, it has been found that most results of 485
mention packet delay, whereas 389 results and 311 results mention packet loss and
jitter respectively, as in Table 2-28 [172-175]. Therefore, only the three evils have
been described in detail.

TABLE 2-27 Definitions of subjective vs objective

Subjective (adj.) Objective (adj.)
TheFreeDictionary 1.a. Proceeding from or 1. Of or having to do with a
[189] taking place in a person’s material object.
mind rather than external 2. Having actual existence or
world. b. Particular to a reality
given person; personal. 3. a. Uninfluenced by
2.Moodily introspective. emotions or personal
3.Existing only in the mind; prejudices. b. Based on
illusory. observable phenomena;
presented factually.
Cambridge Influenced by or based on Not influenced by personal
Dictionaries personal beliefs or feelings, beliefs or feelings; fair or real.
Online [190] rather than based on facts.

Oxford Based on or influenced by (Of a person or their

Dictionaries personal feelings, tastes, or judgement) not influenced by
Online [191] opinions. personal feeling or opinions in
considering and representing

Longman English 1.A statement, report, attitude 1. Based on facts, or making a

Dictionary Online etc that is subjective is decision that is based on
[192] influenced by personal facts rather than on your
opinion and can therefore feelings or beliefs.
be unfair. 2. Existing outside the mind as
2.Existing only in your mind something real, not only as
or imagination an idea.

FIGURE 2-30 The result from search using the keyword “VoIP quality”

TABLE 2-28 The statistic of the results from IEEEXplore after using the keywords
VoIP quality, packet delay, packet loss and jitter

Keywords Results

VoIP quality 1,534

VoIP quality packet delay 485
VoIP quality packet loss 389
VoIP quality jitter 311 Packet Delay

Delay is one of the most classic factors because it cannot eliminate it although
optical fiber is used in core IP networks, only decreasing. It is about the length of time
it takes a packet to traverse the network. Mainly, packet delay is comprised of fixed
delay (e.g. voice coding/decoding delay, quantization delay, algorithmic delay,
packetization delay and serialization delay) and variable delay (e.g. propagation delay
that may be affected from network congestion) [112, 176]. However, it has been
recommended that delay of not over 80 ms is very good for voice quality [112]. Also
in [16], it was mentioned to the ITU-T Recommendation G.114 [177] that the one-
way delay, including endpoint, should not be over 150 ms, for excellent voice quality.
If this value is more than 250 ms, talk-over, which is about one person starts to talk
because the delay block the from realizing that the other person has started speaking
already, will occur, whereas, the delay of 400 ms and above is unacceptable [112], as
in Figure 2-31 [177]. Of course, delay becomes an important factor called ‘delay
impairment factor’ to calculate for MOS [178].

FIGURE 2-31 Relation of R-value from E-model vs delay Packet Loss

Not only voice quality, packet loss may affect availability of VoIP
applications/systems, sending tone and sending signaling traffic [112]. Packet loss in
VoIP networks refers to the percentage of packets dropped, it occurs when packets are
sent but some of them are not received at the destination endpoint because of some
events occur in the network [170]. Its incidence will be moderate of the loss rate is
low [179]. Failures, which can occur at various protocol layers in the network (e.g. a
fiber link cut, router failures/overloads and software erros), can be the root causes of
packet loss that is a major source of voice quality degradation of VoIP [180-181]. In
general, packet loss events can be single packet loss events (which are a large
percentage of all loss events but these affect little to the total amount of loss) and
network outages (which are the events of long loss periods or continuous packet loss
that affects voice quality without avoidance) [112, 182]. Packet loss can be random
and bursty [178] that can be clearly presented as in Figure 2-32 [183]. It has been
reported in [169] that high burst packet loss affects voice quality more than low burst
packet loss. The maximum loss of voice packets or frames between two endpoints
should be 1% or less for very good voice quality, whereas 3% or less is acceptable for
business quality [16, 112]. However, it was suggested by a manufacturer that packet
loss for VoIP application can be up to 5% [184]. However, the results from some
situations, such as packet mis-order and high delay, can show the same results of
packet loss because high amounts of delayed packets and mis-ordered packets would
be dropped for VoIP, which is a real-time application [112, 152]. Packet loss is a part
of ‘equipment impairment factor’ in the E-model [178]. To solve the packet loss

problem, the situation may be improved by implementing Packet Loss Concealment

(PLC) into VoIP applications [124]. Jitter
Jitter or delay variation is the difference in the delay times of consecutive
packets, which is a measure of variance in the time that takes for communication to
traverse from the packet sender to the packet receiver [112, 168]. It is thought of as
the statistical average variance in delivery time between packets. It is recommended
that the jitter in the VoIP network should be not more than 20-30 ms, however, jitter
problem could be cleared/decreased by using jitter-control jitter buffers [16, 112,



FIGURE 2-32 Example of random and bursty packet loss: (a) Random packet loss
(b) Moderate bursty packet loss (c) Heavy bursty packet loss

2.4.4 Subjective Voice Quality Measurement Methods Importance of Subjective Measurement and Overview
Actually, voice quality measurement is to assess or evaluate users’ perceptual
voice quality in telecommunications. It is important because it is necessary for the
objective measurement calibration. Besides, it has been referred in several research
works that subjective measurement is the most authentic method with high accurate
and reliable, as presented in Table 2-29. Listening-Opinion Tests [1, 18, 156]
These kinds of tests have been classified into absolute category rating (ACR),
degradation category rating (DCR), and comparison (CCR), as in Figure 2-33.
Although listening-opinion tests cannot reach the realism as conversation-opinion
tests, ITU-T recommended using ACR for listening-opinion tests. However, a
difficulty of listening-opinion tests is about source recordings of speech set for testing
that requires, for example, very good recording environment (e.g. reverberation time
of less than 300 ms and room noise lower than 30 dBA), good recording system, and
appropriate speech material. Rather than recording source of speech by yourself, there
are available speech databases. For example, Multi-Lingual Speech Database for

Telephonometry 1994, developed by NTT-Advanced Technology Corporation in

Japan. This speech database contains 21 languages of speech, including Thai. Conversation-Opinion Tests [1, 149, 156]
It has been presented that ITU-T recommended in [156] referring to test
facilities, experiment design, conversation task, and test procedure are use for
conversation-opinion tests. These kinds of subjective methods are the most realistic.
These can be used to test voice quality under the conditions of delay and echo that
might degrade voice quality, while two telephone users are talking together. In the
test, two inexperienced subjects are told to sit in separate soundproof rooms, and then
conduct a conversation via IP phones. That means, this method requires two good
soundproof rooms. Of course, high cost is involved if it is necessary to build two
soundproof rooms with very low reverberation time and very low room noise.
However, for pros of conversation-opinion tests, it is not necessary to prepare a
speech database.

TABLE 2-29 The evidence for the importance of subjective measurement

Author Statement

T.A. Hall “…., subjective measures are often very accurate and useful for
(2001) evaluating a telephony system.” [185]
M. Narbutt and “Subjective testing is considered as the most “authentic”
M. Davis, method of measuring voice quality.” [186]
Ding et al. “Speech quality is inherently subjective, as it is determined by
(2007) the listener’s perception. Therefore, the most reliable approach
for assessing speech quality is through subjective tests.” [187]
M. Goudarzi Subjective listening tests are the most reliable method for
(2008) obtaining the true measurement of user’s perception of
voice quality and have good results in terms of correlation
to the true speech quality. [17]
P. Khanduri, “Subjective is widely considered the most “authentic” method of
(2009) measuring voice quality.” [188]
Al-Akhras et al. “…, as subjective methods are the most accurate methods for
(2009) measuring the speech quality, they are used to calibrate objective
methods.” [189]

Mahdi and “The most reliable method for obtaining true measurement of
Picovici (2009) users’ perception of speech quality is to perform properly
designed subjective listening tests.” [152]

J. Lee, K. Nam, “Subjective testing is the most "authentic" method of measuring

and D. Kim voice quality….”[190]

FIGURE 2-33 Subjective voice quality measurement methods Interview and Survey Tests [156]

These kinds of tests are options to evaluate voice quality with subjects. These
can be used to interview and/or survey real users. For example, after finishing the new
VoIP system installation for a customer, the implementer might use the survey tests to
obtain the evidence from users of voice quality provided by the new VoIP system in
order to handover the VoIP system to a customer. However, a lot of endeavor is
required to gather opinions of voice quality from at least 100 interviewees which is
the trade-off with conducting outside a soundproof room.
2.4.5 Objective Voice Quality Measurement Methods Overview and Importance of Objective Measurement
For objective measurement methods, they are the calculations of values about
the combination of different damaging parameters of the network. These methods are
classified into intrusive measurements and non-intrusive measurements, as shown in
Figure 2-34. However, only two of the most popular objective measurement methods
are described in detail in the next sub-sections, consisting of PESQ (stands for
Perceptual Evaluation of Speech Quality), which is an intrusive method, and E-model,
which is a non-intrusive method.

FIGURE 2-34 Objective voice quality measurement methods PESQ
PESQ is state-of-the-art in terms of objective voice quality measurement and
has been claimed to have very high correlation with the subjective voice quality
measurement method [191]. It is the most common and popular method of intrusive
measurement methods [153, 192], including the original version, P.862 that supports
narrow-band telephone networks and speech codecs, and P.862.2 which supports
wideband telephone networks and speech codecs, as shown in Figure 2-35 [193]. The
original ITU-T P.862 PESQ supports only narrow-band telephone networks, whereas
its new version was extended to support wideband telephone networks [194-195]. It
uses the strength of both Perceptual Speech Quality Measurement (PSQM) and

Perceptual Analysis Measurement System (PAMS), which are psycho-acoustic and

cognitive models, and a time alignment algorithm respectively [16].
PESQ is recommended to evaluate the impact of a codec to voice quality and to
test the networks before operation [153]. It can be applied to evaluate several factors,
such as transmission errors on the transmission channel, codec errors, noise in the
system, packet loss and time clipping. As shown in Figure 2-36 (adopted from [153,
194]), PESQ works as the model that compares the degraded signals to the original
signals, instead of subjective evaluation. Mainly, there are three processes [153]. The
first process is signaling pre-processing, which includes input signal frequency and
time alignment. Next process is perceptual modeling. This step focuses on the input
and output transformation understandable by human representations. The mapping in
the time and frequency domain and a signal filtering for the bandwidth typical of the
telephone network is also included in this step. The last process is cognitive modeling.
In this step, the values that represent noise computation are evaluated, and then
combined to the MOS score calculation. A difference between reference signal and
distorted signal is calculated. A positive difference indicates the presence of noise,
while a negative difference indicates a minimum noise presence such as codec
distortion. It has also been described in [153] that this model permits the discovery of
time jitter and identification of frames involved and which frames are affected by the
delay and erased in order to prevent a bad score. For its performance, the average
correlation between PESQ scores and the subjective scores was 0.935 [194], whereas,
it is claimed in [196] that the correlation is up to 0.95.

FIGURE 2-35 PESQ application guide




FIGURE 2-36 PESQ overview (a) PESQ concept. (b) The processes insides the
model of PESQ E-model
E-model is the most popular and widely used method of non-intrusive
measurement methods, as mentioned in [16]. Originally, it was to aid the transmission
planners with testing transmission performance of networks using computation
algorithms to insure satisfaction of users [178, 197]. This computational method is
based on 21 parameters, as shown in Figure 2-37, whereas the default values and
permitted values of those parameters are presented in Table 2-30 [178].
Those parameters have been simplified into the main factors to help calculate
the transmission rating factor R, as follows:

R = Ro-Is-Id-Ie+A (2-2)


Ro is the basic signal-to-noise ratio, including noise sources

such as room noise and circuit noise.
Is is the signal impairment factor which is a combination of all
impairments which occur more or less with the voice
signal simultaneously.
Id is the delay impairment factor that caused by delay.
Ie is the affective equipment factor that caused by codecs.
A is the advantage factor that allows for compensation of
impairment factors when there are other advantages of
access to the user.
After obtaining R-value of the transmission rating factor, MOS-CQE can be
estimated using the equation as follows:
For R  0: MOSCQE  1

For 0  R  100: MOS CQE  1  0.035 R  R ( R  60)(100  R )7  10 6 (2-3)

For R  100: MOSCQE  4.5

The characteristic of R-value can also be represented as in Figure 2-38, whereas

Table 2-31 shows MOS-CQE equivalence [178].
However, it has been stated in the previous version of the E-model that has been
published in the year 2008 that it has not been verified by field surveys or laboratory
tests for the very large number of possible combinations of input parameters [198].

FIGURE 2-37 Reference connection of E-model


TABLE 2-30 Network parameters between two endpoints, in a telephone network

Parameter Abbr. Unit Remark
Send loudness rating SLR dB +8 (Note 1)
Receive loudness rating RLR dB +2 (Note 1)
Sidetone masking rating STMR dB 15 (Notes 2, 4)
Listener sidetone rating LSTR dB 18 (Note 2)
D-Value of telephone, send side Ds – 3 (Note 2)
D-Value of telephone, receive side Dr – 3 (Note 2)
Talker echo loudness rating TELR dB 65
Weighted echo path loss WEPL dB 110
Mean one-way delay of the echo path T Ms 0
Round-trip delay in a 4-wire loop Tr Ms 0
Absolute delay in echo-free connections Ta Ms 0
Number of quantization distortion units Qdu – 1
Equipment impairment factor Ie – 0 (Note 5)
Packet-loss robustness factor Bpl – 4.3 (Notes 3, 5)
Random packet-loss probability Ppl % 0 (Notes 3, 5)
Burst ratio BurstR – 1 (Notes 3, 6)
Circuit noise referred to 0 dBr-point Nc dBm0p 70
Noise floor at the receive side Nfor dBmp 64 (Note 3)
Room noise at the send side Ps dB(A) 35
Room noise at the receive side Pr dB(A) 35
Advantage factor A – 0
NOTE 1 – Total values between microphone or receiver and 0 dBr-point.
NOTE 2 – Fixed relation: LSTR = STMR + D.
NOTE 3 – Currently under study.
NOTE 4 – Eq. (3-24) in [178] provides also predictions for STMR > 20 dB. However,
such values can hardly be measured in a reliable way because the measurement
device will mainly cover the acoustic coupling, and not the electrical one.
NOTE 5 – If Ppl > 0%, then the Bpl must match the codec, packet size, and PLC
NOTE 6 – E-model predictions for values of BurstR > 2 are only valid if the packet
loss percentage is Ppl < 2%.

FIGURE 2-38 The graph represents the relation of R-value and MOS-CQE

TABLE 2-31 The relation among R-value, MOS-CQE and user satisfaction

R-value (lower limit) MOS-CQE (lower limit) User satisfaction

90 4.34 Very satisfied

80 4.03 Satisfied
70 3.60 Some users dissatisfied
60 3.10 Many users dissatisfied
50 2.58 Nearly all users dissatisfied PESQ versus E-Model

Although PESQ is one of the most interesting intrusive-measurement methods
for VoIP, it has limitations. Those limitations has been gathered from [191, 196, 199-
200]. Those are presented that PESQ does not always predict voice quality accurately,
particularly in live networks because of improper time-alignment in [191] (originally
reported in [199]), which is consistent with the report in [196] that presented the
PESQ’s issue about its time alignment and psychoacoustics. Also, it could be implied
from the report in [196] that PESQ result probably depends on the listening test data
set that it has been developed with (speech content dependency). The result may have
high error if it is used to test with different factors such as different codecs. Also,
PESQ does not account affects of jitter buffers and packet loss compensation that are
important mechanisms to maintain voice quality of VoIP. Moreover, with bursty rate
- packet loss, PESQ shows more error than constant rate – packet loss. Besides, it has
been reported in [200] that PESQ is bias against the Enhanced Variable Rate Codec
(EVRC) that is used for CDMA.
Therefore, it is considerable that an E-model is more appealing than PESQ
because it is mainly computed using the network parameters between two endpoints,
it is not speech content dependency. Besides, it’s number one in its research direction,

see the evidence from the search results in IEEEXplorer website as in Table 2-32

TABLE 2-32 The statistic of the results from IEEEXplore for search using the
keywords VoIP, quality, PESQ and E-model, between 2002- Aug

Keywords Results

VoIP quality E-model 100

VoIP quality PESQ 59

2.5 Previous Research on Voice Quality Measurement

This section presents previous research issued by other researchers that relates
to this research, as follows:
2.5.1 Previous Research Using Subjective Tests
An interesting research referring to subjective VoIP speech quality evaluation
based on network measurements in [203], it was conducted with 28 subjects using
Finnish speech samples (which are a non-tonal language) created from four male and
four female speakers. The test was performed in laboratory facility using high-quality
headphones and employed the Adaptive Multi-Rate (AMR) codec in the subjective
tests and focused on two main network factors (Delay and Frame Error Rate (FER) or
Frame loose rate). Of course, the results of MOS decreased from the affect of percent
increased delay and FER.
Sun [155] conducted her PhD research with a chapter of subjective tests, called
Internet-based subjective speech quality measurement, which consists of both
uncontrolled and controlled Internet-based methods. The speech materials are ITMIT
and ITU-T data sets, which was English speech from 4 female and male talkers. In the
tests, there were 16 subjects and 15 subjects in uncontrolled and controlled Internet-
based methods respectively. The results from the Internet-based methods show a non-
high correlation of results from four objective measurement methods (PESQ, PSQM,
MNB and EMBSD) but a high correlation of results from the room-based listening
tests. Besides, it has been presented that PESQ is less sensitive than the subjective
tests referring to packet loss effects.
Comparison between subjective listening quality and P.862 PESQ score has
been presented in [158]. In this paper, the results from listening subjective tests with
British English which is about half of the test and other five non-tonal languages,
which consisted of French, American English, Japanese, German and Dutch, were
compared with the results from PESQ-LQ (LQ stands for Listening Quality) an
objective test. This paper reveals the variation in quality score, which consists of
cultural variation, the balance of conditions and individual variation that needs about
24-32 subjects per test if 95% confidence interval is required. It also reveals the
overview of PESQ-LQ. This work claimed that PESQ can be a good objective
measurement method to predict the MOS for most of those six languages.
Goudarzi presents in his thesis [17], Evaluation of Voice Quality in 3G Mobile
Networks, using both subjective and objective measurement methods, based on
Asterisk open source PBX to mediate between the voice quality measurement

equipment and 3G mobile network. He conducted his work in the UK, therefore, the
used speech samples were British English. For the part of objective measurement tool,
the GSM and AMR codecs had been tested with more than 200 speech samples using
PESQ and 3SQM, and then compared to the result from the part of non-ITU-T
standard subjective listening tests, which was conducted with 33 subjects by sending
the score sheets with instruction, voice files and instructions to the subjects. This
work also used some Asterisk open source PBX software installation. He investigated
and found that PESQ showed a better result than 3SQM. Also, he found that the male-
speech samples got a higher score than female speech samples. However, this work
mainly focused on the objective measure methods.
There is an article that conducted subjective tests (ACR) to compare with PESQ
objective tests. The ACR listening tests were conducted with 20 Japanese using
speech samples of Japanese seven-digit numbers from 2 male and 2 female speakers.
The facilities include a headphone and a soundproof room. The results, called
subjective MOS were then compared to the results from PESQ to evaluate the
effectiveness. In the conclusion, it has been confirmed that PESQ objective MOS
correlates relatively well with the subjective MOS [204]. Also, there is an article that
that presents the comparison of MOS evaluation characteristics among Chinese,
Japanese and English in IP telephony [205]. It has been stated that subjective voice
quality evaluation may be influence by nationality and pointed out that the
dependencies of language, culture and nationality should be investigate to gain more
understanding of QoE. The main contributions of this work are showing the
subjective MOS difference between Japanese and Chinese graphically, as in Figure
2-39, although it shows correlation coefficient of 0.903 [205].
The results have been gathered and analyzed from 25 Chinese and 32 Japanese
using ACR listening opinion tests with native speech sample, within a soundproof
room and provided by three G.722 family codecs, referring to packet loss. Of course,
this is very important evidence about language dependency and cultural dependency
that is consistent with the culture variation issue, covering language variation, pointed
out in [205]. Particularly the language dependency issue that’s has been
acknowledged by ITU-T already [206-207].
In [208] subjective measurement based on collecting opinion via the Internet
has been proposed. They proposed a browser-server structured system and a set of
procedures for collecting subjective opinions through the Internet using degradation
category rating (DCR) as the rating scale. The study has been conducted to compare
the proposed method with 276 subjects and the listening tests based on ITU-T
recommendation P.800 with 32 subjects and 8 operators. The test were operated in the
campus of Beijing Institute of Technology, therefore, it could be implied that all
subjects were Chinese. The experiment was designed to compare the performance of
two of the candidate codec and the reference codec, referring to four types of noisy
background (e.g. Office and Babble40). Also, in the experiment the modulated noise
reference unit (MNRU) has been applied. It has been concluded that the proposed
method was proven to be able to collect subjective opinion scores via the Internet
easily, cheaply and flexibly, although it requires a large number of subjects, whereas,
the proposed user qualification mechanism can give reasonable reliability by
distinguishing between reliable and unreliability subjects.

FIGURE 2-39 Chinese MOS versus Japanese MOS

Nevertheless, there are also several research works that have been conducted
using subjective methods. Most of them are ACR listening tests (both formal and
informal) with about 20-30 subjects. Some gathered previous works are summarized
in Table 2-33 [17, 155, 203-205, 208-210].
Although, there are several previous works that conducted subjective tests, few
works present important information clearly about their laboratory, system under test,
codec under test, language, nationality of subjects, method and test conditions. Thus,
selected information about those works based on different languages, which are
implied being tested by the native listeners/speakers, are presented as in Table 2-34,
obtained from [211-214]. In the table, it can be seen that with, the same codec, MOS
that represents voice quality perception from different languages and native listeners
are rather different. Therefore, it is very interesting to find MOS from base-on Thai
language and Thai native listeners.
2.5.2 Previous Research Using Thai and Other Languages
In [215-216] objective tests with 13 languages, both non-tonal languages
(Arabic, English, French, German, Hindi, Japanese, Korean, Protugese Russian and
Spanish) and tonal languages (Swedish, Chinese and Thai) were used. These two
work used many languages to obtain MOS for BV16 and BV32 codecs (BV16) is a
madatory codec in the PacketCable 1.5 standard), then compared with other codecs,
such as, G.711u, G.729E, GSM-EFR, G.728, G.729 and G.723.1 at 6.3 kbps, by using
PESQ measurement method and mentioning to one subjective listening test results.
However, in this work, it did not focus on analysis and comparison of the issue
between non-tonal and tonal languages which includes Thai.

TABLE 2-33 Example of previous works with subjective tests

Publication Speech Materials
Reference Method No. of Subjects
Year Language Talker
4 female
[202] 2001 Listening Test Finish & 28
4 male
Internet based,
English 4 female
Controlled 16 (uncontrolled tests)
[154] (ITMIT &
2004 Internet based, &
and ITU-T 4 male
and room- 15 (controlled tests)
Listening Test
2 female
[203] 2007 Test, with a Japanese 20
& 2 male
Informal Selected
Listening Test from 12
[17] British 33 (13 female & 20
2008 (for comparing female
English male)
with the and 18
objective tests) male
Informal ACR- Igbo (used
listening Test, in south 2 female 24 (12 female & 12
[208] 2009
with a eastern & 2 male male)
headphone Nigeria)
4 female
[209] 2010 Listening Test English 20 (12 female, 18 male)
& 4 male
Test 2 female
(using Chinese & & 2 male 25 (Chinese) &
[204] 2010
headphones in Japanese for each 32 (Japanese)
a soundproof language
DCR-Listening Not Not 276 + 32 subjects, (in
[207] 2010
Test specified specified BIT)

Uzoamaka [209] conducted his thesis on the tonal language Igbo. The test was
listening subjective tests. 24 subjects, 12 male-subject and 12 female subjects listened
to 200 speech samples before giving the score for each speech sample. The results
from Igbo were used to find the correlation with the PESQ P.862.2 dataset of Igbo
before comparing the correlation of the results of Dutch. The compared result showed
that Igbo (r=0.88) has higher correlation than Dutch (r=0.84) slightly. Also, this work
found that the correlation between results for tone and non-tone sentences from the
subjective tests was very high (r=0.96). It was claimed that this work is the first
research about tonal language in Africa. However, in this work, each subject listened
to each speech sample without encoding a codec such as G.711 or G.729.

TABLE 2-34 Example of MOS from different languages

No. of
Codec Method Condition Language MOS Remarks Ref.
English ~4.15 Approximated
No loss, specified
ACR from the [210]
G.729 no delay Not
Japanese ~3.4 figures
(8 kbps) specified

 0.71
Clean American- 3.69 Psytechnics
ACR 32 [211]
Speech English (Report)
G.711 American-
Clean 32 4.05 8 kHz
A-law ACR English [212]
Speech sampling rate
(64 kbps) Korean 32 4.41

Clean French 32 4.41 16 kHz

ACR [212]
Speech Chinese 32 4.23 sampling rate
 0.834
(64 kbps) American- 4.26
-26dBov, 32 Dynastat Lab.
DCR Clean [213]
 0.752
Speech Finnish 32 Nokia Lab.

Ren et al., [217] presented their articles based on Chinese, which is a tonal
language. Further details are found in 5.6.3, those works mainly presented E-model
2.5.3 Previous Research on E-Model Enhancement
Although E-model is very popular and used widely, there are disadvantages still
[218]. For example, it has been pointed out that its listening quality evaluation is less
accurate than that derived by a signal-based algorithm, it does not consider the
variability of network delays and loss rates and it does not take account of the
interaction between different factors (e.g. the interplay between network delay and
loss rates). Also, the E-model may require verification by field surveys or laboratory
tests, and calibration properly.
There are several research works that propose objective measurement
enhancement, including extended E-model, E-model enhancement or E-model
improvement [183, 219-224]. M. Voznak et al., who presented [219-220] about E-
model modification and improvement, stated that the development of the E-model is
not successfully completed. Those works show that the current version of E-model
does not reflect reality and pointed aggressively that the SG12 group failed to address
significant influences (e.g. codec tandeming). Also, H. Zhang et al., [183] presented
the enhanced E-model using parameter B, which is a new packet loss burstiness
measure parameter, whereas, Zhen and Lin [221] presented the proposed extension of
the E-model, as in Figure 2-40. This proposed model is composed of codec module,
delay module and packet loss module. These tress modules is integrated with the
standard E-model finally to obtain the result called MOS E . However, this proposed
model is based on objective measurement only, for example, Iec is calculated by
applying PESQ.

FIGURE 2-40 The overall structure of the extended E-model

Ding and Goubran proposed the extended E-model using the modified Ie and a
new parameter, called jitter impairment factor Ij, as shown in Eq. (2-4) and (2-5)
respectively [222].

Ie = Ie_opt + C 1 ln(1+C 2 loss_rate) (2-


Ij = C 1 H2+C 2 H +C 3 +C 4 e-T/K (2-


For Eq. (2-4), Ie_opt is the optimum (without packet loss), loss_rate is the
amount of packet loss in percent, C1 and C2 are constants that can be vary depends on
codec and packet loss rate. For Eq. (2-5), H is about delay distribution parameter and
T is about the buffer size, whereas, C 1 -C 4 are coefficients and K is a time constant
(see [222] for more details). These equations have been applied in [223] to determine
maximum number of calls in some given bandwidth capacities while maintaining a
certain level of QoS.
E-model enhancement using jitter impairment factor Ij, has also been found in
[224] with different forms of Ij (see [224] for further information).
The most similar work to this research has been found in [217], Ren et al.
presented their article, entitled Assessment of Effects of Different Language in VoIP,
about experiment to investigate effects of different languages on perceived voice
quality on different VoIP system factors (delay, loss and codec) by testing with
English and Chinese). In this work, PESQ P.862, an objective measurement method
was used to evaluate the sound quality by testing speech samples (English and
Chinese) that included one male and one female for each language. The speech
samples are record in 16-bit, 8 KHz linear PCM format and were selected from the
languages speech sets. These were similar in tempo and perceived quality. Each was
about 8 s long and had about 50% of active speech intervals. They did not find the
different effects from short delay (0-200 ms) to different language but found that

G.729 codec degraded Chinese speech samples more than English speech samples,
compared with G.711u codec. Also it was claimed that English speech samples were
always received with better voice quality, compared to other languages. However, the
important part of this article is about presenting the Enhance E-model by adding the
new impairment factor Il, called language impairment factor, as the equation below:

Il = C1+C2*PPL for Chinese (2-6)

Il = 0 for English speaker
C1=0.52819, C2=-0.574391 and PPL is packet loss percentage.

At present, although there are several proposals that present the methods to
improve or enhance the standard E-model, for example, using new factors, such as,
jitter impairment factor and language impairment factor, the final E-model or the
perfect E-model has not been successfully discovered.
Moreover, according to all the previous research that has been reviewed,
therefore, it can be summarized here that there is no E-model
enhancement/extension/improvement/modification using subjective MOS from
subjective measurement methods.

2.6 Selected Tools for Implementation of the Testbed VoIP System

2.6.1 VoIP Software: Asterisk
At present, there are several manufacturers which supply VoIP products to the
telecommunication market such as Avaya and Cisco. However, prices of those VoIP
systems are very expensive [225], particularly for SMEs. Therefore, economic VoIP
systems implemented from open source software have become an alternative solution.
Asterisk is a free packet of open source VoIP software. Its code has been
originally created by Mark Spencer, the Linux expert who is the founder and CTO of
Digium, Inc., since 1999 [226-228]. Due to Asterisk being fully open source it is
available for download free of charge. It has been claimed that it has become the
world’s leading and most popular open source project for VoIP/IP telephony [228-
229]. In Thailand, Asterisk is very popular and used widely even by students and
academicians in universities, as found in [230-236].
Asterisk architecture is presented in Figure 2-41 [237]. From its architecture,
Asterisk consists of several APIs and has parts which work together as in Table 2-35
[238-241]. Asterisk can be installed on a Linux Based operating system, such as,
FreeBSD, Ubuntu and CentOS. Based on FreeBSD that has been chosen for this
study, after installation of Asterisk, important files, as in Table 2-36, require changes
in order to operate properly referring to related design and conditions [238].
2.6.2 Network Emulator: Dummynet
To study the voice quality of VoIP referring to network factors (e.g. packet loss
and packet delay effects), it requires a network emulator to generate those effects.
There are several freely available and widely used network emurators [242]. These
tools help researchers to experiment these scenarios which cannot be conducted in a
real network easily to determine, for example, the behavior of traffic or protocols in a
complex network made of many nodes, routers and links with different conditions,

such as, queuing policies and sizes, bandwidths, packet delays and packet loses, and
packet reordering. Moreover, network emulators have the advantage of being
controllable and reproducible of the traffic generation with the required conditions.
Rizzo [243] proposed a simple approach that is very effective to put the
standalone system called “Dummynet” into the studied network to conduct
experiments. Originally, it has been developed as a component of FreeBSD for over
one decade. Now it is available in some other operating systems (e.g. Mac OS X,
Linux and Windows) [244]. With the features as in Table 2-37, Dummynet works by
simple intercepting communication between the protocol layer under analysis and the
underlying one which simulates the presence of a real network with conditions of
network factor effects, for example, limited bandwidths, packet delays and packet
loses [242-244]. This approach gives the advantages of both simulation and real-
world testing by conducting experiments on a workstation or PC as traffic generators
without modification of a real world application.
Running an experiment using Dummynet is as easy and quick as running an
application on a PC. Moreover, Dummynet produces almost no overheating in the
communication, meaning experiments can be conducted within the maximum
performance of the system in use.

FIGURE 2-41 Asterisk architecture


TABLE 2-35 Description of each Asterisk part based on its architecture

Component Description
Channel API A channel is an API that Asterisk uses to interface with
the PSTN (e.g. ISDN), Voice over IP with various IP
signaling protocols (e.g. SIP, IAX, H.323, MGCP and
Skinny) and miscellaneous channels (e.g. ACD)
Codec Translation API A codec is a combination of coder/decoder or
compressor/decompress, which is used to change analog
voice signal into a digital data stream and to change the
data stream back into analog voice signal. Asterisk
support many codecs, for example, G.711 alaw, G.711
ulaw, G.722, G.723.1 G.726, G.729, GSM, iLBC, Speex
and LPC10.
File Format API It is an API which is used to store audio data such as
voicemail and music on hole that can be found in the
directory /var/spool/asterisk/vm and
/var/lib/asterisk/sounds respectively. Asterisk supports
variety of file formats (e.g. MP3, raw, pcm, vox, wav and
Application API It is an API which is used to support applications, for
example, voicemail, conferencing, paging and other
custom applications
PBX Switching Core It is the important part of Asterisk which is used to
receive telephone calls from the interfaces ( channel
APIs) and handles calls by following a dial plan.
Application Launcher It is mainly the middle-ware between PBX switching core
and application APIs. For example, it has been used by
PBX switching core to ring IP phones, to dial out on
outgoing trunks and to connect to voicemail. Moreover, it
has been used to interface with CDR core.
Codec translator It is the part that is used to connect, for example, a
channel that compressed with G.711alaw to the other
channel that compressed with G.729 seamlessly by
Scheduler and I/O This part is used to handle related applications and drivers
Management to operate efficiently under the system conditions.
Dynamic Module When Asterisk starts, this part is used to loads and
Loader initialized drivers that provide, for example, channel
drivers, file formats, call detail recording back ends,
codecs and applications.
CDR Core CDR stands for Call Detail Recording, therefore, CDR
core is mainly used to record details of outgoing calls
(e.g. calling and called numbers, duration time and the
trunk number).

TABLE 2-36 Important Asterisk files based on FreeBSD

File Name Function

/etc/rc.conf This file is used to assign IP address, subnetmask,
Note: for FreeBSD OS only gateway and hostname and other parameters to
Asterisk server. Also, to enable the server to run
automatically every time after start-up.
extensions.conf It is used to assign extensions and dial plan that are
run in Asterisk system.
Sip.conf/iax.conf/mgcp.conf These files are used to assign account and other
parameters to IP phones and other devices, based on
their IP signaling protocols.
features.conf It is used to setup telephone features (e.g. call park
and call pickup)

TABLE 2-37 Dummynet features

Feature Availability Remark

Time resolution Y System clock (up to 10 KHz)
Interception Y Input and out put
Latency Y Constant value
Bandwidth limitation Y Simply apply by using “pipe”
By using the rules to define the match
Packet classifier Y criteria (e.g. match address, ports, protocol
and sockets)
Multipath and multihop
Y By using classifier options
FIFO queues are the default, RED and
Queue management Y
GRED queues are also supported.
Uniform random loss patterns are
Packet dropping Y supported for non-congestion-related
By using “queue” and a mechanism to load
Packet scheduling Y and configure at runtime specific link
scheduling algorithms.
Others (e.g. packet
reordering, duplication N
and corruption)

Mainly, the basic object made available by Dummynet is called “pipe” with
given bandwidth, queue size, and delay to generate effects to the traffic, as shown in
Figure 2-42 [244], whereas, the packet classifier called “ipfw” is used to match

packets to a list of numbered rules, called “ruleset”. Therefore, those can be applied to
function in several manner patterns [243-244]. This is very useful for researchers to
use for performance evaluation of the system/network, including evaluating voice
quality provided by a VoIP system referring to network factor effects.
Dummynet is very useful and used widely [132, 217, 245-254], however, it has
few limitations, for example, errors may occur from reproducing the timing computed
by the model, whereas, running Dummynet on the operating system with appropriate
load can reduce opportunity of error occurrence. Further information can be found in

FIGURE 2-42 The structure of a dummynet “pipe” with configurable parameters

2.7 Statistical and Mathematical Tools

2.7.1 Hypothesis Tests
A hypothesis test is a type of statistical inference, which is the procedure of
generalizing from data collected in a sample to the characteristics of a population. It
uses the data from a sample to decide between a null hypothesis (H 0 ), which usually
makes a specific claim about the parameters, and an alternative hypothesis (H 1 ),
which describes that the null hypothesis is false or wrong, including wrong in a
particular way. For example, in one research, a null hypothesis has been defined that
the subjective MOS from G.711 is equal to the subjective MOS from G.722, while an
alternative hypothesis has been defined that the subjective MOS from G.711 is not
equal to subjective MOS from G.722. The H 0 and H 1 can be represented respectively
as follows:
H 0 : subjective MOS from G.711 is equal to subjective MOS from G.722
H 1 : subjective MOS from G.711 is not equal to subjective MOS from G.722
To perform a hypothesis test, it is necessary to gather data from a sample drawn
from the group of subjects that can be used to discriminate the hypotheses. For
example, if the hypotheses involve the value of the parameter of the group of subjects,
then the data should provide a sample estimate corresponding to that parameter. Next
it is necessary to calculate a test statistic that reflects how different the data in the
sample are from what it is expected if the null hypothesis were true. To obtain the p-
value it is important to know the distribution of the test statistic. In general, it involves
knowing the degree of freedom associated with the test statistic and the form of the
distribution (e.g. t and F). If the p-value is lower than the significance level, the null
hypothesis is rejected and the alternative hypothesis is accepted. On the other hand, if
the p-value is higher than the significance level, the null hypothesis is accepted.

hypothesis. Normally it is indicated by the symbol , which is established before

Therefore, the significance level is the breakpoint to accept or reject the null

calculating the p-value of the test statistic. In many research areas, a standard  of
0.05 is widely used. It means, it is possible to have a 5% chance of obtaining results
as inconsistent with the null hypothesis as it has been done using the data from a

sample drawn from the group of subjects. The steps for a hypothesis test can be
summarized as follows:
1) Determine the null and alternative hypotheses
2) Draw a sample from the group of subjects of interest
3) Collect data
4) Calculate a test statistic based on the data
5) Find the p-value of the test statistic
6) Discriminate whether the null hypothesis is rejected or not
In this thesis, t-test is used for comparison between the results from two groups
of subjects whether they are the same or different, whereas, the Analysis of Variance
or ANOVA is used for comparison among the results from at least three group of
subjects. The most simple method to discriminate whether the null hypothesis is

level  of 0.05, with 95% confidence interval, the null hypothesis is rejected and then
accept or reject is considering the p-value, if its value is less than the significance

accept the alternative hypothesis instead. There are several equations to determine t
statistic and F statistic that are used to calculate for various cases. However, the
standard between-subjects t-test and the One-way between-subjects ANOVA, the
t statistic and F statistic can be found using the equations in Table 2-38 [255-257]. For
further information about t-test and ANOVA, it can be found from [255-259].

TABLE 2-38 Comparison of t-tests for two groups and ANOVA for three groups

t-test (t statistic) ANOVA (F statistic)

H0: 1 = 2 H0: 1 = 2 = 3
H1: 1  2 H 1 : Not all of the means are equal

x1  x 2 n1 ( x1  x ) 2  n 2 ( x 2  x ) 2  n 3 ( x 3  x ) 2
I 1

s12 s 22 F
(n1  1) s1  (n 2  1) s 22  (n 3  1) s 32

N I
n1 n 2

Where x1 and x2 are the Where x1 , x 2 and x 3 are the means of the three groups
means of two groups; and x is the mean of all; s12 , s 22 and s 32 are the variances
s12 and s 22 are the variances of the three groups; n1 , n 2 and n3 are the sample sizes
of two groups; and n1 and of the three groups; N is total sample sizes, and I is 3
n 2 are the sample sizes of for three groups.
two groups.

2.7.2 Model Fitting

There are two statements from the most frequently citations about the term
‘model’ as follows [260]:

“…, a model is not verifiable directly by an experiment. For all models are both
true and false… The validation of a model is not that it is “true” but that it generates
good testable hypotheses relevant to important problems.” by R. Lavins
“All models are wrong, but some are useful.” by George E.P. Box
A model is a mathematical description of a state or process. It is used to
understand about that process/mechanism. Nonlinear regression can be applied to fit a
mathematical model to the data to determine the best fit values of the parameters of
the model.
The purpose to use a model is not to describe the system perfectly, due to a
perfect model may have too many parameters to be useful. Therefore, building a
model is to find a simple model as possible that can describe the system and can fit to
the data.
Therefore, model fitting is the method to fit a model to experimental data or to
choose which model best fits the data [261]. Basically, the principle of model fitting
can be described as in Figure 2-43 [262]. By using a model which may include the
object as well as the instrument, it can be computed modeled data m(p) from the
parameters p. The model data are then compared with the real data d to get the
residuals r. However, the fitting may be required repetition with the set of parameters
p to minimize the residuals.
Nowadays, there are several available tools that can create both linear and non-
linear regression models [263-264], including the trendline in Microsoft Excel and the
curve fitting toolbox and the surface fitting toolbox in the Matlab that are appropriate
to one variable and two variables respectively.

FIGURE 2-43 Model fitting process overview

2.7.3 Model Evaluation Tools: Mean Absolute Error and Mean Absolute
Percentage Error
After obtaining a new model, it is necessary to be evaluated by comparing with
the old model or the existing model. In this thesis Mean Absolute Error (MAE) and
Mean Absolute Percent Error (MAPE) [265-267], the simple model evaluation
methods, have been selected to evaluate the enhanced E-model (E2-model) and
ThaiVQE model, comparing with the standard E-model that is the objective
measurement method to be enhanced. MAE is sensitive to small deviations from zero
and it can be considered as a robust measure of accuracy of the model. It tends to
prefer models that produce occasional large failures or errors. However, in this thesis
MAE is based-on the 5-point scale (MOS), which may be inconvenient to observe.

This is the reason to use MAPE for evaluation as well, because MAPE shows the
errors in percentage. Both MAE and MAPE equation can be shown as follows:

MAE   | xi  xi |
1 n
n i 1

MAPE    i
 1 n x   xi 
  100%

 n i 1 xi 

Where xi is the observed value or the subjective data in the meaning of this
thesis, xi is the estimated or predicted value from the model, and n is the number of
instances in the dataset.

In this chapter, the methodology for subjective and objective measurement for
this research, as shown in Figure 3-1, has been separated and described into five
phases in this chapter. This consists of the experimental design phase (Phase I),
preparation phase (Phase II), pilot phase (Phase III), intensive subjective test phase
(Phase IV) consisting of listening opinion tests, conversation opinion tests and
interview tests, and then the objective test phase (Phase V) with PESQ and E-model.
However, details of the Thai bias factor, ThaiVQE, and E2-model are presented in
Chapter 5.

FIGURE 3-1 The methodology for objective measurement method enhancement

using Thai bias factor

3.1 Phase I: Experimental Design

3.1.1 Subjective Test Design
Conversation opinion tests have been selected for the intensive subjective tests
because the advantages include the ability to assess voice quality referring to packet
delay, which ACR tests do not support and also the results can be gathered from each
pair of subjects in each round. G.711A-law has been selected because it is a popular
codec that can provide very good voice quality compared to other narrowband codecs
[16], and is generally used in Thailand, while G.722, G.729 and G.723.1 (5.3 kbps)
have also been selected for comparison. G.722 has been selected because it is a
wideband codec that requires maximum bandwidth for voice payload which is the
same as G.711 (64 kbpa). G.729 has been selected because it is a popular codec for
use over a wide network area, while G.723.1 (6.3 kpbs) is an option. For network
factors, packet loss and packet delay, have been selected because they are the major
network factors that degrades voice quality and the major factors in the VoIP research
However, before conducting the actual conversation opinion tests, the pilot tests
were also designed with the reasons as follows:
1) To ensure the VoIP testbed system.
2) To reveal unexpected problems that may occur
The pilot tests were divided into ACR listening opinion tests and interview
tests. The main reason to select ACR listening opinion tests is due to these tests are
claimed as the most popular listening opinion tests, while the interview test have been
selected to fulfill the limitation testing of the VoIP testbed system with a wideband
codec, G.722. The limitation is inability of the Asterisk VoIP system used as the VoIP
testbed system cannot play wave files in wideband format. All subjective tests are
summarized as in Table 3-1, 3-2 and 3-3.
For the conversation opinion tests, G.711A-law has been tested intensively
because it provides better voice quality than other codec, such as G.729 and G.723.
Thus the result from studying with G.711A-law can be used as the benchmark for
other codecs. However, due to the expectation about voice quality from G.722 >
G.711 > G.729, therefore, G.722 and G.729 have also been tested with some
conditions referring to packet loss and packet delay only, to provide the control values
for G.711A-law results.
From Table 3-1 and 3-2, the pilot phase is designed to have 272 subjects totally,
while the conversation opinion tests were designed to intensively test packet loss of 0,
1%, 2%, 3%, 5%, 6%, 10% (10% is 2 times the maximum packet loss rate
recommended by the manufacturer and packet delay of 0 ms, 400 ms and 800 ms (800
ms is 2 times the maximum packet delay recommended by ITU-T). However testing
with packet loss of 15% and 20% and packet delay of 1500 ms has also been
conducted as optional, only to see the trend. Therefore the conversation opinion tests
required 1,008 subjects totally. Of course, those prospect subjects are expected to be
the KMUTNB students, particularly undergraduate students.

TABLE 3-1 Summary about the pilot tests

Minimum No. of
Codec Condition
G.711 Direct 24
Pilot – ACR Tests G.729 Direct 24
G.723.1(6.3 kbps) Direct 24
Pilot – Interview G.711 Direct 100
Test G.722 Direct 100

TABLE 3-2 Summary about the conversation opinion tests, each scenario required at
least 24 subjects (total 576 subjects)

Condition Minimum No. of

Packet Loss (%) Packet Delay (ms) Subjects
0A 0 0 24
1A 1 0 24
2A 2 0 24
3A 3 0 24
5A 5 0 24
6A 6 0 24
10A 10 0 24
15A 15 0 24
20A 20 0 24
0B 0 400 24
3B 3 400 24
5B 5 400 24
10B 10 400 24
20B 20 400 24
0C 0 800 24
3C 3 800 24
5C 5 800 24
10C 10 800 24
20C 20 800 24
0D 0 1500 24
3D 3 1500 24
5D 5 1500 24
10D 10 1500 24
20D 20 1500 24

TABLE 3-3 Test scenarios

Packet Delay (ms)

0 400 800 1500
0 0A* 0B* 0C* 0D*
1 1A*
Packet Loss (%)

2 2A*
3 3A* 3B* 3C* 3D*
5 5A* 5B* 5C* 5D*
6 6A*
10 10A* 10B* 10C* 10D*
15 15A* 15B 15C 15D
20 20A* 20B* 20C* 20D*
Notes: * = Conversation opinion tests with G.711A-law, G.729 and G.722
* = Conversation opinion tests with G.711A-law
A, B, C, D = 0 ms, 400 ms, 800 ms and 1500 ms respectively

3.1.2 Objective Test Design

Objective tests require an objective measure tool. PESQ and E-model have been
considered. PESQ can request collaboration from the TOT Innovation Institute for
free, while E-model tool can request collaboration from a company in Bangkok but
may be charged a rental rate. However, PESQ has been reported to have content
dependency, therefore, this issue is necessary to be investigated by tests in the pilot
phase as well.
Both PESQ and E-model requires running repetition for each condition of at
least 30 times, following [189, 224] before discarding of the outliers. Each condition
should be tested about 2-3 minutes. Moreover, those are objective tests, therefore, all
scenarios as shown in Table 3-3, will run with the selected objective measurement

3.2 Phase II: Preparation Phase

3.2.1 Preparation of Facilities and Materials
The required facilities for this research consist of:
1) Two soundproof rooms: at least one soundproof room is used for the ACR
and interview tests, whereas two soundproof rooms are required for conversational
2) One sound level meter and reverberation time analyzer: these are required
for preparation of soundproof rooms.
3) VoIP testbed system: this system consists of the VoIP system that has
been implemented using Asterisk, open source software, the network emulator that
has been applied by using dummynet, two small switches and 2-3 IP phones.
4) A set of Thai speech: this is applied by using TSST, which has been
designed and developed based on ITU-T P.800 recommendation. This is required for
ACR tests, the pilot tests.
5) Three IP phones: actually only two IP phones are required for testing. The
additional one is required as a spare. These IP phones must support codecs, including
G.711, G.729, G.723.1 and G.722.

For the soundproof rooms, after surveying, at the starting period of this thesis,
there was no available soundproof room in compliance with ITU-T P.800 standards.
Therefore, to avoid the cost issue, the studio room at the 7th floor of the Central
Library, KMUTNB, has been selected (its plan is shown in Figure 3-2 [268]). The
room was modified by putting some carpet down and a second layer of glass on the
windows in order to reduce the room noise and reverberation time which are two
important acoustic properties. Moreover, a sound level meter and reverberation time
analyzer was required for measurement of those two values (measurement of sound
level of the background noise is shown in Figure 3-3). Figure 3-4 and 3-5 present the
major changes both ‘before’ and ‘after’, while the results ‘before’ and ‘after’
modification of the room are presented in Table 3-4 [268].
For the VoIP testbed system, it consists of two computers, as shown in Figure 3-
6 and 3-7. The first computer was installed with Linux, then installed with Asterisk—
an open source VoIP application which became the testbed system. The second
computer was installed with Linux, and then installed with a module of dummynet to
work as the network emulator that can generate delay and loss for each test scenario.
For the IP phones, they were a set of IP phones that can be bought in Bangkok. They
are required because this research is intended to study ‘real’ IP telephone systems
with Thai users.

FIGURE 3-2 Top view of the plan of the studio room, which has been improved to
be a laboratory for this research

FIGURE 3-3 The background noise was checked before testing

(a) (b)

FIGURE 3-4 Example of the window sill (a) before installing the second layer of
glass and (b) after installation of the second layer of glass

(a) (b)

FIGURE 3-5 Example of room floor (a) before installing the carpet and (b) after
installation of the carpet

TABLE 3-4 Comparison of imported values and properties of the modified room

‘Before’ ‘After’ ITU-T P.800

Room Size (m3) 152.81 120
Room Noise (dBA) 33.0 30.5 50.0
Average Reverberation Time (ms) 317 220 200 – 300
Standard Deviation of Reverberation Time (ms) 46.8 77.3 N/A

FIGURE 3-6 Diagram of the VoIP testbed system


FIGURE 3-7 The VoIP testbed system, while setting up, which consists of an IP
telephone system, a network emulator, a switch and IP phones
For a set of Thai speech, it is a subset of Thai Speech Set for Telephonometry
(TSST) [18, 269]. It was developed and funded by Human Language Technology
(HLT), National Electronics and Computer Technology Center (NECTEC). TSST
consists of 1 girl, 1 boy, 4 female and 4 male speeches. Each speaker recorded 25
pairs of sentences (or phrases). However, only 10 pairs of them, as in the Appendix A,
have been used for listening tests of this thesis by creating 100 speech lists. Then the
systematic-random approach for playing a speech list has been applied. Firstly, the
table of speech list has been created as in Table 3-5, before creating the systematic list
of speech.

TABLE 3-5 Speech lists

Speech Group or a Pair of Speech (Gn)
1 2 3 4 5 6 7 8 9 10
1 G1C1 G2C1 G3C1 G4C1 G5C1 G6C1 G7C1 G8C1 G9C1 G10C1
2 G1C2 G2C2 G3C2 G4C2 G5C2 G6C2 G7C2 G8C2 G9C2 G10C2
3 G1F1 G2F1 G3F1 G4F1 G5F1 G6F1 G7F1 G8F1 G9F1 G10F1
Speaker No.

4 G1F2 G2F2 G3F2 G4F2 G5F2 G6F2 G7F2 G8F2 G9F2 G10F2
5 G1F3 G2F3 G3F3 G4F3 G5F3 G6F3 G7F3 G8F3 G9F3 G10F3
6 G1F4 G2F4 G3F4 G4F4 G5F4 G6F4 G7F4 G8F4 G9F4 G10F4
7 G1M1 G2M1 G3M1 G4M1 G5M1 G6M1 G7M1 G8M1 G9M1 G10M1
8 G1M2 G2M2 G3M2 G4M2 G5M2 G6M2 G7M2 G8M2 G9M2 G10M2
9 G1M3 G2M3 G3M3 G4M3 G5M3 G6M3 G7M3 G8M3 G9M3 G10M3
10 G1M4 G2M4 G3M4 G4M4 G5M4 G6M4 G7M4 G8M4 G9M4 G10M4
C1, C2 = the child number 1 and 2 respectively
F1, F2,…,F4 = the female speaker number 1 to 4 respectively
M1, M2,…, M4 = the male speaker number 1 to 4 respectively

 Speech List1 = {G1C1, G2C2, G3F1, G4F2,…, G6F4, G7M1, G8M2,…,

Then the 100 speech lists has been created as follows:

 Speech List2 = {G2C1, G3C2, G4F1, G5F2,…, G7F4, G8M1, G9M2,…,


 Speech List3 = {G3C1, G4C2, G5F1, G6F2,…, G8F4, G9M1, G10M2,…,


 Speech List4 = {G4C1, G5C2, G6F1, G7F2,…, G9F4, G10M1,


G11M2,…, G3M4}
. .
. .

 Speech List100 = {G10M4, G1C2, G2C2, G3F1, G4F2,…, G6F4, G7M1,

. .

G8M2, G9M3}
Secondly, all speech lists are numbered and then they will be mapped to the
announcement number 2001 to 2100 as arranged in the VoIP testbed system.
3.2.2 Preparation of Subjects
According to Section 3.1, the minimum number of subject is totally 1280
subjects, see details in Table 3-6. However, the number of total subjects may be over
one thousand because each test should have additional subjects in case of outliers and
important missing values while gathering the test results.
The campaign to interest students in joining the tests used a variety of
techniques, for example, giving a pen and/or coupon to buy goods or services from
shops in KMUTNB, also billboards were installed at several points on campus.
Moreover, letters of request for collaboration of students to join the tests were issued
and sent to many lecturers, particularly the lecturers in department of applied

TABLE 3-6 The estimate numbers of subjects

Type of Subjective No. of Required
Tests Subjects (Min.)
ACR - Listening Test 72
Interview Test 200
576 subjects for G.711
Conversational Test 1008
432 subjects for G729 and G.722

3.3 Phase III: Pilot Phase

3.3.1 ACR Listening Opinion Tests
To ensure the reliability of the tested system and to reveal un-expected issues
that may occur, the ACR-listening test using the ‘direct’ system (with delay of < 10
ms and no loss) for each codec has been conducted. However, instead of testing with
24 subjects per codec as the minimum number in the design, the pilot phase
conducted with 36 subjects (intended to 18 female and 18 male subjects) due to get
rid of outliers.
Following the ITU-T Recommendation [156], the ACR-listening tests were
conducted in a studio room which was improved to become a laboratory for the tests,
which is at the 7th floor, Central Library in KMUTNB. The overview of the test
facilities is shown in Figure 3-8, whereas Figure 3-9 shows important network
properties of the VoIP testbed system and Figure 3-10 shows the main soundproof

FIGURE 3-8 The overview of VoIP system for the test, which provided an IP phone
for each participant in the soundproof room

FIGURE 3-9 The captured screen shot from investigation of packet delay and packet
loss before testing

FIGURE 3-10 An IP phone and facilities for the pilot test

For the subjects, they were KMUTNB’s students. The minimum of 24 subjects
per scenario were proposed although it is lower than the recommendation from the
original handbook on telephonometry that guided use of 30 subjects [160]. Moreover,
it suggested having both male and female subjects to participate in the test equally.
For ACR tests, the highest and the lowest MOS-LQS [270] from each subgroup of
subjects were considered as the outliers, this is why the proposed numbers of subjects
for each scenario became at least 28-36 subjects.
In each ACR test, after reading the instruction and listening to it again briefly, a
subject or participant, one by one per round, had to sit down and randomly obtained
an extension number of the announcement that linked to a wave file from the speech
list. Next dialed to listen to a speech list via an IP phone once, and then evaluate it
using a paper-based form, as shown in Appendix B. Each speech list consists of 10
different speech groups with lengths of about 8 seconds from each of the 10 speakers
(1 girl, 1 boy, 4 women and 4 men). That means every participant listened to all 10

sentence groups and all 10 speakers, therefore 10 value scores of evaluation should be
given using the 5-point scale (5=excellent, 4=good,..., 1=bad). However, while
listening, each subject would hear ‘beep’ tones (the 1st and 6th speech groups start
with a double beep tone) to notify and get ready before hearing each speech group as
in Figure 3-11 [18].
After gathering data from the test, at the first round, those would be classified
for the outliers, the abnormality in each subgroup, and the abnormal data from a
participant that gave scores too broad, for example, giving 2, 3, 4 and 5 for listening
to a speech list that was played with the same codec. For the abnormal data that is
called the outliers, they would be discarded. Then, the test would run again until
reaching the satisfactory number of subjects in each subgroup.

A speech group A beep tone A double beep tone

FIGURE 3-11 A speech list that starts with the speech by Child1 (a girl)

3.3.2 Interview Tests

Actually this part is optional, the study of perception of Thai users to the G.722
wideband codec, compared to G.711A-law narrow band codec. In this study it
requires at least 100 subjects per codec using the same ‘direct’ system as in the ACR
tests. Therefore this phase requires at least 200 subjects totally. However, the subjects
who joined the ACR listening opinion test within the six months would not be
allowed to participate in the interview test again.
In each interview test, each interviewee was invited to sit in the room one-by-
one, then an interviewer who was outside (could be a male or a female interviewer),
made a call and started the interview, as in Fig 3-12, taking 3-4 minutes.

FIGURE 3-12 Overview of the interview tests


Before finishing the interview, he or she would be asked to score the speech
quality that has been provide using G.711 or G.722, using the same scale, as in the
ACR tests. The data from all subjects were recorded and gathered using a paper-
based form by the interviewer. The result obtained from each interviewee is only one
3.3.3 PESQ Tests with Thai Speech
These tests have been conducted in order to investigate the issue about language
dependency of PESQ when operating with Thai speech. For this task, PESQ
measurement method has been used for estimating the MOS via the VQT, the
available voice quality tester at the TOT Innovation Institute. The selected codec for
the test was G.711A-law. There were 4 speech lists of TSST, to be tested. The Thai
speech sets for this have been selected with tone-balanced consideration, as in the
Appendix, whereas TSST-List10 has been selected because the speech is not too short
or long. As recommended, the length of each speech sample should be around 8-30
seconds [271]. There are 3 groups of speech sentences for TSST-List2 to TSST-List3
with 8-12 second lengths for each, whereas, there are only two groups for TSST-
List10 with 30 second lengths for each.
Not only Thai speech samples but also American English speech samples were
applied from the ITU-T website [272] and tested to compare with Thai speech
samples. The file format of each speech sample must be:

- Bitrate: 128 kbps - Sample rate: 8 kHz

- Sample size: 16 bit - Format: PCM
- Channel(s): mono

Also, to obtain 95% confidence of the PESQ results from the VQT, called
MOS-LQO [270], each speech list has been repeated at least 30 times, as
recommended in [189].
For the test, the VQT server which also includes the SIP phone function
simulates calls to the SIP phone. Then, the speech samples would be played while
testing before obtaining the MOS-LQO scores, as shown in Figure 3-13 [274].

FIGURE 3-13 Overview of the test system over an IP network

3.4 Phase IV: Intensive Subjective Tests using Conversation Opinion Tests
3.4.1 Conversation Opinion Tests with G.729 and G.722 Referring to Effects of
Loss only and Delay only

These tests were optional and conducted to compare G.711A-law referring to

loss and delay effects only. The main reason to conduct these was due to being used
as a reference for the results from G.711A-law. For example, with the same scenario
of packet loss (e.g. loss of 3%), if the result from G.711A-law is higher than G.722 or
lower than G.729, it may be implied that the results from these three codecs are
unreliable. Therefore, some of that raw data may be discarded. Then the conversation
opinion tests must be conducted with that scenario again.
3.4.2 Conversation opinion tests with G.711A-law referring to the 24 scenarios
This part requires two soundproof rooms separately for conversation of two
subjects. The conversation-opinion tests, following [156], are interactive counting
task (1-10) and a Richard’s task with random shapes [273], as in Figure 3-14, has
been chosen as the conversation tasks. Mainly, this experiment has been designed to
assess voice quality from G.711A-law referring to the scenarios in Table 3-2.
Therefore, there were 24 conditions to be conducted in the experiment, with 24
subjects (minimum) per condition. The participants who were used for the previous
interview and listening tests over the six months were allowed to join these tests.
The conversation tasks were conducted in the studio at the Central Library,
KMUTNB that has been adopted to become a laboratory for this study. Inside the
studio, the main room that has been used for the interview tests and the ACR listening
tests is called ‘Room A’, whereas, the existing sound recording room is call ‘Room
B’. The VoIP testbed system includes two IP phones, as in Figure 3-15.

FIGURE 3-14 Example of random shapes


FIGURE 3-15 The overview of the conversation opinion tests in this reseach

3.5 Phase V: Objective Tests Using E-model

This phase is the E-model tests as the results from the pilot test using PESQ
with Thai speech shows evidence of unreliability with PESQ when testing with
different sets of Thai speech (see further details in Chapter 4). Therefore, an E-model
has been selected to conduct the objective tests in this phase, using the E-model tools
as in Figure 3-16.
In collaboration with a company in Bangkok, the available E-model tool has
been used to run all scenarios/condition in Table 3-3. The results are presented in
Chapter 4.
Reminder of the assumption of this research, the E-model has been developed
and calibrated based on Westerners who use non-tonal languages and have different
cultures. Therefore the results from the E-model tool should be compared with the
results from subjective tests with Thai users to see the differences, then calibrate again
for use in Thai environments.

FIGURE 3-16 Diagram of E-model measurement


The results from all subjective and objective tests are presented in this chapter.
The objective method enhancement, which is essential to this research, is presented as

4.1 Pilot Phase Results

4.1.1 ACR Listening Opinion Tests
The result from the ACR listening opinion tests are presented in Figure 4-1, it
can be seen that the perceived voice quality metric, called MOS-LQS [270], is
consistent with general understanding and the report in [16] stating G.711 is better
than G.729, while G.729 is better than G.723.1 (5.3 kbps). However, the difference of
MOS-LQS between G.711 and G.729 is only 0.05. Therefore, the comparison focuses
on G.711 and G.729 which is based on types of voices (speech materials) consisting
of speech from female, male, and child speakers, as in Figure 4-2 [1], to allow
investigation in greater detail. The analysis has been conducted in the last section of
this chapter.
4.1.2 Interview Tests
The result from the interview tests are presented in Table 4-1 and Figure 4-3
[150], it can be seen that the perceived voice quality scores, called MOS-CQS,
provided by G.711 narrow band codec and G.722
wideband codec are almost the same. Therefore, it is necessary to analyze the
finding in the last section of this chapter.

FIGURE 4-1 The ACR-listening test results, called MOS-LQS, analyzed

information of G.711, G.729 and G.723.1 at 5.3 kbps with N = 247,
231 and 254 of numbers of votes from 31, 29 and 32 subjects, with
SD of 0.70, 0.70 and 0.80 respectively


FIGURE 4-2 MOS-LQS of G.711 vs G.729 from different types of voices (children,
female and male speakers) with N = 61, 93, 93, 58, 87 and 86
respectively, whereas SD of G.711 and G.729 to those voices are 0.72,
0.66, 0.70, 0.63, 0.72 and 0.72 respectively

TABLE 4-1 Interview Test Results

Codec No. of Subjects MOS-CQS SD

G.711 100 4.14 0.60

G.722 at 64 kbps 101 4.17 0.62

4.1.3 PESQ Tests with Thai Speech

Although each test has been conducted with different speech lists adopted from
Thai Speech Set for Telephonometry (TSST) [18, 269], and the American-English
speech set from ITU-T [272] under the same condition, the result seems different. The
result of objective MOS, called MOS-LQO [270], in Figure 4-4 can be the evidence to
confirm the issue about content dependency of PESQ [274].

% of Votes

Opinion Score

FIGURE 4-3 Comparison of percent of the votes: G.711 vs G.722 at 64 kbps, by all

Speech set

FIGURE 4-4 The MOS-LQO results of 4 lists of Thai speech and American-English
speech, where N = 48, 90, 90, 90 and 60, and SD = 0.27, 0.32, 0.32,
0.28 and 0.21 for American ITU-T, TSST-List2, TSST-List3, TSST-
List4 and TSST-List10 respectively

4.2 Intensive Subjective Test Results

4.2.1 Comparison of G.722, G.711 and G.729 Referring to Packet Delay
The results of these tests are shown in Table 4-2, which are also re-presented in
Figure 4-5. One can see that, all three codecs provide almost the same voice quality

(although G.722 is better than G.711, and G.711 is better than G.729) except the
perceived voice quality at packet delay of 1.5 s, G.722 was found to be the worst.
4.2.2 Comparison of G.722, G.711 and G.729 Referring to Packet Loss
The results of these tests are shown in Table 4-3, which are also re-presented in
Figure 4-6 [275]. The results show that, although there are effects of packet loss,
G.711 narrow band codec can provide almost the same voice quality as the G.722
wideband codec, while G.729 seems worse than other codecs which obviously refers
to the high packet loss rates at 10% and 20%.

TABLE 4-2 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to

G.722 G.711 G.729


N - SD N - SD N - SD

0 24 4.25 0.68 26 4.15 0.54 24 4.13 0.54

Packet Delay (s)

0.4 26 4.04 0.53 28 4.04 0.64 28 4.00 0.61

0.8 30 3.93 0.64 24 3.92 0.65 24 3.88 0.45

1.5 26 3.46 0.65 28 3.68 0.67 26 3.65 0.69


Packet Delay (ms)

FIGURE 4-5 Comparison of G.722, G.711 and G.729 referring to delay effects

TABLE 4-3 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to

G.722 G.711 G.729


N - SD N - SD N - SD

0 24 4.25 0.68 26 4.15 0.54 24 4.13 0.54

Packet Loss (%)

2 28 3.93 0.47 24 3.92 0.41 30 3.83 0.38

6 32 3.75 0.51 30 3.67 0.61 32 3.53 0.57

10 30 3.50 0.73 30 3.47 0.57 24 3.04 0.62

20 28 2.93 0.86 24 2.92 0.58 24 2.42 0.97


Packet Loss (%)

FIGURE 4-6 Comparison of G.722, G.711 and G.729 referring to loss effects

4.2.3 Conversation Opinion Tests with G.711 Referring to the 24 Scenarios of

Loss and Delay Effects
Results from the conversation opinion intensively tested with G.711 referring to
24 scenarios of packet loss and packet delay effects after validating are presented in
Table 4-4, whereas Table 4-5 presents the numbers of subjectives who pacticipated in
each scenario. These results refer to packet loss of 1-20% and packet delay of 0-1.5s
which will be applied to the E-model enhancement in Chapter 5.

4.2.4 Expectation of the Subjects/Participants to VoIP Quality

Although the the subjects/participants are not representatiive of all Thai users,
the results from this section might be referred to as the baseline for providing VoIP
quality in Thai environments. The MES from 806 subjects (368 male and 451 female
subjects) with the average age of 19.82 years old (SD=1.95) has been determined, it is
equivalent to MOS of 3.40 (SD=0.92), as in Table 4-6.

4.3 Objective Test Results

The results, called MOS-CQE, from objective test using the E-model tool with
G.711 referring to 27 scenarios of packet loss and packet delay effects after validating
are presented in Table 4-4 and Figure 4-7. These results refer to packet loss of 1-20%
and packet delay of 0-1.5 s. However, some data in Table 4.4 is not the same as in
Table 4-2 and 4-3 because of validation with additional new data.

TABLE 4-4 MOS-CQS versus MOS-CQE from G.711 referring to loss and delay
effects after validating

Packet Delay (s)

0 0.4 0.8 1.5



0.51 0 0.64 0.0 0.75 0.05 0.68 0.04

4.16 4.4 4.04 4.29 3.83 3.96 3.72 3.50

0.55 0.01
4.04 4.25
1 - - - - - -

0.41 0.01
3.92 4.21
2 - - - - - -

0.67 0.05 0.59 0.08 0.51 0.08 0.68 0.05

3.82 4.15 3.80 4.10 3.77 3.56 3.69 3.44
Packet Loss (%)

0.62 0.02 0.65 0.03 0.70 0.06 0.58 0.06

3.69 3.89 3.63 3.75 3.58 3.44 3.50 3.41

0.63 0.04
3.65 3.88
6 - - - - - -

0.57 0.04 0.51 0.04 0.50 0.07 0.70 0.03

3.47 3.58 3.46 3.47 3.38 2.96 3.33 2.93

0.61 0.03 0.06 0.07 0.09

2.97 3.30 3.28 2.62 2.56
15 - - -

0.58 0.04 0.80 0.04 0.58 0.01 0.58 0.05

2.92 2.79 2.81 2.72 2.79 2.60 2.64 2.53


(a) MOS-CQS results (b) MOS-CQE results

FIGURE 4-7 Representing of MOS-CQS versus MOS-CQE, using data from Table

TABLE 4-5 Numbers of subjects, total of 400 subjects referring to the tests with

Packet Delay Condition (s)

0 0.4 0.8 1.5

0 32 28 30 32
1 24 - - -
Packet Loss Condition (%)

2 24 - - -
3 28 24 26 26
5 26 24 26 28
6 26 - - -
10 30 26 26 24
15 36 - - -
20 24 26 24 24

TABLE 4-6 Statistic from the survey of the mean expectation score of voice quality
bases-on 5-point scale from 828 Thai users

Category Statistic

Male 43.2%
Female 56.8%
<18 5.6%
18 17.8%
19 22.0%
Average = 19.85
Age 20 21.1%
SD = 1.96
21 18.4%
22 8.5%
>22 6.6%

Vote 1 5.8%

Vote 2 4.7%
Average = 3.41
Expectation Score Vote 3 40.9%
SD = 0.92
Vote 4 40.2%

Vote 5 8.3%

4.4 Analysis and Comparison

4.4.1 Analysis of MOS from ACR Listening Opinion Tests
In Figure 4.1, it seems that results from the ACR listening opinion tests, called
MOS-LQS of G.711 and G.729 are almost the same, whereas, G.723.1 is different
from the other codecs. However, to ensure the raw data without minimum and
maximum average scores from female and male subjects for each codec, and
minimum and maximum average scores for speakers for each codec were discarded,
then the data was analyzed using ANOVA and T-test respectively, with the
hypotheses H1-H4, as in Table 4-7. Also, to investigate the type-of-speakers
(consisting of age and gender such as child, female and male speakers) from the TSST
affects the MOS-LQS or not, the data without outliers, have been considered using
statistical tools with these following hypotheses H5-H7 as in Table 4-7, whereas, the
analyzed results from ANOVA and T-test, are shown in Table 4-8. From Table 4-8,
the null hypotheses of H1, H3 and H4 are rejected because p-value of each is less than
0.05 (< 0.001) therefore the alternative hypotheses are accepted. That means G.723.1
at 5.3 kbps provides voice quality worse than G.729 and G.711 with significant
difference. Besides, the null hypotheses of H5 – H7 are accepted because the p-values

are 0.114, 0.068 and 0.840 respectively, higher than 0.05, which means there is no
significant difference about different type of speakers.
4.4.2 Analysis of MOS from Interview Tests
The results from G.711 narrowband codec and G.722 wideband codec are
almost the same. Therefore, the t–test and ANOVA with 95% confidence interval
were used for analysis with the hypotheses H8-H12 as in Table 4-7. The output from
the T-test and ANOVA are shown in Table 4-8. It can be seen that the p-value of H8
is 0.743, higher than 0.05 significantly. Therefore, it is proven that there is no
significant difference between the speech quality perception scores provided by G.711
and G.722. On the other hand, it can be said that G.711 narrow band codec provides
good vice quality at the same levels as G.722 wideband codec. For H9, the
verification of variation of two IP phones resulted in a p-value of 0.437. This means
there is no significant difference. For H10, H11 and H12, the verification about the
issues of gender of interviewee and interviewer resulted in a p-value of 0.427, 0.099
and 0.212 respectively. This also means there is no significant difference.
4.4.3 Analysis of MOS from PESQ Test with Thai Speech
In Figure 4-3, it can be seen that the MOS from PESQ tests with English speech,
set downloaded from the ITU-T website, show the highest MOS, 3.93, whereas, the
MOS from the tests with Thai speech sets shows various results. Therefore, to prove
the language dependency issue, focusing on Thai speech set only. The hypothesis,
H13, has been tested with the raw data obtained from PESQ using ANOVA. The
hypothesis result is shown in Table 4-8. The output from ANOVA test shows p-value
of < 0.001, this means different Thai speech sets provided significant differences due
to the issue of content and language dependency of PESQ.
4.4.4 Analysis of MOS from Conversation Opinion Tests Referring to Delay
From previous sections, it has been proven that the three codecs, G.711, G.729
and G.722 provide different voice quality insignificantly. However, referring to delay
effects, it has been found that the results of MOS-CQS from G.711, G.729 and G.722
are not different significantly if delay is not over 0.8 s, whereas the MOS-CQS from
G.722 is the worst with the packet delay rate of 1.5.
4.4.5 Analysis of MOS from Conversation Opinion Tests Referring to Loss Effects
This was found to be similar to 4.4.4 but the MOS-CQS from G.729 trends to be
worse than MOS-CQS from G.711Alaw and G.722. Particularly, it can be obviously
seen that G.729 provides worse voice quality than the other codecs at the packet loss
rate of 10% and 20%.
4.4.6 Comparison of MOS from three subjective methods with Thai subjects and
This is an additional measure to compare subjective MOS from the same codec,
G.711, with the direct condition (no loss and delay) but different methods, consisting

The MOS from these three methods, ACR, interview and conversation, are 4.23 
of ACR - listening opinion tests (ACR), interview tests and conversation opinion tests.

0.70, 4.14  0.60 and 4.16  0.51 respectively. This test aims to verify whether there
is any issue about different methods when testing with Thai users and Thai language,
therefore H14, as in Table 4-7 has been considered. The analyzed result in Table 4-8
presents that p- value is 0.505, higher than 0.05. Therefore, the null hypothesis is
accepted, these three methods present the same result.

4.4.7 Comparison of the Conversation Opinion Tests with G.711 and the
E-model tests
According to MOS-CQS and MOS-CQE from G.711 in Table 4.4, the gathered
data has been obtained from 644 subjects. Nevertheless, only the validated data from
400 subjects referring to packet loss of 0-10% and packet delay of 0-0.8 s is applied to
the E-model enhancement in Chapter 5. All 400 subjects have the same criteria
because they were the undergraduate students in KMUTNB that offer many programs
based on science and technology. They consist of 219 female and 181 male subjects,
with the average age of 19.79 (SD=2.05).
Comparing MOS-CQS and MOS-CQE in Table 4-4 by re-presenting as in
Figure 4-7, it can be seen that mostly each pair has different values, while the
difference can obviously be seen that each line of MOS-CQE has more slop than each
line of MOS-CQS particularly. Therefore, this is evidence that the standard E-model
requires re-calibration or modification with subjective MOS such as MOS-CQS from
Thai users, in order to gain higher accuracy, reliability and confidence for use in Thai
4.4.8 Comparison of MOS from Thai users versus users from other countries
Although it is difficult to prove that language and culture affect MOS which is
the metric to measure voice quality perception from different countries, the outcome
of this research can show evidence about this issue, as in Table 4-9. From the table,
The MOS of G.729 from conversation opinion tests with Thai subjects and language,
4.13, is higher than the MOS of the same codec from ACR - listening opinion tests
with American subjects and American-English language and Japanese language, 3.69
and ~3.4 respectively but close to English language, ~4.15. For G.711, all MOS
values from three subjective tests, 4.14, 4.16 and 4.13 are lower than the MOS from
ACR - listening opinion tests with Korean subjects and language but higher than the
MOS from ACR - listening opinion tests with American subjects and American-
English. For G.722 wideband codec, the MOS values from interview and conversation
opinion-tests, 4.17 and 4.25, are better than the MOS from Nokia Lab using DCR
listening opinion tests with Finnish subjects and language, 4.02 but it can be implied
that they are similar to the MOS from DCR-listening opinion tests with American
subjects and American-English, 4.26 and from ACR-listening opinion tests with
Chinese subjects and language, 4.23. However, the MOS from ACR - listening
opinion tests with French subjects and language is the highest, 4.41.
In summary, the MOS of G.729 varies between 3.4 – 4.15, while MOS of G.711
is 4.05 – 4.41, and MOS of G.722 (64kbps) is 4.02 - 4.41. This summary shows that
there are variations of MOS from different laboratories that conducted tests with
different languages and cultures.

TABLE 4-7 Hypotheses

Item Null Hypotheses and Alternative Hypothesis
H1 H1 0 : The perception of Thai subjects to G.711, G.729 and G.723.1 at 5.3 kbps is the same
H1 1 : The perception of Thai subjects to G.711, G.729 and G.723.1 at 5.3 kbps is different
H2 H2 0 : The perception of Thai subjects to G.711 and G.729 is the same
H2 1 : The perception of Thai subjects to G.711 and G.729 is different
H3 H3 0 : The perception of Thai subjects to G.711 and G.723.1 at 5.3 kbps is the same
H3 1 : The perception of Thai subjects to G.711 and G.723.1 at 5.3 kbps is different
H4 H4 0 : The perception of Thai subjects to G.729 and G.723.1 at 5.3 kbps is the same
H4 1 : The perception of Thai subjects to G.729 and G.723.1 at 5.3 kbps is different
H5 H5 0 : The type-of-speakers affects the perception of Thai subjects to G.711 is the same
H5 1 : The type-of-speakers does not affect the perception of Thai subjects to G.711 is different
H6 H6 0 : The type-of-speakers affects the perception of Thai subjects to G.729 is the same
H6 1 : The type-of-speakers does not affect the perception of Thai subjects to G.729 is different
H7 H7 0 : The type-of-speakers affects the perception of Thai subjects to G.723.1 at 5.3 kbps is the
H7 1 : The type-of-speakers does not affect the perception of Thai subjects to G.723.1 at 5.3
kbps is different
H8 H8 0 : The speech quality perception of Thai subjects/interviewees to G.711 and G.722 is the
H8 1 : The speech quality perception of Thai subjects/interviewees to G.711 and G.722 is
H9 H9 0 : The speech quality perception of Thai subjects/interviewees to different IP phone (under
test) referring to G.711 and G.722 is the same
H9 1 : The speech quality perception of Thai subjects/interviewees to different IP phone (under
test) referring to G.711 and G.722 is different
H10 H10 0 : The perception of different gender of Thai subjects/interviewees to G.711 and G.722 is
the same
H10 1 : The perception of different gender of Thai subjects/interviewees to G.711 and G.722 is
H11 H11 0 : The perception of Thai subjects/interviewees to different gender of interviewer
referring to G.711 and G.722 is the same
H11 1 : The perception of Thai subjects/interviewees to different gender of interviewer
referring to G.711 and G.722 is different
H12 H12 0 : The perception of the same/opposite gender of subjects/interviewees and interviewers
referring to G.711 and G.722 is the same
H12 1 : The perception of the same/opposite gender of subjects/interviewees and interviewers
referring to G.711 and G.722 is different.
H13 H13 0 : The MOS from PESQ tests with four Thai speech sets, TSST-list2, list3, list4 and list10
are the same
H13 1 : The MOS from PESQ tests with four Thai speech sets, TSST-list2, list3, list4 and list10
are different
H14 H14 0 : The MOS from ACR-listening tests, interview tests and conversation opinion tests are
the same
H14 1 : The MOS from ACR-listening tests, interview tests and conversation opinion tests are

TABLE 4-8 Hypothesis Analysis Result with 95% CI

Hypotheses p-value

H1: G.711 vs G.729 vs G.723.1at 5.3 kbps < 0.001*

H2: G.711 vs G.729 0.443

H3: G.711 vs G.723.1 at 5.3 kbps < 0.001*

H4: G.729 vs G.723.1 at 5.3 kbps < 0.001*

H5: Type-of-speakers affects MOS-LQS of G.711 0.114

H6: Type-of-speakers affects MOS-LQS of G.729 0.068

H7: Type-of-speakers affects MOS-LQS of G.723.1 at 5.3 kbps 0.840

H8: MOS-CQS of G.711 VS G.722 0.743

H9: IP phone1 w/ G.711 VS IP phone1 w/ G.722 VS IP phone2 w/

G.711 VS IP phone2 w/ G.722

H10: Gender of interviewee effects to MOS-CQS of G.711 VS G.722 0.427

H11: Gender of interviewer effects to MOS-CQS of G.711 VS G.722 0.099

H12: Same/opposite gender of interviewee and interviewer effects to

MOS-CQS of G.711 VS G.722
H13: Comparison of MOS from PESQ tests with four Thai speech sets
< 0.001*
are the same
H14: Comparison of MOS from ACR-listening tests, interview tests and
conversation opinion tests
Remark: Significant at p-value < 0.05

TABLE 4-9 Comparison of MOS from Thai users and users from different languages
and cultures, adopted from Table 2-34 (in Chapter 2) and Section 4.1-

No. of
Codec Method Condition Language MOS Remarks

English ~4.15 Approximated
No loss & specified
ACR from the
delay Not
Japanese ~3.4 figures

3.69  0.71
G.729 Clean Psytechnics
ACR American-English 32
(8 kbps) Speech (Report)

4.18  0.70
No loss &

4.13  0.54
No loss &
Conversation Thai 24 KMUTNB

American-English 32 4.05
Clean 8 kHz
Speech sampling rate
Korean 32 4.41

4.23  0.32
No loss &
A-law ACR Thai 31 KMUTNB
(64 kbps)
4.14  0.60
No loss &
Interview Thai 100 KMUTNB

4.16  0.51
No loss &
Conversation Thai 32 KMUTNB

French 32 4.41
Clean 16 kHz
Speech sampling rate
Chinese 32 4.23

-26dBov, American- English 32 4.26  0.834 Dynastat Lab.

DCR Clean
4.02  0.752
(64 kbps)
Speech Finnish 32 Nokia Lab.

4.17  0.62
No loss &
Interview Thai 101 KMUTNB

4.25  0.68
No loss &
Conversation Thai 24 KMUTNB

This chapter presents two new models that consist of an objective Enhanced E-
model using Thai Bias Factor (E2-model), and a Thai Subjective-VoIP Quality
Evaluation Mathematical Model (ThaiVQE). Details are as follows:

5.1 E2-model
The method that is proposed to enhance the standard E-model using “Thai Bias
Factor” obtained from MOS-CQS for Thai users of the G.711 codec to modify an E-
model. This Thai Bias factor ( B ) has been developed to cover language, cultural and
nationality factors. The new model is called the Enhanced E-model or E2-model. The
method used to obtain this factor and application to enhance the E-model consisted of
the following steps:
Step 1: Subjective data gathering of MOS-CQS values
Table 4-4 shows raw subjective data obtained for MOS-CQS from Thai users
and the results of tests carried out to check and validate data for the 15 scenarios
included in the gray area of Table 3-3 (in Chapter 3), i.e., packet losses up to 10% and
packet delays up to 0.8s.
Step 2: Objective data gathering of MOS-CQE values
Similar to step 1, Table 4-4 (in Chapter 4) shows raw objective data obtained for
MOS-CQE and the results of tests carried out to check and validate data for the same
15 scenarios of packet loss and packet delay used in step 1.
Step 3: Finding equation for Thai bias factor as a function of packet loss and
packet delay.
The Thai bias factor is defined as B TH for R values given on a 100-point scale
and BTH for MOS values given on a 5-point scale or traditional scale.
The equation for the bias factor BTH has been computed as follows (see Figure
5-1). As the first step, bias factors for each scenario of packet loss and delay were
computed by subtracting the MOS-CQE values obtained in step 2 from the MOS-CQS
values obtained in step 1. Then, an equation for BTH as a function of packet loss and
delay was found by using the surface fitting tool in Matlab to obtain a best fit to the
data. The computed equation is given in (5-4).
Step 4: E-model enhancement using BTH
The enhanced E-model given in Eq. (5-4) was then obtained by replacing the
general bias factor B ' in the 5-point scale E-model Eq. (5-2) with the Thai bias factor
BTH calculated from Eq. (5-3). This enhanced E-model can then be applied for use in
Thailand as shown in Figure 5-2.

FIGURE 5-1 Overview of Finding Thai bias factor

FIGURE 5-2 Overview of the proposed E2-model with Thai bias factor

The standard E-model consists of the following equations.

R = Ro-Is-Id-Ie+A+B (5-1)
MOS-CQE =4.5 ; R>100
MOS-CQE =1+0.035R+R(R-60)(100-R)7*10-6+ B ' ;0<R<100 (5-2)
MOS-CQE =1 ; R<0

Ro is the basic signal-to-noise ratio, including noise sources such as,
room noise and circuit noise.
Is is the signal impairment factor which is a combination of all
impairments which occur more or less with the voice signal
Id is the delay impairment factor that is caused by packet delay.
Ie-eff is the effective equipment factor that is caused by codecs.
A is the advantage factor that allows for compensation of
impairment factors when there are other advantages accessible to
the user.

Bis the bias factor from intensive subjective tests, based-on 100-
point scale
MOS-CQE is the objective MOS from standard E-model
R is the R-value from E-model tool
B is the bias factor from intensive subjective tests, based-on 5-point
The Thai bias factor equation and the proposed enhanced E-model equation are
as follows:

BTH = -0.2806+1.297L-0.4424D+4.505LD+0.8994D2 (5-3)
MOS-CQE*=4.5 ; R>100
-6 '
MOS-CQE* =1+0.035R+R(R-60)(100-R)7*10 + B TH ;0<R<100 (5-
4) MOS-CQE*=1 ; R<0
BTH is the Thai bias factor calculated as described in steps 1 to 3
L is packet loss percentage
D is packet delay (s)
MOS-CQE* is the modified objective MOS from the new E2-model

5.2 Thai Subjective VoIP Quality Evaluation Mathematical Model

The E 2 -model described in Section 5.1 is an objective model that has been
obtained by enhancing an objective E-model with a Thai bias factor. In this section,
an alternative subjective model is described. This subjective model has been obtained
by using raw data from subjective tests of the G.711 codec with 400 Thai subjects.
The method used to develop a mathematical model, called the Thai Subjective VoIP
Quality Evaluation Mathematical Model or ThaiVQE model, is shown in Figure 5-3.
Figure 5-4 shows the results obtained in the model fitting step. The figure shows the
raw data and a surface obtained by using the surface fitting tool in Matlab to obtain a
best fit to the data. The best fit for the MOS-CQS data as a function of packet loss
and packet delay had a Root Mean Square Error (RMSE) of 0.5884 and R-square of
10.126. The result of the surface fitting is shown in Eq. (5-5) which is an equation for
a modified MOS-CQS called MOS-CQS* as a function of packet loss and packet
delay. The function in Eq. (5-5) will be called the ThaiVQE model.
Table 5-1 shows the modified MOS-CQS* values calculated from the ThaiVQE
Eq. (5-5) for a representative set of values of packet loss and packet delay. From the
ThaiVQE table 5-1 it can be seen that Thai users are satisfied when using VoIP
applications/services provided by G.711 if the packet loss in the network is less than
1% and the packet delay in the network is less than 300 ms (shown as the black area
in Table 5-1). For higher values of packet loss and delay, some users are satisfied and
some are dissatisfied if packet loss and packet delay occur together as 6% and 300 ms,
5% and 600 ms, and 4% and 800 ms (shown as the dark grey or dark green area in
Table 5-1). However, for even higher values of packet loss and delay, it can be seen
that some Thai users still remain satisfied even for packet losses as high as 10% and

packet delays as high as 800 ms (shown as the light grey or light green area in Table

FIGURE 5-3 Overview of development of ThaiVQE model

MOS-CQS* = 4.128-9.868L-0.2908D+31.38D2+2.457LD (5-5)

MOS-CQS* is Thai based - subjective MOS
L is packet loss percentage
D is packet delay (s).

FIGURE 5-4 The surface chart of the MOS-CQS provided by G.711 for packet loss
of 0-10% and packet delay of 0-0.8 s. The chart was computed using
the surface fitting tool in Matlab and was used to create the ThaiVQE,
and subjective model

TABLE 5-1 MOS-CQS* calculated from ThaiVQE model as a function of packet

delay and packet loss effects for G.711 only

Packet Delay (ms)

0 100 200 300 400 500 600 700 800
0 4.128 4.099 4.070 4.041 4.012 3.983 3.954 3.924 3.895
1 4.032 4.006 3.979 3.953 3.926 3.899 3.873 3.846 3.819
2 3.943 3.919 3.895 3.871 3.847 3.822 3.798 3.774 3.750
Packet Loss (%)

3 3.860 3.838 3.817 3.795 3.773 3.752 3.730 3.708 3.687

4 3.783 3.764 3.745 3.726 3.706 3.687 3.668 3.649 3.629
5 3.713 3.696 3.679 3.663 3.646 3.629 3.612 3.595 3.579
6 3.649 3.635 3.620 3.606 3.592 3.577 3.563 3.549 3.534
7 3.591 3.579 3.567 3.555 3.543 3.532 3.520 3.508 3.496
8 3.539 3.530 3.521 3.511 3.502 3.492 3.483 3.473 3.464
9 3.494 3.487 3.480 3.473 3.466 3.459 3.452 3.445 3.438
10 3.455 3.450 3.446 3.441 3.437 3.432 3.428 3.423 3.419
Users satisfied
Some users satisfied
Many users dissatisfied but some Thai users satisfied

5.3 Model Comparison: Standard E-model, E2-model and ThaiVQE

To ensure that the E2-model and ThaiVQE will provide better accuracy and
reliability than the standard E-model, it is necessary to compare the predictions of the
three models.
5.3.1 Model Evaluation
As in Eq. (2-9) and Eq. (2-10), the evaluation method compares MOS values
predicted by the model equations with observed MOS values and computes Mean
Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). For a
statistically valid test, it is necessary that the test set is completely different from the
data set used to construct the E2-model and ThaiVQE model. As shown in Table 5-2,
the set of data used in the testing was obtained from 70 subjects with eight scenarios
of packet loss and delay. As required, this test data set included no data from the
creation of the E2-model or ThaiVQE.
5.3.2 Evaluation Results
The evaluation results are presented in Table 5-3. It can be seen that the MAE
of 0.308 from the E2-model and 0.313 from the ThaiVQE are both less than the MAE
of 0.393 from the original E-model. It can also be seen that the MAPE of 9.14% from
the E2-model and 9.25% from the ThaiVQE are less than the MAPE of 12.04% from
the standard E-model. The E2-model errors have been reduced by approximately 24%
compared with the standard E-model. Moreover, the ThaiVQE errors have also been
reduced by approximately 23%. The results show that both of the new models reduce
errors compared with the standard E-model. The results also show that there appears
to be no significant difference in error reduction for the two new models. Therefore,
both of the new models can provide better accuracy and reliability than the old model.

Table 5-2 Test set information

No. of
Scenario MOS-CQS SD
0A: Direct Condition 4 4 0.82
2A: 2% Loss 12 3.73 0.45
3A: 3% Loss 4 4 0
5A: 5% Loss 8 3.63 0.74
0B: 0.4s Delay 12 4 0.43
0C: 0.8s Delay 20 3.95 0.39
3B: 3% Loss & 0.4s Delay 6 3.83 0.75
3C: 3% Loss & 0.8s Delay 4 3.75 0.5

TABLE 5-3 Comparison of errors between actual MOS and predicted MOS for
standard E-model, E2-model, and ThaiVQE model

Model Comparison MAE MAPE Error Reduction (%)

E-model vs Test set 0.393 12.04% -

E2-model vs Test set 0.308 9.14% 24.13%
ThaiVQE vs Test set 0.313 9.25% 23.22%

5.4 ThaiVQE for G.729

The VoIP application is usually used because it reduces the cost of long distance
and international calls. G.729 is an important codec that is commonly used in WAN
or over links between two different networks located in different buildings, different
cities or different countries. In this section, a ThaiVQE model is developed for G.729
by modifying the ThaiVQE model developed for G.711. The steps in the development
of the G.729 ThaiVQE model are as follows:
Step 1: In Table 4-2 and Figure 4-x the data for MOS-CQS were shown for
G.729 and G.711 referring to packet delay and in Table 4-3 and Figure 4-x the data
for MOS-CQS were shown for G.729 and G.711 referring to packet loss. From Table
4-2 it can be seen that there is only a small difference between MOS-CQS for G.729
and G.711, whereas from Table 4-3 it can be seen that there is an appreciable
difference between MOS-CQS for G.729 and G.711. Therefore, in modifying the
G.711 ThaiVQE model for G.729, the effects of packet delay can be ignored.
Step 2: Compute the differences between the MOS-CQS for the G.729 and
G.711 codecs as a function of packet loss. The results are shown in Table 5-4.

Step 3: Use the trendline tool in Microsoft Excel to estimate an equation relating
the MOS-CQS differences to packet loss. The estimated equation, which has an R-
squared value of 1, is given in Eq. (5-6).

TABLE 5-4 The differences between MOS-CQS provided by G.711 and G.729 as a
function of packet loss, adapted from Table 4-3 (in Chapter 4)

Packet Loss Diff (MOS-CQS)
G.711 G.729

0 4.16 4.13 0.03

2% 3.92 3.88 0.09
6% 3.69 3.53 0.16
10% 3.47 3.04 0.43

Diff(MOS-CQS) = 833.32L3 – 87.5L2 + 4.4167L + 0.03 (5-6)

Diff(MOS-CQS) = MOS-CQS (G.711) – MOS-CQS (G.729)
L is packet loss rate

Step 4: Create a G.729 ThaiVQE table of MOS-CQS* G.729 values as a function

of packet loss and delay as follows. From Table 5-1 find the MOS-CQS* G.711 value
for a given packet loss and delay. For each row of constant packet loss in Table 5-1,
subtract the Diff(MOS-CQS) value corresponding to the given value of packet loss
computed from Eq. (5-6). That is, the MOS-CQS* G.729 values are calculated from Eq.
(5-7). The results for the G.729 ThaiVQE model are shown in Table 5-5.

MOS-CQS* G.729 = MOS-CQS* G.711 - Diff(MOS-CQS) (5-7)

MOS-CQS* G.729 is Estimated MOS-CQS* for G.729
MOS-CQS* G.711 is MOS-CQS* for G.711 (Table 5-1)
Diff(MOS-CQS) is the difference between MOS-CQS (Eq. (5-6))

The classification of satisfaction level given in Table 5-5 assumes that the
relationship between MOS-CQS and satisfaction level for Thai users is the same as
that given by ITU-T (see Table 2-x) for users in general. With this assumption, it can
be seen that users Thai users are satisfied when using VoIP applications/services
provided by G.729 without packet loss and with packet delay of less than 200 ms
(shown as the black area in Table 5-5). For higher values of packet loss and delay,
Table 5-5 shows that some users are satisfied and some are dissatisfied, if packet loss
and packet delay occur together as 4% loss and less than 300 ms, or 3% loss and less
than 800 ms (the dark grey or dark green area in Table 5-5). For even higher values
of packet loss and delay, Table 5-5 shows that some Thai users are still satisfied even

if packet loss and packet delay occur together as 6% loss and less than 600 ms, or 5%
loss and less than 800 ms (the light grey or light green area in Table 5-5). At very
high values of packet loss and delay, many Thai users or all Thai users will not be
satisfied (the very light gray or yellow and gray or orange areas in Table 5-5). In
particular, if the packet loss is greater than 9% then all users will be dissatisfied.
It should be noted that the ThaiVQE table for G.729 was obtained by modifying
the ThaiVQE for G.711 using only packet loss data for G.729. It was beyond the
scope of this thesis to carry out the time-consuming detailed measurements of MOS-
CQE and some parts of MOS-CQS for combined packet loss and delay for G.729 that
were carried out for G.711. However, even with this limitation, the G.729 ThaiVQE
table 5-5 should be useful as a basis for VoIP network planning and measurement for
Thai users in Thai environments.

TABLE 5-5 MOS-CQS* estimated from G.729 ThaiVQE model as a function of

packet delay and packet loss

Packet Delay (ms)

0 100 200 300 400 500 600 700 800

0 4.098 4.069 4.040 4.011 3.982 3.953 3.924 3.894 3.865
1 3.966 3.940 3.913 3.886 3.860 3.833 3.806 3.780 3.753
2 3.853 3.829 3.805 3.781 3.757 3.732 3.708 3.684 3.660
Packet Loss (%)

3 3.754 3.732 3.711 3.689 3.667 3.645 3.624 3.602 3.580

4 3.663 3.644 3.625 3.606 3.586 3.567 3.548 3.529 3.509
5 3.577 3.560 3.543 3.526 3.510 3.493 3.476 3.459 3.442
6 3.489 3.475 3.460 3.446 3.432 3.417 3.403 3.389 3.374
7 3.395 3.383 3.371 3.359 3.347 3.335 3.323 3.312 3.300
8 3.289 3.280 3.271 3.261 3.252 3.242 3.233 3.223 3.214
9 3.168 3.161 3.154 3.147 3.140 3.133 3.126 3.119 3.112
10 3.025 3.020 3.016 3.011 3.007 3.002 2.998 2.993 2.989
User satisfied
Some users satisfied but some user dissatisfied
Many users dissatisfied but some Thai users satisfied
Many users dissatisfied, including Thai users
Nearly all users dissatisfied

This is the last chapter which consists of three sections, discussion, conclusion
and future work, as follows:

6.1 Discussion
6.1.1 ACR - listening opinion tests and interview tests in the pilot phase
The pilot tests that consisted of ACR listening opinion tests and interview tests
were important for this research. They helped to ensure the reliability of the VoIP
test-bed system. In particular, the pilot tests helped to reveal unseen and unexpected
issues. For example, the most important issue was finding students to join the
subjective tests in the laboratory that had been set up on the 7th floor of the Central
Library, KMUTNB. Students who said they would join the VoIP quality assessment
must actually intend to go. In some cases, a lecturer allowed their students to join the
test before they finished a class. If there were 30 students for the test and each student
took at least 3 minutes then the last student might have been kept waiting for at least
one and a half hours. The pilot tests revealed that the most effective means of
ensuring that a sufficient number of subjects were available for the tests was
enforcement by their lecturers. The ID of a student who joined the tests was sent to
the lecturer for checking.
The ACR listening opinion tests in the pilot phase showed that the MOS-LQS
results for the G.711, G.729 and G.723.1 codecs from the experiment with Thai users
were consistent with the theory, i.e., that G.711 provides better voice quality than
G.729 and that G.729 provides better voice quality than G.723.1. The results from the
ACR listening tests on Thai users detected a statistically significant difference of
perceived voice quality between G.729 and G. 723.1, but the observed difference
between G.711 and G.729 was not statistically significant.
In the interview tests with the G.711 and G.722, the observed differences
measured using MOS-CQS were not statistically significant. In these interview tests
no major differences between male and female subjects or between different IP
phones were detected.
6.1.2 PESQ test with Thai speech sets in the pilot phase
The PESQ test results for the Thai language described in chapter 4 showed that
there was an issue related to speech content dependency for Thai users with PESQ.
Therefore, it is recommended in this thesis that PESQ is not appropriate for voice
quality measurement in the Thai language. This result contrasts with published PESQ
test results for American English which showed that PESQ worked well with speech
sets from this language.
6.1.3 Conversation Opinion Tests
These MOS-CQS tests were crucial for this research. Without using these tests,
the development of the enhanced E-model for Thai users could not have been
achieved. For these tests, more than one thousand subjects were required. Therefore,

as noted above, it was necessary to overcome the problem of recruiting a sufficient

number of subjects to join the tests. This research was based on a group of Thai
students, both male and female, in KMUTNB who were mainly about 17-24 years
old. These subjects were therefore not a representative sample of the Thai population.
A total of approximately 1,500 Thai users participated in all phases of the
subjective tests. As shown in Table 6-1, 92 subjects participated in the Listening
opinion tests, 201 subjects in the interview test, 1,128 subjects in the conversation
opinion tests, and 70 subjects for the test set of E2-model and ThaiVQE. Because of
some experimental problems, the results for some of the subjects had to be rejected as
outliers. Because of the large number of subjects involved, the tests could not be
carried out by only one researcher within a limited time frame. Therefore, the person
who acted as a research assistant was also very important in obtaining reliable test
results. For example, if an assistant made some mistake by presenting a wrong
scenario to a test involving 30 participants, then all results from that test would show
an abnormality and would have to be discarded. Therefore, the reliability and integrity
of research assistants was very important and had to be closely monitored.
6.1.4 Expectation of Thai users for VoIP quality
After finishing all subjective tests, the mean expectation score (MES) was
calculated and found to be equivalent to a MOS value of 3.41. This MOS value for
Thai users was lower than the MOS value of 3.60 that has been given by ITUT-T as
the minimum MOS value at which only some users are dissatisfied. At lower values,
the ITU-T expects many users to be dissatisfied. Therefore, if NBTC (the regulator of
VoIP in Thailand) regulates voice quality of VoIP services with a MOS of 3.6, the
VoIP quality should mean that only some Thai users will be dissatisfied and that
quality should be satisfactory for many Thai users.
6.1.5 Comparison of MOS-CQS and MOS-CQE
From Table 4-4 and Figure 4-7 (in Chapter 4), it can be seen that, for the best
conditions (zero packet loss and packet delay) of the conversation opinion tests, the
subjective MOS-CQS values for Thai users were not close to the theoretical
maximum value of 4.5 for the objective MOS-CQE value from the E-model equation
or to the actually observed MOS-CQE value of 4.41. On the other hand, for the poorer
conditions (10% packet loss and packet delay of 0.8 s), the MOS-CQS values were
found to be higher than the MOS-CQE values. Some possible reasons are as follows:
1) Language effects: Thai is a tonal language which affects signal processing
in Thai brains. The tonal features may help Thai users perceive voice well even in the
poorer conditions associated with, for example, packet delay and packet loss. This
issue should be investigated further, particularly with reference to signal processing
and content dependency in languages with tonal features.
2) Cultural effects: Thai culture is based on Buddhism. In general, Thai users
do not respond in an extreme manner to either good or bad conditions. The response
of Thai people to situations typically includes compromise, face-saving and respect
for seniors/elders/superiors, such as lecturers/teachers/officials. These cultural effects
might be one reason why the subjective MOS-CQS values are not as high as the
objective MOS-CQE values from the E-model under the best conditions and why the
MOS-CQS are not as low as the MOS-CQE values under the poorer conditions.
3) Standard and quality of life: Most Thai people earn low income,
particularly people from rural areas. Most subjects in the conversation opinion tests

probably came from these low income households and their quality of life would not
be high. They would therefore have relatively low expectations of quality of services.
This could be a possible reason why they were willing to accept poorer VoIP quality
with appreciable packet loss and packet delay.
4) Constraints of the VoIP test-bed system: the packet losses generated for
the tests, both for the conversation opinion test and the E-model tests, were random
uniform losses. However, in the real world, packet losses are often not uniform as
they can be bursty. It is known that bursty packet loss reduces voice quality more
than does uniform random packet loss. Therefore, in the real world, the occurrence of
bursty packet losses are likely to result in MOS-CQS values which are lower than the
values given in Table 4-4 (in Chapter 4).
6.1.6 A comparison of MOS from Thai users with reference data
Table 2.17 (in Chapter 2), adapted from [16], contains the most complete data
currently available about codecs and their characteristics. Table 6-2 contains a
comparison of our values for MOS for Thai users with this reference data for a range
of qualities of VoIP for major codecs. From the table, it can be seen that:
1) The range of MOS for G.722 at 64 kbps of 4-17-4.25 from the Thai
subjective tests is higher than the MOS from the reference of approximately 4.1.
2) The range of MOS for G.711 (A-law) of 4.14 – 4.23 from Thai subjective
tests is higher than the MOS from the reference of 4.1.
3) The range of MOS for G.729 of 4.13 – 4.18 from Thai subjective tests is
appreciably higher than the MOS from the reference of 3.92.
4) The MOS of G.723.1 at 5.3 kbps of 3.9 from Thai subjective tests is
appreciably higher than the MOS from the reference of 3.6.
The MOS values from this thesis could be used as benchmarks or reference for
VoIP quality measurement in Thailand.
All MOS values listed in Table 6-2 for Thai users of the four codecs are higher
than the MOS values given in the reference [16]. The root causes of differences might
be due to effects such as language and culture listed in 6.1.5. However, the reasons
for the differences require further investigation.

TABLE 6-1 Summary of numbers of subjects

Method Codec No. of Subjects

G.711 31
Listening opinion tests G.729 29
G.723.1 at 5.3 kbps 32
G.711 100
Interview tests
G.722 101
G.722 248
Conversation opinion tests G.711 644
G.729 236
Test set for evaluation G.711 70
Total 1491

6.1.7 Comparison of the E2-model and ThaiVQE

In Chapter 5, it was shown that both the proposed E2-model and the ThaiVQE
model give improved accuracy and reliability of approximately 20% when compared
with the standard E-model. However, there are two perspectives to be considered
when making this comparison, namely, an objective measurement perspective (based
on E-model that has been standardized by ITU-T) and a subjective measurement
perspective (based on subjective tests). The two perspectives are as follows:
1) Objective measurement perspective: ITU-T is the organization that takes
care of telecommunication standards, including voice quality measurement in
telecommunication. The E-model is accepted as the ITU-T standard and is widely
applied to assess VoIP quality. It has been widely used in VoIP even though it was
originally developed for network planning. Therefore, the enhanced E-model with a
Thai bias factor that has been developed in this thesis should be accepted as the
enhanced E-model standard for Thai users speaking the Thai language in Thai
environments. It is also suggested that an enhanced E-model with appropriate
language bias factors should be developed for other Asian countries, such as China,
Japan and Vietnam which have their own languages and rich cultures like Thailand.
2) Subjective measurement perspective: according to the literature review in
Chapter 2, many researchers support the idea that subjective measurements of voice
quality are highly accurate and reliable because judgments of voice quality are
inherently subjective. Objective voice quality measurement is a measurement tool
that can be used to estimate MOS easily, but it basically only simulates the users’
perceptions. Although ThaiVQE has been developed based on packet loss and packet
delay effects for the G.711 codec, it is suggested that ThaiVQE could be the most
accurate and reliable method for measuring voice quality for Thai users in Thai
environments. The ThaiVQE tables from the ThaiVQE model can be a guideline for
VoIP network/system planning for Thai users in Thai environments. For example, to
ensure that many Thai users will be satisfied, G.711 must be used with packet loss of
1% maximum or packet delay of 300 ms, while, to avoid dissatisfaction of many Thai
users when G.729 is used, packet loss should be less than 6% maximum.

TABLE 6-2 Subjective MOS from Thai users

Codec Remark
Reference Thai
G.722 at 64 kbps ~4.1 4.17 - 4.25
No statistically significant
G.711 A-law 4.1 4.14 - 4.23 differences for Thai users
G.729 3.92 4.13 - 4.18
Significant differences from above
G.723.1 at 5.3 kbps 3.6 3.90
codecs for Thai users

6.2 Conclusion
This research has been conducted on the assumption that VoIP users who speak
different languages, such as, Thai (a tonal language) and English (a non-tonal

language), and who have different cultures and live in different countries may have
different perceptions about voice quality.
The research has been conducted with more than one thousand subjects and has
concentrated on three major factors, consisting of codec, packet loss and packet delay.
A main aim has been to develop appropriate recommendations about voice quality for
Thai users who speak the standard Thai spoken language and who have their own
culture. It can be claimed that the research in this thesis is the most intensive
subjective test so far carried out for VoIP quality measurement, particularly with a
tonal language. The results from the intensive subjective tests have been applied to
develop a modified objective measurement method that is suitable for use in Thai
environments and which gives results with high accuracy, reliability and confidence.
This research has verified and presents evidence that G.729 can provide a level
of voice quality as good as G.711 and G.722 (see Table 6-2). After surveying the
expectations of Thai users for VoIP quality, a MOS value of 3.41 has been proposed
as the baseline for providing voice quality of VoIP to Thai users in Thailand.
Essentially, the important part of the gathered data, from 400 subjects, has been
validated and used to find a bias factor, called Thai bias factor, to enhance the E-
model, which is the popular objective measurement for VoIP quality. This Enhanced
E-model for VoIP quality measurement in Thai environment, which is called the
E2-model, has been proposed to provide objective measurements of higher accuracy,
reliability and confidence for use in Thai environments. Moreover, a new
mathematical model called Thai subjective – VoIP Quality Evaluation (ThaiVQE) has
been developed by analyzing validated data from the subjective tests with 400
subjects using Matlab. These two proposed methods can be applied by VoIP operators
to provide higher standard VoIP quality to Thai users and to give a better quality of
life of Thai people.
The results from this research are important evidence for the
language/cultural/nationality dependence of subjective VoIP quality measurement. It
can be particularly useful for ITU-T Study Group 12 which is currently studying
language/cultural/nationality dependence of the quality of experience of multimedia,
including MOS.
Some recommendations from this research referring to VoIP quality
measurement or evaluation, based-on Thai users in Thai environments, are as follows:
1) PESQ tool is not recommended to use in Thai environment because it is
content and language dependent and has not been calibrated for the Thai language.
2) G.729 is strongly recommended for use instead of G.711 in cases that
require bandwidth reduction but still require voice quality as good as G.711. The
G.729 can reduce bandwidth consumption by approximately two thirds (2/3) without
appreciably reducing voice quality as observed by Thai users. However, if users do
not care much about voice quality, G.723.1 is also an option because it can reduce
bandwidth consumption by about three fourths (3/4) to four fifths (4/5) compared to
3) G.722 is not recommended if it requires a license charge to use because it
does not provide appreciably better voice quality than G.711.
4) The MOS values given in Table 6-2 can be used as the benchmarks for
VoIP systems/applications/services/ in Thailand.

5) Although the E-model was originally designed and developed for network
planning, it is independent from content and speech resources. Therefore it is
recommended that it should be modified and enhanced for use in Thai environments.
6) At present, there are no official government regulations adopted for MOS
in Thailand. It is suggested that VoIP service providers should use MOS of 3.41 as
the baseline to ensure that many Thai users are not dissatisfied and that a MOS of 3.6
should be proposed to NBTC for VoIP regulation.
7) ThaiVQE tables, see Tables 5-1 and 5-5 (in Chapter 5) should be used as
guidelines to provide good VoIP quality for Thai users.

6.3 Future Work

With the limited scope imposed on this work by constraints such as time and
recruitment of subjects, there are several issues that this thesis has not investigated.
Some issues that require further investigations are as follows:
1) Subjective tests in this thesis have been conducted with about 1,500
subjects. However, all of these subjects were students at KMUTNB and from a
limited range of ages and education. Therefore, future work should be conducted with
more representative subjects from a wider range of ages and economic status and
from regions of Thailand other than Bangkok.
2) Several available codecs have not been assessed by Thai users, for
example, G.723.1 (6.3 kbps), G.726, Speex, iLBC and GSM codecs. These codecs
should be studied as future work.
3) The extension of the E2-model to other codecs (e.g. G.723.1, G.729 and
G.722) should be investigated intensively using the approach described in this thesis.
4) The E2-model with embedded Thai bias factor should be developed and
applied to existing VoIP measurement tools in Thailand market.
5) The automatic and real-time E2-model and ThaiVQE should be developed
and improved so that it can be adopted as a reliable standard for assessing VoIP in
Thailand and so that it can be used with high confidence by the telecom industry of
6) The proposed bias factor in this thesis should be investigated deeply,
particularly in terms of signal processing, to find out the root factors associated with
the tonal nature of Thai speech.
7) Raw data from the subjective tests might be used to create a model using
other techniques, such as, machine learning techniques (e.g. ANN).
8) The effects of culture, based-on the way people respond to situations,
should be investigated further, to see how it affects perception of users to voice
quality of VoIP.
9) The subjective MOS based on Thai users should be studied regularly, for
example, every 5-10 years, to investigate how the change of culture and the higher
quality of life of Thai people affect the VoIP quality perception of Thai users.
10) The approach of this thesis should be applied to investigate the perception
of VoIP quality by subjects in other countries that have their own language and
culture (e.g., China, Japan and ASEAN countries) to confirm whether the language
and cultural variation reported in this thesis also affect VoIP quality evaluation in
these other countries.

1. Daengsi, T., et al. “VoIP Quality Measurement: Insignificant Voice Quality of

G.711 and G.729 Codecs in Listening-Opinion Tests by Thai Users.”
Information Technology Journal. 8(1) (2012): 77-82.
2. Daengsi, T. and Preechayasomboon, A. “Case Study: AIA Insurance – Migration
Project Experience.” Proc. 4th National Conference Computing and
Information Technology (NCCIT’08). (23-24 May 2008): 45-50.
3. _____. “Case Study: AMEX Thailand - PABX Migration Experience.” Proc. 5th
National Conference Computing and Information Technology (NCCIT’09).
(22-23 May 2009): 661-665.
4. Somboonrungroj, H. Let us know VoIP technology. [online] 2012. [cited 2012 Dec
15]. Available from : (Thai).
5. Information Technology Center, MJU. VOIP. [online] 2012. [cited 2012 Dec 15].
Available from : (Thai).
6. Suwannaraj, K. Deployment of VoIP in Pibulsongkram Rajabhat University.
[online] 2012. [cited 2012 Dec 9]. Available from :
/~kitti/asterisk/before_implement_voip.ppt. (Thai).
7. PRSU. Location for Installation of VoIP Campus Phone PSRU. [online] 2012.
[cited 2012 Dec 9]. Available from :
phone_location.pdf. (Thai).
8. Purimkasem, W. IP Telephony Use for Chanthaburi Campus. [online] 2012. [cited
2012 Dec 15]. Available from : http://
icle/IPphoneChan.pdf. (Thai).
9. ARIP, RMUTL. VoIP manual. [online] 2012. [cited 2012 Dec 15]. Available from
: (Thai).
10. Thai Customs Department. Bidding. [online] 2011. [cited 2011 Jul 8]. Available
from : (Thai).
11. TOT. TOT netcall. [online] 2012. [cited 2012 Dec 15]. Available from : http://
12. ____. 008. [online] 2012. [cited 2012 Dec 15]. Available from :
13. True. What is TrueNetTalk? [online] 2012. [cited 2012 Dec 15]. Available from :
14. CAT. CAT009 Service. [online] 2012. [cited 2012 Dec 15]. Available from : http:
// (Thai).
15. NTC. NTC Announcement: Numbering Plan. [online] 2012. [cited 2012 Dec 15].
Available from :
184.PDF. (Thai).
16. Karapantazis, S. and Pavlidou, F.-N. “Voip: A comprehensive survey on a
promising technology.” Computer Networks. 53 (2009): 2050-2090.

17. Goudarzi, M. Evaluation of Voice Quality in 3G Mobile Networks. Master of

Science Thesis, School of Computing, Communications and Electronics,
Faculty of Technology, University of Plymouth, 2008.
18. Daengsi, T., et al. The Development of a Thai Speech Set for Telephonometry.
[online] 2012. [cited 2012 Dec 15]. Available from :
19. Wutiwiwatchai, C. and Furui, S. “Thai speech processing technology: A review.”
Speech Communication. 49 (2007): 8-27.
20. Hudak, T.J. Some Historical Background of Thai Language. [online] 2012. [cited
2012 Dec 15]. Available from :
21. TAT. Map of Thailand. [online] 2012. [cited 2012 Mar 19]. Available from : http:
22. Campbell, S. and Shaweevongs, C. Tones. [online] 2012. [cited 2012 Dec 16].
Available from :
23. Nuamkaew, V. Introduction to Linguistics. The Faculty of Liberal Arts: Krirk
University, 2008. (Thai).
24. Naksakul, K. Thai sound system. 6th ed. Bangkok: Chulalongkorn University,
2008. (Thai).
25. YWAM, Thailand. A guide to culture. [online] 2012. [cited 2012 Dec 16].
Available from :
26. Monthienvichienchai, C., et al. “Cultural awareness, communication apprehension
and communication competence : a case study of Saint John’s International
School.” The International Journal of Educational Management. 16 (2002):
27. Knutson, T.J. Comparison of Thai and U.S. American Culture Values: “Mai pen
rai” versus “Just do it.” [online] 2012. [cited 2012 Dec 16]. Available from :
28. Vongvipanond, P. Linguistic Perspectives of Thai Culture. [online] 2012. [cited
2012 Dec 16]. Available from :
29. Vatanasakdakul, S. and Ambra, J. D. “An Exploratory Study of the Socio-
Cultural Impact on the Adoption of E-Commerce for Firms in the Tourism
Industry of Thailand.” Proc. 14th European Conference on Information
System (ECIS 2006). (12-14 May 2006): 235-247.
30. Altbach, P.G. The past and future of Asian universities: twenty–first century
challenges. [online] 2012. [cited 2012 Dec 20]. Available from : http://50-
31. Joungtrakul, J. The Cultural Dimensions of Business Management in Thailand.
[online] 2012. [cited 2012 May 15]. Available from : http://www.blcigroup
32. Muenjohn, N. “Dimensions of Culture: Understanding Thai working and
management styles.” Proc. Business and Information 2011 (BAI 2011). (4-
6 July 2011): 1-14.

33. Numprasertchai, H.P. and Swierczek, F.W. Dimensions of Success in International

Business Negotiations: A Comparative Study of Thai and International
Business Negotiators. [online] 2012. [cited 2012 Dec 15]. Available from :
34. Lam, C.K. Thailand as new destination for Norwegian travelers. Master Thesis,
The Narwegian School of Hotel Management, Faculty of Social Sciences,
University of Stavanger, 2011.
35. Komolesevin, R., Knutson, T.J. and Datthuyawat, P. “Effective Intercultural
Communication: Research Contributions from Thailand.” Journal of Asian
Pacific Communication. 20 (2010): 90-100.
36. Knutson, T.J., Tales of Thailand: Lessons from the Land of Smile. [online] 2012.
[cited 2012 Dec 15]. Available from :
37. Wangkijchinda, K. Development Intercultural Communication Competence: a
Guide for English Foreign Language Teachers in Thailand. Master of Arts in
Teaching International Languages Thesis, California State University, 2011.
38. Ahmed, S.A. and Astous, A. D. “Moderating effect of nationality on country- of-
original perception: English-speaking Thailand versus French-speaking
Canada.” Journal of Business Research. 60 (2007): 240-248.
39. Soontayatron, S. Socio-Cultural Changes in Thai Beach Resorts: A Case Study of
Koh Samui Island, Thailand. Ph.D. Thesis, School of Tourism, Bouenemouth
University, 2010.
40. Thanasankit, T. and Corbitt, B. “Understanding Thai Culture and Its Impact on
Requirements Engineering Process Management During Information
Systems Development.” Asian Academy of Management Journal. 7 (2002):
41. Chuvetsereporn, S. The impact of Thai culture toward the idea of Empowerment
on Multinational Corporation in Thailand. [online] 2012. [cited 2012 Dec
15]. Available from :
42. Claydon, R. Selected sections from the country file of Thailand. [online] 2012.
[cited 2012 Dec 15]. Available from :
43. Jittaruttha, C. Pyramid Culture and the Struggle for Democratization. [online]
2012. [cited 2012 Jul 9]. Available from : http://www.organizzazion
44. MacDonal, K. What Makes Western Culture Unique? [online] 2012. [cited 2012
Dec 9]. Available from :
45. Hofstede, G. What about Thailand?. [online] 2012. [cited 2012 Dec 17].
Available from :
46. Dockery, A.M. Culture and wellbeing: The case of Indigenous Australians.
[online] 2012. [cited 2012 Dec 17]. Available from :
47. Socialbakers. Facebook Statistics by City (Beta). [online] 2012. [cited 2012 Dec
16]. Available from : statistics/

48. CFAR. Mini-case Study: Nike’s “Just Do It” Advertising Campaign. [online]
2012. [cited 2012 Dec 16]. Available from :
49. Camp J. et al. Thailand and Adult Diapers Feasibility Study. [online] 2012. [cited
2012 Dec 16]. Available from :
50. NSO. The Key Statistics of Thailand, Household Sccio-economic Survey in the
first half of the year 2011. [online] 2012. [cited 2012 Dec 16]. Available
from :
f. (Thai).
51. Rogers, A.L. Wind Turbine Acoustic Noise. [online] 2012. [cited 2012 Dec 16].
Available from :
52. Stanfield, C.L. and Germann, W.J. Principles of Human Physiology. 3rd ed.
California: Pearson Education, 2008.
53. Tortora, G.J. and Derrickson, B. Principles of Anatomy and Physiology. 12th ed.
Asia: John Wiley & Sons, 2009.
54. Silverthorn, D.U., et al. Human Physiology: An Integrated Approach. 2nd ed.
California: Pearson Education, 2001.
55. Lange, M. Dorsal. [online] 2012. [cited 2012 Dec 16]. Available from :
56. Sherwood, L. Human Physiology: from Cells to Systems. 4th ed. California:
Thomson Learning, 2001.
57. Rohen, J.W., Yokochi, C. and Lutjen-Drecoll, E. Color Atlas of Anatomy:
Aphotographic Study of the Human Body. 5th ed. Pennsylvania: Williams &
Wilkins, 2002.
58. Sittiprapaporn, W., Chindaduangratn, C. and Kotchabhakdi, N. “Brain electric
activity during the preattentive perception of speech sounds in tonal
languages.” Songklanakarin Journal of Science and Technology. 26 (2004):
59. ____. Functional Specialization of the Human Auditory Cortex in Processing of
Speech Prosody: A Low Resolution Electromagnetic Tomography (LORETA)
Study. [online] 2012. [cited 2012 Dec 9]. Available from : http://anchan. kukr/bitstream/003/17786/1/KC4205011.pdf.
60. ____. “Long-term memory traces for familiar spoken words in tonal languages as
revealed by the Mismatch negativity.” Songklanakarin Journal of Science
and Technology. 26 (2004): 779-786.
61. Gandour, J., et al. “Pitch processing in the human brain is influenced by language
experience.” Neuroreport Rapid Science. 9 (1998): 2115-2119.
62. ____. “A crosslinguistic PET study of tone perception.” Journal of Cognitive
Neuroscience. 12 (2000): 207-222.
63. Klein, D., et al. “A cross-linguistic PET study of tone perception in Mandarin
Chinese and English speakers.” NeuroImage. 13 (2001): 646-653.
64. Wang, Y., Jongman, A. and Sereno, J.A. “Dichotic perception of Mandarin tones
by Chinese and American listeners.” Brain and Language. 78 (2001): 332-

65. Wang, Y., et al. “The role of linguistic experience in the hemispheric processing
of lexical tone.” Applied Psycholinguistics. 25 (2004): 449-466.
66. Keenaghan, K.M. A Novel Non-Acoustic Voiced Speech Sensor: Experimental
Results and Characterization. Master of Science in Electrical and Computer
Engineering Thesis, Worcester Polytechnic Institute, 2004.
67. Anusuya, M.A. and Kitti, S.K. “Speech Recognition by Machine: A Review.”
International Journal of Computer Science and Information Security. 6
(2009): 181-205.
68. NIDCD. What Is Voice? What Is Speech? What Is Language? [online] 2012.
[cited 2012 Dec 15]. Available from :
69. Jiang, W. and Schulzrinne, H. “Analysis of On-Off Patterns in VoIP and Their
Effect on Voice Traffic Aggregation.” Proc. 9th Int. Conf. Computer
Communications and Networks. (16-18 October 2000): 82-87.
70. FCC. Consumer Guide, Voice Over Internet Protocol (VoIP). [online] 2012.
[cited 2012 Dec 17]. Available from :
71. Desantis, M. Understanding Voice over Internet Protocol (VoIP). [online] 2012.
[cited 2012 Dec 17]. Available from : _room/
72. Nokia. Advantages of SIP for VoIP. [online] 2012. [cited 2012 Dec 17]. Available
from : About_Nokia/Press/White_
73. Emmerson, B. Convergence: the Business Case for IP Telephony. [online] 2012.
[cited 2012 Dec 17]. Available from :
74. ITU. The Status of Voice Over Internet Protocol (VoIP) Worldwide, 2006.
[online] 2012. [cited 2012 Dec 17]. Available from :
75. NASCIO. VoIP and IP Telephony: Planning for Convergence in State
Government. [online] 2012. [cited 2012 Dec 17]. Available from : http://
76. Komunikasi, S. IP Telephony. [online] 2012. [cited 2012 Dec 15]. Available from :
77. Sengar, H., et al. “Fast Detection of Denial-of-Service Attacks on IP Telephony.”
Proc. of 14th IEEE Int. Workshop on Quality of Service (IWQoS 2006). (19-21
June 2006): 199-208.
78. Soares, V.N.G., Neves, P.A.C. and Rodrigues, J.J.P. “Past, Prsent and Future of
IP Telephony.” Proc. Int. Conf. Communication Theory, Reliability, and
Quality of Service 2008 (CTRQ’08). (29 June – 5 July 2008): 19-24.
79. Tanenbaum, A.S. Computer Networks. 4th ed. New Jersey: Prentice Hall, 2003.
80. Macario, J. Intro to Voice over Internet Protocol: What does VoIP Mean for My
Business?. [online] 2012. [cited 2012 Dec 9]. Available from : http://www.
81. Hurley, M. VoIP Vulnerabilities. [online] 2012. [cited 2012 Dec 19]. Available
from :

82. Keromytis, A.D. Voice over IP: Risks, Threats and Vulnerabilities. [online] 2012.
[cited 2012 Dec 17]. Available from :
83. UVA-WISE. New Campus VoIP Telephone System installation starting soon.
[online] 2012. [cited 2012 Dec 17]. Available from :
84. Prime Minister’s Office. Thailand’s NBTC appointments announcement. [online]
2011. [cited 2011 Sep 20]. Available from :
85. NNT. 11 NBTC members endorsed by HM the King. [online] 2012. [cited 2012
Dec 17]. Available from :
86. The Nation. New NBTC to begin search for secretary. [online] 2012. [cited 2012
Dec 9]. Available from :
87. ____. Court drops 3G bombshell. [online] 2012. [cited 2012 Dec 17]. Available
from : ness/Court-
88. Seriwiwatta, P., Nittayagasetwat, A. and Panyagometh, K. “3G and Economic
Impact: A Case if Thailand.” NIDA Business Journal. 9 (2011): 5-21.
89. BBC. 4G Mobile Phone Network Comes to Scandinavia. [online] 2012. [cited
2012 Dec 17]. Available from :
90. ITU-R Recommendation M.1645. Framework and overall objectives of the future
development of IMT-2000 and system beyond IMT-2000. August, 2003.
91. NBTC. NTC Announcement: QoS VoIP. [online] 2012. [cited 2012 Dec 17].
Available from :!ut/p/c4/04_SB8K8x
92. Jaruvitayakovit, T. “VoIP Status in Thailand.” Proc. 1st AUN/Seed-Net Electrical
and Electronics Engineering Regional Conference, Int. Sym. Multimedia and
Communication Technology. (22-23 January 2009): 128-130.
93. Vanijja, V. VoIP Software Using Open Source. [online] 2011. [cited 2011 Nov
11]. Available from :
94. CITEL. Redes de Próxima Generación. [online] 2012. [cited 2012 Dec 18].
Available from :
95. Daengsi, T., et al. “Recent VoIP Services in Thailand and the Expectation of Thai
Users to Voice Quality.” Proc. 1st ASEAN Plus Three Graduate Research
Congress. (1-2 March 2012): ST-835 – ST-840.
96. NBTC. VoIP Telephone Number (06-xxxxx-xxx) in Thailand. [online] 2012. [cited
2012 Dec 17]. Available from :

97. CAT Telecom. CAT 001 Rates. [online] 2012. [cited 2012 Dec 17]. Available
from :
98. CAT2call. CAT2call Rates. [online] 2012. [cited 2012 Dec 17]. Available from :
99. TOT netcall. Rate International. [online] 2012. [cited 2012 Dec 17]. Available
from :
100. True NetTalk. International Calls Rate. [online] 2012. [cited 2012 Dec 17].
Available from :
101. CAT Telecom. CAT 009 Promotion Rates. [online] 2012. [cited 2012 Dec 17].
Available from :
102. TOT. International saving rate service via 008 code. [online] 2012. [cited 2012
Dec 17]. Available from :
103. AIN. 00500 Country & Rate. [online] 2012. [cited 2012 Dec 17]. Available from
104. Truemove. Table of international call rate via code 00600. [online] 2012. [cited
2012 Dec 17]. Available from :
105. CAT. CAT PhoneNet. [online] 2012. [cited 2012 Dec 17]. Available from :
106. ITU-T Recommendation G.722. 7 kHz Audio – Coding within 64 kbit/s. 1988.
107. Packetizer. VoIP Bandwidh Calculator. [online] 2012. [cited 2012 Dec 20].
Available from :
108. ITU-T Recommendation G.711. Pulse Code Modulation (PCM) of Voice
Frequencies. 1988.
109. Hersent, O., Petit, J.-P. and Gurle, D. IP Telephony Deploying Voice-over-IP
Protocols. 1st ed. UK: Wiley, 2005.
110. ITU-T Recommendation G.729. Coding of speech at 8 kbit/s using conjugate-
structure algebraic-code-excited linear prediction (CS-ACELP). January,
111. ITU-T Recommendation G.723.1. Dual rate speech coder for multimedia
communications transmitting at 5.3 and 6.3 kbit/s. May, 2006.
112. Avaya Labs. Avaya IP Voice Quality Network Requirement. [online] 2012. [cited
2012 Dec 18]. Available from :
113. Chen, K.-T., et al. “Quantifying Skype User Satisfaction.” Proc. Special Interest
Group on Data Communication 2006 (SIGCOMM’06). (11-15 September
2006): 399-410.
114. Bonfiglio, D. et al. “Revealing Skype Traffic: When Randomness Plays with
you.” Proc. Special Interest Group on Data Communication 2007
(SIGCOMM’07). (27-31 August 2007): 37-48.
115. Daengsi, T. “Voice Quality Measurement for VoIP: Simple Method Using a
Survey.” Proc. 32nd Electricall Engineering Conference (EECON-32). (28-
30 October 2009).
116. Sulkin, A. PBX Systems for IP Telephony. 1st ed. USA: McGraw-Hill, 2002.

117. Goralski, W.J. and Kolon, M.C. IP Telephony. 1st ed. New York: McGraw-Hill,
118. Wallence, W. Voice over IP first-step. 1st ed. Indianapolis: Cisco Press, 2005.
119. Davidson, J., et al. Voice over IP Fundamentals. 2nd ed. Indianapolis: Cisco
Press, 2006.
120. Cisco. H.323 and SIP Integration. [online] 2012. [cited 2012 Dec 17]. Available
from : prodlit/sh23g
121. Schulzrinne, H. and Rosenberg, J. A Comparison of SIP and H.323 for Internet
Telephony. [online] 2012. [cited 2012 Dec 17]. Available from : http://www.
122. Arora, R. Voice over IP: Protocol Standards. [online] 2012. [cited 2012 Dec
17]. Available from :
123. Avaya. Enterprising with SIP – A Technology Overview. [online] 2012. [cited
2012 Dec 17]. Available from :
124. Tong, H.A. and Rupp, S. SIP-based VoIP services – Architecture and
Comparison. [online] 2012. [cited 2012 Dec 17].
125. Parageogiou, P. A Comparison of H.323 vs SIP. [online] 2012. [cited 2012 Dec
17]. Available from :
126. Malhotra, S. and Kaur, P. “Comparison of Call Signalling Protocols for Ad-hoc
Networks.” International Journal of Computer Applications. 27 (2011): 35-
127. Chen., X., et al. “Survey on QoS Management of VoIP.” Proc. 2003 Int. Conf.
Computer Networks and Mobile Computing (ICCNMC’03). (20-23 October
2003): 69-77.
128. The Applied Technologies Group, Inc. QoS in the Enterprise. [online] 2012.
[cited 2012 Dec 17]. Available from :
129. Frost, N. “VoIP threats – getting louder.” Network Security. (2006): 16-18.
130. Bradbury, D. “The security challenges inherent in VoIP.” Computers & Security.
26 (2007): 485-487.
131. Effnet. An introduction to IP header compression. [online] 2012. [cited 2012
Dec 17]. Available from :
per _Header_Compression.pdf.
132. Palmieri, F. and Fiore, U. “Providing true end-to-end security in converged
voice over IP infrastructures.” Computers & Security. 28 (2009): 433-449.
133. Stanton, R. “Secure VoIP – an achievable goal.” Computer Fraud & Security.
(2006): 11-14.
134. VoIPSA. VoIP Security and Privacy Threat Taxonomy. [online] 2012. [cited
2012 Dec 17]. Available from : VOIPSA_
135. Keromytis, A.D. “Voice-over-IP Security Research and Practice.” IEEE Security
& Privacy. 8 (2010): 76-78.

136. Hunter, P. “VoIP the latest security concern: DoS attack the greatest threat.”
Network Security. (2002): 5-7.
137. Winkler, S. “Video Quality Measurement Standards – Current Status and
Trends.” Proc. 7th Int. Conf. Information, Communications and Signal
Processing 2009 (ICICS 2009). (8-10 December 2009): 1-5.
138. Kilkki, K. “Quality of Experience in Communications Ecosystem.” Journal of
Universal Computer Science. 14 (2008): 615-624.
139. ITU-T Recommendation G.1000. Communications quality of service: A
framework and definitions. November, 2001.
140. ITU-T Recommendation E.800. Terms and definitions related to quality of
service and network performance including dependability. August, 1994.
141. ITU-T Recommendation P.10/G.100. Vocabulary for performance and quality of
service, Amendment 1 - New Appendix I- Definition of Quality of Experience
(QoE). January, 2007.
142. Tran, H.A. and Mellouk, A. “QoE model driven for network services.” Proc. 8th
Int. Conf. Wired/Wireless Internet Communications (WWIC 2010). (1-3 Jun
2010): 264-277.
143. Batteram H. et al. “Delivering Quality of Experience in Multimedia Networks.”
Bell Labs Technical Journal. 15 (2010): 175-194.
144. Nokia. Quality of Experience (QoE) of mobile services: Can it be measured and
improved?. [online] 2012. [cited 2012 Dec 19]. Available from : http://www.
145. Hestnes, B., et al. “Quality of Experience in real-time person-person
communication - User based QoS expressed in technical network QoS
terms.” Proc. 19th Int. Sym. Human Factors in Telecommunication. (1-4
December 2003): 3-10.
146. IneoQuest. MDI / QoE for IPTV and VoIP Quality of Experience for Media over
IP. [online] 2012. [cited 2012 Dec 19]. Available from : http://ftp. ineoquest.
147. Empirix. Assuring QoE on Next Generation Networks. [online] 2012. [cited
2012 Dec 19]. Available from :
148. Wu, W., et al. “Quality of experience in distributed interactive multimedia
environments: toward a theoretical framework.” Proc. 17th ACM Int. Conf.
Multimedia (MM ’09). (19-24 October 2009): 481-490.
149. Daengsi, T. et al. “A study of VoIP quality evaluation: User perception of voice
quality from G.729, G.711 and G.722.” Proc. 9th IEEE Consumer
Communications and Networking Conference – Special Session on Quality of
Experience (QoE) for Multimedia Communications. (14-17 January 2012):
150. ___. “Speech Quality Assessment of VoIP: G.711 VS G.722 Based on Interview
Tests with Thai Users.” International Journal of Information Technology and
Computer Science. 4 (2012): 19-25.
151. Gierlich, H.W. and Kettler, F. “Advanced speech quality testing of modern
telecommunication equipment: An overview.” Signal Processing. 86 (2006):

152. Mahdi, A.E. and Picovici, D. “Advances in voice quality measurement in

modern telecommunications.” Digital Signal Processing. 19 (2009): 79-
153. Rango, F.D., et al. “Overview on VoIP: Subjective and Objective Measurement
Methods.” International Journal of Computer Science and Network Security.
6 (2006): 140-153.
154. Uemura, S., et al. “QoS/QoE measurement system implemented on cellular
phone for NGN.” Proc. 5th IEEE Consumer Communications and Networking
Conference 2008 (CCNC 2008). (10-12 January 2008): 117-121.
155. Sun, L. Speech Quality Prediction for Voice over Internet Protocol Networks.
Ph.D. Thesis, School of Computing, Communications and Electronics,
Faculty of Technology, University of Plymouth, 2004.
156. ITU.T Recommendation P.800. Methods for subjective determination of
transmission quality. August, 1996.
157. Ditech Networks. Voice quality beyond IP QoS.Detech Networks. [online] 2012.
[cited 2012 Dec 9]. Available from :
158. Rix, A.W. Comparison between subjective listening quality and P.862 PESQ
score. [online] 2012. [cited 2012 Dec 19]. Available from : http://wireless.
159. de Lima A.A., et al. On the Quality Assessment of Sound Signals. [online] 2012.
[cited 2012 Dec 17]. Available from :
160. CCITT. Handbook on Telephonometry. 1st ed. Geneva: ITU, 1987.
161. William, L. “Subjective” – The Misused Word.” The Print. 24 (2008): 1-4.
162. Rothstein, J.M. “Objective versus Subjective: Kudzu Terminology.” Physiotherapy
Canada. 60 (2008): 103-105.
163. Farlex. Subjective. [online] 2012. [cited 2012 Dec 17]. Available from : http://
164. Cambridge University Press. Subjective. [online] 2012. [cited 2012 Dec 17].
Available from :
165. Oxford University Press. Subjective. [online] 2012. [cited 2012 Dec 17].
Available from :
166. Longman. Subjective. [online] 2012. [cited 2012 Dec 9]. Available from : http://
167. Sun, L. and Ifeachor, E.C. “Perceived Speech Quality Prediction for Voice over
IP-based Networks.” Proc. IEEE Int. Conf. Communications 2002 (ICC’02).
(28 April - 2 May 2002): 2573-2577.
168. Kovac, A. and Halas, M. Analysis of Influence of Network Performance
Parameters on VoIP Call Quality. [online] 2012. [cited 2012 Dec 17].
Available from :
169. Uhl, T. “Quality of Service in VoIP Communication.” AEU-International
Journal of Electronics and Communications. 58 (2004): 178-182.

170. Cisco. QoS: Quality of Service. [online] 2012. [cited 2012 Dec 17]. Available
from :
171. Jarrett, D. and Buchanan, K. Building Residential VoIP Gateways: A Tutorial
Part Three: Voice Quality Assurance for VoIP Networks. [online] 2012.
[cited 2012 Dec 17]. Available from :
172. IEEE. You searched for: VoIP quality. [online] 2012. [cited 2012 Sep 1].
Available from :
173. ___. You searched for: VoIP quality delay. [online] 2012. [cited 2012 Sep 1].
Available from :
174. ___. You searched for: VoIP quality loss. [online] 2012. [cited 2012 Sep 1].
Available from :
175. ___. You searched for: VoIP quality jitter. [online] 2012. [cited 2012 Sep 1].
Available from :
176. Holub, J., Kastner, M. and Tomiska, O. “Delay effect on conversational quality
in telecommunication networks: Do we mind?” Proceedings of the Wireless
Telecommunications Symposium (WTS 2007). (26-28 April 2007): 1-4.
177. ITU-T Recommendation G.114. One-way transmission time. May, 2003.
178. ITU-T Recommendation G.107. The E-model: a computational model for use in
transmission planning. December, 2011.
179. ITU. The Essential Report on IP Telephony. [online] 2012. [cited 2012 Dec 17].
Available from :
180. Boutremans, C., Iannaccone, G. and Diot, C. Impact of link failures on VoIP
performance. [online] 2012. [cited 2012 Dec 17]. Available from : http://
181. Markopoulou, A. et al. “Characterization of Failures in an IP Backbone.” Proc.
23rd Annual Jount Conference of the IEEE Computer and Communication
Societies (INFOCOM 2004). (7-11 March 2004): 2307-2317.
182. Markopoulou, A., Tobagi, F. and Karam, M. “Loss and Delay Measurements of
Internet Backbones.” Computer Communications. 29 (2006): 1590-1604.
183. Zhang, H., et al. “Packet Loss Burstiness and Enhancement to the E-Model.”
Proc. 6th Int. Conf. Software Engineering, Artificial Intelligence, Networking
and Parallel/Distributed Computing and 1st ACIS Int. Workshop on Self-
Assembling Wireless Networks (SNPD/SAWN 2005). (23-25 May 2005):
184. Fluke Corporation. Quality Management: Troubleshooting Techniques for Voice
over IP. [online] 2012. [cited 2012 Dec 20]. Available from : http://www.teq

185. Hall, T.A. “Objective speech quality measures for Internet telephony.” Proc. of
SPIE, Voice over IP (VoIP) Technology. (2001): 128-136.
186. Narbutt, M. and Davis, M. Assessing the quality of VoIP transmission affected
by playout buffer scheme. [online] 2012. [cited 2012 Dec 20]. Available from
187. Ding, L., et al. “Non-intrusive single-ended speech quality assessment in VoIP.”
Speech Communication. 49 (2007): 477-489.
188. Khanduri, P. “Method and Apparatus for Measuring Voice Quality on a VoIP
Network.” U.S. Patent: 2009/0238085 A1. 24 September 2009.
189. Al-Akhras, M., et al. “Non-intrusive speech quality prediction in VoIP networks
using a neural network approach.” Neurocomputing. 72 (2009): 2595-2608.
190. Lee, J., Nam, K. and Kim, D. “Effect of Network factors on VoIP.” Proc. 13th
Int. Conf. Advanced Communication Technology (ICACT). (13-16 February
2011): 1130-1135.
191. Goudarzi, M. and Sun, L. Performance analysis and comparison of PESQ and
3SQM in live 3G mobile networks. [online] 2012. [cited 2012 Dec 20].
Available from : Perform
192. Voznak M. and Rozhon, J. Automated Speech Quality Monitoring Tool based on
Perceptual Evaluation. [online] 2012. [cited 2012 Dec 18]. Available from :
193. OPTICOM GmbH, SwissQual AG and TNO Telecom. POLQA® Perceptual
Objective Listening Quality Analysis. [online] 2012. [cited 2012 Dec 18].
Available from :
194. ITU-T Recommendation P.862. Perceptual evaluation of speech quality
(PESQ): An objective method for end-to-end speech quality assessment of
narrow-band telephone. 2001.
195. ITU-T Recommendation P.862.2. Wideband extension to Recommendation
P.862 for the assessment of wideband telephone networks and speech codecs.
November, 2007.
196. Ditech Networls. Limitations of PESQ for Measuring Voice Quality in Mobile
and VoIP Networks. [online] 2012. [cited 2012 Dec 18]. Available from :
197. Johannesson, N.O. “The ETSI Computation Model: A Tool for Transmission
Planning of Telephone Networks.” IEEE Communication Magazine. 35
(1997): 70-79.
198. ITU-T Recommendation G.107. The E-model: a computational model for use in
transmission planning. August, 2008.
199. Qiao, Z., Sun, L. and Ifeachor, E. “Case Study of PESQ Performance in Live
Wireless Mobile VoIP Environment.” Proc. IEEE 19th Int. Sym. Personal,
Indoor and Mobile Radio Communications 2008 (PIMRC 2008). (15-18
September 2008): 1-6.

200. Qualcomm. PESQ Limitations for EVRC Family of Narrowband and Wideband
Speech Codecs. [online] 2012. [cited 2012 Dec 18]. Available from : http:
201. IEEE. You searched for: VoIP E-model. [online] 2012. [cited 2012 Sep 1].
Available from :
202. ___. You searched for: VoIP PESQ. [online] 2012. [cited 2012 Sep 1]. Available
from :
203. Lakaniemi, A., Rosti, J. and Raisanen, V.I. “Subjective VoIP speech quality
evaluation based on network measurements.” Proc. IEEE Int. Conf.
Communications 2001 (ICC 2001). (11-14 June 2001): 748–752.
204. Kitawaki, N. and Tamada, T. Subjective and Objective Quality Assessment for
Noise Reduced Speech. [online] 2012. [cited 2012 Dec 20]. Available from :
205. Cai, Z., et al. “Comparison of MOS evaluation characteristics for Chinese,
Japanese and English in IP telephony.” Proc. 4th International Universal
Communication Symposium. (18-19 October 2010): 112-115.
206. ITU-T. Question 7/12 – Methods, tools and test plans for the subjective
assessment of speech, audio and audiovisual quality interactions. [online]
2012. [cited 2012 Dec 17]. Available from :
207. ___. Work Programme. [online] 2012. [cited 2012 Aug 21]. Available from :
208. Yaodu, W., et al. “Subjective Speech Quality Evaluation Based on Collecting
Opinions via Internet.” Proc. 2010 Int. Conf. Communications and Mobile
Computing. (12-14 April 2010): 517-521.
209. Uzoamaka, E.D. Validating Perceptual Objective Listening Quality Assessment
Methods on the Tonal Language Igbo. Master of Science Thesis, Network
Architecture and Services, Faculty of Electrical Engineering, Mathematics
and Computer Science, Delft University of Technology, 2009.
210. Voran, S.D. “Subjective ratings of instantaneous and gradual transitions from
narrowband to wideband active speech.” Proc. IEEE Int. Conf. Acoustics,
Speech, and Signal Processing (ICASSP 2010). (14-19 March 2010): 4674-
211. Jellnek, M., Vaillancourt, T. and Gibbs, J. “G.718: A New Embedded Speech
and Audio Coding Standard with High Resilience to Error-Prone
Transmission Channels.” IEEE Communication Magazine. 47 (2009): 117-

212. Psytechnics. VoIP client benchmarking report. [online] 2012. [cited 2012 Dec
19]. Available from :
213. Hiwasaki, Y. and Ohmuro, H. “ITU-T G.711.1: Extending G.711 to Higher-
Quality Wideband Speech.” IEEE Communication Magazine. 47 (2009):
214. 3GPP. 3rd Generation Partnership Project; Technical Specification Group
Services and System Aspects; Performance characterization of the Adaptive
Multi-Rate Wideband (AMR-WB) speech codec (Release 5). [online] 2012.
[cited 2012 Dec 19]. Available from :
215. Chen, J.-H. and Thyssen, J. “BroadVoice®16: A PacketCable Speech Coding
Standard for Cable Telephony.” Proc. 40th Asilomar Conf. Signals, Systems,
Computers 2006 (ACSSC’06). (29 October - 1 November 2006): 1316-1320.
216. ___. “The BroadVoice Speech Coding Algorithm.” Proc. IEEE Int. Conf.
Acoustics, Speech, Signal Processing 2007 (ICASSP 2007). (15-20 April
2007): IV-537-IV-540.
217. Ren, J., et al. “Assessment of effects of different language in VOIP.” Proc. Int.
Conf. Audio, Language and Image Processing 2008 (ICALIP 2008). (7-9
July 2008): 1624-1628.
218. Wu, C.C., et al. “An empirical evaluation of VoIP playout buffer dimensioning
in Skype, Google talk and MSN Messenger.” Proc. Int. workshop on
Network and Operating Systems Support for Digital Audio and Video
(NOSSDAV ’09). (3-5 June 2009): 97-102.
219. Voznak, M. “E-model Modification for Case of Cascade Codecs Arrangement.”
International Journal of Mathematical Models and Methods in Applied
Sciences. 5 (2011): 1301-1309.
220. Voznak, M. et al. “E-model Improvement for Speech Quality Evaluation
Including Codecs Tandeming.” Proc. 9th WSEAS Int. Conf. Data Networks,
Communications, Computers. (3-5 November 2010): 119-124.
221. Zheng, H. and Lin, Q. “Non-intrusive Speech Qualityy Assessment in VoIP
Using the Extended E-model.” Energy Procedia. 13 (2011): 6867-687.
222. Ding, L. and Goubran, R.A. “Speech quality prediction in VoIP using the
extended E-model.” Proc. IEEE Global Communications Conference 2003
(GLOBECOM’03). (1-5 December 2003): 3974-3978.
223. Bandung, Y. et al. “Optimizing Voice over Internet Protocol (VoIP) Networks
based-on Extended E-model.” Proc. IEEE Conf. Cybernetics and Intelligent
Systems. (21-24 September 2008): 801-805.
224. Ren, J. et al. “Enhancement to E-model on standard deviation of packet delay.”
Proc. 3rd Int. Conf. Information Sciences and Interaction Sciences. (23-25
June 2010): 256-259.
225. Gareiss, R. The True Cost of Voice Over IP. [online] 2012. [cited 2012 Dec 17].
Available from : Case%2
226. Asterisk. A Brief History of the Asterisk Project. [online] 2012. [cited 2012 Dec
17]. Available from :

227. Digium. Company History. [online] 2012. [cited 2012 Dec 17]. Available from :
228. ___. About Digium. [online] 2012. [cited 2012 Dec 17]. Available from : http:
229. Asterisk. Get Started. [online] 2012. [cited 2012 Dec 17]. Available from : http:
230. Chichareon, P. et al. “Web based Configuration Manager for Asterisk Trunking
System.” Proc. 8th PSU Engineering Conference. (22-23 April 2010): 198-
231. Ratsamimonthon, P. et al. “Service Integration of Voice Communication and
Web based Conference.” Proc. 8th PSU Engineering Conference. (22-23
April 2010): 204-208.
232. Casaby, P. and Puangpronpitag, S. “Problem Evaluation of Security Issues in IP
telephony Open Source software.” Proc. National Conference on Computer
Information Technologies. (13-15 January 2010): 33-38.
233. Toomwan, S. A Study of Voice over IP. Master of Science in Network
Engineering Thesis, Faculty of Information Science and Technology,
Mahanakorn University of Technology, 2010. (Thai).
234. Jaksopha, S. VoIP Development for World Study Center Co.,Ltd. Master of
Science in Network Engineering Thesis, Faculty of Information Science and
Technology, Mahanakorn University of Technology, 2010. (Thai).
235. Johansen, A.J. Improvement of SPIT prevention technique based on Turing test.
Master of Science in Information Technology Thesis, Faculty of Information
Science and Technology, Mahanakorn University of Technology, 2010.
236. Thewaphon, S. A Study of Voice over IP for Department of Agricultural
Extension. Master of Science in Network Engineering Thesis, Faculty of
Information Science and Technology, Mahanakorn University of Technology,
2010. (Thai).
237. Digium. Asterisk Architecture. [online] 2012. [cited 2012 Dec 17]. Available
from :
238. Suwannaraj, K. IP-PBX Design and Installation Using Asterisk. 1st ed. Offset
Press, 2008. (Thai).
239. Goncalves, F.E. Configuration Guide for Asterisk 1.4 and 1.6. 4th ed. V.Office
Networks, 2012.
240. Barlas, A. Integrating Management System for the Asterisk Soft-IP-PBX. Master
of Science in Information Networking, Athens Information Technology,
241. Mahler, P. VoIP Telephony with Asterisk. [online] 2012. [cited 2012 Dec 19].
Available from :
242. Nussbaum, L. and Richard, O. A Comparative Study of Network Link Emulators.
[online] 2012. [cited 2012 Dec 19]. Available from :
243. Rizzo, L. “Dummynet: a simple approach to the evaluation of network protocols.”
ACM Computer Communication Review. 27 (1997): 31-41.
244. Carbone, M. and Rizzo, L. “Dummynet Revisited.” ACM SIGCOMM Computer
Communication Review. 40 (2010): 13-20.

245. Huang, T.-Y., et al. “Could Skype be more satisfying? – a QoE-centric study of
the FEC mechanism in the internet-scale VoIP system.” IEEE Network. 24
(2010): 42-48.
246. Balan, H.V., et al. L. “An Experimental Evaluation of Voice Quality over the
Datagram Congestion Control Protocol.” Proc. 26th IEEE Int. Conf. on
Computer Communications (INFOCOM 2007). (6-12 May 2007): 2009-2017.
247. Bhattacharya, A., Wu, W. and Yang, Z. “Quality of Experience Evaluation of
Voice Communication Systems using Affect-based Approach.” Proc. 19th
ACM International Conference on Multimedia (MM’11). (28 November – 1
December 2011): 929-932.
248. Amir, M., et al. QoE-Lab: Towards evaluating Quality of Experience for Future
Internet Conditions. [online] 2012. [cited 2012 Dec 17]. Available from :
249. Stehle, E., et al. Perception of Utility in Autonomic VoIP Systems. [online] 2012.
[cited 2012 Dec 19]. Available from :
250. Kosonen, V. Voice Quality in IP Telephony. [online] 2012. [cited 2012 Dec 17].
Available from :
251. Nahum, E.M., et al. “The Effects of Wide-Area Conditions on WWW Server
Performance.” Proc. ACM SIGMETRICS Int. Conf. Measurement and
Modeling of Computer Systems. (16-20 June 2001): 257-267.
252. Stehle, E., et al. Task Dependency of User Perceived Utility in Autonomic VoIP
Systems. [online] 2012. [cited 2012 Dec 19]. Available from : https://www.cs
253. Ries, M., Svoboda, P. and Rupp, M. “Empirical study of subjective quality for
massive multiplayer games.” Proc. 15th Int. Conf. Systems, Signals and
Image Processing, IWSSIP 2008. (25-28 June 2008): 181-184.
254. Hoßfeld, T. and Binzenhöfer, A. “Analysis of Skype VoIP traffic in UMTS:
End-to-end QoS and QoE measurements.” Computer Networks. 52 (2008):
255. VassarStats. Concepts & Applications of Inferential Statistics. [online] 2012.
[cited 2012 Dec 17]. Available from :
256. Park, H.M. Comparing Group Means: The T-test and One-way ANOVA Using
STATA, SAS, and SPSS. [online] 2012. [cited 2012 Dec 17]. Available from :
257. Overath, T. and Whalley, M. t- and F-tests: Testing hypotheses. [online] 2012.
[cited 2012 Dec 17]. Available from :
mfd/2005/Ft-tests.ppt#280,4,Types of Error.
258. DeCoster, J. Testing Group Differences using T-tests, ANOVA and Nonparametric
Measures. [online] 2012. [cited 2012 Dec 17]. Available from : http://www.
259. Lee, H. and Kuchroo, M. ANOVA: A Test of Analysis of Variance. [online] 2012.
[cited 2012 Dec 17]. Available from :

260. Motulsky, H. and Christopoulos, A. Fitting models to biological data using

linear and nonlinear regression: A practical guide to curve fitting. [online]
2012. [cited 2012 Dec 17]. Available from :
261. Hasnip, P. Mathematical Modelling Lecture 4 – Fitting Data. [online] 2012.
[cited 2012 Dec 17]. Available from :
262. Tallon-Bosc, I. Model Fitting Tutorial. [online] 2012. [cited 2012 Sep 30].
Available from :
263. ECE. Plots, Curve-Fitting, and Data Modeling in Microsoft Excel. [online]
2012. [cited 2012 Dec 17]. Available from :
264. MathWorks. MATLAB - Curve Fitting ToolboxTM User’s Guide. [online] 2012.
[cited 2012 Dec 17]. Available from :
265. Kunst, R.M. Econometric Forcasting: Evaluating predictive accuracy. [online]
2012. [cited 2012 Dec 17]. Available from :
266. Kusiak, A. and Zhang, Z. “Short Horizon Prediction of Wind Power: A Data-
Driven Approach.” IEEE Transactions on Energy Conversion. (25) 2010:
267. Ong, H.-C. and Chan, S.-Y. “A Comparison on Neural Network Forecasting.”
Proc. Int. Conf. Circuits, System and Simulation (IPCSIT 2011). (2011): 56-
268. Daengsi, T. and Tontiwattanakul, K. “A Case of Improvement of Building
Acoustics Using Available Equipments and Limited Resources.” Proc. 6th
Naresuan Research Conference 2010. (29-31 July 2010): 2-13.
269. Daengsi, T., et al. “Thai Text Resource: A Recommended Thai Text Set for
Voice Quality Measurements and Its Comparative Study.” KKU Science
Journal. 40 (2012): 1114-1127.
270. ITU-T Recommendation P.800.1. Mean Opinion Score (MOS) terminology. July,
271. ITU-T Recommendation P.862.3. Application guide for objective quality
measurement base on Recommendations P.862, P.862.1 and P.862.2.
November, 2007.
272. ITU-T. ITU-T Test Signals for Telecommunication Systems. [online] 2012. [cited
2012 Dec 17]. Available from :
273. ITU-T Recommendation P.805. Subjective evaluation of conversational quality.
April, 2007.
274. Suayroop, K. et al. “A VoIP Quality Measurement Study with Thai Speech Sets
Using PESQ and G.711A-law.” Proc. 34th Electricall Engineering
Conference (EECON-34). (20 November – 2 December 2011).
275. Daengsi, T., et al. “Comparison of Perceptual Voice Quality of VoIP Provided
by G.711 and G.729 Using Conversation-Opinion Tests.” International
Journal of the Computer, the Internet and Management. 20 (2012): 21-26.



TABLE A-1 Selected TSST speech samples for the listening opinion test [269]

Sentences (or Phrases) Meaning
Group No.

ตอไปเป็ ่
Next, it is the royal news.
(t盢 pa‹j pe‹n kha›˘w naj pHra¤ rafl˘t tCHa¤ sa‡m na¤k)
Where have you been?
(pa‹j na‡j ma‹˘ rF‡˘)

กรุ ณาถือสายรอสักครู่ ครับ/คะ่ Pleas hold on one

(ka› ru¤ na‹ tHμ‡˘ sa‡˘j rç˘ sa›k kHufl˘ kHra¤p  kHafl) moment.
ยินดีตอ้ นรับครับ/คะ่
(ji‹n di‹˘ t燢n ra¤p kHra¤p  kHafl)

วันนี้ จะไปเที่ยวที่ไหนดี Where are we going

(wa‹n ni¤˘ tCa› pa‹j tHifl˘aw tHifl˘ na‡j di‹˘) today?
่ เจอกนตั
ไมได้ ั ้ งนาน
Long time no see.
(maflj dafl˘j tCF‹˘ ka‹n taflN na‹˘n)

ํ งจะไปไหน
Where are you going?
(ka‹m la‹N tCa› pa‹j na‡j)
ขอรบกวนเวลาสักครู่ นะครับ/คะ Please give me
(kHç˘ ro¤p ku‹˘an we‹˘ la‹˘ sa›k kHufl˘ na¤ kHra¤p  kHa¤) sometime.
ิ าวที่ไหนกนดี
วันนี้ ไปกนข้ ั Where will we go to eat
(wa‹n ni¤˘ paj‹ ki‹n kha˘fl w tHi˘fl naj‡ kan‹ di˘‹ ) today?
จะกลับเมื่อไหร่ When will you come
(tCa› kla›p mμfl˘a ra›j) back?
ดูแลรักษาสุ ขภาพด้วยนะ
Take care of your health.
(du‹˘ lE‹˘ ra¤k sa‡˘ su›k kHa› pHafl˘p dufl˘aj na¤)
(to›k lo‹N na¤ kHra¤p  kHa¤)
Are you ok?

่ บมาใหม่
กรุ ณาติดตอกลั
Please contact again.
(ka› ru¤ na‹˘ ti›t t盢 kla›p ma‹˘ ma›j)
กลับถึงบ้านหรื อยัง
Have you reached home?
(kla›p tHμ‡N bafl˘n rμ‡˘ ja‹N)

TABLE A-1 Selected TSST speech samples for the listening opinion test (Conttinued)

Sentences (or Phrases) Meaning
Group No.

วันนี้ เรี ยนวิชาอะไร What subject are you

going to study (or have
(wa‹n ni¤˘ ri‹˘an wi¤ tCHa‹˘ ?a› ra‹j) you studied) today ?
จะกลับถึงบ้านกโมง When will you reach
(tCa› kla›p tHμ‡N bafl˘n ki›˘ mo‹˘N) home?
ํ งทําอะไรอยูเ่ หรอ
What are you doing?
(ka‹mfl la‹N tHa‹m ?a› ra‹j ju›˘ rF‡˘)
วันนี้ รถติดมากเลย
Traffic is/was bad today.
(wa‹n ni¤˘ ro¤t ti›t mafl˘k lF‹˘j)

ขณะนี้ เวลาแปดนาฬิกา Now, the time is eight

(kHa› na › ni˘¤ we‹˘ la˘‹ pE˘› t na˘‹ li ¤ ka˘‹ ) a.m.
(kH燢 tHofl˘t na¤ kHrap  kHa¤)
I’m sorry.





Name : Mr. Therdpong Daengsi

Thesis Title : VoIP Quality Measurement: Recommendation of MOS and Enhanced
Objective Measurement Method for Standard Thai Spoken Language
Major Field : Information Technology

Biography :
Therdpong Daengsi was born in Nakornphanom, in 1974. He received a
Bachelor of Engineering in Electrical Engineering from the Department of Electrical
Engineering, Faculty of Engineering, King Mongkut’s University of Technology
North Bangkok (KMUTNB), the former KMITNB, in 1997. In the same year, he
started working for a company, in the key-telephone system department, before
moving to the PABX department and becoming the Avaya Certified Expert in IP
Telephony. He received a Mini-MBA Certificate in Business Management and a
Master of Science in ICT from Assumption University in 2006 and 2008 respectively.
In order to obtain his Ph.D. proudly, he presented and published over 10 papers,
including one at the IEEE conference in the USA and another at the international
conference in Japan. While pursuing his graduate studies, he worked at another
telecommunication company in Bangkok. At the moment he is the service manager at
that telecom company and has now gained over 15 years of experience in telecom
service business.

Conference Publications :
1. Daengsi, T. and Preechayasomboon, A. “Enhanced Service Ticket Tracking
System by Using Email and SMS” Paper presented in NCCIT2008,
Mahasarakham, May 2008.
2. Daengsi, T. and Preechayasomboon, A. “Case Study: AIA Insurance – Migration
Project Experience” .Paper presented in NCCIT2008, Mahasarakham, May
3. Daengsi, T., and Preechayasomboon, A., “Case Study: AMEX Thailand – PABX
Migration Experience.” Paper presented in NCCIT2009, Bangkok, May
4. Daengsi, T. “A Simplified Subjective Measurement Method for Voice Quality
Evaluation in IP Telephony System Deployment.” Paper presented in
KMITL2552 Academic Conference, Bangkok, August 2009.
5. ____. “Voice Quality Measurement for VoIP: Simple Method Using a Survey.”
Paper presented in EECON-32, Prachinburi, October 2009.
6. Daengsi, T. and Tontiwattanakul, K. “A Case of Improvement of Building
Acoustics Using Available Equipments and Limited Resources.” Proc. of
the 6th Naresuan Research Conference 2010, pp. 2-13, Phitsanulok,
Thailand, July 2010.
7. Daengsi, T., Wutiwiwatchai, C., Preechayasomboon A. and Clayton, G. “Ear
Preference and Hand Dominance for Telephone Use by Thai People.” Paper
presented in ThaiBME2010, Bangkok, Thailand, August 2010.

8. Daengsi, T., Preechayasomboon, A., Clayton, G. and Wutiwiwatchai, C.,

“Development of Thai Text Set for Telephonometry.” Paper presented in
NCIT2010, Bangkok, Thailand, October 2010.
9. Daengsi, T., Prechayasomboon, A., Sukparungsee, S., Chootrakul, P. and
Wutiwiwatchai, C., “The Development of a Thai Speech Set for
Telephonometry.” Proc. of Oriental-COCOSDA 2010, p. 53, Kathmandu,
Nepal, November 2010.
10. Suayroop, K., Da.engsi, T., Sukparungsee, S., Wutiwiwatchai, C. and
Preechayasomboon, A. “A VoIP Quality Measurement Study with Thai
Speech Sets Using PESQ and G.711A-law.” Paper presented in EECON-34,
Chonburi, October 2011.
11. Daengsi, T., Sukparungsee, S., Wutiwiwatchai, C. and Prechayasomboon, A. “The
Present VoIP Regulation in Thailand and MOS – the Metric for Voice
Quality Evaluation.” Paper presented in The NBTC Year End Conference
2011, Bangkok, December 2011.
12. Daengsi, T., Wutiwiwatchai, C., Prechayasomboon, A. and Sukparungsee, S., “A
Study of VoIP Quality Evaluation: User Perception of Voice Quality from
G.729, G.711 and G.722,” IEEE-CCNC’2012 - SS QoE, Las Vegas, NV,
January 2012.
13. Daengsi, T., Preechayasomboon, A., Wutiwiwatchai, C. and Sukparungsee, S.
“Recent VoIP Services in Thailand and the Expectation of Thai Users to
Voice Quality,” Proc. of the 1st Asean Plus Three Graduate Research
Congress,” pp. ST-835 – ST-840, Chiang Mai, Thailand, March 2012.
14. Daengsi, T., Wutiwiwatchai, C., Sukparungsee S. and Prechayasomboon, A.
“Research on E-model Enhancement Based on Thai: Enhancement Referring
to Packet Loss Effects.” Paper presented in WTC2012 (Poster Session),
Miyazaki, Japan, March 2012.
15. Daengsi, T. “A Guide to VoIP/IP Telephony Security.” Paper presented in Cyber
Security and Digital Forensics 2012, Bangkok, Thailand, September, 2012.

Journal Publications :
1. Daengsi, T., Wutiwiwatchai, C., Preechayasomboon, A. and Sukparungsee, S.
“Speech Quality Assessment of VoIP: G.711 VS G.722 Based on Interview
Tests with Thai Users.” International Journal of Information Technology
and Computer Science, Vol.4, No.2, pp.19-25, March 2012.
2. Daengsi, T., Sukparungsee, S., Wutiwiwatchai, C. and Preechayasomboon, A.,
“Comparison of Perceptual Voice Quality of VoIP Provided by G.711 and
G.729 Using Conversation-Opinion Tests.” International Journal of the
Computer, the Internet and Management, Vol. 20, No. 1, pp. 21-26, April
3. Daengsi, T., Wutiwiwatchai, C., Preechayasomboon A. and Sukparungsee, S.
“VoIP Quality Measurement: Insignificant Voice Quality of G.711 and
G.729 Codecs in Listening-Opinion Tests by Thai Users.” Information
Technology Journal, Vol. 8, No., 1, pp. 77-82, June 2012.
4. Daengsi, T., Preechayasomboon, A., Sukparungsee, S. and Wutiwiwatchai, C.
“Thai Text Resource: A Recommended Thai Text Set for Voice Quality

Measurements and Its Comparative Study.” KKU Science Journal, Vol.

40(4), pp. 1114-1127, December 2012.

You might also like