Professional Documents
Culture Documents
Abstract
Voice over IP (VoIP), a modern form of telecommunications, requires real-time
transmission. However, there are limitations (e.g., packet loss and delay) which result
in degradation in quality of voice transmission over an IP network. ITU-T, a sector of
the International Telecommunication Union, has issued recommendations for VoIP
quality based on a Mean Opinion Score (MOS) derived primarily from studies on
European languages. ITU-T has acknowledged that there is an issue related to
dependence on language/culture/nationality and the Quality of Experience (QoE) of
multimedia, including MOS. Because ITU-T standards have yet to be adopted for
measurement of voice quality for tonal languages (e.g. Thai), the research in this
thesis has several aims as follows. First, it is to carry out detailed assessment of
subjective voice quality for the standard spoken Thai language and for native Thai
speakers in a Thai environment. Second, to compare voice quality perceived by Thai
users of four VoIP codecs, G.729, G.711A-law, G.722 and G.723.1 (at 5.3 kbps). It
has been found that Thai users found no significant difference between three of the
codecs with a slight preference for G.722 over G.711A-law over G.729 codec.
However, they ranked G.723.1 as having significantly poorer voice quality than the
others. Therefore G.729 is recommended as the best choice because it has MOS in the
range 4.13- 4.18 but requires a lower bandwidth than G.722 or G.711A-law. Third, it
is to propose acceptable voice quality standards for VoIP services for Thai language
and for native speakers of Thai. It has been found that, on average, Thai users
expected VoIP quality equivalent to a MOS of 3.41, i.e., fair quality. Finally, this
thesis proposes two new models for Thai languages and Thai users, namely a Thai
subjective-VoIP Quality Evaluation (ThaiVQE) model and an Enhanced E-model
(E2-model), an objective method. The ThaiVQE model focused on two major network
parameters, packet loss and packet delay, and a G.711A-law codec. The E2-model was
obtained by including a subjective Thai bias factor in a standard E-model. The results
showed that both models gave significant accuracy and reliability improvements
compared to the standard E-model, with error reduction more than 20%. Therefore,
both of the new models can support voice quality measurement in Thai environments
with high accuracy, reliability and confidence.
(Total 169 pages)
Keywords : VoIP, MOS, E-model, packet loss, packet delay, codec, Thai
______________________________________________________________ Advisor
ii
ชื่อ : นายเทอดพงษ์ แดงสี
ชื่อวิทยานิพนธ์ ี่ ั เอ็มโอเอส
: การวัดคุณภาพเสี ยงวีโอไอพี: ข้อแนะนําเกยวกบ
และวิธีการวัดเชิงวัตถุวิสยั เสริ มสมรรถนะสําหรับภาษาพูด
ไทยมาตรฐาน
สาขาวิชา : เทคโนโลยีสารสนเทศ
มหาวิทยาลัยเทคโนโลยีพระจอมเกล้าพระนครเหนือ
อาจารย์ที่ปรึ กษาวิทยานิพนธ์หลัก : ผูช้ ่วยศาสตราจารย์ ดร. เสาวณิ ต สุ ขภารังษี
อาจารย์ที่ปรึ กษาวิทยานิพนธ์ร่ วม : ดร. อภิรักษ์ ปรี ชญสมบูรณ์
อาจารย์ที่ปรึ กษาวิทยานิพนธ์ร่ วม : ดร. ชัย วุฒิวิวฒั น์ชยั
ปี การศึกษา : 2555
บทคัดย่ อ
วีโอไอพี (VoIP) เป็ นการสื่ อสารโทรคมนาคมยุคใหม่ ที่ต้องการการสงแบบทั ่ นทีทนั ใด แตก่็
ยังมีขอ้ จํากดั (เชน่ การสู ญเสี ยและการหนวง ่ แพ็คเกต) ซึ่งมีผลตอ่คุณภาพเสี ยงได้ ไอทีย-ู ที (ITU-T)
ได้เสนอวิธีการวัดคุณภาพเสี ยงวีโอไอพี ด้วยมาตรวัดที่ชื่อ เอ็มโอเอสหรื อมอส (MOS) และไอทีย-ู ที
็
กตระหนั ก ดี ว่ า มี ป ระเด็น เกยวกบภาษา
ี่ ั ั
/วัฒ นธรรม/สัญ ชาติ กบการวั ด คุ ณ ภาพเสี ย งวี โอไอพี
เนื่ องจากมาตรฐานไอทีย-ู ที ยังไมมี่ การกาหนดสํ
ํ าหรับภาษาที่มีวรรณยุกต์ การค้นคว้านี้ จึงเกดขึิ ้น
ด้วยวัตถุประสงค์ อันดับแรก เพื่อศึกษาการประเมินคุณภาพเสี ยง สําหรับคนไทยและภาษาพูดไทย
มาตรฐาน อันดับที่สอง เพื่อเปรี ยบเทียบ 4 โคเด็ค (Codec) คือ G.729, G.711A-law, G.722 และ
G.723.1(5.3 kbps) ซึ่ งพบวา่ G.729 ให้คุณภาพเสี ยงที่ไมแตกตางจาก ่ ่ G.711A-law และ G.722
อย่างมีนยั สําคัญ แตใช้ ่ แบนด์วิธท์ (Bandwidth) น้อยกวา่ ขณะที่ G.723.1 ให้คุณภาพเสี ยงตํ่ากวา่
โคเด็คอื่นอยางมี ่ นัยสําคัญ อันดับถัดไป เพื่อหาเกณฑ์คุณภาพเสี ยงที่ สามารถยอมรั บได้สําหรั บ
คนไทย ซึ่งพบวา่ คา่เอ็มโอเอส 3.41 เป็ นคาเฉลี
่ ่ยที่คนไทยคาดหวังจากวีโอไอพี สุ ดท้าย เพื่อนําเสนอ
2 โมเดลใหมคื่ อ ไทยวีคิวอี (ThaiVQE) และ อีสแควร์ -โมเดล (E2-model) ที่ได้จากการรวมคา่ไทย
ไบแอสแฟคเตอร์ (Thai bias factor) ซึ่งพบวา่คาผิ ่ ดพลาดลดลงกวา่ 20% เมื่อเทียบกบัอี-โมเดลเดิม
(วิทยานิพนธ์มีจาํ นวนทั้ งสิ้ น 169 หน้า)
่
คําสําคัญ : วีโอไอพี เอ็มโอเอส อี-โมเดล การสู ญเสี ยแพ็คเกต การหนวงแพ็
คเกต โคเด็ค ไทย
___________________________________________________อาจารย์ที่ปรึ กษาวิทยานิพนธ์หลัก
iii
ACKNOWLEDGEMENTS
Thank you to over one thousand people involved in this research, including
students, staff and lecturers, sorry that I cannot write all of your names here.
Particularly, thank Mr. Chumpol Ngamphiw and Mr. Worasit Junsawang, my friends,
for helping me to implement the VoIP testbed system, thank Mr. Nattawut
Unwanatham for conducting subjective tests with hundreds subjects, thank you to Mr.
Wiwat Suwanuntawong and the Central Library Studio staff for their kindness to use
the studio for almost two years, and Mr. Gary Sherriff, the international coordinator,
Faculty of Information Technology for editing and English support of over 15 papers,
both published and unpublished.
Thank you to the Graduate College, KMUTNB and the Faculty of Information
Technology, KMUTNB, and the Speech and Audio Laboratory, NECTEC, for a part
of funding support.
Gratitude is due to my parents, particularly my mother who came from our
hometown to help take care of my family which provided me with valuable time to
pursue my PhD. Thank you to my wife who tries her best to understand me and is
always standing by my side with our two beautiful daughters.
My deepest gratitude to my advisors, Asst. Prof. Dr. Saowanit Sukparungsee,
Dr. Apiruck Preechayasomboon and Dr. Chai Wutiwiwatchai. Without your support
my dream would not have become true. Thanks to the thesis defense examination
committee members, Dr. Theerawat Piboongungon who is the chair, Dr. Maleerat
Sodanil and Dr. Elvin James Moore for very useful comments.
Finally, I would like to dedicate the contributions of this research to Dr. Gareth
Clayton, my advisor who passed away sadly. The beginning idea of this research
about the activation in the brain by tones in tonal languages came from our lengthy
discussions. We all miss you deeply.
Therdpong Daengsi
iv
TABLE OF CONTENTS
Page
Abstract (in English) ii
Abstract (in Thai) iii
Acknowledgements iv
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
1.1 Motivation of the Study 1
1.2 Purpose of the Study 3
1.3 Scope of the Study 3
1.4 Major Contributions 3
1.5 Thesis Structure 4
Chapter 2 Background and Literature Review 7
2.1 Thai Language and Thai Culture 7
2.2 Hearing and Language Ability 18
2.3 Voice over Internet Protocol 25
2.4 Voice Quality Measurement 40
2.5 Previous Research on Voice Quality Measurement 60
2.6 Selected Tools for Implementation of the Testbed VoIP System 66
2.7 Statistical and Mathematical Tools 70
Chapter 3 Methodology: Subjective and Objective Measurement 75
3.1 Phase I: Experimental Design 76
3.2 Phase II: Preparation 78
3.3 Phase III: Pilot Tests 84
3.4 Phase IV: Intensive Subjective Tests 87
3.5 Phase V: Objective Tests Using E-model 89
Chapter 4 Results of Analysis 91
4.1 Pilot Phase Results 91
4.2 Intensive Subjective Test Results 93
4.3 Objective Test Results 96
4.4 Analysis and Comparison 98
Chapter 5 E2-model ThaiVQE 105
5.1 E2-model 105
5.2 Thai Subjective VoIP Quality Evaluation Mathematical Model 107
5.3 Model Comparison: Standard E-model, E2-model and ThaiVQE 109
5.4 ThaiVQE for G.729 110
Chapter 6 Discussion Conclusion and Future Work 113
6.1 Discussion 113
6.2 Conclusion 116
6.3 Future Work 118
References 119
Appendix A Selected Thai Speech Samples for the Listening Opinion Tests 137
Appendix B Three Questionnaire Forms for All Subjective Tests (Thai) 141
Appendix E Selected Publications 149
Biography 167
v
LIST OF TABLES
Table Page
2-1 Thai initial consonant 9
2-2 Thai final consonant 10
2-3 Thai consonant cluster 10
2-4 Thai vowel 11
2-5 Thai tones 11
2-6 Example of Thai words with different tones 12
2-7 Some definitions of “culture” 13
2-8 Characteristics of “cultures” 14
2-9 Some characteristics of Thai culture versus Western culture 16
2-10 Familiar sounds and their loudness level in dB 19
2-11 Functions of main parts of the brain 20
2-12 Different function between LH and RH 21
2-13 VoIP QoS specification in Thailand 28
2-14 VoIP operators and VoIP numbers at present 30
2-15 Comparison of International call rates of traditional services and some
PC-to-Phone VoIP services 30
2-16 Comparison of international call rates of some services of phone-to-
phone via VoIP networks 31
2-17 Codecs and their properties 33
2-18 Comparison of H.323 versus SIP 38
2-19 General QoS controls 39
2-20 Comparison of header compression techniques 41
2-21 The statistics of the terms QoS vs QoE 43
2-22 ITU-T definition comparison of QoS vs QoE 44
2-23 Some definitions of QoE 45
2-24 Categories of VoIP User’s Experience and their quality expectation 45
2-25 Subjective measurement methods versus objective measurement methods 46
2-26 Scale of opinion scores and meaning 47
2-27 Definitions of subjective versus objective 48
2-28 The statistic of the results from IEEEXplore after using the keywords VoIP
quality, packet delay, packet loss and jitter 49
2-29 The evidence for the importance of subjective measurement 52
2-30 Network parameters between two endpoints, in a telephone network 58
2-31 The relation among R-value, MOS-CQE and user satisfaction 59
2-32 The statistic of the results from IEEEXplore for search using the
keywords VoIP, quality , PESQ and E-model 60
2-33 Example of previous works with subjective tests 63
2-34 Example of MOS from different language 64
2-35 Description of each Asterisk part based on its architecture 68
2-36 Important Asterisk files based on FreeBSD 69
2-37 Dummynet features 69
2-38 Comparison of t-tests and ANOVA for two and three groups respectively 71
vi
LIST OF TABLES (CONTINUED)
Table Page
3-1 Summary about the pilot tests 77
3-2 Summary about the conversation opinion tests, each scenario required at
least 24 subjects (total 576 subjects) 77
3-3 Test scenarios 78
3-4 Comparison of imported values and properties of the modified room 81
3-5 Speech lists 83
3-6 The estimate numbers of subjects 84
4-1 Interview test results 92
4-2 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to
delay 94
4-3 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to
loss 95
4-4 MOS-CQS versus MOS-CQE from G.711 referring to loss and delay
effects after validating 96
4-5 Numbers of subjects, total 400 subjects referring on the tests with G.711 97
4-6 Statistic from the survey of the mean expectation score of voice quality
bases-on 5-point scale from 828 Thai users 98
4-7 Hypotheses 101
4-8 Hypothesis analysis results with 95% CI 102
4-9 Comparison of MOS from Thai users and users from different
languages and cultures 103
5-1 MOS-CQS* from ThaiVQE model referring to packet delay and packet
loss effects for G.711 only 109
5-2 Test set information 110
5-3 E2-model and ThaiVQE evaluation results using comparison with the
test set 110
5-4 The difference of MOS-CQS provided by G.711 and G.729 111
5-5 Estimation of MOS-CQS* from ThaiVQE model referring to packet
delay and packet loss effects for G.729 112
6-1 Summary of numbers of subjects 115
6-2 Subjective MOS from Thai users 116
A-1 Selected TSST Speech Samples for the Listening Opinion Test 138
vii
LIST OF FIGURES
Figure Page
2-1 A map of Thailand 7
2-2 The example of fundamental frequency (F0) contours of five Thai tones 12
2-3 Comparison of Hofstedges’ Score of National Culture in five dimensions 17
2-4 The auditory pathway 20
2-5 Superior view of the brain shows left and right hemispheres 21
2-6 Lateral view of the brain (LH) 22
2-7 Working of the brain 22
2-8 Schematic representation of MMN regions of interest. 23
2-9 Potential maps of electric MMN response 24
2-10 Comparison of English brain, Chinese brain and Thai brain 24
2-11 The PET scanning results from the PET study of tone perception 24
2-12 The vocal organs of human 25
2-13 Business perspective to IP telephony 26
2-14 Overview of 4G Network 28
2-15 NGN layers 29
2-16 NGN subsystem architecture 29
2-17 VoIP architecture overview 32
2-18 H.323 architecture overview 35
2-19 H.323 protocol suite 35
2-20 SIP architecture overview 36
2-21 SIP protocol suite 37
2-22 A comparison of message flows between H.323 and SIP 37
2-23 Overview of CAC mechanisms 39
2-24 Voice packet compression overview 40
2-25 Comparison of QoS and QoE: test points 42
2-26 Position of QoE and QoS for VoIP 42
2-27 VoIP QoE: Good QoE perceived by User A vs poor QoE perceived by
User B 43
2-28 Influence factors for voice quality 44
2-29 Voice quality measurement concept 44
2-30 The result from search using the keyword ‘VoIP quality’ 49
2-31 Relation of R-value from E-model vs delay 50
2-32 Example of random and bursty packet loss 51
2-33 Subjective Voice Quality Measurement Methods 53
2-34 Objective Voice Quality Measurement Methods 54
2-35 PESQ application guide 55
2-36 PESQ overview 56
2-37 Reference connection of E-model 57
2-38 The graph represents the relation of R-value and MOS-CQE 59
2-39 Chinese MOS versus Japanese MOS 62
2-40 The overall structure of the extended E-model 65
2-41 Asterisk architecture 67
2-42 The structure of a dummynet “pipe” with configurable parameters 70
2-43 Model fitting process overview 72
viii
LIST OF FIGURES (CONTINUED)
Figure Page
3-1 The methodology for objective measurement method enhancement
using Thai bias factor 75
3-2 Top view of the plan of the studio room 79
3-3 The background noise was checked before testing 80
3-4 Example of the window sill 80
3-5 Example of room floor 81
3-6 Diagram of the VoIP testbed system 81
3-7 The real VoIP testbed system 82
3-8 The overview of VoIP system for the test 84
3-9 The captured screen shot from investigation of packet delay and packet
loss before testing 84
3-10 An IP phone and facilities for the pilot test 85
3-11 A speech list that starts with the speech by Child1 86
3-12 Overview of the interview tests 86
3-13 Overview of the test system over an IP network 87
3-14 Example of random shapes 88
3-15 The overview of the conversation opinion tests in this reseach 89
3-16 Diagram of E-model measurement 89
4-1 The ACR-listening test results 91
4-2 MOS-LQS of G.711 vs G.729 from different types of voices 92
4-3 Comparison of percent of the votes: G.711 vs G. 722 at 64 kbps 93
4-4 The MOS-LQO results of 4 lists of Thai speech and American-English
speech 93
4-5 Comparison of G.722, G.711 and G.729 referring to delay effects 94
4-6 Comparison of G.722, G.711 and G.729 referring to loss effects 95
4-7 Representing of MOS-CQS versus MOS-CQE 97
5-1 Overview of finding Thai bias factor 106
5-2 Overview of the proposed E2-model with Thai bias factor 106
5-3 Overview of development of ThaiVQE model 108
5-4 The surface chart of the MOS-CQS provided by G.711 for packet loss
of 0-10% and packet delay of 0-0.8 s. 108
ix
CHAPTER 1
INTRODUCTION
In today’s world of rapidly changing technology, information and
communication technologies are at the forefront of the change. Before the emergence
of the Internet, data and voice calls ran over different routes in the network. Now,
more than 15 years later, as a result of modern technology convergence both data and
voice calls can run together over the same routes through the Internet. However, the
nature of a voice call is different from the nature of data transmission. Most data sent
over networks does not require real-time support from the network but a voice call
needs it. Moreover, voice calls cannot tolerate problems which often occur over a
packet based network such as the Internet. Problems such as packet delay, packet loss,
jitter and echo cause major difficulties for a real-time application such as VoIP (Voice
over Internet Protocol). Because of the above problems, voice call quality perceived
by a listener is sure to be affected.
This study focuses on VoIP quality for the standard Thai spoken language. Thai
is a tonal language that is used by most of the approximately 65 million people in
Thailand. The motivation, purpose, and scope of this study are as follows:
services to achieve the conditions of the regulator. Further, when a new VoIP system
is installed by system developers/implementers for system owners, the owners will
usually require a voice quality measurement before accepting the new system. Also,
if VoIP service providers can announce their level of voice quality or MOS, then
VoIP users or consumers will be able to select VoIP services from the service
provider who provides the best VoIP quality. Therefore, all Thai VoIP stake-holders
can be expected to obtain benefits from this research into VoIP quality for Thai users
and Thai Spoken language.
1.4.2 Mean Expectation Score (MES), which is equivalent to MOS, from Thai
users, have been determined. Therefore, MES could be recommended as the baseline
for providing voice quality for VoIP services in Thailand.
1.4.3 An Enhanced E-model (E2-model) based on Thai users in Thai
environments has been developed. This model is suitable for covering Thai tonal
language and Thai culture, particularly characteristics of Thai people and the way
they respond to situations. The E2-model should increase overall accuracy, reliability
and confidence in measurements of VoIP quality.
1.4.4 The Thai Subjective VoIP Quality Evaluation mathematical model
(ThaiVQE) has been developed as an option for VoIP quality measurements. This
model can be applied in Thailand with high accuracy, reliability and confidence,
without high cost
the experiments. Section 3.4 describes intensive subjective tests using conversation
opinion tests. Finally, Section 3.5 discusses objective tests based on the PESQ and E-
model test methods.
Chapter 4 presents results from all tests and gives an analysis and comparison of
the results. Sections 4.1-4.3 presents a comparison of results of the subjective MOS
and the objective MOS tests described in Chapter 3. Section 4.4 uses statistical tools
called t-test and ANOVA for hypothesis tests about the perception of Thai subjects to
the codecs under test (e.g. G.729, G.711 a G.722) and for a statistical comparison of
MOS from Thai users and MOS from users in other countries.
Chapter 5 presents the main contribution of this research. Sections 5.1 and 5.2
describe the development of the enhanced E-model using Thai bias factor, called E2-
model, and the Thai subjective VoIP Quality Evaluation mathematical model
(ThaiVQE). The development is based on a model fitting technique. In Section 5.3
these two models are evaluated, analyzed and discussed using two simple
mathematical tools, called Mean Absolute Error (MAE) and Mean Absolute
Percentage Error (MAPE) in order to see if these two approaches can give higher
accuracy, reliability and confidence. From the evaluation, it is found that the E2-
model and ThaiVQE can reduce error by more than 20% when compared with the
standard E-model. Section 5.4 presents an estimation of ThaiVQE for G.729. This
chapter also proposes a ThaiVQE table based on packet loss and delay that can be
used as a guideline for the use of G.711 and G.729 by Thai users in Thai
environments.
Chapter 6 gives a discussion, recommendations and conclusions on the results
of this research. The chapter concludes with suggestions for possible future work.
CHAPTER 2
BACKGROUND AND LITERATURE REVIEW
However, the formal one that all Thai people understand is Thai, which could be
called as standard or official Thai. It is used in official places such as schools,
universities, hospitals, police stations and many organizations. Also, it is used for
news broadcasting over TV and radio. Thai is in the Tai language family, a subgroup
of Kadai or Kam-Tai which is under the Sino-Tibetan family. Thai is a tonal
language, similar to several languages in Asia, such as, Mandarin-Chinese and
Vietnamese, in Europe, such as, Swedish and Norwegian and in Africa, such as, Igbo.
Thai is used by almost 70 million people at present which includes Thailand and
surrounding borders. Basically, Thai consists of 44 consonants (42 consonants in use
and 2 consonants are obsolete), 15 basic vowels and 4 tone markers. Similar to
English, Thai text is written from left to right horizontally. Contradicting to English,
there is no space between words in the same sentence and no explicit sentence
markers. Vowels can be found before, after, below or above the consonant.
Combinations of a few consonants and vowel characters can make compound vowels,
called diphthongs. Compared to English, Thai grammar is simpler. It is “Subject +
Verb + Object” but there is no article, verb conjugation, declension, object-pronouns,
and tenses.
2.1.2 Thai sound system [19, 22-24]
For the Thai sound system, basically each word consists of an initial consonant,
a vowel, a final consonant and a tone marker respectively. Some background
information about the Thai sound system is presented as follows:
2.1.2.1 Initial consonant
There are 21 phonemes for initial consonant in Thai that are produced from
different points of articulation, as in Table 2-1.
2.1.2.2 Final consonant
There are 9 phonemes for final consonant in Thai that are produced from
different points of articulation, as in Table 2-2.
2.1.2.3 Consonant cluster
There are possibly 6 phonemes for the first phonemes and 3 phonemes for the
second phonemes in the cluster respectively. Therefore, there are 12 forms of
consonant cluster in Thai, as in Table 2-3.
2.1.2.4 Vowel
For the sound system in Thai language, there are both monophthongs and
diphthongs, as in Table 2-4.
2.1.2.5 Tone
For tones of Thai, there are five tones which consist of the middle tone (no tone-
marker), the low tone, the falling tone, the high tone and the rising tone, as in Table 2-
5 and Figure 2-2 [19, 22]. Tone, which is about pitch variation, is a very important
feature for Thai because different tones results in different lexical words and meaning,
as in the Table 2-6. There are many tonal languages used around the world but Thai
differs from them in many ways.
9
ป p
พ, ภ, ผ pH
บ b
ต, ฏ t
ท, ธ, ถ, ฐ, ฑ, ฒ tH Plosive
ด, ฎ, ฑ d
ก k
ข, ค, ฆ kH
อ /
ม m
น, ณ n Nasal
ง N
จ tC
Affricate
ช, ฉ, ฌ tCH
ฟ, ฝ f
ซ, ส, ศ, ษ s Fricative
ห, ฮ h
ล, ฬ l Lateral
ร r Trill
ว w
Approximant
ย, ญ j
10
ป, ภ, บ, พ p
ด, ศ, จ, ช, ฐ, ฏ, ฒ, ถ, ท, ธ, ษ, ต t
Plosive
ก, ข, ค, ฆ k
- /
ม, อํา m
น, ร, ญ, ณ, ล, ร n Nasal
ง N
ว, เ-า w
Approximant
ย, ญ, ใ, ไ j
ปร pr
ตร tr
กร kr
พร pHr
ทร tHr e.g., ทฤษฎี (tHri¤t sd$/ di#˘)
คร kHr
ปล pl
พล pHl
กล kl
คล kHl
กว kw
คว kHw
11
Vowel
Short Long
◌ิ i ◌ี i˘
เะ e เ e˘
แะ E แ E˘
◌ึ μ ◌ื μ˘
Monophthong
เ อะ F เอ F˘
ะ, ◌ั, ◌าํ, ใ , ไ , เ า,
a า a˘
รร
◌ุ u ◌ู u˘
โะ o โ o˘
เ าะ, อ, ◌อ็ ç อ ç˘
FIGURE 2-2 The example of fundamental frequency (F0) contours of five Thai
tones
Thai Tone
Item
Mid Low Falling High Rising
คาว ่
ขาว ข้าว ค้าว (kha¤˘w) ขาว
2 (kha‹˘w) (kha›˘w) (khafl˘w) a kind of (kha‡˘w)
a bad odor news rice freshwater fish White
ฟา ฝ่ า ฝ้ า ฟ้ า ฝา
3 (fa‹˘) (fa›˘) (fafl˘) (fa'˘) (fa‡˘)
th
4 note to violate scum sky a lid
ซอง ่
สอง ่
ซอง ซ้อง (sç'˘N) สอง
4 (s狢N) (sç›N) (sçflN) to acclaim (s燢N)
an envelope to shine a bawdy house with one voice Two
2.1.3 Thai Culture: Characteristics of Thais and the Way They Respond to
Situations
2.1.3.1 Overview on Thai Culture [25-43]
In general, meaning of the term “culture” is very wide; it covers history,
religion, art, food and so on. For this research, its meaning is narrower because it
focuses on characteristics of Thai people and the way Thai people respond to
situations.
From a survey, it has been stated in [26] that there are over 300 definitions of
the term “culture”. Some of those meanings that are consistent with the context of the
research are presented in Table 2-7, whereas, characteristics of culture are
summarized in Table 2-8 [27-28, 31].
Author Definition
The human-made part of the environment, an all-
Herkovits (1944) encompassing explanation leading to the nation that culture is
everything.
It is characterized by systems of shared meanings and
Geertz (1973)
symbols.
A system of knowledge enabling communication with others
Keesing (1974)
and interpretation of their behavior
Vongvipanond A complex phenomenon, a sum total of behavior and belief of
(1974) a society.
The collective programming of the mind which distinguishes
Hofstede (1984)
the members of one human group from another.
A social unit’s collective sense of what reality is, what it
Putnam and
means to be a member of a group, and how a member ought to
Cheney (1985)
act.
Triandis and Its critical attribute is reflecting “shared meanings, norms and
Albert (1987) values”.
Culture characteristics
Thais in Thailand have their own culture without a hybrid characteristic affected
by Western colonization like other countries or islands in Asia (e.g. Singapore,
Malaysia, Vietnam, Philippines, Macao and Hong Kong) that had been conquered by
western countries. Thailand is the only Southeast Asian nation, and one of the three
nations in Asia, including Mainland China and Japan, that has never been colonized
by Europeans (e.g. British, France, Spain, Portugal and USA) throughout the 19th
century [29-30]. Therefore, Thais have preserved their own rich culture [29].
Thailand with an abundance of resources located near the equator which
provides a warm climate benefits agriculture in many ways. Therefore, agriculture
spirit is embedded in Thai culture and can be seen in the daily life and the way Thais
live their life. Thai culture is different and unique from other societies. The
characteristics of Thais [25, 27, 31-32], clearly shows deep gratitude and generosity,
honor of elders, and grate respect to the king the queen and the royal family. Thais are
also kind, polite, easy-going, friendly, fun and carefree. Besides, Thais love peace and
they are proud to be Thai.
Thais are flexible and situation oriented rather
ideologically/principle/system/law [32, 42]. That means it is acceptable for Thais to
adjust laws and principles to fit situations. This can also be seen from using the phrase
่ นไร” (mA^j pe##n rA#j), which means never mind or it does not matter as Thais
“ไมเป็
frequently and prefer compromise to resolve conflict [28, 32-33]. This phrase also
presents the easy-going feeling about life, flexibility and a high degree of tolerance.
Normally, a Thai greets another by placing hands together and raising them to
the face, which is called “ไหว้” (wa^j), while say “สวัสดี” (sa $wa$t di#˘) and smile. In
general, the junior/younger/socially lower status starts the “ไหว้” to the
senior/elder/socially higher status because Thais are taught to respect
seniors/elders/superiors, teachers and parents.
15
Thais always smile, which can be implied as sincerity. It becomes one of the
characteristics of Thais; therefore, Thailand is referred by foreigners as the “Land of
Smile” and becomes a part of the good image of Thailand [34-35]. Knutson [36]
stated that it is not just slogan but it is real. Normally, there are various meanings of
smile in Thai society, for example, to excuse and give pardon for none-serious
mistakes, to thank without saying any word for a small service/support, to show
embarrassment, fear or remorse, or to avoid comment on some issues [25, 37]. Further
kinds of smile and meanings can be found in [31].
For other examples of Thai culture, each part of body has different levels of
honor [25]. The head is considered as the most honorable part, whereas, the foot is
considered as the lower part that is dirty. Therefore, touching the head of other people
or moving something for someone by the foot is generally impolite for Thais.
The national religion in Thailand is Buddhism, therefore, Thai culture is also
known as “Thai Buddhism culture” [31]. The belief or the faiths about this religion,
and the teachings from the “Tripitaka”, have a profound and pervasive influence on
Thai culture greatly since the Sukhothai era. Buddhism encourages individualism,
youthful idealism, open-mindedness, non-violent and tolerance [34, 38-39], for
example, Thais can accept and suffer their loss or occurrences in life because of the
attitude towards life with the Buddhist concept of “karma”, which is believed as a
“road map” to explain the “how and why” things happened in their life [32]. Good
things in life are believed to be the consequences from good karma, while, bad things,
suffering and trouble are explained as the results of bad karma.
Nevertheless, in Thai culture, “face-saving” is very important and sensitive. It is
about the top concern for “ego” that is of fundamental importance to Thais, related to
criticism/conflict/confrontation avoidance [25, 31, 39-40]. “Face” in Thai culture can
be implied as dignity, reputation, honor, shyness, respectability, credibility and
integrity. The face may be lost when, for example, the action of one with his/her
social position fails to meet the requirement, one is being laughed at or insulted by
other one, and one feels embarrassed and looks foolish in front of others [37, 40-41].
Therefore, avoidance of making someone lose face is very necessary [42].
However, for the negative perspectives of Thai culture, there are issues about
disordering, ignoring of the rules, lack of discipline and non-assertiveness [32, 43].
2.1.3.2 Thai Culture versus Western Culture
“A different group of people has different ways of life, different ways to give
meaning to things, and different values and behaviors. Therefore, social or national
culture is dictated by the values, beliefs, behaviors, and norms which permeate their
members and are expressed through the words and behaviors of those members in
society [43]”.
MacDonald [44] stated that Western culture is unique compared to other
cultures. Western culture has been influenced by the Catholic Church and
Christianity. This culture was evolved from the hunter-gatherer culture that adapted in
the cold and cloudy environments which is ecologically adverse climates because
Western countries are located far from the equator. People from Western culture are
free to: debate, argue, analyze and prove things reasonably and logically.
However, compared to Thai culture, there are many different issues. The
fundamental differences come from many factors, such as, geography, ecology,
climate, resources and beliefs. Many characteristics of both cultures are compared and
16
highlighted in Table 2-9 [27, 31-32, 46-48], while the comparison based on Hoftedes’
culture dimension is presented in Figure 2-3 [45].
In the Figure 2-3, it can be seen that Thailand is different from the two Western
countries, USA and UK, in all Hofstedes’ cultural dimensions obviously. Each
dimension has been described in [49] and can be represented as follows:
Firstly, Thailand shows a higher power distance score of 64 meaning Thais
accept unequal power between leaders and followers in society or workplace, whereas
the Westerners requires more equality. Secondly, Thailand shows a lower
individualism score of 20 meaning Thais tend to become a member of group in most
areas of life more than Westerners. Thirdly, Thailand presents a lower masculinity
sore of 34, which can be implied as Thais have less competitiveness ability and
assertiveness. Next, Thailand has a higher uncertainly avoidance score of 64, which
can be translated as Thais do not prefer change or to take risks. Lastly, Thailand has a
higher long term orientation score of 52 that means Thais plan far into the future.
Moreover, Knutson [29] presented that, in 1991, Komin displayed some
characteristics about Thai culture but those are rare to find in Western culture, for
example, caring-considerateness, contentedness, interdependence, mutually
helpfulness, and gratefulness.
To understand Thai culture deeper, it can be emphasized by considering Thai
terms such as, “บุญ” (bu#n) and “บาป” (ba$˘p), “บุญคุณ” (bu#n kHu#n), “กตัญญู” (ka$ tHa#n
ju#˘), “ญาติ” “ที่ต่าํ ที่สูง” (tHifl˘ ta$m tHifl˘ su&˘N), “ใจเย็น”
(ja^˘t), (tCa#j je#n)” or “ใจเย็นๆ”
(tCa#j je#n je#n), “สนุก” (sa$ nu$k) and “เลน ่ ” (le^˘n) [29-30].
18
L p 20 log 10 ( P / P0 ) (2-1)
Where:
Lp is sound pressure level (dB).
P is the root mean square (RMS) sound pressure (Pa)
P0 is a reference RMS sound pressure, normally it is 20x10-6 Pa
Loud sounds of 130-140 dB can be painful and damage hearing abilities of the
ear. Table 2-10 presents some familiar sounds with their SPL [53-54].
Naturally, sounds are not just one frequency but are in form of a mixture of
frequencies. The high frequencies generate the high pitch whereas the low frequencies
generate the low pitch.
In general, sounds of speech cover frequencies of about 100-3,000 Hz but the
range of frequencies that the human ear can hear acutely is about 500-5,000 Hz
(although the average of total range that the human ear can hear is between 20-20,000
Hz).
2.2.1.2 Anatomy of the ear [52-54]
The human ear mainly consists of three parts, the external ear, the middle ear
and the internal ear respectively. The external (outer) ear includes the auricle (pinna),
external auditory canal (ear canal) and eardrum (tympanic membrane). Mainly, its
function is to gather sound waves, then conduct them pass through the eardrum, into
the next part, the middle ear. The middle ear includes three small bones (ossicles),
19
called hammer (malleus), anvil (incus) and stirrup (stapes) respectively (details may
be seen in [52-54]. The function of the middle ear is mainly to amplify the received
sound waves from the external ear in preparation before transmitting into the last part,
the internal ear. The internal (inner) ear is the most complicated part. It contains the
structures that support hearing and equilibrium. However, in this paper only hearing is
presented, the main part of the internal ear is cochlea. It is a spiral-shape structure that
contains the receptor cells (hair cells) which is for hearing. This part of ear is
connected with the nerve VIII, called vestibulocochlear nerve. The pressure waves
from sound waves lead the receptor cells to produce receptor potentials before
generating of nerve impulse to the nerve VIII which links to the brain, as in the Figure
2-4 [53]. Finally, nerve impulses are transmitted to the primary auditory area of the
cerebral cortex which locates in the temporal lobe of the cerebrum of the brain.
2.2.2 Brain and Language Ability
2.2.2.1 Fundamental and Functions of Human Brain [53-56]
The brain consists of four main parts, brain stem, cerebellum, diencephalon and
cerebrum. Main functions of each part are presented as in Table 2-11 [53-54].
However, when the brain is considered at the other point of view, it could be
separated into Right Hemispheres (RH) and Left Hemispheres (LH), located in the
biggest region of the brain is the cerebrum as in Figure 2-5 [55]. Although the brain
seems symmetrical on its two sides and both sides share performance of functions but
each hemisphere specializes in performing some functions as Table 2-12 [53]:
Rustling leaves 15
Ticking watch 20
Whispered speech 30
Elevator music 40
Conversational speech 50-60
Alarm clock, Shouting 80
Live rock band 100
Jackhammer 110
Propeller airplane 120
Jet airplane 130
20
FIGURE 2-5 Superior view of the brain shows left and right hemispheres
FIGURE 2-6 Lateral view of the brain (LH) that shows important areas related to the
motor speech area of Broca and the sensory speech area of Wernicke
FIGURE 2-7 Working of the brain, when listening, the speech is perceived and
interpreted by the sensory speech area of Wernicke, then it is
responded by the motor speech area and the motor cortex respectively
to speak out as responding
confirmed in [59] that part of LH was predominant in the perception of the prosody of
Thai speech sound, while the prosody of Chinese speech sound was dominated by the
part of the RH. From both [58-59] it is evident that only the left hemisphere can
respond to native speech sounds, as shown in Figure 2-8 [58]. Moreover, it was found
that the response of the LH of brains of Thai native listeners to the familiar Thai word
were larger than the response of the LH of brains to the unfamiliar word [60]. From
this study, it has been implied that it may reflect the presence of a long-term memory
trace for familiar Thai words, as shown in Figure 2-9 [60].
b) Gandour J., et al. found from a study about discriminating
pitch patterns in Thai words, that Thai native listeners showed activation in some
parts of LH when compared with an American English native listeners who do not
know Thai [61]. Similar to the results from one later work, about presenting Thai
tones to groups of Thai native listeners and Chinese and English native listeners who
do not know Thai, as in Figure 2-10 [62]. Also, there are several papers that revealed
tone discrimination by the LH of native listeners to other tonal languages [63-65], for
example, Mandarin Chinese, as shown in Figure 2-11 [63], and Norwegian. The
results are consistent.
(a)
(b)
(a) (b)
FIGURE 2-9 Potential maps of electric MMN response, for (a) Familiar word and
(b) Unfamiliar word
FIGURE 2-10 Comparison of English brain, Chinese brain and Thai brain, when
comparing the Thai lexical tone to the pitch task, only the Thai brain
showed significant activation in the FO near Broca’s area
(a) (b)
FIGURE 2-11 The PET scanning results from the PET study of tone perception that
shows different response on parts of (a) the left hemisphere of the
Mandarin Chinese group (b) and the right hemisphere of the English
group
25
over a unified network. This also helps service providers to enhance innovation by
offering new services to users/consumers over IP network, with new business models,
such as flat-rate pricing. Moreover, it helps Internet Service Providers (ISPs) to enter
into the telecom market; on the other hand, it helps PSTNs to enter into the broadband
market.
However, the other side of the coin, there are several issues that become the
limitations of VoIP. For example, voice quality that will be presented in detail with
Section 2.4, security, reliability and voice quality. For security, including VoIP threats
and vulnerability [82], it has been revealed that DoS, which relates to availability, is
the major attacks, whereas, 90% of vulnerability issues were found during
implementation of VoIP system (the remaining was from configuration and VoIP
protocol).
For reliability, this is about frequency of failure and recovery time of VoIP
system after a failure. This issue may become big if power shortages occur frequently
and requires a long time to recover, due to the VoIP system does not provide backup
power to all IP phones, as in the traditional PABX system that supplies power to the
legacy phones--analog and digital phones [83].
For the other obstacles [79], VoIP can be seen as a threat to revenues of
traditional PSTNs, particularly in the market that is less mature and monopoly,
whereas, regulatory from the regulator or the telecommunication commission can
become an issue for new entrants.
2.3.2 Overview on VoIP Services in Thailand
2.3.2.1 The Regulator [84-94]
In 2011, the first committees of the National Broadcasting and
Telecommunications Commission (NBTC) of Thailand were approved [84-86]. The
important mission of this commission is to complete the auction of the long-awaited
licenses for the 3G service, after the 3G license auction problem became talk of the
town in 2010 [87] (although the 3G service has been used in Bangkok and several
provinces already). While 3G communication technology is very important for
Thailand because it has been studied and found that 3G network investment
dramatically affects Thailand’s economy (e.g. boosting up the number of mobile
phone and wireless internet users), leading to higher employment rate and the GDP
[88]. Besides, it is the big step, leading to 4G communication technologies, which is
currently used in some cities in Scandinavia [89]. 4G, as shown in Figure 2-14 [90],
can provide a lot of advantages using IP-based core network, for example, very high
speed rates of data transfer and high capacity to support applications that require more
resources including VoIP applications.
28
TABLE 2-15 Comparison of International call rates of Traditional services and some
PC-to-Phone VoIP services (rates might be changed depending on the
promotion of each operator, unit: Baht/minute)
Service Rate
Traditional – CAT
Country CAT2call TOT netcall True NetTalk
001
Cambodia 20.00 3.00 2.00 3.00
Brunei 22.00 2.00-2.50 1.90-2.00 2.00
Myanmar 22.00 12.00 10.00 12.00
Philippines 20.00 6.00-7.00 5.50 6.00
Malaysia 9.00 0.91-2.00 1.00-1.60 1.00-3.00
Vietnam 20.00 3.00 2.10-2.20 2.50
Singapore 9.00 0.50 0.90-1.00 0.50-1.00
Laos 14.00 3.00 2.25 2.50
Indonesia 18.00 4.00-6.00 3.25-5.00 4.00-6.00
China 9.00 0.50 1.00 0.50-1.00
Japan 18.00 0.91-5.00 1.70-5.00 1.00-5.00
South Korea 18.00 0.91-3.00 1.30-2.00 1.00-3.00
UK 14.00 0.75-7.00 1.20-7.00 1.00-6.00
USA 9.00 0.50 1.00 0.50-1.00
31
Service Rate
CAT 009,
Country TOT 008 AIN 00500 TIC 00600
00900
Cambodia 14.00 24.00 14.00 7.00
Brunei 5.00 5.00 5.00 2.00
Myanmar 14.00 20.00 14.00 12.00
Philippines 15.00 18.00 15.00 8.00-9.00
Malaysia 4.00 5.00 4.00 1.00-3.00
Vietnam 14.00 26.00 14.00 7.00
Singapore 3.00 7.00 3.00 1.00
Laos 4.00 6.00 5.00 4.00
Indonesia 5.00-8.00 7.00 5.00-8.00 4.00-7.00
China 3.00 5.00 3.00 1.00
Japan 5.00-6.00 5.00 6.00-7.00 1.00-5.00
South Korea 4.00-7.00 5.00 5.00-7.00 1.00-3.00
UK 6.00-7.00 7.00 6.00-7.00 1.00-6.00
USA 3.00 5.00 3.00 1.00
Subtypes of G.711 are G.711-law and G.711A-law. The -law is mainly used in
It is a 64 Kbps coding technique called Pulse Code Modulation (PCM).
USA, Canada and Japan, whereas the A-law is used widely in Europe and the rest of
the world, including Thailand. Actually, this kind of codec is not a new thing that has
came with VoIP technology because it has been used in ISDN since the decade 90. It
is usually used in LAN with its MOS of about 4.1, whereas its LAN bandwidth
requirement with 1-3 frames/packet or 10-30ms payload is about 75-100 kbps per call.
2.3.4.3 G.729 [108-110]
This codec is an 8 Kbps coding technique called Conjugate Structure -
Algebraic code-excited linear prediction (CS-ACELP). It is usually used in WAN. Its
WAN bandwidth requirement with 1-3 frames/packet or 10-30ms payload requires
about 20-45 kbps per call (it depends on type of WAN). Its MOS is 3.92. It is not
designed for music. Besides, it does not reliably support DTMF tones and cannot
support fax or modem. There is a total algorithmic delay of 37.5 ms. G.729 has
several annexes such as Annex A, called G.729A, that is the reduced complexity
version of the G.729. It is the original codec in its family which consists of, for
example, G.729B, G.729D and G.729E.
2.3.4.4 G.723.1 [108-109, 111]
There are two voice coding techniques, Multi-Pulse Maximum Likelihood
Quantization (MP-MLQ) with 6.3 kbps which provides better voice quality than the
second technique – Algebraic –Code Excited Linear Prediction (ACELP) with 5.3
33
kbps. There is a total algorithmic delay of 15 ms. However, the voice quality is not so
bad because the MOS is 3.8 and 3.6 for the bitrate of 6.3 kbps and 5.3 kbps
respectively. Therefore, it is optional for use in WAN, whereas, it requires WAN
bandwidth 18-27 kbps per call approximately. It was created from about 18 patents
from several organizations.
2.3.4.5 Codec Selection [16, 112]
This is an important issue for implementation and administration the VoIP
systems/applications, due to its affects of voice quality perception of VoIP users. As
displayed in Table 2-17 [16], the last column is MOS (it stands for Mean Opinion
Score) that is the metric for voice quality, which will be present in detail with Chapter
5. The higher MOS means the better voice quality, however, it is a trade-off with
higher bandwidth consumption. Moreover, the use of a low bitrate which can impact
quality might be of concern to VoIP users.
Therefore, it has been recommended [112] that low bitrate codecs (e.g. G.729
and G.723.1) should be used over WANs, whereas, high bitrate codec (e.g. G.711 and
G.722) should be applied over LANs. However, codec selection may depend on
available codecs in each VoIP system because some codecs are not free, they may
require a license for use.
2.3.5 VoIP Signaling Protocols
VoIP signaling protocol is the main part that enables other components in a
VoIP system to work together. It makes the connection session of a call between
endpoints which are registered to the VoIP system already. Functions of IP signaling
protocol can be divided into four main functions, consisting of user location which is
to discover the location of the endpoint to establish a session, session setup which is
to enable the establishment of session parameters to call the endpoint and a called
endpoint, session negotiation which is about a set of properties for the session that is
involved in the call by endpoints, and call management which allows endpoints to join
a joint session or release.
In the telecom market at present, H.323 and Session Initiation Protocol (SIP) are
the main players [16, 112]. However, there are other VoIP signaling protocols that can
be used alternatively, such as Media Gateway Control protocol (MGCP),
MeGaCo/H.248 protocol and Inter-Asterisk exchange protocol (IAX) [16]. Skype
which is popular and classified as a peer-to-peer protocol, can be applied for personal
purpose and small businesses that require few VoIP clients for users [113-114].
Nevertheless, only H.323 and SIP that have been described in detail in this thesis.
2.3.5.1 H.323
H.323, as presented in Figure 2-18 and 2-19 [78], is the first official protocol
suite of VoIP signaling [78, 115]. It was developed and promoted by ITU-T, which is
the old standard body of telecommunication standards. ITU-T has been looking after
telecommunication standard since the analog era. Many standards such as SS7 can be
compatible and interworks with the traditional telecommunication technologies of
PSTN successfully because ITU-T experts in PSTN standards are mostly issued by
themselves. Due to the fact that H.323 came first into the VoIP industry, therefore, it
is now mature and has dominated the market in last decade, the first era of VoIP.
Basically, H.323 was design and developed using four key components that consists
of terminals, Gateway, Gatekeeper and Multipoint Control Unit (MCU) [78, 116-
118]. These components has been described briefly in [119] that terminals are H.323
endpoints or clients for communication with the other H.323 terminals, which can be
both IP softphones and IP hardphones, while gateways are the interface devices
between a H.323 terminal and a non-H.323 terminal from a circuit-switched network.
The gatekeeper works as the controller to provide central management and control
services such as address translation, admission and access control of H.323 endpoints,
bandwidth management, zone management, call signaling and management. Of
course, all MCUs, gateways and terminals must be registered with a gatekeeper. Last
but not least, MCU is used for managing multipoint conferences (from at least three)
by handling the signaling to add participants to a conference call and remove if it is
required. Conference calls could be both audio and video. However, MCU might be
combined into a gatekeeper or a gateway.
35
2.3.5.2 SIP
SIP stands for Session Initiation Protocol. Its architecture and protocol stack can
be seen in Figure 2-20 and 2-21 [78]. It has been described that it is a text-based peer-
to-peer protocol, using design concepts and architecture from Transfer Protocol
(HTTP) [117]. Its fist version was issued in early 1999 [119]. SIP, a high potential
competitive protocol for H.323, was developed and promoted by Internet Engineering
Task Force (IETF), which is the standard body for Internet protocols. Of course, IETF
36
experts in Internet technology, that likes the road for VoIP applications. Therefore, it
could be implied that the SIP protocol should be more compatible with IP protocol
than H.323 that has some issues about its complexity, centralization and monolith
[116]. However, IETF is not an expert of interworking with the traditional
technologies provided by the PSTN. Also, it was developed later than H.323,
therefore, its major point of weakness is the lack of maturity, compared to H.323 that
is more mature [120]. SIP mainly provides five functions to VoIP systems [117, 119],
consisting of session setup, session management, user location, user availability and
user capacities. These functions have been described briefly in [115] that session
setup is to enable the establishment of session parameters for both calling and called
parties, whereas session management is to manage the session by modifying session
parameters, transferring and terminating. For user location, it is to discover the
location of the end user when delivering a new SIP request or establishing a session,
while user availability and user capacities are to enable the determination of the
willingness of the called party to communicate and reachability of an end user, and to
enable the determination of media capacities of the components that can be used.
Also, it has been described [115] that in general SIP-based VoIP systems are the
major components consisting of User Agents (UA) and network servers. For UA, a
User Agent Clients (UAC) initiates a call while a User Agent Server (UAS) basically
replies. For the network servers, there are a proxy server for forwarding SIP requests
and providing the routing function, a registrar server for supporting register to clients,
a redirect server for directing the UAC to contact the alternative or next server, and a
location server for supporting address resolution [118-119].
Mainly, the perspective of CAC mechanisms is about VoIP system design and
development. For VoIP system implementation, installation and services, it has been
suggested to control VoIP traffic using Virtual LAN (VLAN) [112, 129-130], which
could be categorized as path control - traffic adaptation mentioned in Table 2-19
[128]. Not only virtual network isolation but VLAN also restrict access and protects
against both intentional and unintentional service disruptions to critical VoIP
servers/equipments [129-130].
Bigibility
To apply a user’s profile of permissions Access list
control
Application To disallow access to some application UDP port
control only control
To classify traffic (e.g., gold and silver Differentiated
Classification
Modification
service) services
Traffic
decision
Capability-
Path control To allow using some paths only
based routing
User
To change user behaviours Chargeback
behaviors
Congestion Random Early
To avoid congestion in a network
avoidance Detection (RED)
Congestion To handle congestion without network Fair queuing
management failure and priority
40
Compression techniques
Characteristics
IPHC CRTP ECRTP ROHC
Robustness to errors Low Low High High
Robustness to long delays Low Low High High
Robustness to reordering No No Yes No
Complexity Low Low Low High
Compression ratios High High Medium High
Maximum compression 2 bytes 2 bytes 2 bytes 1 bytes
of QoS and QoE can be seen in Table 2-22 [139-141], while definitions of the term
‘QoE’ have been described widely, as in Table 2-23 [137-138, 142-148].
To understand QoE for VoIP systems/services/applications clearly, the
comparison of QoS and QoE for VoIP can be drawn as in Figure 2-25, whereas,
Figure 2-26 shows that QoE for VoIP is on top of QoS [149-150]. Similar to
traditional telephony services provided by PSTNs, for VoIP / IP Telephone users,
QoE of VoIP relates to several categories of the experience of users, their
expectations are presented in Table 2-24 [148]. Of course, poor QoS within a VoIP
network can result in poor QoE which may dissatisfy as the User A1 in Figure 2-27,
adopted from [147].
FIGURE 2-27 VoIP QoE: Good QoE perceived by User A vs poor QoE perceived
by User B
TABLE 2-21 The Statistics of the terms QoS vs QoE, from the abstracts of IEEE
papers
No. of appearing times of the terms
Year
QoS QoE
2002-2004 2362 6
2005-2007 (by 26 Oct only) 2627 16
44
ITU-T Definition
QoS QoE
The collective effect of The overall acceptability of an application or
service performances, which service, as perceived subjectively by the end-user.
determine the degree of NOTES:
satisfaction of a user of the 1. Quality of Experience includes the complete end-
service. to-end system effects (client, terminal, network,
services infrastructure, etc.).
2. Overall acceptability may be influenced by user
expectations and context.
Author Definition
Hestnes et al. 1. The user's perception of what is being presented by a
(2003) communication service or application user interface.
2. The overall result of the individual Quality of Services and a
measure of overall acceptability of a service or application that
includes factors such as usability, utility, fidelity and the level of
support from the application/service provider.
Nokia (2004) 1. The ability of the network to provide a service with an assured
service level.
2. How a user perceives the usability of a service when in use – how
satisfied he or she is with a service.
3. The perception of the user about the quality of a particular service
or network
Lopaz et al. An extension of the traditional QoS in the sense that QoE provides
(2006) information regarding the delivered services from an end-user point
of view.
Soldani How a user perceives the usability of a service when in use – how
(2006) satisfied he/she is with a service in terms of, e.g., usability,
accessibility, retainability and integrity.
IneoQuest The customers’ perception of how good of a job the service provider
(2008) is doing delivering the service.
Kilkki (2008) The basic character or nature of direct personal participation or
observation.
Winkler Quality from the perspective of the user or consumer (e.g. viewer),
(2009) with a focus on perceived quality of the content (or more
comprehensively, user experience)
Dagstuhi The degree of delight of the user of a service, influenced by content,
(2009) network, device, application, user expectations and goals, and context
of use.
Batteram et The measure of how well a system or an application meets the user’s
al. (2010) expectations.
TABLE 2-24 Categories of VoIP User’s Experience and their quality expectation
Subjective Objective
Accuracy and
High Medium – high
Reliability
Management skill
High Low
requirement
Endeavor requirement High Low
Automatic
No Yes
measurement
objective measurement tool
Special test facilities
Soundproof room(s) (e.g. E-model measurement
requirement
tool)
Very long
Time consumption (e.g. 5 minutes per Short
participant)
High
Collaboration
(e.g. 24-32 participants per Low
requirement
condition)
High
(for conducting subject to
participate, to employ a High
Cost research assistant and to (for a standard tool, e.g. E-
prepare standard test model measurement tool)
facilities, e.g. a sound proof
room)
However, voice quality for general users relates to their expectations, while
their expectations regarding voice quality exist at two levels, fixed-line level and
mobile phone level [157]. For the fixed-line network provided by a PSTN, it normally
provides “toll quality” voice with a MOS of al least 4.0. Particularly the fixed lines
using DS-0 circuits (e.g. ISDN), most calls achieve ‘excellent’ voice quality or MOS
of 4.3, which is as good as it gets for narrowband speech. For mobile phone network,
general users perceive that most calls are provides as ‘fair’ voice quality. Its voice
quality expectation may considerably be MOS of 3.2-3.5.
2.4.2.3 Understanding Subjective and Objective
To understand subjective and objective quality measurement/assessment clearly,
it is necessary to clarify the terminology of subjective and objective. These two terms,
subjective and objective, have been confused and misused widely, including the
misusing in the law or criminal discussions, clinical discussions and journal articles
[161-162]. In the criminal discussions, fingerprint identification has often been
referred to as subjective, whereas, in clinical discussion, pain, muscle testing and
force measurements become issues. Thus, back to the basics, is necessary to
understand the term subjective and objective clearly, dictionaries can be consulted in
[161] and Table 2-27 [163-166].
According to speech or voice quality measurement/assessment in the area of
VoIP and the definitions in [161-166], therefore, it can be summarized shortly that
subjective quality measurement/assessment for VoIP is the voice or speech quality
measurement/assessment methods that are made based on the feelings or opinions of
the subjects or users, reflect the perceived voice or speech. While, the objective
quality measurement/assessment is the measurement/assessment methods that are
made based on facts, something real, or observable phenomena, not influenced by
personal beliefs or feelings.
2.4.3 Network Factors: The Gang of Evils
There are several factors that may affect voice quality of VoIP
applications/services, not only codec selection and packet size [167] but also network
factors. There are three classic major network factors [16, 168-169], consisting of
packet delay, packet loss and jitter, that have been called the ‘three evils’ for voice
quality [170]. However, it has been stated in [155] that echo is also a cause of voice
quality issue for VoIP. Nevertheless, there are other factors that can also affect voice
quality, such as, packet mis-order, transcoding, network duplex and voice activity
detection algorithm [171]. However, using search in IEEEXplore--the major
48
organization that sharp the world, the evidence about research directions based on
VoIP quality between the years 2000-2012 has been found as in Figure 2-30 [172].
Nevertheless, focusing on the three evils, it has been found that most results of 485
mention packet delay, whereas 389 results and 311 results mention packet loss and
jitter respectively, as in Table 2-28 [172-175]. Therefore, only the three evils have
been described in detail.
Definition
Source
Subjective (adj.) Objective (adj.)
TheFreeDictionary 1.a. Proceeding from or 1. Of or having to do with a
[189] taking place in a person’s material object.
mind rather than external 2. Having actual existence or
world. b. Particular to a reality
given person; personal. 3. a. Uninfluenced by
2.Moodily introspective. emotions or personal
3.Existing only in the mind; prejudices. b. Based on
illusory. observable phenomena;
presented factually.
Cambridge Influenced by or based on Not influenced by personal
Dictionaries personal beliefs or feelings, beliefs or feelings; fair or real.
Online [190] rather than based on facts.
FIGURE 2-30 The result from search using the keyword “VoIP quality”
TABLE 2-28 The statistic of the results from IEEEXplore after using the keywords
VoIP quality, packet delay, packet loss and jitter
Keywords Results
(a)
(b)
(c)
FIGURE 2-32 Example of random and bursty packet loss: (a) Random packet loss
(b) Moderate bursty packet loss (c) Heavy bursty packet loss
Author Statement
T.A. Hall “…., subjective measures are often very accurate and useful for
(2001) evaluating a telephony system.” [185]
M. Narbutt and “Subjective testing is considered as the most “authentic”
M. Davis, method of measuring voice quality.” [186]
(2005)
Ding et al. “Speech quality is inherently subjective, as it is determined by
(2007) the listener’s perception. Therefore, the most reliable approach
for assessing speech quality is through subjective tests.” [187]
M. Goudarzi Subjective listening tests are the most reliable method for
(2008) obtaining the true measurement of user’s perception of
voice quality and have good results in terms of correlation
to the true speech quality. [17]
P. Khanduri, “Subjective is widely considered the most “authentic” method of
(2009) measuring voice quality.” [188]
Al-Akhras et al. “…, as subjective methods are the most accurate methods for
(2009) measuring the speech quality, they are used to calibrate objective
methods.” [189]
Mahdi and “The most reliable method for obtaining true measurement of
Picovici (2009) users’ perception of speech quality is to perform properly
designed subjective listening tests.” [152]
2.4.5.2 PESQ
PESQ is state-of-the-art in terms of objective voice quality measurement and
has been claimed to have very high correlation with the subjective voice quality
measurement method [191]. It is the most common and popular method of intrusive
measurement methods [153, 192], including the original version, P.862 that supports
narrow-band telephone networks and speech codecs, and P.862.2 which supports
wideband telephone networks and speech codecs, as shown in Figure 2-35 [193]. The
original ITU-T P.862 PESQ supports only narrow-band telephone networks, whereas
its new version was extended to support wideband telephone networks [194-195]. It
uses the strength of both Perceptual Speech Quality Measurement (PSQM) and
55
(a)
(b)
FIGURE 2-36 PESQ overview (a) PESQ concept. (b) The processes insides the
model of PESQ
2.4.5.3 E-model
E-model is the most popular and widely used method of non-intrusive
measurement methods, as mentioned in [16]. Originally, it was to aid the transmission
planners with testing transmission performance of networks using computation
algorithms to insure satisfaction of users [178, 197]. This computational method is
based on 21 parameters, as shown in Figure 2-37, whereas the default values and
permitted values of those parameters are presented in Table 2-30 [178].
Those parameters have been simplified into the main factors to help calculate
the transmission rating factor R, as follows:
R = Ro-Is-Id-Ie+A (2-2)
where
57
Default
Parameter Abbr. Unit Remark
value
Send loudness rating SLR dB +8 (Note 1)
Receive loudness rating RLR dB +2 (Note 1)
Sidetone masking rating STMR dB 15 (Notes 2, 4)
Listener sidetone rating LSTR dB 18 (Note 2)
D-Value of telephone, send side Ds – 3 (Note 2)
D-Value of telephone, receive side Dr – 3 (Note 2)
Talker echo loudness rating TELR dB 65
Weighted echo path loss WEPL dB 110
Mean one-way delay of the echo path T Ms 0
Round-trip delay in a 4-wire loop Tr Ms 0
Absolute delay in echo-free connections Ta Ms 0
Number of quantization distortion units Qdu – 1
Equipment impairment factor Ie – 0 (Note 5)
Packet-loss robustness factor Bpl – 4.3 (Notes 3, 5)
Random packet-loss probability Ppl % 0 (Notes 3, 5)
Burst ratio BurstR – 1 (Notes 3, 6)
Circuit noise referred to 0 dBr-point Nc dBm0p 70
Noise floor at the receive side Nfor dBmp 64 (Note 3)
Room noise at the send side Ps dB(A) 35
Room noise at the receive side Pr dB(A) 35
Advantage factor A – 0
NOTE 1 – Total values between microphone or receiver and 0 dBr-point.
NOTE 2 – Fixed relation: LSTR = STMR + D.
NOTE 3 – Currently under study.
NOTE 4 – Eq. (3-24) in [178] provides also predictions for STMR > 20 dB. However,
such values can hardly be measured in a reliable way because the measurement
device will mainly cover the acoustic coupling, and not the electrical one.
NOTE 5 – If Ppl > 0%, then the Bpl must match the codec, packet size, and PLC
assumed.
NOTE 6 – E-model predictions for values of BurstR > 2 are only valid if the packet
loss percentage is Ppl < 2%.
59
FIGURE 2-38 The graph represents the relation of R-value and MOS-CQE
TABLE 2-31 The relation among R-value, MOS-CQE and user satisfaction
see the evidence from the search results in IEEEXplorer website as in Table 2-32
[201-202].
TABLE 2-32 The statistic of the results from IEEEXplore for search using the
keywords VoIP, quality, PESQ and E-model, between 2002- Aug
2012
Keywords Results
equipment and 3G mobile network. He conducted his work in the UK, therefore, the
used speech samples were British English. For the part of objective measurement tool,
the GSM and AMR codecs had been tested with more than 200 speech samples using
PESQ and 3SQM, and then compared to the result from the part of non-ITU-T
standard subjective listening tests, which was conducted with 33 subjects by sending
the score sheets with instruction, voice files and instructions to the subjects. This
work also used some Asterisk open source PBX software installation. He investigated
and found that PESQ showed a better result than 3SQM. Also, he found that the male-
speech samples got a higher score than female speech samples. However, this work
mainly focused on the objective measure methods.
There is an article that conducted subjective tests (ACR) to compare with PESQ
objective tests. The ACR listening tests were conducted with 20 Japanese using
speech samples of Japanese seven-digit numbers from 2 male and 2 female speakers.
The facilities include a headphone and a soundproof room. The results, called
subjective MOS were then compared to the results from PESQ to evaluate the
effectiveness. In the conclusion, it has been confirmed that PESQ objective MOS
correlates relatively well with the subjective MOS [204]. Also, there is an article that
that presents the comparison of MOS evaluation characteristics among Chinese,
Japanese and English in IP telephony [205]. It has been stated that subjective voice
quality evaluation may be influence by nationality and pointed out that the
dependencies of language, culture and nationality should be investigate to gain more
understanding of QoE. The main contributions of this work are showing the
subjective MOS difference between Japanese and Chinese graphically, as in Figure
2-39, although it shows correlation coefficient of 0.903 [205].
The results have been gathered and analyzed from 25 Chinese and 32 Japanese
using ACR listening opinion tests with native speech sample, within a soundproof
room and provided by three G.722 family codecs, referring to packet loss. Of course,
this is very important evidence about language dependency and cultural dependency
that is consistent with the culture variation issue, covering language variation, pointed
out in [205]. Particularly the language dependency issue that’s has been
acknowledged by ITU-T already [206-207].
In [208] subjective measurement based on collecting opinion via the Internet
has been proposed. They proposed a browser-server structured system and a set of
procedures for collecting subjective opinions through the Internet using degradation
category rating (DCR) as the rating scale. The study has been conducted to compare
the proposed method with 276 subjects and the listening tests based on ITU-T
recommendation P.800 with 32 subjects and 8 operators. The test were operated in the
campus of Beijing Institute of Technology, therefore, it could be implied that all
subjects were Chinese. The experiment was designed to compare the performance of
two of the candidate codec and the reference codec, referring to four types of noisy
background (e.g. Office and Babble40). Also, in the experiment the modulated noise
reference unit (MNRU) has been applied. It has been concluded that the proposed
method was proven to be able to collect subjective opinion scores via the Internet
easily, cheaply and flexibly, although it requires a large number of subjects, whereas,
the proposed user qualification mechanism can give reasonable reliability by
distinguishing between reliable and unreliability subjects.
62
Nevertheless, there are also several research works that have been conducted
using subjective methods. Most of them are ACR listening tests (both formal and
informal) with about 20-30 subjects. Some gathered previous works are summarized
in Table 2-33 [17, 155, 203-205, 208-210].
Although, there are several previous works that conducted subjective tests, few
works present important information clearly about their laboratory, system under test,
codec under test, language, nationality of subjects, method and test conditions. Thus,
selected information about those works based on different languages, which are
implied being tested by the native listeners/speakers, are presented as in Table 2-34,
obtained from [211-214]. In the table, it can be seen that with, the same codec, MOS
that represents voice quality perception from different languages and native listeners
are rather different. Therefore, it is very interesting to find MOS from base-on Thai
language and Thai native listeners.
2.5.2 Previous Research Using Thai and Other Languages
In [215-216] objective tests with 13 languages, both non-tonal languages
(Arabic, English, French, German, Hindi, Japanese, Korean, Protugese Russian and
Spanish) and tonal languages (Swedish, Chinese and Thai) were used. These two
work used many languages to obtain MOS for BV16 and BV32 codecs (BV16) is a
madatory codec in the PacketCable 1.5 standard), then compared with other codecs,
such as, G.711u, G.729E, GSM-EFR, G.728, G.729 and G.723.1 at 6.3 kbps, by using
PESQ measurement method and mentioning to one subjective listening test results.
However, in this work, it did not focus on analysis and comparison of the issue
between non-tonal and tonal languages which includes Thai.
63
Uzoamaka [209] conducted his thesis on the tonal language Igbo. The test was
listening subjective tests. 24 subjects, 12 male-subject and 12 female subjects listened
to 200 speech samples before giving the score for each speech sample. The results
from Igbo were used to find the correlation with the PESQ P.862.2 dataset of Igbo
before comparing the correlation of the results of Dutch. The compared result showed
that Igbo (r=0.88) has higher correlation than Dutch (r=0.84) slightly. Also, this work
found that the correlation between results for tone and non-tone sentences from the
subjective tests was very high (r=0.96). It was claimed that this work is the first
research about tonal language in Africa. However, in this work, each subject listened
to each speech sample without encoding a codec such as G.711 or G.729.
64
0.71
Clean American- 3.69 Psytechnics
ACR 32 [211]
Speech English (Report)
G.711 American-
Clean 32 4.05 8 kHz
A-law ACR English [212]
Speech sampling rate
(64 kbps) Korean 32 4.41
Ren et al., [217] presented their articles based on Chinese, which is a tonal
language. Further details are found in 5.6.3, those works mainly presented E-model
enhancement.
2.5.3 Previous Research on E-Model Enhancement
Although E-model is very popular and used widely, there are disadvantages still
[218]. For example, it has been pointed out that its listening quality evaluation is less
accurate than that derived by a signal-based algorithm, it does not consider the
variability of network delays and loss rates and it does not take account of the
interaction between different factors (e.g. the interplay between network delay and
loss rates). Also, the E-model may require verification by field surveys or laboratory
tests, and calibration properly.
There are several research works that propose objective measurement
enhancement, including extended E-model, E-model enhancement or E-model
improvement [183, 219-224]. M. Voznak et al., who presented [219-220] about E-
model modification and improvement, stated that the development of the E-model is
not successfully completed. Those works show that the current version of E-model
does not reflect reality and pointed aggressively that the SG12 group failed to address
significant influences (e.g. codec tandeming). Also, H. Zhang et al., [183] presented
the enhanced E-model using parameter B, which is a new packet loss burstiness
measure parameter, whereas, Zhen and Lin [221] presented the proposed extension of
the E-model, as in Figure 2-40. This proposed model is composed of codec module,
delay module and packet loss module. These tress modules is integrated with the
standard E-model finally to obtain the result called MOS E . However, this proposed
model is based on objective measurement only, for example, Iec is calculated by
applying PESQ.
65
Ding and Goubran proposed the extended E-model using the modified Ie and a
new parameter, called jitter impairment factor Ij, as shown in Eq. (2-4) and (2-5)
respectively [222].
For Eq. (2-4), Ie_opt is the optimum (without packet loss), loss_rate is the
amount of packet loss in percent, C1 and C2 are constants that can be vary depends on
codec and packet loss rate. For Eq. (2-5), H is about delay distribution parameter and
T is about the buffer size, whereas, C 1 -C 4 are coefficients and K is a time constant
(see [222] for more details). These equations have been applied in [223] to determine
maximum number of calls in some given bandwidth capacities while maintaining a
certain level of QoS.
E-model enhancement using jitter impairment factor Ij, has also been found in
[224] with different forms of Ij (see [224] for further information).
The most similar work to this research has been found in [217], Ren et al.
presented their article, entitled Assessment of Effects of Different Language in VoIP,
about experiment to investigate effects of different languages on perceived voice
quality on different VoIP system factors (delay, loss and codec) by testing with
English and Chinese). In this work, PESQ P.862, an objective measurement method
was used to evaluate the sound quality by testing speech samples (English and
Chinese) that included one male and one female for each language. The speech
samples are record in 16-bit, 8 KHz linear PCM format and were selected from the
languages speech sets. These were similar in tempo and perceived quality. Each was
about 8 s long and had about 50% of active speech intervals. They did not find the
different effects from short delay (0-200 ms) to different language but found that
66
G.729 codec degraded Chinese speech samples more than English speech samples,
compared with G.711u codec. Also it was claimed that English speech samples were
always received with better voice quality, compared to other languages. However, the
important part of this article is about presenting the Enhance E-model by adding the
new impairment factor Il, called language impairment factor, as the equation below:
where
Il = 0 for English speaker
C1=0.52819, C2=-0.574391 and PPL is packet loss percentage.
At present, although there are several proposals that present the methods to
improve or enhance the standard E-model, for example, using new factors, such as,
jitter impairment factor and language impairment factor, the final E-model or the
perfect E-model has not been successfully discovered.
Moreover, according to all the previous research that has been reviewed,
therefore, it can be summarized here that there is no E-model
enhancement/extension/improvement/modification using subjective MOS from
subjective measurement methods.
such as, queuing policies and sizes, bandwidths, packet delays and packet loses, and
packet reordering. Moreover, network emulators have the advantage of being
controllable and reproducible of the traffic generation with the required conditions.
Rizzo [243] proposed a simple approach that is very effective to put the
standalone system called “Dummynet” into the studied network to conduct
experiments. Originally, it has been developed as a component of FreeBSD for over
one decade. Now it is available in some other operating systems (e.g. Mac OS X,
Linux and Windows) [244]. With the features as in Table 2-37, Dummynet works by
simple intercepting communication between the protocol layer under analysis and the
underlying one which simulates the presence of a real network with conditions of
network factor effects, for example, limited bandwidths, packet delays and packet
loses [242-244]. This approach gives the advantages of both simulation and real-
world testing by conducting experiments on a workstation or PC as traffic generators
without modification of a real world application.
Running an experiment using Dummynet is as easy and quick as running an
application on a PC. Moreover, Dummynet produces almost no overheating in the
communication, meaning experiments can be conducted within the maximum
performance of the system in use.
Component Description
Channel API A channel is an API that Asterisk uses to interface with
the PSTN (e.g. ISDN), Voice over IP with various IP
signaling protocols (e.g. SIP, IAX, H.323, MGCP and
Skinny) and miscellaneous channels (e.g. ACD)
Codec Translation API A codec is a combination of coder/decoder or
compressor/decompress, which is used to change analog
voice signal into a digital data stream and to change the
data stream back into analog voice signal. Asterisk
support many codecs, for example, G.711 alaw, G.711
ulaw, G.722, G.723.1 G.726, G.729, GSM, iLBC, Speex
and LPC10.
File Format API It is an API which is used to store audio data such as
voicemail and music on hole that can be found in the
directory /var/spool/asterisk/vm and
/var/lib/asterisk/sounds respectively. Asterisk supports
variety of file formats (e.g. MP3, raw, pcm, vox, wav and
gsm)
Application API It is an API which is used to support applications, for
example, voicemail, conferencing, paging and other
custom applications
PBX Switching Core It is the important part of Asterisk which is used to
receive telephone calls from the interfaces ( channel
APIs) and handles calls by following a dial plan.
Application Launcher It is mainly the middle-ware between PBX switching core
and application APIs. For example, it has been used by
PBX switching core to ring IP phones, to dial out on
outgoing trunks and to connect to voicemail. Moreover, it
has been used to interface with CDR core.
Codec translator It is the part that is used to connect, for example, a
channel that compressed with G.711alaw to the other
channel that compressed with G.729 seamlessly by
transcoding.
Scheduler and I/O This part is used to handle related applications and drivers
Management to operate efficiently under the system conditions.
Dynamic Module When Asterisk starts, this part is used to loads and
Loader initialized drivers that provide, for example, channel
drivers, file formats, call detail recording back ends,
codecs and applications.
CDR Core CDR stands for Call Detail Recording, therefore, CDR
core is mainly used to record details of outgoing calls
(e.g. calling and called numbers, duration time and the
trunk number).
69
Mainly, the basic object made available by Dummynet is called “pipe” with
given bandwidth, queue size, and delay to generate effects to the traffic, as shown in
Figure 2-42 [244], whereas, the packet classifier called “ipfw” is used to match
70
packets to a list of numbered rules, called “ruleset”. Therefore, those can be applied to
function in several manner patterns [243-244]. This is very useful for researchers to
use for performance evaluation of the system/network, including evaluating voice
quality provided by a VoIP system referring to network factor effects.
Dummynet is very useful and used widely [132, 217, 245-254], however, it has
few limitations, for example, errors may occur from reproducing the timing computed
by the model, whereas, running Dummynet on the operating system with appropriate
load can reduce opportunity of error occurrence. Further information can be found in
[243-244].
calculating the p-value of the test statistic. In many research areas, a standard of
0.05 is widely used. It means, it is possible to have a 5% chance of obtaining results
as inconsistent with the null hypothesis as it has been done using the data from a
71
sample drawn from the group of subjects. The steps for a hypothesis test can be
summarized as follows:
1) Determine the null and alternative hypotheses
2) Draw a sample from the group of subjects of interest
3) Collect data
4) Calculate a test statistic based on the data
5) Find the p-value of the test statistic
6) Discriminate whether the null hypothesis is rejected or not
In this thesis, t-test is used for comparison between the results from two groups
of subjects whether they are the same or different, whereas, the Analysis of Variance
or ANOVA is used for comparison among the results from at least three group of
subjects. The most simple method to discriminate whether the null hypothesis is
level of 0.05, with 95% confidence interval, the null hypothesis is rejected and then
accept or reject is considering the p-value, if its value is less than the significance
accept the alternative hypothesis instead. There are several equations to determine t
statistic and F statistic that are used to calculate for various cases. However, the
standard between-subjects t-test and the One-way between-subjects ANOVA, the
t statistic and F statistic can be found using the equations in Table 2-38 [255-257]. For
further information about t-test and ANOVA, it can be found from [255-259].
TABLE 2-38 Comparison of t-tests for two groups and ANOVA for three groups
H0: 1 = 2 H0: 1 = 2 = 3
H1: 1 2 H 1 : Not all of the means are equal
x1 x 2 n1 ( x1 x ) 2 n 2 ( x 2 x ) 2 n 3 ( x 3 x ) 2
t
I 1
(2-7)
s12 s 22 F
(n1 1) s1 (n 2 1) s 22 (n 3 1) s 32
2
(2-8)
N I
n1 n 2
Where x1 and x2 are the Where x1 , x 2 and x 3 are the means of the three groups
means of two groups; and x is the mean of all; s12 , s 22 and s 32 are the variances
s12 and s 22 are the variances of the three groups; n1 , n 2 and n3 are the sample sizes
of two groups; and n1 and of the three groups; N is total sample sizes, and I is 3
n 2 are the sample sizes of for three groups.
two groups.
“…, a model is not verifiable directly by an experiment. For all models are both
true and false… The validation of a model is not that it is “true” but that it generates
good testable hypotheses relevant to important problems.” by R. Lavins
“All models are wrong, but some are useful.” by George E.P. Box
A model is a mathematical description of a state or process. It is used to
understand about that process/mechanism. Nonlinear regression can be applied to fit a
mathematical model to the data to determine the best fit values of the parameters of
the model.
The purpose to use a model is not to describe the system perfectly, due to a
perfect model may have too many parameters to be useful. Therefore, building a
model is to find a simple model as possible that can describe the system and can fit to
the data.
Therefore, model fitting is the method to fit a model to experimental data or to
choose which model best fits the data [261]. Basically, the principle of model fitting
can be described as in Figure 2-43 [262]. By using a model which may include the
object as well as the instrument, it can be computed modeled data m(p) from the
parameters p. The model data are then compared with the real data d to get the
residuals r. However, the fitting may be required repetition with the set of parameters
p to minimize the residuals.
Nowadays, there are several available tools that can create both linear and non-
linear regression models [263-264], including the trendline in Microsoft Excel and the
curve fitting toolbox and the surface fitting toolbox in the Matlab that are appropriate
to one variable and two variables respectively.
2.7.3 Model Evaluation Tools: Mean Absolute Error and Mean Absolute
Percentage Error
After obtaining a new model, it is necessary to be evaluated by comparing with
the old model or the existing model. In this thesis Mean Absolute Error (MAE) and
Mean Absolute Percent Error (MAPE) [265-267], the simple model evaluation
methods, have been selected to evaluate the enhanced E-model (E2-model) and
ThaiVQE model, comparing with the standard E-model that is the objective
measurement method to be enhanced. MAE is sensitive to small deviations from zero
and it can be considered as a robust measure of accuracy of the model. It tends to
prefer models that produce occasional large failures or errors. However, in this thesis
MAE is based-on the 5-point scale (MOS), which may be inconvenient to observe.
73
This is the reason to use MAPE for evaluation as well, because MAPE shows the
errors in percentage. Both MAE and MAPE equation can be shown as follows:
MAE | xi xi |
1 n
(2-9)
n i 1
MAPE i
1 n x xi
100%
n i 1 xi
(2-10)
Where xi is the observed value or the subjective data in the meaning of this
thesis, xi is the estimated or predicted value from the model, and n is the number of
instances in the dataset.
CHAPTER 3
METHODOLOGY: SUBJECTIVE AND OBJECTIVE
MEASUREMENT
In this chapter, the methodology for subjective and objective measurement for
this research, as shown in Figure 3-1, has been separated and described into five
phases in this chapter. This consists of the experimental design phase (Phase I),
preparation phase (Phase II), pilot phase (Phase III), intensive subjective test phase
(Phase IV) consisting of listening opinion tests, conversation opinion tests and
interview tests, and then the objective test phase (Phase V) with PESQ and E-model.
However, details of the Thai bias factor, ThaiVQE, and E2-model are presented in
Chapter 5.
TABLE 3-2 Summary about the conversation opinion tests, each scenario required at
least 24 subjects (total 576 subjects)
2 2A*
3 3A* 3B* 3C* 3D*
5 5A* 5B* 5C* 5D*
6 6A*
10 10A* 10B* 10C* 10D*
15 15A* 15B 15C 15D
20 20A* 20B* 20C* 20D*
Notes: * = Conversation opinion tests with G.711A-law, G.729 and G.722
* = Conversation opinion tests with G.711A-law
A, B, C, D = 0 ms, 400 ms, 800 ms and 1500 ms respectively
For the soundproof rooms, after surveying, at the starting period of this thesis,
there was no available soundproof room in compliance with ITU-T P.800 standards.
Therefore, to avoid the cost issue, the studio room at the 7th floor of the Central
Library, KMUTNB, has been selected (its plan is shown in Figure 3-2 [268]). The
room was modified by putting some carpet down and a second layer of glass on the
windows in order to reduce the room noise and reverberation time which are two
important acoustic properties. Moreover, a sound level meter and reverberation time
analyzer was required for measurement of those two values (measurement of sound
level of the background noise is shown in Figure 3-3). Figure 3-4 and 3-5 present the
major changes both ‘before’ and ‘after’, while the results ‘before’ and ‘after’
modification of the room are presented in Table 3-4 [268].
For the VoIP testbed system, it consists of two computers, as shown in Figure 3-
6 and 3-7. The first computer was installed with Linux, then installed with Asterisk—
an open source VoIP application which became the testbed system. The second
computer was installed with Linux, and then installed with a module of dummynet to
work as the network emulator that can generate delay and loss for each test scenario.
For the IP phones, they were a set of IP phones that can be bought in Bangkok. They
are required because this research is intended to study ‘real’ IP telephone systems
with Thai users.
FIGURE 3-2 Top view of the plan of the studio room, which has been improved to
be a laboratory for this research
80
(a) (b)
FIGURE 3-4 Example of the window sill (a) before installing the second layer of
glass and (b) after installation of the second layer of glass
81
(a) (b)
FIGURE 3-5 Example of room floor (a) before installing the carpet and (b) after
installation of the carpet
TABLE 3-4 Comparison of imported values and properties of the modified room
FIGURE 3-7 The VoIP testbed system, while setting up, which consists of an IP
telephone system, a network emulator, a switch and IP phones
For a set of Thai speech, it is a subset of Thai Speech Set for Telephonometry
(TSST) [18, 269]. It was developed and funded by Human Language Technology
(HLT), National Electronics and Computer Technology Center (NECTEC). TSST
consists of 1 girl, 1 boy, 4 female and 4 male speeches. Each speaker recorded 25
pairs of sentences (or phrases). However, only 10 pairs of them, as in the Appendix A,
have been used for listening tests of this thesis by creating 100 speech lists. Then the
systematic-random approach for playing a speech list has been applied. Firstly, the
table of speech list has been created as in Table 3-5, before creating the systematic list
of speech.
83
4 G1F2 G2F2 G3F2 G4F2 G5F2 G6F2 G7F2 G8F2 G9F2 G10F2
5 G1F3 G2F3 G3F3 G4F3 G5F3 G6F3 G7F3 G8F3 G9F3 G10F3
6 G1F4 G2F4 G3F4 G4F4 G5F4 G6F4 G7F4 G8F4 G9F4 G10F4
7 G1M1 G2M1 G3M1 G4M1 G5M1 G6M1 G7M1 G8M1 G9M1 G10M1
8 G1M2 G2M2 G3M2 G4M2 G5M2 G6M2 G7M2 G8M2 G9M2 G10M2
9 G1M3 G2M3 G3M3 G4M3 G5M3 G6M3 G7M3 G8M3 G9M3 G10M3
10 G1M4 G2M4 G3M4 G4M4 G5M4 G6M4 G7M4 G8M4 G9M4 G10M4
Note2:
C1, C2 = the child number 1 and 2 respectively
F1, F2,…,F4 = the female speaker number 1 to 4 respectively
M1, M2,…, M4 = the male speaker number 1 to 4 respectively
G11M2,…, G3M4}
. .
. .
G8M2, G9M3}
Secondly, all speech lists are numbered and then they will be mapped to the
announcement number 2001 to 2100 as arranged in the VoIP testbed system.
3.2.2 Preparation of Subjects
According to Section 3.1, the minimum number of subject is totally 1280
subjects, see details in Table 3-6. However, the number of total subjects may be over
one thousand because each test should have additional subjects in case of outliers and
important missing values while gathering the test results.
The campaign to interest students in joining the tests used a variety of
techniques, for example, giving a pen and/or coupon to buy goods or services from
shops in KMUTNB, also billboards were installed at several points on campus.
Moreover, letters of request for collaboration of students to join the tests were issued
and sent to many lecturers, particularly the lecturers in department of applied
statistics.
84
FIGURE 3-8 The overview of VoIP system for the test, which provided an IP phone
for each participant in the soundproof room
FIGURE 3-9 The captured screen shot from investigation of packet delay and packet
loss before testing
85
For the subjects, they were KMUTNB’s students. The minimum of 24 subjects
per scenario were proposed although it is lower than the recommendation from the
original handbook on telephonometry that guided use of 30 subjects [160]. Moreover,
it suggested having both male and female subjects to participate in the test equally.
For ACR tests, the highest and the lowest MOS-LQS [270] from each subgroup of
subjects were considered as the outliers, this is why the proposed numbers of subjects
for each scenario became at least 28-36 subjects.
In each ACR test, after reading the instruction and listening to it again briefly, a
subject or participant, one by one per round, had to sit down and randomly obtained
an extension number of the announcement that linked to a wave file from the speech
list. Next dialed to listen to a speech list via an IP phone once, and then evaluate it
using a paper-based form, as shown in Appendix B. Each speech list consists of 10
different speech groups with lengths of about 8 seconds from each of the 10 speakers
(1 girl, 1 boy, 4 women and 4 men). That means every participant listened to all 10
86
sentence groups and all 10 speakers, therefore 10 value scores of evaluation should be
given using the 5-point scale (5=excellent, 4=good,..., 1=bad). However, while
listening, each subject would hear ‘beep’ tones (the 1st and 6th speech groups start
with a double beep tone) to notify and get ready before hearing each speech group as
in Figure 3-11 [18].
After gathering data from the test, at the first round, those would be classified
for the outliers, the abnormality in each subgroup, and the abnormal data from a
participant that gave scores too broad, for example, giving 2, 3, 4 and 5 for listening
to a speech list that was played with the same codec. For the abnormal data that is
called the outliers, they would be discarded. Then, the test would run again until
reaching the satisfactory number of subjects in each subgroup.
FIGURE 3-11 A speech list that starts with the speech by Child1 (a girl)
Before finishing the interview, he or she would be asked to score the speech
quality that has been provide using G.711 or G.722, using the same scale, as in the
ACR tests. The data from all subjects were recorded and gathered using a paper-
based form by the interviewer. The result obtained from each interviewee is only one
value.
3.3.3 PESQ Tests with Thai Speech
These tests have been conducted in order to investigate the issue about language
dependency of PESQ when operating with Thai speech. For this task, PESQ
measurement method has been used for estimating the MOS via the VQT, the
available voice quality tester at the TOT Innovation Institute. The selected codec for
the test was G.711A-law. There were 4 speech lists of TSST, to be tested. The Thai
speech sets for this have been selected with tone-balanced consideration, as in the
Appendix, whereas TSST-List10 has been selected because the speech is not too short
or long. As recommended, the length of each speech sample should be around 8-30
seconds [271]. There are 3 groups of speech sentences for TSST-List2 to TSST-List3
with 8-12 second lengths for each, whereas, there are only two groups for TSST-
List10 with 30 second lengths for each.
Not only Thai speech samples but also American English speech samples were
applied from the ITU-T website [272] and tested to compare with Thai speech
samples. The file format of each speech sample must be:
Also, to obtain 95% confidence of the PESQ results from the VQT, called
MOS-LQO [270], each speech list has been repeated at least 30 times, as
recommended in [189].
For the test, the VQT server which also includes the SIP phone function
simulates calls to the SIP phone. Then, the speech samples would be played while
testing before obtaining the MOS-LQO scores, as shown in Figure 3-13 [274].
3.4 Phase IV: Intensive Subjective Tests using Conversation Opinion Tests
3.4.1 Conversation Opinion Tests with G.729 and G.722 Referring to Effects of
Loss only and Delay only
88
FIGURE 3-15 The overview of the conversation opinion tests in this reseach
The results from all subjective and objective tests are presented in this chapter.
The objective method enhancement, which is essential to this research, is presented as
follows:
MOS-LQS
FIGURE 4-2 MOS-LQS of G.711 vs G.729 from different types of voices (children,
female and male speakers) with N = 61, 93, 93, 58, 87 and 86
respectively, whereas SD of G.711 and G.729 to those voices are 0.72,
0.66, 0.70, 0.63, 0.72 and 0.72 respectively
% of Votes
Opinion Score
FIGURE 4-3 Comparison of percent of the votes: G.711 vs G.722 at 64 kbps, by all
participants
MOS-LQO
Speech set
FIGURE 4-4 The MOS-LQO results of 4 lists of Thai speech and American-English
speech, where N = 48, 90, 90, 90 and 60, and SD = 0.27, 0.32, 0.32,
0.28 and 0.21 for American ITU-T, TSST-List2, TSST-List3, TSST-
List4 and TSST-List10 respectively
(although G.722 is better than G.711, and G.711 is better than G.729) except the
perceived voice quality at packet delay of 1.5 s, G.722 was found to be the worst.
4.2.2 Comparison of G.722, G.711 and G.729 Referring to Packet Loss
The results of these tests are shown in Table 4-3, which are also re-presented in
Figure 4-6 [275]. The results show that, although there are effects of packet loss,
G.711 narrow band codec can provide almost the same voice quality as the G.722
wideband codec, while G.729 seems worse than other codecs which obviously refers
to the high packet loss rates at 10% and 20%.
TABLE 4-2 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to
delay
TABLE 4-3 Comparison of MOS-CQS from G.722, G.711 and G.729 referring to
loss
TABLE 4-4 MOS-CQS versus MOS-CQE from G.711 referring to loss and delay
effects after validating
0.55 0.01
4.04 4.25
1 - - - - - -
0.41 0.01
3.92 4.21
2 - - - - - -
0.63 0.04
3.65 3.88
6 - - - - - -
MOS-CQS MOS-CQE
FIGURE 4-7 Representing of MOS-CQS versus MOS-CQE, using data from Table
4-4
TABLE 4-5 Numbers of subjects, total of 400 subjects referring to the tests with
G.711
0 32 28 30 32
1 24 - - -
Packet Loss Condition (%)
2 24 - - -
3 28 24 26 26
5 26 24 26 28
6 26 - - -
10 30 26 26 24
15 36 - - -
20 24 26 24 24
98
TABLE 4-6 Statistic from the survey of the mean expectation score of voice quality
bases-on 5-point scale from 828 Thai users
Category Statistic
Male 43.2%
Participants
Female 56.8%
<18 5.6%
18 17.8%
19 22.0%
Average = 19.85
Age 20 21.1%
SD = 1.96
21 18.4%
22 8.5%
>22 6.6%
Vote 1 5.8%
Vote 2 4.7%
Average = 3.41
Expectation Score Vote 3 40.9%
SD = 0.92
Vote 4 40.2%
Vote 5 8.3%
are 0.114, 0.068 and 0.840 respectively, higher than 0.05, which means there is no
significant difference about different type of speakers.
4.4.2 Analysis of MOS from Interview Tests
The results from G.711 narrowband codec and G.722 wideband codec are
almost the same. Therefore, the t–test and ANOVA with 95% confidence interval
were used for analysis with the hypotheses H8-H12 as in Table 4-7. The output from
the T-test and ANOVA are shown in Table 4-8. It can be seen that the p-value of H8
is 0.743, higher than 0.05 significantly. Therefore, it is proven that there is no
significant difference between the speech quality perception scores provided by G.711
and G.722. On the other hand, it can be said that G.711 narrow band codec provides
good vice quality at the same levels as G.722 wideband codec. For H9, the
verification of variation of two IP phones resulted in a p-value of 0.437. This means
there is no significant difference. For H10, H11 and H12, the verification about the
issues of gender of interviewee and interviewer resulted in a p-value of 0.427, 0.099
and 0.212 respectively. This also means there is no significant difference.
4.4.3 Analysis of MOS from PESQ Test with Thai Speech
In Figure 4-3, it can be seen that the MOS from PESQ tests with English speech,
set downloaded from the ITU-T website, show the highest MOS, 3.93, whereas, the
MOS from the tests with Thai speech sets shows various results. Therefore, to prove
the language dependency issue, focusing on Thai speech set only. The hypothesis,
H13, has been tested with the raw data obtained from PESQ using ANOVA. The
hypothesis result is shown in Table 4-8. The output from ANOVA test shows p-value
of < 0.001, this means different Thai speech sets provided significant differences due
to the issue of content and language dependency of PESQ.
4.4.4 Analysis of MOS from Conversation Opinion Tests Referring to Delay
Effects
From previous sections, it has been proven that the three codecs, G.711, G.729
and G.722 provide different voice quality insignificantly. However, referring to delay
effects, it has been found that the results of MOS-CQS from G.711, G.729 and G.722
are not different significantly if delay is not over 0.8 s, whereas the MOS-CQS from
G.722 is the worst with the packet delay rate of 1.5.
4.4.5 Analysis of MOS from Conversation Opinion Tests Referring to Loss Effects
This was found to be similar to 4.4.4 but the MOS-CQS from G.729 trends to be
worse than MOS-CQS from G.711Alaw and G.722. Particularly, it can be obviously
seen that G.729 provides worse voice quality than the other codecs at the packet loss
rate of 10% and 20%.
4.4.6 Comparison of MOS from three subjective methods with Thai subjects and
language
This is an additional measure to compare subjective MOS from the same codec,
G.711, with the direct condition (no loss and delay) but different methods, consisting
The MOS from these three methods, ACR, interview and conversation, are 4.23
of ACR - listening opinion tests (ACR), interview tests and conversation opinion tests.
0.70, 4.14 0.60 and 4.16 0.51 respectively. This test aims to verify whether there
is any issue about different methods when testing with Thai users and Thai language,
therefore H14, as in Table 4-7 has been considered. The analyzed result in Table 4-8
presents that p- value is 0.505, higher than 0.05. Therefore, the null hypothesis is
accepted, these three methods present the same result.
100
4.4.7 Comparison of the Conversation Opinion Tests with G.711 and the
E-model tests
According to MOS-CQS and MOS-CQE from G.711 in Table 4.4, the gathered
data has been obtained from 644 subjects. Nevertheless, only the validated data from
400 subjects referring to packet loss of 0-10% and packet delay of 0-0.8 s is applied to
the E-model enhancement in Chapter 5. All 400 subjects have the same criteria
because they were the undergraduate students in KMUTNB that offer many programs
based on science and technology. They consist of 219 female and 181 male subjects,
with the average age of 19.79 (SD=2.05).
Comparing MOS-CQS and MOS-CQE in Table 4-4 by re-presenting as in
Figure 4-7, it can be seen that mostly each pair has different values, while the
difference can obviously be seen that each line of MOS-CQE has more slop than each
line of MOS-CQS particularly. Therefore, this is evidence that the standard E-model
requires re-calibration or modification with subjective MOS such as MOS-CQS from
Thai users, in order to gain higher accuracy, reliability and confidence for use in Thai
environments.
4.4.8 Comparison of MOS from Thai users versus users from other countries
Although it is difficult to prove that language and culture affect MOS which is
the metric to measure voice quality perception from different countries, the outcome
of this research can show evidence about this issue, as in Table 4-9. From the table,
The MOS of G.729 from conversation opinion tests with Thai subjects and language,
4.13, is higher than the MOS of the same codec from ACR - listening opinion tests
with American subjects and American-English language and Japanese language, 3.69
and ~3.4 respectively but close to English language, ~4.15. For G.711, all MOS
values from three subjective tests, 4.14, 4.16 and 4.13 are lower than the MOS from
ACR - listening opinion tests with Korean subjects and language but higher than the
MOS from ACR - listening opinion tests with American subjects and American-
English. For G.722 wideband codec, the MOS values from interview and conversation
opinion-tests, 4.17 and 4.25, are better than the MOS from Nokia Lab using DCR
listening opinion tests with Finnish subjects and language, 4.02 but it can be implied
that they are similar to the MOS from DCR-listening opinion tests with American
subjects and American-English, 4.26 and from ACR-listening opinion tests with
Chinese subjects and language, 4.23. However, the MOS from ACR - listening
opinion tests with French subjects and language is the highest, 4.41.
In summary, the MOS of G.729 varies between 3.4 – 4.15, while MOS of G.711
is 4.05 – 4.41, and MOS of G.722 (64kbps) is 4.02 - 4.41. This summary shows that
there are variations of MOS from different laboratories that conducted tests with
different languages and cultures.
101
Hypotheses p-value
TABLE 4-9 Comparison of MOS from Thai users and users from different languages
and cultures, adopted from Table 2-34 (in Chapter 2) and Section 4.1-
4.2
No. of
Codec Method Condition Language MOS Remarks
listeners
Not
English ~4.15 Approximated
No loss & specified
ACR from the
delay Not
Japanese ~3.4 figures
specified
3.69 0.71
G.729 Clean Psytechnics
ACR American-English 32
(8 kbps) Speech (Report)
4.18 0.70
No loss &
ACR Thai 29 KMUTNB
delay
4.13 0.54
No loss &
Conversation Thai 24 KMUTNB
delay
American-English 32 4.05
Clean 8 kHz
ACR
Speech sampling rate
Korean 32 4.41
4.23 0.32
G.711
No loss &
A-law ACR Thai 31 KMUTNB
delay
(64 kbps)
4.14 0.60
No loss &
Interview Thai 100 KMUTNB
delay
4.16 0.51
No loss &
Conversation Thai 32 KMUTNB
delay
French 32 4.41
Clean 16 kHz
ACR
Speech sampling rate
Chinese 32 4.23
4.17 0.62
No loss &
Interview Thai 101 KMUTNB
delay
4.25 0.68
No loss &
Conversation Thai 24 KMUTNB
delay
CHAPTER 5
E2-MODEL AND THAIVQE
This chapter presents two new models that consist of an objective Enhanced E-
model using Thai Bias Factor (E2-model), and a Thai Subjective-VoIP Quality
Evaluation Mathematical Model (ThaiVQE). Details are as follows:
5.1 E2-model
The method that is proposed to enhance the standard E-model using “Thai Bias
Factor” obtained from MOS-CQS for Thai users of the G.711 codec to modify an E-
model. This Thai Bias factor ( B ) has been developed to cover language, cultural and
nationality factors. The new model is called the Enhanced E-model or E2-model. The
method used to obtain this factor and application to enhance the E-model consisted of
the following steps:
Step 1: Subjective data gathering of MOS-CQS values
Table 4-4 shows raw subjective data obtained for MOS-CQS from Thai users
and the results of tests carried out to check and validate data for the 15 scenarios
included in the gray area of Table 3-3 (in Chapter 3), i.e., packet losses up to 10% and
packet delays up to 0.8s.
Step 2: Objective data gathering of MOS-CQE values
Similar to step 1, Table 4-4 (in Chapter 4) shows raw objective data obtained for
MOS-CQE and the results of tests carried out to check and validate data for the same
15 scenarios of packet loss and packet delay used in step 1.
Step 3: Finding equation for Thai bias factor as a function of packet loss and
packet delay.
The Thai bias factor is defined as B TH for R values given on a 100-point scale
'
and BTH for MOS values given on a 5-point scale or traditional scale.
'
The equation for the bias factor BTH has been computed as follows (see Figure
5-1). As the first step, bias factors for each scenario of packet loss and delay were
computed by subtracting the MOS-CQE values obtained in step 2 from the MOS-CQS
'
values obtained in step 1. Then, an equation for BTH as a function of packet loss and
delay was found by using the surface fitting tool in Matlab to obtain a best fit to the
data. The computed equation is given in (5-4).
'
Step 4: E-model enhancement using BTH
The enhanced E-model given in Eq. (5-4) was then obtained by replacing the
general bias factor B ' in the 5-point scale E-model Eq. (5-2) with the Thai bias factor
'
BTH calculated from Eq. (5-3). This enhanced E-model can then be applied for use in
Thailand as shown in Figure 5-2.
106
FIGURE 5-2 Overview of the proposed E2-model with Thai bias factor
R = Ro-Is-Id-Ie+A+B (5-1)
MOS-CQE =4.5 ; R>100
MOS-CQE =1+0.035R+R(R-60)(100-R)7*10-6+ B ' ;0<R<100 (5-2)
MOS-CQE =1 ; R<0
where
Ro is the basic signal-to-noise ratio, including noise sources such as,
room noise and circuit noise.
Is is the signal impairment factor which is a combination of all
impairments which occur more or less with the voice signal
simultaneously.
Id is the delay impairment factor that is caused by packet delay.
Ie-eff is the effective equipment factor that is caused by codecs.
A is the advantage factor that allows for compensation of
impairment factors when there are other advantages accessible to
the user.
107
Bis the bias factor from intensive subjective tests, based-on 100-
point scale
MOS-CQE is the objective MOS from standard E-model
R is the R-value from E-model tool
'
B is the bias factor from intensive subjective tests, based-on 5-point
scale
The Thai bias factor equation and the proposed enhanced E-model equation are
as follows:
'
BTH = -0.2806+1.297L-0.4424D+4.505LD+0.8994D2 (5-3)
MOS-CQE*=4.5 ; R>100
-6 '
MOS-CQE* =1+0.035R+R(R-60)(100-R)7*10 + B TH ;0<R<100 (5-
4) MOS-CQE*=1 ; R<0
where
'
BTH is the Thai bias factor calculated as described in steps 1 to 3
above
L is packet loss percentage
D is packet delay (s)
MOS-CQE* is the modified objective MOS from the new E2-model
packet delays as high as 800 ms (shown as the light grey or light green area in Table
5-1).
where
MOS-CQS* is Thai based - subjective MOS
L is packet loss percentage
D is packet delay (s).
FIGURE 5-4 The surface chart of the MOS-CQS provided by G.711 for packet loss
of 0-10% and packet delay of 0-0.8 s. The chart was computed using
the surface fitting tool in Matlab and was used to create the ThaiVQE,
and subjective model
109
TABLE 5-3 Comparison of errors between actual MOS and predicted MOS for
standard E-model, E2-model, and ThaiVQE model
Step 3: Use the trendline tool in Microsoft Excel to estimate an equation relating
the MOS-CQS differences to packet loss. The estimated equation, which has an R-
squared value of 1, is given in Eq. (5-6).
TABLE 5-4 The differences between MOS-CQS provided by G.711 and G.729 as a
function of packet loss, adapted from Table 4-3 (in Chapter 4)
MOS-CQS
Packet Loss Diff (MOS-CQS)
G.711 G.729
where
Diff(MOS-CQS) = MOS-CQS (G.711) – MOS-CQS (G.729)
L is packet loss rate
where
MOS-CQS* G.729 is Estimated MOS-CQS* for G.729
MOS-CQS* G.711 is MOS-CQS* for G.711 (Table 5-1)
Diff(MOS-CQS) is the difference between MOS-CQS (Eq. (5-6))
The classification of satisfaction level given in Table 5-5 assumes that the
relationship between MOS-CQS and satisfaction level for Thai users is the same as
that given by ITU-T (see Table 2-x) for users in general. With this assumption, it can
be seen that users Thai users are satisfied when using VoIP applications/services
provided by G.729 without packet loss and with packet delay of less than 200 ms
(shown as the black area in Table 5-5). For higher values of packet loss and delay,
Table 5-5 shows that some users are satisfied and some are dissatisfied, if packet loss
and packet delay occur together as 4% loss and less than 300 ms, or 3% loss and less
than 800 ms (the dark grey or dark green area in Table 5-5). For even higher values
of packet loss and delay, Table 5-5 shows that some Thai users are still satisfied even
112
if packet loss and packet delay occur together as 6% loss and less than 600 ms, or 5%
loss and less than 800 ms (the light grey or light green area in Table 5-5). At very
high values of packet loss and delay, many Thai users or all Thai users will not be
satisfied (the very light gray or yellow and gray or orange areas in Table 5-5). In
particular, if the packet loss is greater than 9% then all users will be dissatisfied.
It should be noted that the ThaiVQE table for G.729 was obtained by modifying
the ThaiVQE for G.711 using only packet loss data for G.729. It was beyond the
scope of this thesis to carry out the time-consuming detailed measurements of MOS-
CQE and some parts of MOS-CQS for combined packet loss and delay for G.729 that
were carried out for G.711. However, even with this limitation, the G.729 ThaiVQE
table 5-5 should be useful as a basis for VoIP network planning and measurement for
Thai users in Thai environments.
This is the last chapter which consists of three sections, discussion, conclusion
and future work, as follows:
6.1 Discussion
6.1.1 ACR - listening opinion tests and interview tests in the pilot phase
The pilot tests that consisted of ACR listening opinion tests and interview tests
were important for this research. They helped to ensure the reliability of the VoIP
test-bed system. In particular, the pilot tests helped to reveal unseen and unexpected
issues. For example, the most important issue was finding students to join the
subjective tests in the laboratory that had been set up on the 7th floor of the Central
Library, KMUTNB. Students who said they would join the VoIP quality assessment
must actually intend to go. In some cases, a lecturer allowed their students to join the
test before they finished a class. If there were 30 students for the test and each student
took at least 3 minutes then the last student might have been kept waiting for at least
one and a half hours. The pilot tests revealed that the most effective means of
ensuring that a sufficient number of subjects were available for the tests was
enforcement by their lecturers. The ID of a student who joined the tests was sent to
the lecturer for checking.
The ACR listening opinion tests in the pilot phase showed that the MOS-LQS
results for the G.711, G.729 and G.723.1 codecs from the experiment with Thai users
were consistent with the theory, i.e., that G.711 provides better voice quality than
G.729 and that G.729 provides better voice quality than G.723.1. The results from the
ACR listening tests on Thai users detected a statistically significant difference of
perceived voice quality between G.729 and G. 723.1, but the observed difference
between G.711 and G.729 was not statistically significant.
In the interview tests with the G.711 and G.722, the observed differences
measured using MOS-CQS were not statistically significant. In these interview tests
no major differences between male and female subjects or between different IP
phones were detected.
6.1.2 PESQ test with Thai speech sets in the pilot phase
The PESQ test results for the Thai language described in chapter 4 showed that
there was an issue related to speech content dependency for Thai users with PESQ.
Therefore, it is recommended in this thesis that PESQ is not appropriate for voice
quality measurement in the Thai language. This result contrasts with published PESQ
test results for American English which showed that PESQ worked well with speech
sets from this language.
6.1.3 Conversation Opinion Tests
These MOS-CQS tests were crucial for this research. Without using these tests,
the development of the enhanced E-model for Thai users could not have been
achieved. For these tests, more than one thousand subjects were required. Therefore,
114
probably came from these low income households and their quality of life would not
be high. They would therefore have relatively low expectations of quality of services.
This could be a possible reason why they were willing to accept poorer VoIP quality
with appreciable packet loss and packet delay.
4) Constraints of the VoIP test-bed system: the packet losses generated for
the tests, both for the conversation opinion test and the E-model tests, were random
uniform losses. However, in the real world, packet losses are often not uniform as
they can be bursty. It is known that bursty packet loss reduces voice quality more
than does uniform random packet loss. Therefore, in the real world, the occurrence of
bursty packet losses are likely to result in MOS-CQS values which are lower than the
values given in Table 4-4 (in Chapter 4).
6.1.6 A comparison of MOS from Thai users with reference data
Table 2.17 (in Chapter 2), adapted from [16], contains the most complete data
currently available about codecs and their characteristics. Table 6-2 contains a
comparison of our values for MOS for Thai users with this reference data for a range
of qualities of VoIP for major codecs. From the table, it can be seen that:
1) The range of MOS for G.722 at 64 kbps of 4-17-4.25 from the Thai
subjective tests is higher than the MOS from the reference of approximately 4.1.
2) The range of MOS for G.711 (A-law) of 4.14 – 4.23 from Thai subjective
tests is higher than the MOS from the reference of 4.1.
3) The range of MOS for G.729 of 4.13 – 4.18 from Thai subjective tests is
appreciably higher than the MOS from the reference of 3.92.
4) The MOS of G.723.1 at 5.3 kbps of 3.9 from Thai subjective tests is
appreciably higher than the MOS from the reference of 3.6.
The MOS values from this thesis could be used as benchmarks or reference for
VoIP quality measurement in Thailand.
All MOS values listed in Table 6-2 for Thai users of the four codecs are higher
than the MOS values given in the reference [16]. The root causes of differences might
be due to effects such as language and culture listed in 6.1.5. However, the reasons
for the differences require further investigation.
6.2 Conclusion
This research has been conducted on the assumption that VoIP users who speak
different languages, such as, Thai (a tonal language) and English (a non-tonal
117
language), and who have different cultures and live in different countries may have
different perceptions about voice quality.
The research has been conducted with more than one thousand subjects and has
concentrated on three major factors, consisting of codec, packet loss and packet delay.
A main aim has been to develop appropriate recommendations about voice quality for
Thai users who speak the standard Thai spoken language and who have their own
culture. It can be claimed that the research in this thesis is the most intensive
subjective test so far carried out for VoIP quality measurement, particularly with a
tonal language. The results from the intensive subjective tests have been applied to
develop a modified objective measurement method that is suitable for use in Thai
environments and which gives results with high accuracy, reliability and confidence.
This research has verified and presents evidence that G.729 can provide a level
of voice quality as good as G.711 and G.722 (see Table 6-2). After surveying the
expectations of Thai users for VoIP quality, a MOS value of 3.41 has been proposed
as the baseline for providing voice quality of VoIP to Thai users in Thailand.
Essentially, the important part of the gathered data, from 400 subjects, has been
validated and used to find a bias factor, called Thai bias factor, to enhance the E-
model, which is the popular objective measurement for VoIP quality. This Enhanced
E-model for VoIP quality measurement in Thai environment, which is called the
E2-model, has been proposed to provide objective measurements of higher accuracy,
reliability and confidence for use in Thai environments. Moreover, a new
mathematical model called Thai subjective – VoIP Quality Evaluation (ThaiVQE) has
been developed by analyzing validated data from the subjective tests with 400
subjects using Matlab. These two proposed methods can be applied by VoIP operators
to provide higher standard VoIP quality to Thai users and to give a better quality of
life of Thai people.
The results from this research are important evidence for the
language/cultural/nationality dependence of subjective VoIP quality measurement. It
can be particularly useful for ITU-T Study Group 12 which is currently studying
language/cultural/nationality dependence of the quality of experience of multimedia,
including MOS.
Some recommendations from this research referring to VoIP quality
measurement or evaluation, based-on Thai users in Thai environments, are as follows:
1) PESQ tool is not recommended to use in Thai environment because it is
content and language dependent and has not been calibrated for the Thai language.
2) G.729 is strongly recommended for use instead of G.711 in cases that
require bandwidth reduction but still require voice quality as good as G.711. The
G.729 can reduce bandwidth consumption by approximately two thirds (2/3) without
appreciably reducing voice quality as observed by Thai users. However, if users do
not care much about voice quality, G.723.1 is also an option because it can reduce
bandwidth consumption by about three fourths (3/4) to four fifths (4/5) compared to
G.711.
3) G.722 is not recommended if it requires a license charge to use because it
does not provide appreciably better voice quality than G.711.
4) The MOS values given in Table 6-2 can be used as the benchmarks for
VoIP systems/applications/services/ in Thailand.
118
5) Although the E-model was originally designed and developed for network
planning, it is independent from content and speech resources. Therefore it is
recommended that it should be modified and enhanced for use in Thai environments.
6) At present, there are no official government regulations adopted for MOS
in Thailand. It is suggested that VoIP service providers should use MOS of 3.41 as
the baseline to ensure that many Thai users are not dissatisfied and that a MOS of 3.6
should be proposed to NBTC for VoIP regulation.
7) ThaiVQE tables, see Tables 5-1 and 5-5 (in Chapter 5) should be used as
guidelines to provide good VoIP quality for Thai users.
48. CFAR. Mini-case Study: Nike’s “Just Do It” Advertising Campaign. [online]
2012. [cited 2012 Dec 16]. Available from : http://www.cfar.com//Document
s/nikecmp.pdf.
49. Camp J. et al. Thailand and Adult Diapers Feasibility Study. [online] 2012. [cited
2012 Dec 16]. Available from : http://www.jacobcamp.com/wp-content/uplo
ads/2011/05/Thailand-and-Adult-Diapers-Feasability-Study-Final.pdf.
50. NSO. The Key Statistics of Thailand, Household Sccio-economic Survey in the
first half of the year 2011. [online] 2012. [cited 2012 Dec 16]. Available
from : http://service.nso.go.th/nso/nsopublish/themes/files/socioImpt54(6).pd
f. (Thai).
51. Rogers, A.L. Wind Turbine Acoustic Noise. [online] 2012. [cited 2012 Dec 16].
Available from : http://www.minutemanwind.com/pdf/Understanding%20Wi
nd%20Turbine%20Acoustic%20Noise.pdf.
52. Stanfield, C.L. and Germann, W.J. Principles of Human Physiology. 3rd ed.
California: Pearson Education, 2008.
53. Tortora, G.J. and Derrickson, B. Principles of Anatomy and Physiology. 12th ed.
Asia: John Wiley & Sons, 2009.
54. Silverthorn, D.U., et al. Human Physiology: An Integrated Approach. 2nd ed.
California: Pearson Education, 2001.
55. Lange, M. Dorsal. [online] 2012. [cited 2012 Dec 16]. Available from :
http://homepages.widged.com/mlange/teaching/CNL/helpers/dorsal.gif.
56. Sherwood, L. Human Physiology: from Cells to Systems. 4th ed. California:
Thomson Learning, 2001.
57. Rohen, J.W., Yokochi, C. and Lutjen-Drecoll, E. Color Atlas of Anatomy:
Aphotographic Study of the Human Body. 5th ed. Pennsylvania: Williams &
Wilkins, 2002.
58. Sittiprapaporn, W., Chindaduangratn, C. and Kotchabhakdi, N. “Brain electric
activity during the preattentive perception of speech sounds in tonal
languages.” Songklanakarin Journal of Science and Technology. 26 (2004):
439-445.
59. ____. Functional Specialization of the Human Auditory Cortex in Processing of
Speech Prosody: A Low Resolution Electromagnetic Tomography (LORETA)
Study. [online] 2012. [cited 2012 Dec 9]. Available from : http://anchan.
lib.ku.ac.th/ kukr/bitstream/003/17786/1/KC4205011.pdf.
60. ____. “Long-term memory traces for familiar spoken words in tonal languages as
revealed by the Mismatch negativity.” Songklanakarin Journal of Science
and Technology. 26 (2004): 779-786.
61. Gandour, J., et al. “Pitch processing in the human brain is influenced by language
experience.” Neuroreport Rapid Science. 9 (1998): 2115-2119.
62. ____. “A crosslinguistic PET study of tone perception.” Journal of Cognitive
Neuroscience. 12 (2000): 207-222.
63. Klein, D., et al. “A cross-linguistic PET study of tone perception in Mandarin
Chinese and English speakers.” NeuroImage. 13 (2001): 646-653.
64. Wang, Y., Jongman, A. and Sereno, J.A. “Dichotic perception of Mandarin tones
by Chinese and American listeners.” Brain and Language. 78 (2001): 332-
348.
123
65. Wang, Y., et al. “The role of linguistic experience in the hemispheric processing
of lexical tone.” Applied Psycholinguistics. 25 (2004): 449-466.
66. Keenaghan, K.M. A Novel Non-Acoustic Voiced Speech Sensor: Experimental
Results and Characterization. Master of Science in Electrical and Computer
Engineering Thesis, Worcester Polytechnic Institute, 2004.
67. Anusuya, M.A. and Kitti, S.K. “Speech Recognition by Machine: A Review.”
International Journal of Computer Science and Information Security. 6
(2009): 181-205.
68. NIDCD. What Is Voice? What Is Speech? What Is Language? [online] 2012.
[cited 2012 Dec 15]. Available from : http://www.nidcd.nih.gov/health/voice
/pages/whatis_vsl.aspx.
69. Jiang, W. and Schulzrinne, H. “Analysis of On-Off Patterns in VoIP and Their
Effect on Voice Traffic Aggregation.” Proc. 9th Int. Conf. Computer
Communications and Networks. (16-18 October 2000): 82-87.
70. FCC. Consumer Guide, Voice Over Internet Protocol (VoIP). [online] 2012.
[cited 2012 Dec 17]. Available from : http://transition.fcc.gov/cgb/consumer
facts/voip.pdf.
71. Desantis, M. Understanding Voice over Internet Protocol (VoIP). [online] 2012.
[cited 2012 Dec 17]. Available from : http://www.us-cert.gov/reading _room/
understanding_voip.pdf.
72. Nokia. Advantages of SIP for VoIP. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.nokia.com/NOKIA_COM_1/ About_Nokia/Press/White_
Papers/pdf_files/whitepaper_sip_for_voip.pdf.
73. Emmerson, B. Convergence: the Business Case for IP Telephony. [online] 2012.
[cited 2012 Dec 17]. Available from : http://scc.cisco.com/web/AP/uc/assets/
docs/bcipt.pdf.
74. ITU. The Status of Voice Over Internet Protocol (VoIP) Worldwide, 2006.
[online] 2012. [cited 2012 Dec 17]. Available from : http://www.itu.int/osg/
spu/ni/voice/papers/FoV-VoIP-Biggs-Draft.pdf.
75. NASCIO. VoIP and IP Telephony: Planning for Convergence in State
Government. [online] 2012. [cited 2012 Dec 17]. Available from : http://
www.nascio.org/publications/documents/NASCIO-VOIP.pdf.
76. Komunikasi, S. IP Telephony. [online] 2012. [cited 2012 Dec 15]. Available from :
http://www.skmm.gov.my/skmmgovmy/files/attachments/ir_ip_telephony.pdf.
77. Sengar, H., et al. “Fast Detection of Denial-of-Service Attacks on IP Telephony.”
Proc. of 14th IEEE Int. Workshop on Quality of Service (IWQoS 2006). (19-21
June 2006): 199-208.
78. Soares, V.N.G., Neves, P.A.C. and Rodrigues, J.J.P. “Past, Prsent and Future of
IP Telephony.” Proc. Int. Conf. Communication Theory, Reliability, and
Quality of Service 2008 (CTRQ’08). (29 June – 5 July 2008): 19-24.
79. Tanenbaum, A.S. Computer Networks. 4th ed. New Jersey: Prentice Hall, 2003.
80. Macario, J. Intro to Voice over Internet Protocol: What does VoIP Mean for My
Business?. [online] 2012. [cited 2012 Dec 9]. Available from : http://www.
xo.com/SiteCollectionDocuments/Whitepapers/XO_Intro_to_VoIP.pdf.
81. Hurley, M. VoIP Vulnerabilities. [online] 2012. [cited 2012 Dec 19]. Available
from : http://www.ccip.govt.nz/newsroom/information-notes/ccip-informatio
n-note-current.pdf.
124
82. Keromytis, A.D. Voice over IP: Risks, Threats and Vulnerabilities. [online] 2012.
[cited 2012 Dec 17]. Available from : http://www.cs.columbia.edu/~angelos/
Papers/2009/cip.pdf.
83. UVA-WISE. New Campus VoIP Telephone System installation starting soon.
[online] 2012. [cited 2012 Dec 17]. Available from : http://uvawise.edu/it/file
s/oit/InfoLinkSP2011.pdf.
84. Prime Minister’s Office. Thailand’s NBTC appointments announcement. [online]
2011. [cited 2011 Sep 20]. Available from : http://www.nbtc.go.th/phoca
download/prnbtc/107201141415541007-0001-notification-opm.pdf.
85. NNT. 11 NBTC members endorsed by HM the King. [online] 2012. [cited 2012
Dec 17]. Available from : http://202.47.224.92/en/news.php?id=2554100700
12.
86. The Nation. New NBTC to begin search for secretary. [online] 2012. [cited 2012
Dec 9]. Available from : http://www.nationmultimedia.com/business/New%
20-NBTC-to-begin-search-for-secretary-30167208.html.
87. ____. Court drops 3G bombshell. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.nationmultimedia.com/home/2010/09/17/busi ness/Court-
drops-3G-bombshell-30138165.html.
88. Seriwiwatta, P., Nittayagasetwat, A. and Panyagometh, K. “3G and Economic
Impact: A Case if Thailand.” NIDA Business Journal. 9 (2011): 5-21.
89. BBC. 4G Mobile Phone Network Comes to Scandinavia. [online] 2012. [cited
2012 Dec 17]. Available from : http://news.bbc.co.uk/2/hi/technology/84120
35.stm.
90. ITU-R Recommendation M.1645. Framework and overall objectives of the future
development of IMT-2000 and system beyond IMT-2000. August, 2003.
91. NBTC. NTC Announcement: QoS VoIP. [online] 2012. [cited 2012 Dec 17].
Available from : http://www.nbtc.go.th/wps/portal/NTC/!ut/p/c4/04_SB8K8x
LLM9MSSzPy8xBz9CP0os3gTf3MX0wB3U09jx0AjA09ntwADZ28LdzdL
E_2CbEdFAPXh_qI!/?WCM_GLOBAL_CONTEXT=/wps/wcm/connect/lib
rary+ntc/internetsite/11reshdev/11032standard/11032standard_detail/stan000
03.
92. Jaruvitayakovit, T. “VoIP Status in Thailand.” Proc. 1st AUN/Seed-Net Electrical
and Electronics Engineering Regional Conference, Int. Sym. Multimedia and
Communication Technology. (22-23 January 2009): 128-130.
93. Vanijja, V. VoIP Software Using Open Source. [online] 2011. [cited 2011 Nov
11]. Available from : http://www.tridi.ntc.or.th/library/upload/c6.pdf.
94. CITEL. Redes de Próxima Generación. [online] 2012. [cited 2012 Dec 18].
Available from : http://www.oas.org/en/citel/infocitel/2007/diciembre/ngn_e.
asp.
95. Daengsi, T., et al. “Recent VoIP Services in Thailand and the Expectation of Thai
Users to Voice Quality.” Proc. 1st ASEAN Plus Three Graduate Research
Congress. (1-2 March 2012): ST-835 – ST-840.
96. NBTC. VoIP Telephone Number (06-xxxxx-xxx) in Thailand. [online] 2012. [cited
2012 Dec 17]. Available from : http://numbering.nbtc.go.th/wps/portal/Num
bering/Numbering/InternetNumber.
125
97. CAT Telecom. CAT 001 Rates. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.cattelecom.com/web_data/uploads/file/CAT%20001_2011
.pdf.
98. CAT2call. CAT2call Rates. [online] 2012. [cited 2012 Dec 17]. Available from :
http://www.cat2call.com/download/CAT2call_rate.pdf.
99. TOT netcall. Rate International. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.totnetcall.com/rateinternational.html.
100. True NetTalk. International Calls Rate. [online] 2012. [cited 2012 Dec 17].
Available from : http://www.truenettalk.com/truenettalk/en/products/all_rate.
html.
101. CAT Telecom. CAT 009 Promotion Rates. [online] 2012. [cited 2012 Dec 17].
Available from : http://www.cattelecom.com/web_data/uploads/file/CAT%
20009%20rates_3%20baht_New%20to%2031%20Dec_%2011(2).pdf.
102. TOT. International saving rate service via 008 code. [online] 2012. [cited 2012
Dec 17]. Available from : http://www.tot.co.th/files/008.pdf.
103. AIN. 00500 Country & Rate. [online] 2012. [cited 2012 Dec 17]. Available from
: http://www.ain.co.th/th/00500_rate.html.
104. Truemove. Table of international call rate via code 00600. [online] 2012. [cited
2012 Dec 17]. Available from : http://www.truemove.com/th/product-pre-
new-sim-intersim-pop1.html.
105. CAT. CAT PhoneNet. [online] 2012. [cited 2012 Dec 17]. Available from :
http://www.contactcenter.cattelecom.com/thai/oversea/catphonenet_info.asp.
106. ITU-T Recommendation G.722. 7 kHz Audio – Coding within 64 kbit/s. 1988.
107. Packetizer. VoIP Bandwidh Calculator. [online] 2012. [cited 2012 Dec 20].
Available from : http://www.bandcalc.com/.
108. ITU-T Recommendation G.711. Pulse Code Modulation (PCM) of Voice
Frequencies. 1988.
109. Hersent, O., Petit, J.-P. and Gurle, D. IP Telephony Deploying Voice-over-IP
Protocols. 1st ed. UK: Wiley, 2005.
110. ITU-T Recommendation G.729. Coding of speech at 8 kbit/s using conjugate-
structure algebraic-code-excited linear prediction (CS-ACELP). January,
2007.
111. ITU-T Recommendation G.723.1. Dual rate speech coder for multimedia
communications transmitting at 5.3 and 6.3 kbit/s. May, 2006.
112. Avaya Labs. Avaya IP Voice Quality Network Requirement. [online] 2012. [cited
2012 Dec 18]. Available from : http://downloads.avaya.com/css/P8/documen
ts/100018203.
113. Chen, K.-T., et al. “Quantifying Skype User Satisfaction.” Proc. Special Interest
Group on Data Communication 2006 (SIGCOMM’06). (11-15 September
2006): 399-410.
114. Bonfiglio, D. et al. “Revealing Skype Traffic: When Randomness Plays with
you.” Proc. Special Interest Group on Data Communication 2007
(SIGCOMM’07). (27-31 August 2007): 37-48.
115. Daengsi, T. “Voice Quality Measurement for VoIP: Simple Method Using a
Survey.” Proc. 32nd Electricall Engineering Conference (EECON-32). (28-
30 October 2009).
116. Sulkin, A. PBX Systems for IP Telephony. 1st ed. USA: McGraw-Hill, 2002.
126
117. Goralski, W.J. and Kolon, M.C. IP Telephony. 1st ed. New York: McGraw-Hill,
2000.
118. Wallence, W. Voice over IP first-step. 1st ed. Indianapolis: Cisco Press, 2005.
119. Davidson, J., et al. Voice over IP Fundamentals. 2nd ed. Indianapolis: Cisco
Press, 2006.
120. Cisco. H.323 and SIP Integration. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.cisco.com/warp/public/cc/techno/tyvdve/sip/ prodlit/sh23g
_wp.pdf.
121. Schulzrinne, H. and Rosenberg, J. A Comparison of SIP and H.323 for Internet
Telephony. [online] 2012. [cited 2012 Dec 17]. Available from : http://www.
cs.columbia.edu/~hgs/papers/Schu9807_Comparison.pdf.
122. Arora, R. Voice over IP: Protocol Standards. [online] 2012. [cited 2012 Dec
17]. Available from : http://www.cse.wustl.edu/~jain/cis788-99/ftp/voip_pro
tocols/index.html.
123. Avaya. Enterprising with SIP – A Technology Overview. [online] 2012. [cited
2012 Dec 17]. Available from : http://www.avaya.com/usa/resource/assets/w
hitepapers/lb2343.pdf.
124. Tong, H.A. and Rupp, S. SIP-based VoIP services – Architecture and
Comparison. [online] 2012. [cited 2012 Dec 17]. http://www.linecity.de/INF
OTECH_ACS_SS05/acs5_top2_paper.pdf.
125. Parageogiou, P. A Comparison of H.323 vs SIP. [online] 2012. [cited 2012 Dec
17]. Available from : http://www.cs.umd.edu/~pavlos/papers/unpublished/pa
pageorgiou01comparison.pdf.
126. Malhotra, S. and Kaur, P. “Comparison of Call Signalling Protocols for Ad-hoc
Networks.” International Journal of Computer Applications. 27 (2011): 35-
40.
127. Chen., X., et al. “Survey on QoS Management of VoIP.” Proc. 2003 Int. Conf.
Computer Networks and Mobile Computing (ICCNMC’03). (20-23 October
2003): 69-77.
128. The Applied Technologies Group, Inc. QoS in the Enterprise. [online] 2012.
[cited 2012 Dec 17]. Available from : http://www.hadassah.ac.il/CS/staff/ma
rtin/Seminar/qos/qosent.pdf.
129. Frost, N. “VoIP threats – getting louder.” Network Security. (2006): 16-18.
130. Bradbury, D. “The security challenges inherent in VoIP.” Computers & Security.
26 (2007): 485-487.
131. Effnet. An introduction to IP header compression. [online] 2012. [cited 2012
Dec 17]. Available from : http://www.effnet.com/sites/effnet/pdf/uk/Whitepa
per _Header_Compression.pdf.
132. Palmieri, F. and Fiore, U. “Providing true end-to-end security in converged
voice over IP infrastructures.” Computers & Security. 28 (2009): 433-449.
133. Stanton, R. “Secure VoIP – an achievable goal.” Computer Fraud & Security.
(2006): 11-14.
134. VoIPSA. VoIP Security and Privacy Threat Taxonomy. [online] 2012. [cited
2012 Dec 17]. Available from : http://www.voipsa.org/Activities/ VOIPSA_
Threat_Taxonomy_0.1.pdf.
135. Keromytis, A.D. “Voice-over-IP Security Research and Practice.” IEEE Security
& Privacy. 8 (2010): 76-78.
127
136. Hunter, P. “VoIP the latest security concern: DoS attack the greatest threat.”
Network Security. (2002): 5-7.
137. Winkler, S. “Video Quality Measurement Standards – Current Status and
Trends.” Proc. 7th Int. Conf. Information, Communications and Signal
Processing 2009 (ICICS 2009). (8-10 December 2009): 1-5.
138. Kilkki, K. “Quality of Experience in Communications Ecosystem.” Journal of
Universal Computer Science. 14 (2008): 615-624.
139. ITU-T Recommendation G.1000. Communications quality of service: A
framework and definitions. November, 2001.
140. ITU-T Recommendation E.800. Terms and definitions related to quality of
service and network performance including dependability. August, 1994.
141. ITU-T Recommendation P.10/G.100. Vocabulary for performance and quality of
service, Amendment 1 - New Appendix I- Definition of Quality of Experience
(QoE). January, 2007.
142. Tran, H.A. and Mellouk, A. “QoE model driven for network services.” Proc. 8th
Int. Conf. Wired/Wireless Internet Communications (WWIC 2010). (1-3 Jun
2010): 264-277.
143. Batteram H. et al. “Delivering Quality of Experience in Multimedia Networks.”
Bell Labs Technical Journal. 15 (2010): 175-194.
144. Nokia. Quality of Experience (QoE) of mobile services: Can it be measured and
improved?. [online] 2012. [cited 2012 Dec 19]. Available from : http://www.
afutt.org/Qostic/qostic1/MOB-GD-MGQ-NOKIA-040129-Nokia-
whitepaper_qoe_net-final.pdf.
145. Hestnes, B., et al. “Quality of Experience in real-time person-person
communication - User based QoS expressed in technical network QoS
terms.” Proc. 19th Int. Sym. Human Factors in Telecommunication. (1-4
December 2003): 3-10.
146. IneoQuest. MDI / QoE for IPTV and VoIP Quality of Experience for Media over
IP. [online] 2012. [cited 2012 Dec 19]. Available from : http://ftp. ineoquest.
com/pub/docs/Papers/MediaQualityofExperience_060105.pdf.
147. Empirix. Assuring QoE on Next Generation Networks. [online] 2012. [cited
2012 Dec 19]. Available from : http://www.triple-play-news.com/voip/white
papers/whitepaper_NGNT_AssuringQoE.pdf.
148. Wu, W., et al. “Quality of experience in distributed interactive multimedia
environments: toward a theoretical framework.” Proc. 17th ACM Int. Conf.
Multimedia (MM ’09). (19-24 October 2009): 481-490.
149. Daengsi, T. et al. “A study of VoIP quality evaluation: User perception of voice
quality from G.729, G.711 and G.722.” Proc. 9th IEEE Consumer
Communications and Networking Conference – Special Session on Quality of
Experience (QoE) for Multimedia Communications. (14-17 January 2012):
342-345.
150. ___. “Speech Quality Assessment of VoIP: G.711 VS G.722 Based on Interview
Tests with Thai Users.” International Journal of Information Technology and
Computer Science. 4 (2012): 19-25.
151. Gierlich, H.W. and Kettler, F. “Advanced speech quality testing of modern
telecommunication equipment: An overview.” Signal Processing. 86 (2006):
1327-1340.
128
170. Cisco. QoS: Quality of Service. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.cisco.com/web/offer/powernow/docs/netinfra/qos-cheatsh
eet.pdf.
171. Jarrett, D. and Buchanan, K. Building Residential VoIP Gateways: A Tutorial
Part Three: Voice Quality Assurance for VoIP Networks. [online] 2012.
[cited 2012 Dec 17]. Available from : http://www.analogzone.com/nett0712.
pdf.
172. IEEE. You searched for: VoIP quality. [online] 2012. [cited 2012 Sep 1].
Available from : http://ieeexplore.ieee.org/search/searchresult.jsp?queryText
%3DVoIP+quality&addRange=2000_2012_Publication_Year&pageNumber
=1&resultAction=REFINE.
173. ___. You searched for: VoIP quality delay. [online] 2012. [cited 2012 Sep 1].
Available from : http://ieeexplore.ieee.org/search/searchresult.jsp?newsearch
=true&queryText=VoIP+quality+packet+delay.
174. ___. You searched for: VoIP quality loss. [online] 2012. [cited 2012 Sep 1].
Available from : http://ieeexplore.ieee.org/search/searchresult.jsp?newsearch
=true&queryText=VoIP+quality+packet+loss.
175. ___. You searched for: VoIP quality jitter. [online] 2012. [cited 2012 Sep 1].
Available from : http://ieeexplore.ieee.org/search/searchresult.jsp?newsearch
=true&queryText=VoIP+quality+jitter.
176. Holub, J., Kastner, M. and Tomiska, O. “Delay effect on conversational quality
in telecommunication networks: Do we mind?” Proceedings of the Wireless
Telecommunications Symposium (WTS 2007). (26-28 April 2007): 1-4.
177. ITU-T Recommendation G.114. One-way transmission time. May, 2003.
178. ITU-T Recommendation G.107. The E-model: a computational model for use in
transmission planning. December, 2011.
179. ITU. The Essential Report on IP Telephony. [online] 2012. [cited 2012 Dec 17].
Available from : http://www.itu.int/ITU-D/cyb/publications/2003/IP-tel_repo
rt.pdf.
180. Boutremans, C., Iannaccone, G. and Diot, C. Impact of link failures on VoIP
performance. [online] 2012. [cited 2012 Dec 17]. Available from : http://
www.recursosvoip.com/docs/english/voip-nossdav.pdf.
181. Markopoulou, A. et al. “Characterization of Failures in an IP Backbone.” Proc.
23rd Annual Jount Conference of the IEEE Computer and Communication
Societies (INFOCOM 2004). (7-11 March 2004): 2307-2317.
182. Markopoulou, A., Tobagi, F. and Karam, M. “Loss and Delay Measurements of
Internet Backbones.” Computer Communications. 29 (2006): 1590-1604.
183. Zhang, H., et al. “Packet Loss Burstiness and Enhancement to the E-Model.”
Proc. 6th Int. Conf. Software Engineering, Artificial Intelligence, Networking
and Parallel/Distributed Computing and 1st ACIS Int. Workshop on Self-
Assembling Wireless Networks (SNPD/SAWN 2005). (23-25 May 2005):
214-219.
184. Fluke Corporation. Quality Management: Troubleshooting Techniques for Voice
over IP. [online] 2012. [cited 2012 Dec 20]. Available from : http://www.teq
uipment.net/pdf/FlukeNetworks/ApplicationNote-QualityManagement-Trou
bleshootingTechniquesForVoiceOverIP.pdf.
130
185. Hall, T.A. “Objective speech quality measures for Internet telephony.” Proc. of
SPIE, Voice over IP (VoIP) Technology. (2001): 128-136.
186. Narbutt, M. and Davis, M. Assessing the quality of VoIP transmission affected
by playout buffer scheme. [online] 2012. [cited 2012 Dec 20]. Available from
: http://wireless.feld.cvut.cz/mesaqin2005/papers/15_NARBUTT.pdf.
187. Ding, L., et al. “Non-intrusive single-ended speech quality assessment in VoIP.”
Speech Communication. 49 (2007): 477-489.
188. Khanduri, P. “Method and Apparatus for Measuring Voice Quality on a VoIP
Network.” U.S. Patent: 2009/0238085 A1. 24 September 2009.
189. Al-Akhras, M., et al. “Non-intrusive speech quality prediction in VoIP networks
using a neural network approach.” Neurocomputing. 72 (2009): 2595-2608.
190. Lee, J., Nam, K. and Kim, D. “Effect of Network factors on VoIP.” Proc. 13th
Int. Conf. Advanced Communication Technology (ICACT). (13-16 February
2011): 1130-1135.
191. Goudarzi, M. and Sun, L. Performance analysis and comparison of PESQ and
3SQM in live 3G mobile networks. [online] 2012. [cited 2012 Dec 20].
Available from : http://www.tech.plym.ac.uk/spmc/staff/mgoudarzi/ Perform
ance%20analysis%20and%20comparison%20of%20PESQ%20and%203SQ
M.pdf.
192. Voznak M. and Rozhon, J. Automated Speech Quality Monitoring Tool based on
Perceptual Evaluation. [online] 2012. [cited 2012 Dec 18]. Available from :
http://www.wseas.us/e-library/conferences/2012/Vouliagmeni/EDUCIT/ED
UCIT-14.pdf.
193. OPTICOM GmbH, SwissQual AG and TNO Telecom. POLQA® Perceptual
Objective Listening Quality Analysis. [online] 2012. [cited 2012 Dec 18].
Available from : http://www.transcom.net.cn/uploadfiles/2011062714572349
06.pdf.
194. ITU-T Recommendation P.862. Perceptual evaluation of speech quality
(PESQ): An objective method for end-to-end speech quality assessment of
narrow-band telephone. 2001.
195. ITU-T Recommendation P.862.2. Wideband extension to Recommendation
P.862 for the assessment of wideband telephone networks and speech codecs.
November, 2007.
196. Ditech Networls. Limitations of PESQ for Measuring Voice Quality in Mobile
and VoIP Networks. [online] 2012. [cited 2012 Dec 18]. Available from :
http://www.tmworld.com/file/4534-white_paper_on_the_limitations_of_PES
Q.pdf.
197. Johannesson, N.O. “The ETSI Computation Model: A Tool for Transmission
Planning of Telephone Networks.” IEEE Communication Magazine. 35
(1997): 70-79.
198. ITU-T Recommendation G.107. The E-model: a computational model for use in
transmission planning. August, 2008.
199. Qiao, Z., Sun, L. and Ifeachor, E. “Case Study of PESQ Performance in Live
Wireless Mobile VoIP Environment.” Proc. IEEE 19th Int. Sym. Personal,
Indoor and Mobile Radio Communications 2008 (PIMRC 2008). (15-18
September 2008): 1-6.
131
200. Qualcomm. PESQ Limitations for EVRC Family of Narrowband and Wideband
Speech Codecs. [online] 2012. [cited 2012 Dec 18]. Available from : http:
//www.google.com/url?sa=t&rct=j&q=Qualcomm+PESQ+Limitations+for+
EVRC+Family+of+Narrowband+and+Wideband+Speech+Codecs&source=
web&cd=1&ved=0CDMQFjAA&url=http%3A%2F%2Fwww.qualcomm.co
m%2Fmedia%2Fdocuments%2Ffiles%2Fpesq-limitations-white-paper.pdf&
ei=sMPSUMyWBczPrQeD6IHYAQ&usg=AFQjCNEtI9AqEjtJ3TufMLPy5
XoUrMZl4g&bvm=bv.1355534169,d.bmk.
201. IEEE. You searched for: VoIP E-model. [online] 2012. [cited 2012 Sep 1].
Available from : http://ieeexplore.ieee.org/search/searchresult.jsp?querText%
3DVoIP+quality+E-model&addRange=2002_2012_Publication_Year&page
Number=1&resultAction=REFINE.
202. ___. You searched for: VoIP PESQ. [online] 2012. [cited 2012 Sep 1]. Available
from : http://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&que
ryText=VoIP+quality+PESQ.
203. Lakaniemi, A., Rosti, J. and Raisanen, V.I. “Subjective VoIP speech quality
evaluation based on network measurements.” Proc. IEEE Int. Conf.
Communications 2001 (ICC 2001). (11-14 June 2001): 748–752.
204. Kitawaki, N. and Tamada, T. Subjective and Objective Quality Assessment for
Noise Reduced Speech. [online] 2012. [cited 2012 Dec 20]. Available from :
http://portal.etsi.org/stq/workshop2007presentations/kitawaki.pdf.
205. Cai, Z., et al. “Comparison of MOS evaluation characteristics for Chinese,
Japanese and English in IP telephony.” Proc. 4th International Universal
Communication Symposium. (18-19 October 2010): 112-115.
206. ITU-T. Question 7/12 – Methods, tools and test plans for the subjective
assessment of speech, audio and audiovisual quality interactions. [online]
2012. [cited 2012 Dec 17]. Available from : http://www.itu.int/ITU-T/studyg
roups/com12/sg12-q7.html.
207. ___. Work Programme. [online] 2012. [cited 2012 Aug 21]. Available from :
http://www.itu.int/ITU-T/workprog/wp_search.aspx?&isn_sg=551&isn_wp=
807&isn_qu=638&details=0.
208. Yaodu, W., et al. “Subjective Speech Quality Evaluation Based on Collecting
Opinions via Internet.” Proc. 2010 Int. Conf. Communications and Mobile
Computing. (12-14 April 2010): 517-521.
209. Uzoamaka, E.D. Validating Perceptual Objective Listening Quality Assessment
Methods on the Tonal Language Igbo. Master of Science Thesis, Network
Architecture and Services, Faculty of Electrical Engineering, Mathematics
and Computer Science, Delft University of Technology, 2009.
210. Voran, S.D. “Subjective ratings of instantaneous and gradual transitions from
narrowband to wideband active speech.” Proc. IEEE Int. Conf. Acoustics,
Speech, and Signal Processing (ICASSP 2010). (14-19 March 2010): 4674-
4677.
211. Jellnek, M., Vaillancourt, T. and Gibbs, J. “G.718: A New Embedded Speech
and Audio Coding Standard with High Resilience to Error-Prone
Transmission Channels.” IEEE Communication Magazine. 47 (2009): 117-
123.
132
212. Psytechnics. VoIP client benchmarking report. [online] 2012. [cited 2012 Dec
19]. Available from : http://www.ucstrategies.com/uploadedFiles/UC_Inform
ation/White_Papers/Microsoft/VoIP_benchmarking_report.pdf.
213. Hiwasaki, Y. and Ohmuro, H. “ITU-T G.711.1: Extending G.711 to Higher-
Quality Wideband Speech.” IEEE Communication Magazine. 47 (2009):
110-116.
214. 3GPP. 3rd Generation Partnership Project; Technical Specification Group
Services and System Aspects; Performance characterization of the Adaptive
Multi-Rate Wideband (AMR-WB) speech codec (Release 5). [online] 2012.
[cited 2012 Dec 19]. Available from : http://www.3gpp.org/ftp/tsg_sa/TSG_
SA/TSGS_18/docs/pdf/SP-020682.pdf.
215. Chen, J.-H. and Thyssen, J. “BroadVoice®16: A PacketCable Speech Coding
Standard for Cable Telephony.” Proc. 40th Asilomar Conf. Signals, Systems,
Computers 2006 (ACSSC’06). (29 October - 1 November 2006): 1316-1320.
216. ___. “The BroadVoice Speech Coding Algorithm.” Proc. IEEE Int. Conf.
Acoustics, Speech, Signal Processing 2007 (ICASSP 2007). (15-20 April
2007): IV-537-IV-540.
217. Ren, J., et al. “Assessment of effects of different language in VOIP.” Proc. Int.
Conf. Audio, Language and Image Processing 2008 (ICALIP 2008). (7-9
July 2008): 1624-1628.
218. Wu, C.C., et al. “An empirical evaluation of VoIP playout buffer dimensioning
in Skype, Google talk and MSN Messenger.” Proc. Int. workshop on
Network and Operating Systems Support for Digital Audio and Video
(NOSSDAV ’09). (3-5 June 2009): 97-102.
219. Voznak, M. “E-model Modification for Case of Cascade Codecs Arrangement.”
International Journal of Mathematical Models and Methods in Applied
Sciences. 5 (2011): 1301-1309.
220. Voznak, M. et al. “E-model Improvement for Speech Quality Evaluation
Including Codecs Tandeming.” Proc. 9th WSEAS Int. Conf. Data Networks,
Communications, Computers. (3-5 November 2010): 119-124.
221. Zheng, H. and Lin, Q. “Non-intrusive Speech Qualityy Assessment in VoIP
Using the Extended E-model.” Energy Procedia. 13 (2011): 6867-687.
222. Ding, L. and Goubran, R.A. “Speech quality prediction in VoIP using the
extended E-model.” Proc. IEEE Global Communications Conference 2003
(GLOBECOM’03). (1-5 December 2003): 3974-3978.
223. Bandung, Y. et al. “Optimizing Voice over Internet Protocol (VoIP) Networks
based-on Extended E-model.” Proc. IEEE Conf. Cybernetics and Intelligent
Systems. (21-24 September 2008): 801-805.
224. Ren, J. et al. “Enhancement to E-model on standard deviation of packet delay.”
Proc. 3rd Int. Conf. Information Sciences and Interaction Sciences. (23-25
June 2010): 256-259.
225. Gareiss, R. The True Cost of Voice Over IP. [online] 2012. [cited 2012 Dec 17].
Available from : http://www.rts.com/docs/Nemertes%20-%20A%20 Case%2
0for%20Avaya%20VoIP.pdf.
226. Asterisk. A Brief History of the Asterisk Project. [online] 2012. [cited 2012 Dec
17]. Available from : https://wiki.asterisk.org/wiki/display/AST/A+Brief+Hi
story+of+the+Asterisk+Project.
133
227. Digium. Company History. [online] 2012. [cited 2012 Dec 17]. Available from :
http://www.digium.com/en/company/history.php.
228. ___. About Digium. [online] 2012. [cited 2012 Dec 17]. Available from : http:
//www.digium.com/en/company/.
229. Asterisk. Get Started. [online] 2012. [cited 2012 Dec 17]. Available from : http:
//www.asterisk.org/asterisk.
230. Chichareon, P. et al. “Web based Configuration Manager for Asterisk Trunking
System.” Proc. 8th PSU Engineering Conference. (22-23 April 2010): 198-
203.
231. Ratsamimonthon, P. et al. “Service Integration of Voice Communication and
Web based Conference.” Proc. 8th PSU Engineering Conference. (22-23
April 2010): 204-208.
232. Casaby, P. and Puangpronpitag, S. “Problem Evaluation of Security Issues in IP
telephony Open Source software.” Proc. National Conference on Computer
Information Technologies. (13-15 January 2010): 33-38.
233. Toomwan, S. A Study of Voice over IP. Master of Science in Network
Engineering Thesis, Faculty of Information Science and Technology,
Mahanakorn University of Technology, 2010. (Thai).
234. Jaksopha, S. VoIP Development for World Study Center Co.,Ltd. Master of
Science in Network Engineering Thesis, Faculty of Information Science and
Technology, Mahanakorn University of Technology, 2010. (Thai).
235. Johansen, A.J. Improvement of SPIT prevention technique based on Turing test.
Master of Science in Information Technology Thesis, Faculty of Information
Science and Technology, Mahanakorn University of Technology, 2010.
236. Thewaphon, S. A Study of Voice over IP for Department of Agricultural
Extension. Master of Science in Network Engineering Thesis, Faculty of
Information Science and Technology, Mahanakorn University of Technology,
2010. (Thai).
237. Digium. Asterisk Architecture. [online] 2012. [cited 2012 Dec 17]. Available
from : http://www.digium.com/images/graphics/asteriskarch.gif.
238. Suwannaraj, K. IP-PBX Design and Installation Using Asterisk. 1st ed. Offset
Press, 2008. (Thai).
239. Goncalves, F.E. Configuration Guide for Asterisk 1.4 and 1.6. 4th ed. V.Office
Networks, 2012.
240. Barlas, A. Integrating Management System for the Asterisk Soft-IP-PBX. Master
of Science in Information Networking, Athens Information Technology,
2005.
241. Mahler, P. VoIP Telephony with Asterisk. [online] 2012. [cited 2012 Dec 19].
Available from : ftp://ftp.instantnt.ru/pub/Books/Cisco/Voip%20Telephony%
20With%20Asterisk%20(Paul%20Mahler).pdf.
242. Nussbaum, L. and Richard, O. A Comparative Study of Network Link Emulators.
[online] 2012. [cited 2012 Dec 19]. Available from : http://www.loria.fr/~lnu
ssbau/files/netemulators-cns09.pdf.
243. Rizzo, L. “Dummynet: a simple approach to the evaluation of network protocols.”
ACM Computer Communication Review. 27 (1997): 31-41.
244. Carbone, M. and Rizzo, L. “Dummynet Revisited.” ACM SIGCOMM Computer
Communication Review. 40 (2010): 13-20.
134
245. Huang, T.-Y., et al. “Could Skype be more satisfying? – a QoE-centric study of
the FEC mechanism in the internet-scale VoIP system.” IEEE Network. 24
(2010): 42-48.
246. Balan, H.V., et al. L. “An Experimental Evaluation of Voice Quality over the
Datagram Congestion Control Protocol.” Proc. 26th IEEE Int. Conf. on
Computer Communications (INFOCOM 2007). (6-12 May 2007): 2009-2017.
247. Bhattacharya, A., Wu, W. and Yang, Z. “Quality of Experience Evaluation of
Voice Communication Systems using Affect-based Approach.” Proc. 19th
ACM International Conference on Multimedia (MM’11). (28 November – 1
December 2011): 929-932.
248. Amir, M., et al. QoE-Lab: Towards evaluating Quality of Experience for Future
Internet Conditions. [online] 2012. [cited 2012 Dec 17]. Available from :
http://fibium.org/papers/2011-tridentcom-quelab.pdf.
249. Stehle, E., et al. Perception of Utility in Autonomic VoIP Systems. [online] 2012.
[cited 2012 Dec 19]. Available from : https://www.cs.drexel.edu/~spiros/pap
ers/IJAIS09.pdf.
250. Kosonen, V. Voice Quality in IP Telephony. [online] 2012. [cited 2012 Dec 17].
Available from : http://www.netlab.tkk.fi/opetus/s38130/k01/Papers/Kosone
n-VoiceQuality.pdf.
251. Nahum, E.M., et al. “The Effects of Wide-Area Conditions on WWW Server
Performance.” Proc. ACM SIGMETRICS Int. Conf. Measurement and
Modeling of Computer Systems. (16-20 June 2001): 257-267.
252. Stehle, E., et al. Task Dependency of User Perceived Utility in Autonomic VoIP
Systems. [online] 2012. [cited 2012 Dec 19]. Available from : https://www.cs
.drexel.edu/~spiros/papers/ICAS08.pdf.
253. Ries, M., Svoboda, P. and Rupp, M. “Empirical study of subjective quality for
massive multiplayer games.” Proc. 15th Int. Conf. Systems, Signals and
Image Processing, IWSSIP 2008. (25-28 June 2008): 181-184.
254. Hoßfeld, T. and Binzenhöfer, A. “Analysis of Skype VoIP traffic in UMTS:
End-to-end QoS and QoE measurements.” Computer Networks. 52 (2008):
650-666.
255. VassarStats. Concepts & Applications of Inferential Statistics. [online] 2012.
[cited 2012 Dec 17]. Available from : http://vassarstats.net/textbook/index.ht
ml.
256. Park, H.M. Comparing Group Means: The T-test and One-way ANOVA Using
STATA, SAS, and SPSS. [online] 2012. [cited 2012 Dec 17]. Available from :
http://stat.smmu.edu.cn/DOWNLOAD/ebook/statistics_course.pdf.
257. Overath, T. and Whalley, M. t- and F-tests: Testing hypotheses. [online] 2012.
[cited 2012 Dec 17]. Available from : http://www.fil.ion.ucl.ac.uk/spm/doc/
mfd/2005/Ft-tests.ppt#280,4,Types of Error.
258. DeCoster, J. Testing Group Differences using T-tests, ANOVA and Nonparametric
Measures. [online] 2012. [cited 2012 Dec 17]. Available from : http://www.
stat-help.com/ANOVA%202006-01-11.pdf.
259. Lee, H. and Kuchroo, M. ANOVA: A Test of Analysis of Variance. [online] 2012.
[cited 2012 Dec 17]. Available from : http://www.bbn-school.org/us/math/ap
_stats/project_abstracts_folder/proj_student_learning_folder/anova_kuchroo
_lee.ppt.
135
TABLE A-1 Selected TSST speech samples for the listening opinion test [269]
Speech
Sentences (or Phrases) Meaning
Group No.
่
ตอไปเป็ ่
นขาวในพระราชสํ
านัก
Next, it is the royal news.
(t盢 pa‹j pe‹n kha›˘w naj pHra¤ rafl˘t tCHa¤ sa‡m na¤k)
5
ไปไหนมาเหรอ
Where have you been?
(pa‹j na‡j ma‹˘ rF‡˘)
ํ งจะไปไหน
กาลั
Where are you going?
(ka‹m la‹N tCa› pa‹j na‡j)
8
ขอรบกวนเวลาสักครู่ นะครับ/คะ Please give me
(kHç˘ ro¤p ku‹˘an we‹˘ la‹˘ sa›k kHufl˘ na¤ kHra¤p kHa¤) sometime.
ิ าวที่ไหนกนดี
วันนี้ ไปกนข้ ั Where will we go to eat
(wa‹n ni¤˘ paj‹ ki‹n kha˘fl w tHi˘fl naj‡ kan‹ di˘‹ ) today?
9
จะกลับเมื่อไหร่ When will you come
(tCa› kla›p mμfl˘a ra›j) back?
ดูแลรักษาสุ ขภาพด้วยนะ
Take care of your health.
(du‹˘ lE‹˘ ra¤k sa‡˘ su›k kHa› pHafl˘p dufl˘aj na¤)
11
ตกลงนะครับ/คะ
(to›k lo‹N na¤ kHra¤p kHa¤)
Are you ok?
่ บมาใหม่
กรุ ณาติดตอกลั
Please contact again.
(ka› ru¤ na‹˘ ti›t t盢 kla›p ma‹˘ ma›j)
12
กลับถึงบ้านหรื อยัง
Have you reached home?
(kla›p tHμ‡N bafl˘n rμ‡˘ ja‹N)
139
TABLE A-1 Selected TSST speech samples for the listening opinion test (Conttinued)
Speech
Sentences (or Phrases) Meaning
Group No.
SELECTED PUBLICATIONS
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
BIOGRAPHY
Biography :
Therdpong Daengsi was born in Nakornphanom, in 1974. He received a
Bachelor of Engineering in Electrical Engineering from the Department of Electrical
Engineering, Faculty of Engineering, King Mongkut’s University of Technology
North Bangkok (KMUTNB), the former KMITNB, in 1997. In the same year, he
started working for a company, in the key-telephone system department, before
moving to the PABX department and becoming the Avaya Certified Expert in IP
Telephony. He received a Mini-MBA Certificate in Business Management and a
Master of Science in ICT from Assumption University in 2006 and 2008 respectively.
In order to obtain his Ph.D. proudly, he presented and published over 10 papers,
including one at the IEEE conference in the USA and another at the international
conference in Japan. While pursuing his graduate studies, he worked at another
telecommunication company in Bangkok. At the moment he is the service manager at
that telecom company and has now gained over 15 years of experience in telecom
service business.
Conference Publications :
1. Daengsi, T. and Preechayasomboon, A. “Enhanced Service Ticket Tracking
System by Using Email and SMS” Paper presented in NCCIT2008,
Mahasarakham, May 2008.
2. Daengsi, T. and Preechayasomboon, A. “Case Study: AIA Insurance – Migration
Project Experience” .Paper presented in NCCIT2008, Mahasarakham, May
2008.
3. Daengsi, T., and Preechayasomboon, A., “Case Study: AMEX Thailand – PABX
Migration Experience.” Paper presented in NCCIT2009, Bangkok, May
2009.
4. Daengsi, T. “A Simplified Subjective Measurement Method for Voice Quality
Evaluation in IP Telephony System Deployment.” Paper presented in
KMITL2552 Academic Conference, Bangkok, August 2009.
5. ____. “Voice Quality Measurement for VoIP: Simple Method Using a Survey.”
Paper presented in EECON-32, Prachinburi, October 2009.
6. Daengsi, T. and Tontiwattanakul, K. “A Case of Improvement of Building
Acoustics Using Available Equipments and Limited Resources.” Proc. of
the 6th Naresuan Research Conference 2010, pp. 2-13, Phitsanulok,
Thailand, July 2010.
7. Daengsi, T., Wutiwiwatchai, C., Preechayasomboon A. and Clayton, G. “Ear
Preference and Hand Dominance for Telephone Use by Thai People.” Paper
presented in ThaiBME2010, Bangkok, Thailand, August 2010.
168
Journal Publications :
1. Daengsi, T., Wutiwiwatchai, C., Preechayasomboon, A. and Sukparungsee, S.
“Speech Quality Assessment of VoIP: G.711 VS G.722 Based on Interview
Tests with Thai Users.” International Journal of Information Technology
and Computer Science, Vol.4, No.2, pp.19-25, March 2012.
2. Daengsi, T., Sukparungsee, S., Wutiwiwatchai, C. and Preechayasomboon, A.,
“Comparison of Perceptual Voice Quality of VoIP Provided by G.711 and
G.729 Using Conversation-Opinion Tests.” International Journal of the
Computer, the Internet and Management, Vol. 20, No. 1, pp. 21-26, April
2012.
3. Daengsi, T., Wutiwiwatchai, C., Preechayasomboon A. and Sukparungsee, S.
“VoIP Quality Measurement: Insignificant Voice Quality of G.711 and
G.729 Codecs in Listening-Opinion Tests by Thai Users.” Information
Technology Journal, Vol. 8, No., 1, pp. 77-82, June 2012.
4. Daengsi, T., Preechayasomboon, A., Sukparungsee, S. and Wutiwiwatchai, C.
“Thai Text Resource: A Recommended Thai Text Set for Voice Quality
169