Professional Documents
Culture Documents
1 Introduction
Call centers are the front door of any organization where crucial interactions with
the customers are handled [Reynolds (2010)]. The effective and efficient operations
are the key ingredient to the overall success of in-source and outsource call center
profitability and reputation. It is very difficult to measure productivity objectively
because the agent output, as a firm worker, is the spoken words delivered to the cus-
tomer over the phone. The evaluation mostly is handled in a subjective way. Subjec-
tive evaluation in call center is the essence of qualitative method which is performed
through the monitoring and evaluating the interactions between the agent and the
customer according to the evaluator perception [Sharp (2003)]. It is performed by
listening to the agent recorded call, tapping a live call or making a test call by one of
the quality team or anonymous caller [Rubingh (2013)]. The quality team listens to
the agents recorded calls and uses predefined evaluation forms (evaluation check list)
[Cleveland (2012)]. The evaluation process has many drawbacks. One of the reasons
is that the quality teams evaluate the agents according to their perception and prece-
dent experiences [Cleveland (2012)]. The subjective evaluation opens the door for
favoritism due to what is called social ties [Breuer et al. (2013)]. A quantitative study
of social ties and subjective performance evaluation [Breuer et al. (2013)] highlights
a closer social attachment between supervisors and subordinates that leads to bet-
ter performance rating when there are no differences in true performance. Another
drawback of subjective evaluation is the resources limitation to evaluate all the agents
consistently per time. For instance, some agents are evaluated in different day shifts
(day shift/night shift) which leads to inconsistent or unfair evaluation from one agent
to another. A typical challenge in performance evaluation studies is that the true per-
formance is not observable to the researcher and hence it is hard to assess the gap and
detect the evaluation distortions [Breuer et al. (2013)]. This means that the subjec-
tive evaluation may underestimate the agent when his performance could be higher.
Conversely, the agent may be overestimated in evaluation because of other factors
that may not be relevant to the true performance or the quality of service.
This paper proposes three classification methods for objectively measuring the
agent’s productivity through machine learning approach. The next section (Sect. 2)
discusses the conceptual framework and gives an overview about the main build-
ing blocks. Section 3 discuss in some details the binary classification methods and
parameters optimization methods. Section 4 explains the experiment carried out and
Sect. 5 discusses the study results. Finally, Sect. 6 concludes the study and recom-
mends research opportunities for future work.
This section gives an overview about productivity measurement and highlights gen-
eral concepts and methods of agent evaluation in call centers environment.
A Call Center Agent Productivity Modeling . . . 503
Agent Output
Productivity = (1)
Input Effort
Speech recognition systems started in the 80s and achieved a significant improve-
ment by the new era of machine learning using neural networks [Yu and Deng
(2012)]. By transcribing the calls into text, the content analysis has become a power-
ful tool for features prediction and interpretation [Othman et al. (2004)]. The speech
in Arabic language achieved a high accuracy in terms of word error rate (WER)
[Ahmed et al. (2016)]. The word error rate is the main indicator of the speech recog-
nition accuracy and performance [Young et al. (2015)]. The lower word error rate
(WER), the higher performance of speech recognition [Woodland et al. (1994)]. The
inbound or outbound call is divided into agent talk part, customer part, silences,
music on hold and noise. As the agent part is the target of the analysis, it requires
a diarization process. The diarization process is the process of using an acoustic
model for sophisticated signal and speech processing to split the one channel mono
recorded voices into different speakers [Tranter and Reynolds (2006)]. It removes
silences, music as well as giving clear one speaker speech [Tranter and Reynolds
(2006)].
The diarization process was intended to be performed using LIUM diarization
toolkit [Meignier and Merlin (2010)]. It is a java based open source toolkit spe-
cialized in diarization using speech recognition models. It required Gaussian Mix-
ture Model (GMM) for training voice and corresponding labels using two clustering
states or more according to the number of speakers (number of states equal to the
number of speakers). It uses GMM mono-phone to present the local probability for
each speaker [Meignier and Merlin (2010)]. For speech recognition, we uses both of
GMM and Hidden Markov Model (HMM) methods but in different configurations.
In Arabic language, we use 3 HMM states for each phone (Arabic proposed phones
are 40 phones), each state is presented by 16 Gaussian models. The Arabic speech
recognition was presented in paper [Ahmed et al. (2016)]. We leave the diarization
process for future work.
A Call Center Agent Productivity Modeling . . . 505
Table 1 Sample of Arabic letters, corresponding characters transliteration and its English equiv-
alent
Arabic letter
Transliteration ga d sh l k
Equivalent A D SH L K
English
This step is essential to convert the Arabic transcription into Latin for machine
processing. The character set are 36 character as shown in Table 1.
The transliteration process maps each letter from Arabic to the corresponding
Latin character. The next example shows a transliteration of an Arabic statement:
After the speech has been transcribed into text, the next step is to process the text
using sentiment analysis. Sentiment analysis refers to the natural language process-
ing by detecting and classifying the sentiment expressed by an opinion holder [Mur-
phy (2012)]. Sentiment analysis, also called opinion mining, is the way to classify
the text based on opinions and emotions regarding a particular topic [Richert et al.
(2013); Chen and Goodman (1996)]. This technique classifies the text in polarity
way (on/off/yes/no/good/bad), and it is used for assessing people opinion in books,
movies etc. It deals with billions of words over the web and classifies the positive
and negative opinions according to the most informative features (word) extracted
from the text.
The sentiment analysis uses different binary classification methods in order to
classify and predict the most informative features. The binary classification means
that the classifier results will be only two results i.e. productive/non-productive
1
http://www.qamus.org/transliteration.htm.
506 A. Ahmed et al.
(in our study). This article has selected three classification methods to compare
the classification performance applied on text (Sect. 3). Taking in consideration that
agent productivity is different than opinion mining (emotions classification) because
productivity is assessment of the output of the agent as mentioned in Eq. (1) regard-
less emotional words. However, this work is performed under assumption that the
semantics of the call contents should be shaped with a positive meaning that tends
to classify the call to a productive call [Ezpeleta et al. (2016)].
The classification accuracy is one of the most important factors in this study. The
human accuracy shows that the level of agreement regarding sentiment is about
80%.2 However, this percent cannot be considered as a baseline of the study because
of two reasons. The first reason is that the accuracy is dependent on the domain of
the collected text which varies from one domain to another. For example, the produc-
tivity features are perceived in different way than other domains like spam emails or
movies review. The second reason is that the machine learning approach and human
perception are incommensurable. Hence, The study presents unprecedented baseline
of performance in real-estate call centers located in Egypt. For the data validation of
the study, the classifier should be able to classify the test set accurately as intended.
The accuracy calculation is given by Eq. (5).
c
Fcor
Accuracy = , (2)
Ftot
c
where Fcor is the correctly classified features per class and Ftot is the total features
extracted.
This section describes the classification using Naïve Bayes denoted by (NB), logistic
regression denoted by (LR) and linear support vector machine (LSVC).
The Naïve Bayes classifier is built on Bayes theorem by considering that features are
independent from each other. Naïve Bayes satisfies the following equation:
2 http://www.webmetricsguru.com/archives/2010/04/sentiment-analysis-best-done-by-humans.
A Call Center Agent Productivity Modeling . . . 507
p(x|c)p(c)
p(c|x) = , (3)
p(x)
Nc
p(c) = , (6)
Ntot
where Nc is the number of the words (features) annotated per class divided by total
number of features in both classes Ntot . To calculate the maximum likelihood, we
count the frequency of word per class and divide it to overall words counted per the
same class [Jurafsky and Martin (2014)] as following:
count(xi |c)
p(xi |c) = ∑ (7)
count(x|c)
count(xi |c) + 1
p(xi |c) = ∑ (8)
count(x|c) + 1
To avoid underflow and to increase the speed of the code processing, we use log(p)
in Naïve Bayes calculations [Yu and Deng (2012)].
508 A. Ahmed et al.
∑
N
p(c|x) = w0 x0 + w1 x1 + ⋯ + wn xn = wi xi = wT x (9)
i=1
However, the original Eq. (9) lies from −∞ to ∞, hence, this can be achieved by
taking the natural log:
p(c|x)
log( ) = z = wT x (11)
1 − p(c|x)
1
p(c|x) = 𝛷(z) = (12)
1 + e−z
∑
N
z = w0 x0 + w1 x1 + ⋯ + wn xn = wi xi = wT x
i=1
The logit function 𝛷(z) is sigmoid function that limits the classification output
from 0 to 1 (probability function). If we assume the classification threshold is 0.5 and
logit function output is above the 0.5 (i.e. 𝛷(z) = 0.8), then it means that the prob-
ability of a particular sample features of x and w being productive is 80% as shown
∑N
in Fig. 2. The threshold value is also called hyperplane where z = 0 or i=0 wi xi = 0
[Jurafsky and Martin (2014)].
Equation (12) predicts the class given the features. Then we have to train the
model for best value of w which is required for optimum classification. The following
figure illustrates the network diagram by defining an initial value of the weights w,
A Call Center Agent Productivity Modeling . . . 509
summing the net input, classifying then adjust the weights according to error detected
[Raschka (2015)] (Fig. 3).
In linear regression, the error is determined by sum-squared cost function (SSE).
Equation (13) is the cost function (SSE).
∑
J(w) = (𝛷(z)(i) − y(i) )2 (13)
i
Assuming that the features are independent each other so that the maximum joint
probability of the class c given the feature x is the probability product of all the class
observations given the input features. The likelihood definition of logistic function
in Eq. (14).
∏ n
L(w) = p(y(i) |x(i) , w) (14)
i=1
The logit function 𝛷(z) is a probability function that can be governed by proba-
bility rules. This function is similar to Bernoulli function as stated in Eq. (15):
∏
n
y(i) 1−y(i)
L(w) = (𝛷(z(i) )) (1 − 𝛷(z(i) )) (15)
i=1
Taking log helps in calculating the small values by summation rather than multi-
plication. Furthermore, it eliminates the exponents for easier manipulation, the like-
lihood becomes:
∑
n
l(w) = logL(w) = y(i) log(𝛷(z(i) )) + (1 − y(i) )log(1 − 𝛷(z(i) )) (16)
i=1
Therefore, the objective function (cost function) is minimized by taking the neg-
ative log of Eq. (16).
[ ]
∑
n
J(w) = −logL(w) = − y(i) log(𝛷(z(i) )) + (1 − y(i) )log(1 − 𝛷(z(i) )) (17)
i=1
𝜕𝛷(z) ( ) ( )
𝜕 1 e−z 1 1
= = = 1−
𝜕z 𝜕z 1 + e−z (1 + e )
−z 2 1+e −z 1+e −z
n ( )
𝜕l(w) ∑ (i) 1 (i) 1 𝜕
= y − (1 − y ) 𝛷(z(i) )(1 − 𝛷(z(i) )) z(i)
𝜕w i=1
𝛷(z )
(i) 1 − 𝛷(z )
(i) 𝜕w
∑
n
( (i) )
= y (1 − 𝛷(z(i) )) − (1 − y(i) )𝛷(z(i) ) x(i)
i=1
∑
n
( (i) )
𝛥w = y − 𝛷(z(i) ) x(i) (20)
i=1
The final value of w is the previous value added to the predicted one in Eq. (19).
∑
n
wnew ∶= wold + 𝛥w = wold + (y(i) − 𝛷(z(i) ))x(i) = wold − 𝜂∇J(w) (21)
i=1
One way to update the parameters is to use gradient descent as shown in Eq. (21)
but usually Newton’s method is used to improve the speed of the training [Fan et al.
(2008)]. 𝜂 is the learning rate (a constant between 0 to 1) that controls the step size
or the amount of the effect of 𝛥w. It is adjusted to decide the speed (number of steps)
for reaching the optimum value or may overfitting the model. Overfitting is one of
the famous problems in machine learning when the model fits the training set but
cannot be generalized well for unseen data [Murphy (2012)]. The overfitted model
has a high variance and it is too complex. Similarly, the model that suffers from
underfitting when it fails to capture the pattern in the training data and suffers from
low performance [Raschka (2015)].
Regularization (called L2 regularization) is the method to control the collinearity
(high correlation among features) and prevent overfitting [Raschka (2015)]. It penal-
izes the extreme parameter weights and the model complexity [Murphy (2012)]. L2
regularization is the summation of weights energy over the parameters described in
Eq. (22).
∑n
L2 = 𝜆 ‖w‖2 = 𝜆 w2i (22)
i=1
Support Vector Machine (SVM) is the most powerful and widely used in machine
learning and binary classification [Raschka (2015)]. The margin is defined as the
distance between the separating hyperplane (decision boundary) and the training
samples that are closest to this hyperplane [Murphy (2012)]. This method is hypoth-
esized on the maximization of the margin that it gives the best hyperplane posi-
tion and linearly classifies the samples. The support vectors are the nearest samples
(almost touching) to the margin that bounds the hyperplane. Figure 4 illustrates the
hyperplane and the margin for binary classes x1, x2.
The hyperplane equation in terms of weight vector wT apart from b and vector x.
wT x + b = 0 (24)
This means that any vector resides on the hyperplane, the equation results zero. In
Fig. 4, it is assumed that the first support vector point of the positive class resides on
the margin with distance +1 from the hyperplane and the negative point of negative
class on −1. So we have two equations:
wT xpos + b = +1 (25)
wT xneg + b = −1 (26)
When we subtract both equations and normalize them by ‖w‖ where ‖w‖ =
√∑
n 2
i=1 wi [Raschka (2015)]:
wT (xpos − xneg ) 2
= (27)
‖w‖ ‖w‖
The left side of the Eq. (27) is the distance between the positive and negative
boundary (the margin). The objective function of SVM becomes maximum by min-
imizing ‖w‖ (or 12 ‖w‖2 for mathematical convenience) [Murphy (2012)]. This is
under the constraint that the samples are classified according to the following:
y(wT x + b) ≥ +1 (30)
Now, we have to use Lagrange equation to combine both of w and the constraint
y(wT x + b) ≥ +1 in one equation (a slack variable) [Gunn et al. (1998)]. By adding
Lagrange multiplier 𝛼, the Lagrangian equation will be as following:
1 T ∑N
𝓁(w, b, 𝛼) = w w− 𝛼n (yn (wT xn + b) − 1) (31)
2 n=1
By differentiation and substituting the equations that are equal to zero, the final
equation of 𝛼 minimization:
∑
N
1 ∑∑
N N
∇𝓁(𝛼) = 𝛼n − y y 𝛼 𝛼 xT x (32)
n=1
2 n=1 m=1 n m n m n m
∑N
where 𝛼n ≥ 0 for n = 1, … , N and n=1 𝛼n yn = 0
It is obvious that w has been disappeared from the first derivative of Lagrange
equation, however, it is concluded from the partial derivative with respect to w that
𝛼 is proportionally related to w.
∑
N
w= 𝛼n yn xn (33)
n=1
514 A. Ahmed et al.
Equation (31) is the dual form that is used in case of hard margin. Hard margin
is the maximum distance between the hyperplane and the margin when the data is
linearly separable [Friedman et al. (2001)]. There are other cases that the data is
almost linearly separable but there are some data points that makes few mistakes
and violates the margin. That data points reside either within the margin (correctly
classified) or with the margin in the opposite side (misclassified) as they are shown
in Fig. 5.
Referring to Eq. (29), an error 𝜉 should be added as shown in Eqs. (30) and (34).
y(wT x + b) ≥ +1 − 𝜉 (34)
𝜉 is the error distance of the vector data (slack variable) that violates the margin
and takes the value range 𝜉 ≥ 0 [Fan et al. (2008)]. Hence, 𝜉 = 0 means there is no
error and vector data are correctly classified and reside out of the margin (the decision
boundary). When 0 ≤ 𝜉 ≤ 1, it means that the data vector reside in the margin but
still correctly classified (The vector resides between the margin and hyperplane).
When 𝜉 ≥ 1 means that the data vector is misclassified (in the wrong side of the
hyperplane). The soft margin is the case when it is accepted to have some errors
or misclassified data points. The primal form of the loss function in this case is in
Eq. (35).
1 ∑N
min wT w + C 𝜉n (35)
w 2
n=1
A Call Center Agent Productivity Modeling . . . 515
1 ∑N
min wT w + C max(0, 1 − yi wT xi ) (37)
w 2
n=1
The form in Eq. (37) is called L1-SVM. The quadratic smoothed form is L2-
regularization form of SVM (L2-SVM), which is used in our experiment, in Eq. (38)
[Fan et al. (2008); Rennie and Srebro (2005)].
1 ∑N
min wT w + C max(0, 1 − yi wT xi )2 (38)
w 2
n=1
It is worth mentioning that both of dual and primal form in LSVM can be used
independently to get the optimum classification margin [Wang (2005)] which is
applied in python libraries as will be mentioned in next section.
516 A. Ahmed et al.
4 The Experiment
The corpus consists of 7 h real estate call centers hosted in Egypt. A call center qual-
ity team has listened to the calls carefully (30 calls) and categorized subjectively the
data set into productive/non-productive. The criterion used for file annotation is the
ability of the agent or CSR to respond to the customer inquiries with right answers
and fulfill the answer items. The evaluator gives score for each item fulfilled out of
the total items required. For example, the customer asks about an apartment for sale,
assuming the answer of the agent is set of 5 main items (Ask the customer name,
the contact number, the budget available, the city, number of bed rooms). When the
agent misses one of the answer items, the score is deducted by one point. Each call
center draws the baseline of productivity according to the call center ultimate objec-
tives. Referring to previous example, when the agent misses one item out of 5, this
may be considered non-productive in call center (X) and still productive in call center
(Y). Here it comes the power of machine learning for drawing the baseline for every
different call center environment. The 30 calls are split into smaller audio chunks
each audio file duration is around 20–50 s. This process is performed to simulate
the output files that are produced by both of the diarization and speech recognition
decoding process [Meignier and Merlin (2010)]. Therefore, the total files are 500
files divided into training set of 400 files and test set of 10% of the files (100 files).
The quality team subjectively has annotated the files in to productive (400 files) and
non-productive (100 files). This unbalanced annotation biases the results of proba-
bility of each class p(c) in Eq. (3) which deviates from one class to the other. It should
be for example 400 productive versus 400 non-productive. In this case, we may have
to use a scaling factor to adjust the probability per class. The text has been converted
from Arabic into Latin using Buckwalter script for machine processing then con-
verted it back into Arabic. The code is developed in python version 2.7 using natural
language toolkit (NLTK) Naïve classifier, Scikit learning library [Raschka (2015)].
The code uses bag-of-words which is an unordered set of words frequency regardless
to its position in the sentence [Jurafsky and Martin (2014)]. Both Logistic regression
and Linear support vector machine are classified using liblinear/scikit library instead
of libsvm. This is because liblinear library has more flexibility for penalties and loss
function parameters adjustment and better scaling for large numbers of samples.3
The scope of the study is to perform the classification methods: Naïve Bayes (NB),
Logistic regression (LR) and linear support vector machine (LSVM). The compari-
son is to show the best classification performance among the classification methods.
3
http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html.
A Call Center Agent Productivity Modeling . . . 517
5 Results
We generated the data model and applied different binary classification approaches
on the whole data set to compare their performance. The classification results accu-
racy is shown the Table 2.
The data set and the test set has been classified in order to extract the most infor-
mative features (MIF). Most informative features (MIF) is presented by ratio of
occurrence in which the feature is much appearance in one of the classes than other.
The features have been converted from Buckwalter to Arabic, then into English as
following.
The NB classifier provides the ratios for comparing different features with respect
to training set that are more often repeated in one of the classes which is called
likelihood ratios [Murphy (2012)]. As shown in Table 3, NB presented the 10 most
informative features out of 100 features extracted and have high tendency to specific
class. We tried subjectivity to explore the meaning behind the classification for better
understanding the definition of productivity in real estate call centers. For feature
number 3 - or in English saying (I have no idea) is non-productive according
to lack of awareness of the product or the service. In feature number 6, the agent
dictates his/her mobile number over the phone, and this is considered non-productive
as it consumes much time. The prices for the feature number 10—(expensive) is
classified non-productive feature because it drags the CSR in useless debate and
consumes much time. Furthermore, it might be an unjustified answer by agent which
may indicate less awareness to the market changes and prices. This feature may be
categorized under the same feature number 3. For productive agents, they mentioned
the key features of the apartments or villas such as (the view), (the roof) and the city
(October) can be considered as product awareness. Nevertheless, the feature in itself
works perfectly in evaluating the agent for mentioning some selling points through
the call. There are other features meaningless, for example, (to go), (yesterday),
(Upper), and (Sales). These features may be related to the error percentage which is
expected from the beginning. The accuracy is expected to be improved by training
larger corpus and getting balanced training set for productive and non-productive.
Referring to Table 2, the experiment results can be summarized in a two main
themes. The first theme is the generative versus discriminative models. In the previ-
ous experiment, the Naïve Bayes gives the lowest classification performance com-
pared to other methods, which is expected, because it is generative method. It is
well-known in machine learning that generative models gives less accuracy in clas-
sification compared to discriminative one [Jurafsky and Martin (2014)]. The second
theme is about the linear support vector classification. It shows that the margin max-
imization approach is the most appropriate method in this experiment.
6 Conclusion
The article proposed the call center agents performance evaluation using different
machine learning approaches. Three methods are developed in this work: Naïve
Bayes classifier, Logistic regression and support vector machine for binary classi-
fication (productive/non-productive). The annotation was performed by a call center
quality team that depends on the scoring the itemized answers. The resulted accu-
racy of classification shows that discriminative models (logistic regression/support
vector machine) could give high accuracy (80, 82%) compared to generative model
(Naïve Bayes accuracy 67%). There are still research gap in productivity measure-
ment for better extracting the productivity features (the most informative features—
MIF). Furthermore, extending the productivity measurement in a range of scales
(rather than binary classification) and considering the conversation context may help
better understanding objectively the evaluation gap.
A Call Center Agent Productivity Modeling . . . 519
References
Abbott, J.C.: The Executive Guide to Call Center Metrics. Robert Houston Smith Publishers (2004)
Ahmed, A., Hifny, Y., Shaalan, K., Toral, S.: Lexicon free Arabic speech recognition recipe.
In: International Conference on Advanced Intelligent Systems and Informatics, pp. 147–159.
Springer (2016)
Breuer, K., Nieken, P., Sliwka, D.: Socialities and subjective performance evaluations: an empirical
investigation. Rev. Manag. Sci. 7(2), 141–157 (2013)
Card, D.N.: The challenge of productivity measurement. In: Proceedings of the Pacific Northwest
Software Quality Conference (2006)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In:
Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, Asso-
ciation for Computational Linguistics, pp. 310–318 (1996)
Cleveland, B.: Call Center Management on Fast Forward: Succeeding in the New Era of Customer
Relationships. ICMI Press (2012)
Ezpeleta, E., Zurutuza, U., Hidalgo, J.M.G.: Does sentiment analysis help in Bayesian spam filter-
ing? In: International Conference on Hybrid Artificial Intelligence Systems, pp. 79–90. Springer
(2016)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear
classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer Series
in Statistics. Springer, Berlin (2001)
Gunn, S.R., et al.: Support vector machines for classification and regression. ISIS Tech. Rep. 14,
85–86 (1998)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Pearson (2014)
Meignier, S., Merlin, T.: LIUM SpkDiarization: an open source toolkit for diarization. In: CMU
SPUD Workshop, vol. 2010 (2010)
Murphy, K.P.: Naive Bayes Classifiers. University of British Columbia (2006)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press (2012)
Othman, E., Shaalan, K., Rafea, A.: Towards resolving ambiguity in understanding Arabic sentence.
In: International Conference on Arabic Language Resources and Tools, pp. 118–122. NEMLAR,
Citeseer (2004)
Raschka, S.: Python Machine Learning. Packt Publishing Ltd (2015)
Rennie, J.D., Srebro, N.: Loss functions for preference levels: regression with discrete ordered
labels. In: Proceedings of the IJCAI Multidisciplinary Workshop on Advances in Preference
Handling, pp. 180–186. Kluwer, Norwell, MA (2005)
Reynolds, P.: Call center metrics: best practices in performance measurement and management to
maximize quitline efficiency and quality. North American Quitline Consortium (2010)
Richert, W., Chaffer, J., Swedberg, K., Coelho, L.: Building Machine Learning Systems with
Python, vol. 1. Packt Publishing, GB (2013)
Rubingh, R.: Call Center Rocket Science: 110 Tips to Creating a World Class Customer Service
Organization. CreateSpace Independent Publishing Platform. https://books.google.ae/books?
id=IknGmgEACAAJ (2013)
Sharp, D.: Call Center Operation: Design, Operation, and Maintenance. Digital Press (2003)
Steemann Nielsen, E.: Productivity, definition and measurement. The Sea 2, 129–164 (1963)
Taylor, P., Mulvey, G., Hyman, J., Bain, P.: Work organization, control and the experience of work
in call centres. Work Employ. Soc. 16(1), 133–150 (2002)
Thomas, H.R., Zavrki, I.: Construction baseline productivity: theory and practice. J. Construct. Eng.
Manag. 125(5), 295–303 (1999)
Tranter, S.E., Reynolds, D.A.: An overview of automatic speaker diarization systems. IEEE Trans.
Audio Speech Lang. Process. 14(5), 1557–1565 (2006)
Wang, L.: Support Vector Machines: Theory and Applications, vol. 177. Springer Science & Busi-
ness Media (2005)
520 A. Ahmed et al.
Woodland, P.C., Odell, J.J., Valtchev, V., Young, S.J.: Large vocabulary continuous speech recog-
nition using HTK. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal
Processing, 1994. ICASSP-94. IEEE, vol. 2, pp. II/125–II/128 (1994)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Olla-
son, D., Povey, D.: The HTK Book (for HTK version 3.5). Cambridge University Engineering
Department, Cambridge, UK (2015)
Yu, D., Deng, L.: Automatic Speech Recognition. Springer (2012)