Research Paper AIMSTalk

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/358185386
AIMS TALK: Intelligent Call Center Support in Bangla Language with Speaker
Authentication
Conference Paper · December 2021

DOI: 10.1109/ETCCE54784.2021.9689831
CITATION READS
1 184
7 authors, including:
Shehan Irteza Pranto Rahad Arman Nabid

University of Alabama Temple University
4 PUBLICATIONS 3 CITATIONS 6 PUBLICATIONS 21 CITATIONS
SEE PROFILE SEE PROFILE
Ahnaf Mozib Samin Farhana Sarker

University of Malta University of Liberal Arts Bangladesh (ULAB)
16 PUBLICATIONS 17 CITATIONS 49 PUBLICATIONS 350 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Shehan Irteza Pranto on 08 December 2023.
The user has requested enhancement of the downloaded file.

2021 Emerging Technology in Computing, Communication and Electronics (ETCCE)
AIMS TALK: Intelligent Call Center Support in

Bangla Language with Speaker Authentication
2021 Emerging Technology in Computing, Communication and Electronics (ETCCE) | 978-1-6654-8364-3/21/$31.00 ©2021 IEEE | DOI: 10.1109/ETCCE54784.2021.9689831
Shehan Irteza Pranto Rahad Arman Nabid Ahnaf Mozib Samin Nabeel Mohammed
AIMS Lab, UIU AIMS Lab, UIU AIMS Lab, UIU Dept. of CSE
Dhaka, Bangladesh Dhaka, Bangladesh Dhaka, Bangladesh North South University
shehanirteza@gmail.com ran.nabid@gmail.com asamin9796@gmail.com Dhaka, Bangladesh
nabeel.mohammed@northsouth.edu
Farhana Sarker Mohammad Nurul Huda Khondaker A. Mamun*

Dept. of CSE Dept. of CSE Dept. of CSE
University of Liberal Arts Bangladesh United International University United International University
Dhaka, Bangladesh Dhaka, Bangladesh Dhaka, Bangladesh
farhana.sarker@ulab.edu.bd mnh@cse.uiu.ac.bd mamun@cse.uiu.ac.bd
Abstract—Call support centers operate over the telephone, such as email, social media, or chat. Customer satisfaction
connect between customers and receptionists to ensure cus- over telephone calls depends on the behavior of customer
tomer satisfaction by solving their problems. Due to pandemics, service employees. Moreover, the behavior of customer service
customer call support centers have become a popular way of
communication that has been used in different domains such as employees depends on their mood; sometimes, biases and
e-commerce, hospitals, banks, credit card support, government unresponsiveness may occur. In fact, human service members
offices. Moreover, humans’ limitations to serve 24 hours a day cannot support 24 hours customer services. So customers can-
and the fluctuation of waiting time makes it more challenging to not get support from the hot-line after office time. Moreover,
satisfy all the customers over call center. So, customer service due to pandemics, the issues of traditional customer call center
needs to be automated to handle customers by providing a
domain-based response in the native language, especially in a service are becoming apparent as an increasing number of
developing country like Bangladesh, where call support centers people are trying to get service through call centers.
are increasing. Although most people use the Bangla language With the increasing difficulty of serving customers, now
to communicate, little work has been done in customer care artificial intelligence (AI) can be the scalable and most cost-
automation in the native language. Our developed architecture, efficient solution for improving customer service [2]. Accord-
“AIMS TALK” can respond to that customer’s need by rec-
ognizing users’ voices, specifying customers’ problems in the ing to the latest report of smallbiztrends titled, “Local Business
standardized Bangla language, collecting customers’ responses Websites and Google My Business Comparison Report” said,
to the database to give feedback according to the queries. 60% of the customers prefer to call over the phone during
Besides, the system uses MFCC feature extraction for speaker pandemic instead of visiting physical shop [14]. Issues include
recognition with an average accuracy of 94.38% on 42 people limited service available only in the office period, sluggish
in real-time testing, an RNN-based model for Bangla Automatic
Speech Recognition (ASR) with a word error rate (WER) of service that led to the increase of customers’ waiting time,
42.15%, and sentence summarization we used Sentence similarity and service delay during peak period [4]. Studies show that
measurement technique having an average loss of 0.004. Lastly, 66% of the customers prefer to solve their issues within 10
we used gTTS that works as Text to Speech Synthesis for the minutes, or they may readily switch to alternative service [1].
Bangla language in WavNet architecture. AI can automate resolution while cutting costs and enhancing
Index Terms—Text to Speech Synthesis(TTS), Automatic
Speech Recognition, Mel-Frequency Cepstral Coefficients customer satisfaction, allowing human agents to work on
more complex issues [3]. Many E-commerce companies have
I. I NTRODUCTION started to implement various forms of AI to understand their
customers better and provide an improved customer experience
Phone call is the preferred top customer support channel for but working with the Bangla language has limited progress. In
almost all e-commerce companies. Customers try to resolve addition, around 37% of e-commerce customers use automated
their issues and queries via phone than any other medium management services such as chatbots to react quickly in an
emergency situation to support 24/7 customer [1].
This research work is funded by ICT innovation Fund (a2i), ICT Division,
Ministry of Posts, Telecommunications and Information Technology, the Researchers have been working on finding the best possible
People’s Republic of Bangladesh solution integrating AI-assistant with human customercare
978-1-6654-8364-3/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: UNIV OF ALABAMA-TUSCALOOSA. Downloaded on June 13,2023 at 17:22:35 UTC from IEEE Xplore. Restrictions apply.
employees to ensure user satisfaction through automated call
center services. Feng-Lin Li et al. [18] presented AliMe Assist,
which is mainly designed for providing automatic online shop-
ping services in the E-commerce domain that addresses 85%
of questions per day from customers. In another study, Linjian
Sun et al. [20] proposed a service robot, where various types
of deep learning-based independent modules such as image
classification and human-computer interaction are integrated to
automate customer service. In a different study, Hilda Hardy
et al. [21] presented a dialogue management system, which
can interact with a user in spoken language having Auto-
matic Speech Recognition, Text-to-Speech conversion, dia-
logue manager, and information database to provide telephone-
based automatic customer service. Furthermore, G. Zweig et
al. [22] proposed an automated quality monitoring system for
call centers that combined the Speaker Recognition module
with maximum entropy classification, pattern recognition tech-
nology, and automatic speech recognition to develop the robust
set-up. Afterward, Graeme Mclean et al. [23] conducted a web-
based survey with 302 respondents on real-life chat service
experiences to understand the influencing parameters involved
in customer satisfaction. A.K. Warnapura et al. [24] proposed
an artificial intelligence-based architecture, where a particular
customer can get various types of information in the form of Figure 1: System Architecture of Intelligent Call Center inte-
texts/voices or emails. They used sentiment analysis for classi- grating with Speaker and Interaction Recognition
fying the response of the users, Natural Language Processing
(NLP), and Automatic Speech Recognition to automate the
whole robust system. Furthermore, Aderemi A. Atayero et II. M ETHODOLOGY
al. [25] implemented an automated speech-enabled customer
care service system called “ASR4CRM”. They developed the This section illustrates the integration of various types
system three tire architecture: telephone system, gateway and of deep learning and Natural Language Processing (NLP)-
deployment in web server, and database management system. based technologies to develop the robust architecture of AIMS
Our proposed AIMS TALK connects the bridge between TALK. At first, after receiving a user’s call, the system will
the latest architecture customized for Bangla native speakers, ask the users’ name and date of birth using Text-to-Speech
though this architecture can be used in other languages. The (TTS) synthesis. Afterward, the system extracts the MFCC
architecture of AIMS TALK uses MFCC feature extraction features from the user’s voice and sends them to the GMM
and a GMM-based adaptation model for speaker recognition to based speaker recognition module. If the speaker recognition
differentiate the authorized and unauthorized users, having an module recognizes the voice, the system will give a welcome
accuracy of 96.27% for 42 samples in real-time testing. Total message to the user in a spoken form and ask the reason of
architecture has three different databases: Personal Informa- calling in the call center. Next, the system will receive the
tion database, Generic Information database, and Credential response from the user, and using ASR, it will convert into the
Information Database. As the model can deliver credential text and save it into the personal information database where
information to the users, we used a two-step authentication the personal details have been stored. Afterward, the system
process to provide the best security. Besides, the new users will apply the Sentence Summarizing technique and extract
can only get access to the response of the FAQs from the the valuable information from the user response, encode it
generic information database. Moreover, for Automatic Speech with a sentence transformer, and identify the question type
Recognition in Bangla language, we use “Deep Speech 2” from the five different types of groups of questions prepared
[19] model gives 42.15% of Word Error Rate (WRE) on in the credential information database. Then the system will
annotated Bangla speech corpus- ’Large Bengali-Automatic ask the user some predefined questions based on that particular
Speech Recognition Training Data’ (LB-ASRTD) by google. group of questions. After identifying the groups of question
The captured response is summarized by the seq2seq model sets from the user response, the system will receive the specific
in Bangla language [16] having average testing loss is 0.004, questions from the user and give the response in spoken
which is used for specifying the domains of problems by form according to our developed credential database, where
similarity check from our developed databases. Our system similarity checking will be used to identify the particular
uses the WavNet algorithm by gTTS in the Bangla language question from our database.
to communicate with users for real-time response. On the other hand, if the speaker recognition module doesn’t
Figure 2: Workflow of Speaker Recognition module
Figure 3: Sentence Summarizing Architecture
recognize the voice of the user, the system will ask some extra
security questions (National Identity No, Mobile no.) to check - “Large Bengali-Automatic Speech Recognition Training
into the personal information database (PID) whether the user Data” (LB-ASRTD), we trained our ASR model using an
is previously registered in the system or not. If the system finds RNN-based end-to-end speech recognition architecture called
the user information in the PID, the previous method will be “Deep Speech 2” (DS2) [19], developed by Baidu Research
applicable. In contrast, the system will adapt the voice features Community. HPC techniques are used extensively in DS2,
of the user with a GMM-based algorithm and save it with resulting in 7x faster training than the previous edition, “Deep
their information into the PID. Afterward, similarly, the system Speech.” Batch normalization is combined with RNN in this
will ask the reason for calling, convert it into text form using architecture, and a unique optimization curriculum called
ASR, save it into the PID, apply Sentence Summarization, and SortaGrad [19] is employed.
encode it with sentence transformer. Then after the similarity
checking with the generic information database, the system ∞
will respond with the most appropriate answer to the user from
X
L(x, y; ) = log ΠP ctc(lt|x; θ)
the predefined knowledge-based generic information database. l∈=Align(x,y)
At last, the system will ask the user to rate their satisfaction
level after using the system, and it will refer a human assistant Here, Align(x,y) denotes the set of all possible combinations
if it finds the rating low. of the characters of the transcription y to frames of input x.
A. Speaker Recognition C. Sentence summarizing:

We used the MFCC feature extraction [5] technique to Customers’ responses are usually very long which contain
recognize the speaker from the audio signal. These MFCC unnecessary information. Our system uses the Bangla Sentence
features are low-frequency components. Besides, the feature Summarizing technique to reduce sentence length into the
matching technique integrated with the maximization algo- important point without losing necessary information. So we
rithm is created by using Gaussian Mixture Model (GMM) [7]. used the Seq2Seq Bangla news summarization technique with
For implementing the speaker adaptation, our system collects attention model to summarize the response of the customer
20 seconds of voice data from the user and uses feature [26]. Seq2seq model reduces the training loss up to 0.001,
extraction techniques to create a GMM model of each speaker which gives a satisfactory result in our experimental test. Fig-
to perform similarity matching. Furthermore, the threshold 3 [26] illustrates the sentence summarizing architecture.
mechanism identifies the speakers who are not registered in 1) Data Set and Data Processing : The text data comes
the database, significantly improving the system’s efficiency by from popular Bangla newspapers “The Daily Prothom Alo”
avoiding the wrong prediction. Fig. 2 illustrates the workflow and “The Daily Ittefaq.” We have selected 250 long ar-
of the Speaker Recognition module. ticles from “Prothom Alo,”, and 250 articles from “The
Daily Ittefaq”. Moreover, BANS [16] dataset is used from
B. Automatic Speech Recognition “kaggale.com,” which contains 19k long articles with short
We use an ASR model to convert consumer speech into summaries that are used for the model’s evaluation. Collected
text. Using a publicly available annotated Bangla voice corpus data has been processed in four steps; break down sentences
into series of tokens, add the full meaning of the contractions
and remove stopwords and punctuation, apply lemmatization
to convert the word into their root form, and finally, break
words into parts of speech.
2) Count Vocabulary: Vocabulary is counted from the own
dataset among 12,676 unique words are found from the
dataset. Among this unique model only 11,712 words are used
by removing English words from the Bangla Article.
3) Model Architecture: For training, the dataset RNN
encoder-decoder and Seq2Seq learning with attention model
mechanism are used to summarise the articles. This frame-
work has three components: an encoder network, an attention
network, and a decoder network. The encoder converts the
sequence into fixed-size context vector, which represents a
semantic summary of the article. This context vector works
as the initial stage of the decoder connected to hidden units,
although the encoder time stamps are not equal in encoder
and decoder. If A is a long sequence of input where a is
the target sequence of sentences b is the source sequence of
Figure 4: WaveNet Architecture
the sentences, the maximum probability of the word vector
sequence will be
arg(maxbp(a|b)) S2ORC pairsdataset. BERT makes it unsuitable for semantic
similarity search as well as for unsupervised tasks like
Sequence to Sequence model with Bahdanau attention model
clustering. In N. Reimers Al. [17] present Sentence-BERT
is used with fixed output that maps the fixed-length output. A
(SBERT) that use siamese and triplet object function to derive
pre-trained Bangla word vector file “bn w2c model” is used for
semantically meaningful sentence embedding from comparing
word embedding that converts all the words into a numerical
using cosine-similarity. This reduces the effort for finding
form. The final output of the word is in vector form used for
the most similar pair from 65 hours with BERT / Roberta to
model training.
about 5 seconds with SBERT while maintaining the accuracy
D. Workflow of Interactive Agent: from BERT. Given an anchor sentence a, a positive sentence
p, and a negative sentence n, triplet loss tunes the network
We encoded all the sentences from our database using such that the distance between a and p is smaller than the
Sentence Transformer and saved the encoded score in an distance between a and n. Mathematically, we minimize the
array as repeatedly encoding cost us time for every user. The following loss function [17]:
sentences go to the pooling layer, followed by the BERT
model to make a vector to predict a score. With the same max(||sa − sp|| − ||sa − sn||
procedures, we use “Paraphrase mpnet base v2” to encode
With sx, the sentence embedding for a/n/p a distance metric
new sentences that come from users simultaneously. Then
and margin.
new sentence and database sentences array are compared
by similarity check using cosine similarity technique. The E. Text to Speech Synthesis
maximum score is likely the expected query the user is Our intended model can assist customers by interacting with
looking for. The system will transfer the answer via the TTS them; it can reply to various speech in audio format. As we
system, and at the same time, it enlists necessary data, for targeted the domain of e-commerce in Bangladesh, most of
example, doctors’ appointments, into the database. To avoid the users use Bangali Language to communicate. We used
irreverent queries, we set up a thresh holding that is 0.96; the python-based “gTTS” library to turn our text into speech.
this gives a relevant result in our case. The library serves to convert the text by calling API, which
Sentence Transformer: Text from the ASR system in serves the best quality of sound among other available TTS. It
Bangla language, the model encodes every word with uses DeepMind’s WaveNet [15] to deliver the highest accuracy
BERT sentence transformers model using a pertained possible. The architecture of the WaveNet model is shown in
weight called “paraphrase mpnet base v2” [17] has Fig-4 [14]. Furthermore, the speech generation speed from text
been used to convert the word into a vector format. response is quite impressive and useful for real-time uses.
Paraphrase npnet base v2 model is based on Microsoft
mpnet base having a dimension of 768 and mean pooling F. Database
structure. The model is trained with AllNLI, sentence The system consists of three types of Databases, 1. Personal
compression, SimpleWiki, quora duplicates, coco captions, Information Database (PID), 2. Generic Information Database
flickr30k captions,yahoo answers title questions, (GID), 3. Credential Information Database (CID). PID mainly
Figure 5: Personal Information Database (PID) Figure 7: Credential Information Database (CID)
Figure 6: Generic Information Database (GID)
receives the user’s name, National Identity (NID) number,

mobile number, date of birth, recorder response, speech
adapted GMM based model and calling time for each user. For Figure 8: Accuracy of Speaker Recognition Module in two
verification, the system matches the personal information given different environments
by the users with the PID’s information. Fig. 5 illustrates the
sample of PID. GID consists of the frequently asked questions
(FAQs) of the e-commerce domain. When a user is new to into the neural network. We used 10 Gated Recurrent Unit
the system, he/she cannot access the CID, where the user’s (GRU) layers with 800 hidden nodes each. The learning rate
confidential information is stored. Instead, they get services was set to 3 x 10-4, and the momentum was set at 0.9.
from the GID. Fig. 6 represents the samples of GID, and Fig. 7 To minimize over-fitting, our ASR model was trained for 30
illustrates the structure of the credential information database. epochs. The LB-ASRTD corpus, which comprises almost 210
hours of utterances, is now the largest open-source speech
III. R ESULT corpus in Bangla. The entire dataset is divided into three sets:
train, validation, and test, each of which contains 171 hours,
To evaluate the speaker recognition module of our proposed 18 hours, and 18 hours of speeches, respectively. Our trained
system, we considered two different environments for testing, model got the word error rate (WER) of 42.15%. Fig. 9 shows
Env 1: Noisy, Env 2: Studio quality. The microphones used the curve of Epoch vs WER for training the Deep Speech 2
for those two environments are also different. At first, for Model.
a noisy environment, We conducted our test with a total
of 42 samples allocated in 7 consecutive steps for each
environment. Afterward, we observed the flow of the changing
accuracy with the increasing number of sample sizes. The
same procedures had been continued for the studio-quality
environment, and then we observed the diversity of accuracy in
different environments. For instance, we can see from Figure
8, when the sample size is only 7, the module recognized all
the 7 persons in both environments, but when we increased the
sample size to 14, the module failed to recognize 1 person in
a noisy environment that reduced the accuracy of the module
to 92.85%. With the increasing sample size, the performance Figure 9: Epoch vs WRE for training Deep Speech 2 Model
tends to be constant after a particular point. We got 94.38%
accuracy in a real-time environment. However, the module’s For training Sentence Summerization model we use
performance might decrease if we test it in large-scale real- , RNN size = 256, batch size = 2, learning rate =
time environments. 0.001, and probability rate = 0.70 to 0.75 with ”Adam”
We extracted the MFCC features from the utterances for optimizer. The model is tested on Bangali Abstractive
our Deep Speech 2 model and then fed these input features News Summarization(BANS) dataset [16]. Using the main
authors parameters the average loss is 0.004 which is quite [7] Yan, H., Ang, M. H., & Poo, A. N. (2014). A survey on perception
satisfactory. methods for human–robot interaction in social robots. International
Journal of Social Robotics, 6(1), 85-119.
We used the weight of the BERT sentence [8] Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker
transformer(paraphrase-mpnet-base-v2) trained on STSb, identification using Gaussian mixture speaker models. IEEE transactions
DupQ, TwitterP, SciDocs, Clustering dataset having average on speech and audio processing, 3(1), 72-83.
[9] Sarkar, A. K., Matrouf, D., Bousquet, P. M., & Bonastre, J. F. (2012).
accuracy 76.84% shown in Table:I Study of the effect of i-vector modeling on short and mismatch utterance
duration for speaker verification. In Thirteenth Annual Conference of the
Table I: Accuracy of sentence transformer “paraphrase-mpnet- International Speech Communication Association.
base-v2” for the quality to embedded sentences and to embed- [10] Liu, G., & Hansen, J. H. (2014). An investigation into back-end advance-
ments for speaker recognition in multi-session and noisy enrollment
ded search queries paragraphs scenarios. IEEE/ACM Transactions on Audio, Speech, and Language
Dataset Accuracy(%) Processing, 22(12), 1978-1992.
STSb 86.99 [11] Hasan, T., & Hansen, J. H. (2013). Maximum likelihood acoustic factor
analysis models for robust speaker verification in noise. IEEE/ACM
DupQ 87.80
transactions on audio, speech, and language processing, 22(2), 381-391.
TwitterP 76.05
[12] Zheng, M., & Meng, M. Q. H. (2012, December). Designing gestures
SciDocs 80.57
with semantic meanings for humanoid robot. In 2012 IEEE International
Clustering 52.81 Conference on Robotics and Biomimetics (ROBIO) (pp. 287-292).
IEEE.
[13] Nadarzynski, Tom, et al. “Acceptability of artificial intelligence (AI)-led
IV. C ONCLUSION chatbot services in healthcare: A mixed-methods study.” Digital health
Our developed system can communicate with users in the 5 (2019): 2055207619871808.
[14] https://smallbiztrends.com/2019/05/customer-contact-statistics.html
standardized Bangla language. This whole AI-based archi- [15] Oord, A.V.D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves,
tecture will automate the call center as well as increase the A., Kalchbrenner, N., Senior, A. and Kavukcuoglu, K., 2016. Wavenet:
efficiency and quality of services. Moreover, this model can A generative model for raw audio. arXiv preprint arXiv:1609.03499.
[16] Bhattacharjee P, Mallick A, Islam MS, Jannat M (2020) Bengali ab-
reduce the waiting time in the call center and make a positive stractive news summarization (BANS): a neural attention approach. In:
response to the customers by providing unbiased responses. 2nd international conference on trends in computational and cognitive
But in the real case, our system may not handle converting engineering. arXiv:2012.01747
[17] Reimers, N. and Gurevych, I., 2019. Sentence-bert: Sentence embed-
regional forms of speech into text. As we don’t introduce dings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
any noise cancellation module during the phone call, the [18] Li, Feng-Lin, et al. “Alime assist: An intelligent assistant for creating
accuracy may degrade in a noisy environment. The GMM an innovative e-commerce experience.” Proceedings of the 2017 ACM
on Conference on Information and Knowledge Management. 2017.
based MFCC model is suitable for the small size of the dataset. [19] Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E.,
But increasing numbers of users may cause degrade of the Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G. and Chen, J.,
accuracy. In the future, we have a plan to replace the model 2016, June. Deep speech 2: End-to-end speech recognition in english
and mandarin. In International conference on machine learning (pp. 173-
with sincnet to improve the overall accuracy. However, our 182). PMLR.
system is a milestone for call center support in the Bangla [20] Sun, Linjian, et al. “Design of Integrated Vision and Speech Technology
language, creating state of art for future development to ensure for a Robot Receptionist.” 2018 IEEE International Conference on
Mechatronics and Automation (ICMA). IEEE, 2018.
customer satisfaction. [21] Hardy, Hilda, Tomek Strzalkowski, and Min Wu. Dialogue management
for an automated multilingual call center. STATE UNIV OF NEW
V. ACKNOWLEDGEMENT YORK AT ALBANY INST FOR INFORMATICS LOGICS AND
ICT innovation fund from ICT Division, Ministry of Posts, SECURITY STUDIES, 2003.
[22] Zweig, Geoffrey, et al. “Automated quality monitoring for call cen-
Telecommunications and Information Technology, the People’s ters using speech and NLP technologies.” Proceedings of the Human
Republic of Bangladesh funds this research work and pilot Language Technology Conference of the NAACL, Companion Volume:
project. The total technical support is provided by AIMS Lab, Demonstrations. 2006.
[23] McLean, Graeme, and Kofi Osei-Frimpong. “Examining satisfaction
United International University, Bangladesh. with the experience during a live chat service encounter-implications
for website providers.” Computers in Human Behavior 76 (2017): 494-
R EFERENCES 508.
[1] Conduent. (2018). The State of Consumer Experience Communication [24] Warnapura, A. K., et al. “Automated Customer Care Service System for
Edition 2018 Finance Companies.” Research and Publication of Sri Lanka Institute of
[2] BrandGarage, & Linc. (2018). How AI Technology Will Transform Information Technology (SLIIT)’. NCTM, 2014. 08.
Customer Engagement. [25] Atayero, Aderemi A., et al. “Implementation of ‘ASR4CRM’: An auto-
[3] Microsoft. (2018). State of Global Customer Service Report mated speech-enabled customer care service system.” IEEE EUROCON
[4] Li, F., Qiu, M., Chen, H., Wang, X., Gao, X., Huang, J., . . . Chu, W. 2009. IEEE, 2009.
(2017). AliMe Assist: An Intelligent Assistant for Creating an Innovative [26] Sultana, Mariyam, Partha Chakraborty, and Tanupriya Choudhury. “Ben-
E-commerce Experience. In CIKM’17: Proceedings of the 2017 ACM gali Abstractive News Summarization Using Seq2Seq Learning with
Conference on Information and Knowledge Management (pp. 2–5). Attention.” Cyber Intelligence and Information Retrieval. Springer, Sin-
https://doi.org/10.1145/3132847.3133169 gapore, 2022. 279-289.
[5] Zhao, Q., Tu, D., Xu, S., Shao, H., & Meng, Q. (2014, November).
Natural human-robot interaction for elderly and disabled healthcare
application. In 2014 IEEE International Conference on Bioinformatics
and Biomedicine (BIBM) (pp. 39-44). IEEE.
[6] Prado, J. A., Simplı́cio, C., Lori, N. F., & Dias, J. (2012). Visuo-auditory
multimodal emotional structure to improve human-robot-interaction.
International journal of social robotics, 4(1), 29-51.
View publication stats

Research Paper AIMSTalk

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Paper AIMSTalk

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Conference Paper · December 2021

Shehan Irteza Pranto Rahad Arman Nabid

SEE PROFILE SEE PROFILE

Ahnaf Mozib Samin Farhana Sarker

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

AIMS TALK: Intelligent Call Center Support in

Farhana Sarker Mohammad Nurul Huda Khondaker A. Mamun*

978-1-6654-8364-3/21/$31.00 ©2021 IEEE

A. Speaker Recognition C. Sentence summarizing:

Figure 6: Generic Information Database (GID)

receives the user’s name, National Identity (NID) number,

View publication stats

You might also like