Professional Documents
Culture Documents
by
ABIJITH KRISHNA(NSS18EC004)
AJAY A(NSS18EC007)
ANANDHAKRISHNAN P A(NSS18EC017)
MAHESH KRISHNA M(NSS18EC047)
PRASAD R MENON
Assistant Professor
Dept.of ECE
ABIJITH KRISHNA
AJAY A
ANANDHAKRISHNAN P A
MAHESH KRISHNA M
ABSTRACT
In this paper, we present our most recent investigations in electromyographic (EMG) speech
recognition, where the activation potentials of the articulatory muscles are directly recorded
from the subject’s face via surface electrodes. In contrast to many other technologies, the
major advantage of EMG is that it allows to recognize non-audible, i.e. silent speech. This
makes it an interesting technology not only for mobile communication in public environments,
where speech communication may be both a confidentiality hazard and an annoying
disturbance, but also for people with speech pathologies. Silent speech or unvoiced speech can
be interpreted by lip reading, which is difficult, or by using EMG (Electromyography)
electrodes to convert facial muscle movements into distinct signals. These signals are
processed in MATLAB and matched to a predefined word by using Dynamic Time Warping
algorithm. The identified signal can be used to control a nearby device such as fan, Light, T.V
and other home appliances. Thus, a silent speech interface has the potential to enable a
differently-abled person to communicate and interact with objects in their surroundings to
ease their lives.
Contents
1 INTRODUCTION ………………………………………………………………………....…… 6
3 METHODOLOGY ………………………………………………………….. 11
4 CONCLUSION ………………………………………………………………………………….. 14
5 REFERENCES ………………………………………………………….. 15
List of Figures
INTRODUCTION
Speech is the most basic and preferred means of communication among humans.
Unfortunately, around 18.5 million individuals have a speech, voice, or language disorder.
Some of these individuals have great potential but are unable to communicate their thoughts.
There are several types of speech disorders such as stuttering, lisps, Dysarthria (weakness or
paralysis of speech muscles due to ALS or Parkinson’s diseases.). We do not always need our
voice to convey our thoughts and ideas to others. There are many problems we face when
communicating with people who are not familiar with our language. When we speak, each
word has a different type of movement of the facial muscles.
In this project, we are designing a Human Computer Interface to detect and analyze facial
movements and differentiate facial muscle movements into words, by using different signal
processing methods to process the input signal retrieved from the hardware component.
Surface EMG assesses muscle function by recording muscle activity from the surface above
the muscle on the skin. Surface EMG can be recorded by a pair of electrodes or by a more
complex array of multiple electrodes. More than one electrode is needed because EMG
recordings display the potential difference (voltage difference) between two separate
electrodes. Limitations of this approach are the fact that surface electrode recordings are
restricted to superficial muscles, are influenced by the depth of the subcutaneous tissue at the
site of the recording which can be highly variable depending on the weight of a patient, and
cannot reliably discriminate between the discharges of adjacent muscles. Specific electrode
placements and functional tests have been developed to minimize this risk, thus providing
reliable examinations.
Chapter 2
LITERATURE REVIEW
The most natural and powerful way of communication for humans is the spoken language.
For this reason there has been vast research in learning the design principles of systems able
to understand human speech and expressions. Natural language communication with
machines is typically done using automatic speech recognition (ASR) systems. The usual
setting is a user that speaks to a microphone, and then, the ASR recognizes the speech and
the integrated application behaves according to the established dialogue. One of the main
drawbacks of traditional speech interfaces is their limited robustness in the presence of
ambient noise . To overcome this limitation, several electromyographic (EMG) approaches
have been proposed in which the acoustic speech recognition is substituted by silent-speech
recognition. The classification is based on the myoelectric signals produced in the facial
muscles during speech . This solution overcomes the ambient noise but also provides an
alternative to human machine communication for people with speech disabilities such as
laryngectomy, as well as elderly or convalescent people. In these cases there is no acoustic
signal coming from the user, or the signal is distorted or very weak.
Focusing on existing and natural EMG speech recognition systems, there are mainly three
possible approaches to the problem. The first one is based on phoneme recognition. This
problem has about 30 classes (approximately the number of letters in Spanish language) but
the main difficulty is to delimit where begins and finishes inside a word.
2.2 ELECTRODE POSITIONS
Bipolar electrodes were placed in the same direction of the fibers of the facial muscle and
the distance was fixed to be 1 cm. The ground electrode was placed on the forehead, and
the reference electrode was placed on the left earlobe. The impedance at each electrode
was checked to be below 10 kΩ. The eight bipolar EMG signals were acquired and
digitized (using a gUSBamp amplifier from gTec) at a sampling frequency of 2400 Hz,
power-line notch-filtered to remove the 50 Hz line interference, and band-pass filtered
between 5 and 500 Hz to remove different noise sources out of the EMG signals frequency
band. The general instrumentation was a commercial gTec amplifier, and eighteen gold-
made EMG surface electrodes (diameter: 10 mm). The recording system and software was
developed under the BCI2000 platform
Sensor locations: The skin area of each sensor location was first prepared using a
disposable shaver (if needed), alcohol wipes, and repeated tape peels to remove facial hair,
oils, and exfoliates (respectively). Position and alignment of sensors relative to anatomical
landmarks on the face and neck relied on templates. The ventral neck sensor pairs were
placed in submental and ventromedial regions, and the face sensors were placed over
supralabial and infralabial regions . A double sided hypoallergenic adhesive tape with
cutouts for the electrode pairs was used to secure the sensors to the skin.
Many of the target muscles involved in speech production are relatively superficial and
therefore easily accessible for recording, whereas others are relatively deep
(laryngeal/pharyngeal) or otherwise poorly situated for conventional sensor placement (e.g.
intrinsic muscles of the tongue). Given that there are practical limitations to the number of
sEMG recording locations obtainable from the body surface above the speech musculature,
we sought to identify an optimal sensor configuration in preliminary experiments prior to
collecting our larger data sets. We started by identifying 6 regions across the neck and face
surface (supralabial, labial, infralabial, submental, ventromedial neck, and ventrolateral
neck) superficial to muscles involved in speech production, and identified one or two
sensor positions within each region that had been used in prior sEMG speech studies
and/or were above prominent speech muscles. The resulting 11 “default” sensor locations
served as reference points for sensor position mapping experiments performed in an adult
male and female participant. The mapping strategy involved systematically moving one or
two (at a time) single-differential bar-type sEMG sensors in 5mm steps across a face or
neck region during mouthed speech production while maintaining the default locations for
all other regions. This enabled us to assess the sEMG speech content of each mapped
position in relation to a complete set of sEMG signals, examining several sensor
configurations for each subject. We found that the information content did not markedly
change when sensors were moved along predetermined trajectories across the labial region
(above the orbicularis oris superioris and inferioris), but that there were consistent optimal
locations in the other zones for our two test subjects in reference to their body midline. In
the supralabial region, sensor placement at a location likely above the zygomaticus minor
and levator anguli oris provided the best information. Likewise, in the infralabial region,
the sensor position above the depressor anguli oris and depressor labii inferioris provided
the most unique information. In the submental region, sensor positions above the anterior
digastric and mylohyoid were the best, as well as a location 3 cm lateral to that position
(above the platysma and lateral mylohyoid). Along the ventromedial neck, a sensor
position falling above and slightly lateral to the cricothyroid membrane was optimal, as
well as a position in the same medial location but closer to the chin. Findings from our
sensor position mapping experiment enabled us to identify 11 of the most appropriate (if
not optimal) sensor positions, but did not indicate the relative importance of each recording
location or redundancies in the data they provide.
The experiments consisted in recording electrical EMG signals obtained from the forearms of
the subjects through thirty superficial electrodes (10mm diameter), while the subjects
responded to a visual stimulus. The stimulus is related to sign language, and a computer
program randomly generates the gestures. The subjects’ tasks consisted in reproducing the
random gestures displayed on the computer monitor .
The paper has the signals taken from the movement of hand positions. Instead of hand
gestures we are focusing on the facial muscles movement to take the signals.
Chapter 3
METHODOLOGY
A myoelectric signal is an electric biological signal generated by the motor neurons connected
to the muscle fibers during the contraction of a muscle. To measure these signals so as to use
them in the control system, an electromyography (EMG) circuit is required.
Electromyography is a technique for evaluating and recording the electrical activity produced
by skeletal muscles. There are many applications for the use of EMG. It is widely used in the
biomedical field. EMG is used clinically for the diagnosis of neurological and neuromuscular
problems. The modern-day applications include for the training of prosthetic arm of robots,
Used in Virtual reality for getting accurate shape of face etc.
We propose a method that uses myo electric signal to control your household devices like
Light, Fan etc. for people who have speech disabilities. Myo electric signals are taken from
Facial muscles using EMG sensors which are connected to an Arduino. We are currently
assigning specific syllables or words for specific tasks. For example, ‘O’ for switching ON the
fan. The muscle movement in the face when we speak is recorded using the emg sensors with
the help of electrodes.AD8232(heart rate monitor sensor) or EMG muscle sensor v3 can be
used; they both have a 3.5 mm cable port for connecting the electrodes. The readings are taken
from the following muscles:
The sensor extracts, amplifies, and filters small biopotential signals in the presence of noisy
conditions, such as those created by motion or remote electrode placement by simply
connecting to a microcontroller. An Arduino is used as the microcontroller. The Arduino
acquires these EMG signals and sends it serially to the personal computer for signal
processing. Signal processing is done using MATLAB. Many filtering techniques can be used
for processing the data here we use Dynamic Time Warping (DTW), it is one of the
algorithms for measuring similarity between two temporal sequences, which may vary in
speed. For instance, similarities in walking could be detected using DTW, even if one person
was walking faster than the other, or if there were accelerations and decelerations during the
course of an observation. It aims at aligning two sequences of feature vectors by warping the
time axis iteratively until an optimal match (according to a suitable metrics) between the two
sequences is found. By repeating the process for many a threshold can be obtained. Based on
this value we can assign a task for a word or syllable. Serial communication is possible in
Arduino uno with the help of Software Serial Library. So, we can give the output signal to
raspberry pi or Arduino for more functions. We can control electric devices with the help of
raspberry pi provided that the devices should be connected to a relay network. Using the
software like Node-Red, Raspberry pi os we can program the circuit to perform verity of tasks
according to the input data.
Fig 3.3 Arduino And Sensor
CONCLUSION
Our work, currently at its initial stage, proposes a method for people with a speech
impediment to communicate with others and to control IOT devices in their
surroundings. A special abled person with silent speech can control appliances in his
surroundings to ease his life and thus making him more independent. After the
completion of the first fully functional prototype, the usability of the prototype will be
assessed by a spectacled abled person, whose feedback will be essential for further
development. A language translator function can also be added so that the recognized
silent speech can be converted to another language.
Bibliography
[1] A. Kapur, S. Kapur, and P. Maes,"AlterEgo:A Personalized Wearable Silent
Speech Interface." 23rd International Conference on Intelligent User
Interfaces(IUI2018),pp43-53,March5,2018.
[6] Wand, M. and Schultz, T., 2009, January. Towards Speaker-adaptive Speech
Recognition based on Surface Electromyography. In Biosignals (pp. 155-162).
[7] K. R. Wheeler, "Device control using gestures sensed from EMG," Proceedings of
the 2003 IEEE International Workshop on Soft Computing in Industrial
Applications, 2003. SMCia/03., 2003, pp. 21-26, doi:
10.1109/SMCIA.2003.1231338.
[8] V. Asanza, E. Peláez, F. Loayza, I. Mesa, J. Díaz and E. Valarezo, "EMG Signal
Processing with Clustering Algorithms for motor gesture Tasks," 2018 IEEE Third
Ecuador Technical Chapters Meeting (ETCM), 2018, pp. 1-6, doi:
10.1109/ETCM.2018.8580270.
[9] G. S. Meltzner, J. T. Heaton, Y. Deng, G. De Luca, S. H. Roy and J. C. Kline,
"Silent Speech Recognition as an Alternative Communication Device for Persons
With Laryngectomy," in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, vol. 25, no. 12, pp. 2386-2398, Dec. 2017, doi:
10.1109/TASLP.2017.2740000
[10] N. S. Jong, A. G. S. de Herrera and P. Phukpattaranont, "Multimodal Data Fusion
of Electromyography and Acoustic Signals for Thai Syllable Recognition," in IEEE
Journal of Biomedical and Health Informatics, vol. 25, no. 6, pp. 1997-2006, June
2021, doi: 10.1109/JBHI.2020.3034158.
[11] de Freitas, R.C., Alves, R., da Silva Filho, A.G., de Souza, R.E., Bezerra, B.L. and
dos Santos, W.P., 2019. Electromyography-controlled car: A proof of concept based
on surface electromyography, Extreme Learning Machines and low-cost open
hardware. Computers & Electrical Engineering, 73, pp.167-179.
[12] Vasylkiv, Y., Neshati, A., Sakamoto, Y., Gomez, R., Nakamura, K. and Irani, P.,
2019, January. Smart Home Interactions for People with Reduced Hand Mobility
Using Subtle EMG-Signal Gestures. In ITCH (pp. 436-443).
[13] L. Shao, "Facial Movements Recognition Using Multichannel EMG Signals," 2019
IEEE Fourth International Conference on Data Science in Cyberspace (DSC), 2019,
pp. 561-566, doi: 10.1109/DSC.2019.00091.
[14] Chan, A.D., Englehart, K., Hudgins, B. and Lovely, D.F., 2001. Myo-electric
signals to augment speech recognition. Medical and Biological Engineering and
Computing, 39(4), pp.500-504.
[15] H. Cha, S. Choi and C. Im, "Real-Time Recognition of Facial Expressions Using
Facial Electromyograms Recorded Around the Eyes for Social Virtual Reality
Applications," in IEEE Access, vol. 8, pp. 62065-62075, 2020, doi:
10.1109/ACCESS.2020.2983608.