You are on page 1of 20

In the name of Allah almighty, the most gracious the most merciful and the most beneficent.

Prepared By: Dr. Fazli Wahid;


Course:: Speech Processing 1
Course: Speech Processing

Chapter 1: Introduction

By: Dr. Fazli Wahid

Department of Computer Science and Information Technology


Prepared By: Dr. Fazli Wahid;
Course:: Speech Processing 2
Chapter Outlines

1. Basics of AI, ML, DL


2. Speech Basics
3. Speech Processing
4. Steps in Speech Processing
5. Speech Recognition
6. Speech Synthesis
7. Fields for Speech Processing
8. Speech Recognition Using Machine Learning

Prepared By: Dr. Fazli Wahid;


Course:: Speech Processing 3
Introduction
Natural Artificial Intelligence
➢ God created ➢ Man made ➢ Thinking power
➢ Having some natural ➢ No natural phenomenon or origin ➢ Ability to understand
phenomenon or origin ➢ Capacity of reasoning

Human Robot
Animals Artificial animals toys Artificial intelligence:
Birds Drone or aero plane
Trees Artificial Trees ➢ Making artificial
Earth Earth Map objects work like
Mountains Artificial Mountains natural object
Sea Artificial Sea ➢ Artificial objects act
Sky Artificial Sky like natural creation
Cloud Cloud Computing
Wind Fan

Prepared By: Dr. Fazli Wahid;


Course: Speech Processing 4
Introduction
✓ AI is defined as the methods, approaches and
No exact and universally acceptable techniques for writing computer programs which can
description of AI is still available represent human intelligence working mechanism when
Artificial: fed into computers.
✓ anything having no natural In specific relevance to computer science, AI is referred as
configuration or the branch of computer science that
✓ simulate the properties of natural creation (mainly
✓ real life occurrence is known as human beings) including
“artificial” ✓ sensing capabilities,
Intelligence: ✓ intelligence and
✓ competence of thinking ✓ reasoning procedures for solving various types of
✓ ability to comprehend (Understand), problems
✓ To understand and profit from Artificial Intelligence is an ability to design smart
experience machines or to develop self-learning software
applications that imitate (copy) the traits
✓ the capacity for thought and reason (Characters) of the human mind like reasoning,
✓ aspects of intelligence include problem-solving, planning, optimal decision making,
✓ creativity, sensory perceptions etc.
✓ skill, AI can also be referred as “virtual intelligence”, “soft
✓ consciousness (awareness), computing”, and “computational intelligence” as
✓ emotion and described by Mohaghegh (2000).
✓ intuition (sensitivity) The AI was first introduced to academia in 1956
(Solomonoff,and Ray, 1957).
Prepared By: Dr. Fazli Wahid;
Course: Speech Processing 5
Introduction

✓ In the modern fast growing information and communication era, the artificial intelligence (AI)
concept has introduced tremendous advances to the researchers’ community.

✓ The phenomenon behind this remarkable attraction both in numerical representation and complexity
of various types of problems prevailing in academia and industries.

Machine Learning

✓ Machine learning is the scientific study of algorithms and statistical models that machines use to
perform a specific task

✓ The tasks are performed without taking explicit instructions from human

✓ Uses previous patterns and inference (Conclusion) instead

✓ It is considered as subset of AI

✓ Machine learning algorithms build a mathematical model based on sample data

Prepared By: Dr. Fazli Wahid;


Course: Speech Processing 6
Introduction
1. Machine:
➢ A collection of few systems 3. Machine Learning
➢ Perform a specific work ➢ Enabling the machine to learn
➢ Example ➢ Basically these are
➢ Apparatus (Oxygen tank), ➢ Mathematical models
➢ Appliance (Fan), ➢ Techniques
➢ Device (Computer), ➢ Algorithms
➢ Watch ➢ Methods
➢ Car ➢ These models learn like natural creation
➢ Aero plane ➢ These are implemented inside machine so
➢ Laptop that they may learn like natural creation
2. Learning:
➢ the acquisition of knowledge or 4. Machine Learning:
skills through ➢ Completely software based
➢ Self learning, models
➢ Being taught. ➢ The models have different
➢ Briefing categories
➢ Observations from ➢ A sub field of Artificial
environment intelligence
➢ Capability of natural creation Prepared By: Dr. Fazli Wahid;
➢ Human, animals, fish, birds Course: Speech Processing 7
Introduction

✓ The sample data known as "training data"

✓ The training data is used to make predictions or decisions without being explicitly programmed to
perform the task

✓ Machine learning algorithms are used in a wide variety of applications, such as email
filtering and computer vision

Deep Learning
✓ Deep learning is part of a broader family of machine learning methods

✓ Deep learning methods are based on artificial neural networks with complicated structure for solving
complex problems

✓ Also known as deep structured learning or differential programming

✓ have been applied to fields including computer vision, speech recognition, natural language
processing, audio recognition, social network filtering, machine translation, bioinformatics, drug
design, medical image analysis, material inspection and game programs
Prepared By: Dr. Fazli Wahid;
Course: Speech Processing 8
Introduction

Prepared By: Dr. Fazli Wahid;


Course: Speech Processing 9
5.Difficult
for Human

Artificial Intelligence

2. Algorithms,
Techniques,
1.
3. Act Models
Subset
Like
Human
5.Easy for
Human

Machine Learning

2. Algorithms,
1. Techniques,
Subset Models
3.Lear
n Like
Huma
n 5.Easiest for
Human

3Think 2. Algorithms,
Like Deep Learning Techniques,
Huma Models
n

Note: The Figure may have some minor Prepared By: Dr. Fazli Wahid;
inaccuracy Course: Speech Processing 10
Introduction
➢ The study of speech Speech acquisition ➢ while language is an
signals and the arbitrary (Random)
processing methods of ➢ Acquiring of speech by human association of symbols
signals. beings or computer used according to
➢ Aspects of speech prescribed rules to
processing includes ➢ Speech consists of an organized convey meaning
set of sounds or phonemes
➢ the acquisition, ➢ Our focus here is
➢ manipulation, ➢ Sounds or phonemes are used to speech acquisition by
storage, convey meaning computer system
➢ transfer and
➢ output of speech ➢ Phoneme: any of the distinct units ➢ When computer is used
signals. of sound in a specified language for acquisition of
that distinguish one word from sound, there are many
another, devices used for this
➢ for example p, b, d, and t in the purpose e.g.
English words pad, pat, bad, microphone
and bat.

Prepared By: Dr. Fazli Wahid;


Course:: Speech Processing 11
Introduction
Speech manipulation ➢ This mechanism is called filtering ➢ For example, in google,
techniques Youtube speech
➢ Performing some recognition, different
operations on sound ➢ There are different types of filters models are used
for removing various types of
➢ The operations are for noise from the sound Speech Storage
improving the quality of
➢ The processing of
sound Speech Identification storing sound for some
➢ Speech Identification means future reference is
➢ The quality is improved
determining the meaning of sound called speech storage
for clear understanding

➢ For example, there are ➢ For example, a given sound is ➢ For example, the
some noise in the sound music, normal sound, noise, speech recognition
systems store the
➢ That noise is actually ➢ For this process of distinguishing sounds of different
unwanted signal that many Artificial intelligence and individuals for future
create disturbance machine learning models are used reference

➢ Therefore some
mechanism is required to
Prepared By: Dr. Fazli Wahid;
remove this noise Course:: Speech Processing 12
Introduction
➢ It incorporates knowledge and
Speech recognition research in the computer
science, linguistics and computer
Speech Transfer ➢ Speech recognition is engineering fields.
an interdisciplinary sub
➢ The process of moving sound data field of computer Speech Synthesis
from one location to another science and computatio ➢ Speech synthesis is the artificial
location is called speech transfer nal linguistics production of human speech.
➢ Develops methodologie ➢ A computer system used for this
➢ For transferring sound, different s and technologies purpose is called a speech
devices are used such as radio or ➢ that enable the computer or speech synthesizer,
even TV recognition and
and translation of ➢ can be implemented
➢ Different communication media spoken language into in software or hardware products.
are used for transferring sound text by computers. ➢ A text-to-speech (TTS) system
➢ It is also known converts normal language text
➢ Both wired and wireless media as automatic speech into speech;
are used for this purpose recognition (ASR), co ➢ other systems render symbolic
mputer speech linguistic
Two most important components of recognition or speech representations like phonetic
speech processing to text (STT). transcriptions into speech
1. Speech recognition
Prepared By: Dr. Fazli Wahid;
2. Speech synthesis
Course:: Speech Processing 13
Introduction

1. Signal processing

Fields for Speech processing ➢ The process of extracting relevant information from the
speech signal in an efficient and robust manner.
These disciplines are involved in
speech processing system ➢ Using this process we can characterize the time-varying
properties of the speech signal
1. Signal processing
2. Physics ➢ as well as various types of signal preprocessing and post
3. Pattern recognition processing to make the speech signal robust
4. Linguistics
5. Physiology 2. Physics
6. Computer Science ➢ The science of understanding the relationship between the
7. Psychology physical speech signal and physiological mechanisms or

➢ we can say that human vocal tract mechanism that produces


speech and with which the speech is perceived (interpreted)

Prepared By: Dr. Fazli Wahid;


Course:: Speech Processing 14
Introduction

3. Pattern Recognition

Fields for Speech processing ➢ The set of algorithms used to cluster data to create patterns
and
These disciplines are involved in
speech processing system ➢ to compare a pair of patterns on the basis of feature
measurement.
1. Signal processing
2. Physics 4. Linguistics (Language related)
3. Pattern recognition
4. Linguistics ➢ The relationship between sounds (phonology), words in a
5. Physiology language (syntax), meaning of spoken words (semantics),
6. Computer Science and sense derived from the meaning (pragmatics).
7. Psychology
5. Physiology (Study of human body working)

➢ Understanding of the higher-order mechanisms within the


human central nervous system that account for speech
production and perception in human beings.
Prepared By: Dr. Fazli Wahid;
Course:: Speech Processing 15
Introduction

Fields for Speech processing


6. Computer Science
These disciplines are involved in
speech processing system ➢ The study of efficient algorithms for implementing, in
software and hardware, the various methods used in a
1. Signal processing practical speech recognition system
2. Physics
3. Pattern recognition 7. Psychology (Study of mind and behavior)
4. Linguistics
5. Physiology ➢ The science of understanding the factors that enable a
6. Computer Science technology to be used by human beings in practical tasks.
7. Psychology

Prepared By: Dr. Fazli Wahid;


Course:: Speech Processing 16
Introduction

Speech Recognition using ML

Sound Signal Spectrogram


Pre-Processing
Acquisition Conversion Creation

Feature Image Image Feature Image Feature


Extraction conversion Extraction Reduction

Recognition

Prepared By: Dr. Fazli Wahid;


Course:: Speech Processing 17
Introduction
Speech Recognition using ML
Speech Recognition using ML
3. Spectrogram creation process
1. Noise Removal: ➢ A spectrogram is a visual representation of
➢ Unwanted parts are removed from the spectrum of frequencies of a signal as it varies with
the sound time.
➢ These parts create disturbance ➢ Spectrum represents the distribution of power of values
making the signal
2. Values normalization ➢ Frequency is the number of occurrences of a repeating event
per unit of time
➢ Normalization is a scaling ➢ When applied to an audio signal, spectrograms are
technique in which values are sometimes called sonographs, voiceprints, or voicegrams
shifted and rescaled so that they
end up ranging between 0 and 1. 4. Spectrogram Feature Extraction process
➢ For example, there may be values ➢ Processing the whole spectrogram is very complex process
3, 6, 1200, 20, etc ➢ Therefore important portions of spectrogram are retrieved
➢ These values are difficult to from the whole spectrogram
process ➢ These parts are complete representation of the whole
➢ So these are given other ranges spectrogram
for easy processing ➢ This extraction of parts is called feature extraction
Prepared By: Dr. Fazli Wahid;
Course:: Speech Processing 18
Introduction
Speech Recognition using ML
Speech Recognition using ML
7. Image Feature Reduction
5. Image conversion
➢ The features extracted in previous stage are larger in
➢ The portions extracted in the last number
stage are difficult to process ➢ These features are difficult to process by ML technique
➢ For easy processing, the ➢ Some operations are perform to reduce this number
spectrogram parts are converted ➢ This process is called feature reduction stage
to image ➢ There are different techniques used for this purpose
➢ The image is numerical
representation of spectrum 8. Recognition
➢ Actually it is large number of ➢ The features obtained from the previous stage are given as
numerical values in 2 dimensional input to the ML model
form ➢ The ML model recognizes the inputs
➢ For different sound, different values are inputs
6. Image Feature extraction ➢ Based on these inputs to ML, the sound type is determined
➢ The output can be music sound, normal sound or other types
➢ The important features (Portions) of sound
of image are extracted ➢ Different ML models for example, ANN, KNN, SVM, DT,
➢ These features are also numerical Deep models are Prepared
used forBy:this purpose
Dr. Fazli Wahid;
values Course:: Speech Processing 19
Introduction

Prepared By: Dr. Fazli Wahid;


Course:: Deep Learning 20

You might also like