Chapter 1 Introduction

In the name of Allah almighty, the most gracious the most merciful and the most beneficent.
Prepared By: Dr. Fazli Wahid;

Course:: Speech Processing 1
Course: Speech Processing
Chapter 1: Introduction
By: Dr. Fazli Wahid
Department of Computer Science and Information Technology

Chapter Outlines
1. Basics of AI, ML, DL

2. Speech Basics
3. Speech Processing
4. Steps in Speech Processing
5. Speech Recognition
6. Speech Synthesis
7. Fields for Speech Processing
8. Speech Recognition Using Machine Learning

Introduction
Natural Artificial Intelligence
➢ God created ➢ Man made ➢ Thinking power
➢ Having some natural ➢ No natural phenomenon or origin ➢ Ability to understand
phenomenon or origin ➢ Capacity of reasoning
Human Robot
Animals Artificial animals toys Artificial intelligence:
Birds Drone or aero plane
Trees Artificial Trees ➢ Making artificial
Earth Earth Map objects work like
Mountains Artificial Mountains natural object
Sea Artificial Sea ➢ Artificial objects act
Sky Artificial Sky like natural creation
Cloud Cloud Computing
Wind Fan

Course: Speech Processing 4
Introduction
✓ AI is defined as the methods, approaches and
No exact and universally acceptable techniques for writing computer programs which can
description of AI is still available represent human intelligence working mechanism when
Artificial: fed into computers.
✓ anything having no natural In specific relevance to computer science, AI is referred as
configuration or the branch of computer science that
✓ simulate the properties of natural creation (mainly
✓ real life occurrence is known as human beings) including
“artificial” ✓ sensing capabilities,
Intelligence: ✓ intelligence and
✓ competence of thinking ✓ reasoning procedures for solving various types of
✓ ability to comprehend (Understand), problems
✓ To understand and profit from Artificial Intelligence is an ability to design smart
experience machines or to develop self-learning software
applications that imitate (copy) the traits
✓ the capacity for thought and reason (Characters) of the human mind like reasoning,
✓ aspects of intelligence include problem-solving, planning, optimal decision making,
✓ creativity, sensory perceptions etc.
✓ skill, AI can also be referred as “virtual intelligence”, “soft
✓ consciousness (awareness), computing”, and “computational intelligence” as
✓ emotion and described by Mohaghegh (2000).
✓ intuition (sensitivity) The AI was first introduced to academia in 1956
(Solomonoff,and Ray, 1957).
Introduction
✓ In the modern fast growing information and communication era, the artificial intelligence (AI)
concept has introduced tremendous advances to the researchers’ community.
✓ The phenomenon behind this remarkable attraction both in numerical representation and complexity
of various types of problems prevailing in academia and industries.
Machine Learning
✓ Machine learning is the scientific study of algorithms and statistical models that machines use to
perform a specific task
✓ The tasks are performed without taking explicit instructions from human
✓ Uses previous patterns and inference (Conclusion) instead
✓ It is considered as subset of AI
✓ Machine learning algorithms build a mathematical model based on sample data

Introduction
1. Machine:
➢ A collection of few systems 3. Machine Learning
➢ Perform a specific work ➢ Enabling the machine to learn
➢ Example ➢ Basically these are
➢ Apparatus (Oxygen tank), ➢ Mathematical models
➢ Appliance (Fan), ➢ Techniques
➢ Device (Computer), ➢ Algorithms
➢ Watch ➢ Methods
➢ Car ➢ These models learn like natural creation
➢ Aero plane ➢ These are implemented inside machine so
➢ Laptop that they may learn like natural creation
2. Learning:
➢ the acquisition of knowledge or 4. Machine Learning:
skills through ➢ Completely software based
➢ Self learning, models
➢ Being taught. ➢ The models have different
➢ Briefing categories
➢ Observations from ➢ A sub field of Artificial
environment intelligence
➢ Capability of natural creation Prepared By: Dr. Fazli Wahid;
➢ Human, animals, fish, birds Course: Speech Processing 7
Introduction
✓ The sample data known as "training data"
✓ The training data is used to make predictions or decisions without being explicitly programmed to
perform the task
✓ Machine learning algorithms are used in a wide variety of applications, such as email
filtering and computer vision
Deep Learning
✓ Deep learning is part of a broader family of machine learning methods
✓ Deep learning methods are based on artificial neural networks with complicated structure for solving
complex problems
✓ Also known as deep structured learning or differential programming
✓ have been applied to fields including computer vision, speech recognition, natural language
processing, audio recognition, social network filtering, machine translation, bioinformatics, drug
design, medical image analysis, material inspection and game programs
Introduction

5.Difficult
for Human
Artificial Intelligence
2. Algorithms,
Techniques,
1.
3. Act Models
Subset
Like
Human
5.Easy for
Human
Machine Learning
2. Algorithms,
1. Techniques,
Subset Models
3.Lear
n Like
Huma
n 5.Easiest for
Human
3Think 2. Algorithms,
Like Deep Learning Techniques,
Huma Models
n
Note: The Figure may have some minor Prepared By: Dr. Fazli Wahid;
inaccuracy Course: Speech Processing 10
Introduction
➢ The study of speech Speech acquisition ➢ while language is an
signals and the arbitrary (Random)
processing methods of ➢ Acquiring of speech by human association of symbols
signals. beings or computer used according to
➢ Aspects of speech prescribed rules to
processing includes ➢ Speech consists of an organized convey meaning
set of sounds or phonemes
➢ the acquisition, ➢ Our focus here is
➢ manipulation, ➢ Sounds or phonemes are used to speech acquisition by
storage, convey meaning computer system
➢ transfer and
➢ output of speech ➢ Phoneme: any of the distinct units ➢ When computer is used
signals. of sound in a specified language for acquisition of
that distinguish one word from sound, there are many
another, devices used for this
➢ for example p, b, d, and t in the purpose e.g.
English words pad, pat, bad, microphone
and bat.

Introduction
Speech manipulation ➢ This mechanism is called filtering ➢ For example, in google,
techniques Youtube speech
➢ Performing some recognition, different
operations on sound ➢ There are different types of filters models are used
for removing various types of
➢ The operations are for noise from the sound Speech Storage
improving the quality of
➢ The processing of
sound Speech Identification storing sound for some
➢ Speech Identification means future reference is
➢ The quality is improved
determining the meaning of sound called speech storage
for clear understanding
➢ For example, there are ➢ For example, a given sound is ➢ For example, the
some noise in the sound music, normal sound, noise, speech recognition
systems store the
➢ That noise is actually ➢ For this process of distinguishing sounds of different
unwanted signal that many Artificial intelligence and individuals for future
create disturbance machine learning models are used reference
➢ Therefore some
mechanism is required to
remove this noise Course:: Speech Processing 12
Introduction
➢ It incorporates knowledge and
Speech recognition research in the computer
science, linguistics and computer
Speech Transfer ➢ Speech recognition is engineering fields.
an interdisciplinary sub
➢ The process of moving sound data field of computer Speech Synthesis
from one location to another science and computatio ➢ Speech synthesis is the artificial
location is called speech transfer nal linguistics production of human speech.
➢ Develops methodologie ➢ A computer system used for this
➢ For transferring sound, different s and technologies purpose is called a speech
devices are used such as radio or ➢ that enable the computer or speech synthesizer,
even TV recognition and
and translation of ➢ can be implemented
➢ Different communication media spoken language into in software or hardware products.
are used for transferring sound text by computers. ➢ A text-to-speech (TTS) system
➢ It is also known converts normal language text
➢ Both wired and wireless media as automatic speech into speech;
are used for this purpose recognition (ASR), co ➢ other systems render symbolic
mputer speech linguistic
Two most important components of recognition or speech representations like phonetic
speech processing to text (STT). transcriptions into speech
1. Speech recognition
2. Speech synthesis
Introduction
1. Signal processing
Fields for Speech processing ➢ The process of extracting relevant information from the
speech signal in an efficient and robust manner.
These disciplines are involved in
speech processing system ➢ Using this process we can characterize the time-varying
properties of the speech signal
2. Physics ➢ as well as various types of signal preprocessing and post
3. Pattern recognition processing to make the speech signal robust
4. Linguistics
5. Physiology 2. Physics
6. Computer Science ➢ The science of understanding the relationship between the
7. Psychology physical speech signal and physiological mechanisms or
➢ we can say that human vocal tract mechanism that produces

speech and with which the speech is perceived (interpreted)

Introduction
3. Pattern Recognition
Fields for Speech processing ➢ The set of algorithms used to cluster data to create patterns
and
speech processing system ➢ to compare a pair of patterns on the basis of feature
measurement.
2. Physics 4. Linguistics (Language related)
3. Pattern recognition
4. Linguistics ➢ The relationship between sounds (phonology), words in a
5. Physiology language (syntax), meaning of spoken words (semantics),
6. Computer Science and sense derived from the meaning (pragmatics).
7. Psychology
5. Physiology (Study of human body working)
➢ Understanding of the higher-order mechanisms within the

human central nervous system that account for speech
production and perception in human beings.
Introduction
Fields for Speech processing

6. Computer Science
speech processing system ➢ The study of efficient algorithms for implementing, in
software and hardware, the various methods used in a
1. Signal processing practical speech recognition system
2. Physics
3. Pattern recognition 7. Psychology (Study of mind and behavior)
4. Linguistics
5. Physiology ➢ The science of understanding the factors that enable a
6. Computer Science technology to be used by human beings in practical tasks.
7. Psychology

Introduction
Speech Recognition using ML
Sound Signal Spectrogram

Pre-Processing
Acquisition Conversion Creation
Feature Image Image Feature Image Feature

Extraction conversion Extraction Reduction
Recognition

Introduction
3. Spectrogram creation process
1. Noise Removal: ➢ A spectrogram is a visual representation of
➢ Unwanted parts are removed from the spectrum of frequencies of a signal as it varies with
the sound time.
➢ These parts create disturbance ➢ Spectrum represents the distribution of power of values
making the signal
2. Values normalization ➢ Frequency is the number of occurrences of a repeating event
per unit of time
➢ Normalization is a scaling ➢ When applied to an audio signal, spectrograms are
technique in which values are sometimes called sonographs, voiceprints, or voicegrams
shifted and rescaled so that they
end up ranging between 0 and 1. 4. Spectrogram Feature Extraction process
➢ For example, there may be values ➢ Processing the whole spectrogram is very complex process
3, 6, 1200, 20, etc ➢ Therefore important portions of spectrogram are retrieved
➢ These values are difficult to from the whole spectrogram
process ➢ These parts are complete representation of the whole
➢ So these are given other ranges spectrogram
for easy processing ➢ This extraction of parts is called feature extraction
Introduction
7. Image Feature Reduction
5. Image conversion
➢ The features extracted in previous stage are larger in
➢ The portions extracted in the last number
stage are difficult to process ➢ These features are difficult to process by ML technique
➢ For easy processing, the ➢ Some operations are perform to reduce this number
spectrogram parts are converted ➢ This process is called feature reduction stage
to image ➢ There are different techniques used for this purpose
➢ The image is numerical
representation of spectrum 8. Recognition
➢ Actually it is large number of ➢ The features obtained from the previous stage are given as
numerical values in 2 dimensional input to the ML model
form ➢ The ML model recognizes the inputs
➢ For different sound, different values are inputs
6. Image Feature extraction ➢ Based on these inputs to ML, the sound type is determined
➢ The output can be music sound, normal sound or other types
➢ The important features (Portions) of sound
of image are extracted ➢ Different ML models for example, ANN, KNN, SVM, DT,
➢ These features are also numerical Deep models are Prepared
used forBy:this purpose
Dr. Fazli Wahid;
values Course:: Speech Processing 19
Introduction

Course:: Deep Learning 20

Chapter 1 Introduction

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1 Introduction

Uploaded by

Copyright:

Available Formats

In the name of Allah almighty, the most gracious the most merciful and the most beneficent.

Prepared By: Dr. Fazli Wahid;

By: Dr. Fazli Wahid

Department of Computer Science and Information Technology

1. Basics of AI, ML, DL

Prepared By: Dr. Fazli Wahid;

Prepared By: Dr. Fazli Wahid;

✓ Uses previous patterns and inference (Conclusion) instead

✓ Machine learning algorithms build a mathematical model based on sample data

Prepared By: Dr. Fazli Wahid;

✓ The sample data known as "training data"

✓ Also known as deep structured learning or differential programming

Prepared By: Dr. Fazli Wahid;

Prepared By: Dr. Fazli Wahid;

➢ we can say that human vocal tract mechanism that produces

Prepared By: Dr. Fazli Wahid;

➢ Understanding of the higher-order mechanisms within the

Fields for Speech processing

Prepared By: Dr. Fazli Wahid;

Speech Recognition using ML

Sound Signal Spectrogram

Feature Image Image Feature Image Feature

Prepared By: Dr. Fazli Wahid;

Prepared By: Dr. Fazli Wahid;

You might also like