You are on page 1of 28

ASR system

Rashmi Kethireddy
Acoustic Modeling
What it does ?

Different Acoustic models in ASR system

- HMM+GMM
- HMM+DNN
- Connectionist Temporal Classification (CTC)
- Sequence to sequence neural network model
Acoustic Modeling - Markov assumption
Markov assumption states that the future state only depends on the current
state.

Example :

[1] A tutorial on hidden Markov model - Rabiner


[2] https://web.stanford.edu/~jurafsky/slp3/A.pdf
Acoustic Modeling - Hidden Markov Model
Acoustic Modeling - Markov terminology
Acoustic Modeling - HMM Example
Given a sequence of observations O(each an integer representing the number of ice creams eaten on a
given day) find the ‘hidden’ sequence Q of weather states (H or C) which caused Jason to eat the ice
cream.
Acoustic Modeling - Three problems for speech
Acoustic Modeling : Solution 1 (computing
likelihood)
what is the probability of the sequence of observations “3 1 3”, given the model λ ?

Here the state sequence is hidden. Let us find the probability for state sequence “hot hot cold”
P(3 1 3|hot hot cold) = P(3|hot)×P(1|hot)×P(3|cold) = 0.4 x 0.2 x 0.1
P(3 1 3,hot hot cold) = P(hot|start)×P(hot|hot)×P(cold|hot)×P(3|hot)×P(1|hot)×P(3|cold)

For an HMM with N hidden states and an observation sequence of


T observations, there are N^T possible hidden sequence.
Acoustic Modeling : Forward Algorithm
Acoustic Modeling : Forward Algorithm
Forward Algorithm
Decoder : Solution 2 Viterbi Algorithm
In the ice-cream
domain, given a
sequence Decoding
of ice-cream
observations 3 1 3
and an HMM, the
task of the
decoder is to find
the Decoder best
hidden weather
sequence (H H H)
Decoder : Viterbi Algorithm
Solution 3: Learning A,B (Expectation maximization (EM))
● EM is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to
computing a better estimate, and so on, iteratively improving the probabilities that it learns.

● We now know input observations and aligned hidden states:


3 3 2 1 1 2 1 2 3
hot hot cold cold cold cold cold hot hot

Πh = ⅓, Πc = ⅔

p(hot|hot) =⅔ p(cold|hot) =⅓ p(cold|cold) =½ p(hot|cold) =½


P(1|hot) =0/4=0 p(1|cold) =3/5=.6 P(2|hot) =1/4=.25 p(2|cold=2/5=.4 P(3|hot) =3/4=.75 p(3|cold) =0

The Baum-Welch algorithm solves this by iteratively estimating the counts. We will start with an estimate for the
transition and observation probabilities and then use these estimated probabilities to derive better and better
probabilities.
Solution 3 : Backward Algorithm
Solution 3 :Backward Algorithm
Acoustic modeling : Solution 3
Third problem for HMMs: learning the parameters of an HMM, that is, the A and B matrices
Forward-Backward Algorithm
HMM-GMM
Pronunciation Modeling
Maps each word to a word sequence. We call it Pronunciation dictionary.

CMU-dict for English.


Language modeling - N-grams

Example corpus :

1. He said thank you.


2. He said bye as he walked through the door.
3. He went to San Diego.
4. San Diego has nice weather.
5. It is raining in San Francisco.
WFST
FSA (Finite State Automaton) : It can represent infinite set of strings using
finite states. It reads one symbol at a time and either accept or rejects the
string.

WFSA (Weighted Finite State Automata) : It will output weights that is the
probability

WFST (Weighted Finite State Transducer) : outputs another string from the set
of alphabets from output set and probability.

[1] Weighted Finite-State Transducers in Speech Recognition - Mehryar Mohri, Fernando


Pereira, and Michael Riley
Representation of WFST
● Maps sequences of input symbols to output symbols with some
probabilities.

● Input sequence abcd maps to XYZW with transition probabilities 0.1, 0.2,
0.5, 0.1
WFST Operations
● Composition
○ Compose two transducers
● Determinization
○ Unique start state
○ No two transitions from a state share the same label
○ No epsilon
● Minimization
○ Removal of Equivalent states.
Composition
Determinization
Minimization

You might also like