Professional Documents
Culture Documents
INTRODUCTION
B. H. JUANG
Bell Labs, Murray Hill, New Jersey, USA
Many people learn from other people. But, many wise people also learn from data
and almanacs because they are more objective and because they leave the inference
of truths to those who are able to analyze them. This direct dependency on data
seems to draw a parallel in the development of artificial intelligence in the past two
decades. Statistical methods, which operate on data rather than rules, have become
indispensable in the field of artificial intelligence. Statistics is undoubtedly being
recognized as a form of knowledge.
Statistical methods have played and continue to play an essential role in pattern
recognition. Automatic learning from data, a large amount of data, as opposed to
human experts, has proven to produce results that are more consistent and effective
in many practical applications, particularly, when the performance of the pattern
recognition system is also judged statistically (e.g. in terms of the error rate). The
ever-increasing computing power at our disposal also leads to efficiency in system
design and revision. We have clusters of computers — dubbed “training camps” —
that churn to optimize millions of parameters needed in a large recognition system.
While the need of human expertise remains strong for other reasons, automated
acquisition of knowledge from a large pool of data is a task that is commensurate
with the preeminence of computers.
Most pattern recognition systems deal with observations of events that dis-
play randomness. Natural randomness or uncertainty that interests us is rarely
stationary (as in speech) or homogeneous (as in image). Sources of uncertainty can
be broadly divided into four categories: structure, production, ambience and mea-
surement. We have the habit of assuming a particular kind of uncertainty when
it comes to ambience or measurement, but we often feel a bit at a loss in dealing
with the uncertainty in structure and in production. This is best illustrated by way
of example in speech communication. We speak to express a notion in our mind.
The exact expression in terms of the words we choose may be influenced by the
mood or context of the conversation. When the sequence of words is spoken, our
5
February 7, 2001 9:22 WSPC/115-IJPRAI 00073
6 B. H. Juang
representing the structure of statistics such that we can learn from complex data.
by CHINESE UNIVERSITY OF HONG KONG on 02/17/15. For personal use only.
The hidden Markov model is such an advance that has attracted a lot of attention
in the past two decades.
The original idea of a hidden Markov model (HMM) is simply a representa-
tion of stochastic functions of Markov chain. It is often called a doubly stochastic
process. The model is capable of addressing heterogeneous states of randomness
via the underlying Markov chain, while the (usually homogeneous or stationary)
uncertainty in each state can be conveniently covered by ordinary distributions
or density functions. This is well suited for many natural events such as speech,
handwritten characters and, as is shown in this special issue, many signals in
vision. This is not difficult to understand since many natural events are causal
(the Markov chain) and local (the in-state distribution) — at least to a first-order
effect. The interplay of homogeneity and nonhomogeneity in these event-processes
is the unique feature that the hidden Markov model is able to address.
The HMM was also found to be a special and yet very general form of a statistical
problem called the incomplete data problem. In the formulation, it assumes the
notion of a soft and fuzzy state boundary. This gives the model the flexibility of
making as much use of the data as possible to ensure maximum reliability in the
estimated (learned) statistics, encapsulated by the values of the parameters that
define the model. In other words, the hidden Markov model offers the potential of
an effective and reliable format of knowledge, a precise representation of which will
be realized through data learning.
The development of HMM has undergone several important stages. The origi-
nal contributions by Baum and his colleagues laid the groundwork of the hidden
Markov model, including the problem of model estimation. This remarkable ad-
vance was later enhanced by Poritz who embedded autoregressive models into the
HMM framework and by Liporace who relaxed the limitation on the local dis-
tribution, from the family of log-concave distributions to the family of elliptically
symmetric distributions. Further expansion of the HMM capability took place when
mixture densities were introduced into the HMM framework at Bell Laboratories
in the mid-1980, out of a need for modeling speech parameters. The significance of
mixture-density HMM is that, given a sufficient number of mixture components,
its approximation to any distribution within a state can be made arbitrarily close,
thereby alleviating, in theory, the fundamental problem of model mismatch.
February 7, 2001 9:22 WSPC/115-IJPRAI 00073
Introduction 7