You are on page 1of 3

February 7, 2001 9:22 WSPC/115-IJPRAI 00073

International Journal of Pattern Recognition and Artificial Intelligence


Vol. 15, No. 1 (2001) 5–7
c World Scientific Publishing Company

INTRODUCTION

A SIMPLE COMPLEX IN ARTIFICIAL INTELLIGENCE


AND MACHINE LEARNING
Int. J. Patt. Recogn. Artif. Intell. 2001.15:5-7. Downloaded from www.worldscientific.com
by CHINESE UNIVERSITY OF HONG KONG on 02/17/15. For personal use only.

B. H. JUANG
Bell Labs, Murray Hill, New Jersey, USA

Many people learn from other people. But, many wise people also learn from data
and almanacs because they are more objective and because they leave the inference
of truths to those who are able to analyze them. This direct dependency on data
seems to draw a parallel in the development of artificial intelligence in the past two
decades. Statistical methods, which operate on data rather than rules, have become
indispensable in the field of artificial intelligence. Statistics is undoubtedly being
recognized as a form of knowledge.
Statistical methods have played and continue to play an essential role in pattern
recognition. Automatic learning from data, a large amount of data, as opposed to
human experts, has proven to produce results that are more consistent and effective
in many practical applications, particularly, when the performance of the pattern
recognition system is also judged statistically (e.g. in terms of the error rate). The
ever-increasing computing power at our disposal also leads to efficiency in system
design and revision. We have clusters of computers — dubbed “training camps” —
that churn to optimize millions of parameters needed in a large recognition system.
While the need of human expertise remains strong for other reasons, automated
acquisition of knowledge from a large pool of data is a task that is commensurate
with the preeminence of computers.
Most pattern recognition systems deal with observations of events that dis-
play randomness. Natural randomness or uncertainty that interests us is rarely
stationary (as in speech) or homogeneous (as in image). Sources of uncertainty can
be broadly divided into four categories: structure, production, ambience and mea-
surement. We have the habit of assuming a particular kind of uncertainty when
it comes to ambience or measurement, but we often feel a bit at a loss in dealing
with the uncertainty in structure and in production. This is best illustrated by way
of example in speech communication. We speak to express a notion in our mind.
The exact expression in terms of the words we choose may be influenced by the
mood or context of the conversation. When the sequence of words is spoken, our

5
February 7, 2001 9:22 WSPC/115-IJPRAI 00073

6 B. H. Juang

articulation apparatus manages to produce the corresponding sounds. The speech


that reaches our ear (or the machine that is supposed to be recognizing the speech)
would inevitably contain uncertainty due to the ambient factor as well as the fluc-
tuation in our hearing mechanism. The uncertainties in structure and production
relate to the unpredictable nature of the way the word sequence is formed in our
mind and, particularly, the sounds produced by our vocal device. And, these various
kinds of uncertainty interact with each other, making the process rather interesting
and hard to cope with, if we have only simple forms of probabilistic models to record
the random behavior of the event. It calls for new mathematical tools or models for
Int. J. Patt. Recogn. Artif. Intell. 2001.15:5-7. Downloaded from www.worldscientific.com

representing the structure of statistics such that we can learn from complex data.
by CHINESE UNIVERSITY OF HONG KONG on 02/17/15. For personal use only.

The hidden Markov model is such an advance that has attracted a lot of attention
in the past two decades.
The original idea of a hidden Markov model (HMM) is simply a representa-
tion of stochastic functions of Markov chain. It is often called a doubly stochastic
process. The model is capable of addressing heterogeneous states of randomness
via the underlying Markov chain, while the (usually homogeneous or stationary)
uncertainty in each state can be conveniently covered by ordinary distributions
or density functions. This is well suited for many natural events such as speech,
handwritten characters and, as is shown in this special issue, many signals in
vision. This is not difficult to understand since many natural events are causal
(the Markov chain) and local (the in-state distribution) — at least to a first-order
effect. The interplay of homogeneity and nonhomogeneity in these event-processes
is the unique feature that the hidden Markov model is able to address.
The HMM was also found to be a special and yet very general form of a statistical
problem called the incomplete data problem. In the formulation, it assumes the
notion of a soft and fuzzy state boundary. This gives the model the flexibility of
making as much use of the data as possible to ensure maximum reliability in the
estimated (learned) statistics, encapsulated by the values of the parameters that
define the model. In other words, the hidden Markov model offers the potential of
an effective and reliable format of knowledge, a precise representation of which will
be realized through data learning.
The development of HMM has undergone several important stages. The origi-
nal contributions by Baum and his colleagues laid the groundwork of the hidden
Markov model, including the problem of model estimation. This remarkable ad-
vance was later enhanced by Poritz who embedded autoregressive models into the
HMM framework and by Liporace who relaxed the limitation on the local dis-
tribution, from the family of log-concave distributions to the family of elliptically
symmetric distributions. Further expansion of the HMM capability took place when
mixture densities were introduced into the HMM framework at Bell Laboratories
in the mid-1980, out of a need for modeling speech parameters. The significance of
mixture-density HMM is that, given a sufficient number of mixture components,
its approximation to any distribution within a state can be made arbitrarily close,
thereby alleviating, in theory, the fundamental problem of model mismatch.
February 7, 2001 9:22 WSPC/115-IJPRAI 00073

Introduction 7

The HMM is a powerful mathematical formalism and yet is computationally


straightforward. Most of its related algorithms are linear-time and intuitive. The
model is a complex of thoughts but its reasoning and execution are brilliantly
simple. It is a simple complex.
The HMM has been successfully applied to speech recognition. Today, almost all
automatic speech recognition systems are based on HMM. Other applications that
have been reported include financial market prediction, crypto-analysis, and object
recognition, to name a few. It is only a matter of time before people find it useful
in many more fields as well. This issue of the International Journal of Artificial
Int. J. Patt. Recogn. Artif. Intell. 2001.15:5-7. Downloaded from www.worldscientific.com

Intelligence and Pattern Recognition marks such a worthwhile undertaking.


by CHINESE UNIVERSITY OF HONG KONG on 02/17/15. For personal use only.

You might also like