Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
practical pattern matching

practical pattern matching

Ratings: (0)|Views: 58|Likes:
Published by DanteA

More info:

Published by: DanteA on Nov 11, 2008
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





N e w s
Features Editor:Dennis Taylor
In the
as recognizing speech patterns and creating
a nanotechnology resource library.
No new genes?
At the University of Toronto, Brendan
Frey is leading a group of scientists who are

using AI techniques to analyze molecular-
biology data. One of their projects involves
using a factor graph they developed called
GenRate to discover and evaluate genes in
mouse tissues. Factor graphs let researchers
describe a system with complex variables,

such as gene location in DNA as well as gene
length and function.

\u201cWhat a factor graph is useful for,\u201d says Frey, \u201cis describing a scoring function that tells you how good each setting of the vari- ables is.\u201d

Using samples from over 1 million probes
along DNA in 37 different mouse tissues,
the scientists used their factor graph to deter-
mine which bits of DNA are expressed, or

activated to read protein. In some tissues, the DNA is expressed; in others, it might not be. DNA parts that have no function are never activated.

In the factor graph, each variable is a
node. The scoring function comprises many
local scoring functions that look for a small
number of variables. For that small set of
variables, it finds a score for each configu-
ration of those variables. The local scores\u2019
sum is the total score. \u201cIt\u2019s a nice way to
decompose a very complex problem into a
whole bunch of simpler problems,\u201d Frey
says. The scientists then compare the factor
graph data to known gene patterns.

Because the factor graph provides a com-
putational framework for vetting the best
configuration of variables as well as discov-
ering them, the team came up with surpris-
ing results that led to a major revision of the
view of the mammalian genome. Although
some research claims many genes are left to
discover, Frey\u2019s team has shown that might
not be true. \u201cBeyond the genes we found,\u201d
Frey says, \u201cwe don\u2019t believe there exists
many new protein-coding genes.\u201d

Cancer detection
At the University of Texas at Arlington,
Lawrence Holder has developed Subdue,
another pattern recognition system based

on graphs. A data mining system that repre- sents data as a collection of nodes and links between the nodes, Subdue works by search- ing through graphic data using a heuristic

based on the notion of compression.

After researchers input a big graph into the system and run a search, Subdue finds a pattern that has several instances in the

graph. The system then replaces all of those instances by a single node, making the graph smaller. The larger the pattern, and the more instances it has, the more compression you get. \u201cThe more it compresses, the more we\u2019re interested in it,\u201d Holder says.

A practical application of Subdue exam-
ines a chemical structure to determine

whether it causes cancer. The system repre- sents the chemical in terms of its atoms and the bonds between them (the atoms are

nodes in the graph, and the bonds are links).
For the system to learn, researchers input
many cancer-causing chemicals as graphs,
which the system searches to find recurring

patterns, or subgraphs. It then searches
through the space of subgraphs to find a
pattern that shows up a lot. That pattern is
then matched against the new chemical\u2019s

\u201cThe interpretation would be if this sub-
molecule shows up in 90 percent of these
chemicals that cause cancer, then it may be

predictive,\u201d Holder says. So, if a new chem-

ical contains the subgraph, he says, \u201cyou
might predict that this chemical may cause
cancer, and you may want to go off and test
it in the laboratory.\u201d

Holder has tested the system in the Amer- ican Cancer Institute\u2019s predictive-toxicology challenge. The Institute releases information

Practical Pattern Matching
Danna Voth
umans are fascinated by patterns, and they can spot them well\u2014in fact, that\u2019s one

area where humans excel over computers. But research is producing interesting
competition as scientists discover and employ new methods of automated pattern recogni-
tion. Practical applications include finding genes, detecting cancer-causing chemicals in
molecules, searching out potential terrorists, and predicting terrorist threat levels, as well

1541-1672/06/$20.00 \u00a9 2006 IEEE
Published by the IEEE Computer Society

on both a set of chemicals that it has deter-
mined to be cancer causing and a set that
isn\u2019t. Participants speculate which chemi-
cals cause cancer, and the one with the
most correct guesses wins the challenge.
Holder won the competition in 2000.

Terrorists targeted
Subdue is also useful for detecting pat-
terns of potential terrorist activity and locat-
ing potential terrorist networks. Holder
trained his system on simulated data that
the US Air Force\u2019s Evidence Assessment,

Grouping, Linking, and Evaluation program created. The domain simulates the evidence available about terrorist groups and their

plans before they put them into action.
Following a general plan of starting a
group, recruiting members, acquiring

resources, communicating, visiting targets,
and transferring resources between actors,
groups, and targets, the domain contains
numerous concepts. The concepts include
threat and nonthreat actors and threat and
nonthreat groups.

Trained on patterns that give examples
of threat potential, Subdue searched the
simulated data to find similar types of pat-
terns. The system achieved 78 to 93 percent
accuracy discriminating threat from non-
threat groups.

Language learning
Cornell University professor Shimon
Edelman, in collaboration with colleagues at
Tel Aviv University, has created a program
that can discover patterns in languages,

learn them as grammars, and then generate
sentences of its own in that language. The
system is called ADIOS(automatic distilla-
tion of structure), and it has been tested on
both natural languages, such as English and
Chinese, and artificial grammars, such as
those in DNA and music.

\u201cYou can only recognize patterns if you
have the right primitives, the right features,\u201d
Edelman says. \u201cIt\u2019s like having the right


Language contains patterns that on its
face are invisible, and it\u2019s generally thought
to possess structure beyond just the serial
order of words, a grammar. \u201cThe true struc-
ture of the sentence is a kind of tree,\u201d Edel-

man says. ADIOScombines statistics and

rules applied to a body of text in a language
to discover the grammar. The system can
then generate sentences in that language.

\u201cIt can do things like assign structure to a

new sentence,\u201d Edelman says. \u201cIt\u2019s not such a big deal to recognize a pattern on which you\u2019ve trained your system.\u201d The team is patenting the technology, and Edelman

wants to put it to commercial use. One possi-
ble arena is speech recognition technologies.
Patterns in theory

Research on the theoretical aspect of
pattern discovery is also generating useful
applications. At the University of Califor-
nia, Davis, Jim Crutchfield has leveraged
his interest in \u201cwhat a pattern is\u201d to apply a
pattern\u2019s abstract definition\u2014which he
calls a causal state\u2014to different kinds of

processes. He defines causal states as groups
of histories that lead to the same knowledge
about the future.

The mathematical theory that defines the
causal states leads to a small number of

possible ways to find the causal states. The
mathematical definition of being in the
same state of predictability about the future
means that, by looking at data, you can
estimate and make predictions about the
future on the basis of different points in
time having different histories. From that
definition, Crutchfield derived an algorithm
that describes how to group histories that
provide knowledge about the future when
those histories are basically predictively
equivalent. \u201cWe just apply this in these
different domains,\u201d he says, \u201cwhether it\u2019s a

spatial pattern, like cellular automata, or time
series, or looking at complex materials.\u201d
Crutchfield has applied his causal-state

definition to examining dynamic systems,
irregular crystals, hidden Markov models,
and cellular automata. One field of applica-
tion is quantum computation.

\u201cA current proposal for implementing
molecular computers is to look at very long
chain molecules and to design the interac-
tions between the atoms in the molecular
chain so that they implement various of
these cellular-automata rules,\u201d Crutchfield
says. \u201cSo this pattern discovery system that
we have for cellular automata is making a
catalog of all the possible kinds of interac-
tions and what sorts of information storage
structures they can produce, and how those

information storage structures can be moved around and interacted, and how they interact to process information.\u201d

Crutchfield is working on developing a
library called the Encyclopedia of Cellular
Automata. \u201cIt will be a resource for people
working in nanotechnology,\u201d he says, \u201cto

look at how to design molecular systems that have only local interactions but that will pro- duce in their behavior large-scale structures that can be used for doing computations.\u201d

According to Nello Cristiani, associate
professor of statistics at UC Davis, people

have always been attracted to patterns and
pattern recognition. \u201cIn a way, this is the
essence of science and most cognitive

processes, such as generalization,\u201d Cristiani
says. \u201cNow this activity has been automa-
Ga O
The Subdue pattern-recognition system searches through graphic data to find a
pattern that has several instances. In this chemical-structure example, the atoms are
nodes in the graph and the bonds are links in the graph.
tized, and we rely heavily on it as a society.

\u201cThere would be no genome project, no speech recognition, and probably no credit card system without it. The last decade has

seen a revolution in pattern recognition
technology. Machine learning algorithms
are now faster, simpler, and more accurate
in generalization.\u201d

IEEE Computer Society
Publications Office
10662 Los Vaqueros Circle, PO Box 3014
Los Alamitos, CA 90720-1314
Lead Editor
Dennis Taylor
Group Managing Editor
Crystal R. Shif
Senior Editors
Shani Murray, Dale Strok, and Linda World
Staff Editor
Rita Scanlan
Editorial Assistants
Brooke Miner and Molly Mraz
Magazine Assistant
Hilda Hosillos
Contributing Editors
Keri Schreiner and Joan Taylor
Design Director
Toni Van Buskirk
Layout/Technical Illustrations
Carmen Flores-Garvey and Alex Torres
Angela Burgess, aburgess@computer.org
Assistant Publisher
Dick Price
Membership/Circulation Marketing Manager
Georgann Carter
Business Development Manager
Sandra Brown
Senior Production Coordinator
Marian Anderson
Submissions:For detailed instructions and formatting,

see the author guidelines at www.computer.org/intelligent/ author.htm or log onto IEEE Intelligent Systems\u2019 author center at Manuscript Central (www.computer.org/mc/ intelligent/author.htm). Visit www.computer.org/intelligent for editorial guidelines.

Editorial:Unless otherwise stated, bylined articles as well as

products and services reflect the author\u2019s or firm\u2019s opinion; inclusion does not necessarily constitute endorsement by the IEEE Computer Society or the IEEE.

\u201cSassy\u201d Chatbot Wins
with Wit
Benjamin Alfonsi
hen you sit down for an online

chat with computer scientist
Rollo Carpenter, you\u2019re not quite sure if
it\u2019s him on the other end or his virtual alter

ego, a chatbot named George. And that, in a
nutshell, is the point of Carpenter\u2019s research.

Carpenter\u2019s work is inspired by Japanese roboticist Masahiro Mori\u2019s Uncanny Valley theory. In 1970, Mori asserted that, as robots become increasingly human-like in appear- ance, movement, and behavior, they will

illicit emotional responses from human beings that border on human-to-human empathy levels.

In discussing digital immortalization and
a cultural renaissance by way of virtual

reality, Carpenter views advanced chatbots as harbingers of a new era in machine learn- ing. \u201cWe intend to get very close indeed to the Uncanny Valley,\u201d he says.

Bringing home the bronze

During the 2005 Loebner Prize contest
(www.loebner.net), a panel of judges found
George to be the most convincing conver-
sationalist of the four chatbot participants,
which included reigning three-time cham-
pion Alice. The contest, launched in 1990
by Hugh Gene Loebner and touted as \u201cthe
first formal instantiation of a Turing Test,\u201d

gauges the contestants\u2019 \u201cintelligence\u201d levels.

In the contest\u2019s 15 years, no contestant
has won a silver or gold medal, awarded
for convincing at least half of the judges
that a text-based program or virtual per-
sona is actually real. However, every year a
bronze medal and a cash prize have gone to
the most human-like program. This year,
Carpenter\u2019s Jabberwacky program, which

hosts George, brought home the bronze.
Now, George has caught the interest of the
computer science community as well as
thousands of visitors to the Jabberwacky
site (www.jabberwacky.com).

Considering context

Many of the site\u2019s visitors find it hard to believe George isn\u2019t human after convers- ing with him. Even some of the contest\u2019s four judges were fooled\u2014at least initially.

\u201cIt [becomes] crystal clear within a couple
of lines of communication who is human

and who is bot,\u201d according to one judge, Lila Davachi, assistant professor of psychology at New York University. \u201cHowever, I found Jabberwacky to be the most interesting bot by far because it displayed some very human qualities. It was sassy and playful.\u201d

What makes George seem so human? It
appears George is different from other
chatbots not only because of his personal-
ity but also because of how he learns. Car-
penter says machine learning is trending
toward being statistical and probabilistic,
relying on analyzing significant volumes
of data.

\u201cMost of the chatbots that exist today
work in a hard-coded, entirely predictable
fashion,\u201d he says. \u201cA series of \u2018if\u2019 (or equiv-
alent) statements created by the program-

mer evaluate the input and return known
results, either as whole sentences or as
modifications of the input.\u201d

Although George is also statistical, prob-

abilistic, and data intensive, the bot pro- gram is also something else: chaotic. \u201cIt never turns probabilities into numbers,\u201d explains Carpenter. \u201cIt avoids looping

through data, summing up an estimate of
\u2018fitness for purpose.\u2019\u201d

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->