Practical Pattern Matching

In the
News
ALSO FEATURED THIS ISSUE
“SASSY” CHATBOT WINS WITH WIT

Features Editor: Dennis Taylor
dtaylor@computer.org
another pattern recognition system based

Practical Pattern Matching on graphs. A data mining system that repre-
sents data as a collection of nodes and links
between the nodes, Subdue works by search-
Danna Voth ing through graphic data using a heuristic
based on the notion of compression.
After researchers input a big graph into
H umans are fascinated by patterns, and they can spot them well—in fact, that’s one
area where humans excel over computers. But research is producing interesting
competition as scientists discover and employ new methods of automated pattern recogni-
the system and run a search, Subdue finds
a pattern that has several instances in the
graph. The system then replaces all of those
tion. Practical applications include finding genes, detecting cancer-causing chemicals in instances by a single node, making the graph
smaller. The larger the pattern, and the more
molecules, searching out potential terrorists, and predicting terrorist threat levels, as well instances it has, the more compression you
get. “The more it compresses, the more we’re
as recognizing speech patterns and creating In the factor graph, each variable is a interested in it,” Holder says.
a nanotechnology resource library. node. The scoring function comprises many A practical application of Subdue exam-
local scoring functions that look for a small ines a chemical structure to determine
No new genes? number of variables. For that small set of whether it causes cancer. The system repre-
At the University of Toronto, Brendan variables, it finds a score for each configu- sents the chemical in terms of its atoms and
Frey is leading a group of scientists who are ration of those variables. The local scores’ the bonds between them (the atoms are
using AI techniques to analyze molecular- sum is the total score. “It’s a nice way to nodes in the graph, and the bonds are links).
biology data. One of their projects involves decompose a very complex problem into a For the system to learn, researchers input
using a factor graph they developed called whole bunch of simpler problems,” Frey many cancer-causing chemicals as graphs,
GenRate to discover and evaluate genes in says. The scientists then compare the factor which the system searches to find recurring
mouse tissues. Factor graphs let researchers graph data to known gene patterns. patterns, or subgraphs. It then searches
describe a system with complex variables, Because the factor graph provides a com- through the space of subgraphs to find a
such as gene location in DNA as well as gene putational framework for vetting the best pattern that shows up a lot. That pattern is
length and function. configuration of variables as well as discov- then matched against the new chemical’s
“What a factor graph is useful for,” says ering them, the team came up with surpris- structure.
Frey, “is describing a scoring function that ing results that led to a major revision of the “The interpretation would be if this sub-
tells you how good each setting of the vari- view of the mammalian genome. Although molecule shows up in 90 percent of these
ables is.” some research claims many genes are left to chemicals that cause cancer, then it may be
Using samples from over 1 million probes discover, Frey’s team has shown that might predictive,” Holder says. So, if a new chem-
along DNA in 37 different mouse tissues, not be true. “Beyond the genes we found,” ical contains the subgraph, he says, “you
the scientists used their factor graph to deter- Frey says, “we don’t believe there exists might predict that this chemical may cause
mine which bits of DNA are expressed, or many new protein-coding genes.” cancer, and you may want to go off and test
activated to read protein. In some tissues, the it in the laboratory.”
DNA is expressed; in others, it might not be. Cancer detection Holder has tested the system in the Amer-
DNA parts that have no function are never At the University of Texas at Arlington, ican Cancer Institute’s predictive-toxicology
activated. Lawrence Holder has developed Subdue, challenge. The Institute releases information
4 1541-1672/06/$20.00 © 2006 IEEE IEEE INTELLIGENT SYSTEMS

Published by the IEEE Computer Society
on both a set of chemicals that it has deter-
mined to be cancer causing and a set that R
R
isn’t. Participants speculate which chemi- O Ga
Ga O C
cals cause cancer, and the one with the C
O Ga O CH CH
most correct guesses wins the challenge. O Ga O
R O C
Holder won the competition in 2000. C O R C O R
O Ga O CH
Terrorists targeted
Subdue is also useful for detecting pat- C C
Ga O
terns of potential terrorist activity and locat-
O C R
ing potential terrorist networks. Holder R
C O R
trained his system on simulated data that O Ga O
the US Air Force’s Evidence Assessment, O Ga O
C Ga
Grouping, Linking, and Evaluation program C O
Ga O
created. The domain simulates the evidence R
R
available about terrorist groups and their
plans before they put them into action.
Following a general plan of starting a The Subdue pattern-recognition system searches through graphic data to find a
pattern that has several instances. In this chemical-structure example, the atoms are
group, recruiting members, acquiring
nodes in the graph and the bonds are links in the graph.
resources, communicating, visiting targets,
and transferring resources between actors,
groups, and targets, the domain contains
numerous concepts. The concepts include new sentence,” Edelman says. “It’s not such definition to examining dynamic systems,
threat and nonthreat actors and threat and a big deal to recognize a pattern on which irregular crystals, hidden Markov models,
nonthreat groups. you’ve trained your system.” The team is and cellular automata. One field of applica-
Trained on patterns that give examples patenting the technology, and Edelman tion is quantum computation.
of threat potential, Subdue searched the wants to put it to commercial use. One possi- “A current proposal for implementing
simulated data to find similar types of pat- ble arena is speech recognition technologies. molecular computers is to look at very long
terns. The system achieved 78 to 93 percent chain molecules and to design the interac-
accuracy discriminating threat from non- Patterns in theory tions between the atoms in the molecular
threat groups. Research on the theoretical aspect of chain so that they implement various of
pattern discovery is also generating useful these cellular-automata rules,” Crutchfield
Language learning applications. At the University of Califor- says. “So this pattern discovery system that
Cornell University professor Shimon nia, Davis, Jim Crutchfield has leveraged we have for cellular automata is making a
Edelman, in collaboration with colleagues at his interest in “what a pattern is” to apply a catalog of all the possible kinds of interac-
Tel Aviv University, has created a program pattern’s abstract definition—which he tions and what sorts of information storage
that can discover patterns in languages, calls a causal state—to different kinds of structures they can produce, and how those
learn them as grammars, and then generate processes. He defines causal states as groups information storage structures can be moved
sentences of its own in that language. The of histories that lead to the same knowledge around and interacted, and how they interact
system is called ADIOS (automatic distilla- about the future. to process information.”
tion of structure), and it has been tested on The mathematical theory that defines the Crutchfield is working on developing a
both natural languages, such as English and causal states leads to a small number of library called the Encyclopedia of Cellular
Chinese, and artificial grammars, such as possible ways to find the causal states. The Automata. “It will be a resource for people
those in DNA and music. mathematical definition of being in the working in nanotechnology,” he says, “to
“You can only recognize patterns if you same state of predictability about the future look at how to design molecular systems that
have the right primitives, the right features,” means that, by looking at data, you can have only local interactions but that will pro-
Edelman says. “It’s like having the right estimate and make predictions about the duce in their behavior large-scale structures
glasses.” future on the basis of different points in that can be used for doing computations.”
Language contains patterns that on its time having different histories. From that
face are invisible, and it’s generally thought definition, Crutchfield derived an algorithm
to possess structure beyond just the serial
order of words, a grammar. “The true struc-
ture of the sentence is a kind of tree,” Edel-
that describes how to group histories that
provide knowledge about the future when
those histories are basically predictively
A ccording to Nello Cristiani, associate
professor of statistics at UC Davis, people
man says. ADIOS combines statistics and equivalent. “We just apply this in these have always been attracted to patterns and
rules applied to a body of text in a language different domains,” he says, “whether it’s a pattern recognition. “In a way, this is the
to discover the grammar. The system can spatial pattern, like cellular automata, or time essence of science and most cognitive
then generate sentences in that language. series, or looking at complex materials.” processes, such as generalization,” Cristiani
“It can do things like assign structure to a Crutchfield has applied his causal-state says. “Now this activity has been automa-
JANUARY/FEBRUARY 2006 www.computer.org/intelligent 5

IEEE
tized, and we rely heavily on it as a society. seen a revolution in pattern recognition

“There would be no genome project, no technology. Machine learning algorithms
speech recognition, and probably no credit are now faster, simpler, and more accurate
card system without it. The last decade has in generalization.”
IEEE Computer Society

Publications Office
10662 Los Vaqueros Circle, PO Box 3014
Los Alamitos, CA 90720-1314
“Sassy” Chatbot Wins

STAFF
Lead Editor
with Wit
Dennis Taylor
dtaylor@computer.org Benjamin Alfonsi
Group Managing Editor
Crystal R. Shif
W
cshif@computer.org hen you sit down for an online hosts George, brought home the bronze.
Now, George has caught the interest of the
Senior Editors chat with computer scientist
computer science community as well as
Shani Murray, Dale Strok, and Linda World Rollo Carpenter, you’re not quite sure if thousands of visitors to the Jabberwacky
Staff Editor it’s him on the other end or his virtual alter site (www.jabberwacky.com).
Rita Scanlan
Editorial Assistants ego, a chatbot named George. And that, in a Considering context
Brooke Miner and Molly Mraz nutshell, is the point of Carpenter’s research. Many of the site’s visitors find it hard to
Magazine Assistant Carpenter’s work is inspired by Japanese believe George isn’t human after convers-
Hilda Hosillos roboticist Masahiro Mori’s Uncanny Valley ing with him. Even some of the contest’s
theory. In 1970, Mori asserted that, as robots four judges were fooled—at least initially.
Contributing Editors
Keri Schreiner and Joan Taylor
become increasingly human-like in appear- “It [becomes] crystal clear within a couple
ance, movement, and behavior, they will of lines of communication who is human
Design Director illicit emotional responses from human and who is bot,” according to one judge, Lila
Toni Van Buskirk
beings that border on human-to-human Davachi, assistant professor of psychology
Layout/Technical Illustrations empathy levels. at New York University. “However, I found
Carmen Flores-Garvey and Alex Torres In discussing digital immortalization and Jabberwacky to be the most interesting bot
Publisher a cultural renaissance by way of virtual by far because it displayed some very human
Angela Burgess, aburgess@computer.org reality, Carpenter views advanced chatbots qualities. It was sassy and playful.”
Assistant Publisher as harbingers of a new era in machine learn- What makes George seem so human? It
Dick Price ing. “We intend to get very close indeed to appears George is different from other
the Uncanny Valley,” he says. chatbots not only because of his personal-
Membership/Circulation Marketing Manager
Georgann Carter
ity but also because of how he learns. Car-
Bringing home the bronze penter says machine learning is trending
Business Development Manager During the 2005 Loebner Prize contest toward being statistical and probabilistic,
Sandra Brown
(www.loebner.net), a panel of judges found relying on analyzing significant volumes
Senior Production Coordinator George to be the most convincing conver- of data.
Marian Anderson sationalist of the four chatbot participants, “Most of the chatbots that exist today
which included reigning three-time cham- work in a hard-coded, entirely predictable
pion Alice. The contest, launched in 1990 fashion,” he says. “A series of ‘if’ (or equiv-
Submissions: For detailed instructions and formatting, by Hugh Gene Loebner and touted as “the alent) statements created by the program-
see the author guidelines at www.computer.org/intelligent/ first formal instantiation of a Turing Test,” mer evaluate the input and return known
author.htm or log onto IEEE Intelligent Systems’ author gauges the contestants’ “intelligence” levels. results, either as whole sentences or as
center at Manuscript Central (www.computer.org/mc/ In the contest’s 15 years, no contestant modifications of the input.”
intelligent/author.htm). Visit www.computer.org/intelligent
has won a silver or gold medal, awarded Although George is also statistical, prob-
for editorial guidelines.
for convincing at least half of the judges abilistic, and data intensive, the bot pro-
Editorial: Unless otherwise stated, bylined articles as well as that a text-based program or virtual per- gram is also something else: chaotic. “It
products and services reflect the author’s or firm’s opinion; sona is actually real. However, every year a never turns probabilities into numbers,”
inclusion does not necessarily constitute endorsement by bronze medal and a cash prize have gone to explains Carpenter. “It avoids looping
the IEEE Computer Society or the IEEE. the most human-like program. This year, through data, summing up an estimate of
Carpenter’s Jabberwacky program, which ‘fitness for purpose.’”
6 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

IEEE
For George, context is key. Output is
based on an interpretation of the current
context—taking the current conversation
into account, while comparing it to past
conversations.
“Techniques for finding context within How to Reach Us
complete conversations are the real key to the
AI’s success to date,” says Carpenter. “The Writers
context finding tends to do a lot inaccurately, For detailed information on submitting articles, write
allowing influences from all sorts of quarters for our Editorial Guidelines (isystems@computer.org)
even if they may be individually only par- or access www.computer.org/intelligent/author.htm.
tially relevant and often even irrelevant.”
He says tiny differences in context, how-
ever seemingly unconnected, can give the Letters to the Editor
program a reason to choose one thing over Rollo Carpenter’s chatbot, George, took Send letters to
another, and those differences are infinitely home the bronze medal in the 2005
Dennis Taylor, Lead Editor
better than randomness. Loebner Prize contest.
IEEE Intelligent Systems
Many people believe George is human
10662 Los Vaqueros Circle
because he mimics human behavior, which
is contextual, rather than human responses, “Ultimately, this is a form of digital im- Los Alamitos, CA 90720
which might or might not be. For example, mortalization, clearly not a complete one, dtaylor@computer.org
someone testing the program might ask, yet much more immediate and closer to a Please provide an email address or daytime phone
“What color is a red apple?” A programmer person’s ‘being’ than the more classic tech- number with your letter.
could easily add a rule that deals precisely nique of writing an autobiography,” he
with that question, but George formulates says. “Only one person trained George—
the answer to such a question on the basis me. So, George does gradually become my On the Web
of what past users have said to him. Most reflection, or a reflection of the persona I Access www.computer.org/intelligent for information
humans would find the question inane and project, in terms of speech patterns, charac- about IEEE Intelligent Systems.
might respond sarcastically, “Pink with teristics, and interests.”
yellow and white polka dots.” So, this is the But, Carpenter says, anybody can create
Subscription Change of
kind of answer that Carpenter’s program is his or her own chatbot at Jabberwacky, just
likely to give—what some users see as wit. as he has—hence his prophecy that Mori’s Address
Just as humans typically increase in intel- Uncanny Valley is approaching. Send change-of-address requests for magazine subscrip-
ligence from infancy through adulthood, tions to address.change@ieee.org. Be sure to specify
learning from each person with whom they IEEE Intelligent Systems.
come into contact, so does George—who
his creator says is still “a child.”
Membership Change of
“As the data set grows,” Carpenter says,
“there are ever-improved chances of accurate
overlaps between the current conversation and
C arpenter has partnered with Televirtual
in creating a 3D animated character with
Address
Send change-of-address requests for the membership
its predecessors, so it becomes increasingly voice input and custom voice output. (The directory to directory.updates@computer.org.
able to make intelligent connections.” And as partnership has spawned the current image
George’s conversational level improves, peo- of George.) He has also commenced col-
ple are more willing to engage in sensible, laborative research with other computer Missing or Damaged Copies
consistent, and valuable dialogue. The result? scientists to construct a highly lifelike robotic If you are missing an issue or you received a damaged
Data continues to improve over time. head that imitates those who interact with copy, contact membership@computer.org.
it in voice, movement, and facial expres-
Looking to the future sion. Think of it as George, version 3.0.
Reprints of Articles
In discussing how George is in many “Imagine a Jabberwacky with millions
ways his thumbprint, Carpenter suggests of characters like George in place of For price information or to order reprints, email
that advancements in robotics might take today’s handful, operating with Google- isystems@computer.org or fax +1 714 821 4010.
the cloning debate out of the biology labs scale serving capacity,” says Carpenter.
and into the computer labs. Whatever soci- “Almost certainly it will be shockingly Reprint Permission
etal and sociological implications would realistic, whether or not it passes a formal
To obtain permission to reprint an article, contact
exist in the wake of widespread prolifera- Turing Test.”
tion of highly evolved chatbots, the computer “Shockingly? I doubt it,” says Davachi. William Hagen, IEEE Copyrights and Trademarks
scientist is confident about the advancements “But that is an empirical question; I’ll wait Manager, at copyrights@ieee.org.
in AI they represent. and see.”
JANUARY/FEBRUARY 2006 www.computer.org/intelligent 7

Practical Pattern Matching

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical Pattern Matching

Uploaded by

Copyright:

Available Formats

In the

“SASSY” CHATBOT WINS WITH WIT

another pattern recognition system based

4 1541-1672/06/$20.00 © 2006 IEEE IEEE INTELLIGENT SYSTEMS

JANUARY/FEBRUARY 2006 www.computer.org/intelligent 5

tized, and we rely heavily on it as a society. seen a revolution in pattern recognition

IEEE Computer Society

“Sassy” Chatbot Wins

6 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

JANUARY/FEBRUARY 2006 www.computer.org/intelligent 7

You might also like