You are on page 1of 14

Knowledge Base Development

and RME Processing

A Rapport Technical White Paper

DRAFT COPY
January 16, 2003

1999 Banter Technology Inc. All rights reserved.


Rapport
Version 3.0
 1997–1999 Banter Technology Inc. All rights reserved.

The contents of this documentation are strictly confidential and are proprietary to Banter Technology
Inc. No part of this documentation may be reproduced, transmitted, or stored, in any form, in whole or
in part, or by any means for any purpose without the prior written consent of Banter Technology Inc.
The software described in this document is furnished under a license agreement and may be used or
copied only in accordance with the terms stipulated therein.
Banter Technology Inc. reserves the right to modify the information contained in this document
without prior notification.

Rapport is a trademark of Banter Technology Inc.


Microsoft, Outlook, and Windows are registered trademarks of Microsoft Corporation. Other product
and company names mentioned in this document may be trademarks of their respective owners.

Banter Technology Inc.


60 Federal Street
Suite 550
San Francisco, CA
94107

Tel: 1-415-247-2600
Fax: 1-415-247-2626
E-mail: info@banter.com
Knowledge Base Development and RME Processing Page 1

Introduction
The Rapport Knowledge Base is a unique, adaptive repository of linguistic and
statistical information that enables Rapport to accurately manage and classify high-
volume customer e-communications. Rapport is a learning system—the Knowledge
Base continuously evolves to model an organization’s current communication
environment.
Working in conjunction with the Rapport Knowledge Base, Rapport’s Relationship
Modeling Engine (RME) analyzes customer communications and takes the most
appropriate action on-the-fly, based on various user specifications, including
Rapport’s broad spectrum of configuration settings.
This white paper examines Rapport’s unique adaptive Knowledge Base, its
development process, and how it enables the RME to accurately process and classify
messages.

Background: Rapport’s RME Architecture


In Rapport, each customer message is processed according to user-defined categories.
Categories determine which automatic or semi-automatic action is taken with each
message.
Each category represents the content of a message, or indicates some other attribute of
a message such as its source. For example, a financial institution may define
categories like Checking Balance, Transfer Request, and Mortgage Info—these
categories represent the types of customer communications they commonly receive.
In the Rapport Knowledge Base, categories are associated with linguistic concept
models (discussed below) that are used by the RME for message classification. These
concept models determine the relevance of incoming messages to the categories in the
system.
Optionally, categories in the Knowledge Base may be associated with logical
expressions—formulas or statements used to refine or override the RME’s concept-
based message classification. For example, a category may be associated with the
following expression: $R_secured(s) == ‘YES’. Rapport analyzes an incoming
message, and using this expression, assigns the associated category 100% relevancy if
the message originated from a secure source.
During the Rapport configuration process, each category is associated with properties
that determine which actions are taken for each message. For example, a message
received by a financial institution is matched to the Mortgage Info category. This
category may have properties that instruct Rapport to compose and send an
appropriate automatic reply using standard pre-written text containing mortgage
information. Alternatively, the Mortgage Info category’s properties may be set to
route the message to an appropriate queue for manual handling.

This document contains confidential and proprietary information.


Page 2 Knowledge Base Development and RME Processing

RME Analysis and Message Processing


The RME uses linguistic data and complex statistical algorithms to accurately analyze
and classify customer messages. Each message entering Rapport is analyzed by the
RME’s two primary components: the Natural Language Processing (NLP) engine and
the Rapport statistic engine.
The NLP engine identifies concepts—basic units of linguistic or quantitative data
contained within each message. Linguistic data may be based on semantic, contextual,
and morphological information. Quantitative data may include various indicators
derived from the message, such as the number of sentences in a message.
For example, a message may contain the word “depositing.” Rapport’s NLP engine
uses morphological analysis to derive the base form of this word as “deposit”—an
identifiable concept used to classify the message.
After a message’s concepts are identified by the NLP engine, they are exported to the
statistic engine as concept models, the format used for Rapport’s statistical analysis.
Rapport’s statistic engine compares a message’s concept models to each category’s
pre-existing concept models in the Knowledge Base which were gathered during the
Learning and (optional) Training process described below.
The following example illustrates Rapport’s concept-based analysis: A financial
institution receives a message requesting information about mortgages. The RME
analyzes this message and identifies the linguistic concepts it contains. Then the RME
compares these concepts to all concept models associated with categories in the
system, and determines that the Mortgage Info category best matches this message.
Using unique proprietary algorithms and formulas to derive category relevancy, the
statistic engine calculates category scores—percentage values reflecting the
likelihood that a message belongs to a category. The statistic engine may also use
logical expressions to extract and evaluate message parameters that may influence or
determine category relevancy. Depending on a broad spectrum of system
configuration settings, the message is routed for appropriate automatic or semi-
automatic actions.
The simplified diagram below illustrates the RME’s message processing flow:

1. An incoming message enters the Rapport system.

2. The NLP engine identifies concepts within the message using linguistic data
stored in the Knowledge Base.

3. Concepts are exported as concept models to Rapport’s statistic engine.

4. The statistic engine compares the message’s concept models with each
category’s existing concept models to determine category relevancy.
Optionally, logical expressions are used to refine or override concept-based
message categorization.

This document contains confidential and proprietary information.


Knowledge Base Development and RME Processing Page 3

5. The message is routed for an automatic or semi-automatic action, based on


category properties and other configuration settings.

RME Processing Flow

NLP Engine Statistic Engine


Identifies
Concept Compares concept
Modeler models and uses optional
concepts within
logical expressions to
Incoming each message Message Routed for
score categories
Message Automatic or Semi-
Automatic Action

Rapport
Knowledge
Base

Simplified diagram illustrating the RME message processing flow

Rapport’s Adaptive Knowledge Base


The knowledge required for accurately classifying each customer message is stored in
Rapport’s adaptive Knowledge Base. The Rapport Knowledge Base is a repository of
linguistic and statistical information used during RME processing. The Knowledge
Base includes a framework of user-defined categories, built according to the specific
requirements of each organization using Rapport. It is fully adaptive—Learning
(discussed below) automatically updates linguistic and statistical information to
improve future message classification.
The Rapport Knowledge Base consists of two components: the Linguistic Knowledge
Base (LKB) and Statistic Knowledge Base (SKB). Rapport’s LKB contains a glossary
of standard English usage, semantically significant words, linguistically identical
words, grammar, rules for morphological analysis, and optional domain-specific
terminology. Rapport’s SKB contains hierarchical or flat decision trees—frameworks
of categories. Each category in a decision tree is associated with the concept models
and optional logical expressions that enable Rapport to accurately classify messages.

Rapport Knowledge Base


LKB: Linguistic Data SKB: Statistic Data
♦ Hierarchical or Flat Decision Trees
♦ Glossary of Standard English Usage
♦ Categories populated with Concept Models
♦ Semantically Significant Words and optional Logical Expressions
♦ Linguistically Identical Words
(stat_matching
♦ Grammar <= 0.5) ? 0 :
stat_matching
♦ Rules for Morphological Analysis
Concept Models and
♦ Optional Domain-Specific Terminology Decision Trees (optional) Logical Expressions

This document contains confidential and proprietary information.


Page 4 Knowledge Base Development and RME Processing

Building the Statistic Knowledge Base


Rapport’s Knowledge Base Editor application is used to create decision tree structures
stored in the SKB.
Decision trees may have either a flat or hierarchical structure, determined by an
organization’s message classification requirements. Flat decision trees are category
lists, employed when a hierarchical organization of categories is not warranted.
Hierarchical decision trees are well-suited for organizing categories that break down
logically into successively greater levels of detail.
The following simplified diagram represents a section of a financial institution’s
hierarchical decision tree. Branches of the hierarchical tree are designated by ovals;
sub-branches associated with categories appear in gray. Note that the categories
beneath each branch are logically related; for example, Address Change, Telephone
Change, and E-mail Change are all related to the “Customer Info” branch.

Requests Customer Info

Statement Check Address Telephone E-mail


Orders
Copy Copy Change Change Change

Check Traveler’s Foreign


Order Checks Currency

Representation of a hierarchical decision tree

Note: Using the Knowledge Base Editor, logical expressions may also be associated
with specified categories at this stage to further refine the classification
process.

Once a skeletal decision tree structure—either hierarchical or flat—has been created,


concept models for each category (used for message classification) are gathered for
each category through the Learning or Training processes.

Learning
Learning is an ongoing automatic process, invisible to the user, that gathers concept
models for each category in the SKB over time. Concept models are gathered by
collecting feedback from normal message processing activity, bootstrapping the
system for accurate message classification in the future.

This document contains confidential and proprietary information.


Knowledge Base Development and RME Processing Page 5

For example, when a customer service agent uses the Rapport Message Center
application to compose a reply to a message, the agent may choose from a database of
pre-written responses linked to categories in the Knowledge Base. The act of choosing
a response provides feedback to the system; concept models contained in the message
form the basis of concept models associated with categories linked to the response.
In addition to bootstrapping the system, learning continuously updates and enriches
existing concept models in the SKB during normal Rapport usage. Learning is an
organic process, enabling the Knowledge Base to grow and adapt over time. Concept
models are refined by introducing new information derived from changes that have
occurred in the composition of messages, and from agent activity. As Rapport learns,
it broadens the base of concept models, making the system more precise over time.

Training
The Training process is an optional, but recommended method for gathering models
for categories in the SKB decision tree. Training is implemented offline, and involves
analyzing a corpus of sample messages classified into pre-defined categories. These
messages are first processed by Rapport’s Lexical Editor to enrich the LKB with user-
specific linguistic data. Then each message in the corpus is processed individually by
the NLP and Statistic engines to populate the SKB with models used to classify
incoming messages.
The sections that follow discuss the Training process in greater detail.

Stages of Knowledge Base Development


Rapport Knowledge Base development based on Training is an optional, but
recommended process implemented offline, consisting of the following stages:

 Creating a Corpus
A corpus of sample messages, pre-classified according to categories, provides
source material for NLP and Statistic Training processes that build the Rapport
Knowledge Base.

 The Pre-Training Process


An optional process that enriches the LKB by extracting and identifying
significant words and linguistic information that are unique to the corpus of
messages.

 Knowledge Base Building


An optional process consisting of two stages: NLP and Statistic Training. NLP
Training generates concept models—units of linguistic information used by the
statistic engine to build the SKB. Statistic Training gathers concept models from
each message in the corpus and updates each category’s models in the SKB
decision tree.

This document contains confidential and proprietary information.


Page 6 Knowledge Base Development and RME Processing

Creating a Corpus
A corpus is a collection of sample messages gathered by an organization (prior to
using Rapport) that have been pre-classified according to their subject matter. The
corpus provides source data used during Pre-training, NLP Training, and Statistic
Training (described below).
The corpus may be organized by grouping similar messages in directories or folders
according to category names that represent the messages’ content. Alternatively, each
message may have a field or data identifier that indicates its category (or categories).

A corpus where each


Category 1 Category 2 Category n Category 4 message is associated
with one or more
Category 2 categories
Category 1,5
. . .
Category 3

A corpus with similar messages grouped


together in separate folders or directories
representing categories

For the subsequent Pre-Training and Training processes to be most effective, the
corpus should only contain messages that are accurately classified and free of
extraneous text (unrelated to the message’s category). An ideal corpus consists of
messages that are classified according to well-defined categories (avoiding
redundancies between categories), with textual content that is consistently
representative of the category’s subject. As many messages as possible with similar
message content should be grouped together for each category—more messages per
category improves the quality of concept models created during the statistic Training
process (described below).

This document contains confidential and proprietary information.


Knowledge Base Development and RME Processing Page 7

The Pre-Training Process


Pre-Training is an optional process that extracts and identifies significant linguistic
information unique to the corpus being analyzed. This data enriches the LKB,
improving NLP Training and the RME’s ability to accurately classify messages
online.
Each business or organization has its own vocabulary of words that are unique and
significant. For example, an Internet Service Provider (ISP) may consider the words
“Internet Connection” to be significant, while an airline passenger service might
decide that these words are insignificant. At the same time, both companies would
probably consider the word “connection” to be significant, but they would define
“connection” in two entirely different ways. To the ISP, a connection is an Internet
hookup; to an airline, it’s an air flight. In contrast, an insurance company may define
“connection” as insignificant.
Rapport’s Lexical Editor application is used to implement the Pre-Training process.
The Lexical Editor analyzes the corpus of messages, filters the text, and generates lists
of simple linguistic units called tokens and token pairs, organized according to
frequency.
A token is a string of characters identified by the Lexical Editor within a body of text.
When the system analyzes the text of a corpus, it searches for delimiter characters
such as spaces and typographical marks (periods, colons, etc.). Any string of
characters found between these delimiters is recorded as a token. Significant tokens,
non-significant tokens, and word associations are identified using the Lexical Editor,
and stored in the LKB.

Note: The Pre-Training process is particularly useful for preparing the RME to
accurately classify and process messages from international sources,
especially messages including frequent misspellings and non-standard
English usage.

Lexical Editor LKB


LKB
Analyzes
Analyzes thethe corpus
corpus
Filters
Filters the
the word
word base
base
Linguistic
Linguistic
Generates
Generates lists
lists of
of single
single tokens
tokens Knowledge
Knowledge Base
Base
and
and token
token pairs
pairs
Corpus Calculates
Calculates token
token frequency
frequency Receives
Receives corpus-specific
corpus-specific
Enables
Enables thethe user
user toto identify
identify linguistic
linguistic data
data from
from the
the
significant
significant and
and non-significant
non-significant Lexical
Lexical Editor
Editor
tokens,
tokens, and
and define
define word
word Also
Also contains
contains additional
additional
associations
associations domain
domain knowledge
knowledge
Stores
Stores information
information in in the
the (optional),
(optional), standard
standard English
English
Linguistic
Linguistic Knowledge
Knowledge Base Base word
word lists,
lists, grammar,
grammar, andand
rules
rules for
for morphological
morphological
analysis
analysis

This document contains confidential and proprietary information.


Page 8 Knowledge Base Development and RME Processing

Knowledge Base Building


Knowledge Base building based on a corpus is implemented in two phases: NLP
Training and Statistic Training.

The NLP Training Phase


During the NLP Training phase, the NLP engine analyzes and processes each message
in the corpus individually in two stages: Pre-Processing and Processing.
During Pre-Processing, the NLP engine analyzes each message text, identifies the
portion of text to be processed, and generates an intermediate representation of the
concepts contained in the message. In the Processing stage, the NLP engine uses
morphological rules, word associations, and other linguistic techniques to accurately
determine the concepts contained in each message, and the associations between them.
These concepts are exported to the statistic engine for statistic Training via the
Concept Modeler. The Concept Modeler converts the message’s concepts into
concept models—a format used by the statistic engine to build the Statistic
Knowledge Base.

NLP Engine
Pre-Processing
Analyzes and processes each Processing
message individually Uses morphological rules, word
Corpus Identifies the portion of text to be associations, and complex
processed algorithms for generating
Receives data from the concepts, and concepts based
Linguistic Knowledge Base on other concepts
Generates an intermediate Exports concepts to the Concept
representation of concepts Modeler

■ ■ ■ ■ ■

LKB ■Concepts
■ ■ ■ ■

Linguistic
Knowledge Base

Concept
Concept Statistic
Modeler
Modeler Engine
Converts
Converts concepts
concepts Implements
into
into concept
concept models
models Statistic Training

This document contains confidential and proprietary information.


Knowledge Base Development and RME Processing Page 9

The Statistic Training Phase


Statistic Training is implemented using the Rapport Knowledge Base Editor
application. A skeletal decision tree structure is built based on the same categories
used to classify messages in the corpus. During statistic Training, the statistic engine
receives concept models from each corpus message individually. The statistic engine
builds the SKB by performing operations on these concept models, and creating
models for the categories of each message in the SKB decision tree. The result is an
SKB populated with models that accurately classifies incoming messages during
online RME processing.

Note: Statistic Training may also provide feedback (manually) to the NLP
Training process, improving NLP analysis and the determination of
concepts.

Statistic Engine
Concept
Concept
Models
Models Knowledge Base
SKB
Statistic
Editor Knowledge Base
♦ Populates decision tree with
new concept models based Stores concept models for
on each message’s concept each category in decision
models trees
Per
Per Individual
Individual ♦ Updates existing models in
Message
Message the Statistic Knowledge
Base

Updating the Knowledge Base


Rapport readily adapts to almost any change in your incoming message environment.
In some situations, however, the Learning process may take time. A more immediate
solution is running an accelerated version of the Pre-Training and Training processes.
Repeating the Pre-Training and Training (as required) ensures optimal message
classification.
It is recommended to repeat these processes when:

 Major changes have been made to categories

 Demographic or geographic changes have occurred effecting the origin of your


incoming messages (e.g., an organization begins to receive large numbers of
messages from a location outside its normal area of operation)

 Adding new categories to the SKB

 Adding or changing products or services

This document contains confidential and proprietary information.


Page 10 Knowledge Base Development and RME Processing

 Responding to special events

Summary of Knowledge Base Development


Linguistic and statistical data stored in the Rapport Knowledge Base is used by the
RME to perform accurate message classification, enabling the system to take the most
appropriate action for each customer message.

Learning
To gather this data, the system can be bootstrapped by an automatic process called
Learning. Learning is ongoing, invisible to the user, and populates the SKB decision
tree with concept models over time during normal Rapport operation. In addition to
bootstrapping the system, learning continuously updates models in the SKB,
improving message classification.

Training
Alternatively, the Rapport Knowledge Base may be built based on a corpus of sample
messages classified according to categories. During Pre-Training, the Lexical Editor is
used to analyze the corpus, identify significant, corpus-specific linguistic data, and
refine the LKB. NLP Training analyzes each message in the corpus individually, and
exports concepts via the Concept Modeler to the statistic engine. The Knowledge Base
Editor application is used to create a skeletal decision tree structure based on corpus
categories. For each message in the corpus, concept models are gathered for
categories in the decision tree, and are stored in the SKB.
The following simplified diagrams illustrate the chronological development of the
Rapport Knowledge Base using the Training process.

This document contains confidential and proprietary information.


Knowledge Base Development and RME Processing Page 11

Knowledge Base Development


(Based on Training)

Creating a Corpus

Classified Corpus
Corpus
Sample Messages according to
message content

The Pre-Training Process

Lexical
Linguistic
Corpus
Corpus Editor
Knowledge
Application
Base

NLP Training Process

NLP Engine
Corpus
Corpus Concept
Pre-Processing
Modeler
& Processing

Concept Models
Exported to
Statistic Engine
Linguistic
Knowledge
Base

Statistic Training Process

Statistic Statistic
Engine Knowle
dge
Concept
Models from
NLP Training

This document contains confidential and proprietary information.


Page 12 Knowledge Base Development and RME Processing

Online RME Processing


The linguistic and statistical data gathered through Learning, and optionally Training,
enables the RME to accurately classify customer messages on-the-fly. In a process
similar to NLP Training, message concepts are identified by the NLP engine using
data in the LKB, and are exported to the Concept Modeler. Concept models are
received by the statistic engine and compared to existing models in the SKB’s
decision tree, generating category scores. Based on category relevancy, optional
logical expressions and other message parameters, and category configuration
properties, the message is routed for an appropriate automatic or semi-automatic
action. The learning process enables the system to evolve and adapt over time,
constantly improving Rapport’s ability to accurately classify messages in the future.

Online RME Message Processing

Concept
NLP Engine Modeler
Pre-Processing
Statistic
Processing
Engine
Customer Message Routed for
Message Automatic or Semi-
Automatic Action
Knowledge Base

LKB SKB

This document contains confidential and proprietary information.