You are on page 1of 14

Knowledge Base Development

and RME Processing

A Rapport Technical White Paper

January 16, 2003

1999 Banter Technology Inc. All rights reserved.

Version 3.0
 1997–1999 Banter Technology Inc. All rights reserved.

The contents of this documentation are strictly confidential and are proprietary to Banter Technology
Inc. No part of this documentation may be reproduced, transmitted, or stored, in any form, in whole or
in part, or by any means for any purpose without the prior written consent of Banter Technology Inc.
The software described in this document is furnished under a license agreement and may be used or
copied only in accordance with the terms stipulated therein.
Banter Technology Inc. reserves the right to modify the information contained in this document
without prior notification.

Rapport is a trademark of Banter Technology Inc.

Microsoft, Outlook, and Windows are registered trademarks of Microsoft Corporation. Other product
and company names mentioned in this document may be trademarks of their respective owners.

Banter Technology Inc.

60 Federal Street
Suite 550
San Francisco, CA

Tel: 1-415-247-2600
Fax: 1-415-247-2626
Knowledge Base Development and RME Processing Page 1

The Rapport Knowledge Base is a unique, adaptive repository of linguistic and
statistical information that enables Rapport to accurately manage and classify high-
volume customer e-communications. Rapport is a learning system—the Knowledge
Base continuously evolves to model an organization’s current communication
Working in conjunction with the Rapport Knowledge Base, Rapport’s Relationship
Modeling Engine (RME) analyzes customer communications and takes the most
appropriate action on-the-fly, based on various user specifications, including
Rapport’s broad spectrum of configuration settings.
This white paper examines Rapport’s unique adaptive Knowledge Base, its
development process, and how it enables the RME to accurately process and classify

Background: Rapport’s RME Architecture

In Rapport, each customer message is processed according to user-defined categories.
Categories determine which automatic or semi-automatic action is taken with each
Each category represents the content of a message, or indicates some other attribute of
a message such as its source. For example, a financial institution may define
categories like Checking Balance, Transfer Request, and Mortgage Info—these
categories represent the types of customer communications they commonly receive.
In the Rapport Knowledge Base, categories are associated with linguistic concept
models (discussed below) that are used by the RME for message classification. These
concept models determine the relevance of incoming messages to the categories in the
Optionally, categories in the Knowledge Base may be associated with logical
expressions—formulas or statements used to refine or override the RME’s concept-
based message classification. For example, a category may be associated with the
following expression: $R_secured(s) == ‘YES’. Rapport analyzes an incoming
message, and using this expression, assigns the associated category 100% relevancy if
the message originated from a secure source.
During the Rapport configuration process, each category is associated with properties
that determine which actions are taken for each message. For example, a message
received by a financial institution is matched to the Mortgage Info category. This
category may have properties that instruct Rapport to compose and send an
appropriate automatic reply using standard pre-written text containing mortgage
information. Alternatively, the Mortgage Info category’s properties may be set to
route the message to an appropriate queue for manual handling.

This document contains confidential and proprietary information.

Page 2 Knowledge Base Development and RME Processing

RME Analysis and Message Processing

The RME uses linguistic data and complex statistical algorithms to accurately analyze
and classify customer messages. Each message entering Rapport is analyzed by the
RME’s two primary components: the Natural Language Processing (NLP) engine and
the Rapport statistic engine.
The NLP engine identifies concepts—basic units of linguistic or quantitative data
contained within each message. Linguistic data may be based on semantic, contextual,
and morphological information. Quantitative data may include various indicators
derived from the message, such as the number of sentences in a message.
For example, a message may contain the word “depositing.” Rapport’s NLP engine
uses morphological analysis to derive the base form of this word as “deposit”—an
identifiable concept used to classify the message.
After a message’s concepts are identified by the NLP engine, they are exported to the
statistic engine as concept models, the format used for Rapport’s statistical analysis.
Rapport’s statistic engine compares a message’s concept models to each category’s
pre-existing concept models in the Knowledge Base which were gathered during the
Learning and (optional) Training process described below.
The following example illustrates Rapport’s concept-based analysis: A financial
institution receives a message requesting information about mortgages. The RME
analyzes this message and identifies the linguistic concepts it contains. Then the RME
compares these concepts to all concept models associated with categories in the
system, and determines that the Mortgage Info category best matches this message.
Using unique proprietary algorithms and formulas to derive category relevancy, the
statistic engine calculates category scores—percentage values reflecting the
likelihood that a message belongs to a category. The statistic engine may also use
logical expressions to extract and evaluate message parameters that may influence or
determine category relevancy. Depending on a broad spectrum of system
configuration settings, the message is routed for appropriate automatic or semi-
automatic actions.
The simplified diagram below illustrates the RME’s message processing flow:

1. An incoming message enters the Rapport system.

2. The NLP engine identifies concepts within the message using linguistic data
stored in the Knowledge Base.

3. Concepts are exported as concept models to Rapport’s statistic engine.

4. The statistic engine compares the message’s concept models with each
category’s existing concept models to determine category relevancy.
Optionally, logical expressions are used to refine or override concept-based
message categorization.

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing Page 3

5. The message is routed for an automatic or semi-automatic action, based on

category properties and other configuration settings.

RME Processing Flow

NLP Engine Statistic Engine

Concept Compares concept
Modeler models and uses optional
concepts within
logical expressions to
Incoming each message Message Routed for
score categories
Message Automatic or Semi-
Automatic Action


Simplified diagram illustrating the RME message processing flow

Rapport’s Adaptive Knowledge Base

The knowledge required for accurately classifying each customer message is stored in
Rapport’s adaptive Knowledge Base. The Rapport Knowledge Base is a repository of
linguistic and statistical information used during RME processing. The Knowledge
Base includes a framework of user-defined categories, built according to the specific
requirements of each organization using Rapport. It is fully adaptive—Learning
(discussed below) automatically updates linguistic and statistical information to
improve future message classification.
The Rapport Knowledge Base consists of two components: the Linguistic Knowledge
Base (LKB) and Statistic Knowledge Base (SKB). Rapport’s LKB contains a glossary
of standard English usage, semantically significant words, linguistically identical
words, grammar, rules for morphological analysis, and optional domain-specific
terminology. Rapport’s SKB contains hierarchical or flat decision trees—frameworks
of categories. Each category in a decision tree is associated with the concept models
and optional logical expressions that enable Rapport to accurately classify messages.

Rapport Knowledge Base

LKB: Linguistic Data SKB: Statistic Data
♦ Hierarchical or Flat Decision Trees
♦ Glossary of Standard English Usage
♦ Categories populated with Concept Models
♦ Semantically Significant Words and optional Logical Expressions
♦ Linguistically Identical Words
♦ Grammar <= 0.5) ? 0 :
♦ Rules for Morphological Analysis
Concept Models and
♦ Optional Domain-Specific Terminology Decision Trees (optional) Logical Expressions

This document contains confidential and proprietary information.

Page 4 Knowledge Base Development and RME Processing

Building the Statistic Knowledge Base

Rapport’s Knowledge Base Editor application is used to create decision tree structures
stored in the SKB.
Decision trees may have either a flat or hierarchical structure, determined by an
organization’s message classification requirements. Flat decision trees are category
lists, employed when a hierarchical organization of categories is not warranted.
Hierarchical decision trees are well-suited for organizing categories that break down
logically into successively greater levels of detail.
The following simplified diagram represents a section of a financial institution’s
hierarchical decision tree. Branches of the hierarchical tree are designated by ovals;
sub-branches associated with categories appear in gray. Note that the categories
beneath each branch are logically related; for example, Address Change, Telephone
Change, and E-mail Change are all related to the “Customer Info” branch.

Requests Customer Info

Statement Check Address Telephone E-mail

Copy Copy Change Change Change

Check Traveler’s Foreign

Order Checks Currency

Representation of a hierarchical decision tree

Note: Using the Knowledge Base Editor, logical expressions may also be associated
with specified categories at this stage to further refine the classification

Once a skeletal decision tree structure—either hierarchical or flat—has been created,

concept models for each category (used for message classification) are gathered for
each category through the Learning or Training processes.

Learning is an ongoing automatic process, invisible to the user, that gathers concept
models for each category in the SKB over time. Concept models are gathered by
collecting feedback from normal message processing activity, bootstrapping the
system for accurate message classification in the future.

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing Page 5

For example, when a customer service agent uses the Rapport Message Center
application to compose a reply to a message, the agent may choose from a database of
pre-written responses linked to categories in the Knowledge Base. The act of choosing
a response provides feedback to the system; concept models contained in the message
form the basis of concept models associated with categories linked to the response.
In addition to bootstrapping the system, learning continuously updates and enriches
existing concept models in the SKB during normal Rapport usage. Learning is an
organic process, enabling the Knowledge Base to grow and adapt over time. Concept
models are refined by introducing new information derived from changes that have
occurred in the composition of messages, and from agent activity. As Rapport learns,
it broadens the base of concept models, making the system more precise over time.

The Training process is an optional, but recommended method for gathering models
for categories in the SKB decision tree. Training is implemented offline, and involves
analyzing a corpus of sample messages classified into pre-defined categories. These
messages are first processed by Rapport’s Lexical Editor to enrich the LKB with user-
specific linguistic data. Then each message in the corpus is processed individually by
the NLP and Statistic engines to populate the SKB with models used to classify
incoming messages.
The sections that follow discuss the Training process in greater detail.

Stages of Knowledge Base Development

Rapport Knowledge Base development based on Training is an optional, but
recommended process implemented offline, consisting of the following stages:

 Creating a Corpus
A corpus of sample messages, pre-classified according to categories, provides
source material for NLP and Statistic Training processes that build the Rapport
Knowledge Base.

 The Pre-Training Process

An optional process that enriches the LKB by extracting and identifying
significant words and linguistic information that are unique to the corpus of

 Knowledge Base Building

An optional process consisting of two stages: NLP and Statistic Training. NLP
Training generates concept models—units of linguistic information used by the
statistic engine to build the SKB. Statistic Training gathers concept models from
each message in the corpus and updates each category’s models in the SKB
decision tree.

This document contains confidential and proprietary information.

Page 6 Knowledge Base Development and RME Processing

Creating a Corpus
A corpus is a collection of sample messages gathered by an organization (prior to
using Rapport) that have been pre-classified according to their subject matter. The
corpus provides source data used during Pre-training, NLP Training, and Statistic
Training (described below).
The corpus may be organized by grouping similar messages in directories or folders
according to category names that represent the messages’ content. Alternatively, each
message may have a field or data identifier that indicates its category (or categories).

A corpus where each

Category 1 Category 2 Category n Category 4 message is associated
with one or more
Category 2 categories
Category 1,5
. . .
Category 3

A corpus with similar messages grouped

together in separate folders or directories
representing categories

For the subsequent Pre-Training and Training processes to be most effective, the
corpus should only contain messages that are accurately classified and free of
extraneous text (unrelated to the message’s category). An ideal corpus consists of
messages that are classified according to well-defined categories (avoiding
redundancies between categories), with textual content that is consistently
representative of the category’s subject. As many messages as possible with similar
message content should be grouped together for each category—more messages per
category improves the quality of concept models created during the statistic Training
process (described below).

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing Page 7

The Pre-Training Process

Pre-Training is an optional process that extracts and identifies significant linguistic
information unique to the corpus being analyzed. This data enriches the LKB,
improving NLP Training and the RME’s ability to accurately classify messages
Each business or organization has its own vocabulary of words that are unique and
significant. For example, an Internet Service Provider (ISP) may consider the words
“Internet Connection” to be significant, while an airline passenger service might
decide that these words are insignificant. At the same time, both companies would
probably consider the word “connection” to be significant, but they would define
“connection” in two entirely different ways. To the ISP, a connection is an Internet
hookup; to an airline, it’s an air flight. In contrast, an insurance company may define
“connection” as insignificant.
Rapport’s Lexical Editor application is used to implement the Pre-Training process.
The Lexical Editor analyzes the corpus of messages, filters the text, and generates lists
of simple linguistic units called tokens and token pairs, organized according to
A token is a string of characters identified by the Lexical Editor within a body of text.
When the system analyzes the text of a corpus, it searches for delimiter characters
such as spaces and typographical marks (periods, colons, etc.). Any string of
characters found between these delimiters is recorded as a token. Significant tokens,
non-significant tokens, and word associations are identified using the Lexical Editor,
and stored in the LKB.

Note: The Pre-Training process is particularly useful for preparing the RME to
accurately classify and process messages from international sources,
especially messages including frequent misspellings and non-standard
English usage.

Lexical Editor LKB

Analyzes thethe corpus
Filters the
the word
word base
Generates lists
lists of
of single
single tokens
tokens Knowledge
Knowledge Base
and token
token pairs
Corpus Calculates
Calculates token
token frequency
frequency Receives
Receives corpus-specific
Enables thethe user
user toto identify
identify linguistic
linguistic data
data from
from the
significant and
and non-significant
non-significant Lexical
Lexical Editor
tokens, and
and define
define word
word Also
Also contains
contains additional
associations domain
domain knowledge
Stores information
information in in the
the (optional),
(optional), standard
standard English
Linguistic Knowledge
Knowledge Base Base word
word lists,
lists, grammar,
grammar, andand
rules for
for morphological

This document contains confidential and proprietary information.

Page 8 Knowledge Base Development and RME Processing

Knowledge Base Building

Knowledge Base building based on a corpus is implemented in two phases: NLP
Training and Statistic Training.

The NLP Training Phase

During the NLP Training phase, the NLP engine analyzes and processes each message
in the corpus individually in two stages: Pre-Processing and Processing.
During Pre-Processing, the NLP engine analyzes each message text, identifies the
portion of text to be processed, and generates an intermediate representation of the
concepts contained in the message. In the Processing stage, the NLP engine uses
morphological rules, word associations, and other linguistic techniques to accurately
determine the concepts contained in each message, and the associations between them.
These concepts are exported to the statistic engine for statistic Training via the
Concept Modeler. The Concept Modeler converts the message’s concepts into
concept models—a format used by the statistic engine to build the Statistic
Knowledge Base.

NLP Engine
Analyzes and processes each Processing
message individually Uses morphological rules, word
Corpus Identifies the portion of text to be associations, and complex
processed algorithms for generating
Receives data from the concepts, and concepts based
Linguistic Knowledge Base on other concepts
Generates an intermediate Exports concepts to the Concept
representation of concepts Modeler

■ ■ ■ ■ ■

LKB ■Concepts
■ ■ ■ ■

Knowledge Base

Concept Statistic
Modeler Engine
Converts concepts
concepts Implements
into concept
concept models
models Statistic Training

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing Page 9

The Statistic Training Phase

Statistic Training is implemented using the Rapport Knowledge Base Editor
application. A skeletal decision tree structure is built based on the same categories
used to classify messages in the corpus. During statistic Training, the statistic engine
receives concept models from each corpus message individually. The statistic engine
builds the SKB by performing operations on these concept models, and creating
models for the categories of each message in the SKB decision tree. The result is an
SKB populated with models that accurately classifies incoming messages during
online RME processing.

Note: Statistic Training may also provide feedback (manually) to the NLP
Training process, improving NLP analysis and the determination of

Statistic Engine
Models Knowledge Base
Editor Knowledge Base
♦ Populates decision tree with
new concept models based Stores concept models for
on each message’s concept each category in decision
models trees
Per Individual
Individual ♦ Updates existing models in
Message the Statistic Knowledge

Updating the Knowledge Base

Rapport readily adapts to almost any change in your incoming message environment.
In some situations, however, the Learning process may take time. A more immediate
solution is running an accelerated version of the Pre-Training and Training processes.
Repeating the Pre-Training and Training (as required) ensures optimal message
It is recommended to repeat these processes when:

 Major changes have been made to categories

 Demographic or geographic changes have occurred effecting the origin of your

incoming messages (e.g., an organization begins to receive large numbers of
messages from a location outside its normal area of operation)

 Adding new categories to the SKB

 Adding or changing products or services

This document contains confidential and proprietary information.

Page 10 Knowledge Base Development and RME Processing

 Responding to special events

Summary of Knowledge Base Development

Linguistic and statistical data stored in the Rapport Knowledge Base is used by the
RME to perform accurate message classification, enabling the system to take the most
appropriate action for each customer message.

To gather this data, the system can be bootstrapped by an automatic process called
Learning. Learning is ongoing, invisible to the user, and populates the SKB decision
tree with concept models over time during normal Rapport operation. In addition to
bootstrapping the system, learning continuously updates models in the SKB,
improving message classification.

Alternatively, the Rapport Knowledge Base may be built based on a corpus of sample
messages classified according to categories. During Pre-Training, the Lexical Editor is
used to analyze the corpus, identify significant, corpus-specific linguistic data, and
refine the LKB. NLP Training analyzes each message in the corpus individually, and
exports concepts via the Concept Modeler to the statistic engine. The Knowledge Base
Editor application is used to create a skeletal decision tree structure based on corpus
categories. For each message in the corpus, concept models are gathered for
categories in the decision tree, and are stored in the SKB.
The following simplified diagrams illustrate the chronological development of the
Rapport Knowledge Base using the Training process.

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing Page 11

Knowledge Base Development

(Based on Training)

Creating a Corpus

Classified Corpus
Sample Messages according to
message content

The Pre-Training Process

Corpus Editor

NLP Training Process

NLP Engine
Corpus Concept
& Processing

Concept Models
Exported to
Statistic Engine

Statistic Training Process

Statistic Statistic
Engine Knowle
Models from
NLP Training

This document contains confidential and proprietary information.

Page 12 Knowledge Base Development and RME Processing

Online RME Processing

The linguistic and statistical data gathered through Learning, and optionally Training,
enables the RME to accurately classify customer messages on-the-fly. In a process
similar to NLP Training, message concepts are identified by the NLP engine using
data in the LKB, and are exported to the Concept Modeler. Concept models are
received by the statistic engine and compared to existing models in the SKB’s
decision tree, generating category scores. Based on category relevancy, optional
logical expressions and other message parameters, and category configuration
properties, the message is routed for an appropriate automatic or semi-automatic
action. The learning process enables the system to evolve and adapt over time,
constantly improving Rapport’s ability to accurately classify messages in the future.

Online RME Message Processing

NLP Engine Modeler
Customer Message Routed for
Message Automatic or Semi-
Automatic Action
Knowledge Base


This document contains confidential and proprietary information.