Knowledge Base Development and RME Processing

A Rapport Technical White Paper

DRAFT COPY January 16, 2003

©1999 Banter Technology Inc. All rights reserved.

Rapport Version 3.0 © 1997–1999 Banter Technology Inc. All rights reserved.

The contents of this documentation are strictly confidential and are proprietary to Banter Technology Inc. No part of this documentation may be reproduced, transmitted, or stored, in any form, in whole or in part, or by any means for any purpose without the prior written consent of Banter Technology Inc. The software described in this document is furnished under a license agreement and may be used or copied only in accordance with the terms stipulated therein. Banter Technology Inc. reserves the right to modify the information contained in this document without prior notification. Rapport is a trademark of Banter Technology Inc. Microsoft, Outlook, and Windows are registered trademarks of Microsoft Corporation. Other product and company names mentioned in this document may be trademarks of their respective owners.

Banter Technology Inc. 60 Federal Street Suite 550 San Francisco, CA 94107 Tel: 1-415-247-2600 Fax: 1-415-247-2626 E-mail:

Knowledge Base Development and RME Processing

Page 1

The Rapport Knowledge Base is a unique, adaptive repository of linguistic and statistical information that enables Rapport to accurately manage and classify highvolume customer e-communications. Rapport is a learning system—the Knowledge Base continuously evolves to model an organization’s current communication environment. Working in conjunction with the Rapport Knowledge Base, Rapport’s Relationship Modeling Engine (RME) analyzes customer communications and takes the most appropriate action on-the-fly, based on various user specifications, including Rapport’s broad spectrum of configuration settings. This white paper examines Rapport’s unique adaptive Knowledge Base, its development process, and how it enables the RME to accurately process and classify messages.

Background: Rapport’s RME Architecture
In Rapport, each customer message is processed according to user-defined categories. Categories determine which automatic or semi-automatic action is taken with each message. Each category represents the content of a message, or indicates some other attribute of a message such as its source. For example, a financial institution may define categories like Checking Balance, Transfer Request, and Mortgage Info—these categories represent the types of customer communications they commonly receive. In the Rapport Knowledge Base, categories are associated with linguistic concept models (discussed below) that are used by the RME for message classification. These concept models determine the relevance of incoming messages to the categories in the system. Optionally, categories in the Knowledge Base may be associated with logical expressions—formulas or statements used to refine or override the RME’s conceptbased message classification. For example, a category may be associated with the following expression: $R_secured(s) == ‘YES’. Rapport analyzes an incoming message, and using this expression, assigns the associated category 100% relevancy if the message originated from a secure source. During the Rapport configuration process, each category is associated with properties that determine which actions are taken for each message. For example, a message received by a financial institution is matched to the Mortgage Info category. This category may have properties that instruct Rapport to compose and send an appropriate automatic reply using standard pre-written text containing mortgage information. Alternatively, the Mortgage Info category’s properties may be set to route the message to an appropriate queue for manual handling.

This document contains confidential and proprietary information.

Page 2

Knowledge Base Development and RME Processing

RME Analysis and Message Processing
The RME uses linguistic data and complex statistical algorithms to accurately analyze and classify customer messages. Each message entering Rapport is analyzed by the RME’s two primary components: the Natural Language Processing (NLP) engine and the Rapport statistic engine. The NLP engine identifies concepts—basic units of linguistic or quantitative data contained within each message. Linguistic data may be based on semantic, contextual, and morphological information. Quantitative data may include various indicators derived from the message, such as the number of sentences in a message. For example, a message may contain the word “depositing.” Rapport’s NLP engine uses morphological analysis to derive the base form of this word as “deposit”—an identifiable concept used to classify the message. After a message’s concepts are identified by the NLP engine, they are exported to the statistic engine as concept models, the format used for Rapport’s statistical analysis. Rapport’s statistic engine compares a message’s concept models to each category’s pre-existing concept models in the Knowledge Base which were gathered during the Learning and (optional) Training process described below. The following example illustrates Rapport’s concept-based analysis: A financial institution receives a message requesting information about mortgages. The RME analyzes this message and identifies the linguistic concepts it contains. Then the RME compares these concepts to all concept models associated with categories in the system, and determines that the Mortgage Info category best matches this message. Using unique proprietary algorithms and formulas to derive category relevancy, the statistic engine calculates category scores—percentage values reflecting the likelihood that a message belongs to a category. The statistic engine may also use logical expressions to extract and evaluate message parameters that may influence or determine category relevancy. Depending on a broad spectrum of system configuration settings, the message is routed for appropriate automatic or semiautomatic actions. The simplified diagram below illustrates the RME’s message processing flow: 1. An incoming message enters the Rapport system. 2. The NLP engine identifies concepts within the message using linguistic data stored in the Knowledge Base. 3. Concepts are exported as concept models to Rapport’s statistic engine. 4. The statistic engine compares the message’s concept models with each category’s existing concept models to determine category relevancy. Optionally, logical expressions are used to refine or override concept-based message categorization.

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing

Page 3

5. The message is routed for an automatic or semi-automatic action, based on category properties and other configuration settings.

RME Processing Flow
NLP Engine
Incoming Message Identifies concepts within each message

Statistic Engine Concept Modeler
Compares concept models and uses optional logical expressions to score categories

Message Routed for Automatic or SemiAutomatic Action

Rapport Knowledge Base

Simplified diagram illustrating the RME message processing flow

Rapport’s Adaptive Knowledge Base
The knowledge required for accurately classifying each customer message is stored in Rapport’s adaptive Knowledge Base. The Rapport Knowledge Base is a repository of linguistic and statistical information used during RME processing. The Knowledge Base includes a framework of user-defined categories, built according to the specific requirements of each organization using Rapport. It is fully adaptive—Learning (discussed below) automatically updates linguistic and statistical information to improve future message classification. The Rapport Knowledge Base consists of two components: the Linguistic Knowledge Base (LKB) and Statistic Knowledge Base (SKB). Rapport’s LKB contains a glossary of standard English usage, semantically significant words, linguistically identical words, grammar, rules for morphological analysis, and optional domain-specific terminology. Rapport’s SKB contains hierarchical or flat decision trees—frameworks of categories. Each category in a decision tree is associated with the concept models and optional logical expressions that enable Rapport to accurately classify messages.

Rapport Knowledge Base
LKB: Linguistic Data
♦ Glossary of Standard English Usage ♦ Semantically Significant Words ♦ Linguistically Identical Words ♦ Grammar ♦ Rules for Morphological Analysis ♦ Optional Domain-Specific Terminology
Decision Trees
(stat_matching <= 0.5) ? 0 : stat_matching

SKB: Statistic Data
♦ Hierarchical or Flat Decision Trees ♦ Categories populated with Concept Models and optional Logical Expressions

Concept Models and (optional) Logical Expressions

This document contains confidential and proprietary information.

Page 4

Knowledge Base Development and RME Processing

Building the Statistic Knowledge Base
Rapport’s Knowledge Base Editor application is used to create decision tree structures stored in the SKB. Decision trees may have either a flat or hierarchical structure, determined by an organization’s message classification requirements. Flat decision trees are category lists, employed when a hierarchical organization of categories is not warranted. Hierarchical decision trees are well-suited for organizing categories that break down logically into successively greater levels of detail. The following simplified diagram represents a section of a financial institution’s hierarchical decision tree. Branches of the hierarchical tree are designated by ovals; sub-branches associated with categories appear in gray. Note that the categories beneath each branch are logically related; for example, Address Change, Telephone Change, and E-mail Change are all related to the “Customer Info” branch.


Customer Info

Statement Copy

Check Copy


Address Change

Telephone Change

E-mail Change

Check Order

Traveler’s Checks

Foreign Currency

Representation of a hierarchical decision tree Note: Using the Knowledge Base Editor, logical expressions may also be associated

with specified categories at this stage to further refine the classification process. Once a skeletal decision tree structure—either hierarchical or flat—has been created, concept models for each category (used for message classification) are gathered for each category through the Learning or Training processes.

Learning is an ongoing automatic process, invisible to the user, that gathers concept models for each category in the SKB over time. Concept models are gathered by collecting feedback from normal message processing activity, bootstrapping the system for accurate message classification in the future.

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing

Page 5

For example, when a customer service agent uses the Rapport Message Center application to compose a reply to a message, the agent may choose from a database of pre-written responses linked to categories in the Knowledge Base. The act of choosing a response provides feedback to the system; concept models contained in the message form the basis of concept models associated with categories linked to the response. In addition to bootstrapping the system, learning continuously updates and enriches existing concept models in the SKB during normal Rapport usage. Learning is an organic process, enabling the Knowledge Base to grow and adapt over time. Concept models are refined by introducing new information derived from changes that have occurred in the composition of messages, and from agent activity. As Rapport learns, it broadens the base of concept models, making the system more precise over time.

The Training process is an optional, but recommended method for gathering models for categories in the SKB decision tree. Training is implemented offline, and involves analyzing a corpus of sample messages classified into pre-defined categories. These messages are first processed by Rapport’s Lexical Editor to enrich the LKB with userspecific linguistic data. Then each message in the corpus is processed individually by the NLP and Statistic engines to populate the SKB with models used to classify incoming messages. The sections that follow discuss the Training process in greater detail.

Stages of Knowledge Base Development
Rapport Knowledge Base development based on Training is an optional, but recommended process implemented offline, consisting of the following stages:  Creating a Corpus A corpus of sample messages, pre-classified according to categories, provides source material for NLP and Statistic Training processes that build the Rapport Knowledge Base.  The Pre-Training Process An optional process that enriches the LKB by extracting and identifying significant words and linguistic information that are unique to the corpus of messages.  Knowledge Base Building An optional process consisting of two stages: NLP and Statistic Training. NLP Training generates concept models—units of linguistic information used by the statistic engine to build the SKB. Statistic Training gathers concept models from each message in the corpus and updates each category’s models in the SKB decision tree.

This document contains confidential and proprietary information.

Page 6

Knowledge Base Development and RME Processing

Creating a Corpus
A corpus is a collection of sample messages gathered by an organization (prior to using Rapport) that have been pre-classified according to their subject matter. The corpus provides source data used during Pre-training, NLP Training, and Statistic Training (described below). The corpus may be organized by grouping similar messages in directories or folders according to category names that represent the messages’ content. Alternatively, each message may have a field or data identifier that indicates its category (or categories).

Category 1

Category 2

Category n

Category 4 Category 2

A corpus where each message is associated with one or more categories

. . .
A corpus with similar messages grouped together in separate folders or directories representing categories

Category 1,5 Category 3

For the subsequent Pre-Training and Training processes to be most effective, the corpus should only contain messages that are accurately classified and free of extraneous text (unrelated to the message’s category). An ideal corpus consists of messages that are classified according to well-defined categories (avoiding redundancies between categories), with textual content that is consistently representative of the category’s subject. As many messages as possible with similar message content should be grouped together for each category—more messages per category improves the quality of concept models created during the statistic Training process (described below).

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing

Page 7

The Pre-Training Process
Pre-Training is an optional process that extracts and identifies significant linguistic information unique to the corpus being analyzed. This data enriches the LKB, improving NLP Training and the RME’s ability to accurately classify messages online. Each business or organization has its own vocabulary of words that are unique and significant. For example, an Internet Service Provider (ISP) may consider the words “Internet Connection” to be significant, while an airline passenger service might decide that these words are insignificant. At the same time, both companies would probably consider the word “connection” to be significant, but they would define “connection” in two entirely different ways. To the ISP, a connection is an Internet hookup; to an airline, it’s an air flight. In contrast, an insurance company may define “connection” as insignificant. Rapport’s Lexical Editor application is used to implement the Pre-Training process. The Lexical Editor analyzes the corpus of messages, filters the text, and generates lists of simple linguistic units called tokens and token pairs, organized according to frequency. A token is a string of characters identified by the Lexical Editor within a body of text. When the system analyzes the text of a corpus, it searches for delimiter characters such as spaces and typographical marks (periods, colons, etc.). Any string of characters found between these delimiters is recorded as a token. Significant tokens, non-significant tokens, and word associations are identified using the Lexical Editor, and stored in the LKB.
Note: The Pre-Training process is particularly useful for preparing the RME to

accurately classify and process messages from international sources, especially messages including frequent misspellings and non-standard English usage.

Lexical Editor
Analyzes the corpus Analyzes the corpus Filters the word base Filters the word base Generates lists of single tokens Generates lists of single tokens and token pairs and token pairs Calculates token frequency Calculates token frequency Enables the user to identify Enables the user to identify significant and non-significant significant and non-significant tokens, and define word tokens, and define word associations associations Stores information in the Stores information in the Linguistic Knowledge Base Linguistic Knowledge Base

Linguistic Linguistic Knowledge Base Knowledge Base
Receives corpus-specific Receives corpus-specific linguistic data from the linguistic data from the Lexical Editor Lexical Editor Also contains additional Also contains additional domain knowledge domain knowledge (optional), standard English (optional), standard English word lists, grammar, and word lists, grammar, and rules for morphological rules for morphological analysis analysis


This document contains confidential and proprietary information.

Page 8

Knowledge Base Development and RME Processing

Knowledge Base Building
Knowledge Base building based on a corpus is implemented in two phases: NLP Training and Statistic Training.

The NLP Training Phase
During the NLP Training phase, the NLP engine analyzes and processes each message in the corpus individually in two stages: Pre-Processing and Processing. During Pre-Processing, the NLP engine analyzes each message text, identifies the portion of text to be processed, and generates an intermediate representation of the concepts contained in the message. In the Processing stage, the NLP engine uses morphological rules, word associations, and other linguistic techniques to accurately determine the concepts contained in each message, and the associations between them. These concepts are exported to the statistic engine for statistic Training via the Concept Modeler. The Concept Modeler converts the message’s concepts into concept models—a format used by the statistic engine to build the Statistic Knowledge Base.

NLP Engine
Analyzes and processes each message individually Identifies the portion of text to be processed Receives data from the Linguistic Knowledge Base Generates an intermediate representation of concepts

Uses morphological rules, word associations, and complex algorithms for generating concepts, and concepts based on other concepts Exports concepts to the Concept Modeler

s s s s s s

Linguistic Knowledge Base

Concepts s s s s s s

Concept Concept Modeler Modeler
Converts concepts Converts concepts into concept models into concept models

Statistic Engine
Implements Statistic Training

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing

Page 9

The Statistic Training Phase
Statistic Training is implemented using the Rapport Knowledge Base Editor application. A skeletal decision tree structure is built based on the same categories used to classify messages in the corpus. During statistic Training, the statistic engine receives concept models from each corpus message individually. The statistic engine builds the SKB by performing operations on these concept models, and creating models for the categories of each message in the SKB decision tree. The result is an SKB populated with models that accurately classifies incoming messages during online RME processing.
Note: Statistic Training may also provide feedback (manually) to the NLP

Training process, improving NLP analysis and the determination of concepts.

Concept Concept Models Models

Statistic Engine
Knowledge Base Editor
♦ Populates decision tree with new concept models based on each message’s concept models Updates existing models in the Statistic Knowledge Base

Statistic Knowledge Base
Stores concept models for each category in decision trees

Per Individual Per Individual Message Message

Updating the Knowledge Base
Rapport readily adapts to almost any change in your incoming message environment. In some situations, however, the Learning process may take time. A more immediate solution is running an accelerated version of the Pre-Training and Training processes. Repeating the Pre-Training and Training (as required) ensures optimal message classification. It is recommended to repeat these processes when:  Major changes have been made to categories  Demographic or geographic changes have occurred effecting the origin of your incoming messages (e.g., an organization begins to receive large numbers of messages from a location outside its normal area of operation)  Adding new categories to the SKB  Adding or changing products or services

This document contains confidential and proprietary information.

Page 10

Knowledge Base Development and RME Processing

 Responding to special events

Summary of Knowledge Base Development
Linguistic and statistical data stored in the Rapport Knowledge Base is used by the RME to perform accurate message classification, enabling the system to take the most appropriate action for each customer message.

To gather this data, the system can be bootstrapped by an automatic process called Learning. Learning is ongoing, invisible to the user, and populates the SKB decision tree with concept models over time during normal Rapport operation. In addition to bootstrapping the system, learning continuously updates models in the SKB, improving message classification.

Alternatively, the Rapport Knowledge Base may be built based on a corpus of sample messages classified according to categories. During Pre-Training, the Lexical Editor is used to analyze the corpus, identify significant, corpus-specific linguistic data, and refine the LKB. NLP Training analyzes each message in the corpus individually, and exports concepts via the Concept Modeler to the statistic engine. The Knowledge Base Editor application is used to create a skeletal decision tree structure based on corpus categories. For each message in the corpus, concept models are gathered for categories in the decision tree, and are stored in the SKB. The following simplified diagrams illustrate the chronological development of the Rapport Knowledge Base using the Training process.

This document contains confidential and proprietary information.

Knowledge Base Development and RME Processing

Page 11

Knowledge Base Development (Based on Training)
Creating a Corpus
Corpus Corpus

Sample Messages

Classified according to message content

The Pre-Training Process
Lexical Editor Application

Corpus Corpus

Linguistic Knowledge Base

NLP Training Process
NLP Engine Pre-Processing & Processing

Corpus Corpus

Concept Modeler Concept Models Exported to Statistic Engine

Linguistic Knowledge Base

Statistic Training Process
Statistic Engine Concept Models from NLP Training Statistic Knowle dge

This document contains confidential and proprietary information.

Page 12

Knowledge Base Development and RME Processing

Online RME Processing
The linguistic and statistical data gathered through Learning, and optionally Training, enables the RME to accurately classify customer messages on-the-fly. In a process similar to NLP Training, message concepts are identified by the NLP engine using data in the LKB, and are exported to the Concept Modeler. Concept models are received by the statistic engine and compared to existing models in the SKB’s decision tree, generating category scores. Based on category relevancy, optional logical expressions and other message parameters, and category configuration properties, the message is routed for an appropriate automatic or semi-automatic action. The learning process enables the system to evolve and adapt over time, constantly improving Rapport’s ability to accurately classify messages in the future.

Online RME Message Processing NLP Engine
Pre-Processing Customer Message Processing Concept Modeler

Statistic Engine

Message Routed for Automatic or SemiAutomatic Action

Knowledge Base



This document contains confidential and proprietary information.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.