You are on page 1of 90

Subu Goparaju

Vice President
and Head of SETLabs
At SETLabs, we constantly look for opportunities to leverage
technology while creating and implementing innovative business
solutions for our clients. As part of this quest, we develop engineering
methodologies that help Infosys implement these solutions right frst
time and every time.
For information on obtaining additional copies, reprinting or translating articles, and all other correspondence,
please contact:
Telephone : 91-80-41173871
Email: SetlabsBriefngs@infosys.com
SETLabs 2009, Infosys Technologies Limited.
Infosys acknowledges the proprietary rights of the trademarks and product names of the other
companies mentioned in this issue of SETLabs Briefngs. The information provided in this document
is intended for the sole use of the recipient and for educational purposes only. Infosys makes no
express or implied warranties relating to the information contained in this document or to any
derived results obtained by the recipient from the use of the information in the document. Infosys
further does not guarantee the sequence, timeliness, accuracy or completeness of the information and
will not be liable in any way to the recipient for any delays, inaccuracies, errors in, or omissions of,
any of the information or in the transmission thereof, or for any damages arising there from. Opinions
and forecasts constitute our judgment at the time of release and are subject to change without notice.
This document does not contain information provided to us in confdence by our clients.
VOL 7 NO 5
2009
kNOwLEdgE
ENgINEErINg
aNd maNagEmENT
k
N
O
w
L
E
d
g
E

E
N
g
I
N
E
E
r
I
N
g
V
O
L

7



N
O

5


2
0
0
9
ABHISHEK KUMAR
Abhishek Kumar is a Software Engineer at Center for Knowledge Driven Information Systems (CKDIS) at Infosys. He
can be contacted at abhishek_kumar25@infosys.com.
ANJANEYULU PASALA
Anjaneyulu Pasala PhD is a Senior Research Associate at SETLabs, Infosys. His research interests include Software
Engineering and Software Verification and Validation. He can be reached at Anjaneyulu_Pasala@infosys.com.
ANJU G PARVATHY
Anju G Parvathy is a Junior Research Associate with CKDIS at Infosys. She researches in the fields of NLP and Text
Analytics. She can be contacted at anjug_parvathy@infosys.com.
ARIJIT LAHA
Arijit Laha PhD is a Senior Research Associate at SETLabs, Infosys. He researches in Knowledge Work Support
Systems, Pattern Recognition and Fuzzy Set Theory. He can be reached at Arijit_Laha@infosys.com.
ARUN SETHURAMAN
Arun Sethuraman was a Junior Research Associate at SETLabs, Infosys. His research interests include Intelligent
Multi-Agent Systems and Phylogenetics.
ASHISH SUREKA
Ashish Sureka PhD is a Senior Research Associate at SETLabs, Infosys. His research interests are in the areas of Data
Mining and Text Analytics. He can be reached at Ashish_Sureka@infosys.com.
BINTU VASUDEVAN
Bintu G Vasudevan PhD is a Research Associate with CKDIS at Infosys. His research interests include NLP, AI and
Text Analytics. He can be contacted at bintu_vasudevan@infosys.com.
GEORGE ABRAHAM
George Abraham is an Associate Consultant with the Oracle Business Intelligence practice at Infosys. His areas of
interest include Business Intelligence and Innovation Systems. He can be reached at george_abraham01@infosys.com.
JOHN KURIAKOSE
John Kuriakose is a Software Architect with SETLabs, Infosys. He has research interests in semantic technologies and
knowledge engineering. He can be contacted at john_kuriakose@infosys.com
JOYDIP GHOSHAL
Joydip Ghoshal is a Programmer Analyst at Infosys Technologies Limited. He has a vast experience in business
analysis and software development projects. He can be reached at joydip_ghoshal@infosys.com.
KOMAL KACHRU
Komal Kachru is a Researcher with SETLabs, Infosys Technologies. She has several years of research experience in
areas like Artificial Neural Network and Genetic Algorithms. She can be contacted at Komal_Kachru@infosys.com.
MANISH KURHEKAR
Manish Kurhekar is a Programmer Analyst at Infosys Technologies Limited. He has rich experience in Business
Analysis and software development projects. He can be reached at manish_kurhekar@infosys.com.
NIRANJANI S
Niranjani S is Software Engineer in Test Automation Lab at SETLabs, Infosys. She can be contacted at Niranjani_S@
infosys.com
RAJESH BALAKRISHNAN
Rajesh Balakrishnan is a Principal Architect with CKDIS at Infosys Technologies Limited. He has research interests
in NLP, AI and Information Retrieval. He can be reached at rajeshb@infosys.com.
RAJESH ELUMALAI
Rajesh Elumalai is an Associate Consultant with the BPM-EAI Practice at Infosys. His areas of specialization include
BPM and Business Rules Management. He can be contacted at Rajesh_Elumalai@Infosys.com
RAKESH KAPUR
Rakesh Kapur is a Principal Consultant at Infosys Consulting Services. His key areas of interest include consulting
enterprises to enable process transformation. He can be reached at rakesh_kapur@infosys.com.
RAVI GORTHI
Ravi Gorthi PhD is a Principal Researcher with SETLabs, Infosys. His research interests include Knowledge
Engineering and Model Driven Software Engineering. He can be contacted at Ravi_Gorthi@infosys.com.
SUJATHA R UPADHYAYA
Sujatha R Upadhyaya PhD is Researcher with SETLabs, Infosys. Her research interests include Knowledge Modeling,
Ontologies, Machine Learning and Text Analytics. She can be reached at Sujatha_Upadhyaya@infosys.com.
SWAMINATHAN NATARAJAN
Swaminathan Natarajan is Senior Technical Architect with SETLabs, Infosys. His areas of interest include Information
Management and Knowledge Engineering. He can be contacted at Swaminathan_N01@infosys.com.
VENUGOPAL SUBBARAO
Venugopal Subbarao is a Principal Architect with SETLabs, Infosys. His interests are in Information Management and
Knowledge Engineering. He can be reached at venugopal_subbarao@infosys.com.
YOGESH DANDAWATE
Yogesh Dandawate is a Researcher with SETLabs, Infosys. His research interests include Knowledge Engineering,
Ontologies and Text Analytics. He can be contacted at yogesh_dandawate@infosys.com.
Authors featured in this issue
SETLabs Briefings
Advisory Board
Gaurav Rastogi
Associate Vice President,
Head - Learning Services

George Eby Mathew
Senior Principal,
Infosys Australia
Kochikar V P PhD
Associate Vice President,
Education & Research Unit
Raj Joshi
Managing Director,
Infosys Consulting Inc.
Rajiv Narvekar PhD
Manager,
R&D Strategy
Software Engineering &
Technology Labs
Ranganath M
Vice President &
Chief Risk Officer
Subu Goparaju
Vice President & Head,
Software Engineering &
Technology Labs

knowledge Powered
IT Systems
In the last three decades, information technology has evolved and matured as
a dependable online business transaction processing (OLTP) technology. Some
trillions of business transactions are processed across the world per day and it
is no surprise that millions of people have confdence to trust the integrity of
this technology. In addition, the last one decade has witnessed the availability
and widespread use of online analytical processing (OLaP) tools that offer
multidimensional insights into the latest enterprise information to the business
decision-makers.
Concurrent to the above evolution, the feld of artifcial Intelligence (aI)
has gone through a series of serious challenges in bringing knowledge into
automated reasoning and action. However, the recent success stories in
applying aI techniques to specifc business problems hold out promises that
this feld has begun to offer acceptable benefts to the business community. a
paradigm shift in information technology, termed as knowledge Powered IT
(kPIT) systems is anticipated. These kPIT systems should enable business users
- semi-automatically or in human-assisted ways - to extract, refne and re-use
actionable enterprise knowledge. For example, the knowledge of experienced
professionals who can diagnose and repair complex engineering artifacts with
expert skills who constitute a small percentage can be made available to novices
who constitute a large percentage, in an attempt to raise the productivity and/
or quality of the novice group. knowledge Engineering is a critical aspect of
kPIT systems. and this discipline covers models to represent various kinds of
knowledge and techniques to extract, refne and re-use such knowledge, where
and when required.
This issue aims to present a landscape picture of emerging trends in business
applications of knowledge engineering that can potentially empower enterprises
to be smart. Be it the usage of divergent terminology to refer to common
business concepts across enterprise IT systems or the usage of domain-specifc
knowledge to automatically extract fnancial data from complex unstructured
sources, the ultimate goal of knowledge engineering is to enable enterprises
move from the traditional way of managing enterprises to that of knowledge-
oriented and knowledge-powered management. all the papers in this collection
weave around a very potent theme knowledge-powered systems for sharp
decision making and effcient management.
we hope you enjoy reading this issue as much as we have in putting it together
for you. Needless to mention, your feedback will help us in our pursuits to
bring insights into technology trends to you through special issues of SETLabs
Briefngs such as this one. Please do write in to me with your comments and
suggestions.
ravi P gorthi Phd
ravi_gorthi@infosys.com
guest Editor

3
13
23
31
39
47
53
61
69
81
83
SETLabs Briefings
VOL 7 NO 5
2009
Literature Review: Applications of Collaborative Multi-Agent Technology to
Business: A Comprehensive Survey
By Ravi Gorthi PhD, Niranjani S, Anjaneyulu Pasala PhD and Arun Sethuraman
Multi-agent systems have the potential to revolutionize the way businesses operate today. The
authors discuss innovative ways to apply the intelligent systems in the most effective manner to
ease business communication bottlenecks and speed up decisions therein.
Research: Building Knowledge-Work Support Systems with Information
Warehouses
By Arijit Laha PhD
Knowledge can be accessed and interpreted in countless ways. A knowledge management
system falls flat if the socio-cultural-behavioral aspects of knowledge workers and users are
not considered. A task-based knowledge management (TbKM) approach and an Information
Warehouse (IW) can open up a host of possibilities in the field of knowledge management,
asserts the author.
Insight: Whats in a Name?
By Yogesh Dandawate and John Kuriakose
Business knowledge is encapsulated in business ontology. Creating business ontologies
afresh is a gargantuan task. The authors draw from their vast experience and suggest that
organizations that dig into IT artifacts from their existing IT portfolio can build such
ontologies with ease.
Opinion: Knowledge Management for Virtual Teams
By Manish Kurhekar and Joydip Ghoshal
Virtual teams are the order of the day and how one leverages KM tools to smudge the physical
boundaries is what becomes a key differentiating factor. The authors document a plethora
of ways to exchange knowledge in a virtual team setup and win over the challenges of virtual
interactions.
Viewpoint: Toward Disruptive Strategies in Knowledge Management
By Rajesh Elumalai and George Abraham
KM tools and technologies have become imperative to the survival of any organization today.
The authors suggest that it is time to revisit the conventional methods that are prevalent today
and work around strategies to maintain a competitive edge in the market.
Perspective: Knowledge Engineering for Customer Service Organizations
By Rakesh Kapur and Venugopal Subbarao
Customer service providers need to update themselves with every new invention and
technology that hits the market. To stand out in the crowd of service advisors, a differentiated
service can be accelerated and aided with the help of knowledge engineering solutions, feel
the authors.
Model: Support to Decision Making: An Approach
By Sujatha R Upadhyaya PhD, Swaminathan Natarajan and Komal Kachru
Tools empowered with a Bayesian inference engine can support decision making in uncertain
situations. The authors suggest that such automated support can effectively reduce analysis
time and accommodate flexibility in business decision making.
Methodology: Automated Knowledge-Based Information Extraction from
Financial Reports
By Bintu G Vasudevan PhD, Anju G Parvathy, Abhishek Kumar and Rajesh Balakrishnan
Financial data is often stored in tabular form. Information stored in financial statements and
documents can throw up brilliant analysis if extracted and mined properly. A methodology
that can mine such text residing in images and similar such unstructured data can affect
investment decisions, avoid pitfalls and reap huge pecuniary benefits, claim the authors.
Case Study: A Differentiated Approach to Business Process Automation using
Knowledge Engineering
By Ashish Sureka PhD and Venugopal Subbarao
Manual processing of data, be it structured or unstructured, is bound to be cumbersome and
error prone. The authors suggest an application that is developed on a requirement that lies
at the intersection of business process automation, knowledge engineering and text analytics
and promises to gather relevant data at a lightning speed.
The Last Word: Power Your Enterprise with Knowledge. Be Smart.
By T Ravindra Babu PhD
Index
Making business decisions under uncertain situations
can be a big pain. Thankfully, Bayesian networks
through their superior knowledge representation and
inference methods come to the decision makers aid.
Sujatha Upadhyaya PhD
Researcher, Information Management Group
SETLabs, Infosys Technologies Limited
In a world increasingly dictated by change, it is important
for organizations to move away from people-dependent
operations to system-dependent ones. A sturdy knowledge
management system comes in handy in negotiating todays
all pervasive change.
Rakesh Kapur
Principal Consultant
Consulting Services, Infosys Technologies Limited
3
SETLabs Briefings
VOL 7 NO 5
2009
Applications of Collaborative Multi-
Agent Technology to Business:
A Comprehensive Survey
By Ravi Gorthi PhD, Niranjani S, Anjaneyulu Pasala PhD and Arun Sethuraman
Intelligent, multi-agent technology offers a host
of new opportunities to businesses across industry
and business segments
T
he collaborative, intelligent, multi-agent
technology has witnessed a considerable
attention in recent years. This technology
promises to offer a host of new opportunities to
business communities in almost all the vertical
industry and horizontal business segments.
Applications built using these technologies
enable dynamic data and information acquisition,
aiding planning and decision making. In this
paper, an attempt has been made to present a
landscape of plausible applications, which could
be very useful to the current CxOs of enterprises
in planning future business strategies.
MULTI AGENT SYSTEMS (MAS) :
BACKGROUND
An intelligent agent is a distinct kind of
software program concept that has a goal,
has knowledge of one or more domains of
relevance, is autonomous (pro-active) in
achieving its goal, is reactive to the changes to
the environment in which it pursues its goal
and is capable of communicating with humans
and other agents [1, 2]. These agents possess
basic characteristics like (i) role to play, (ii) one
or more goals to achieve, (iii) capability to take
actions autonomously, (iv) capability to monitor
the environment periodically and pro-actively
and effect changes, if required, (v) capability to
sense and react to the changes to environment,
and (vi) capability to communicate with
humans, and extended characteristics like (i)
specifc knowledge in one or more subjects, (ii)
capability to communicate and collaborate/
compete with other agents, (iii) ability to be
mobile and move around in the environment,
if needed, to achieve the goals [3].
Henceforth in this paper, the term
agent refers to a software agent with the above
mentioned basic and/or extended characteristics.
4
Simple problems can be solved by
agents with basic characteristics whereas
complex problems require multiple intelligent
agents that collaborate and/or compete among
themselves. Systems built using the extended
characteristics are known as multi-agent
systems (MAS) [4].
Some popular applications of MAS are:
Multiple agents with different roles
col l aborat e among t hemsel ves t o
continuously monitor road traffc and
effect changes to signal durations at
road intersections in order to improve
traffc fow [5]
A personal assistant agent (residing on a
mobile wireless connected device of its
owner) detects and interacts with other
similar agents of a local social-network
in a geography and offers a variety of
services of interest to its owner [6]
A set of agents with different roles
collaborate among themselves in a
supply-chain management environment
leading to enhanced productivity and
quality [7, 8].
PRIOR SURVEYS
A survey on appl i cat i ons of agent s i n
telecommunications describes how (i) agents
in Integrated in Service Provision help in
mediating all personal communication from
different media sources with specific user
needs, (ii) agents help in automating some
of the network management and supervision
tasks, and (iii) agents help distributed problem
solving [9].
Abbot t and Si skovi c di scuss t he
various ways in which agents can be used
i n managi ng t he net work resources, i n
conf i guri ng sof t ware programs, i n t he
maintenance and repair of software programs,
in e-mail filtering, network monitoring and
protection, etc [2].
A survey on applications of agents
in medical science presents agents-based
Intelligent Decision Support Systems (IDSS)
in areas of clinical management and clinical
research [10]. The study also analyzes the
applications of agents-based IDSS in Neonatal
Intensive Care Unit (NICU).
Ml adeni c shares a survey on t he
applications of agents in text analytics and
learning where machine learning approaches
viz., content-based approach and collaborative
approach, with various user interface agents
have been discussed [11].
A survey carri ed on Di st ri but ed
Artifcial Intelligence (DAI) illustrates how
multi-agents coordinate with each other in
accomplishing complex tasks and handling
conficting situations. It also discusses game
theory involving inter-agent cooperation [12].
Yet another survey by Kowalczyk et al., narrates
various short overviews on intelligent and
mobile agents in e-commerce [13].
Tveit describes an overview of agent
oriented software engineering [14]. Hoekstra
offers a survey on the usage of intelligent agents
in dynamic scripting, genetic algorithms and
neural networks for video games business [15].
However, given the possibility that
future ITES is likely to heavily depend on
and utilize the collaborative MAS technology,
the details offered by the above surveys
are found to be inadequate. CXOs require a
comprehensive and latest view on the landscape
of applications of MAS technology to various
vertical and horizontal business segments. Our
survey addresses this need.
5
APPLICATIONS OF MULTI-AGENT
TECHNOLOGY
There is evidence of fairly exhaustive surveys
on the applications of MAS technology to
various vertical industrial segments such as
banking and capital markets [16, 17, 18], travel
and tourism [19, 20, 21], telecommunications
[6], transportation and services [22] and
bioinformatics [23, 24, 25] and horizontal
busi ness domai ns such as knowl edge
management [26], supply chain management
[7] and software project management [27, 28,
29, 30]. The subsequent sub-sections present a
landscape view of these applications.
Application of MAS Technology in Vertical
Industry Segments
Banking and Capital Markets
The rapid growth of e-commerce in recent
years has given rise to concerns in the area
of tax evasion and mitigation. Wei et al., [16]
explore and exploit the power and features
of mobile, multi-agent technology to offer a
new solution to this taxation problem, even
while preserving the privacy of the purchaser.
The authors propose the use of five types
of mobile agents, viz. purchaser agents,
seller agents, bank agents, tax agents and
certification agents, that interact with each
other in the creation and tracking of Electronic
Invoice (EI) and Electronic Tax Voucher (ETV)
leading to efficient and simplified e-commerce
taxation mechanism. The authors simulated
the proposed solution using IBMs Aglet
workbench.
The phenomenal growth in the area
of mobile wireless network users has led to
a great opportunity for the banking industry
to offer mobile banking services such as the
capability for the users to perform various
banking transactions viz., seek account balance,
get alerts on changes to bank accounts, perform
money transfers, pay utility bills, etc., through
their mobile devices. Adagunodo et al., present
an Interactive SMS Banking Agent based
innovative, incrementally scalable, mobile
banking solution [17]. In this solution, the real
time (24 hours a day and 7 days a week) SMS
banking agents run on a server (thus avoiding
the need for distribution and deployment) and
offer a range of banking transactions through
SMS facility on mobile devices.
There are many areas of social operations
such as visiting a restaurant, a hospital, an
entertainment park or a theater, where users
would like to know dynamically and easily
about the availability of resources. Muguda et
al., in their paper describe a very interesting
application of mobile agents [18]. They discuss
their experiments to characterize and model
the benefts of planning in such environments,
where resources can be reserved and such
reservations can be traded in a market place.
Travel and Tourism
If people can dynamically get details on their
mobile wireless network enabled handheld
devices about a historical place, monument
or a piece of art work that they are currently
looking at, it can be of immense value to them.
Bombara et al., offer details of a multi-agent
based system called KORE that aims to address
the above need and provide a personal guide
to assist museum visitors through the visitors
wireless connected handheld device [19]. KORE
was developed as a prototype using Java Agent
Development Environment (JADE) to work on
Palm m505 PDA and consists of:
Main museum server, that has global
information database with details of all
the works of art in the museum
6
Information service agents that provide
access to main museum database
A set of zonal servers that have a
database with details on works of art
in that particular zone along with zone
information agents that are responsible
for managing the database
Beamer agents that drive the IR beamers
User mobile agents that use WAI (Work
of Art Identifer) from IR beamers and
provide information based on Users
Preferences.
Tourists visiting a particular city would
like to dynamically receive details such as
places to visit, restaurants nearby, visiting
hours of a tourist spot, etc., on their mobile
devices.
Lopez and Bustos describe a multi-agent
system architecture that provides services like
obtaining up-to-date information on places of
visit and planning for a specifc day in tourism
industry [20]. This MAS system architecture
consists of broker agents, sight agents, user
agents and a planning agent. Communication
among these agents is based on a common
ontology. Various services like search, reserve,
plan a specific day and register have been
provided to the users. When a user requires
details on places of visit, she uses various
services provided by a user agent. Then the
user agent sends a REQUEST message to
a broker agent, that in turn processes and
forwards message to sight agents that match the
parameters and return a set of results that match
the users requirements containing relevant
information about each site. In order to reserve,
the user agent sends PROPOSE message with
required information to a sight agent, which
then reserves the bookings or sends refusal
if reservation is not possible. A plan agent
presents plans on receiving a REQUEST
message from the user agent such that time
can be managed effciently throughout the day.
The authors have developed a prototype and
implemented it using JADE-LEAP platform on
Hewlett Packard iPAQ 5450 that has Bluetooth
and 802.11b onboard to run the user agents.
S i mi l a r l y , B a l a c h a n d r a n a n d
Enkhsaikhan present the use of MAS technology
in automating various services in travel industry
involving airline tickets, hotel accommodations,
taxi services, etc [21]. These agents communicate
with each other and negotiate the services to
provide an optimal solution to a customer. The
different types of agents used in creating this
MAS based application are:
Business agents such as travel agents,
fight agents, hotel agents, car agents
that are specialized assistant agents for
the customer using this system
Database agents that are responsible for
performing all database operations such
as queries and updates.
Transportation Services
Increasing population has led to increase in
traffic. There are limited parking spots in
busy commercial localities, malls, offces and
colleges. People spend a lot of time fnding a
parking space.
To address this problem of finding
parking space Ganchev et al., have demonstrated
a multi-agent based system solution wherein a
set of different types of agents collaborate to
automatically and dynamically locate a parking
space in a university campus. These ideas can
be extended to offer similar service in other
locations of public interest. The system can
inform the user through her mobile wireless
7
connected device [22]. The authors presented
a detailed three tier architecture of this
solution consisting of: user mobile devices,
geographically dispersed InfoStations and a
central InfoStation center. Different types of
agents mounted on these devices or systems
collaborate through WPAN or Wi-Fi or Wi-Max
connections. In this solution, a user makes a
request for a parking slot through the mobile
device. This request will be forwarded to
the nearest InfoStation by the personal agent
residing on the users mobile device. If the
nearest InfoStation cannot confrm a parking slot
in its geography, it escalates the request to the
InfoStation center which then locates the near
optimal parking slot if available and informs
the personal agent of the user. This system
was implemented using JADE framework and
LEAP module that facilitate implementation of
agents on mobile devices. One can imagine that,
in future, it is possible to make such a personal
agent intelligent and proactive whereby (i) it
can examine the plans of its owner in advance
and proactively collaborate with appropriate
InfoStations and reserve parking slots, and
(ii) dynamically negotiate changes to these
reservations by sensing changes / delays to the
plans of its owner.
Roozemond and Rogier discuss the use
of intelligent agents to build traffic control
systems that pro-actively bring changes in real
time to various traffc scenarios [5]. Information
agents collect information about weather, traffc
jams, public transport, route closures, best
routes and various parameters that control
traffc via a secure network and send it to the
user and the control stations. Signal durations
would hence be determined based on the
measured and predicted data. Traffc regulation
and tuning is done with coordination among
adjoining agents.
Bioinformatics
The adoption of multi-agent systems constitutes
an emerging area in bioinformatics [23]. In fact,
a working group on Agents in Bioinformatics
(BIOAGENTS) was founded during the frst
AgentLink III Technical Forum meeting held
in July 2004, with a purpose to explore agent
technology and develop new fexible tools for
(a) analysis and management of data, and (b)
for modeling and simulation of computational
biology.
GeneWeaver is a multi-agent system
comprising of a community of agents, having
fve distinct roles, that collaborate with each
other in order to automate the processes
involved in bioinformatics [24].
Armano G et al., describe a multi-
agent system for Protein Secondary Structure
Pr edi ct i on ( PSSP) by a popul at i on of
homogenous experts [25]. The authors discuss
how multi-agent technology is a very good ft
to address the problems of PSSP.
Telecommunication
The advent of wireless connected mobile
devices has enabled human beings to be
connected with other humans and information
systems anytime, anywhere. Such a paradigm
shift in connectivity coupled with the MAS
technology is showing a phenomenal potential
for a new set of social and business applications.
Bryl et al . , have used mul ti -agent
technology and Bluetooth enabled mobile
devices to create and use ad-hoc social networks
[6]. These social networks can hence be used
to provide access to a variety of services that
allow users of a locality to interact and transact
in areas of mutual interest, such as buy and
sell books in a university campus. A generic
architecture of independent servers is presented
where multi-agent platforms can be installed
8
and agents can act on behalf of their users.
Each server is meant to offer one or more specifc
services (e.g., buy and sell books) of interest to
the geographic area in which it is located (e.g.,
a university campus). The strength of this
architecture is that it is (a) domain independent
where each server can offer different services
relevant to its location, and (b) independent
of the MAS technology used (one can use
different MAS technologies such as JADE on
each server). A prototype with buy/sell books
service has been developed and implemented
using JADE and tested using Nokia 6260 and
PC/Server equipped with Tecom Bluetooth
adapter. Bluetooth communication has been
implemented using Blue Cove which is an open
source implementation of the JSR-82 Bluetooth
API for Java.
Application of MAS Technology in Horizontal
Business Segments
Knowledge Management
Knowledge Management (KM) is gaining
importance in large organizations owing to their
geographically distributed operations spread
across different time zones. Such organizations
are increasingly tapping into global markets on
the one hand and resources on the other. KM
systems attempt to offer the latest knowledge
of the enterprise knowledge extracted and
created from structured and unstructured
sources - to the employees who need it.
Houari and Far offer a comprehensive
methodology to build such a sophisticated
KM system using the multi-agent systems
technology [26]. They discuss how a KM
system built using agents with distinct roles,
cooperation and communication capabilities,
intelligence, autonomy and shared ontologies
can be used to achieve better utilization of
knowledge in decision-making.
Supply Chain Management
A typical supply-chain manager is responsible
for (a) managing the optimal arrival and
stocking of a range of input materials from
different sub-contractors, and (b) the integration
and processing of the input materials to
produce and deliver a variety of finished
products to the clients. The management of this
responsibility in todays world is still human
centric. The supply chain manager and her
team interact with the teams representing the
sub-contractors, enterprise production units
and the clients. These interactions are known
to involve many tedious tasks that are error
prone. In addition to that, the lack of latest
information from all these sources can impact
cost, productivity and quality of products
delivered to the clients.
To achieve better coordination in the
flow of information among the sub-services
of a supply chain management, Wang et
al., propose the use of a variety of software
agents [7]. The problem of coordination among
sub-services is modeled as a distributed
constraint satisfaction problem which is solved
collaboratively by the group of software
agents. The steps involved in this solution
methodology are as follows:
Decomposing customer requirements
into a set of services represented by a
business process or plan. This can be
achieved by means of any workflow
representation or hierarchical task
network (HTN) wherein each task is
broken down into sub-tasks and uses
task-reduction rules to decompose
abstract goals into lower level tasks.
Find and coordinate the actors that
would be fulflling these services.
9
These steps are achieved by creating
multiple service dispatcher agents, service
broker agents and service provider agents.
The requirements are initially analyzed by
the dispatcher agent, based on the customers
requirements and history of customer requests.
Following this, each service broker agent
forwards the request to service providers for
collecting bids or solutions to the request. Once
all the bids are received, the next step is to
flter the dominated solutions and then identify
compatible and promising solutions. The fnal
step is to refne constraints for a global solution
by means of communications between the
various service broker agents, thereby achieving
coherence and coordination.
Kern et al., discuss on how intelligent
software agents can help humans in carrying
out different tasks involved in supply chain
management [8]. Their proj ect, titled as
MobiSoft, proposes a new form of supply-chain
management. In their approach, a mobile device
based software agent that performs the role of
a personal assistant is provided to each of the
humans involved in the supply-chain process
fow. These personal assistant agents interact
and collaborate to reduce human errors and
provide latest information to their owners
anytime, anywhere thereby enabling the teams
to achieve higher levels of productivity and
quality.
Software Project Management
An i mportant characteri sti c of busi ness
management is the use of dialogues by a
community of professionals to solve problems.
One of the serious problems confronting the
business managers, in their problem solving
and decision making endeavor, is the high
degree of dependency on human interactions
and the high degree of manual interpretation
of dialogues by humans. Such a dependency
on manual intervention lends itself to problems
that can result from (i) the unpredictable,
inconsistent egoistic behavior of humans that
one witnesses time to time, (ii) the drop in
effciency of humans under stressful situations,
or (iii) the use of less experienced/qualifed
humans for managing tasks due to lack of
suffcient number of adequately skilled human
resources.
Set huraman et al . , i l l ust rat e t he
application of MAS technology to one such
business management task, viz., software
project management (SPM) [27]. The authors
discuss the various sub-tasks of SPM that can
get beneftted from the use of MAS technology.
They use the task of Quality Review which is
initiated and completed at the end of each phase
of the software lifecycle and demonstrate how
a set of personal assistant agents assigned to
each (a) software engineer, (b) quality assurance
reviewer, (c) quality assurance manager and (d)
software project manager, collaborate among
themselves and manage the Quality Review
task effciently. The agents manage many steps
of the process of Quality Review but do not
perform the sub-task of actually reviewing the
artifacts. The main advantages brought out by
this research are (i) productivity improvement,
as the agents perform many mundane tasks that
otherwise consume the time of experienced
software professionals, and (ii) consistency of
ensuring that the task of Quality Review is
initiated and completed at the right time and
any incompleteness is recorded and escalated
in time.
Pet ri e et al . i l l ust rat e how MAS
technology can be utilized for propagation
of dynamic knowledge, such as designs and
plans that are changed according to the status
of proj ect execution, between the proj ect
10
designers and planners, such that the effects
of changes are communicated properly [28].
The effort is concentrated on provision of
support for complex projects, where it is crucial
to communicate changes to necessary actors
on time, also termed as Distributed Integrated
Process Coordination. The authors characterize
their coordination model as a logical set of
dependencies among the project elements
that can be used to determine the effects of
changes within the project. To implement
this, Redux dependencies model was used,
which tracks validity within the dependency
model and notifies designers as and when
changes occur. They exhibit this by means of
a central facilitating event which uses these
Redux dependencies for decision making and
propagation. They conduct a case study by
means of a building construction example.
Nienaber et al., talk of a comprehensive
black-box model of a generic agent framework
that could be used in different phases of
software project management [29]. The paper
discusses the creation of personal assistant
(PA) agents, messaging agents, task agents,
monitoring agents and team manager agents.
A multi-agent system comprised of these
types of agents is used in the framework
to support various aspects of SPM, such
as scope management, time management,
cost management, qual i ty management,
human resource management, communication
management and risk management. The paper
discusses the prototype design of the system
and proposes the development of the same
using JADE framework.
Pitt et al., describe the design and
implementation of a CEC GOAL project, that
aims at development of generic software tools
to support distributed project management,
which is collaborative, decentralized and inter-
organizational [30]. The authors propose the
use of autonomous software agents to provide
for normalization of inter-organizational
t ermi nol ogy and f l ow of i nf ormat i on,
structuring of inter-organizational interactions
with respect to contracts and working practices
and also to enable each organization to provide
or use services required or offered by other
organizations. It uses a distributed review
process as an example to exhibit the application
of an agent system with behavior specified
by means of decision logic. They consider a
quality control scenario, where the deliverables
are largely technical papers. The project offce
aims to assure quality of the papers by means
of getting them reviewed by at least three
reviewers. Hence a call for participation for
performing this review is sent out, followed by
an announce and negotiate way of aiding this
review process.
CONCLUSION
MAS is found to be increasingly adapted by
various industry verticals and horizontal
business segments. There is an important
need to present a comprehensive survey of
the current and potential future applications
of MAS technology, which is undertaken
in this paper. The landscape of case studies
discussed in this survey points to a host of new
opportunities to various business communities.
REFERENCES
1. M Wooldridge, An Introduction to
Multi-Agent Systems, John Wiley &
Sons, 2002
2. L Abbott and H Siskovic, Intelligent
Agents in Computer and Network
Management. Available at http: //
t eachnet . edb. ut exas. edu/~l ynda_
abbott/webpage.html#intagt3
11
3. G Weiss, Multi Agent Systems: A Modern
Approach to Distributed Artificial
Intelligence; The MIT Press, 2001
4. Mul ti -agent Systems. Avai l abl e at
http://www.cs.cmu.edu/~softagents/
multi.html
5. A R Danko and J L H Rogier, Agent
Controlled Traffic Lights, ESIT 2000,
14-15 September 2000, Aachen, Germany
6. V Bryl, P Giorgini and S Fante, An
Implemented Prototype of Bluetooth-
based Multi-Agent System, Proceedings
of WOA05, 2005. Available at http://
l i a. dei s. uni bo. i t/books/woa2005/
papers/21.pdf
7. M Wang, H Wang and J Liu, Dynamic
Supply Chain Integration through
Intelligent Agents, System Sciences,
2007. HICSS 2007, 40th Annual Hawaii
International Conference, January 2007
8. S Kern, T Dettborn, R Eckhaus, Yang
J, C Erfurth, W Rossak and P Braun,
Assistant-based Mobile Supply Chain
Management , 13t h Annual I EEE
International Symposium and Workshop
on Engineering of Computer Based
Systems, 2006
9. A Survey on Intelligent Agents in
Tel ecommuni cati ons. Avai l abl e at
ht t ps: //www. cs. t cd. i e/research_
groups/aig/iag/survey.html
10. D Foster, C McGregor and S El-Masri,
A Survey of Agent-Based Intelligent
Decision Support Systems to Support
Clinical Management and Research.
Available at http://www.diee.unica.it/
biomed05/pdf/W22-102.pdf
11. D Mladenic, Text-Learning and Related
Intelligent Agents: A Survey, Intelligent
Systems and their Applications, IEEE,
1999, Vol 14, No 4, 1999
12. Surprise 95 Intelligent Agents and
Article 1. Available at http://www.
doc.ic.ac.uk/~nd/surprise_95/journal/
vol1/jjc1/article1.html
13. R Kowalczyk, M Ulieru and R Unland,
Integrating Mobile and Intelligent
Agents i n Advanced e-commerce:
A Sur vey. Avai l abl e at ht t p: //
c i t e s e e r x. i s t . ps u. e du/vi e wdoc /
summary?doi=10.1.1.11.4636
14. A Tveit, A survey of Agent-Oriented
Software Engineering, First NTNU
CSGSC, May 2001
15. C Hoekstra, Adapti ve Arti fi ci al l y
Intelligent Agents in Video Games: A
Survey, UNIAI-06
16. D Wei, Z Gan and J Zhang, A Mobile-
Agent-Based E-Commerce Taxation
Model, Computational Intelligence and
Security, 2006, Vol 1
17. E R Adagunodo, O Awodele and O
B Ayayi, SMS Banking Services: A
21st Century Innovation in Banking
Technology, Issues in Informing Science
and Information Technology, Vol 4, 2007
18. N K Muguda, Peter R Wurman and
R Michael Young, Experiments with
Planning and Markets in Multi-agent
Systems, Proceedings of the Third
Internati onal Joi nt Conference on
Autonomous Agents and Multi-agent
Systems, Vol 3, July 2004
19. M Bombara, D Cal and C Santoro, KORE:
A Multi-Agent System to Assist Museum
Visitors, Proc. Workshop Objects and
Agents (WOA 03), Pitagora Editrice
Bologna, 2003
20. J S Lopez and F A Bustos, MultiAgent
Tourism System: An Agent Application
on the Tourism Industry, Proceedings
of the International Joint Conference
12
IBERAMIA/SBIA/SBRN 2006, 1st
Workshop on Industrial Applications
of Di stri buted Intel l i gent Systems
(INADIS2006), Ribeirao Preto, Brazil,
October 2328, 2006
21. B M Balachandran, M Enkhsaikhan,
Development of a Multi-Agent System for
Travel Industry Support, Computational
Intelligence for Modeling, Control and
Automation, 2006 and International
Conference on Intelligent Agents, Web
Technologies and Internet Commerce,
International Conference, Sydney, 2006
22. I Ganchev, M ODroma and D Meere,
Intelligent Car Parking Locator Service,
Internati onal Journal Informati on
Technologies and Knowledge, Vol 2, 2008
23. A E Merelli, B G Armano, A N Cannata,
A Corradini, C M Dinverno, D A Doms,
E P Lord, F A Martin, G L Milanesi, H
S Mller, D M Schroeder and I Luck,
Agents in Bioinformatics, Computational
and Systems Biology. Available at
http://citeseerx.ist.psu.edu/viewdoc/
summary?doi=10.1.1.101.3003
24. K Bryson, M Luck, M Joy and D Jones,
Applying Agents to Bioinformatics in
Geneweaver, in Cooperative Information
Agents IV, Lecture Notes in Artifcial
Intelligence, Springer-Verlag, 2000
25. G Armano, G Mancosu, and A Orro,
A Mul ti -agent System for Protei n
Secondary Structure Prediction, in
NETTAB Models and Metaphors from
Biology to Bioinformatics Tools, pages
1929, Camerino, Italy, 57 Sept 2004
26. N Houari and B H Far, Application
of Intelligent Agent Technology for
Knowledge Management Integration,
Pr oc eedi ngs of t he Thi r d I EEE
International Conference on Cognitive
Informatics, 2004
27. A Sethuramam, K K Yalla, A Sari and
R P Gorthi, Agents Assisted Software
Project Management, Annual Bangalore
Compute Conference, Proceedings of
the 1st Bangalore Annual Compute
Conference, January 2008
28. C Petrie, S Goldmann and A Raquet,
Agent-Based Proj ect Management,
Proceedings International Workshop on
Intelligent Agents in CSCW, Dortmound,
1998
29. R C Nienaber and A Barnard, A Generic
agent Framework to Support the Various
Software Project Management Process,
Interdisciplinary Journal of Information,
Knowledge and Management, Vol 2,
2007
30. J Pitt, M Anderson and R J Cunningham,
Normal i zed Int eract i ons Bet ween
Autonomous Agents: A Case Study in
Inter-organizational Project Management,
Computer Supported Cooperative Work:
The Journal of Collaborative Computing,
Vol 5, 1996.
13
SETLabs Briefings
VOL 7 NO 5
2009
Building Knowledge-Work
Support Systems with Information
Warehouses
By Arijit Laha PhD
Efficient access to relevant information can
enable your knowledge worker perform complex
tasks with seamless ease
K
nowledge Management (KM) today can be
seen as a key focus area within all leading
knowledge driven organizations. Several
initiatives are taken up within organizations
to manage knowledge. However, despite
all the money spent and the efforts put in
by organizations to achieve effective KM
capabilities, the results till date are far from
stellar. The overall situation led Maier to
observe the solution is still not there and
many businesses trying to implement these
technologies have been frustrated by the fact
that the technologies certainly could not live up
to the overly high expectations [1].
Much of the problems stem from the
multi-faceted nature of knowledge management
systems (KMS) that not only involve information
technol ogy, but the soci al , cul tural and
behavioral aspects of the organization as a
whole, as well as of various user communities
within the organization [2]. Usually there exists
signifcant diversity of the above factors across
organizational subunits within an organization
[3]. This tends to render the universalistic
approach of organizational KM less effective [4].
In recent years, a host of extensive
ethnographic studies, where the researcher
becomes part of the environment and culture
under study and makes frst-hand observations
over an extended period of time, have been
published to demonstrate the effectiveness of
KM practices in various organizations [4, 5]. All
these studies, among other interesting fndings,
emphasize the necessity of building KMS to
cater to the needs of knowledge workers such
that it directly affects the way they perform a
task. Such task-oriented approach of knowledge
management is called the task-based knowledge
management (TbKM) [5, 6].
Burstein and Linger, based on their
extensive field works, positioned TbKM as a
robust framework suitable for studying and
14
analyzing the characteristics of knowledge-
intensive tasks [6]. They define a task as a
substantially invariant activity with outcomes,
including tangible outputs. A task is performed
by socially situated actors. Burstein and Linger
use the term knowledge work referring to the
collection of activities that constitute a task. They
have also outlined a conceptual architecture
of KMS supporting the workers in performing
tasks. This clearly makes a huge paradigm shift,
changing the focus to task instead of organization.
As a consequence, the designer of a KMS, instead
of studying the KM requirements of an entire
organization in all its diversity, can study the
requirements of individual workers or community
of workers involved in performing specifc classes
of tasks and design system supports adapted to
the task-specifc characteristics.
In TbKM, a KMS designed for supporting
a targeted task or task-type is called a knowledge
work support system ( KWSS) [6]. TbKM can be
viewed as a framework for developing KWSS,
each supporting a single targeted task or task
type. Typical examples of such task-types are
survey design, dictionary construction, weather
forecasting, etc. [6]. In large organizations, there
is a bulk of knowledge-intensive tasks involving
planning, decision-making and many other
creative and reflective requirements where
suitable KWSS can be of great value. However,
due to the scale and complexity of such tasks,
for building effective KWSS we need to consider
several additional aspects of KWSS that have not
received adequate consideration under TbKM.
EFFICIENCY AND RELEVANCE: THE
CONTEXT
This paper looks into the problem of providing
the knowledge workers effcient access to relevant
information. In the process, the discussion will
also delve on developing a novel approach for
building a very advanced information archive
called the information warehouse (IW) to
meet the requirements of knowledge workers
performing complex tasks using KWSSes.
As indicated above, the two operative
terms vis--vis information access here are
effcient and relevant. In conventional information
management systems the unit of creation, archival
and retrieval of information is whole document.
On the other hand, according to TbKM, a task is
a system of activities consisting of structure and
processes. Structure refers to the composition
of the task in terms of smaller activities and the
processes encapsulate the interrelation of the
activities. It is easy to see that a knowledge worker
engaged in performing a task, at any given time, is
actually engaged in performing an activity as part
of the task. Consequently, his/her information
needs are governed by the current task.
A regular search for information often throws up multiple
documents for the knowledge worker to read and decipher
relevant knowledge
15
However, the worker, while seeking
information, receives a set of whole documents,
some of which, depending on the precision of
the IR system, typically contain information
useful to the worker, embedded in relatively
small portions. Nevertheless, to access the
information, the worker needs to read a number
of (ideally all) whole documents retrieved,
which puts enormous demand on the cognitive
ability of the worker as well as on the available
time to her. Further, in todays environment,
where availability of multiple IT-enabled
archives of enormous volumes is common in
organizations, the worker faces serious threat
of information overload.
The most crucial support an information
system can provide to a knowledge worker
is the access to information that the worker is
likely to fnd relevant and thus useful. However,
judgment of relevance is a complex issue involving
situational, topical, cognitive, even social aspects,
many of them being beyond the scope of
information systems [7]. Nevertheless, the context
defned as any information that can be used to
characterize the situation of an entity forms an
important basis of judging the relevance [8].
From the perspective of a knowledge worker, her
current task and the current activity as part of the
task, dictate the most important components of
context. Suffce it to say:
The information needed by a knowledge
worker, to make use in the context of her current
task-activity instance, is produced by other
knowledge workers as part of different instances
of task (may be of different types altogether)
performance, within their own contexts.
Thus, to properly use the information,
the current worker, i.e., the consumer, needs
to understand and/or compare the current
context with those of the other workers, i.e., the
producers of information.
While describing different aspects of
IW, this discussion will draw upon various
examples from a patient-care KWSS, under
development, for clearer explanations.
INFORMATION WAREHOUSE (IW)
IW supports archi val of contextual i zed
information and is organized into two layers
the contextualization support and the
task instance (TI) archive that put together
provide the required functionalities [Fig.1].
The contextualization support layer consists of
artifacts that include structural and semantic
defnitions of the supported tasks/task-types
and the domain vocabulary. They provide
means to category-based annotations of the
informational elements and are developed as
part of building individual KWSS. On the other
hand, the TI archive contains the information
produced/reproduced by performance of
instances or episodes of supported tasks.
The Contextualization Support: Knowledge
work i s defi ned as the producti on and
The Information Warehouse
Contextualization Support
Generic
Informational Elements
Domain Semantics/
Vocabulary Theasurus/
Ontology, etc.
Mapping:
Task Elements to Generic
Info Elements
Mapping: Task
Elements to Domain
Semantics
Target Task(s)
Definition(s): Structural
and Semantic
The Task Instance (TI)
Archive: TI Information are
contained in Generic
Informational Elements:
Granular, Linked and
Provenanced
Figure 1: The Information Warehouse (IW) Architecture
Source: Infosys Research
16
reproduction of information and knowledge
[9]. The defnition simultaneously captures two
aspects of knowledge work:
1. Co n s u mp t i o n o f I n f o r ma t i o n :
Consumption of information allows a
human worker to gain new knowledge
and/or update existing knowledge.
2. Production of Information: When the
worker articulates the knowledge, in
some symbolic form, production of
information occurs.
Information, unlike knowledge, is
an entity amenable to capture in persistent
media, sharing and archival. Information can
be in the form of end products (reports, plans,
strategies, procedures, lessons learned, etc., or
as intermediate products (memos, suggestions,
arguments, minutes of meeting, etc).
Ready for Discharge
Ready for the Physician
Patient Registered
Physician with KWSS
Ready for Treatment
Plans Modified
C
o
m
m
u
n
i
c
a
t
e
EOT: End of Treatment
Follow-up Planned
Treatment Planned
Admin System In the Patient-care Facility (Hospital Ward)
Patient Discharged
Treatment Completed
Receive Instructions
EOT
Communicate
(Receive Plans )
Patient under Treatment
Monitor
Condition
Condition Recorded
EndTreatment
Apply
Treatment
Send Plans to
Patient-care
Facility
i
n
s
t
r
u
c
t
E
O
T
(Periodic/
Scheduled)
(Register Patient) (Register Patient)
(Discharge Patient)
(Wait for finish and
provide admin support
when required)
(Assign Physician)
(Receive Patient)
(Determine Treatment)
(Follow-up treatment)
(Send to
Physician)
(Complete) Review
Admit for Treatment
PatientAdmitted
Figure 2(a): The Activity Interdependency Model Source: Infosys Research
The contextualization support layer
of IW consists of artifacts describing domain
vocabulary, task-structure definitions and
mappings between elements of task-structure
definitions and various types of container
objects used by the system to archive the
informational elements. There could be many
perspectives from which the structure of a
knowledge work can be analyzed [10]. This
discussion builds up on the activity-based
approach proposed by Dustdar [11]. Here task-
structure defnitions comprise of two kinds of
artifacts namely the activity dependency model
and the information usage model. Activity
dependency model describes the various
interdependencies among the activities within
a task whereas the information usage model
describes the supportive interrelationships
among the informational elements associated
with the activities. Graphical representations of
the top level defnitions for a patient-care KWSS
17
are shown in Figures 2(a) and 2(b). The elements
of these artifacts are used to mark, annotate
and provide semantics to the instance-specifc
or episodic informational elements archived in
TI archive.
Methodologies for analyzing the task-
structures and processes, and developing the
definition artifacts are beyond the scope of
the current discussion. However, for the sake
of clarity we present the result of a high level
analysis of the patient-care task [Fig. 3].
The Task Instance (TI) Archive: The inner
layer of IW, called the task instance (TI) archive
contains the information produced through
knowl edge arti cul ati on. The knowl edge
reproduced is often from external sources and
is found relevant while performing the instances
or episodes of supported tasks/knowledge
Patient Consults
the Doctor and
Initiation of the
Case (in medical
sense) Occurs
Diagnostic
Procedure
Diagnosis and
Treatment Plan
Treatment
Follow-up
Plan
Patient
Condition
Follow-up
Report
Review
Making a
Diagnosis
(next page)
D
o
c
t
o
r
M
a
k
e
s
L
e
a
d
s
t
o
Medication
and/or
Surgery
Therapy
Education
and
Instructions
Included in
Or
Im
p
le
m
e
n
t
Im
p
le
m
e
n
t
R
e
p
o
r
t
E
v
a
lu
a
te
Medical Care Case
Figure 2(b): The Information Usage Model
Source: Infosys Research
works. The challenge lies in organizing the
information in TI archive of IW to leverage the
contextualization support that will provide
the workers with an improved platform
for accessi ng processi ng and produci ng
information. To achieve this, we strive to
impart three essential attributes to the archived
information: (i) proper granularity level, (ii)
linkage, and (iii) provenance of information.
Granularity Level: Knowledge-intensive tasks
are never monolithic or atomic in nature.
They consist of a set of interrelated activities,
often fairly diverse in nature. Knowledge
workers, whether as producers or consumers
of information, at any given point in time,
work on one activity. Thus, instead of large
body of information, as typically contained
in whole documents, it will be much easier
for the workers to work with information as
produced and consumed at activity levels.
In IW, the TI information is archived at
the level of informational elements (IE).
IE is commensurate with the activity and
informational granularity levels specified by
the task definitions. The task definitions are
available for the respective task classes in the
contextualization support layer. Further, TI-
specific IEs are persistently linked with their
corresponding definition elements.
Given such an organization of archived
information, it can be easily seen that the
knowledge worker, as a consumer, can enjoy
great advantage in establishing the context
of the accessed informational elements and
utilize them with much more ease. Further,
the knowledge worker as a producer can also
articulate and produce information at the
activity level that can be easily contextualized
by the artifact supporting the task-type she is
engaged in.
18
Linkage: With the maintenance of proper
granularity level of the informational elements,
creation and maintenance of proper linkage
among the informational elements can improve
the impact of IW manifold. This follows
from the well-known fact that the value or
usefulness of a piece of information increases
enormously when it can be associated with








other related pieces of information and studied
together.
For example, consider a patient-care
system. Consider a scenario where a doctor,
treating a patient, is trying to make a diagnosis
based on observed physical and clinical
fndings. Undoubtedly, mere information on the
fact that there were x number of patients who
Activity
Patient Consults the
Doctor and Initiation
of the Case
(in medical sense)
Occurs
Diagnostic
Procedure
Diagnosis and
Treatment Plan
Follow-up Plan
Follow-up Report Patent Condition
Treatment
R
e
p
o
r
t
E
v
a
l
u
a
t
e
Review
T
h
e
r
a
p
y
M
e
d
i
c
a
t
i
o
n
A
n
c
h
o
r
S
u
r
g
e
r
y
E
d
u
c
a
t
i
o
n
a
n
d
I
n
s
t
r
u
c
t
i
o
n
s
Actors
Registers Creates
Using
Performs
Makes
Informational
Elements
Medical record/file
Anchor
Creates
The Case for the
Patient's Treatment
Task
Processing
Making a
Diagnosis
On Diagnosis
The Diagnosis
DecisionTreatment
Plan Workplan: List
Medication
Workplan List:
Activity, Schedule
Therapy
Workplan List:
Activity, Schedule
Instructions
Workplan List:
Activity, Schedule
Expectations/ Prognosis
Follow-up Plan List:
Expectation, Schedule
Follow-up
Report
Analysis
Store Analyze
Consulting
Treats Patient
Processes
The Case
R
e
c
o
r
d
C
o
n
d
it
io
n
s
A
p
p
ly
th
e
P
la
n
s
S
t
a
r
t
s
L
e
a
d
s
t
o
I
m
p
l
e
m
e
n
t
I
m
p
l
e
m
e
n
t
Conventional
Healthcare
Database
Review
Review
Conventional
Healthcare
Database
Patient
Receptionist
Doctor
Nurse
Nurse
Patient
Figure 3: Developed Contextualization Support Artifacts
for Patient-care KWSS
Source: Infosys Research
19
had shown similar conditions, will not suffce.
The doctor would want to know about the
diagnoses, prescribed treatments, success rate,
pattern in the prognosis and many more such
related pieces of information. Such a capability
needs rich navigable links among the IEs.
In IW, the l i nks between vari ous
informational elements within a task instance
as well as across task instances is maintained
based on several criteria so that given a piece
of information one can easily fnd other related
IEs. This can help various services in the layer at
the top of IW and can serve establishing broader
context, providing support/evidence, allowing
follow-up of the usage as well as consequence of
usage of the information and many other aspects
of the information and knowledge works.
Provenance: Provenance of a pi ece of
information encompasses related information
on how, where, what, when, why, which and by
whom the information has come into existence.
This relates directly to the requirement of
reliability and authenticity of information that
are, as observed by Schultze, as among the most
important aspects of information other than the
content itself [13]. Further, it forms the basis of
validation of the information that is of crucial
importance in various task types. In the scenario
of collaborative works, where the information,
containing some crucial arguments, prepared by
a co-worker can be examined much rigorously
using the provenance and linkage facilities
that go beyond the upfront information.
Contributions of the workers is thus likely to
be much more effective due to exchange of
information and collaboration. Apart from its
enormous importance to knowledge workers
by its own virtue, in many tasks, maintenance
of detailed provenance information is required
by various regulatory frameworks.
Creation and maintenance of provenance
can be easily achieved in IW. Due to activity
level granularity of the information and the
existence of links among the IEs as well as
between IEs and elements of task-defnitions,
if the identity of the worker is consistently
maintained among the contents of IEs, one can
readily establish the provenance of an IE from
multiple perspectives. Also, IE level provenance
can be easily utilized to compute provenance at
various levels within the tasks.
Figure 4 depicts an example of TI
information as they are archived annotated by
contextualization elements.
DESIGNING THE TI ARCHIVE IN IW
At the technical design level one can envisage the
TI archive as a network of information elements
(InfoEl). The InfoEls are objects, specialized
to serve as containers of various types. The
InfoEl objects also contain specialized methods
that can assist computations commensurate
with the corresponding IE types. InfoEls are
interconnected through two types of links,
the creational links and the reference links,
and we call the organization as the creational
and referential network or CaRN view of
information [Fig. 5]. The CaRN view has the
following properties:
The CaRN view consists of task instances
(TI). In other words, TIs are the macro
units of the CaRN view.
A TI contains all information developed
duri ng perf ormance of a t ask or
knowledge-work instance and defnes the
sub-network of InfoEls corresponding to
that particular TI.
A creational link joins two IEs when the
information content of one is created to
satisfy the need of creating the content
20
Initial Findings
Analysis
Test
Observations
Test 2
Observation
Test 1
Observation
Test 3
Observation
Test 2
Observation
Test 1
Observation
Diagnosis (Differential)
DecisionTreatment
Plan Workplan(s)
Medication
Workplan List:
Activity, Schedule
Therapy
Workplan List:
Activity Schedule
Instructions
Workplan List:
Activity, Schedule
Expectations/
Prognosis
Follow-up plan List:
Expectations,
Schedule
Follow-up Report
Analysis
Aspect 2
Analysis
Test
Observation
Diabetes
Blood Sugar
Levels
Fasting BS
Post-meal BS
Micro
Albumin Level
Test 1
Test 2
Observation
Aspect 1
Analysis
Metabolic
Condition
Possibility 1Action
Possibility 2Action
Aspect 1
Analysis
Weight Loss
Increase inThirst
andAppetite: Palpitation
and Weakness
Diabetes or
Heart-disease or Both
Heart Disease
Cardiac
Performance
Treadmill Test
ECG
Blood
Chemistry
Uric Acid
Condition: Diabetes,
Initial Stage,
Kidney notAffected
Nerve
Conditions
Glyco-
Haemoglobin
Verify/ConfirmDiagnosis
and/or Record Patient
Status Analysis
Find
Possibilities
C
h
e
c
k
F
o
r
C
h
e
c
k
I
n
v
e
s
t
i
g
a
t
e
I
n
v
e
s
t
i
g
a
t
e
Investigate
Investigate
N
o
r
m
a
l
N
o
r
m
a
l
Normal
Measure
Measure
Measure
Measure
Measure
Measurements
Measurements
Measurements
Measurements
Measurements
All within
Normal Ranges
Feedback on
Success of
Treatment
ModifyTreatment
Based on Feedback
S
i
n
g
l
e
C
o
n
d
i
t
i
o
n
Processing the Case
Making a Diganosis
History
Observation
Symptoms
Observation
While processing any of the IntelliObjects the
user can search the archive for relevant pieces of
knowledge (other IntelliObjects) and navigate
around using context and relevance links to study
them. Those found useful in solving the problem
(s) will be marked relevant and newrelevance
links may be created and maintained.
Signs
Observation
Phys Exam
Observation
Lipid Profile
Diabetic
Retinopathy
Comparison of
Reported BS Levels
With Expected Values
Expected BS Level
Lowering Under Treatment
Diet Control, Regular
Checking of BS Level
Report to Doctor
Every Fortnight
Regular Physical
Exercise
Min. 30 Min/day
Oral Medicines
and/or Insulin Inj.
No Major Illness
in Recent Past
Aspect 2
Analysis
False True
Figure 4: An Example of Task Instance Information as
Archived
Source: Infosys Research
21
of the other. Creational links also refect
the relationships between pieces of
information developed by activities
performed as part of the same task
instance, i.e., the creational links are
allowed to form between IEs that belong
to the same TI.
A reference link exists between two IEs
when the content of one is used, but not
explicitly created to cater to the needs
of the other. Thus, the reference links
are free to cross the TI boundaries to
associate IEs belonging to different TIs.
Implementations of the archival of
information in CaRN view can be achieved
with relational databases (we have adopted this
option in our current prototype), XML databases,
object databases, content management systems
or some hybrid of them. However, irrespective
of implementation technology adopted, it can be


















TI TI TI Special TI
(Policy)
Special TI
(Legal
Constraints)
Special TI
(Resource
Constraints)
A Task Instance (TI)
Task
Subtask 1
Subtask 2
Soln. alt. 1
Soln. alt. 2
Solution
aspect 1
Solution
aspect 2
Verification 1 Verification 2
Finding 1 Finding 2
Observation 1 Observation 2
Subtask 3
Selected
Solution
Reasons
Action Plan
Impact
Watch Plan
Expected
Impacts
Actual
Impacts
Casual Analysis
of Deviations
Modified
Action Plan
Information
Elements (InfEIs)
Creational Relationship Reference Relationship
Figure 5: The CaRN View of Information in TI Archive Source: Infosys Research
easily perceived that in this scheme, access to the
content of InfoEls and their interconnections can
provide the user with information with higher
relevance along with a capability of navigation
to other contextually related information using
the edges of the InfoEl network.
CONCLUSION
The I W embodi es a powerf ul i dea f or
information archival for supporting knowledge-
intensive tasks in several ways signifcantly
ahead of the conventional practices. The
core facility provided by the IW for archival
of richly contextualized information opens
up enormous possibilities of building next
generation information systems for knowledge
management. At the one end of the spectrum we
can envisage building customized applications
leveraging the information in IW to support
individual tasks such as patient-care, legal
research, etc.
22
At the other end of the spectrum lies
a very exciting possibility. One can think of
building a technology platform consisting of the
IW and a host of interesting information access
and processing services, such as exploration,
collaboration, argumentation, recommendation,
contextualized articulation, transcription, etc.,
and most importantly a KWSS development
service for building and maintenance of KWSS,
running at the top of the platform. Given that
the platform is installed in an organization,
development service can be used for building
the defnitions for new tasks and deploying
them to realize new KWSS on top of the same
platform. Further, it can enable verifcation and
update/improvement of performance of existing
KWSSes, resulting in overall organizational
learning. Suffice it to say, that a platform
consisting of IW can be used for building a
number of very sophisticated, inter-operating
KWSS to address the knowledge management
needs of an organization, to a large extent.
REFERENCES
1. R Maier, Modeling Knowledge Work for
the Design of Knowledge Infrastructures,
J University Computer Science, Vol 11,
No 4, 2005
2. M Alavi and D E Leidner, Knowledge
Management Systems: Issues, Challenges,
and Benefts, Comm. AIS, Vol 1, Article
7, 1999
3. G P Pisano, Knowledge, Integration,
and the Locus of Learning: An Empirical
Analysis of Process Development,
Strategic Management Journal, Vol 15,
Winter 1994
4. I Becerra-Fernandez and R Sabherwal,
Organizational Knowledge Management:
A Contingency Perspective, JMIS, Vol 18,
No 1, Summer 2001
5. D B Leake, L Birnbaum, C Marlow
and H Yang, Task-Based Knowledge
Management, In Proceedings of the AAAI-
99 Workshop on Exploring Synergies of
KM and Case-Based Reasoning, AAAI
Press, 1999
6. F Burstein and R Linger, Supporting
post-Fordist practices A Knowledge
Management Framework for Supporting
Knowledge Work, Information Technology
and People, Vol 16 No 3, 2003
7. T D Anderson, St udyi ng Human
Judgments of Relevance: Interactions in
Context, Proceedings IIiX, Copenhagen
Denmark 2006
8. A Dey, G Abowd and D Salber, A
Conceptual Framework and a Toolkit
for Supporting the Rapid Prototyping of
Contextaware Applications, Hum-Comp
Interact 16, 2001
9. N Stehr, The Knowledge Society, Sage,
Cambridge, UK, 1994
10. M Zachry, C Spinuzzi and W Hart-
Davidson, Visual Documentation of
knowledge work: An Examination of
Competing Approaches, Proceedings of
25th annual ACM International Conference
on Design of Communication, 2007
11. S Dustdar, Reconciling Knowledge
Management and Workfow Management
Systems: The Activity-Based Knowledge
Management Approach, Jadavpur
University Computer Science, Vol 11, No
4, 2005
12. J P Aarons, F Burstein and H Linger, What
is the Task? Applying the Task-based
KM Framework to Weather Forecasting,
Organizational Challenges for KM, 2005
13. U Schultze, A Confessional Account of
an Ethnography about Knowledge Work.
MIS Quarterly, Vol 24, No 1, 2000.
23
SETLabs Briefings
VOL 7 NO 5
2009
Whats in a Name?
By Yogesh Dandawate and John Kuriakose
Recover business ontology from existing enterprise
IT assets, incrementally
B
usiness ontology is a formal and precise
representation of business knowledge in
terms of concepts, relations and rules. Software
engineering within the enterprise involves
a translation of knowledge about the world
from abstract mental models to executable
code. Business domain, technology design and
implementation language are three distinct
knowledge domains involved in software
engineering.
The current software engi neeri ng
scenario is characterized by geographically
dispersed teams and work transition between
these teams. It is vital that the entire team has a
shared understanding of the system in terms
of the business domain, functional features,
internal structure (architecture and design)
and implementation (the program code) of
the software. The team needs a more formal
knowledge representation to respond to the
current scale and complexity.
Creating this shared understanding
of applications within the IT portfolio and
its context involves making implicit and
tacit knowledge explicit in ontologies and
employing knowledge engineering methods.
However, creating formal business ontology
from scratch is not feasible because of the scale
involved and the specialized skill sets required.
This paper outlines a more effective approach
to exploit existing IT artifacts (primarily
program code) across the IT portfolio to recover
fragments of the business ontology. Users can
then respond to this initial representation and
refine it to incrementally build the business
ontology.
Information overload is a key problem
in the knowledge management arena. To
analyze raw data one consumes expensive
resources. Previous research in cognitive
science has shown that human mind creates
mental models to comprehend [1]. A developer
creates a mental model during the process of
understanding the software system. The mental
model constitutes of key abstractions that exist
in the code.
The concepts in a domain form the
key abstractions in the mental model. The
mental model resides in the head of the
expert. In the current software development
scenario where there is continuous transition
of developers, the knowledge gathered about
the system should be made available to each
stakeholder. It is therefore essential that
24
one captures the knowledge of the experts
formally and utilizes it for the benefit of
lesser experienced developers. This will help
reduce the comprehension time as well as
spread uniform understanding among the
developers.
Knowledge engineering (KE), an offshoot
of artifcial intelligence (AI), has been working
to resolve the problem of formal knowledge
representation. Research in this area suggests
that one needs domain ontology for background
knowledge, a knowledge base (KB) to store
information and an inference mechanism to
comprehend.
OBSERVATIONS
Problem domain or business domain concepts
are represented using programming language
concepts:
e.g., Concept::Account Class::Account
Some key problems that prevail in
software engineering and hinder comprehension
of code are:
Concept Scattering: Single concept in
business or problem domain gets represented
in multiple artifacts in multiple programming
languages.
Missing of Semantic Links between
Program Artifacts: Concept attributes across
multiple languages share same value spaces. For
instance, a java program uses values of database
location that is available in a XML fle. Hence
the database location is defned in a XML fle
whereas the consumer of the information is a
java program. This multi-language-multi-artifact
defne-use leads to semantic integrity issues.
During program comprehension the
developer attempts to learn or gain knowledge
about various aspects of the program. This
involves learning new knowledge concepts
as well as mapping linguistic terms in the
program text to concepts in one of the domains.
The programming language domain or the
implementation domain, architecture and
design domain and business or problem
domain are the three distinct knowledge
domains involved in software engineering
[Fig. 1].
A formal representation of knowledge in
a domain consists of concepts and relationships
in that domain and this is what is known as
Ontology [2, 3].
Ontol ogy can be seen as a shared
agreement to represent knowledge within
a community of stakeholders as a formal
representation.
Ont ol ogy, t heref ore, has a soci al
dimension of agreement and commitment
from the members of the community to what
is represented and also a formal dimension of
machine interpretation with precise semantics.
Thus, ontologies form the foundation for building
integrated knowledge repositories or knowledge
bases that capture shared understanding within
a software engineering team.
Domain
Ontology
Architecture
and Design
Ontology
Implementation
Language
Ontology
Figure 1: Multi Domain Ontologies
Source: Infosys Research
25
Recentl y, the use of ontol ogi es to
address some of these knowledge management
problems within software engineering has
been proposed [4]. One of the specifc software
reverse engineering problems we address is
the recovery of the business domain concepts
and relations from the structured program
artifacts. Previous work in this feld of ontology
learning has primarily focused on the web and
unstructured text corpus as data sources [5].
CREATING ONTOLOGY
Two primary reasons that inhibit clean slate
ontology creation are:
Existing ontology languages present a
high barrier of entry that inhibit their use
and deployment in the enterprise context
The enterpri se al ready has ampl e
structured and unstructured information
that can be mined to learn the business
ontology rather than attempting to create
it from scratch.
Our discussion brings forth an approach
to exploit structured data sources within the
enterprise IT portfolio that include program
artifacts, structured web content, web services
descriptions, XML and messaging artifacts,
database schema and business process models
to recover elements of the business ontology.
The key idea presented is to extract and exploit
the identifier names within the respective
formal languages and apply lingo-syntactic
patterns that govern the composition of
identifer names from basic tokens.
APPROACH
Previous studies have already established the
contribution and role of meaningful identifers
within program code in comprehension [6, 7].
Research indicates that almost 47-62% of the
development time is spent in understanding
the previously written code [8].
An attempt is made to build an extensible
knowledge base that is capable of aspects like
extracting identifiers from the code base and
tokens from identifiers, storing identifiers,
tokens, programming language elements
and syntactic relations between the program
elements. It should also be able to analyze
how multiple tokens are composed to form
program identifier names and exploit syntactic
rules within the language to identify language
relations between the tokens. Other features
of the extensible knowledge base should
include features viz., leveraging existing
machine processable semantic lexicons for
disambiguation of the meaning of a token in
the context; identifying basic linguistic tokens
Auto recovery of ontology from the formal software artifacts
should be seen as a feasible approach than creating ontology
from scratch
26
that model business concepts and separating
tokens from business and technology domain;
and applying axiomatic token rules to identify
concept tokens and relations between them.
CERTAIN FACTS ABOUT IDENTIFIERS
Object oriented program code is composed of
classes, methods, variables which are qualifed
by identifers.
Deibenbock and Pizkas work in the
past highlights facts that state that about
70% of the code comprises of identifers and
every programming language available today
allows use of arbitrary sequence of characters.
There exists no mechanism available to check
if the identifer names are meaningful. They
also opine that consistent and well-formed
variable names, can improve code quality and
that programming guidelines have naming
conventions intended to improve the readability
of the code. Identifer names are intended to
communicate the concepts that they represent.
This helps comprehension and maintenance.
Also, multi programmer development does not
guarantee uniform usage of identifer names.
One fnds synonyms of the concepts being used
heavily. Thus, single concept gets represented
with multiple names in the code. Programming
styles of programmers also create additional
challenges during identifier analysis. The
programmers tend to use prefxes like I for
interfaces, or may use suffxes like impl for
pure implementation of the types [6].
Recovering the mapping from program
or linguistic lexical tokens to business concepts
that are vehicles of domain semantics is the goal
of our work.
THE RECOVERY PROCESS
A eight phased approach as has been detailed
below has been adopted for ontology extraction.
Identifer Extraction: Extractors are built for
parsing the programming language artifacts
and the facts are extracted. Reverse engineering
techniques are used for extracting data. The
identifers are part of the facts that are extracted.
Every extracted fact is augmented with the
concept information to which it belongs. For
e.g., sample code:
Cl a s s S a v i n g Ac c o un t e x t e n ds
AbstractAccount
{}
Extracted Identifers:
SavingAccount
AbstractAccount
Extracted Relations:
InstanceOf(AbstractAccount, Class)
IsNameOf(AbstractAccount, Class)
InstanceOf(SavingAccount, Class)
IsNameOf (SavingAccount, Class)
Extends(SavingAccount,
AbstractAccount)
Token Extraction: Identifers or terms comprise
of one or more words or tokens. Identifers
are separated either by some separator (e.g.,
underscore, periods) or camel case (e. g. ,
AbstractAccount).
For e.g.,
Savi ngAccount = {t0: Account, t1:
Saving}
AbstractAccount = {t0: Account, t1:
Abstract}
Token Filtering and Validation: WordNet
database of English is used for separating valid
and invalid tokens [6]. The tokens that have
a meaning are valid tokens and are tagged
appropriately.
27
Acronym Generation: A list of possible
acronyms/abbreviations is generated for every
valid token detected by WordNet. Criteria for
acronyms are that they begin with the same
letter as the token, maintain letter sequence and
are three letters or longer.
For e.g., Token {Tok, Toe, Ton, Tke,
Tkn, Ten, Toke, Tokn, Tken}
Invalid Token Analysis: Invalid tokens are
mostly popular acronyms of some valid word.
The knowledge base is queried if this invalid
token is acronym of some word. A table is also
maintained where most common acronyms
(also called as stop list) are available, for e.g., str,
int, lang, acnt, etc [7]. This table is also queried
to convert the invalid token to valid meaning
word. If there are multiple possible expansions
then the token with highest frequency is
selected.
Valid Token Analysis: Every token is assigned
a part of speech usi ng WordNet. Noun,
adjective, verb, adverb are of our typical
interest as they follow certain grammatical
rules. Identifiers may have multiple parts
of speech. This can be referred to as parts of
speech disambiguation.
I n addi t i on, a uni que st emmi ng
algorithm, also known as stem finder is used
for identification of the stems of each token.
The tokens are clustered based on the stem.
Each stem that is discovered by the above
process represents concept in either the
software technology domain or the business
domain. For e.g., Order is stem available in
identifiers:
PurchaseOrder,
SupplierOrder,
ProxyOrder,
ServiceOrder,
CompletedPurchaseOrder,
ShoppingcardEmptyPurchaseOrder,
MailPurchaseOrder,
UnknownPurchaseOrder,
PetStorePurchaseOrder,
AdminServicePurchaseOrder
Stem Filtering: This stage requires programming
domain concepts to be captured as a stop
list. Post fltering, the domain concepts can
be recovered and can be suitably tagged.
Knowledge-base post fltering has mapping
of domain concepts to the artifact in which
they reside. This information is crucial while
performing impact analysis.
Applying Empirical Rules over the Recovered
Concepts: The last business noun (BN) in a
homogenous sequence is a major concept.
The token position and application of
empirical rules aid in recovery of ontological
concepts and relations
28
For e.g., createSavingAccountEJB.
The last business noun is Account and
hence the major concept.
When frst token is a Verb it indicates
action (domain/ technology)
e.g., transferFunds() -> {transfer, Funds}
(transfer indication action)
In sequence of business nouns BN1BN2
can be interpreted as: BN1 is always followed by
BN2 (not an independent token). Then BN1BN2
is subsumed concept of BN2 [Fig. 2].
A knowledge-base capable of storing
knowledge was developed. A byte code and a
source code extractor were developed to extract
java program information into a knowledge base.
The recovery process was run on the pet
store application that is provided by Sun. The
stems recovered from the recovery process were
the key concepts that were scattered in codebase.
It was possible to separate the business domain
concepts from the technological domain concepts
and locate subsumption hierarchies from the code.
It was also possible to identify the possible actions
that were performed on the domain concepts.
CONCLUSION
Proper identifer naming aids comprehension.
Identifers are vehicles of semantics. Identifer
names encompass knowledge that can be analyzed
to identify key concepts that form the business
domain ontology. Several rules of inferences
can be formulated on empirical analysis of the
identifer tokens. The analysis of verbs present in
the identifer names can be analyzed for fnding
the actions that are performed over a business or
technology concepts. The relationships present in
the code and token position can be used to build
concept subsumption hierarchies.
Admin Client
Service Order
Completed
Purchase Order
Unknown
Purchase Order
TPA Supplier
Order
PetStore Proxy
Order
Admin Service
Order
Shopping Cart
Empty Purchase
Order
Mail Purchase
Order
Purchase Order
Supplier Order
Proxy Order
Service Order
Order
Figure 2: Subsumption Hierarchy Source: Infosys Research
29
REFERENCES
1. H Muller, Understanding Software
Systems Using Reverse Engineering
Technology 1994, http://www.rigi.csc.
uvic.ca
2. J D Novak and D B Gowin, Learning
how to learn, NY: Cambridge University
Press, 1984
3. John F Sowa, Knowledge Representation
- Lo g i c a l , P h i l o s o p h i c a l a n d
Computational Foundations, Brooks/
Cole 2000
4. H Knublauch, Ramblings on Agile
Methodologies and Ontology-Driven
Software Development. In Workshop
on Semantic Web Enabled Software
Engineering (SWESE), Galway, Ireland,
2005
5. P Buitelaar, P Cimiano and B Magnini,
Ontology learning from text: Methods,
applications and evaluation. IOS Press,
2005
6. F Deienbck and M Pizka, Concise
and Consistent Naming, In Proceedings
of the 13th International Workshop on
Program Comprehension (IWPC 2005),
St Louis, MO, USA, May 2005. IEEE
Computer Society
7. Dawn Lawri e and W Bruce Croft,
Discovering and Comparing Topic
Hierarchies, in Proceedings of RIAO
2000 Conference. Available at http://
ciir.cs.umass.edu/pubfles/ir-183.pdf
8. h t t p : / / w w w 2 . u ma s s d . e d u /
swpi/1docs/comprehension.html
9. Marin Fowler, Kent Beck (Contributor),
John Brant (Contri butor), Wi l l i am
Opdyke, Don Roberts, Refactoring:
Improving the Design of Existing Code,
Addison Wesley
10. Scott R Tilley and Dennis B Smith, Coming
Attractions in Program Understanding,
Technical Report. Available at http://
www.sei.cmu.edu/pub/documents/96.
reports/pdf/tr019.96.pdf
11. Prasenjit Mitra and Gio Wiederhold,
An Ontology-Composition Algebra,
Handbook of Ontlogies, Springer
12. Wordnet. Available at http://wordnet.
princeton.edu/
13. Vaclav Raj lich and Norman Wilde,
The Rol e of Concepts i n Program
Compr ehens i on, i n I WPC 2002.
Available at http://www.cs.wayne.
edu/~vi p/publ i c at i ons /r aj l i c h_
IWPC2002.pdf
14. ht t p: //f aq. j avar anch. com/j ava/
associ at i onVsAggr egat i onVsCom
position
15. Florian Deissenboeck and Daniel Ratiu,
A Unified Meta-Model for Concept-
Based Reverse Engineering. Available
http://www4.informatik.tu-muenchen.
de/~dei ssenb/publ i cati ons/2006_
deissenboeckf_concepts.pdf
16. D Pos hyva nyk a nd A Ma r c us ,
Combining Formal Concept Analysis
with Information Retrieval for Concept
Location in Source Code. Program
Comprehension, 2007. ICPC apos 07.
15th IEEE International Conference on
Volume, Issue 26-29 June 2007
17. Kim Mens, Diego Ordoez Camacho and
Mathieu Syben, Ny Navigating through
Java Programs with Concept Lattices.
December 20, 2006
18. Gabriela Arvalo, Stphane Ducasse and
Oscar, Lessons Learned in Applying
Formal Concept Analysis to Reverse
Engineering, Lecture Notes in Computer
Science, Springer Berlin / Heidelberg,
January 2005
30
19. A Maedche and S Staab, Discovering
Conceptual Relations from Text, in
Proceedi ngs of the 14th European
Conference on Artificial Intelligence
( ECAI 2000) , Augus t 20- 25 2000,
Berlin, Germany, Amsterdam, IOS
Press 2000
20. H Yang, Z Cui and P O Br i e n,
Extracting Ontologies from Legacy
Sys t ems f or Under s t andi ng and
Reengineering, in Proceedings of the
23rd IEEE International Conference on
Computer Software and Applications
(COMPSAC 99), IEEE Press 1999.
31
SETLabs Briefings
VOL 7 NO 5
2009
Knowledge Management for
Virtual Teams
By Manish Kurhekar and Joydip Ghoshal
Use advanced KM techniques to enable
knowledge contribution and enhance
KM usage culture in virtual teams
O
rganizations today operate at a global
level to leverage the favorable economic
conditions that come with high technological
advancements in connectivity. As a result, they
need IT support and offshore IT teams around
the globe. This has given rise to the concept of
globally dispersed team, also known as virtual
team.
The current business setup across the
globe considers knowledge to be one of the
key resources, enabling competitive advantage.
Knowl edge management i s furthermore
important for industries operating in a global
environment and across time zones with
a diversely located workforce. This paper
outlines the knowledge-based challenges
faced by virtual teams and some best practices
and techniques that help circumventing such
challenges.
VIRTUAL TEAM
A virtual team consists of a group of people
working across geographies, time zones and
organizations. They are connected together
through some technologies and tools for
communication. These teams comprise of
tele-workers working from home and/or
team members working from different offce
locations. Some virtual teams are formed across
the organizations. For instance, the team can
comprise of members of a vendor or members
of a client. Virtual team members are to possess
varied skill sets for the success of a particular
project or operation. While the virtual teams
have a purpose that binds them together,
organizations ensure that these individuals
understand and are committed to the values and
vision of the organization. In a typical virtual
team scenario, members come together based
on the skill sets to execute particular tasks and
once the task is over, the members get allocated
for other tasks.
On a three dimensional axes, Figure 1
shows how virtual teams can be classifed based
on same or different time, same or different
place and same or different organization.
32
IMPORTANCE OF KM IN VIRTUAL TEAM
ORGANIZATIONS
Knowledge management (KM) is an essential
component of an organization with virtual
teams. KM enables the industry today to face
the challenges of the modern business world
with various innovative solutions in terms
of products and services. Several factors that
contribute to the importance of managing
knowledge in virtual team organizations are
discussed below.
Competitive Advantage: A virtual team, by its
very nature, has a diverse knowledge base. KM
effectively taps into this knowledge base and
translates it into a competitive advantage for
the organization.
Technology: Modern day technology is dynamic
and continuously changing. An organization
must deploy measures to keep the skill set of
the employees updated with the recent cutting
edge technologies. KM in virtual team helps an
organization to disseminate a lot of information
to its workforce effectively, irrespective of the
teams physical location or time zone, with the
help of advanced technology in connectivity.
Organizational Change: Due to factors like
organizational changes, restructuring, changing
business needs, mergers and acquisitions, etc.,
it is critical for organizations to retain old and
create new knowledge in the face of complexity,
uncertainty and rapid change.
Enhanced Decision Making and Improved
Productivity: Lessons from past experiences
and ample use of knowledge base can help
in better predicting, estimating and planning
the future work and thus act as business
intelligence to enhance the decision-making
process. Use of reusable components and
tools also improve productivity and avoid
reinventing the wheel.
Workforce Demographics: As the workforce
ages and changes, knowledge transfer becomes
vital in virtual team organizations just as any
traditional co-located team organization to
sustain critical business functions.
CHALLENGES FACED BY VIRTUAL TEAMS
In a traditional project team, members are
co-located and a face-to-face interaction is the
dominant mode of communication. However
in case of organizations with virtual project
teams, a lack of face-to-face interaction as well
as technological limitations for communication
can prove to be constraints.
A typical scenario is where virtual teams
get together for the purpose of completing
tasks and once the tasks are over, the teams are
dissolved. Thus, there needs to be a process in
place to capture the knowledge gained by each
of the stakeholders of virtual team during a
course of task execution.
Si nce t her e may not be anyone
physically around to monitor and mentor the
work, virtual team members need to be self-
motivated to perform the duties in a given
duration. Some regular concerns that emerge
Time
Place
O
r
g
a
n
i
z
a
t
i
o
n
Figure 1: A Classification of Virtual Team
Source: Infosys Research
33
in a virtual team need to be handled in an
effective manner. A few recurring ones are
discussed below.
Vi r t ual t eams ar e mos t l y r es ul t
oriented. Unless an individual starts
contributing knowledge specifically,
there may not be anyone physically
around t o access expert i se of t he
i ndi vi dual . Al so, t eam member s
shoul d be avai l abl e onl i ne usi ng
communi cati on tool s l i ke i nternet
messenger, emai l , phone, etc. , for
informal and day-to-day knowledge
sharing among team members.
There is a need for virtual teams to
use effective communication, with
the correct use of time and complete
dissemination of information, although
they cannot avail the conventional face-
to-face communication or body language
techniques.
For the success of a task, confdence and
building trust forms an integral part in
virtual team organizations as many a times
team members do not see each other at all.
Organizations should invest in measures
like training workshops and online tools
like videoconferencing using which team
members can see each other and can relate
a name with a face that also helps building
in trust and confdence in a team.
KM TECHNIQUES FOR VIRTUAL TEAMS
KM practices are used in an organization to
capture the learning and best practices from
past work experiences and disseminate the
same among employees for future work. The
knowledge artifacts can also be in the form
of reusable components and tools that can
be plugged in the future work to improve
productivity and thus avoid reinventing the
wheel. Some important KM techniques for
virtual team organizations are discussed below.
Knowledge Contribution: All of us learn
everyday through our day-to-day work. But we
rarely document our experiences. Documenting
experiences is very important for a virtual
team where one cannot share ones experience
through a face-to-face communication with
other team members.
Organi zat i ons shoul d encourage
employees to contribute to the central knowledge
repository. Knowledge creation should be made
a natural process of an employees work process
instead of adding it as an extra workload.
Knowledge should be created at all levels.
While submission of white papers may be a
time consuming process, daily tips or creation
A central knowledge repository for all to
access is one of the key requirements in a
virtual team setup
34
of tools used in day-to-day activities can always
increase the knowledge contribution.
Knowl e dge c ont r i but i ons f r om
employees should fnally be reviewed by some
subject matter experts.
Knowledge Storage: It is very important to
maintain a central knowledge repository for all
the contributions. For a virtual team, easy access
to the knowledge repository is the key factor
for the success of KM. This can be implemented
with a shared LAN folder or a web hosted forum
or a web hosted application.
Knowledge Sharing: The knowledge collected
becomes useful for the organizations only
when it is effectively shared with the people.
There are various ways to implement this
for a virtual team. Some popular ways are
mentioned below.
Mailing Groups: This group consists of a set of
individuals with common interests. It can be
something as small as a project group or it can
be a large group with a business vertical. This
group can share daily mailers, daily newsletters
and day-to-day work in the form of tip of the
day.
Online Tools: Many organizations use online
tools to disseminate knowledge across the
members of virtual teams. Technology-based
knowledge sharing and e-learning modules
help the team members share knowledge
without any face-to-face communication.
Seminars arranged on video conferencing
mode, to simulate co-presence, also helps the
team members meet and interact with each
other more effectively. Online quiz after the
session creates a healthy competition among
the employees and also ensures that the
knowledge shared is being understood and
used by the employees.
F i g u r e 2 i l l u s t r a t e s v a r i o u s
communication methods that can be used
towards effective knowledge sharing based on
the classifcation of the virtual team.
Wikis/Blogs: Wikis are web pages that a group of
people can update together. Wikis are used to
share information with a wide group of people.
Blog is a knowledge information publishing
tool that can be used to share knowledge with
the world. With the help of wikis and blogs,
convenient knowledge sharing can be done
with a wide group of people irrespective of their
physical location. These methods of knowledge
sharing are becoming increasingly popular due
to ease in accessing internet.
Knowledge Evaluation: This is more of an
ongoing monitoring activity from higher
management. Organizations need to make sure
P
L
A
C
E
?
?
?
?
?
Email
Voice-Mail
Wikis, Blogs,
Online forums
Fax
E-learning
?
?
?
Face-to-face
presentation
Whiteboards
Group discussion
? Email
Voice-Mail
Fax
Whiteboards
?
?
?
?
?
?
?
?
?
Web-sharing
Tele-conferencing
Video-conferencing
Video taping
Web seminar
Virtual Whiteboard
TIME
SAME DIFFERENT
S
A
M
E
D
I
F
F
E
R
E
N
T
Communication Methods
Figure 2: Communication Methods in Virtual teams
Source: Infosys Research
35
that all the knowledge that is being collected
and shared, finally lead to enhancing the
quality or productivity of the organization.
Also, recognition from higher management like
appreciating knowledge champions of a team
always encourages active participation from
the employees.
USE OF COLLABORATIVE TECHNOLOGIES
Collaborative methods involve virtual team
members working together to contribute in
knowledge sharing. As shown in Figure 3,
these individuals have direct communication
with each other to brainstorm knowledge ideas
and decide knowledge contribution strategy.
There is also an indirect communication
between the individuals for either updating
the same knowledge artifacts or for using the
knowledge artifact by one individual created
by another.
Make KM a Natural Part of the Workfow:
As mentioned earlier, KM should be a part of
the workfow for the employee. For example,
at the end of a particular phase of the project,
employees can be asked to document their
l earni ngs and submi t to the knowl edge
repository. Similarly, before starting any
research on a topic, the existing knowledge
documents should be referred to.
Knowledge Sharing Culture: Effective KM
needs continuous evolution of culture in virtual
team organizations. KM is not a destination,
it is a journey. Team members should be
continuously motivated to use the existing
knowledge artifacts for performing tasks and
should be encouraged to contribute their
experiences and best practices in the form of
knowledge artifacts.
Also, online facilities like web forums
and video conferencing should be used more
frequently to enhance the knowledge sharing
culture within a virtual team.
Format i on of Knowl edge Workgroup:
Knowledge workgroup/community needs
to be formed based on similar interests to
Knowledge
Repository
Knowledge
Contributor
M
a
i
n
t
a
i
n
s
R
e
p
o
s
i
t
o
r
y
C
o
n
t
r
i
b
u
t
e
s
A
r
t
i
f
a
c
t
s
C
o
n
t
r
i
b
u
t
e
s
A
r
t
i
f
a
c
t
s
U
s
e
s
A
r
t
i
f
a
c
t
s
Knowledge
Administrator
Knowledge
Contributor
Knowledge
Reader / Viewer
Direct Communication
Indirect
Communication
Figure 3: Collaboration for Knowledge Contribution Source: Infosys Research
36
keep abreast with the latest technologies and
trends in the industry. These workgroups are
not formed to execute any particular task and
such workgroups should not be formed by
the management. These should be formed by
volunteers who come together with the sole
purpose of keeping their interests alive.
Maintenance of Existing Knowledge Base:
Addition of new knowledge should not be the
only function of an organization for effective
KM. There needs to be a process in place to
ensure that the existing knowledge artifacts are
updated refecting changing environment and
changes in system and business functionalities.
Also, a process needs to be in place to remove
obsolete knowledge contents from the KM
repository that may have otherwise had an
adverse impact on the organizations business.
CASE STUDY
This was a project where a virtual team was to
build functionality in the clients claims system
that would engage with various Independent
Practice Associations (IPA) to implement claim
delegation arrangements.
The end users were geographically
distributed across countries and subdivided in
two locations in USA. The business owners from
the client were also geographically distributed
across countries and most of them were working
from home.
The project faced various challenges
related to knowledge transfer in a virtual team
and they were resolved with some of the known
techniques as outlined below.
Problem: Lack of unanimity among business
teams as they were split in two locations.
Solution: The project team brought both the
business teams together using teleconference
and captured requirements. Web sharing
technologies were used to capture requirements
and create business models, so that both
the business groups could understand and
contribute knowledge and expertise to the
requirement of each other.
Problem: The offshore team was not always in
sync with the lead business analyst (BA) and
business needs were unclear.
Solution: The offshore team was asked to dial into
the meetings directly to enable them to understand
the actual requirement from the end user without
depending on the e-mails from onshore. Offshore
team members raised and clarified their own
questions and gathered knowledge from the
customer directly during the meetings.
Knowledge needs to be monitored and
archived on a continuous basis to keep the knowledge
repository updated
37
Sl.
No.
Scenarios Related to Knowledge
Management Challenges
SuggestedApproach to Overcome
Challenges
KMTools and
Techniques Suggested
1 Training is being conducted in virtual
organization. Trainer is at one geographic
location, whereas trainees are at different
client locations. Trainer may have shared the
training material prior to the training via email,
but the training will not be effective if the
trainer and trainees are not referring to the
same page at the same time during training
and if the trainer is unable to see the facial
responses of the trainees during training.
If the trainer is web sharing the material
during training that the trainees can get to
see, then both the parties will be on the same
page. Some web sharing tools, for e.g., white
board can be used to make the training more
effective. If a video conferencing for the
training is made possible it will help the
trainer to see the facial responses of the
trainees.
Webshare, Video
Conference, Virtual
White Board
2 ProjectABC has onshore teamand offshore
team. Onshore teamis gaining knowledge
because of frequent/regular interactions with
the customer, whereas offshore teamis
losing out on knowledge because of less
interaction with customer. This may lead to
de-motivation and misunderstanding between
the two teams.
If both the teams can have frequent
interactions over teleconferencing and
videoconferencing such that the customer,
onshore teamand offshore teamcan all get
to participate and share the knowledge, it will
help the offshore teamgain knowledge and
stay motivated.
Teleconference,
Videoconference
3 Virtual teammembers of ABC corporation,
who are located in multiple geographies, are
working in different time zones. Because of
different time zones, some teammembers
cannot go through the trainings conducted at
the same time.
Use E-learning, technology based training,
videotaping, etc., so that all teammembers
get trained as per their convenience and
availability. Also, use of Webinars and
availability of recorded Webinars on intranet
sites will help teammembers go through the
recorded sessions at their convenience.
E-learning,
Videotaping, Webinar
4 Virtual teammembers working on project
XYZare working in different time zones.
Some virtual teammembers are not able to
attend the meetings that are conducted at
different time zones.
Record the meeting and share it with team
members who were unable to attend the
meeting. This will help the teammembers to
listen to the entire conversation to avoid any
communication gap in knowledge sharing.
Meeting Recording
5 Client is not comfortable in offshoring the
work because client is newto offshoring.
There is a need to build trust and confidence
about virtual teamworking on XYZproject.
A videoconference between client and virtual
teamwill instill confidence and trust towards
the teammembers. This will enable smoother
knowledge transfer fromclient/onshore to
offshore.
Videoconference
6 There is a need to conduct J ointApplication
Development (J AD) sessions within virtual
teammembers.
Using tools like web sharing, video
conferencing and virtual white board, J AD
sessions will be more effective in a project
having a virtual team.
Web share, Video
Conference, Virtual
White Board
7 In project XYZthere are business users in
US, designers in Europe and developers in
India. They are working together to develop a
software for healthcare industry. Knowledge
gained by the teammembers during the
project is not being documented. Next time
when the same developers work with
business users fromdifferent geography, they
may not be able to utilize the knowledge of
US business users.
If there is a practice in place to document the
knowledge gained during the course of
project, it will help the future projects to utilize
the knowledge. Also at the end of project, if
lessons learned and best practices are
discussed and documented, it will help to
apply the same experience for future
projects.
Capturing Best
Practices and Lessons
Learned
8 There are regular knowledge sharing
sessions conducted inABC organization, but
management wants to ensure that the shared
knowledge is really being absorbed by virtual
teammembers.
Online quizzes help to check the
effectiveness of knowledge sharing sessions
and evaluate if knowledge is really gained by
virtual teammembers.
Online Quiz tools
Table 1: Mitigating KM Problems Source: Infosys Research
38
Problem: Internal project communication
between the project manager and lead BA was
unclear and there were confusions in the project
team due to the same.
Solution: Project managers and lead BA met
every day over teleconference and web-share
to discuss the proj ect issues. The proj ect
manager gained the knowledge related to the
requirement issues from the lead BA and was
able to raise risks on time effectively. The lead
BA gained the knowledge related to project
schedule and fnance from the project manager
and was able to ensure completion of all the
deliverables on time, meeting the scope of the
project appropriately.
MEETING KM CHALLENGES
Some practical scenarios related to knowledge
management chal l enges and suggest ed
approach, tools and techniques to overcome
the challenges are given in Table 1.
CONCLUSION
Knowledge management in virtual team
is a new work culture and it is still in its
nascent stage. Inspite of many advanced
technological methods, virtual team members
still struggle with constraints like lack of face-
to-face communication as is the case with any
traditional team. Organizations should maintain
a continued focus to address the motivation,
communication and trust related issues to
enable a successful knowledge sharing culture.
There needs to be a continuous motivation for
teams to contribute to the knowledge repository
and use the existing knowledge artifacts to
perform their tasks. Checks and measures
are also to be in place to avoid redundancy/
duplication in knowledge addition ensuring
the management of quality knowledge sharing.
Also, existing knowledge artifacts need to be
maintained with the changing environment,
changing business/system functionalities
to keep the artifacts up to date and remove
obsolete knowledge information. KM in a
virtual team is not a destination; it is a journey
and evolution of work culture.
REFERENCES
1. Knowledge Management and Virtual
Teams. Avai l abl e at http: //www.
chris-kimble.com/Courses/Sogn_og_
Fjordane/KM_and_Virtual_Teams.html
2. Virtual Team Benefts and Challenges.
Avai l abl e at ht t p: //www. t i me-
management-guide.com/virtual-team.
html
3. Importance of Knowledge Management.
Avai l abl e at ht t p: //www. port al .
state. pa.us/portal/server.pt?open=
512&obj ID=1082&&PageID=233241
&l evel =3&css=L3&mode=2&i n_hi _
userid=2&cached=true
4. Johann Schlichter, Michael Koch1 and
Martin Brger, Coordination Technology
for Collaborative Applications, Springer
Berlin, April 2006. Available on http://
www. spr i nger l i nk. com/cont ent /
m48007t7416157h0
5. Chris Kimble, Barlow Alexis and Feng
Li, Effective Virtual Teams through
Communities of Practice, Social Science
Research Network, September 2000.
Available on http://papers.ssrn.com/
sol3/papers.cfm?abstract_id=634645
6. A Virtual Team. Available at http://
en. wi ki pedi a. or g/wi ki /Vi r t ual _
team
7. Knowledge Management. Available
at http: //en. wi ki pedi a. org/wi ki /
Knowledge_management.
39
SETLabs Briefings
VOL 7 NO 5
2009
Toward Disruptive Strategies in
Knowledge Management
By Rajesh Elumalai and George Abraham
Inquire into the assumptions underpinning
conventional KM thinking to achieve
breakthrough strategies
R
api d i nnovat i ons i n col l abor at i on
technology and shortened knowledge
l i f ecycl es have l ed t o t he revi si t i ng of
conventional knowledge management (KM)
thinking. Conventional KM techniques are
falling short in securing competitive advantage.
Conventional KM strategies are based on certain
key tenets. The necessary precondition towards
evolving disruptive strategies, unquestionably,
is to inquire into these tenets. The paper
questions three tenets in order to evaluate their
validity, applicability and impact in the current
context.
The origin of KM in enterprises can be
traced back to the days of Adam Smith. Though
one could argue that Smith advocated division
of labor and specialization as the sources of
productivity, it is the focus on the knowledge in
doing a particular activity that drives improved
productivity. Focus on KM has increased
over the years prodded by varied drivers of
adoption. These drivers include inter alia
productivity improvement, faster cycle times,
reuse and competitive advantage. Innovations
in information technology have aided in rapid
realization of certain objectives of KM like reuse
and productivity improvement.
With the widespread availability and
choice of KM software, the rate of adoption of
KM technologies has considerably increased
over the years. The benefits of technology-
aided and technology-driven KM provided
competitive advantage to enterprises as long
as KM software was targeted and implemented
only by early adopters. Sustainability of such
an advantage needs to be questioned with
the widespread adoption of KM software by
competing enterprises. Incremental innovations
in KM technology cannot be relied upon as
a differentiator. Hence, enterprises need to
embrace strategies that create competitive
differentiation that is sustainable.
There is therefore a need for breakthrough
strategies in KM that differentiate an enterprise
40
from its competition in a sustainable manner.
Some drivers for breakthrough strategies are
listed below.
The role of technology and software in
KM is widely accepted and its rate of
adoption is on the rise. Hence, it cannot
be relied upon as the sole source of
sustained competitive differentiation.
Lower than expected success rate of KM
initiatives over the past decade.
Innovations in technology for real-time
interaction and collaboration.
Shrinking knowledge lifecycle caused by
reduced product lifecycles.
Evolution of mature knowledge eco-
systems in various industries.
INQUIRING INTO THE ASSUMPTIONS
In order to develop groundbreaking strategies,
it becomes imperative to challenge the basic
assumptions that underlie conventional KM
thinking. Three such tenets are mentioned
below:
Knowl edge Needs t o be Peopl e-
independent: Knowledge rooted in
individuals is impacted by resource
risk. Hence it needs to be extracted
and made available independent of the
resources that possess the knowledge.
This ensures sustained competitive
edge through an organizational pool of
knowledge.
Knowledge Can be and Needs to be
Codifed: Representing knowledge using
formal languages (e.g., C++ / Java code)
is the best way of making knowledge
transferable. Any form of knowledge
can be codifed.
Knowing Results in Doing: Competitive
differentiation through knowledge
assumes that knowing results in doing.
Hence, the goal of KM strategy is limited
to ensuring that the knowing dimension
is addressed.
For the purpose of this analysis, we
have depi ct ed knowl edge management
l i fecycl e as a three step process. Whi l e
literature contains various lifecycle models
with multiple steps, this simple structuring
helps our analysis due to its brevity and
focus on fundamental aspects of knowledge
management. The above mentioned tenets
closely follow the dimensions depicted in
Figure 1.
Following sections will analyze these
tenets in detail in search of the possible changes
required in these perspectives.
Store
Use Transfer
?
?
Knowledge needs to be
people Independent
Primary source of
knowledge should be
knowledge repositories
?
?
Knowing results in
doing
KM needs to focus only
on all that is required
fromknowing
perspective
?
?
Codification is the best
mode of knowledge
transfer
Knowledge can be -
and needs to be -
codified
Figure 1: Knowledge Management Lifecycle
Source: Infosys Research
41
Tenet 1: Knowledge Needs To Be People-
Independent
Managing knowledge effectively depends
pr i mar i l y upon wher e knowl edge i s
stored in other words, determining the
source of knowledge. Knowledge resides
with the individuals in an enterprise. They
assimilate knowledge in the course of their
work. Conventionally, it is assumed that
this knowledge needs to be made people-
independent for various reasons. Creation of
knowledge base to store enterprise knowledge
is advocated by conventional KM techniques.
Classical KM techniques are aimed at addressing
the following issues:
Resource Risk: Enterprise knowledge
is impacted by resource risk. When a
resource leaves an organization, the
associated knowledge is lost.
Easy Accessibility: When knowledge
i s made avai l abl e i n a knowl edge
store, it is more easily accessible to
everyone than when it resides within
individuals.
Centralized Management: To achieve
maximum usage of enterprise knowledge,
it needs to be governed and managed
c e nt r a l l y. Cr e a t i ng knowl e dge
repositories leads to achieving this
beneft.
Standardization: Making knowledge
people-independent ensures that there
is a single version of truth. In other
words, the subj ecti vi ty associ ated
with individuals can be eliminated,
when there is an organizational pool of
knowledge.
Testing Contextual Applicability
While the above mentioned reasons make
a good case for making knowledge people-
independent, the applicability of this tenet
needs to be examined in the context of the
drivers mentioned earlier in the paper.
Shrinking activity life cycles and constant
productivity pressures mean that an employee
needs to do more with less time available at
hand. Hence, the focus on creating knowledge
and contributing to an enterprise knowledge
repository is severely crippled. This diffculty
is compounded by the complexities involved
in technology architectures and organization
models [1].
Technology for real-time collaboration
like instant messaging and voice over
internet are available widely, making
collaboration easier and faster than
ever. Lack of such technologies in
the past made real-time collaboration
cost l y. Reachi ng out t o peopl e i s
now easier and faster than retrieving
t he r i ght cont ent f r om mul t i pl e
repositories.
At t empt s at s t andar di zat i on of
k n o wl e dg e ma n a g e me n t h a v e
introduced a high degree of complexity
that calls for significant user education
on using KM systems. Benassi et.al.,
argue that Especially i n complex
organizations, workers specialized
in different sectors, with different
needs, different ways of thinking, and
different interpretation schemas
cannot be forced to use a unique system
of knowledge representation that they
might consider either as oppressive or
irrelevant [1].
42
The people risk dimension holds good
only for very specialized skills where
the exit of one individual poses risk. In
general, the degree of capture of this
knowledge in conventional KM systems
is very limited. We can reason that KM
benefts arise out of the skills that are
used by a reasonably large number of
people, rather than from niche skills
that are used by a relatively small group.
Inference
With the above analysis, one can infer that
making knowledge people-independent is
no more critical for successful KM. Therefore
the change in perspective should be to treat
individuals as primary knowledge store rather
than independent knowledge repositories.
While this is a pre-requisite for breakthrough
strategies, it does not call for deprecation of KM
efforts made using conventional techniques. This
change in perspective only means that the focus
should be shifted towards treating individuals
as the knowledge store. Conventional KM
efforts at best are complimentary to the efforts
that arise out of the changed perspective.
Knowledge stores can still be used in scenarios
where the knowledge is fairly stable over time,
and is likely to be reused multiple times and
extraction of knowledge is easier.
As Heather Creech, Director, Knowledge
Communications of International Institute of
Sustainable Development (IISD) says, KM has
moved well beyond the systematic collection,
archiving and retrieval of information. Merged
into KM are concepts of dialogue, relationship-
building and adaptive learning through constant
interaction with users, who have their own
knowledge and perspectives to contribute [2].
Towards Inclusion of Human Knowledge
Sources: The key to organizational knowledge
creation is the augmentation of an individuals
knowledge within the organization. The creation
of new perspectives and knowledge stems from
the synergy of accumulated knowledge and
reason. Unless and until they are articulated
and amplifed, this knowledge remains personal.
One way to implement the management of
organizational knowledge creation is to create
a field or self-organizing team in which
individual members collaborate to create a new
concept [3]. This methodology advocates the
perspective of making the process of knowledge
creation more dynamic and people centric.
Tenet 2: Codifcation For Making Knowledge
Transferable
In line with the observations made in the
previous section, conventional KM relied
A perception change in traditional knowledge
management can be brought about by making knowledge,
people-independent
43
heavily on codifcation as the enabler of effective
knowledge transfer. Tacit knowledge is being
codifed in various forms in order to make it
explicit, so that it can be diffused easily. KM
through codifcation provided one of the major
realized benefts of KM, namely reusability.
However, codification has its own list of
drawbacks when compared to other modes
of knowledge transfer. With the availability
of better modes of knowledge transfer (viz.,
personal interactions, narratives, discussions),
the conventional choice of codifcation can be
attributed mainly to the technology choices that
were unavailable so far. Most of the incremental
innovations in KM technologies (viz., blogs,
wikis and indexing and search innovations)
have their roots based on codifed content.
While codifcation has been suitable for
such content that is fairly static and needs to
be reused multiple times, it cannot be treated
as a universal choice for knowledge transfer.
Codifcation has its own associated problems
and some are mentioned below.
Deterioration of knowledge during
transfer owing to insuffcient semantics.
The ti me i nvol ved i n transferri ng
knowledge through codifcation is high
compared to other modes of transfer like
discussion.
Codifcation is not suitable for all kinds
of knowledge. Especially, complex
perspectives that arise out of experience
are not suitable for codifcation.
Testing Contextual Applicability
With the advent of new technology enabling
real time collaboration, codifcation lags behind,
since other forms of knowledge transfers like
dialogue and discussion can be leveraged with
such technology.
Due t o i t s t i me i nt ensi ve nat ure,
knowledge transfer through codification is
not likely to be practiced because of reduced
activity times.
Since codification is decoupled with
transfer of knowledge, knowledge that is still
not codifed cannot be transferred. Even the
knowledge that is codifed, but not updated,
loses its value during transfer. Hence, with
the short cycle times, this decoupling hampers
realization of the promised value of reusability.
Inference
The focus on codifcation needs to be shifted to
other modes of knowledge transfer like personal
interactions and real-time collaborations.
While codifcation can continue to be used for
knowledge that is easy to abstract and is likely
to be reused many times, the perspective should
Active involvement by the end user can lend a
better perspective towards enhancing conventional
knowledge management
44
change towards focusing on other forms of
knowledge transfer.
The focus of KM should enable the
human process of knowing and thus, in the
direction of how the users construct meanings
when they codify their knowledge.
As Mathew Hall asserts it is important,
therefore, that people as individual and unique
knowers are at the centre of any approach
to KM. Focusing predominantly on the creation
of information and how that information gets
moved around does not get close enough to
understanding how the information contributes
to someone elses knowledge and work. There
seems little point codifying knowledge for
the purpose of transferring it elsewhere in the
organization without someone else being able
to decodify it. And without knowing who that
someone is, it is diffcult to know how to codify
the knowledge to begin with [4].
Tenet 3: Knowing Results in Doing
While knowledge is instrumental in doing, its
only doing that creates value to the enterprise.
Knowledge is just a lever. In other words, KM
efforts focused on building knowledge are not
likely to provide value, unless this knowing
results in doing. Conventional KM aims to
create an independent knowledge pool that
aspires to cover the entire gamut of knowing.
The conversion of this knowledge into doing is
challenged by the active engagement of the doer
who uses this knowledge. Efforts made to make
the knowledge available are worthless, unless
it is used by the end user.
The focus of conventi onal KM on
making all knowledge required for doing
a task is not likely to succeed, simply due
to the enormity of such an effort. Instead,
active engagement with the provider is likely
to succeed from the doing perspective, since
the consumer is given access to specific
information from doing perspective.
Testing Contextual Applicability
With shortening activity cycles, decoupling
knowing and doing is not likely to succeed,
since the end user has to go through information
that is created from knowing perspective that
may not be helpful from a doing perspective.
There is a huge search cost involved in fnding
the knowledge that is relevant for his doing.
Wi th the avai l abi l i ty of real ti me
collaboration technologies, end users can
use alternative means of knowledge transfer
like, personal discussion, in order to get the
knowledge from doing perspective.
Inference
KM efforts need to have a doing perspective,
rather than simply a knowing perspective.
Such a change would necessitate a relook into
making knowledge standardized across the
organization. Alternative modes of knowledge
transfer can hel p i n maki ng knowl edge
available to end users from doing perspective.
TOWARDS A SOLUTION
The authors seek to analyze the trajectory
towards a solution for the problems enlisted
in the beginning of this paper. Identifying the
four paradigms of knowledge creation and the
associations between them, aids in addressing
the aforementioned challenges [Fig. 2].
As s o c i a t i o n P a r a d i g m i s t h e
systemization and communication between
groups in an explicit knowledge realm. This
aids in creating explicit knowledge from explicit
knowledge.
Hobnob Paradigm is characterized by
knowledge emanating from individuals who
interact in a group through various modes
45
of communication. The nature of knowledge
creation in this scenario is tacit-tacit.
Inclusion Paradigm is the percolation
of knowledge from the external source to the
individual in the tacit knowledge realm. This
aids in creation of tacit knowledge from explicit
knowledge.
Peripheral Paradigm is the articulation
of individual knowledge into a group. This
results in creation of explicit knowledge from
tacit knowledge. This is depicted in Figure 2.
Of the four modes of knowledge creation,
conventional KM well addresses Association
Paradi gm ( expl i ci t - expl i ci t knowl edge
creation) and Inclusion Paradigm (explicit-tacit
knowledge creation). Conventional KM also
attempts at addressing Peripheral Paradigm
(tacit-explicit knowledge creation) to a certain
extent, though the appropriateness and success-
rate is questionable. However, conventional
systems and strategies are woefully inadequate
in addressing Hobnob Paradigm (tacit-tacit
knowledge creation) as a mode of knowledge
creation.
The i nadequacy of convent i onal
systems in addressing Peripheral Paradigm
and Hobnob Paradigm modes are mainly
due to the constrained perspectives along the
tenets, as listed in this paper. A direction for
evolving solutions to address all four modes
of knowledge creation and management, is
therefore proposed here.
1. How shoul d you st ruct ure your
knowledge stores?
External knowledge repositories
are adequate for managing explicit
knowledge. Hence, they should
be relied upon only for explicit
knowledge.
Tacit knowledge cannot be effectively
and effciently managed with external
repositories. Hence, manage such
knowledge with individuals of an
organization, creating a knowledge
cloud.
While knowledge repositories and
knowledge-cloud could collectively
serve as knowledge stores in your
organization, shift the focus towards
individuals as the primary store of
knowledge.
2. What should be your approach for
knowledge transfer?
Knowledge can also be transferred
without codifcation. Use codifcation
only for addressing Association
Paradigm and Inclusion Paradigm
of knowledge creation.
Opt for codification only when
knowledge is fairly stable and is
TACIT KNOWLEDGE
I
N
C
L
U
S
I
O
N
P
A
R
A
D
I
G
M
P
E
R
I
P
H
E
R
A
L
P
A
R
A
D
I
G
M
EXPLICIT KNOWLEDGE
ASSOCIATION PARADIGM
HOBNOB PARADIGM
Figure 2: Paradigms of Knowledge Creation
Source: Infosys Research
46
likely to be reused multiple times, at
least in order to offset the effort gone
into creating codifed knowledge.
Us e mor e per s onal means of
knowledge transfer like dialogue and
discussion for addressing Hobnob
Paradigm and Peripheral Paradigm.
3. How can you ensure that your KM
efforts result in the desired outcome?
Expl i c i t knowl e dge i s of t e n
declarative in nature. Hence it will
not provide results in terms of
productivity in itself.
Tacit knowledge is likely to be
procedural in nature. Procedural
k n o wl e d g e wh e n ma n a g e d
effectively provides results in terms
of productivity. Hence, shift your
KM basket towards tacit knowledge
by leveraging Hobnob Paradigm and
Peripheral Paradigm.
Shi f t your f oc us away f r om
decoupling knowledge provider
and consumer. By actively engaging
provider and consumer, knowledge
from doi ng perspecti ve can be
realized effectively.
CONCLUSION
In view of the rising KM adoption rate, it is
important that enterprises evolve breakthrough
strategies to create sustainable competitive
advantage. It is worthwhile challenging the
conventional KM assumptions. What seems
like a critical tenet in conventional KM thinking
is not likely so. Contextualization plays an
important role in defining the validity and
applicability of such assumptions.
Time has come to revisit the principles of
conventional KM and fgure out whether they
still hold good in the current conditions. An
enterprise seeking to create knowledge-driven
competitive advantage should avoid getting
stuck in the mire of conventional KM thinking.
REFERENCES
1. Mario Benassi, Paolo Bouquet and
Roberta Cuel, Success and Failure
Criteria for Knowledge Management
Systems. Available at http://fandango.
cs.unitn.it/~rcuel/docs/EURAM-2003-
Benassi-Bouquet-Cuel.pdf
2. Heather Creech, A Synopsis of Trends
in Knowledge Management. Available
at http://www.iisd.org/pdf/2006/
networks_km_trends.pdf
3. Ikujiro Nonaka, A Dynamic Theory of
Organizational Knowledge Creation,
Organization Science Journal, Vol 5, No
1, February 1994
4. Matthew Hall, Knowledge Management
a nd t he Li mi t s of Knowl e dge
Codification, Journal of Knowledge
Management, Vol 10 No 3, December
2004. Available at http://www.abs.
as t on. ac . uk/ne wwe b/r e s e ar c h/
publications/docs/RP0438.pdf.
47
SETLabs Briefings
VOL 7 NO 5
2009
Knowledge Engineering For
Customer Service Organizations
By Rakesh Kapur and Venugopal Subbarao
Power your customer service function with a
knowledge-based approach
C
ommunications industry has maintained
a decent upswing for some time now.
Telecom products and services have created
a long lasting impact on our lives in the past
ten years. Newer inventions and technologies
have surfaced making communication easier
and affordable across the globe. Cheaper
technology and high demand attracted multiple
communications service providers (CSP) to the
industry. Cut throat competition paved the way
for competitive pricing and resulted in rapid
increase in customer base. Customer service
centers had to be established to cater to these
growing customers.
To gain numero uno spot in customer
acquisition the CSPs concentrated on inventing
ways and strategies to expand their reach and to
make the customers spend on the new products
and services. The CSPs were able to expand
their reach from urban to semi-urban to rural
areas. In this hustle an important benchmark
of customer assurance went unnoticed. The
CSPs were not able to cater to customer
expectations in terms of network availability
and timely support that resulted in a churn
in the subscriber base. With tariffs reaching a
near saturation point, business viability seemed
possible only by retaining focus on customer
retention.
While they were trying to put their
houses in order by focusing on customer
experience, newer products had to be invented
and taken to the market. Newer technologies
like VoIP, IPTV, IMS, number portability,
etc., surfaced. The CSPs had to launch newer
products and adopt emerging technologies to
maintain competitive edge.
CSPs now have t he chal l enge of
maintaining an edge over competitors by
launching newer products, adopting new
technol ogi es and focusi ng on enri chi ng
customer experience.
TRADITIONAL CUSTOMER SERVICE
SYSTEMS
Customer service centers are a major cost to the
service providers. Around 70% of the cost is
attributed to the service assurance operations.
48
CSPs are maturing rapidly to understand
the need for customer retention and are on
constant lookout for fnding innovative ways
to improve and maintain customer satisfaction
levels, reduce customer churn and reduce cost
of operations.
The key challenge is to provide good
and constant customer service. Bad service can
erode the roots of even the largest of the service
providers.
A mix of offerings from self-care to
an advisor-based support is being developed
to enrich customer experience. This acts as a
differentiator to the service provider and helps
in customer retention and a reduced churn.
The key challenges in customer service
industry include:
Expanding the current networks to
support the new technologies and
deliver constant throughput.
Convergence l eadi ng to i ncreased
complexity of service. The complexity
of the products increases exponentially
with every release of the product and
introduction of any new product.
Dependency on vendor for bringing in
key skilled resources resulting in higher
costs.
Higher cost of manpower training and
incremental updates.
Complexities in understanding and
maintaining the systems.
More centers to handle increased volume
of calls during new product launches even
while sustaining the current products.
Providing adequate training to advisory
staff in a short interval and be productive
from day one. Getting the right talent
pool f or net work operat i ons and
customer service is a challenge.
Scalability issues with respect to support
growing business.
Asymmetrical way of issue resolution
as traditional troubleshooting systems
available today operate in silos. They
provide certain ways for troubleshooting
but do not give a comprehensive view to
the advisor to troubleshoot an issue. A
lot of it is left to the advisors knowledge
to troubleshoot. This results in a very
asymmetrical way of resolving an issue.
Use of pri mi t i ve t roubl eshoot i ng
mechanisms whereby customer service
advisors are equipped with some basic
tools for troubleshooting, or reference
materials are kept in a shared location.
Diffculties in adapting to next generation
technologies. Resistance to adopt new
technologies and the learning curve is a
challenge that needs to be overcome. This
makes it diffcult for service providers to
ramp up and adapt to next generation
services like VoIP, IMS, etc., in a short
period of time.
Human nature of taking shortcuts
and employing mechanisms to bypass
processes compounds to the delay
in resolution. On the contrary if a
chronological mechanism or a process
is followed, the issue gets resolved in a
shorter timeframe.
In a highly dynamic customer service
center environment where the front line advisor
is in an interactive mode with the customer,
there is a dire need for the advisor to resolve
the customer issue at the frst go and in the
shortest timeframe possible while improving
customer satisfaction. It is a common scenario
where one encounters different answers from
the customer service advisors, every time one
49
calls for the same issue. This holds good even
when the calls get escalated. The supervisors
have a different way of handling issues versus
the front line advisors. Many an irate customer
would be hard to handle should they get into
drilling the advisor on the information supplied
by an earlier advisor. These challenges lead
to customer dissatisfaction and also add to
the costs of the operations center as calls gets
escalated to various tier levels.
CIRCUMVENTING THE CHALLENGES
The challenge today is to capture the knowledge
(skills and experiences) of an expert and make
it available to all (the advisors). Knowledge
management research is trying to address this
problem of formal capture of expert knowledge
and making it available to all.
A common realization taking shape is
the need for a knowledge management system
that can cater to the ambit of products being
offered by the CSPs. A knowledge based
support system is built leveraging knowledge
engineering concepts. The system will be able
to cater to the needs of the customer service
advisors and also should aid self care.
The chal l enge of managi ng costs,
maintaining products and services calls for
a holistic knowledge management solution.
The solution should be able to (i) bridge the
gap between training and operations and help
collate the knowledge spread across multiple
systems; (ii) provide different techniques for
capturing knowledge and mine various data
sources to provide a unifed information system,
(iii) provide a scalable knowledge base and
leverage the SME knowledge as required by
making the system people independent; (iv)
provide a customizable process for diagnosis
that improves CSRs effciency leading to faster
resolution of customer issues; (v) provide
probable solution to a resolution based on the
network inputs and also improve on customer
satisfaction levels and overcome challenges
of AHT, FCR (RFT) and quality; (vi) capture
expert knowledge and make it available to
all, thus reducing the impact from churn of
employees; (vii) provide speedy service by
capturing knowledge of new products and
offerings into the knowledge-based system;
(viii) be scalable with ever increasing number
of users, increasing knowledge database, be
a stable system and be able to perform with
minimal latency.
Such a holistic solution brings with it
innumerous advantages like (a) aiding in quicker
decision making; (b) exploiting enterprise
knowledge/information assets and enabling
knowledge arrest; (c) addressing the knowledge
management need in a comprehensive manner;
and (d) providing analytics over the historical
data using information management techniques
of knowledge engineering.
SBCS AND KNOWLEDGE ENGINEERING
A SOLUTION IN SIGHT
Scenario based customer service solution
(SBCS) aims to transform customer service
operat i ons by i nt roduci ng harmoni zed
generic processes for proactive, reactive and
predictive service assurance. Utilizing pre-
built scenarios and leveraging continuous
learning mechanisms, SBCS equips the CSR
with enriched information to improve first call
resolution and significantly improve customer
experience.
The capabilities offered by SBCS will
help transform a customer service centric
organization to a knowledge based organization.
SBCS Methodology
Based on knowledge engineering principles,
50
SBCS provides a structured methodology for
knowledge base development and enables
a standardized process to diagnose service
issues.
Thi s st ruct ured process hel ps i n
streamlining:
the diagnosing process and helps the
advisor resolve the issues reported
consistency in engaging customers while
diagnosing the issues
knowledge management and knowledge
capture from various stakeholders in the
CSR organization.
To address t he servi ce provi der
challenges, various knowledge engineering
features form a part of SBCS solution and these
are listed in Table 1.
SBCS provides the capabilities to service
provider to collate information residing in
various portals and shared folders and build
a common knowledge database that can be
presented in a structured form to the advisor
to help resolve the issues.
SBCS hel ps t he servi ce provi der
organizations to move towards an aspirational
model where front-end processes and teams
do not require substantial product and domain
knowledge. The information model equips the
front-end teams to deal with the service requests
effectively and hand over the request to high
skilled second level team only for a minority
of cases.
SBCS pr ovi des a wor kbench f or
authoring and modifying the knowledge base.
Workbench provides a user friendly approach
for SMEs to add new knowledge and modify the
existing knowledge-base.The SBCS architecture
is shown in Figure 1.
SBCS Component
Knowledge
Capturing
Knowledge
Retrieval
Knowledge
Management
Import and Export
Features
Knowledge capturing helps to quickly develop required knowledge base fromexperts.
Use of diagnosis tree for both detailed diagnosis and pre-classification.
Template based knowledge capturing.
Pre-classification knowledge capturing based on probability approach.
Knowledge authoring workflow.
Enables faster adoption of knowledge base by CSRs; enables faster ramp-up and scale-up of
CSRs for newproducts and increase in customer base.
SBCS search portal.
Diagnostics assistance.
Pre-classification based on probabilities.
Pre-classification based on structured questions.
Diagnostics work-flow.
Role based knowledge retrieval.
Enables change management.
Metrics and reports.
Helps in reusing the existing knowledge base.
Import and export of template based knowledge base.
Table 1: SBCS Components and Features Source: Infosys Research
51
SBCS DEPLOYMENT APPROACH IN A
PILOT ENVIRONMENT
Developing a knowledge system that consists
of solution for all the issues can be a challenge
and is likely to take longer time. SBCS adopts
an incremental way of knowledge system
development. Typical process of knowledge system
development in SBCS is represented in Figure 2.
Similar to any business transformation
initiatives, SBCS roll-outs are also faced with
some nagging challenges. For e.g.,
People: Any CSR organization is comprised of
large number of people. Any initiative in such
organization requires addressing the people
dynamics, people aspects, such as:
Getting buy-in from operations, quality,
process and training
Identifying knowledge champions and
ability to get contribution from them
Infuencing the advisors to adopt new
systems.
Process: Process changes and ability to seamlessly
integrate the new system to the existing process
is one of the key challenges. Capturing base
Classification
ReviewPublish
(Workflow)
KnowledgeAuthoring Indexing
Scenario Modeling Workbench
Desktop Workbench
Models
Model2Code
Model Based
Web Workbench
Heuristics
Editor
Knowledge
Templates
Knowledge
Management
Authoring
Metrics
Usage
Metrics
KM
Process
SBCS Knowledge Base
External
Files
Diagnostic
Scripts
SBCS
Database
SBCS Engines
Diagnosis
Process
Search
Diagnosis
Manager
Diagnosis
Engine
Scripting
Engine
Workflow
Presentation Layer
Solution Portal
Diagnosis
Assistant
Solution
Center
User
Management
Directory
Systems
CRM
Applications
Inventory
and
Backend
Systems
Backend
Test and
Diagnosis
Tools
I
n
t
e
g
r
a
t
i
o
n
L
a
y
e
r
Figure 1: SBCS Architecture Source: Infosys Research
Analysis to identify
top 20%issues
Identify the
knowledge assets
Define knowledge
template structure
Extract knowledge /
model knowledge
Review
and publish
the knowledge
Train CSRs to
use SBCS
knowledge system
Roll-out the
systemand
measure metrics
Define metrics and
measurement for
knowledge
usage and addition
Figure 2: Typical Knowledge System Development Process
Source: Infosys Research
52
metrics and process change metrics to measure
the overall impact of the process change is very
important in such initiatives.
Technology: Telecommunication (Telco)
organi zat i ons have evol ved over t i me.
Any new s ys t em i nt r oduc t i on needs
integration with legacy systems. In case of
CSR organizations like Telco CSRs, ability of
CSRs to understand diverse Telco services and
ability to scale up to new converged solutions
is also important.
SBCS adopts a holistic approach to
address the above challenges. SBCS model based
approach enables experts to articulate their
tacit knowledge. SBCS scenario development
approach is based on knowledge engineering
approach that helps in analyzing the current
system with respect to process-workflows,
organization structure and collaboration
between systems and people.
Knowledge Capturing: SBCS knowledge
development process identifes the knowledge
intensive activities in the service assurance
process like diagnosis, classifcation, etc. SBCS
suggests and has specifc templates to capture
diagnosis and classifcation knowledge from
various sources including experts. This thus
enables effective capturing of knowledge.
Change Management: SBCS knowl edge
management functionalities like usage metrics
capturing, knowledge authoring metrics
capturing, etc., help in implementing risk-
reward mechanism, which thus enable effective
implementation of KM practices.
CONCLUSION
Knowledge engineering systems are being used
in various forms worldwide for a variety of
products and services spanning across various
technologies. These vary from as simple solutions
as self care solutions like Help functions for a
tool to complex solutions like VoIP and IMS
services. The challenges for providing a better
customer service have increased with the launch
of new products. Various service providers delay
their go-to-market due to the time it takes to hire,
train and ramp up the customer service advisors.
The complexity that new technologies bring
have compounded to this delay as the level of
understanding of the issues is more theoretical
and it is diffcult to comprehend the practicality
of the issues.
So, there is a defnite need for knowledge
management systems that provide the capability
to the service provider to add, update and enhance
the knowledge base as and when desired and help
move away from people dependent operations to
a knowledge management based solution.
REFERENCES
1. A Demand-Based View of Support, The
Consortium for Service Innovation, KCS
2. P Colin et al., A New Approach to
Knowledge Management for IT support,
The Value Proposition of KM in CRM,
Gartner Research 2003
3. Deploying Self-Service Applications for
Customer Service, Gartner 2001
4. Lel i ance Nunes de Barros et al . ,
Model based Diagnosis for Network
Communication Faults, TRAI Report 7
Oct, 2008.
53
SETLabs Briefings
VOL 7 NO 5
2009
Support to Decision Making:
An Approach
By Sujatha R Upadhyaya PhD, Swaminathan Natarajan and Komal Kachru
Leverage the power of Bayesian networks to make
decisions under uncertain situations
K
nowledge engineering techniques such
as neural networks, support vector
machines, topic maps, semantic networks,
Bayesian networks (BN) and ontologies have
grown in popularity in the past few years.
Some of these models such as ontologies
and semantic networks are static, where
domain knowledge is stored and retrieved
in a certain fashion. Although there would
be cert ai n amount of l ogi c i nvol ved i n
t he way queri es are answered i n t hese
model s, t hey t ypi cal l y do not i nvol ve
dynamic computations. On the other hand
neural net works ( NNs) , support vect or
machines (SVM), BNs and fuzzy models
(FMs) are dynami c model s that i nvol ve
computati ons and so are al so known as
computation models or dynamic models. In
most of these models computations remain
a bl ack box to the user and knowl edge
representation itself is not very elaborate.
Unlike other computational models, tracing
of computations is possible in BNs. Also
BNs provi de a systemati c approach for
representing domain knowledge. Due to
these reasons BN has become one of the
most used and i nt ui t i ve t echni que f or
representi ng domai n knowl edge. It can
be effectively employed in contexts where
one has to observe changes i n di fferent
parameters of interest within the domain
vis--vis the changes in other parameters of
the domain.
Literature is inundated with research
on BN and their applicability to various felds
of study, right from medicine to meteorology
[1]. BN has proven to be one of the most
efficient techniques where one has to make
decisions under uncertainty. Some of the typical
applications where BN has been effectively
used are clinical decision support, genetic
models, crime risk factor analysis, spatial
science, forensic science, information retrieval,
reliability analysis, terrorism risk management,
weather forecasting and credit rating of
companies. An attempt has been made in this
paper to explicate how BN can be employed to
make decisions under conditions of uncertainty.
54
METHODOLOGY OF CONSTRUCTION
AND COMPUTATION
BN employs a probabilistic and graphical
method for knowledge representation. Given
a set of variables in a domain, a directed
acyclic graph that shows the dependencies
among them would make a basic structure
for building a model in BN. In such a graph,
each variable would appear as a node and
dependency between two variables would be
indicated by the presence of an edge between
them. More than one state could be assigned to
each node. For instance, for the node Severe
Headache in the network shown in Figure
1, one could assign two states, present and
absent. Each node would also be associated
with a conditional probability table (CPT) that
indicates the probability of that node being
in a certain state, given the state of its parent
nodes. In Figure 1, the CPTs for all the nodes in
the coma network are shown. The node coma
has two parents Brain Tumor and Increased
Serum Calcium. The node coma could be
in two states present or absent indicating
presence of absence of coma in a certain case.
Also, the parent nodes are associated with two
states, presence or absence. The CPT of node
coma represents the probability of coma being
absent and present when the parent nodes
are in all possible combinations as shown in
Figure 1.
The domain expert plays an important
role in building the domain model in BN. For
instance, consider the BN given in Figure 1 that
represents a confned domain in medicine. It
shows different symptom nodes associated
with metastatic cancer and brain tumor. The
responsibility of the domain expert would be
to choose the variables of interest and indicate
the dependencies between different variables
and also the different states associated with the
nodes. To keep the case simple, only two states
namely, present and absent are associated
with each of the nodes. Typically, if the network
Source: www.genie.sis.pitt.edu
Coma Network and Associated CPTs
Metastatic Cancer
Increased
SerumCalcium
Coma
BrainTumor
Severe Headache
Increased
Serum
Calcium
Present 0.8 0.2
Absent 0.2 0.8
Present Absent Increased
Calcium
BrainTumor
Present 0.8 0.8 0.8 0.05
Absent 0.2 0.2 0.2 0.95
Serum Present Absent
Present Absent Present Absent
Metastatic
Cancer Present Absent
Present 0.2 0.05
Absent 0.8 0.95
BrainTumor BrainTumor Severe Headache Coma
Present 0.8
Absent 0.2
Metastatic Cancer
Figure 1: Coma Network: An Example of Bayesian
Network
55
is small and the states associated are not too
many, the expert herself can assign values in the
CPT of each node. That is how the alternative
name Belief Network is in use, indicating that
experts belief is given signifcant importance
in this method.
Although the procedure described above
is the ideal method of building the domain
model in BN, in a practical scenario it is not
always possible for a domain expert to assign
each value in the CPT, especially in cases
where the CPT could get too large because of
too many number of parents and associated
states. In the literature there are quite a few
expectation maximization (EM) algorithms
such as maximum likelihood estimation (MLE)
and entropy maximization that can learn these
probabilities from data. The data has to be in
discrete form indicating the states associated
with the nodes as shown in Table 1.
Similarly, there are quite a few algorithms
that can learn the structure of the network (the
way edges are connected with the nodes).
However, accuracy of these algorithms has
not been proven yet. In practical situations, a
network arrived through discussions from a
team of experts would work very well. At best,
these algorithms can be used to give an initial
structure of the network in contexts involving
too many parameters. The so obtained structure
has to be further refned by a team of experts.
Once the complete structure of BN is
ready, the way to utilize this model is to learn to
make inferences. In this context, inference is an
operation in which the values for some subsets
of attributes are known. Given this information
one must use the BN to estimate the probability
distribution of one or more of the remaining
attributes. Typically, inference would answer
the questions such as, Given that Metastatic
cancer is present and Brain tumor is absent,
what are the probabilities that the patient would
go into a Coma stage?
The al gori thms typi cal l y used for
making inference with BNs are of two types
approximate inference algorithms and
exact inference algorithms. Approximate
inference algorithms that reduce the complexity
of inference procedure include likelihood
weighting algorithm, Gibbs sampling method
and Loopy belief propagation. Junction tree
algorithm, variable elimination algorithm, brute
force method, Pearls algorithm are some of the
exact inference algorithms.
CASE STUDY
The very purpose of building case studies is to
learn the best contexts for employing a certain
technique as a real time application. In this
case too, the primary interest was to explore
the feasibility of using BNs in a particular
context. BN was employed to study the effect
on the performance parameters of different
companies within a particular industry sector in
the context of changing industry and economic
scenarios. A team of domain experts identifed
the parameters of interest in the domain. These
parameters were grouped into three categories,
Metastatic Brain Increased Coma Severe
Cancer Tumor Serum Headaches
Calcium
Absent Absent Absent Absent Absent
Absent Absent Present Present Present
Absent Present Absent Present Present
Present Absent Present Absent Present
Present Absent Absent Absent Present
Present Present Present Present Present
Table 1: Fragment of Training Sample
Source: Infosys Research
56
namely macroeconomic factors, industry
specifc factors and company specifc factors.
The dependencies among the parameters were
specifed. With this, a basic BN structure was
built and was later improved upon by the team.
In the fnal model of the BN, 29 variables of
interest represented all the three categories.
The network consisted of macro economic
indices such as GDP, oil price, infation, etc. It
also had a quite a few industry specifc factors.
Industry specifc factors are dependent on the
particular industry that one intends to explore.
For instance, in real estate sector one would
consider cement price, commercial property
index, steel price, etc., as some of the industry
specifc factors. It also had company specifc
variables depicting the health of the company
such as sales, revenue generated and net
income. A fragment of the BN is as shown in
Figure 2. In this fgure, a general BN is shown,
without being specifc to any industry. It is a
representation of how macro economic indices,
industry specifc factors and company specifc
factors appear in the BN model.
After the structure of the network is
drawn, the next phase is learning the probability
parameters through parameter learning.
However, since the data for parameter learning
is required to be discrete and usually the data
available in this domain is in continuous form,
a very crucial phase of data preparation had to
be undertaken.
Data Preparation: Data preparation has to be
done very carefully to ensure accurate results.
All the parameters of interest in this domain,
appear as continuous numbers, whereas BN
works on discrete data. Therefore, the primary
objective here is to convert this continuous data
into discrete data. However, one must take care
not to lose the trends present in the original
data. The discretization or bucketization
procedure involves categorizing continuous
data into specifed number of discrete values.
The most common approach is to discretize the
data into n buckets, each bucket specifying the
range of values it may take. The drawback with
bucketization approach lies in the diffculty
in interpreting the buckets while considering
the same variable of different companies.
For example, the sales figures of different
companies could take different ranges. It is
diffcult to maintain uniformity in bucketizing
using this methodology. Another approach to
achieving a more meaningful discretization and
still maintaining the trend or the information
contained is to adopt a growth-decline approach.
All the financial indexes or numbers are
compared usually as a percentage change from
either previous quarter or previous year. For
example, net proft achieved by the company is
always compared with the net proft fgure of
last quarter. Applying the same approach one
can discretize or categorize the fgures into an
up (growth) or down (decline) buckets. This
way all the variables are categorized as up,
down or no change states. For example, up
signifying that the net proft has increased as
compared to previous quarter. Inferring the
GDP Inflation Oil Price
Sales Net Income Operating Costs
Industry
Factor 1
Industry
Factor 2
Industry
Factor 6
Figure 2: Representational Bayesian Network for the Case
Study
Source: Infosys Resource
57
results from such discretization also becomes
easy. For example, the analyst may observe
that while an increase in the price of variable
1 impacts the net proft fgure of company A,
the impact on company B may not be much.
This could be because of the internal operations
of company B.
Similarly, data corresponding to each
parameter has to be discretized. The exact
number of buckets and the labels for the buckets
(present, absent; high, low, medium; quick,
slow) are to be decided by the domain expert.
She may also suggest the method adopted
for bucketization. Other than choosing an
appropriate method for bucketizing the data, one
has to take care of the following problems that
may very often appear with the collected data.
Missing Data: In fnance domain, data for all
variables is not available at the same interval.
Some of them are available on weekly basis,
some on daily basis and some others on
quarterly basis. This in turn reduces the number
of data points where data is available for all
variables including data points where the data
is available only for some variables results in
gaps in data leading to missing data problem.
Interpolation and regression techniques
were used in this case study to handle the
probl em of mi ssi ng dat a. Int erpol at i on
technique is useful for estimating values
between measured data points. Interpolation
is used in situations when the data points are
described by a complex function or no function
at all. Regression is used to estimate the missing
points through approximation of data.
Presence of Unwanted Trends: At times, the
data itself might exhibit some undesirable
trends. For instance, the value of money due
to inflation may appear as an undesirable
trend in proft or revenue data. Unless this
trend is not removed, in other words, if the
data values are not normalized, it might result
in wrong bucketization of data. For instance,
sales revenue of $50m would be considered
High ten years back, but the same should not
be considered High years later.
In order to nullify the effect of infation on
value of money two methods of normalization,
namely box-cox transformation and ranking
were used in this case study. The normalized
data was then subjected to bucketization.
Grouping the data values into 3, 5 or 7 buckets
was done by applying the control limits.
MAKING INFERENCE USING THE BN
MODEL
This case study was taken up to evaluate
how usage of BN can help fnancial analysts.
The objective of this case study was to help
the financial analysts estimate the possible
outcomes corresponding to the performance
of different companies in changing economic
scenarios. While using the BN for making
inference, the variables corresponding to the
company specifc factors refer to the company
being analyzed. Hence, while learning the
probabilities from data (parameter learning)
to populate the Conditional Probability Tables
(CPTs), data for the company specifc factors
must correspond to the company intended to
be analyzed.
Once the probabilities are learnt, the
model for the specifc company is stored. Similar
models were created for each of the companies
intended to be studied. The network structure
remains the same but the probability values
differ corresponding to company specifc data.
Unless, it is decided to re-learn the probabilities
for want of learning from data of different
period, one can use the same model to conduct
58
studies on the specific company. The built
model was subjected to rigorous testing to fnd
if infuences suggested by the model in specifc
situations were intuitive and agreed with the
assessment of the analysts or not. The behavior
of every other parameter was observed.
Typically only a portion of the available
data is used for learning and the rest of the data
is used for verifying the results of inferences
made by the system. Figure 3 shows the results
of the inference made. This fgure shows the
comparative results of two companies A (blue)
and B (red), with respect to parameters viz.,
sales revenue and consumer confdence. Here,
both sales revenue and consumer confdence
data were descretized into 5 buckets; very
down, down, no change, up and very up. One
can see that the probability of sales revenue
going up is nearly 90% for company A and there
are 75% chances that company Bs revenue will
fall into very up category. However, there is a
100% chance that consumer confdence would
go up in case of company A and the same
could be told about company B with nearly
65% probability.
To aid the analysts in making sensible
estimates, two features were provided in the tool:
Probability of Evidence: This allows
the analysts to estimate how probable
is the evidence they are supplying for a
particular case.
Confdence Measure: This feature gives
the confdence with which a particular
inference is being made. A sensitivity
analysis approach was used to estimate
the confdence levels with respect to the
inferences.
KEY FINDINGS
It was possible to study different kinds of
variables such as macroeconomic indices,
industry specifc indices and company specifc
factors on a single platform, which were
typically analyzed separately. The case study
also revealed that this method was extremely
useful for estimating the value of different
parameters in the light of new incoming
information.
Source: Infosys Research
1.0
0.8
0.6
0.4
0.2
0.0
Very Down No change Up Very Up
1.0
0.8
0.6
0.4
0.2
0.0
Very Down No change Up Very Up
Sales Revenue (11) Consumer Confidence (22)
Figure 3: Comparison of Results for Two Companies
59
The solution has a feature that allows
combining of quantifiable information with
quality information such as an analysts belief.
With this solution it is possible for a domain
expert to edit the network to introduce a new
node and edit the probability tables of the nodes
that affect / get affected by the introduction of
a new node.
Signifcant saving in time compared to
that required for modeling economy, industry
and company scenarios across companies in an
industry using traditional modeling techniques
was reported by the analysts. The task that
would typically take days or even weeks is
now reduced to few hours with this application.
With the use of the solution, analysts
were able to easily make comparisons across
companies within an industry or even across
industries. One can even have a comparison
across scenarios. This tool provides unique
benef i t s t o qui ckl y obt ai n probabi l i t y
distributions once new evidence is added. These
probabilities can be used directly by analysts
or can be used as inputs in models that require
probability based inputs.
LESSONS LEARNT
One must have suffcient data as training sample
that well represents all possible situations.
Interpolating between data points to make the
training sample large does not help. Absence of
a particular situation in training sample results
in incorrect inference for that situation.
Data preparation has great impact
in finance domain. Given that most of the
available data in this domain is in continuous
form, sufficient thought must be applied
before deciding the buckets for descretization.
For instance, in the case study the data was
discretized into 3 buckets, up, down, and
no change. To do this one needs to consider
differential data. In other words, one needs
to list out whether this data point is an
improvement or a decline compared to the last
data point. This will help answer the question
like, whether the probability of a particular
companys profts going up is high or not. But
it does not suggest how much it is likely to go
up. This can be solved by considering more
buckets within up such as up by 0-30%, up
by 30-60% and up by 60-100%. When there
is a need to compare the performance of two
companies, the system could suggest that it
is highly possible that performance of both
companies would go up in the range 0-30%.
It could as well mean that the performance of
company A is likely to go up by 10-12% and
that of B would go up by 20-25%, which is not
explicit in the inference. The solution would be
to further divide the buckets. However, this in
Bayesian networks suggest a range and do not give any
exact value. It would be good to work upon them to arrive
at a number or a figure
60
turn would add to the complexity of Bayesian
inference and there might be a need to revise
this in future. Therefore, deciding the number of
buckets and choosing the method of bucketing
should be based on the kind of decision making
process one would like to support.
Inferences made using BN suggests only
a range (or a discrete value), but not the exact
value. Considering that, such tasks are carried
out with regression analysis that gives an exact
number for a query, BN results will have to be
further worked upon to meet the expectations
of fnancial analysts who are more comfortable
with numbers rather than discrete information.
CONCLUSION
BN is a powerful method for knowledge
representation and inference that supports
decision making under uncertain situations.
The experience of employing a decision making
tool empowered with Bayesian inference
indicates that data preparation stage is of great
significance in finance domain. One has to
take great care in converting continuous data
into discrete data so that trends in the data
are not lost. Keeping in mind the objectives
of analysis that must be accomplished and
bucketizing the data accordingly would
go a long way in making useful inferences.
According to financial data analysts, the tool
saves a lot of expert time in modeling and
analysis of data and gives a lot of flexibility
in terms of capability to view and compare
analysis results. The tool brings in great
flexibility into data analysis as it facilitates
building analysis models that accommodate
variables of different nature, re-building model
under changed premise and accommodating
incoming information. Using a forecasting
module that uses the results of inference and
projects a number associated with the inference
would further assert the usefulness of this tool
as a decision making tool.
REFERENCES
1. D Heckerman, A Tutorial on Learning
with Bayesian Networks, Learning in
Graphical Models, MA, 1999
2. Ri za Demi rer, Ronal d R Mau and
Catherine Shenoy, Bayesian Networks: A
Decision Tool to Improve Portfolio Risk
Analysis, Journal of Applied Finance,
October, 2006
3. J Gamela, Learning Bayesian Networks
us i ng Var i ous Dat as our ces and
Applications to Financial Analysis,
Journal of Soft Computing, April 1, 2003
4. Catherine Shenoy and Prakash Shenoy,
Bayesian Network Models of Portfolio
Risk and Return, Computational Finance,
1999.
61
SETLabs Briefings
VOL 7 NO 5
2009
Automated Knowledge-Based
Information Extraction from
Financial Reports
By Bintu G Vasudevan PhD, Anju G Parvathy, Abhishek Kumar and
Rajesh Balakrishnan
Why trouble analysts when financial information
can be presented to them in an intelligent form?
F
inancial statements provide an overview
of a companys financial condition and
contain critical business performance metrics.
They are usually presented in a structured
manner consisting of substantial tabular
data that make them easy to be understood
and interpreted. These statements are often
complex and may also include extensive text
notes. Financial analysts and shareholders
require various financial attributes to be
analyzed and compared. Such information is
useful in analysis that aid important factors to
make or break an investment decision.
Information extraction and text mining
techniques can be used to automatically
capture and tag such relevant attributes for the
purpose of high stake quantitative analyses.
This paper presents a customized knowledge-
based algorithm wherein domain specific
heuristics are used to guide information
extraction from financial reports. Extracted
information from the financial reports are then
stored in the user defined standardized XML
templates. This information helps in financial
analysis and can be referred for the assessment
of a business to deal with planning, budgeting,
monitoring and forecasting.
Financial statements or reports that
contain critical performance metrics, on
a quarterly, half yearly or yearly basis
convey a concise picture of the profitability
and financial position of the organization to
the management and potential clients. The
financial reports are generally published
in DOC or PDF formats. They are usually
present ed i n a st ruct ured manner, i n a
form that is easy to understand. Tables are
an important means of presenting multi-
dimensional information and are widely used
in these documents to represent factual or
statistical information. However such financial
reports are extremely varied in their style of
62
presentation as they are prepared by analysts
from different geographical locations. This
accounts for the complex tabular structures
found in these reports.
Financial statements consist of a balance
sheet, income statement (profit and loss
statement) and a statement of cash flows. Often
these tables are embedded within text notes
accompanying the news story that tell the
reasons for the rise or fall of that companys
profits, sales, etc. The paper primarily focuses
on the extraction of important information from
these reports published by various companies.
A customized knowledge-based algorithm
that uses domain specific heuristics to guide
information extraction from financial reports is
presented. Financial analysis applications and
various shareholders like financial institutions,
investors, creditors, governmental oversight
agencies require various financial attributes
(viz., sales, earnings, net income and earnings
per share, etc.) to be analyzed and compared.
This in turn influences important factors that
make or break an investment decision.
THE APPROACH
Early works on table data extraction and
processing presented challenges especially
with that of detecting tables in text and dealing
with structure [1]. Table data extraction from
web documents have been presented where
it exploits formatting clue in HTML tables
tags [2, 3]. HTML tables cells are already
demarcated by <td> tags. But usually structure
i s not known i n advance and hence the
approach to understand the table is to look
for specific patterns of interest from attribute-
value pairs (e.g., <Year: 1999 and <Color:
Blue>) and map the extraction into ontology
form. It uses extraction ontology to search for
values in the source that are likely to form the
attribute-value pairs. Basically, it uses two
dimensional pattern mapping scheme, where
it first recognizes the cells (attribute value)
that contain the attribute names and then the
attributes names that match the attributes
values.
Table extraction by wrapper learning
has been presented in Cohens discussion [4].
Wrappers learn rules based on examples. The
rules are composed of tokens made of HTML
tags. These rules tend to be specific and can
be applied only to those documents whose
Figure 1: Data Extraction Process Source: Infosys Research
Input
Doc / PDF
Automatic Data
Extraction Process
Quality Review
Output Schema
XML / Database
Time Consuming
More Man Hours
Error Prone
Current
Manual Data
Extraction Process
0
1
0
1
1 1
1
1
0
0
0
0
1
1
0
1
0
0 0
1
1
1
0
0 1
1
0
1
1
0 1 1 1
Staging Database
63
structure is similar to the training document.
The use of tokens to compose rules makes
it difficult to generalize across distributed
websites. Table data extraction from HTML
looks at diverse nature of html tags and
vocabulary variants in attributes name for
data extraction [5]. Machine-learning based
t abl e dat a ext ract i on usi ng condi t i onal
random fields (CRFs) has been described by
Pinto [6]. However, the system described does
not perform a complete table extraction task.
It can only locate, label and tag table rows
into a type such as datarow, sectionheader
or superheader, but knows nothing about
the columns and cannot distinguish between
data cells (attribute value) and header cells
(fiscal period). A method to extract table
information from PDF files has also been
discussed by Burcu [7]. This approach is a
heuristic-based approach and works well for
simple and especially lucid tables. The system
only shows the extraction of table elements in
the row and column format and does not deal
with the dynamic mapping of the attributes
values to attribute name and headed columns.
Also, it does not deal with the use of any
business rule to do proper attribute value
mapping to capture information in standard
template.
Our approach f or f i nanci al dat a
extraction is not restricted to extraction of data
from tables but also demonstrates the method of
information extraction from the embedded text
in fnancial reports. The information extracted
from the text notes is important as they drive
the different business rules that are applied
on the extracted data. The extracted data is
dynamically tagged using the knowledge base
and various domain specifc rules are applied
before presenting the data in user defined
standard templates.
Fi nanci al report s cont ai n anal yst
estimates that are forecasted values and are used
extensively for fnancial analysis applications.
Currently many fnancial analysis and research
frms extract such valuable information through
the knowledge process outsourcing (KPO)
activity. This process of extracting information
from various financial statements is mostly
done manually by the KPO expert. She has to
read the document and manually insert the
relevant attributes into the editor tool which is
then converted into the pre-defned standard
(usually XML). This extracted data is then
processed for quality check before it can be
consumed by fnancial analysis. The current
process is as shown in dotted lines in Figure
1. The idea here is to automate the process
and present an efficient way of extracting
information that will reduce the man hours
required for manual data extraction. However
it needs to go through quality check before it
can be used for fnancial research or analysis.
CHALLENGES
Automatic information extraction from fnancial
statements is a challenging problem since the
reports are from heterogeneous sources. The
layout and formatting of the tables are diverse
and broker or analyst dependent. Tables in
fnancial reports can be located in different
positions and can have various styles, types
of layout, positions of header, label and
data values. However, the fnancial analysis
applications work on a target set of attributes/
metrics that can be modeled in a generic formal
representation. This representation can be
considered as the target schema with which
the source schemas (from different tables in
the reports) have a semantic correspondence.
Target schema is the user defned standardized
template.
64
Information extracted is used in fnancial
quantitative analysis and hence the extracted
information needs to be reliable. Presently the
KPO activity of information extraction requires
trained people and is done manually. This
process is time consuming and occasionally
error prone. These problems need to be
addressed by the automatic data extraction
process.
The specifc information extracted from
the tables is called an attribute. Some of the
prominent attribute names are sales, earnings,
net income and earnings per share. The attribute
value is the particular value in the table cell.
The tough part of understanding the table is to
distinguish the table cells containing attribute
names from the ones containing the attribute
values and then map the attribute value to the
corresponding attribute name. This then needs
to be collectively mapped to the header (fscal
period). Thus the extracted attribute values
for a given attribute name has to be matched
against multi-dimensional header information
that contains the fscal period and year (the
completely qualified fiscal period). A table
header generally is single row but sometimes
tables may have multiple rows of header
information. An example is a table header that
spans two rows where the frst row is 2008 and
the second row may contain 1Q, 2Q, 3Q and
4Q. If the table header is split in multiple rows
then it needs to be normalized so that the fscal
period maps to the appropriate year.
In few cases the tables in the fnancial
reports are also structured in a transposed
fashion wherein the attribute names instead
of appearing in one column appear in one
row. Many a times a currency scaling factor
needs to be extracted for the table that is either
mentioned along with the table title (e.g., all
fgures in the table below are in million dollars)
or as a footnote at the end of the table. In such
cases the value scaling factor extracted from
the title or the footnote is used to describe the
attribute values to a greater detail.
Extraction process is not limited to the
table. Our approach also enables automated
extraction of critical information like the report
published date, year end date, company name,
ticker symbol, stock listed, analyst names, base
year and currency from the embedded text
notes. These entities are extracted irrespective of
where they appear in the text. This information
is crucial to run certain business rules to flter
the captured data. Additionally the extracted
information is stored in the representation
confgured as per client requirement.
METHODOLOGY
This section consists of the approach followed
t owards knowl edge- based i nf ormat i on
extraction from the financial reports. The
reports contain lots of critical information in
the tables as well as in the embedded text.
Three major algorithms for text analytics have
been collectively and complementarily used
in our approach. They are the algorithms for
categorization, named entity extraction and
data transformation to standardized templates.
Figure 2, shows the functional block diagram of
the information extraction process. Knowledge-
base has been created based on (a) controlled
vocabularies, and (b) domain knowledge in the
form of ontologies. A controlled vocabulary is
a collection of terms organized in a hierarchy
intended to serve as a standard nomenclature.
The purpose of a controlled vocabulary is
to provide a common set of terms. Domain
knowledge ontology is a set of classes and
associated slots that describe a particular
domai n. Domai n speci al i st can feed the
knowledge directly onto the knowledge-base
65
resulting in a comprehensive knowledge base.
Protg can be used to build the knowledge-
base. Protg is a frame-based knowledge-
representation system that offers classes, slots,
facets, instances and slot value [8].


INPUT SOURCE
Most of the financial reports are published
in PDF formats. So the first step in the pre-
processing stage is to convert the PDF to
XML. Text elements in this XML have the
following attributes: top - vertical distance
from the top of the page (y coordinate), left -
horizontal distance from the left border of the
page (x coordinate), width - width of the text
chunk, height - height of the text chunk and
font - this attribute describes the font size,
family and color of the text chunk. The XML
document does not have regular HTML mark-
up for table. Hence identification of the tables
becomes a complex task. One needs to identify
a set of text elements as part of one table only
by means of the absolute coordinates of these
elements. Here the text chunks that are closely
located (that are at the same y coordinate and
are at slightly separated x coordinates) are
grouped together as one chunk with a single
(x, y) coordinate describing its position. Text
chunks with larger distances separating them
are captured at separate (x, y) coordinates.
The text chunks positional information is
used to identify tables and table span using
geometrical constraints based algorithm and
simple heuristics.
CATEGORIZATION
Categorization is an approach of grouping
objects like documents based on the properties
they share and are based on a training set of
previously labeled objects. A trained categorizer
system can be used to assign categories to the
previously unseen documents. An in-house
developed N-gram words-based Bayesian
classifcation technique has been used for the
solution.
Financial reports are categorized into
certain categories based on the geographical
location of the company that publishes it. This
is necessary because if companies are based in
some countries like China or Japan then there is
a need to apply a different set of business rules
to extract certain information, like in this case
current proft need to be extracted.
NAMED ENTITY EXTRACTION
Named entity recognition is an important task
performed by information extraction systems.
It can be based on domain-specifc lexicon or
based on natural language processing, driven
by certain patterns and heuristics. Texts in
fnancial reports are usually unstructured. This
text is processed by the named entity system
to identify and tag certain predefned entities
such as - names of analyst, company name,
Pre-processing
(PDF or XML)
Categorizer
Post-processing
(Data Transform)
Domain Knowledge
Heuristics
Table Identification
(Area and Location)
Named Entity
Extraction
Multi-dimensional
Attribute Value Mapping
Staging Database
Input
(PDF)
Output
(XML)
Figure 2: Functional Block Diagram for Information
Extraction Process
Source: Infosys Research
66
ticker symbol, stock listing, publish date, year
end date, monetary values, etc. This tagged
information is imported into the metadata in
the standard template. Further based on the
tagged data different business rules are applied
to flter the data extracted. For instance the year
end date extracted from the report is used to
specify the base year that in turn describes the
completely qualifed fscal period for a given
attribute value.
MULTIDIMENSIONAL ENTITY TAGGING
AND DATA MAPPING
The information extracted from the tables
i.e., attribute values, have to be mapped to
the attribute name and to the column header.
This multi-dimensional mapping scheme
for attribute values appearing in the tables
constitute from the data transformation to
standardized template. This process of entity
tagging is dynamic. In other words, variant
attribute names specifed in the tables need
to be matched to the standard attribute name
in the standardized template. The process is
multi-dimensional because the attribute name
and the value then need to be mapped to the
corresponding header columns (fscal period).
Table 1 shows a portion of the table from a
fnancial report. The attribute value $117.0 is
extracted for the attribute name Total Sales
against the column header (fscal period) 1Q07.
The attribute name Total Sales is mapped
to SALES in the standardized template, the
attribute value is 117.0, the currency is USD
and fscal period is the quarter ending on 30
Jun 07. The translation of 1Q07 to the quarter
ending on 30 Jun 07 is based on certain entities
(year-end date, company name) extracted from
embedded text. For instance, if it is an Indian
company then 1Q would end on 30 Jun, but
for a Chinese company the same would end
on 31 Mar. The knowledge base is used to
make further such decisions while mapping
the attributes values. Once all the relevant
information is extracted and tagged, they can
be stored in a standardized template that is
defned as per user requirement.
INTERMEDIATE STAGE
The extracted information from the financial
report can be stored in a staging database
dependi ng on t he us er r equi r ement .
Information can be stored in two forms:
(i) storing the information directly in the
standardized template, and (ii) storing the
information in the form of an ontology in
the XML/RDF format in a database like
Sesame [9]. The advantage of using the RDF
format is that the user can fire queries based
on changing requirements, to the Sesame
database to extract required information.
A sampl e query coul d be: Show me al l
companies whose percentage sales increased
10% since last quarter. The specific goals and
objectives may change over a period of time
and the RDF store of data can be useful in
such a scenario.
Fy2006 1Q07 2Q07 3Q07 4Q07
Total Sales $459.2 $117.0 $122.6 $126.1 $126.7
Cost of Goods
Sold 298.0 75.9 85.2 88.0 85.7
Gross Income 161.2 41.1 37.4 38.1 41.0
R&D 27.9 8.6 7.9 7.7 8.9
SG&A 84.5 21.4 17.3 20.9 23.6
48.8 11.1 12.2 9.5 5.6
Other Income
(Exp) 1.0 (0.1) (0.8) (1.3) (0.4)
Operating Income
Fiscal year ending March
Info Technologies Income Statement
Table 1: Sample Table Data
Source: Infosys Research
67
QUALITY REVIEW
In order to ensure the accuracy and reliability
of data retrieved, quality review has to be
performed. In the information extraction
stage, the byte offsets of the attribute values
are extracted. Byte offset is very critical
in the quality review process. It helps the
quality analyst trace the original position of
the attribute value in the source document.
In Table 2, all the attribute values extracted
from the source document are highlighted
based on the byte offset value. It is possible
to specifically highlight any given attribute
value in the source document using the byte
offset value.
OUTPUT
The extracted information is transformed to the
standardized template as per user requirement
in the post processing stage and stored in XML/
RDF/ Excel/ Database. A sample XML output
is shown in Figure 3.
DISCUSSION
The preliminary results of our approach for
information extraction from the financial
reports, shows that the automated process is
suffciently precise and the data extracted can
be used for high stake fnancial analysis. The
method was tested on a set of sample fnancial
reports each of which were 8 to 10 pages long.
The approach was built, refned and tuned
based on reports of 10 companies from different
geographical locations that had as many as 8
different tabular styles (including the transpose
tables). Then it was tested on 40 fles that had
further variations. Table 3, shows the recall and
precision of this test. Out of the 40 new reports
that were processed, 15 with table styles similar to
Attributes
AttributeInformation
SALES
AttributeValue
Value 117.0
@FiscalQuarter Q1
@QuarterEnd 2007-06-30
@ScalingFactor 1
@CurrencyType USD
@ActualEstimate Actual
OffsetLocation
AttributeName
@length 3
@offset 18293
AttributeValue
Value 122.6
@FiscalQuarter Q2
@QuarterEnd 2007-09-30
@ScalingFactor 1
@CurrencyType USD
@ActualEstimate Actual
OffsetLocation
@length 5
@offset 18530
AttributeValue
Value 126.1
@FiscalQuarter Q3
@QuarterEnd 2007-12-31
@ScalingFactor 1
@CurrencyType USD
abc
A
abc
abc
abc
abc
A
abc
abc
abc
A
abc
abc
abc
abc
A
abc
abc
abc
A
abc
abc
abc
Figure 3: SA Sample XML Output
Source: Infosys Research
Fy2006 1Q07 2Q07 3Q07 4Q07
Total Sales $459.2 $117.0 $122.6 $126.1 $126.7
Cost of Goods
Sold 298.0 75.9 85.2 88.0 85.7
Gross Income 161.2 41.1 37.4 38.1 41.0
R&D 27.9 8.6 7.9 7.7 8.9
SG&A 84.5 21.4 17.3 20.9 23.6
48.8 11.1 12.2 9.5 5.6
Other Income
(Exp) 1.0 (0.1) (0.8) (1.3) (0.4)
Operating Income
Quality Check
Table 2: Highlighted Output for Quality Review
Source: Infosys Research
Documents Recall Precision
15 85.2% 97.1%
25 78.4% 95.7%
Table 3: Recall and Precision on Sample Financial
Reports
Source: Infosys Research
68
the frst 10 had a recall of 85% with 97% precision
and the remaining 25 documents with different
table styles showed a recall of 78% and a precision
of about 96%. The average processing time was
approximately 18 seconds per document.
Recall and precision were calculated
by validating the automatically extracted data
against the data extracted manually. It is observed
that the extracted data is suffciently accurate.
There are certain values/attributes/
tables that are not fully extracted. This is
mainly because of problems in the PDF to XML
conversion process or due to some column
headers/values that are merged together by the
virtue of being very close to each other.
CONCLUSION
It is important that fnancial reports be subject
to knowledge-based information extraction
process. Information extractors use an optimized
combination of algorithms for data extraction
from free text and tables. Various techniques
like, named entity extraction, classification
and multi-dimensional entity mapping for data
transformation to standardized templates, can
be used for robust extraction. It should be kept
in mind that the process involves identifying
the table location and span based on geometrical
constraints, domain specifc rules and heuristics
from the knowledge-base to guide automated
information extraction.
Extracted data if displayed in an editable
form helps smoothen the quality check process.
If stored in one of the structured formats like
XML/RDF/Excel, it can help analysts be very
precise in their analysis.
Learnings from information extraction
from financial reports can be extrapolated to
other information rich domains provided one
has suffcient knowledge of business rules in
such domains.
REFERENCES
1. R Zanibbi, D Bolstein and J R Cordy, A
Survey of Table Recognition, International
Journal on Document Analysis and
Recognition, Vol 7 No 1, 2004
2. David W Embley, Cui Tao, Stephen W
Liddle, Automating the Extraction of
Data from HTML Tables with Unknown
St r uct ur e, Dat a and Knowl edge
Engineering Archive, Vol 54 , 2005
3. M Hurst , Layout and Language:
Challenges for Table Understanding on
the Web, Technical Report, WhizBang
Labs, 2001
4. W W Cohen, M Hurst and L S Jensen, A
Flexible Learning System for Wrapping
Tables and Lists in HTML Documents,
WWW 02 Proceedi ngs of the 11th
International Conference on World Wide
Web, Honolulu, Hawaii, USA, 2002
5. Ashwin Tengli, Yiming Yang and Nian
L Ma, Learning Table Extraction from
Examples, International Conference On
Computational Linguistics, 2004
6. David Pinto, Andrew Mccallum, Xing
Wei and Bruce W Croft, Table Extraction
using Conditional Random Fields, in
Proceedings of SIGIR, Conference, 2003
7. Burcu Yildiz, Katharina Kaiser and
Silvia Miksch, A Method to Extract Table
Information from PDF Files, Proceedings
of the 2nd Indian International Conference
on Artifcial Intelligence, 2005
8. Protg Ontology Editor and Knowledge-
base Framework. Available at http://
www. smi . st anf ord. edu/proj ect s/
protege/
9. Sesame Java Framework for Storing,
Querying and Reasoning with RDF
and RDF Schema. Available at www.
openrdf.org.
69
SETLabs Briefings
VOL 7 NO 5
2009
A Differentiated Approach to
Business Process Automation using
Knowledge Engineering
By Ashish Sureka PhD and Venugopal Subbarao
Increase operational efficiency by automating
knowledge-intensive processes
M
ost knowledge-intensive back-office
business processes involve manual
processing of unstructured textual data for
making decisions. Manual processing of
textual data is time consuming, requires
domain expertise, is error prone and cannot
be scaled. A signifcant amount of cost savings
and operational effciencies can be achieved by
automating few of the activities that require
signifcant human intervention. An application
of knowledge engineering (KE) that employs
text extraction techniques to the domain of
business process automation (BPA) is proposed
in this paper. Insights and knowledge derived
from past projects in this area is leveraged in
coming up with the solution.
This paper lies at the intersection of
three felds business process automation,
knowledge engineering and text analytics.
BUSINESS PROCESS AUTOMATION
BPA consi sts of i denti fyi ng, anal yzi ng,


















automating and streamlining some of the time-
consuming manual-labor intensive processes
and activities as part of a business operation
or workflow, in order to increase operation
effciency and reduce cost. The process consists
of studying an existing business process
(as-is process), ordering of activities, inputs
and outputs to each activity and identifying
opportunities for automation. Typically activities
that are manual, time consuming, error prone,
tedious in nature, have dependencies on experts
are candidates for automation. Based on the
knowledge gained from as-is process analysis,
an improved and re-engineered process (to-
be process) is created which demonstrates
business value. Operationally intensive business
processes can be fairly automated. Automation
of knowledge intensive processes poses lots
of challenges when it comes to capturing the
logic or rules of automation from experts and
interpreting information from unstructured
sources of information.
70
KNOWLEDGE ENGINEERING
KE is a feld within artifcial intelligence related to
the study and development of expert systems or
intelligent knowledge-based systems in order to
solve real world problems. The basic premise is to
capture an experts or a practitioners knowledge
into a machine readable or executable format.
A system is developed that can emulate same
actions and reasoning as performed by an expert
in a similar situation. The process of building such
knowledge based systems consists of acquiring
or capturing an experts knowledge, encoding
it in computer readable format (for e.g., some
of the representation formats are rules, decision
trees and ontology) and building an inferencing
or reasoning component to exploit the captured
knowledge (knowledge base) to make intelligent
decisions. Typical applications are in the area
of diagnosis or debugging (for e.g., defect
discovery and identifying causes of malfunctions
in manufacturing and health care industry), repair
(prescribing remedies and solutions), forecasting
and predictions (predicting outcomes based on
historical data and knowledge base).
TEXT ANALYTICS
Text analytics or text mining consists of
analyzing large volumes of free-form textual
data for extracti ng useful patterns and
discovering implicit knowledge locked in the
underlying data. Text mining and natural
language processing techniques have been
applied to diverse industry domains for
solving a variety of problems. The need to
analyze unstructured data sources viz., free-
form textual data and discovering knowledge
to make intelligent decisions has been a major
driver for the surge in research and business
activity around text mining. The word mining
in the phrase text mining is used in the
context of automatic knowledge and pattern
discovery. Applying data mining and text
mining to specific domains has resulted in
several variants of data mining such as patent
data mining, web content mining, educational
data mining, biomedical data mining and
software engineering data mining. However,
the objective behind all these application areas
is automatic knowledge discovery. Several
text mining and natural language processing
algorithms have been invented and successfully
applied to a variety of data mining tasks.
Application of text processing techniques for
the purpose of mining data has become a well
known and common practice.
THE DIFFERENTIATED APPROACH
The focus of this article is to demonstrate
application of knowledge engineering and
intelligent text processing techniques to an
emerging and promising area of automating
knowledge intensive business processes.
Automating a business process that largely
consists of dealing with text does not involve
any mining or knowledge discovery activity.
The only purpose here is to automate some
Focus of
this paper
(Intersection of
BPA +KE +TP)
Text Processing
Knowledge
Engineering
Order
Management
Pay-roll
processing
Expert
Systems
Machine
Translation
Knowledge
based systems
Knowledge
Management
Business Process
Automation
SpendAnalysis,
Reconciliation,
Customer
Sentiment
Analysis
Figure 1: Diagram Illustrating the Focus of this Paper
Source: Infosys Research
71
of t he manual t ext mani pul at i on t asks
per f ormed by a knowl edge worker t o
improve productivity, accuracy and achieve
manual independence. It is strongly felt
that business process automation using
knowledge engineering and intelligent text
processing technique is an area that has a
wide application and is relatively unexplored.
The number of available tools or packaged
software that caters to the needs of such text
related business process automation is small
as compared to the overall market size and
opportunity. Several business processes were
studied and witnessed over the past six months
where a clear opportunity and application of
knowledge engineering and text processing
was seen for transforming the current manual
labor-intensive business process. The business
process t hat we experi enced appl i es t o
industry verticals such as banking, retail,
automotive, and news and media.
The structured approach that was used
to come up with a solution is as listed below.
Demons t r at e t he appl i cat i on of
knowledge engineering and intelligent
text extraction techniques to the domain
of BPA. A combination of these three
fields is a promising but a relatively
unexplored area.
Present generic solution architecture and
illustrate some of the key technology
components and activities that are involved
as part of developing such a system.
Present few real-world examples and
scenari os from di fferent i ndustry
verticals and present a case or two in
detail as concrete examples to illustrate
the concepts discussed in this paper.
List some of the technical and non-
technical challenges and limitations of
developing such a process automation
system.
The Solution Framework
A traditional software development life cycle
(SDLC) model is employed for developing
quality IT solution for such text and knowledge
i nt ensi ve appl i cat i ons wi t h f ew mi nor
modifcations to the overall process. The solution
emphasizes more on analyzing and capturing
the required domain experts knowledge and
decision making logic to enable automation of
various knowledge-intensive and unstructured
information processing activities. Depending
on the problem at hand and specifcity of the
solution, it is also required to evaluate various
implementation options for the text analytics
(for e.g., rule-based, machine-learning based
Make the most of the demand-supply gap for business
process automation tools by actively developing and
promoting knowledge engineering applications
72
or a hybrid) and knowledge-based systems.
Seamless integration of new systems and
utilities to monitor the change management
are the key to the success of implementing the
new systems.
Figure 2 illustrates the broad stages and
the various activities in each of the stages that
are required to be carried out while executing
such projects.
The first phase in the development
of knowledge-intensive and text related
business process automation solution consists
of detailed process modeling and analysis.
The two sub-activities that fall under process
modeling and analysis are as-is process
understanding and process analysis. The
primary motivation behind this step is to
develop an as-is process model and study the
process in detail from the perspective of finding
opportunities for automation. As illustrated in
Figure 2, as-is process understanding consists
of studying the business operation, building
a process model, studying various processes
and sub-processes, dependencies between
processes, inputs and outputs to each process
or activity, organization structure, IT systems
involved and actors.
Process analysis primarily consists of
identifying labor-intensive and knowledge-
intensive activities, listing down the time
consumed while executing each of the processes
and sub-processes, common errors made, data
volume processed, complexity of the process,
number of people involved, criticality of each
task and an assessment of the level of diffculty
in automation. The next phase consists of
Figure 2: Steps and Activities Involved in Process Analysis
and System Design
Source: Infosys Research
1. As-is Process Modeling
2. End-to-End business
operation
3. Organization Structure
4. Processes, Sub-
Processes
5. Activities andTasks
6. Process/Activity Ordering
7. Process/Activity
Dependencies
8. Inputs and Outputs
9. Identification of
Knowledge intensive
activities
10. Artifacts produced
11. IT Systems involved
12. Internal &External Tasks
13. Actors/Roles
1. Labor intensive activities
2. Knowledge intensive
activities
3. Repeated/Mechanical
tasks
4. Time consumed at each
step
5. Accuracy achieved
6. Common errors/mistakes
7. Data volume processed
8. Complexity level of each
step
9. Difficulty in automation
10. Dependency on experts
11. Number of people
involved
12. Level of expertise required
13. Criticality of each activity
14. Alternatives
1. Input and output data
analysis
2. Documenting exceptions
3. Interviewing and
observing practitioners
4. Documenting experts'
knowledge
5. Acquiring domain
knowledge
6. Feasibility study
7. As-is systemanalysis
8. Accuracy requirements
9. Performance
requirements
10. Systemintegration
requirements
1. Knowledge encoding
(rules, ontology, decision
trees)
2. Lexicon, termlookup
tables/dictionary
3. Reasoning and
Inferencing engine
4. Text extraction engine
5. Text classification engine
6. Data modeling/schema
definition
7. Input/output XML Schema
definition
8. Systemintegration
9. Tool evaluation
10. Testing and validation
Process Modeling and Analysis System Design and Development
As-is Process
Understanding
Process Analysis
Requirement
Gathering
Design and
Development
Process Model Automation Opportunities Requirement Specifications
Design Document
and Production System
73
system design and development. System
design and development follows the typical
phases of a software development lifecycle like
requirement gathering, feasibility study, design,
development and testing.
For such projects, we suggest the adoption
of a framework based on CommonKADS
(leading methodology to support structured
knowledge engineering) for knowledge analysis
and knowledge-intensive system development.
CommonKADS methodology brings scientifc
discipline to knowledge engineering. The
methodology has been gradually developed
over a period of time and has been validated
and adopt ed by many compani es and
universities around the world. It is in-fact
quoted as the European de facto standard for
knowledge analysis and knowledge-intensive
system development. CommonKADS is a
comprehensive methodology that [1, 2]:
Enables to spot opportunities and
bot t l enecks i n how organi zat i ons
develop, distribute and apply their
knowledge resources
Provides tools for corporate knowledge
management
Provides the methods to perform a
detailed analysis of knowledge-intensive
tasks and processes
Supports the development of knowledge
systems that support selected parts of the
business process.
CASE STUDIES
We present two case studies to illustrate the
concepts presented in this paper. The frst case
study is in the area of automating account
reconciliation from unstructured data sources
and the second one is in the area of spend
analytics. The following case studies lie at
the intersection of the three felds business
process automation, knowledge engineering
and text analytics.
Case Study 1 Automatic Account
Reconciliation
Account reconciliation is a process of matching
accounts by comparing records or transactions
in account statements to make sure that they are
in agreement. It is done to uncover any possible
discrepancies or incompatibilities. Account
reconciliation is a process that is performed
by individuals, businesses, brokers, various
fnancial institutions and investment banks. It
is done on a periodic basis (for e.g., on a daily,
weekly or monthly basis) or on an on-demand
basis (specifying the interval by giving a start-
time and end-time) based on the transaction
volume and business need.
An example of account reconciliation at
individual level will be to compare transactions
present i n credi t card st at ement , check
disbursements, cash-withdrawal statements
against the bank account statements to make
sure that there is a balance between the amount
debited and amount spent. Another instance can
be of matching the books of a broker (investment
manager or dealer) and a custodian bank for
shares or security transactions. Matching is
done based on record attributes viz., security
ID, date of trade, type of transaction, number
of units and amount of transaction. Figure 3
overleaf illustrates a simplifed business process
model for trade reconciliation in the context of
an investment bank. Typically, the data from
both the parties (bank and broker) are brought
into a staging area and a manual reconciliation
of transactions is done. A list of transactions
74
that does not match (exceptions) is created and
is sent to another process called as exception
management.
There exist several problems with manual
reconciliation or paper based reconciliation.
Manual reconciliation cannot scale when
volumes are high, is error prone and time
consuming. It is also expensive due to high
manpower cost. Automating the process of
account reconciliation enhances scalability,
accelerates the overall process, reduces errors
and operational risk and results in better
tracking and visibility into the process. As a
result of these gains, organizations can make
informed decisions, increase their competitive
advantage, increase effciency and save cost.
Tools are available that can automate the
process of reconciliation across a wide range
of financial instruments. Various software
tools provide capability to read account
statements available in structured form, match
records based on the rules specifed by an end
user and then generate reports. Processing
unstructured data (for e.g., free-form textual
data in PDF format or in excel format without
any standardization) for the purpose of
account reconciliation is still a pain-point
for the fnancial institution community and
is a relatively an unexplored area. A single
platform that can perform reconciliation, both
from structured as well as unstructured data
sources, is highly desirable and can result in
huge effciency gains.
The biggest technical challenge is to
address the variability and different formats of
the unstructured data. The data pre-processing
and text information extraction modules need
to be customized for a specifc input type. The
structural layout, terms used and presentation
of the text in a fnancial statement need to be
studied while developing text extraction rules.
The two different sources from which the
fles are obtained may use different notations,
symbols, variable names and units in recording
the transactions. Such information needs to
be captured with the help of domain experts
and people who are currently performing this
operation. This knowledge becomes the input
to the text processing system. For example,
if the columns that need to be matched in
PDF Excel Statements
DataAcquisition
Source 1
DataAcquisition
Source 2
Create Intermediate
Storage (Staging)
PerformReconciliation
PerformReconciliation
PerformReconciliation
Exception
Management
Reporting
Manual Text-relatedActivity
PDF
Figure 3: Simplified Business Process Model for Trade
Reconciliation
Source: Infosys Research
75
the two tables have different names then the
domain expert needs to provide a mapping
table that maps the column names in one table
to the corresponding column in the other table.
The text processing module incorporates the
mappings between two table column names.
Similarly, the system takes input as mapping
information between values or notations used
or information required to normalize the units
in which some of the columns are represented.
Domain Knowledge-based and Text Processing
Approach
It is important for one to understand how
the tacit knowledge of a domain expert can
be encoded in a machine readable format
for the purpose of automatic reconciliation.
As illustrated in Figure 4, the end-to-end
process of account reconciliation consists of
two phases.
The frst phase consists of data extraction
from structured and unstructured data sources,
as well as performing standardization. The
second step consists of matching transactions.
Both the phases require domain knowledge in
order to perform the required task. Tables 1 and
2 overleaf illustrate the concept and knowledge
required using a simple example. Tables 1 and
2 represent two simple fnancial statements that
need to be reconciled. Both the tables consist of
four columns. However, all the columns may
not be required for reconciliation process. Also,
the column names in both the tables may not
always match even though the two columns
may represent the same information. For
example, the frst column named as Category
in Table 1 corresponds to the third column
named as Type in Table 2. Even at the column
value level, the scale and units used can be
different. For example, the Lots column in
Table 2 needs to be multiplied by 5 in order to
compare it with the Units values in Table 1.
Similarly, B and S in Table 1 correspond to
Buy and Sell in Table 2.
Phase 1: Unstructured to Structured Data Conversion + Standardization
Pre-processing
Extraction Standardization
UI Based Configuration
(by Business User)
Phase 2: Matching
Matching
Statement 1 Statement 2
Feed 1 (Bank)
Only Excel
Feed 2:
PDF or Excel
PDF
Structured Data
(Tables in RDBMS)
Figure 4: Two Phases on Trade Reconciliation Process Source: Infosys Research
76
Col umn mappi ng, normal i zat i on
and standardization is done by referring
to a knowledge base. Figure 5 illustrates a
knowledge base that is created by a domain
expert. The knowledge is encoded during the
design time (confgured through a business
analyst graphical user interface).
The knowledge base is leveraged at
the run-time by the system. These are some
exampl es of domai n expert knowl edge
that needs to be acquired and encoded in
a computer readable format for the system
to perform record matching. Normally, the
knowledge worker or the domain expert
performi ng the reconci l i ati on operati on
is trained on the process and the domain
knowledge is with her as a result of training
or practice. Another common example of prior
knowledge is the rule that a single transaction
in one table having a unique transaction ID
and a instrument ID (for e.g., a security ID)
can be matched with two transactions in the
second table having the same instrument ID
such that the sum of the transaction amount
in the second table is equal to the transaction
amount in the first table.
The basic premise of our text extraction
approach from PDF documents is to exploit the
structural layout properties and information
relating to the position of the text within a
document for solving the problem of table data
extraction (as most of the transaction data is in
the form of tables). We break the problem of
table data extraction into two steps. The frst step
consists of table boundary identifcation and the
second step consists of table decomposition.
We formulate the problem of table
identifcation as a sequence labeling task and
use machine learning technique like Conditional
Random Fields (CRFs). Hand-crafted rules and
heuristics were implemented for the task of
table decomposition. The approach was applied
on synthetic and real-world data. Tables could
be successfully extracted from input PDF fles
with an accuracy of around 90% for non-column
and non-row spanning tables from single
column PDF documents. The standardization
and matching process requiring access to
domain knowledge is accomplished with 100%
accuracy for a specifc dataset and problem.
Case Study 2 - Spend Data Analytics
Analyzing spend data (money spent towards
procurement of direct and indirect goods
Consists of Consists of
Lots Buy/Sell
Buy/Sell
Type
Equivalent to
Category
Categorical
Also called as Also called as
Units
Numeric
Data type
A
B
Y
X
Has
Transaction
Figure 5: Sample Knowledge Base or Ontology for
Reconciliation
Source: Infosys Research
TID Category B/S Units Month
01 X B 55 May
02 X S 60 May
03 Y S 150 J une
Table 1: Transactions in Statement 1
Source: Infosys Research
TID Buy / Sell Lots Type Tax Paid
01 Buy 11 A Yes
02 Sell 12 A No
03 Sell 75 A Yes
Table 2: Transactions in Statement 2
Source: Infosys Research
77
and services) is an important and strategic
activity for organizations for the purpose of
optimizing their cost and realizing savings.
The purpose of spend analysis is to get an
aggregate view (as well as a drill-down/
break-up view) of the spend data and gain
useful insights from this data to come up with
a sourcing strategy, negotiate better prices
and consolidate suppliers and vendors. This
activity is especially critical for organizations
that have a global presence, multiple offices
and departments, are diversified and deal with
multiple suppliers.
As illustrated in Figure 6, spend data
categorization workflow consists of three
main activities aggregation, cleaning and
classifcation.
Data aggregation consists of bringing
data from different transaction systems into
a single data warehouse or a database. An
organizations spend data can be lying in
multiple heterogeneous data sources (because
of multiple departments, operations in several
locations and lines of businesses) that need to
be loaded into a single repository for analysis.
The frst step in the end-to-end process
consists of identifying the various data sources
and bringing the required data into a single pre-
defned repository. The next step consists of data
cleaning wherein operations such as missing
value analysis, data standardization, outlier
detection and removal and data normalization
are performed. This is an important step and
any errors during this step will likely have an
impact on the accuracy of the following step of
data classifcation.
Data cleaning step involves enhancing
the overall quality of the data as the accuracy
and reliability of the data analysis result is
very much dependent on the quality of the
underlying data. Figure 7 illustrates one
example of data normalization within the
data cleaning step. Let us consider Infosys
Technologies Limited as one of the vendor
companies listed in the spend data transactions
of an organization. Figure 7 illustrates five
different ways in which Infosys Technologies
Limited can be referred. Such situations
(wherein a single entity is referred in varied
forms in multiple places) are common in spend
data transactions and thus, data normalization
needs to be performed to correct such data
quality issues.
A knowledge-base having terms and
phrases and its variants can be leveraged to
Aggregation
Spend Data
(LocationA)
Spend Data
(Location B)
Cleaning Classification
Figure 6: Spend Data Categorization Workflow
Source: Infosys Research
Infosys
Infosys
Technologies
Infosys
Technologies
Limited
ITL
Infosys Ltd
Infosys Technologies
Limited
Figure 7: An Example of Knowledge-base for the Purpose
of Term or Phrase Normalization
Source: Infosys Research
78
perform data normalization. Also, simple
heuristics and pattern-matching techniques on
text strings can be applied to address issues
related to data normalization. After performing
data cleaning and addressing all aspects
related to data quality, the fnal step of data
classifcation is performed.
Data classifcation and categorization
consists of assigning a transaction to one of
the nodes in product taxonomy (standard
taxonomy or company-specifc) [3, 8].
Figure 8 illustrates a small section of
UNSPSC codes under the consumer electronics
category. As illustrated in the fgure, each item
in the spend dataset needs to be assigned to the
correct node in the spend data taxonomy. The
classifcation is performed based on attributes
like the item code, item name, item description
and vendor name.
As illustrated in Figure 6, the data
cleaning and data classifcation step involves
a lot of manual effort and analysis. It is
a knowledge-intensive activity and also
involves dealing with textual data. A lot of
companies perform manual analysis of spend
data wherein a knowledge worker reads each
and every transaction and classifes it into one
of the categories in the product taxonomy.
The cleaning and classifcation of transaction
requires knowledge about different products,
product categories and sub-categories, supplier
company, supplier details, parent company,
product code, etc., that makes it a knowledge
intensive and a text processing task. Knowledge
engineering and intelligent text processing
techniques can be used to automate this process.
Named-entity extraction, data normalization
and data classifcation or categorization (based
on key-words and semantics) techniques are
developed that can be used in a spend data
analytics solution. The data cleaning or data
pre-processing and also the data classifcation
system needs to be customized and confgured
to optimally solve spend data problem for a
specifc domain, industry or an organization.
CHALLENGES
The key challenges in developing knowledge-
i nt ensi ve t ext - rel at ed busi ness process
automation applications are as follows:
Capturing Expert Knowledge: Normally,
knowl edge wor ker s per f or mi ng s uch
knowledge-intensive, text-related back-end
business processes undergo several days
or weeks (depending on the process and its
complexity) of training to understand the
details of the processes and activities that need
to be performed. Acquiring this knowledge and
representing it in a machine readable format is
a challenging task. In our implementation of
the case studies illustrated in this paper, we
captured knowledge from an expert using a
graphical user interface designed specifcally for
a particular problem. The knowledge captured
is stored in the form of rules and ontology.
UNSPSC Code 52160000: Consumer electronics
UNSPSC Code 52161500: Audio and visual equipment
UNSPSC Code 52161502: Cassette players or recorders
UNSPSC Code 52161505: Televisions
UNSPSC Code 52161507: Clock radios
UNSPSC Code 52161508: Laser disc players
UNSPSC Code 52161600: Audio visual equipment accessories
UNSPSC Code 52161601: Cassette storage
UNSPSC Code 52161602: Audio or video head cleaners
UNSPSC Code 52161603: Compact video cassette adapter
UNSPSC Code 52161604: Headphone jack adapters
Figure 8: A Small Section of UNSPSC codes Under the
Consumer Electronics Category
Source: Infosys Research
79
Varied Input Data Format: Dealing with all
the variations in the input dataset (for e.g., fle
type, format and layout) is a challenging task.
The unstructured textual data that needs to be
processed can come from different departments
and organizations each with their own format
and layout. Each format and document type
needs to be studied carefully and appropriate
data adapters and text parsing routines needs
to be implemented. In our implementation
of the case studies illustrated in this paper,
we developed text processing module for the
purpose of extracting data from a specifed
format pertaining to the specific business
process and client organization.
Integration with Current Process and System:
Automation of some of the manual activities
will result in change of workfow and existing
systems. Integration of the transformed system
to existing technology infrastructure with
minimum changes and additional effort is an
important and challenging activity. The design
and implementation of the transformed system
should be sensitive to such issues and should be
handled in such a way that minimum amount of
resources and time are consumed to incorporate
a transformed system within the organization.
Change Management: People are used to
working with an existing system and hence
a disciplined change management initiative
is required before introducing a transformed
system and replacing the old process with the
new process. It is important to articulate the
value proposition and the working of the new
system to all the stakeholders before rolling it
out within the organization.
CONCLUSION
The application of knowledge engineering
and text extraction techniques to the domain
of knowledge-intensive text-related business
process automation can be seen as a promising
area, especi al l y i n the busi ness process
outsourcing world. The advancement in the
state-of-the-art in the feld of text analytics and
knowledge engineering will enable automation
or semi-automation in manually intensive back
offce business processes involving information
extraction from unstructured textual data. There
are two main technical challenges that need to
be addressed. One of the challenges is to be able
to acquire domain knowledge from an expert
and encode it in a machine understandable
format and the other is developing text
extraction systems that are able to accurately
extract required information from diverse
input data types and formats. There are several
applications that fall at the intersection of TA,
KE and BPA spanning industry verticals but
based on our recent experiences, maximum
traction and demand for such technology is seen
in the fnance and banking industry.
REFERENCES
1. Guus Schreiber, Hans Akkermans, Anjo
Anjewierden, Robert de Hoog, Nigel
Shadbolt, Walter Van de Velde and
Bob Wielinga, Knowledge Engineering
and Management: The CommonKADS
Methodology, MIT Press, December
1999
2. Engineerig and Managing Knowledge,
CommonKADS Homepage. Available
at http://www.commonkads.uva.nl/
3. Saikat Mukherj ee, Dmitriy Fradkin
and Michael Roth, Classifying Spend
Transactions with Off-the-Shelf Learning
Co mpo ne nt s , I EEE I nt er nat i onal
Conference on Tool s i n Art i fi ci al
Intelligence (ICTAI), Vol 1, November
80
2008. Available at http://paul.rutgers.
edu/~dfradkin/papers/mukherj ee-
Spend.pdf
4. Moni nde r Si ngh a nd J a ya nt R
Kalagnanam, Using Data Mining in
Procurement Business Transformation
Out sourci ng, 12t h ACM SI GKDD
Conference on Knowledge Discovery and
Data Mining KDD 2006 Workshop on
Data Mining for Business Applications,
August 2006
5. Moninder Singh, Jayant R Kalagnanam,
Sudhir Verma, Amit J Shah and Swaroop
K Chalasani, Automated Cleansing for
Spend Analytics, Proceedings of the
14th ACM International Conference
on I nf or mat i on and Knowl edge
Management, 2005
6. Andrew Bartels, Market Overview 2008:
Automated Spend Analysis, Forrester
Research, April 2008
7. The United Nations Standard Products
and Services Code (UNSPSC). Available
at http://www.unspsc.org
8. eCl@ss: International Standard for
the Classification and Description of
Products and Services. Available on
http://www.eclass-online.com.
81
THE LAST WORD
Power Your Enterprise with
Knowledge. Be Smart.
H
umans have been manual l y usi ng
knowledge for millions of years to
solve problems in all areas of their endeavor
personal, social, scientific or business.
Professionally speaking, I have reasons to
bel i eve that ti me has come to empower
information systems with knowledge to create
smart enterprises.
Knowl edge engi neeri ng ( KE) has
emerged as a mature discipline over the last three
decades, enabling IT professionals to integrate
human and machine derived knowledge into
IT systems in order to solve complex problems
faster and/or more accurately. KE broadly
consists of the processes, models and techniques
to elicit, structure, formalize and operationalize
information and knowledge.
The or e t i c a l l y s pe a ki ng, KE i s
predominantly seen as a branch of artificial
intelligence (AI), which in turn integrates
multiple disciplines encompassing knowledge
representation and reasoning, problem solving,
machine learning, etc., where some of the popular
knowledge representation and reasoning systems
include ontology, rule (logic)-based systems,
case-based reasoning systems, neural networks,
Bayesian networks, and fuzzy systems.
It is an accepted fact that the two generic
types of competitive advantage for business are
cost leadership and differentiation. Let us consider
a few interesting scenarios where knowledge
forms the key differentiator of business.
Consider the case of extracting qualitative
and quantitative fnancial information such as,
trends in growth of revenues and net-profits
from quarterly and annual reports of industries.
Financial services organizations traditionally
accomplish these activities manually through
highly paid fnancial experts. The current trends
demonstrate the feasibility of utilizing domain-
specific knowledge to semi-automatically
extracting such fnancial information from a range
of unstructured sources of data. One should note
here that every small percentage improvement in
the productivity of quality of this service leads to
both competitive advantage and enhanced profts.
Competitive advantage has become the most
stable source for differentiation today. But
arent imitators catching up? Consulting editor
Dr. T R Babu feels that only knowledge powered
enterprises can continue to remain competitive.
82
Knowledge-based automated human
face recognition system that combines image
processing and pattern recognition is much more
effcient than a conventional system, when it makes
use of the knowledge of complexion or racial
groups available within data, anthropological
facts, image background and other artifacts.
Intelligent, multi-agent systems have
the potential to offer a new gamut of business
opportunities across almost all the vertical and
horizontal industry segments. These opportunities
include smart personalized software assistants on
mobile handheld devices that enable their owners
to know the latest raw and derived information
of interest, anywhere and anytime.
In case of aerospace i ndustry one
encounters a number of fai l ures duri ng
subsystem testing in spite of careful design of
each subsystem and their subsequent harness.
In aerospace systems, tolerance of error is zero.
There cannot be partial successes and most
subsystem failures turn out to be catastrophic
for the overall mission. Such failures during
subsystem testing and integration forms a
learning experience ultimately leading to
the desired system. Usually such failures are
handled by domain experts and their sources
are identifed through elaborate brainstorming.
A KE based system that mines the subsystem
test results data and provides leads to possible
cause-and-effect relationship is of immense
use. Such a system includes a number of
parameters that are beyond the list suggested
by conventional wisdom. When the set of
inputs for such failure analysis emanates from
knowledge based system, it forms the key
differentiator for business advantage over its
rival industries.
Semantic Business Intelligence (BI)
technologies combine data mining from
structured and unstructured sources with
semantic mappings. Consider retail industry
that consists of a chain of malls. Enormous
amount of knowledge can be derived from
sal es t ransact i on dat a wi t h t he hel p of
data mining techniques that detect subtle
relationships among the items purchased.
Such knowledge helps in identifying customer
preferences, bunch of items that are bought
together, amount of time spent in locating
such items, etc. Such knowledge can provide
immense advantage to the retailer in that it
helps the retailer ensure better arrangement
of items within the mall to reduce redundant
occupancy, work out product promotion
strategi es and most i mportantl y, enti ce
cust omers and work around cust omer
retentions strategies.
To sum it up, when KE techniques are
applied to specific engineering or business
problems that have well defned boundaries
to extract, refine and re-use knowledge, it
is possible to create smarter enterprise IT
systems. Such systems offer either faster and/
or more accurate results leading to competitive
advantages.
About the Consulting Editor
Dr. T. Ravindra Babu is a Principal Researcher
with the Education and Research unit of Infosys
Technologies Limited. He holds a PhD degree for his
work in the areas of Pattern Recognition and Data
Mining from Indian Institute of Science, Bangalore.
He had earlier worked for Indian Space Research
Organization (ISRO) for over 24 years. Dr. Babu can
be reached at ravindrababu_t@infosys.com.
83
Index
Algorithm 4, 27, 55, 61-62, 65
Constraints-based 65
Genetic 4
Inference 55
Junction Tree 55
Knowledge-based 61-62
Likelihood Weighting 55
Pearl 55
Stemming 27
Variable Elimination 55
Analysis 26-27, 58, 60-63, 67, 69, 72-73, 77, 82
Data 60, 77
Failure 82
Financial 61-63, 67
Identifer 26
Impact 27
Invalid Token 27
Knowledge 73
Process 69, 72
Sensitivity 58
Spend 77
Valid Token 27
Analytics 4, 64, 69-71, 73, 77-79
Spend Data 73, 77-78
Text 4, 64, 69-71, 73, 79
Approach 4, 13, 16, 47, 51, 53, 56, 58, 63, 70-71,
75-76
Activity-based 16
Bucketization 56
Collaborative 4
Content-based 4
Deployment 51
Differentiated 70
Growth-Decline 56
Heuristic-based 63
Knowledge-based 47
Sensitivity Analysis 58
Structured 71
Systematic 53
Task Oriented 13
Text Extraction 76
Text Processing 75
Bayesian Network, also BN 53-54, 59-60, 81
Business Process Automation, also BPA 69,
71-73, 79
Conditional Probability Table, also CPT 54-55,
57
Conditional Random Fields, also CRF 63, 68, 77
Data 77-78
Aggregation 77
Categorization 77
Classifcation 77-78
Cleaning 77-78
Extraction 26, 61-65, 67-69, 75-76, 79
Identifer 26
Information 61-64, 67-68, 75, 79
Named Entity 64-65, 68
Ontology 26
Table Data 62-63, 76
Text 69, 76, 79
Token 26
Fuzzy Model, also FM 53
Geographically Dispersed Team, also GDT 23
Information 3, 5, 13-19, 21- 23
Acquisition 3
Archive 14
Database 5
Management Systems 14
Overload 15, 23
Service Agents 5
Usage Model 16-17
Warehouse, also IW 13-19, 21-22
84
IPTV 47
Knowledge 13-16, 18, 22, 33-35, 37-38, 63-64
Evaluation 34
Process Outsourcing, also KPO 63-64
Sharing 33-35, 37-38
Storage 34
Work Support System, also KWSS 13-16,
18, 22
Workgroup 35
Model 6, 10, 23, 72, 52, 71
Activity Dependency 16
Black Box 10
Business Process 23, 72
Coordination 10
Information Usage, see under
Information
Mental 23
Redux Dependencies 10
Scenario Based Customer Service
Solution, also SBCS 52
Software Development Life Cycle, also
SDLC 71
Neural Network, also NN 53, 81
Ontology 6, 15, 23-26, 28-29, 62, 65-66, 70, 72, 76,
79, 81
Stem Filtering 27
Support Vector Machines, also SVM 53
VoIP 47-48, 52
WordNet 26-27, 29
XML 21, 24-25, 61, 63, 65-68
SETLabs Briefings
BUSINESS INNOVATION through TECHNOLOGY
Editor
Praveen B Malla PhD
Guest Editor
Ravi P Gorthi PhD
Consulting Editor
T Ravindra Babu PhD
Deputy Editor
Yogesh Dandawate
Copy Editor
Sudarshana Dhar
Graphics & Web Editors
Ashutosh Panda
Jayesh Sivan
Srinivasan G
Sudheesh Sreedharan
IP Manager
K V R S Sarma
Marketing Manager
Pavithra Krishnamurthy
Online Support
Rahil Arora
Production Manager
Sudarshan Kumar V S
Distribution Managers
Santhosh Shenoy
Suresh Kumar V H
How to Reach Us:
Email:
SETLabsBriefngs@infosys.com
Phone:
+91-080-41173871
Fax:
+91-080-28520740
Post:
SETLabs Briefngs,
B-19, Infosys Technologies Ltd.
Electronics City, Hosur Road,
Bangalore 560100, India
Subscription:
setlabsbriefngs@infosys.com
Rights, Permission, Licensing
and Reprints:
praveen_malla@infosys.com
Editorial Office: SETLabs Briefings, B-19, Infosys Technologies Ltd.
Electronics City, Hosur Road, Bangalore 560100, India
Email: SetlabsBriefings@infosys.com http://www.infosys.com/setlabs-briefings
SETLabs Briefngs is a journal published by Infosys Software Engineering
& Technology Labs (SETLabs) with the objective of offering fresh
perspectives on boardroom business technology. The publication aims at
becoming the most sought after source for thought leading, strategic and
experiential insights on business technology management.
SETLabs is an important part of Infosys commitment to leadership
in innovation using technology. SETLabs anticipates and assesses the
evolution of technology and its impact on businesses and enables Infosys
to constantly synthesize what it learns and catalyze technology enabled
business transformation and thus assume leadership in providing best of
breed solutions to clients across the globe. This is achieved through research
supported by state-of-the-art labs and collaboration with industry leaders.
Infosys Technologies Ltd (NASDAQ: INFY) defnes, designs and delivers
IT-enabled business solutions that help Global 2000 companies win in a
fat world. These solutions focus on providing strategic differentiation
and operational superiority to clients. Infosys creates these solutions
for its clients by leveraging its domain and business expertise along
with a complete range of services. With Infosys, clients are assured of a
transparent business partner, world-class processes, speed of execution
and the power to stretch their IT budget by leveraging the Global Delivery
Model that Infosys pioneered. To fnd out how Infosys can help businesses
achieve competitive advantage, visit www.infosys.com or send an email to
infosys@infosys.com
2009, Infosys Technologies Limited
Infosys acknowledges the proprietary rights of the trademarks and product names of the other companies
mentioned in this issue. The information provided in this document is intended for the sole use of the recipient
and for educational purposes only. Infosys makes no express or implied warranties relating to the information
contained herein or to any derived results obtained by the recipient from the use of the information in this
document. Infosys further does not guarantee the sequence, timeliness, accuracy or completeness of the
information and will not be liable in any way to the recipient for any delays, inaccuracies, errors in, or omissions
of, any of the information or in the transmission thereof, or for any damages arising there from. Opinions and
forecasts constitute our judgment at the time of release and are subject to change without notice. This document
does not contain information provided to us in confdence by our clients.
NOTES
Authors featured in this issue
ABHISHEK KUMAR
Abhishek Kumar is a Software Engineer at Center for Knowledge Driven Information Systems (CKDIS) at Infosys. He
can be contacted at abhishek_kumar25@infosys.com.
ANJANEYULU PASALA
Anjaneyulu Pasala PhD is a Senior Research Associate at SETLabs, Infosys. His research interests include Software
Engineering and Software Verification and Validation. He can be reached at Anjaneyulu_Pasala@infosys.com.
ANJU G PARVATHY
Anju G Parvathy is a Junior Research Associate with CKDIS at Infosys. She researches in the fields of NLP and Text
Analytics. She can be contacted at anjug_parvathy@infosys.com.
ARIJIT LAHA
Arijit Laha PhD is a Senior Research Associate at SETLabs, Infosys. He researches in Knowledge Work Support
Systems, Pattern Recognition and Fuzzy Set Theory. He can be reached at Arijit_Laha@infosys.com.
ARUN SETHURAMAN
Arun Sethuraman was a Junior Research Associate at SETLabs, Infosys. His research interests include Intelligent
Multi-Agent Systems and Phylogenetics.
ASHISH SUREKA
Ashish Sureka PhD is a Senior Research Associate at SETLabs, Infosys. His research interests are in the areas of Data
Mining and Text Analytics. He can be reached at Ashish_Sureka@infosys.com.
BINTU VASUDEVAN
Bintu G Vasudevan PhD is a Research Associate with CKDIS at Infosys. His research interests include NLP, AI and
Text Analytics. He can be contacted at bintu_vasudevan@infosys.com.
GEORGE ABRAHAM
George Abraham is an Associate Consultant with the Oracle Business Intelligence practice at Infosys. His areas of
interest include Business Intelligence and Innovation Systems. He can be reached at george_abraham01@infosys.com.
JOHN KURIAKOSE
John Kuriakose is a Software Architect with SETLabs, Infosys. He has research interests in semantic technologies and
knowledge engineering. He can be contacted at john_kuriakose@infosys.com
JOYDIP GHOSHAL
Joydip Ghoshal is a Programmer Analyst at Infosys Technologies Limited. He has a vast experience in business
analysis and software development projects. He can be reached at joydip_ghoshal@infosys.com.
KOMAL KACHRU
Komal Kachru is a Researcher with SETLabs, Infosys Technologies. She has several years of research experience in
areas like Artificial Neural Network and Genetic Algorithms. She can be contacted at Komal_Kachru@infosys.com.
MANISH KURHEKAR
Manish Kurhekar is a Programmer Analyst at Infosys Technologies Limited. He has rich experience in Business
Analysis and software development projects. He can be reached at manish_kurhekar@infosys.com.
NIRANJANI S
Niranjani S is Software Engineer in Test Automation Lab at SETLabs, Infosys. She can be contacted at Niranjani_S@
infosys.com
RAJESH BALAKRISHNAN
Rajesh Balakrishnan is a Principal Architect with CKDIS at Infosys Technologies Limited. He has research interests
in NLP, AI and Information Retrieval. He can be reached at rajeshb@infosys.com.
RAJESH ELUMALAI
Rajesh Elumalai is an Associate Consultant with the BPM-EAI Practice at Infosys. His areas of specialization include
BPM and Business Rules Management. He can be contacted at Rajesh_Elumalai@Infosys.com
RAKESH KAPUR
Rakesh Kapur is a Principal Consultant at Infosys Consulting Services. His key areas of interest include consulting
enterprises to enable process transformation. He can be reached at rakesh_kapur@infosys.com.
RAVI GORTHI
Ravi Gorthi PhD is a Principal Researcher with SETLabs, Infosys. His research interests include Knowledge
Engineering and Model Driven Software Engineering. He can be contacted at Ravi_Gorthi@infosys.com.
SUJATHA R UPADHYAYA
Sujatha R Upadhyaya PhD is Researcher with SETLabs, Infosys. Her research interests include Knowledge Modeling,
Ontologies, Machine Learning and Text Analytics. She can be reached at Sujatha_Upadhyaya@infosys.com.
SWAMINATHAN NATARAJAN
Swaminathan Natarajan is Senior Technical Architect with SETLabs, Infosys. His areas of interest include Information
Management and Knowledge Engineering. He can be contacted at Swaminathan_N01@infosys.com.
VENUGOPAL SUBBARAO
Venugopal Subbarao is a Principal Architect with SETLabs, Infosys. His interests are in Information Management and
Knowledge Engineering. He can be reached at venugopal_subbarao@infosys.com.
YOGESH DANDAWATE
Yogesh Dandawate is a Researcher with SETLabs, Infosys. His research interests include Knowledge Engineering,
Ontologies and Text Analytics. He can be contacted at yogesh_dandawate@infosys.com.
SETLabs Briefings
Advisory Board
Gaurav Rastogi
Associate Vice President,
Head - Learning Services

George Eby Mathew
Senior Principal,
Infosys Australia
Kochikar V P PhD
Associate Vice President,
Education & Research Unit
Raj Joshi
Managing Director,
Infosys Consulting Inc.
Rajiv Narvekar PhD
Manager,
R&D Strategy
Software Engineering &
Technology Labs
Ranganath M
Vice President &
Chief Risk Officer
Subu Goparaju
Vice President & Head,
Software Engineering &
Technology Labs
knowledge Powered
IT Systems
In the last three decades, information technology has evolved and matured as
a dependable online business transaction processing (OLTP) technology. Some
trillions of business transactions are processed across the world per day and it
is no surprise that millions of people have confdence to trust the integrity of
this technology. In addition, the last one decade has witnessed the availability
and widespread use of online analytical processing (OLaP) tools that offer
multidimensional insights into the latest enterprise information to the business
decision-makers.
Concurrent to the above evolution, the feld of artifcial Intelligence (aI)
has gone through a series of serious challenges in bringing knowledge into
automated reasoning and action. However, the recent success stories in
applying aI techniques to specifc business problems hold out promises that
this feld has begun to offer acceptable benefts to the business community. a
paradigm shift in information technology, termed as knowledge Powered IT
(kPIT) systems is anticipated. These kPIT systems should enable business users
- semi-automatically or in human-assisted ways - to extract, refne and re-use
actionable enterprise knowledge. For example, the knowledge of experienced
professionals who can diagnose and repair complex engineering artifacts with
expert skills who constitute a small percentage can be made available to novices
who constitute a large percentage, in an attempt to raise the productivity and/
or quality of the novice group. knowledge Engineering is a critical aspect of
kPIT systems. and this discipline covers models to represent various kinds of
knowledge and techniques to extract, refne and re-use such knowledge, where
and when required.
This issue aims to present a landscape picture of emerging trends in business
applications of knowledge engineering that can potentially empower enterprises
to be smart. Be it the usage of divergent terminology to refer to common
business concepts across enterprise IT systems or the usage of domain-specifc
knowledge to automatically extract fnancial data from complex unstructured
sources, the ultimate goal of knowledge engineering is to enable enterprises
move from the traditional way of managing enterprises to that of knowledge-
oriented and knowledge-powered management. all the papers in this collection
weave around a very potent theme knowledge-powered systems for sharp
decision making and effcient management.
we hope you enjoy reading this issue as much as we have in putting it together
for you. Needless to mention, your feedback will help us in our pursuits to
bring insights into technology trends to you through special issues of SETLabs
Briefngs such as this one. Please do write in to me with your comments and
suggestions.
ravi P gorthi Phd
ravi_gorthi@infosys.com
guest Editor
Subu Goparaju At SETLabs, we constantly look for opportunities to leverage
Vice President
technology while creating and implementing innovative business
and Head of SETLabs
solutions for our clients. As part of this quest, we develop engineering
methodologies that help Infosys implement these solutions right frst
time and every time.
For information on obtaining additional copies, reprinting or translating articles, and all other correspondence,
please contact:
Telephone : 91-80-41173871
Email: SetlabsBriefngs@infosys.com
SETLabs 2009, Infosys Technologies Limited.
Infosys acknowledges the proprietary rights of the trademarks and product names of the other
companies mentioned in this issue of SETLabs Briefngs. The information provided in this document
is intended for the sole use of the recipient and for educational purposes only. Infosys makes no
express or implied warranties relating to the information contained in this document or to any
derived results obtained by the recipient from the use of the information in the document. Infosys
further does not guarantee the sequence, timeliness, accuracy or completeness of the information and
will not be liable in any way to the recipient for any delays, inaccuracies, errors in, or omissions of,
any of the information or in the transmission thereof, or for any damages arising there from. Opinions
and forecasts constitute our judgment at the time of release and are subject to change without notice.
This document does not contain information provided to us in confdence by our clients.
VOL 7 NO 5
2009
kNOwLEdgE
ENgINEErINg
aNd maNagEmENT
k
N
O
w
L
E
d
g
E

E
N
g
I
N
E
E
r
I
N
g
V
O
L

7



N
O

5


2
0
0
9