Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 16 |Likes:
Published by Betsey Merkel

More info:

Published by: Betsey Merkel on Jan 19, 2010
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





An Operational Test of KnowledgeNet
Paul ThompsonThayer School of Engineering andDepartment of Computer Science, Dartmouth CollegeHanover, New Hampshire 03755, U.S.A
This paper describes an operational test of theKnowledgeNet system for knowledge management.KnowledgeNet is based on probabilistic design principlesfirst developed for document retrieval, but which havebeen applied to the retrieval of people within anorganization as the sources of information. Thisoperational test of KnowledgeNet was conducted at theCaltex operation at the Minas oil field in Indonesia. Aninitial analysis of the results of this test show thatemployees at Tripatra were able to probabilistically indexthemselves as sources of information accurately enoughto provide useful retrieval of their expertise by theirsupervisors. This technology shows promise forknowledge management not only within an organization,but also at the national level. The expertise of workers nolonger within an organization could be maintained withina national labor database, facilitating the worker’s rehireby another organization.
Putting the knowledge existing in the minds of itsmembers to effective use is a challenge facing anyorganization. The problem tends to become greater thegreater the size of the organization. One of the earliestapplications of the computer, dating back to the 1950s,was to solve what was called the library problem, usingcomputers to find bibliographic records that could then beused to find documents. In those days computerizedretrieval systems were usually used to find documents forscientific researchers. It was commonplace then toobserve that people were living in an information age andthat there was an information overload problem. Theseearly computerized retrieval systems were based on anexact match retrieval methodology referred to as Booleanlogic. A searcher would construct a query expressing hisor her information need using terms from the vocabularyused to represent the bibliographic records held in thesystem, connecting the terms with the Boolean operators,“AND”, “OR”, and “NOT”, as well as proximityoperators permitting the match of records where, forexample,
term 1
was within 2 words of 
term 2
. Usingsuch queries a user would tend to find either far too manyrecords (information overload) or few, if any, records.Much has changed since those days. Computerizedretrieval systems now provide access to the full text of documents. Database retrieval systems have beendeveloped which store an organization’s data instructured databases to be retrieved by an exact matchquery language, SQL, which is not unlike the Booleanretrieval languages of the 1950s. Still, such databaserecords are a very small percentage of the computerizeddata held by most organizations. Most data is textual.Textual retrieval methods have changed as well. Untilabout 10 years ago Boolean retrieval systems werevirtually the only ones available commercially. Sincethen major online retrieval services have providedoptional ranked retrieval methods and, more significantly,the World Wide Web has brought about a new type of retrieval with its web search engines. Ranked retrievalmethods use various metrics to measure the similarity of a user’s query to the text of documents in order to rank documents, so that even if, say, a thousand documents areretrieved by a query, the user can be shown the top ten ortwenty documents, which are more likely to be relevantthan lower ranked documents. Although these rankedretrieval techniques worked well with well-organizedcollections such as those maintained by the major onlineservice providers, they did not work well on the Web,especially as it grew larger. A few years ago it wascommon to do a search on one of the major web searchengines and retrieve more than five million not well-ranked documents. More recently web search engineshave developed new algorithms that are more suited tothe nature of the Web, e.g., its hyper-link structure, andbetter retrieval is being provided. Nevertheless all of these improvements with database systems and textretrieval systems do not solve the problem mentionedabove – putting to effective use the knowledge existing inthe minds of the members of an organization. Thisproblem has come to be called the knowledgemanagement problem.To be sure some of this knowledge is represented indatabases. Some of this knowledge is represented indocuments written by members of the organization or byothers outside the organization. Some of this knowledgemight be gleaned from e-mail sent within an organization.However, even if database management systems wereextended to much greater coverage than they now have,
and even if document retrieval systems could bedeveloped which for every information need provided alland only the relevant documents contained either withinan organization or on the Web, and even if a machinelearning, natural language understanding algorithm couldbe developed and applied to the analysis of all e-mailmessages passing through an organization’s e-mailsystem, the knowledge management problem still wouldnot be solved. This is because much knowledge,including some of the most useful knowledge anorganization has, only exists in the minds of its members
KnowledgeNet is a knowledge management system,based on principles first developed for ranked documentretrieval [1], which directly addresses the problem of accessing knowledge held only in the minds of anorganization’s members. Document retrieval researchersrealized many years ago that the exact match logic usedin early document retrieval systems and in databasesystems was inadequate to retrieve the unstructuredinformation contained in documents. There are a varietyof reasons why exact match technology is inadequate, butchief among these reasons are linguistic and conceptualambiguity. The same word can have multiple meanings.The same, or similar, concepts can be expressed in manydifferent ways. Over the years retrieval systems weredeveloped that either calculated the probability that aparticular document would be relevant for a particularuser’s information need, or used heuristics to measure thesimilarity of the document to the user’s information need,as represented by the query. It was recognized that thesame type of probabilistic retrieval algorithms could beused to retrieve information, or knowledge directly frompeople, the ultimate sources of the knowledge that mightbe represented in databases or documents [2].A person could probabilistically index him- or herself asa source of information. To be more concrete, a personcould be asked to estimate how many out of ten peoplecoming to him or her with a question on a particular topicthe person thinks that he or she could help. Similarly, aperson seeking for information using such a system couldbe asked out of 
people represented in the organizationas having some knowledge about a topic, how many of these people does the seeker think could provide helpfulinformation. These estimates of knowledge sources andseekers can be combined to provide the probability that aparticular source could help a particular seeker with his orher information need. Such a probabilistic retrievalsystem for people as sources of information andknowledge, which we call KnowledgeNet, hasadvantages that go beyond retrieving information, orknowledge, that is contained only in people’s minds.Even if knowledge has been recorded in a database recordor document, it is often not as helpful as knowledgecoming directly from a knowledgeable person. A seekerof information cannot interact with a database record ordocument in the same way as with a person.As described so far, it is clear that KnowledgeNetpromises advantages as a tool for knowledgemanagement over systems based on database ordocument retrieval. Several issues need to be addressed,however, before a practical system can be built. Two of the most important such issues are: First, whichknowledge topics should be included in the system?, and,second, can people accurately estimate either how helpfulthey could be to a seeker of information on a given topic,or how helpful a potential source of information would beto them?The question of which knowledge topics should beincluded in the system is resolved by referring to theworkflow management of the organization itself. Thetopics are those implied by the accounting structure of theorganization. The second question is potentially moreserious. It has been known for many years that peopleare poor estimators of probabilities [3]. Tools have beendeveloped in work on decision support systems that canhelp with this problem to some extent, but it remains truethat initial estimates provided by sources and seekers willlikely not be as accurate as desired [4]. The probabilisticalgorithms underlying KnowledgeNet are able to adapt tothe experience of people using the system to improve theaccuracy of initial estimates through a process known asrelevance feedback. With relevance feedback a seekerretrieving a document, or in the case of KnowledgeNet asource of information, provides an assessment to thesystem as to whether or not the document, or source, washelpful. For many years research studies have shown thatrelevance feedback can lead to much more accurateretrieval systems. Users of KnowledgeNet will bemotivated to provide relevance judgments as they realizethat providing these judgments leads to their obtainingbetter results.This paper describes an operational test of KnowledgeNetin a large oil company in Indonesia. It shows the promiseof how effectively such a system could be used to putseekers of information in contact with workers who cananswer the seekers’ questions. KnowledgeNet’sknowledge base, i.e., the information stored on theexpertise of each member of the organization representedin the system, can also have another important use. Aslarge organizations lay off workers from time to time, theexpertise of laid off workers could be represented in anational labor database, which would facilitate theeventual rehiring of these workers.
Minas is an ageing oil field in central Sumatra.Seventeen years after its first production, Minas field, onMay 4, 1969 reached an accumulated production of thefirst billion barrels of crude and became the first giant oilfield in Asia, east of Iran and the twenty-second in theworld. Until the end of 1990, the accumulated productionof Minas field had exceeded three billion barrels. “Minascrude” (now known with its name “Sumatran lightcrude”-SLC) is favored by industrial countries for its verylow degree of sulphur. From 1995 onward, at the timeCaltex (PT Caltex Pacific Indonesia) was organizingMinas as a Strategic Business Unit (SBU), part of anenterprise approach to project management, it wasbecoming evident that oil production at Minas field wasfalling off. Also, about this time, it was seen that twoadvanced management technologies might be appliedsignificantly, with far ranging results. This paper willdeal with one of them, KnowledgeNet One World, orKNOW. The second technology is an enterprisemanagement package that implements the organizationaltheory of Dr. Elliott Jaques [5].KNOW represents the first practical use of the innovativeKnowledegNet search technology, which was firstintroduced, as Helpnet, an academic research model,incorporating the probabilistic search theory of Dr. M.E.Maron and his colleagues [2]. Although the model wastechnically sound, it was perceived that the use of subjective input probabilities required of the participantswas inherently unreliable [3]. Helpnet’s seminalcontribution to the field of expert finding systems is onlynow being recognized [6, 7].The Helpnet paper resurfaced at Minas in 1995. TheHelpnet model, which had not been used for a practicalapplication, by 1995, was outdated technically. All thatwas available for management to consider was theHelpnet paper, which was theoretical in nature and hardfor anyone but a specialist to understand. Aside from theproblem of developing user-friendly software for theapplication itself, it was necessary to address the“subjective input” objection. Estimating is not foreign toconstruction. Although it might be preferable to arrive ata task time assessment, using a productivity factor anddrawing quantity, the skill and experience of a worker,given some kind of model, has proven to be quite reliablein making these assessments, also.Caltex has out-sourced management of the field laborforce to various contractors over the years, the last beingTripatra, a national contractor. The one constant has beenthe labor force itself, which is generational. Thiscontinuity of service has provided depth of experience,skill and knowledge, while providing trainingopportunities for the worker themselves. Much of thework is repetitious, though taking place over often-difficult terrain.Given context, the worker can make a subjectiveassessment, such as is required in the use of KNOW.KNOW requires only two such assessments: one, fromthe worker, or source at file building time; the other, fromthe user of the system at query time. KNOW requires asubjective assessment pertaining to specified work categories. Regarding each of these categories, 52 in thiscase, the worker is asked how many out of 10 questionsput to him by his immediate supervisor he feels he couldhelp answer. These are not meant to be test questions,but questions that the supervisor himself feels he needshelp in answering.These work categories are called Areas of Interest inKNOW and are critical to the use of the system. Anenterprise approach to project management requires abody of hierarchically interrelated cost account codes,defining a generic project as being composed of a numberof chargeable sub-projects, and non-chargeable coveringterms. Use of financial enterprise software, such asPeopleSoft [8], SAP [9], and J.D. Edwards [10], requiresthis. These account code definitions, used by Calfais, theCaltex financial software, recently “mapped” over to J. D.Edwards, were used to define the 52 work categoriesgiven to the workers for their input estimates. TheKNOW Areas of Knowledge requirement was met bysimply using employee Time and Attendance data. (Timeand Attendance Systems are part of any enterprisesystem.) Employees are linked to corporate functions,corporate function are Areas of Knowledge. As anemployee moves through his corporate career he mayoccupy more than one functional area. These are all hisAreas of Knowledge, and may be kept track of automatically.An operational test had to wait until the development of the KNOW program, which took place after November2000. Careful consideration was given as to how toapproach program development. It was seen that theapplication lent itself ideally to the use of an off-the-shelf relational database as a starting point. Any, but essentialprogramming would be avoided. A user-friendly,workable program was ready for corporate use sometimein 2003. Since the employee work situation had altereddramatically in the intervening years – the residentcontractor at Minas had downsized from about 3,000 to1,400 employees, Minas management kindly provided themeans to solicit employee information from as many aspossible.It was not until this year, 2004 that the operational testwas able to go forward. From the site visit in 2003 to thesite visit in 2004, continued downsizing has further

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->