This action might not be possible to undo. Are you sure you want to continue?
Paul Thompson Thayer School of Engineering and Department of Computer Science, Dartmouth College Hanover, New Hampshire 03755, U.S.A
This paper describes an operational test of the KnowledgeNet system for knowledge management. KnowledgeNet is based on probabilistic design principles first developed for document retrieval, but which have been applied to the retrieval of people within an organization as the sources of information. This operational test of KnowledgeNet was conducted at the Caltex operation at the Minas oil field in Indonesia. An initial analysis of the results of this test show that employees at Tripatra were able to probabilistically index themselves as sources of information accurately enough to provide useful retrieval of their expertise by their supervisors. This technology shows promise for knowledge management not only within an organization, but also at the national level. The expertise of workers no longer within an organization could be maintained within a national labor database, facilitating the worker’s rehire by another organization.
Putting the knowledge existing in the minds of its members to effective use is a challenge facing any organization. The problem tends to become greater the greater the size of the organization. One of the earliest applications of the computer, dating back to the 1950s, was to solve what was called the library problem, using computers to find bibliographic records that could then be used to find documents. In those days computerized retrieval systems were usually used to find documents for scientific researchers. It was commonplace then to observe that people were living in an information age and that there was an information overload problem. These early computerized retrieval systems were based on an exact match retrieval methodology referred to as Boolean logic. A searcher would construct a query expressing his or her information need using terms from the vocabulary used to represent the bibliographic records held in the system, connecting the terms with the Boolean operators, “AND”, “OR”, and “NOT”, as well as proximity operators permitting the match of records where, for example, term 1 was within 2 words of term 2. Using such queries a user would tend to find either far too many records (information overload) or few, if any, records.
Much has changed since those days. Computerized retrieval systems now provide access to the full text of documents. Database retrieval systems have been developed which store an organization’s data in structured databases to be retrieved by an exact match query language, SQL, which is not unlike the Boolean retrieval languages of the 1950s. Still, such database records are a very small percentage of the computerized data held by most organizations. Most data is textual. Textual retrieval methods have changed as well. Until about 10 years ago Boolean retrieval systems were virtually the only ones available commercially. Since then major online retrieval services have provided optional ranked retrieval methods and, more significantly, the World Wide Web has brought about a new type of retrieval with its web search engines. Ranked retrieval methods use various metrics to measure the similarity of a user’s query to the text of documents in order to rank documents, so that even if, say, a thousand documents are retrieved by a query, the user can be shown the top ten or twenty documents, which are more likely to be relevant than lower ranked documents. Although these ranked retrieval techniques worked well with well-organized collections such as those maintained by the major online service providers, they did not work well on the Web, especially as it grew larger. A few years ago it was common to do a search on one of the major web search engines and retrieve more than five million not wellranked documents. More recently web search engines have developed new algorithms that are more suited to the nature of the Web, e.g., its hyper-link structure, and better retrieval is being provided. Nevertheless all of these improvements with database systems and text retrieval systems do not solve the problem mentioned above – putting to effective use the knowledge existing in the minds of the members of an organization. This problem has come to be called the knowledge management problem. To be sure some of this knowledge is represented in databases. Some of this knowledge is represented in documents written by members of the organization or by others outside the organization. Some of this knowledge might be gleaned from e-mail sent within an organization. However, even if database management systems were extended to much greater coverage than they now have,
and even if document retrieval systems could be developed which for every information need provided all and only the relevant documents contained either within an organization or on the Web, and even if a machine learning, natural language understanding algorithm could be developed and applied to the analysis of all e-mail messages passing through an organization’s e-mail system, the knowledge management problem still would not be solved. This is because much knowledge, including some of the most useful knowledge an organization has, only exists in the minds of its members
coming directly from a knowledgeable person. A seeker of information cannot interact with a database record or document in the same way as with a person. As described so far, it is clear that KnowledgeNet promises advantages as a tool for knowledge management over systems based on database or document retrieval. Several issues need to be addressed, however, before a practical system can be built. Two of the most important such issues are: First, which knowledge topics should be included in the system?, and, second, can people accurately estimate either how helpful they could be to a seeker of information on a given topic, or how helpful a potential source of information would be to them? The question of which knowledge topics should be included in the system is resolved by referring to the workflow management of the organization itself. The topics are those implied by the accounting structure of the organization. The second question is potentially more serious. It has been known for many years that people are poor estimators of probabilities . Tools have been developed in work on decision support systems that can help with this problem to some extent, but it remains true that initial estimates provided by sources and seekers will likely not be as accurate as desired . The probabilistic algorithms underlying KnowledgeNet are able to adapt to the experience of people using the system to improve the accuracy of initial estimates through a process known as relevance feedback. With relevance feedback a seeker retrieving a document, or in the case of KnowledgeNet a source of information, provides an assessment to the system as to whether or not the document, or source, was helpful. For many years research studies have shown that relevance feedback can lead to much more accurate retrieval systems. Users of KnowledgeNet will be motivated to provide relevance judgments as they realize that providing these judgments leads to their obtaining better results. This paper describes an operational test of KnowledgeNet in a large oil company in Indonesia. It shows the promise of how effectively such a system could be used to put seekers of information in contact with workers who can answer the seekers’ questions. KnowledgeNet’s knowledge base, i.e., the information stored on the expertise of each member of the organization represented in the system, can also have another important use. As large organizations lay off workers from time to time, the expertise of laid off workers could be represented in a national labor database, which would facilitate the eventual rehiring of these workers.
KnowledgeNet is a knowledge management system, based on principles first developed for ranked document retrieval , which directly addresses the problem of accessing knowledge held only in the minds of an organization’s members. Document retrieval researchers realized many years ago that the exact match logic used in early document retrieval systems and in database systems was inadequate to retrieve the unstructured information contained in documents. There are a variety of reasons why exact match technology is inadequate, but chief among these reasons are linguistic and conceptual ambiguity. The same word can have multiple meanings. The same, or similar, concepts can be expressed in many different ways. Over the years retrieval systems were developed that either calculated the probability that a particular document would be relevant for a particular user’s information need, or used heuristics to measure the similarity of the document to the user’s information need, as represented by the query. It was recognized that the same type of probabilistic retrieval algorithms could be used to retrieve information, or knowledge directly from people, the ultimate sources of the knowledge that might be represented in databases or documents . A person could probabilistically index him- or herself as a source of information. To be more concrete, a person could be asked to estimate how many out of ten people coming to him or her with a question on a particular topic the person thinks that he or she could help. Similarly, a person seeking for information using such a system could be asked out of n people represented in the organization as having some knowledge about a topic, how many of these people does the seeker think could provide helpful information. These estimates of knowledge sources and seekers can be combined to provide the probability that a particular source could help a particular seeker with his or her information need. Such a probabilistic retrieval system for people as sources of information and knowledge, which we call KnowledgeNet, has advantages that go beyond retrieving information, or knowledge, that is contained only in people’s minds. Even if knowledge has been recorded in a database record or document, it is often not as helpful as knowledge
3. MINAS: AN OPERATIONAL TEST OF KNOWLEDGENET
Minas is an ageing oil field in central Sumatra. Seventeen years after its first production, Minas field, on May 4, 1969 reached an accumulated production of the first billion barrels of crude and became the first giant oil field in Asia, east of Iran and the twenty-second in the world. Until the end of 1990, the accumulated production of Minas field had exceeded three billion barrels. “Minas crude” (now known with its name “Sumatran light crude”-SLC) is favored by industrial countries for its very low degree of sulphur. From 1995 onward, at the time Caltex (PT Caltex Pacific Indonesia) was organizing Minas as a Strategic Business Unit (SBU), part of an enterprise approach to project management, it was becoming evident that oil production at Minas field was falling off. Also, about this time, it was seen that two advanced management technologies might be applied significantly, with far ranging results. This paper will deal with one of them, KnowledgeNet One World, or KNOW. The second technology is an enterprise management package that implements the organizational theory of Dr. Elliott Jaques . KNOW represents the first practical use of the innovative KnowledegNet search technology, which was first introduced, as Helpnet, an academic research model, incorporating the probabilistic search theory of Dr. M.E. Maron and his colleagues . Although the model was technically sound, it was perceived that the use of subjective input probabilities required of the participants was inherently unreliable . Helpnet’s seminal contribution to the field of expert finding systems is only now being recognized [6, 7]. The Helpnet paper resurfaced at Minas in 1995. The Helpnet model, which had not been used for a practical application, by 1995, was outdated technically. All that was available for management to consider was the Helpnet paper, which was theoretical in nature and hard for anyone but a specialist to understand. Aside from the problem of developing user-friendly software for the application itself, it was necessary to address the “subjective input” objection. Estimating is not foreign to construction. Although it might be preferable to arrive at a task time assessment, using a productivity factor and drawing quantity, the skill and experience of a worker, given some kind of model, has proven to be quite reliable in making these assessments, also. Caltex has out-sourced management of the field labor force to various contractors over the years, the last being Tripatra, a national contractor. The one constant has been the labor force itself, which is generational. This continuity of service has provided depth of experience, skill and knowledge, while providing training
opportunities for the worker themselves. Much of the work is repetitious, though taking place over oftendifficult terrain. Given context, the worker can make a subjective assessment, such as is required in the use of KNOW. KNOW requires only two such assessments: one, from the worker, or source at file building time; the other, from the user of the system at query time. KNOW requires a subjective assessment pertaining to specified work categories. Regarding each of these categories, 52 in this case, the worker is asked how many out of 10 questions put to him by his immediate supervisor he feels he could help answer. These are not meant to be test questions, but questions that the supervisor himself feels he needs help in answering. These work categories are called Areas of Interest in KNOW and are critical to the use of the system. An enterprise approach to project management requires a body of hierarchically interrelated cost account codes, defining a generic project as being composed of a number of chargeable sub-projects, and non-chargeable covering terms. Use of financial enterprise software, such as PeopleSoft , SAP , and J.D. Edwards , requires this. These account code definitions, used by Calfais, the Caltex financial software, recently “mapped” over to J. D. Edwards, were used to define the 52 work categories given to the workers for their input estimates. The KNOW Areas of Knowledge requirement was met by simply using employee Time and Attendance data. (Time and Attendance Systems are part of any enterprise system.) Employees are linked to corporate functions, corporate function are Areas of Knowledge. As an employee moves through his corporate career he may occupy more than one functional area. These are all his Areas of Knowledge, and may be kept track of automatically. An operational test had to wait until the development of the KNOW program, which took place after November 2000. Careful consideration was given as to how to approach program development. It was seen that the application lent itself ideally to the use of an off-the-shelf relational database as a starting point. Any, but essential programming would be avoided. A user-friendly, workable program was ready for corporate use sometime in 2003. Since the employee work situation had altered dramatically in the intervening years – the resident contractor at Minas had downsized from about 3,000 to 1,400 employees, Minas management kindly provided the means to solicit employee information from as many as possible. It was not until this year, 2004 that the operational test was able to go forward. From the site visit in 2003 to the site visit in 2004, continued downsizing has further
reduced project employee levels to about 500, suitable for the mostly maintenance mission now required of the contractor. Also, Primatrain, a local software house, provided IT support, unavailable on site. Of the original 1,400 employees, 500 only were able to provide Area of Interest information. Of that 500, only 110 were still with the company at the last visit. In the allotted time there was no possibility of getting complete file information for the remaining Minas complement of 500. Despite the difficult labor conditions described above and the expense, a test of KNOW was completed. Eighteen supervisors asked questions from as many of the 52 Areas of Interest as they felt were applicable to their employees. Because of the downsizing only a few of the employees who were originally entered into the KNOW system were still on site. In most cases an employee was only asked one question for each Area of Interest, making it impossible to get a clear sense of how accurate the employee’s estimates as to how many of 10 questions he could answer were. For three types of employees, however, due to the uncertainties as to who would be their supervisor as reorganization proceeded, more than one supervisor asked questions of the same pool of employees. In particular, employees working in the “civil” area were asked questions by three supervisors. Although representing only an initial quick look at the large amount of data collected, it is instructive to consider the responses of the civil employees. Each supervisor evaluated whether the answer given to his question by each of the employees was satisfactory. Since each of the three supervisors asked each of the employees one question, an employee answered either 0, 1, 2, or all 3 questions correctly, or, in terms of percentages, 0, .33, .66, or 1.00. These actual results can be compared to the percentage of questions that the employee estimated that he would be able to answer. Three questions are not enough to give a valid measure of an individual’s ability to make accurate estimates. On the other hand, it is possible to consider a more aggregate measure. Of the 52 Areas of Interest, the number of times that the source that made the highest estimate of number of questions that he could answer, was in fact the source that was able to answer the most questions, was recorded. If two or more sources were each able to answer the same, highest number of questions, including the source with the highest estimate, this was counted in favor of the source with the highest estimate. For a few areas, there was only one source, so these areas were not included. For 28 areas of interest the source with the highest estimate, was not the source who was able to answer the most questions. For 19 areas of interest, the source with the highest estimate was able to answer the most questions. Although at first this might not seem too impressive, consider the hypothetical case where sources’ estimates had no predictive value. Since some areas of interest had
fewer numbers of sources that were consulted than other areas, there were, on average, 4.596 sources consulted for each area of interest for which more than one source was consulted. The source with the highest estimate actually answering the most questions for 19 out of 47 areas of interest is about .40425 percent of the time. If their estimates had no predictive value, it would be expected that by chance they would answer the most questions one out of 4.596 times, which would be .217 percent of the time. The actual performance of .40425 shows that the estimates have predictive value.
4. DISCUSSION AND FUTURE PLANS
Although the preliminary results described above show that humans can make estimates for KnowledgeNet that have predictive value, it is not necessary to regard these estimates as static. As seekers of information use KnowledgeNet to find people to answer their questions, the seekers can record which people were helpful. This relevance feedback can be used to adjust the original estimates provided by the sources of information. Relevance feedback has been shown to be effective in document retrieval . The probabilistic model on which KnowledgeNet is based has been extended to support relevance feedback [12, 13]. Recently this model was implemented in the NewsVerifier system . KNOW recognizes the fact that all employees have an inherent intellectual property right which cannot be contracted away by an employing organization. What one knows is a market commodity, imperfectly expressed in an employee resume or curriculum vitae. A qualitative measure of the knowledge one accrues over time in corporate employment does not exist. Furthermore, law forbids employers from expressing an opinion. Just as market forces determine the value of consumer goods and services, the market place of ideas has its place within a large corporate setting. The uninhibited exchange of work-related ideas, with relevance feedback can be used to value an employee’s corporate contributions over time. These valuations may be carried over into an Internet world market, and with the power of the KNOW search technology give the employee his unique voice. A national labor database based on KNOW technology is not only feasible, but can be achieved. The worker will no longer be lost in the collective; as impossible to find as a needle in a haystack.
This initial operational test of KnowledgeNet shows the promise of this technology to support knowledge management within an organization, benefiting both management and employees at all levels. It also suggests how the knowledge of employees within an organization can be preserved not only within corporate memory, but
within national memory in the form of a national labor database.
SAP. 2004. http://www.sap.com/
10. J.D. Edwards. 2004. http://www.jdedwards.com
The author would like to acknowledge the assistance of the management of Tripatra and all of its employees present and past, who took part in this study. Without their participation this study would not have been possible. 11. T. Sakai, S.E. Robertson, and S. Walker, “Flexible relevance feedback for NTCIR-2” In Proceedings of the Second NTCIR Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization, Tokyo, Japan, March 7 – 9, Tokyo: National Institute of Informatics, 2001. 12. P. Thompson, “A combination of expert opinion approach to probabilistic information retrieval, Part 1: The conceptual model”, Information Processing & Management, Vol. 25, No. 6, 1990, pp. 371-382. 13. P. Thompson, “ A combination of expert opinion approach to probabilistic information retrieval, Part 2: mathematical treatment of CEO Model 3”, Information Processing & Management, Vol. 25, No. 6, 1990, pp 383-394. 14. P. Thompson, “Cognitive Hacking and Intelligence and Security Informatics”, Proceedings of the Conference on Enabling Technologies for Simulation Science VIII, Defense and Security Symposium 2004¸ Orlando, Florida, 12-16 April, 2004.
1. S.E. Robertson, M.E. Maron, and W.S. Cooper. “Probability of relevance: A unification of two competing models for document retrieval”, Information Technology: Research and Development, Vol. 1, No. 1, 1982, pp. 1-2. M.E. Maron, S. Curry, and P. Thompson, P. “An Inductive Search System: Theory, Design and Implementation”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-16, No. 1, 1986, pp. 21-28. D. Kahneman, P. Slovic, A. Tversky, A., editors. Judgment under uncertainty: Heuristics and biases, Cambridge, England: Cambridge University Press; 1982 P. Thompson, “ Subjective Probability and Information Retrieval: A Review of the Psychological Literature”, Journal of Documentation, Vol. 44, No. 2, 1988, pp. 119-143. E. Jaques, Requisite Organization: A Total System for Effective Managerial Organization and Managerial Leadership for the 21st Century, 2nd edition, Glouster, Massachusetts: Cason Hall, 1996. J. Wang, C. Zheng, T. Li, W-Y. Ma, W. Liu, “Ranking User's Relevance to a Topic through Link Analysis on Web Logs”, Fourth ACM CIKM International Workshop on Web Information and Data Management (WIDM'02), McLean, Virginia, 2002. D. Yimam-Seid and A. Kobsa, “Expert Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach”, Journal of Organizational Computing and Electronic Commerce, Vol. 13, No. 1, 2003, pp. 1-24. PeopleSoft. 2004. http://www.peoplesoft.com/corp/en/public_index.jsp
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.