TAKMA 05 Copenhagen, Denmark August 22-26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
Mykola Pechenizkiy, Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal Department of Computer Science Trinity College Dublin Ireland

Outline
‡ Introduction
± KDD ± Selection of DM strategy for a problem at hand ± Meta-learning

‡ Our goal
± To propose a knowledge-driven approach to enhance the selection of DM strategies in KDSs.

‡ Need for KM ‡ What are the challenges
± KM processes wrt problem of DM strategy selection

‡ Further research ‡ Discussion
TAKMA 05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen

2

R.. Smyth. Piatetsky-Shapiro. TAKMA 05 Copenhagen.. Tsymbal. U. G. Uthurusamy. P. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy.. 1997. Denmark August 22-26. AAAI/MIT Press. Puuronen 3 . Advances in Knowledge Discovery and Data Mining.I Knowledge discovery as a process Fayyad..

2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Puuronen 4 . Tsymbal. Denmark August 22-26.crisp-dm.org/ TAKMA 05 Copenhagen.CRISPCRISP-DM http://www.

2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. TAKMA 05 Copenhagen. Tsymbal. 1999. LNAI 1623. Puuronen 5 . Denmark August 22-26.KDD Process: Vertical Solutions Busine ss Understanding Data Understanding Data Preparation Data Exproration Data Mining Ex pe rie nc ea cc u mu la t io n Evaluation & Interpretation Deployment Reinartz. Focusing Solutions for Data Mining. Berlin Heidelberg. T.

Tsymbal. TAKMA 05 Copenhagen. ‡ The choice of methods needs to be based on a higher level induction or on meta-learning in the context of machine learning. or correct predictions concerning the best rank of strategies for a new task. Puuronen 6 . different goals and different degrees of success´ [Laudan] ‡ Meta-learning can produce rules concerning the use of the alternative strategies. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. ‡ ³knowledge concerning the most appropriate method for a given goal can be obtained by induction on the database of history of science a collection of problems of different methods.The Search for Scientific Methods and Meta-Learning Meta‡ Adequate scientific methods make induction easier with a smaller number of examples. Denmark August 22-26. methodological knowledge.

1997). Tsymbal. TAKMA 05 Copenhagen. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. ‡ Their general idea is usually to select one classifier on the dynamic basis taking into account the local performance (e.g. Puuronen 7 . ± multistrategy learning (Michalski) ‡ applies a strategy selection approach which takes into account the classification problem. generalisation accuracy) in the instance space.related characteristics (meta-data). Denmark August 22-26.Dynamic Selection of DM Methods ‡ « in KDSs has been under active study ‡ 2 contexts of dynamic selection: ± multi-classifier systems that apply different ensemble techniques (Dietterich.

Tsymbal. Denmark August 22-26. combination and application => metaknowledge. ‡ Problem ± Selection is usually not straightforward. ‡ We distinguish 2 levels of knowledge: ± the knowledge extracted from data that represents the problem to be mined by means of applying a DM technique ± the higher-level knowledge (from the KDS perspective) required for managing techniques¶ selection. ± some knowledge is required for making a decision about appropriate techniques¶ selection and DM strategy construction for a problem at hand. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. TAKMA 05 Copenhagen.Selection of the most appropriate DM technique ‡ Motivation ± No Free Lunch theorem. 2003). ± many empirical studies show ‡ one learning strategy can perform significantly better than another strategy on a group of problems that are characterised by some properties (Kiang. Puuronen 8 .

Denmark August 22-26. Puuronen 9 . ± to measure the benefits of early learning on subsequent learning. ± to use such evaluations to reason about learning strategies ‡ select useful ones and disregard the useless or misleading strategies (Schmidhuber et al. ‡ based on the assumptions that it is possible ± to evaluate and compare learning strategies. TAKMA 05 Copenhagen..MetaMeta-learning ‡ or ³learning to learn´ ± the effort to automatically induce dependencies: ± learning tasks  learning strategies. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Tsymbal. 1996).

± rather good practical results are shown in experiments supported by theoretical studies as well. 2002) ‡ No practical success! TAKMA 05 Copenhagen. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Tsymbal. 1997) ± several studies on automatic classifier selection via meta-learning (Kalousis.in Meta-learning Meta‡ in the context of classifier ensembles. Puuronen 10 . where only the data itself is used to make decisions about method selection. ‡ in dynamic integration of DM strategies for a data set at hand: ± a multistrategy approach based on the ideas of constructive induction and conceptual clustering (Michalski. Denmark August 22-26.

Tsymbal.MetaMeta-Learning C o lle c t io n o f d a ta s e ts C o lle c t io n o f t e ch n iq u es P e r f o r m a n c e c r it e r ia M e t a -le a r n in g s p ace K n o w le d g e r e p o s it o r y A n e w d a t a s et M e t a -m o d e l S u g g es t e d t ec h n iq u e E va lu a t io n TAKMA 05 Copenhagen. Denmark August 22-26. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Puuronen 11 .

Problems with Meta-Learning for DM SS Meta‡ Representativeness of meta-data samples ± Meta-learning space is large ± Computationally expensive to produce meta-data samples ± Curse of dimensionality ± Many possible irrelevant features wrt collected/produced meta-data ‡ Complexity of statistical measures ± Why do we need to spend time to characterize the dataset if we can use this time to try different DM approaches and select the best one? TAKMA 05 Copenhagen. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Denmark August 22-26. Tsymbal. Puuronen 12 .

Tsymbal. ± representation. ‡ focus on KM aimed to organise a systematic process of knowledge capture and refinement over time. Puuronen 13 . ‡ We consider the basic knowledge management processes of ± knowledge creation and identification.Our goal and focus: KM perspective ‡ to propose a knowledge-driven approach to enhance the dynamic integration of DM strategies in knowledge discovery systems. ± adaptation and application with respect to the introduced concept of meta-knowledge. TAKMA 05 Copenhagen. collection and organization. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. ± sharing and integration. Denmark August 22-26.

Tsymbal. and ± (4) communities of practice [these are end-users]. Puuronen 14 .Introducing KM to DM SS ‡ Generally. 1999). 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Validation and R efinement TAKMA 05 Copenhagen. storage. and dissemination is similar to data and information management in ISs. Denmark August 22-26. the problem of knowledge capture. Zack (1999) ± the most practical way to define KM is to show on the existing IT infrastructure the involvement of: ± (1) knowledge repositories. Knowledge reation & Ac uisition Knowledge rgani ation & Storage Knowledge Distribution & Integration Knowledge Adaptation & Application ‡ Knowledge Evaluation. and therefore some executives prefer to view KM as a natural extension to IS functions (Alavi and Leidner. ± (3) expert networks [these are DM experts]. ± (2) best-practices and lessons-learned systems.

Representation. Archiving. Transmission. A long history of epistemological debates. Deletion Data Processing Information Knowing that and what Information Processing Knowl Knowing how and why Knowing when. and discussion of knowledge from different perspectives in Polanyi (1962). 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. Recording. Storage. TAKMA 05 Copenhagen.Transformations of data and knowledge concepts Entities Reality Data Attributes Capture. Puuronen 15 . Denmark August 22-26. 1994). where and what for Knowledge Processing Wi om (adopted from Spiegler. 2000) Knowledge is ³justified belief that increases an entity¶s capacity for effective action´ (Nonaka. Tsymbal.

Puuronen 16 .e. relationships. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. domain kno ledge TAKMA 05 Copenhagen. i. resources business DM goals. procedural kno ledge data set characterization temporal context higher-level abstraction integration.e. Denmark August 22-26. risks. Tsymbal. declarative kno ledge hypothesis. i.Different types of knowing Knowing that and hat ho here hen hy ho ho much hat for Analysis Conceptual Functional patial Temporal Causal Organizational Economical trategic Context concepts. sharing benefits.

statistics and related fields. Tsymbal. research and business communities. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. ± (4) knowledge from field experiments on real-world problems. where participant are motivated to share their knowledge. knowledge discovery. Denmark August 22-26. ± (2) knowledge from a data-mining practitioner. ± Beside this. TAKMA 05 Copenhagen. ± (3) knowledge from laboratory experiments on synthetic data sets. finally. Puuronen 17 .Knowledge distribution and knowledge integration 4 potential sources of knowledge that has to be integrated in the repository of KDS system: ± (1) knowledge from an expert in data-mining. and. and similar KDSs themselves can organize different trusted networks.

Tsymbal. ± needs for continuously update. Puuronen 18 . Denmark August 22-26. ‡ if similar contributions are combined. ‡ some content needs to be deleted (if misleading). generalized and restructured.Knowledge Repository Lifecycle (1 of 2) ‡ Since the repository is created it tends to grow and at some point it naturally begins to collapse under its own weight. the content may become less fragmented and redundant. requiring major reorganization. ‡ The process of filtering knowledge claims into accepted or suppressed is important ± when a plenty of claims are produced automatically they need to be filtered automatically. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. deactivated or archived (if it is potentially useful). TAKMA 05 Copenhagen.

± Disagreements within the knowledge repository need to be resolved by means of generalization of some parts and contextualization of the others. Tsymbal. ‡ Some basic principles of triggers can be introduced TAKMA 05 Copenhagen.Knowledge Repository Lifecycle (2 of 2) ‡ ³knowing when´ and ³knowing where´ contexts: ± when the environment changes. Puuronen 19 . it needs to be continually tested. ± some knowledge should exist that would guide an organization to change the repository when the environment calls for it. Denmark August 22-26. ‡ In order to increase the quality and validity of knowledge. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. improved or removed. ‡ Some knowledge claims are naturally in constant competition with the other claims. all of the general rules without specifying the context could become invalid.

2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. interpretability. precision. and adequacy of (meta-)knowledge. and understandability and sensitiveness to the context. To determine the relative quality of a validated knowledge claim.Knowledge validity and knowledge quality ‡ The contexts ³knowing when´ and ³knowing where´ can be discovered before it appears in a real situation. Denmark August 22-26. ± context conditions can be important for knowledge quality estimation ‡ ‡ ‡ The quality of knowledge can be estimated by its ability to help a KDS produce solutions faster and more effectively. Tsymbal. exactness. and predictive power are well formalised and easy to estimate. Knowledge claims have both a degree of utility and a degree of satisfaction. ± understandability. TAKMA 05 Copenhagen. usefulness. Puuronen 20 . compactness. ± Active learning ± Zooming in and zooming out procedures ± Search for balance between generality. evaluation criteria should be defined: ± complexity. reliability of source. explanatory power are rather subjective and therefore inaccurate.

Puuronen 21 . benefits. ‡ the identification of available knowledge.Limitations ‡ The goal of KM here is to make more effective and efficient use of available DM techniques. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy. and risks management. ‡ and analysing the ability to design an KM (sub)system including its tools and applications ± (3) costs. Denmark August 22-26. ± (2) operational management. Tsymbal. TAKMA 05 Copenhagen. ‡ seeking ways to capture it in a KM process. and ± (4) standards in the KM technology and communication. ‡ The most important issues in knowledge management: ± (1) executive/strategic management.

Further Research r c l ti & isiti r l iz ti t r & l istri ti I t r ti & l t ti & lic ti l E l ti . 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy.ri rk f r th t c t i s li it r f M t ch i s f c rt i ty xtr cti s t ch i s cl ssific ti ± F t r t ch i ‡ E l rl ti r f th fr l si rk i istri t r ctic f r r ir t l- TAKMA 05 Copenhagen. V li ti R fi t ‡ I fr l t ti f r s t k l . Tsymbal. Denmark August 22-26. Puuronen 22 .

fi/~mpechen TAKMA 05 Copenhagen. Puuronen 23 .jyu.cs.Thank You! Feedback is very welcome: ‡ Questions ‡ Suggestions ‡ Guidelines ‡ Collaboration Contact Info: Mykola Pechenizkiy Department of Computer Science and Information Systems. University of Jyväskylä.jyu. Tsymbal. Denmark August 22-26. FINLAND -mail: mpechen@cs.fi Tel. 2005 Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy.: +358 14 2602472 Fax: +358 14 260 3011 http://www.

Sign up to vote on this title
UsefulNot useful