You are on page 1of 3

Implementing The Taxonomy - New Idea Engineering

Monday, February 11, 2013

Search this site:

Copyright (c)1996-2012 by New Idea Engineering, Inc.

Professional Microsoft Search: FAST Search, SharePoint Search, and Search Server

Learn whats new on #Solr 4.1 - great new capabilities make it a world class search platform /25xqOzz7
about 2 days ago

Close call.. asteroid passing closer than geosync gps satellites on the 15th. Wonder if any will be take out? /KIuVOSDs
about 3 days ago

Follow @miles_kehoe

1 of 3

02/11/2013 11:19 PM

ideaeng. every subject (classification) in the taxonomy needs to be defined to answer the question “how do I decide if the document or record belongs to this classification?”. There are two steps to taxonomy implementation.. or by a combination of people and automation. or by software (various types of information retrieval or classification-categorization). or assigned to content.. These rules are then applied by people. First. 2003 « NIE Newsletter “Ok. After all that work …. December 21. The taxonomy can only be used to assist in accessing information if the taxonomy has been “implemented”.Implementing The Taxonomy .Are we there yet? Although a taxonomy is a “subject map” to an organization’s content. See the information site at the Enterprise Search Summit Call for Speakers page. Read More → NIE Implementing The Taxonomy Last Updated Jan 2009 By: Mark Bennett & John Lehman .HighClassify . the map is missing the “roads”. Now everyone knows how our content is accessed”. we have our organization taxonomy! That should make the users happy. The degree to which the assignment is by humans or by automation is closely related to the number and variety of classifications and the amount of content. If you work .New Idea Engineering http://www. in some way. 2 of 3 02/11/2013 11:19 PM .Issue 3 .July. is the last day for submitting papers and workshops to ESS in NY in May Enterprise Search Blog Last call for submiting papers to ESS NY about a month ago This Friday.

Without such classifications. that are explicitly. completely and accurately defined. a product type classification rule consists of the list of manufacturers. For example. If the definition isn’t obvious and precise. they also miss the precision of rules and the result vagueness that accompanies vague rules. the likelihood that classification will both miss “good” material and mis-assign “bad” material is high. Ongoing maintenance of classification rules is a significant but tractable activity. inconsistent subjective judgments. a provider of taxonomy and content classification services and solutions. and simultaneously avoid a large percentage of incorrect results. either from a written “cookbook” for human classifiers.New Idea Engineering http://www. the rule is ANY entry from the terminology that represents joint ventures PLUS ANY entry from a list of joint venture document-record types PLUS ANY of the companies from a list of partners The key to end use success is precise classifications. either manually or with a rule-based technique. Stay with subjects that are simpler. the rule is: ANY entry from a list of states or postal codes that make up the territory. then the rule[s] cannot perform well. For the classification “Joint Ventures”. and are able to be represented by proper names. or each person being asked to classify will have to make personal.Implementing The Taxonomy .ideaeng. Rule-based classifier software will require very low maintenance. For the classification “Northeast Territory”.] 3 of 3 02/11/2013 11:19 PM . For the classification “Product X Family”. or software. Is the message that not all subjects belong in a taxonomy? Yes. combined with a list of the product proper names or model identifiers. why ignore them? The best possible use for a statistical or learning type classification approach is after you have classified a large and varied group of document-records. the English language and the richness of its usage will defeat all the good intentions of a taxonomy. Since you just spent dollars and time to define rules. Classification should be able to be perfect and should be perfect. Create more and simpler classifications. the rule is: ANY item from the list of product models.. If the classification rule depends on product descriptive information. Statistical classifier software needs regular re-calibration to address new or modified classification rules. There are basically two approaches to classification in software. those packages that accept and execute rules. identifiers or other unique evidence. “Relative” quality (some mis-classified and some missed) will destroy your user’s confidence. either in-house or from third party Each classification is defined by rules. and those packages that use statistical techniques (“content like this”) to construct their own rules. The results of 20 years of organized co-operative research into better textual query have yet to produce techniques and languages that consistently find a large percentage of correct results. The easy part of taxonomy implementation is the actual assignment of rules. rather than fewer and sophisticated classifications. qualitative or descriptive subjects in your taxonomy. [John Lehman is Co-Founder and President of HighClassify Inc. While the statistical vendors can fairly claim that their products avoid the “rigor” of classification definition. Avoid vague.