You are on page 1of 43

Taxonomy & Metadata / Information Architecture Consulting

Metadata & Taxonomies for a More Flexible Information Architecture


Information Architecture Summit March 16, 2002 Amy J. Warner, Ph.D. warneramyj@yahoo.com

Amy J. Warner, Ph.D.

Outline
What Ill cover:
Metadata and IA. Metadata schema. Vocabulary development.

Underlying themes:
Standards. Reality. Some IR (information retrieval) issues.
Amy J. Warner, Ph.D. 2

What is Metadata?
Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives.
Chris Taylor University of Queensland

Amy J. Warner, Ph.D.

Types & Functions of Metadata


TYPE
Administrative

DEFINITION
Metadata used in managing and administering resources

EXAMPLES
Acquisition information Rights and reproduction tracking Documentation of legal access requirements Location information Version control Cataloging records Specialized indexes Hyperlinked relationships between resources Annotations by users Documentation of actions taken to preserve physical and digital versions of resources (e.g., data refreshing and migration) Digitization information (e.g., formats, compression ratios, scaling routines) Authentication and security data (e.g., encryptions, passwords) Use and user tracking Content re-use and multiversioning information

Descriptive

Metadata used to describe or identify information resources

Preservation

Metadata related to the preservation of information resources Metadata related to how a system functions or metadata behaves

Technical

Use

Metadata related to the level and type of use of information resources

Introduction to Metadata, Getty Information Institute

Amy J. Warner, Ph.D.

Confusing Terminology
Controlled vocabularies
Subject Headings: traditionally employed in libraries to tag (index) the topics of books and other library materials Thesauri: traditionally employed in abstracting & indexing services to tag (index) the topics of journal articles and other scholarly material in a given subject area (e.g. medicine, engineering) Taxonomies: the classification of different organisms into mutually exclusive categories based on phylum species Amy J. Warner, Ph.D. 5

Levels of Control
(Vocabularies)

Synonym Rings

Authority Files

Classification Schemes

Thesauri

Taxonomies
Simple Complex

Equivalence

Hierarchical
(Relationships)

Associative

Amy J. Warner, Ph.D.

Metadata & IA
Business Context Users
Determine how target audience(s) search for and use information

Content
Identify patterns in content Determine how stakeholders want to organize &present their information

Amy J. Warner, Ph.D.

IA Generations
Brochureware Pages served from database Metadata-driven website
CMS

Amy J. Warner, Ph.D.

Metadata in Metadata-Driven Websites


Author J. Jones Title xxxx DocType White Paper Audience Employees URL http://...

Metadata Records

http://.

Content

Amy J. Warner, Ph.D.

Two Parts to Generating a Metadata Schema


Decisions about indexable parameters (attributes, aspects) of documents; this corresponds to fields in the database records. Decisions about the elements (terms, descriptors, subject headings, tags) that these fields contain.
Amy J. Warner, Ph.D. 10

Two Possibilities
Content already exists
Identify content that exists--content inventory.

Most or all content does not exist


Use wish lists to identify desired content.

To do content inventory, need to go to those who are going to develop, own, maintain content.
Amy J. Warner, Ph.D. 11

Content Analysis
Look for patterns, similarities:
logical--themes, sensitivity, specialization. physical--formats, dynamic vs. static (dated vs. rarely updated).

Look for relationships--note connections between content (parent-child, sibling, dependencies. Begin to create groupings.
Amy J. Warner, Ph.D. 12

Generating a Metadata Table


The beginning of a metadata-driven website. Determine the major indexable parameters or attributes for each major document type in your sample. Determine what major types of rules or general guidelines your indexing system will follow for each attribute. Create an X-by-Y table. Put indexable attributes on the X axis and the rules on the Y axis. Fill in the decisions you make about each rule application in the individual cells of the table.

Amy J. Warner, Ph.D.

13

Metadata Table
Required Repeatable Auto/Manual Whole doc/ Concepts CV

Author

Yes

Yes

Manual

Whole Doc.

No

Title

Yes

No

Manual

Whole Doc.

No

DocType

No

Yes

Manual

Whole Doc.

DocTypes List

Subject

Yes

Yes

Semi-Auto

Concepts

Subjects Vocabulary

Audience

No

No

Manual

Whole Document

Audience List

Amy J. Warner, Ph.D.

14

User and Stakeholder Involvement


When organizing content, start with the content, generate the metadata, and then evaluate with users and stakeholders. When organizing entities (i.e. products, projects) where content is not the major focus, start with stakeholders and users to determine metadata.
Amy J. Warner, Ph.D. 15

Identify Terms
Published Reference Materials
Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes.

Content
Representative sample of web site / intranet.

Users
Search log analysis, surveys, interviews.

Experts
Authors, subject experts. Amy J. Warner, Ph.D. 16

Organize Terms
Define preferred terms. Link synonyms and variants.
Synonym Rings

Group preferred terms by subject. Identify broader and narrower terms.


Taxonomies / Hierarchies

Identify related terms.


Thesauri

Amy J. Warner, Ph.D.

17

Variant Terms
Variant terms provide the user with entry points into the vocabulary. Synonyms (same meaning): cats USE felines helicopters USE whirlybirds Lexical Variants (different word forms): paediatrics USE pediatrics BK USE Burger King Quasi-Synonyms (treated as equivalent): generic posting: beagle USE dog antonyms/continuum: wetness USE dryness Amy J. Warner, Ph.D. 18

Term Specificity
Assuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast).
Vocabulary A United States Vocabulary B United States California San Diego

Amy J. Warner, Ph.D.

19

Compound Terms
Article Title: Software for Information Architects
One Term Information Architecture Software High Precision High Recall Two Terms Information Architecture Software Three Terms Architecture Information Software

Amy J. Warner, Ph.D.

20

Facets
Facets of a Topic
Things (entities) Concepts Processes People Organizations Occupations etc. etc.

Facets of Documents
Topic
Audience Intellectual Level Form Type Language Date

Controlled Vocabular(ies)

Aspects of Documents to Index

Amy J. Warner, Ph.D.

21

Facet Analysis
Facets come from content inventory, intuition, and users. Break domain into logical categories or chunks based on how documents need to be managed (both for system and for search).

Amy J. Warner, Ph.D.

22

Polyhierarchy
Strict Hierarchies
Each term appears in only one place in the hierarchy. Essential for placement of physical objects.

Polyhierarchies
Terms cross-listed in multiple categories Accepts complex nature of reality. Amy J. Warner, Ph.D. 23

Polyhierarchy
Medical Subject Headings (MeSH) Compound terms needed to manage 6 million documents in Medline. High level of pre-coordination forces polyhierarchy. Terms may have more than one BT.
Diseases

Virus Diseases

Respiratory Tract Diseases

Viral Pneumonia

Amy J. Warner, Ph.D.

24

Facets, Coordination, Specificity


Partial List of Potential Combinations Entities Apples Pears Peaches Processes Canning Freezing Drying Forms Canned Frozen Fresh Apples Pears Peaches Canning Freezing Drying Canned Frozen Fresh Canning of Apples Canning of Pears Canning of Peaches Freezing of Apples Freezing of Pears Freezing of Peaches Drying of Apples Drying of Pears Drying of Peaches Canned Apples Canned Pears Canned Peaches Frozen Apples Frozen Pears Frozen Peaches Fresh Apples Fresh Pears Fresh Peaches Freezing of Canned Apples Canning of Dried Pears Drying of Fresh Peaches

Amy J. Warner, Ph.D.

25

Semantic Relationships
Equivalence:
Use/Used For (USE/UF) Leads from variants to preferred
e.g., prams: USE baby carriages

A=B

Amy J. Warner, Ph.D.

26

Semantic Relationships
Hierarchical:
Broader Term/Narrower Term (BT/NT)

Types
Generic (class/species, inheritance)
Vertebrata NT Amphibia

A
B

Whole-Part (associative unless exclusive)


Ear NT Vestibular Apparatus

Instance (proper name)


Seas NT Mediterranean Sea

Amy J. Warner, Ph.D.

27

Semantic Relationships
Associative:
Related Term (RT, See Also) Non-hierarchical and non-equivalent Relation should be strongly implied
e.g., hammers RT nails

Amy J. Warner, Ph.D.

28

Associative Relationships
Field of Study and Object of Study:
Forestry RT Forests

Process and its Agent:


Temperature Control RT Thermostat

Concepts and their Properties:


Poisons RT Toxicity

Action and Product of Action:


Weaving RT Cloth

Concepts Linked by Causal Dependence:


Bereavement RT Death

Amy J. Warner, Ph.D.

29

Leveraging the Thesaurus


User Interface:
Generate browsable indexes (site-wide, sub-site, specialized authority lists). Enable Field-Specific Searching (filters, zones, sorting). Support personalization (map profile to vocabulary).

Behind the Scenes:


Enable efficient content management. Support decentralized tagging. Amy J. Warner, Ph.D. 30

Uses of Metadata-Driven Website


Routing Search Navigation

Amy J. Warner, Ph.D.

31

Routing
Document Stream Metadata Filter Document Subset

From Individual Contributors or Syndication Service

Profile or Filter

Amy J. Warner, Ph.D.

32

Generalizations about Routing


Can be push or pull. Can be driven by various metadata elements (e.g., audience, topic, etc.). May have both internal and external metadata schemes to consider; mapping may be an important issue.

Amy J. Warner, Ph.D.

33

Searching
User Query Databases Document Subset

Metadata Records

http://.

Amy J. Warner, Ph.D.

34

Epicurious.com

Amy J. Warner, Ph.D.

35

Epicurious, First Facet

Browse > Picnics

Amy J. Warner, Ph.D.

36

Epicurious.com Facets
Main Ingredients
Beans, Beef, Berries, Cheese, Chocolate, Citrus, Dairy, Eggs, Fish, Fruits, Garlic, Ginger, Grains, Greens, Herbs, Lamb, Mushrooms, Mustard, Nuts, Olives, Onions, Pasta, Peppers, Pork, Potatoes, Poultry, Rice, Shellfish, Tomatoes, Vegetables

Season/Occasion
Christmas, Easter, Fall, Fourth of July, Hanukkah, New Years, Picnics, Spring, Summer, Superbowl, Thanksgiving, Valentine's Day, Winter

Cuisine
African, American, Asian, Caribbean, Eastern European, French, Greek, Indian, Italian, Jewish, Mediterranean, Mexican, Middle Eastern, Scandinavian, Spanish

Course/Dish
Appetizers, Bread, Breakfast, Brunch, Condiments, Cookies, Desserts, Hors D'oeuvres, Main Dish, Salads, Sandwiches, Sauces, Side Dish, Snacks, Soup, Vegetables

Preparation Method
Advance, Bake, Broil, Fry, Grill, Marinade, Microwave, No Cook, Poach, Quick, Roast, Saut, Slow Cook, Steam, Stir Fry

Amy J. Warner, Ph.D.

37

Epicurious, Second Facet


Browse > Picnics > Poultry

Amy J. Warner, Ph.D.

38

Integration of Search and Browse

Amy J. Warner, Ph.D.

39

Integration of Search and Browse

Amy J. Warner, Ph.D.

40

Amazon.com Advanced Search

Amy J. Warner, Ph.D.

41

Generalizations about Search & Navigation


The relationship between the metadata and search engine capabilities is crucial. Controlled vocabulary and keyword searching are often both enabled. Navigation and search are often both provided as complements to each other.
Amy J. Warner, Ph.D. 42

Contact:
Amy J. Warner, Ph.D. warneramyj@yahoo.com

Questions??

Amy J. Warner, Ph.D.

43

You might also like