Professional Documents
Culture Documents
Outline
What Ill cover:
Metadata and IA. Metadata schema. Vocabulary development.
Underlying themes:
Standards. Reality. Some IR (information retrieval) issues.
Amy J. Warner, Ph.D. 2
What is Metadata?
Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives.
Chris Taylor University of Queensland
DEFINITION
Metadata used in managing and administering resources
EXAMPLES
Acquisition information Rights and reproduction tracking Documentation of legal access requirements Location information Version control Cataloging records Specialized indexes Hyperlinked relationships between resources Annotations by users Documentation of actions taken to preserve physical and digital versions of resources (e.g., data refreshing and migration) Digitization information (e.g., formats, compression ratios, scaling routines) Authentication and security data (e.g., encryptions, passwords) Use and user tracking Content re-use and multiversioning information
Descriptive
Preservation
Metadata related to the preservation of information resources Metadata related to how a system functions or metadata behaves
Technical
Use
Confusing Terminology
Controlled vocabularies
Subject Headings: traditionally employed in libraries to tag (index) the topics of books and other library materials Thesauri: traditionally employed in abstracting & indexing services to tag (index) the topics of journal articles and other scholarly material in a given subject area (e.g. medicine, engineering) Taxonomies: the classification of different organisms into mutually exclusive categories based on phylum species Amy J. Warner, Ph.D. 5
Levels of Control
(Vocabularies)
Synonym Rings
Authority Files
Classification Schemes
Thesauri
Taxonomies
Simple Complex
Equivalence
Hierarchical
(Relationships)
Associative
Metadata & IA
Business Context Users
Determine how target audience(s) search for and use information
Content
Identify patterns in content Determine how stakeholders want to organize &present their information
IA Generations
Brochureware Pages served from database Metadata-driven website
CMS
Metadata Records
http://.
Content
Two Possibilities
Content already exists
Identify content that exists--content inventory.
To do content inventory, need to go to those who are going to develop, own, maintain content.
Amy J. Warner, Ph.D. 11
Content Analysis
Look for patterns, similarities:
logical--themes, sensitivity, specialization. physical--formats, dynamic vs. static (dated vs. rarely updated).
Look for relationships--note connections between content (parent-child, sibling, dependencies. Begin to create groupings.
Amy J. Warner, Ph.D. 12
13
Metadata Table
Required Repeatable Auto/Manual Whole doc/ Concepts CV
Author
Yes
Yes
Manual
Whole Doc.
No
Title
Yes
No
Manual
Whole Doc.
No
DocType
No
Yes
Manual
Whole Doc.
DocTypes List
Subject
Yes
Yes
Semi-Auto
Concepts
Subjects Vocabulary
Audience
No
No
Manual
Whole Document
Audience List
14
Identify Terms
Published Reference Materials
Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes.
Content
Representative sample of web site / intranet.
Users
Search log analysis, surveys, interviews.
Experts
Authors, subject experts. Amy J. Warner, Ph.D. 16
Organize Terms
Define preferred terms. Link synonyms and variants.
Synonym Rings
17
Variant Terms
Variant terms provide the user with entry points into the vocabulary. Synonyms (same meaning): cats USE felines helicopters USE whirlybirds Lexical Variants (different word forms): paediatrics USE pediatrics BK USE Burger King Quasi-Synonyms (treated as equivalent): generic posting: beagle USE dog antonyms/continuum: wetness USE dryness Amy J. Warner, Ph.D. 18
Term Specificity
Assuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast).
Vocabulary A United States Vocabulary B United States California San Diego
19
Compound Terms
Article Title: Software for Information Architects
One Term Information Architecture Software High Precision High Recall Two Terms Information Architecture Software Three Terms Architecture Information Software
20
Facets
Facets of a Topic
Things (entities) Concepts Processes People Organizations Occupations etc. etc.
Facets of Documents
Topic
Audience Intellectual Level Form Type Language Date
Controlled Vocabular(ies)
21
Facet Analysis
Facets come from content inventory, intuition, and users. Break domain into logical categories or chunks based on how documents need to be managed (both for system and for search).
22
Polyhierarchy
Strict Hierarchies
Each term appears in only one place in the hierarchy. Essential for placement of physical objects.
Polyhierarchies
Terms cross-listed in multiple categories Accepts complex nature of reality. Amy J. Warner, Ph.D. 23
Polyhierarchy
Medical Subject Headings (MeSH) Compound terms needed to manage 6 million documents in Medline. High level of pre-coordination forces polyhierarchy. Terms may have more than one BT.
Diseases
Virus Diseases
Viral Pneumonia
24
25
Semantic Relationships
Equivalence:
Use/Used For (USE/UF) Leads from variants to preferred
e.g., prams: USE baby carriages
A=B
26
Semantic Relationships
Hierarchical:
Broader Term/Narrower Term (BT/NT)
Types
Generic (class/species, inheritance)
Vertebrata NT Amphibia
A
B
27
Semantic Relationships
Associative:
Related Term (RT, See Also) Non-hierarchical and non-equivalent Relation should be strongly implied
e.g., hammers RT nails
28
Associative Relationships
Field of Study and Object of Study:
Forestry RT Forests
29
31
Routing
Document Stream Metadata Filter Document Subset
Profile or Filter
32
33
Searching
User Query Databases Document Subset
Metadata Records
http://.
34
Epicurious.com
35
36
Epicurious.com Facets
Main Ingredients
Beans, Beef, Berries, Cheese, Chocolate, Citrus, Dairy, Eggs, Fish, Fruits, Garlic, Ginger, Grains, Greens, Herbs, Lamb, Mushrooms, Mustard, Nuts, Olives, Onions, Pasta, Peppers, Pork, Potatoes, Poultry, Rice, Shellfish, Tomatoes, Vegetables
Season/Occasion
Christmas, Easter, Fall, Fourth of July, Hanukkah, New Years, Picnics, Spring, Summer, Superbowl, Thanksgiving, Valentine's Day, Winter
Cuisine
African, American, Asian, Caribbean, Eastern European, French, Greek, Indian, Italian, Jewish, Mediterranean, Mexican, Middle Eastern, Scandinavian, Spanish
Course/Dish
Appetizers, Bread, Breakfast, Brunch, Condiments, Cookies, Desserts, Hors D'oeuvres, Main Dish, Salads, Sandwiches, Sauces, Side Dish, Snacks, Soup, Vegetables
Preparation Method
Advance, Bake, Broil, Fry, Grill, Marinade, Microwave, No Cook, Poach, Quick, Roast, Saut, Slow Cook, Steam, Stir Fry
37
38
39
40
41
Contact:
Amy J. Warner, Ph.D. warneramyj@yahoo.com
Questions??
43