Professional Documents
Culture Documents
Denisa Popescu
Enterprise Architecture
World Bank Group
Presentation Outline
As a result,
• Uneven capture of information and metadata across the
Bank’s institutional repositories
• Similar information resides in multiple repositories
• Multiple representations for same type of “information”
• Staff can’t find related information
Information in Bank’s Environment
Knowledge
Sharing People find information
People create information
by searching or browsing
repositories
Create,
Architecture Processes to Manage Information Capture &
Structured Catalog
Information
Structures
SAP P/Soft Other DBs Capture
Enterprise Unstructured
Architecture Governance Manage
(Models, Stewards, Data Harmonization)
Technology, Publish
Documents Email Multimedia Etc. Collections
Information
Access/Usage
Employee
Employee Business
Vendor Project
Client
Client
Client //Consultant
Consultant Partner Party
Finance
WB Product
Project
Project Organization
Policy (Cost Centre –
&
Fund Centre,
Service
Chart of Account)
Reference
Reference Documents
Data
Data
Identity
Identity & Reports Theme
Theme Sector
Sector
Geographical
Geographical Country
Country
Project
Client Party
Identity
Title
Topics
Author
Core Business Function
Owner
Keywords
Abstract
Extension Country
Project ID
ResourceIdentifier
Automatic Metadata Capture using Teragram
You might think it would be easy to tell what the document you’re reading is
about. However, this software can tell us not only what you think it’s about,
but what the Bank thinks it’s about.
Fortunately, we can
automatically process a great
deal of it, using a Teragram that
scans documents, recognizes
terms and categorizes them for
us. This is often more effective
than letting a human being try
to figure out what a document
is about.
For example, the Bank produces a working paper on “Sustainable tourism
and cultural heritage”. This report provides and overview of the relation
between culture heritage preservation and tourism and present
strategies for promoting sustainability in tourism industry associated with
cultural heritage sites and natural environments.
This concerns
preservation of
heritage sites
Enterprise Topic
Taxonomy
Automatic Metadata Capture: E-Library
Teragram-generated
Topics , Keywords,
Region
Example of Use of Thesaurus in Search
Teragram-
generated
(Thesaurus)
Automatic Metadata Extraction and Categorization
Group into Collections
Raw Content
Apply Extract
Teragram Metadata
Profiles
content metadata
Quality
control
• Ensure that you can derive rules from the information and context