Professional Documents
Culture Documents
Phrase Output
IT Information Technology
HR Human Resources
Finance Finance
Engineering Engineering
Matthew Lawler lawlermj1@gmail.com
17 Sep 2019 9
Beyond Data Glossary 101
Authority
Authority Type represents the 'Who' of phrases.
That is, which person or Org has defined this phrase.
This is very useful for defusing definition wars.
Authority Authority Type Comment
Any term used by the organisation without an
Internal Adhoc external authority.
Womb of Ignorance, Kraziness and
Wiki Adhoc Incomprehension
Oracle Commercial Organisation
Kimball Expert
AG Government
Water Act
2007 Parliamentary Act
ANSI Standards Organisation
Matthew Lawler lawlermj1@gmail.com
17 Sep 2019 10
Beyond Data Glossary 101
Phrase
A phrase is a single word, or common multiword
phrase. A set of phrases is a Corpus of words.
Phrase Phrase Type Expansion Domain
WIP Acronym Work In Progress AllDomains
Work AllPhrase AllDomains
Yr Contraction Year AllDomains
E Letter AllDomains
Workstatus MultipleWords [Work, Status] AllDomains
9 Number AllDomains
Accrued PastTense AllDomains
Works Plural AllDomains
Oracle ProperNoun Organisation
Macaddress Term IT
Iadc ZRubbish ZDomain
Matthew Lawler lawlermj1@gmail.com
17 Sep 2019 11
Beyond Data Glossary 101
Column Name
Main input of database names, including schema,
table and column. This can be extracted using
SQL from the metadata tables.
Schema Table Name ORD Column Name
Accrued ACCRUED
Activities Activities
Activities ACTIVITIES
E E
Macaddress Macaddress
Macaddress MACADDRESS
Matthew Lawler lawlermj1@gmail.com
17 Sep 2019 13
Beyond Data Glossary 101
Name2Phrase
For each Name, this shows the phrase list, and any unparsed string.
Cardinality = O(Column Name) (e.g. 200,000).
This shows examples of true and false positive parsing examples.
name2PhraseOutName name2PhraseOutSnippetsFinal ? Note
No Underscore, but still
ACTIONWHENCOMPLETE [Action,When,Complete] 0 works
ORDER_TOTAL_ELAPSED_DURATION_H
OURS_WH [Order,Total,Elapsed,Duration,Hours,Wh] 0 Underscore separator
EFFORTTRACKINGTOTALTIMESPENTHOU
RS [Effort,Tracking,Total,Times,PE,NT,Hours] 1 Need to add Timespent
Snippet 2
Phrase Output
Name 2
Name Parse Snippet Input
Name 2
Phrase Join Phrase
Phrase 2
Invert Name
Future?
Extracting words from Documents.
Grammatical rules + lexemes
NLP - Natural Language Processing