Professional Documents
Culture Documents
Matching
Matching
1. Upload
2. Import
3. Search via bar
4. Search via filter
5. dry-run
Upload, Import, Search via filter, and dry-run match type-specifically, Search via bar has to
match against all types.
Normalization
1. Replace umlauts and special characters: Ä->ae, Ö->oe, Ü->ue, ä->ae, ö->oe, ü->ue, ß->ss
2. Replace capital letters with lower case
3. Replace all but [0..9][a..z] with blank
4. Separate into a list of words where word is any sequence != blank and > 2 letters
5. Remove stop-words from the list of words
6. Remove duplicates from the list of words
7. Store the list of words sorted alphanumerically as one text, separated by blanks with the
AU entry
For the AU it will be done with every creation or change of an AU entry and saved with the AU.
AU data Description
value_<lang> the defining phrase, consisting of one or more words
variants_<lang> one or more phrases for value
groups_<lang> one or more generic terms for value
links_<lang> one or more URLs describing the value
n_value_<lang> normalized value phrase, unique between different types
n_variants_<lang> normalized variant phrases
n_groups_<lang> normalized group phrases
n_value_all combined languages of normalized values
n_variants_all combined languages of normalized variants and values
n_groups_all combined languages of normalized groups and variants and values
Matching
Matchings
For the phrase to be matched, normalization has to happen on the fly and ends in a list of words.
We define the following matchings between the resulting list of words and the AU:
Algorithms
Basic algorithm for Upload, Import, Search via filter, and dry-run:
1->2->3->4->5->6->7->8->9->10->11->12->13->14->15->16->17->18