Professional Documents
Culture Documents
Faculty Presentation For University of Arizona Human Language and Technology 2020 Homecoming
Faculty Presentation For University of Arizona Human Language and Technology 2020 Homecoming
Fig. 6. Formatted description of the common sunflower (Helanthus annuus) as constructed from traits extracted by the ETC. From the beta
version of the digital Flora of North America. See also the list of the individual traits in Fig. 5, and compare both with the original text in Fig. 1.
Source: http://dev.floranorthamerica.org/Helianthus_annuus.
Fig.1. Trait data coverage in plants and vertebrates. For most plant and vertebrate orders less than
10% of species are represented in these databases (a-b), and thousands of species in most countries
have no trait data at all (c-d). Adapted from Feng et al (in review).
Trait Extraction from Textual
Descriptions
terms Categorical
Glossaries
Character extraction
Domain
Ontologies
OTO: Group Terms
Term Report: History and
Comments
Plant Glossaries Built in OTO
• Plant_glossary v0.1
• 3293 terms from the published FNA
Categorical Glossary + FNA v19
• New terms from 30 vols of FNA & FoC
• FNA vols 3-5, 7-8, 19-23, and 26-27;
• FoC vols 4-14, 18, and 20-25;
Issues
• OBO Foundry
• http://obofoundry.org/
• Anatomy ontologies
• UBERON, etc.
• Limb, fin, head, …
• BSPO (Biological Spatial Ontology)
• Anterior region, posterior region, …
• Quality ontology
• PATO (phenotype quality ontology)
• Color, shape, …
EQ Generation EQ = Entity+ Quality
Eye absent
UBERON:Eye PATO:absent
• Goal
• Convert character descriptions to EQ statements using the
set of ontologies
Machine-against-People on EQ
Generation
• Machine: CharaParser+EQ
• People: 3 post-doc curators
• Experiment:
• 202 characters selected randomly
• 3 post-doc curators created EQs
• Naïve round: based on character descriptions only
• Knowledge round: free access to other resources
• Ontologies used
• UBERON, BSPO, PATO
• Curators can add terms independently when needed to
ontologies in both rounds of curation
Ontologies Generated
average
CharaParser 0.24 0.26 0.41 0.49
+EQ
Result: Naïve vs. Knowledge
rounds
Curators modified > 50% EQs in the Knowledge round:
Changed/ # Added # Removed no change in term
total states terms terms count
naïve_1_know_1 261/463 87 78 96
naïve_2_know_2 326/463 79 119 128
naïve_3_know_3 298/463 92 97 109
Ontologies
*No communication btw authors/curators
*Curators’ “weak” may not be author’s “weak”
Literature/
Authors
Ontologies
Curators
Currently: