Professional Documents
Culture Documents
Animal Genetics - 2020 - Nicholas - Online Mendelian Inheritance in Animals OMIA A Record of Advances in Animal Genetics
Animal Genetics - 2020 - Nicholas - Online Mendelian Inheritance in Animals OMIA A Record of Advances in Animal Genetics
doi: 10.1111/age.13010
Summary For the last 25 years, Online Mendelian Inheritance in Animals (OMIA) has been providing free
global access to an ever-increasing record of discoveries made by animal geneticists around
the world. To mark this 25-year milestone, this document provides a brief account
(including some pre-history) of how OMIA came to be; some timelines of important
discoveries and advances in the genetics of the animal species covered by OMIA, gleaned
from the OMIA database; and an analysis of the current state of knowledge regarding likely
causal variants of single-locus traits in OMIA species, also gleaned from the OMIA database.
Keywords animal model, catalogue, disorders, likely causal variants, precision medicine,
rare disease, single-locus traits
Johns Hopkins medical faculty member, Heinrich Ursprung, All this came to fruition in 1995, when Carolyn
who used the word as if its meaning in the genetics context Bucholtz worked her wonders in creating the original
was well known in the 1960s (Ursprung 1967). In any OMIA web page, populating it with a flat file exported
case, Dr McKusick used it widely, e.g. McKusick (2007). from ARev, and teaming it up with a BIRX search engine
Consequently, each entry in MIM (and each non-gene entry (a www-compatible batch version of IRX), kindly provided
in its online successor OMIM – see below) is an entry for a by Dennis Benson and Randy Huntzinger, with support
phene. I quickly adopted this way of thinking. from David Lipman, all from NCBI. Dr McKusick solved a
In August 1978, shortly after the publication of the fifth pivotal problem with permissions to cite references,
edition of MIM (McKusick 1978), I introduced myself to Dr enabling the web page to be made public on 26 May
McKusick at the XIV International Congress of Genetics in 1995. Its (somewhat clumsy) domain name was
Moscow, explaining that I would like to do the same thing www.angis.su.oz.au/Databases/BIRX/omia/omia_form. On
for animals as he was doing for humans, i.e. to create 1 December of that same year, OMIM was launched on
Mendelian Inheritance in Animals (MIA). My thinking was to the www by NCBI as www.ncbi.nlm.nih.gov/omim, using
follow MIM by using a six-digit numeric ID for each phene the same BIRX search engine as had been provided to
and, wherever possible, the human name for all animal OMIA (McKusick 2007).
disorders that appeared to have a human homologue. The In May 1997, OMIA’s domain name was simplified to
MIM numeric ID would be used as a means of cross- omia.angis.org.au. A month later, NCBI launched a new
referencing MIM entries to MIA. Obviously, there would be bibliographic resource: PubMed was born.
a big stress on comparative biology. Dr McKusick was One obvious way to exploit the power of the Internet was
encouraging. to use the OMIM and OMIA six-digit IDs as a means of
In 1980, a colleague Jan Graham was employed on a creating reciprocal hyperlinks between the two resources.
three-year grant to do all the legwork in the library With substantial help from David Lipman from NCBI and
(including photocopying papers!) and then to enter the Joanna Amberger from Johns Hopkins University, reciprocal
relevant information into a mainframe database, prepared hyperlinking of OMIM and OMIA came into operation on 16
and managed by another colleague, Steve Brown. In this October 1997. This meant that, at the click of a mouse
way, MIA was gradually assembled. (inanimate in this case), anyone accessing OMIM had direct
Meanwhile, in the USA Dr McKusick and his colleagues access to information on animal models of human inherited
were, not surprisingly, early adopters of the Internet. In disorders, and veterinarians and breeders had immediate
September 1987, MIM became Online MIM (OMIM) when access to information on what could be called human
the MIM ASCII file was made available on the Internet from models of animal inherited disorders. This was a major step
the Welch Medical Library of Johns Hopkins University, forward for comparative biology.
searchable by the IRX (Information Retrieval eXperiment) Early in 2004, Mike Poidinger, then Head of ANGIS,
database searching program developed by the National generously transferred all the tables of the ARev database
Library of Medicine, with MIM as its first trial dataset on the laptop to a MySQL database on an ANGIS server. He
(Williamson 1986; Harman et al. 1988; McKusick 2007). also picked up the PubMed URLs for thousands of refer-
The next year, 1988, the National Center for Biotechnology ences. With invaluable contributions from numerous col-
Information (NCBI) came into existence, a birthchild of the leagues (each of whom is acknowledged at https://omia.
National Library of Medicine. org/acknowledgements/) in Sydney and at NCBI over the
Computing technology was evolving rapidly. Around next 18 months, on 10 September 2005 Carolyn’s original
1990, MIA was moved to a SCIMATE bibliographic database web page was retired, and the new ANGIS version of OMIA
on an IBM-compatible 286 microcomputer. A year or so was launched. For the first time, OMIA could be curated
later, it was moved again to a powerful relational database online. It was revolutionary to be able to add a reference or
called Advanced Revelation (ARev) (Small 1988), first on the change some text, and immediately see those changes on
microcomputer and then onto a laptop. Steve Brown’s ARev the website!
software was in regular use for more than 20 years. In that same year (2005) Matthew Mailman from NCBI
The year 1991 witnessed two major developments: the expressed interest in linking OMIA to a phenotype database
World Wide Web (www) came into global existence, and under development at NCBI. This proved to be the begin-
the Australian National Genomic Information Service ning of a fruitful collaboration, culminating in a mirror of
(ANGIS) was created at the University of Sydney. The OMIA being hosted at NCBI as an Entrez database. To see
potential of the www was immediately evident: here was OMIA and OMIM sitting alongside each other at NCBI was
means by which MIA could be made available throughout something special. Nine years later, the mirror was decom-
the world. And ANGIS had the local resources to achieve missioned by NCBI because of low traffic flow, compared
this. By regularly pestering ANGIS staff (especially their with other NCBI resources. In the end, this turned out to be
head programmer, Carolyn Bucholtz) I launched myself on not such a bad move, because, despite all the power of
a steep bioinformatics learning curve. NCBI’s Entrez system, the OMIA mirror was not able to
provide the same level of related information as the OMIA appreciated services were provided by the Sydney Infor-
home site. matics Hub within the University of Sydney. The projects
In March 2010, the OMIA database and website were are summarised and everyone involved is acknowledged at
relocated to a virtual server hosted by Information and https://omia.org/acknowledgements. A particularly impor-
Communications Technology at the University of Sydney. tant enhancement was the creation of downloadable tables
Commencing in August 2010, Matthew Hobbs began a of all known likely causal variants (Tammen & Nicholas
major overhaul of the OMIA master database and develop- 2018). Many more enhancements are planned for the
ment of a new website with improved curation tools, based on future and will be undertaken as soon as funding can be
django, a high-level Python Web framework. On 10 August raised. Following in the footsteps of OMIM (with guidance
2011, Matthew’s new version of OMIA was launched. from Joanna Amberger), a crowdfunding appeal for OMIA
From 1 September 2011, textual species names were was launched in 2019 (https://omia.org/donate/), in the
replaced with NCBI Taxonomy species IDs. As a conse- hope of generating sufficient funds to enable the many
quence, the OMIA ID became binomial, with the format much-needed enhancements to become reality, and to
OMIA xxxxxx-yyyy. . ., where xxxxxx is the six-digit ID for a render OMIA sufficiently sustainable for it to be able to
phene, and yyyy. . . is the NCBI species taxonomy ID celebrate its 50th birthday in 25 years’ time.
(usually four digits, but sometimes longer). This greatly
facilitated referencing and hyperlinking to OMIA entries in
Timelines of important discoveries and
publications: the URL for a particular OMIA page became a
advances in animal genetics
simple function of the OMIA binary ID.
Also in 2011, OMIM moved to a new website hosted In attempting to record all discoveries relating to inherited
by Johns Hopkins University, with the domain name disorders and non-disorder single-locus phenes in all animal
omim.org. A few years later, an opportunity arose for species not covered in other Internet resources (called OMIA
OMIA to obtain the ‘homologous’ domain name species), the OMIA database incorporates records of many
omia.org, thereby emphasising the comparative nature milestones that have been achieved in animal genetics.
and the global reach of the two websites. The purchase Nicholas & Hobbs (2014) reviewed the early discoveries of
was made from a domain register on 13 April 2015 for the phenes showing Mendelian inheritance, as documented in
princely sum of US$425! Consequently, the domain name OMIA. They also retold the often heroic stories of initial
omia.angis.org.au was retired after 18 years of loyal service. discoveries of likely causal variants in the 1980s and early
A recurring question has been whether OMIA should 1990s, as also documented in OMIA.
include QTL, especially for those multifactorial disorders The present paper summarises the historical storyline
that it does include, e.g. hip dysplasia. OMIA’s over-riding since the development of DNA molecular tools in the early
principle has been to leave QTL to AnimalQTLdb main- 1980s in a series of five supporting tables. Table S1
tained at https://www.animalgenome.org/cgi-bin/QTLdb/ summarises the first discoveries of each of the types of
index, i.e. to try, as far as possible, to highlight likely causal variants that give rise to the astonishing array
AnimalQTLdb as the repository for QTL. In practice, of of single-locus phenes in OMIA species. Table S2 sum-
course, there are substantial overlaps, and decisions are marises discoveries of the first likely causal variant in many
made on a case-by-case basis (not always with complete of the animal species included in OMIA. The next two tables
consistency!). One possible way of addressing this issue are gleaned from the OMIA list of mapping papers (https://
would be to apply the concept of endophenotypes (John & omia.org/key_articles/maps): Table S3 records the history
Lewis 1966), which can be considered as identifiable of draft genome sequences/assemblies and Table S4 records
Mendelian phenotypes that contribute to variation in a initial developments of high-density SNP panels. Table S5,
multifactorial trait. gleaned from the OMIA list of Landmark papers (https://
Looking to the future, OMIA has considerable potential to omia.org/key_articles/landmarks), records other important
help in resolving ‘buried SNPs’ (Sasaki et al. 2019) and discoveries and milestones in the genetics of OMIA species.
variants of unknown significance. In the latter case, for
example, it would be feasible for OMIA to include the
Current knowledge
relevant information about the 5667 single-locus human
disorders documented in OMIM, any one of which could We shall now briefly review the information in OMIA.
occur in any animal species, and for which, currently, only Table 1 provides a numerical summary of the current state
a small fraction (no more than 6%) have been reported in of knowledge recorded in OMIA (as at 12 June 2020). The
any OMIA species. Conversely, OMIA’s hyperlinks with total numbers of Mendelian phenes in each of the major
OMIM could be helpful to researchers puzzling over human animal species (first row of data), has increased by around
variants of unknown significance. 50% since a similar table was compiled by Nicholas & Hobbs
In recent years, several major enhancements of OMIA (2014). The total numbers of phene-species for which at
have been undertaken by software engineers whose much- least one likely causal variant has been published have
Table 1 The numbers of single-locus (Mendelian) traits published in OMIA species and the numbers of such traits for which likely causal variants
have been discovered, at the time of writing (12 June 2020)
Animal species
Dog Cattle Chicken Sheep Cat Pig Horse Goat Other Total
Total number of Mendelian traits 353 254 131 109 111 85 59 18 384 1504
Total number of Mendelian traits with at least one likely causal 281 161 51 54 76 39 46 14 166 888
variant known
Total number of likely causal variants known 407 216 66 68 119 47 97 25 150 1195
Hyperlinks to lists, and thence to details, of the actual traits and variants summarised in this table are available from the home page of Online
Mendelian Inheritance in Animals (OMIA): https://omia.org.
increased by around 75%, the overall total increasing from 275 716 variants discovered up to the end of 2019 (http://
499 to 888 in the past 6 years. www.hgmd.cf.ac.uk/), with the average number reported
Of particular interest are the actual numbers of likely over the most recent five full years being 21 212 per year.
causal variants that have been published for phenes in Despite the enormous difference in these numbers between
OMIA species: the total number has more than doubled humans and OMIA species, Fig. 2 shows that the relative
from 556 to 1195 in the past 6 years. The timeline of their rate of discovery is similar for the two groups, with an
discovery is shown in Fig. 1. In the last five complete years, increasing trend, reflecting the ever-increasing power of
the average number of new likely causal variants published genomic technologies.
was 81 per year, equivalent to one every 4.5 days. In the The types of variants in OMIA are classified according to
current year (2020), newly discovered likely causal variants the classification system developed by the HGMD, with a few
for OMIA species are being published at an average rate of modifications, such as dividing the HGMD missense/non-
one every 3.8 days. sense category into its two components. Fig. 3 shows the
The discovery of likely causal variants in humans has frequency distribution of types of variants reported in OMIA
been, of course, on a far larger scale. The Human Gene species (OMIA) compared with humans (HGMD). Although
Mutation Database (HGMD) (Stenson et al. 2017) lists the distributions are not homogeneous, overall there is
substantial similarity. In both cases, missense mutations Table 2 Distribution of pathogenic and non-pathogenic variants in
account for by far the largest percentage of variants. Noting OMIA species as at 12 June 2020, in relation to type of mutation, based
on the categories used by the Human Gene Mutation Database
(as in Nicholas & Hobbs 2014) that 21 of the 61 non-stop
(HGMD; http://www.hgmd.cf.ac.uk)
codons in the genetic code are ‘near-stop’, i.e. one base
substitution away from a stop codon, the expected ratio of Non-
missense to nonsense variants is 40:21 (34% nonsense), Pathogenic pathogenic Totals
assuming that all possible single-nucleotide substitutions
Totals 799 71% 334 29% 1133
and all non-stop codons are equally likely. The breakdown
of the current OMIA numbers for this category in OMIA Missense 255 32% 134 40% 389
Deletion, small (≤20 bp) 137 17% 45 13% 182
species is 389:130, which is significantly different from the
Nonsense (stop-gain) 105 13% 25 7% 130
expected ratio (Χ2 = 20.2, P = <0.0001), reflecting a lower Deletion, gross (>20 bp) 73 9% 20 6% 93
proportion of nonsense variants (25%). Interestingly, the Splicing 72 9% 18 5% 90
comparative human data from the Human Gene Mutation Insertion, small (≤20 bp) 54 7% 11 3% 65
Database (HGMD; http://www.hgmd.cf.ac.uk/ac/stats.php) Insertion, gross (>20 bp) 39 5% 19 6% 58
Complex rearrangement 15 2% 16 5% 31
are 129 708:29 997 (19% nonsense), which is an even
Regulatory 15 2% 12 4% 27
greater departure from the simple expectation. Apart from Delins, small (≤20 bp) 13 2% 4 1% 17
the obvious possible explanation that the stated assump- Duplication 6 1% 9 3% 15
tions do not hold, it could be that nonsense mutations are Repeat variation 4 1% 11 3% 15
more likely to give rise to early embryonic lethality, in Haplotype 1 0% 6 2% 7
Inversion 4 1% 1 0% 5
which case they would be less likely to be observed than
Extension (stop-lost) 3 0% 0 0% 3
missense mutations. Not known 1 0% 2 1% 3
Start-lost 1 0% 1 0% 2
Delins, gross (>20 bp) 1 0% 0 0% 1
Pathogenic vs. non-pathogenic variants:
different types of mutations? The variant category ‘haplotype’ is for cases where two or more sites of
mutation are in complete or very strong LD, and authors have argued
The number of published likely causal variants in OMIA that the non-wt variants at the mutational sites are jointly causal. The
species is now sufficient to enable an interesting question to variant category ‘delins’ is the word now recommended (in place of
be asked, namely whether there is any difference in the ‘indel’) for a deletion-insertion variant, by the Human Genome
Variation Society (https://varnomen.hgvs.org/recommendations/
distribution of the types of variants that are pathogenic (e.g.
DNA/variant/delins/)
those causing disorders) compared with those that are not
pathogenic (e.g. those giving rise to most coat and plumage not pathogenic in relation to single-locus phenes are
colours)? The relevant OMIA data are shown in Table 2, included. In interpreting these data, some potential biases
which presents the distribution of both types of variants in should be noted. Some types of mutation, e.g. regulatory
relation to the 18 types of mutation by which OMIA and complex rearrangement, may be under-represented
variants are classified. Not all variants in OMIA have been because by their nature, they are more difficult to discern
included in this analysis: only those that could be classified and to justify with publishable evidence. Also, deletions of
unambiguously (by the present author) as pathogenic or 3 bp, or multiples thereof, are likely to be much less
pathogenic than other categories of small deletions. How- represented (in the context of their overall frequencies) in
ever, as such mutations comprise only a small proportion of both categories of variants.
small (≤20 bp) deletions, this is unlikely to have a noticeable A different way to interrogate these data is to consider
effect on the conclusions. both variant categories together, and to then ask what is the
Of the total of 1133 variants in this analysis, 799 (71%) probability of each mutation type resulting in a pathogenic
are pathogenic. While this is consistent with the general variant? Our benchmark here is the overall proportion of
expectation that pathogenic variants are more likely than pathogenic variants, which in the reduced pooled dataset is
non-pathogenic variants (especially for coding regions), the 72%. In other words, from the pooled OMIA variant data set,
actual percentage is largely a reflection of the relative the estimated overall probability of a variant being patho-
numbers of pathogenic and non-pathogenic Mendelian genic is 0.72. For each of the six mutation types, the
phenes that have been amenable to study. Consequently, corresponding probabilities are 0.81 for nonsense mutations,
this proportion should not necessarily be taken as indicative 0.80 for splicing mutations, 0.76 for insertions/deletions,
of the chance of any new variant being pathogenic. 0.66 for missense mutations, 0.56 for regulatory mutations
In an attempt to draw useful conclusions from this and 0.48 for complex rearrangements.
dataset, the last nine mutation types in Table 2 were When interpreting the above results, it must be noted
omitted because of small numbers, leaving a total of 1065 that the data could be biased in the sense that some variant
variants. In analysing the remaining data, the starting point categories, e.g. missense and nonsense, are easier to
was to see to what extent different types of mutations could discover than others, e.g. regulatory. The key unanswered
be pooled on the basis of homogeneous distributions (judged question here is whether there is an interaction between
via Χ2 tests) between pathogenic and non-pathogenic ease of discovery and pathogenicity.
variants. Applying this strategy in a hierarchical manner
led to the following conclusions:
Conclusion
(i) For each of insertions and deletions, the distribution of
small (≤20 bp) and gross (>20 bp) is homogeneous The material summarised in this review gives a tantalising
between pathogenic and non-pathogenic variants (in- hint of some of the information in OMIA. In the not-too-
sertions Χ2 = 4.17; P = 0.07; deletions Χ2 = 0.35; distant future, it is hoped that the information included
P = 0.66). Consequently, each of these mutation types here, and in the Supporting Information, can be made
can be pooled across size. available on the OMIA website in enhanced graphic form
(ii) The distribution of insertions and deletions is that will facilitate education and research. Given the nature
homogeneous between pathogenic and non-pathogenic of the Supporting Information, the author would be grateful
variants (Χ 2 = 0.03; P = 0.87). Consequently, these two to be informed of anything that needs to be corrected or
mutation types can be pooled. included.
This leaves six mutation types, as shown in Fig. 4, which In the meantime, the utility of OMIA can be greatly
compares the frequency distribution of those six mutation enhanced if authors include relevant hyperlinked OMIA IDs
types for pathogenic and for non-pathogenic variants. The in publications describing any phenes that are included in
first three mutation types in Fig. 4 (insertions/deletions, OMIA. Because the URL for any OMIA phene is a simple
nonsense, and splicing) are more common in pathogenic function of its ID (e.g. the hyperlink for OMIA 000402-
variants, and the other three (missense, complex rearrange- 9940 is https://omia.org/OMIA000402/9940/), hyperlink-
ments and regulatory) are more common in non-patho- ing is a simple task. Some researchers are already doing
genic variants. Importantly, all six types are well this. Everyone is encouraged to do so.
Data availability
Supporting information
The animal variant data analysed in this paper are freely
Additional supporting information may be found online in
available from the table on the OMIA home page https://
the Supporting Information section at the end of the article.
omia.org. The human variant statistics were obtained from
the Human Gene Mutation Database (http://www.
Table S1. First publication of each of the major types of
hgmd.cf.ac.uk/ac/stats.php).
likely causal variants in OMIA animal species
Table S2. First publication of a likely causal variant in
References OMIA animal species
Table S3. Draft genome sequences/assemblies, gleaned
Harman D., Benson D., Fitzpatrick L., Huntzinger R. & Goldstein C.
(1988) IRX: an information retrieval system for experimentation from the OMIA list of mapping papers
and user applications. ACM SIGIR Forum 22, 2–10. https://doi. Table S4. The initial development of high-density SNP
org/10.1145/54347.54348. panels, gleaned from the OMIA list of mapping papers
John B. & Lewis K.R. (1966) Chromosome variability and Table S5. Other important discoveries/advances in OMIA
geographic distribution in insects. Science 152, 711–21. animal species, gleaned from the OMIA list of landmark
https://doi.org/10.1126/science.152.3723.711. papers