You are on page 1of 19

Bio-Ontologies: a New Means of Travel for Biological Facts


The role of Bio-Ontologies [BOs] in biological databases Four interpretive steps in standardization The epistemic status of BO terms: situating concepts A new type of theory in biology? Back to Mary Hesses network view Implications: data travel and use across research contexts Conclusion: on technology and theory-making

Biological Ontologies [BOs]

Fast accumulation of data on model organisms, esp. genomics Fragmentation of biology into local epistemic cultures Common yearning for integrative understanding of organisms

Goal: enhance availability and usability of data across research contexts Means: formal representations of areas of knowledge in which the
essential terms are combined with structuring rules that describe the relationship between the terms. Knowledge that is structured in a bioontology can then be linked to the molecular databases (Bard and Rhee 2004) Precisely defined terms related through DAGs structures Association of terms with datasets

E.g. Gene Ontology: Precise definition, large set of associated data

Search by GO

Wnt receptor signaling pathway

Search returns children Sum of MGI data

Returns set of genes annotated to this term Search returns annotations to terms and subterms (children)

BO Terms as Standards
Standard = Coordination device facilitating interdisciplinary research (Berg 2004) BO terms as neutral tools for scientific communication and exchange:
Data are attached to specific BO terms purely for the purposes of retrieval by biologists interested in investigating the phenomenon to which the term refer No theoretical interpretation involved: BO terms are broad classificatory concepts conceived to pass on information without distorting or interpreting it

However: Interpretation in standardisation is unavoidable

(Bowker & Star 1999)

Interpreting to Standardize: 4 Steps

1. Abstraction processes: Masking, distorting, simplifying or eliminating characteristics of entities to be standardised (data formatting) 2. De-contextualisation processes: Black-boxing specific interests, methods and goals of producers of data (non-locality: decoupling marks from provenance) 3. Knowledge-stabilisation processes: Assemble precise definitions for each term and relation so as to mirror (what curators see as) the consensus in contemporary biology 4. Situating processes: Associate each dataset with a specific term (and thus a specific phenomenon) = standardisation processes influence the database users understanding and use of data

BO Terms as Situating Concepts

Unambiguously defined as referring to specific phenomena (knowledge-stabilisation process) Through gene annotation, each available dataset is associated with one or more BO term (situating process). This makes it possible to retrieve data relevant to the phenomena captured by those terms. But also, .. .. it fixes the biological relevance of data as evidence: BO terms determine the range of phenomena to be researched by reference to each dataset BO terms are situating concepts = they determine the future applicability of data by fixing the research contexts in which data can be of use
Vs. Unifying or explanatory concepts: do not aim at explaining phenomena, but rather at describing a phenomenon so that data associated with it can easily be retrieved

Select data about gene product TSK from publication: Suzuki et al., 2005 Plant Cell Physiol. 46:736-742. TONSOKU Is Expressed in S Phase of the Cell Cycle and Its Defect Delays Cell Cycle Progression in Arabidopsis Associate with term G2/M transition of mitotic cell cycle, which is defined as progression from G2 phase to M phase of the mitotic cell cycle Looking for data on the mitotic cell cycle, researchers find gene product TSK as relevant to the G2/M transition Gene product TSK could be relevant to researching other parts of the mitotic cell cycle but there is no evidence for this, so the database does not report this possibility = the biological relevance of dataset Y is restricted to the phenomenon captured by the term G2/M transition, thus excluding other, possibly relevant phenomena

Select data from publication or repository

Associate data with GO term

GO term refers to phenomenon X

Data are situated as relevant to phenomenon X and not to other phenomena

A New Type of Theory in Biology? Mary Hesses three criteria:

1. Network of concepts:
Situating concepts rather than unifying or explanatory concepts

2. Observational and theoretical language

Concepts are primarily meant to refer to existing phenomena: mix of observational and theoretical

3. Internal coherence and economy:

Consistency among terms = should not have the same referents (otherwise redundant/obsolete) Minimalism = the most useful standards are those that consist of the minimal number of the most informative parameters (Brazma et al 2006, 594)

Implications Data travel made easier:

Easy retrieval and comparison Easy to check and form new hypotheses Relatively simple access skills: IT skills & acquaintance with BOs

What about data use?

Easy retrieval of information about data provenance (evidence codes) BUT users need to be aware of interpretive processes involved in standardization

Conclusion: When Technology Makes a Difference to Theory-Making

Digital technology does not guarantee objectivity: Curators estimate the biological relevance of data as evidence for phenomena Curators define situating concepts Yet, technology efficiently mediates between different (local) expertises: Integration of data from various sources Opportunity for comparisons and queries Differential access to information depending on expertise: layers of complexity and detail reachable through a mouse click BOs & bioinformatics: towards integration without unification?

Bio-ontologies are often presented as a neutral tool for the diffusion of facts about organisms to biologists: that is, as a way to standardise the terminology and relations among terms used to describe biological processes, so that the immense amount of (especially microbiological) data recently accumulated on various aspects of the main model organisms can be brought together and made accessible to the whole biological community. In this paper, I argue that bio-ontologies are not a neutral vehicle for the diffusion of evidence. Rather, they constitute a new type of biological theory, incorporating a specific perspective on biological phenomena, through which data are re-interpreted in order to fit specific research goals. Notably, one of these goals consists of integrating the available knowledge about various aspects of any organisms into an overall understanding of their biology. The main issues that I shall address in this paper are thus the following: how well do biological facts circulate through bio-ontologies? How effective is the use of bio-ontologies towards obtaining integration in biology? And what kind of integration is that is it actually possible to distinguish it from a kind of theoretical unification? In addressing these questions, I focus on the use of one of the bio-ontologies, the so-called Gene Ontology, to structure and display data about Arabidopsis thaliana within The Arabidopsis Information Resource.

No associated data!

Opens Browser