Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224027702
Article in Philosophical Transactions of The Royal Society B Biological Sciences · June 2001
Impact Factor: 7.06 · DOI: 10.1098/rsta.2001.0817
CITATIONS READS
80 35
4 authors, including:
1. Introduction
Computer modelling of biological processes can be a valuable complement to experi-
mental methods. Modelling can help place experimental data in a meaningful context
and allow scientists to investigate questions that are difficult or impossible to address
experimentally. Modelling also facilitates the exploration of the parameter space of
a system, helping scientists determine which features of the system exert the most
influence over its behaviour.
Phil. Trans. R. Soc. Lond. A (2001) 359, 1073–1089 c 2001 The Royal Society
1073
1074 W. J. Hedley and others
2. Model
A model defined in CellML consists of a network of interconnected components.
Models are organized into the following structures.
Components, which are the smallest functional units in a model. Each component
contains the mathematics that describes the behaviour of the portion of the system
represented by that component. For instance, an electrophysiological model of a
cell might be organized into components that represent various ion channels. All of
membrane
V V V
I Na IK IL
h n
m
Figure 1. The network structure of the CellML description of the Hodgkin–Huxley squid axon
model. The shapes represent components and the lines correspond to connections along which
variables are passed. The solid arrow heads point towards the parent component in a containment
relationship, hollow arrow heads point towards the parent in both geometric containment and
encapsulation relationships. Variable names are displayed next to the component in which they
are declared alongside connections on which their value is exported to other components.
the mathematics that describe the behaviour of the L-type calcium channel would
be defined in a single component representing this particular ion channel.
Connections, which are used to connect components to each other and to map
variables in one component to variables in another.
Groups, which allow the modeller to indicate the existence of logical or physical
groups of components.
Metadata, which provide context for the model.
The use of these elements is best demonstrated by example. We will use the
Hodgkin–Huxley (Hodgkin & Huxley 1952) model of the giant squid axon as an
example throughout this paper. Some minor changes have been made to the original
published model to reflect current modelling practice. For instance, the membrane
voltage is defined with respect to absolute zero, and not the resting potential of the
membrane. Excerpts of the CellML document describing this model are presented in
this paper. The full model is available online at http://www.cellml.org/examples/
hh squid axon 1952/. A diagram of the model’s structure, as defined in the CellML
document, is shown in figure 1.
The usual root element for a CellML document is the model element, <model>
<model>.
The <model> element for the Hodgin–Huxley example is shown in the following XML
fragment.
<model
name = "Hodgkin_Huxley_squid_axon_1952"
xmlns = "http://www.cellml.org/2001/04/cellml"
xmlns:cellml = "http://www.cellml.org/2001/04/cellml">>
...
</model>
The <model> element has a name attribute that allows this model to be unambigu-
ously referenced by other models. For instance, this would be necessary if the model
were to be combined with other models or partial models to create a larger model.
Two namespaces (Bray et al . 1999) are also declared on the <model> element. The
first sets the default namespace for the <model> element and all elements contained
within the <model> element to the CellML namespace (this URI may change as
CellML develops (see the CellML website at http://www.cellml.org/ for the current
URI)). The second namespace is again the CellML namespace, but this time declared
with an explicit ‘cellml
cellml
cellml’ prefix. This declaration has document-wide scope, so the
‘cellml
cellml
cellml’ prefix may be used anywhere to move an element or attribute into the
CellML namespace. This simplifies the addition of CellML elements and attributes
to non-CellML elements. For instance, a cellml:units attribute (described in the
next section) can be added to MathML <cn> elements without having to redeclare the
CellML namespace with each occurrence. The declaration of the CellML namespace
as both the default namespace and a namespace mapped to a prefix is recommended
practice for any <model> element.
3. Units
One of the key features ensuring robustness and reusability of CellML components
and models is the requirement that all variables and numbers be declared with a
set of units. Components and models containing variables with different units may
therefore still be connected. However, variables that are to be mapped to one another
must have the same dimensions. The explicit declaration of units also allows basic
consistency checking of equations.
CellML provides a dictionary of standard units that may be used in variable dec-
larations and attached to bare numbers in mathematics. References to these units
should make use of the actual name of the units, rather than the standard abbre-
viation, thus avoiding confusion between units (e.g. metre) and prefixes (e.g. milli).
The full list of units that any CellML processing application should understand is
given in table 1. The keywords in that table comprise the SI (BIPM 1998, 2000)
base and derived units and the additional units that are commonly used in the
types of biological models likely to be defined using CellML. Expressions relating
these additional units to the SI base units can be found in the CellML specification
(http://www.cellml.org/specification/index.html).
CellML also provides a facility whereby new units can be defined in terms of the
units defined in the dictionary. This functionality allows the creation of complex
units (made up of the product of simple units), definition of imperial units (which
are expressed as a scaled version of an SI unit), and even creation of units that
require an offset (such as degrees Fahrenheit). This allows model authors to work in
whatever set of units they feel most comfortable with, while still ensuring that their
models can be integrated with those of other authors using other units.
New units are defined using the <units> element, which may be placed inside
both <model> and <component> elements. When a <units> element is placed inside
a <model> element, the units definition may be referenced by all components in
that model. When a <units> element is placed inside a <component> element, the
units definition may only be referenced inside that component. Units definitions are
referenced by the value of the name attribute of the <units> element. The value of
the name attribute of a <units> element must be unique across all <units> elements
in the <model> or <component> element in which it is defined. If the value of the
name attribute of a <units> element defined inside a <component> element matches
the name attribute of a <units> element defined inside the parent <model> element,
then it will redefine the unit, and all references to the units within that component
refer to the new definition.
All <variable> elements must include a units attribute that references either
one of the keywords defined in the standard dictionary shown in table 1 or the value
of the name attribute of a <units> element in the current component or model.
Whenever a bare number occurs in an equation, it must be placed in a <cn> element
in the MathML namespace. Every <cn> element must include a units attribute in
the CellML namespace, the value of which follows the same scheme as the units
attribute on a <variable> element.
The contents of a <units> element are a set of <unit> elements, each referenc-
ing units from the dictionary or some previously defined units. The product of the
subunits is the final units type. A <unit> element has no content but may have up
to five attributes. The units attribute is the most important of these, and is the
only one that is required. It is used to set the base quantity for the current <unit>
element, and its value must correspond to a keyword from the standard CellML units
dictionary or to the value of the name attribute of a <units> element in the current
component or model.
The definition of new units in terms of subunits may require the use of some
combination of the optional offset
offset, prefix
prefix, exponent and multiplier attributes.
The optional offset attribute is used to represent the addition of a constant in the
transformation between the current units and the base units. This should only be
necessary to define the Fahrenheit temperature scale (an offset of "+32.0" must
be applied to define Fahrenheit in terms of Celsius). If the offset attribute is not
present, it assumes a default value of "0.0".
The prefix attribute can be used to indicate a scale attribute for the unit. Its
value may be from the standard set of CellML prefix names given in table 2, or an
integer, in which case the unit is premultiplied by 10 to the power of this number.
If no prefix attribute value is specified, it is assumed that the unit stands alone,
i.e. it is premultiplied by one.
The combination of prefix attribute and units attribute is raised to a power
equal to the value of the exponent attribute. The value of the exponent attribute
must be a real number, and is typically an integer. If no exponent attribute value
is specified, it is assumed that the unit occurs once, i.e. the exponent attribute has
a default value of one. Note that an exponent attribute value of "0" (zero) has the
effect of removing the parent <unit> element from the current units.
Finally, a multiplier attribute can be used to premultiply the rest of the conver-
sion expression by a further scale factor, allowing the introduction of floating-point
scale factors. For instance, a multiplier of "0.45359237" is used to define a pound
in terms of the kilogram.
A simple units definition occurs when units are defined as a linear function of some
previously defined simple units or base units. In a simple units definition, a <units>
element contains only a single child <unit> element, that <unit> element has an
exponent attribute value of "1.0", and the units definition referenced by the units
attribute is one of the SI base units or is itself a simple units definition. These are
the only conditions under which a <unit> element may define an offset attribute.
The formula that expresses how the old units (referenced by the value of the units
attribute on the <unit> element) are transformed into the new units (defined by the
value of the name attribute on the parent <units> element) is given below:
Terms in square brackets represent the units associated with values in the expression,
which are italicized. xold is the value to be transformed from the old units, and
xnew is the resulting value in the new units. ‘Units’ are the units being defined,
and multiplier, prefix, ‘units’, and offset correspond to the values of the appropriate
attributes on the <unit> element.
Complex units are the product of multiple units. In a complex units definition, a
<units> element contains multiple child <unit> elements, or some <unit> element
defines an exponent attribute with a value other than "1.0". The conversion between
the new units and the product of the constituent units is given by the formula below:
xnew [Units] = (m1 . . . mn pe1 en e1 en e1 en
1 . . . pn ) [Units/(u1 . . . un )]xold [u1 . . . un ]
</units>
<units name=
name="millisiemens_per_centimetre_squared">
<unit prefix=
prefix=milli"
" units="
units="siemens"" />
<unit units=
units=per_centimetre_squared"" />
</units>
<units name=
name="microfarad_per_centimetre_squared">
<unit prefix=
prefix=micro"
" units="
units="farad"
" />
<unit units=
units=per_centimetre_squared"" />
</units>
<units name=
name="microampere_per_centimetre_squared">
<unit prefix=
prefix=micro"
" units="
units="ampere"" />
<unit units=
units=per_centimetre_squared"" />
</units>
4. Component
Components are the basic structural elements of a CellML model. Components may
contain the following optional structures.
(1) A set of variables.
(2) A set of mathematical expressions defining the relationships between variables.
(3) Metadata providing information about the context of the component.
A <component> element is used to declare a CellML component. It may only be
used inside a <model> element or as the root element of a CellML document.
The structure of the component that represents one of the Hodgkin–Huxley gates
in the Hodgkin–Huxley model’s sodium channel is shown in the following XML frag-
ment. The contents of the <math> element have been omitted to preserve space
(mathematics is discussed in a later section):
<component name="
name="sodium_channel_m_gate"> ">
<variable name=" " public interface="
name="m" interface="out""
units="
units="dimensionless" " initial value="
value="0.05"" />
<variable name="
name="alpha"" public interface="
interface="in""
units="
units="per_millisecond" " />
<variable name=" " public interface="
name="beta" interface="in"
"
units="
units="per_millisecond" " />
<variable name=" " public interface="
name="time" interface="in"
"
units="
units="millisecond"" />
<math xmlns="
xmlns="http://www.w3.org/1998/Math/MathML">
">
...
</math>
</component>
Each <component> element has a name attribute, the value of which must be unique
across all other <component> elements within the same <model> element. The value
of this attribute is used to reference the component in connections and groups.
5. Variable
A CellML variable is a named entity that belongs to a single component. The
<variable> element is used to declare a CellML variable. It can only be used inside
a <component> element. Variables have a name attribute, the value of which must be
unique across all variables in the current component. The name of a variable is used
when mapping variables inside connections. <variable> elements may also have the
following attributes.
initial value. A variable may be a simple scalar, a simple compound struc-
ture such as a vector, or a more complicated structure such as a spatially
varying field (defined by FieldML). The initial value attribute provides a
convenient means for specifying the initial value of a scalar variable in a simu-
lation with time as the independent variable.
units
units. All variable declarations must have units specified. The value of the
units attribute must correspond to one of the keywords in the CellML units
dictionary or the name attribute of some units defined within the current com-
ponent or model.
public interface. This attribute specifies the interface exposed to compo-
nents in the parent and sibling sets (see below). The public interface may have
a value "in", "out" or "none". The absence of a public interface attribute
implies a value of "none".
private interface. This attribute specifies the interface exposed to compo-
nents in the encapsulated set (see below). The private interface may have a
value "in", "out" or "none". The absence of a private interface attribute
implies a value of "none".
The rules for mapping variables depend on the encapsulation hierarchy of the
components that own the variables. Encapsulation allows the modeller to hide a
complex network of components from the rest of the model and provide a single
component as an interface to the hidden network. Encapsulation effectively divides
the network into layers, where connections between the layers may only be made
through the interface components. The components to which any given component
may connect can be divided into four distinct classes. The set of all components
encapsulated by the current component is referred to as the encapsulated set. If the
current component is encapsulated, then the encapsulating component is referred to
as the parent, and the set of all other components encapsulated by the same parent
is referred to as the sibling set. If the current component is not encapsulated, then
it has no parent and the sibling set consists of all other components in the model
that are not encapsulated. All other components, which may not be connected to
the current component, make up the hidden set.
The CellML network shown in figure 1 demonstrates these sets effectively. The
encapsulated set of the ‘sodium channel’ component comprises the ‘m gate’ and
‘h gate’ components. The ‘sodium channel’ component is not itself encapsulated so
has no parent, and has a sibling set consisting of the ‘membrane’, ‘potassium channel’
and ‘leakage current’ components. The ‘n gate’ component is not in the sibling set
because it is encapsulated inside the ‘potassium channel’ component.
6. Mathematics
CellML uses MathML content markup to describe the relationships between vari-
ables within a component. MathML is an XML application providing ‘an explicit
encoding of the underlying mathematical structure of an expression’ (Ausbrooks et
al . 2001). The technical specification for MathML is maintained by the W3C Math
Working group as part of the activity of the W3C User Interface Domain. MathML
contains a rich set of predefined containers and operators, as well as mechanisms for
combining them in mathematically meaningful ways. This set is sufficient to enable
unambiguous representation of most biological models. In particular, MathML pro-
vides content elements to allow coding of simple formulae in the following areas:
arithmetic, algebra, logic and relations; calculus and vector calculus; set theory;
sequences and series; elementary classical functions; statistics; linear algebra.
MathML content markup, as opposed to algorithmic specification, is used to rep-
resent relationships between variables because it attempts to describe the meaning
of the relationships, rather than the methods by which those relationships are com-
puted. The CellML specification strongly encourages model authors to represent all
model behaviour as mathematical equations, using MathML. If a model contains
behaviour that requires algorithmic description, ECMAScript (ECMA 1999) can be
used to encode algorithms in CellML. This practice is not discussed in this paper,
since it is discouraged, and algorithms are not required to represent the Hodgkin–
Huxley model.
The governing differential equation in the Hodgkin–Huxley model for the gate
variable m in the sodium channel (in terms of the rate constants alpha_m and beta_m
is)
d(m)
= alpha m · (1 − m) − beta m · m
d(time)
The MathML representation of this equation is given in the XML fragment below:
<math xmlns="
xmlns="http://www.w3.org/1998/Math/MathML">
">
<apply><eq />
<apply><diff />
<bvar><ci> time </ci></bvar>
<ci> m </ci>
</apply>
<apply><minus />
<apply><times />
<ci> alpha_m </ci>
<apply><minus />
<cn cellml:units="
cellml:units="dimensionless">
"> 1.0 </cn>
<ci> m </ci>
</apply>
</apply>
<apply><times />
<ci> beta_m </ci>
<ci> m </ci>
</apply>
</apply>
</apply>
</math>
The default namespace for the <math> element and all of its children elements is set
to the MathML namespace as defined in the MathML 2.0 specification, overriding
the default namespace declaration on the <model> element.
7. Connection
Connections provide the mechanism for mapping variables declared within one com-
ponent to variables in another component, thereby allowing information to be ex-
changed between the various components in the network. There will be many such
mappings present in a network. For convenience, they are grouped into sets of map-
pings between pairs of components. A set of such variable mappings between two com-
ponents constitutes a connection. Each referenced variable must appear in the appro-
priate interface of the referenced component. Furthermore, the interface attributes
of each pair of variables must be compatible: an "out" variable in one component’s
interface must map to an "in" variable in the other component’s interface. A single
"out" variable may map to multiple "in" variables in other components, allowing
8. Group
Grouping provides a mechanism for adding structure to a model by defining named
relationships between components. A group has the following properties.
Relationships. A group must define one or more relationships between the com-
ponents referenced by the group.
Components. A group can contain references to any number of components. These
references may be nested, indicating that the group’s relationships may be hierar-
chical.
Two types of grouping relationship are predefined in CellML: encapsulation and
geometric containment. These are indicated by relationship attributes with values
of "encapsulation" and "containment", respectively. Users may also define their
own classes of group, but CellML-compliant processing software is not required to
recognize any groups not belonging to these two predefined classes. A single group
can be of more than one class.
The <group> element is used to declare a CellML group. It can only be used
inside a <model> element. The following XML fragment demonstrates the use of the
<group>
<relationship ref relationship="
relationship="encapsulation"
" />
<relationship ref relationship="
relationship="containment"
" />
<component ref component="
component="sodium_channel"
" />
<component ref component="
component="hodgkin_huxley_gate_1"
" />
<component ref component="
component="hodgkin_huxley_gate_2"
" />
/component ref>
</group>
This <group> element contains three <component ref> elements, which are used
to reference (via component attributes) the components involved in the group. The
<component ref> elements that reference the Hodgkin–Huxley gate components are
defined inside of the <component ref> element that references the sodium channel,
indicating that the gate components are both encapsulated by and physically inside
the sodium channel. This is indicated by the "encapsulation" and "containment"
relationships referenced by the <relationship ref> elements.
These grouping relationships do not have any mathematical significance, i.e. they
do not affect the equations contained in the grouped components or imply any default
behaviour in the mapping of variables between these components.
9. Encapsulation
The most important of the two predefined grouping classes is the encapsulation rela-
tionship, which is indicated by a relationship attribute value of "encapsulation".
Encapsulation allows the modeller to hide a group of components from the rest of the
model by using a single component as an interface to the hidden subnetwork. Encap-
sulation adds structure to a model by preventing connections between specified sets
of components: encapsulated components (those referenced by the <component ref>
elements that are defined inside other <component ref> elements) may only be con-
nected to the encapsulating component (the component referenced by the parent
<component ref> element) and to other components encapsulated by the same dom-
inant component.
The encapsulation functionality requires each variable to have two interfaces.
These, and their interaction, were described in detail in the previous section on
variables.
10. Containment
The second predefined grouping class is a geometric relationship known as contain-
ment, which is indicated by a relationship attribute value of "containment". This
relationship is only intended to define the most basic form of rendering information:
namely that the components referenced by nested <component ref> elements are
physically inside the component referenced by their parent <component ref> ele-
ment.
CellML’s <group> element allows the modeller to specify that one component is
inside another and name this relationship. By specifying numerous geometric rela-
tionships with the same name, the modeller can build up geometric hierarchies. Any
unnamed geometric relationships form a single geometric hierarchy. It is typically this
hierarchy, if present, that a CellML processing application might render by default.
Geometric relationship information is completely independent of encapsulation
information, but CellML processing software is free to check for inconsistencies
between the two relationships; for instance, it would not generally be useful for
an encapsulating parent to be geometrically inside one of its children!
11. Metadata
Metadata are included in CellML to provide context for models and to facilitate
searches of collections of models and model components. They provide a means for
a modeller to include structured descriptive information about his model, which
can help other modellers determine whether they can incorporate the model into
their own work. Metadata defined in the CellML specification include model author,
literature reference, copyright, model creation date, and various elements intended
to place the model into a meaningful biological context.
The syntax developed for embedding metadata within CellML documents is based
on the Resource Description Format (RDF) (Brickley & Guha 2000) specification
developed and maintained by the W3C. Modellers are free to define additional meta-
data within their own RDF schema. However, CellML processing software is not
required to recognize any metadata other than that defined in the CellML specifica-
tion.
The following are the basic rules for inclusion of metadata in CellML.
(i) All metadata are optional. A model without any metadata is a valid CellML
model. However, the CellML specification strongly recommends that the mod-
eller provide as much metadata as possible, particularly his/her name and con-
tact information and a reference for a paper that describes the development of
the model.
(ii) All metadata are allowed on any element in a CellML document. Although the
primary intent of the metadata is to provide information about models and
model components, this specification does not limit the metadata to the model
and component elements.
(iii) There can only be one set of metadata on a given element. The CellML meta-
data model allows some metadata elements to repeat, while others can occur
only once. Any given CellML element can have one set of metadata, as defined
by this data model.
(iv) Lack of metadata implies nothing. No inheritance of metadata from parent
elements is implied if an element lacks metadata.
12. Conclusion
As models of biological processes become more sophisticated they become increas-
ingly difficult to reliably construct, communicate, test and modify. This problem has
both structural and procedural roots. The structures and interactions characteristic
of complicated systems are inherently difficult to understand and model. There is
probably little that can be done to alleviate this problem. In contrast, resolution
of the procedural problems is possible. The traditional procedure for communicat-
ing models by publishing in hard-copy journals and books is both error prone and
inefficient. With digital computation and communication tools widely available to
researchers it makes increasing sense to specify, distribute, compute and modify
models electronically, removing the need for human involvement in the translation
of models from one form to another.
CellML is a language for describing a wide class of biological models in a form that
is readable by both humans and computers. It makes use of existing widely accepted
standards, such as XML, MathML and RDF. Furthermore, CellML is designed to
facilitate automatic translation of models from specification into computational
form;
promote reuse of existing models via the use of a component-based model
architecture, encapsulation and model namespace constructs; and
enable automatic error and consistency checking by requiring units information
to be carried by all variables.
It is hoped that the widespread adoption of CellML as a standard vehicle for
specifying and communicating biological models will alleviate some of the procedural
problems described above.
The authors gratefully acknowledge the support of the CellML project by Physiome Sciences
Inc.
References
Ausbrooks, R., Buswell, S., Dalmas, S., Devitt, S., Diaz, A., Hunter, R., Smith, B., Soif-
fer, N., Sutor, R. & Watt, S. 2001 Mathematical markup language (MathML) version 2.0.
W3C proposed recommendation 8 January 2001 (http://www.w3.org/TR/MathML2/PDF-
p-MathML-20001113.pdf).
Bray, T., Hollander, D. & Layman, A. 1999 Namespaces in XML. W3C recommendation, 14
January 1999 (http://www.w3.org/TR/1999/REC-xml-names-19990114).
Bray, T., Paoli, J., Sperberg-McQueen, C. M. & Maler, E. 2000 Extensible markup language
(XML) 1.0 (2nd edn). W3C recommendation, 6 October 2000 (http://www.w3.org/TR/2000/
REC-xml-20001006.pdf).
Brickley, D. & Guha, R. V. 2000 Resource Description Framework (RDF) schema specifica-
tion 1.0. W3C candidate recommendation 27 March 2000 (http://www.w3.org/TR/2000/
CR-rdf-schema-20000327).
Bullivant, D., Hedley, W. J. & Nielsen, P. M. F. 2000 The FieldML website (http://www.
physiome.org.nz/sites/physiome/fieldml/pages/index.html).
BIPM 1998 Bureau International des Poids et Mesures. The international system of units (SI)
(http://www.bipm.fr/pdf/si-brochure.pdf).
BIPM 2000 Bureau International des Poids et Mesures. The international system of units. Sup-
plement 2000: addenda and corrigenda to the 7th edn (1998) (http://www.bipm.fr/pdf/si-
supplement2000.pdf).
ECMA 1999 Standard ECMA-262. ECMAScript language specification (ftp://ftp.ecma.ch/
ecma-st/Ecma-262.pdf).
Hedley, W. J., Nelson, M., Bullivant, D. & Nielsen, P. M. F. 2000 Welcome to cellml.org
(http://www.cellml.org/).
Hodgkin, A. L. & Huxley, A. F. 1952 A quantitative description of membrane current and its
application to conduction and excitation in nerve. J. Physiol. 117, 500–544.
Hucka, M., Finney, A., Sauro, H. & Bolouri, H. 2000 Systems biology markup language (SBML).
Level 1. Structures and facilities for basic model definitions, 21 December 2000 edn (ftp://
ftp.cds.caltech.edu/pub/caltech-erato/sbml-level-1/sbml.pdf).
Murray-Rust, P. & Rzepa, H. S. 1999 Chemical markup, XML, and the World Wide Web. 1.
Basic principles. J. Chem. Inf. Comput. Sci. 39, 928–942.
Rosetta Inpharmatics 2000 Gene Expression Markup Language (GEMLtm ). A common data
format for gene expression data and annotation interchange (http://www.geml.org/docs/
GEML.pdf).