You are on page 1of 19

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224027702

A short introduction to CellML

Article in Philosophical Transactions of The Royal Society B Biological Sciences · June 2001
Impact Factor: 7.06 · DOI: 10.1098/rsta.2001.0817

CITATIONS READS

80 35

4 authors, including:

David Bullivant Poul Michael Fønss Nielsen


University of Auckland University of Auckland
68 PUBLICATIONS 648 CITATIONS 491 PUBLICATIONS 7,064 CITATIONS

SEE PROFILE SEE PROFILE

Available from: Poul Michael Fønss Nielsen


Retrieved on: 10 May 2016
10.1098/rsta.2001.0817

A short introduction to CellML


By Warren J. H e d l e y1 , M e l a n i e R. N e l s o n2 ,
David P. Bullivant1 a n d P o u l F. Nielsen1
1
Department of Engineering Science, Private Bag 92019,
The University of Auckland, Auckland, New Zealand
2
Physiome Sciences Inc., 307 College Road East,
Princeton, NJ 08540-6608, USA

CellMLtm is an XML-based language designed to facilitate the exchange of biological


models across the World Wide Web. Processing applications are able to appropriately
render models based on the definition of model structure given in a CellML document,
and run simulations based on the definition of the underlying mathematics.
CellML is designed to be a general framework upon which a wide variety of models
may be built. The basic constituents and structure are simple, providing a common
basis for describing models and facilitating the creation of complex models from
simpler ones by combining models and/or adding detail to existing models.
CellML models are represented as a collection of discrete components linked by
connections to form a network. A component is a functional unit that may corre-
spond to a physical compartment, a collection of entities engaged in similar tasks, or
a convenient modelling abstraction. Components may contain variables, mathemati-
cal relationships that specify the interactions between those variables, and metadata.
Variables may be local to a component, or made visible to other components via inter-
face attributes. All interactions between variables within a component are described
using MathML content markup. The interface attributes describe the external view
of the component, specifying those variables visible to other components. A con-
nection is a directed mapping from externally visible variables in one component
to those of another. Every variable has a set of units associated with it, making it
possible to connect together components with variables defined using different units.
CellML offers additional facilities, such as metadata, for adding context informa-
tion to a model, and component grouping. These assist in the creation and mainte-
nance of models but do not alter the mathematics of the model. All models described
using CellML can be reduced to the canonical form: a set of connected components.
Keywords: cell model; mathematical model; XML; MathML; markup language

1. Introduction
Computer modelling of biological processes can be a valuable complement to experi-
mental methods. Modelling can help place experimental data in a meaningful context
and allow scientists to investigate questions that are difficult or impossible to address
experimentally. Modelling also facilitates the exploration of the parameter space of
a system, helping scientists determine which features of the system exert the most
influence over its behaviour.

Phil. Trans. R. Soc. Lond. A (2001) 359, 1073–1089 c 2001 The Royal Society

1073
1074 W. J. Hedley and others

However, the current method by which mathematical models of biological systems


are published and shared has several problems. A model is developed and run as
computer code, but is published as text and equations. These equations describe
the underlying mathematical model, but may be inconsistent with the code that was
used to implement the model and obtain any published results. These inconsistencies
can make it impossible for other scientists to implement the published model and
reproduce the published results.
The model author may make the source code used in their model implementation
freely available to other scientists. However, this source code does not usually lend
itself to customization or integration with other models. Furthermore, most models
are written in code that is specific for one computer platform. Someone wishing to
reuse such a model would have to either run the model on the platform for which
it was written or port the source code to an alternative platform. Reusing only part
of a model is even more difficult, because this requires disentangling the portions of
the code that pertain to the desired part of the model.
CellML (Hedley et al . 2000) is designed to address these problems and to encourage
the reuse of models and parts of models. It is based on the eXtensible Markup Lan-
guage (XML) (Bray et al . 2000), and provides a means for unambiguously specifying
biological models. The model author specifies the underlying mathematical model,
and not a particular implementation of its solution. The same CellML document can
be processed by rendering software to generate equations suitable for publishing, or
processed by solution software to generate computer code that can then be compiled
and executed to perform simulations, ensuring that the executable model and the
published equations are consistent.
Models are described in CellML as networks of components where each component
defines one logically discrete portion of the model. This component-based architec-
ture of models facilitates the reuse of parts of models, because all of the mathematics
and other information about a model component are stored in one place. Further-
more, each component is defined with an interface that determines which variables
are required as input to the component (i.e. which variables are used but not defined
in the component). Therefore, all a modeller needs to do to reuse the component
is provide the required input variables. It is not necessary to delve into the inner
workings of the component.
XML is a structured document format that is both human and machine read-
able. XML documents are plain text, and so may be created and edited using a
basic text editor. The XML specification (Bray et al . 2000) defines the format of an
XML document and specifies the behaviour of XML processing software, ensuring
interoperability between all applications claiming to be XML conformant. XML doc-
uments are made up of a tree of elements. Each element consists of a start tag (which
may contain attributes), some content (which may contain text or further elements),
and an end tag. Start tags are made up of a less-than sign ‘< <’, the element name,
attributes, and a greater-than ‘> >’ sign. End tags have a slash ‘/
/’ before the element
name. An attribute is a name = "value" expression associated with an element. The
value of an attribute is usually intended to be data relevant to the content of the
current element.
The design goals for XML make it an ideal metalanguage for CellML. XML docu-
ments are both human and machine readable. The language has a short specification,
making the implementation of XML processing software a reasonably simple matter.

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1075

As a consequence, many free implementations of XML processors exist. These can


be used in the creation of CellML processing software, making it simple to read and
write CellML documents. XML is also readily usable over the internet, facilitating
the exchange of models between databases and processing software using the web.
Many of the features of XML make it easy to combine multiple XML-based stan-
dards in one document. CellML elements are used to define model structure, but
other information is incorporated into the model document using existing standards.
Mathematics is included as MathML (Ausbrooks et al . 2001). The use of algorithms
is strongly discouraged. However, if algorithms are necessary to specify a model,
they can be included using ECMAScript (ECMA 1999), the standard version of
JavaScript. Metadata are included via the Resource Description Framework (RDF)
(Brickley & Guha 2000).
Elements from other related languages can be inserted into a CellML document
and identified within the document by using the XML namespaces (Bray et al .
1999) mechanism. For instance, protein-structure information could be added to a
CellML document using the Chemical Markup Language (Murray-Rust & Rzepa
1999), or gene-expression data added using the Gene Expression Markup Language
(Rosetta Inpharmatics 2000). The CellML specification defines four namespaces: two
associated with model structure and mathematics, and two associated with metadata.
Simulation software need only recognize elements in the first two namespaces and
may ignore all other elements.
Although this paper focuses on the use of CellML to specify lumped parame-
ter electrophysiological models, other types of models, such as biochemical models of
signal transduction and metabolic pathways, can also be specified in CellML. In addi-
tion, CellML is designed to be compatible with spatially and temporally distributed
models. Such models require mechanisms for characterizing and manipulating dis-
tributed quantities in an unambiguous and computationally efficient manner. These
requirements are met by FieldML (Bullivant et al . 2000), an XML-based language for
representing arbitrary tensor fields in piecewise functional form. FieldML provides a
means for distributed models to be constructed within the CellML framework.
The scope of the models that can be described using CellML overlaps with the func-
tionality of several other languages that are being developed to specify certain kinds
of biological models. The most prominent of these is the Systems Biology Markup
Language (SBML) (Hucka et al . 2000), which ‘is oriented towards representing bio-
chemical networks’. SBML is not suitable for defining electrophysiological models of
the kind presented in this paper, but it would be possible to embed SBML descrip-
tions of cell signalling pathways inside CellML documents, and to define transfor-
mations between SBML pathway model descriptions and the corresponding CellML
descriptions.

2. Model
A model defined in CellML consists of a network of interconnected components.
Models are organized into the following structures.
Components, which are the smallest functional units in a model. Each component
contains the mathematics that describes the behaviour of the portion of the system
represented by that component. For instance, an electrophysiological model of a
cell might be organized into components that represent various ion channels. All of

Phil. Trans. R. Soc. Lond. A (2001)


1076 W. J. Hedley and others

membrane

V V V

I Na IK IL

sodium potassium leakage


channel channel current
αm αh αn
βm βn
βh

h n
m

m gate h gate n gate

Figure 1. The network structure of the CellML description of the Hodgkin–Huxley squid axon
model. The shapes represent components and the lines correspond to connections along which
variables are passed. The solid arrow heads point towards the parent component in a containment
relationship, hollow arrow heads point towards the parent in both geometric containment and
encapsulation relationships. Variable names are displayed next to the component in which they
are declared alongside connections on which their value is exported to other components.

the mathematics that describe the behaviour of the L-type calcium channel would
be defined in a single component representing this particular ion channel.
Connections, which are used to connect components to each other and to map
variables in one component to variables in another.
Groups, which allow the modeller to indicate the existence of logical or physical
groups of components.
Metadata, which provide context for the model.
The use of these elements is best demonstrated by example. We will use the
Hodgkin–Huxley (Hodgkin & Huxley 1952) model of the giant squid axon as an
example throughout this paper. Some minor changes have been made to the original
published model to reflect current modelling practice. For instance, the membrane
voltage is defined with respect to absolute zero, and not the resting potential of the
membrane. Excerpts of the CellML document describing this model are presented in
this paper. The full model is available online at http://www.cellml.org/examples/
hh squid axon 1952/. A diagram of the model’s structure, as defined in the CellML
document, is shown in figure 1.
The usual root element for a CellML document is the model element, <model>
<model>.
The <model> element for the Hodgin–Huxley example is shown in the following XML
fragment.

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1077

Table 1. Dictionary of names of predefined CellML units


(SI base units are in bold.)

ampere gram kilogram mole sievert


becquerel gray liter newton steradian
candela henry litre ohm tesla
celsius hertz lumen pascal volt
coulomb joule lux radian watt
dimensionless katal meter second weber
farad kelvin metre siemens

<model
name = "Hodgkin_Huxley_squid_axon_1952"
xmlns = "http://www.cellml.org/2001/04/cellml"
xmlns:cellml = "http://www.cellml.org/2001/04/cellml">>
...
</model>
The <model> element has a name attribute that allows this model to be unambigu-
ously referenced by other models. For instance, this would be necessary if the model
were to be combined with other models or partial models to create a larger model.
Two namespaces (Bray et al . 1999) are also declared on the <model> element. The
first sets the default namespace for the <model> element and all elements contained
within the <model> element to the CellML namespace (this URI may change as
CellML develops (see the CellML website at http://www.cellml.org/ for the current
URI)). The second namespace is again the CellML namespace, but this time declared
with an explicit ‘cellml
cellml
cellml’ prefix. This declaration has document-wide scope, so the
‘cellml
cellml
cellml’ prefix may be used anywhere to move an element or attribute into the
CellML namespace. This simplifies the addition of CellML elements and attributes
to non-CellML elements. For instance, a cellml:units attribute (described in the
next section) can be added to MathML <cn> elements without having to redeclare the
CellML namespace with each occurrence. The declaration of the CellML namespace
as both the default namespace and a namespace mapped to a prefix is recommended
practice for any <model> element.

3. Units
One of the key features ensuring robustness and reusability of CellML components
and models is the requirement that all variables and numbers be declared with a
set of units. Components and models containing variables with different units may
therefore still be connected. However, variables that are to be mapped to one another
must have the same dimensions. The explicit declaration of units also allows basic
consistency checking of equations.
CellML provides a dictionary of standard units that may be used in variable dec-
larations and attached to bare numbers in mathematics. References to these units
should make use of the actual name of the units, rather than the standard abbre-
viation, thus avoiding confusion between units (e.g. metre) and prefixes (e.g. milli).
The full list of units that any CellML processing application should understand is

Phil. Trans. R. Soc. Lond. A (2001)


1078 W. J. Hedley and others

given in table 1. The keywords in that table comprise the SI (BIPM 1998, 2000)
base and derived units and the additional units that are commonly used in the
types of biological models likely to be defined using CellML. Expressions relating
these additional units to the SI base units can be found in the CellML specification
(http://www.cellml.org/specification/index.html).
CellML also provides a facility whereby new units can be defined in terms of the
units defined in the dictionary. This functionality allows the creation of complex
units (made up of the product of simple units), definition of imperial units (which
are expressed as a scaled version of an SI unit), and even creation of units that
require an offset (such as degrees Fahrenheit). This allows model authors to work in
whatever set of units they feel most comfortable with, while still ensuring that their
models can be integrated with those of other authors using other units.
New units are defined using the <units> element, which may be placed inside
both <model> and <component> elements. When a <units> element is placed inside
a <model> element, the units definition may be referenced by all components in
that model. When a <units> element is placed inside a <component> element, the
units definition may only be referenced inside that component. Units definitions are
referenced by the value of the name attribute of the <units> element. The value of
the name attribute of a <units> element must be unique across all <units> elements
in the <model> or <component> element in which it is defined. If the value of the
name attribute of a <units> element defined inside a <component> element matches
the name attribute of a <units> element defined inside the parent <model> element,
then it will redefine the unit, and all references to the units within that component
refer to the new definition.
All <variable> elements must include a units attribute that references either
one of the keywords defined in the standard dictionary shown in table 1 or the value
of the name attribute of a <units> element in the current component or model.
Whenever a bare number occurs in an equation, it must be placed in a <cn> element
in the MathML namespace. Every <cn> element must include a units attribute in
the CellML namespace, the value of which follows the same scheme as the units
attribute on a <variable> element.
The contents of a <units> element are a set of <unit> elements, each referenc-
ing units from the dictionary or some previously defined units. The product of the
subunits is the final units type. A <unit> element has no content but may have up
to five attributes. The units attribute is the most important of these, and is the
only one that is required. It is used to set the base quantity for the current <unit>
element, and its value must correspond to a keyword from the standard CellML units
dictionary or to the value of the name attribute of a <units> element in the current
component or model.
The definition of new units in terms of subunits may require the use of some
combination of the optional offset
offset, prefix
prefix, exponent and multiplier attributes.
The optional offset attribute is used to represent the addition of a constant in the
transformation between the current units and the base units. This should only be
necessary to define the Fahrenheit temperature scale (an offset of "+32.0" must
be applied to define Fahrenheit in terms of Celsius). If the offset attribute is not
present, it assumes a default value of "0.0".
The prefix attribute can be used to indicate a scale attribute for the unit. Its
value may be from the standard set of CellML prefix names given in table 2, or an

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1079

Table 2. CellML prefix names conform to SI prefix names

factor name factor name

1024 yotta 10−1 deci


1021 zetta 10−2 centi
1018 exa 10−3 milli
1015 peta 10−6 micro
1012 tera 10−9 nano
109 giga 10−12 pico
106 mega 10−15 femto
103 kilo 10−18 atto
102 hecto 10−21 zepto
101 deka 10−24 yocto

integer, in which case the unit is premultiplied by 10 to the power of this number.
If no prefix attribute value is specified, it is assumed that the unit stands alone,
i.e. it is premultiplied by one.
The combination of prefix attribute and units attribute is raised to a power
equal to the value of the exponent attribute. The value of the exponent attribute
must be a real number, and is typically an integer. If no exponent attribute value
is specified, it is assumed that the unit occurs once, i.e. the exponent attribute has
a default value of one. Note that an exponent attribute value of "0" (zero) has the
effect of removing the parent <unit> element from the current units.
Finally, a multiplier attribute can be used to premultiply the rest of the conver-
sion expression by a further scale factor, allowing the introduction of floating-point
scale factors. For instance, a multiplier of "0.45359237" is used to define a pound
in terms of the kilogram.
A simple units definition occurs when units are defined as a linear function of some
previously defined simple units or base units. In a simple units definition, a <units>
element contains only a single child <unit> element, that <unit> element has an
exponent attribute value of "1.0", and the units definition referenced by the units
attribute is one of the SI base units or is itself a simple units definition. These are
the only conditions under which a <unit> element may define an offset attribute.
The formula that expresses how the old units (referenced by the value of the units
attribute on the <unit> element) are transformed into the new units (defined by the
value of the name attribute on the parent <units> element) is given below:

xnew [Units] = (multiplier prefix ) [Units/units]xold [units] + offset [Units].

Terms in square brackets represent the units associated with values in the expression,
which are italicized. xold is the value to be transformed from the old units, and
xnew is the resulting value in the new units. ‘Units’ are the units being defined,
and multiplier, prefix, ‘units’, and offset correspond to the values of the appropriate
attributes on the <unit> element.
Complex units are the product of multiple units. In a complex units definition, a
<units> element contains multiple child <unit> elements, or some <unit> element
defines an exponent attribute with a value other than "1.0". The conversion between

Phil. Trans. R. Soc. Lond. A (2001)


1080 W. J. Hedley and others

the new units and the product of the constituent units is given by the formula below:
xnew [Units] = (m1 . . . mn pe1 en e1 en e1 en
1 . . . pn ) [Units/(u1 . . . un )]xold [u1 . . . un ]

The mi , pi , ui and ei terms refer to the values of the multiplier


multiplier, prefix
prefix, units
and exponent attributes on the ith <unit> element respectively.
The CellML specification forbids offset attributes from being defined on any
<unit> elements that occur inside a complex units definition. When a complex units
definition references a simple units definition, any offset associated with the simple
units definition is removed. This means that the conversions such as the one between
degrees Fahrenheit per inch and degrees Celsius per centimetre involve only a scale
factor.
The following CellML fragment contains the definition of two simple units:
<units name=
name="lit">
<unit multiplier="
multiplier="1000"" prefix="
prefix="centi"
"
units="
units="metre"
" exponent="
exponent="3"" />
</units>
<units name=
name="fahrenheit">
<unit multiplier="
multiplier="1.8"
" offset="
offset="32.0"
" units="
units="celsius"
" />
</units>
The first <units> element is used to define a litre (we assign the new units the
name lit to avoid a conflict with the keyword litre from the standard dictionary
of units). In the example, a litre is defined as 1000 cubic centimetres. It would also
be possible to define a litre as one thousandth of a cubic metre or using any number
of possible multipliers and scales. The formula that relates xm , a quantity with units
of cubic metres, to xl , a quantity with units of lit, is
xl [lit] = (1000(10−2 )−3 ) [lit/metre3 ]xm [metre3 ]
The second <units> element is used to define degrees Fahrenheit as a function of
degrees Celsius. The formula we obtain from this <units> element is
xf [fahrenheit] = 1.8 [fahrenheit/celsius]xc [celsius] + 32.0 [fahrenheit]
The units definitions from the Hodgkin–Huxley model are given in the following
XML fragment, demonstrating how complex units can be built up from the product
of simple units:
<units name=
name="millisecond">
<unit prefix=
prefix=milli"
" units="
units="second"" />
</units>
<units name=
name="per_millisecond">
<unit prefix="
prefix="milli"" units="
units="second"" exponent="
exponent="-1"" />
</units>
<units name=
name="per_centimetre_squared">
<unit prefix=
prefix=centi"
" units="
units="metre"" exponent="
exponent="-2"" />
</units>
<units name=
name="millivolt">
<unit prefix=
prefix=milli"
" units="
units="volt"
" />

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1081

</units>
<units name=
name="millisiemens_per_centimetre_squared">
<unit prefix=
prefix=milli"
" units="
units="siemens"" />
<unit units=
units=per_centimetre_squared"" />
</units>
<units name=
name="microfarad_per_centimetre_squared">
<unit prefix=
prefix=micro"
" units="
units="farad"
" />
<unit units=
units=per_centimetre_squared"" />
</units>
<units name=
name="microampere_per_centimetre_squared">
<unit prefix=
prefix=micro"
" units="
units="ampere"" />
<unit units=
units=per_centimetre_squared"" />
</units>

4. Component
Components are the basic structural elements of a CellML model. Components may
contain the following optional structures.
(1) A set of variables.
(2) A set of mathematical expressions defining the relationships between variables.
(3) Metadata providing information about the context of the component.
A <component> element is used to declare a CellML component. It may only be
used inside a <model> element or as the root element of a CellML document.
The structure of the component that represents one of the Hodgkin–Huxley gates
in the Hodgkin–Huxley model’s sodium channel is shown in the following XML frag-
ment. The contents of the <math> element have been omitted to preserve space
(mathematics is discussed in a later section):
<component name="
name="sodium_channel_m_gate"> ">
<variable name=" " public interface="
name="m" interface="out""
units="
units="dimensionless" " initial value="
value="0.05"" />
<variable name="
name="alpha"" public interface="
interface="in""
units="
units="per_millisecond" " />
<variable name=" " public interface="
name="beta" interface="in"
"
units="
units="per_millisecond" " />
<variable name=" " public interface="
name="time" interface="in"
"
units="
units="millisecond"" />
<math xmlns="
xmlns="http://www.w3.org/1998/Math/MathML">
">
...
</math>
</component>
Each <component> element has a name attribute, the value of which must be unique
across all other <component> elements within the same <model> element. The value
of this attribute is used to reference the component in connections and groups.

Phil. Trans. R. Soc. Lond. A (2001)


1082 W. J. Hedley and others

5. Variable
A CellML variable is a named entity that belongs to a single component. The
<variable> element is used to declare a CellML variable. It can only be used inside
a <component> element. Variables have a name attribute, the value of which must be
unique across all variables in the current component. The name of a variable is used
when mapping variables inside connections. <variable> elements may also have the
following attributes.
initial value. A variable may be a simple scalar, a simple compound struc-
ture such as a vector, or a more complicated structure such as a spatially
varying field (defined by FieldML). The initial value attribute provides a
convenient means for specifying the initial value of a scalar variable in a simu-
lation with time as the independent variable.
units
units. All variable declarations must have units specified. The value of the
units attribute must correspond to one of the keywords in the CellML units
dictionary or the name attribute of some units defined within the current com-
ponent or model.
public interface. This attribute specifies the interface exposed to compo-
nents in the parent and sibling sets (see below). The public interface may have
a value "in", "out" or "none". The absence of a public interface attribute
implies a value of "none".
private interface. This attribute specifies the interface exposed to compo-
nents in the encapsulated set (see below). The private interface may have a
value "in", "out" or "none". The absence of a private interface attribute
implies a value of "none".
The rules for mapping variables depend on the encapsulation hierarchy of the
components that own the variables. Encapsulation allows the modeller to hide a
complex network of components from the rest of the model and provide a single
component as an interface to the hidden network. Encapsulation effectively divides
the network into layers, where connections between the layers may only be made
through the interface components. The components to which any given component
may connect can be divided into four distinct classes. The set of all components
encapsulated by the current component is referred to as the encapsulated set. If the
current component is encapsulated, then the encapsulating component is referred to
as the parent, and the set of all other components encapsulated by the same parent
is referred to as the sibling set. If the current component is not encapsulated, then
it has no parent and the sibling set consists of all other components in the model
that are not encapsulated. All other components, which may not be connected to
the current component, make up the hidden set.
The CellML network shown in figure 1 demonstrates these sets effectively. The
encapsulated set of the ‘sodium channel’ component comprises the ‘m gate’ and
‘h gate’ components. The ‘sodium channel’ component is not itself encapsulated so
has no parent, and has a sibling set consisting of the ‘membrane’, ‘potassium channel’
and ‘leakage current’ components. The ‘n gate’ component is not in the sibling set
because it is encapsulated inside the ‘potassium channel’ component.

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1083

Components can be thought of as having two interfaces, each making internally


declared variables available for mapping to variables in other components. Variables
can be placed in either or both of the public and private interfaces by specifying a
value of "in" or "out" in the appropriate attribute. Variables exposed in the pub-
lic interface of a component may be mapped to variables in the public interface of
components in the sibling set or variables in the private interface of components in
the parent. Variables exposed in the private interface of a component may only be
mapped to variables exposed in the public interface of components in the encapsu-
lated set. In all cases, variables with an interface value of "in" must be mapped
to variables with an interface value of "out". A variable may not have a public
and private interface value of "in" because its value may only be obtained via one
mapping.
All variables that are declared in a component may be used in mathematics within
the current component. However, the value of a variable declared with either a
public interface or private interface value of "in" may not be mathemati-
cally modified inside the current component, as this value is imported from a vari-
able belonging to another component. This prevents inconsistencies in the model
that would arise if several equations were to attempt to modify the same variable.
The variables in the Hodgkin–Huxley gate component shown in the previous XML
excerpt demonstrate the use of the public interface attribute. Variables with a
public interface value of "out" may have their value modified in the current
component. This Hodgkin–Huxley gate component contains one such variable: the
gating variable m that represents the percentage opening of the gate.
Variables with a public interface value of "in" obtain their value from variables
declared in other components with an appropriate interface value of "out". Their
value may not be modified in the current component. This Hodgkin–Huxley gate
component declares three such variables: the independent variable time and the rate
constants, alpha and beta.

6. Mathematics
CellML uses MathML content markup to describe the relationships between vari-
ables within a component. MathML is an XML application providing ‘an explicit
encoding of the underlying mathematical structure of an expression’ (Ausbrooks et
al . 2001). The technical specification for MathML is maintained by the W3C Math
Working group as part of the activity of the W3C User Interface Domain. MathML
contains a rich set of predefined containers and operators, as well as mechanisms for
combining them in mathematically meaningful ways. This set is sufficient to enable
unambiguous representation of most biological models. In particular, MathML pro-
vides content elements to allow coding of simple formulae in the following areas:
arithmetic, algebra, logic and relations; calculus and vector calculus; set theory;
sequences and series; elementary classical functions; statistics; linear algebra.
MathML content markup, as opposed to algorithmic specification, is used to rep-
resent relationships between variables because it attempts to describe the meaning
of the relationships, rather than the methods by which those relationships are com-
puted. The CellML specification strongly encourages model authors to represent all
model behaviour as mathematical equations, using MathML. If a model contains
behaviour that requires algorithmic description, ECMAScript (ECMA 1999) can be

Phil. Trans. R. Soc. Lond. A (2001)


1084 W. J. Hedley and others

used to encode algorithms in CellML. This practice is not discussed in this paper,
since it is discouraged, and algorithms are not required to represent the Hodgkin–
Huxley model.
The governing differential equation in the Hodgkin–Huxley model for the gate
variable m in the sodium channel (in terms of the rate constants alpha_m and beta_m
is)
d(m)
= alpha m · (1 − m) − beta m · m
d(time)
The MathML representation of this equation is given in the XML fragment below:
<math xmlns="
xmlns="http://www.w3.org/1998/Math/MathML">
">
<apply><eq />
<apply><diff />
<bvar><ci> time </ci></bvar>
<ci> m </ci>
</apply>
<apply><minus />
<apply><times />
<ci> alpha_m </ci>
<apply><minus />
<cn cellml:units="
cellml:units="dimensionless">
"> 1.0 </cn>
<ci> m </ci>
</apply>
</apply>
<apply><times />
<ci> beta_m </ci>
<ci> m </ci>
</apply>
</apply>
</apply>
</math>
The default namespace for the <math> element and all of its children elements is set
to the MathML namespace as defined in the MathML 2.0 specification, overriding
the default namespace declaration on the <model> element.

7. Connection
Connections provide the mechanism for mapping variables declared within one com-
ponent to variables in another component, thereby allowing information to be ex-
changed between the various components in the network. There will be many such
mappings present in a network. For convenience, they are grouped into sets of map-
pings between pairs of components. A set of such variable mappings between two com-
ponents constitutes a connection. Each referenced variable must appear in the appro-
priate interface of the referenced component. Furthermore, the interface attributes
of each pair of variables must be compatible: an "out" variable in one component’s
interface must map to an "in" variable in the other component’s interface. A single
"out" variable may map to multiple "in" variables in other components, allowing

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1085

a variable to fan out to multiple components. Component ownership of a variable


may be traced by following the variable back from "in" to "out" interfaces defined
by the model’s connections.
There can only be one connection between any two components in a network. This
prevents setting up inconsistent, circular or duplicate variable mappings between any
two components in the network. However, it does not prevent a model author from
creating inconsistent mathematical relationships between the variables.
The <connection> element is used to declare a CellML connection. It can only
appear inside a <model> element. The <connection> element that is used to specify
the mappings between variables in the membrane and sodium channel components
of the Hodgkin–Huxley model is shown in the following XML fragment:
<connection>
<map components component 1="
1="membrane
component 2="
2="sodium_channel"
" />
<map variables variable 1="
1="V variable 2="
2="V"" />
<map variables variable 1="
1="E_R variable 2="
2="E_R" " />
<map variables variable 1="
1="i_Na variable 2="
2="i_Na"" />
</connection>
Since only one connection can be created between any two components in
a model, mappings in both directions are stored together within this single
<connection> element. The direction of each mapping is determined by the value
of the public interface attributes on the two variables: the value is always passed
from the variable with an interface value of "out" to the variable with an interface
value of "in". In this case, the membrane voltage, V, and the equilibrium potential
of the membrane, E_R, are passed from the membrane component to the sodium
channel component, whereas the sodium current, i_Na, is passed in the opposite
direction.

8. Group
Grouping provides a mechanism for adding structure to a model by defining named
relationships between components. A group has the following properties.
Relationships. A group must define one or more relationships between the com-
ponents referenced by the group.
Components. A group can contain references to any number of components. These
references may be nested, indicating that the group’s relationships may be hierar-
chical.
Two types of grouping relationship are predefined in CellML: encapsulation and
geometric containment. These are indicated by relationship attributes with values
of "encapsulation" and "containment", respectively. Users may also define their
own classes of group, but CellML-compliant processing software is not required to
recognize any groups not belonging to these two predefined classes. A single group
can be of more than one class.
The <group> element is used to declare a CellML group. It can only be used
inside a <model> element. The following XML fragment demonstrates the use of the

Phil. Trans. R. Soc. Lond. A (2001)


1086 W. J. Hedley and others

<group> element to encapsulate two Hodgkin–Huxley gate components within the


component representing the sodium channel. This group also represents a geometric
containment relationship:

<group>
<relationship ref relationship="
relationship="encapsulation"
" />
<relationship ref relationship="
relationship="containment"
" />
<component ref component="
component="sodium_channel"
" />
<component ref component="
component="hodgkin_huxley_gate_1"
" />
<component ref component="
component="hodgkin_huxley_gate_2"
" />
/component ref>
</group>

This <group> element contains three <component ref> elements, which are used
to reference (via component attributes) the components involved in the group. The
<component ref> elements that reference the Hodgkin–Huxley gate components are
defined inside of the <component ref> element that references the sodium channel,
indicating that the gate components are both encapsulated by and physically inside
the sodium channel. This is indicated by the "encapsulation" and "containment"
relationships referenced by the <relationship ref> elements.
These grouping relationships do not have any mathematical significance, i.e. they
do not affect the equations contained in the grouped components or imply any default
behaviour in the mapping of variables between these components.

9. Encapsulation
The most important of the two predefined grouping classes is the encapsulation rela-
tionship, which is indicated by a relationship attribute value of "encapsulation".
Encapsulation allows the modeller to hide a group of components from the rest of the
model by using a single component as an interface to the hidden subnetwork. Encap-
sulation adds structure to a model by preventing connections between specified sets
of components: encapsulated components (those referenced by the <component ref>
elements that are defined inside other <component ref> elements) may only be con-
nected to the encapsulating component (the component referenced by the parent
<component ref> element) and to other components encapsulated by the same dom-
inant component.
The encapsulation functionality requires each variable to have two interfaces.
These, and their interaction, were described in detail in the previous section on
variables.

10. Containment
The second predefined grouping class is a geometric relationship known as contain-
ment, which is indicated by a relationship attribute value of "containment". This
relationship is only intended to define the most basic form of rendering information:
namely that the components referenced by nested <component ref> elements are
physically inside the component referenced by their parent <component ref> ele-
ment.

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1087

CellML’s <group> element allows the modeller to specify that one component is
inside another and name this relationship. By specifying numerous geometric rela-
tionships with the same name, the modeller can build up geometric hierarchies. Any
unnamed geometric relationships form a single geometric hierarchy. It is typically this
hierarchy, if present, that a CellML processing application might render by default.
Geometric relationship information is completely independent of encapsulation
information, but CellML processing software is free to check for inconsistencies
between the two relationships; for instance, it would not generally be useful for
an encapsulating parent to be geometrically inside one of its children!

11. Metadata
Metadata are included in CellML to provide context for models and to facilitate
searches of collections of models and model components. They provide a means for
a modeller to include structured descriptive information about his model, which
can help other modellers determine whether they can incorporate the model into
their own work. Metadata defined in the CellML specification include model author,
literature reference, copyright, model creation date, and various elements intended
to place the model into a meaningful biological context.
The syntax developed for embedding metadata within CellML documents is based
on the Resource Description Format (RDF) (Brickley & Guha 2000) specification
developed and maintained by the W3C. Modellers are free to define additional meta-
data within their own RDF schema. However, CellML processing software is not
required to recognize any metadata other than that defined in the CellML specifica-
tion.
The following are the basic rules for inclusion of metadata in CellML.
(i) All metadata are optional. A model without any metadata is a valid CellML
model. However, the CellML specification strongly recommends that the mod-
eller provide as much metadata as possible, particularly his/her name and con-
tact information and a reference for a paper that describes the development of
the model.
(ii) All metadata are allowed on any element in a CellML document. Although the
primary intent of the metadata is to provide information about models and
model components, this specification does not limit the metadata to the model
and component elements.
(iii) There can only be one set of metadata on a given element. The CellML meta-
data model allows some metadata elements to repeat, while others can occur
only once. Any given CellML element can have one set of metadata, as defined
by this data model.
(iv) Lack of metadata implies nothing. No inheritance of metadata from parent
elements is implied if an element lacks metadata.

12. Conclusion
As models of biological processes become more sophisticated they become increas-
ingly difficult to reliably construct, communicate, test and modify. This problem has

Phil. Trans. R. Soc. Lond. A (2001)


1088 W. J. Hedley and others

both structural and procedural roots. The structures and interactions characteristic
of complicated systems are inherently difficult to understand and model. There is
probably little that can be done to alleviate this problem. In contrast, resolution
of the procedural problems is possible. The traditional procedure for communicat-
ing models by publishing in hard-copy journals and books is both error prone and
inefficient. With digital computation and communication tools widely available to
researchers it makes increasing sense to specify, distribute, compute and modify
models electronically, removing the need for human involvement in the translation
of models from one form to another.
CellML is a language for describing a wide class of biological models in a form that
is readable by both humans and computers. It makes use of existing widely accepted
standards, such as XML, MathML and RDF. Furthermore, CellML is designed to
facilitate automatic translation of models from specification into computational
form;
promote reuse of existing models via the use of a component-based model
architecture, encapsulation and model namespace constructs; and
enable automatic error and consistency checking by requiring units information
to be carried by all variables.
It is hoped that the widespread adoption of CellML as a standard vehicle for
specifying and communicating biological models will alleviate some of the procedural
problems described above.
The authors gratefully acknowledge the support of the CellML project by Physiome Sciences
Inc.

References
Ausbrooks, R., Buswell, S., Dalmas, S., Devitt, S., Diaz, A., Hunter, R., Smith, B., Soif-
fer, N., Sutor, R. & Watt, S. 2001 Mathematical markup language (MathML) version 2.0.
W3C proposed recommendation 8 January 2001 (http://www.w3.org/TR/MathML2/PDF-
p-MathML-20001113.pdf).
Bray, T., Hollander, D. & Layman, A. 1999 Namespaces in XML. W3C recommendation, 14
January 1999 (http://www.w3.org/TR/1999/REC-xml-names-19990114).
Bray, T., Paoli, J., Sperberg-McQueen, C. M. & Maler, E. 2000 Extensible markup language
(XML) 1.0 (2nd edn). W3C recommendation, 6 October 2000 (http://www.w3.org/TR/2000/
REC-xml-20001006.pdf).
Brickley, D. & Guha, R. V. 2000 Resource Description Framework (RDF) schema specifica-
tion 1.0. W3C candidate recommendation 27 March 2000 (http://www.w3.org/TR/2000/
CR-rdf-schema-20000327).
Bullivant, D., Hedley, W. J. & Nielsen, P. M. F. 2000 The FieldML website (http://www.
physiome.org.nz/sites/physiome/fieldml/pages/index.html).
BIPM 1998 Bureau International des Poids et Mesures. The international system of units (SI)
(http://www.bipm.fr/pdf/si-brochure.pdf).
BIPM 2000 Bureau International des Poids et Mesures. The international system of units. Sup-
plement 2000: addenda and corrigenda to the 7th edn (1998) (http://www.bipm.fr/pdf/si-
supplement2000.pdf).
ECMA 1999 Standard ECMA-262. ECMAScript language specification (ftp://ftp.ecma.ch/
ecma-st/Ecma-262.pdf).

Phil. Trans. R. Soc. Lond. A (2001)


Introduction to CellML 1089

Hedley, W. J., Nelson, M., Bullivant, D. & Nielsen, P. M. F. 2000 Welcome to cellml.org
(http://www.cellml.org/).
Hodgkin, A. L. & Huxley, A. F. 1952 A quantitative description of membrane current and its
application to conduction and excitation in nerve. J. Physiol. 117, 500–544.
Hucka, M., Finney, A., Sauro, H. & Bolouri, H. 2000 Systems biology markup language (SBML).
Level 1. Structures and facilities for basic model definitions, 21 December 2000 edn (ftp://
ftp.cds.caltech.edu/pub/caltech-erato/sbml-level-1/sbml.pdf).
Murray-Rust, P. & Rzepa, H. S. 1999 Chemical markup, XML, and the World Wide Web. 1.
Basic principles. J. Chem. Inf. Comput. Sci. 39, 928–942.
Rosetta Inpharmatics 2000 Gene Expression Markup Language (GEMLtm ). A common data
format for gene expression data and annotation interchange (http://www.geml.org/docs/
GEML.pdf).

Phil. Trans. R. Soc. Lond. A (2001)

You might also like