You are on page 1of 11

Transgenic Research 6, 309±319 (1997)

Rules and guidelines for genetic nomenclature in mice:


excerpted version

C O M M I T T E E O N S TA N DA R D I Z E D G E N E T I C N O M E N C L AT U R E
FOR MICE
C h a i r p e r s o n : M U R I E L T. DAV I S S O N
The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA

Received April 14 1997; accepted 18 April 1997

The unique identification of genes and mouse strains is critical to their identification in research and in the
scientific literature. Rules for genetic nomenclature in mice have existed since the 1940s. The latest complete
revision of the rules was approved by the International Committee on Standardized Genetic Nomenclature for Mice
in November, 1993. Minor revisions have occurred since. The complete, current rules are available on-line from
The Jackson Laboratory's Mouse Genome Database (MGD), URL: http://www.informatics.jax.org. A printed
version appeared in Mouse Genome 92 (2), June, 1994, and in Genetic Variants and Strains of the Laboratory
Mouse, 3rd edition (Committee on Standardized Genetic Nomenclature for Mice, 1996a,b,c). The excerpted version
below gives the general guidelines for naming and symbolizing mouse genes, transgenes and transgenic strains,
targeted mutations and DNA markers. More detailed guidelines, including revisions as they are made, may be
found at the MGD Web site given above. Subparagraphs are numbered as in the complete guidelines so that the
user can refer easily from this excerpted version to the full text. Textual references to paragraphs not included here
may be found in the full text as well.

1. General rules for gene nomenclature used in gene symbols only for clarity, primarily to
separate characters that together might be confusing, e.g.:
Introduction
Gene nomenclature guidelines are based upon the premise (1) two numbers that would be in adjacent positions,
that the primary purpose of a gene or locus symbol is to such as Lamb1-2,
provide a brief and universally acceptable symbol that (2) -rs and - ps (related sequence and pseudogene,
uniquely identifies a specific gene or locus; all other respectively, see 1.1.4) from gene symbols,
purposes of a symbol are secondary and should not (3) as required, characters for loci in a complex from
interfere with this primary purpose. the complex symbol (see 1.1.12 and 1.2.1),
(4) components of mutant allele symbols, such as
Mod1a-m1Lws (see 1.1.7.7).
1.1.1. Names of genes or loci
Names of genes and loci should be brief and chosen to 2. The total number of characters in a locus symbol
convey as accurately as possible the character by which should not exceed 10 unless this maximum limit would
the gene is usually recognized. Genes are functional units, cause violation of some other rule.
whereas a locus can be any distinct, recognizable DNA 3. Except in the case of loci first discovered because
sequence (see 1.1.3). of a recessive mutation (see Section 1.1.7), the initial
letter of the locus symbol should be capital, and all
others lower case.
1.1.2. Symbols for genes
4. In published articles gene symbols should be set in
1. Symbols for genes should typically be two-, three-, italics, e.g. dw, dwarf; Hbb, haemoglobin b-chain.
or four-letter abbreviations of the name. Hyphens are 5. Identification of new loci should not be assumed
0962±8819 # 1997 Chapman & Hall
310 Davisson
from the discovery of variation, whether morphological, point substitutions, insertions or deletions (e.g. of retro-
biochemical, quantitative or antigenic. viral elements), and variations in simple sequence repeat
6. A proposed new symbol must never duplicate one numbers. Each such variant has the potential to define a
already used for another locus. Date of publication in a distinct genetic locus or DNA segment, given a specific
refereed journal, provided the symbol is acceptable, assay and appropriately recombinant mice, and the full set
establishes priority. Symbols for new loci can be reserved of variants possessed by any given form of gene or length
through the MGD at The Jackson Laboratory, Bar of chromosomal DNA defines a haplotype. The term
Harbor, Maine, USA. haplotype is also used to define a complement of alleles at
7. When a well-known locus has been recognized multiple loci within a complex (see Sections 1.2.1 and
initially by a mutation and later the structural locus is 1.2.3).
identified, the locus is identified by the symbol for the To describe the results of a DNA typing assay
structural locus and the mutant allele symbol is unambiguously, investigators should, therefore, give the
designated as a superscript to the structural locus symbol, gene name and symbol, the assay (e.g. probe or PCR
e.g. W, which is a mutation in Kit, becomes Kit W . primers, and restriction enzyme if any), and, if a specific
8. Symbols for quantitative trait loci (QTL) genes shall DNA segment within the gene is being assayed, the D
follow the rules for other types of genes (see 1.1.1). They locus symbol. An important function of chromosome
should be symbolized with 3-4 character symbols that are committees and of databases is to assemble such data to
acronyms of the name and begin with a capital letter. define haplotypes for genes. To avoid ambiguity, assay
Those affecting the same complex trait, i.e. in a series names and probe names should not resemble D symbols;
like Idd loci, shall be given the same stem symbol and use of the letter `D' should be reserved for DNA loci.
serially numbered. `q' may be used as the final letter D symbols are used in two ways:
preceding the serial number but is not required.
9. Expressed sequence tagged (EST) loci, when 1. Loci recognized by anonymous DNA probes should
mapped to chromosomes, may be given either D symbols be given D symbols. This use of D symbols as gene or
with a final `e' for expressed or the symbol ESTM###, locus symbols should be reserved for anonymous DNA
where M identifies mouse as opposed to human EST loci loci distinct from known genes.
and ### ˆ a serial number assigned from the Mouse 2. Intragenic loci may be given D symbols to
Genome Database (MGD). distinguish individual sites within a gene. Remember
10. Genes encoded by the opposite (antisense) strand that genes are functional units whereas loci can be any
of a known gene shall be given their own symbols. distinct DNA segment, and that several variant D loci
11. Alternative transcripts from the same gene should may lie within, and be used to assay, a gene. Variation in
not be given different `locus' symbols. DNA sequence of a known gene or tightly linked
12. For homologous genes, to facilitate comparative sequences should be described using the gene symbol
mapping one should use the mouse equivalent of the and current rules for nomenclature of genes and alleles.
symbol already adopted for the conserved gene in other Intragenic D locus symbols should only be used (a) when
species. For example, the gene symbol for amyloid beta describing intragenic mapping analysis or (b) when
precursor protein is APP in the human and cow genomes stating that a gene was typed using an intragenic D
and App in the mouse. If the exact symbol has already locus (see next paragraph for the most common example
been assigned to another gene in the mouse, the symbol of this).
should be modified to one resembling as closely as
possible that used in the other species; e.g. by adding a Note: While D symbols could be given to all variants
single letter such as `h' for homologue. One should also within a gene, the Committee discourages the use of D
try not to duplicate a symbol used for a different gene in symbols for intragenic DNA segments except when a D
another species. Do not insert the letter `m' or `M' (for locus symbol is needed to relate a segment to other
mouse) as the first letter of the symbol for a locus with objects within or near the gene. YAC ends may be given
homologues in other species. D symbols when they are used for genetic mapping.
(Note approved November 1995).
Loci recognized by variation in copy number of mini-
or microsatellites should be given D symbols. If such
1.1.3. Genes and loci recognized by DNA sequence
microsatellite (D symbol) loci are within or very near
A gene, including both transcribed DNA sequences and known genes and, thus, can be used to detect those
associated regulatory elements, can span a considerable genes, then the gene symbol should always be used to
length of chromosomal DNA. Sequences at many places refer to the gene, e.g. the gene's location on the
in a gene might, therefore, vary from strain to strain of chromosome, and the D symbols should only be used
mouse, and each variation could be any of several kinds: to refer to specific sites within the gene, e.g. to convey
Rules and guidelines for genetic nomenclature in mice: excerpted version 311
intragenic mapping information. When D symbol loci fall its function remains unknown, a lower case `e' for
within known genes, on general maps the gene will be expressed should be added to the D symbol, e.g.
identified by the gene symbol and the locus (D) symbols D1Pas5e. Newly identified expressed genes for which
will appear in locus lists and databases cross-referenced the gene product is unknown should also be given D
with the gene. Locus symbols would, of course, be used symbols followed by `e'.
on fine structure, high resolution maps around and within Anonymous DNA loci from the human genome that
genes. cross-hybridize with mouse DNA and are mapped to a
D symbols are composed of four parts: mouse chromosome retain their human symbol in all
uppercase and the mouse chromosome number and a
(1) D for DNA.
capital H, for human, are inserted after the D, e.g.
(2) 1 . . . 19, X and Y for the chromosomal assign-
D16H21S56 is a D locus on mouse Chromosome 16 that
ment, and 0 for unmapped loci.
cross-hybridizes with the probe for the 56th simple
(3) A 2±3 letter laboratory registration code indicating
sequence DNA locus from human Chromosome 21. This
the laboratory or scientist describing the locus. The same
same convention may be used for D loci derived from
symbol should be used for a laboratory's loci, chromo-
other species. Single-letter abbreviations for other species
some aberrations, and inbred substrains, e.g. Pas for
will be assigned by the International Nomenclature
Pasteur Institute. A code can be obtained from the central
Committee to assure that each letter unambiguously
registry maintained by the Institute of Laboratory Animal
indicates the species of origin. If a unique locus in
Resources (see section on Laboratory Registration Codes
another species identifies more than one mouse locus, the
below).
related sequence convention should be used (see 1.1.4).
(4) A unique serial number. It is preferred that
Alternative or new loci detected with primers for a
numbers be assigned to loci in the order the loci are
known locus belong to the lab discovering them and shall
described on each chromosome for a particular labora-
be symbolized by that lab with its lab code and next
tory, e.g. D1Pas5 the fifth D locus developed and
serial number on the chromosome. `Mismapped' loci
mapped on Chromosome 1 at the Pasteur Institute. The
should only be resymbolized if it can be proved that the
use of a number from the probe that detects the locus,
original locus does not exist in the original location.
e.g. D17Leh48, a Chromosome 17 D locus identified by
B1 (and other types of) repeat `loci' should only be
probe number Tu48 of Lehrach, is discouraged.
given symbols if there is evidence of a unique locus (e.g.
Recognizing that allelic variation detected by DNA probes genetic proof or polymorphism).
can be complex, the Committee proposes the following. In Xrf shall be used as the `lab code' in D symbols that
published papers, the allele type of a specific strain should designate cross-referenced genes in mice and yeast or
be given by fragment(s) size with a description of the other species. Symbols will be DChr#Xrf###, where ###
assay used. When simple allele symbols are required, as is a serial number assigned by the Johns Hopkins
for display of linkage data or listing in databases or locus database. Typically, a clone derived from yeast may hit
lists, a single letter abbreviation for the strain should be 2 (or more) mouse loci. These will not require the -rs
used (see Alleles, section 1.1.7.9). Allele symbols should nomenclature because the clone number in the symbol
be written as a lowercase superscript when appended to will identify relatedness between loci on different
the locus symbol; in tables of linkage data in publications chromosomes. A hyphenated serial number shall be used
a single uppercase letter denoting the strain may be used. when two loci detected by the same clone are on the
The assay or restriction enzyme must be specified for same chromosome.
each allele used in publication or entered into a database,
because the same two strains may differ with one assay Chromosomal regions detected cytologically and by
but be indistinguishable with another. That is, a strain may RFLPs
have of as designating the haplotype for that gene or At least five chromosomal regions can be detected by a
segment of DNA (see Section 1.1.3, para. 1 and Section specific cytological staining method that reveals the whole
1.1.7, general rules for allelic designation). region or loci within the region can be detected with DNA
If an anonymous DNA locus identified only by the D probes: centromeres, pericentromeric heterochromatin (C-
symbol is later identified to be a known locus or the bands), nucleolus organizer regions (NORs), homoge-
function of the gene is determined, the D symbol should neously staining regions (HSRs) and telomeres. Chromo-
either be replaced by the known gene's symbol or somal nomenclature guidelines should be followed for
changed to a new gene symbol that is an acronym for the cytologically detected chromosomal regions and gene
new gene's name. If more than one mutation is identified nomenclature for genetic loci. While we recognize the
within the gene, the original D symbol should be retained distinction blurs, please try to follow these guidelines;
for the mutated site it originally designated (see Section contact a member of the International Nomenclature
1.1.3, para. 3). If a D locus is shown to be expressed but Committee if in doubt. Hc# is the symbol for a
312 Davisson
cytologically detected C-band; loci recognized by DNA recombination is used as a mechanism to insert a
polymorphisms is heterochromatin should be given D transgene and it is the transgene itself that is of primary
locus symbols. Centromeres are designated by the symbol interest. The purpose of this designation is to enable the
Cen when detected cytologically or referred to as a unit. If user to identify it as a symbol for a transgene and to
it becomes possible and necessary to distinguish loci or distinguish between the three fundamentally different
segments within the centromere, these should be given D organizations of the introduced sequence relative to the
locus symbols. Polymorphic loci within NORs are host genome, not simply to indicate the method of
symbolized Rnr#, where Rnr ˆ ribosomal RNA and insertion or nature of the vector. To illustrate these
# ˆ the chromosome on which the locus is located. If distinctions, examples are given below.
more than one Rnr locus on the same chromosome is
· Mice derived by infection of embryos with MuLV
distinguished by genetic means, the loci should be serially
vectors are designated TgR; mice derived by microinjec-
numbered in order of identification, e.g. Rnr1-1, Rnr1-2.
tion of MuLV DNA into zygotes are designated TgN.
For homogeneously staining regions, HSR is incorporated
· Mice derived from ES cells by introduction of DNA
into a chromosome aberration symbol (see chromosome
followed by recombination with the homologous genomic
guidelines) or a D symbol is assigned to the locus
sequence are designated TgH; mice derived by random
amplified and Hsr is added when it is amplified, e.g.
insertion of the same sequence by nonhomologous
D1Lub1Hsr. Related sequence nomenclature (rs#) will be
crossing over events are designated TgN.
used for telomere sequence loci recognized by poly-
morphic variants with telomere sequences. Telomere B. The insert designation is a symbol for the salient
sequences detected by cytogenetic methods shall be features of the transgene, as determined by the investi-
symbolized Tel# like other chromosomal `anomalies or gator. It is always contained within parentheses and
variants'. consists of no more than 6 characters: letters (capitals,
or capitals and lower-case letters) or a combination of
letters and numbers. Insert designations longer than 6
Transgenes (Section 1.3 in the complete guidelines;
characters may be used only if the insert designation and
MGD; Committee 1996)
the laboratory assigned number (C. a. below) together are
All DNA sequences that are experimentally and stably 11 characters or less. Italics, super- or subscripts, internal
introduced into the germline of animals are considered spaces, and punctuation should not be used. While the
transgenes. They are named according to the following choice of the insert designation is up to the investigator,
conventions, which were developed by an interspecies the following guidelines should be followed:
committee sponsored by the Institute of Laboratory
a. The insert designation should identify the inserted
Animal Resources (ILAR, 1992). Transgenic symbols
sequence and indicate important features. Where the
can be registered with TBASE at the Human Genome
insertion uses sequences from a named gene, it should
Database (GDB) at Johns Hopkins (URL: http://
contain the standard symbol for that gene. Hyphens are
www.gdb.org/Dan/tbse/tbase.html).
omitted when using hyphenated gene symbols. If the
1. A transgene symbol consists of three parts, all in gene symbol exceeds the spaces available, use the
roman typeface, as shown below: beginning letters of the symbol. For example, Ins1
should be used within the symbols of transgenes
TgX(YYYYYY)#####Zzz
containing either coding or regulatory sequences from
the mouse insulin gene (Ins1) as an important part of the
Where: TgX ˆ mode,
insert designation. Gene symbols are not italicized when
(YYYYYY) ˆ insert designation, incorporated into transgene symbols.
b. Avoid using symbols that are identical to other
##### ˆ laboratory assigned number and
named genes in the same species. For example, the use
Zzz ˆ laboratory registration code of `Ins' to designate `insertion' would be incorrect.
c. Ideally two different gene constructs should not be
A. The mode, designates the transgene and always consists
identified by identical insert designations.
of the letters `Tg' followed by a letter designating the
d. To aid communication, standard abbreviations can
mode of insertion of the DNA: H for homologous
be used as part of the insert designation.
recombination, R for insertion via infection with a
retroviral vector, and N for nonhomologous insertion. These presently include:
`Knockout' or directed mutation of a specific known gene
should be designated using standard allele symbol con- An anonymous sequence
ventions, (see below). Transgenic nomenclature is used for Gen genomic
homologous recombination insertions when homologous Im insertional mutation
Rules and guidelines for genetic nomenclature in mice: excerpted version 313
Nc noncoding sequence series of microinjections done in the laboratory of Jon W.
Rp reporter sequence Gordon (Jwg).
Sn synthetic sequence · Crl:ICR-TgN(SVDhfr)432Jwg The SV40 early pro-
ET enhancer trap construct motor driving a mouse dihydrofolate reductase (Dhfr)
Pt promoter trap construct gene. This was a 4 kb plasmid, and this animal was the
32nd animal screened in the laboratory of Jon W. Gordon
This list will be expanded as needed and maintained by (Jwg). The ICR outbred mice were obtained from Charles
the Nomenclature Committee. River Laboratories (Crl).
· TgN(GPDHim)1Bir The human glycerol phosphate
e. The insert designation should identify the inserted
dehydrogenase, which caused an induced mutation (im);
sequence, not its location or phenotype.
the first transgenic line produced by Birkenmeier.
C. Laboratory Assigned Number and Laboratory Registra-
tion Code is a number and letter combination that must
Examples of insertional mutation designations
uniquely identify each independently inserted sequence. It
is formed of two parts: · hoTgN447Jwg The insertion of a transgene into the
hotfoot locus (ho).
a. The Laboratory Assigned Number is a number from · xxx TgN21Jwg The insertion of a transgene that leads to
1 to 99 999 that is uniquely assigned by the laboratory to a recessive mutation in a previously unidentified gene. A
each stably transmitted insertion. This assignment should gene symbol for xxx must be obtained from MGD.
be done at the time germline transmission is confirmed.
The number can have some intra-laboratory meaning or
simply be a number in a series of transgenes produced by Targeted mutations
the laboratory. The same number cannot be used more
than once by each lab. Rules for symbolizing targeted mutations are given in
b. The Laboratory Registration Code is uniquely Section 1.1.7.7 below. Currently, targeted mutations are
assigned to all laboratories originating transgenic animals, often maintained on a mixed genetic background derived
DNA loci or inbred strains (see Laboratory Registration from the embryonic stem (ES) cell line and the host strain
Code section below). used. Standardized nomenclature for naming such strains
or for congenic strains made by backcrossing the targeted
2. The complete designation identifies the inserted site, mutation onto a standard inbred background is given in
and provides a symbol for unique identification. When a Sections 3.1.2 and 3.3.
mutation that produces an observable phenotype is caused
by the insertion, the locus so identified must be named
according to standard procedures for the species involved. 1.1.7. Alleles
The allele of the locus identified by the insertion can
Alleles are usually designated by the locus symbol with an
then be identified by the abbreviated transgene symbol
added superscript (in italics when printed). In computer-
according to the conventions adopted for communication,
ized symbols the superscript may be denoted by prefixing
and supplies a unique identifier to distinguish it from all
an asterisk or enclosing the allele symbol in angle
other insertions. Each insertion retains the same symbol
brackets, e.g. Gpi1a or Gpi1 a or Gpi1 , a .. For D
even if it is placed on a different genetic background.
symbol locus alleles see section 1.1.3. When a sponta-
Specific lines of animals carrying the insertion should be
neous mutation is cloned or shown to occur in a
additionally distinguished by a stock designator preceding
previously named candidate gene, the mutation's symbol
the transgene symbol. In general, this designator will
is changed to become an allele at the cloned locus by
follow the established conventions for the naming of
turning the mutation symbol into an allele symbol, e.g. the
strains or stocks of the particular animal used. In cases
shi (shiverer) mutation in the Mbp (myelin basic protein)
where the background is a mixture of several strains,
gene becomes Mbpshi . If the original mutation symbol
stocks, or both, the transgene symbol should be used
already has a superscript, the mutation and allele symbols
without a strain or stock name. For rules on how to
are placed on one line in the new superscript and
designate strains derived from such mixed genetic
hyphenated, e.g. the shimld (myelin deficient) mutation
backgrounds or congenic strains or see the section below
becomes Mbpshi-mld (see also #2 below).
on targeted mutations and Sections 3.1.2 and 3.3.
1. In the case of mutant genes for which there is
Examples of transgenic strain designations clearly a wild-type, the symbol for the first discovered
· C57BL=6J-TgN(CD8GEN)23Jwg The human CD8 mutant allele becomes both the gene symbol and the
genomic clone inserted into C57BL=6 mice from The symbol for that allele. No superscript is then used, e.g.
Jackson Laboratory (J). The 23rd mouse screened in a Ca, caracul. When a new allele is discovered, it is
314 Davisson
symbolized by adding a superscript to the original 6. Indistinguishable alleles of independent origin (e.g.
symbol. reoccurrences, reversions to wild-type) should be desig-
2. Recessive alleles should be indicated by the use of nated by the existing gene symbol with a series symbol
a lower case initial letter for a mutant gene, e.g. a, non- (see below) appended as a superscript in italics. If the
agouti. All other alleles, whether dominant, codominant gene symbol already has a superscript, this should be
or having dominance relationships that vary with method separated from the series symbol by a hyphen. The series
of assessment, should be indicated by the use of a capital symbol should consist of an arabic numeral correspond-
initial letter followed by lower case letters, as in the locus ing to the serial number of the variant in any given
symbol, e.g. Ta, tabby. laboratory, plus the laboratory registration code. To avoid
Two exceptions to this rule are allowed for targeted the confusion of the numberal 1 and the letter l, a first-
and cloned mutant genes when the original cloned gene discovered variant may be left unnumbered, and the
symbol starts with an upper case letter: second variant numbered 2. When two named mutant
genes are found to be alleles at the same locus, the
1. If the phenotype of mutant alleles may be recessive symbol published or assigned first remains the locus
or codominant depending on the method of determina- symbol and the symbol of the second gene is super-
tion, the use of upper or lower case will depend upon scripted as an allele symbol for that mutation, e.g. hr rh ,
what the naming investigator considers the defining the rhino allele at hairless.
phenotype. For example, a targeted mutant allele of Tcra 7. Mutations or other variations occurring in known
created by Mombaerts can be symbolized Tcratm1Mom, alleles may be denoted by a superscript m followed by an
even though heterozygotes are not visibly different from appropriate series symbol (as above) and separated from
wild-type mice, if heterozygotes can be distinguished at the original allele symbol, if one exists, by a hyphen, e.g.
the DNA or protein level. Mod1a-m1Lws , the first mutant allele of Mod1a found by
2. When a mutation is shown to occur in a cloned Lewis. For known deletions of all or part of an allele, the
candidate gene and its symbol is changed to become an superscript m may be replaced with dl. Information on
allele of the cloned gene (see first paragraph of 1.1.7 the allele of origin of mutations may be valuable in
above), the first letter of the gene symbol may remain elucidating changes in DNA sequence. Mutant alleles
uppercase and the inheritance pattern may be conveyed in created by targeted mutagenesis should have a t
the allele symbol, e.g. the e (recessive yellow) and Eso preceding the m to denote targeted, e.g. Cftr tm1Unc , a
(somber) alleles at the Mc1r (melanocortin 1 receptor) targeted mutation of the cystic fibrosis transmembrane
gene become Mc1r e and Mc1r E-so . regulator gene created at the University of North
3. Allele superscripts should typically be one or two Carolina.
lower case letters and, if possible, should convey addi- 8. Mutant alleles that turn out subsequently to be
tional information about the allele, e.g. pun , pink-eyed deletions retain their allelic designation, e.g. Ta25H and
unstable allele of p of pink-eyed dilution. If information the various c-locus deletions retain their original symbols
is too complex to be conveyed conveniently in the even though they are now known to be deletions that
symbol, the alleles are given single letter superscripts encompass more than the Ta or c loci. If the deletion
and the information concerning the allelic properties is deletes more than one gene and is cytologically visible,
shown in catalogs or tables, e.g. Pgm1a , Pgm1b ; H2a , the deletion should be given a chromosome anomaly
H2b , etc. designation containing the original allele designation and
4. For alleles where the allele designates presence or the allele symbol is used as the abbreviation. For
absence of a virus or immune response, the allele for the example, Del(10)Sl12 1H is the first deletion discovered
presence of the virus or immune response is designated at Harwell; it is located on Chromosome 10 and was
by a superscript `a' and the allele for absence of the trait originally detected as the 12th steel allele at Harwell
by a superscript `b'. For loci governing resistance and (Sl12H ). Once referred to in a publication by the full
susceptibility to infectious organisms or other agents, designation, it may be abbreviated Sl12H . Information on
resistance is designated by a superscript `r' and the genes deleted becomes part of the description. If the
susceptibility by a superscript `s'. deletion deletes more than one gene, but is not
5. Wild-type should be designated by a ‡ sign, with cytologically detectable, the above nomenclature is
the locus symbol as superscript, e.g. ‡ p . Reversions from discouraged, although a cytological designation may be
a mutant allele to wild-type should be distinguished from given in the future if improved techniques reveal the
the original wild-type allele by designating them by the deletion cytologically. The term `cytological' refers to
locus symbol, with a ‡ sign as superscript e.g. pun‡J . cases where the deletion can be detected by simple
A ‡ sign alone may be used when the context leaves no staining methods and visual examination of chromo-
doubt as to the locus represented, e.g. in genetic somes. For a deletion of multiple genes detected only by
formulae. methods such as in situ hybridization of gene-specific
Rules and guidelines for genetic nomenclature in mice: excerpted version 315
probes, it is left to the investigator to determine the most regarded as provisional until the true functions of the
useful terminology. genes become known, when they should be renamed, e.g.
9. To display polymorphic data from a multipoint Erbb became epidermal growth factor receptor, Egfr.
cross in a table for publications, it is acceptable to use
single letter abbreviations for the strains involved in the
cross to designate the strain origin of the alleles. A 1.1.11. Phenotype symbols
footnote to the table should point out that these designate Phenotype symbols, where these are necessary (e.g.
strain of origin rather than allele symbols, e.g. B for antigen loci, enzyme loci), should be the same as
C57BL=6 vs S for Mus spretus. In databases where genotype symbols except that symbols for phenotypes
single letters are needed to make comparisons among should be in capitals, not italicized, and with superscripts
strains, the letter used will refer to the `haplotype' or lowered to the line. The phenotypes of heterozygotes
constellation of different variants revealed by different should be written as in the following example: GPI1A,
assays that make up the phenotype of a particular locus GPI1B, and GPI1AB are phenotypes associated with the
for a particular strain. The first strain in which the Gpi1 locus a and b alleles.
`phenotype' is described is the prototype and determines
the allele symbol. If two strains thought to have the same
allele are distinguished from each other by a different 1.1.12. Gene complexes
assay, one of them is given a new simple allele Gene complexes are considered to exist when a number of
designation. Other strains with the original allele apparently functionally or evolutionarily related loci are
designation, retain the original until they are typed with genetically closely linked. Alternative states of complexes
the new assay to determine their appropriate allele are referred to as haplotypes rather than alleles. Known
designation. complexes are of two main types: (a) less extensive
complexes involving duplicate loci or in which operators
1.1.8. Lethals or cis-acting regulators of structural genes for protein
show little or no recombination with the loci on which
Appropriate locus symbols for recessive lethals with no they act: and (b) very extensive complexes, possibly
known heterozygous effect and unidentified function involving hundreds of related loci, for which special rules
consist of a lower case letter l followed by the number may be necessary. The H2 and the immunoglobin
of the chromosome on which the locus is located in complexes are in category (b). The complete nomenclature
parentheses, and series symbol indicating the serial rules contain guidelines for less extensive complexes
number of the lethal in the laboratory of origin, e.g. involving operators, cis-acting regulators, or duplicate
l(17)2Pas, the second lethal on Chr. 17 found at the loci, and for very extensive complexes with special rules,
Institut Pasteur. including the H2 complex (see 1.2.3), immunoglobulin
complexes (see 1.2.5), globin gene complexes (see 1.2.4),
1.1.9. Viruses homeobox-containing gene complexes (see 1.2.6), and the
t-complex.
Nomenclature for genes related to the expression of viral
antigens, or to sensitivity or resistance to viruses, should
follow the standard rules for gene nomenclature. Where 1.1.13. Mitochondrial genome
possible and appropriate, the letters of the symbol should Loci in the mitochondrial genome should be denoted by
be those by which the virus is usually known; e.g. Mtv1, a the prefix mt-set off from the main symbol by a hyphen.
locus concerned in induction of mammary tumour virus,
MTV. Successive loci concerned with the same virus
should be distinguished by appending serial numbers; e.g. 1.1.14. Antigenic variants
Fv1, Fv2.
Symbols adopted for loci concerned in cell-mem-
brane alloantigens should be based on the method of
1.1.10. Oncogenes demonstrating such loci. Brief examples of the different
types are listed below with reference to the sections in
Nomenclature for mouse cellular oncogene sequences
which detailed rules, if they exist, can be found.
should follow the standard nomenclature for oncogenes.
When referring specifically to the mouse locus, however, 1. Loci primarily demonstrable by transplantation
in lists of symbols and maps, the prefix c-denoting techniques should be designated by an initial H; e.g.
cellular sequence should be omitted and the initial letter H1, H2, etc. (see 1.2.3).
of the symbol should be capitalized; e.g. c-myc becomes 2. Loci demonstrable by red-cell agglutination should
Myc. The names and symbols of oncogenes should be be designated by the letters Ea; e.g. Ea1, Ea2.
316 Davisson
3. Loci coding for a cell surface molecule on designations that do not conform. For example, see
lymphocytes, or shared by lymphocytes and other cell Section 3.1.6. Any strains with a common origin
types, and detected by serological or biochemical meth- separated before F20 shall be regarded as related inbred
ods, and for which there is a demonstrable polymorph- strains and be given symbols that indicate relationship,
ism, should be designated by the letters Cd, if the CD e.g. NZB, NZC, NZO.
antigen is known, and Ly, if not. Inbred targeted mutation strains derived from only two
4. Similarly, other antigen loci involving other cell strain (including the ES cell line as one) but not exactly
types should be denoted by symbols indicating the cell recombinant inbred strains (see below) may be designated
type, e.g. Pca, plasma cell antigen; Tla, thymus leuk- using abbreviations for the two strains separated by a
aemia antigen. comma, e.g. B6, 129-gene symbol.

The complete text of the nomenclature guidelines


(http://www.informatics.jax.org: Committee, 1996) also
3.1.8. Use of the Laboratory Registration Code to
contains detailed rules for naming and symbolizing
designate colony holders
pseudogenes, related sequence loci, loci which are
members of a series or family, special gene complexes, A particular colony is normally indicated by appending a
chromosome anomalies and banding patterns, and various Laboratory Registration Code to the end of the
types of mouse strains. strain=substrain symbol preceeded by an `@' sign, e.g.
SJL@J, the colony of strain SJL bred at The Jackson
Laboratory. Note that at present there are no recognized
Laboratory Registration Codes
substrains of SJL. Should another substrain be identified
Laboratory Registration Codes originated as a means of in the future at a different laboratory, The Jackson
identifying substrains of mice, rats and rabbits held in Laboratory colony would then be designated SJL=J@J,
different institutions or by different investigators. They or optionally, it could be designated in this way at any
have subsequently come to be used in symbols for time in anticipation of the development of such substrains.
transgenes, DNA loci, targeted mutations, and chromo- Other examples: C3H=He@N, the NIH colony of strain
some anomalies. Codes are typically a three-letter C3H, substrain He, bred at the NIH; CBA=Ca-se@J, the
acronym for an institution or an investigator. Only the Ca substrain of CBA, carrying the se mutation, and bred
first letter of the laboratory code is capitalized. Unique at The Jackson Laboratory; C57BL=6J@Arc, the J
codes are assigned from a central registry maintained by substrain of substrain 6 of C57BL bred at Arc. If the
the Institute of Laboratory Animal Resources (ILAR) in substrain symbol and laboratory code are the same, as in
Washington, DC, USA. For a current list of approved lab SJL=J@J, for simplicity the @J may be dropped. In all
codes or to register a new lab code, use the World Wide cases the laboratory code is the last symbol used, and is
Web and go to URL address http://www2.nas.edu= meant to indicate that the environmental conditions and
labcode=. previous history of this colony are unique. The @ symbol
usually signifies a strain or substrain is held in a different
colony from the original but no genetic differences from
3. Rules for nomenclature for inbred strains and the original substrain have been identified and, therefore,
outbred stocks it is not considered a true substrain.
Note that many breeders maintain different colonies of
3.1.2. Symbols for inbred strains
the same strain and substrain in different animal rooms,
Inbred strains shall be designated by a capital letter or buildings, or even at different sites, but only have a
letters in roman type or, less preferably, by a combination single laboratory registration code. It is up to the
of capital letters and numbers, beginning with a letter. individual organization to decide whether it would be
Anyone naming a new strain should consult the most appropriate to have a separate laboratory registration code
recent listing of inbred strains (published annually in for each site. As far as possible, in publications the exact
Mouse Genome) to avoid duplication. Brief symbols are origin of the animals used should be stated.
preferred. New inbred strain symbols should be registered One significance of manipulative processes such as
as soon as possible after inbreeding is completed. In some fostering is that they may add or eliminate vertically
cases an appropriate designation may be reserved prior to transmitted viruses. If the identity of an added or
the completion of inbreeding. The contact person for removed virus is known, or if a virus is added artificially,
strain registration is given in current issues of Mouse this should be indicated by appending a symbol for the
Genome. virus, in capital letters, followed by a ‡ or ÿ sign
Exception to the symbol guidelines are allowed in the according to whether it is present or absent, e.g.
case of stocks already widely used and known by C3H=He-MTV ‡ @N, a strain produced by inoculating
Rules and guidelines for genetic nomenclature in mice: excerpted version 317
purified MTV into young hysterectomy-derived C3H=He (with the recipient strain given first, followed by the
mice bred at the NIH. donor strain) separated by a lower case `c' (e.g. CcS, a
Because the strain name can not hope to reflect the set of recombinant congenic strains from a cross between
full previous history of a strain, breeders are advised to BALB=c and STS, backcrossed to BALB=c). Individual
keep detailed records of the history of the strain, strains of the series are distinguished by appending
including details of any manipulation such as cross numbers to the strain symbols (e.g. CcS1). Avoid
fostering, freeze preservation, hysterectomy re-derivation hyphenated symbols unless the second strain ends in a
etc. in case such information may be relevant to the number. The number of generations of backcrossing
interpretation of observed results. should normally form part of the history of each
Note that previous rules included symbols for various individual strain, but may be indicated in parentheses
manipulative procedures as part of the strain nomenclature where more than two backcross generations were used
(e.g. e ˆ embryo transfer, f ˆ foster nursing, h ˆ hand (e.g. CcS1(N4)).
rearing, o ˆ ovary transplant, p ˆ freeze preservation,
fh ˆ fostered on hand-reared). These symbols are now
discontinued except where they form part of a well 3.3. Co-isogenic, congenic and segregating inbred
recognized strain name (e.g. C3HeB=De, a substrain of strains
C3H formed by transfer of fertilized ova to strain C57BL). Two strains that are genetically identical (i.e. isogenic),
Laboratory Registration Codes should not be accumu- except for a difference at a single locus, are called co-
lated except when long-term maintenance of the colony isogenic. True co-isogenicity can probably be achieved
results in the establishment of a new substrain according only by mutation within an existing inbred strain, while
to current rules (see Section 3.1.5). lines obtained by inbreeding with forced heterozygosis
(segregating inbred), or by backcrossing to an inbred
3.2. Recombinant inbred and recombinant congenic strain (congenic), usually differ in a short chromosomal
strains segment rather than a single gene.
1. Recombinant inbred (RI) strains are formed by 1. Co-isogenic strains are formed by the occurrence of
crossing two inbred strains, followed by 20 or more a mutation within an inbred strain. They should be
generations of b 3 s mating (Bailey, 1971). The names of designated by the strain symbol and, where appropriate,
such strains should consist of an abbreviation of both the substrain symbol followed by a hyphen and the gene
parental strain names separated by a capital X (the letter symbol of the differential allele (in italics in printed
X) with no intervening spaces, e.g. CXB, a set of articles e.g. DBA=Ha-d‡ ). When the mutant gene is
recombinant inbred strains derived from a cross of maintained in the heterozygous condition, this should be
BALB=c 3 C57BL. Different RI strains in the same indicated by including a `‡' sign in the symbol: e.g.
series should be distinguished by numbers (e.g. BXD2, A=Faÿ‡=c; C3H=Nÿ‡=W v. Such strains differ from the
BXD3, etc.) except for strains already well known by a substrains described in Section 3.1.5 and 3.1.6 in that in
different designation. If the abbreviation of the male co-isogenic strains the genetic difference is simple and
parental strain ends in a number, a hyphen may be used precisely defined. The number of generations of inbreed-
to separate the series numbers that distinguish strains, ing since the mutation arose in a co-isogenic strain
e.g. CXB6-2. Punctuation in strain symbols, however, is should be indicated by intercalating a ‡, or ‡ M ‡ in the
to be avoided. indication of inbreeding, e.g. F110 ‡ F23 or F110 ‡
2. Recombinant congenic strains. Strains formed by M ‡ F23, 23 generations of b 3 s matings since the
crossing two inbred strains, followed by a few (usually occurrence of a mutation at F110 in an inbred strain.
two) brackcrosses to one of the parental strains (the 2. Congenic strains are produced by repeated back-
`background' strain), with subsequent inbreeding without crosses to an inbred strain. They should be designated by
selection for specific markers are known as recombinant a compound symbol consisting of two parts separated by
congenic (RC) strains (Demant and Hart, 1986). RC a period: the full or abbreviated symbol of the back-
strains should be regarded as fully inbred when the ground strain followed by (i) a period and an abbreviated
theoretical coefficient of inbreeding approximates that of symbol of the donor strain, and (ii) a hyphen and the
a standard inbred strain. For this purpose, one generation symbol of the differential locus or loci and allele(s) (in
of backcrossing will be regarded as being equivalent to italics), e.g. B10.129- H-12b , a strain with the genetic
two generations of brother 3 sister mating. Thus a strain background of C57BL=10Sn (ˆB10), but which differs
produced by two backcrosses (N3, equivalent to F6 ) from that strain in a differential allele ( H-12b ) derived
followed by 14 generations of b 3 s mating (F14 ) would from strain 129=J. If several lines derived from the same
be fully inbred. RC strains are designated by an upper donor strain are available, the individual lines are
case abbreviation of the names of the two parental strains distinguished by a number and=or letter in parentheses;
318 Davisson
e.g. B10.129(12M)- H-12 b , B10.129(21M)- H-4 b p. In some backcrossed to C57BL=6J. In these cases, no period
cases where the use of such a full symbol is followed by the donor strain is required because the
inappropriate, as when the donor strain is not inbred, or strain of origin is shown in the superscript. Capitalization
the genetic difference is complex, or where the strain is of all letters in the superscript distinguishes it from allele
already widely known, a less complete symbol may be symbols which are lower case. As with congenic strains,
used, e.g. B10- pa H-3e at ; B10.129(5M); A.SW. a minimum of 10 backcross generations is required,
Congenic lines that differ at a histocompatibility locus counting the F1 generation as gernation 1.
and, therefore, resist each other's grafts are called 5. Conplastic strains are developed by backcrossing the
congenic resistant (CR) lines. nuclear genome from one strain into the cytoplasm of
A strain developed by this method shall be regarded as another, i.e. the mitochondrial parent is always the female
congenic when a minimum of 10 backcross generations to parent during the backcrossing program. The designation
the background strain have been made, counting the first is NUCLEAR GENOME-mtCYTOPLASMIC GENOME , e.g.
hybrid or F1 generation as generation 1. The number of C57BL=6J-mtBALB=c , is a strain with the nuclear genome
backcross generations shall be indicated by N and the of C57BL=6J, and the cytoplasmic genome of BALB=c,
number in parentheses. In the case where it is necessary to developed by crossing male C57BL=6J mice with
employ more complex mating systems, the generations BALB=c females, followed by repeated backcrossing of
should be expressed as N equivalents (NE) and the strain female offspring to male C57BL=6J. As with congenic
regarded as congenic at a minimum of NE10. For strains, a minimum of 10 backcross generations is
example, when backcrossing a recessive gene onto an required, counting the F1 generation as generation 1.
inbred background, after 10 rounds of backcrossing and
intercrossing (to recover a homozygote for the next
backcross), the strain would be at NE10. When a congenic 3.4. Hybrids
strain is maintained by b 3 s matings after backcrossing, F1 hybrids are designated by listing the female progenitor
the number of b 3 s generations follows the number of first and the male progenitor second. The symbol can be
backcross generations, e.g. (N10F6), 10 generations of abbreviated using standard strain abbreviations. Thus,
backcrossing followed by 6 generations of b 3 s inbreed- B6D2F1 mice are the offspring of a C57BL=6J female
ing; (NE12F17), a complex system of backcrosses and mated to a DBA=2J male; D2B6F1 mice are offspring of
intercrosses genetically equivalent to 12 backcrosses, the reciprocal mating. The two F1 hybrids differ in the Y-
followed by 17 generations of b 3 s matings. chromosome present in the males, and have been exposed
3. Segregating inbred strains are developed by in- to a different maternal environment. They should not be
breeding with forced heterozygosis. They shall be considered genetically identical. The correct full strain
designated, like co-isogenic strains, by a strain symbol symbol should be given the first time the hybrid is
followed by a hyphen and the gene symbol of the mentioned in a publication; the abbreviated symbol can be
segregating locus. Indication of the segregating locus is used subsequently. Hybrids from backcrosses and three- or
optional when it is part of the standard genotype of that four-way crosses can be designated in the same way, that
strain, and the hyphen or entire gene symbol may be is, by giving the designation of the female parent first and
omitted; examples are 129 or 129 cch= c (129 is the designation of the male parent second. The hybrid
customary); WBÿW=‡ or WB. The minimal inbreeding parent in such a cross is enclosed in parentheses followed
requirement for such a strain shall be the same as that by the generation number, e.g. B6(D2AKRF1). The
for an inbred strain, i.e. 20 generations of b 3 s matings. construction of these hybrids should be described in full
For segregating inbred strains developed by inbreeding with complete strain designations before using the
with forced heterozygosis, the number of generations of abbreviated symbol.
such breeding may be indicated by FH and the number in
parentheses, e.g. (FH27), a strain developed by 27
generations of b 3 s matings with forced heterozygosis. 3.5. Laboratory mice and outbred stocks
4. Consomic strains are produced by repeated back- In addition, to standard inbred strains, there is a need for
crossing of a whole chromosome such as the X or Y standard ways to refer to laboratory mice in general and to
chromosome onto an inbred strain. The generic designa- outbred stocks.
tion for consomic strains is HOST STRAIN-
CHROMOSOME DONOR STRAIN , e.g. C57BL=6J-YAKR is a 1. Laboratory mice. Since laboratory strains are
consomic strain with the Y-chromosome of strain AKR neither pure Mus domesticus nor Mus musculus, they
backcrossed onto C57BL=6J. When the chromosome should be referred to with the words `laboratory mice' or
comes from a non-inbred background, an appropriate by the inbred strain name when known.
abbreviation designating the stock or origin may be used, 2. Non-inbred stocks. Outbred or random bred stocks
e.g. C57BL=6J-YDOM has a M.m. domesticus Y-chromo- are sometimes given specific designations if they meet
Rules and guidelines for genetic nomenclature in mice: excerpted version 319
specific criteria (ICLA, 1972). Stock designations must Lyon, M.F., Rastan, S. and Searle, A.G. eds. Genetic Variants
not be the same as those for inbred strains of the same and Strains of The Laboratory Mouse, 3rd Ed. Oxford, UK:
species. New stock symbols should be registered as for Oxford University Press, pp. 1±16.
inbred strains (see 3.1.2 and 3.1.7). Committee on Standarized Genetic Nomenclature for Mice (1996b)
Rules for nomenclature of chromosome anomalies. In: Lyon,
M.F., Raston, S. and Searle, A.G. eds. Genetic Variants and
Summary Strains of The Laboratory Mouse, 3rd Ed. Oxford; UK:
Oxford University Press. pp. 1443±5.
The rules and guidelines for genetic nomenclature in mice Committee on Standardized Genetic Nomenclature for Mice
are intended to ensure the accurate and unique identifica- (1996c) Rules for nomenclature of inbred strains. In: Lyon,
tion of genes, genetically manipulated mice and strains. M.F., Raston, S. and Searle, A.G. Genetic Variants and Strains
The Nomenclature Committee strongly urges the scientific of the Laboratory Mouse, 3rd Ed. Oxford, UK: Oxford
community to follow the rules as closely as possible. The University Press. pp. 1532±6.
rapid advances in genetic technology and molecular Demant, P. and Hart, A.A.M. (1986) Recombinant congenic strains
characterization of genes, however, are continually pre- ± a new tool for analyzing genetic traits determined by more
senting the Committee with situations that the rules do not than one gene. Immunogenetics 24, 416±22.
ICLA (International Committee on Laboratory Animals) (1972)
quite fit. Thus, the guidelines are intended to provide
International standardized nomenclature for outbred stocks of
guidance for naming and symbolizing genetic elements laboratory animals. ICLA Bull. 30, 4±17.
without being absolutely rigid. For example, despite ILAR (Institute of Laboratory Animal Resources), Committee on
several attempts over the past 15 years to develop a Transgenic Nomenclature (1992) Standardized nomenclature
standard nomenclature for trangenes, nomenclature used for transgenic animals. ILAR News, 34, 45±52.
by investigators still is not consistent across transgenic MGD (Mouse Genome Database), Mouse Genome Informatics,
methods and models. Most recently, a committee formed The Jackson Laboratory, Bar Harbor, Maine. World Wide Web
by the Institute of Laboratory Animal Resources drew up (URL: http://www.informatics.jax.org=).
guidelines, soliciting input from many members of the
scientific community who were currently making trans-
genic animals (ILAR, 1992). The nomenclature proposed Additional Bibliography
by this committee was an attempt to strike a balance
Beechey, C.V. (1992) Maps of chromosome anomalies in the
between the long and, often complicated, symbols some mouse. Mouse Genome 90, 45±65.
investigators wanted to use and the simple three-four letter Evans, E.P. (1989) Standard normal chromosomes In: Lyon, M.F.,
symbols others favoured. Once a complete symbol is used Rastan, S. and Searle, A.G. eds. Genetic Variants and Strains
in a paper and a shorter, more common abbreviation is of the Laboratory Mouse, 3rd ed., Oxford, UK: Oxford
introduced, the abbreviation can be use throughout the rest University Press, pp. 1446±1451.
of the paper. The most important principle to keep in mind IUPAC-IUB Commission on Biochemical Nomenclature (CBN)
when symbolizing genes, transgenes, targeted mutations, (1978) Nomenclature of multiple forms of enzymes. Recom-
or strains and stocks is that the purpose of the symbol is mendations (1976). Arch. Biochem. Biophys. 185, 1±3.
to provide a unique identifier or name and not to convey Klein, J. and others (1990) Revised nomenclature of mouse H-2
all the information known about the genetic element being genes. Immunogenetics 32, 147±149.
Lyon, M.F., Barker, J.E. and Popp, R.A. (1988) Mouse globin gene
symbolized. When in doubt about a symbol, we encourage
nomenclature. J. Hered. 79, 93±5.
you to contact a member of the Nomenclature Committee Morse, H.C. (1992) Genetic nomenclature for loci controlling
(see MGD, http://www.informatics.jax.org) or the MGD, surface antigens of mouse hemopoietic cells. J. Immunol. 149,
nomenclature coordinator. 3129±34.
Nesbitt, M.N. and Francke, U. (1973) A system of nomenclature
for band patterns of mouse chromosomes. Chromosoma 41,
References
145±58.
Bailey, D.W. (1971) Recombinant inbred strains, an aid to finding Sawyer, J.R., Moore, M.M. and Hozier, J.C. (1987) High resolution
identity, linkage, and function of histocompatibility and other G-banded chromosomes of the mouse. Chromosoma 95,
genes. Transplantation 11, 325±7. 350±8.
Committee on Standardized Genetic Nomenclature for Mice Scott, M.P. (1992) Vertebrate homeobox gene nomenclature. Cell
(1996a) Rules and guidelines for gene nomenclature. In: 71, 551±3.

You might also like