Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
6Activity
0 of .
Results for:
No results containing your search query
P. 1
Graphical Model Representation of Pedigree Based Mixed Model

Graphical Model Representation of Pedigree Based Mixed Model

Ratings: (0)|Views: 781|Likes:
Published by Gregor Gorjanc
Gorjanc G. 2010. Graphical Model Representation of Pedigree Based Mixed Model. Contribution for 32nd International Conference on Information Technology Iinerfaces (http://iti.srce.hr/)
Gorjanc G. 2010. Graphical Model Representation of Pedigree Based Mixed Model. Contribution for 32nd International Conference on Information Technology Iinerfaces (http://iti.srce.hr/)

More info:

Categories:Types, Research
Published by: Gregor Gorjanc on Feb 25, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/02/2012

pdf

text

original

 
Graphical Model Representation of Pedigree Based Mixed Model
Gregor Gorjanc
University of Ljubljana, Biotechnical Faculty, Department of Animal Science, Groblje 3,1230 Domžale, Slovenia E-mail: gregor.gorjanc@bf.uni-lj.si
 
Abstract
.
 Pedigree based mixed model represents a simplistic yet robust and powerful model frequently used in animal and plant breeding, evolutionary biology, and human genetics. In the Bayesian setting inference of all model parameters can be performed with the useof well known McMC methods. Algorithms arecommonly formulated with matrices, which provides a generic view, but hindersinterpretation. Here, a generic graphical model representation is developed. This eases theinterpretation of the model and used algorithms. In addition, graphical model formulation provides a way to fit pedigree based mixed model in standard graphical model programs, such as BUGS.
Keywords.
 
 pedigree, animal model, graphicalmodel, McMC
1. Introduction
Several statistical models are being used inthe field of genetics. One of these models is pedigree based mixed model of Henderson [3],which is a “modern formulation” of Fishers’infinitesimal model [1]. In this model phenotypicvalues and pedigree information are being usedto learn about (infer) environmental and genetic parameters. Pedigree based mixed model is in principle a simplistic genetic model as the onlygenetic information comes from the expectedrelationships upon the known pedigree of individuals. The true genetic state is in practiceseldom known and therefore seldom used.However, pedigree based mixed model is veryrobust model and proved to be very powerful anduseful in various areas. It is frequently used inthe area of animal and plant genetics for researchand selective breeding, in evolutionary biologyfor quantification of evolutionary forces, and inhuman genetics for the initial screening of genetic aetiology of diseases. The use of  pedigree based mixed model is probably themost spread in the field of animal breeding,where the model is commonly called “animalmodel” as the performance of individual animalsis being modelled. Detailed description of  pedigree based mixed model is out of the scopeof this work. The interested reader can findadditional details and literature pointers in [3, 8,9].The aim of this work is to present a graphicalmodel representation of pedigree based mixedmodel, which eases interpretation and enablesfitting in graphical model programs, such asBUGS [7]. Only additive part of genetic effectwill be treated.
2. Model
Data in Table 1 is used to serve as anexample. Example is small, but shows all the peculiarities that are important for standardanimal model. Pedigree (genealogy) contains tenindividuals among whom some have both parentsunknown, one parent unknown, or both parentsknown. Some individuals have no, one, or moredescendants. It is assumed that pedigree iscorrect. Some individuals have observed phenotypic value (individual 2 has two repeatedrecords), which could for example represent ameasure of some physical trait on an individual.It is assumed here that phenotype measure is unitless. In addition to phenotypic value a groupmembership is known, where it is supposed thatgroup membership has an effects on the phenotype. Examples of group membership are:gender, breed/race, etc.Model for the example data can be written inmatrix form as (1):
eZaXby
++=
, (1)where
y
is a vector of phenotypes,
b
is a vector of so called fixed effects (group in the presentedexample),
a
is a vector of additive geneticvalues,
X
and
Z
are design matrices, while
e
 
 
is a vector of residuals. Standard modelassumptions are (2-4):
( )
22
,~,,|
ee
 N 
σ σ 
IZaXbaby
+
, (2)
constant~
b
, (3)
( )
22
,~,|
aa
 N 
σ σ 
A0Aa
, (4)where
2
e
σ 
is residual variance,
2
a
σ 
is additivegenetic variance, and
A
is numerator relationship matrix constructed from the pedigree[3, 8]. Matrix
A
introduces genetic informationin the model (relationship among pedigreemembers) and accounts for selection, drift, non-random mating, and inbreeding in population [5].
Table 1. Example data
Id Father Mother Group Phenotype1 / / / /2 / / 1 1032 / / 1 1063 2 1 1 984 2 / 2 1015 4 3 2 1066 2 3 2 937 5 6 / /8 5 6 / /9 / / / /10 8 9 1 109
3. Inference
The only known component of model (1) are phenotypic values (
y
), while other quantities areunknown and treated as parameters:
b
,
a
,
2
a
σ 
,and
2
e
σ 
. Knowledge of these parameters can beinferred from the collected phenotypic valuesand postulated model via Bayes theorem. For themoment, it will be assumed that
2
a
σ 
and
2
e
σ 
areknown. The posterior distribution for 
 p
location parameters
( )
abθ
,
=
is then (5):
( )
.,|,,|,,,|
2222
aeae
 p p p
σ σ σ σ 
ΑaabyΑyθ
×
(5)This posterior distribution (5) is of amultivariate normal form (6):
( )
( )
2122
,ˆ~,,,|
eae
 N  p
σ σ σ 
CθΑyθ
, (6)where
θ
ˆ
and
C
are the solution and left handside of the so called Mixed Model Equations(MME, 7), respectively:
    =        +
yZyXabGZZXZ ZXXX
ˆˆ
1
, (7)where
α 
11
=
AG
and
22
ae
σ σ α 
=
[3]. Theinverse of 
A
needed in (5) can be obtaineddirectly from the pedigree list using thealgorithm of Henderson [2] and Quaas [11].They derived the algorithm upon the fact that breeding values of all pedigree members can berepresented as a linear combination (8-9):
( ) ( )
)
iimi f i
waaa
++=
21
, (8)
Twa
=
, (9)where
( )
i f 
and
( )
im
give index for father andmother of individual
i
, respectively, matrix
T
 traces flow of genes, and is commonlyreferred as Mendelian sampling – a deviation(residual) from the average (expected) additivegenetic value of parents [5]. Covariance matrixof (9) and its inverse is then:
( )
22
aa
Var 
σ σ 
TWTAa
==
, (10)
( )
1111
=
TWTA
, (11)where
W
is diagonal matrix with known valuesfrom the theory [2, 11], while
1
T
can be setdirectly from the pedigree (all values are zeroexcept for the 0.5 at the position of parents)leading to an efficient algorithm [2, 11].The left-hand side of MME for the exampledata set (Table 1) is shown in Table 2 at the endof this contribution. The evaluation of (6) isdemanding for large system, particularly due to
1
C
. To alleviate this issue, Markov chainMonte Carlo (McMC) techniques can be usedinstead. Samples from (6) can be for exampleobtained with single component McMC (Gibbs)sampler [9] using (12):
( )
( )
21,22
,ˆ~,,,|
eiiiaeii
 N  p
σ θ σ σ θ 
Αyθ
, (12)
 
where
i
θ
represents all the elements of 
θ
 except the
i
-th one. Solution
i
θ 
ˆ
can be obtainedfrom (13):
( )
iiiiiii
,,
/ˆ
=
θC
θ 
, (13)where
i
is the
i
-th component of right handside in (7).The construction of this sampler follows fromthe properties of normal distribution and linear algebra view of the problem [9]. This treatment provides a generic view of Gibbs sampler for location parameters in animal model, butsomewhat hinders interpretation as allcomponents are represented with matrices. Inaddition, specialized programs are needed toaccommodate the structure of MME, especiallydue to the pedigree prior information – therelationships among pedigree members (4).When variance components are not known thesampling algorithm (12) can be extended. Firstmodel (2-4) need to be extended with priors for variance components:
2
e
 p
σ 
and
2
a
 p
σ 
. Thenthe full conditional distributions for variancecomponents are:
( ) ( ) ( )
222
,,|else|
eee
 p p p
σ σ σ 
bay
, (14)
222
,|else|
aaa
 p p p
σ σ σ 
Aa
. (15)Details are not shown, but above expressionsinvolve quadratic forms:
ee
in (14) and
aAa
1
in (15). The later can be efficientlyevaluated using (9) and (11) as shown in (16).This means that once and
1
W
are availableany program can be used to sample from (14)and (15) because
1
W
is a diagonal matrix withfixed known values following the theory [2, 11].Additional details about (14) and (15) and thewhole McMC implementation for pedigree basedmixed models are meticulously described in [9].
wWwaAa
11
=
(16)The use of McMC methods inherently leadsto the monitoring of sampling process byevaluating the burn-in period, mixing andconvergence to stationary distribution. Mixing isoften unsatisfactory in pedigree based mixedmodels, especially if the models are large andhighly parameterized as it is often the case inanimal breeding [e.g. 9]. An alternative tosampling based methods are variational methods[e.g. 13]. This class of methods givesapproximate solution in considerably shorter time. The representation of variational methodsis naturally coupled with graphical models thatare the subject of next section.
4. Graphical model view
The model (1-4) can be expressed in a seriesof equations (17):
11
2212
eab y
++=
,
22
2212
eab y
++=
,
1010110
eab y
++=
, (17)
11
wa
=
,
22
wa
=
,
( )
109810
21
waaa
++=
,which naturally lead to intuitive graphrepresentation, where variables are representedwith nodes and functional dependencies withlinks between nodes. A formalism of such arepresentation is provided by graphical models[e.g. 4, 6].Graphical models can be seen as a marriage between probability theory and graph theory [4].One of the most popular representations of graphical models is Directed Acyclic Graph(DAG) also called Bayesian or belief network and influence or causal diagram. DAG for model(1) of the example data (Table 1) is shown inFig. 1, where the nodes represent modelvariables
( )
aby
,,
, while directed arcs represent postulated causal structure of the model.Variance components are assumed known andtheir nodes with associated arcs (for each pair of 
ie
 y
2
σ 
and
ia
a
2
σ 
) are omitted from thegraph to avoid clutter. There is also nodistinction about which variables are observedand which are not.Joint probability distribution (model)associated with DAG can be constructed as a product of series of conditional dependencestatements (18):
( )
( )
( )
z
z
i
 z ii
 z  z  p p
 parents
|~
, (18)

Activity (6)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
jane25tel liked this
liyi3344520 liked this
knkl liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->