Introduction to Animal Breeding with Examples of (Non-)Gaussian Traits

Gregor Gorjanc
University of Ljubljana, Biotechnical Faculty, Department of Animal Science, Slovenia

INLA for Animal Breeders “Project" Trondheim, Norway 30th August 2010

Thank you for the invitation to NTNU!!! My department ...

Table of Contents

1. Animal breeding crash course

2. Categorical trait example

3. Survival analysis example

1. Animal Breeding Crash Course

Introduction
Animal breeding = mixture(animal science, genetics, statistics, . . . ) Many species (cattle, chicken, pig, sheep, goat, horse, dog, salmon, shrimp, honeybee, . . . ) Many (complex) traits:
production (milk, meat, eggs, . . . ) reproduction (no. of offspring, insemination success, . . . ) conformation (body height, width, . . . ) health & longevity ...

Genetic evaluation - to enhance selective breeding

Selective Breeding
Measure phenotype in candidates and select those with the most favourable values (= "mass” selection) Selected candidates will bred the next (better) generation

. . . , but phenotype is not transmitted to the next generation

Decomposition of Phenotypic Value
Genotype Environment

Phenotype

P =G +E +G ×E Genetic evaluation = inference of genotypic value given the data and postulated model (= “BLUP” selection)

Postulated Model and Data
Postulated model P = G + E + G × E = A + D + I + ...
A - additive (breeding) value D - dominance I - epistasis

Data
phenotypes on various relatives (pedigree)
own performance test progeny test (half-)sib test ...

recently also genotype marker data

Evaluation via Pedigree based Mixed Models
Not so standard example - “maternal animal model” y|b, c, ad , am , R R b c|C C a = aT , aT d m
T

∼ = ∼ ∼ =

N (Xb + Zc c + Zad ad + Zam am , R) 2 Iσe const. N (0, C) 2 Iσc
2 σad σad ,am 2 sym. σ am

|G ∼ N (0, G) G = G0 ⊗ A, G0 =

data: y (phenotypes), X, Z∗ (“covariates”), A (pedigree) parameters: b, c, a (means) 2 2 2 2 σc , σad , σad ,am , σam , σe (variances)

Inference (for Gaussian models)
“Standard”
means - solve Mixed Model (Normal) Equations (MME∗ )
Henderson (1949+)

SE of means (needed for accuracies) - inversion of LHS or some approximation variances - maximize Restricted Likelihood (REML)
Patterson & Thompson (1971)

“Powerfull/Popular/Fancy/. . . ” - McMC

MME
 XT R−1 X    sym. XT R−1 y
T

LHS =

XT R−1 Zc ZT R−1 Zc + C−1 c

 XT R−1 Za  ZT R−1 Za c  ZT R−1 Za + G−1 ⊗ A−1  a
T T

RHS =

, ZT R−1 y c

T

, ZT R−1 y a

Graphical Model View of Pedigree Model
A−1 = Wi,i T−1
T

W−1 T−1

= (I − 1/2P)T W−1 (I − 1/2P) = 1 − 1/4 1 + F f (i) − 1/4 1 + F m(i)
2 σa

Wi,i af (i)
1/2

am(i)
1/2

ai
i = 1 : nI

Genetic Groups
Different means in founders (usually due to different origin) = sort of hierarchical centering for pedigree model ... a|G ∼ N (Za Qa0 , G) a0 ∼ const. ... after some "massage"  ... ... ... 0  ... ... 0 LHS =  −1 T −1 −1  Za R Za + G ⊗ Ai,i G−1 ⊗ A−1 i,g sym. G−1 ⊗ A−1 g,g i − individuals, g − genetic groups

   

Genetic Groups - Graphical Model View
Unknown (phantom) parents are represented with (few!) genetic group(s) - “graphical parent(s)” Algorithm to set up A−1 directly available!!! Hierarchical prior can be put on genetic groups for stability/shrinkage
2 σa Wi,i

af (i)
1/2

am(i)
1/2

a0g(i)

ai
i = 1 : nI

Multi-trait = multi-variate
T T , X = ... y = y1 , y2 y| . . . ∼ N (Xb + Zc c + Zad ad + Zam am , R) 2 σe1 σe1 ,e2 R = R0 ⊗ I, R0 = 2 sym. σe2 c|C ∼ N (0, C) 2 σc1 σc1 ,c2 C = C0 ⊗ I, C0 = 2 sym. σc2 ad , am |G ∼ N (0, G)  2 σad σad1 ,ad2 σad1 , am1 σad1 ,am2 1  2 σad σad2 ,am1 σad2 ,am2  2 G = G0 ⊗ A, G0 =  2 σam1 σam1 ,am2  2 sym. σam2 T

    

there are now 16 variance components!!!

Non-Gaussian Traits
Categorical (health status, calving ease score, . . . )
threshold model = (ordered) probit model, cumulative link model, . . . multinomial categories mostly treated separately as binary traits

Counts (no. of offspring, . . . )
Poisson, but rarely used - replacements: threshold and/or Gaussian model

Time (longevity)
survival (Weibull & Cox) models

Mixtures
Gaussian components zero-inflated (no. of black spots in sheep skin -> wool, cure model - bivariate threshold model)

2. Categorical Trait Example (Calving ease score)

Calving Ease Score
Of great economical importance!!! We can not measure calving difficulty -> subjective score
1 2 3 4 = = = = no problem easy difficult mechanical help or ceasearean

Reasons for difficult calving?
sex (male calfs bigger) number of calfs - data usually omitted parity (more problems with the 1st calving) age (especially in the 1st parity; younger cows more problems) season? environment (= herd, herd-year) ...

Calving Ease Score II
Reasons for difficult calving - genetics?
morphology of calf
“direct” genetic effect or “sire/bull” effect genes expressed in calf “origin” of genes - father and mother of a calf

morphology of cows’ pelvic area
“maternal” genetic effect genes expressed in cow “origin” of genes - father and mother of a cow

Negative genetic correlation
larger animals (↑direct effect -> bad) have larger pelvic area (↓maternal effect -> good)

Parity specific genetic effects - 1st vs. 2nd+

Threshold Model
(Wright, . . . , Gianola & Foulley, Sorensen, . . . )

l|b, c, ad , am , R ∼ N (Xb + Zc c + Zad ad + Zam am , R) Pr (yi = k|µi , t) = Pr (tk−1 < li < tk |µi , t) tk−1 − µi tk − µi −Φ = Φ σ σ ... Model σ as well to improve model fit? log (σ) = . . . Methods: approx. EM-REML, Laplace approx., McMC

Approximative (Gaussian) Model - Example
(joint work with Marija Špehar - Croatia)

Dataset: ~150k phenotypes, ~200k animals, 10 dataset samples Homogenization of variance by region and period of recording scale problems? Bi-variate (1st & 2nd+ parity) maternal animal model with heterogenous (by sex within parity class) residual variance 18 variance components - with VCE-6 program
herd-year interaction (3) -> better with autoregressive prior? 2 2 σh1 , σh2+ , σh1 ,h2+ permanent effect of a cow (repeated records) (1) 2 σc2+ direct & maternal genetic effect (10) 2 2 2 σad , σad , σad1 ,ad2+ , . . . σam 2+ 1 2+ residual (4) 2 2 2 2 σem , σef , σem , σef
1 1 2+ 2+

Approximative (Gaussian) Model - Example
Residual variances 2 2 2 2 σem1 = 0.295, σef = 0.204, σem2+ = 0.228, σef
1 2+

= 0.162

Ratios and correlations (1st vs. 2nd+) 1st 2nd+ Corr. Herd-year Direct 27.545 4.548 24.445 9.948 20.845 0.548 Maternal 3.548 4.248 0.743 Perm. / 5.1 /

Genetic correlation between direct and maternal effect Direct, 1st Direct, 2nd+ Maternal, 1st -0.490 -0.433 Maternal, 2nd+ -0.377 -0.730

A Look at my Data - Structure
Dimensions
#records (= #calfs) ~150k #cows ~74k #bulls ~1k #pedigree records (all generations + pruning)
animal pedigree ~230k (basic set are calfs + ancestors) sire-dam pedigree ~115k (basic set are mothers and fathers of calfs + ancestors) two more options: sire-maternal grandsire pedigree, sire pedigree

Distribution of scores
no problem 50.3% no problem 49.7%
easy 43.5% difficult 6.1% mechanical help or ceasearean 0.1%

A Look at my Data - Sex & Parity
Sex
females 52% females 47%

Parity
1st 59% 2nd 46% 3rd 45% 4th 45% 5th 45%

A Look at my Data - Age within Parity
1.0 Score 1st (male) Score 2nd ... #Records

0.0 20

0.2

Average score 0.4 0.6

0.8

40

60 Age at calving

80

100

A Look at my Data - Age within Parity & Sex
1.0 Score 1st (male) Score 1st (female) Score 2nd ... #Records

0.0 20

0.2

Average score 0.4 0.6

0.8

40

60 Age at calving

80

100

A Look at my Data - Season
1.0 Score #Records

0.0 0

0.2

Average score 0.4 0.6

0.8

20

40 Season

60

80

100

Analysis of my Data in R - Available Tools
Bernoulli/binomial model
glm() - package stats glmer() - package lme4
Laplace and adaptive Gauss-Hermite approximation (for more effects)

inla()

threshold model
polr() - package MASS clm() - package ordinal
location (additive) and scale (multiplicative) model

clmm() - package ordinal
location (additive) and scale (multiplicative) model Laplace and adaptive Gauss-Hermite approximation (for one effect)

3. Survival Analysis Example (Longevity = Length of Productive Life)

Model and Data
Weibull model y|b∗ , h, a, ρ ∼ Weibull (Xb∗ + Zh h + Za a, ρ) h (y|b∗ , h, a, ρ) = ρyρ−1 exp (Xb∗ + Zh h + Za a) b∗ b∗ h|γ a|G G Data
~110k cows from ~4k herds, ~40% censoring sire-maternal grandsire pedigree with ~3k bulls

= ∼ ∼ ∼ =

ρ ln λ, bT const. Log − Gamma (γ, γ) N (0, G) 2 Aσa

T

Implementation

Survival Kit program

Log-Gamma prior “integrated out”

Laplace approximation for Normal prior

Time Independent Effect - Age at 1st Calving
All records Uncensored records Relative risk Baseline 12000 43 46 49 0 19 22 25 28 31 34 37 40 Age at first calving (month) 2000 4000 6000 8000 No. of records 1.8 1.0 Relative risk 1.2 1.4 1.6

Time Dependent Effect - Parity*Stage
0.0020 All records Uncensored records Hazard function 8000 2000 0 0 500 1000 1500 Length of productive life (day) 2000 4000 6000 No. of records

0.0000

Hazard function 0.0005 0.0010 0.0015

Thank you!

Postulated Model and Data II
Breeding value for individual = f(parent average, phenotype deviation, progeny contribution)
b1 y21 a1 y3 a3 a5 a7 y22 a2 a4 y5 a8 y10 a10 a6 y4 y6 a9 b2

Sign up to vote on this title
UsefulNot useful