Professional Documents
Culture Documents
Presentation by:
Ahmad Alsahaf
Longevity, fertility,
more wool, milk, eggs, meat, etc.
Breeding for desirable traits
Longevity, fertility,
more wool, milk, eggs, meat, etc.
1930’s
Early History
1930’s
Karl Pearson (1854-1936)
1940’s
Early History
1940’s
Early History
Artificial
Insemination
Measure the
quality of the
milk
Determine the
economic value
of the bull
Progeny Testing
Artificial
Insemination
50,000 – 70,000
Genetic Markers
Measure the
quality of the
milk
Determine the
economic value
of the bull
Machine Learning Examples
1. Using classification models (supervised learning) to detect problems
in artificial insemination.
• Lactation number
• % HF Genome
• Sex of the calf Good cow
• Age of cow
• AI season Bad cow
• Health metric
• % of fat/protein
in Milk
In 1200 cows
nominal phenotypes,
categorical phenotypes,
environmental factors
Machine Learning Examples
1. Using classification models (supervised learning) to detect problems
in artificial insemination.
• Lactation number
• % HF Genome
• Sex of the calf Linear Classifiers
Good cow
• Age of cow Logistic Regression
• AI season Artificial Neural Networks Bad cow
• Health metric Multivariate adaptive regression splines
• % of fat/protein
in Milk
In 1200 cows
nominal phenotypes,
categorical phenotypes,
environmental factors
Machine Learning Examples
1. Using classification models (supervised learning) to detect problems
in artificial insemination.
• Quantitative traits (e.g. Milk yield, disease, longevity) are controlled by multiple
markers.
• Machine Learning can associate multiple genetic markers to a phenotype AND find
complex interactions between markers.
• Machine Learning can facilitate dealing with redundant and irrelevant variables.
Example: From genotype to Milk yield
Input:
n: 297 cows
p: 35,798 SNPS
Output:
Milk yield
Protein yield
Fat yield
Example: From genotype to Milk yield
i.e. Small n, large • 297 variables derived from the original 35,798
p problem
§ Using genome derived (SNP) relationships
between the cows as inputs instead of the SNP’s
Output: themselves.
Milk yield
Protein yield § By constructing a matrix of genomic
Fat yield relationships that’s analogous to a covariance
matrix and is based on allele frequency in the
population
Example: From genotype to Milk yield
Input:
395 Holstein cows
42,275 SNPS
Output:
Residual Feed Intake of the cow
Methods:
• Decision trees
Methods:
Dealing with
dimensionality:
Such that:
§ the emulator is less computationally intensive than the PB model;
§ the input-output behavior reproduces accurately the PB model behaviour;
§ the emulator is ‘’credible’’ from the user/analyst’s point of view. (Physically inrerpretable)
What is Model-order reduction (Emulation Modelling)?
Such that:
§ the emulator is less computationally intensive than the PB model;
§ the input-output behavior reproduces accurately the PB model behaviour;
§ the emulator is ‘’credible’’ from the user/analyst’s point of view. (Physical interpretability)
Recursive Variable Selection - A feature selection algorithm
>2%
State variables
Exogenous inputs
Control variables
Output variable
PCA vs Sparse PCA– Coefficients heat map
PCA Vs. Sparse and Weighted PCA – Emulator performance
Emulator performance
1 Explained variance
PCA
0.9
WPCA
0.8 SPCA
0.7
0.6
R2
0.5
0.4
0.3
0.2
0.1
-0.1
-0.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Principle Components
Ref: Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees.
Machine learning, 63(1), 3-42.