You are on page 1of 21

Genetic risk prediction in complex disease

Luke Jostins and Jeffrey C. Barrett


Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambs CB10 1HH, UK august 22, 2011

SB Mirza FA11-RBI-004 sbmirza89@gmail.com

EPIDEMIOLOGY AND RISK PREDICTION


Attempting to predict the onset and progression of disease is one of the cornerstones of epidemiology For some (but by no means all) diseases, clinically usable risk prediction can be performed using classical risk factors such as body mass index, lipid levels, smoking status, family history and, under certain circumstances, genetics (e.g.BRCA1/2in breast cancer)

The advent of genome-wide association studies (GWAS) has led to the discovery of common risk loci for the majority of common diseases. These discoveries raise the possibility of using these variants for risk prediction in a clinical setting.

QUANTIFYING PREDICTIVE ACCURACY


Predictive tests can produce either binary classification: high or low risk quantitative risk score: degree of risk for each individual The simplest measures of classification accuracy are the sensitivity and specificity of the test These values vary with the choice of T

GWAS
Genome-wide association studies (GWAS) are used to identify common genetic factors that influence health and disease. genetic predic-tion is improved via GWAS good genetic prediction of age-related macular degeneration was quickly enabled by mul-tiple largeeffect variants identified by GWAS Crohns disease, can be reasonably well predicted by a large number of weak effects. Note that the range of AUCs for these diseases is very similar to the range found in classical prediction

PREDICTION IN THE POST-GWAS ERA


The most basic means of incorporating genetic information in risk prediction is via family history, which has predictive accuracy proportional to both the heritability and prevalence of disease

Assuming a breast cancer lifetime risk(k) of 12%, a BRCA1/2 penetrance of approximately 50%, and a frequency of around 1%, the AUC of BRCA1/2 testing would only be 0.52.

MODELLING CONCERNS
Accurate prediction is done if the genetic architecture of a disease were completely described.poor prediction is due to poor current understanding of the underlying genetics of a disease, whereas others might never be tractable to genetic prediction.

Three models have been proposed each of which corresponds to a dif-ferent assumption about the distribution of disease probability in the population log model:analytically tractable but relatively unrealistic, assuming that probabilities are log-normally distributed, which can create disease probabilities greater than 1

logistic model:odds ratios are log-normally distributed, probit (or liability threshold) model:assumes a continuous distri-bution of a disease phenotype (called the liability) in the population, with heritable and non-heritable components.

THE FUTURE OF GENETIC RISK PREDICTION


allow prediction for a large number of common diseases but similar predictive ac-curacies found in classical prediction Family history+genetic risk prediction=combined beter effect In addition, larger meta-analyses and future sequencing studies will identify further risk variants, possibly in-cluding lower frequency variants of large effect size

genetic risk predic-tion is highly stable over time, as a persons genetic sequence is essentially constant throughout their life. Cost effectiveness Attaching a patients genome to an electronic medical record will enable a variety of prediction scenarios depend-ent on disease aetiology, prevalence and prevention and treatment options

Genetic epidemiology:is the study of the role of genetic factors in determining health and disease in families and in populations, and the interplay of such genetic factors with environmental factors.

AUC for dominent mutation with incomplete penetrance


Assume a disease has a prevelance ofK , and a mutation for this disease has a frequency f and a penetrance (with >K). Thus of individuals who have the disease a proportion f /K have the mutation, and of individuals who do not have the disease a proportion f (1 ) /1 K have the mutation. the AUC of a single sibling family history in 1/ 2+ (S 1) / 2( 1 K ) ,where K is the prevalence and S the sibling relative risk,

established GWAS loci typically explain only a small fraction of the heritability of complex diseases *an observation known as missing heritability+ It has been shown that 3% of variance (corresponding to an AUC of 0.65)in schizophrenia risk can be explained by a polygenic model, including a large number of loci that did not achieve genome-wide significance

ROC
Efectiveness of continuous diagnostic markers in distinguishing between diseased and healthy individuals A person is assessed as diseased (positive) or healthy (negative) depending on whether the corresponding marker value is greater than or less than or equal to a given threshold value. The theoretical ROC curve is a plot of q = sensitivity versus p =1-specicity for all possible threshold values

AUC
Let X and Y denote the diagnostic marker measurements for diseased and healthy subjects, respectively. Bamber showed that AUC= Prob (X>Y). This can be in-terpreted as the probability that in a randomly selected pair of healthy and diseased individuals the diagnostic marker value is higher for the diseased subject. Values of AUC close to 1.0 indicate that the marker has high diagnostic accuracy. global index of diagnostic accuracy general measure for the diferences between two distributions

Log-normal distribution
In probability theory, a log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. If X is a random variable with a normal distribution, then Y = exp(X) has a log-normal distribution; likewise, if Y is log-normally distributed, then X = log(Y) is normally distributed.

You might also like