0 ratings0% found this document useful (0 votes)

307 views69 pagesThe slides of the lecture "Information-theoret ic analysis of -omics data," delivered 17 November 2008 in BIO 5106 (BIOL 5506) BIOINFORMATICS.

Dec 07, 2008

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

The slides of the lecture "Information-theoret ic analysis of -omics data," delivered 17 November 2008 in BIO 5106 (BIOL 5506) BIOINFORMATICS.

Attribution Non-Commercial (BY-NC)

0 ratings0% found this document useful (0 votes)

307 views69 pagesThe slides of the lecture "Information-theoret ic analysis of -omics data," delivered 17 November 2008 in BIO 5106 (BIOL 5506) BIOINFORMATICS.

Attribution Non-Commercial (BY-NC)

You are on page 1of 69

An introduction

David R. Bickel

University of Ottawa

17 November 2008

Today’s class

Today’s class

Which genes express di¤erently between treatment and control?

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

Basic: hormone or other chemical added to some cell cultures

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

Basic: hormone or other chemical added to some cell cultures

Other examples?

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

Basic: hormone or other chemical added to some cell cultures

Other examples?

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

Basic: hormone or other chemical added to some cell cultures

Other examples?

for di¤erential expression?

Today’s class

Which genes express di¤erently between treatment and control?

Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

Basic: hormone or other chemical added to some cell cultures

Other examples?

for di¤erential expression?

for equivalent expression?

Pick the di¤erentially expressed genes

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the true

distribution

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the true

distribution

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the true

distribution

A sample from the treatment group and a sample from the control

group are hybridized to the same microarray slide

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the true

distribution

A sample from the treatment group and a sample from the control

group are hybridized to the same microarray slide

Each gene’s expression ratio is a measurement of its expression in the

treatment group relative to its expression in the control group

Pick the di¤erentially expressed genes

An average expression ratio of 1 indicates equivalent expression

Two types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

An average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the true

distribution

A sample from the treatment group and a sample from the control

group are hybridized to the same microarray slide

Each gene’s expression ratio is a measurement of its expression in the

treatment group relative to its expression in the control group

data set #1 data set #2 data set #4 data set #6

data (n = 3)

data (n = 6)

model (n = 3)

model (n = 6)

evidence (n = 3)

evidence (n = 6)

For each data set, indicate whether the gene is equivalently expressed (E)

or di¤erentially expressed (D) according to the plot of the data, according

to the model, and according to the evidence for each number of

observations (3 or 6). Equivalent expression means the average expression

ratio is 1.

Statistical models

Statistical models

Equivalent expression model

Statistical models

Equivalent expression model

Unknown variability of expression

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

One unknown parameter (p = 1)

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

One unknown parameter (p = 1)

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

One unknown parameter (p = 1)

Unknown variability of expression

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

One unknown parameter (p = 1)

Unknown variability of expression

Unknown expression ratio

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

One unknown parameter (p = 1)

Unknown variability of expression

Unknown expression ratio

Two unknown parameters (p = 2)

Statistical models

Equivalent expression model

Unknown variability of expression

Expression ratio known to be 1

One unknown parameter (p = 1)

Unknown variability of expression

Unknown expression ratio

Two unknown parameters (p = 2)

Balancing complexity and …t

equivalent expression model (p = 1)

Balancing complexity and …t

equivalent expression model (p = 1)

More complex models tend to …t data better than simple models,

even if the simple models are better

Balancing complexity and …t

equivalent expression model (p = 1)

More complex models tend to …t data better than simple models,

even if the simple models are better

Overly complex models make poor generalizations

Balancing complexity and …t

equivalent expression model (p = 1)

More complex models tend to …t data better than simple models,

even if the simple models are better

Overly complex models make poor generalizations

A sample of patients may not represent the population

Balancing complexity and …t

equivalent expression model (p = 1)

More complex models tend to …t data better than simple models,

even if the simple models are better

Overly complex models make poor generalizations

A sample of patients may not represent the population

A single experiment may not re‡ect typical biological processes

Balancing complexity and …t

equivalent expression model (p = 1)

More complex models tend to …t data better than simple models,

even if the simple models are better

Overly complex models make poor generalizations

A sample of patients may not represent the population

A single experiment may not re‡ect typical biological processes

Fit

= Evidence

Complexity

Balancing complexity and …t

equivalent expression model (p = 1)

More complex models tend to …t data better than simple models,

even if the simple models are better

Overly complex models make poor generalizations

A sample of patients may not represent the population

A single experiment may not re‡ect typical biological processes

Fit

= Evidence

Complexity

How does balancing …t with complexity change your assessments?

Quality of model …t to the data

n = sample size

Quality of model …t to the data

n = sample size

number of measured expression ratios

Quality of model …t to the data

n = sample size

number of measured expression ratios

Quality of model …t to the data

n = sample size

number of measured expression ratios

degree to which the model disagrees with the observed data (log scale)

Quality of model …t to the data

n = sample size

number of measured expression ratios

degree to which the model disagrees with the observed data (log scale)

n

1

Fit = p

MSE

Quality of model …t to the data

n = sample size

number of measured expression ratios

degree to which the model disagrees with the observed data (log scale)

n

1

Fit = p

MSE

degree to which the model …ts the observed data (assuming a normal

distribution)

Model complexity

n = sample size

Model complexity

n = sample size

number of measured expression ratios

Model complexity

n = sample size

number of measured expression ratios

p = model dimension

Model complexity

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

Model complexity

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

Model complexity

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

Model complexity

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p (p + 1)

pc = p +

2 (n p + 1)

Model complexity

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p (p + 1)

pc = p +

2 (n p + 1)

e¤ective number of parameters in the model (corrected for small n)

Model complexity

n = sample size

number of measured expression ratios

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p (p + 1)

pc = p +

2 (n p + 1)

e¤ective number of parameters in the model (corrected for small n)

Complexity = 2.718pc

Model complexity

n = sample size

number of measured expression ratios

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p (p + 1)

pc = p +

2 (n p + 1)

e¤ective number of parameters in the model (corrected for small n)

Complexity = 2.718pc

Fit

= Evidence

Complexity

Answers

Answers

If a statistical method says an equivalently expressed gene is

di¤erentially expressed, is the method useless?

Answers

If a statistical method says an equivalently expressed gene is

di¤erentially expressed, is the method useless?

If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?

Answers

If a statistical method says an equivalently expressed gene is

di¤erentially expressed, is the method useless?

If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?

The advantage of obtaining more data

Answers

If a statistical method says an equivalently expressed gene is

di¤erentially expressed, is the method useless?

If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?

The advantage of obtaining more data

Answers

If a statistical method says an equivalently expressed gene is

di¤erentially expressed, is the method useless?

If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?

The advantage of obtaining more data

How con…dent should you be in your assessments?

Answers

If a statistical method says an equivalently expressed gene is

di¤erentially expressed, is the method useless?

If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?

The advantage of obtaining more data

How con…dent should you be in your assessments?

Should you obtain more data before making an assessment?

The expression data sets

data set #1 data set #2 data set #4 data set #6

ratio 1 2 1 1.4

expression equivalent di¤erential equivalent di¤erential

n = 10 0.44/1.38 0.14/0.09 0.14/0.17 0.19/0.37

n = 25 0.29/0.71 0.03/0.002 4.77/1.00 0.05/0.04

1 10 4

n = 100 36/69 16/32 0.03/0.01

2 10 7

Key

n is the number of observed expression ratios.

Evidence di¤erentially expressed

Each ratio is , the weight of evidence

Evidence equivalently expressed

favoring di¤erential expression over equivalent expression.

* misleading evidence for di¤erential expression

** misleading evidence for equivalent expression

David Bickel (uOttawa) Information theory 17 November 2008 10 / 11

Further study

(AIC) after correcting it for small numbers of measurements

Further study

(AIC) after correcting it for small numbers of measurements

AICc = 2 ln (Evidence)

Further study

(AIC) after correcting it for small numbers of measurements

AICc = 2 ln (Evidence)

Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)

Further study

(AIC) after correcting it for small numbers of measurements

AICc = 2 ln (Evidence)

Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)

Kenneth Burnham and David Anderson, Model Selection and

Multi-Model Inference

Further study

(AIC) after correcting it for small numbers of measurements

AICc = 2 ln (Evidence)

Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)

Kenneth Burnham and David Anderson, Model Selection and

Multi-Model Inference

Further study

(AIC) after correcting it for small numbers of measurements

AICc = 2 ln (Evidence)

Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)

Kenneth Burnham and David Anderson, Model Selection and

Multi-Model Inference

www.statomics.com

Skip section### Trending

- The Duke's Shotgun WeddingStacy Reid
- City of AshesCassandra Clare
- Good to Great: Why Some Companies Make the Leap...And Others Don'tJim Collins
- Rich Dad Poor Dad: What The Rich Teach Their Kids About Money - That the Poor and Middle Class Do Not!Robert T. Kiyosaki
- City of Lost Souls: The Mortal Instruments, Book FiveCassandra Clare
- Penguins, Penguins, Everywhere!Bob Barner
- RadianceGrace Draven
- Red QueenVictoria Aveyard
- The Return of the King: Book Three in the Lord of the Rings TrilogyJ.R.R. Tolkien
- The SilmarillionJ.R.R. Tolkien
- Whitney, My LoveJudith McNaught
- Beartown: A NovelFredrik Backman
- Spy School Secret ServiceStuart Gibbs
- RoomiesChristina Lauren
- The OverstoryRichard Powers
- The Extraordinary Life of Sam Hell: A NovelRobert Dugoni
- The Creation Frequency: Tune In to the Power of the Universe to Manifest the Life of Your DreamsMike Murphy
- A Court of Frost and StarlightSarah J. Maas
- Daughter of the Pirate KingTricia Levenseller
- This Is How You Lose The Time WarAmal El-Mohtar
- Tomorrow Most LikelyDave Eggers
- Come Tumbling DownSeanan McGuire
- The WivesTarryn Fisher
- The Worst Best Man: A NovelMia Sosa