You are on page 1of 13

Statistics

What is Statistics?

  • - Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances

Why Do Medical Professionals Need an Understanding of Statistics?

  • - Average number of peer-reviewed medical research articles published/year over 1994-2001: 398,778 (MEDLINE)

  • - ~ 275,000 involved human subjects

  • - ~ 25,000 involved randomized, controlled trials

Yeah, yeah, yeah. So, how will this knowledge make me a better doctor?

  • - Understanding what the results of these studies mean (and don’t mean) can help in deciding between various treatments

  • - The more you know, the better the chance that you will be able to communicate with your patients.

Example: The impact of microscopic extrathyroid extension on outcome in patients with clinical T1 and T2 well-differentiated thyroid cancer

  • - Patients and Methods. From an institutional database, we identified 984 patients (54%) who underwent surgery for cT1/T2N0 disease. Of these, 869 patients were pT1/T2 and 115 were upstaged to pT3 based on the finding of microscopic ETE. Disease-specific survival (DSS) and recurrence-free survival (RFS) were analyzed for each group using the KaplanMeier method. In the pT3 group, factors predictive of outcome were analyzed by univariate and multivariate analyses.

  • - Results. There was no difference in the 10-year DSS (99% vs 100%; P = .733) or RFS (98% vs 95%; P =.188) on comparison of the pT1/pT2 and pT3 cohorts. Extent of surgery and administration of postoperative RAI were not significant for recurrence on univariate or multivariate analysis in the pT3 cohort.

o

No difference with stage 3 and stage 1 odds

  • - Conclusion. Outcomes in patients with cT1T2N0 WDTC are excellent and not affected by microscopic ETE. The extent of resection and administration of postoperative RAI in patients with microscopic ETE does not impact survival or recurrence.

No difference?

  • - 99% = 100%?!?

  • - 98% = 95%?!?

  • - What do P = .733 and P = .188 mean?

Getting Started

  • - Variable characteristic that can be measured or observed. If a characteristic is the same for every member of the population, it is referred to as a constant

  • - Types of variables

o

Quantitative (Continuous, Discrete)

o

Categorical (Binary, Non-binary)

  • Ex. ZIP CODE (number but just describes something, not actually a numerical value)

Quantitative Variables

  • - Variables that take on numeric values for which arithmetic operations (differences, averages, etc.) make sense.

o

Continuous: Can take on any value over one or more intervals (height, body fat percentage, LDL)

o

Discrete: Takes on one of a finite or countably infinite set of values (white blood cell count, number of cases of mono in a school system, number of patients treated in an emergency room in one day)

  • has to be a space in between

Categorical Variables

  • - Variables that identify which of at least two categories an observation falls in. Binary: Two possible categories (disease presence, Rh factor, vital state dead or alive)

o

o

Non-binary: Three or more categories (eye color, type of melanoma (lentigo, nodular, etc.))

Some comments

  • - The choice of statistical methods depends partially on the type(s) of variable(s).

  • - We often measure many continuous variables a specific level of precision (nearest inch, gram, etc.). Process called discretization. This does imply that the variable is discrete.

  • - Numeric values don’t automatically imply that a variable is quantitative. (e.g. databases will have breast cancer staging classified as 0, 1, 2, 3, or 4.)

Descriptive vs. Inferential Statistics

  • - Descriptive statistics involves using summary values and graphical displays to explore the distribution of one or more variables in a data set or the relationship between two or more variables.

  • - Inferential statistics involves drawing conclusions about a population with a certain degree of confidence or error rate.

Some Descriptive Statistics

  • - Frequency/relative frequency distributions

  • - Histogram

- Numeric Summaries Measures of central tendency (mean, median, mode) Measures of variability (standard deviation, interquartile
-
Numeric Summaries
Measures of central tendency (mean, median, mode)
Measures of variability (standard deviation,
interquartile range, range)
-
Boxplot
-
Z-score (it’s not just about the average!)
Breast Cancer % Late Stage at Diagnosis by
Health Insurance Status - Women Ages 40-79,
New Jersey, 2006-2008
Maybe more people in Medicaid are getting
diagnosed with more late stage than early
stage

Frequency/Relative Frequency

  • - Frequency number of times an observation with a specific value occurs.

  • - Relative Frequency fraction/proportion of all observations that have a specific value. Can also be expressed as a percentage. (Note: not all percentages are relative frequencies. Blood/alcohol content, Body/fat percentage are two examples)

Frequency/Relative Frequency - Frequency – number of times an observation with a specific value occurs. -

Histogram/Frequency Table

  • - Used for quantitative data

  • - Set of data values broken into equal width intervals (width usually determined by software, can be changed)

  • - Bar height = frequency (or relative frequency)

  • - Variable represented on horizontal axis

  • - Changing interval width changes appearance of graph.

Histogram/Frequency Table on Right

  • - From state data

  • - Intervals are closed on left and open on the right

Left end point is included and right end

point is not (so just goes up to 39.99 but

doesn’t include 40 for the first incidence

rate

Frequency/Relative Frequency - Frequency – number of times an observation with a specific value occurs. -

Comparisons (Death Rates)

  • - deaths out of every 100,000 people

  • - this doesn’t tell you which one is more dangerous

doesn’t tell you details about what

percentage of people who get it die

  • - colorectal cancer is more common than pancreas so that’s why it has a higher death rate

Questions

  • - Are there site/sex combinations that tend to have higher/lower death rates than others?

o

Higher: M colorectal (more numbers to the right)

Frequency/Relative Frequency - Frequency – number of times an observation with a specific value occurs. -

o

Death rate for males will be more than females

  • For breast cancer:

Its more common in females but the % of survival for F and M is the same

  • - Are there site/sex combinations that tend to have more variable (less consistent) death rates than others?

o

How widespread the data is

o

Male colorectal

  • More variability

  • Pancreas data appears the same for F and M

Numeric Summaries

  • - Mean arithmetic average Not resistant will be affected by outlier

o

  • - Median middle observation in an ordered list of data

o

Resistant wont really be affected by outlier

  • - Mode observation with the highest frequency

  • - Resistance of a statistic Depends on whether the value of statistic is impacted by an extremely low or extremely high observation (outlier)

Usefulness?

  • - Mean:

o

Not resistant to outliers. Many inferential methods based on the mean do not produce reliable results when

o

outliers (especially extreme outliers) are in a data set. Utilizes all of the observations in a data set, hence it takes advantage of as much information as possible

  • - Median

o

Resistant to outliers.

o

Does not utilize all of the data

o

Inferential methods based on the median can be used when outliers exist.

  • - Mode

o

Used with categorical data/discrete variables with a small number of unique values.

o

No inferential methods

Important Related Measures

  • - Trimmed mean: mean is calculated after removing the smallest p% and largest p% of the data values. (Resistant to outliers, uses more data than the median)

o

Dangerous to do if outlier is important

  • - Five number summary: consists of the minimum, lower (first) quartile, median, upper (third) quartile, and maximum. Divides the data set into ordered sets that consist of 25% of the data.

o

Put it in order, cut it in half, cut each half in half

MAO Example Monoamine oxidase in 18 schizophrenia patients (nmoles benzylaldehyde product/108 platelets)

Median: middle of 8.7 and 9.7

Numeric Summaries - Mean – arithmetic average Not resistant – will be affected by outlier o

MAO Example (cont.)

  • - Five-number summary:

4.1 (min)

 

7.4 (lower quartile)

9.2 (median)

11.9 (upper quartile)

18.8 (max)

  • - Mode:

 

7.8

(occurred twice)

5.56% (1 out of 18) Trimmed Mean:

 
 

9.6

(4.1 and 18.8 were removed, avg. of remaining 16 was calculated)

Measures of Variability

  • - Standard Deviation – “average” distance between each observation and the mean.

  • - Interquartile Range distance between the lower and upper quartiles

o

Middle 50%

  • - Range distance between the minimum and the maximum

Usefulness?

  • - Standard Deviation

o

Not resistant to outliers because mean is not resistant

o

Utilizes all of the observations in a data set.

  • - Interquartile Range

o

Resistant to outliers

o

Multiple accepted methods for calculating IQR

  • - Range

o

Not resistant to outliers

o

No idea what is going on between the extremes

MAO example revisited

Mean = 9.8056

MAO Example (cont.) - Five-number summary: 4.1 (min) 7.4 (lower quartile) 9.2 (median) 11.9 (upper quartile)

Deviation: data value of mean (4.1 9.2 =-5.7) Standard deviation = 3.6183 (absolute standpoint) Interquartile Range (IQR) = 11.9-7.4 = 4.5 Range = 18.8-4.1 = 14.7

MAO example revisited (cont.)

  • - Interpretations

  • - Standard deviation: On “average”, the MAO values for the 18 people in the sample are within 3.6183 units of the mean.

  • - IQR: The “middle” 50% of the MAO values fall within a 4.5-unit interval.

  • - Range: All of the data fall within a 14.7-unit interval.

Colorectal M
Colorectal M
Pancreas F
Pancreas F
- Standard deviation: On “average”, the MAO values for the 18 people in the sample are
- Standard deviation: On “average”, the MAO values for the 18 people in the sample are

Death Rates (slide 20) Revisited

- Standard deviation: On “average”, the MAO values for the 18 people in the sample are
Colorectal F
Colorectal F
- Standard deviation: On “average”, the MAO values for the 18 people in the sample are
Pancreas M
Pancreas M
- Standard deviation: On “average”, the MAO values for the 18 people in the sample are
- Standard deviation: On “average”, the MAO values for the 18 people in the sample are

Match the summary statistics with the four histograms.

Boxplot (aka Box-and-whisker plot)

  • - Visual representation of five-number summary

  • - Displays outliers, if they exist

  • - Max, min, quardrants, medium

  • - Dot = outlier

  • - If have outlier, the lines are at the min and max WITHOUT the outlier

- Standard deviation: On “average”, the MAO values for the 18 people in the sample are

Z-score

  • - Putting things on the same scale so you can compare them

  • - Mississippi’s colorectal cancer death rate is 16.5 per 100,000 women, while Kentucky’s rate is 23.4 per 100,000 men. A Mississippi politician states that their rate for women isn’t as bad as Kentucky’s for men because their rate is 2.484 above average, while Kentucky’s is 3.624 above average. Is this an appropriate comparison?

Z-score - Putting things on the same scale so you can compare them - Mississippi’s colorectal
  • - Expect the males rates to be more spread out than female rates because male rate SD is more

  • - The Mississippi rate is 1.682 standard deviations above the mean rate for females, while the Kentucky rate is 1.605 standard deviations above the mean rate for males. Hence, the Mississippi rate is farther above average for females than the Kentucky rate is for males.

  • - Farther away from 0, farther away you are from the average

Segue to Inference

Z-score - Putting things on the same scale so you can compare them - Mississippi’s colorectal

Sampling Distribution

Stereotypical “survival” curve – will dive down like that

  • - Suppose we choose 40 of these patients at random and calculate the average survival time of the 40 who were selected. What values would we typically expect for the average of the 40? Would it be possible to obtain an average between 1600 and 1700 days? How about between 0 and 100 days? Which would be more likely?

The 0-100 days would be more likely

Distribution of mean survival time of 40 patients

-

-

10,000 random samples of 40 patients

NOT ONE has an average between 0-100 but its closer than

1600-1700

Question?

Distribution of mean survival time of 40 patients - - 10,000 random samples of 40 patients

-

-

-

Two experimental treatments for late-stage pancreatic cancer have been developed. Each one is tested on a cohort of forty patients. The first cohort has an average survival time of 370 days, while the second cohort has an average survival time of 450 days. Do either one of these results provide evidence that the average survival time using the new treatment is greater than 343 days?

1,897 out of the 10,000 average survival times were

  • 370 days or higher. (P = .1897)

  • 319 out of the 10,000 average survival times were 450

days or higher. (P = .00319)

Distribution of mean survival time of 40 patients - - 10,000 random samples of 40 patients

Implication

-

-

-

-

If the average survival time is 343 days (no different) for experimental treatment A than the status quo, approximately 19% of all samples of 40 patients will have an average survival time of 370 days or longer.

If the average survival time is 343 days (no different) for experimental treatment B than the status quo, approximately 0.3% of all samples of 40 patients will have an average survival time of 450 days or longer.

Do the data seem “out of line” enough with the status quo to believe that the new treatment does increase survival time?

(Ball pit analogy…)

Basics of a Hypothesis Test

-

Null hypothesis

: Statement assumed to be true. Typically implies “no difference”, “no impact”, “maintaining the

status quo”. If the null hypothesis is true, we know how our statistic should behave.

-

Alternative hypothesis ( “research hypothesis.”

) statement of what a statistical hypothesis test is set up to establish. Also called the

Hypothyroidism Analogy

-

-

-

-

-

: patient doesn’t have hypothyroidism

: patient does have hypothyroidism

Data: blood test

 

If

is true,

 

 

,

Layman’s view: If TSH is high enough, and thyroxine is low enough,

hypothyroidism will be made.

will be rejected and a diagnosis of

P-value

  • - Probability of observing a statistic that is at least as extreme as the one produced by the sample, assuming that the null hypothesis is true.

  • - Logic: lower p-values imply that it is harder to obtain a specific result if the null hypothesis is true. Hence, lower p- values correspond to having more evidence against the null hypothesis.

Significance Level (

)

  • - Threshold for concluding whether the null hypothesis is false.

  • - Maximum acceptable error rate for mistakenly concluding that the alternative hypothesis is true, when, in fact, the null hypothesis is true.

  • - From a medical test standpoint

-

Most common

values: .10, .05, .025, .01

If you pick .05 5% of the time, you get this result due to chance

THE RULE

If we reject

Why “Fail to Reject”?

If P-value

, reject

. If not, fail to reject (do not reject)

. , we say that the result is statistically significant at the (insert

) level.

  • - A court trial is analogous to a hypothesis test, with

  • - : defendant is innocent

  • - : defendant is guilty

  • - If there is enough evidence against the defendant, the jury rejects the null, returning a guilty verdict. If there isn’t enough evidence, the jury returns a “not guilty” verdict, not an “innocent” verdict. The purpose of the trial is not to prove innocence, it is to “prove” guilt.

  • - Rejecting H0 is GUILTY but not rejecting H0 is NOT GUILTY

The Result Is Significant. Now What?

  • - Pancreatic Cancer Treatment B

P-value = .0031

  • - We have concluded that the new treatment increases survival, on average. Does this mean in increases survival for everyone? Does it mean that more people are cured, but the side effects result in more early deaths? The result should create more questions to be examined.

HE ENDED HIS LECTURE AT THIS POINT BECAUSE HE RAN OUT OF TIME. HE WILL RECORD HIMSELF AND RELEASE THE REST OF THE LECTURE FOR OUR OWN BENEFIT.

Other Things You Might See

  • - Test Statistic measure of how different the relevant statistic(s) is/are from what is specified in the null hypothesis. In general, values that are farther away from zero imply a greater difference, but what is classified as being “far away” depends on a number of factors, which is why it is easier to work with the P-value.

  • - Power: The power of a hypothesis test is the probability that you will reject some cases, the value

when the statement in

is true. In

-

will be reported. In this setting,

is the probability of failing to reject

when, in fact,

is true.

From a medical test standpoint, the power of a test is equivalent to the sensitivity of a test. Hence, probability of a false negative.

All others constant,

and

are inversely related.

is the

Strategy to Accelerate or Augment the Antidepressant Response and for An Early Onset of SSRI Activity. Adjunctive Amisulpride to Fluvoxamine in Major Depressive Disorder

  • - Abstract: The topic of early response to antidepressant treatment has been extensively studied in major depressive disorder (MDD). We serendipitous observed an increase tolerability, a rapid response to therapy and an early onset of antidepressant fluvoxamine activity when associated with amisulpride in patients with major depressive disorder. The purpose of this study was to investigate our preliminary observations.

Fluvoxamine Study (cont.)

  • - 20 women (mean age 51.3 years) with DSM-IV TR [23] diagnostic criteria for major depressive disorder and a Hamilton Depression Rating Scale (HDRS) [24-26] score higher than 20.

  • - Exclusion factor was the age under 35 years.

  • - Each patient was given fluvoxamine (100mg/day) and amisulpride (50mg/day) throughout the 6week trial.

  • - Clinical symptoms were evaluated by using Hamilton Depression Rating Scale (HDRS) [24-26] at the end of week 1, 2, 3,
    6.

Output Table

- will be reported. In this setting, is the probability of failing to reject when, in
  • - Comparing average HDRS at different time periods, looking to see if the average HDRS score is different from the average score at the beginning of the six-week period. The P-values imply that they are different at those times.

  • - A separate analysis stated “The ANOVA one way for repeated measures carried outon the basis of the HDRS score at baseline and at week 1, 2, 3 and 6 stage expressed a statistically significant improvement of depressive symptoms (F=4.5; DF 9,80,4,76,99; P < 0.00001).”

One last note about P-values

  • - Some articles will not state the actual P-value. Instead, you may see something akin to the following.

The P-value is larger than .10, result is insignificant at commonly used Result significant at .10, but not at .05 Result significant at .05, but not at .01 Result significant at .01, but not at .001

levels

(Some journals will differentiate at a .025 or a .02 level, but that information will be on the journal’s website.)

Confidence Intervals

  • - Back to the pancreatic cancer example:

  • - We concluded that the mean survival time is greater than 343 days, but this does not give us any additional information about the actual value of the mean survival time.

  • - A confidence interval provides us with an estimate of a population value (parameter) with a specified level of confidence that the interval contains the parameter we are trying to estimate.

  • - Confidence intervals can be two-sided (most common) or one-sided.

  • - A confidence interval utilizes information about the variability of an statistic, such as the standard deviation of the 10,000 sample means, along with the desired level of confidence, to produce a margin of error.

  • - Many commonly used confidence intervals have the form:

Margin of Error

estimate

margin of error

  • - Important properties As sample size increases, margin of error decreases (interval gets narrower) As the confidence level increases, margin of error increases (interval gets wider)

Confidence Level

  • - Common confidence levels: 90%, 95%, 98%, 99%

  • - Relationship to the significance level?

  • - Interpretation of a 95% confidence level for a population mean: If we were to take repeated random samples of a fixed size and calculate a 95% confidence interval for the population mean, on average, 95% of the resulting intervals would include the value of the population mean.

Thyroid Cancer Example

  • - Confidence interval for the hazard ratio (similar to, but not the same as, relative risk). A value of one implies that the two groups have the same hazard rate.

- We concluded that the mean survival time is greater than 343 days, but this does
  • - 95% CI for the hazard ratio for the two age groups is (1.925, 5.870). Estimated hazard of dying within 10 years is approximately two to six times as high for the over 45 group than the under 45 group.

  • - 95% CI for the hazard ratio for the two sex groups is (.902, 2.499). Values below one imply a lower hazard rate of dying within 10 years for one sex, while the values above one imply a lower hazard rate for the other group. The fact that 1 falls within the interval would imply that the hazard ratios are not significantly different for males and females.

the 10-year death rates. The last output reveals a similar result based on whether the patient had a radioactive iodine treatment.

Correlation

-

Oftentimes, we are interested in how the values of variables change with one another. One of the primary statistical measures of such an association is called correlation. The most commonly used version of correlation is called the Pearson product-moment correlation coefficient, which measures how close an association between two quantitative variables is to being linear.

Properties of r

 

If

or

, the data are perfectly linear.

If the association is positive,

. If the association is negative,

If

. , there is no linear association, but there could be a strong nonlinear association.

the 10-year death rates. The last output reveals a similar result based on whether the patient

Other Information About r

the 10-year death rates. The last output reveals a similar result based on whether the patient
  • - Versions exist for other types of variables.

the 10-year death rates. The last output reveals a similar result based on whether the patient
the 10-year death rates. The last output reveals a similar result based on whether the patient
  • - A major use of r in medical research is to look for relationships between variables.

  • - Hypothesis tests exist for correlation. The primary null hypothesis is that r = 0 (that there is no linear association between the variables). Smaller P-values imply that there is an association between the variables.

Correlation Isn’t Causation!

  • - One of the biggest mistakes is when someone believes that when two variables are correlated, changing the value of one variable will cause a change in the other variable. Finding out that two variables are correlated should result in questions about the biological, chemical, and/or physical link between the variables.

Example

  • - Children ages 3-10 had the length of their feet measured, as well as their reading ability, based on their lexile score. The correlation between the two variables was close to one, which implies that kids with longer feet have higher reading levels. As a result, government agencies began awarding grants to scientists to research how to increase the length of children’s feet.

Serious Example

  • - Pre-tumor exercise decreases breast cancer in old mice in a distance-dependent manner A negative correlation was observed between daily distance ran, prior to tumor injection, and absolute tumor mass measured at necropsy (Pearson’s r = -0.89, P = 0.0066). A correlation was also observed between distance ran before tumor implant and the histological score for mitotic index (Pearson’s r = -0.85, P = 0.034).

- Children ages 3-10 had the length of their feet measured, as well as their reading
- Children ages 3-10 had the length of their feet measured, as well as their reading

Breast Cancer/Exercise (cont.)

  • - Runners showed an increased respiratory exchange ratio during the light cycle (P = 0.029) suggesting that voluntary running shifted resting substrate metabolism toward glucose oxidation, relative to lipid oxidation.

  • - The observations from this study indicate that running longer distances is associated with decreased breast tumor burden in old mice, suggesting that physiological factors generated by exercising before tumor onset are protective against tumor progression.

  • - Most important statement! The mechanisms for this protective effect are not known, but the data show that older mice are useful models to address specific questions in cancer research and support further studies on the ability of exercise training to protect older women at risk for breast cancer.