What is Statistics?
 Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advances
Why Do Medical Professionals Need an Understanding of Statistics?
 Average number of peerreviewed medical research articles published/year over 19942001: 398,778 (MEDLINE)
 ~ 275,000 involved human subjects
 ~ 25,000 involved randomized, controlled trials
Yeah, yeah, yeah. So, how will this knowledge make me a better doctor?
 Understanding what the results of these studies mean (and don’t mean) can help in deciding between various treatments
 The more you know, the better the chance that you will be able to communicate with your patients.
Example: The impact of microscopic extrathyroid extension on outcome in patients with clinical T1 and T2 welldifferentiated thyroid cancer
 Patients and Methods. From an institutional database, we identified 984 patients (54%) who underwent surgery for cT1/T2N0 disease. Of these, 869 patients were pT1/T2 and 115 were upstaged to pT3 based on the finding of microscopic ETE. Diseasespecific survival (DSS) and recurrencefree survival (RFS) were analyzed for each group using the Kaplan–Meier method. In the pT3 group, factors predictive of outcome were analyzed by univariate and multivariate analyses.
 Results. There was no difference in the 10year DSS (99% vs 100%; P = .733) or RFS (98% vs 95%; P =.188) on comparison of the pT1/pT2 and pT3 cohorts. Extent of surgery and administration of postoperative RAI were not significant for recurrence on univariate or multivariate analysis in the pT3 cohort.
o
No difference with stage 3 and stage 1 odds
 Conclusion. Outcomes in patients with cT1T2N0 WDTC are excellent and not affected by microscopic ETE. The extent of resection and administration of postoperative RAI in patients with microscopic ETE does not impact survival or recurrence.
No difference?
 99% = 100%?!?
 98% = 95%?!?
 What do P = .733 and P = .188 mean?
Getting Started
 Variable – characteristic that can be measured or observed. If a characteristic is the same for every member of the population, it is referred to as a constant
 Types of variables
o 
Quantitative (Continuous, Discrete) 
o 
Categorical (Binary, Nonbinary) 
Ex. ZIP CODE (number but just describes something, not actually a numerical value)
Quantitative Variables
 Variables that take on numeric values for which arithmetic operations (differences, averages, etc.) make sense.
o 
Continuous: Can take on any value over one or more intervals (height, body fat percentage, LDL) 
o 
Discrete: Takes on one of a finite or countably infinite set of values (white blood cell count, number of cases of mono in a school system, number of patients treated in an emergency room in one day) 
has to be a space in between
Categorical Variables
o 

o 
Nonbinary: Three or more categories (eye color, type of melanoma (lentigo, nodular, etc.)) 
Some comments
 The choice of statistical methods depends partially on the type(s) of variable(s).
 We often measure many continuous variables a specific level of precision (nearest inch, gram, etc.). Process called discretization. This does imply that the variable is discrete.
 Numeric values don’t automatically imply that a variable is quantitative. (e.g. databases will have breast cancer staging classified as 0, 1, 2, 3, or 4.)
Descriptive vs. Inferential Statistics
 Descriptive statistics involves using summary values and graphical displays to explore the distribution of one or more variables in a data set or the relationship between two or more variables.
 Inferential statistics involves drawing conclusions about a population with a certain degree of confidence or error rate.
Some Descriptive Statistics
 Frequency/relative frequency distributions
 Histogram
Frequency/Relative Frequency
 Frequency – number of times an observation with a specific value occurs.
 Relative Frequency – fraction/proportion of all observations that have a specific value. Can also be expressed as a percentage. (Note: not all percentages are relative frequencies. Blood/alcohol content, Body/fat percentage are two examples)
Histogram/Frequency Table
 Used for quantitative data
 Set of data values broken into equal width intervals (width usually determined by software, can be changed)
 Bar height = frequency (or relative frequency)
 Variable represented on horizontal axis
 Changing interval width changes appearance of graph.
Histogram/Frequency Table on Right
 From state data
 Intervals are closed on left and open on the right
–
Left end point is included and right end
point is not (so just goes up to 39.99 but
doesn’t include 40 for the first incidence
rate
Comparisons (Death Rates)
 deaths out of every 100,000 people
 this doesn’t tell you which one is more dangerous
–
doesn’t tell you details about what
percentage of people who get it die
 colorectal cancer is more common than pancreas so that’s why it has a higher death rate
Questions
 Are there site/sex combinations that tend to have higher/lower death rates than others?
o
Higher: M colorectal (more numbers to the right)
o
Death rate for males will be more than females
For breast cancer:
Its more common in females but the % of survival for F and M is the same
 Are there site/sex combinations that tend to have more variable (less consistent) death rates than others?
o 
How widespread the data is 
o 
Male colorectal 
More variability
Pancreas data appears the same for F and M
Numeric Summaries
 Mean – arithmetic average Not resistant – will be affected by outlier
o
 Median – middle observation in an ordered list of data
o
Resistant – wont really be affected by outlier
 Mode – observation with the highest frequency
 Resistance of a statistic – Depends on whether the value of statistic is impacted by an extremely low or extremely high observation (outlier)
Usefulness?


o 
Not resistant to outliers. Many inferential methods based on the mean do not produce reliable results when 
o 
outliers (especially extreme outliers) are in a data set. Utilizes all of the observations in a data set, hence it takes advantage of as much information as possible 


o 
Resistant to outliers. 
o 
Does not utilize all of the data 
o 
Inferential methods based on the median can be used when outliers exist. 


o 
Used with categorical data/discrete variables with a small number of unique values. 
o 
No inferential methods 
Important Related Measures
 Trimmed mean: mean is calculated after removing the smallest p% and largest p% of the data values. (Resistant to outliers, uses more data than the median)
o
Dangerous to do if outlier is important
 Five number summary: consists of the minimum, lower (first) quartile, median, upper (third) quartile, and maximum. Divides the data set into ordered sets that consist of 25% of the data.
o
Put it in order, cut it in half, cut each half in half
MAO Example – Monoamine oxidase in 18 schizophrenia patients (nmoles benzylaldehyde product/108 platelets)
Median: middle of 8.7 and 9.7
MAO Example (cont.)
 Fivenumber summary:
4.1 (min) 
7.4 (lower quartile) 
9.2 (median) 
11.9 (upper quartile) 
18.8 (max) 



7.8 
(occurred twice) 

5.56% (1 out of 18) Trimmed Mean: 

9.6 
(4.1 and 18.8 were removed, avg. of remaining 16 was calculated)
Measures of Variability
 Standard Deviation – “average” distance between each observation and the mean.
 Interquartile Range – distance between the lower and upper quartiles
o
Middle 50%
 Range – distance between the minimum and the maximum
Usefulness?
 Standard Deviation
o 
Not resistant to outliers because mean is not resistant 
o 
Utilizes all of the observations in a data set. 
 Interquartile Range
o 
Resistant to outliers 
o 
Multiple accepted methods for calculating IQR 


o 
Not resistant to outliers 
o 
No idea what is going on between the extremes 
MAO example revisited
Mean = 9.8056
Deviation: data – value of mean (4.1 – 9.2 =5.7) Standard deviation = 3.6183 (absolute standpoint) Interquartile Range (IQR) = 11.97.4 = 4.5 Range = 18.84.1 = 14.7
MAO example revisited (cont.)
 Interpretations
 Standard deviation: On “average”, the MAO values for the 18 people in the sample are within 3.6183 units of the mean.
 IQR: The “middle” 50% of the MAO values fall within a 4.5unit interval.
 Range: All of the data fall within a 14.7unit interval.
Death Rates (slide 20) Revisited
Match the summary statistics with the four histograms.
Boxplot (aka Boxandwhisker plot)
 Visual representation of fivenumber summary
 Displays outliers, if they exist
 Max, min, quardrants, medium
 Dot = outlier
 If have outlier, the lines are at the min and max WITHOUT the outlier
Zscore
 Putting things on the same scale so you can compare them
 Mississippi’s colorectal cancer death rate is 16.5 per 100,000 women, while Kentucky’s rate is 23.4 per 100,000 men. A Mississippi politician states that their rate for women isn’t as bad as Kentucky’s for men because their rate is 2.484 above average, while Kentucky’s is 3.624 above average. Is this an appropriate comparison?
 Expect the males rates to be more spread out than female rates because male rate SD is more
 The Mississippi rate is 1.682 standard deviations above the mean rate for females, while the Kentucky rate is 1.605 standard deviations above the mean rate for males. Hence, the Mississippi rate is farther above average for females than the Kentucky rate is for males.
 Farther away from 0, farther away you are from the average
Segue to Inference
Sampling Distribution
Stereotypical “survival” curve – will dive down like that
 Suppose we choose 40 of these patients at random and calculate the average survival time of the 40 who were selected. What values would we typically expect for the average of the 40? Would it be possible to obtain an average between 1600 and 1700 days? How about between 0 and 100 days? Which would be more likely?
•
The 0100 days would be more likely
Distribution of mean survival time of 40 patients


10,000 random samples of 40 patients
NOT ONE has an average between 0100 but its closer than
16001700
Question?



Two experimental treatments for latestage pancreatic cancer have been developed. Each one is tested on a cohort of forty patients. The first cohort has an average survival time of 370 days, while the second cohort has an average survival time of 450 days. Do either one of these results provide evidence that the average survival time using the new treatment is greater than 343 days?
1,897 out of the 10,000 average survival times were
370 days or higher. (P = .1897)
319 out of the 10,000 average survival times were 450
days or higher. (P = .00319)
Implication




If the average survival time is 343 days (no different) for experimental treatment A than the status quo, approximately 19% of all samples of 40 patients will have an average survival time of 370 days or longer.
If the average survival time is 343 days (no different) for experimental treatment B than the status quo, approximately 0.3% of all samples of 40 patients will have an average survival time of 450 days or longer.
Do the data seem “out of line” enough with the status quo to believe that the new treatment does increase survival time?
(Ball pit analogy…)
Basics of a Hypothesis Test

Null hypothesis
: Statement assumed to be true. Typically implies “no difference”, “no impact”, “maintaining the
status quo”. If the null hypothesis is true, we know how our statistic should behave.

Alternative hypothesis _{(} “research hypothesis.”
_{)} statement of what a statistical hypothesis test is set up to establish. Also called the
Hypothyroidism Analogy





: patient doesn’t have hypothyroidism
: patient does have hypothyroidism
Data: blood test 

If 
is true, 

• 

, 
Layman’s view: If TSH is high enough, and thyroxine is low enough,
hypothyroidism will be made.
will be rejected and a diagnosis of
Pvalue
 Probability of observing a statistic that is at least as extreme as the one produced by the sample, assuming that the null hypothesis is true.
 Logic: lower pvalues imply that it is harder to obtain a specific result if the null hypothesis is true. Hence, lower p values correspond to having more evidence against the null hypothesis.
Significance Level (
)
 Threshold for concluding whether the null hypothesis is false.
 Maximum acceptable error rate for mistakenly concluding that the alternative hypothesis is true, when, in fact, the null hypothesis is true.
 From a medical test standpoint

•
Most common
values: .10, .05, .025, .01
•
If you pick .05 5% of the time, you get this result due to chance
THE RULE
If we reject
Why “Fail to Reject”?
If Pvalue
, reject
. If not, fail to reject (do not reject)
. , we say that the result is statistically significant at the (insert
) level.
 A court trial is analogous to a hypothesis test, with
 : defendant is innocent
 : defendant is guilty
 If there is enough evidence against the defendant, the jury rejects the null, returning a guilty verdict. If there isn’t enough evidence, the jury returns a “not guilty” verdict, not an “innocent” verdict. The purpose of the trial is not to prove innocence, it is to “prove” guilt.
 Rejecting H0 is GUILTY but not rejecting H0 is NOT GUILTY
The Result Is Significant. Now What?
 Pancreatic Cancer Treatment B
•
Pvalue = .0031
 We have concluded that the new treatment increases survival, on average. Does this mean in increases survival for everyone? Does it mean that more people are cured, but the side effects result in more early deaths? The result should create more questions to be examined.
Other Things You Might See
 Test Statistic – measure of how different the relevant statistic(s) is/are from what is specified in the null hypothesis. In general, values that are farther away from zero imply a greater difference, but what is classified as being “far away” depends on a number of factors, which is why it is easier to work with the Pvalue.
 Power: The power of a hypothesis test is the probability that you will reject some cases, the value
•
when the statement in
is true. In

will be reported. In this setting,
is the probability of failing to reject
when, in fact,
is true.
•
From a medical test standpoint, the power of a test is equivalent to the sensitivity of a test. Hence, probability of a false negative.
•
All others constant,
and
are inversely related.
is the
Strategy to Accelerate or Augment the Antidepressant Response and for An Early Onset of SSRI Activity. Adjunctive Amisulpride to Fluvoxamine in Major Depressive Disorder
 Abstract: The topic of early response to antidepressant treatment has been extensively studied in major depressive disorder (MDD). We serendipitous observed an increase tolerability, a rapid response to therapy and an early onset of antidepressant fluvoxamine activity when associated with amisulpride in patients with major depressive disorder. The purpose of this study was to investigate our preliminary observations.
Fluvoxamine Study (cont.)
 20 women (mean age 51.3 years) with DSMIV TR [23] diagnostic criteria for major depressive disorder and a Hamilton Depression Rating Scale (HDRS) [2426] score higher than 20.
 Exclusion factor was the age under 35 years.
 Each patient was given fluvoxamine (100mg/day) and amisulpride (50mg/day) throughout the 6week trial.
 Clinical symptoms were evaluated by using Hamilton Depression Rating Scale (HDRS) [2426] at the end of week 1, 2, 3,
6.
Output Table
 Comparing average HDRS at different time periods, looking to see if the average HDRS score is different from the average score at the beginning of the sixweek period. The Pvalues imply that they are different at those times.
 A separate analysis stated “The ANOVA one way for repeated measures carried outon the basis of the HDRS score at baseline and at week 1, 2, 3 and 6 stage expressed a statistically significant improvement of depressive symptoms (F=4.5; DF 9,80,4,76,99; P < 0.00001).”
One last note about Pvalues
 Some articles will not state the actual Pvalue. Instead, you may see something akin to the following.
The Pvalue is larger than .10, result is insignificant at commonly used Result significant at .10, but not at .05 Result significant at .05, but not at .01 Result significant at .01, but not at .001
levels
(Some journals will differentiate at a .025 or a .02 level, but that information will be on the journal’s website.)
Confidence Intervals
 Back to the pancreatic cancer example:
 We concluded that the mean survival time is greater than 343 days, but this does not give us any additional information about the actual value of the mean survival time.
 A confidence interval provides us with an estimate of a population value (parameter) with a specified level of confidence that the interval contains the parameter we are trying to estimate.
 Confidence intervals can be twosided (most common) or onesided.
 A confidence interval utilizes information about the variability of an statistic, such as the standard deviation of the 10,000 sample means, along with the desired level of confidence, to produce a margin of error.
 Many commonly used confidence intervals have the form:
Margin of Error
estimate
margin of error
 Important properties As sample size increases, margin of error decreases (interval gets narrower) As the confidence level increases, margin of error increases (interval gets wider)
•
•
Confidence Level
 Common confidence levels: 90%, 95%, 98%, 99%
 Relationship to the significance level?
 Interpretation of a 95% confidence level for a population mean: If we were to take repeated random samples of a fixed size and calculate a 95% confidence interval for the population mean, on average, 95% of the resulting intervals would include the value of the population mean.
Thyroid Cancer Example
 Confidence interval for the hazard ratio (similar to, but not the same as, relative risk). A value of one implies that the two groups have the same hazard rate.
 95% CI for the hazard ratio for the two age groups is (1.925, 5.870). Estimated hazard of dying within 10 years is approximately two to six times as high for the over 45 group than the under 45 group.
 95% CI for the hazard ratio for the two sex groups is (.902, 2.499). Values below one imply a lower hazard rate of dying within 10 years for one sex, while the values above one imply a lower hazard rate for the other group. The fact that 1 falls within the interval would imply that the hazard ratios are not significantly different for males and females.

Above interval looks at the hazard ratio for thyroid cancer patients with microscopic extensions versus without
microscopic extensions. Again, the confidence interval (0.714, 2.710) contains 1, so there isn’t a significant difference in
the 10year death rates. The last output reveals a similar result based on whether the patient had a radioactive iodine treatment.
Correlation
 
Oftentimes, we are interested in how the values of variables change with one another. One of the primary statistical measures of such an association is called correlation. The most commonly used version of correlation is called the Pearson productmoment correlation coefficient, which measures how close an association between two quantitative variables is to being linear. 

Properties of r 

• 

• 
If 
or , the data are perfectly linear. 
• 
If the association is positive, . If the association is negative, 

• 
If 
. , there is no linear association, but there could be a strong nonlinear association. 
Other Information About r
 Versions exist for other types of variables.
 A major use of r in medical research is to look for relationships between variables.
 Hypothesis tests exist for correlation. The primary null hypothesis is that r = 0 (that there is no linear association between the variables). Smaller Pvalues imply that there is an association between the variables.
Correlation Isn’t Causation!
 One of the biggest mistakes is when someone believes that when two variables are correlated, changing the value of one variable will cause a change in the other variable. Finding out that two variables are correlated should result in questions about the biological, chemical, and/or physical link between the variables.
Example
 Children ages 310 had the length of their feet measured, as well as their reading ability, based on their lexile score. The correlation between the two variables was close to one, which implies that kids with longer feet have higher reading levels. As a result, government agencies began awarding grants to scientists to research how to increase the length of children’s feet.
Serious Example
 Pretumor exercise decreases breast cancer in old mice in a distancedependent manner • A negative correlation was observed between daily distance ran, prior to tumor injection, and absolute tumor mass measured at necropsy (Pearson’s r = 0.89, P = 0.0066). • A correlation was also observed between distance ran before tumor implant and the histological score for mitotic index (Pearson’s r = 0.85, P = 0.034).
Breast Cancer/Exercise (cont.)
 Runners showed an increased respiratory exchange ratio during the light cycle (P = 0.029) suggesting that voluntary running shifted resting substrate metabolism toward glucose oxidation, relative to lipid oxidation.
 The observations from this study indicate that running longer distances is associated with decreased breast tumor burden in old mice, suggesting that physiological factors generated by exercising before tumor onset are protective against tumor progression.
 Most important statement! The mechanisms for this protective effect are not known, but the data show that older mice are useful models to address specific questions in cancer research and support further studies on the ability of exercise training to protect older women at risk for breast cancer.
•
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.