## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

● Discuss the significance of Statistics for Physicians ● Suggest the study strategies for learning Statistics

**● Present role of Statistics in the scientific process
**

● Review basic concepts of Statistics

● Introduce methods of Exploratory Data Analysis

● “Amazingly, it is widely considered acceptable for medical

researchers to be ignorant of statistics. Many are not ashamed (and some seem proud) to admit that they don't know anything about Statistics.”1

● “It may not be expected from doctors to be expert in

**statistics but they should be made capable of understanding the basic statistical methodology. ”2
**

● “Medical students may not like statistics, but as good

**doctors they will have to understand it.”3
**

1. 2. 3. Altman DG. The scandal of poor medical research. BMJ. 1994 ; 308: 283-4. Singh G. Medical Science without Statistics. The Internet Journal of Healthcare Administration. 2006; 4:2 Chen J. Lecture: Advice to GCRC & Surgery Fellows and Residents, SBU 2004.

● Statistics: theory & methodology

**for the collection, organization, analysis, interpretation & presentation of data.
**

DESCRIPTIVE INFERENTIAL

**● Descriptive Statistics: discipline
**

Collecting

Organizing Summarizing Presenting Data Inferences Hypothesis Testing Relationships Predictions

**of quantitatively describing the features of data.
**

● Inferential Statistics: deals with

drawing conclusions from data.

Ability to understand:

● the value of published

Medical Research.

● the role of Statistics in

Medical Business.

In the past the USE of Statistics was its most significant aspect.

M

P H A R M

E

D I A

Today, the MISUSE of Statistics in Research became a concern.

A

● Statistics is an essential aspect of modern science. ● Before Statistics, the science was perceived as the process of

**developing absolute knowledge through observations
**

● In a contrast, Statistics is based on the notion that scientific

**knowledge is not absolute
**

● Hence, uncertainty & error are part of science

● The only real things in science are distributions of numbers

**● Probability theory is used to interpret those distributions
**

● Statistics reflects acts of interpretation - not irrefutable facts

the Wilcoxon rank sum test Poisson regression models the Bayesian estimates Wald χ2 statistics Cox proportional hazards compared using t tests repeated-measures ANOVA adjusted hazard ratios 2-stage statistical model 95% confidence interval the degrees of freedom odds ratio

Kaplan-Meier method Pearson χ2 Fisher exact test to have 90% power Mann-Whitney test a 2-tailed a level of less than.05 the log-rank test A 2-factor analysis of variance χ2 tests the Z test Logistic regression models stratified Mantel-Haenszel analysis

Source: JAMA Vol. 292 (19): Six Original Contribution Papers

● In developed countries, much of what

**laymen know about medicine is gleaned from the media.
**

● Unfortunately, the more frightening an

**event is - the more “newsworthy” it is.
**

● The Statistical Analysis of Research

Studies is complex. Regrettably, it tends to be oversimplified & sensationalized.

Data dredging (data fishing, data snooping, equation fitting) is the inappropriate use of statistics to uncover misleading relationships.

● The SIMPLEST FORM OF STATISTICS

will suffice in most well-designed studies.

● Therefore, a revision of the study

design should occur - before the use of more “sophisticated” analysis. complex statistical methods should be approached with a caution.

● Similarly, any study that uses overly

Bhattacharya K. University of Oxford Introduction to Statistics for Medical Students. 2004

● As opposed to the past we live now in the

QUANTITATIVE ERA.

● In clinical environment everything is measured.

● All aspects of physicians’ work is being

**statistically analyzed & compared to benchmarks such as evidence-based guidelines.
**

● Any physician who does not understand this

WILL BE CRUSHED.

**Computerization of medical business facilitated:
**

● Automated surreptitious data gathering

● Data Mining

● Physician Performance analyses ● Outcome & Cost-Effectiveness analyses ● Practitioner vs. Peer-Group comparison analyses ● More accurate Actuarial analyses

**Statistic is challenging for everybody. Physicians may find it especially challenging - as Statistics is:
**

● Math-based. It has many rigorous quantitative

**aspects rooted in mathematics. Most physicians are not used to study math-based subjects.
**

● Time consuming. It is a tedious subject requiring a

**tremendous time commitment.
**

● Spuriously not-essential. It appeasers to be not

everyday use topic. (“I can get away w/o studying it”)

Statistics is not a spectator sport.

● Get Motivated by understanding why you need Statistics. ● Learn Actively: it cannot be passively “crammed”:

**o Use pen & paper: for solving problems & reflecting on ideas o Make your own scenarios
**

● Study Deliberately as:

o few words & symbols can mean a lot in statistics o it may be necessary to read a topic many times

● Study Incrementally:

o Statistics is based on small number of principles o Those must be memorized & understood first o It is futile to “look up” the advanced test (e.g. used in a research paper) w/o knowing those essentials

● Assemble Resources:

o There is no single “best statistical manual”

o It pays to prepare the set of personalized references

Source: University of Oxford & LISA: Laboratory for Interdisciplinary Statistical Analysis at Virginia Tech

**● Population: all elements to be studied
**

o Parameter: characteristic of the Population (e.g. Mean, Standard Deviation)

**● Sample: a subset of Population.
**

o Statistic: characteristic of the Sample (not to be confused with Statistics)

**VARIABLE: any measurable attribute that differs. ● Quantitative = Numerical
**

o Continuous: any value between a set of numbers

E.g.: Time

**o Discrete: only a finite number of values
**

E.g.: Number of children in a family

**● Qualitative = Categorical o Ordinal: can be ordered (ranked)
**

E.g.: Clothing Size: S, M, L, XL

**o Nominal: cannot be ordered
**

E.g.: Colors

**DATA: values that variables can assume
**

DATA

● Univariate: analysis of one variable

**● Bivariate: analysis of two variables
**

● Multivariate: analysis of many variables

SSDC IDM

● SAMPLE SELECTION & DATA COLLECTION ● INITIAL DATA MANIPULATION o Data Formatting o Data Quality Control ● EXPLORATORY DATA ANALYSIS o Tabular, Numerical, Graphical data summaries o Choosing ways of Definitive Analysis ● DEFINITIVE DATA ANALYSIS o Final Inferential Data Analysis ● PRESENTATION of CONCLUSIONS o Concise graphical & tabular summaries o Statement of conclusions

EDA

DDA PoC

SSDC IDM

● Understanding the phases of SA is important

**not only for performing research.
**

● It is essential for the critical appraisal of the

EDA

DDA

published studies.

● This truth is frequently overlooked.

PoC

GOALS:

DI

● Descriptive INFERENCE: describe a population,

**using information from a sample
**

● Analytical INFERENCE: describe relationships

AI

between variables, using a sample - assuming that it can be generalized to a population.

SAMPLING:

● Simple Random

● Stratified

● Cluster ● Multistage

**SIMPLE RANDOM Sample
**

● subset of individuals chosen

**RANDOMLY from a population
**

● each individual has the same

probability of being chosen

STRATIFIED Sample

● STRATA: homogeneous

nonoverlapping subgroups

● STRATIFICATION: dividing

**population into strata
**

● STRATIFIED Sample is

obtained by simple random sampling from each stratum

CLUSTER Sample

● CLUSTERS: natural heterogenous

Cluster

Cluster

**subgroups representative of population
**

● CLUSTERING: identifying clusters

in population

● CLUSTER Sample is obtained by

simple random sampling within each cluster

MULTISTAGE Sample ● a form of cluster sampling ● when using all the sample elements in all the clusters is undoable ● instead the researcher randomly selects elements from clusters

**Putting a Data Set to order, making it usable: ● Data Formatting ● Checking Quality of:
**

o Data (outliers?) o Implementation of Design

● Basic Characteristics of data

OUTLIERS: data points that deviate remarkably from the majority of the sample.

**DISTRIBUTION: The pattern of occurrence
**

of the various values of a variable

● POPULATION D: distribution of values for

**all units in the population
**

● EMPIRICAL D: distribution of values for the

units in a sample.

It is assumed that the Empirical Distribution is a good representation of the Population Distribution

is a listing or function showing all the possible values of the data and how often they occur.

● Distribution of categorical data shows the number

**& percentage of individuals in each group.
**

● Distribution of numerical data is typically

**presented using graphs & charts to examine:
**

o the shape, o center, o amount of variability in the data.

NORMAL Distribution

A PROBABILITY DISTRIBUTION: assigns a probability to each measurable subset of the possible outcomes of a procedure. ● Normal (Gaussian) distribution is a very common continuous probability distribution

● Continuous probability distribution is a

**probability distribution that has a pdf.
**

● pdf: Probability density function or density of a

continuous random variable, is a function that describes the relative likelihood for this variable to take on a given value.

NORMAL (GAUSSIAN) DISTRIBUTION

There are myriad probability distributions Most are related to each other, and ultimately to the Normal

● GOAL: to reduce the information contained in a data

**set to a few key indicators.
**

● APPROACH: summarization of the data with visual

**methods to reveal trends & patterns.
**

● METHODS: depends on the type of data

TABULAR:

NUMERICAL: GRAPHICAL:

Q1=64; Q2=71; Q3=74; IQR=14 𝐗 = 𝟕𝟐. 𝟔; 𝐗= 45; 2 σ = 16 ; σ=4; CV=0.9

Quantiles & Quartiles Median Mean Mode Spread or Dispersion Interquartile Range Standard Deviation Coefficient of Variation

● The EDA methods to be presented in this section are

**important not just for the researchers
**

● Any reader of scientific literature or business statistical

**analyses will encounter discussed here methods.
**

● Familiarity with them is essential for one’s ability to

critically appraise any statistics based document.

**FREQUENCY DISTRIBUTION: is an organization of the raw data in the tabular form using classes & frequencies
**

● Frequency : the number of times a value occurs in a data set ● Relative Frequency: frequency counts expressed as percentages

**of the total observations
**

● Cumulative Frequency: the sum for the frequencies for all

**values at or below the given value
**

● Cumulative Relative Frequency: the sum for the relative

frequencies for all values at or below the given value

● Useful for

categorical data.

● It presents the

distribution of values by showing their frequencies.

**Contingency table (cross tab) is used to analyze the relationship between many categorical variables.
**

Example: 100 individuals are randomly sampled from a population as part of a study of sex differences in handedness.

● Quantiles & Quartiles ● Location o median o mean o mode ● Spread or Dispersion o Range o Interquartile range o Variance o Standard deviation o Coefficient of variation

**● Skewness o Coefficient of Skewness ● Kurtosis o Coefficient of Kurtosis ● Covariance ● Correlation o Correlation Coefficients
**

Pearson’s CC Spearman's rank CC

Simple Definition: QUANTILES: Points taken at regular intervals, that divide the data set into equal subsets.

Example of Formal Definition: The α-th sample quantile, denoted η(α), is the smallest value such that (100×α)% of the observations for the variable take values which are less than or equal to η(α).

● Quantiles are the data values (cut-off POINTS) marking the

**boundaries between subsets. ● Examples of specific quantiles:
**

o o o o 2-quantile: median 4-quantiles: quartiles 5-quantiles : quintiles 100-quantiles: percentiles

● Common misconception: the use of the name of quantiles

to denote the subsets they mark. These subsets should be called thirds, quarters, fifths, etc.

three POINTS that divide the data set into four equal groups, each comprising a quarter of data. A quartile is a type of quantile. ● Q1: First: lower = 25th percentile

o splits lowest 25% of data ● Q2: Second: median = 50th percentile

Q2=5

**o cuts data set in half
**

● Q3: Third = upper = 75th percentile o splits highest 25% of data

Q2=5.5

Interquartile Range (IQR): the difference between upper and lower quartiles.

Q2

IQR= Q3-Q1

**Finding the position of the value in the data set that best characterizes it.
**

● Median (𝑋 ): value separating the “higher” half of a data set from “lower”

o The median of {2,3,5,8,9} is 5

● Mean (𝑋): the sum of the n numbers divided by n

o The mean of {6,4,7,10,4} is 6.2=

6+4+7+10+4 5

● Mode (Mo): the most frequent value in the data set

o The mode of {1, 3, 6, 4, 3, 5, 3} is 3

● Mean is affected by outliers, median is not ● Median exhibits robustness against outliers ● Robustness: “the ability to resist”.

● Robust statistics: statistics with good performance

for data drawn from a wide range of probability distributions & not affected by outliers

● Measures the degree to which the observed values are

**concentrated around a location measure.
**

● Smaller spread: values are tightly clustered around the center.

Measures of Spread:

● Range

● Interquartile range

● Variance ● Standard deviation ● Coefficient of variation

● RANGE: difference between the

**sample Maximum & Minimum.
**

o The simplest measure of dispersion o Very sensitive to outliers

● INTERQUARTILE RANGE (IQR):

**the difference between upper and lower quartiles.
**

o Less sensitive to outliers

Measure of how far a set of numbers is spread out: how far the numbers are located from the mean.

● s2 is always positive

● s2=0: no variation

n = Number of variables Xi = Each of the values of the data 𝑋 = Mean

● s2 Small: data close to 𝑋

● s2 High: data far from 𝑋

Since Variance is expressed in squared units it is difficult to interpret intuitively.

**Standard Deviation (SD): square root of the Variance. It shows the extend of variation from the mean.
**

● s Small: data close to 𝑋

**● s High: data far from 𝑋
**

● Both s2 & s depend on the units in

which a variable is measured. ● It can be misleading when comparing variables using different units.

**from Latin: co (together) + efficere (to effect)
**

COEFFICIENT

4

COEFFICIENT

**In Mathematics: Number or other known factor (symbol) by which another number or factor is multiplied.
**

● Eg.: in the equation ax2 + bx + c = 0

o a is the coefficient of x2 o b is the coefficient of x

a

In Statistics: Measure of a specified characteristic of a phenomenon

**Coefficient of Variation (CV): ratio of the SD to the Mean
**

Relative SD (RSD): CV expressed as a percentage

**● CV<1: Low Variance
**

s= Standard Deviation 𝑋 = Mean

● ●

● CV=1: No Variance ● CV>1: High Variance

CV has no units It can be used for comparing dispersions of variables measured in different units.

**Skewness: deviations from symmetry with respect to a location measure. It is unit-free.
**

● b1=0: variables distributed symmetrically around 𝑋

**o Tails are symmetric
**

s= Standard Deviation 𝑿 = 𝑴𝒆𝒂𝒏 n = Number of variables Xi = The data values

● b1>0: positively, right-skewed

**o Longer tail for values > 𝑋
**

● b1<0: negatively, left-skewed

o Longer tail for values < 𝑋

**The degree of peakedness of the distribution - as compared to a Normal (Gaussian) Distribution. It is unit-free.
**

● b2>3: Leptokurtic

o Peaked > Normal

s= Standard Deviation 𝑿 = 𝑴𝒆𝒂𝒏 n = Number of variables Xi = Each of the data values

● b2=3: Mesokurtic

o Peaked as Normal

● b2<3: Platykurtic

o Peaked < Normal

● Covariance is a measure of how much two

random variables change together. ● Dependence is any statistical relationship between two random variables. ● Correlation refers to statistical relationships involving dependence.

Correlation does not imply causation!

**Measures association between two numerical variables
**

● cov(X,Y)=0: X&Y are INDEPENDENT

o X&Y do not correspond

X , Y : variables Xi ,Yi : observations for unit i 𝑋 , 𝑌: means of the variables

**● cov(X,Y)>0: X&Y POSITIVELY associated
**

o greater values of X correspond w/ greater Y

**● cov(X,Y)<0: X&Y NEGATIVELY associated
**

o greater values of X correspond w/ smaller Y

n: number of variables

● Sign (+/-) of cov shows the type of linear relationship between X&Y.

● The magnitude of the cov is hard to interpret, hence normalized cov is used.

● NORMALIZATION: creation of scaled versions of statistics to

**allow the comparisons with elimination of influences.
**

● Correlation Coefficients (CC): normalized versions of covariance

**● CC measure the degree of correlation
**

● CC commonly used:

o Pearson Correlation Coefficient o Spearman’s Rank Correlation Coefficient

**Measure of the linear correlation between variables X&Y.
**

Linear X,Y relationship is modeled best by a straight line

**● r=-1: total NEGATIVE correlation
**

X,Y: variables

● r = 0: NO correlation

● r=+1: total POSITIVE correlation

cov (X,Y): covariance

sx ,sy : Standard Deviations

● ●

r removes the dependence on the units by scaling the cov by the product of the SD of X,Y r is not robust to: outliers, unequal variances, non-normality, & non-linearity

● RANK: relative position in a graded group ● RANKING: transformation of data, in which

**values are replaced by their rank
**

8.9 7.3

5.1

3.4

2.6

**Ranking of numerical dataset: { 3.4, 5.1, 7.3, 2.6, 8.9 }
**

Value Rank 8.9 5 7.3 4 5.1 3 3.4 2 2.6 1

5

4

3

2

1

**Measure of monotonic dependence between variables X&Y
**

In Monotonic X,Y relationship: Y moves in one direction ( ↑or↓) as X moves, but the relationship is not necessarily linear

** Reflects Monotone Trend (M.T.) between X&Y:
**

● =+1: perfect increasing M.T.

o +1>>0: increasing M.T. (Y↑ when X↑)

is calculated by applying the Pearson CC formula to the ranks of the data, not to values For a sample of size n, the n raw scores Xi ,Yi are converted to ranks xi , yi .

● =0: no M.T.

o -1<<0: decreasing M.T. (Y↓ when X↑)

● =-1: perfect decreasing M.T.

● is robust to outliers, unequal variances, non-normality, & non-linearity ● is non-parametric as exact sampling distribution can be obtained w/o knowing

the parameters of the joint probability distribution of X&Y.

SSDC

IDM

EDA

DDA

PoC

● Statistics is an essential component of both:

Science & Business of Medicine. is prevalent among physicians.

● Despite this fact statistical illiteracy & innumeracy ● This situation should be remedied. ● Study of Statistics poses many challenges, but

those are well worth of overcoming.

● Statistics is based on definite number of principles.

● It is best studied in an incremental fashion.

SSDC IDM EDA DDA PoC

● Statistics reflects acts of interpretation, not irrefutable facts. ● Statistics can be misused & abused. ● Statistical Analyses are result of the multiphasic process, that:

**o starts at Sample Selection, o ends with Presentation of Conclusions.
**

● Appraisal of Statistical Analyses requires familiarity with all phases. ● Understanding of Tabular, Numerical and Graphical Methods of

EDA is critical for assessing the quality of the Statistical Analysis.

To be continued…

Author wishes to thank: Stephen DeCherney, MD, MPH for his valuable comments.

Nothing to disclose: there are no known conflicts of interest associated with this presentation. Specifically, neither the author nor his family have any potential conflicts of interest, financial or otherwise regarding any of the discussed here products and/or services.

- vol2no2uploaded byGiorgos Staf
- Chapter 12 Assignmentuploaded byOm Prakash
- D02104851.pdfuploaded byAJER JOURNAL
- Data to Inzight Glossaryuploaded byaustronesian
- Ch 03 Mgmt 1362004uploaded bysin117
- Structural Reliabilityuploaded byBMWrider1453
- Personal Styles and Perceived Effectiveness in Decision-Making: A Comparative Study between Public and Private University in Braziluploaded byAnonymous 7VPPkWS8O
- Chapter 2 correlation and regression.docuploaded bySumesh Mirashi
- Essayuploaded byKainat Khowaja
- New QM- Lesson Planuploaded bypradeep
- Intro to Statistics Part IIuploaded byAssociation of American Physicians and Surgeons
- Validation of scoring Instruments in Obstetrics- Gynaecologyuploaded byCreanga Cristina
- AKU PUNYOuploaded byAzira Zamzuri
- 2_BCA_MATHuploaded bywishpond
- Vocabulary of Anal Chem (Cont.)uploaded byVu Sang
- US Federal Reserve: ifdp788uploaded byThe Fed
- Q Bank 2010 WITH Answersuploaded byazum77
- Full Text 01uploaded byAbhishek Rawat
- 11_economics_notes_ch07_correlation.pdfuploaded bymirza
- Classification project- application using SAS base programinguploaded byAndra
- Pearson and Spearman Correlation Coefficients Alternatives, by Florentin Smarandacheuploaded byscience2010
- Kaveh 86uploaded bythespammancan
- Risk Analysis in Investment Appraisaluploaded bykevinlan
- Ujian Pra Pentaksiran Prestasi STPM 2014 Semester 2uploaded bymasyati
- UCS Amireg Paper Kelessidisuploaded byVassilios Kelessidis
- 9860_pact_s03_01uploaded byMichaelZambrana
- On Construction of Robust Composite Indices by Linear Aggregationuploaded bySudhanshu K Mishra
- 122018020618uploaded byTopRankers
- Risk Analysis in Investment Apprisaluploaded byAhmad Shahien
- Hutchinson Relatia Temperament Cu via-1uploaded byAmaranthine Emma

- The 2016 Physician's Guide to Opting Out of Medicareuploaded byAssociation of American Physicians and Surgeons
- AAPS News 1974uploaded byAssociation of American Physicians and Surgeons
- Prior Authorization Consult Request Formuploaded byAssociation of American Physicians and Surgeons
- huntoon-aaps-dallas-2016-for-video.pptxuploaded byAssociation of American Physicians and Surgeons
- ABMS Data Sharing With FSMBuploaded byAssociation of American Physicians and Surgeons
- AAPS News 1972uploaded byAssociation of American Physicians and Surgeons
- Common Sense Medicine: Restoring the Patient/Physician Relationshipuploaded byAssociation of American Physicians and Surgeons
- AAPS News 1973uploaded byAssociation of American Physicians and Surgeons
- Jaggard Letter 1973uploaded byAssociation of American Physicians and Surgeons
- AAPS News March 1971 on HMOsuploaded byAssociation of American Physicians and Surgeons
- Brave New Practice: An Ophthalmologist Opts Out of Medicareuploaded byAssociation of American Physicians and Surgeons
- FREE MARKET MEDICINE Or…How to love being a doctor againuploaded byAssociation of American Physicians and Surgeons
- FSMB Changes to Policyuploaded byAssociation of American Physicians and Surgeons
- Agenda and Speaker Bios - Thrive, Not Just Survive XXIV - May 20, 2016uploaded byAssociation of American Physicians and Surgeons
- Combined WV FOIA Filesuploaded byAssociation of American Physicians and Surgeons
- December 2013 Sombrerouploaded byAssociation of American Physicians and Surgeons
- Opposition to Mandatory Maintenance of Certification (MOC) - Passed SCMAuploaded byAssociation of American Physicians and Surgeons
- Safety, Efficiency and Flexibility – Convergent or Mutually Exclusive Goals in Healthcare Designuploaded byAssociation of American Physicians and Surgeons
- Agenda & Speaker Bios for AAPS Thrive Not Just Survive XXIIIuploaded byAssociation of American Physicians and Surgeons
- Ronald Reagan address to AAPS Annual Meetinguploaded byAssociation of American Physicians and Surgeons
- 5 - More Doctors Steer Clear of Medicareuploaded byAssociation of American Physicians and Surgeons
- 2013 Battlefield to Streetuploaded byAssociation of American Physicians and Surgeons
- Agenda January 9, 2015uploaded byAssociation of American Physicians and Surgeons
- Intro to Statistics Part IIuploaded byAssociation of American Physicians and Surgeons
- Agenda / Bios Minneapolis 2014uploaded byAssociation of American Physicians and Surgeons
- Opt Out Affidavit Revision June 2015uploaded byAssociation of American Physicians and Surgeons
- Medicaid Contracting Ny Stateuploaded byAssociation of American Physicians and Surgeons
- MOC Slides, Paul Kempen, MD, PhDuploaded byAssociation of American Physicians and Surgeons
- Agenda Bios Raleigh 2015uploaded byAssociation of American Physicians and Surgeons

Read Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading