Professional Documents
Culture Documents
SCHOOL OF MEDICINE
DEPARTMENT OF MEDICAL EDUCATION
INTRODUCTION TO STATISTICS
@2023
Principles of Statistics
Objective
Imagine
Lets begin thethis (hypothetical)
concepts conversation
by asking “What with a man on the
is Statistics?”
taxi cab
Statistics
is the science of planning studies and
experiments, obtaining data, and then
organizing, summarizing, presenting,
analyzing, interpreting, and drawing
conclusions based on the data
Statistical Inference
Population
the complete collection of all individuals
(scores, people, measurements, and so on)
to be studied; the collection is complete in
the sense that it includes all of the
individuals to be studied
Census versus Sample
Census
Collection of data from every member of
a population
Sample
Subcollection of members selected from
a population
Parameter
Parameter
a numerical measurement describing some
characteristic of a population.
population
parameter
Statistic
Statistic
a numerical measurement describing some
characteristic of a sample.
sample
statistic
Statistic
Types of Data
Data
Qualitative Quantitative
qualitative data
consists of names or labels
Example: The genders (male/female) of professional
athletes
Example: Shirt numbers on professional athletes
uniforms - substitutes for names.
DISCRETE DATA
Discrete data
result when the number of possible values is either
a finite number or a ‘countable’ number
(i.e. the number of possible values is
0, 1, 2, 3, . . .)
x x2 ... xn x i
x 1 i 1
n n
MEAN
2 ( x1 x ) 2 .... ( xn x ) 2
S
n 1
(5 5) 2 (3 5) 2 (7 5) 2
4
3 1
•
MEASURES OF DISPERSION
The dispersion of the value when they are close to the mean
is less and vice versa
Hence the logic to measure the variation of values from the mean
CALCULATION OF THE VARIANCE
Population variance:
Mathematical notation: σ² = Σ(x – μ)²
N
ADVANTAGES AND DISADVANTAGES OF THE VARIANCE
Advantage
• It takes into consideration all the values in the set of
data.
Disadvantage
• The units of measure are squared which may be
difficult to communicate
– e.g. variance of weight will be in kg squared.
STANDARD DEVIATION
• The way around the difficulty of s² is to use the square root of the variance as a measure of
variability.
• The quantity denoted by s, is called the sample standard deviation
σ = √σ²
Where
σ² = Σ(x – μ)²
N
EXAMPLE 1
Variance S
2
( n 1)
Variance = 0.097
Standard dev. = 0.31
METHODS OF VARIABILITY MEASUREMENT
Quartiles: Data can be divided into four regions that cover the
total range of observed values. Cut points for these regions are
known as quartiles.
In notations, quartiles of a data is the (n+1)/4)qth observation of the
data, where q is the desired quartile and n is the number of
observations of data.
The first quartile (Q1) is the first 25% of the data. The second
quartile (Q2) is between the 25th and 50th percentage points in the
data. The upper bound of Q2 is the median. The third quartile
(Q3) is the 25% of the data lying between the median and the
75% cut point in the data.
Q1 is the median of the first half of the ordered observations and
Q3 is the median of the second half of the ordered observations.
METHODS OF VARIABILITY MEASUREMENT
Box plots are useful as they provide a visual summary of the data
enabling researchers to quickly identify mean values, the
dispersion of the data set, and signs of skewness.
Boxplot
BOX PLOT
Key bio statistical concepts
SAMPLING DISTRIBUTIONS
06/18/2023 1 46
Practical illustration
1. Select the weight of five people; add them up and divide by five
2. Repeat step 1 nine more times; you can select a particular person more than once
3. Record the results from steps 1 and 2. This will give you ten records in total
4. Now add the results in step 3 and divide by 10
06/18/2023 1 48
Sample and Statistic
Note that the same group of subjects can be a sample for one
question about its characteristics and a population for another
question.
We use descriptive statistic from the sample to estimate the
characteristics of the population.
Sampling process
Population Sample
Inference
The traditional and most widely used approach is termed the “classical” or “
frequentist”, and this is the one pursued in this course – as illustrated above
06/18/2023 51
Sampling distribution of a statistic
We consider the systolic blood pressure measurements for 1600 workers as the
population.
The population mean was, say, μ and the population standard deviation was,
say, σ. Each value was written on a small disc and put into a bag. Each student
was asked to shake the bag, pick 10 discs, write down the ten systolic blood
pressures, work out their mean, x, and return the discs to the bag.
We imagine that the original sampling process is repeated over and over again,
each time working out their mean.
Over a large number of repetitions this builds up a distribution for the sample
means. This distribution is called sampling distribution of the mean
We can work out:
The mean of these sample means, close to the population mean
The standard deviation of these sample means - standard error.
06/18/2023 52
Standard Error
06/18/2023 53
Central limit theorem