Professional Documents
Culture Documents
Verybasicstatistics 1
Verybasicstatistics 1
Course Content
• Data Types
• Descriptive Statistics
• Data Displays
Data Types
Variables
• Quantitative Variable
• A variable that is counted or measured on a
numerical scale
• Can be continuous or discrete (always a whole
number).
• Qualitative Variable
• A non-numerical variable that can be classified into
categories, but can’t be measured on a numerical
scale.
• Can be nominal or ordinal
Continuous Data
• For example
• Temperature (39.25oC)
• Time (2.468 seconds)
• Height (1.25m)
• Weight (66.34kg)
Discrete Data
• For example,
• Colours of objects (red, yellow, blue, green)
• Types of transport (plane, car, boat)
• Age
• Year of birth
• Sex
• Height
• Number of staff in a department
• Time taken to get to work
• Preferred strength of coffee
• Company size
Descriptive Statistics
Session Content
• Measures of Location
• Measures of Dispersion
Measures of Location
Common Measures
x x
n
X bar equals the sum of the data divided by the
number of data-points
Pro’s & Con’s
• Advantages • Disadvantages
– It may not be an actual
‘meaningful’ value, e.g. an
– basic calculation is easily average of 2.4 children per
understood family.
– Can be greatly affected by
extreme values in a dataset. e.g.
– all data values are used in the seven students take a test and
calculation receive the following scores.
40 42 45 50 53 54 99
– The average score is 54.7 – but
– used in many statistical is this really representative of the
procedures. group?
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18
18 24 29 30 32
Finding the Median from Individual
Data
• Step 1:- Arrange the observations in increasing order i.e.
rank order. The median will be the number that corresponds
to the middle rank.
• Step 2:- Find the middle rank with the following formula:
Middle rank = ½*(n+1)
40 42 45 50 53 54 70 99
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18
Quartiles
40 42 45 50 53 54 70 99
Position of Upper Quartile = ¾*(n+1) = 6.75
Upper quartile = data-point 6 + 0.75*(data-point 7 – data-point 6)
Upper quartile = 54 + 0.75*(70 – 54) = 66
Task 4
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18
Measures of Dispersion
Common Measures
2 4 6 8 10 12 14 16 2 4 6 8 10 12
Report turnaround time (days) Report turnaround time (days)
Common Measures
4 16 Days
Range
Pro’s & Con’s
• Advantages • Disadvantages
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18
Inter-quartile Range
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18
Variance and Standard Deviation
1 SD Mean 1 SD
Mean
1 SD 1 SD
4 6 8 10 12 14 16 Days 8 10 12
Variance and Standard Deviation
Variance s 2
x m ) 2
Variance s 2
x x) 2
N n 1
Standard Deviation s s 2 Standard Deviation s s 2
Variance
• Advantages:
• uses all of the data values
• Disadvantages:
• the variance is measured in the original units
squared
• extreme values or outliers effect the variance
considerably
• hard to calculate manually
Standard Deviation
• Advantages:
• same units of measurement as the values
• useful in theoretical work and statistical methods
and inference
• Disadvantages:
• hard to calculate manually
Task 7
18, 19, 18, 25, 22, 20, 21, 45, 33, 20, 18, 18
Session Summary
• Measures of Location
• Mean
• Mode
• Median
• Quartiles
• Measures of Dispersion
• Range
• Interquartile Range
• Variance
• Standard Deviation
Data Displays
Session Content
– Histograms
– Run charts
– Box plots
– Bar charts
– Pareto charts
– Pie charts
– Scatter plots
– Contingency tables
Histograms
25
20
Frequency
15
10
0
45.0 52.5 60.0 67.5 75.0 82.5 90.0
dataset 1 (normal)
Run Charts
Time Series Plot of Time Taken
35.0
32.5
Time Taken
30.0
27.5
25.0
mon tue wed thu fri mon tue wed thu fri mon tue wed thu fri mon tue wed thu fri
Day
Boxplots
Boxplot of dataset 1 (norma, dataset 2 (expon, dataset 3 (unifo
400
300
Data
200
100
Chart of Frequency
20
15
Frequency
10
0
missed dose wrong patient wrong dose wrong time wrong medicine
Causes of Medication Errors
Pareto Charts
80
30
Frequency
Percent
60
20
40
10
20
0 0
Causes of Medication Errors se e e nt he
r
do tim icin it e t
ng ng ed pa O
ro ro m ng
w w ng ro
ro w
w
Frequency 18 15 4 2 1
Percent 45.0 37.5 10.0 5.0 2.5
Cum % 45.0 82.5 92.5 97.5 100.0
Pie Charts
15, 37.5%
18, 45.0%
Scatterplots
70
60
50
Weight Loss
40
30
20
10
0
0 5 10 15 20 25
Time on Diet
Contingency Tables
Colour of eyes
Colour of hair Brown Green/grey Blue Total
Black 50 54 41 145
Brown 38 46 48 132
Fair 22 30 31 83
Ginger 10 10 20 40
Total 120 140 140 400=N
Session Summary
– Histograms
– Run charts
– Box plots
– Bar charts
– Pareto charts
– Pie charts
– Scatter plots
– Contingency tables
Course Summary
• Data Types
• Descriptive Statistics
• Data Displays