Introduction and Data Collection
Week 12
Week 1  Learning Objectives
Explain key definitions:
Descriptive vs. Inferential Statistics
Population vs. Sample
Parameter vs. Statistic
Quantitative vs. Qualitative Data
Identify types of data and levels of measurement
Week 13
Week 1  Learning Objectives
Describe different sampling methods:
Probability Samples vs. Nonprobability Samples
Describe data using measures of central
tendency, dispersion (variation) and shape
Use the coefficient of correlation to measure
association between two quantitative
variables
(continued)
Week 14
1.1: What is Statistics
The science of collecting,
organizing, presenting,
analyzing, and interpreting data
to assist in making more
effective decisions.
Week 15
Types of Statistic
i. Descriptive Statistics
 Methods of organizing, summarizing,
and presenting data in an informative way.
Example:
The United States government
reports the population of the US was
226,542,000 in 1980; 248,709,000 in
1990 and 265,000,000 in 2000.
Week 16
Types of Statistic
i. Descriptive Statistics
Collect data  e.g., Survey
Present data  e.g., Tables & graphs
Characterize data
 e.g., Sample mean =
i
X
n
¿
Week 17
Types of Statistic
ii. Inferential Statistics
 generalization about a population based on sample
data
e.g., estimate the population mean weight using the
sample mean weight
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Week 18
1.2: Key Definitions
Population  The entire set of individuals or
objects of interest or the measurements
obtained from all individuals or objects of
interest.
Sample  A portion, or part, of the population
of interest.
Week 19
Population vs Sample
a b c d
e f g h I j k l
m n o p q r s
t u v w x y
z
POPULATION SAMPLE
b c
g i n
o r u
y
Measures used to describe the
POPULATION are called
PARAMETERS
Measures computed from
SAMPLE data are called
STATISTICS
Week 110
Key Definitions
A PARAMETER is a summary measure that
describes a characteristic of the population.
A STATISTICS is a summary measure
computed from a sample to describe a
characteristic of the population.
Week 111
Reasons for Drawing a
Sample
1. Less TIME consuming than a census
2. Less COSTLY to administer than a
census (survey of a population)
3. Less CUMBERSOME and more
PRACTICAL to administer than a
census of the targeted population
Week 112
Reasons for Collecting Data
i. To provide the necessary input to a survey
ii. To provide the necessary input to a study
iii. To measure performance of an ongoing service
or production process
iv. To evaluate conformance to standards
v. To assist in formulating alternative courses of
action in a decisionmaking process
vi. To satisfy our curiosity
Week 113
1.3: Identifying Sources of
Data
SECONDARY
Data Compilation
Observation
Experimentation
Print / Electronic
Survey
PRIMARY
Data Collection
Week 114
1.3: Types of Data
Quantitative Data
Measured on a naturally
occurring scale
Equal intervals along scale
(allows for meaningful
mathematical calculations)
Data with absolute zero (zero
means no value) is ratio data
(bank balance, grade)
Data with relative zero (zero has
value) is interval data
(temperature)
Qualitative Data
Measured by classification only
Nonnumerical in nature
Meaningfully ordered categories
identify ordinal data (best to
worst ranking, age categories)
Categories without a meaningful
order identify nominal data
(political affiliation, industry
classification, ethnic/cultural
groups)
Week 115
1.4: Levels of Measurement and Types
of Measurement Scales
Nominal Scale – classifies data into distinct categories in
which no ranking is implied
E.g.
Gender: Male Female
Ethnic group: Malays Chinese Indians
Others, please specify___________
Religion: Islam Christianity Hinduism
Buddhism Taoism
Others, specify ________
Week 116
Levels of Measurement and Types of
Measurement Scales
• Ordinal Scale – classifies data into distinct
categories in which ranking is implied
E.g.
I am interested in the cultures of other countries.
Disagree Neutral Agree
Week 117
Levels of Measurement and Types of
Measurement Scales
• Interval Scale – An ordered scale in which the
difference between measurements is a meaningful
quantity but does not involve a true zero point.
E.g.
Temperature (in Celsius C
O
or Fahrenheit F
O
)
Week 118
Levels of Measurement and Types of
Measurement Scales
Ratio Scale – an ordered scale in which the difference
between the measurements involves a true zero point
E.g.
What is your average mobile telephone bill per month?
RM _____________ per month
Week 119
Nonprobability Sample
Items included are chosen without
regard to their probability of
occurrence
Probability Sample
Items in the sample are chosen on
the basis of known probabilities
1.5: Types of Samples
Used
Week 120
Types of Samples Used
Quota
Samples
NonProbability
Samples
Judgement Chunk
Probability Samples
Simple
Random
Systematic
Stratified
Cluster
Convenience
Week 121
1.6: Measures of Central Tendency,
Variability and Shape
i. Central Tendency
 tendency of data to center about certain
numerical values
 3 commonly used measures of
Central Tendency: Mean, Median &
Mode
Week 122
Numerical Measures of Central
Tendency
N
X
N
i
i ¿
=
=
1
µ
n
x
x
¿
=
¿
¿
=
f
fx
x
2
1 +
=
n
M
)
2
(
f
F
n
i I M
m
÷
+ =
) (
2 1
1
d d
d
i I M
m
+
+ =
Mode
Median
Mean
Grouping Data Ungroup Data
SAMPLE POPULATION
Numerical Measures of Variability
ii. Variation
•  the amount of dispersion, or scattering of
values away from a central value
•  variance, standard deviation, coefficient
of variation
Week 124
Numerical Measures of Variability
N
X
¿
÷
=
2
2
) ( µ
o
1
) (
2
2
÷
÷
=
¿
n
X X
S
1
) (
2
2
÷
÷
=
¿
n
X X f
S
N
X
¿
÷
=
2
) ( µ
o
1
) (
2
÷
÷
=
¿
n
X X
S
1
) (
2
÷
÷
=
¿
n
X X f
S
Standard
Deviation
Variance
Grouping Ungroup
Sample Population
Week 125
Coefficient of Variation
Relative comparison of standard deviation to the
mean
Always in Percentage (%)
Used to Compare Two or More Sets of Data
Measured in Different Units
Sensitive to Outliers
100%
S
CV
X
 
=

\ .
Week 126
iii. Coefficient of Skewness
This measure looks at whether or not the data is evenly
distributed about the average, or is skewed to one end.
It is typically used with income and wealth data
A value of zero means no skewness
( )
Deviation St
Median Mean
CofSkew
.
3 ÷
=
( )
273 . 20
667 . 31 37 3 ÷
= CofSkew
273 . 20
333 . 5 3×
=
= 0.789
Week 127
Shape of a Distribution
Measures of Shape:
Symmetric or skewed
Mean = Median =Mode Mean < Median < Mode
Mode < Median < Mean
RightSkewed LeftSkewed Symmetric
Week 128
Interpreting the Standard Deviation
Empirical Rule
Week 129
Interpreting the Standard Deviation
You have purchased compact fluorescent light bulbs for your home.
Average life length is 500 hours, standard deviation is 24, and
frequency distribution for the life length is mound shaped. One of
your bulbs burns out at 450 hours. Would you send the bulb back for
a refund?
Interval Range % of observations
included
% of observations
excluded
476  524
Approximately
68%
Approximately
32%
452  548
Approximately
95%
Approximately 5%
428  572
Approximately
99.7%
Approximately
0.3%
s 1 ±
s 2 ±
s 3 ±
Week 130
1.11: The Sample Covariance
The sample covariance measures the strength of the linear
relationship between two variables (called bivariate data)
The sample covariance:
Only concerned with the strength of the relationship
No causal effect is implied
1 n
) Y Y )( X X (
) Y , X ( cov
n
1 i
i i
÷
÷ ÷
=
¿
=
Week 131
Covariance between two random variables:
cov(X,Y) > 0 X and Y tend to move in the same
direction
cov(X,Y) < 0 X and Y tend to move in opposite
directions
cov(X,Y) = 0 X and Y are independent
Interpreting Covariance
Week 132
1.12: Coefficient of Correlation
Measures the relative strength of the linear
relationship between two variables
Sample coefficient of correlation:
Y X
n
1 i
2
i
n
1 i
2
i
n
1 i
i i
S S
) Y , X ( cov
) Y Y ( ) X X (
) Y Y )( X X (
r =
÷ ÷
÷ ÷
=
¿ ¿
¿
= =
=
Week 133
Features of Correlation Coefficient, r
Correlation Coefficient, r Value
Perfect positive correlation +1.00
Strong positive correlation 0.50 to 0.99
Mediumpositive correlation 0.30 to 0.49
Weak positive correlation 0.01 to 0.29
No correlation 0
Weak negative correlation 0.01 to 0.29
Mediumnegative correlation 0.30 to 0.49
Stronger negative correlation 0.50 to 0.99
Perfect negative correlation 1.00
Week 134
Scatter Plots of Data with Various Correlation
Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = 1 r = 0.6 r = 0
r = +0.3 r = +1
Y
X
r = 0
Week 135
Summary
• Introduced key definitions:
• Descriptive vs. Inferential Statistics
• Population vs. Sample
• Parameter vs. Statistic
• Quantitative vs. Qualitative Data
• Described different types of samples
Week 136
Summary
• Reviewed data types and measurement levels
• Described measures of central tendency
• Mean, median, mode
• Described measures of variation
• Range, variance and standard deviation, coefficient of
variation
• Measure association between two quantitative variables
(continued)