You are on page 1of 36

Week 1

Introduction and Data Collection
Week 1-2
Week 1 - Learning Objectives
Explain key definitions:
Descriptive vs. Inferential Statistics
Population vs. Sample
Parameter vs. Statistic
Quantitative vs. Qualitative Data
Identify types of data and levels of measurement
Week 1-3
Week 1 - Learning Objectives
Describe different sampling methods:
Probability Samples vs. Non-probability Samples
Describe data using measures of central
tendency, dispersion (variation) and shape
Use the coefficient of correlation to measure
association between two quantitative
variables
(continued)
Week 1-4
1.1: What is Statistics
The science of collecting,
organizing, presenting,
analyzing, and interpreting data
to assist in making more
effective decisions.
Week 1-5
Types of Statistic
i. Descriptive Statistics
- Methods of organizing, summarizing,
and presenting data in an informative way.
Example:
The United States government
reports the population of the US was
226,542,000 in 1980; 248,709,000 in
1990 and 265,000,000 in 2000.
Week 1-6
Types of Statistic
i. Descriptive Statistics
Collect data - e.g., Survey
Present data - e.g., Tables & graphs
Characterize data
- e.g., Sample mean =
i
X
n
¿
Week 1-7
Types of Statistic
ii. Inferential Statistics
- generalization about a population based on sample
data
e.g., estimate the population mean weight using the
sample mean weight
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Week 1-8
1.2: Key Definitions
Population - The entire set of individuals or
objects of interest or the measurements
obtained from all individuals or objects of
interest.
Sample - A portion, or part, of the population
of interest.
Week 1-9
Population vs Sample
a b c d
e f g h I j k l
m n o p q r s
t u v w x y
z
POPULATION SAMPLE
b c
g i n
o r u
y
Measures used to describe the
POPULATION are called
PARAMETERS
Measures computed from
SAMPLE data are called
STATISTICS
Week 1-10
Key Definitions
A PARAMETER is a summary measure that
describes a characteristic of the population.
A STATISTICS is a summary measure
computed from a sample to describe a
characteristic of the population.
Week 1-11
Reasons for Drawing a
Sample
1. Less TIME consuming than a census
2. Less COSTLY to administer than a
census (survey of a population)
3. Less CUMBERSOME and more
PRACTICAL to administer than a
census of the targeted population
Week 1-12
Reasons for Collecting Data
i. To provide the necessary input to a survey
ii. To provide the necessary input to a study
iii. To measure performance of an ongoing service
or production process
iv. To evaluate conformance to standards
v. To assist in formulating alternative courses of
action in a decision-making process
vi. To satisfy our curiosity
Week 1-13
1.3: Identifying Sources of
Data
SECONDARY
Data Compilation
Observation
Experimentation
Print / Electronic
Survey
PRIMARY
Data Collection
Week 1-14
1.3: Types of Data
Quantitative Data
 Measured on a naturally
occurring scale
 Equal intervals along scale
(allows for meaningful
mathematical calculations)
 Data with absolute zero (zero
means no value) is ratio data
(bank balance, grade)
 Data with relative zero (zero has
value) is interval data
(temperature)
Qualitative Data
 Measured by classification only
 Non-numerical in nature
 Meaningfully ordered categories
identify ordinal data (best to
worst ranking, age categories)
 Categories without a meaningful
order identify nominal data
(political affiliation, industry
classification, ethnic/cultural
groups)
Week 1-15
1.4: Levels of Measurement and Types
of Measurement Scales
Nominal Scale – classifies data into distinct categories in
which no ranking is implied
E.g.
Gender: Male Female
Ethnic group: Malays Chinese Indians
Others, please specify___________
Religion: Islam Christianity Hinduism
Buddhism Taoism
Others, specify ________
Week 1-16
Levels of Measurement and Types of
Measurement Scales
• Ordinal Scale – classifies data into distinct
categories in which ranking is implied
E.g.
I am interested in the cultures of other countries.
Disagree Neutral Agree
Week 1-17
Levels of Measurement and Types of
Measurement Scales
• Interval Scale – An ordered scale in which the
difference between measurements is a meaningful
quantity but does not involve a true zero point.
E.g.
Temperature (in Celsius -C
O
or Fahrenheit -F
O
)
Week 1-18
Levels of Measurement and Types of
Measurement Scales
Ratio Scale – an ordered scale in which the difference
between the measurements involves a true zero point
E.g.
What is your average mobile telephone bill per month?
RM _____________ per month
Week 1-19
Non-probability Sample
Items included are chosen without
regard to their probability of
occurrence
Probability Sample
Items in the sample are chosen on
the basis of known probabilities
1.5: Types of Samples
Used
Week 1-20
Types of Samples Used
Quota
Samples
Non-Probability
Samples
Judgement Chunk
Probability Samples
Simple
Random
Systematic
Stratified
Cluster
Convenience
Week 1-21
1.6: Measures of Central Tendency,
Variability and Shape
i. Central Tendency
- tendency of data to center about certain
numerical values
- 3 commonly used measures of
Central Tendency: Mean, Median &
Mode
Week 1-22
Numerical Measures of Central
Tendency
N
X
N
i
i ¿
=
=
1
µ
n
x
x
¿
=
¿
¿
=
f
fx
x
2
1 +
=
n
M
)
2
(
f
F
n
i I M
m
÷
+ =
) (
2 1
1
d d
d
i I M
m
+
+ =
Mode
Median
Mean
Grouping Data Ungroup Data
SAMPLE POPULATION
Numerical Measures of Variability
ii. Variation
• - the amount of dispersion, or scattering of
values away from a central value
• - variance, standard deviation, coefficient
of variation
Week 1-24
Numerical Measures of Variability
N
X
¿
÷
=
2
2
) ( µ
o
1
) (
2
2
÷
÷
=
¿
n
X X
S
1
) (
2
2
÷
÷
=
¿
n
X X f
S
N
X
¿
÷
=
2
) ( µ
o
1
) (
2
÷
÷
=
¿
n
X X
S
1
) (
2
÷
÷
=
¿
n
X X f
S
Standard
Deviation
Variance
Grouping Ungroup
Sample Population
Week 1-25
Coefficient of Variation
Relative comparison of standard deviation to the
mean
Always in Percentage (%)
Used to Compare Two or More Sets of Data
Measured in Different Units
Sensitive to Outliers
100%
S
CV
X
| |
=
|
\ .
Week 1-26
iii. Coefficient of Skewness
 This measure looks at whether or not the data is evenly
distributed about the average, or is skewed to one end.
 It is typically used with income and wealth data
 A value of zero means no skewness
( )
Deviation St
Median Mean
CofSkew
.
3 ÷
=
( )
273 . 20
667 . 31 37 3 ÷
= CofSkew
273 . 20
333 . 5 3×
=
= 0.789
Week 1-27
Shape of a Distribution
Measures of Shape:
Symmetric or skewed
Mean = Median =Mode Mean < Median < Mode
Mode < Median < Mean
Right-Skewed Left-Skewed Symmetric
Week 1-28
Interpreting the Standard Deviation
Empirical Rule
Week 1-29
Interpreting the Standard Deviation
You have purchased compact fluorescent light bulbs for your home.
Average life length is 500 hours, standard deviation is 24, and
frequency distribution for the life length is mound shaped. One of
your bulbs burns out at 450 hours. Would you send the bulb back for
a refund?
Interval Range % of observations
included
% of observations
excluded
476 - 524
Approximately
68%
Approximately
32%
452 - 548
Approximately
95%
Approximately 5%
428 - 572
Approximately
99.7%
Approximately
0.3%
s 1 ±
s 2 ±
s 3 ±
Week 1-30
1.11: The Sample Covariance
The sample covariance measures the strength of the linear
relationship between two variables (called bivariate data)
The sample covariance:
Only concerned with the strength of the relationship
No causal effect is implied
1 n
) Y Y )( X X (
) Y , X ( cov
n
1 i
i i
÷
÷ ÷
=
¿
=
Week 1-31
Covariance between two random variables:
cov(X,Y) > 0 X and Y tend to move in the same
direction
cov(X,Y) < 0 X and Y tend to move in opposite
directions
cov(X,Y) = 0 X and Y are independent
Interpreting Covariance
Week 1-32
1.12: Coefficient of Correlation
Measures the relative strength of the linear
relationship between two variables
Sample coefficient of correlation:
Y X
n
1 i
2
i
n
1 i
2
i
n
1 i
i i
S S
) Y , X ( cov
) Y Y ( ) X X (
) Y Y )( X X (
r =
÷ ÷
÷ ÷
=
¿ ¿
¿
= =
=
Week 1-33
Features of Correlation Coefficient, r
Correlation Coefficient, r Value
Perfect positive correlation +1.00
Strong positive correlation 0.50 to 0.99
Mediumpositive correlation 0.30 to 0.49
Weak positive correlation 0.01 to 0.29
No correlation 0
Weak negative correlation -0.01 to -0.29
Mediumnegative correlation -0.30 to -0.49
Stronger negative correlation -0.50 to -0.99
Perfect negative correlation -1.00
Week 1-34
Scatter Plots of Data with Various Correlation
Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -0.6 r = 0
r = +0.3 r = +1
Y
X
r = 0
Week 1-35
Summary
• Introduced key definitions:
• Descriptive vs. Inferential Statistics
• Population vs. Sample
• Parameter vs. Statistic
• Quantitative vs. Qualitative Data
• Described different types of samples
Week 1-36
Summary
• Reviewed data types and measurement levels
• Described measures of central tendency
• Mean, median, mode
• Described measures of variation
• Range, variance and standard deviation, coefficient of
variation
• Measure association between two quantitative variables
(continued)