You are on page 1of 29

Chapter 1:

INTRODUCTION
TO STATISTICS
2

◦What is statistics?
Learning ◦Uses of statistics

Outcome ◦Types of statistics


◦Common statistical terms
◦Sources of data
◦Types of variables
◦Scales of measurement
3

○Data are
Why study everywhere. ○No matter what your
career, you will make
Statistics? professional decisions
○Statistical that involve data. An
techniques are understanding of
used to make many statistical methods
decisions that affect will help you make
our lives. these decisions
efectively.
What is meant by Statistics?
 In the more common usage, statistics refers to numerical information
 Examples:
o the average starting salary of college graduates
o the number of deaths due to motor accident last year
o the number of student enroll at a university this semester
 We often present statistical information in a graphical
form for capturing reader attention and to portray a large amount of
information. 4
5

Formal definition
STATISTICS is the science of
collecting, organizing, presenting,
analyzing and interpreting
numerical data to assist in making
more effective decisions.
Who uses Statistics
6

Medicine
Effectiveness of drugs
Predict diseases

Education
Predict most favouritesubject
Predict CGPA
Business/Marketing
Law
Predict Sales
Organize evidence to make decision
Consumer Preferences
Financial Trends
Types of Statistics?
Descriptive Inferential

A decision, estimate,
Describe the situation
prediction, or
generalization about a
population, based on a
sample (make
Methods of organizing, inferences about
summarizing, and population based on
presenting data in an sample)
informative way.
Examples
Of 350 randomly selected students in the faculty FSKM, Shah Alam,
180 students had the first name Mohd.

Descriptive Inferential

“51% of these students "51% of FSKM students


have the first name Mohd.." have the first name Mohd.."
Who uses Statistics 9
○A population (universe) is a collection of all possible individuals,
objects, or measurements of interest.

○A sample is a portion, or part, of the population of interest.


Example
Statistical Terms 10

• All items of interest opulation


•Census – if the study involve the
Population whole population
arameter
•Parameter – summary measure of
the whole population

• Portion of population
•Sample survey – involved subgroup (or
Sample sample) of selected population ample
•Statistic – summary measure
computed from sample data
tatistics

• a small survey taken in advance


Pilot study before a major observations
11
Primary

• First hand data


• Collected by the investigator
• Eg. Interview respondents, survey, experiment
Source of data • Advantage – more accurate and consistent
• Able to explain how the data are collected and limitation used
• Disadvantage – requires more time, manpower and high cost

Secondary

• Taken from other investigator’s collection of figures


• Data collected from other parties
• Eg. Bank Negara, Statistics Department
• Advantage – easily accessible from the internet, journals, books,
annual reports etc and inexpensive, less time to collect
• Disadvantage – lack accuracy because method of data collection
are not explained and biased – original purpose of data collection
is not known
Types of Variables 12

Variables

Qualitative Quantitative

categorical Discrete Continuous

mostly (e.g.,number obtained


e.g.,make of a (e.g., length,
integers or of houses, through
computer, hair age, height,
numbers used cars, measuring
color, gender weight, time)
for counting accidents) process
Levels of Measurement 13

Level of Nominal Ordinal Interval Ratio


Measurements

ordinal, interval,
nominal,
plus plus ratios
categorical plus can
intervals are
(names) be ranked
are consistent,
(order)
consistent true zero
Nominal data 14

• Represent observations that can be categorized, do


not have a meaningful numeric value
• Examples: Gender, Religion, Nationality, Favorite colour,
Number on a football jersey

Properties:
1. Observations of a qualitative variable can only be
classified and counted.
2. There is no particular order to the labels.

Note:
• The values cannot be compared to see if one is
larger than the other
• Cannot calculate the MEAN
Ordinal data 15

• Represent observations that can be categorized and rank ordered


• The values can be compared to see if one is larger or smaller than
the other
• Examples:
o Consumer satisfaction ratings,
o Military rank - Private, Lieutenant, Captain, General
o Class ranking - Grade (A, B, C, D, E, F)

Properties:
1. Data classifications are represented by sets of labels or names
(high, medium, low) that have relative values.
2. Because of the relative values, the data classified can be
ranked or ordered.
Ordinal data 16

Note:
• cannot assume the differences between adjacent scale
values are equal
• cannot make this assumption even if the labels are
number, not words
Interval data 17

• Represent observations that can be categorized, rank


ordered, and have an unit of measure
• A unit of measure implies that the difference between any two
successive values is identical
• Examples: Farennheit temperature scale

Properties:
1. Data classifications are ordered according to the amount of
the characteristic they possess.
2. Equal differences in the characteristic are represented by
equal differences in the measurements.
Interval data 18

Note:
• Can be added or subtracted (cannot be multiplied or divided)
• No true zero point (the value 0 does not represent the
complete absence of the variable)
Example: Women’s dress
sizes listed on the table.
Ratio data 19

• Highest and most informative scale


• Observations that can be categorized, rank ordered, have an
unit measure and have a true zero (an absolute zero point)
• The true zero implies that a value zero represents the
complete absence of the variable
• Examples:
o amount of money – zero money indicates the absence of money
o Weight, height, time
Ratio data 20

Properties:
1. Data classifications are ordered according to the amount of the
characteristics they possess.
2. Equal differences in the characteristic are represented by equal
differences in the numbers assigned to the classifications.
3. The zero point is the absence of the characteristic and the ratio
between two numbers is meaningful.

Note:
• Can be multiplied or divided
21

Ratio
Highest Strongest forms
Levels of scale of measurement

Measurement Interval

to
Ordinal

Lowest
Nominal scale
Weakest form of
measurement
Variables and Types of Data
Variable Nominal Ordinal Interval Ratio Level
Hair colour

Postcode

Letter Grade

CGPA

Height

Age

Temperature
(F) 22
23

▪ In an observational study, the researcher


Observational merely observes and tries to draw conclusions
and based on the observations.
Experimental ▪ The researcher manipulates the independent
Studies (explanatory) variable and tries to determine
how the manipulation influences the
dependent (outcome) variable in an
experimental study.
▪ A confounding variable influences the
dependent variable but cannot be separated
from the independent variable.
Example 1 24

Traffic offence is a growing concern at Dewan


Bandaraya in Kuala Lumpur. A study was
conducted to determine the profile of these traffic
offenders. A researcher from this office collected
data on the age, gender, race, types of offence,
the amount of fine paid and the years of driving
experience from a sample of traffic offenders as
they entered the building to pay their fines. The
researcher also checked the office database to
obtain the number of traffic offences by these
drivers.
Example 1 (cont’d) 25

i) State the population for the above study.


ii) Is the above study a census study or sample
study?
iii) Was any secondary data used for the above
study? If there was, please state the data.
iv) State the variable (s) and measurement scale
from this study.
v) What is the most suitable data collection
method? Give ONE (1) advantage and ONE
(1) disadvantage of this method.
Example 1 (Solution) 26

i) State the population for the above study.


A: ALL the traffic offenders in K.L.
ii) Is the above study a census study or sample
study?
A: Sample
iii) Was any secondary data used for the above
study? If there was, please state the data.
A: Yes. Number of traffic offense by the drivers
Example 1 (Solution) 27

iv) State the variable(s) and measurement scale


from this study.
Variable Level

Age Ratio

Gender Nominal

Race Nominal

Types of offence Nominal

Amount of fine paid Ratio

Years of driving Ratio


experience
Example 1 (Solution) 28

iv) What is the most suitable data collection


method? Give ONE (1) advantage and ONE
(1) disadvantage of this method.

Method Advantage Disadvantage

Personal interview Higher response expensive


rate
Tutorial
29

Review Exercises
Pg. 26-28
• Q 4-9, 12, 13, 17, 18

Chapter Quiz
Pg. 29-30
• Q 1-6, 8, 10, 11, 22-24

You might also like