You are on page 1of 47

CHAPTER 1: INTRODUCTION TO

STATISTICS

1
WHAT IS STATISTICS?

The science of collecting, organizing, presenting,


analyzing, and interpreting data to assist in making
more effective decisions
Statistical analysis – used to manipulate summarize,
and investigate data, so that useful information needed
in decision-making results.

2
WHY STUDY STATISTICS?
1. Data are everywhere
2. Statistical techniques are used to make many decisions
that affect our lives
3. No matter what your career is, you will make
professional decisions that involve data. An
understanding of statistical methods will help you make
these decisions efectively
MATHEMATICAL MODEL
1. A real world problem is observed
2. A mathematical model is devised
3. The model is used to make predictions about the expected
behavior of the real world problem
4. Experimental data is collected from the real world.
5. Compare the predicted and observed outcomes.
6. Statistical tests are used to assess how well the model describes the
real world.
7. The mathematical model is refined, if necessary, to improve the
match of predicted outcomes with the observed.

4
APPLICATIONS OF STATISTICAL
CONCEPTS IN THE BUSINESS WORLD
Finance – correlation and regression, index numbers, time
series analysis
Marketing – hypothesis testing, chi-square tests,
nonparametric statistics
Personnel – hypothesis testing, chi-square tests,
nonparametric tests
Operating management – hypothesis testing, estimation,
analysis of variance, time series analysis
TYPES OF STATISTICS

statistics – Methods of organizing,


Descriptive
summarizing, and presenting data in an informative way
Inferential statistics – The methods used to determine
something about a population on the basis of a sample
 Population –The entire set of individuals or objects of
interest or the measurements obtained from all
individuals or objects of interest
 Sample – A portion, or part, of the population of
interest

6
INFERENTIAL STATISTICS
Estimation
 e.g., Estimate the population mean
weight using the sample mean weight

Hypothesis testing
 e.g., Test the claim that the population
mean weight is 70 kg

Inference is the process of drawing conclusions or making decisions about a


population based on sample results
SAMPLING
a sample should have the same characteristics
as the population it is representing.
Sampling can be:
with replacement: a member of the population may be chosen more
than once (picking the candy from the bowl)
without replacement: a member of the population may be chosen
only once (lottery ticket)
SAMPLING METHODS
Sampling methods can be:
random (each member of the population has an equal chance of
being selected)
nonrandom

The actual process of sampling causes sampling


errors. For example, the sample may not be large
enough or representative of the population. Factors not
related to the sampling process cause nonsampling
errors. A defective counting device can cause a
RANDOM SAMPLING METHODS
simple random sample (each sample of the same size
has an equal chance of being selected)
stratified sample (divide the population into groups
called strata and then take a sample from each stratum)
cluster sample (divide the population into strata and then
randomly select some of the strata. All the members from
these strata are in the cluster sample.)
systematic sample (randomly select a starting point and
take every n-th piece of data from a listing of the
population)
DESCRIPTIVE STATISTICS

Collect data
 e.g., Survey

Present data
 e.g., Tables and graphs

Summarize data
X i
 e.g., Sample mean = n
VARIABLES

A variable is a characteristic or condition that can


change or take on different values.
Most research begins with a general question about the
relationship between two variables for a specific group
of individuals.

13
POPULATION

The entire group of individuals is called the population.


For example, a researcher may be interested in the
relation between class size (variable 1) and academic
performance (variable 2) for the population of third-
grade children.

14
SAMPLE

Usually populations are so large that a researcher


cannot examine the entire group. Therefore, a sample
is selected to represent the population in a research
study. The goal is to use the results obtained from the
sample to help answer questions about the population.

15
STATISTICAL DATA
 The collection of data that are relevant to the problem being
studied is commonly the most difficult, expensive, and time-consuming
part of the entire research project.
 Statistical data are usually obtained by counting or measuring items.
 Primary data are collected specifically for the analysis desired
 Secondary data have already been compiled and are available for statistical
analysis
 A variable is an item of interest that can take on many different
numerical values.
 A constant has a fixed numerical value.
DATA
Statistical data are usually obtained by counting or measuring items.
Most data can be put into the following categories:
Qualitative data are measurements that each fail into one of several
categories. (hair color, ethnic groups and other attributes of the
population)
Quantitative data are observations that are measured on a numerical
scale (distance traveled to college, number of children in a family,
etc.)
QUALITATIVE DATA
Qualitative data are generally described by words or
letters. They are not as widely used as quantitative data
because many numerical techniques do not apply to the
qualitative data. For example, it does not make sense to
find an average hair color or blood type.
Qualitative data can be separated into two subgroups:
 dichotomic (if it takes the form of a word with two
options (gender - male or female)
 polynomic (if it takes the form of a word with more than
two options (education - primary school, secondary
school and university).
QUANTITATIVE DATA

Quantitative data are always numbers and are the


result of counting or measuring attributes of a
population.
Quantitative data can be separated into two
subgroups:
discrete (if it is the result of counting (the number of
students of a given ethnic group in a class, the number of
books on a shelf, ...)
continuous (if it is the result of measuring (distance
traveled, weight of luggage, …)
TYPES OF VARIABLES
Variables

Qualitative Quantitative

Dichotomic Polynomic Discrete Continuous

Children in family, Amount of income


Gender, marital Brand of Pc, hair
Strokes on a golf tax paid, weight of
status color
hole a student
MEASURING VARIABLES
To establish relationships between variables, researchers must observe the
variables and record their observations. This requires that the variables be
measured.
The process of measuring a variable requires a set of categories called a scale
of measurement and a process that classifies each individual into one
category.

22
4 TYPES OF MEASUREMENT SCALES
1. A nominal scale is an unordered set of categories identified
only by name. Nominal measurements only permit you to
determine whether two individuals are the same or different.
2. An ordinal scale is an ordered set of categories. Ordinal
measurements tell you the direction of difference between two
individuals.

23
4 TYPES OF MEASUREMENT SCALES
3. An interval scale is an ordered series of equal-sized
categories. Interval measurements identify the
direction and magnitude of a difference. The zero
point is located arbitrarily on an interval scale.
4. A ratio scale is an interval scale where a value of
zero indicates none of the variable. Ratio
measurements identify the direction and magnitude
of differences and allow ratio comparisons of
measurements.

24
ASSIGNMENT
Give at least 5 examples for each type of data:
1. Qualitative Data
2. Quantitative Data
3. Nominal
4. Ordinal
5. Interval
6. Ratio

25
MISUSES OF STATISTICS
Examples:
1. Using Statistics to convince people to buy products
2. Generalizing conclusions about a sample that does not represent
the population (biased sample)
3. Using Statistics to alter opinions of people

Assignment:
Look for other ways where Statistics are used in the wrong way.
Explain their negative effects.

26
II. DATA COLLECTION AND SAMPLING

27
METHODS OF COLLECTING DATA

There are many methods used to collect or obtain data for statistical
analysis. Three of the most popular methods are:

•Direct Observation
•Experiments, and
•Surveys.

28
SURVEYS…
A survey solicits information from people.
 e.g. polls; pre-election polls; marketing surveys.

The Response Rate (i.e. the proportion of all people selected who
complete the survey) is a key survey parameter.

Surveys may be administered in a variety of ways, e.g.


•Personal Interview,
•Telephone Interview, and
•Self-Administered Questionnaire.
29
QUESTIONNAIRE DESIGN…
Key design principles:
1.Keep the questionnaire as short as possible.
2.Ask short, simple, and clearly worded questions.
3.Start with demographic questions to help respondents get started
comfortably.
4.Use dichotomous (yes | no) and multiple choice questions.

5.Use open-ended questions cautiously.


6.Avoid using leading-questions.
7.Pretest a questionnaire on a small number of people.
8.Think about the way you intend to use the collected data when
preparing the questionnaire.
30
SAMPLING PLANS…

A sampling plan is just a method or procedure for specifying how a


sample will be taken from a population.

We will focus our attention on these three methods:

•Simple Random Sampling,


•Stratified Random Sampling, and
•Cluster Sampling.

31
SIMPLE RANDOM SAMPLING

A simple random sample is a sample selected in such a way that


every possible sample of the same size is equally likely to be chosen.

Drawing three names from a hat containing all the names of the
students in the class is an example of a simple random sample: any
group of three names is as equally likely as picking any other group
of three names.

32
USING THE EXCEL FUNCTION RAND() TO
GENERATE A RANDOM SAMPLE FROM A
POPULATION.
E x am p le: A government income tax auditor must choose a sample of
40 (usually denoted by n) of 1,000 (usually denoted by N ) returns to
audit. . .

Extra #’s may be used if


duplicate randomnumbers are
generated
33
STRATIFIED R A N D O M SAMPLING

A stratified random sample is obtained by separating the population


into mutually exclusive sets, or strata, and then drawing simple
random samples from each stratum.

Strata 1 Strata 2 Strata 3


Gender Age Occupation
Male <20
Female 20-30
31-40
41-50
>50

34
After the population has been stratified, we can use simple random
sampling to generate the complete sample:

35
CLUSTER SAMPLING

A cluster sample is a simple random sample of groups or clusters of


elements (vs. a simple random sample of individual objects).

This method is useful when it is difficult or costly to develop a


complete list of the population members or when the population
elements are widely dispersed geographically.

Cluster sampling may increase sampling error due to similarities


among cluster members.

36
SYSTEMATIC RANDOM SAMPLING
•Systematic sampling is a probability sampling method in which a
random sample, with a fixed periodic interval, is selected from a
larger population.
•The fixed periodic interval, called the sampling interval, is calculated
by dividing the population size by the desired sample size.

37
SAMPLE SIZE
This is an important issue. Numerical techniques for determining
sample sizes will be described later, but suffice it to say that the larger
the sample size is, the more accurate wecan expect the sample
estimates to be.

38
SAMPLING AND NON-SAMPLING ERRORS

Two major types of error can arise when a sample of observations is


taken from a population: sampling error and non-sampling error.

Sampling error refers to differences between the sample and the


population that exist only because of the observations that happened
to be selected for the sample.

Non-sampling errors are more serious and are due to mistakes


made in the acquisition of data or due to the sample observations
being selected improperly.

39
SAMPLING ERROR

Sampling error refers to differences between the sample and the


population that exist only because of the observations that happened
to be selected for the sample.

Another way to look at this is: the differences in results for different
samples (of the same size) is due to sampling error:

E.g. Two samples of size 10 of 1,000 households. If we happened to


get the highest income level data points in our first sample and all
the lowest income levels in the second, this is a consequence of
sampling error.

Increasing the sample size will reduce this type of error. 40


NON-SAMPLING ERROR

Non-sampling error are more serious and are due to mistakes made
in the acquisition of data or due to the sample observations being
selected improperly.

There are three types of non-sampling errors:

•Errors in data acquisition,


•Nonresponse errors, and
•Selection bias.

Increasing the sample size will not reduce this type of error.
41
ERRORS IN DATA ACQUISITION

. . . arises from the recording of incorrect responses, due to:

—incorrect measurements being taken because of faulty equipment,

—mistakes made during transcription from primary sources,


—inaccurate recording of data due to misinterpretation of terms, or
—inaccurate responses to questions concerning sensitive issues.

42
NON-RESPONSE ERROR

. . . refers to error (or bias) introduced when responses are not


obtained from some members of the sample, i.e. the sample
observations that are collected may not be representative of the
target population.

As mentioned earlier, the Response Rate (i.e. the pro- portion of all
people selected who complete the survey) is a key survey parameter
and helps in the understanding in the validity of the survey and
sources of nonresponse error.

43
SELECTION BIAS

. . . occurs when the sampling plan is such that some members of


the target population cannot possibly be selected for inclusion in the
sample.

44
45
NON-PROBABILITY SAMPLING
1. CONVENIENCE SAMPLING - a non-probability sampling
technique where samples are selected from the population only
because they are conveniently available to the researcher
2. QUOTA SAMPLING - the population is first segmented into
mutually exclusive sub-groups, just as in stratified sampling.
Then judgement is used to select the subjects or units from
each segment based on a specified proportion
3. PURPOSIVE SAMPLING - researchers choose only those people
who they deem fit to participate in the research study

46
NON-PROBABILITY SAMPLING

4. CONSECUTIVE SAMPLING - the researcher picks a single person


or a group of a sample, conducts research over a period, analyzes
the results, and then moves on to another subject or group if
needed
5. SNOWBALL SAMPLING - once the researchers find suitable
subjects, he asks them for assistance to seek similar subjects to form
a considerably good size sample

47

You might also like