You are on page 1of 80

Nguyen Thu Hang

Theory of Economics Statistics


(TOAE301)
Assessment

 Performance: 10%
 Mid-term test + project: 30%
 Final term test : 60%
Course outline
 Chapter 1: Introduction to Statistics
 Chapter 2: Summarizing Data
 Chapter 3: Numerical Descriptive Techniques
 Chapter 4: Inferences Based on a Single
Sample: Inferences Based on a Two Samples
Confidence Intervals and Tests of Hypothesis
 Chapter 5: Inferences Based on a Two Samples
Confidence Intervals and Tests of Hypothesis
 Chapter 6: Regression Analysis
 Chapter 7: Time series analysis
Text book

 Gerald Keller (2018), Statistics for Management


and Economics, Cengage Learning
 James T. McClave • P. George Benson • Terry
Sincich (2018), Statistics for Business and
Economics, Pearson Education
 Levin, Stephan, Krehbiel & Berenson, Statistics
for Managers Using Microsoft Excel, 8e © 2017
Pearson Prentice-Hall, Inc.
Chapter 1

Introduction to Statistics (6 hours)


Learning Objectives
In this chapter you learn:
 1. Statistics Definition and Objectives
 2. Statistical Concepts
 3. Types of data and variable
measurements
 4. Statistical Analysis Process
 5. Source of Data
 6. Questionnaire design
Business Statistics Marks
 A student enrolled in a business
program is attending the first class
of the required statistics course.
The student is somewhat
apprehensive because he believes
the myth that the course is
difficult. To alleviate his anxiety,
the student asks the professor
about last year’s marks. The
professor obliges and provides a
list of the final marks, which is
composed of term work plus the
final exam. What information
can the student obtain from the
list?
Business Statistics Marks
 A student enrolled in a business
program is attending the first class
of the required statistics course.
The student is somewhat
apprehensive because he believes
the myth that the course is
difficult. To alleviate his anxiety,
the student asks the professor
about last year’s marks. The
professor obliges and provides a
list of the final marks, which is
composed of term work plus the
final exam. What information
can the student obtain from the
list?
Case Pepsi’ Agreement
Case Pepsi’ Agreement
1. What Is Statistics?

1. Collecting Data Data


e.g., Survey Analysis
2. Presenting Data Why?
e.g., Charts & Tables

3. Characterizing Data
Decision-
e.g., Average
Making
1. What is statistics?

 A branch of mathematics taking and


transforming numbers into useful information for
decision makers. Statistics is a way to get
information from data.
 Methods for processing & analyzing numbers
 Methods for helping reduce the uncertainty
inherent in decision making
1. What Is Statistics?
Statistics is the science of data.
It involves
collecting,
classifying,
summarizing,
organizing,
analyzing,
interpreting
numerical information.
Application Areas

 Economics  Engineering
 Forecasting  Construction
 Demographics  Materials

 Sports  Business
 Individual & Team  Consumer
Performance Preferences
 Financial Trends
Objectives of Statistics

Decision Makers Use Statistics To:


 Present and describe business data and information
properly
 Draw conclusions about large groups of individuals or
items, using information collected from subsets of the
individuals or items.
 Make reliable forecasts about a business activity
 Improve business/production processes
 Improve product quality
Statistics: Two Processes

A Describing sets of data

B Drawing conclusions
making estimates,
decisions,
predictions, etc.
about sets of data based on sampling
Types of Statistics

 Statistics
 The branch of mathematics that transforms data into
useful information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, and Drawing conclusions and/or


describing data making decisions concerning a
population based only on sample
data
Descriptive Statistics

 Collect data
 e.g., Survey
 Present data
 e.g., Tables and graphs
 Characterize data
X i
 e.g., Sample mean = n
Descriptive Statistics

Descriptive statistics
utilizes numerical and graphical methods to
explore data,
i.e., to look for patterns in a data set,
to summarize the information revealed in a
data set,
to present the information in a convenient
form.
Inferential Statistics
 Estimation
 e.g., Estimate the population
mean weight using the sample
mean weight
 Hypothesis testing
 e.g., Test the claim that the
population mean weight is 120
pounds

Drawing conclusions about a large group of


individuals based on a subset of the large group.
Inferential Statistics

 Inferential statistics utilizes sample data to


make estimates,
 decisions,
 predictions,
 other generalizations
 about a larger set of data.
Example- Inferential statistics
2. Statistical Concepts
Experimental unit Object upon which we collect data
Population
the totality of objects under consideration • P in
Variable Population
Characteristic of an individual & Parameter
experimental unit • S in Sample
Measurement & Statistic
the process we use to assign numbers to variables of
individual population units
Sample
Subset of the units of a population that is selected for analysis
2. Statistical Concepts
 Data
 facts or information that is relevant or appropriate to

a decision maker
 Parameter
 a summary measure (e.g., mean) that is computed

to describe a characteristic of the population


 Statistic
 a summary measure (e.g., mean) that is computed

to describe a characteristic of the sample


Population vs. Sample

Population Sample

Measures used to describe the Measures computed from


population are called parameters sample data are called statistics
Example

 According to a report in the Washington Post (Sep. 5,


2014), the average age of viewers of television programs
broadcast on CBS, NBC, and ABC is 54 years. Suppose
a rival network (e.g., FOX) executive hypothesizes that
the average age of FOX viewers is less than 54. To test
her hypothesis, she samples 200 FOX viewers and
determines the age of each.
 a. Describe the population.
 b. Describe the variable of interest.
 c. Describe the sample.
 d. Describe the inference.
2. Statistical Concepts
 Measure of Reliability
• Statement (usually qualified) about the degree of
uncertainty associated with a statistical inference
Four Elements of Descriptive
Statistical Problems
1. The population or sample of interest
2. One or more variables (characteristics of the
population or sample units) that are to be
investigated
3. Tables, graphs, or numerical summary tools
4. Identification of patterns in the data
Five Elements of Inferential
Statistical Problems
1. The population of interest
2. One or more variables (characteristics of the
population units) that are to be investigated
3. The sample of population units
4. The inference about the population based on
information contained in the sample
5. A measure of reliability for the inference
Process

A process is a series of actions or operations that


transforms inputs to outputs. A process produces
or generates output over time.
Process

A process whose operations or actions are


unknown or unspecified is called a black box.

Any set of output (object or numbers) produced by


a process is called a sample.
Example
 A particular fast-food restaurant chain has 6,289 outlets with
drive-through windows. To attract more customers to its
drive-through services, the company is considering offering
a 50% discount to customers who wait more than a specified
number of minutes to receive their order. To help determine
what the time limit should be, the company decided to
estimate the average waiting time at a particular drive-
through window in Dallas, Texas. For 7 consecutive days,
the worker taking customers’ orders recorded the time that
every order was placed. The worker who handed the order
to the customer recorded the time of delivery. In both cases,
workers used synchronized digital clocks that reported the
time to the nearest second. At the end of the 7-day period,
2,109 orders had been timed.
Example (cont)
 a. Describe the process of interest at the Dallas
restaurant.
 b. Describe the variable of interest.
 c. Describe the sample.
 d. Describe the inference of interest.
 e. Describe how the reliability of the inference could be
measured.
3. Types of Data and variable
measurements

Quantitative data are measurements that are


recorded on a naturally occurring numerical scale.
Qualitative data are measurements that cannot
be measured on a natural numerical scale; they
can only be classified into one of a group of
categories.
3. Types of Data

Types of
Data

Quantitative Qualitative
Data Data
Quantitative Data
Measured on a numeric 4
scale.
Number of defective
943
items in a lot. 21 52
Salaries of CEOs of

oil companies. 120 12


Ages of employees at 8
a company. 71 3
Qualitative Data
Classified into categories.
College major of each
student in a class.
Gender of each employee

at a company.
Method of payment

(cash, check, credit card).

$ Credit
Example
 Chemical and manufacturing plants sometimes
discharge toxic-waste materials such as DDT into
nearby rivers and streams. These toxins can
adversely affect the plants and animals inhabiting
the river and the riverbank. The U.S. Army Corps
of Engineers conducted a study of fish in the
Tennessee River (in Alabama) and its three
tributary creeks: Flint Creek, Limestone Creek, and
Spring Creek. A total of 144 fish were captured,
and the following variables were measured for
each: (continued on next slide)
Example (cont)
 1. River/creek where each fish was captured
 2. Species (channel catfish, largemouth bass, or
smallmouth buffalo fish)
 3. Length (centimeters)
 4. Weight (grams)
 5. DDT concentration (parts per million)

 These data are saved in the DDT file. Classify each


of the five variables measured as quantitative or
qualitative.
Types of Variables

 Categorical (qualitative) variables have values


that can only be placed into categories, such as
“yes” and “no.”

 Numerical (quantitative) variables have values


that represent quantities.
Types of Variables

Data

Categorical Numerical

Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
(Defined categories) Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
Levels of Measurement

 A nominal scale classifies data into distinct categories in


which no ranking is implied.

Categorical Variables Categories

Personal Computer Yes / No


Ownership

Type of Stocks Owned Growth Value Other

Internet Provider Microsoft Network / AOL/ Other


Levels of Measurement

 An ordinal scale classifies data into distinct categories


in which ranking is implied

Categorical Variable Ordered Categories

Student class designation Freshman, Sophomore, Junior,


Senior
Product satisfaction Satisfied, Neutral, Unsatisfied

Faculty rank Professor, Associate Professor,


Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Levels of Measurement
 An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true zero
point.

 A ratio scale is an ordered scale in which the difference


between the measurements is a meaningful quantity
and the measurements have a true zero point.
Interval and Ratio Scales
4. Statistical Analysis Process

 Identify research goals


 Identify variables of interest and measuring
methods
 Data collection
 Data summarization
 Data analysis
 Forecasting
 Decision making
The role of statistics in business analytics

Source: From The American


Statistician by George Benson.
Discussion
 Monitoring product quality. The Wallace Company of Houston is a
distributor of pipes, valves, and fittings to the refining, chemical, and
petrochemical industries. The company was a recent winner of the
Malcolm Baldrige National Quality Award. One of the steps the company
takes to monitor the quality of its distribution process is to send out a
survey twice a year to a subset of its current customers, asking the
customers to rate the speed of deliveries, the accuracy of invoices, and the
quality of the packaging of the products they have received from Wallace.
a. Describe the process studied.
b. Describe the variables of interest.
c. Describe the sample.
d. Describe the inferences of interest.
e. What are some of the factors that are likely to affect the reliability of the
inferences?
5. Sources of Data

1. Data from a published source


2. Data from a designed experiment
3. Data from an observationally study
5. Sources of Data

 Primary Sources: The data collector is the one using the data
for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
 Secondary Sources: The person performing data analysis is
not the data collector
 Analyzing census data
 Examining data from print journals or data published on the internet.
5. Sources of Data
Published source:
book, journal, newspaper, Web site
Designed experiment:
researcher exerts strict control over the units
Survey:
a group of people are surveyed and their
responses are recorded
Observation study:
units are observed in natural setting and
variables of interest are recorded
Designed Experiment

 A designed experiment is a data-collection


method where the researcher exerts full control
over the characteristics of the experimental
units sampled. These experiments typically
involve a group of experimental units that are
assigned the treatment and an untreated (or
control) group.
Observational Study

 An observational study is a data-collection


method where the experimental units sampled
are observed in their natural setting. No attempt
is made to control the characteristics of the
experimental units sampled. (Examples include
opinion polls and surveys.)
Samples
A representative sample exhibits characteristics
typical of those possessed by the population of
interest.

A simple random sample of n experimental units is


a sample selected from the population in such a way
that every different sample of size n has an equal
chance of selection.
Random Sample
 A simple random sample of n experimental units is a sample
selected from the population in such a way that every different
sample of size n has an equal chance of selection.
Example

 Suppose you wish to assess the feasibility of


building a new high school. As part of your
study, you would like to gauge the opinions of
people living close to the proposed building site.
The neighborhood adjacent to the site has 711
homes. Use a random number generator to
select a simple random sample of 20
households from the neighborhood to
participate in the study
Importance of Selection

How a sample is selected from a population is of


vital importance in statistical inference because
the probability of an observed sample will be
used to infer the characteristics of the sampled
population.
Nonrandom Sample Errors
Selection bias results when a subset of the
experimental units in the population is excluded so
that these units have no chance of being selected
for the sample.
Nonresponse bias results when the researchers
conducting a survey or study are unable to obtain
data on all experimental units selected for the
sample.
Measurement error refers to inaccuracies in the
values of the data recorded. In surveys, the error
may be due to ambiguous or leading questions and
the interviewer’s effect on the respondent.
Example
 How do consumers feel about using the Internet
for online shopping? To find out, United Parcel
Service (UPS) commissioned a nationwide
survey of 5,118 U.S. adults who had conducted
at least two online transactions in 2015. One
finding from the study is that 74% of online
shoppers have used a smartphone to do their
shopping.
 a. Identify the data-collection method.
 b. Identify the target population.
 c. Are the sample data representative of the
population?
Questionnaire Design
61

Questionnaires

 The validity of the results depends on the quality


of these instruments.
 Good questionnaires are difficult to construct; bad
questionnaires are difficult to analyze.
 Difficult to design for several reasons:
 Each question must provide a valid and reliable
measure.
 The questions must clearly communicate the research
intention to the survey respondent.
 The questions must be assembled into a logical, clear
instrument that flows naturally and will keep the
respondent sufficiently interested to continue to
cooperate.
62

Quality aims in survey research


Goal is to collect information that is:
 Valid: measures the quantity or concept that is

supposed to be measured
 Reliable: measures the quantity or concept in a

consistent or reproducible manner


 Unbiased: measures the quantity or concept in a

way that does not systematically under- or


overestimate the true value
 Discriminating: can distinguish adequately

between respondents for whom the underlying


level of the quantity or concept is different
Steps to design a 63

questionnaire:
Step 1: Write out the primary and secondary aims
of your study.
Step 2: Write out concepts/information to be
collected that relates to these aims.
Step 3: Review the current literature to identify
already validated questionnaires that measure
your specific area of interest.
Step 4: Compose a draft of your questionnaire.
Step 5: Revise the draft.
Step 6: Assemble the final questionnaire.
Step 1: Define the aims of the 64

study

 Write out the problem and primary and


secondary aims using one sentence per aim.
Formulate a plan for the statistical analysis of
each aim.
 Make sure to define the target population in
your aim(s).
65

Step 2: Define the variables to be collected

 Write a detailed list of the information to be collected and the


concepts to be measured in the study. Are you trying to
identify:
 Attitudes
 Needs
 Behavior
 Demographics
 Some combination of these concepts
 Translate these concepts into variables that can be measured.
 Define the role of each variable in the statistical analysis:
66

Step 3: Review the literature


 Review current literature to identify related
surveys and data collection instruments that
have measured concepts similar to those
related to your study’s aims.
67

Step 4: Compose a draft


 Determine the mode of survey administration:
face-to-face interviews, telephone interviews, self-
completed questionnaires, computer-assisted
approaches.
 Format the draft as if it were the final version with
appropriate white space to get an accurate
estimate as to its length – longer questionnaires
reduce the response rate.
 Make sure questions flow naturally from one to
another.
68

Compose a draft

 Question: How many cups of coffee or tea do


you drink in a day?
 Principle: Ask for an answer in only one
dimension.
 Solution: Separate the question into two –
 (1) How many cups of coffee do you drink during a
typical day?
 (2) How many cups of tea do you drink during a
typical day?
69

Compose a draft
 Question: What brand of computer do you own?
 (A) IBM PC
 (B) Apple
 Principle: Avoid hidden assumptions. Make sure to
accommodate all possible answers.
 Solution:
 (1) Make each response a separate dichotomous item
 Do you own an IBM PC? (Circle: Yes or No)

 Do you own an Apple computer? (Circle: Yes or No)

 (2) Add necessary response categories and allow for multiple


responses.
 What brand of computer do you own? (Circle all that apply)

 Do not own computer


 IBM PC
 Apple
 Other
70

Compose a draft

 Question: Have you had pain in the last week?


[ ] Never [ ] Seldom [ ] Often [ ] Very often
 Principle: Make sure question and answer
options match.
 Solution: Reword either question or answer to
match.
 How often have you had pain in the last week?
[ ] Never [ ] Seldom [ ] Often [ ] Very Often
71

Compose a draft

 Question: Are you against drug abuse? (Circle:


Yes or No)
 Principle: Write questions that will produce
variability in the responses.
 Solution: Eliminate the question.
72

Compose a draft
 Question: Which one of the following do you think increases a
person’s chance of having a heart attack the most? (Check
one.)
[ ] Smoking [ ] Being overweight [ ] Stress
 Principle: Encourage the respondent to consider each possible
response to avoid the uncertainty of whether a missing item may
represent either an answer that does not apply or an overlooked
item.
 Solution: Which of the following increases the chance of having
a heart attack?
 Smoking: [ ] Yes [ ] No [ ] Don’t know
 Being overweight: [ ] Yes [ ] No [ ] Don’t know
 Stress: [ ] Yes [ ] No [ ] Don’t know
73

Compose a draft

 Question:
 (1) Do you currently have a life insurance policy?
(Circle: Yes or No)
 If no, go to question 3.
 (2) How much is your annual life insurance premium?
 Principle: Avoid branching as much as possible
to avoid confusing respondents.
 Solution: If possible, write as one question.
 How much did you spend last year for life insurance?
(Write 0 if none).
74

Step 5: Revise

 Shorten the set of questions for the study. If a


question does not address one of your aims,
discard it.
 Refine the questions included and their wording
by testing them with a variety of respondents.
 Ensure the flow is natural.
 Verify that terms and concepts are familiar and easy
to understand for your target audience.
 Keep recall to a minimum and focus on the recent
past.
75
Step 6: Assemble the final
questionnaire

 Decide whether you will format the questionnaire yourself or


use computer-based programs for assistance:
 SurveyMonkey.com
 Google form
 At the top, clearly state:
 The purpose of the study
 How the data will be used
 Instructions on how to fill out the questionnaire
 Your policy on confidentiality
Assemble the final 76

questionnaire

 Group questions concerning major subject


areas together and introduce them by heading
or short descriptive statements.
 Order questions in order to stimulate recall.
 Order and format questions to ensure unbiased
and balanced results.
Assemble the final 77

questionnaire
 Include white space to make answers clear and
to help increase response rate.
 Space response scales widely enough so that it
is easy to circle or check the correct answer
without the mark accidentally including the
answer above or below.
 Open-ended questions: the space for the response
should be big enough to allow respondents with large
handwriting to write comfortably in the space.
 Closed-ended questions: line up answers vertically
and precede them with boxes or brackets to check, or
by numbers to circle, rather than open blanks.
78

Non-responders

 Understanding the characteristics of those who


did not respond to the survey is important to
quantify what, if any, bias exists in the results.
 To quantify the characteristics of the non-
responders to postal surveys, Moser and Kalton
suggest tracking the length of time it takes for
surveys to be returned. Those who take the
longest to return the survey are most like the
non-responders. This result may be situation-
dependent.
79

Conclusions

 You need plenty of time!


 Design your questionnaire from research hypotheses
that have been carefully studied and thought out.
 Discuss the research problem with colleagues and
subject matter experts is critical to developing good
questions.
 Review, revise and test the questions on an iterative
basis.
 Examine the questionnaire as a whole for flow and
presentation.
 End of Chapter 1

You might also like