You are on page 1of 44

FOUNDATIONS OF BUSINESS ANALYTICS

VNUK_FBA
LECTURE 1
What is Statistics?
Types of data, data collection and
sampling
Chapter outline
1.1 Key statistical concepts
1.2 Practical applications
1.3 How managers use statistics
1.4 Online resources
Learning objectives
LO1 describe the two major branches of statistics –
descriptive statistics and inferential statistics
LO2 understand the key statistical concepts – population,
sample, parameter, statistic and census
LO3 provide examples of practical applications in which
statistics have a major role to play
LO4 understand the basics of the computer spreadsheet
package Microsoft excel and its capabilities in aiding
with statistical data analysis for large amounts of data.
1.5

What does this form indicate?


1.6

What does this form indicate?


1.7

What does this photo indicate?


1.8
What does this photo indicate?
1.9

What does this photo indicate?


1.10
1.11

Introduction to Statistics
In today’s world we have access to more data than ever.
For example, data are collected for business applications
from:
• Direct observation or measurement
• Customer surveys
• Political elections
• Economic surveys
• Marketing surveys

This chapter introduces various statistical concepts. In


Chapter 2, we introduce various methods of data
collection.
1.12

In today’s world…
How can we make use of the collected data to
help make informed business decisions?

By learning statistics, which is a collection of


various techniques and tools, we can help make
such decisions.
1.13

What is Statistics?

‘Statistics is a body of principles and


methods concerned with extracting
useful information from a set of
data to help people make informed
business decisions.’
1.14

What is Statistics?
‘Statistics is a way to get information
from data to make informed decisions.’

Statistics

Data Information

Information: Knowledge communicated


Data: Mostly numerical facts collected from concerning some particular fact, which
direct observations, measurements or can be used for decision making.
surveys. E.g. average marks
The proportion of class receiving a ‘B”
grade and above
1.15

Two major branches of Statistics

1. Descriptive Statistics

2. Inferential Statistics
1.16

Descriptive Statistics
Descriptive statistics deals with methods of organising,
summarising, and presenting data in a convenient and
informative way.

One form of descriptive statistics uses graphical


techniques, which allow statistics practitioners to present
data in ways that make it easy for the reader to extract
useful information.

Weeks 2 and 3 introduce several graphical methods.


1.17

Descriptive Statistics
Another form of descriptive statistics uses numerical
measures to summarise data.
The mean and median are popular numerical measures
to describe the location of the data.
The range, variance and standard deviation measure
the variability of the data

Weeks 3 and 4 introduce several numerical statistical


measures that describe different features of the data.

17
1.18

Inferential Statistics
Descriptive statistics describe the data set that is
being analysed, but does not provide any tools for us
to draw any conclusions or make any inferences about
the data. Hence we need another branch of statistics:
inferential statistics.
Inferential statistics is also a set of methods, but it is
used to draw conclusions or inferences about
characteristics of populations based on sample
statistics calculated from sample data.
Weeks 5-14 introduce several techniques in inferential
statistics.
1.19

1.1 Key Statistical concepts


1.20

Population
A population is the group of all items (data) of interest.

Population is frequently very large; sometimes infinite.

E.g. 1. All current million or so members of an automobile club.


2. All prawns available at the freshwater prawn Farm A in
Queensland.
1.21

Sample
A sample is a set of items (data) drawn from the
population of interest.

Sample could potentially be very large, but much less


than the population.
E.g. 1. A sample of 500 members of the automobile club selected.
2. A sample of 1000 prawns selected from different sections of
the freshwater prawn Farm A.
1.22

Parameter
A descriptive measure of a population.

Statistic
A descriptive measure of a sample.
1.23

Population Sample

Subset

Statistic
Parameter
A descriptive measure of a population is called a parameter
(e.g. Population mean)
A descriptive measure of a sample is called a statistic (e.g.
Sample mean)
1.24

Statistical Inference
Statistical inference is the process of making an
estimate, prediction, or decision about a population
based on a sample.
Population Sample

Inference

Statistic
Parameter

What can we infer about a population’s parameter based


on a sample’s statistic?
1.25

Statistical Inference

We use sample statistics to make inferences about


population parameters.
Therefore, we can produce an estimate, prediction, or
decision about a population based on sample data.
Thus, we can apply what we know about a sample to
the larger population from which it was drawn!
1.26

Statistical inference
Rationale:
• Large populations make investigating each member
impractical and expensive.
• Easier and cheaper to take a sample and make
estimates about the population from the sample.
However:
• Such conclusions and estimates are not always going to
be correct.
• For this reason, we build into the statistical inference
‘measures of reliability’, namely confidence level and
significance level (which will be discussed later in this subject)
1.27

1.2 Practical applications


1.28

Example 1: Pepsi’s Exclusivity Agreement


A large university with a total enrolment of about
50,000 students has offered Pepsi-Cola an exclusivity
agreement that would give Pepsi exclusive rights to
sell its products at all university facilities for the next
year with an option for future years. In return, the
university would receive 35% of the on-campus
revenues and an additional lump sum of $200,000 per
year.

Pepsi has been given 2 weeks to respond.


1.29

The market for soft drinks is measured in terms of 375ml


cans. Pepsi currently sells an average of 10,000 cans per
week (over the 30 weeks of the year during two teaching
semesters that the university operates). The problem is
that we do not know how many soft drinks (all types
including Pepsi) are sold weekly at the university (this
information is needed to calculate total profit from the deal)

• What is the population of interest in this case?


• The cost of interviewing each student is too high
and extremely time consuming. It is therefore not
possible to interview every student. What is the best
way to get an estimate of the total soft drinks
consumption sold at the university?
1.30

Example 1: Solution
What is the population of interest in this case?
• the soft drink consumption of the university’s 50 000
students.

What is the best way to get an estimate of the total


soft drinks consumption sold at the university?
• we can sample a much smaller number of students
(the sample size is 500) and infer from the sample
data the number of soft drinks consumed by all 50,000
students.
1.31

Suppose Pepsi assigned you to survey a sample


of 500 students to infer from the sample data the
number of soft drinks consumed by all 50,000
students. Accordingly, you organise a survey that
asks 500 students to keep track of the number of
soft drinks by type of drink (Pepsi, Coke,
Lemonade etc.) they purchase during the next 7
days.
Discuss how would Pepsi use the data
collected by you to get estimates of the soft
drink consumption?
1.32

Example 1: Solution
The answer to his question involves inferential statistics.

Since revenue generated from the consumption of each soft drink is


equal to

(price of the soft drink) x (mean consumption) x 50,0000

Since, mean consumption (consumption per student) of the


population is not known, using our insight from inferential
statistics we can get an estimate of the mean consumption from
the sample data (500 students).
1.33

1.3 How managers use statistics


Statistical Applications in Business
Statistical analysis plays an important role in virtually
all aspects of business and economics.
Throughout this course, we will see applications of
statistics in accounting, economics, finance, human
resources management, marketing, and operations
management.
TYPES OF DATA, DATA COLLECTION AND
SAMPLING
2.35

2.1 Types of data


Definitions
A variable is some characteristic of a population or sample.
E.g. student marks.

A variable is typically denoted with a capital letter: X, Y, Z…


The values of the variable are the range of possible values
for a variable.
E.g. student marks (0,…,100)

Data are the observed values of a variable.


E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
2.36

Types of data…
Data (at least for purposes of Statistics) fall into three
main groups:

Numerical data
Nominal Data
Ordinal Data
2.37

Numerical data
Numerical data
The values of numerical data are real numbers.
E.g. heights, weights, prices, waiting time at a medical
practice, etc.

Arithmetic operations can be performed on numerical data,


thus its meaningful to talk about 2*Height, or Price + $1, and
so on.

Numerical data are also called quantitative or interval.


2.38

Nominal data
Nominal Data
The values of nominal data are categories.
E.g. Responses to questions about marital status are categories,
coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4

These data are categorical in nature; arithmetic operations


don’t make any sense (e.g. does Married ÷ 2 = Divorced?!)

Nominal data are also called qualitative or categorical.


2.39

Ordinal data
Ordinal Data
Ordinal data appear to be categorical in nature, but their
values have an order; a ranking to them:
E.g. University course evaluation system:
Poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

While its still not meaningful to do arithmetic on this data


(e.g. does 2*fair = very good?!), we can say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric values
are assigned to each category.
Ordinal data are also called ranked.
2.40

Types of data – Examples


Numerical data Nominal data Ordinal data
exam grade
age income person married HD
55 75 000 1 yes
42 68 000 D
. . 2 no C we
With ordinal data, all
. . 3 no P
can use is computations
. . F
weight gain With nominal.data, all . involving the ordering
computer
we brand process. Food quality
+10 can calculate is the
1 IBM Excellent
+5 proportion of data 23that Dell Good
Compaq
. falls into each category.
4 IBM Satisfactory
. . Poor
.
IBM Dell Compaq other total
25 11 8 6 50
50% 22% 16% 12%
2.41

Calculations for types of data


As mentioned above,
• All calculations are permitted on numerical data.
• No calculations are allowed for nominal data, except
counting the number of observations in each category
and calculating their proportions.
• Only calculations involving a ranking process are allowed
for ordinal data.

This lends itself to the following ‘hierarchy of data’…


2.42

Hierarchy of data
Numerical
• Values are real numbers.
• All calculations are valid.
• Data may be treated as ordinal or nominal.

Nominal
• Values are the arbitrary numbers that represent categories.
• Only calculations based on the frequencies of occurrence are valid.
• Data may not be treated as ordinal or numerical.

Ordinal
• Values must represent the ranked order of the data.
• Calculations based on an ordering process are valid.
• Data may be treated as nominal but not as numerical.
2.43

Other forms of data


Cross-sectional data is collected at a certain point in time
across a number of units of interest
• marketing survey (observe preferences by gender, age)
• students’ marks in a statistics course exam
• starting salaries of graduates of an MBA program in a particular
year.

Time-series data is collected over successive points in time


• weekly closing price of gold
• monthly tourist arrivals in Australia.
2.44

Sample size
Numerical techniques for determining sample sizes will
be described later, but it is sufficient to say that the
larger the sample size, the more accurate we can
expect the sample estimates to be.

You might also like