MMGT6012

MMGT6012
Business Tools for Management
TOPIC 1: Descriptive Statistics

Dr. Matthew Beck
ITLS, Business School
The University of Sydney Page 1

1. Descriptive Statistics
Agenda
1) Orientation
2) Course Overview & Challenges
3) Introduction to Data Analysis / Fundamental Principles
It is envisioned that we work together as ‘partners’:

– You have made a choice to enhance your knowledge in this area.
– Let’s come with an open attitude right from the start to learn from each
other and strive to achieve the common goal

ORIENTATION
All communication will be done via Canvas
About Me:
– Matthew Beck
– Rm 444; Merewether Building (H04)
– matthew.beck@sydney.edu.au
– 02 9114 1834

Course Overview
Learning Objectives
– Apply statistical methods to solve quantitative business problems

– Apply the logical processes of quantitative analysis to deconstruct complex
business problems
– Critically evaluate application statistical methods in solving business
problems
– Communicate the results of statistical analysis clearly, concisely and with
impact
– Be comfortable using a range of software especially Excel

How to Succeed in MMGT6012

Why Study Quantitative Methods?
• Where to get materials?

• How costly?
• Performance Old/New? • What features?

• Relative costs? • What price?
• What is demand?
• Worthwhile addition?
• Machine performance? • What colour package?
• Monitor production levels • What method best to ship?
• What markets to ship to?

Why Study Quantitative Methods
“To increase the business value of information, we need data from various angles. In
addition to sales data, we need to know why sales increased. We need to know how
and where we are influential.”
CEO, German Pharmaceutical Company
“With thousands of customers, products, and contractual terms and conditions, pricing
and incentive models become very complex. Analytics is a key way to get this
complexity under control. But we are not yet good enough in this regard.”
CEO, Japanese Electronics Manufacturer
“High-performance businesses have a much more developed analytical orientation than

other organizations. They are five times more likely to view analytical capabilities as
core to the business.”
Why Predictive Analytics Is A Game-Changer, Forbes Magazine, 01/04/10.

Information is Valuable:
– purchased for $1 billion.
– purchased for $956 million.
– purchased for $689 million.


It's Not All Math
“On one NBA team, an analyst quickly gained the reputation as being the smartest
guy in the room but had virtually no impact on the decision-making process...while the
work he was producing was innovative, it was wasted because he did not have the
ability to communicate it in a manner that was understandable to the decision-makers.”
“It's not just about the data. It's what you do with the data.”
Mike Rhodin, Senior Vice President, IBM

It's Not All Math
You may never do the analysis yourself
But somewhere at some point you will:

– Need to read it.
– Need to understand it.
– Need to use it.

Course Overview
Statistical Data Analysis:

– Basic description
– Hypothesis testing (differences and relationships)
– Predictive analysis (regression modelling)
Optimisation Modelling:
– Linear Programming (optimal allocation of limited resources)
Simulation Modelling:
– Incorporating uncertainty into the modelling process (probabilities)

Course Overview
Cannot study every single method to solve all problems:

– A board range of techniques to solve a broad range of problems
– Develop useful skills in software and problem definition / solving
Focus on the application:

– Making sense of data
– Approaching complex problems with a plan of attack
– Trying to use quantitative information to do something meaningful

What is Data?
Data is derived from study, experience, or instruction
Data is something that we can process that allows us to make

sense of a situation
We take data and by analysing it we transform it into

information: a set of facts with meaning

The Role of Data
Reduce the risk of decision making:

– Information on course of action
– Ideas of likely strategy direction
– Understand how something is performing
– Understand what people want
– Understand where and how we can have influence
– https://www.kaggle.com/c/flight

Class Activity

Two Branches of Statistics
Descriptive Statistics:
– Procedures that describe the data we are studying
– Results help us organize and understand the data
– The results cannot be generalized to any larger group
Descriptive Statistics include:

– Measures of central tendency (mean, median, and mode)
– Measure of spread (range, variance, standard deviation)
– Frequency distributions Graphs (pie charts, bar charts, etc)

Useful if you do not need to extend your results a larger group:

– Most reports in social sciences seek to produce “universal truths” about the
behaviour of people

Inferential Statistics:
– Trying to reach conclusions that extend past the immediate data
– How might the population behave based on our collected data
– The probability that our result is systematic not random chance

Hypothesis Testing:
– Also called tests of significance
– A Chi-square test, T-test, F-test, regression models are examples
– Typically tests for differences or relationships between variables
Predictive Analysis:
– Regression modelling (and others)
– Determine patterns and predict future outcomes and trends

Types of Data
Also referred to as levels of measurement
CATEGORICAL CONTINUOUS
Nominal Ordinal Interval Ratio

(has no meaning) (can be put in order) (can measure between) (Has a fixed zero)

Class Activity
1. Form small teams with the people sitting next to you
2. Come up with an example for each type of data
3. Give an example of how you describe the characteristics of

that data (if you had a large number of data points)
4. Write this on a piece of paper and bring it up the front

As per Ordinal data

Ratio Data
As per Ordinal data

plus means, std dev. Interval Data
As per Nominal data

plus medians and
percentiles.
Ordinal Data
Frequencies, Mode Nominal Data

Nominal Data
From a production line, I pick 10

widgets randomly.
They may be coded as either being

defective or not defective:
– How will I code this?
– What information is useful?
– What analysis is appropriate to
perform?

Nominal Data – Frequency Distributions
What is a Frequency Distribution

– A frequency distribution is a list or a table …
– containing categories or ranges within which the data fall
– the corresponding frequencies within each category
A graph of a frequency distribution is called a frequency

histogram:
– Allows for a quick visual interpretation of the data

Nominal Data – Frequency Distributions
Defective 2 Defective 20%

Not Defective 8 Not Defective 80%

– A measure of central tendency

– Value that occurs most often
– Not affected by extreme values
– Used for either numerical or categorical data
– There may may be no mode
– There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
As per Ordinal data

Ratio Data
As per Ordinal data

As per Nominal data

plus medians and
percentiles.
Ordinal Data

Ordinal Data
Five people rank, in order of preference, 5 companies:

– (1 = Best – 5 = Worst)

– What analysis is appropriate to perform?

Ordinal Data – Frequency Distributions

Ordinal Data – Mode

Ordinal Data – Median
In an ordered array:
– The median is the “middle” number
– 50% of observations are above this point
– 50% of observations are below this point
– (n+1)/2 gives the position of the median
13 14 15 16 17 18 19 20 21 13 14 15 16 17 18 19 20 21
Median = 19 Median = 18

As per Ordinal data

Ratio Data
As per Ordinal data

As per Nominal data

plus medians and
percentiles.
Ordinal Data

Interval Data
5 people rate the delivery performance for 5 companies:

– (1= very bad to 10 = very good)


Interval Data – Arithmetic Mean
The mean/average is the most common measure of central

tendency
– For a sample of size n:
∑X i
X1 + X 2 + X 3 + X n
Observed
X= i =1
= Values
n n
Sample size

Interval Data – Mean

Interval Data – Variance
Average (approximately) of the squared deviations of values

from the mean:
n
2
∑ (X − X)
i
2
S = i=1
n -1
– X = arithmetic mean
– n = sample size
– Xi = ith value of the variable X

Interval Data – Standard Deviation
Square root of the variance

– The most commonly used measure of variation
∑ i
(X − X ) 2
S= i=1
n -1
– Has the same unit of measurement as the original data
– Provides a "standard" way of knowing what is “average” and what is
“extra large” or “extra small”

Interval Data – Picturing the Standard Deviation

Data A
11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5
S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.570
Interval Data – Variance and Std Dev

As per Ordinal data

Ratio Data
As per Ordinal data

As per Nominal data

plus medians and
percentiles.
Ordinal Data

Ratio Data
Burger Squares is a company that make

chip packets which come in 50g packs.
– Given this weight, government regulations mean
that to protect consumers, packs must have
between 45 and 55 grams.
– To much variability in packets weight, i.e. outside
the prescribed range, results in heavy fines.

Ratio Data
20 packs are randomly chosen off of an

assembly line and are weighed.

Ratio Data – Descriptive Statistics
Note that by looking at all the data combined, we get a holistic

understanding of what is occurring within the data.
How To Best Describe Data
We tend to describe categorical variables with frequencies and

modes:
– Why do you not want to report a mean or a standard deviation?
We tend to describe continuous variables with means and

standard deviations:
– Why do you not want to report frequencies or use a histogram?

Thinking About Measurement Accuracy (or Error)
Accurate and Precise Accurate but Imprecise Inaccurate and Precise

Systematic Error

Measures of Central Tendency
Mode:
– Can really only be used when dealing with nominal data
Mean:
– When data is symmetric and continuous
Median:
– When the data is skewed (has extreme or outlying observations)

Outliers and Extreme Values
A value that is much smaller or larger than other values:

– A value that stands out or doesn’t belong doesn't belong



The Role of Data Cleaning
Mistakes are made when collecting or coding data:

– All data should be cleaned before analysis!
Data cleaning is the process of the act of detecting and

correcting
– Corrupt data
– Inaccurate data
– Incomplete data
– Irrelevant data
– Incorrect data

Errors and outliers can be detected by:

– Descriptives
– Frequencies
– Consistent with allowable values
– Common sense
Profiling & categorising:

– Procedural error – data entry or coding error
– Extraordinary event – uniqueness of observation
– Observations within range but unique in combination
– No explanation

Retention vs. Deletion:

– Depending on philosophies among analysts
– Deletion may improve multivariate analysis but limit generalisability
Case-wise deletion:
– Cases or respondents with any missing responses are discarded
– Useful if extent of missing data is small
Pair-wise deletion:
– Only the piece of information requiring cleaning is discarded

Case Substitution
– Observations with missing data are replaced by choosing another non-sampled observation.
Mean Substitution
– Widely used; replaces missing values for a variable with the mean value of that variable
based on all valid responses.
Cold Deck Imputation

– Substituting a constant value derived from external resources or previous research fro the
missing values.
Regression Imputation
– Regression analysis used to predict the missing values of a variable based on its relationship to
other variables in the data set.

Class Activity
1. Form small teams with the people near you
2. Locate the “Class Data.xlsx” file
3. Clean this data file
4. Record issues and a description of what you did

Presenting Data
Data analysis is story-telling:

– It should be interesting and compelling
− Consider your audience

− Establish a setting
− Define the characters
− Establish the conflict
− Resolve the conflict
− Think about the future

Presenting Data
Present data in a way that provides substance & statistics:

– Communicate complex ideas with clarity, precision and efficiency
– Give the largest number of ideas in the most efficient manner
– Tell the truth about the data
http://www.edwardtufte.com/tufte/
http://kirkgoldsberry.com/

Presenting Data

Presenting Data – Chart Junk

Presenting Data – No Relative Basis for Comparison
A’s Received by Students A’s Received by Students
300 30%
200 20%
100 10%
0 0%
1st Yr 2nd Yr 3rd Yr 4th Yr 1st Yr 2nd Yr 3rd Yr 4th Yr
Click here to listen to an explanation!!!

Presenting Data – Distorting/Compressing Vertical Axis
Quarterly Sales Figures Quarterly Sales Figures
200 50
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

Presenting Data – No Zero Point on Vertical Axis

Monthly Sales Numbers
$45
$42
Monthly Sales Numbers
$39
$45
$36
$42
J F M A M J
$39
$60
$36
J F M A M J $40
$20
$0
J F M A M J
Using Different Simple Graphs
When presenting data you are normally trying to show a:

– Relationship (how things are connected)
– Comparison (how things are different)
– Composition (what things are made of)
– Distribution (how spread out things are)


Vertical Bar Charts:

– Best for comparing 2-7 means or percentages

Horizontal Bar Charts:

– Best for comparing 8 or more means or percentages

Pie Charts:
– Best for showing the composition of single group of data

Line Charts:
– Illustrate trends or changes over time
– Movements in the same or different direction

Scatterplots Charts:
– Illustrate distribution of data
– Help identify relationships between variables

Some Useful Resources
– https://infogr.am/
– http://articles.sysev.com/principles-practices-effective-
presentation-communication/
– http://www.kaushik.net/avinash/data-presentation-tips-focus-
think-simplify-visualize/
– http://www.forbes.com/sites/kateharrison/2015/01/20/a-
good-presentation-is-about-data-and-story/

Class Activity
1. Go to Blackboard and locate the “Country Data.xlsx” file
2. Present the information in the file in an interesting way
3. Develop a story around the data; what is it telling you?
4. Prepare a short power point presentation to tell your story
5. Email to matthew.beck@sydney.edu.au

Descriptives in Excel
=mode(array) Where the array is a row or

column of data you are
=median(array) interested in analysing
=average(array)
=stdev(array)
Go to the Insert ribbon for graphs
=range(array)
=min(array)
=max(array)

Descriptives in SPSS
You are an intelligent Master’s level student
SPSS is pretty intuitive
See if you can figure it out:

– Hint: You are trying to analyse your data (where do you think you go?)
– Hint: You want to describe your data (where do you think you go?)

Class Activity
1. Locate the “Class Data - Clean.xlsx” file on Blackboard
2. Do some descriptive statistics on the data
3. Develop a story around the data; what is it telling you?
4. Prepare a short power point presentation to tell your story
5. Email to matthew.beck@sydney.edu.au

MMGT6012 - Topic 1 - Descriptive Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MMGT6012 - Topic 1 - Descriptive Statistics

Uploaded by

Copyright:

Available Formats

Business Tools for Management

TOPIC 1: Descriptive Statistics

The University of Sydney Page 1

It is envisioned that we work together as ‘partners’:

The University of Sydney Page 2

All communication will be done via Canvas

The University of Sydney Page 3

– Apply statistical methods to solve quantitative business problems

The University of Sydney Page 4

How to Succeed in MMGT6012

The University of Sydney Page 5

Why Study Quantitative Methods?

• Where to get materials?

• Performance Old/New? • What features?

The University of Sydney Page 6

Why Study Quantitative Methods

“High-performance businesses have a much more developed analytical orientation than

The University of Sydney Page 7

Why Study Quantitative Methods

– purchased for $1 billion.

– purchased for $956 million.

– purchased for $689 million.

The University of Sydney Page 8

Why Study Quantitative Methods

The University of Sydney Page 9

It's Not All Math

The University of Sydney Page 10

It's Not All Math

You may never do the analysis yourself

But somewhere at some point you will:

The University of Sydney Page 11

Statistical Data Analysis:

The University of Sydney Page 12

Cannot study every single method to solve all problems:

Focus on the application:

The University of Sydney Page 13

Data is derived from study, experience, or instruction

Data is something that we can process that allows us to make

We take data and by analysing it we transform it into

The University of Sydney Page 14

The Role of Data

Reduce the risk of decision making:

The University of Sydney Page 15

The University of Sydney Page 16

Two Branches of Statistics

Descriptive Statistics include:

The University of Sydney Page 17

Two Branches of Statistics

Useful if you do not need to extend your results a larger group:

The University of Sydney Page 18

Two Branches of Statistics

The University of Sydney Page 19

Two Branches of Statistics

The University of Sydney Page 20

Also referred to as levels of measurement

Nominal Ordinal Interval Ratio

The University of Sydney Page 21

1. Form small teams with the people sitting next to you

2. Come up with an example for each type of data

3. Give an example of how you describe the characteristics of

4. Write this on a piece of paper and bring it up the front

The University of Sydney Page 22

As per Ordinal data