Midterm Test Revision: Charanjit Kaur

25/08/2023
Whiteboard Chart by Shopify Partners

https://burst.shopify.com/photos/whiteboard-chart?q=graph
ETF1100 Business Statistics

Week 6
Midterm Test Revision
Charanjit Kaur
Learning Outcomes
• Revision of materials taught in Weeks 1-5
• Complete Practise Test for Mid-Semester
Basic Concepts of Statistics: Random Variables Wk1
Definition: outcomes of experiments whose values may vary due to chance
• Repeated observations of random variables produce a spread of values.
• These observations are “data” that is relevant in analysing the problem.
Business outcomes are rarely predictable
1
25/08/2023
Basic Concepts of Statistics: Population and Sample Wk1

Sample:
A subset of the population selected for
analysis.
• Often chosen randomly
• Preferably representative of the population.
Population:
All members of a group about which you want to draw a conclusion.
Eg. All voters in an election, all Telstra shareholders, all invoices
submitted to Medicare for reimbursement, etc.
Types of Data Wk1
Data Types
Numerical
Numerical
operations Categorical Numerical operations
are not (Qualitative)
meaningful.
(Quantitative) are
meaningful.
Nominal Ordinal Discrete Continuous

Values are labels & do Values are labels, but Values arise from Values arise from
not imply any order they have an order counting measuringg
Visualising Data Wk1
Nominal & Ordinal Discrete Data Continuous data
Bar Chart Pie Chart Bar Chart Histogram Box plot
Great for Great for

illustrating
illustrating Great for illustrating the distribution
relativity portions or
particularly for shares
ordinal data
2
25/08/2023
Normalization of Data Wk1

Purpose: comparability across observations
• Choice of normalization depends on the purpose of the analysis!
Store Profit ($000) Net Profit % ROI

A 8.06 3.96% 22.39%
B 54.229 10.81% 2.36%
C 17.981 17.55% 9.04%
D 94.891 15.67% 5.02%
E 70.913 7.04% 6.66%
F 23.005 15.49% 18.11%
G 108.656 17.94% 2.57%
Normalization of Data Wk1

Other common normalization
a. across time; real vs nominal values (CPI adjusted)
• Consumer Price Index (CPI): a weighted average of prices for everyday goods
and services people buy. It is indexed to 100 in the base period and is used
to calculate the inflation rate.
• Nominal value: value that is measured in terms of actual prices that exist at
the time.
• Real value: the value of the same item after it has been adjusted for
inflation.
b. across observations; e.g. percentage, per capita
Analysis of Categorical Data Wk2

Probability = quantification of chance. Probability of all possible events, add to 1
• Marginal Probability: P(A) = probability of event A.
• Joint Probability: Probability of “Intersection” describes “A AND B” 𝑃(𝐴 ∩ 𝐵)
• Conditional probability: P(A) conditional on B having occurred P(A|B) = P(A) where

𝑷(𝑨 ∩ 𝑩)
𝑷 𝑨𝑩 =
𝑷(𝑩)
• Probability of “Union” describes “Either A OR B” 𝑃(𝐴 ∪ 𝐵)
3
25/08/2023
Analysis of Categorical Data Wk2

• Mutually exclusive events: Events that cannot occur together
Pr 𝐻𝑒𝑎𝑑 ∩ 𝑇𝑎𝑖𝑙 = 0
• Independent events: Event A is independent of event B if

P 𝐴 𝐵 = Pr 𝐴 or P 𝐴 ∩ 𝐵 = P 𝐴 × P(𝐵)
• Evaluate the relationship between two categorical variables – refer to Exercise

vs Heart Disease Example in Seminar 2
Understanding Statistical Uncertainty Wk3

Distribution of
Numerical Data
Central
Variation Shape
Tendency
Arithmetic Interquartil Standard

Median Mode Range Variance Skewness
Mean e Range Deviation
What is the typical or the central value? How much variation in the distribution? Are there any
unusual values
that
contribute to
the
distribution?
Mean, Median & Mode Wk3

Mean: measure of typical value, also known as “average”.
The sum of all values observed divided by the no of observations. In Excel : =AVERAGE(…)
Median: The middle value if values are sorted from smallest to largest (50th percentile).
50% of values are equal to or lower than the median, and 50% are equal to or higher.
In Excel : =MEDIAN(…)
Mode: Value that occurs most frequently.

This might not be interesting if the values don’t repeat often. For numerical data, the most
populated bin range is often reported. In Excel : =MODE(…)
All are measures of central tendencies, but which one should we use?
4
25/08/2023
Measures of Variability Wk3

Range: The difference between the maximum and the minimum values. It relies just on the two
most extreme values in the dataset. In Excel: =MAX(…)-MIN(…)
Interquartile Range: the spread of the middle 50% of the data

Q1 = first quartile → 25% of data falls below this value In Excel: =QUARTILE.EXC(…,1)
Q3 = third quartile → 25% of data falls above this value In Excel: =QUARTILE.EXC(…,3)
In Excel: =Q3 – Q1
Variance: average squared deviations (distance) from the mean. Reported in squared units
In Excel: =VAR.S(…)
Standard deviation: Variance

Same unit as the data. Easier to interpret In Excel: =STDEV.S(…)
Shape/Skewness of Data Distribution Wk3

Skewness is the extent of asymmetry in the distribution.
If the distribution is symmetric, the mean is equal to the median.
Skewness > 0 Skewness = 0 Skewness < 0
Probability Distribution Wk3

• In statistics, we use a smooth mathematical function to model the
probability density function (pdf)
• These are approximation to the data distribution – “model”
• The function 𝑓 𝑋 denotes the “pdf”

• Areas under the curve represents
probability
5
25/08/2023
Normal Distribution Wk3

• The most common distribution in statistics → normal distribution
• It is a symmetric (bell-shaped) distribution
• The normal distribution has two features: Mean and Stdev
• Notation: 𝑋 ~ 𝑁(𝑀𝑒𝑎𝑛, 𝑆𝑡𝑑𝑒𝑣)
• Skewness = 0;
• Mean = Median = Mode
Excel Functions:
For probability “=NORM.DIST(xvalue,mean,stdev,TRUE)”
For percentile “=NORM.INV(prob, mean,stdev)”
Representative Sample Wk4

Representative sample is determined by:
1) Data collection process (sampling design)
2) Survey design → wording design of the questions/form.
3) Sample size → a sufficiently large sample means the sample statistic gets closer to the population
parameter
Biased sample:
• Non-representative statistics
• Invalid inference → invalid conclusions. It could end with catastrophic outcomes if used in business
decisions
Potential biases:
• Selection bias – each identity in the population has an uneven chance of being chosen
• Non-responsive bias – data collection process leading to systematic non-response from certain
groups
Statistics is UNCERTAIN Wk4

• Statistics is about quantifying the uncertainty of the sample estimate
• 𝒙
ഥ is an estimate of 𝑬 𝑿 = 𝝁 (Sample statistic is only an estimate of the
truth. Any sample statistic is not exact and has variation/error around
them.)
• Assume we take data samples repeatedly, and compute sample means as the
statistic for each set of sample. Then we would have the sampling
distribution of the sample mean to portray its variability.
𝒔
• Central Limit Theorem: If the sample size 𝒏 is large: ഥ
𝒙 ∼ 𝑵 𝝁,
𝒏
• This is true regardless of the shape of the population distribution
6
25/08/2023
Confidence Interval for the Population Mean Wk4

Confidence interval = plausible range of the unknown population
mean given some level of probability
𝑠 𝑠
𝑋−𝑍 < 𝜇 < 𝑋+𝑍
𝑛 𝑛
If the standard deviation (𝜎) ↑, the spread of the distribution is larger

𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
standard error ↑, width ↑, estimate is less precise
𝑛
𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
If the sample size (n) ↑,standard error ↓, width ↓, estimate is more precise
𝑛
The bigger the sample, the more information we have to increase the precision of the interval estimate of the
sample mean, the narrower the interval.
If the level of confidence (1-α) ↑, critical value changes, width ↑ , the estimate is less precise
The more confident we are, the more values we need to include in our confidence interval, the wider the
interval.
Hypothesis Test for Evidence-based Decisions Wk5

A statistical framework for using data to derive evidence-based
decisions.
• Define business problem and variables relevant to that problem
• Formulate a hypothesis around these variable that are relevant to business
decisions
• Conduct hypothesis testing to establish degree of evidence for the
hypotheses
• Based on evidence, make business decisions
Hypothesis Test for Evidence-based Decisions Wk5

21
Sample
Sampling
STATISTICS Distribution
DESCRIPTIVE INFERENTIAL
ESTIMATION
HYPOTHESIS TESTS
Point & Interval
Estimating the value of a Testing a claim about the value

population parameter of a population parameter
7
25/08/2023
Steps in Hypothesis Test Wk5
1 2 3 4
Formulate Decide Calculate Apply
𝐻0 & 𝐻1 on  the p-value decision rule:
reject 𝐻0
if p-value < 
OR retain it if
p-value > 
Defining the hypothesis Wk5

•Formulate 𝐻0 & 𝐻1
1 •The null hypothesis always involve equality sign (=)
•The alternative hypothesis is what we are searching evidence for. It can contain an “≠” , “>” or “<“ sign
𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 = 𝜇0
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 > 𝜇0 𝐻1 : 𝜇 < 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
Two-tailed test Right-tailed test Left-tailed test

“different to” “greater than” “less than”
Mechanics of Hypothesis Testing Wk5

Decide on .
2 Recommendations: 𝛼 = 5%; or 𝛼 = 1% for conservative cases.
ҧ 0
𝑥−𝜇 ഥ−𝝁
𝒙
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = 𝑠/ 𝑛
= 𝑺𝑬 𝒙ഥ𝟎
3 Judging whether or not the test statistic is outstanding “far from zero”, in the
direction of the alternative.
Decision:
4 Reject 𝐻0 if p-value <  OR Retain it if p-value > 

A smaller p-value means that there is stronger evidence in favor of H1
P-value for a right-tail test P-value for a left-tailed test P-value for a two-tail test
=1-NORM.S.DIST(test statistic ,TRUE) =NORM.S.DIST(test statistic ,TRUE) =2*NORM.S.DIST(??,TRUE)
8
25/08/2023
Type I and II errors Wk5

Since we rely on data samples to conduct hypothesis tests, there is a potential
for errors. Possible scenarios:
Type I error Type II error

Reject a true null The true ‘state of the world’ Retain a false null
𝑯𝟎 is TRUE 𝑯𝟎 is FALSE
Do not reject 𝑯𝟎 CORRECT TYPE II ERROR
DECISION! (β)
Reject 𝑯𝟎 TYPE I ERROR CORRECT
(α) DECISION!

Midterm Test Revision: Charanjit Kaur

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Midterm Test Revision: Charanjit Kaur

Uploaded by

Copyright:

Available Formats

25/08/2023

Whiteboard Chart by Shopify Partners

ETF1100 Business Statistics

• Revision of materials taught in Weeks 1-5

• Complete Practise Test for Mid-Semester

Basic Concepts of Statistics: Random Variables Wk1

Definition: outcomes of experiments whose values may vary due to chance

• Repeated observations of random variables produce a spread of values.

• These observations are “data” that is relevant in analysing the problem.

Business outcomes are rarely predictable

Basic Concepts of Statistics: Population and Sample Wk1

Types of Data Wk1

Nominal Ordinal Discrete Continuous

Visualising Data Wk1

Nominal & Ordinal Discrete Data Continuous data

Bar Chart Pie Chart Bar Chart Histogram Box plot

Great for Great for

Normalization of Data Wk1

• Choice of normalization depends on the purpose of the analysis!

Store Profit ($000) Net Profit % ROI

Normalization of Data Wk1

Analysis of Categorical Data Wk2

• Conditional probability: P(A) conditional on B having occurred P(A|B) = P(A) where

• Probability of “Union” describes “Either A OR B” 𝑃(𝐴 ∪ 𝐵)

Analysis of Categorical Data Wk2

• Independent events: Event A is independent of event B if

• Evaluate the relationship between two categorical variables – refer to Exercise

Understanding Statistical Uncertainty Wk3

Arithmetic Interquartil Standard

Mean, Median & Mode Wk3

Mode: Value that occurs most frequently.

Measures of Variability Wk3

Interquartile Range: the spread of the middle 50% of the data

Standard deviation: Variance

Shape/Skewness of Data Distribution Wk3

Probability Distribution Wk3

• The function 𝑓 𝑋 denotes the “pdf”

Normal Distribution Wk3

Representative Sample Wk4

Statistics is UNCERTAIN Wk4

• This is true regardless of the shape of the population distribution

Confidence Interval for the Population Mean Wk4

If the standard deviation (𝜎) ↑, the spread of the distribution is larger

Hypothesis Test for Evidence-based Decisions Wk5

Hypothesis Test for Evidence-based Decisions Wk5

Estimating the value of a Testing a claim about the value

Steps in Hypothesis Test Wk5

Defining the hypothesis Wk5

Two-tailed test Right-tailed test Left-tailed test

Mechanics of Hypothesis Testing Wk5

4 Reject 𝐻0 if p-value <  OR Retain it if p-value > 

Type I and II errors Wk5

Type I error Type II error

You might also like