You are on page 1of 14

Quantitative Methods

M.Sc. Core Courses


Preparation for session 3
Descriptive analysis

Laurent Bertrandias
Professor of Marketing
TBS Education
Descriptive
analysis

Description of one variable Description of the link


= between two variables =
Univariate analysis Bivariate analysis

A required preparatory stage


• to get an overview of the dataset,
• to spot outliers
• to get an idea of which variables are associated together
Outliers
Outlier = A data point that differs significantly from other observations.
 Measurement / experimental error
 Can occur by chance
 Heavy-tailed distribution (ex. In a dataset based on French regions
variables, Ile de France will often appear as an outlier)

Outliers can cause serious problems in statistical analyses, especially when


these analyses assume a normal distribution.
Outlier identification
Identification:

 based on interquartile range. For example, if Q1 and Q3 are the lower and
upper quartiles respectively, then one could define an outlier to be any observation
outside the range:

[Q1-k(Q3-Q1) ; Q3+k(Q3-Q1)]

k=1.5 indicates an "outlier", and k=3 indicates data that is "far out".

 Based on standard deviation. Usually, observation outside the range:

[Mean-3*SD ; Mean +3*SD]


Panorama of univariate analyses
Qualitative variables Quantitative variables

Frequency analysis Central tendency


statistics :
Charts •Mean
Pie chart •Median
Vertical or
horizontal bars Dispersion statistics :
• Variance / Standard
deviation
• Quartiles and interquartile
range

Charts
Histogram, boxplot
Panorama of bivariate analyses

Qualitative variable Quantitative variable

Qualitative •Cross-tab
variable •Chi-square analysis

•Mean comparison (t-


•Scatterplots
Quantitative tests, non-parametric
•Correlation
variable tests / analysis of
analysis
variance
Some basics of data analysis
Sampling and statistical inference

Sample = a subgroup of the population selected for


participation in the study.
Population

Population = Aggregate of individuals (not always persons,


sometimes companies, associations…) that is of interest for
the purpose of the research.
Sample

Census = All “individuals” in the population participate in the


study, remains exceptional.
Statistical inference

Target population
Ex. Physicians in Sample
France

“True” parameters, not


Calculated statistic
accessible
Ex. The percentage of
Ex. The “true” percentage
physicians prescribing the
of physicians prescribing Estimation seasonal flu vaccine
the seasonal flu vaccine

True parameters are estimated (or inferred). The


sample calculated statistic is used as an
estimator of the parameter
Testing hypotheses in a context of statistical inference

Research hypothesis  H1. The probability that a physician prescribes the seasonal flu
vaccine is positively influenced by the extent he/she considered him(her)self a potential
vector of transmission.

The hypothesis which is tested  Null hypothesis = The probability that a physician
prescribes the seasonal flu vaccine is not influenced by the extent he/she considered
him(her)self a potential vector of transmission.
→ a statement of the statu quo, the opposite of the defended theory
→ Often refers to a null parameter : ex. b1 = 0 (b1 being the effect of interest)
→ The decision consists of either accepting (more exactly not rejecting) or
reject H0.  Regecting H0 means that our research hypothesis is supported
Types of statistical error and statistical power in a
statistical inference context
1st case : Let’s imagine we decide that H0 is false, then two
possibilities :
► H0 is really false: The probability that a physician prescribes the
seasonal flu vaccine is influenced by the extent he/she considered
him(her)self a potential vector of transmission
► H0 is in fact true, rejection is a mistake => tests are built to
minimize the risk to make this type 1 error : this risk is called
the alpha error (α)
Ex : if α= 8%, the probability to make a mistake rejecting
H0 is equal to 8 %. We have 92% chance being right
→ Arbitrarily, α is usually fixed at 5% or at 1%
The
2nd case : We now decide that H0 is true. Then again two
possibilities
► H0 is really true : The probability that a physician prescribes the seasonal
flu vaccine is not influenced by the extent he/she considered him(her)self a
potential vector of transmission
► H0 is in fact false, we make a mistake not rejecting it => this is a type II
error associated to the beta error (β).
Ex : if β = 8%, 8 % chance to be mistaken not rejecting H0
and 92% chance being right
Synthesis
A company want to test whether a wage increase by 10% of the salespeople increase their
efficiency and the amount of sales.
Test on 30 salesmen - H0 : Increasing wage has no effect on the amount of sales

H0 is really “true” H0 is really “false”


Type I error
Decision : H0 is Alpha error α “1-α”
rejected To increase wages
although it is not
efficient

Decision : H0 is 1- β Type II error


not rejected “power of the test” Beta error β
Power is the Not to increase wages
probability of although it is efficient
correctly rejecting
H0 when it should be
rejected.
Quantitative Methods
M.Sc. Core Courses

Laurent Bertrandias
Professor of Marketing
TBS Business School

You might also like