You are on page 1of 51

BAS-110

Methods of Empirical Research

WS 2019/20
Ching-Hua Yeh
OUTLINE
• Course information
− schedule & location
− evaluation criteria

• Introduction to quantitative research methods


• *Recalls:
− treatment of data: standardizing variables & outliers
− hypothesis testing procedure
− t-tests
− basic statistics
− statistical distribution

• Introduction to ordinary least squares (OLS) method


2
COURSE INFORMATION
• Qualitative research methods (by Ms Meyer)
− Methods of qualitative research (observation/ Interview/ content analysis)

− Quality of qualitative research

− Analysis of qualitative data with atlas.ti

• Quantitative research methods (by Yeh) (theoretical couses)


− Linear regression analysis (Ordinary Least Squares, OLS)
Theoretical courses (by Yeh)
− OLS (R²/ Significance of parameters/ t-test/ F-test/ standardized coefficients/ interpretation)

− OLS assumptions (testing of multicollinearity/ Heteroscedasticity/ Autocorrelation)


Lab courses (by Ms Macht)
− Dummy variable

− Logistic regression

3
COURSE INFORMATION
• Quantitative research methods (Theoretical courses by Yeh)
− Introduction to quantitative empirical research

− Linear regression analysis (R²/ significance of parameters/ t-test/ F-test/ standardized coefficients/
interpretation)

− OLS assumptions (test of multicollinearity/ heteroscedasticity/ autocorrelation)

− Dummy variables and used in OLS

− Interaction terms used in OLS

− Introduction to logisitic regression

4
COURSE INFORMATION
• Quantitative research methods (Lab courses by Ms Macht)
− Introduction to EXCEL and SPSS

− Data preparation for the analysis in EXCEL and SPSS

− Apply most of the methods taught in the lecture. In the lab course you will be working with realistic
examples

→Aim of this course:


To be able to conduct your own empirical research based on regression analysis

5
SCHEDULE & LOCATION
Week Date Theoretical course Date Exercise courses
1 09 Oct. AFECO week 10 Oct. AFECO week
2 16 Oct. Yeh (@Nussallee 13) 17 Oct.
3 23 Oct. Yeh (@Nussallee 13) 24 Oct. Macht (@HRZ)
4 30 Oct. Yeh (@Nussallee 13) 31 Oct.
5 06 Nov. 07 Nov. Macht (@HRZ)
6 13 Nov. Yeh (@Nussallee 13) 14 Nov.
7 20 Nov. 21 Nov. Macht (@HRZ)
8 27 Nov. Yeh (@Nussallee 13) 28 Nov.
9 04 Dec. Academicus 05 Dec. Macht (@HRZ)
10 11 Dec. Yeh (@Nussallee 13) 12 Dec.
11 18 Dec. Yeh (@Nussallee 13) 19 Dec. Macht (@HRZ)
12 25 Dec. Holiday 26 Dec. Holiday
13 01 Jan. Holiday 02 Jan. Holiday
14 08 Jan. Yeh (@Nussallee 13) 09 Jan.
15 15 Jan 16 Jan. Macht (@HRZ)
16 22 Jan. 23 Jan.
17 29 Jan. Yeh (@Nussallee 13) 30 Jan. 6
COURSE INFORMATION
• Theoretical courses (Wed., 12:15 - 13:45) @Nussallee 13, HS XIII

16.10.19, 13.11.19, 18.12.19,


23.10.19, 27.11.19, 08.01.20,
30.10.19, 11.12.19, 29.01.20

• Lab courses (Thu., 14:00 - 15:30 ; 15:30 - 17:00) @HRZ, Kursraum1


*Please assign and remember the course group you belong to!
*You may bring your own laptop.

24.10.19, 05.12.19,
07.11.19, 19.12.19,
21.11.19, 16.01.20

7
The first group is full occupied.
If you don’t find your name on the list, then you are automatically assigned into the second group (15:30-17:00)

8
COURSE INFORMATION
• Quantitative research methods
− Theoretical courses (9 appointments: Wednesdays, 12:15 - 13:45 @Nussallee 13, HS XIII)
− Lab courses (6 appointments: Thursdays, 14:00 - 15:30; 15:30 - 17:00 @ HRZ, Room 1)

• Qualitative research methods


− Wednesday, @ Nussallee 13, HS XIII

• Ecampus: (BAS-110)
− Please make sure that you are registered in the course
− Lecture slides and lab course data
− Short dated changes etc. will be announced here

• Questions and feedback: chinghua.yeh@ilr.uni-bonn.de

9
COURSE INFORMATION
• Grading information:
− Written examination (50%) → Quantitative methods
▪ max. 60 minutes for answering the questions (max. 100 points in total)

▪ you need 50 points to pass the written exam

− Assignment (50%) →Qualitative methods


▪ assignment is mandatory to take the exam and to pass the course!
▪ 1st submission deadline: 18 Nov.; final submission deadline: 13 Jan.

▪ presentations will be on 15 and 22 January

10
QUANTITATIVE RESEARCH METHODS
We can consider the (quantitative) empirical research as consisting of four distinct phases:

1. Research design
− Research question/ research objective/ stating hypotheses
− Questionnaire design/ experimental setup
− Sample size/ choice of instruments…

2. Execution
− Data collection process via
▪ experimental lab, or
▪ survey

3. Data analysis
− is usually based on the acquired data and the hypotheses testing/ parametric models proposed

4. Interpretation
− To measure the parameters of the model
− To prove the validity of the model
− Is the accuracy of results sufficient to meet the criteria specified in the design phase?
− To decide whether or not the experiment has been successful
11
Statistical analysis strategy

Univariate techniques Multivariate techniques

Hypothesis testing
Dependence Independence
techniques techniques
Parametric tests Nonparametric tests
One Y Multiple Y Multiple Y & Xs
Two or more • ANOVA MANOVA SEM
One sample • Multiple
samples
T-test regression
• Conjoint Focus on Focus on
analysis variables objects/cases
Factor • Cluster analysis
Independent Paired analysis • MDS
samples samples
T-test T-test
ANOVA

12
COURSE PREREQUISITES

14
*RECALL: Z-SCORE
The standard score, or z-score, represents the number of standard
deviations a given value x falls from the mean

• To find the z-score for a given value, we can use the following formula:

𝑥−𝜇
𝑧=
𝜎

• It has a mean of 0 and a standard deviation of 1

• A z-score can be negative, positive, or zero


− If z is negative, the corresponding x-value is less than the group mean

− If z is positive, the corresponding x-value is greater than the group mean

− If the corresponding x-value is equal to the group mean 14


*RECALL: Z-SCORE
• A Z-score of 1.23 indicates that:
- it is greater than the group mean
- it is about 1.23 S.D. above the mean
- around 89.07% of the participants are
having the grade up to the grade of 87

15
*RECALL: Z-SCORE
Finding areas under the standard normal curve

• We can find the area to the left of z-score using the z-table
→ So, 89.07% of the area under the curve falls up to z = 1.23

• Using the z-table to find the area to the right of z-score, and
then subtract the area from 1
→ 10.93% of the area under the curve falls to the right of z = 1.23

• Using the z-table to find the area between two z-scores, and
then subtract the smaller area from the larger area
→ 66.41% of the area under the curve falls between z = -0.75 and z = 1.23

16
*RECALL: OUTLIERS
• An outlier is an extremely high or an extremely low value in the data
− can strongly affect the mean and standard deviation of a variable
− can have an effect on other statistics as well

• Detecting outliers you will need the following information:


− calcualte quartiles (Q1, Q3) and interquartile range (IQR = Q3 - Q1)
− calculate upper and lower boundaries
▪ Lower range limit =Q1 - (1.5 × IQR)

▪ Upper range limit =Q3 + (1.5 × IQR)

− an intuitive way: Boxplot

17
*RECALL: OUTLIERS
A boxplot can be used to check for outliers:
Outliers

Upper limit =Q3+ (1.5 × IQR)

Q3

IQR Median

Q1
Lower limit =Q1- (1.5 × IQR) 18
*RECALL: HYPOTHESIS TESTING PROCEDURE
1. State the relevant null and alternative hypotheses as well as the level of
significance
- Set alpha (e.g. α=0.05)
- Set H0 and Ha

2. Choose and compute the relevant test statistic

3. Determination of the critical value from the respective table

4. Decision rules:
- e.g.
If test statistic > critical value, reject H0
If test statistic < critical value, fail to reject H0

19
T-TESTS
• t-test is a parametric test based on the normal distribution, it is used to test
whether there is a difference between the two groups in the mean

• Why can’t we just calculate the “difference” from the scores?


Because we have to take the ‘variability’ into account!

difference between (group) means


𝑇=
sampling variability

• Assumptions:
– unknown variance of the population σ²
– population following a normal distribution
– or n > 30
20
T-TESTS
There are types of t-test:
• One sample t-test:
– A single sample mean against a hypothesis
– tests whether a single sample mean is significantly different from an expected value.

• Paired-samples t-test:
– Two means within the same sample.
– tests the relationship between two associated samples, e.g. means obtained in two conditions within a single group of
participants

• Independent-samples t-test:
– Two sample means compared to each other
– tests the relationship between two independent samples

24
21
ONE SAMPLE T-TEST
• One-sample t-test:
− A single sample mean against a hypothesis. Used to test whether the
population mean is different from a specified value.

▪ Null hypothesis H0 : 𝜇𝑚𝑒𝑎𝑛 = 𝜇𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑

▪ Alternative hypothesis HA : 𝜇𝑚𝑒𝑎𝑛 ≠ 𝜇𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑

− T-test statistics:

X −
T =
s/ n Std .Err. =
s
n

25
22
PAIRED-SAMPLE T-TEST
• Paired-samples t-test:
− The paired samples t-test is used to compare the means of two dependent samples.

▪ Null hypothesis H0: 𝜇𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 = 0

▪ Alternative hypothesis HA: 𝜇𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 ≠ 0

− T-test statistics:

d −0 σ 𝑑𝑖2 − 𝑛 × 𝑑 2
t= where 𝑆𝑑 =
sd / n 𝑛−1

With d.f. = n -1,


Where 𝑑ҧ is the mean of the differences and sd is the standard deviations of the differences.
23
27
T-TESTS
• Independent samples t-test:
− Two sample means compared to each other

▪ Null hypothesis H0 : 𝑏𝐴 = 𝑏𝐵

▪ Alternative hypothesis HA : 𝑏𝐴 ≠ 𝑏𝐵

− T-test statistics:
( x1 − x2 )
t= (n1 − 1) S12 + (n2 − 1) S22
1 1 S =
2
sp ( +
2
) where p
n1 + n2 − 2
n1 n2

With degrees of freedom = n(1) + n(2) - 2


29
24
MEASUREMENT SCALES OF VARIABLES
• Categorical variables contain a finite number of categories or distinct
groups, so the values can be:
− Nominal variables ( e.g. hair color for blonde/ black/ red)
− Dichotomous/Binary variables (e.g. yes or no)
− Ordinal variables (e.g., economic status for low/ median/ high…)

• Continuous variables can take on any value


− Interval variables, (e.g. temperature measure in Celsius for 11℃ to 20℃, 21℃ to 30℃,
31℃ to 40℃, 41℃ to 50℃, 51℃ to 60℃, 61℃ to 70℃, …)
− Ratio variables, (e.g. weight for 2 kgs, 4 kgs, 8 kgs…)

25
*RECALL: BASIC STATISTICS
• How do we describe the characteristics of a set of data?
− Statistics (e.g. mean, standard deviation, variance, etc. )
− Graphical display (e.g. histogram, boxplots, etc.)

ID Height (cm) ID Height (cm)


1 141 11 170
2 152 12 176
3 151 13 173
4 158 14 184
5 164 15 180
6 160 16 184
7 161 17 182
8 165 18 192
9 175 19 190
10 174 20 201 26
*RECALL: BASIC STATISTICS
σ 𝒙𝒊
• Mean: the sum divided by the count. ഥ=
𝒙
𝒏
• Mode: a number that appears most often is the mode.

• Median: the middle of a sorted list of numbers

• Range: maximum value – minimum value

σ(𝒙−𝒙)𝟐
• (Sample) Standard deviation: a measure of how spread out data are. 𝒔 = 𝑽𝒂𝒓 =
𝒏−𝟏

𝟐 σ(𝒙−𝒙)𝟐
• (Sample) Variance: The average of the squared differences from the Mean. 𝑽𝒂𝒓 = 𝒔 =
𝒏−𝟏

σ(𝒙−𝒙)(𝒚−𝒚)
• (Sample) Covariance: a measure of the joint variability of two variables 𝒄𝒐𝒗 𝒙, 𝒚 =
𝒏−𝟏

27
STATISTICS
ID Height (cm) ID Height (cm) • Mean: 𝑥ҧ =171.65
1 141 11 170
2 152 12 176 • Mode = 184
3 151 13 173
• Median: = 173.5
4 158 14 184
5 164 15 180 • Range: 60
6 160 16 184
7 161 17 182
σ(𝑥−𝑥)2
8 165 18 192 • Standard deviation: 𝑠 = 𝑉𝑎𝑟 = = 15.26
𝑛−1
9 175 19 190
10 174 20 201
σ(𝑥−𝑥)2
• Variance: 𝑉𝑎𝑟 = 𝑠 2 = = 232.87
𝑛−1

28
GRAPHICAL DISPLAY
ID Height (cm) ID Height (cm) Height (cm) Freq.
1 141 11 170 140-149 1
2 152 12 176
3 151 13 173 150-159 3
4 158 14 184 160-169 4
5 164 15 180 170-179 5
6 160 16 184 180-189 4
7 161 17 182
8 165 18 192 190-199 3
9 175 19 190 200-209 1
10 174 20 201 (Total N=20)

29
GRAPHICAL DISPLAY
Height (cm) Freq.
140-149 1 6
150-159 3 5
160-169 4
4
170-179 5

Freq.
180-189 4 3
190-199 2
2
200-209 1
(Total N=20) 1

0
140-149 150-159 160-169 170-179 180-189 190-199 200-209

Height (cm) 30
*RECALL: STATISTICAL DISTRIBUTION
• Types of distributions
− Discrete (e.g. binomial distribution, Poisson distribution etc.)
− Continuous (e.g. normal distribution, exponential distribution etc.)

• When describing a distribution, we always include:


− Shape (e.g. symmetrical; skewed; uniform)
− Central tendency
− Spread (variability)

31
*RECALL: STATISTICAL DISTRIBUTION
• Shape
− Symmetric vs. skewed Symmetric

• Central tendency
− Where most of the data located
− Mean, median, mode

Mode = mean = median 32


*RECALL: STATISTICAL DISTRIBUTION
• Shape
− Symmetric vs. skewed

Negative skew Positive skew


33
Is the mean always the best measure of central tendency?
→ skew pulls the mean in the direction of the skew

Mode
Mode
Median

Median
Mean
Mean 34
*RECALL: STATISTICAL DISTRIBUTION
• Shape
− Symmetric vs. skewed

• Central tendency
− Where most of the data located
− Mean, median, mode

• Spread (Variability)
− How similar the values are
− Range, standard deviation, variance

35
NORMAL DISTRIBUTION PROPERTIES
• Normal distribution (/ normal curve/ gaussian distribution)
• If a variable x data follows a normal distribution, then:
− it has continuous data
− its density curve is bell-shaped and perfectly symmetric,
− and characterized by its mean 𝜇 and standard deviation 𝜎, which denoted
by x ~𝑁(𝜇, 𝜎)
− mean = median = mode
− A mathematical theory, the Central Limit Theorem, allows us to determine
what scores in the distribution are between 𝜎, 2𝜎, and 3𝜎 from the mean
36
*RECALL: NORMAL DISTRIBUTION
The height of the density curve at any point x is given by
the density function:

1 x− 2
1 − ( )
Y= f ( x) = e 2 
 2 , −∞ < 𝑥 < +∞

A normal curve with different


Constants: centers and spreads depending
=3.14159 on  and 
e=2.71828 37
*RECALL: NORMAL DISTRIBUTION
It is a probability function, therefore no matter what the values of  and
, must integrate to 1.

+ 1 x− 2
1 − ( )

− 2
 e 2  dx =1

38
CENTRAL LIMIT THEOREM:
• Use the mean to describe the center and S.D. to describe spread of a normal distribution

• 68.25% of the scores are within one standard deviation of the mean

• 95.44% of the scores are within 2 standard deviation of the mean

• 99.72%(or most of the scores) are within 3 standard deviations of the mean

𝜇+𝜎
1 1 𝑥−𝜇 2
න • 𝑒 −2(𝜎
)
𝑑𝑥 = 0.6825
𝜎 2𝜋
𝜇−𝜎
68.25%
𝜇+2𝜎

SD SD න
1
𝜎 2𝜋
1 𝑥−𝜇 2
• 𝑒 −2( 𝜎
)
𝑑𝑥 = 0.9544
𝜇−2𝜎

95.44% 𝜇+3𝜎
1 1 𝑥−𝜇 2
න • 𝑒 −2( 𝜎
)
𝑑𝑥 = 0.9972
𝜎 2𝜋
99.72% 𝜇−3𝜎

-3𝜎 -2𝜎 -𝜎 0 𝜎 2𝜎 3𝜎 39
ASSESSING DISTRIBUTION SHAPE:
NORMAL DISTRIBUTION
• Graphical approach: Histograms

• Statistical approach:
− Checking value of skewness and kurtosis
(using rule of thumb ± 1.00 criterion. The further the value is from zero, the more likely it is that the data are
not normally distributed)

− Applying Shapiro-Wilk and/or the Kolmogorov-Smirnov tests


i. Null hypothesis H0: the distribution is normal in shape

ii. They compare the scores in the sample to a normally distributed set of scores with the same mean and
standard deviation

iii. Limitation: not suitable for large sample because it is very easy to get significant results for large sample

40
HOW DO WE EXPLORE THE
RELATIONSHIP BETWEEN TWO
VARIABLES?
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
ID Studying Grade (Y)
hours (X) (by points ranging from 0-100)

1 4.5 99
2 2 66
3 1.5 55
4 3.5 84
5 1 26
6 2.5 75
7 4 92
8 3 70
9 2.5 52
10 1.5 40
42
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
1. Graphical method
2. Correlation
3. (Linear) Regression analysis

43
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
1. Graphical method: using scatter diagram
− Scatter diagram pairs of bivariate observations (x, y) on the X-Y plane and provide an
initial exploration of the relationship between two variables

− The pattern of data is indicative of the type of relationship between the two variables:
▪ positive relationship
▪ negative relationship
▪ no relationship

44
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
1. Graphical method: using scatter diagram
─ Plotting n pairs of observations (x1, y1), (x2, y2), …, (xn, yn).

X
45
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
2. Correlation:
− So called bivariate correlation, Pearson‘s correlation, or
Pearson product-moment correlation
𝐶𝑜𝑣(𝑥, 𝑦)
− How to calculate the simple correlation coefficient (𝑟): 𝑟=
𝜎𝑥 𝜎𝑦
− Correlation (𝑟) is a measure of association between two continues variables: −1 ≤ 𝑟 ≤ 1

− 𝑟 is used to determine the nature and strength between two variables without being able
to infer causal relationships
o Positive sign of 𝑟 means the relation is direct
o Negative sign of 𝑟 means the relation is indirect and inverse
o 𝑟 = 0 represents no linear relationship between the two variables
46
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
2. Correlation:

Cov ( x, y )
r= = 0.93
xy

→ Direct strong correlation

47
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
2. Correlation:

−1 ≤ 𝑟 ≤ 1

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


Indirect Direct

Perfect negative no relation Perfect positive


correlation correlation
48
SCATTER PLOTS OF DATA WITH VARIOUS
CORRELATION COEFFICIENTS

Y Y Y

X X X
r = -1 r = -0.6 r=0

Y Y Y

X X X
r = +1 r = +0.3 r=0 49
REFERENCES
• Verbeek (2012). A Guide to Modern Econometrics, Wiley. 4th ed.
• Backhaus K. et al. (2005). Multivariate Analysemethoden – Eine
• Mann, P. S. (2007). Introductory statistics. John Wiley & Sons.
• Gujarati, D. N. (2004). Basic Econometrics. 4th Ed. McGraw-Hill
• Studenmund, A.H. (2006). Using Econometrics. A practical guide, Pearson/Addison Wesley
Publisher. 6th ed.

50
Feedback or questions: chinghua.yeh@ilr.uni-bonn.de
Next lecture: 23 Oct. 2018 51

You might also like