Professional Documents
Culture Documents
WS 2019/20
Ching-Hua Yeh
OUTLINE
• Course information
− schedule & location
− evaluation criteria
− Logistic regression
3
COURSE INFORMATION
• Quantitative research methods (Theoretical courses by Yeh)
− Introduction to quantitative empirical research
− Linear regression analysis (R²/ significance of parameters/ t-test/ F-test/ standardized coefficients/
interpretation)
4
COURSE INFORMATION
• Quantitative research methods (Lab courses by Ms Macht)
− Introduction to EXCEL and SPSS
− Apply most of the methods taught in the lecture. In the lab course you will be working with realistic
examples
5
SCHEDULE & LOCATION
Week Date Theoretical course Date Exercise courses
1 09 Oct. AFECO week 10 Oct. AFECO week
2 16 Oct. Yeh (@Nussallee 13) 17 Oct.
3 23 Oct. Yeh (@Nussallee 13) 24 Oct. Macht (@HRZ)
4 30 Oct. Yeh (@Nussallee 13) 31 Oct.
5 06 Nov. 07 Nov. Macht (@HRZ)
6 13 Nov. Yeh (@Nussallee 13) 14 Nov.
7 20 Nov. 21 Nov. Macht (@HRZ)
8 27 Nov. Yeh (@Nussallee 13) 28 Nov.
9 04 Dec. Academicus 05 Dec. Macht (@HRZ)
10 11 Dec. Yeh (@Nussallee 13) 12 Dec.
11 18 Dec. Yeh (@Nussallee 13) 19 Dec. Macht (@HRZ)
12 25 Dec. Holiday 26 Dec. Holiday
13 01 Jan. Holiday 02 Jan. Holiday
14 08 Jan. Yeh (@Nussallee 13) 09 Jan.
15 15 Jan 16 Jan. Macht (@HRZ)
16 22 Jan. 23 Jan.
17 29 Jan. Yeh (@Nussallee 13) 30 Jan. 6
COURSE INFORMATION
• Theoretical courses (Wed., 12:15 - 13:45) @Nussallee 13, HS XIII
24.10.19, 05.12.19,
07.11.19, 19.12.19,
21.11.19, 16.01.20
7
The first group is full occupied.
If you don’t find your name on the list, then you are automatically assigned into the second group (15:30-17:00)
8
COURSE INFORMATION
• Quantitative research methods
− Theoretical courses (9 appointments: Wednesdays, 12:15 - 13:45 @Nussallee 13, HS XIII)
− Lab courses (6 appointments: Thursdays, 14:00 - 15:30; 15:30 - 17:00 @ HRZ, Room 1)
• Ecampus: (BAS-110)
− Please make sure that you are registered in the course
− Lecture slides and lab course data
− Short dated changes etc. will be announced here
9
COURSE INFORMATION
• Grading information:
− Written examination (50%) → Quantitative methods
▪ max. 60 minutes for answering the questions (max. 100 points in total)
10
QUANTITATIVE RESEARCH METHODS
We can consider the (quantitative) empirical research as consisting of four distinct phases:
1. Research design
− Research question/ research objective/ stating hypotheses
− Questionnaire design/ experimental setup
− Sample size/ choice of instruments…
2. Execution
− Data collection process via
▪ experimental lab, or
▪ survey
3. Data analysis
− is usually based on the acquired data and the hypotheses testing/ parametric models proposed
4. Interpretation
− To measure the parameters of the model
− To prove the validity of the model
− Is the accuracy of results sufficient to meet the criteria specified in the design phase?
− To decide whether or not the experiment has been successful
11
Statistical analysis strategy
Hypothesis testing
Dependence Independence
techniques techniques
Parametric tests Nonparametric tests
One Y Multiple Y Multiple Y & Xs
Two or more • ANOVA MANOVA SEM
One sample • Multiple
samples
T-test regression
• Conjoint Focus on Focus on
analysis variables objects/cases
Factor • Cluster analysis
Independent Paired analysis • MDS
samples samples
T-test T-test
ANOVA
12
COURSE PREREQUISITES
14
*RECALL: Z-SCORE
The standard score, or z-score, represents the number of standard
deviations a given value x falls from the mean
• To find the z-score for a given value, we can use the following formula:
𝑥−𝜇
𝑧=
𝜎
15
*RECALL: Z-SCORE
Finding areas under the standard normal curve
• We can find the area to the left of z-score using the z-table
→ So, 89.07% of the area under the curve falls up to z = 1.23
• Using the z-table to find the area to the right of z-score, and
then subtract the area from 1
→ 10.93% of the area under the curve falls to the right of z = 1.23
• Using the z-table to find the area between two z-scores, and
then subtract the smaller area from the larger area
→ 66.41% of the area under the curve falls between z = -0.75 and z = 1.23
16
*RECALL: OUTLIERS
• An outlier is an extremely high or an extremely low value in the data
− can strongly affect the mean and standard deviation of a variable
− can have an effect on other statistics as well
17
*RECALL: OUTLIERS
A boxplot can be used to check for outliers:
Outliers
Q3
IQR Median
Q1
Lower limit =Q1- (1.5 × IQR) 18
*RECALL: HYPOTHESIS TESTING PROCEDURE
1. State the relevant null and alternative hypotheses as well as the level of
significance
- Set alpha (e.g. α=0.05)
- Set H0 and Ha
4. Decision rules:
- e.g.
If test statistic > critical value, reject H0
If test statistic < critical value, fail to reject H0
19
T-TESTS
• t-test is a parametric test based on the normal distribution, it is used to test
whether there is a difference between the two groups in the mean
• Assumptions:
– unknown variance of the population σ²
– population following a normal distribution
– or n > 30
20
T-TESTS
There are types of t-test:
• One sample t-test:
– A single sample mean against a hypothesis
– tests whether a single sample mean is significantly different from an expected value.
• Paired-samples t-test:
– Two means within the same sample.
– tests the relationship between two associated samples, e.g. means obtained in two conditions within a single group of
participants
• Independent-samples t-test:
– Two sample means compared to each other
– tests the relationship between two independent samples
24
21
ONE SAMPLE T-TEST
• One-sample t-test:
− A single sample mean against a hypothesis. Used to test whether the
population mean is different from a specified value.
− T-test statistics:
X −
T =
s/ n Std .Err. =
s
n
25
22
PAIRED-SAMPLE T-TEST
• Paired-samples t-test:
− The paired samples t-test is used to compare the means of two dependent samples.
− T-test statistics:
d −0 σ 𝑑𝑖2 − 𝑛 × 𝑑 2
t= where 𝑆𝑑 =
sd / n 𝑛−1
▪ Null hypothesis H0 : 𝑏𝐴 = 𝑏𝐵
▪ Alternative hypothesis HA : 𝑏𝐴 ≠ 𝑏𝐵
− T-test statistics:
( x1 − x2 )
t= (n1 − 1) S12 + (n2 − 1) S22
1 1 S =
2
sp ( +
2
) where p
n1 + n2 − 2
n1 n2
25
*RECALL: BASIC STATISTICS
• How do we describe the characteristics of a set of data?
− Statistics (e.g. mean, standard deviation, variance, etc. )
− Graphical display (e.g. histogram, boxplots, etc.)
σ(𝒙−𝒙)𝟐
• (Sample) Standard deviation: a measure of how spread out data are. 𝒔 = 𝑽𝒂𝒓 =
𝒏−𝟏
𝟐 σ(𝒙−𝒙)𝟐
• (Sample) Variance: The average of the squared differences from the Mean. 𝑽𝒂𝒓 = 𝒔 =
𝒏−𝟏
σ(𝒙−𝒙)(𝒚−𝒚)
• (Sample) Covariance: a measure of the joint variability of two variables 𝒄𝒐𝒗 𝒙, 𝒚 =
𝒏−𝟏
27
STATISTICS
ID Height (cm) ID Height (cm) • Mean: 𝑥ҧ =171.65
1 141 11 170
2 152 12 176 • Mode = 184
3 151 13 173
• Median: = 173.5
4 158 14 184
5 164 15 180 • Range: 60
6 160 16 184
7 161 17 182
σ(𝑥−𝑥)2
8 165 18 192 • Standard deviation: 𝑠 = 𝑉𝑎𝑟 = = 15.26
𝑛−1
9 175 19 190
10 174 20 201
σ(𝑥−𝑥)2
• Variance: 𝑉𝑎𝑟 = 𝑠 2 = = 232.87
𝑛−1
28
GRAPHICAL DISPLAY
ID Height (cm) ID Height (cm) Height (cm) Freq.
1 141 11 170 140-149 1
2 152 12 176
3 151 13 173 150-159 3
4 158 14 184 160-169 4
5 164 15 180 170-179 5
6 160 16 184 180-189 4
7 161 17 182
8 165 18 192 190-199 3
9 175 19 190 200-209 1
10 174 20 201 (Total N=20)
29
GRAPHICAL DISPLAY
Height (cm) Freq.
140-149 1 6
150-159 3 5
160-169 4
4
170-179 5
Freq.
180-189 4 3
190-199 2
2
200-209 1
(Total N=20) 1
0
140-149 150-159 160-169 170-179 180-189 190-199 200-209
Height (cm) 30
*RECALL: STATISTICAL DISTRIBUTION
• Types of distributions
− Discrete (e.g. binomial distribution, Poisson distribution etc.)
− Continuous (e.g. normal distribution, exponential distribution etc.)
31
*RECALL: STATISTICAL DISTRIBUTION
• Shape
− Symmetric vs. skewed Symmetric
• Central tendency
− Where most of the data located
− Mean, median, mode
Mode
Mode
Median
Median
Mean
Mean 34
*RECALL: STATISTICAL DISTRIBUTION
• Shape
− Symmetric vs. skewed
• Central tendency
− Where most of the data located
− Mean, median, mode
• Spread (Variability)
− How similar the values are
− Range, standard deviation, variance
35
NORMAL DISTRIBUTION PROPERTIES
• Normal distribution (/ normal curve/ gaussian distribution)
• If a variable x data follows a normal distribution, then:
− it has continuous data
− its density curve is bell-shaped and perfectly symmetric,
− and characterized by its mean 𝜇 and standard deviation 𝜎, which denoted
by x ~𝑁(𝜇, 𝜎)
− mean = median = mode
− A mathematical theory, the Central Limit Theorem, allows us to determine
what scores in the distribution are between 𝜎, 2𝜎, and 3𝜎 from the mean
36
*RECALL: NORMAL DISTRIBUTION
The height of the density curve at any point x is given by
the density function:
1 x− 2
1 − ( )
Y= f ( x) = e 2
2 , −∞ < 𝑥 < +∞
+ 1 x− 2
1 − ( )
− 2
e 2 dx =1
38
CENTRAL LIMIT THEOREM:
• Use the mean to describe the center and S.D. to describe spread of a normal distribution
• 68.25% of the scores are within one standard deviation of the mean
• 99.72%(or most of the scores) are within 3 standard deviations of the mean
𝜇+𝜎
1 1 𝑥−𝜇 2
න • 𝑒 −2(𝜎
)
𝑑𝑥 = 0.6825
𝜎 2𝜋
𝜇−𝜎
68.25%
𝜇+2𝜎
SD SD න
1
𝜎 2𝜋
1 𝑥−𝜇 2
• 𝑒 −2( 𝜎
)
𝑑𝑥 = 0.9544
𝜇−2𝜎
95.44% 𝜇+3𝜎
1 1 𝑥−𝜇 2
න • 𝑒 −2( 𝜎
)
𝑑𝑥 = 0.9972
𝜎 2𝜋
99.72% 𝜇−3𝜎
-3𝜎 -2𝜎 -𝜎 0 𝜎 2𝜎 3𝜎 39
ASSESSING DISTRIBUTION SHAPE:
NORMAL DISTRIBUTION
• Graphical approach: Histograms
• Statistical approach:
− Checking value of skewness and kurtosis
(using rule of thumb ± 1.00 criterion. The further the value is from zero, the more likely it is that the data are
not normally distributed)
ii. They compare the scores in the sample to a normally distributed set of scores with the same mean and
standard deviation
iii. Limitation: not suitable for large sample because it is very easy to get significant results for large sample
40
HOW DO WE EXPLORE THE
RELATIONSHIP BETWEEN TWO
VARIABLES?
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
ID Studying Grade (Y)
hours (X) (by points ranging from 0-100)
1 4.5 99
2 2 66
3 1.5 55
4 3.5 84
5 1 26
6 2.5 75
7 4 92
8 3 70
9 2.5 52
10 1.5 40
42
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
1. Graphical method
2. Correlation
3. (Linear) Regression analysis
43
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
1. Graphical method: using scatter diagram
− Scatter diagram pairs of bivariate observations (x, y) on the X-Y plane and provide an
initial exploration of the relationship between two variables
− The pattern of data is indicative of the type of relationship between the two variables:
▪ positive relationship
▪ negative relationship
▪ no relationship
44
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
1. Graphical method: using scatter diagram
─ Plotting n pairs of observations (x1, y1), (x2, y2), …, (xn, yn).
X
45
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
2. Correlation:
− So called bivariate correlation, Pearson‘s correlation, or
Pearson product-moment correlation
𝐶𝑜𝑣(𝑥, 𝑦)
− How to calculate the simple correlation coefficient (𝑟): 𝑟=
𝜎𝑥 𝜎𝑦
− Correlation (𝑟) is a measure of association between two continues variables: −1 ≤ 𝑟 ≤ 1
− 𝑟 is used to determine the nature and strength between two variables without being able
to infer causal relationships
o Positive sign of 𝑟 means the relation is direct
o Negative sign of 𝑟 means the relation is indirect and inverse
o 𝑟 = 0 represents no linear relationship between the two variables
46
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
2. Correlation:
Cov ( x, y )
r= = 0.93
xy
47
HOW DO WE EXPLORE THE RELATIONSHIP
BETWEEN TWO VARIABLES?
2. Correlation:
−1 ≤ 𝑟 ≤ 1
Y Y Y
X X X
r = -1 r = -0.6 r=0
Y Y Y
X X X
r = +1 r = +0.3 r=0 49
REFERENCES
• Verbeek (2012). A Guide to Modern Econometrics, Wiley. 4th ed.
• Backhaus K. et al. (2005). Multivariate Analysemethoden – Eine
• Mann, P. S. (2007). Introductory statistics. John Wiley & Sons.
• Gujarati, D. N. (2004). Basic Econometrics. 4th Ed. McGraw-Hill
• Studenmund, A.H. (2006). Using Econometrics. A practical guide, Pearson/Addison Wesley
Publisher. 6th ed.
50
Feedback or questions: chinghua.yeh@ilr.uni-bonn.de
Next lecture: 23 Oct. 2018 51