Professional Documents
Culture Documents
Theories
Types of Variables
Continuous variables:
Always numeric Can be any number, positive or negative Examples: age in years, weight, blood pressure readings, temperature, concentrations of pollutants and other measurements
Information that can be sorted into categories Types of categorical variables ordinal, nominal and dichotomous (binary)
Categorical variables:
Categorical Variables:
Ordinal Variables
Ordinal variablea categorical variable with some intrinsic order or numeric value Examples of ordinal variables:
Education (no high school degree, HS degree, some college, college degree) Agreement (strongly disagree, disagree, neutral, agree, strongly agree) Rating (excellent, good, fair, poor) Frequency (always, often, sometimes, never) Any other scale (On a scale of 1 to 5...)
Categorical Variables:
Nominal Variables
Nominal variable a categorical variable without an intrinsic order Examples of nominal variables:
Where a person lives in the U.S. (Northeast, South, Midwest, etc.) Sex (male, female) Nationality (American, Mexican, French) Race/ethnicity (African American, Hispanic, White, Asian American) Favorite pet (dog, cat, fish, snake)
Categorical Variables:
Dichotomous Variables
Dichotomous (or binary) variables a categorical variable with only 2 levels of categories
Often represents the answer to a yes or no question Did you attend the church picnic on May 24? Did you eat potato salad at the picnic? Anything with only 2 categories
For example:
Process
How do you go from point A to point B Process an organization goes through from a small firm to a large corporation.
Relationship between variables
Variable
Rational Approach
10
Gender
1 1 1 1 1 1 1
Height
69.5 70.1 68.2 70.9 71.9 69.2 71.9
Smoker
0 0 0 0 1 1 1
Exercise
25 24 26 26 20 15 0
Age
47 67 36 68 58 19 40
11 11
Preliminary Analyses
The table below shows some descriptive statistics for each variable. What basic statements about our data can we make from this?
Lung Capacity (cc) Mean Stdev Min Max 5325.60 410.48 4233.71 6261.00 Gende r 0.50 0.50 0.00 1.00 Smoke r 0.39 0.49 0.00 1.00 Exercis e 21.35 8.91 0.00 40.29
12 12
256.41
30.00 4837.4 5 273.74 20.00 5191.5 8 391.51 50.00
284.71
31.00 5129.0 5 297.51 19.00 5459.6 1 387.93 50.00
293.75
61.00 4979.51 318.12 39.00 5325.60 410.48 100.00
Does there appear to be a relationship between, Smoking, Gender, and Lung Capacity?
13 13
Distributions
Lung Capacity (cc.)
40 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 4400 4800 5200 5600 6000 More Capacity in cc, up to number shown
Height Distribution
50 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 60 64 68 72 76 More Height in Inches
Frequency
30 20 10 0
Frequency
40 30 20 10 0
Distribution of Age
30 25 20 15 10 5 0 20 30 40 50 60 70 80 More Age in years 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
Frequency
Frequency
14 14
Exer cise
20 0
A ge
15 15
10
Frequency
4400
6000
6400
Frequency
Non-Smokers have a larger lung capacity than smokers on average. What about the variance?
16 16
Simple Regression
How well can exercise time alone predict the lung capacity?
Lung Capacity and Exercise Time
7000 y = 28.71x + 4712.5 R2 = 0.3881
6000 5000 4000 3000 2000 1000 0 0 10 20 30 40 50 Minutes of exercise per day
17 17
Multiple Regression
How do all the Xs together help predict y?
SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations Coefficients Intercept Gender Height Smoker Exercise Age 1662.3965 202.3282 50.3468 -278.9711 11.2949 -0.1174 Standard Error 475.1456634 41.86861042 7.08207335 52.71395448 2.991170972 1.462303258 t Stat 3.498709192 4.832456809 7.109058989 -5.292169492 3.776112614 -0.080303367 0.8798341 0.7741081 0.7620926 200.21 100 P-value 0.000716253 5.23607E-06 2.24959E-10 7.88193E-07 0.000279023 0.936166702
18 18
Males
Total
30
100
70
100
100
200
Males
Total
49
100
51
100
100
200
Females are more likely than males to be PTI members but now relationship is weaker
20 20
Males
Total
5
100
95
100
100
200
Create a contingency table, listing the values of the cause as rows and the values of the effect as columns. Think of 100 hypothetical people for each row of the table; that is, set the marginal frequency of each row to equal 100. Of these 100 people, specify how many you think will fall into each column category; this represents the percentage of people in each category.
22
Sindhi
Balochi Total
25
50 100
50
25 100
25
25 100
100
100 300
23 23
Proposition 1: Punjabi, Sindhi and Balochi are equally likely to be members of PTI. Proposition 2: Sindhis are more likely to be PPP members than either Punjabis or Balochis; Punjabis and Balochis are equally likely to be PPP members. Proposition 3: Punjabi and Balochi both are more likely than Sindhis to be PML(N) members;
24
Often of greatest interest in social science is investigation into relationships between variables: is social class related to political perspective? is income related to education? is worker alienation related to job monotony? We are also interested in the direction of causation, but this is more difficult to prove empirically: our empirical models are usually structured assuming a particular theory of causation
25
The most straight forward way to investigate evidence for relationship is to look at scatter plots:
traditional to:
put the dependent variable (I.e. the effect) on the vertical axis
or y axis
put the explanatory variable (I.e. the cause) on the horizontal axis
or x axis
26
30000
20000
INCOME
IQ
27
30000
20000
INCOME
IQ
28
Co
Model 1
(Cons tant) IQ
30000
20000
INCOME
IQ2
30
30000
20000
INCOME
IQ2
31
30000
20000
INCOME
IQ2
32
30000
20000
INCOME
IQ2
33
30 20 10 0 0 5 10 X 15 20 25
Inverted-U Theory
This theory illustrates the relationship between the market structures and technological advances.
Technological Discontinuities
Discontinuity
Private Public
Private Universitys faculty has higher salaries than public universitys faculty
41 41
Liberalness = 2
Liberalness = 1
100
100
Liberalness = 2
Liberalness = 1
40
30
60
70
100
100
Liberalness = 2
Liberalness = 1
0.40
0.30
0.60
0.70
100
100
Moderated relationships involve 3 variables. Focus on cases where the strength of a relationship between two variables changes depending on the value of a third variable. Examples:
Inflation has a bigger influence in economies in underdeveloped countries as opposed to developed. Higher level of education are more likely to translate into job opportunities for punjabis as opposed to Balochis.
45
Males
5.0
4.0
Gender differences in Universitys satisfaction during MBA are larger than gender differences in BBA
46 46
Create a factorial table with the moderator variable (MV) as columns and the focal IV as rows. Fill in plausible hypothetical mean values on the outcome variable for each cell of the table. Calculate the effect of the focal IV at each level of the MV and then calculate the interaction contrast to determine if there is a moderated relationship.
47
Interaction Contrast
BBA Females A MBA C
Males
Interaction contrast = (a-b)-(c-d). If this value is non-zero, then a moderated relationship is present.
48 48
Males
5.0
4.0
4.0
49 49
Females Males
Gender differences in Universitys satisfaction during BBA are larger than in FSc.
50 50
Medium
Low
3.0
3.0
5.0
6.0
Quantitative variables
High Male 3.0
Female
4.0
High Low
Medium
3.0
5.0
Medium Low
The effect of spending large amount of time versus moderate is stronger for females than for males.
52 52