Professional Documents
Culture Documents
Independent: explanatory
Age, sex, diet, exercise
Latent constructs
SES, satisfaction, health status
Measurable indicators
education, employment, revisit, miles walked
Variables in data example
Name # of Position
characters
STFIPS FIPS 1 2
CODE (STATE)
STCENSUS 1 3
LEVEL 1 4
STABBREV 1 5
AREANAME 7 6
NAME OF
US/STATE/COUN
TY
POPULATION 7 13
1992 ABS
ITEM002
xyz 20
Data
Data screening and transformation
Normality
Independence
Correlation (or lack of independence)
Variable types and measures of
central tendency
Nominal: mode
Ordinal: median
Interval: Mean
Ratio: Geometric mean and harmonic
mean
Simple linear regression
Y = A + BX
X
Correlation
Mean =
Variance (SD)2 =
Population covariance = (X- x)(Y- y)
Product moment coefficient=
=xy/ x y
It lies between -1 and 1
Example physical and mental health
indicators
Correlations
PHYSICAL MENTAL
PHYSICAL Pearson Correlation 1.000 .230**
Sig. (2-tailed) . .000
N 109888 109888
MENTAL Pearson Correlation .230** 1.000
Sig. (2-tailed) .000 .
N 109888 109888
**. Correlation is significant at the 0.01 level (2-tailed).
Negative correlation
Correlations
WEIGHT AGEDIAB
WEIGHT Pearson Correlation 1.000 -.029**
Sig. (2-tailed) . .000
N 109888 109888
AGEDIAB Pearson Correlation -.029** 1.000
Sig. (2-tailed) .000 .
N 109888 109888
**. Correlation is significant at the 0.01 level (2-tailed).
Population covariance
=0.88
Multiple regression and correlation
Simple linear Y = + X
Multiple regression Y = + 1X1 + 2X2 + 3X3 . . .+ pXp
EF ejection fraction
Exercise
Body fat
Issues with regression
Missing values
random
pattern
mean substitution and ML
Dummy variables
equal intervals!
Multicollinearity
independent variables are highly
correlated
Garbage can method
Canonical correlation
An extension of multiple regression
Multiple Y variables and multiple X
variables
Finding several linear combinations of the
X var and the same number of linear
combinations of the Y var.
These combinations are called canonical
variables and the correlations between the
corresponding pairs of canonical variables
are called CANONICAL CORRELATIONS
Correlation matrix
Correlations
WTFORHTX
Pearson Correlation
Data screening and transformation
TX
1.000
GENHLTH
.072**
H
-.008**
TH
.016**
TH
-.005
N
.023**
BPTAKE
.011**
TOLDHI
.000
Sig. (2-tailed) . .000 .006 .000 .208 .000 .000 .903
GENHLTH
N
Pearson Correlation
Normality
109888
.072**
109888
1.000
109888
-.228**
109888
-.061**
54351
-.147**
109888
.035**
108445
-.084**
77436
-.091**
Sig. (2-tailed)
N
Independence
.000
109888 109888
. .000
109888
.000
109888
.000
54351
.000
109888
.000
108445
.000
77436
PHYSHLTH Pearson Correlation -.008** -.228** 1.000 .223** .295** -.011** .083** .030**
Sig. (2-tailed)
N
Correlation (or lack of independence)
.006
109888
.000
109888 109888
. .000
109888
.000
54351
.000
109888
.000
108445
.000
77436
MENTHLTH Pearson Correlation .016** -.061** .223** 1.000 -.120** -.038** .019** .014**
Sig. (2-tailed) .000 .000 .000 . .000 .000 .000 .000
N 109888 109888 109888 109888 54351 109888 108445 77436
POORHLTH Pearson Correlation -.005 -.147** .295** -.120** 1.000 -.001 .055** .014**
Sig. (2-tailed) .208 .000 .000 .000 . .816 .000 .005
N 54351 54351 54351 54351 54351 54351 53754 38018
HLTHPLAN Pearson Correlation .023** .035** -.011** -.038** -.001 1.000 .152** .022**
Sig. (2-tailed) .000 .000 .000 .000 .816 . .000 .000
N 109888 109888 109888 109888 54351 109888 108445 77436
BPTAKE Pearson Correlation .011** -.084** .083** .019** .055** .152** 1.000 .039**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000 . .000
N 108445 108445 108445 108445 53754 108445 108445 77436
TOLDHI Pearson Correlation .000 -.091** .030** .014** .014** .022** .039** 1.000
Sig. (2-tailed) .903 .000 .000 .000 .005 .000 .000 .
N 77436 77436 77436 77436 38018 77436 77436 77436
**. Correlation is significant at the 0.01 level (2-tailed).
Discriminant analysis
A method used to classify an individual
in one of two or more groups based on a
set of measurements
Examples:
at risk for
heart disease
cancer
diabetes, etc.
It can be used for prediction and
description
Discriminant analysis
B B
ab
A
A
a and b are wrongly classified
discriminant function to describe the probability
of being classified in the right group.
Logistic regression
An alternative to discriminant analysis to
classify an individual in one of two
populations based on a set of criteria.
It is appropriate for any combination of
discrete or continuous variables
It uses the maximum likelihood
estimation to classify individuals based
on the independent variable list.
Survival analysis (event history
analysis)
Analyze the length of time it takes a
specific event to occur.
Time for death, organ failure, retirement,
etc.
Length of time function of {explanatory
variables (covariates)}
Survival data example
died
died
died
lost
surviving
1980
1985 1990
Log-linear regression
A regression model in which the
dependent variable is the log of survival
time (t) and the independent variables
are the explanatory variables.
Component
1 2 3 4
GENHLTH .450 .207 -.150 -.552
PHYSHLTH -.770 .254 -3.31E-03 -.208
MENTHLTH .652 -.232 -6.74E-02 .353
POORHLTH -.612 6.329E-02 -1.03E-02 .110
BPTAKE -.128 .352 -.465 .474
BLOODCHO 6.411E-02 .335 -.563 .158
SEATBELT .166 .697 .242 .222
SFTYLT16 .137 .676 .447 .188
BIKEHLMT .156 .414 .210 -.299
SMOKENOW -.112 -.382 .495 .356
Extraction Method: Principal Component Analysis.
a. 4 components extracted.
Cluster analysis
A classification method for individuals into
previously unknown groups
It proceeds from the most general to the most
specific:
Kingdom: Animalia
Phylum: Chordata
Subphylum: vertebrata
Class: mammalia
Order: primates
Family: hominidae
Genus: homo
Species: sapiens
Patient clustering
Major: patients
Types: medical
Subtype: neurological
Class: genetic
Order: lateonset
disease: Guillian Barre syndrom
Hierarchical: divisive or agglumerative
Conclusions
Presentation Schedule
4 each on 4/22 and 4/27
5 on 4/29
Each presentation should be maximum of
10 minutes and 5 minutes for discussion
E-mail me your requirements of software
and hardware for your presentation.
Final projects due 5/7/99 by 5:00 pm in
my office.
Presentation Schedule 1