## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Dont

Joe King

University of Washington

1

Contents

I Introduction to Statistics 4

1 Principles of Statistics 5

1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Types of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.2 Sample vs. Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Type I & II Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.3 What does Rejecting Mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Writing in APA Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Description of A Single Variable 8

2.1 Where’s the Middle? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Skew and Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Testing for Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

II Correlations and Mean Testing 13

3 Relationships Between Two Variables 14

3.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Pearsons Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 R Squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Point Biserial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5 Spurious Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Means Testing 16

4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.1 Independent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.2 Dependent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.3 Eﬀect Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2

CONTENTS CONTENTS

III Latent Variables 20

5 Latent Constructs and Reliability 21

5.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

IV Regression 22

6 Regression: The Basics 23

6.1 Foundation Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.2 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.3 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7 Linear Regression 25

7.1 Basics of Linar Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7.1.1 Sums of Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7.2.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.2.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.3 Interpretation of Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.3.1 Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.3.1.1 Transformation of Continous Variables . . . . . . . . . . . . . . . . . . . . . 29

7.3.1.1.1 Natural Log of Variables . . . . . . . . . . . . . . . . . . . . . . . . 30

7.3.2 Categorical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

7.3.2.1 Nominal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

7.3.2.2 Ordinal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

7.4 Model Comparisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.5 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.6 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.6.1 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.6.1.1 Normality of Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.6.1.1.1 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.6.1.1.2 Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7.7 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

8 Logistic Regression 36

8.1 The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.2 Regression Modeling Binomial Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.2.2 Regression for Binary Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.2.2.1 Logit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.2.2.2 Probit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.2.2.3 Logit or Probit? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.2.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3

Part I

Introduction to Statistics

4

Chapter 1

Principles of Statistics

Statistics is scary to most students but it does not have to be. The trick is to build up your knowledge base

one-step at a time to make sure you get the building blocks necessary to understand the more advanced

statistics. This paper will go from very simple understanding of variables and statistics to more complex

analysis for describing data. This mini-book of statistics will give several formulas to calculate parameters

yet rarely will you have to calculate these on paper or insert the numbers in an equation for a spreadsheet.

This ﬁrst chapter will look at some of the basic principles of statistics. Some of the basic concepts that will

be necessary to understand statistical inference. These may seem simple and some of these many may be

familiar with but best to start any work of statistics with the basic principles as a strong foundation.

1.1 Variables

First we start with the basics. What is a variable? Essentially a variable is a construct we observe. There

are two kinds of variables, manifest (or observed variables) and latent variables. Latent variables are ones we

can only measure by measuring other manifest variables, but we infer it (socio-economic status is a classic

example). Manifest variables we directly measure and can model or we can use them to construct more

complex latent variables, for example we may measure parents education, parents incoming and combine

those into the construct of socio-economic status.

1.1.1 Types of Variables

There are four primary categories of manifest variables, nominal, ordinal, interval, and ratio. The ﬁrst

two are categorical variables. Nominal variables are variables which are strictly categorical and have no

discernible hierarchy or order to them, this includes race, religion, or states for example. Ordinal is also

categorical but this has a natural order to it. Likert scales (strongly disagree, disagree, neutral, agree,

strongly agree) is one of the most common examples of an ordinal variable. Other examples include include

class status (freshman, sophomore, junior, senior) and levels of education obtained (high school, bachelors,

masters, etc).

The continuous variables are interval and ratio. These are not categorical such as having a set number of

values but can take any value between two values. A continuous variable is exam scores; your score may

take any value from 0%-100%. Interval has no absolute value so we cannot make judgements about the

distance between two values. Temperature is a good example, Celsius and Fahrenheit realistically won’t

have an absolute minimum or maximum from the temperatures we experience. We cannot say 30 degrees

Fahrenheit is twice as warm as 15 degrees Fahrenheit. A ratio scale is still continuous but has an absolute

zero, so we can make judgements about diﬀerences. I can say a student who got an 80% on the exam did

twice as good as the student who got a 40% on their exam.

1.1.2 Sample vs. Population

One of the primary interests in statistics is to try to generalize our sample to a population. A population

doesn’t always have to be the population of a state or nation as we usually think of the word. Lets say for

example the head of UW Medicine came to me and asked me to do a workplace climate survey on all the

nursing staﬀ at UW Medical Center. While there are alot of nurses there, I could conceivably give my survey

to each and every one of them. This would mean I would not have a problem of generalizability because I

know the attitudes of my entire population.

Unfortunately statistics is rarely this clean, and you will not have access to an entire population. Therefore

I must collect data that is representative’s of the population I want to study, this will be a sample. It is

important to note though because diﬀerent notation is used for samples versus populations. For example

5

CHAPTER 1. PRINCIPLES OF STATISTICS 1.2. TERMINOLOGY

x is generally a sample mean while µ is used as the population mean. Rarely will you be able to know

the population mean where this becomes a huge issue. Many books on statistics have the notation at the

beginning of their book, yet I feel this is not a good idea. I will introduce notation as it becomes relevant,

and speciﬁcally discuss it when its necessary. Do not be alarmed if you ﬁnd yourself coming back to chapters

remembering notation, it happens to everyone, and committing this to memory is a truly life long aﬀair.

1.2 Terminology

There is also the discussion of terminology. This will be discussed before the primary methods for under-

standing how to do statistics because the terminology can get confusing. Unfortunately statistics tends to

like to change its terminology and have multiple words for the same concept, which diﬀer between journals,

disciplines and diﬀerent coursework.

One area where this is most true is when talking about types of variables. We classiﬁed variables into how

they are measured above, but how they ﬁt into our research question is diﬀerent. Basic statistics books still

talk about variables as independent or dependent variables. Although these have fallen out of favor in

alot of disciplines, especially the methodology literature, but still bears weight so will be discussed. We will

talk about which variables are independent and dependent based on the models we run when we get to those

models but in general, the dependent variable is the one we are interested in knowing about. In short, we

want to know how our independent variables inﬂuence our dependent variable(s).

Now of course there are diﬀerent names for the dependent and independent variables depending on what

we are studying. Almost universally the dependent variable is called the outcome variable. This seems

justiﬁed given its the outcome we are studying. Its the independent variable which has been given many

names. In many cases its called the regressor (in regression model), predictor (again generally in regression)

or covariate. I prefer the second term, and don’t like the third. The ﬁrst one seems too tied to regression

modelling and not as general as predictor. Covariate has diﬀerent meanings with diﬀerent tests so in my

opinion can be confusing. Predictor also can be confusing because some people may conﬂate this with cau-

sation which would be a very wrong assumption to make. I will usually use the term independent variable

or predictor due to the lack of better terms and these are the more common ones you will see in the literature.

1.3 Hypothesis Testing

The basis from where we start our research is the null hypothesis. This simply says there is no relationship

between the variables we are studying. When we ”reject” the null hypothesis, we are saying we accept the

alternative hypothesis which says the null hypothesis is not true and there is a ”signiﬁcant” relationship

between the variable(s) we are studying.

1.3.1 Assumptions

There are many types of assumptions that we must make in our analysis in order for our coeﬃcients to be

unbiased.

1.3.2 Type I & II Error

So we have a hypothesis associated with a research question. This mini-book will look at ways to explore

hypothesis and how we can either support or not support our hypothesis. First we must make a few basics

about hypothesis testing. We have to have some basis to determine whether the questions we are testing

are true or not. Yet we also don’t want to make hasty judgements about whether our hypothesis is correct

or not. This leads us to committing errors in our judgements. There are two primary errors in this context.

Type I error is where we reject the null hypothesis when it is correct. Type II error is when we do not

reject the null hypothesis when it is wrong. While we attempt to avoid both types of errors, the latter is

more acceptable than the former. This is because we do not want to make hasty decision about discussing

an important relationship between variables when none exists. If we say there is no relationship when in

6

CHAPTER 1. PRINCIPLES OF STATISTICS 1.4. WRITING IN APA STYLE

fact there is one, this is a more conservative approach that hopefully future research will correct.

1.3.3 What does Rejecting Mean?

When we try to reject the null hypothesis ﬁrst we must determine our critical value which is generally 0.05.

It is by convention that it is done and currently debated on whether its still of practical use given computing

technology today. When we reject the null hypothesis all we are saying is the chances of ﬁnding as large or

larger result is less than the signiﬁcance level. This does not mean that your research question really merits

any major practical eﬀect. Rejecting the null hypothesis may be important but so can not rejecting the null

hypothesis be important. For example if there was a school where lower income groups and higher income

groups were performing ”signiﬁcantly diﬀerent” on exams 5 years ago, and I came in and tested again, and

I found ”no statistically signiﬁcant diﬀerences”, I would ﬁnd that to be highly important. It would mean

there was a change in the test scores and there is now some relative parity.

The next concern is practical signiﬁcance. If my research is signiﬁcant, but there may not be any real reason

to think its going to make a diﬀerence if implemented in policy or clinical settings. This is where other

measures come into play, like eﬀect sizes which will be discussed later. One should also note that larger

sample sizes can make even a very small statistics ”statistically signiﬁcant” and a small sample size can mask

a signiﬁcant result. All of these must be considerations. One should not take a black and white approach to

answering research questions. Something is just not signiﬁcant or not.

1.4 Writing in APA Style

One thing to be cautious about is how to write up your results and present them in a manner which is both

ethical and concise. This includes graphics, tables and paragraphs. These should make the main points of

what you want to say while not mis-representing your results. If you are going to be doing alot of writing

for publication you should pick up a copy of the APA Manual (Association2009).

1.5 Final Thoughts

A lot was discussed in this ﬁrst part. These concepts will be revisited in later sections as we begin to

implement these concepts. There are many books which have been written which expand on these concepts

further and articles which have been written about these concepts. I ask that you constantly keep an open

mind as researchers and realize statistics can never tell us ”truth”, it can only hint at it, or point us in the

right directions, and the process of scientiﬁc inquiry never ends.

7

Chapter 2

Description of A Single Variable

So when we have variables we want to understand the nature of these variables. Our ﬁrst job is to describe

our data, before we can start to do any test. There are two measures we want to know about our data.

The ﬁrst is we want to know where the center of the mass of the data is, and how far from the center of

the mass our data is distributed. The middle is calculated by the measures of central tendency (discussed

momentarily), how far from the middle of that helps us know how much variability there is in our data. This

is also called uncertainty or the dispersion parameter. These concepts are more generally known as the

location and scale parameters. Location being the middle of the distribution, where on a real number line

does the middle lie. Scale is how far away from the middle does our data go. These are concepts that are

common among all statistical distributions. Although, for now our focus is on the normal distribution. This

is also known as the Gaussian distribution and is widely used in statistics for its satisfying mathematical

properties and being able to conform to allow us to run many types of analyses.

2.1 Where’s the Middle?

The best way to describe data is to use the measure of central tendency, or what is the middle of a set of

values. This includes the mean, median, and mode.

The equation to ﬁnd the mean is in 2.1. The equation below has some notation which requires some dis-

cussion as you will see this in alot of formulas. The

**is the summation sign, which tells us to sum
**

everything to its left. The i = 1 below the summation sign simply means start at the ﬁrst value in the vari-

able, and the N at the top means go all the way to the end (or the number of responses seen in that variable).

¯ x =

N

i=1

x

N

(2.1)

If we return to our x vector we get 2.2

¯ x = 1 + 2 + 3 + 4 + 5 = 15/5

¯ x = 15/5

¯ x = 3

(2.2)

Our mean is inﬂuenced by all the numbers equally, so our example of variable y would give a diﬀerent mean

by formula 2.3.

¯ x = 1 + 1 + 2 + 3 + 4 + 5 = 15/6

¯ x = 16/6

¯ x = 2.67

(2.3)

The addition of the extra one weighed our mean down. As we will see, values can have dramatic changes on

our mean, especially when the number of values we have is low. Finally we represent mean in several ways,

the Greek letter µ represents the population mean, while the mean of a sample can be denoted with a ﬂat

bar on top, so we would say x = 3. Finally the mean is also known as the expected value, so we can write

8

CHAPTER 2. DESCRIPTION OF A SINGLE VARIABLE 2.2. VARIATION

it as E(x) = 3.

For categorical data there are two great measures. The ﬁrst is Median which is simply the middle number

of a set, so for a set of values as in 2.4

Median = 1, 2, 3

.¸¸.

Median=3

, 4, 5 (2.4)

Now if there is an even number of values we take the mean of the two middle values 2.5

Median = 1, 1, 2, 3

.¸¸.

Median=2.5

, 4, 5 (2.5)

Mode is simply the most common number in a set, so the last example, 1 is the mode since it occurs twice,

the others occurs once. You may get bi-modal data where there is two numbers that occur most of all, or

even more.

These last two measures if discussing the middle of a distribution are of great interest in categorical data

mostly. Mode is rarely useful in interval or ordinal data, although median can be of help in this data. Mean

is the most relevant for continuous data and one that will be used a lot in statistics. The mean is more

commonly referred to as the average Mean is computed by taking the sum of all of the values and dividing

by the number of values.

2.2 Variation

We now know how to get the mean, but much of the time we also want to know how much variation is in

our data. When we talk about variation we are talking about why we get diﬀerent values in the data set.

So going on our previous example of [1,2,3,4,5] we want to know why we got these values and not all 3s, or

4s. A more practical example is why does one student score a 40 on an exam, and another 80, another 90,

another 50, etc. This measure of variation is called variance. It is also called the dispersion parameter in

the statistics literature and the word dispersion will be used in discussion of other models.

Variance for the normal distribution is ﬁrst to ﬁnd the diﬀerence between each value and the sample mean.

Then those diﬀerences are squared, and the sum of that is divided by the number of observations as seen

below in taking the variance of x. Taking the square root of the variance gives the standard deviation for

the normal distribution. Formula 2.6 shows the equation for this.

V ar(x) =

N

i=1

(x − ¯ x)

2

N

(2.6)

Formula 2.7 below shows how we take the formula above and use our previous variable x to calculate the

sample variance.

9

CHAPTER 2. DESCRIPTION OF A SINGLE VARIABLE 2.3. SKEW AND KURTOSIS

V ar(x) = ([1 − 3 = −2] + [2 − 3 = −1] + [3 − 3 = 0] + [4 − 3 = 1] + [5 − 3 = 2])/5

= (−2

2

+ −1

2

+ 0

2

+ 1

2

+ 2

2

)/5

= (4 + 1 + 0 + 1 + 4)/5

= 10/5

= 2

(2.7)

A plot of the normal distribution with lines pointing to the distance between 1, 2 and 3 standard deviations

is shown in 2.1.

6 10 14 18 22 26 30 34

1 Standard Deviation (68.2%)

2 Standard Deviations (95.4%)

3 Standard Deviations (99.7%)

Figure 2.1: Normal Distribution

Now is when we start getting into the discussion of distributions. Speciﬁcally here we will talk about the

normal distribution. The standard deviation is one property of the normal distribution. The standard

deviation is a great way to understand how data is spread out and gives us an idea of how close to the mean

our sample is. The rule for the normal distribution is 68% of the population will be within one standard

deviation of the mean, 95% will be within two standard deviations, and 99% will be within three standard

deviations. This is shown in Figure 1, which has a mean of 20, and a standard deviation of two.

There is two other forms of variation that are good to see. This the interquartile range. This shows the

middle 50% of the data. It goes from the upper 75th percentile to the lower 25th percentile. One good

graphing technique for this is a box and whisker plot . This is shown in 8.1. The line in the middle is the

middle of the distribution. The box is the interquartile range, the horizontal lines are two standard devia-

tions out. The dots outside those are outliers (data points more than two standard deviations from the mean).

2.3 Skew and Kurtosis

Two other concepts which help us evaluate a single normal variable is skew and kurtosis. This is not talked

about as much but they are still important. Skew is when one part of the sample is on one side of the mean

than the other. Negative skew is where the peak of the curve is to the right of the mean (the tail going

to the left). Positive Skew is where the peak of the distribution is to the left and the tail is going to the right.

Kurtosis is how ﬂat or peaked a distribution looks. A distribution which has a more peaked shaped is called

leptokurtic, and a shape that is ﬂatter is called platokurtic. Although skewness and Kurtosis can make a

distribution violate normality, it does not always.

10

CHAPTER 2. DESCRIPTION OF A SINGLE VARIABLE 2.4. TESTING FOR NORMALITY

Figure 2.2: Box and Whisker Plot

2.4 Testing for Normality

Can we test for normality? Well we can, and should. One way is to use descriptive statistics and to look

at a histogram. Below you can see a histogram of the frequency of a normal distribution. We can overlay a

normal distribution over it, and we can see if the data looks normal. This is not a ”test” per se but we can

get a good idea of our data looks like. This is shown in 2.3.

5 10 15 20 25 30 35

Figure 2.3: A Histogram of the normal distribution above with the normal curve overlaid

We could also example a PP Plot. This is a plot with a line at a 45 degree angle going from bottom left to

upper right of a plot. the closer the points are to the line the closer to normality the distribution is. This is

also the same principle behind a qqplot (Q meaning quantiles).

2.5 Data

I will try to give examples of data analysis and its interpretation. One good data set is on Cars released in

1993 (Lock, 1993), names of the variables and more info on the data set can be found in Appendix ??.

2.6 Final Thoughts

A lot of concepts were discussed are necessary for a basic understanding of statistical knowledge. Although

do not feel you have to have this entire chapter memorized. The concepts here you may need to come back

to from time to time. Do not focus either on memorizing formulas, focus on what the formulas tell you

about the concept. With today’s computing powers your concern will be understanding what the output is

telling you and how to connect that to your research question. While it is good to know how numbers are

11

CHAPTER 2. DESCRIPTION OF A SINGLE VARIABLE 2.6. FINAL THOUGHTS

calculated, its just to understand how to use it in your test.

12

Part II

Correlations and Mean Testing

13

Chapter 3

Relationships Between Two Variables

The ﬁrst part of this book we just looked at describing variables. Now we look at how they are related and

want to test the strength of those relationships. This is a diﬃcult task, something that will take time to

master not only the concepts but its implementation. Course homework’s are actually the easiest way to do

statistics. You are given a research question told what to run and to report your results. In real analysis

you will have to decide for yourself what test to run that best ﬁts your data and your research question.

While I will provide some equations, its best to look at them just to see what they are doing, and what

they mean, its less important to memorize them. This ﬁrst part will look at basic correlations and testing

of means (t-tests and ANOVA).

Much of statistics is correlational research. It is research where we look at how one variable changes when

another changes, yet causal inferences will not be assessed. It is very tempting to use the word cause or to

imply some directionality in your research but you need to refrain from it unless you have alot of evidence to

justify it as the ethical standards for determining causality is high. If you are wishing to learn more about

causality see (Pearl, 2009a;Pearl, 2009b)

3.1 Covariance

Before discussing correlations we have to discuss the idea of a covariance. One of the most basic ways to

associate variables is by getting a variance-covariance matrix. Now a matrix is like a spreadsheet, each cell

having a value in it. The diagonal going from upper left to lower right is the variance of the variable (as

it will be the same variable on the top row as it will be on the left column. The other values will be the

covariance between the two variables. The idea of covariance is similar to variance, except we want to know

how one variable varies with another. So if one changes in one direction, how will another variable change

in the same direction? Do note though we are only talking about continuous variables here (for the most

part interval and ratio scales are treated the same and the distinction is rarely made in statistical testing,

so when I mention continuous it may be either interval or ratio without compromising my analysis). The

formula for covariance is in 3.1.

Cov(x, y) =

N

i=1

(x − ¯ x)(y − ¯ y)

N

(3.1)

As one can see it is taking the deviations from the mean, and multiplying them together and then dividing

by the sample size. This gives a good measure of the relationship between the two variables. While this

concept is necessary and a bedrock of many statistical tools, its not very intuitive. It is not standardizing

it in anyway that allows us to make quick understandings of the relationships, this is what leads us into

correlations.

3.2 Pearsons Correlation

A correlation is essentially a standardized covariance. We take the covariance and divide it by the standard

deviation in 3.2:

r

x,y

=

N

i=1

(x − ¯ x)(y − ¯ y)

_

N

i=1

(x − ¯ x)

2

N

i=1

(y − ¯ y)

2

(3.2)

If we dissect this formula its not as scary as it looks. The top of the equation is simply the covariance. The

bottom is the variance of x and the variance of y multiplied by each other. Taking the square root is simply

converting that to a standard deviation. This puts the correlation coeﬃcient into the metric of -1 to 1. A

correlation of 0 means no association what so ever. A correlation of 1 is a perfect correlation. So lets say

14

CHAPTER 3. RELATIONSHIPS BETWEEN TWO VARIABLES 3.3. R SQUARED

we are looking at the association of temperatures between two cities, if city A temperature went up by one

degree, city B would also go up by one degree if their correlations were 1 (remember a correlation assumes

the units of measurement). If the correlation is -1, its a perfect inverse correlation, so if temperature of city

A goes up one degree, city B will go DOWN one degree. In social science the correlations are never this

clean, or clear to understand. Since the metrics can diﬀer between correlations one must be careful about

when you do a correlation and how you interpret it. Also remember a correlation is non-directional, so if we

have a correlation of .5 and temperature in city A goes up one degree and up a half degree in city B, then

if city B goes up a full degree then will go up a half degree in city A.

Pearsons correlations are reported with an ”r” and then the coeﬃcient, followed by the signiﬁcance level.

For example r = 0.5, p < .05 if signiﬁcant.

3.3 R Squared

When we get a pearsons correlation coeﬃcient we can take the square of that value, and that is whats called

the percentage of variance explained. So if we get a correlation of .5, then the square of that is .25, so we

can say that 25% of the variation in one variable is accounted for by the other variable. Of course as the

correlation increases so will the amount of variance explained.

3.4 Point Biserial Correlation

One special case where a categorical variable can use a continuous Pearsons r is the point-biserial correlation.

If you have a binary variable you can calculate the correlation between the two categories if the other variable

you are comparing it to is continuous. This is similar to a t-test we will examine later. The test looks at

whether or not there is a signiﬁcant diﬀerent between the two groups of the dichotomous variables. When

we ask whether its ”signiﬁcant” or not, we are wanting to determine whether or not the diﬀerence is due

to random chance. We already know there is going to be random variability in any sample we take, but

we want to know if the diﬀerence between the two groups is due to this randomness or is there a genuine

diﬀerence in the groups which is due to true diﬀerences.

3.5 Spurious Relationships

So lets say we get a pearsons r=.5, so what now? Can we say there is a direct relationship between variables?

No, because we don’t know if the relationship is direct or not. There are many examples of ”spurious rela-

tionships”. For example, if I look at the rate of illness students report to the health center at their University

and the the relative time of exams, I would most likely ﬁnd a good (probably moderate) correlation. Now

before any students starts using this statement as a reason to cancel tests, there is no reason to believe your

exams are causing you to get sick! Well what is it then? Well something we DIDNT measure, Stress! Stress

weakens the immune system, and stress is higher during periods of examinations, so you are more likely to

get ill. If we just looked at correlations we would only be looking at the surface, so take the results but use

them with caution, as they may not be telling the whole story.

3.6 Final Thoughts

This may seem like a short chapter given the heavy use of correlations but much of the basics of this chapter

will be used in future statistical analysis. One of the primary concerns to take from this is this is not in

anyway measuring causality, and this point can not be discussed enough. Correlations are a good way of

looking at associations, but that’s all, but is a good way to help us explore data and work towards more

advanced statistical models which can help us support or not support our hypotheses. While correlations

can be used, use them with caution.

15

Chapter 4

Means Testing

This chapter goes a bit more into exploring the diﬀerneces between groups. So if we have a nominal or ordinal

variable, and we want to see if these categories are statistically diﬀerent based on a continous variable, there

are several tests we can do. We already looked at the point bi-serial correlation, which is one test. This

chapter examines the t-test which is a test that gives a bit more detail, and Analysis of Variance (ANOVA)

which will explore when the number of groups is greater than 2 (the letter denoting groups is generally ”k”,

as ”n” denotes sample size, so ANOVA will be k > 2 or k ≥ 3). Here we will want to know whether the

diﬀerence in the means is statistically signiﬁcant.

4.1 Assumptions

So the ﬁrst assumption we will make is the continuous variables we are measuring are normally distributed,

and we learned to test that earlier. Another assumption we must make is called ”homogeneity of variance”.

This means the variance is the same for both groups (it doesn’t have to be exactly the same but similar,

again it will be somewhat diﬀerence due to randomness but is the variance diﬀerent enough to be statisti-

cally diﬀerent). If this assumption is untenable we will have to correct for the degrees of freedom, which will

inﬂuence whether our t-statistic is signiﬁcant or not.

This can be shown in the two ﬁgures below. 4.1 shows the diﬀerence in the means (mean of 10 and 20) but

with same variance of 4.

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Mean Difference

Figure 4.1: Same Variance

4.2 has same means but one variance is 4 and the other is 16 (standard deviation of 4).

4.2 T-Test

The t-test is similar to the point-biserial as we are wanting to know whether two groups are statistically

diﬀerent.

So we will look at the ﬁrst equation, which the numerator is the diﬀerence between the means. The denom-

inator is the diﬀerence between the standard deviations. the variance of the sample is denoted s

2

, and n is

16

CHAPTER 4. MEANS TESTING 4.2. T-TEST

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36

Mean Difference

Figure 4.2: Diﬀerent Variances

the sample size for that group. This is shown in 4.1

t =

¯ x

1

− ¯ x

2

_

s

2

1

n1

+

s

2

2

n2

(4.1)

The degrees of freedom is denoted by 4.2.

df =

s

2

1

/n

1

+ s

2

2

/n

2

2

(s

2

1

/n

1

)

2

/n

1

− 1 + (s

2

2

/n

2

)

2

/n

1

− 1

(4.2)

The above equations assume unequal sample sizes and variances. The equations get smaller if you have same

variance or same sample size in each group. Although this only generally occurs in experimental settings

where sample size and other parameters can be more strictly controlled.

In the end we want to see if there is a statistical diﬀerence between groups. If we look at data from the

National Educational Longitudinal Study from 1988 baseline year, we can see how this works. If we look

at the diﬀerence in gender and science scores, we can do a t-test and we ﬁnd there’s a signiﬁcant mean

diﬀerence. The means for gender are in 4.1

Mean SD

Male 52.1055 10.42897

Female 51.1838 10.03476

Table 4.1: Means and Standard Deviations of Male and Female Test Scores

Our analysis shows t(10963) = 4.712, p < .05. Although the test of whether variances are the same is

signiﬁcant F = 13.2, p < .05, so we have to use the variances not assumed. This changes our results to

t(10687.3) = 4.701, p < .05. You can see the main diﬀerence is our degrees of freedom dropped, thus our

t-statistic dropped.

This time it didn’t matter, our sample size was so large that both values were signiﬁcant, but in some tests

this may not be the case. If the test of equal variances rejects the null hypothesis but the test of unequal

17

CHAPTER 4. MEANS TESTING 4.3. ANALYSIS OF VARIANCE

variances does not reject, even if levenes test is not signiﬁcant, you should really be cautious about how you

write it up.

4.2.1 Independent Samples

The above example was an independent samples t-test. This means the participants are independent of each

other and so their responses will be too.

4.2.2 Dependent Samples

This is a slightly diﬀerent version of the t-test where you still have two means but the samples are not

independent of each other. A classic example of this is pre-test, post-test designs. Also longitudinal data

where a measure was collected at one year then measured on the same test at a later date.

4.2.3 Eﬀect Size

The eﬀect size r is used in this part. The equation for this is in 4.3:

r =

¸

t

2

t

2

+ df

(4.3)

4.3 Analysis of Variance

Analysis of Variance (ANOVA) is used to compute when you have more than two groups. Here we will look

at what happens when have race and standardized test scores. The problem we will encounter is to see which

groups are signiﬁcantly diﬀerent. ANOVA adds some steps to testing the analysis. First all of the means

are compared (the equations for this will be quite complex so we will just go through the analysis steps).

First you see if any of the means are statistically diﬀerent. This is called an omnibus test and follows the

F distribution (the F distribution and t distribution are similar to the normal but have ”fatter tails” which

means it allows for more outliers but this is of not much consequence to the applied analysis). We get an F

statistic for both levenes test and the omnibus test. In this analysis we get four group means. These means

are below in 4.2:

Race Mean SD

Asian, Paciﬁc Islander 56.83 10.69

Hispanic 46.72 8.53

Black, Not Hispanic 45.44 8.29

White, Not Hispanic 52.91 10.03

American Indian, Alaskan 45.91 8.13

Table 4.2: Means and Standard Deviations of Race Groups Test Scores

Table 4.3 is the mean diﬀerences. Now after we reject the omnibus test we need to see if there’s a signiﬁcant

diﬀerences between the tests. We do this by doing post-hoc tests. For simplicity reasons I have put it in

a matrix where the numbers inside is the diﬀerences between the groups. Those with (*) beside them are

statistically signiﬁcant. Now this is not how it is done in SPSS, because it will give you it in rows but this

is easily made. There are many post-hoc tests one can do. The ones done below are Tukey and Games-

Howell, and both reject the same mean diﬀerence groups. There are alot more post-hoc tests but these

two do diﬀerent things. Tukey adjusts for diﬀerent sample sizes, Games Howell corrects for heterogeneity

of variance. If you do a few types of post-hoc tests and the result is the same this gives credence to your

hypothesis. If not you should go back to see if there is a real diﬀerence or not or re-examine your assumptions.

18

CHAPTER 4. MEANS TESTING 4.3. ANALYSIS OF VARIANCE

Race Groups

Asian-PI Hispanic Black White AI-Alaskan

Asian-PI 0

Hispanic 10.1092* 0

Black 11.3907* 1.2815* 0

White 3.9193* -6.1899* -7.4714* 0

AI-Alaskan 10.9178* 0.8086 -0.4729 6.9985* 0

Note: PI-Paciﬁc Islander; AI-American Indian

Table 4.3: Mean Diﬀerences Among Race Groups

19

Part III

Latent Variables

20

Chapter 5

Latent Constructs and Reliability

Sometimes in statistics we have variables we want to study, but we cant measure them directly. This means

we have to use multiple measures (called manifest variables), which come together to measure the construct

we are trying to understand. Unfortunately we cant say for certain the variables we measure are informing

on the overall construct we want to test. This means we have to have measures to test this. Some examples

of latent variables include socio-economic status and intelligence. We cant measure socio-economic status

directly but we can look at income, education, neighborhood, and other measures to get an overall gauge of

the construct.

5.1 Reliability

One measure of reliability (also called internal consistency) is chronbachs alpha. It is on a metric from 0

to 1. The closer the one the more reliable the measure is. A measure less than .7 is considered too weak

to be reliable. The true measure of reliability depends on your measure (some tests that are critical like

standardized test scores may have higher restrictions on them). In the end it comes down to the researcher

to defend if a measure is reliable or not.

21

Part IV

Regression

22

Chapter 6

Regression: The Basics

Regression techniques make up a major portion of social science statistical inference. Regression is also

called linear models (this will be generalized later but for now we will stick with linear models) as we try to

ﬁt a line to our data. These methods allow us to create models to predict certain variables of interest. This

section will be quite deep, since regression requires a lot of concepts to consider, but as in past portions of

this book, we will take it one step at a time, starting out with basic principles and moving to more advanced

ones. The principle of regression to we have a set of variables (known as predictors, or independent variables)

that we want to use to predict an outcome (known as the dependent variable but fallen out of favor in more

advanced statistics classes and works). Then we have a slope for each independent variable, which tells us

the relationship between the predictors and outcomes.

If you see yourself not understanding something, come back to the more fundamental portions of regression

and it will sink in. This type of method is so diverse people spend careers learning and using this modeling

procedure, so it’s not expected you pick it up in one quarter, but are just laying the foundations for the use

of it.

6.1 Foundation Concepts

So how do we try to predict an outcome? Well it comes back to the concept of variance. Remember early on

in this book we looked at variance as simply variation in a variable. There are diﬀerent values for diﬀerent

cases (i.e. diﬀerent scores on a test for diﬀerent students). Regression allows us to use a set of predictors to

explain the variation in our outcome.

Now we will look at the equations themselves and the notation that we will use. The basic equation of a

regression model (or linear model) is 7.11.

y = β

0

+

p

i=1

β

p

x

p

+ ε (6.1)

This basic equation may look scary but it is not. There are some basic parts to the equation which will

be relevant to the future understanding of these models. So let us go left to right. The y is our outcome

variable, this is the variable we want to predict the behavior of. The β

0

is the slope of the model (where the

regression line crosses the y axis on a coordinate plane. The β

p

x

p

the actually two components together.

The x is the predictor variables, and the β is the slopes for each predictor. This tells us the relationship

between that predictor and the outcome variable. The summation sign is there, yet unlike other times this

has been used, at the top is the letter p instead of n. This is because p stands for number of predictors, and

not summing to the number of cases. The ε is the ”error” term, which takes into account the variability in

the model the predictors don’t explain.

6.2 Final Thoughts

This brief chapter introduces regression as a concept, or more generally linear modeling. I don’t say linear

regression (which is the next chapter) as this is just one form of regression. Many more types of regression

will be done in future chapters. There are many books on regression, and at the end of each chapter I will

note very good ones. One extraordinary one is Gelman and Hill (2007) which I will use a lot to refer to with

regards to creating this chapter.

23

CHAPTER 6. REGRESSION: THE BASICS 6.3. BIBLIOGRAPHIC NOTE

6.3 Bibliographic Note

Many books have been written on regression. I have used many as inspiration and references for this work

although much of the information is freely available online. On top of Gelman and Hill (2007) for doing

regression, the books Everitt, Hothorn, and Group (2010), Chatterjee and Hadi (2006) and ﬁnally the free

book Faraway (2002), and other excellent books that are available for purchase Faraway, 2004; Faraway, 2005.

More theory based books are Venables and Ripley (2002), Andersen and Skovgaard (2010), Bingham2010

Rencher and Schaalje (2008), Rencher and Schaalje (2008),Sheather (2009). As you can tell most of these

books use R which is my preferred statistical package of choice. Some books are focused on SPSS and do

a good job at that, one notable one being by Field (2009), also more advanced books but still very good is

Tabachnick and Fidell (2006) and Stevens (2009). Stevens (2009) would not make a good text book but is an

excellent reference, including SPSS and SAS instructions and syntax for almost all multivariate applications

in social sciences and is a necessary reference for any social scientist.

24

Chapter 7

Linear Regression

Lets focus for a while on one type of regression, linear regression. This requires us to have an outcome

variable that is continuous and normally distributed. When we have a continuous normally distributed out-

come, we can use least squares to calculate the parameter estimates. Other forms of regression use maximum

likelihood, which will be discussed in later chapters. Although the least squares estimates are the maximum

likelihood estimates.

7.1 Basics of Linar Regression

This ﬁrst regression technique we will learn, and the most common one used is where our outcome is contin-

uous in nature (interval or ratio it nature, it does not matter). Linear regression uses an analytic technique

called least squares. We will see how this works graphically and then how the equations give us the numbers

for our analysis.

What linear regression does is it looks at the plot of x and y and tries to ﬁt a straight line that is closest

to all of these points. Figure 7.1 shows how this is done. I just randomly drew values for both x and y

and the line is the regression line that is the best ﬁt for the data. Now as the plot shows, the line doesn’t

ﬁt perfectly, its just the ”best ﬁtting line”. The diﬀerence between the actual data and the line is what’s

termed residuals as it is what is not being captured in the model. The better the line ﬁts and the less

residual there is, the stronger the predictor will predict the outcome.

−1 0 1 2 3

−

1

0

1

2

3

x

y

Figure 7.1: Simple Regression Plot

7.1.1 Sums of Squares

When discussing the sums of squares we get two equations, one is for the sums of squares for the model in

7.1. This is the diﬀerence between our predicted values and the mean. This is how good our model is ﬁtting.

We want this number to be as high as possible.

25

CHAPTER 7. LINEAR REGRESSION 7.2. MODEL

SSR =

n

i=1

(ˆ y

i

− ¯ y)

2

(7.1)

The second is the sums of squares regression (or error), this is the diﬀerence between predicted and actual

values of the outcome, this we want to be as low as possible and is shown in 7.2.

SSE =

n

i=1

(ˆ y

i

− y

i

)

2

(7.2)

The total sums of squares can be done by summing the SSR and SSE or by 7.3.

SST =

n

i=1

(y

i

− ˆ y

i

)

2

(7.3)

The table 7.1 shows how this can be arranged. We commonly report sums of squares and degrees of freedom

along with the F statistic, the mean squares are less important but will be shown for the purposes of the

examples in this book.

Sums of Squares DF Mean Square F Ratio

Regression SSR p MSR =

SSR

p

F =

MSR

MSE

Residual (Error) SSE n − p − 1 MSE =

SSE

n−p−1

Total SST n-1

Table 7.1: ANOVA Table

7.2 Model

First lets look at the simplest model, if we had one predictor it would be a simple linear regression 7.4.

As shown, β

0

is the slope parameter for the model, also called the ”y intercept”, it is where on the co-

ordinate plane the regression line crosses the y-axis when x = 0. The β is the parameter estimate for

that predictor beside it, the x. This shows the magnitude and direction of the relationship to the outcome

variable. Finally ε is the residual, this is how much the data deviates from the regression line. This is also

called the ”error term”, it’s the diﬀerence between the predicted values of the outcome and the actual values.

y = β

0

+ β

1

x

1

+ ε (7.4)

More than one predictor is multiple linear regression, such as having two or more predictors will look like

7.5, note the subscript p stands for parameters, so there will be a βx for each independent variable.

y = β

0

+ β

1

x

1

+ β

2

x

2

+ · · · + β

p

x

p

+ ε (7.5)

26

CHAPTER 7. LINEAR REGRESSION 7.2. MODEL

7.2.1 Simple Linear Regression

If we have the raw data, we can ﬁnd the equations by hand. While in the era of very high speed computers

it is rare you will have to manually compute these statistics we should still look at the equations to see how

we derive the slopes. The slope below is how to calculate the beta coeﬃcient for a simple linear regression.

We square values so we get an approximation of the distance from the best ﬁtting line as shown in 7.6. If we

just added the numbers up, some would be below the line, and some above giving us negative and positive

values respectively so they would add to zero (as is one of the assumptions of error term). Squaring makes

sure we have this issue removed.

ˆ

β

1

=

n

i=1

(x − ¯ x)(y − ¯ y)

n

i=1

(x − ¯ x)

2

(7.6)

The equation 7.7 shows how the slope parameter is calculated in a simple linear regression. This is where

the regression line crosses the y-axis when x = 0.

ˆ

β

0

= ¯ y +

ˆ

β¯ x (7.7)

Finally we come to our residuals. When we plug in values for x into the equation, we get the ”ﬁtted val-

ues”. These values are predicted by the regression equation. This is signiﬁed by ˆ y. When we subtract the

actual outcome value for the predicted value (which the ﬁtted value is known as). This shows how much

our actual values ﬁt from the line, and it gives us an idea of which values are furthest from the regression line.

ε = y − ˆ y (7.8)

We can also ﬁnd in the model how much of the variability within our outcome is being explained by our

predictors. When we run this model we will get a Pearson’s correlation coeﬃcient (r). We can still square

this number (as we did in correlation) and get the amount of variance explained. This is done in several

ways, see 7.10.

r

2

=

SSR

SST

= 1 −

SSE

SST

=

n

i=1

(y

i

− ˆ y

i

)

2

n

i=1

(ˆ y

i

− ¯ y)

2

(7.9)

We do need to adjust our r squared value to account for complexity of the model. Whenever we add a

predictor, we will always explain more variance. The question is, is this is truly explaining variance for

theoretical reasons or if it is just randomly adding variation explanation. The adjusted r squared should

be comparable to the non-adjusted value, if they are substantially diﬀerent, you should look at your model

more closely. The adjusted r-squared can be particularly sensitive to sample size, so smaller sample size will

show diﬀerences in adjusted r squared values. Also its best to report both if they vary by a non-trivial amount.

Adjustedr

2

= 1 − (1 − r

2

)

SSE/n − p − 1

SST/n − 1

(7.10)

We can look at an example of data. Let’s look at our cars example. Let’s see if we can predict the price of

a vehicle based on its miles per gallon (MPG) of fuel used while driving in the city.

27

CHAPTER 7. LINEAR REGRESSION 7.2. MODEL

> mod1<-lm(Price~MPG.city);summary(mod1)

Call:

lm(formula = Price ~ MPG.city)

Residuals:

Min 1Q Median 3Q Max

-10.437 -4.871 -2.152 1.961 38.951

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 42.3661 3.3399 12.685 < 2e-16 ***

MPG.city -1.0219 0.1449 -7.054 3.31e-10 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 7.809 on 91 degrees of freedom

Multiple R-squared: 0.3535, Adjusted R-squared: 0.3464

F-statistic: 49.76 on 1 and 91 DF, p-value: 3.308e-10

We ﬁnd that it is a signiﬁcant predictor of price. Our ﬁrst test is similar to ANOVA, which is the F test. This

we reject the null hypothesis, F(1, 91) = 49.76, p < .001. We then look at the signiﬁcance of our individual

predictor. It is signiﬁcant, here we report two statistics, the parameter estimate (β), and the t-test associated

with that. Here miles per gallon in city is signiﬁcant with β = −1.0219, t(91) = −7.054, p < .001. The ﬁrst

interesting thing is there is an inverse relationship, as one variable increases, the other decreases, here we

can say that for every mile per gallon used in the city increase, there’s a drop in price of $1,000.

1

We can

also look at the r

2

value to see how well the model is ﬁtting. The r

2

= 0.3535 and the Adjustedr

2

= 0.3464.

While the adjusted value is slightly lower it’s not a major issue, so we can trust this value.

7.2.2 Multiple Linear Regression

Multiple linear regression is similar to simple regression except we place more than one predictor in the

equation. This is how most models in social science are ran, since we expect more than one variable to be

related to our outcome.

y = β

0

+

p

i=1

β

p

x

p

+ ε (7.11)

Lets go back to the data, lets add to our model above not only miles per gallon in the city but fuel tank

capacity.

> mod3<-lm(Price~MPG.city+Fuel.tank.capacity);summary(mod3)

Call:

lm(formula = Price ~ MPG.city + Fuel.tank.capacity)

Residuals:

Min 1Q Median 3Q Max

-18.526 -4.055 -2.055 2.618 38.669

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 10.1104 11.6462 0.868 0.38763

MPG.city -0.4608 0.2395 -1.924 0.05747 .

Fuel.tank.capacity 1.1825 0.4104 2.881 0.00495 **

1

I say $1000 dollars and not one dollar as this is the unit of measurement, be sure when interpreting data you use the unit

of measurement unless the data is transformed (which will be discussed later).

28

CHAPTER 7. LINEAR REGRESSION 7.3. INTERPRETATION OF PARAMETER ESTIMATES

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 7.514 on 90 degrees of freedom

Multiple R-squared: 0.4081, Adjusted R-squared: 0.395

F-statistic: 31.03 on 2 and 90 DF, p-value: 5.635e-11

We ﬁnd we reject the null hypothesis with F(2, 90) = 31.03, p < .05. We have an r

2

= .408 and adjustedr

2

=

0.395. So this model is ﬁtting well and we can explain around 40% of the variance by these two parameter

estimates. Interestingly, miles per gallon fails to remain signiﬁcant in the model, β = −0.4608, t(90) =

−1.924, p = 0.057 This is one of those times where signiﬁcance is close, and most people who hold rigidly to

the alpha of .05 would say this isnt important. I dont hold such views, while this seems less important than

in the last model, its still worth mentioning as a possible predictor, but in the presence of fuel tank capacity

has less predictive power.

Fuel talk capacity is strongly related to price β = 1.1825, t(90) = 2.881, p < .05. We ﬁnd the relationship

here is positive, so the more fuel tank capacity the higher the price. We could speculate larger vehicles, with

larger capacity will be more expensive. Although we have seen consistently that miles per gallon in the city

is inversly related, well this may also deal with size. Larger vehicles may get less fuel eﬃciency but may

be more expensive, smaller cars may be more fuel eﬃcient and yet cheaper. I am not an expert on vehicle

pricing so we will just trust the data from this small sample.

7.3 Interpretation of Parameter Estimates

7.3.1 Continuous

When a variable is continuous generally, interpretation is relatively straight forward. We interpret the coef-

ﬁcients to mean that one unit increase in the predictor will mean an increase in y by the amount of β. So

lets say you have a coeﬃcient y = β

0

+ 2x + ε. Well here the 2 is the parameter estimate (β), so we say for

each unit increase in x, we will increase y by 2 units. Now when saying the word ”unit” we are referring to

the original measurements of the individual variables. So if x is income in thousands of dollars, and y is test

scores, then for each one thousand dollars increase in income (x) will mean 2 points greater score on the exam.

This changes if we transform our variables. If we standardize our x values, we would say for each standard

deviation increase in x, increase y by two units. If we standardized y and x, we would say one standard

deviation increase of x would mean two standard deviation increase in y.

If we log our outcome, then we would say that one thousand dollar increase in come would mean 2 log units

increase in y. One thing to note is when statisticians (or almost all scientists say log) they mean the natural

log. To transform this back to the original units, you take the exponential function, so e

y

if you had taken

the log of the outcome (reasons for this will be discussed in testing assumptions). If we take the log of y and

x, the we can talk about percent’s, so a one percent increase in x, means a 2 percent increase in y. Although

to get back to original units, exponentiation is still necessary.

If we look at our models above, in the simple linear regression model of just MPG in the city, for each increase

in one MPG in the city, the price goes down by 1.0219 thousand dollars. This is because the coeﬃcient is

negative, so the relationship is inverse. In our multiple regression model we see for each gallon increase in

fuel tank capacity the price increases 1.1825 thousand dollars. This is because the coeﬃcient is positive.

7.3.1.1 Transformation of Continous Variables

Sometimes its neccessary to transform our variables. This can be done to make interpretation easier, more

relevant to our research question, or to allow our model to meet assumptions.

29

CHAPTER 7. LINEAR REGRESSION 7.3. INTERPRETATION OF PARAMETER ESTIMATES

7.3.1.1.1 Natural Log of Variables Here we will explore what happens when we take the log of con-

tinous variables.

> mod2<-lm(log(Price)~MPG.city);summary(mod2)

Call:

lm(formula = log(Price) ~ MPG.city)

Residuals:

Min 1Q Median 3Q Max

-0.58391 -0.19678 -0.04151 0.19854 1.06634

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.15282 0.13741 30.223 < 2e-16 ***

MPG.city -0.05756 0.00596 -9.657 1.33e-15 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.3213 on 91 degrees of freedom

Multiple R-squared: 0.5061, Adjusted R-squared: 0.5007

F-statistic: 93.26 on 1 and 91 DF, p-value: 1.33e-15

Here we have taken the natural logarithm of our outcome variable. This will be shown later to be advantan-

geous when looking at our assumptions and violations of that. It can also make model interpration diﬀerent

and sometimes easier. So now instead of the original units, its in log units, so we would say, for each MPG

unit increase, the price will decrease 0.0576 percent. This is because the coeﬃcient is negative and so the

relationship is still inverse. Notice the percent of variance explained dramatically increased, from 35% to

50%, this is due to the transformation process.

> mod3<-lm(log(Price)~log(MPG.city));summary(mod3)

Call:

lm(formula = log(Price) ~ log(MPG.city))

Residuals:

Min 1Q Median 3Q Max

-0.61991 -0.21337 -0.03462 0.19766 1.05362

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 7.5237 0.4390 17.14 <2e-16 ***

log(MPG.city) -1.5119 0.1421 -10.64 <2e-16 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.3052 on 91 degrees of freedom

Multiple R-squared: 0.5543, Adjusted R-squared: 0.5494

F-statistic: 113.2 on 1 and 91 DF, p-value: < 2.2e-16

This model looks at what happens when we take the natural log of both the outcome and the predictor.

This is also interpreted diﬀerently, but now both estimates are in percents. So for each percent increase in

30

CHAPTER 7. LINEAR REGRESSION 7.3. INTERPRETATION OF PARAMETER ESTIMATES

MPG in the city, the price decreases by 1.512 percent. Also the model estimates have changed due to our

transformation.

7.3.2 Categorical

When our predictors are categorical, we need to be careful how they are modeled. They cannot be added

simply as numerical values or words. This would cause estimates to be wrong, as the model will assume it

is a continuous variable.

7.3.2.1 Nominal Variables

For nominal variables we must recode the levels of the factor. One way to do this is dummy coding. This

is where we code one factor per variable as a ”1”, with the other factors as ”0”. If we denote the number

of factors as k, then the total number of dummy variables we can model for a factor variable is k − 1. For

example, if we are coding sporting events, lets say we have a variable of diﬀerent sporting events, such as

football, basketball, soccer, and baseball. The total number of dummy variables we can have is 3. The

coding can be done in statistics programs as shown in Table 7.2.

Factor Levels Dummy 1 Dummy 2 Dummy 3

Football 1 0 0

Basketball 0 1 0

Soccer 0 0 1

Baseball 0 0 0

Table 7.2: How Nominal Variables are Recoded in Regression Models using Dummy Coding

As you can see, the baseball part of our sports variable has all zeros. This is the baseline group, for which the

other groups are compared. This is good when there is a natural baseline group (like treatment vs. control

in medical studies). Although ours does not have a natural baseline. So we can do another type of coding

called contrast coding.

Factor Levels Dummy 1 Dummy 2 Dummy 3

Football -1 0 0

Basketball 0 -1 0

Soccer 0 0 -1

Baseball 1 1 1

Table 7.3: How Nominal Variables are Recoded in Regression Models using Contrast Coding

As you can see, the factors sum to 0 in the column. Of course in real data sets we may not have an even

number of levels of the factors, the diﬀerent levels (or group) may have diﬀerent amounts. So if there were

25 participants that played football and only 23 baseball players, ﬁnding the numbers that contrasts that

equal zero will be more diﬃcult. Luckily many software programs allow for this type of coding automatically.

If we only had these variables as our predictors, this would be equivalent to an Analysis of Variance, and

the intercept would be the mean of the baseline variable. This is not so if more predictors are added, as this

would be an Analysis of Covariance.

7.3.2.2 Ordinal Variables

For ordinal variables, we generally can allow them to be in the model as one variable and not require dummy

coding. This is because our assumption of linearity is relatively tenable, as we expect the categories to be

naturally ordered and to be increasing. The interpretation of this would be as you go up one category, the

31

CHAPTER 7. LINEAR REGRESSION 7.4. MODEL COMPARISIONS

value of y will change the amount of the parameter estimate (your beta-coeﬃcient for that variable).

7.4 Model Comparisions

In many cases of research we want to know the eﬀect of how much we add to the ﬁt of a model when we

add or take away one or more predictors. When we do model comparisions, we must ensure the models

are nested. This means we add or take away predictor(s), but otherwise still measuring same things. For

example in the above models we compared MPG and fuel capacity. We will want to know how much adding

fuel capacity to the model adds to model ﬁt, or how adding MPG to the model with fuel capacity already in

the model compares. We can not compare directly a simple regression with only fuel capacity and another

model just measuring MPG.

7.5 Assumptions

The assumptions for regression depend on the nature of the regression being used. For continuous outcomes,

the assumptions are the errors are homoscedastic, normally distributed errors, linearly related outcome and

samples are independent of one another. We look at the assumptions of linear regression and how to test

them. Then we will discuss corrections to them.

7.6 Diagnostics

We need to make sure our model is ﬁtting our assumptions, and we need to see if we can correct for times

our assumptions are violated.

7.6.1 Residuals

So ﬁrst we need to look at our residuals. Remember residuals are the actual y values subtracted from the

predicted y values. For this exercise, I will use the cars data I used above, as it is a good data set to discuss

regression on. For the purposes of looking at our assumptions, let us stick with simple regression where

we have price of vehicles as our outcome and miles per gallon in the city as our predictor. Here I will just

provide R commands and code along with discussions of it.

7.6.1.1 Normality of Residuals

First lets look at our assumption of normality. We assume our errors are normally distributed with mean

0 and some unknown variance. We can do tests of this via my preferred test, Shapiro Wilks test which is

good from sample sizes from 3 - 5000 (Shapiro and Wilk (1965)).

7.6.1.1.1 Tests Lets look at the above model and see if our normality assumption is met. First we test

”mod1” which is just the variables in its original form.

> shapiro.test(residuals (mod1))

Shapiro-Wilk normality test

data: residuals(mod1)

W = 0.8414, p-value = 1.434e-08

As you can see the results aren’t pretty, we reject the null hypothesis for the test, so W = 0.8414, p < .05

which means there’s enough evidence to say that the sample deviates from the theoretical normal distribution

the test was expecting. This test, the null hypothesis is the sample does conform to a normal distribution,

so unlike most testing, we do not want to reject this test.

32

CHAPTER 7. LINEAR REGRESSION 7.6. DIAGNOSTICS

> shapiro.test(residuals (mod2))

Shapiro-Wilk normality test

data: residuals(mod2)

W = 0.9675, p-value = 0.02022

Doing a second model with the log of the outcome help some, but we still cant say our assumption is teneable,

W = 0.9675, p < .05.

> shapiro.test(residuals (mod3))

Shapiro-Wilk normality test

data: residuals(mod3)

W = 0.9779, p-value = 0.1154

This time we cannot reject the null hypothesis, W = 0.9779, p > .05, so taking the log of both our outcome

and predictor allows us approximate the normal distribution, or at the very least we can say there isnt

enough evidence to say our distribution is signiﬁcantly diﬀerent than the theoretical (or expected) normal

distribution.

7.6.1.1.2 Plots Now lets look at plots. Two plots are important, one is a QQ plot, and another is a

histogram. A histogram allows us to look at the frequency of values, and the QQ plot plots our residuals

against what we would expect from a theoretical normal distribution. In those plots the line represents

where we want our residuals to be, means its matching the theoretical normal distribution.

Distribution of Residuals

residuals(mod1)

D

e

n

s

i

t

y

−10 0 10 20 30 40

0

.

0

0

0

.

0

2

0

.

0

4

0

.

0

6

0

.

0

8

−2 −1 0 1 2

−

1

0

0

1

0

2

0

3

0

4

0

Normal Q−Q Plot

Theoretical Quantiles

R

e

s

i

d

u

a

l

s

Figure 7.2: Histogram of Studentized Residuals for Model 1

The ﬁrst set of plots shows us what we expected from our statistics above. Our residuals dont conform to

33

CHAPTER 7. LINEAR REGRESSION 7.6. DIAGNOSTICS

a normal distribution, we can see heavy right skew in the residuals, and the QQ plot is very non-normal at

the extremes.

Distribution of Residuals

residuals(mod2)

D

e

n

s

i

t

y

−0.5 0.0 0.5 1.0

0

.

0

0

.

5

1

.

0

1

.

5

−2 −1 0 1 2

−

0

.

5

0

.

0

0

.

5

1

.

0

Normal Q−Q Plot

Theoretical Quantiles

R

e

s

i

d

u

a

l

s

Figure 7.3: Histogram of Studentized Residuals for Model 2

As we saw in our statistics, taking the log of our outcome made it better, but still not quite to make our

assumption of normality tenable. We are still seeing too much right skew in our distribution.

Distribution of Residuals

residuals(mod3)

D

e

n

s

i

t

y

−0.5 0.0 0.5 1.0

0

.

0

0

.

5

1

.

0

1

.

5

−2 −1 0 1 2

−

0

.

5

0

.

0

0

.

5

1

.

0

Normal Q−Q Plot

Theoretical Quantiles

R

e

s

i

d

u

a

l

s

Figure 7.4: Histogram of Studentized Residuals for Model 3

This looks much better! Our distribution is looking much more normal. Our QQ plot still shows some de-

viation at the top and bottom but our Shapiro-Wilks test gives us enough evidence to show the assumption

of normality is tenable, so this is OK.

34

CHAPTER 7. LINEAR REGRESSION 7.7. FINAL THOUGHTS

7.7 Final Thoughts

Linear regression is used very widely in statistics, most notably because of the pleasing mathmatical prop-

erties of the normal distribution. Its ease of interpretation and wide implementation in software packages

enhances its abilities. One should be cautious about the use of it though to ensure your outcome is normally

distributed.

35

Chapter 8

Logistic Regression

So now we begin to discuss the idea that our outcome is not linear. Logistic regression deals with the idea

out outcome is binary, that is it can only take one one of two values (almost universally 0 and 1). This has

many applications, graduate or not graduate, contract in illness or not, get a job or not, etc. This does pose

problems for interpretation at times, because its not as easy to study.

8.1 The Basics

So we have to model the events that take on values of 0 or 1. The problem is with linear regression in this

sense is that it requires us to use a straight line. This cant be done since our values are bounded. This

means we must go to a diﬀerent distribution than the normald distribution

8.2 Regression Modeling Binomial Outcomes

Contingency tables are useful when we have one categorical covariate. Contingency tables are not possible

when we have a continuous predictor or multiple predictors. Even if there is one variable of interest in

relationship to the outcome, researchers still try to control for the eﬀects of other covariates. This leads to

the use of a regression model to test the relationship between a binary outcome and one or several predictors.

8.2.1 Estimation

The basic regression model taught in introductory statistics classes is linear regression. This has a continuous

outcome, and estimation is done by least squares. That is a line ﬁt to the data where the diﬀerence between

each data point and the line is at its minimum. In a binomial outcome, we cannot use this estimation

technique. The binomial model will estimate proportions, which are bound from 0 to 1. A least squares

model may give estimates outside these bounds. Therefore we turn to maximum liklihood and a class of

models known as ”Genearlized Linear Models” (GLM)

1

.

E(y)

. ¸¸ .

RandomComponent

= β

0

+

p

i=1

β

p

x

p

. ¸¸ .

Systematic Component

2

(8.1)

The random component is the outcome variable, its called the random component because we want to know

why there is variation in this variable. The systematic component is the linear combination of our covariates

and the parameter estimates. When our variable is continuous we don’t have to worry about establishing

a linear relationship as we assume it exists if the covariates are related to the outcome. When we have

categorical outcomes we can not have this linear relationship, so GLMs provide a link function, that allows

a linear relationship to exist if there is a signiﬁcant relationship.

8.2.2 Regression for Binary Outcomes

Two of the most common functions are logit and probit functions. These allow us to look at a linear

relationship between our outcome and our covariates. In ﬁgure 8.1, you can see there is not a lot of diﬀerence

between logit and probit, the diﬀerence is in the interpretation of coeﬃcients (discussed below). The green

line does show how a traditional regression line is not an appropriate ﬁt, because the data (the blue dots)

goes outside the range of the data. The logit and probit ﬁts look at the probabilities of being a success.

The ﬁgure also shows that there is little diﬀerence in the actual model ﬁt between the two models. Logit

and probit models will be very similar in the substantive conclusions made. The primary diﬀerence is in

the interpretation of the results. While we don’t have a true r

2

coeﬃcient, there is a pseudo r

2

that was

created by Nagelkerke (1992) which does give a general sense of how much variation is being explained by

the predictors.

1

For SPSS users, do not confuse this with General Linear Model which performs ANOVA, ANCOVA and MANOVA

2

Some authors use α to denote the intercept term, although most still use β

0

which is still the most popular and will continue

to be used here

36

CHAPTER 8. LOGISTIC REGRESSION 8.2. REGRESSION MODELING BINOMIAL OUTCOMES

x

π

(

x

)

0.0

0.2

0.4

0.6

0.8

1.0

−10 0 10 20

Logit Probit OLS Regression

Figure 8.1: Logit, Probit and OLS regression lines; data simulated from R

37

CHAPTER 8. LOGISTIC REGRESSION 8.3. FURTHER READING

8.2.2.1 Logit

The most common model in education is the logit model, also known as logistic regression, there are two

equations we can solve, equation 8.2 allows us to get the log odds of a positive response (a ”success”).

logit[π(x)] = log

_

π(x)

1 − π(x)

_

= β

0

+ β

p

x

p

(8.2)

The probability of a positive response is calcualted from equation 8.3.

π(x) =

e

β0+βpxp

1 − e

β0+βpxp

(8.3)

Fitted values (either log odds or probabilities) are usually what is given in statistical programs, and just

uses the values from the sample. Although a researcher can place values for the covariates of hypothetical

participants and it will give a probability for those values. One caution would be to ensure the values you

place in the covariates are within the range of the data values (i.e. if your sample ages are 18-24 don’t solve

for an equation of a 26 year old). Since the model was ﬁtted with data that did not include that age range.

8.2.2.2 Probit

The probit function is similar in that its function is assumes an underlying latent normal distribution bound

between 0 and 1 which is found in 8.4. A probit model will change the probabilities into z scores. In Agresti

(2007, p. 72) he uses the probit coeﬃcient of 0.05, which is -1.645, which is 1.645 standard deviations below

the mean for that probability.

P(ˆ π) = Φ

−1

(β

0

+ β

p

x

p

) (8.4)

8.2.2.3 Logit or Probit?

As can be seen in ﬁgure 8.1 the model ﬁt for both logistic and probit regression is very similar and this is

usually true. Its also possible to alter the coeﬃcients to change the coeﬃcients from logit to probit or vice

versa. Amemiya (1981) showed multiplying a logit coeﬃcient by 1.6 will give the probit coeﬃcient. Andrew

Gelman (2006) ran simulations and found results between 1.6 and 1.8 to be correct corrections, and also

corresponds to Agresti (2007) which mentions the scaling being between 1.6 and 1.8.

8.2.3 Model Selection

Researchers tend to ﬁt multiple models to try and ﬁnd the best ﬁtting model consistent with their theoretical

framework. There are several ways to evaluate models to determine which model ﬁts best. Sequential model

building is a technique frequently used to look at the addition of predictors to a regression model. The same

framework that is used with other regression models as well. In a linear regression the test to test the models

will be an F test (since the null hypothesis of the model uses an F distribution), models which use maximum

likelihood use the likelihood ratio test which is chi-squared like the ratio test used above. Shmueli (2010)

examines the diﬀerences in building a model to explain the relationship of predictors to an outcome, or a

model to predict an outcome from future data sources. The article also discusses the information criteria

such as the AIC and BIC measures used to test model ﬁt.

8.3 Further Reading

This chapter borrows heavily from Alan Agresti (2007) who is well known and respected for his work in

categorical data analysis. Some books which cover many statistical models yet still do a good job at logistic

regression is Tabachnick & Fidell (2006) and Stevens (2009). The ﬁrst book is great for a textbook, Stevens

is a dense book, but has both SPSS syntax and SAS code, works well a must have reference. Gelman and

Hill (2007) is rapidly becoming a classic book in statistical inference yet its computation is focused on R

which hasn’t hit mainstream academia much, but they do have some supplemental material at the end of the

book for other programs. Although for those who have an interest in R, another great book is by Faraway

(2005). Andy Field (2009) has a classic book called ”Discovering Statistics Using SPSS” which blends very

nicely SPSS and statistical concepts, and is good at explaining of diﬃcult statistical concepts. Students

who wish to explore categorical data analysis conceptually there are a few good books, I recommend Agresti

(Agresti2002); this is a diﬀerent book from his 2007 book a focus on theory yet still a lot of great examples

of application). Long’s (1997) book explores maximum likelihood methods focusing on categorical outcomes.

38

CHAPTER 8. LOGISTIC REGRESSION 8.4. CONCLUSIONS

It combines a more conceptual and mathematical ideas of maximum likelihood. A classic by McCullagh and

Nelder (1989) which is a seminal work in the concept of generalized linear models (the citation here is their

well known second edition).

8.4 Conclusions

This chapter looked in an introductory manner. There is more to analyzing the binomial outcomes and

reading some of the works above can help add to analyzing binomial outcomes. This is especially important

for researchers whose outcomes will be binomial. These principals will also act as a starting point to learn

about other categorical outcomes such as nominal outcomes with more than two categories, or an ordinal

outcomes (used often as likert scales).

39

Bibliography

Agresti, A. (2007, March). An Introduction to Categorical Data Analysis. Hoboken, NJ: Wiley-Blackwell.

doi:10.1002/0470114754

Amemiya, T. (1981). Qualitative response models: a survey. Journal of Economic Literature, 19(4), 1483–

1536. doi:10.2298/EKA0772055N

Andersen, P. K., & Skovgaard, L. T. (2010). Regression with Linear Predictors. Statistics for Biology and

Health. New York, NY: Springer New York.

Chatterjee, S., & Hadi, A. S. (2006). Regression analysis by example (4 ed). Hoboken, NJ: Wiley-Interscience.

Everitt, B. S., Hothorn, T., & Group, F. (2010). A Handbook of Statistical Analyses Using R, Second Edition.

Boca Raton, FL: Chapman and Hall/CRC.

Faraway, J. J. (2005). Extending the Linear Model with R: Generalized Linear, Mixed Eﬀects and Non-

parametric Regression Models (Chapman & Hall/CRC Texts in Statistical Science). Boca Raton, FL:

Chapman and Hall/CRC.

Faraway, J. J. (2004). Linear Models with R (Chapman & Hall/CRC Texts in Statistical Science). Boca

Raton, FL: Chapman and Hall/CRC.

Faraway, J. J. (2002). Practical Regression and ANOVA using R.

Field, A. (2009). Discovering Statistics Using SPSS (Introducing Statistical Methods). Thousand Oaks, CA:

Sage Publications Ltd.

Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. New

York: Cambridge University Press.

Gelman, A. (2006). Take logit coeﬃcients and divide by approximately 1.6 to get probit coeﬃcients. Retrieved

from http://www.andrewgelman.com/2006/06/take\ logit\ coef/

Lock, R. (1993). 1993 new car data. Journal of Statistics Education, 1(1). Retrieved from http: //www.

amstat.org/PUBLICATIONS/JSE/v1n1/datasets.lock.html

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks,

CA: SAGE Publications.

McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models, Second Edition (Chapman & Hall/CRC

Monographs on Statistics & Applied Probability). Boca Raton, FL: Chapman and Hall/CRC.

Nagelkerke, N. J. D. (1992). Maximum likelihood estimation of functional relationships. Springer-Verlag New

York.

Pearl, J. (2009a). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.

Pearl, J. (2009b). Causality: Models, Reasoning and Inference. Cambridge University Press.

Rencher, A., & Schaalje, B. (2008). Linear Models in Statistics (2nd ed.). Wiley-Interscience.

Shapiro, S. S., & Wilk, M. B. (1965, December). An analysis of variance test for normality (complete samples).

Biometrika, 52(3-4), 591–611. doi:10.1093/biomet/52.3-4.591

Sheather, S. J. S. J. (2009). A modern approach to regression with R. New York, NY: Springer Verlag.

Retrieved from http://www.springerlink.com/content/978-0-387-09607-0

Shmueli, G. (2010, August). To Explain or to Predict? Statistical Science, 25(3), 289–310.

Stevens, J. P. (2009). Applied Multivariate Statistics for the Social Sciences, Fifth Edition. New York, NY:

Routledge Academic.

Tabachnick, B. G., & Fidell, L. S. (2006). Using Multivariate Statistics (5th Ed.). Upper Saddle River, NJ:

Allyn & Bacon.

Venables, W. N. N., & Ripley, B. D. D. (2002). Modern applied statistics with S (4th Ed.). New York, NY:

Springer.

40

- Multi-Product Production Cycling
- Timetable Template
- Ethics
- Odoh, 2013
- 1.2 Hypothesis Tests Complete
- Paired Sample T-Test
- Comparison of Means
- Amr Word File
- Tutors Quick Guide to Statistics Print 2 Up Pamphlet
- 4. Chapter IV - Data Analysis
- Siti Ermawati 5g
- A Distant Dream Coastal Women Empowerment at Dhanuskodi
- statistical testing
- BRM_MGT558
- IPS7e_LecturePowerPointSlides_ch07
- anova.pdf
- Business Statistics Assignment Graphs and Written Answers
- Analysis of Variance ANOVA
- Assignment 2
- Two-Sample Testing for Equality of Variances
- sem 1 - occt 504 - quantitative paper
- 20
- groupresearch final 504
- Solution+to+Week+6+Assignment
- Final Project
- Ch4 Slides
- Ada 589086
- COMPARATIVE STUDY OF THE IMPACT OF PROFESSIONAL ETHICS EDUCATION USING LECTURE AND MULTIMEDIA SOFTWARE ON KNOWLEDGE OF NURSING STUDENTS
- Final Dissertation
- Rural Population Density

Sign up to vote on this title

UsefulNot usefulRead Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading