0 Up votes0 Down votes

3 views89 pagesxaa

Apr 03, 2017

© © All Rights Reserved

PPTX, PDF, TXT or read online from Scribd

xaa

© All Rights Reserved

3 views

xaa

© All Rights Reserved

- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- The Wright Brothers
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Other Einstein: A Novel
- The Kiss Quotient: A Novel
- State of Fear
- State of Fear
- The 10X Rule: The Only Difference Between Success and Failure
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The Black Swan
- Prince Caspian
- The Art of Thinking Clearly
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Last Battle
- The 6th Extinction
- HBR's 10 Must Reads on Strategy (including featured article "What Is Strategy?" by Michael E. Porter)

You are on page 1of 89

+Introduction: Simple

Linear Regression Models

+

Simple Linear Regression (SLR)

Models

We always draw a scatterplot first to obtain an

idea of the strength (measured by correlation),

form (e.g., linear, quadratic, exponential), and

direction (positive or negative, in the linear

case) of any relationship that exists between two

variables.

close to 1) positive (as x increases, y increases)

linear (straight line) relationship between the

percentage of voters of each state who voted for

President Obama in 2008 and in 2012.

+

Simple Linear Regression (SLR)

Models

70 60

% Obama 2012

40 50

30

40 50 60 70

% Obama 2008

+

Simple Linear Regression (SLR)

Models

When data are collected in pairs the

standard notation used to designate this

is:

(x1, y1), (x2, y2), . . . , (xn, yn),

where x1 denotes the first value of the X-

variable and y1 denotes the first value of

the Y-variable. The X-variable is called the

explanatory variable, while the Y-

variable is called the response variable.

+

Simple Linear Regression (SLR)

Models

Forexample, the first few entries in

the election data set above were

(37.7, 40.8), (38.8, 38.4), (38.8,

36.9)

meaning 37.7% of Alabama voters

voted for President Obama in 2008,

while 40.8% did so in 2012, and so on.

+

Simple Linear Regression (SLR)

Models

We could use a matched pairs t-test to test

whether the same percentage of voters

voted for President Obama in 2008 and

2012 in the 50 states, on average. If the

percentages were indeed equal, what would

that mean for our regression model?

average percentages being equal that the percentage of

the country voting for Obama had stayed the same why

not?

+

S.L.R. Models

The regression of Y on X is linear if

determine the intercept and the slope,

respectively, of a specific straight line.

Note: the parabolic model

because the mean of Y given X is related to the

parameters via a linear function.

+

S.L.R. Models

+

S.L.R. Models

There will almost certainly be some variation in

Y due strictly to random phenomenon that

cannot be measured, predicted, or explained.

This variation is called random error, and

denoted ei.

actually occurs and our model is the error. Our

model doesnt explain everything; the amount

of variability in our response variable that

remains unexplained is measured by the error.

+

S.L.R. Models

Suppose that Y1, Y2 , , Yn are independent

realizations of the random variable Y that are

observed at the values x1 , x2 , , xn of a

random variable X . If the regression of Y on X

is linear, then for i = 1, 2, , n;

that E(ei | X) = 0.

+

Continued

Notice that the random error term does not

depend on x , nor does it contain any

information about Y (otherwise it would be a

systematic error).

variance:

+

Introduction to Least Squares

Dont get these ideas confused:

+

Introduction to Least Squares

+Deriving And Interpreting

Parameter Estimates

+

Least Squares: Derive

parameter estimates

Goal: Minimize

with respect to 0

and 1.

with two unknowns. Set

both equal to 0 and solve.

+

Least Squares: Derive

parameter estimates

First: derivative

with respect to the

intercept. Solve

for the y-intercept.

line goes through the point

+

Least Squares: Derive

parameter estimates

Second: Derivative

with respect to the

slope. Solve for

the slope.

+

Example: R Code

election<read.csv("/Users/Elizabeth//stat608/sp13/pct_obama.csv)

my.lm<lm(election$Obama_12~election$Obama_08)

my.lm

summary(my.lm)

anova(my.lm)

plot(x,y)

(e.g. the lm function), type:

?lm

+

Example: SAS Code

dataelection;

infile/Users/Elizabeth//stat608/sp13/pct_obama.csv;

inputstate$Pollclose$ElectoralObama12Romney12Pred$Obama08

Mccain08;

run;

procregdata=election;

modelObama12=Obama08;

run;

procgplotdata=election;

ploty*x;

run;

+

Example: Election Data

Explanatory variable: % voting for Obama in 08

Response variable: % voting for Obama in 12

Coefficients:

(Intercept) election$Obama_08

-4.461 1.042

+

Least Squares: Matrix Setup

+

Derivatives of Matrices

Derivatives of matrices are defined as partials

with respect to the variable vector. For example:

+

Derivatives of Matrices

The previous example was a derivative of a

scalar-by-vector on this wikipedia page:

http://en.wikipedia.org/wiki/Matrix_calculus#

Derivatives_with_matrices

derivatives in the Matrix Cookbook. (61) p. 9

(see eCampus)

+

Derivatives of Quadratic Forms

+ Least Squares: Derive

parameter estimates in matrix

notation

+

Linear Models: Inference

+ 28

George Box

but some are useful.

+ 29

Inference

Objective: Develop hypothesis tests and

confidence intervals for the slope and

intercept of a least squares regression model

1. Assumptions: Next chapter

2. Bias of Estimates

3. Variability of Estimates

In English: Is the percent of the states that

voted for Obama in 2008 linearly related to

the percentage in 2012?

+

Review: Expectation and

Variance of Random Variables

Recall the definition of variance:

correlation:

+

Expectation and Variance of

Random Variables

Assume X and Y are random variables, and a and

b are constants. Then:

+

Correlation

Correlation is unit-free (with no measurement error,

changing from meters to feet yields the same

correlation).

response variables are exchanged.

relationship between the explanatory and response

variables; correlation closer to 0 indicates a weaker

association.

+

Correlation

below average, a positive value is contributed to the sum.

When exactly one is above average, a negative value is

contributed to the sum.

When most values are in the bottom left and top right of the

plot, the correlation is positive.

+

Correlation

70

60

% Obama 2012

50

r = 0.98.

40

30

positive relationship

40 50 60 70

between 2008 and 2012.

% Obama 2008

+

Expectation and Variance of

Matrices

Assume x is a vector of random variables. Then:

+

Variance / Covariance Matrix

+

Correlation Matrix

the appropriate standard deviations

elementwise.

+

Expectation and Variance of

Matrices

Assume X and Y are matrices of random

variables; x and y are vectors of random

variables; and A, B, and C are constant matrices.

Then:

+

Quadratic Form

+

Expectation of Quadratic Form

It can be shown that

variance-covariance matrix of x, respectively,

and tr denotes the trace of a matrix.

and ; in particular, normality of x is not

required.

+ 41

Usual Assumptions

In matrix notation:

+

Bias

expected value and the parameter it should estimate.

+ 43

Intercept

Unbiased Estimates

+ Inferences About the Slope and 44

Intercept

Variance of the estimates

+

Inferences About the Slope and

Intercept

What does it mean to take the expectation of the

sample slope?

+ Inferences About the Slope 46

Distribution of slope:

1xi + ei is also normally distributed, given x. Since

the sample slope is a linear combination of the yis, it

is also normally distributed.

+ Inferences About the Slope 47

sample standard deviation. Then our test statistic

has the t-distribution (with n k degrees of freedom

in general, k being the number of columns of X).

+

Inferences About the Slope

The

standard error is the standard deviation of a statistic

(often, the sample standard deviation).

Dont mix up the critical value with the test statistic calculated

from the data! Look up the critical value on a table, or use R.

square root of MSE, RMSE, is s in the equation above.

+ Inferences About the Slope 49

Pr(>|t|)

0.00459 **

< 2e-16 ***

Confidence Interval for the slope:

+

Inferences About the Slope

+

More Inference

+ 52

intervals for the regression line:

1. Confidence intervals for the mean

2. Prediction intervals for individuals

In English: If 48% of a state voted for

Obama in 2008, what percentage voted

for him in 2012?

+ 53

+ 54

+ 55

Confidence interval for slope:

If x increases by 1 unit, by how much does y increase or decrease?

Deals with sampling distribution of sample slope from one sample to

another.

Can use CLT to say the sample slope is approximately normal for large

n.

For a given value of x, what is the mean value of y?

Deals with sampling distribution of sample mean from one sample to

another.

Can use CLT to say the sample mean is approximately normal for large

n.

For a given value of x, what are the individual values of y?

Deals with distribution of individuals about the regression line.

The CLT is NOT useful in this case. We need to know the distribution of

the errors.

+ 56

Population Regression Line

interval for the unknown population mean at a

given value of X, which we shall denote by x*.

is given by

+ 57

Population Regression Line

Anestimator of this unknown quantity is the

value of the estimated regression equation at

X = x*, namely:

+ 58

Sampled?

How many populations were sampled in

the election data?

+ 59

do we Have?

___ means

___ variances

However, we assume that

+ 60

Variance of mean

+ 61

+ 62

Population Regression Line

A 100(1 a)% confidence interval for

distribution with n - 2 degrees of freedom.

+ 63

Population Regression Line

Does the confidence interval for the mean y-value get

narrower as n gets bigger?

your desired x-value is farther from the mean for x?

+ 64

Value of Y

+ 65

actual value of Y

+Dummy (Indicator)

Variable Regression

+ 67

Experiment: Does increasing calcium intake

decrease blood pressure?

+ 68

For placebo group:

+ 69

21 total men; 10

took the calcium

supplement, and 11

took the placebo.

+ 70

indicator of taking the placebo and one an indicator of

taking the calcium supplement?

the matrix X must be of full rank.

+ 71

+ 72

+ 73

+ 74

my.lm<-lm(y~x)

summary(my.lm)

Output:

(Intercept) -0.2727 2.2266 -0.122 0.904

x 5.2727 3.2267 1.634 0.119

Difference between means = 5.27, as expected.

+ 75

+

Geometry

+ 77

Geometric Interpretation of

LetV be the vector space spanned by the

columns of our design matrix X.

Then is the projection of the vector y down

into the vector space V.

Is a in V?

+ 78

Geometric Interpretation of

+ 79

Geometric Interpretation of

+ 80

Geometric Interpretation of

+ 81

Geometric Interpretation of

+ 82

Geometric Interpretation of

Is it true that

What is

+

ANOVA Table

+

ANalysis Of VAriance (ANOVA)

measuring the total amount of variability in the data

set.

one explained by x, and the leftovers.

+ 85

ANOVA Table

Were considering two sources of variation using this

regression model:

considered.

yi = ( 0 + 1xi) + ( i)

is not really about variances; the point in a regression model is

to study how y changes with x.

86

+ 87

ANOVA Table

+

ANOVA Table

_____________ decreases.

increases.

+

ANOVA Table

Degre

es of

Freedo Sums of Mean F- P-

m Squares Squares Statistic value

Regressi

on 1

(Model)

Residual

n2

(Error)

Total

n1

- Raja Genius 10Uploaded bysenthil
- Ms18d003 Ass 1 Ques 1Uploaded byshiva prakash
- CVEN2002 Week6Uploaded byKai Liu
- 1st Quarter Test in STATISTICSUploaded byrosemarie
- T10_MixModelsUploaded bymaleticj
- Stats Lecture 06 Sampling DistributionsUploaded byKatherine Sauer
- 734.pdfUploaded byMalcolm Christopher
- Solution HW3Uploaded bymrezza
- ica algo(notes11)Uploaded bygibsatwork
- 2,6,10,14 part of 20Uploaded bytgidkp
- calculozscoreUploaded byafroditoman
- Purchase Intention of International Branded Clothes Fashion among Younger’s in JakartaUploaded byVictor Brian B Martin
- HandoutsUploaded byIan Cronbach
- AppendicesUploaded byPika Wabbles
- Normal Distributions Grading on the BellcurveUploaded bykokolay
- IGNOU MBA MS-05 Free Solved Assignment 2012Uploaded byahesan ali
- IGNOU MBA MS-05 Free Solved Assignment 2012Uploaded byjignesh07
- SN2558Uploaded bycarlodoimo
- [2] Correlation and Regression - The Simple CaseUploaded bySonny Boy P. Aniceto Jr.
- 9-07 Factorial Planning ProcessUploaded byDorababu Adapa
- Berdingetal.1991aUploaded byOpal Priya Wening
- Berdingetal.1991a.pdfUploaded byOpal Priya Wening
- On the Regularity of the Distribution of the Maximum by Azais and WscheborUploaded byadaniliu13
- Core AssignmentUploaded bytewodros
- Detection of Facial Feature Points UsingUploaded byOmar Rigane
- 8_ckhouryAbstractUploaded byPravin Singh
- 24303-49919-1-PBUploaded byYeni Septiani
- Customer Service Department and Clients Satisfaction: An Empirical Study of Commercial Bank of Surkhet, ValleyUploaded byThe Ijbmt
- STAT-112-20172S-_-y-ExamWeek-10_-Quarterly-Exam-_-Quarterl.docxUploaded byArman
- International BusinessUploaded bySahibzada Muhammad Hamza

- 40 Questions on Probability for Data Science - [Solution_ SkillPower - Probability, DataFest 2017]Uploaded byjstpallav
- Abhijeet Shekhar ResumeUploaded byjstpallav
- Pay StatementsUploaded byjstpallav
- Pallav Anand 2684 M.S. ReportUploaded byjstpallav
- Understanding XGBoost Model on Otto DatasetUploaded byjstpallav
- 31dd395f-93ab-44c9-bb6c-7148c59a1caa.pdfUploaded byjstpallav
- Cover Letter OrUploaded byjstpallav
- Resume Pal LavUploaded byjstpallav
- What Do Hiring Managers Look for in a Data Scientist’s CV_ _ Ben Dias _ Pulse _ LinkedInUploaded byjstpallav
- Data Science Interview QuestionsUploaded byjstpallav
- c5b45ddb3ff1f62 EkUploaded byjstpallav
- Data Science Hiring GuideUploaded byramesh158
- Choosing a Machine Learning ClassifierUploaded byjstpallav
- PythonScikitLearnMiniProjects:1KMeansClusteringHandwrittenDigits.ipynb at Master · 04pallav:PythonScikitLearnMiniProjectsUploaded byjstpallav
- Airline Data DescriptionUploaded byjstpallav
- 44368013 Interview Questions ExperiencesUploaded byjstpallav
- Stat 608 Chapter 3Uploaded byjstpallav
- Stat 608 Chapter 4Uploaded byjstpallav
- Stat 608 Chapter 5Uploaded byjstpallav
- Stat 608 Chapter 6Uploaded byjstpallav
- Dates TimesUploaded byPallav Anand
- Case Study 3PallavUploaded byjstpallav
- Final Project ReportUploaded byjstpallav
- LP HW1Uploaded byjstpallav
- error analysisUploaded byManu Chakkingal
- MMAUGuideUploaded byjstpallav

- Accuracy AssessmentUploaded byNasirajk
- Chapter 2 of Two- Theory of Control ChartUploaded byAmsalu Setey
- Walang kwentaUploaded byGenelle Mae Madrigal
- Https Onlinecourses.science.psuUploaded byKadiri Olanrewaju
- Lecture 1Uploaded byVishnu Venugopal
- 201-HTH-05Uploaded byYo_amaranth
- evermann_slides.pdfUploaded byEgyptian Researcher
- D_Tutorial 6 (Solutions)Uploaded byAlfie
- Multiple linear regressionUploaded byShameer P Hamsa
- Multiple Imputation for Missing Data- A Cautionary TaleUploaded byDavid Sacco
- Confidence Intervals for the Area Under an ROC Curve.pdfUploaded byscjofyWFawlroa2r06YFVabfbaj
- Assignment 3Uploaded byDaniel Cryer
- Chapter6 Anova BibdUploaded byShailendra Kr Yadav
- Using Gretl for POE4Uploaded bysanjita
- Ox Metrics IntroUploaded bysweetbabywinne
- alleahUploaded byAlleah Laylo
- Modelling Greek Industrial Production IndexUploaded byItalo Arbulú Villanueva
- An Introduction to Applied multivariate analisis with R.pdfUploaded byandres
- Regression AnalysisUploaded bypravesh1987
- bioassayUploaded byRicky
- 6691_01R_que_2013aerger0613Uploaded byyvg95
- ChoiceModelR ManualUploaded byIda Bagus Ketut Wedastra
- CFA-AMOSUploaded byTransportasi Masal
- aUploaded bygudun
- Industrial Engineering by S K MondalUploaded byRaghuChouhan
- Experimental Design and Analysis SeltmanUploaded byLarryag12196
- Machine Learning 3Uploaded byanupam20099
- Econometrics Note3sr StUploaded byShashank Jain
- Module 5 - Data AnalysisUploaded byVishal Sharma
- GCSE Mathematics Stats CourseworkUploaded bymarkhowson999

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.