Research paper on regression & hypothesis testing

© All Rights Reserved

12 views

Research paper on regression & hypothesis testing

© All Rights Reserved

- Statistical and Numerical Methods for Chemical Engineers
- Century National Banks
- Concepts linear vs Non-linear Regression
- Multiple Regression_Statistical Method for Psychology
- Regression analysis
- 2014 02 03 and 05 econ 141 uc berkeley
- Bivariate Regression Analysis
- Simple Linear Regression
- Onset Rainy Season West Africa
- Project of Harish
- English Version
- Miaou and Lum 1993
- Statistics Project1
- Exam 1 Notes
- regresi-linier-sederhana-dan-berganda1.pdf
- Output
- SPSS PASW Infrnc Regression Determination
- stataproj
- Ch07
- 302 10 Chi Squared Tests and Regression S15

You are on page 1of 24

Regression

Goals

Linear regression in R

Estimating parameters and hypothesis testing

with linear models

Develop basic concepts of linear regression from

a probabilistic framework

Regression

Technique used for the modeling and analysis of

numerical data

Exploits the relationship between two or more

variables so that we can gain information about one of

them through knowing values of the other

Regression can be used for prediction, estimation,

hypothesis testing, and modeling causal relationships

Regression Lingo

Y = X

1

+ X

2

+ X

3

Dependent Variable

Outcome Variable

Response Variable

Independent Variable

Predictor Variable

Explanatory Variable

Why Linear Regression?

Suppose we want to model the dependent variable Y in terms

of three predictors, X

1

, X

2

, X

3

Y = f(X

1

, X

2

, X

3

)

Typically will not have enough data to try and directly

estimate f

Therefore, we usually have to assume that it has some

restricted form, such as linear

Y = X

1

+ X

2

+ X

3

Linear Regression is a Probabilistic Model

Much of mathematics is devoted to studying variables

that are deterministically related to one another

!

y = "

0

+ "

1

x

!

"

0

!

y

!

x

!

"

1

=

#y

#x

!

"y

!

"x

But were interested in understanding the relationship

between variables related in a nondeterministic fashion

A Linear Probabilistic Model

!

"

0

!

y

!

x

!

y = "

0

+ "

1

x + #

Denition: There exists parameters , , and , such that for

any xed value of the independent variable x, the dependent

variable is related to x through the model equation

!

"

0

!

"

1

!

"

2

!

" is a rv assumed to be N(0, #

2

)

!

y = "

0

+ "

1

x

True Regression Line

!

"

1

!

"

2

!

"

3

Implications

The expected value of Y is a linear function of X, but for xed

x, the variable Y differs from its expected value by a random

amount

Formally, let x* denote a particular value of the independent

variable x, then our linear probabilistic model says:

!

E(Y | x*) =

Y| x*

= mean value of Y when x is x *

!

V(Y | x*) = "

Y| x*

2

= variance of Y when x is x *

Graphical Interpretation

!

y = "

0

+ "

1

x

!

"

0

+ "

1

x

1

!

"

0

+ "

1

x

2

!

x

1

!

x

2

!

y

!

x

For example, if x = height and y = weight then is the average

weight for all individuals 60 inches tall in the population

!

Y| x

1

=

!

Y| x

2

=

!

Y| x=60

One More Example

Suppose the relationship between the independent variable height

(x) and dependent variable weight (y) is described by a simple

linear regression model with true regression line

y = 7.5 + 0.5x and

Q2: If x = 20 what is the expected value of Y?

!

Y| x=20

= 7.5 + 0.5(20) = 17.5

Q3: If x = 20 what is P(Y > 22)?

Q1: What is the interpretation of = 0.5?

!

"

1

The expected change in height associated with a 1-unit increase

in weight !

" = 3

!

P(Y > 22 | x = 20) = P

22 -17.5

3

"

#

$

%

&

'

=1( )(1.5) = 0.067

Estimating Model Parameters

Point estimates of and are obtained by the principle of least

squares

!

"

0

!

"

1

!

f ("

0

,"

1

) = y

i

#("

0

+ "

1

x

i

)

[ ]

i=1

n

$

2

!

"

0

!

y

!

x

!

"

0

= y #

"

1

x

Predicted and Residual Values

Predicted, or tted, values are values of y predicted by the least-

squares regression line obtained by plugging in x

1

,x

2

,,x

n

into the

estimated regression line

!

y

1

=

"

0

#

"

1

x

1

!

y

2

=

"

0

#

"

1

x

2

Residuals are the deviations of observed and predicted values

!

e

1

= y

1

"

y

1

e

2

= y

2

"

y

2

!

y

!

x

!

e

1

!

e

2

!

e

3

!

y

1

!

y

1

Residuals Are Useful!

!

SSE = (e

i

i=1

n

"

)

2

= (y

i

i=1

n

"

#

y

i

)

2

They allow us to calculate the error sum of squares (SSE):

Which in turn allows us to estimate :

!

"

2

!

"

2

=

SSE

n #2

As well as an important statistic referred to as the coefcient of

determination:

!

r

2

=1"

SSE

SST

!

SST = (y

i

" y )

2

i=1

n

#

Multiple Linear Regression

Extension of the simple linear regression model to two or

more independent variables

!

y = "

0

+ "

1

x

1

+ "

2

x

2

+ ... + "

n

x

n

+#

Partial Regression Coefcients: !

i

! effect on the

dependent variable when increasing the i

th

independent

variable by 1 unit, holding all other predictors

constant

Expression = Baseline + Age + Tissue + Sex + Error

Categorical Independent Variables

Qualitative variables are easily incorporated in regression

framework through dummy variables

Simple example: sex can be coded as 0/1

What if my categorical variable contains three levels:

x

i

=

0 if AA

1 if AG

2 if GG

Categorical Independent Variables

Previous coding would result in colinearity

Solution is to set up a series of dummy variable. In general

for k levels you need k-1 dummy variables

x

1

=

1 if AA

0 otherwise

x

2

=

1 if AG

0 otherwise

AA

AG

GG

x

1

x

2

1

1

0

0

0 0

Hypothesis Testing: Model Utility Test (or

Omnibus Test)

The rst thing we want to know after tting a model is whether

any of the independent variables (Xs) are signicantly related to

the dependent variable (Y):

!

H

0

: "

1

= "

2

= ... = "

k

= 0

H

A

: At least one "

1

# 0

f =

R

2

(1$ R

2

)

k

n $(k +1)

!

Rejection Region : F

",k,n#(k+1)

Equivalent ANOVA Formulation of Omnibus Test

We can also frame this in our now familiar ANOVA framework

- partition total variation into two components: SSE (unexplained

variation) and SSR (variation explained by linear model)

Equivalent ANOVA Formulation of Omnibus Test

We can also frame this in our now familiar ANOVA framework

!

Rejection Region : F

",k,n#(k+1)

- partition total variation into two components: SSE (unexplained

variation) and SSR (variation explained by linear model)

n-1 Total

n-2 Error

k Regression

F MS Sum of Squares df Source of

Variation

!

SSR

k

!

SSE

n "2

!

MS

R

MS

E

!

SSR = (

y

i

" y )

2

#

!

SSE = (y

i

"

y

i

)

2

#

!

SST = (y

i

" y )

2

#

F Test For Subsets of Independent Variables

A powerful tool in multiple regression analyses is the ability to

compare two models

For instance say we want to compare:

!

Full Model : y = "

0

+ "

1

x

1

+ "

2

x

2

+ "

3

x

3

+ "

4

x

4

+#

!

Reduced Model : y = "

0

+ "

1

x

1

+ "

2

x

2

+#

!

f =

(SSE

R

" SSE

F

) /(k " l)

SSE

F

/([n "(k +1)]

Again, another example of ANOVA:

SSE

R

= error sum of squares for

reduced model with predictors

!

l

SSE

F

= error sum of squares for

full model with k predictors

Example of Model Comparison

We have a quantitative trait and want to test the effects at two

markers, M1 and M2.

!

f =

(SSE

R

" SSE

F

) /(3"2)

SSE

F

/([100 "(3+1)]

=

(SSE

R

" SSE

F

)

SSE

F

/96

Full Model: Trait = Mean + M1 + M2 + (M1*M2) + error

Reduced Model: Trait = Mean + M1 + M2 + error

!

Rejection Region : F

a, 1, 96

Hypothesis Tests of Individual Regression

Coefficients

Hypothesis tests for each can be done by simple t-tests:

!

"

i

!

H

0

:

"

i

= 0

H

A

:

"

i

# 0

T =

"

i

$"

i

se("

i

)

Condence Intervals are equally easy to obtain:

!

"

i

t

# / 2,n$(k$1)

se(

"

i

)

!

Critical value : t

" / 2,n#(k#1)

Checking Assumptions

Critically important to examine data and check assumptions

underlying the regression model

! Outliers

! Normality

! Constant variance

! Independence among residuals

Standard diagnostic plots include:

! scatter plots of y versus x

i

(outliers)

! qq plot of residuals (normality)

! residuals versus tted values (independence, constant variance)

! residuals versus x

i

(outliers, constant variance)

Well explore diagnostic plots in more detail in R

Fixed -vs- Random Effects Models

In ANOVA and Regression analyses our independent variables can

be treated as Fixed or Random

Fixed Effects: variables whose levels are either sampled

exhaustively or are the only ones considered relevant to the

experimenter

Random Effects: variables whose levels are randomly sampled

from a large population of levels

Expression = Baseline + Population + Individual + Error

Example from our recent AJHG paper:

- Statistical and Numerical Methods for Chemical EngineersUploaded byadminchem
- Century National BanksUploaded byAsad Msa
- Concepts linear vs Non-linear RegressionUploaded byZain Ahmad Khan
- Multiple Regression_Statistical Method for PsychologyUploaded byderafik
- Regression analysisUploaded byChetan B Arkasali
- Bivariate Regression AnalysisUploaded byMegaDocs
- Simple Linear RegressionUploaded byPatrick Hernandez
- Project of HarishUploaded byAshish Sethi
- 2014 02 03 and 05 econ 141 uc berkeleyUploaded bySeema Soleja
- Onset Rainy Season West AfricaUploaded byRama Kumbara
- English VersionUploaded byarief1897
- Miaou and Lum 1993Uploaded byWilliam Sasaki
- Statistics Project1Uploaded bytakesomething
- Exam 1 NotesUploaded byBBYPENNY
- regresi-linier-sederhana-dan-berganda1.pdfUploaded byabdilimbong
- OutputUploaded byAbudzar Ghifari
- SPSS PASW Infrnc Regression DeterminationUploaded byMonniq
- stataprojUploaded bydebasis
- Ch07Uploaded byأبوسوار هندسة
- 302 10 Chi Squared Tests and Regression S15Uploaded byD
- Interpreting CorrelationUploaded byAnonymous AQ9cNm
- Chapter_01_Regression_5E(1).pdfUploaded bykhoold93
- Assessing Accounts Receivables Management as a Determinant of Profitability on Agro-Firms in Eldoret Business CentreUploaded byIJAERS JOURNAL
- 2004 Summer Homework7Uploaded byM
- UJI LAGIUploaded byAnnisa Aisyha Malik
- Help Nonlinear RegressionUploaded bybennyferguson
- Regression LineUploaded byFebz Canutab
- OutputUploaded byAdi Sarli
- statistika lanjut 2.docxUploaded bydivazalza
- Residuals CheckUploaded byDennis Chen

- consumer behaviourUploaded byAnjali Shah
- Diffusion of InnovationUploaded byAnjali Shah
- Final ResearchUploaded byAnjali Shah
- Decision MakingUploaded byAnjali Shah
- Mis PPTUploaded byAnjali Shah
- Annual Report 2009-10 Part3,apoll tyresUploaded byAnjali Shah
- PMUploaded byYaser Mohamed
- Ch 1Uploaded byAnjali Shah
- Shipping,Research,port managementUploaded byAnjali Shah
- Nature and ScopeUploaded byAnjali Shah
- Annual Report,apollo tyresUploaded byAnjali Shah
- 071011 Apollo Annual Report Fy11Uploaded bygardianjoe86
- managementUploaded byAnjali Shah
- Uplifting earthUploaded byAnjali Shah
- GemsandJewellery_sectoralUploaded byAnkit Nindra
- jun_9_mbaUploaded byVinodh Veluswamy

- Proceedings of the International Conference on Computational Creativity (ICCC-X, 7-9 January 2010, Lisbon, Portugal)Uploaded byholtzermann17
- Ericsson-APG 43L(Linux) O&MUploaded byNassif Hawa
- Bs 7799 ControlsUploaded byNarayanaRao
- powerptstdygd05Uploaded bysatishcreative
- SWOT AnalysisUploaded byMonika Vj
- 1469 Parsed HTTP Proxies With Kidux Proxy ScraperUploaded byanon_397930627
- Application of an Interactive Ode Simulation ProgramUploaded bybarbara_rope
- economy backgroundUploaded byGabriel Peter
- SQL Practice Problems(1)Uploaded bySudhanshu Mishra
- Catalogo Sokkia SCT-6Uploaded byFelipe Perez Guzman
- AME_2010_NEOAUGUploaded byhamdy2001
- BSC COMM TECH REGISTRABLE COURSES AS AT July 2012.pdfUploaded byGodwin Ariwodo
- Low-Power Level Shifter ForUploaded byvkry007
- Acs DatasheetUploaded byArio Wicaksono Damanik
- pid self tuningUploaded byArianna Lovati
- LaptopUploaded byAnonymous ETBwIduGi
- MUSTREAD.txtUploaded bygnohm
- An Introduction to 3G TechnologyUploaded byadityadixit109046
- Sales Guide CompensationUploaded bylvaratharaju
- MacOS 10.14 Beta 8 Release NotesUploaded byJason
- Check List Service SparesUploaded byapi-3753991
- PO lsmw reqUploaded byKishore Kumar
- Paper 3_Importance of SlottingUploaded byAlumno1979
- Document 2363980.1Uploaded byprasantha1989mvk
- nasmdoc.pdfUploaded byistv4n
- Primary and Secondary Service and System ConfigurationsUploaded bydaodoquang
- Dynamic Graphics magazineUploaded bydebro100
- Data security in cloud computingUploaded byPiyush Mittal
- SAP INTRO PPT.pdfUploaded byRAJLAXMI THENGDI
- SMART_Manual.pdfUploaded bySaad Parvaiz Durrani

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.