Professional Documents
Culture Documents
E ( x| y )=β 0 + β 1 x
Equation for the line in the graph
Note that the equation has the same format as y=mx+b
y i=β 0 + β 1 x i +ε i
What is epsilon?
o The vertical distance between the regression line point at
x i and the actual scatter plot point at x i
o Hence: y i=E ( x| y ) +ε i
yhat=b o +b1 x
y i=b o +b1 x+ ei
y i= y h at +e i
What is correlation?
Two variables are correlated if there is a positive or negative linear relationship
1 = perfectly positive correlation, -1 = perfectly negative, 0 = no correlation
covariance(x , y)
corr ( x , y )=ρ x, y =
σxσy
What is covariance?
How x and y vary together
List of equations
b 0= ybar−b1 (xbar )
ybar =b0 +b 1( xbar)
2
sx , y SS xy r x, y s y
b 1= = =
s
2
x
SS xx sx
R2=(r ¿¿ x , y )2 ¿
2
SSR=R × SST
SSE=SST −SSR
SSE 2
MSE= =s
n−2 e
se =√ MSE
2
SS xx=s x × ( n−1 )
se
sb =
1
√ SS xx
SST SST
s2y = → n= 2 +1
n−1 sy
o This will be useful in problems where we’re not given n
SS yy =SST =∑ ( y i− ybar )
2
SSR
MSR=
df R
o This is for the ANOVA table
1. Find rx,y, R2, sb1, se, SSR, SSE, MSE. (21 points)
b1 s x −3.835 × √ 1.09
r x , y= = =−0.66225
sy √ 36.552
R2=(r ¿¿ x , y )2=¿ ¿
SSR=R2 × SST =0.4386 ×3618.648=1587.057
SSE=SST −SSR=3618.648−1587.057=2031.59
SSE 2031.59
MSE= =
n−2 n−2
To find n :
SST 3618.648
n= +1= +1=100
2
sy 36.552
Hence:
2031.59 2031.59
= =20.731
n−2 100−2
se =√ MSE=√ 20.731=4.553
se 4.553 4.553
sb = = = =0.4383
1
√ SS xx √ s × ( n−1 ) √1.09 × ( 100−1 )
2
x
What is βstar ? It’s the beta that we’re testing in our hypothesis test (in this case, -2).
Now, we find t α :
Figure out if it’s a one-tailed or two-tailed test. If HA has “<” or “>” it’s one-tailed. If HA has “≠” it’s
two-tailed. In this case, we have a one-tailed test. That means that we’re looking for the bottom
portion of the data under the curve, as opposed to the middle data. Below is an example of one and
two-tailed tests where α =0.05 in both:
Whenever n (our sample size) is ≥ 30, that means that we can use t and z interchangeably.
Since n=100, we’ll use a z-table.
We know that α =0.05 . That means that we’re looking for the bottom (since it’s one-tailed) 95%, or
0.95 under the curve. To find the correct value for z α aka t α , we look for 0.95 inside the z-table. We
find that 0.95 is directly in between 0.505 and 0.0495, whose z-values are 1.64 and 1.65 respectively.
Thus, we take the average of the two and find a z value of 1.645. This is our t α .
t α =1.645
Now, all that’s left is to compare our t and −t α
RR: t <−t α
−4.187←1.645
Since −4.187←1.645, we reject H0 in favor of HA at α =0.05 . Thus, there is evidence at α =0.05 that
β 1 ←2.
3. Construct a 95% CI for Beta1.
This is the structure for this kind of confidence interval:
b 1 ± t α /2 × sb 1
We’re looking for z 0.025. To find it, we look inside the z-table for 0.975.
The z-value that corresponds with 0.975 is 1.96. That means that z 0.025 aka z α / 2 aka t α / 2=1.96.
z 0.025 =1.96=t α /2
Now, we just plug that back into the CI formula:
b 1 ± t α /2 × sb 1
4. Construct a 90% CI for muy, the true population mean of the y values and interpret the CI.
Almost exactly the same as #3, except the variables are a bit different in this CI formula:
sy
ybar ± t α /2 ×
√n
Let’s start with finding t α/ 2. Since it’s a 90% CI, α =0.1 (because 1−0.9=0.1)
If α =0.1, then α /2=0.05
α =0.1
α / 2=0.05
To find t α / 2, we look for z α/ 2=z 0.05, which can be found by looking for 0.95 in the z-table.
Why 0.95? Because if z α / 2=z 0.05, then we need the bottom 1−0.05=0.95 under the curve.
z 0.05=1.645=t α /2
√ n= √100=10
ybar =b0 +b 1 ( xbar )=42.59+ (−3.835 ×9.937 )=4.4816
6.0458
CI: 4 . 4816 ±1 . 645( )
10
We are 90% confident that μ y is in the above interval. This is because if we repeat the process many
times, then approximately 90% of the resulting intervals would include μ y .
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based upon the regression output in RStudio, answer the following questions.
5. Is there evidence at alpha = .05 of a negative linear relationship between mpg and wt?
This is another hypothesis test. We’re trying to figure out if there is a negative linear relationship. Beta
is an indicator of that. If β is zero, there is no linear relationship. If β is positive, there is a positive
linear relationship. If β is negative, there is a negative linear relationship. Hence…
Below are the null and alternate hypotheses:
H0: β 1=0
HA: β 1< 0
b1
Test statistic: t=
sb
1
These values can be found in the R-studio information we’re given. Below is a template for information
from a simple regression in R-studio:
b1 −5.3445
t= = =−9.559
sb
1
0.5591
We also know that the p-value is in the bottom right corner of the coefficients table.
p-value = 1.29e-10 ≈ 0
7. What is the value of the sample standard deviation of the residuals? Provide an interpretation of this value
within the context of the problem.
To answer this, we must know that the sample standard deviation of the residuals is represented by se .
We’re given se where it says: “Residual standard error: 3.046”
se =3.046
Within a band 2 se thick of the regression line, we would expect that approximately 95% of the y-
values (mpg’s) will lie there.
All this means is that we expect 95% of the values to lie within 2 standard deviations of the line itself
(this makes sense because we know that when it comes to a standard bell curve, 95% of the data lies
within 2 standard deviations from the mean).
r x , y =( √ R ) × sign (b 1)
2
What is sign( b1) ? It’s either -1 or 1, depending on if b 1 has a negative or positive sign in front of it.
In this case, since b 1=−5.3445, sign ( b 1) =−1.
SSR=SST −SSE=1125.983−278.343=847.64
Intersection of Regression and df:
In the R-studio information, we’re told “F-statistic: 91.38 on 1 and 30 DF”
When it says “on 1,” that tells us that the value in the top-left corner of the ANOVA table is 1.
847.64
MSR = SSR divided by whatever is in the top-left box in the table ¿ =847.64
1