You are on page 1of 5

TEAM 10

Tamasi-Klaus Adrian
Tcaciuc Alexandru

Aim of study: the behavior of extra weight, depending on the number of cigarettes smoked per
day (over past 30 days) and on the age when first smoked a cigarette.
Data recorded for 30 smokers randomly drawn:

Age when first smoked a Number of cigarettes smoked


Extra weight (kg)
cigarette (years) per day past 30 days
0.79 11 14
1.60 15 11
1.26 17 17
1.37 15 17
0.80 15 19
1.01 16 15
0.39 16 2
1.50 15 16
0.81 13 5
1.35 18 7
0.48 15 17
0.45 15 1
0.69 13 15
0.65 17 1
1.01 14 16
6.59 16 6
1.83 11 8
0.65 16 16
0.77 16 5
0.73 12 9
2.36 11 22
1.12 14 17
0.96 16 19
1.17 17 21
0.90 14 18
1.44 17 17
0.75 21 8
1.43 22 20
0.81 15 20
0.89 11 3
1.83 19 16
2.21 19 8
2.57 16 10
1.34 17 11
1.97 14 15
1.17 13 3
2.26 13 4
2.33 41 7
0.50 21 13
3.17 15 15

Process the data in Excel (Data/Data Analysis/Regression) and answer the following questions:

a. Identify the variables, the linear regression equation and interpret the partial regression
coefficients.
b. Is there enough evidence to conclude that the regression model is valid, at 90%
confidence level? (critical value: 2,45).
c. Test the significance of the model parameters (critical value: 1,687).
d. Find and interpret the confidence intervals for the model parameters.
e. Compute and interpret the coefficient of determination.
f. Analyze the direction and the strength of the relationship between the three variables,
using an appropriate statistical indicator. Test its significance.
g. Get the Correlation Matrix (use Data/Data Analysis/Correlation). Explain the values on
the main diagonal.
h. Predict a person’s extra weight, if he started to smoke when he was 15 years old and used
to smoke 3 cigarettes per days in the last 30 days.

Solution

a.Identify the variables, the linear regression equation and interpret the partial regression
coefficients.

The independent variable (x1) = Age when first smoked a cigarette (years)

The independent variable (x2) = Number of cigarettes smoked per day past 30 days

The dependent variable (y) = extra weight(kg)


SUMMARY
OUTPUT

Regression Statistics
Multiple R 0.129859528
R Square 0.016863497
Adjusted R Square -0.036279017
Standard Error 1.087275694
Observations 40

ANOVA
  df SS MS F Significance F
0.37513 0.31732
Regression 2 0.750265374 3 6 0.730055194
1.18216
Residual 37 43.74023213 8
Total 39 44.4904975      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%

1.55957 0.12737
Intercept 1.108225856 0.710593552 8 3 -0.331573443 2.548025155

0.48839
X Variable 1 0.025217755 0.036033095 0.69985 8 -0.04779223 0.09822774

X Variable 2 -0.009522382 0.028242804 -0.33716 0.7379 -0.06674774 0.047702975

Linear Regression Equation

 Population;
Y i = β 0 + β 1∗X 1i + β 2∗X 2i + ε i

 Sample:
y i= b 0 + b 1∗x 1i + b 2∗x 2i + e i

y i= 1,108 + 0,025∗x 1i - 0,009∗x 2i + e i

b 1=0,025 > 0 => positive correlation between Age when first smoked a cigarette (years) and
extra weight(kg)
b 2 = -0,009 < 0 => negative correlation between Number of cigarettes smoked per day past 30
days and extra weight(kg)

 If the extra Age when first smoked a cigarette (years) increases by 1year, then the extra
weight increase by 0,025 years.

 If the Number of cigarettes smoked per day past 30 days increases by 1m2 then extra
weight decrease by 0,009 kg.

b. Is there enough evidence to conclude that the regression model is valid, at 90%
confidence level? (critical value: 2,45).

H 0 : MSR=MSE ( the model is not valid)


. H 1: MSR > MSE ( the model is valid)

MSR 0.375133
 F comp = = = 0.3173
MSE 1.182168

 F crit= 2,45

F comp < F crit => F comp ∈ Ra => reject H 1, accept . H 0=> the model is not valid ( significant )

Significance F = 0.730055194 > 0,1 (∝) => accept H 0=> the model is not valid

c. Test the significance of the model parameters (critical value 1,687).

Hypothesis:

H0 : β j = 0
H 1 : β j ≠ 0 ; j= 1 , n

b^ j−βj b^ j
t comp = =
sb^
j
s b^
j

 Testing the β 0 parameter:

H 0 : β0 = 0

H 1 : β0 ≠ 0
b^ 0− β 0 b^ 0 1.108225856
t comp = = = = 1.55957769
s ^b
0
s b^
0
0.710593552
t ∝/2 , n−k−1 = t 0,025;37 = 1,687

t comp < t ∝/2 , n−k−1 => accept H 0 , the parameter β 0 is

d.) Upper( β 1) = -0,03


Lower( β 1) = -0,09
b 1-tn2,n-2 x Sb1≤ β1 ≤ b1+tα12, n-2 x Sb1
−0,09≤ β1 ≤ -0,03
Both limits are negative ⇒0 is not in the interval ⇒ β 1≠0 ⇒ β 1 is significant
Upper ( β 2) = 0,04
Lower( β 2 ¿ = -0,12
-0,12≤ β 2 ≤ 0,04

e.) Rsquare=0,016863x100=1,6863% dependent variation of the variation is explained by


the regression model

f.) r = multiple r =~ 0,12986


r > 0 => The link is direct and poor (because the value is very close to zero)
*testing the significance of multiple r

H₀ : r = 0
H₁ : r ≠ 0
L = 0,1
Ttabelar = 2,048 (for 0,1 and n-2 observations)
r 0,12986
Tcalculated for r = √ n−2 = 30−2
√1−r 2 √1−( 0,12986 ) ❑2 √
If |tcalc| > ttab →rejecting the null hypothesis, that means that multiple r is statistic
sumnificativ,garanteed result with 90%

Δ. Confidence interval at 95% for x 1


x 1 1,96 se (x1) = 0,025217755 ± 1,96 x 0,036033095 = ¿-0,04507112 , 0,0958426212¿
±

.
For x2
x 2 1,96 x se ( x 2) = -0,009522382 ± 1,96 x 0,028242804 = ¿-0,06484823724 ,
±

0,04583351384¿

h.) y = 1,108225856 + 0,025217755 x x 1 – 0,009522382 x x 2 ⇒ x 1= 15 , x 2 = 3


⇒y(extra weight) = 1 x 108225856 + 0,0255217755 x 15 – 0,005522382 x 3 =
1,486492181 – 0,028567146 = 1,457925035

You might also like