You are on page 1of 14

Running Head: STATISTICAL ASSIGNMENT 1

Statistical Project Assignment

Course

Instructor

Date
STATISTICAL ASSIGNMENT 2

Statistical Project Assignment

Dataset 1: Simple Regression Analysis

A).

SUMMARY
OUTPUT                
   
Regression Statistics  
0.609
Multiple R 4  
0.371
R Square 4  
Adjusted R 0.370
Square 1  
Standard 10095
Error .7328  
495.0
Observations 000  
   
ANOVA            
Signific
  df SS MS F ance F  
29687314 29687314 291.
Regression 1 944.7007 944.7007 2696 0.0000  
50248443 10192382
Residual 493 796.1759 1.0876    
79935758
Total 494 740.8766        
   
P-
Coeffi Standard valu Lower Upper Lower Upper
  cients Error t Stat e 95% 95% 95.0% 95.0%
42362 0.00 41038.0 43686 41038. 43686.
Intercept .2458 673.9857 62.8533 00 071 .4845 0071 4845
1201. 0.00 1062.86 1339. 1062.8 1339.4
yearsrank 1490 70.3800 17.0666 00 72 4307 672 307

B).
STATISTICAL ASSIGNMENT 3

If the meaning of the years in rank of an academic staff increase by 1, then their salary also

increases by $ 1201.149.

C).

When X = 12  
Then,  
Salary = $42362.25 + 1201.149(12)  
Salary (Y) = $ 56,776.0338

D).

The coefficient of determination is as follows:

Coefficient of Determination 0.37139

The percentage variation in salary as a result of the years in ranking is 37.1%.

E).

The standard error of estimate is as follows:

Standard Error of Estimate 10095.7328

The standard error of the estimate above shows the accuracy of the predictions. The above

standard error value of 10095.733 is a small percentage of the mean value of salary, therefore,

we can interpret it as a low standard error of estimate.

F).
STATISTICAL ASSIGNMENT 4

The residual analysis has been performed in the excel spreadsheet and the residual plot is shown

below:

yearsrank Residual Plot


60000
40000
Residuals

20000
0
-20000 0 5 10 15 20 25 30
-40000
yearsrank

The above plot shows that the assumptions of independence of variables and constant variance

are maintained by the model however, the normality of the distribution is not visible.

G).

The t-test for slope is as follows:

b1 1201.1490
SE 70.3800
t value 17.0666
Df 493.0000
Using T Distribution Calculation P Value
is 0.0000%

The p value is 0.000 therefore, the slope is significant.

H).

The F-test for slope is as follows:

F = Explained Variance/ Unexplained


Variance  
F value 291.2696
STATISTICAL ASSIGNMENT 5

Df 493
P value 0.0000%

The p value is 0.000 therefore, the slope is significant.

I).

The test for correlation coefficient is as follows:

Correlation between X and Y 0.6094


T value for r 17.0666
P value for correlation 0.0000%

Since, the p value is 0.000, both variables are significantly correlated.

J).

When X = 12, then  


$
Mean Y = 56,776.0338
Alpha 0.0500
Standard Deviation of Salary 12720.5847
Size 495
Confidence 1120.6050
$
LCI 55,655.4287
$
UCI 57,896.6388

The true value of the average salary of academic staff would be between $ 55,655.4287 and $

57,896.6388.

K).

When X = 12, then  


STATISTICAL ASSIGNMENT 6

$
Mean Y = 56,776.0338
T value 1.9650
Standard Error of the Estimate 673.9857
$
Lower 55,451.6572
$
Upper 58,100.4103

The future observations or the future mean salary of academic staff with a certain probability

would fall between $ 55,451.6572 and $ 58,100.4103.

Dataset 2: Multiple Regression Analysis

A).

SUMMARY
OUTPUT                
   
Regression Statistics  
0.273
Multiple R 7  
0.074
R Square 9  
0.073
Adjusted R Square 8  
5.388
Standard Error 8  
Observations 1700  
   
ANOVA            
Signifi
cance
  df SS MS F F  
68.
3991.15 1995. 720
Regression 2 49 5774 3 0.0000  
49279.4 29.03
Residual 1697 313 91    
53270.5
Total 1699 862        
STATISTICAL ASSIGNMENT 7

   
P- Uppe
Coeffi Standar val Lower r Lower Upper
  cients d Error t Stat ue 95% 95% 95.0% 95.0%
8.360 4.953 0.0 11.67 11.670
Intercept 4 1.6879 2 000 5.0498 09 5.0498 9
0.336 11.64 0.0 0.392
ttl_exp 0 0.0289 42 000 0.2794 6 0.2794 0.3926
- - -
0.125 2.892 0.0 0.040 - -
age 0 0.0432 1 039 -0.2098 2 0.2098 0.0402

Wage = $8.360 +
0.336(ttl_exp) -
0.125(age)    

B).

If the total years of experience of the employee increases by 1 year, then the wage of the

employee would increase by $ 0.336 per hour. If the age of the employee increases by 1 year,

then the wage of the employee would decrease by $ 0.125 per hour.

C).

When X = 18 and X2 = 38  
Then,  
Wage = $8.3604 + 0.3360(18) - 0.125(38)  
$
Wage = 9.6574

D).

When X = 18 and X2= 38, then  


$
Mean Y = 9.6574
$
Alpha 0.0500
Standard Deviation of Wage $
STATISTICAL ASSIGNMENT 8

5.5995
$
Size 1,700.0000
$
Confidence 0.2662
$
LCI 9.3912
$
UCI 9.9236

The true value of the average wage of all employees would be between $ 9.39 and $ 9.92 per

hour.

E).

When X = 18 and X2= 38, then  


$
Mean Y = 9.6574
$
T value 1.9614
$
Standard Error of the Estimate 5.3888
$
Lower (0.9120)
$
Upper 20.2268

The future observations or the future mean wage of employees with a certain probability would

fall between $ -0.91 and $ 20.23 per hour.

F).
STATISTICAL ASSIGNMENT 9

ttl_exp Residual Plot


40
Residuals

20

0
0 5 10 15 20 25 30 35
-20
ttl_exp

age Residual Plot


40
30
20
Residuals

10
0
32 34 36 38 40 42 44 46 48
-10
-20
age

The above residual plots show that the variance is not constant, and the distribution of the

variables does not seem to be normal. The variables seem to be independent however, normality

and constant variance assumptions are violated.

G).

Coefficient of Determination of X 0.0704  


Coefficient of Determination of Y 0.0010  
VIF of X1 1.0757 Less than 10
VIF of X2 1.0010 Less than 10

There is no reason to suspect collinearity as the VIF for both variables is less than 10. This is an

acceptable value.
STATISTICAL ASSIGNMENT 10

H).

X1 0.3360
SE 0.0289
t value 11.6442
Df 1698.0000
Using T Distribution Calculation P Value
is 0.0000

X2 0.1250
SE 0.0432
t value 2.8921
Df 1698.0000
Using T Distribution Calculation P Value
is 0.0039

On the basis of the t-tests, X1 should be included in the regression model as its p value is

significant at 0.000 and X2, should be excluded from the regression model as its p value is

insignificant at 0.338%.

I).

F = Explained Variance/ Unexplained


Variance  
F value 68.7203
Df 1698
P value 0.0000%

We have used the F-test to test the significance of the overall multiple regression model. The p

value is 0.000 and is less than 0.05 level of significance therefore, the model is significant

overall.

J).
STATISTICAL ASSIGNMENT 11

Correlation between X1 and Y 0.2653


T value for r 11.3366
P value for correlation 0.0000

Correlation between X2 and Y 0.0318


T value for r 1.3104
P value for correlation 0.1907

The relationship between Y and X1 is significant and Y and X2 is insignificant based on the

above tests performed at a significance level of 0.05.

K).

Intermediate Calculations
SSR(X1, X2) 3991.1549
SST 53270.5862
SSR(X2) 14130.5316
SSR(X1) 1693.1194
SSR(X1 | X2) -10139.3768
SSR(X2 | X1) 2298.0355
   
Coefficients
r2 Y1.2 -0.2591
r2 Y2.1 0.0446

The percentage variation in Y or wage that cannot be explained within a reduced model, but can

only be explained in the fuller model is -0.2591 and 0.0446 for X1 and X2 respectively.

L).

SUMMARY OUTPUT        
 
Regression Statistics
Multiple R 0.2977
R Square 0.0886
STATISTICAL ASSIGNMENT 12

Adjusted R Square 0.0870


Standard Error 5.3503
Observations 1700
 
ANOVA        
  df SS MS
Regression 3 4722.0285 1574.0095
Residual 1696 48548.5577 28.6253
Total 1699 53270.5862    
 

  Coefficients Standard Error t Stat P-v


Intercept 7.7469 1.6802 4.6107
ttl_exp 0.3394 0.0287 11.8429
age -0.1385 0.0430 -3.2216
race 1.4925 0.2954 5.0530

Equation for White Female


Wages = 7.747+ 0.339(ttl_exp)- 0.139(age) + 1.492(1) Employees
Equation for Black Female
Wages = 7.747+ 0.339(ttl_exp)- 0.139(age) + 1.492(0) Employees

If the race of the employee is white, then the wage per hour would increase by $ 1.492 and if the

race of the employee is black, then the wage per hour would increase by $0.

M).

SUMMARY
OUTPUT                
   
Regression Statistics  
Multiple R 0.9971  
R Square 0.9942  
Adjusted R
Square 0.9942  
Standard
Error 0.4256  
Observations 1700  
   
STATISTICAL ASSIGNMENT 13

ANOVA            
Signific
  df SS MS F ance F  
52963.90 8827. 48729.
Regression 6 26 3171 8639 0.0000  
0.181
Residual 1693 306.6836 1    
53270.58
Total 1699 62        
   
Coeffi Standar P- Lower Upper Lower Upper
  cients d Error t Stat value 95% 95% 95.0% 95.0%
9.496
Intercept 3.6193 0.3811 6 0.0000 2.8718 4.3668 2.8718 4.3668
10.83
ttl_exp 0.3192 0.0295 14 0.0000 0.2614 0.3770 0.2614 0.3770
-
- 9.504 -
age 0.0929 0.0098 5 0.0000 -0.1120 0.0737 -0.1120 -0.0737
-
- 1.642
race 0.0390 0.0238 0 0.1008 -0.0856 0.0076 -0.0856 0.0076
Interaction 0.464
(X1xX2) 0.0002 0.0004 8 0.6421 -0.0007 0.0011 -0.0007 0.0011
Interaction 161.9
(X1xX3) 0.0258 0.0002 759 0.0000 0.0255 0.0261 0.0255 0.0261
-
Interaction - 11.10 -
(X2x X3) 0.0082 0.0007 02 0.0000 -0.0096 0.0067 -0.0096 -0.0067

N).

F = Explained Variance/ Unexplained


Variance  
4872986.3892
F value %
Df 1698
P value 0.0000%

Interaction 1
9105827.5343
F value %
Df 1698
STATISTICAL ASSIGNMENT 14

P value 0.0000%

Interaction 2
F value 30271.5231%
Df 1698
P value 0.0000%

Interaction 3
F value 402719.2307%
Df 1698
P value 0.0000%

The three interactions improve the regression model significantly as the F value of the partial F

test is 0.000. Furthermore, the three interactions separately also improve the regression model as

the partial F test p value is 0.000 for all the three interactions. However, as the F value for

interaction 1 is the higher therefore, it terms to be included in the model.

Reference

You might also like