You are on page 1of 28

University of Gujrat

School of Business
Administration and Management
Academic –Block (H. H CAMPUS)

Student Name Muhammad Faisal, Yasir Arfat, Tauseef Ahmad


Roll No. 16101720-024, 16101720-001, 16101720-033
Instructor Name Dr. Muqaddas Javed
Course Title Quantitative Techniques in Business
Course Code MGT-508
Assignment No. 03
Date Submitted Due Date 15-08-2017
Note: Marking penalties apply for assignment submitted after the due date.

DECLARATION BY STUDENT:

I certify that this assignment is my/our own work in my/our own words. All sources have been
acknowledged and the content has not been previously submitted for assessment to School of Business
Administration and Management, University of Gujrat or any other institute. I also confirm that I have kept
a copy of this assignment.

Signed: ..........................................

This assignment must be submitted in hard copy, either (1) to CR or GR, or (2) by Person to the
Office S-209 or (3) by post, ensuring that either the assignment or envelope is date stamped.

Office use only: Office Use


Date stamp:
This assignment was received:
Up to one week late and will receive a 25% mark deduction

Two weeks late and will receive a 50% mark deduction

Over two weeks late and no marks will be awarded

1
Table of Contents

Objective 1

What is Parametric Test & Non Parametric Test?

Is Our Data Normal?

One Sample t-Test (One Variable One Mean)

Two Independent Sample t-Test (Two Variables and both are Independent)

Paired Sample t-Test (Two Dependent Variables)

Analysis of Variance (Anova), (More than two variables)

Objective 2

Simple Regression

Simple Linear Regression

Population Parametric Model

Constraints of Simple Regression

Multiple Regression

Coefficient of Determination

Simple Regression in SPSS

Interpretation of Results

Objective 3 Objective 4

Logistic Relation Factor Analysis

Discriminants Analysis PFA

Assumptions of Logistic Relation CFA

Odds FA through SPSS

How to Run the Model in SPSS

Interpretation of Results

2
Parametric Test vs Non Parametric Test:-

The test which follows some distribution is called parametric Test whereas the test which not follows any
distribution is called non parametric test.

Well known distributions of Normal data are Z, T and F. The basis condition for using parametric test is
that the data should follow some distribution.

HOW TO CHECK THE NORMALITY OF THE OF THE DATA?

Procedure

For the stated task we run normality test by using explore option which available in sub menu of descriptive
statistics under the head of analysis. Normality shall be checked only for the Quantitative Data

For checking the Normality of the data we use following Descriptive and Inferential tests: -

Descriptive Approach: - (i) Graphical (ii) Numerical

Graphical Approach: - Graphically we use histogram, P-P plot / Q-Q Plot, Stem & Leave Plot, Box Plot

Numerical Approach: Numerically we use Mean, Median, Mode, Skewness and Kurtosis.

For normality the distribution should be Mesocratic and Symmetric.

Inferential Approach: - In this approach we use two tests

1- Shapiro Wills W Test


2- Kolmogorov Smirnov

The condition for the using the above test is that

- If the data is 3 to 2000 we will use Shapiro Wills


- If the data is > 2000 we will use Kolmogorov- Smirnov

The hypothesis is for the parametric test shall be as follow

Ho = Test Distribution is Normal

HA= Test Distribution is not Normal

3
4
The output files shows the following results :-

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

Miles per Gallon 398 23.51 7.816 .392

The above table shows that there are 398 observations for MPG with Mean of 23.51 with 7.816 standard
deviation.

One-Sample Test

Test Value = 25

95% Confidence Interval of the


Difference

T Df Sig. (2-tailed) Mean Difference Lower Upper

Miles per Gallon -3.791 397 .000 -1.485 -2.26 -.72

The above table shows that the sample t-test value is -3.791 which the statistic of the formula the average
test value is 25 whereas the test value is has mean difference of -1.485 which shows that the distribution is
not normal. df is 397 which is equal to n-1 ( 398-1) so the total no. of observations are 398.

5
Distribution of the Histogram shows that the distribution is not symmetric but positively / right skewed.

The Q-Q plot show that the data is not on the fitted line especially the starting and ending points are not on
the line so the distribution is not normal..

6
Another view portrays that the data is not only the straight line and the data is scattered and deviated from
the normality.

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

Miles per Gallon 398 98.0% 8 2.0% 406 100.0%

The normality test is run on miles per gallon. The table of case processing indicating that for this test 406
observations Or vehicle are used for the analysis but late on it is observed that 2.% vehicle information on
MPF is not stated and the valid information is only available on 398 units.

7
Descriptive

Statistic Std. Error

Miles per Gallon Mean 23.51 .392

95% Confidence Interval for Lower Bound 22.74


Mean
Upper Bound 24.28

5% Trimmed Mean 23.22

Median 23.00

Variance 61.090

Std. Deviation 7.816

Minimum 9

Maximum 47

Range 38

Interquartile Range 12

Skewness .457 .122

Kurtosis -.511 .244

It is observed though descriptive analysis that average Mile Per Gallon of 398 vehicles is 23.5MPG with
variation of 7.81 (SD). The minimum MPG for vehicle observed is 9 and maximum MPG is 47. The Median
of MPG varies from average 1.5 MPG. As the averge varies with each other which is an indication that
distribution of data has not symmetrical pattern but we further investigate the pattern though skewness and
kurtosis for confirmation, the both statistics also indicate a skewed pattern in the distribution of Data.
Further through graphical approach to use Histogram and Normal Q-Q Plot for investigation. The vertical
bars of the Histogram make along a tail on right side from the average 23.51. and Normal Q-Q plot also not
fit expected numbers on the fitted line on both tails. The above table shows that MPG mean is 23.51 with
standard deviation 7.816. which shows that the data is not symmetrical and the distribution is not normal as
it has Std.Dev of 7.816 for this sample. Also the value of skewness is .457 which is > 0 = positively skewed.
Similarly, the kurtosis has the value of -.511 which is not equal to 3 and less than 3 which shows that the
distribution is not normal.

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic Df Sig. Statistic df Sig.

Miles per Gallon .079 398 .000 .968 398 .000

a. Lilliefors Significance Correction

8
The table contain the shapiro wilk statistic value with degree of freedom and p-vale. As the No. of
observations are under 2000 therefore the results shall be extracted from Shapiro Wilk. The significance is
.000 = P-value which is less than level of significance (α). So we will reject Ho (Test Distribution is
Normal) and will accept HA (Test Distribution in Not Normal).

Parametric Test
After checking the normality, the Parametric tests are applied. Following four types of test are used in
parametric Tests: -

1- One Sample t-Test (One Variable One Mean)


2- Two Independent Sample t-Test (Two Variables and both are Independent)
3- Paired Sample t-Test (Two Dependent Variables)
4- Analysis of Variance (Anova), (More than two variables)

Condition for applying One Sample t-Test


- Test distribution should be Normal
- The Variable must be quantitative
- Data must be random

Using SPSS for One Sample t-Test

Analyze > Compare Means > One Sample t-Test

Ho = MPG average is 25 mpg

HA= MPG average is not 25 mpg

9
The output file shows the following results

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

Miles per Gallon 398 23.51 7.816 .392

10
The statistics shows that there are 398 observations against MPG with 23.51 mean and 7.816 std. deviation.

One-Sample Test

Test Value = 25

95% Confidence Interval of the


Difference

T Df Sig. (2-tailed) Mean Difference Lower Upper

Miles per Gallon -3.791 397 .000 -1.485 -2.26 -.72

The above table showed that there is not much difference between claimed and Sample Mean but the
variation as mentioned in table one is 7.816 which may vary by sample to sample so we can conclude that
data dis not support our hypothesis and the results are significant (Ho is rejected).

Two Independent Sample t-Test


Ho = Variable difference between 8 and 4 Cylinder Vehicles is same
HA= Variation Difference between 8 and 4 Cylinder Vehicles not same
T-Test

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 4 CYL 26.6376 101 4.50795 .44856

8 CYL 14.9921 101 2.78509 .27713

Paired Samples Correlations

N Correlation Sig.

Pair 1 4 CYL & 8 CYL 101 .363 .000

11
Paired Samples Test

Paired Differences

95% Confidence
Interval of the
Difference
Std. Std. Error Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)

Pair 4 CYL - 8 11.6455


4.35510 .43335 10.78579 12.50530 26.873 100 .000
1 CYL 4

Descriptive Statistics

N Mean Std. Deviation Minimum Maximum

4 CYL 204 29.2868 5.71016 18.00 46.60

8 CYL 102 15.0216 2.78723 10.00 26.60

Wilcoxon Signed Ranks Test

Ranks

N Mean Rank Sum of Ranks

8 CYL - 4 CYL Negative Ranks 101a 51.00 5151.00

Positive Ranks 0b .00 .00

Ties 0c

Total 101

a. 8 CYL < 4 CYL

b. 8 CYL > 4 CYL

c. 8 CYL = 4 CYL

12
Test Statisticsb

8 CYL - 4 CYL

Z -8.728a

Asymp. Sig. (2-tailed) .000

a. Based on positive ranks.

b. Wilcoxon Signed Ranks Test

ANOVA

Miles per Gallon

Sum of Squares Df Mean Square F Sig.

Between Groups 15279.466 4 3819.866 170.897 .000

Within Groups 8761.906 392 22.352

Total 24041.372 396

Multiple Comparisons

Miles per Gallon


LSD

95% Confidence Interval


(I) Number of (J) Number of Mean Difference
Cylinders Cylinders (I-J) Std. Error Sig. Lower Bound Upper Bound

3 Cylinders 4 Cylinders -8.737* 2.387 .000 -13.43 -4.04

5 Cylinders -6.817 3.611 .060 -13.92 .28

6 Cylinders .564 2.420 .816 -4.19 5.32

8 Cylinders 5.528* 2.410 .022 .79 10.27

4 Cylinders 3 Cylinders 8.737* 2.387 .000 4.04 13.43

5 Cylinders 1.920 2.750 .485 -3.49 7.33

6 Cylinders 9.301* .613 .000 8.10 10.51

8 Cylinders 14.265* .573 .000 13.14 15.39

5 Cylinders 3 Cylinders 6.817 3.611 .060 -.28 13.92

4 Cylinders -1.920 2.750 .485 -7.33 3.49

6 Cylinders 7.381* 2.778 .008 1.92 12.84

8 Cylinders 12.345* 2.769 .000 6.90 17.79

6 Cylinders 3 Cylinders -.564 2.420 .816 -5.32 4.19

4 Cylinders -9.301* .613 .000 -10.51 -8.10

13
5 Cylinders -7.381* 2.778 .008 -12.84 -1.92

8 Cylinders 4.964* .697 .000 3.59 6.33

8 Cylinders 3 Cylinders -5.528* 2.410 .022 -10.27 -.79

4 Cylinders -14.265* .573 .000 -15.39 -13.14

5 Cylinders -12.345* 2.769 .000 -17.79 -6.90

6 Cylinders -4.964* .697 .000 -6.33 -3.59

*. The mean difference is significant at the 0.05 level.

Ranks

Number of
Cylinders N Mean Rank

Miles per Gallon 3 Cylinders 4 160.62

4 Cylinders 204 287.35

5 Cylinders 3 259.83

6 Cylinders 84 150.21

8 Cylinders 102 62.20

Total 397

Test Statisticsa,b

Miles per Gallon

Chi-Square 282.560

df 4

Asymp. Sig. .000

a. Kruskal Wallis Test

b. Grouping Variable: Number of


Cylinders

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

X1 207 100.0% 0 .0% 207 100.0%

14
Descriptives

Statistic Std. Error

X1 Mean 2.3127E3 24.41291

95% Confidence Interval for Lower Bound 2.2646E3


Mean
Upper Bound 2.3608E3

5% Trimmed Mean 2.2992E3

Median 2.2340E3

Variance 1.234E5

Std. Deviation 3.51241E2

Minimum 1613.00

Maximum 3270.00

Range 1657.00

Interquartile Range 530.00

Skewness .557 .169

Kurtosis -.275 .337

X1
X1 Stem-and-Leaf Plot

Frequency Stem & Leaf

2.00 16 . 14
5.00 17 . 56799
12.00 18 . 002233334567
25.00 19 . 1233445556666777788888999
17.00 20 . 00122344455667778
30.00 21 . 001122222233333445555566678899
26.00 22 . 00011122222334455666677899
16.00 23 . 0001235777788999
10.00 24 . 0000335689
16.00 25 . 0011244456677889
17.00 26 . 00122333466777779
10.00 27 . 0122344599
7.00 28 . 0056679
6.00 29 . 034557
3.00 30 . 039
2.00 31 . 59
3.00 32 . 357

Stem width: 100.00


Each leaf: 1 case(s)

15
Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

X1 207 100.0% 0 .0% 207 100.0%

Descriptives

Statistic Std. Error

X1 Mean 2.3127E3 24.41291

95% Confidence Interval for Lower Bound 2.2646E3


Mean
Upper Bound 2.3608E3

5% Trimmed Mean 2.2992E3

Median 2.2340E3

16
Variance 1.234E5

Std. Deviation 3.51241E2

Minimum 1613.00

Maximum 3270.00

Range 1657.00

Interquartile Range 530.00

Skewness .557 .169

Kurtosis -.275 .337

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

X1 .095 207 .000 .967 207 .000

a. Lilliefors Significance Correction

17
18
19
Objective 2

Simple Linear Regression

Population Parametric Model

Constraints of Simple Regression

Multiple Regression

Coefficient of Determination

Simple Regression in SPSS

Interpretation of Results

Simple Regression
Regression is a statistical measure used in finance, investing and other disciplines that attempts to determine
the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other
changing variables (known as independent variables).

To find out the relationship between one dependent variable due to the change in independent variable.
Just like if we want to know the relationship between Store Size with store sales.
Sometimes independent variables link with each other’s therefore, they don’t effect dependent variable or
we cannot calculate the actual change of dependent variable. To overcome this we use only independent
variables which have not relate with each other.

Y= βo + βiXi+ €i
Y=sales, responses (dependent Variable)
βo = Intersect
βi = Rate of change
Xi = Control, Independent Variable
€i= Random Error

Simple Liner Regression


When there is constant change in dependent variable due to change in independent variable the relation is
called Simple linear regression.

Population Parametric Model


Y= βo + βiXi+ €i

Y= 100+150 x 7=

20
Sales Sample
Sr.No. Store Size X
Result
Y

1 850 5

2 1150 7

3 1600 10

4 1900 12

5 2350 15

6 3100 20

7 3850 25

Predictions in Regression Analysis: Interpolation Versus Extrapolation


When using a regression model for prediction purposes, we should consider only the relevant range of the
independent variable in making predictions. This relevant range includes all values from the smallest to the
largest X used in developing the regression model. Hence, when predicting Y for a given value of X, we can
interpolate within this relevant range of the X values, but you should not extrapolate beyond the range of X
values. When we use the square footage to predict annual sales, the square footage (in thousands of square
feet) varies from 5 to 25. Therefore, we should predict annual sales only for stores whose size is between
this range. Any prediction of annual sales for stores outside this range assumes that the observed
relationship between sales and store size for store sizes from 5 to 25 thousand square feet is the same as for
stores outside this range. For example, we cannot extrapolate the linear relationship beyond this range.

Constrains of Regression Analysis


- Data Should be Random
- Distribution of dependent variable should be linear/ Normal
- Error mean should be 0 and variance should be constant
- Error Term should not create relationship If there is correlation between them then there will be auto
correlation
- Independent Variable must not create significant relations if they have there will be an issue of
multi-collinearity
- The Relation between X & Y must be Linear
- Parameter must be linear
- Y must be Quantitative and X may be Qualitative or Quantitative

21
Multiple Regression
Y= βo + βiXi+ βiiXii+…………+ βkXk+€i

Hypothesis
Ho = βi= βii = βiii= Bk = 0 (coefficient does not play role)
Ha= at least on Bi varies (play some role)

Coefficient of Determination = R2
How much variation of dependent variable is due to independent variable?
IF
R2 = 0 ( No Role)
R2= 1 (Too much /Complete Role)
R2 > .8 to .9 perfect for forecasting
Regression In SPSS
To run the regression analysis in the SPSS we will select the Regression button under the
Analyze menu and will select linear. Analyze Regression Linear
Statistics Confidence Interval, Estimate, Descriptive, collinearity Diagnostic
and Durban Watson. And from Analyze-Regression-Linear-Plot-Dependent-
Histogram & Normal Probability Plot.
Also we will shift Dependent (MPG) and Independent Variables (Horsepower and
vehicle weight) accordingly before applying any test.

22
Descriptive Statistics

Mean Std. Deviation N

Miles per Gallon 23.45 7.805 392

Vehicle Weight (lbs.) 2967.38 852.294 392

Horsepower 104.21 38.233 392

Descriptive Statistics is showing that the data is consists of 392 vehicles/ observations (population) out of
which the MPG mean is 23.45 with SD 7.805 similarly the Vehicle weight has a mean of 2967 and
horsepower has a mean of 104.21 with SD of 38.233.

Correlations

Vehicle Weight
Miles per Gallon (lbs.) Horsepower

Pearson Correlation Miles per Gallon 1.000 -.807 -.771

Vehicle Weight (lbs.) -.807 1.000 .857

Horsepower -.771 .857 1.000

Sig. (1-tailed) Miles per Gallon . .000 .000

Vehicle Weight (lbs.) .000 . .000

Horsepower .000 .000 .

N Miles per Gallon 392 392 392

Vehicle Weight (lbs.) 392 392 392

Horsepower 392 392 392

Correlations table is showing the relation between MPG, Vehicle Weight and Horsepower also it shows that
there is relation between MPG, Horsepower and Vehicle Weight. MPG has a high degree negative relation
with vehicle weight and horsepower. Also the significant values (i-tailed) are less than the P-Value α so we
will reject Ho and accept HA.
Variables Entered/Removedb

Variables Variables
Model Entered Removed Method

1 Horsepower,
Vehicle Weight . Enter
(lbs.)a

a. All requested variables entered.

b. Dependent Variable: Miles per Gallon

23
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate Durbin-Watson

1 .822a .675 .674 4.459 .819

a. Predictors: (Constant), Horsepower, Vehicle Weight (lbs.)

b. Dependent Variable: Miles per Gallon

The Results of above model show that the R=.822 which shows that independent variables jointly effected
by the dependent variables (joint relation between dependent and independent variables) whereas the value
of R square shows the amount of variance in the Dependent variable that is accounted for or explained by
the Independent Varibables. It also means that vehicle weight and Horsepower explain 67 % about MPG.
The value of Durbin- Watson statistics is .819 which is between 0.5 to 2.5 so there is no issue of
autocorrelation. The thumbs rule for Durbin Watson is that it should not be over 2.5 otherwise there will be
issue of autocorrelation.

ANOVAb

Model Sum of Squares Df Mean Square F Sig.

1 Regression 16085.855 2 8042.928 404.583 .000a

Residual 7733.138 389 19.880

Total 23818.993 391

a. Predictors: (Constant), Horsepower, Vehicle Weight (lbs.)

b. Dependent Variable: Miles per Gallon

The model of Analysis of Variance shows that the model has the power to explain the relationship between
Independent and Dependent Variable. Also the sig. value is .000 which is less than the P-value so on the
basis of this we can reject Ho as well and will conclude that coefficient plays role in the model.

24
Coefficientsa

Standardize
Unstandardized d 95% Confidence Collinearity
Coefficients Coefficients Interval for B Statistics

Lower Upper Toleranc


Model B Std. Error Beta t Sig. Bound Bound e VIF

1 (Constant) 44.777 .825 54.307 .000 43.156 46.398

Vehicle Weight
-.005 .001 -.551 -9.818 .000 -.006 -.004 .265 3.770
(lbs.)

Horsepower -.061 .011 -.299 -5.335 .000 -.084 -.039 .265 3.770

a. Dependent Variable: Miles per


Gallon

The most interesting and most important model is the above coefficient model. It shows the relations
between Independent and dependent variables. The unstandardized coefficient predicts that .005 change in
MPG will occur in one additional unit of vehicle weight (increase / decrease) in the opposite direction as the
relation is negative remaining other independent variables constant. Similarly, MPG will change .061
increasing /decreasing in one unit change in Horsepower of the vehicle remaining other independent
variables constant.
The Result of VIF (variance Inflation Factor) shows that there is no issue of multi-collinearity as the value
of VIF is only 3.70 and the rule of thumb is VIF => 10 then there will be an issue of multi-collinearity
whereas when the value of VIF <10 then there is no issue of multi-collinearity.

Collinearity Diagnostics

Variance Proportions

Dimensi Vehicle Weight


Model on Eigenvalue Condition Index (Constant) (lbs.) Horsepower

1 1 2.923 1.000 .01 .00 .00

2 .065 6.731 .72 .01 .16

3 .013 15.195 .27 .98 .84

a. Dependent Variable: Miles per Gallon

25
Residuals Statistics

Minimum Maximum Mean Std. Deviation N

Predicted Value 6.06 35.40 23.45 6.414 392

Residual -26.404 16.435 .000 4.447 392

Std. Predicted Value -2.710 1.864 .000 1.000 392

Std. Residual -5.922 3.686 .000 .997 392

a. Dependent Variable: Miles per Gallon


Residual value is the value which is defined as the
difference between observed and the predicted value.

26
Histogram shows the distribution is not symmetrical but is negatively skewed.

The scattered diagram shows that there is –ve relationship between MPG and Horsepower as well as MPG
and Vehicle Weight

27
Objective 3

Logistic Relation

Discriminants Analysis

Assumptions of Logistic Relation

Odds

How to Run the Model in SPSS

Interpretation of Results

Logistic Relation
The Linear regression only deals with continuous variables but in real world we have to deal with dependent
variables which have qualitative in nature having two categories, like- dislike, yes-no, accept-reject etc. the
solution of these problems is logistic regression.

28

You might also like