This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Statistical Applications through SPSS

S. Ali Raza Naqvi

Variables:

A quantity which changes its value time to time, place to place and person to person is called variable and if the corresponding probabilities are attached with the values of variable then it is called a random variable. For example If we say x= 1 or x=7 or x=-6 then x is a variable but if a variable appears in the following way then it is known as a random variable. x 1 2 3 4 P(x) 0.2 0.3 0.1 0.4

Population:

A large count or the whole count of the object related things is called population. There are two types of population it may be finite or infinite. If the population elements are countable then it is known as finite population but if the population elements are uncountable then it is called an infinite population. For example: Population of MBA students at IUGC (Finite Population) Population of the University teachers in Pakistan (Finite Population) Population of trees (Infinite Population) Population of sea life (Infinite Population) The population is also categorized in two ways. 1. 2. Homogeneous population Heterogeneous population

Homogeneous Population:

If all the population elements have the same properties then the population is known as homogeneous population. For example: Population of shops, Population of houses, Population of boys, Population of rice in a box etc.

Quantitative Techniques in Analysis

Page 2

Statistical Applications through SPSS

S. Ali Raza Naqvi

Heterogeneous Population:

If all the population elements do not have the same properties then the population is known as homogeneous population. For example: Population of MBA students (Male and Female), Population of plants, etc.

Parameter:

A constant computed from the population or a population characteristic is known as parameter. For Example: Population Mean µ, Population standard deviation δ, coefficient of skewness and kurtosis for the population.

Statistic:

A constant computed from the Sample or a sample characteristic is known as parameter. For Example: Sample mean , sample standard deviation s, coefficient of skewness and kurtosis for the sample.

Estimator:

A Sample statistic used to estimate the population parameter is known as estimator. For Example: Sample mean is used to estimate the population mean. So sample mean is also called an estimator of population mean. Sample variance is used to estimate the population variance. So sample variance is also called an estimator of population variance.

Hypothesis:

An assumption about the population parameter tested on the basis of sample information is called hypothesis or hypothesis testing. These assumptions are established in the way that we generate two alternative statements say “null and alternative hypothesis” in such a manner if one statement is found wrong automatically other one is selected as correct statement.

Quantitative Techniques in Analysis

Page 3

Statistical Applications through SPSS

S. Ali Raza Naqvi

**Types of Hypothesis: 1) Null Hypothesis:
**

A Statement or the first think about the parameter value is called a null hypothesis. But statistically we can say that a null hypothesis is a statement should consist equality sign such as: H0: H0: H0: µ = µ0 µ ≤ µ0 µ ≥ µ0

As it is clear from above statements there are two types of null hypothesis. 1- Simple null hypothesis 2- Composite null hypothesis

**1-Simple Null Hypothesis:
**

If a null hypothesis is based on the single value (or it consist of only equal sign) then the null hypothesis is called a simple null hypothesis For Example Phrases Average rain fall in United States of America during 1999 was 200 mm. The average concentrations of two substances are same The IQ level of MBA and BBA students are same. IQ level is independent from education level. H0: µ = µ0

**2-Composite Null Hypothesis:
**

If a null hypothesis is based on the interval of the parameter value (or it consist of less then or greater then sign with equal sign) then the null hypothesis is called a Composite null hypothesis For Example H0: H0: Phrases µ ≤ µ0 µ ≥ µ0

Quantitative Techniques in Analysis

Page 4

Statistical Applications through SPSS The mean height of BBA students are at most 70 inches The performance of PHD students is at most same as MBA students Variability in a data set must be positive (Greater or equal to zero)

S. Ali Raza Naqvi

2) Alternative Hypothesis:

An Automatically generated statement against the established null hypothesis is called an alternative hypothesis. For Example: Null Hypothesis H0: H0: H0: µ = µ0 µ ≤ µ0 µ ≥ µ0 Alternatives Hypothesis (H1:µ ≠ µ0, (H1:µ ≠ µ0, (H1:µ ≠ µ0, H1:µ > µ0, H1:µ > µ0, H1:µ > µ0, H 1: H 1: H 1: µ < µ 0) µ < µ 0) µ < µ 0)

It is clear from the above stated alternatives that there are two different types of alternatives. 1- One tailed or One sided alternative hypothesis 2- Two tailed or two sided alternative hypothesis

**1-One tailed Alternative Hypothesis:
**

If an alternative is based on either the greater then (>) or a less then (<) sign in the statement then the alternative hypothesis is known as the one tailed hypothesis. For Example: Phrases Average rain fall in Pakistan is more then from average rain fall in Jakarta. Inzamam is more consistent player then Shaid Afridi. Waseem Akram is a better bowler then McGrath. Gold prices are dependent on oil prices. H1:µ > µ0, Or H 1: µ < µ0

**2-Two tailed Alternative Hypothesis:
**

If an alternative is based on only an unequal (≠) sign in the statement then the alternative hypothesis is known as the two tailed hypothesis. For Example: H1: µ ≠ µ0,

Quantitative Techniques in Analysis

Page 5

Statistical Applications through SPSS Phrases The Concentration of two substances is not same. There is a significant difference between the wheat production of Sind and Punjab. The consistency of KSE and SSE is not same.

S. Ali Raza Naqvi

In this type of alternatives the total chance of type I error remain in only one side of the normal curve In this type of alternatives the total chance of type I error is divided in two sides of the normal curve

**Probabilities Associated with Decisions:
**

Ho is True Correct Decision Accept Ho 1-β β False Decision Reject Ho Type I Error 1-α α Correct Decision Ho is False False Decision Type II Error

Quantitative Techniques in Analysis

Page 6

Statistical Applications through SPSS True Population

S. Ali Raza Naqvi

Other Population

It is clear from the above figures that both the errors can not be minimized at the same time. An increase is observed in the type II error when type I error is minimized.

P- Value:

It is the minimum value of alpha “α” which is needed to reject a true null hypothesis. As it is the value of “α” so it can be explain as the minimum value of type I error which is associated with a hypothesis while it is testing. Therefore, it is used in two ways, one in decision making and the other to determine the probability of type I error associated with the testing.

**Decision Rule on the basis of p - value:
**

Reject Ho if p – value < 0.05 Accept Ho if p – value ≥ 0.05 For example: If the p-value for any test appears 0.01. It is indicating that our null hypothesis is to be rejected and there is only 1% chance of rejecting a true null hypothesis. That further can explain as we are 99% confident in rejection of the null hypothesis. Or we can say that we can reject our this null hypothesis up to α = 1% or 99% confidence level If the p-value for any test appears 0.21. It is indicating that our null hypothesis is to be accepted and there is 21% chance of rejecting a true null hypothesis. That further can explain as we are 79% confident in our decision and rejection of the null hypothesis. Or we can say that our this true null hypothesis may be rejected at α = 21%. Quantitative Techniques in Analysis Page 7

Statistical Applications through SPSS

S. Ali Raza Naqvi

T-test:A t-test is a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.

**Uses of T-test:Among the most frequently used t tests are:
**

• •

A test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points. We can use some kind of t-test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. There are different versions of the t- test depending on whether the two samples are o Unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group), or o Paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention.

Interpretation of the results:If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.

•

A test of whether the slope of a regression line differs significantly from 0.

Statistical Analysis of the t-test:•

The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-tonoise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see

Page 8

Quantitative Techniques in Analysis

Statistical Applications through SPSS

S. Ali Raza Naqvi

the group difference. Figure shows the formula for the t-test and how the numerator and denominator are related to the distributions.

The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square root The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value we have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, we need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred we would find a statistically significant difference between the means even if there was none (i.e., by "chance"). We also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, we can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, we can conclude that the difference between the means for the two groups is different (even given the variability.

Calculations:a)

**Independent one-sample t-test
**

Page 9

Quantitative Techniques in Analysis

Statistical Applications through SPSS

S. Ali Raza Naqvi

In testing the null hypothesis that the population means is equal to a specified value μ0, one uses the statistic.

Where “s” is the sample standard deviation of the sample and “n” is the sample size. The degrees of freedom used in this test is “n – 1”.

b)

Independent two-sample t-test:-

A) Equal sample sizes, equal variance This test is only used when both: the two sample sizes (that is, the n or number of participants of each group) are equal; • It can be assumed that the two distributions have the same variance.

•

Violations of these assumptions are discussed below. The t statistic to test whether the means are different can be calculated as follows:

Where;

Here “ ” is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of “t” is the standard error of the difference between two means. For significance testing, the degrees of freedom for this test is “n1 + n2 − 2” where n1 = # of participants of group # “1” and “n2”= # of participants of group # “2”

Quantitative Techniques in Analysis

Page 10

Statistical Applications through SPSS

S. Ali Raza Naqvi

B) Unequal sample sizes, unequal variance This test is used only when the two sample sizes are unequal and the variance is assumed to be different. See also Welch's t test. The t statistic to test whether the means are different can be calculated as follows:

Where n1 = number of participants of group “1” and n2 is number of participants group two. In this case, variance is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student's t distribution with the degrees of freedom calculated using

This is called the Welch-Satterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances. This test can be used as either a one-tailed or two-tailed test. c) Dependent t-test for paired samples:This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".

For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant μ0 is non-zero if you want to test whether the average of the difference is significantly different than μ0. The degree of freedom used is “N – 1”.

Example # 01

Quantitative Techniques in Analysis Page 11

Statistical Applications through SPSS

S. Ali Raza Naqvi

Analysis through SPSS:A) One-sample t-test:SPSS need:1) 2)

The data should be in the form of numerical (i.e the numerical variable) A test value which is our hypothetical value to which we are going to test.

To analyze the “one-sample t-test” I have use the employees’ salaries of an organization. For this purpose, I have select the sample of “474” employees of the company. The hypotheses are:

a)

The null hypothesis states that the average salary of the employee is equal to “30,000”. H0 : 30,000

b)

The alternative hypothesis states that the average salary of the employee is not equal to “30,000”. HA: 30,000

Method:

Enter the data in the data editor and the variable is labeled as employee's current salary. Now click on Analyze which will produce a drop down menu, choose Compare means from that and click on one-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select a variable, which is to be computed. The variable computed in our case is Current salaries of the employees. The variables can be selected for analysis by transferring them to the test variable box. Next, change the value in the test value box, which originally appears as 0, to the one against which you are testing the sample mean. In this case, this value would be 35000. Now click on OK to run the analysis.

Pictorial Representation

Analyze (Scale) Compare Means Give Test Value One-Sample T Test Drag Test Variable OK

Quantitative Techniques in Analysis

Page 12

Statistical Applications through SPSS

S. Ali Raza Naqvi

SPSS output:One-Sample Statistics N Current Salary 474 Mean $34,419.57 Std. Deviation $17,075.661 Std. Error Mean $784.311

Interpretation:-

Quantitative Techniques in Analysis

Page 13

Statistical Applications through SPSS

S. Ali Raza Naqvi

In above table “N” shows the total number of observation. The average salary of total employees is “34,419.57”. The standard deviation of the data is “17,075.661”and the standard error of the mean is “784.311”.

One-Sample Test Test Value = 30000 t Current Salary 5.635 df 473 Sig. (2-tailed) .000 Mean Difference $4,419.568 95% Confidence Interval of the Difference Lower $2,878.40 Upper $5,960.73

**Interpretation:Through above table we can observe that,
**

i)

ii)

iii)

iv) v)

“T” value is positive which show that our estimated mean value is greater than actual value of mean. Degree of freedom is (N – 1) = 473. The “P-value” is “0.000” which is less than “0.05”. The difference between the estimated & actual mean is “4,419.568”. Confidence interval has the lower & upper limit 2,878.4 & 5,960.73 respectively. The confidence interval limits does not contains zero.

Decision:On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision. i) The “P-value” is “0.000” which is less than “0.05”. ii) The confidence interval limits does not contains zero. Comments:The average salary of employees is not equal to “30,000”.

Example # 02

B) Independent t-test:SPSS need:1)

Two variable are required one should be numerical and other should be categorical with two levels.

Page 14

Quantitative Techniques in Analysis

Statistical Applications through SPSS

S. Ali Raza Naqvi

To analyze the “independent t-test” I have use the employees’ salaries of an organization. For this purpose, I have select the sample of “474” employees of the company containing the both males and females. In my analysis I assigned males as “m” and female as “f”. The hypotheses are:

a)

The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 :

i.e b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:

Enter the data in the data editor and the variables are labeled as employee's beginning salary and employee's designations respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Beginning salary of the employees is the dependent variable to be analyzed and should be transferred into test variable box by clicking on the first arrow in the middle of the two boxes. Job category is the variable which will identify the groups of the employees and it should be transferred into the grouping variable box. Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents the employees belong to clerical category and group2 represents the employees belong to the custodial category. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.

Pictorial Representation

Analyze Grouping Variable Compare Means Define Groups Independent-Samples T Test OK Drag Test &

Quantitative Techniques in Analysis

Page 15

Statistical Applications through SPSS

S. Ali Raza Naqvi

SPSS output:Group Statistics

Quantitative Techniques in Analysis

Page 16

Statistical Applications through SPSS

S. Ali Raza Naqvi

Gender Current Salary Male Female

N 258 216

Mean $41,441.78 $26,031.92

Std. Deviation $19,499.214 $7,558.021

Std. Error Mean $1,213.968 $514.258

Interpretation:Through above table we can observe that, Total number of male is “258” and the female is “216”. The mean value of salaries of male employee is 41,441.78 & the female employee is 26,031.92. iii) Standard deviation of salaries of male employee is 19,449.214 & the female employee is 7,558.021. iv) Standard error of mean of salaries of male employees is 1,213.968 & the Standard error of mean of salaries of female employees is 514.258.

i) ii)

Independent Samples Test

Current Salary

Levene's Test for Equality of Variances F Sig. t df Sig. (2tailed)

t-test for Equality of Means Mean Difference $15,409.862 10.945 472 .000 $15,409.862 11.688 344.262 .000 $1,318.400 $12,816.728 $18,002.996 $1,407.906 $12,643.322 $18,176.401 Std. Error Difference 95% Confidence Interval of the Difference Lower Upper

Equal variances assumed Equal variances not assumed 119.669 .000

Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that, “F” value is “119.669” with significant value of “0.00” which is less than “0.05”. On the basis of P-value of F-test part we assume that that the variance of the two populations is not equal. iii) “T” value is positive which show that the mean value of salaries of male employees is greater than the mean value of salaries of female employees iv) Degree of freedom is “344.262”. v) The “P-value” is “0.000” which is less than “0.05”.

i) ii) Quantitative Techniques in Analysis Page 17

Statistical Applications through SPSS vi)

S. Ali Raza Naqvi

The difference between the two population mean is “15,409.862”. vii) The standard error difference between the two population mean is “1,318.400”. viii) Confidence interval has the lower & upper limit “12,816.728” & “18,002.996” respectively. The confidence interval limits does not contains zero. Decision:On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision. i) The “P-value” is “0.000” which is less than “0.05”. ii) The confidence interval limits does not contains zero. Comments:The average salaries of male & female employees are not equal.

Example # 03

C) Paired t-test:SPSS need:1)

Two numerical variables are required which should be equal in numbers.

To analyze the “paired t-test” I used the begging & ending salaries of the employees’ of an organization. For this purpose, I have select the sample of “474” employees of the organization. The hypotheses are: a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 : i.e

0

Quantitative Techniques in Analysis

Page 18

Statistical Applications through SPSS

S. Ali Raza Naqvi

b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:

Enter the data in the data editor and the variables are labeled as employee's current and beginning salary respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on Paired-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select variables, which are to be computed. The two variables computed in our case are Current and Beginning salaries. Select these together and they will immediately appear in the box at the bottom labeled current selection. They are simultaneously highlighted in the box in which they originally appeared. Once the variables are selected the arrow at the center becomes active. The variables can be transferred to the Paired-Variables box by clicking on this arrow. They will appear in the box as Current-Beginning. Now click on OK to run the analysis.

0

Pictorial Representation

Analyze Compare Means (Scale) Paired-Samples T Test OK Drag Paired Variables

Quantitative Techniques in Analysis

Page 19

Statistical Applications through SPSS

S. Ali Raza Naqvi

SPSS output:Paired Samples Statistics Std. Error Mean $784.311 $361.510

Pair 1

Current Salary Beginning Salary

Mean $34,419.57 $17,016.09

N 474 474

Std. Deviation $17,075.661 $7,870.638

Interpretation:Through above table we can observe that, i) The mean vale of current & beginning salary is “34,419.57” & “17,016.09” respectively. ii) Total number of both groups is “474” individually. iii) The standard deviation of current & beginning salary is “17,075.661” & “7,870.638” respectively. iv) The standard error mean of current & beginning salary is “784.331” & “361.510” respectively.

Paired Samples Correlations N Pair 1 Current Salary & Beginning Salary 474 Correlation .880 Sig. .000

Analyze

Interpretation:Quantitative Techniques in Analysis Page 20

Statistical Applications through SPSS

S. Ali Raza Naqvi

Through above table we can observe that, i) The total number of pair is “474”. ii) “0.88” show that the both values of group are highly co-related, which indicate that the employees who has greater begging salary has also greater current salary. iii) The P-value is “0.00” which is less than “0.05”.

Paired Samples Test

Mean

Std. Deviation

Std. Error Mean

95% Confidence Interval of the Difference Lower Upper $16,427.407 $18,379.555

t

df

Sig. (2-tailed)

Pair 1 Current Salary - Beginning Salary

$17,403.481

$10,814.620

$496.732

35.036

473

.000

Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that, i) ii) iii) iv) The mean value of pair is “17,403.481”. The standard deviation of pair is “10,814.620”. The standard error mean of pair is “496.732”. Confidence interval has the lower & upper limit “16,427.407” & “18,379.555” respectively. The confidence interval limits does not contains zero. v) T- Value is “35.036”. vi) Degree of freedom is (N-1) = “473”. vii) P-vale is “0.00” which is less than “0.05”. Decision:On the basis of following observation I reject my “Null hypothesis” and accept the “Alternative hypothesis”. I am almost “100%” sure on my decision. iii) The “P-value” is “0.000” which is less than “0.05”. iv) The confidence interval limits does not contains zero. Comments:The mean difference of the two paired variables i.e. current and beginning salary is significant or not same.

Quantitative Techniques in Analysis Page 21

Statistical Applications through SPSS

S. Ali Raza Naqvi

**One-Way ANOVA
**

ANOVA is a commonly used statistical method for making simultaneous comparisons between two or more population means, that yield values that can be tested to determine whether a significant relation exist between variables or not. Its simplest form is One-Way ANOVA, it involves only one dependent variable and one or more independent variables.

Data Source:

C:\SPSSEVAL\Employee Data

**Variables: Here we analyze two different variables by One-Way ANOVA, i.e.
**

A) Current salary of the employees. B) Employment Category.

Hypothesis:

H0: HA: µ1 = µ2 = µ3 at least one mean is not equal.

SPSS Need:

SPSS need two types of variables for analyzing one-way ANOVA. • • Numerical Variable (Scale). Categorical Variable (with more than two categories).

Method:

First of all enter the data in the data editor and the variables are labeled as employee's current salary and employment category respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on One-Way ANOVA, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform one-way ANOVA, transfer the dependent variable into the box labeled Dependent List and all factoring variable into the box labeled Factor. In our case Current salary is the dependent variable and should be transferred to the dependent list box by clicking on the first arrow in the middle of the two boxes. Employment Category is the factoring variable and should be transferred to the factor box by clicking on the second arrow and then click OK to run the analysis. If the null hypothesis is rejected, ANOVA only tells us that all population means are not equal. Multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.

Pictorial Representation

Analyze & Factors Compare Means Post Hoc (Optional) One-Way ANOVA OK Drag Dependent List

Quantitative Techniques in Analysis

Page 22

Statistical Applications through SPSS

S. Ali Raza Naqvi

Quantitative Techniques in Analysis

Page 23

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output:

ANOVA Current Salary Between Groups Within Groups Total Sum of Squares 89438483925.943 48478011510.397 137916495436.340 df 2 471 473 Mean Square 44719241962.972 102925714.459 F 434.481 Sig. .000

The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled between groups gives the variability due to the different designations of the employees (known reasons). The second row labeled within groups gives the variability due to random error (unknown reasons), and the third row gives the total variability. In this case, F-value is 434.481, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis and conclude that the average salary of the employees is not the same in all three categories.

**Post Hoc Tests
**

Multiple Comparisons Dependent Variable: Current Salary LSD Mean Difference (I-J) -$3,100.349 -$36,139.258* $3,100.349 -$33,038.909* $36,139.258* $33,038.909* 95% Confidence Interval Lower Bound Upper Bound -$7,077.06 $876.37 -$38,552.99 -$33,725.53 -$876.37 $7,077.06 -$37,449.20 -$28,628.62 $33,725.53 $38,552.99 $28,628.62 $37,449.20

(I) Employment Category (J) Employment Category Clerical Custodial Manager Custodial Clerical Manager Manager Clerical Custodial

Std. Error $2,023.760 $1,228.352 $2,023.760 $2,244.409 $1,228.352 $2,244.409

Sig. .126 .000 .126 .000 .000 .000

*. The mean difference is significant at the .05 level.

The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for Clerical – Manager and Custodial – Manager comparison is shown as 0.000, whereas it is 0.126 for Clerical – Custodial comparison. This means that the average current salary of the employees between Clerical and Manager as well as Manager and Custodial are significantly different, whereas the same is not significantly different between Clerical and Custodial. Conclusion: As our null hypothesis is rejected and we conclude that all three means are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of managers is significantly different from other two means whereas the other two means are insignificant with each other.

**Two-Way ANOVA
**

Quantitative Techniques in Analysis Page 24

Statistical Applications through SPSS

S. Ali Raza Naqvi

In two-Way Analysis, we have two independent variables or known factors and we are interested in knowing their effect on the same dependent variable.

Data Source:

C:\SPSSEVAL\Carpet

**Variables: Here we analyze two different categorical variables with a numerical variable by Two-Way ANOVA,
**

i.e. A) Preference B) Package design C) Brand (Numerical) (Categorical) (Categorical)

Hypothesis:

For Brand: For Package: H0: HA: H0': HA': µi = µj µi ≠ µj Ұ for all i & j µi = µj µi ≠ µj Ұ for all i & j

SPSS Need:

SPSS need two types of variables for analyzing two-way ANOVA. • • Numerical Variable (Scale). Two categorical Variables (with more than two levels).

Method:

First of all enter the data in the data editor and the variables are labeled as Preference, brand, package design respectively. Click on Analyze which will produce a drop down menu, choose General Linear model from that and click on Univariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform two-way ANOVA, transfer the dependent variable (Preference) into the box labeled Dependent variable and factor variable (Brand & Package) into the box labeled Fixed Factor. After defining all variables, now click on OK to run the analysis. If the null hypothesis is rejected, multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.

Pictorial Representation

Analyze General Linear Model Drag Dependent Variable & Fixed Factors Univariate Post Hoc OK

Quantitative Techniques in Analysis

Page 25

Statistical Applications through SPSS

S. Ali Raza Naqvi

Quantitative Techniques in Analysis

Page 26

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output:

Between-Subjects Factors Package design Brand name 1.00 2.00 3.00 1.00 2.00 3.00 Value Label A* B* C* K2R Glory Bissell N 9 6 7 7 7 8

This table shows the value label under each category and the frequency of each value label. We have totaled 6 value labels under package design and brand name.

Tests of Between-Subjects Effects Dependent Variable: Preference Source package brand Error Total Type III Sum of Squares

a

df 2 2 13 22

Mean Square 268.616 18.054 15.910

F 16.883 1.135

Sig. .000 .351

537.231 36.108 206.833 3758.000

a. R Squared = .763 (Adjusted R Squared = .617)

The above table gives the test results for the analysis of two-way ANOVA. The results are given in four rows. The first row labeled package gives the variability due to the different package design of the carpets, which may affect the customer's preferences (known reason). The second row labeled brand gives the variability due to the different brand names (known reason). The third row labeled error gives the variability due to random error, which also affects the customer's preferences (unknown reasons). The fourth row gives the total variability in the customer's preferences due to both known and unknown reasons. In this case, F-value for package design is 16.883, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis for package design and conclude that the average preference for all packages is not same. Now the F-value for brand name is 1.135, and the corresponding p-value is greater than 0.05. So we can accept the null hypothesis for brand and conclude that all average brand preferences are found approximately same.

**Post Hoc Tests
**

Quantitative Techniques in Analysis Page 27

Statistical Applications through SPSS

S. Ali Raza Naqvi

Package design

M ultiple C parisons om D ependent V ariable: P reference LS D M ean D ifference (I-J) 11.5556 9.2698 -11.5556 -2.2857 -9.2698 2.2857

(I) P ackage design A * B * C *

(J) P ackage design B * C * A * C * A * B *

* * * *

S E td. rror 2.10226 2.01015 2.10226 2.21914 2.01015 2.21914

S ig. .000 .000 .000 .322 .000 .322

95%C onfidence Interval Low B er ound U pper B ound 7.0139 16.0972 4.9272 13.6125 -16.0972 -7.0139 -7.0799 2.5085 -13.6125 -4.9272 -2.5085 7.0799

B ased on observed m eans. *. The m ean difference is significant at the .05 level.

As our null hypothesis for package design is rejected, so multiple comparisons are used to assess that which group mean is different from the others. The above table gives the results for multiple comparisons between each value label under package design category. The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for A* – B* and A* – C* comparison is shown as 0.000, whereas it is 0.322 for B* – C* comparison. This means that the average preference for package design between A* and B* as well as A* and C* are significantly different, whereas the same is not significantly different between B* & C*. Conclusion: As our null hypothesis for package design is rejected and we conclude that all mean preferences for package design are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of A* is significantly different from other two means whereas the other two means are insignificant with each other. But in the case of brand name, our null hypothesis is accepted and we conclude that all mean brand preferences are same. So there is no need for multiple comparisons in the case of brand.

**Chi-Square Test
**

Chi-square test is a test which is commonly used to test the hypothesis regarding; • • Goodness of fit test Test for Association / Independence / Attributes

It is denoted by "χ2" and its degree of freedom is "n-1", where n = Number of categories It is a positively skewed distribution so that, it has one tailed critical region on the right tail of the curve and the value of χ2 is always positive.

Chi-Square Goodness of Fit Test

Quantitative Techniques in Analysis

Page 28

Statistical Applications through SPSS

S. Ali Raza Naqvi

Chi-Square goodness of fit test is used when the distribution is non-normal and the sample size is less than 30, so the chi-square goodness of fit test determines whether the distribution follows uniform distribution or not.

Data Source:

C:\SPSSEVAL\Carpet

**Variables: Here we are interested to analyze a numerical variable i.e. Hypothesis:
**

H0: HA: Fit is good. (Data follows Uniform Distribution/ Prices are Uniform) Fit is not good. (Data does not follow Uniform Distribution/ Prices are not uniform) • Price (Numerical)

SPSS Need:

SPSS need a categorical variable or a numerical variable for analyzing Chi-Square goodness of fit test.

Graphical Representation:

8

6

Frequency

4

2

Mean = 2.00 Std. Dev. = 0.87287 N = 22 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50

Price

Quantitative Techniques in Analysis

Page 29

Statistical Applications through SPSS

S. Ali Raza Naqvi

Explanation of Graph

From the above graph we see that our numerical variable (price) is on x-axis and its frequency on the y-axis. The mean and standard deviation of 22 observations are 2.00 and 0.87287 respectively. The above graph clearly shows that the selected numerical variable i.e. price does not follow a normal distribution, so we use chi-square goodness of fit test to determine if the sample under investigation has been drawn from a population, which follows some specified distribution.

Method:

First of all enter the data in the data editor and the variables are labeled as price. Click on Analyze which will produce a drop down menu, choose non-parametric test from that and click on Chi-square test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to analyze. When you select the test variable, the arrow between the two boxes will now active and you can transfer the variable on the box labeled test variable list by clicking on the arrow. In this case our test variable is price and it should be transferred to the test variable box. You also click on the options button, if you are interested to know the descriptive statistics of the tested variable. Now click on OK to run the analysis.

Pictorial Representation

Analyze Non-parametric test OK Chi-square Define test Variable list

Quantitative Techniques in Analysis

Page 30

Statistical Applications through SPSS

S. Ali Raza Naqvi

Quantitative Techniques in Analysis

Page 31

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output

Price $1.19 $1.39 $1.59 Total Observed N 8 6 8 22 Expected N 7.3 7.3 7.3 Residual .7 -1.3 .7

First column of the above table shows the three categories in price variable. The column labeled Observed N gives the actual number of cases falling in different categories of test variable, which is directly obtained from the data given. The column labeled Expected N gives the expected number of cases that should fall in each category of the test variable. The column labeled Residual gives the difference between observed and expected frequencies of each category, and it is commonly known as Error.

Quantitative Techniques in Analysis

Page 32

Statistical Applications through SPSS

S. Ali Raza Naqvi

Test Statistics Chi-Squarea df Asymp. Sig. Price .364 2 .834

**a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 7.3.
**

The above table gives the test results for Chi-Square Goodness of Fit Test. In this case the chi-square value is 0.364 with a degree of freedom 2. The p-value for the test is shown as 0.834 which is greater than 0.05, so we can accept our null hypothesis that Fit is good. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that our null hypothesis is correct and our test variable (price) follows a uniform distribution, and we are 16.6% confident in our decision and the rejection of the null hypothesis.

**Chi-Square Test for Independence
**

Chi-Square test for independence is used to test the hypothesis that two categorical variables are independent of each other. A small chi-square statistics indicates that the null hypothesis is correct and that the two variables are independent of each other.

Data Source:

C:\SPSSEVAL\Employee Data

**Variables: Here we analyze two different categorical variables i.e.
**

A) Gender of the employees B) Designation of the employees (Categorical) (Categorical)

Hypothesis:

H0: HA: Designation is independent of Sex. Designation is not independent of Sex.

SPSS Need:

SPSS need two categorical variables for analyzing Chi-Square test for independence.

Method:

First of all enter the data in the data editor and the variables are labeled as Gender, Designation, respectively. Click on Analyze which will produce a drop down menu, choose Descriptive Statistics from that and click on Crosstabs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to create the row of your contingency table and transfer it to the box labeled Row(s), transfer the other variable to the box labeled Column(s). In this case we transfer gender to the box labeled Row(s) and designation to the box labeled column(s). Next, click on the Statistics button, which brings up a dialogue box. Here

Quantitative Techniques in Analysis

Page 33

Statistical Applications through SPSS

S. Ali Raza Naqvi

Tick the first box labeled Chi-Square and click continue to return to the previous screen. Click on OK to run the analysis.

Pictorial Representation

Analyze Descriptive Statistics Tick Chi-Square OK Crosstabs Drag Row and Column Variables

Quantitative Techniques in Analysis

Page 34

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output

Gender * Employment Category Crosstabulation Count Employment Category Clerical Custodial Manager 206 0 10 157 27 74 363 27 84 Total 216 258 474

Gender Total

Female Male

Cross tabulation is used to examine the variation in the categorical data, it is a cross measuring analysis. Above we are cross examine the gender and designation of the employees. We take designation of the employees in the column and gender of the employees in row, and we have totaled 474 observations. The results are given in two rows; the first row shows the number of female employees in each employment category. The second row shows the number of male employees in each employment category.

Chi-Square Tests Value 79.277 a 474 df 2 Asymp. Sig. (2-sided) .000

Pearson Chi-Square N of Valid Cases

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 12.30.

The above table gives the test results for the chi-square test for independence. The first row labeled Pearson ChiSquare shows that the value of χ2 is 79.277 with 2 degree of freedom. The two-tailed p-value is shown as 0.000, which is less than 0.05, so we can reject our null hypothesis and conclude that the Designation is not independent of Sex. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that the designation of the employees is not independent of the sex, and we are almost 100% confident on our decision and the rejection of the null hypothesis.

Quantitative Techniques in Analysis

Page 35

Statistical Applications through SPSS

S. Ali Raza Naqvi

Second Approach

Consider a case in which the data is not available and only the table labeled as Gender*Employment Category Crosstabulation in the above output, is given. On the basis of the output table you can easily find out the same result as above by using SPSS weight cases options. Below we briefly explain that how to enter the data on the basis of the table and to find out the desired results.

Method

First of all, in the variable view of SPSS define three variables and labeled them as Gender, Employment Category and Value. Now on the data view of the SPSS, enter the data in a different manner. We see that the table contains two rows and 3 columns. In row we have two categories i.e. Female and Male; similarly in columns we have three categories i.e. Clerical, Custodial, and Manager. Now the female and male employees, both are fall in the three employment category. So in the data view we simply define the row data i.e. Gender and its opposite define the column data i.e. Employment category and its corresponding frequencies in the Value column. The resulted data view is shown in the picture below.

. After defining the data just click on Data, which will produce a drop down menu, choose weight cases from that, a dialogue box appears in which all the variables are on the left hand side of that box. Tick weight cases by and drag value in the box labeled Frequency Variable by clicking on the arrow between the two boxes. Now click OK to return to the previous window.

The Further process is same as described above. Just define Gender in row and Employment category in Column. Tick Chi-square by clicking on the Statistics button. Now click OK to run the analysis. When the output appears, you will see that SPSS will give you the same result as we find out earlier through data.

Quantitative Techniques in Analysis

Page 36

Statistical Applications through SPSS

S. Ali Raza Naqvi

Regression Analysis

Regression is the relationship between selected values of independent variable and observed values of dependent variable, from which the most probable value of dependent variable can be predicted for any value of independent variable. The use of regression to make quantitative predictions of one variable from the values of another variable is called regression analysis. There are following several types of regression, which may be used by the researcher. • • • • • Linear regression Multiple linear regression Quadratic / Curvilinear regression Logistic / Binary logistic regression Multivariate logistic regression

Linear Regression

When one dependent variable depends on single independent variable then their dependency called linear regression and its model is given by y = a + bx Where, y x a b is is is is a depending variable a independent variable called the regression constant called the regression coefficient

Regression Coefficient

Regression coefficient is a measure of how strongly the independent variable predicts the dependent variable. There are two types of regression coefficient. • • Un-standardized coefficients Standardized coefficients commonly known as Beta.

The un-standardized coefficients can be used in the equation as coefficients of different independent variables along with the constant term to predict the value of dependent variable. The standardized coefficient is, however, measured, in standard deviations. The beta value of 2 associated with a particular independent variable indicates that a change of 1 standard deviation in that particular independent variable will result in change of 2 standard deviations in the dependent variable.

Quantitative Techniques in Analysis

Page 37

Statistical Applications through SPSS

S. Ali Raza Naqvi

Data Source: Variables:

C:\SPSSEVAL\Employee Data

Here we are interested to analyze two numerical variables i.e. • • Current salary Beginning salary (Numerical) (Numerical)

Hypothesis:

H0: HA: Regression coefficient is zero. Regression coefficient is not zero.

SPSS Need:

SPSS need two numerical variables and both should be scaled.

Method:

The given data is entered in the data editor and the variables are labeled as current salary and beginning salary. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Linear, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent. Transfer the independent variable into the box labeled Independent(s). In our case, current salary is a dependent variable and beginning salary is an independent variable. Next we have to select the method for analysis in the box labeled Method. SPSS gives five options here: Enter, Stepwise, Remove, Forward, and Backward. In the absence of a strong theoretical reason for using a particular method, Enter should be used. The box labeled Selection variable is used if we want to restrict the analysis to cases satisfying particular selection criteria. The box labeled Case labels is used for designating a variable to identify points on plots. After making the appropriate selections click on Statistics button. This will produce a dialogue box labeled Linear regression: Statistics. Tick against the statistics you want in the output. The Estimates option gives the estimate of regression coefficient. The Model fit option gives the fit indices for the overall model. Besides these the R-Squared change option is used to get the incremental R-square value when the models change. Other options are not commonly used. Click on the Continue button to return to the main dialogue box. The Plots button in the main dialogue box may be used for producing histograms and normal probability plots of residual. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression. Quantitative Techniques in Analysis Page 38

Statistical Applications through SPSS

S. Ali Raza Naqvi

Now click on OK in the main dialogue box to run the analysis.

Pictorial Representation

Analyze Plots Regression Linear Tick Histogram & Normal Probability Plot Define DV & IV OK

Quantitative Techniques in Analysis

Page 39

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

b Variables Entered/Removed

Model 1

Variables Entered Beginning a Salary

Variables Removed .

Method Enter

a. All requested variables entered. b. Dependent Variable: Current Salary

The above table tells us about the independent variable and the regression method used. Here we see that the independent variable i.e. beginning salary is entered for the analysis as we selected the Enter method.

b Model Summary

Model 1

R R Square .880 a .775

Adjusted R Square .774

Std. Error of the Estimate $8,115.356

a. Predictors: (Constant), Beginning Salary b. Dependent Variable: Current Salary

This table gives us the R-value, which represents the correlation between the observed values and predicted values of the dependent variable. R-Square is called the coefficient of determination and it gives the adequacy of the model. Here the value of R-Square is 0.775 that means the independent variable in the model can predict 77.5% of the variance in dependent variable. Adjusted R-Square gives the more accurate information about the model fitness if one can further adjust the model by his own.

Quantitative Techniques in Analysis

Page 40

Statistical Applications through SPSS

S. Ali Raza Naqvi

b ANOVA

Model 1

Regression Residual Total

Sum of Squares 106831048750.13 31085446686.216 137916495436.34

df 1 472 473

Mean Square 106831048750.1 65858997.217

F 1622.118

Sig. .000 a

a. Predictors: (Constant), Beginning Salary b. Dependent Variable: Current Salary

The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled Regression gives the variability in the model due to known reasons. The second row labeled Residual gives the variability due to random error or unknown reasons. F-value in this case is 1622.118 and the p-value is given by 0.000 which is less that 0.05, so we reject our null hypothesis and conclude that the mean beginning salary is not equal to the mean current salary of the employees.

a Coefficients

Model 1

(Constant) Beginning Salary

Unstandardized Coefficients B Std. Error 1928.206 888.680 1.909 .047

Standardized Coefficients Beta .880

t 2.170 40.276

Sig. .031 .000

a. Dependent Variable: Current Salary

The above table gives the regression constant and coefficient and their significance. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary i.e. Current salary = 1928.206 + (1.909) (Beginning salary) Now we test our hypothesis, we see that the p-value for regression coefficient of beginning salary is given by 0.000, which is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero.

Charts

Quantitative Techniques in Analysis

Page 41

**Statistical Applications through SPSS Histogram
**

D p n e t V ria le C rre t S la e e d n a b : u n a ry

S. Ali Raza Naqvi

15 2

Frequency

10 0

7 5

5 0

2 5 M a = -3 7 -1 en .1 E 6 S . D v. = 0 9 td e .9 9 N= 4 4 7

0 -5 .0 -2 .5 0 .0 2 .5 5 .0 7 .5

R g s io S n a ize R s u l e re s n ta d rd d e id a

The above histogram of standardized residuals shows the value of mean and standard deviation of the residual in the model. The mean and standard deviation is approximately 0 and 1 respectively, which shows that the fitted model is best and the chances of error is minimum.

Normal P –P Plot of Regression Standardized Residual

Quantitative Techniques in Analysis

Page 42

Statistical Applications through SPSS

S. Ali Raza Naqvi

**D ependent Variable: C urrent Salary
**

1.0

Expected Cum Prob

0.8

0.6

0.4

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0

O bserved C umProb

The above Normal probability plot of regression standardized residual shows the regression line which touches maximum number of points presents in the model and it also shows the accuracy of the fitted model. Scatter Plot

Dependent Variable: Current Salary

$140,000 $120,000

Current Salary

$100,000 $80,000 $60,000 $40,000 $20,000 $0 -5.0 -2.5 0.0 2.5 5.0 7.5

Regression Studentized Deleted (Press) Residual

The above scatter plot also shows the adequacy of the fitted model as we can see that the data is scattered and it does not follow any particular pattern, so we can say that the fitted model has minimum chances of error.

**Multiple Regression (Hierarchical Method)
**

Quantitative Techniques in Analysis Page 43

Statistical Applications through SPSS

S. Ali Raza Naqvi

Multiple regression is the most commonly used technique to assess the relationship between one dependent variable and several independent variables. There are three major types of multiple regression i.e. • • • Standard multiple regression. Hierarchical or Sequential regression. Stepwise or statistical regression.

Data Source:

C:\SPSSEVAL\Employee Data

Variables:

Here we are interested to analyze four numerical variables i.e. • • • • Current salary Beginning salary Educational Level Month since Hire (Numerical) (Numerical) (Numerical) (Numerical)

Hypothesis:

H0: HA: Regression coefficients are zero. Regression coefficients are not zero.

SPSS Need: Method:

SPSS need more than two numerical variables that should be scaled.

The method for analyzing multiple regression is same as we discuss earlier in the case of linear regression. The only change in the case of multiple regression is that we have one dependent variable along with three independent variables. Here Current salary is the dependent variable, whereas Beginning salary, Educational Level, and Month since hire are the independent variables. So here we transfer current salary in the box labeled Dependent and beginning salary, educational level, and month since hire in the box labeled Independent(s). The further procedure and uses of advance options for extra results are discussed earlier in the case of linear regression. Now after making appropriate selections of options for better results click on OK to run the analysis.

Quantitative Techniques in Analysis

Page 44

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

b Variables Entered/Removed

Model 1 2 3

Variables Entered a Beginning Salary a Educational Level (years) a Months since Hire

Variables Removed . . .

Method Enter Enter Enter

a. All requested variables entered. b. Dependent Variable: Current Salary

The above table shows that beginning salary was entered in model one followed by educational level in model two followed by months since hire in model three. Note that model one includes only beginning salary as independent variable. Whereas model two includes beginning salary and educational level as independent variables, and so on model three includes beginning salary, educational level, and months since hire as independent variables. Enter method is used to assess all three models.

Model Summary Change Statistics Model 1 2 3 R R Square .880 a .775 .890 b .792 .895 c .801 Adjusted R Square .774 .792 .800 Std. Error of the Estimate $8,115.356 $7,796.524 $7,645.998 R Square Change .775 .018 .008 F Change 1622.118 40.393 19.728 df1 1 1 1 df2 472 471 470 Sig. F Change .000 .000 .000

a. Predictors: (Constant), Beginning Salary b. Predictors: (Constant), Beginning Salary, Educational Level (years) c. Predictors: (Constant), Beginning Salary, Educational Level (years), Months since Hire

The above table shows different R-values along with change statistics for the three models in different rows. In this table we get some additional statistics under the column change statistics. Under change statistics, the first column labeled R-square change gives change in the R-square value between the three models. The last column labeled Sig. F Change tests whether there is a significant improvement in models as we introduce additional independent variables. In other words it tells us if the inclusion of additional independent variables in different steps helps in explaining significant additional variance in the dependent variable. We can see the R-square change value in row three is 0.008. This means that the inclusion of month since hire after beginning salary and educational level helps in explaining the additional 0.8% variance in the current salary of the employees. The p-value for all three models shows that our value falls in the critical region, so we can reject our null hypothesis that means regression coefficients are not zero. Quantitative Techniques in Analysis Page 45

Statistical Applications through SPSS

S. Ali Raza Naqvi

a Coefficients

Model 1 2

3

Unstandardized Coefficients B Std. Error (Constant) 1928.206 888.680 Beginning Salary 1.909 .047 (Constant) -7808.714 1753.860 Beginning Salary 1.673 .059 Educational Level (years) 1020.390 160.550 (Constant) -19986.5 3236.616 Beginning Salary 1.689 .058 Educational Level (years) 966.107 157.924 Months since Hire 155.701 35.055

Standardized Coefficients Beta .880 .771 .172 .779 .163 .092

t 2.170 40.276 -4.452 28.423 6.356 -6.175 29.209 6.118 4.442

Sig. .031 .000 .000 .000 .000 .000 .000 .000 .000

a. Dependent Variable: Current Salary

The above table gives the regression coefficients and related statistics for three models separately in different rows. These regression coefficient and constant can be used to construct an ordinary least squares (OLS) equation and also to test the hypothesis of the independent variable. Using the regression coefficient and the constant term given under the column labeled B; one can construct the OLS equation for predicting the current salary of the employees for three models i.e. MODEL 1 MODEL 2 MODEL 3 CS = 1928.206 + (1.909) (BS) CS = -7808.714 + (1.673) (BS) + (1020.390) (EL) CS = 19986.50 + (1.689) (BS) + (966.107) (EL) + (155.701) (MSH)

Now we test our hypothesis, we see that the p-value for regression coefficient in all three models is less that 0.05, so we can reject our null hypothesis and conclude that the regression coefficient is not zero. Conclusion: By using hierarchal or stepwise method for multiple regression we concluded that model adequacy is being increased by introducing each independent variable but the increased in adequacy by including the independent variable i.e. educational level is more than the adequacy increased by introducing the independent variable, months since hire. But as our p-value lie in the critical region so we can reject our null hypothesis and conclude that the regression coefficients for all three models are not equal to zero.

Charts:

Quantitative Techniques in Analysis Page 46

Statistical Applications through SPSS

S. Ali Raza Naqvi

This model also produces three diagrams for the standardized residual i.e. Histogram, Normal Probability Plot, and Scatter Plot. The charts and its interpretation are almost same as we discuss under the case of linear regression. So we are not describing these charts and its interpretations again.

**Curvilinear / Quadratic Regression
**

The relationship between variables when the regression equation is nonlinear i.e. quadratic or higher order then their dependency called curvilinear or quadratic regression. There may be more than one dependent variable depending on one independent variable.

Data Source:

C:\SPSSEVAL\Employee Data

Variables:

Here we are interested to analyze three numerical variables i.e. • • • Current salary Beginning salary Educational level (Numerical) (Numerical) (Numerical)

Hypothesis:

H0: HA: Regression coefficient is zero. Regression coefficient is not zero.

SPSS Need:

SPSS need two numerical variables and both should be scaled.

Method:

The given data is entered in the data editor and the variables are labeled as current salary, beginning salary and Educational level. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Curve Estimation, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Transfer the dependent variable into the right-hand side box labeled Dependent(s). Transfer the independent variable into the box labeled Independent. In our case, current salary and Beginning salary are dependent variables and Educational level is an independent variable.

Quantitative Techniques in Analysis

Page 47

Statistical Applications through SPSS

S. Ali Raza Naqvi

Now choose an appropriate model you want by ticking its box appearing below the window labeled Curve Estimation. In this case we choose Quadratic model by ticking its corresponding box. The Save button can be used to save statistics like predicted values, residuals, and predicted intervals. Now click on OK in the main dialogue box to run the analysis.

Pictorial Representation

Analyze Regression Curve Estimation Define DVs and IV Tick Quadratic OK

Quantitative Techniques in Analysis

Page 48

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Model Description Model Name Dependent Variable 1 2 MOD_2 Current Salary Beginning Salary Quadratic Educational Level (years) Included Unspecified .0001

Equation 1 Independent Variable Constant Variable Whose Values Label Observations in Plots Tolerance for Entering Terms in Equations

The above table gives the description of the model. In this case we have two dependent variables i.e. Current salary and Beginning salary along with one independent variable i.e. Educational level (years).

Quantitative Techniques in Analysis

Page 49

Statistical Applications through SPSS

S. Ali Raza Naqvi

Case Processing Summary N Total Cases a Excluded Cases Forecasted Cases Newly Created Cases 474 0 0 0

a. Cases with a missing value in any variable are excluded from the analysis.

The above table shows the number of cases fall in the selected model. In our case the total number of cases is 474, with no excluded or missing cases.

Model Summary and Parameter Estimates Dependent Variable: Current Salary Equation Quadratic R Square .589 Model Summary F df1 337.246 2 df2 471 Parameter Estimates Sig. Constant b1 b2 .000 85438.237 -12428.5 612.950

The independent variable is Educational Level (years). The above table gives the test results for the quadratic regression. R-value shows the correlation between the observed and expected values of the dependent variables. In this case the F-value is given by 337.246, with level of significance equals 0.000 which is less that 0.05. This means that our value falls in the critical region, so we can reject our null hypothesis and conclude this as the regression coefficients are not zero.

Scatter Plots

Quantitative Techniques in Analysis

Page 50

Statistical Applications through SPSS

S. Ali Raza Naqvi

Current Salary

Observed Quadratic $120,000

$140,000

$100,000

$80,000

$60,000

$40,000

$20,000

$0 8 10 12 14 16 18 20 22

Educational Level (years)

B eginning Salary

$80,000

O bserved Q uadratic

$60,000

$40,000

$20,000

$0 8 10 12 14 16 18 20 22

Educational Level (years)

The above charts for residuals of dependent variables clearly show that the residual values are not scattered and it follows a particular pattern, this means that the fitted model is not good. Quantitative Techniques in Analysis Page 51

Statistical Applications through SPSS

S. Ali Raza Naqvi

**Linker Type Scaling
**

Linker type scaling is a method used for nominations on categorical data in order to make the categorical data meaningful, when we have to apply some statistical test on the data. Through this scaling approach the ranks assign can be treated as numerical values and its statement at once. Suppose we have to collect the data about the awareness, preferences, usage, likeness, and dislikeness as well as the agreement with any statement that should return in qualitative form and we have to record the responses, which can be analyze statistically. So in this condition we use linker type scaling.

Data Source:

RUN \\temp\temp\Ali Raza\Mateen.sav

Hypothesis:

H0: HA: µMale = µFemale µMale ≠ µFemale

Variables:

Here we are interested to analyze two categorical variables i.e. • • Gender. (Categorical) Preference of cellular service with respect to network coverage. (Categorical but treated as Numerical)

Here we consider the preference of cellular service as a numerical variable and statistically test the hypothesis that Mean preference of male and female over cellular service with respect to network coverage is same. The method, we use to test the above hypothesis is Independent samples t-test.

Method:

Enter the data in the data editor and the variables are labeled as Gender and preference. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Preference is the dependent variable to be analyzed and should be transferred into test variable box. Gender is the variable which will identify the groups and it should be transferred into the grouping variable box.

Quantitative Techniques in Analysis

Page 52

Statistical Applications through SPSS

S. Ali Raza Naqvi

Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents Male and group2 represents female. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.

Pictorial Representation

Analyze Compare Means Drag Test & Grouping Variable Independent-Samples T Test Define Groups OK

Quantitative Techniques in Analysis

Page 53

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Group Statistics Gender contain Both male and female Wide network coverage Male motivate the induvidual to prefer a particular Female cellulare service N 100 100 Mean 4.12 4.32 Std. Deviation .891 .618 Std. Error Mean .089 .062

This table contains the descriptive statistics for both groups. We have taken 200 observations for the independent samples t-Test in which 100 belongs to male category and 100 to female category. The column labeled Mean shows that the mean preferences of cellular service with respect to network coverage for both groups are approximately 4. This means that both groups are Agree that wide network coverage motivate the individual to prefer a particular cellular service.

Independent Samples Test Levene's Test for Equality of Variances

t-test for Equality of Means 95% Confidence Interval of the Difference

F Wide network coverage motivate the induvidual to prefer a particular cellulare service Equal variances assumed Equal variances not assumed 2.730

Sig. .100

t -1.845 -1.845

df 198 176.31

Sig. (2-tailed) .067 .067

Mean Difference -.200 -.200

Std. Error Difference .108 .108

Lower -.414 -.414

Upper .014 .014

The above table contains the test statistics for independent samples t-test. Levene's Test: The table contains two sets of analysis, the first one assuming equal variances in the two groups and the second one assuming unequal variances. The Levene's test tells us which statistic to consider analyzing the equality of means. The p-value for Levene's test is given by 0.10, which is greater than 0.05. Therefore, the statistic Quantitative Techniques in Analysis Page 54

Statistical Applications through SPSS

S. Ali Raza Naqvi

associated with equal variances assumed should be used for the t-test for equality of means of two independent populations. P-Value: shows that the value of our test statistic does not fall in the critical region i.e. 0.067 > 0.05 so we can accept our Null Hypothesis i.e. µMale = µFemale Conclusion: The test results are statistically significant at 95% confidence level and the data provide sufficient evidence to conclude that the mean preference of cellular service with respect to network coverage for male and female is same and there is only 6.7% chance of rejecting a true Null Hypothesis and we are 93.3% confident in our decision.

Reliability Analysis

Reliability analysis is applied to check the reliability of the data, that whether the conclusions and the analysis perform for the data are reliable to understand and forecast. One way to ideally measure reliability is by the test-retest method. However, establishing reliability through test-retest is practically very difficult. Some of the commonly used techniques for assessing reliability include Cohen's Kappa Coefficient for categorical data and Cronbach's Alpha for internal reliability of the data set.

Data Source: Variables:

C:\SPSSEVAL\Home Sales [By Neighborhood].sav

Here we are interested to check the reliability of data set, which includes five numerical variables i.e. • • • • • Appraised land Value. Appraised value of improvements. Total Appraised Value. Sales Price. Ratio of Sales price to total Appraised Value.

Note that the data contains one String variable labeled as Neighborhood; we deleted this variable because SPSS does not check the reliability, if the data contains any String or Blank variable.

SPSS Need:

For reliability analysis through SPSS, one can use any variable of any nature except the String and Blank Variables.

Quantitative Techniques in Analysis

Page 55

Statistical Applications through SPSS

S. Ali Raza Naqvi

Method:

Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Scale from that and click on Reliability Analysis, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the reliability analysis, transfer the variables in the box labeled Items by clicking on the arrow between the two boxes. In this case, we have five numerical variables in the data set that should be transferred to the Items box.

Choose appropriate Model by clicking on that box, here we choose Alpha as model. Now click on Statistics button, a dialogue box appears. Tick the corresponding box which you want to analyze in the output. Now click Continue to return to the main dialogue box. Click on OK to run the analysis.

Pictorial Representation

Analyze Drag Items Scale Choose Model Reliability Analysis Give Statistics OK

Quantitative Techniques in Analysis

Page 56

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Case Processing Summary Cases Valid a Excluded Total N 2440 0 2440 % 100.0 .0 100.0

a. Listwise deletion based on all variables in the procedure.

The above table shows the total number of cases fall in the data set. We have 2440 observations with no missing and excluded cases. Quantitative Techniques in Analysis Page 57

Statistical Applications through SPSS

S. Ali Raza Naqvi

Reliability Statistics Cronbach's Alpha .576 N of Items 5

The above table shows the test results for the reliability analysis. The value of Cronbach's Alpha is given by 0.576; the number of items in the data set is 5. The value associated with Alpha is said to be Poor and the conclusions draw from this data is not reliable to understand and forecast.

Item-Total Statistics Scale Mean if Item Deleted Appraised Land Value 164151.7603 Appraised Value of 140212.2148 Improvements Total Appraised Value 132761.3423 Sale Price 106454.2587 Ratio of Sale Price to 181191.6111 Total Appraised Value Scale Variance if Item Deleted 5533196618 4646009801 5160815037 1928523141 6801537138 Corrected Item-Total Correlation .688 .505 .314 .565 -.032 Cronbach's Alpha if Item Deleted .480 .438 .533 .477 .615

The above table shows the statistics associated with each item. The last column of the table shows the improvement in the value of Alpha, if the corresponding item is deleted from the data set. Now the value associated with the top four items in the data set is less than the current value of Alpha which is 0.576, that means if one of these items is deleted, the value of Cronbach's Alpha is become worst. But the value associated with the item labeled Ratio of sale price to total appraised value is given by 0.615. This means that if this item is deleted from the analysis and retests the reliability of the entire data, the value of Cronbach's Alpha becomes 0.615. So, in order to improve the value of Alpha to make our data set more reliable we delete the last item and retest the value of our Cronbach's Alpha.

Reliability Statistics Cronbach's Alpha .615 N of Items 4

Here we retest our data after deletion of one item and our new value of Alpha is given by 0.615. Now the total number of items in the entire data set is 4. The value associated with Alpha in this set of reliability statistics is said to be Acceptable and the conclusions draw from this data is reliable to understand and forecast. Quantitative Techniques in Analysis Page 58

Statistical Applications through SPSS

S. Ali Raza Naqvi

Item-Total Statistics Scale Mean if Item Deleted 164150.57 140211.03 132760.16 106453.07 Scale Variance if Item Deleted 5533198039 4646008210 5160813863 1928530335 Corrected Item-Total Correlation .688 .505 .314 .565 Cronbach's Alpha if Item Deleted .540 .493 .599 .536

Appraised Land Value Appraised Value of Improvements Total Appraised Value Sale Price

This table shows that if we delete any other item from the data set and retest the reliability, then our value of Alpha becomes Poor. Because all the values associated with the remaining four items in last column of the above table is less than the current value of our Cronbach's Alpha i.e. 0.615. So we don’t need to further retest the reliability of the data set, which means the data is reliable at the current value of our Cronbach's Alpha.

Correlation Analysis

Correlation refers to the degree of relation between two numerical variables. It is denoted by "r", which is typically known as Correlation Coefficient.

Correlation Coefficient

The Correlation coefficient gives the mathematical value for measuring the strength of the linear relation between two variables. Mathematically the value of "r" always lay between -1 and 1 with: (a) +1 representing absolute positive linear relationship (as X increases, Y increases). (b) 0 representing no linear relationship (X and Y have no pattern). (c) -1 representing absolute inverse relationship (as X increases, Y, Decreases).

Quantitative Techniques in Analysis

Page 59

Statistical Applications through SPSS

S. Ali Raza Naqvi

Bivariate Correlation

Bivariate correlation tests the strength of relationship between two variables without giving any consideration to the interference some other variables might cause to the relationship between the two variables being tested. For example, while testing the correlation between the Current and Beginning salary of the employees, bivariate correlation will not consider the impact of some other variables like Educational Level and Previous Experience of the employees. In such cases, a bivariate analysis may show us a strong relationship between Current and Beginning salary; but in reality, this strong relationship could be the result of some other extraneous factors like Educational Level and Previous Experience etc.

Data Source:

C:\SPSSEVAL\Employee data

Hypothesis:

H0: HA: There is no Correlation between Variables (r =0) There is some Correlation between Variables (r ≠ 0)

Variables:

• • •

Here we are interested to analyze three numerical variables i.e. Current salary Beginning salary Educational Level (years) (Numerical) (Numerical) (Numerical)

Technically correlation analysis can be run with any kind of data, but the output will be of no use if a correlation is run on a categorical variable with more than two categories. For example, in a data set, if the respondents are categorized according to nationalities and religions, correlation between these variables is meaningless.

SPSS Need:

SPSS need two or more numerical variables to perform Correlation Analysis.

Method:

Firstly the data is entered in the data editor and the variables are labeled as Current salary, Beginning salary, Educational Level, and Previous Experience. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Bivariate, a dialogue box appears, in which all the input variables appear in the left-hand Quantitative Techniques in Analysis Page 60

Statistical Applications through SPSS

S. Ali Raza Naqvi

side of that box. To perform the bivariate correlation, choose the variables for which the correlation is to be studied from the left-hand side box and move them to the right-hand side box labeled Variables. Once any two variables transferred to the variables box, the OK button becomes active. In our case we will transfer four numerical variables i.e. Current salary, Beginning salary, Educational Level, and Previous Experience, to the right-hand side box labeled as Variables. There are some default selections at the bottom of the window; that can be change by clicking on the appropriate boxes. For our purpose, we will use the most commonly used Pearson's Coefficient. Next, while choosing between one-tailed and two-tailed test of significance, we have to see if we are making any directional prediction. The one-tailed test is appropriate if we are making predictions about a positive or negative relationship between the variables; however the two-tailed test should be used if there is no prediction about the direction of relationship between the variables to be tested. Finally Flag Significant Correlations asks SPSS to print an asterisk next to each correlation that is significant at the 0.05 significance level and two asterisks next to each correlation that is significant at the 0.01 significant level, so that the output can be read easily. The default selections will serve the purpose for the problem at hand. We may choose Means and Standard Deviations from the Options button if we wish to compute these figures for the given data. After making appropriate selections, click on OK to run the analysis.

Analyze Define Variables

Pictorial Representation

Correlate Bivariate Choose appropriate options OK

Quantitative Techniques in Analysis

Page 61

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Quantitative Techniques in Analysis Page 62

Statistical Applications through SPSS

S. Ali Raza Naqvi

Correlations Current Salary Current Salary Pearson Correlation 1 Sig. (2-tailed) N 474 Beginning Salary Pearson Correlation .880** Sig. (2-tailed) .000 N 474 Educational Level (years) Pearson Correlation .661** Sig. (2-tailed) .000 N 474 **. Correlation is significant at the 0.01 level (2-tailed). The above table gives the correlation for all pairs of variables and each correlation is produced twice in the matrix. So here we get following 3 correlations for the given data. • • • Current salary and Beginning salary Current salary and Educational level Beginning salary and Educational level Beginning Educational Salary Level (years) .880** .661** .000 .000 474 474 1 .633** .000 474 474 .633** 1 .000 474 474

The value of correlation coefficient is 1 in the cells where SPSS compare two same variables (Current salary and Current salary and so on). This means that there is a perfect positive correlation between the variables. In each cell of the correlation matrix, we get Pearson's correlation coefficient, p-value for two-tailed test of significance and the sample size. From the output we can see that the correlation coefficient between Current salary and Beginning salary is 0.88 and the pvalue for two-tailed test of significance is less than 0.05. From these figures we can conclude that there is a strong positive correlation between Current salary and beginning salary and that this correlation is significant at the significance level of 0.01. Similarly, the correlation coefficient for Current salary and Educational level is 0.661. So there is a moderate positive correlation between these variables. The correlation coefficient for Beginning salary and Educational level is 0.633 and its pvalue is given by 0.000, so we can reject our null hypothesis and conclude this as there is some correlation between these two variables. Conclusion: At 1% level of significance all variables are significantly correlated with each other. In this case our null hypothesis is rejected that there is no correlation between the variables for all pairs of variables. We can conclude this as there is some correlation present between all variables in the given data.

Quantitative Techniques in Analysis

Page 63

Statistical Applications through SPSS

S. Ali Raza Naqvi

Partial Correlation

Partial correlation allows us to examine the correlation between two variables while controlling for the effects of one or more of the additional variables without throwing out any of the data. In other words, it is the degree of relation between the dependent variable and one of the independent variable by controlling the effect of other independent variables, because we know that, in a multiple regression model, one dependent variable depends on two or more independent variables.

Data Source:

C:\SPSSEVAL\Employee data

Hypothesis:

H0: HA: There is no Correlation between Variables (r =0) There is some Correlation between Variables (r ≠ 0)

Variables:

Here we are interested to analyze two numerical variables, while controlling one additional variable. • • • Current salary Beginning salary Educational level

(Control variable)

SPSS Need:

SPSS need two or more numerical variables to perform partial Correlation.

Method:

Enter the data in the data editor and labeled them. Click on Analyze which will produce a drop down menu, choose Correlate from that and click on Partial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the partial correlation, transfer the variables for which you want to know the correlation between them in the box labeled Variables, while controlling the effect of one or more additional variables by transferring them to the box labeled Controlling for.

In our case, we want to find the correlation between Current salary and beginning salary of the employees, so these variables should be transferred to the box labeled Variables, while controlling for the effect of Educational level and Previous experience of the

Quantitative Techniques in Analysis

Page 64

Statistical Applications through SPSS

S. Ali Raza Naqvi

employees by transferring them to the box labeled Controlling for. Now click on OK to run the analysis.

Pictorial Representation

Analyze Drag Variables Correlate Drag Controlling Variables Partial OK

Quantitative Techniques in Analysis

Page 65

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Correlations Control Variables Educational Level (years) Current Salary Current Salary Correlation 1.000 Significance (2-tailed) . df 0 Correlation .795 Significance (2-tailed) .000 df 471 Beginning Salary .795 .000 471 1.000 . 0

Beginning Salary

The above table shows the test results for the partial correlation between Current salary and beginning salary of the employees. The variable we are controlling for in the analysis is Educational level, and it is shown in the left-hand side of the table. We can see that the correlation coefficient between Current salary and Beginning salary is 0.795, which is considerably smaller as compared to 0.88 in case of Bivariate. This means that both the variables are still have positive correlation, but the value of correlation coefficient decreased if we control for the Educational level of the employees and the variables are no longer strongly correlated with each other. Conclusion: The test results are significant at 5% level of significance and the data provide sufficient evidence to conclude that there is some correlation present between the Current Quantitative Techniques in Analysis Page 66

Statistical Applications through SPSS

S. Ali Raza Naqvi

salary and Beginning salary of the employees, but it is considerably smaller in the case of partial correlation than in case of bivariate correlation.

Logistic Regression

Logistic regression starts in 1700. If a categorical variable depends on any numerical or categorical variable then their dependency may called the logistic regression. It is used to predict a discrete outcome based on variables may be discrete, continuous, or mixed. Thus when the dependent variable is categorical with two or more than two discrete outcomes, logistic regression is a commonly used technique. It has the following two types: • • Binary logistic regression / Logit Multinomial logistic regression

**Coefficient of Logistic Regression
**

Logistic regression computes the log odds for a particular outcome. The odds of an outcome is given by the ratio of the probability of it happening and not happening as [P / (1-P)], where P is the probability of an event. There are some mathematical problems in reporting these odds, so natural logarithms of these odds are calculated. A positive value indicates that odds are in favor of the event and the event is likely to occur while a negative value indicates that odds are against the events and the event is not likely to occur. The formula to do so may be written either

**Binary Logistic Regression
**

If a categorical variable having only two levels E.g. Male and Female, Yes or No, Good and Bad etc, and it is depending on different categorical or numerical independent variables then their relation can be referred as Binary logistic regression. The expression for binary logistic regression may be given as Y = b0 + b1X1 + b2X2 + ………..bkXk

Data Source:

C:\SPSSEVAL\AML Survival

Quantitative Techniques in Analysis

Page 67

Statistical Applications through SPSS

S. Ali Raza Naqvi

Hypothesis:

H0: HA: Regression coefficients are zero Regression coefficients are not zero

Variables:

• • •

Here we are interested to analyze three Different variables i.e. Status (Categorical) Time (Numerical) Chemotherapy (Categorical)

Here Status is our dependent variable depending on Time and Chemotherapy. As in this case our dependent variable is categorical having only two levels i.e. Censored and Relapsed, so we use binary logistic regression to analyze the dependency between the variables.

SPSS Need:

SPSS need one dependent variable and it must be Categorical, while the independent variables can be categorical as well as numerical.

Method:

Firstly the data is entered in the data editor and the variables are labeled as Status, Time, and Chemotherapy. Click on Analyze which will produce a drop down menu, choose Regression from that and click on Binary Logistics, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binary Logistics Regression, transfer the dependent variable in the box labeled Dependent and the independent variables in the box labeled Covariates. In our case, Status is an only dependent variable and should be transfer to the box labeled Dependent. Time and Chemotherapy are independent variables and should be transfer to the box labeled Covariates. Next we have to select the method of for analysis in the box labeled Method. SPSS gives seven options, of which the Enter method is most commonly used. For common purpose one does not need to use the Save and Options buttons. Advance users may experiment with these. The Save button can be used to save statistics like predicted values, residuals, and distances. The options button can be used to specify the criteria for stepwise regression. After making appropriate selections, click on OK to run the analysis.

Quantitative Techniques in Analysis

Page 68

Statistical Applications through SPSS

S. Ali Raza Naqvi

Pictorial Representation

Analyze Drag Dependent Regression Drag Covariates Binary Logistic OK

Quantitative Techniques in Analysis

Page 69

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Case Processing Summary Unweighted Cases Selected Cases Included in Analysis Missing Cases Total Unselected Cases Total

a

N 23 0 23 0 23

Percent 100.0 .0 100.0 .0 100.0

a. If weight is in effect, see classification table for the total number of cases.

The above table gives the description of cases selected for the analysis. We have totaled 23 cases included in the analysis with no missing and unselected cases.

Dependent Variable Encoding Original Value Censored Relapsed Internal Value 0 1

The above table shows that how the two outcomes or two levels of Status i.e. Censored and Relapsed have been coded by SPSS. Quantitative Techniques in Analysis Page 70

Statistical Applications through SPSS

S. Ali Raza Naqvi

Block 0: Beginning Block

a,b Classification Table

Predicted Status Censored Relapsed 0 5 0 18 Percentage Correct .0 100.0 78.3

Step 0

Observed Status Overall Percentage

Censored Relapsed

a. Constant is included in the model. b. The cut value is .500

The above table shows the observed or actual number of cases fall in each category of the dependent variable. The last column labeled Percentage Correct shows that our model can predict 0% status of the censored patients and 100% status for the relapsed patients. Overall, our model can predict 78.3% status of the patients.

**Block 1: Method = Enter
**

Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 4.609 4.609 4.609 df 2 2 2 Sig. .100 .100 .100

The above table reports significance levels by the traditional chi-square method. It tests if the model with the predictors is significantly different from the model. The omnibus test may be interpreted as a test of the capability of all predictors in the model jointly to predict the response (dependent) variable. A finding of significance, as in the illustration above, corresponds to the a research conclusion that there is adequate fit of the data to the model, meaning that at least one of the predictors is significantly related to the response variable. In the illustration above, the Enter method is used (all model terms are entered in one step), so there is no difference for Step, Block, or Model, but in a stepwise procedure one would see results for each step.

Quantitative Techniques in Analysis

Page 71

Statistical Applications through SPSS

S. Ali Raza Naqvi

Model Summary Step 1 -2 Log Cox & Snell likelihood R Square 19.476 a .182 Nagelkerke R Square .280

a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.

The above table gives the Cox & Snell R-Square value, which gives an approximation about how much variance in the dependent variable can be explained with the hypothesized model. In this case Time and Chemotherapy can explain 18.2% of the patient's current Status.

a Classification Table

Predicted Status Censored Relapsed 1 4 0 18 Percentage Correct 20.0 100.0 82.6

Step 1

Observed Status Overall Percentage

Censored Relapsed

a. The cut value is .500

The above Classification table summarizes the results of our predictions about patient's Status based on Time and Chemotherapy. We can see that our model can correctly predict 20% status of censored patients and 100% status of the relapsed patients. Overall, our model predicts 82.6% status of the patients.

Variables in the Equation Step a 1 chemo time Constant B -1.498 -.024 2.962 S.E. 1.262 .024 1.207 Wald 1.409 1.055 6.025 df 1 1 1 Sig. .235 .304 .014 Exp(B) .224 .976 19.332

a. Variable(s) entered on step 1: chemo, time.

The above table gives the Beta coefficients for the independent variables along with their significance. Negative beta coefficients for time and chemotherapy mean that with increasing chemotherapy and time of the treatment, it chances for the patient of having a relapsed status. Same as Multiple linear regression models, we can construct an OLS equation for the status of the patient by the help of above regression constant and coefficients. The expression for status of the patient is given by: Status = 2.962 + (-1.498) (Chemotherapy) + (-0.024) (Time)

Quantitative Techniques in Analysis

Page 72

Statistical Applications through SPSS

S. Ali Raza Naqvi

The last column labeled Exp(B) takes a value of more than one, if the beta coefficients are positive and less than one, if it is negative. In our case, the beta coefficients for Chemotherapy and time are negative, so coefficients are having the values of less than one in column labeled Exp(B). A value of 0.976 for Time indicates that for 1 week increase in the treatment, the odds of a patient having a relapsed status increases by a factor of 0.976. These values can also use to construct an equation for the odds of a patient, and it is given by:

P=

1 1 + e-(19.332 + 0.224 C + 0.976 T)

**Non-Parametric Tests
**

Non-Parametric tests are used to test the hypothesis regarding the population parameters of non-normal data with small sample size (less than 30). These tests are sometimes also referred as "Distribution-Free tests"

Binomial Test

Binomial tests are used to test the hypothesis regarding the population proportion. It runs on a categorical variable having two levels only.

Data Source:

C:\SPSSEVAL\Carpet

Hypothesis:

H0: HA: P = 0.5 P ≠ 0.5

Variables:

Quantitative Techniques in Analysis

Page 73

Statistical Applications through SPSS

S. Ali Raza Naqvi

Here we are interested to analyze a categorical variable i.e. House keeping Seal. In our case a superstore owner claims that 50% of their customers got house keeping seal on the purchase of the product.

SPSS Need:

SPSS need one categorical variable (2 levels only).

Method:

Firstly the data is entered in the data editor and the variable is labeled as House keeping seal. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Binomial, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Binomial test, transfer the test variable in the box labeled Test variable list. In our case House keeping seal is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now give the test value in the box below labeled as Test Proportion. In our case the test value is 0.50.

After making appropriate selections, click on OK to run the analysis.

Pictorial Representation

Analyze Drag test Variable Non-Parametric tests Give test Proportion Binomial OK

Quantitative Techniques in Analysis

Page 74

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Quantitative Techniques in Analysis

Page 75

Statistical Applications through SPSS

S. Ali Raza Naqvi

NPar Tests

Binomial Test Category Yes No N 8 14 22 Observed Prop. .36 .64 1.00 Test Prop. .50 Exact Sig. (2-tailed) .286

Good Housekeeping seal

Group 1 Group 2 Total

The above table gives the test results for the Binomial Non-parametric test.

The first column labeled Category gives the two categories (Yes or No) of the test variable i.e. Good House keeping seal.

The second column labeled as N gives the total number cases analyzed, and also the number of cases fall in each category of our test variable. In this case we have selected the sample of 22 persons out of which 8 persons say Yes they got the house keeping seal and the remaining says No.

The third column labeled as Observed Proportion gives the percentage of the persons saying Yes or No. 36% individuals says Yes they got the house keeping seal while 64% individuals says No.

The last column gives the p-value for the 2-tailed test and it is given by 0.286, which is greater than 0.05, so we can accept our null hypothesis and conclude that the claim of the superstore owner is correct, the proportion is 0.50.

Quantitative Techniques in Analysis

Page 76

Statistical Applications through SPSS

S. Ali Raza Naqvi

Runs Test

Runs test is used to test the randomness of the data. This test is best run, if the test variable is numerical. The word RUNS refer the number of time sign is changed.

Data Source:

C:\SPSSEVAL\Carpet

Hypothesis:

H0: HA: Data is random Data is not random

Variables:

Here we are interested to analyze a numerical variable i.e. Preference.

SPSS Need:

SPSS need a numerical variable with small sample size.

Method:

Firstly the data is entered in the data editor and the variable is labeled as Preference. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on Runs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the Run test, transfer the test variable in the box labeled Test variable list. In our case Preference is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now in our case the test variable i.e. Preference is a numerical variable, so in the Section labeled Cut Point, we tick the box Median, but if the test variable is categorical, it is appropriate to calculate its Mean by ticking its corresponding box. Quantitative Techniques in Analysis Page 77

Statistical Applications through SPSS

S. Ali Raza Naqvi

After making appropriate selections, click on OK to run the analysis.

Pictorial Representation

Analyze Drag test Variable Non-Parametric tests Tick Box (Median) Runs OK

Quantitative Techniques in Analysis

Page 78

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Runs Test

a Test Value Cases < Test Value Cases >= Test Value Total Cases Number of Runs Z Asymp. Sig. (2-tailed)

Preference 11.50 11 11 22 13 .218 .827

a. Median

The above table gives the test results for Runs Test. The first row labeled as Test Value gives the Median of the data. In this case out of 22 observations 11 values is less than our median or in other words those values having a negative sign, while the remaining values having a positive sign. The row labeled Number of Runs gives a value 13; this means in the given data 13 times a sign is changed. The last row gives the p-value for the Runs test and it is given by 0.827 > 0.05, so we can accept our null hypothesis and conclude that the Data is Random.

Representation of Runs:

Quantitative Techniques in Analysis Page 79

Statistical Applications through SPSS

S. Ali Raza Naqvi

**One Sample K-S Test
**

One sample K-S Test is used to test the goodness of fit of any specific distribution for the given data. This distribution is called "Kolmogrov-Smirnov Z" commonly known as "Nonparametric Chi-square".

Data Source:

C:\SPSSEVAL\Carpet

Hypothesis:

H0: HA: Fit is Good (Data follows the fitted distribution) Fit is not Good (Data does not follow the fitted distribution)

Variables:

Quantitative Techniques in Analysis

Page 80

Statistical Applications through SPSS Here we are interested to analyze a numerical variable i.e. Price.

S. Ali Raza Naqvi

SPSS Need:

SPSS need a numerical variable with small sample size.

Method:

Firstly the data is entered in the data editor and the variable is labeled as Price. Click on Analyze which will produce a drop down menu, choose Non-Parametric Tests from that and click on 1-sample K-S, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the K-S test, transfer the test variable in the box labeled Test variable list. In our case Price is a test variable, so it should be transferred to the box labeled Test variable list by clicking on the arrow between the two boxes. Now tick the box in the section labeled Test Distribution at the bottom of the dialogue box. In our case the fitted distribution is Poisson, so we tick the corresponding box labeled Poisson. After making appropriate selections, click on OK to run the analysis.

Pictorial Representation

Analyze Drag test Variable Non-Parametric tests Tick Box (Median) Runs OK

Quantitative Techniques in Analysis

Page 81

Statistical Applications through SPSS

S. Ali Raza Naqvi

OUTPUT

Quantitative Techniques in Analysis

Page 82

Statistical Applications through SPSS

S. Ali Raza Naqvi

One-Sample Kolmogorov-Smirnov Test N a,b Poisson Parameter Most Extreme Differences Price 22 2.0000 .143 .143 -.135 .670 .760

Mean Absolute Positive Negative

Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed) a. Test distribution is Poisson. b. Calculated from data.

The above table gives the test results for the one sample K-S test. We have taken 22 observations for the analysis. The mean of Poisson distribution calculated from data is given by 2. The row labeled Absolute gives the difference between extreme values i.e. extremely high values and Extremely Low values and it given by 0.143.

The row labeled Positive gives the difference between the Maximum and Minimum values, when we subtract Minimum value from maximum value and it is 0.143.

The row labeled Negative also gives the same as row labeled Positive, but here we subtract Maximum value from the Minimum value, so the resulted value is given by (-0.135).

The Kolmogrov-Smirnov Z value is given by 0.67, which calculated from the formula.

The last row gives the p-value for the analysis. In our case the p-value is given by 0.76, and it is greater than 0.05. So we can accept our null hypothesis and conclude this test as the Fit is Good and the data follows the Poisson distribution.

Quantitative Techniques in Analysis

Page 83

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd