You are on page 1of 19

Basic Research Process

A Agenda: d
Hypothesis testing Z test T test (independent and dependent) Analysis of Variance (ANOVA) F-test
Research Process
Problem Definition Approach Development Research Design Fieldwork & Data Collection Data Analysis Report & Presentation

G K Saini

A Classification of Univariate Techniques


Univariate Techniques

A Classification of Multivariate Techniques


Multivariate Techniques

Metric Data One Sample * t test * Z test Two or More Samples

Non Metric Data Non-Metric Two or More Samples

Dependence Technique One Dependent Variable


* Cross-Tabulation * Analysis of Variance and Covariance * Multiple Regression * 2-Group Discriminant/Logit * Conjoint Analysis

Interdependence Technique Variable Interdependence Interobject Similarity


* Cluster Analysis * Multidimensional Scaling

One Sample * Chi-Square * K-S * Runs * Binomial Independent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA

More Than One Dependent Variable

Independent * Two- Group test * Z test * One-Way ANOVA

Related * Paired t test

Related * Sign * Wilcoxon * McNemar * Chi-Square

* Multivariate Analysis * Factor Analysis of Variance * Confirmatory * Canonical Correlation Factor Analysis * Multiple Discriminant Analysis * Structural Equation Modeling and Path Analysis

What is a Hypothesis?
A hypothesis is an assumption about the population parameter Examples of parameters are population mean The parameter must be identified before analysis I claim the average monthly productivity of a salesman is = 24 !

The Null Hypothesis, H0


States the assumption (numerical) to be tested e.g.: The average monthly productivity of a salesman is twenty four ( H 0 : = 24 ) Is always about a population parameter ( H about a sample statistic ( H 0 : X = 24 )
0

: = 24 ) , not

Begins with the assumption that the null hypothesis is true Similar to the notion of innocent until proven guilty Refers to the status quo May or may not be rejected

The Alternative Hypothesis, H1


Is the opposite of the null hypothesis e.g.: The Th average monthly thl productivity d ti it of fa salesman is not 24. ( H a : 24 ) Challenges the status quo May or may not be accepted

Simple Hypothesis Testing Process


Assume the population average monthly sales by a salesman is 24 ( H 0 : = 24 )

Identify the Population

how .likely .that . X = 24 ?


No, not likely! REJECT Null Hypothesis

T k a Sample Take S l
( X = 28 )

Reason for Rejecting H0


Sampling Distribution of X
... Therefore, we reject the null hypothesis that mean = 24.

Level of Significance and the Rejection Region

It is unlikely that we would get a sample mean of this value ...

... if in fact this were the population mean.


= 24
28

If H0 is true

Z values at 95 percent level of significance

Hypothesis Testing: in the scale of original variable vs. standardized scale


Observed values and critical values should be comparable. As z values are given in standardized format, therefore original values are to be converted in the standard scale, to make the units comparable to Z values. Instead of converting the critical z values to the original scale to get numbers directly comparable to the observed value of xbar, we can convert observed value of xbar to the standardized scale. Both values (original and standardized) will produce the same results. In practice, the standardized scale is used most often.

Hypothesis Testing: in the scale of original variable


Find out, based on the sample data as given below, whether company Xs on-the-job training is effective. You expect that after training workers should able to produce 40000 units per year. Past data indicates that there is normally a S.D. of 2000 units per year.

n = 100.;.x = 39650( Number.of .Units), ;. = 2000

H 0 = 40000.( Hypothesized .Value.of .Pop.Mean) H0: Yearly output per worker after training is 40000 units. H 0 : = 40000.( Null.Hypothesis ) H1: Yearly output per worker after training is not 40000 units. H 1 : 40000.( Alternate.Hypothesis) = 0.05.( Level.of .Significance)

( we .know ); So ; =

2000 100

= 200 .( SE .of .Mean )


The 0 0.95 95 acceptance region contains two equal areas of 0 0.475. 475 Z value for 0.475 of the area under the curve is 1.96

H 0 1.96 x

UpperConfi denceLimit = 40000 + 1.96(200) = 40392.Units LowerConfi denceLimit = 40000 1.96(200) = 39608.Units
Since, the sample mean (i.e. 39650) lies in the confidence interval so accept the null hypothesis. That means there are no significant difference between hypothesized mean and observed sample mean. There is an improvement in the productivity after the training, as expected (i.e., 40k unit)

Hypothesis Testing: in the scale of standardized scale


Find out, based on the sample data as given below, whether company Xs on-the-job training is effective. You expect that after training workers should able to produce 40000 units per year. Past data indicates that there is normally a S.D. of 2000 units per year.

Critical Values Approach to Testing


Convert sample statistic (e.g.: X ) to test statistic (e g : z and t statistic) (e.g.: Obtain critical value(s) for a specified from a table or computer

n = 100.;.x = 39650( Number.of .Units), ;. = 2000

H 0 = 40000.( Hypothesized .Value.of .Pop.Mean) H 0 : = 40000.( Null.Hypothesis) H0: Yearly output per worker after training is 40000 units. H1: Yearly output per worker after training is not 40000 units. H 1 : 40000.( Alternate.Hypothesis) = 0.05.( Level.of .Significance) 2000 ( we .know ); So ; = x = = 200 .( SE .of .Mean )
n 100

z=

x H0

39650 40000 z= = 1.75 200

The 0.95 acceptance region contains two equal areas of 0.475. Z value for 0 0.475 475 of the area under the curve is 1.96

Don ' t.. Re ject .If ;1.96 < Z > 1.96 1.96 < ( 1.75) > 1.96
Since, the z value (i.e. -1.75) lies in the confidence interval ( 1.96 ) so accept the null hypothesis. That means there are no significant difference between hypothesized mean and observed sample mean. There is an improvement in the productivity after the training, as expected (i.e., 40k unit).

If the test statistic falls beyond the critical values, reject H0 Otherwise do not reject H0

p-Value Approach to Testing


Convert Sample Statistic (e.g.X ) to Test Statistic (e.g. Z, t or F statistic) Obtain the p-value p value from a table or computer
p-value: Probability of obtaining a test statistic more extreme ( or ) than the observed sample value given H0 is true Called observed level of significance

p-Value Approach to Testing


P-value is the probability of getting a value of sample mean far or farther from hypothesized population mean, given the H0 is true. In other way, how unlikely is the result we have observed. The p-value is precisely the largest significance level at which we would accept the H0. P-value is dependent p on test statistics while critical value is dependent on level of significance. Intuitively, the smaller p-value means smaller chances of sampling error and higher the significance.

C Compare th the p-value l with ith If p-value , do not reject H0 If p-value , reject H0

Customary Rule for P value


P-value P< 0.01 0 01 0.01 P < 0.05 0.05 P < 0.10 0 10P 0.10 Interpretation very strong evidence against H0 moderate evidence against H0 suggestive evidence against H0

Points to Remember About Hypothesis


The purpose of hypothesis testing is to make a judgment about the difference between the sample statistic and a hypothesized population parameter and not to question the computed value . If we assume the hypothesis is correct, then the significance level will indicate the percentage of sample means that is outside the certain limits. Theoretically, even if your hypothesis does fall in the acceptance region this does not prove the H0 is true; rather it means that does not region, provide statistical evidence to reject it. However, practice it means that when sample data does not provide us the statistical evidence to reject a H0 , we take it as that H0 is true.

little or no real evidence against H0

Source: Arsham H., Kuiper's P-value as a Measuring Tool and Decision Procedure for the Goodness-of-fit Test, Journal of Applied Statistics, Vol. 15, No.3, 131-135, 1988.

Level of Significance,
Defines unlikely values of sample statistic if null hypothesis is true Called rejection region of the sampling distribution Is designated by , (level of significance) Typical values are .01, .05, .10 Is selected by the researcher at the beginning Provides the critical value(s) of the test

Points to Remember About Hypothesis


Selecting a Significance Level
Th The higher hi h the h significance i ifi l level lf for testing i a hypothesis, the higher the probability of rejecting a null hypothesis when it is true. Needs to be careful about deciding significance level

Type I and Type II Errors

Errors / Result Probabilities


H0: Innocent Jury Trial The Truth Verdict Innocent Innocent Guilty Correct Error Error Do Not Reject H0 H Hypothesis th i Test T t The Truth Decision H0 True H0 False 1- Type I Error ( ) Type II Error ( ) Power (1 - )

Type I and Type II Errors: Decision Making Errors


Type I Error Rejects a true null hypothesis Has serious consequences The probability of Type I Error is Called level of significance (Set by researcher) Probability of not making Type I Error Called the confidence coefficient 1

Guilty

Correct Reject H0

T Type II Error E Fails to reject a false null hypothesis The probability of Type II Error is The power of the test is 1

General Steps in Hypothesis Testing


e.g.: Test the assumption that the average monthly leaves in a department is at least three (Known )

General Steps in Hypothesis Testing


(continued)

6. Set up critical value(s)

Reject H0

1. State the H0 2. State the H1 3 Choose 3. Ch

H0 : 3 H1 : < 3

7. Collect data 8. Compute test statistic and p-value 10. Express conclusion

-1.645 100 employees surveyed Computed test stat =-2, p-value = .0228 p

=.05 05
n = 100 Z test

4. Choose n 5. Choose Test

9. Make statistical decision Reject null hypothesis


The true average monthly leave is less than 3

One-tail Z Test for Mean ( Known)


Assumptions
Population is normally distributed If not normal, requires large samples Null hypothesis has or sign only

Rejection Region
H0: 0 H1: < 0 H0: 0 H1: > 0

Reject H0

Reject H0

0
Z Must Be Significantly Below 0 to reject H0

Z 0
Small values of Z dont contradict H0 Dont Reject H0 !

Z test statistic
Z= X x

X x

/ n

Example Solution: One Tail Test


H0: 368 H1: > 368

Example Solution: One Tail Test


one-sample tests | Z test for the mean, sigma i k known

Test Statistic:
Z=
Reject .05

= 0.05
n = 25 Critical Value: 1.645

=1.50
Microsoft Excel Worksheet

Decision:

Do Not Reject at = .05

Conclusion:

0 1.645 Z
1.50

No evidence that true mean is more than 368

Example Solution: Two-Tail Test


H0: = 368 H1: 368 Test Statistic:

Example Solution: Two-Tail Test


| one-sample tests | Z test for the mean, sigma i k known

= 0.05
n = 25 Critical Value: 1.96
R j t Reject .025 -1.96 .025

Z=

372.5 368 = 1.50 15 25

Decision: Do Not Reject at = .05 Conclusion: No Evidence that True Mean is Not 368

Microsoft Excel Worksheet

0 1.96 1.50

Hyp. Testing of Means When Pop. SD is not Known: Exercise 1


H 0 = 90; n = 20;
x = 84; s = 11

Hyp. Testing of Means When Pop. SD is not Known: Exercise 1


H 0 = 90; n = 20;
x = 84; s = 11
Use t-distribution as sample size <30; P SD i Pop.SD is unknown; k

H 0 : = 90; H1 : 90;. = 0.10

H 0 : = 90; H1 : 90;. = 0.10


x = n = Estimated .SE .of .Mean .of .a . Infinite . Pop x = 11 = 2 . 46 20

t=

x H 0 84 90 = = 2 . 44 x 2 . 46

Test the hypothesis

Reject the null hypothesis as t value is more than the critical value i.e. 1.729 (for 19 d.f. and 0.10 level of significance)

Less than than, more than than, less than or equal to to, and greater or equal to.

Use two-tailed test when questioned is worded


Equal to, different from, or changed from

Source: http://www.sjsu.ed du/faculty/gerstman/StatPrime er/t-table.pdf

T- Distribution Ta able

Two Sample Tests

Source: http://pluto.huji.ac c.il/~msby/columbia/tabla_normal1. .pdf

Use one-tailed test when questioned is worded

Standard Normal Distrib bution Table

Useful Tips: Deciding Between One-tailed and Two-tailed Test

Sampling Distribution for Difference between Sample Means

Sampling Distribution for Difference between Sample Means


x1 x 2 = 21
n1
21 n1

22
n2
22 n2

.( ( SE.of .the.Diff .between.Two.Means)

x1 x 2 =

.( Estimated .SE.of .the.Diff .between.Two.Means)


Extending our earlier equation for two samples in order to find out Z value

x1 = 1
Sampling Dist. of the diff. b between sample l means

x2 = 2
z=
x 1 x 2 = Diff .Between.SampleMean s

x
x1x2

z=
Dist .of . All .Possible .Values ..of .x 1 x 2

( x1 x 2 ) (1 2 )H 0

Tests for Differences Between Means: Large Sample


City Delhi (1) Mumbai (2) Average T-Shirt Price ($) 8.95 9.10 Sample SD 0.40 0.60
Prices in Delhi and Mumbai are not equal

Level of Significance and the Rejection Region

Sample Size 200 175

H 0 : 1 = 2 (there .is.no.difference ) Price in Delhi and Mumbai are equal H 1 : 1 2 (there .is.a.difference )

= 0 .05
x1x 2 = 21 22 n1 + n2 .= (0.40) 2 (0.60) 2 + 200 175
Standardizing the difference of sample means; H0=hypothesized difference between two means Since 2.83>1.96, reject the null hypothesis H0 Interpretation: There is significant difference in T-Shirt price at Delhi and Mumbai

= 0.00286 = 0.053

z= =

( x1 x 2 ) (1 2 )H 0 x1x2

(8.95 9.10) 0 = 2.83 0.053

Test for Differences Between Means: Large Sample: Exercise 1


City Delhi (1) M b i (2) Mumbai Average T-Shirt Price ($) 8.95 9 10 9.10 Sample SD 0.40 0 60 0.60 Sample Size 200 175 City

Test for Differences Between Means: Large Sample: Exercise 1


Average T-Shirt Price ($) 8.95 Sample SD 0.40 Sample Size 200 Delhi (1)

Question: Please test whether prices are about 0.10$lower in Delhi than Mumbai, at 0.05 level.

Mumbai (2) 9 10 9.10 0 60 0.60 175 Question: Please test whether prices are about 0.10$lower in Delhi than Mumbai, at 0.05 level.

H 0 : 1 = 2 0.10

Prices are 0.10$ lower in Delhi than Mumbai;

H1 : 1 2 0.10 Prices are not 0.10$ lower in Delhi than Mumbai = 0.05
z= = ( x1 x 2 ) (1 2 )H 0 x1 x2
Standardizing S d di i the h diff difference of f sample l means; H0=hypothesized difference between two means

(8.95 9.10) (0.10) = 0.94 0.053


Since -0.94<1.96, we do not reject the null hypothesis H0 Interpretation: Prices are 0.10$ lower in Delhi than Mumbai

Test for Differences Between Means: Small Sample Size: Independent Samples
Weighted average of
2 s12 .and .s 2

Test for Differences Between Means: Small Sample Size: Exercise 2


Training type Firm A Average customer handled/day 80 70 Sample SD 12 17 Sample Size (employees) 10 14

2 (n1 1) s12 + (n 2 1) s 2 s2 .( Pooled .Estimate.of . 2 ) p = n1 + n 2 2

Firm B

H 0 : 1 = 2 (there .is.no.difference .in. productivi ty .of .two . firms ) H 1 : 1 > 2 ( productivi ty .is.higher .in. firm1.than . firm 2) = 0.05

Estimated Standard Error of the Difference between Two Sample Means with Small Samples and Equal Pop. Variance

x1 x 2 = s p

1 1 + n1 n 2

Test for Differences Between Means: Small Sample Size: Exercise 2


Training type Firm A Firm B Average customer handled/day 80 70 Sample SD 12 17 Sample Size (employees) 10 14

Tests for Differences Between Means: Small Sample Size: Dependent (Paired) Samples
Sale Before Sale After s.no. Promotion Promotion 1 2 3 4 5 6 7 8 9 10 60 70 65 63 67 70 73 75 64 68 64 75 68 64 69 74 75 79 70 71 Various Forms of Hypothesis?
Considering Two Samples

H 0 : 1 = 2 (there .is.no.difference .in. productivi ty .of .two . firms ) H 1 : 1 > 2 ( productivi ty .is.higher .in. firm1.than . firm 2) = 0.05

H 0 : 1 2 = 2.5.; H 1 : 1 2 > 2.5.;. = 0.05


Test the Hypothesis ?

s2 p =

2 (n1 1) s12 + (n 2 1) s 2 (10 1)(12) 2 + (14 1)(17) 2 .= n1 + n 2 2 10 + 14 2

s2 p =

(9)(144) + (13)(289) = 229.68; OR..s p = 15.16,.taking.SQRT .of .both.sides 22


1 1 1 1 + = 15.16 + = 15.16 0.414 = 6.27.( Est.SE.of .Diff .) n1 n 2 10 14

H 0 : 1 2 = 2.5.; H1 : 1 2 > 2.5


Considering One Sample of Change in Sales

x1 x 2 = s p

Critical Value for 22 d.f. for 0.05 level ( x1 x 2 ) (1 2 )H 0 (80 70) 0 t= = = 1.59 of significance is 1.717. x1 x 2 6.27

H 0 : = 2.5.; H 1 : > 2.5

So accept the Null hypothesis because 1.59<1.717. Interpretation: There are no significant differences between two firms.

Tests for Differences Between Means: Small Sample Size: Dependent (Paired) Samples
Sale Before Sale After sno Promotion Promotion Diff (x) Diff (xsq.) 1 2 3 4 5 6 7 8 9 10 mean 60 70 65 63 67 70 73 75 64 68 67.5 64 75 68 64 69 74 75 79 70 71 70.9 4 5 3 1 2 4 2 4 6 3 34 16 25 9 1 4 16 4 16 36 9 136
t=
x = n = 1 .505 10 = 0 .476
s= x=

H 0 : 1 2 = 2.5.; H1 : 1 2 > 2.5.; = 0.05


x=

Deciding When To Use Two-Tailed and One-Tailed Test


If the test concerns whether two means are or are not equal
Use a two-tailed test that will measure whether one mean is different (higher or lower) from the other.

x ,&. x &s=
n
34 = 3 .4 10

n 1

nx n 1

136 10 ( 3 .4 ) 2 = 9 9

2 .267 = 1 .505

x H0 3 .4 2 .5 = = 1 . 890 x 0 . 476

If the test concerns whether one mean is significantly higher or significantly lower than the other
Use a one tailed test.

Since T value (1.890) is more than critical value (1.833), for 0.05 level of significance, so reject the null hypothesis. Interpretation: There is more improvement in sales than the hypothesized.

Analysis Of Variance: ANOVA


ANOVA and F-test
ANOVA is a statistical technique designed to test whether the means of >2 2 population(variable) are equal.
Testing for the significance of the difference among >2 sample (variable) means. Whether the samples are drawn from populations having the same means.
E.g. three or more promotion methods differ significant from one other, in terms of effectiveness. E.g. Differences in crop yield by applying three or more fertilizers

ANOVA: Exercise 1
A firm adopted three promotional campaigns in three different territories for the same product, and is interested in knowing whether one of them is more effective than others in generating sales. 5 days sales figures are observed at random from each method, which are given below.
Day 1 2 3 4 5 News Paper Ads(1) 25 30 36 38 31 TV Advertising (2) 31 39 38 42 35 Direct Marketing (3) 24 30 28 25 28

Analysis Of Variance: ANOVA


Stating Hypothesis
H 0 : 1 = 2 = 3 ( Null .Hypothesis ) H 1 : 1 2 3 ( Alternativ e.Hypothesis )

Assumptions
Each of the sample is drawn from a normal populations Each of the population has same variance, 2 . However, in larger sample size normality assumption is not must.

Determine whether the promotion methods are significantly different

Analysis Of Variance: ANOVA


ANOVA is based on a comparison of two different estimates of the variance, 2 . Variance between the various sample means Variance within the various sample means Compare two estimates of 2
Two estimates of 2 should be approximately equal for H0 to be true. If H0 is not true then two estimates of 2 will differ significantly.
2 = sx

ANOVA: Calculating the Variance


( x x)
k 1
2

(Variance .between .the .sample .means )

where : x = sample .mean ; x = grand .mean ; k = the .number .of .samples

x
2

n
2 x

= ( we .know )
n .( Pop .Variance )

2 2 = sx n. =

n( x x)
k 1

( Estimated .Pop .Variance )

So So, the process of ANOVA is Determine estimate of 2 from the variance between the sample means within the sample means Compare two estimates of 2 to reject or accept H0

2 b =

( Estimate.of .Between.Column( Sample).Variance) k 1 where : n j = size.of .the. jth.sample; x j = sample.mean.of .the. jth.sample

n j ( x j x) 2

ANOVA: Calculating the Variance


s2 =

Using F-test for Hypothesis Testing


F Statistic = b2 2 w where ; b2 = between .column . var iance
2 w = within .column . var iance

( x x)
n 1

( Sample.Variance)

2 w =

nj 1 2 .s j ( Estimate.of .Within Column.Variance) nT k


is nothing but weight assigned n j 1 Term 2 to s j . n k Infact, its just fraction of the total T number of number of degrees g of freedom.

where : n j = size.of .the. jth.sample; s2 p . var iance.of f .the. j jth.sample p j = sample nT = n j = total.sample.size k = number.of .samples

F-ratio should be one (close to one), if null hypothesis is true. So, if smaller the F-value, more likely y that H0 is true. Larger the F-value less likely that H0 is true. In practice, when populations are unequal, the variance among the sample means tends to be higher than the variances within the sample. This results into higher F-value leading to rejection of H0

The F-distribution

The F-distribution
Calculating d.f. for distribution
d . f .in .the .numerator .of .the .F .ratio = ( k 1)

d . f .in.the.deno min ator.of .the.F .ratio = (n j 1) = nT k


where : n j = size.of .the. jth.sample;

The F-distribution F distribution is useful to calculate the ratios of variances of normally distributed statistics. The shape of an F distribution is positively skewed and depends on the degrees of freedom for the numerator and the denominator. The value of F is always positive. The F-distribution depends on the number of d.f., however, it approaches symmetry with increasing d.f.

nT = n j = total.sample.size k = number b .of f .samples l

Comparing F-value with critical value Taking the decision: rejecting and accepting the H0

Points to Remember about F-Test


Sample size should be large enough Control other factors that might influence the variance This considers only one factor. However, two-factor problems can be tackled by two-way analysis of variance.

2-Way ANOVA Main Effects and Interaction Effects

Source: http://www.socialresearchmethods.net/kb/expfact.php

Source: http://www.socialresearchmethods.net/kb/expfact.php

For all settings, the 4 hrs/week condition worked better than the 1hr/week

In class training was better than pull out training for all amount of time

Source: http://www.socialresearchmethods.net/kb/expfact.php

Source: http://www.socialresearchmethods.net/kb/expfact.php

4hrs/week always works better than 1 hr/week and in-class setting always works better than pull-out

4hrs/week and in-class setting does better than the other three

Source: http://www.socialresearchmethods.net/kb/expfact.php

Source: http://www.socialresearchmethods.net/kb/expfact.php

Chi Square Test Chi-Square

Source: http://www.socialresearchmethods.net/kb/expfact.php

Chi-Square (

Test: Why and When to Use?

Chi-Square Distribution
v=degree of freedom Sampling dist. of ( 2 ) statistic can be approximated by a continuous curve known k as a ( 2 ) distribution, given H0 is true. There is different dist. for each of different number of degrees of freedom. Smaller the d.f., rightly skewed the dist. ( 2 ) curve tend to approach symmetry with increasing d.f. and can become normal. Dist. is nothing but a prob. dist. so 2 total area under the curve in each ( ) dist. is 1.0

>2 samples (variables) are being investigated. Chi-square as a test of independence of attributes. Chi-square test allows to test whether more than two population proportions can be considered equal.
Whether the differences observed among various sample proportions are significant or only due to chance.

Chi-square as a Test of Goodness of Fit


between observed and theoretical distribution

Chi Square: Concepts & Testing Procedures


Contingency table
Table composed p of rows and columns

Estimating Chi-Square Value


RTxCT ; (to.generate .contingenc y.table ) n where ; f e = exp ected . frequency .in.a.given.cell fe = RT = row.total . for .the.row.containing .that .cell CT = column .total . for .the.column .containing .that .cell n = total .number .of .observatio ns

Stating the Hypothesis

H 0 : p s1 = p s 2 = p s 3 = p s 4 ( Null .Hypothesis )

H 1 : p s1 p s 2 p s 3 p s 4 ( Alternativ e.Hypothesis )

If the columns are not contingent on rows, than the rows and column frequencies are independent. Test to whether columns are contingent on rows, is known as chisquare test of independence independence.

2 =

( fo fe )2 .(Chi Square.Statistic ) fe

Observed and expected frequencies Calculating expected frequencies Comparing expected and observed frequencies Accepting and rejecting the null hypothesis

where; f o = observed . frequency; f e = exp ected . frequency

Degree.of . freedom = (no.of .rows 1)(no.of .columns 1); or = (r 1)(c 1)

Chi-Square: Exercise 1
An IT firm was trying to find out what variables influenced the attrition rate. Type of education institution was suggested as possible variable influencing attrition. A sample of 500 employees was selected, which is given below.
With the firm Tier I eng. college (IITs) Tier II eng. College (NITs) Tier III eng. (others) 0 50 80 Left the firm 250 100 20

Chi-Square: Exercise 2
Following table shows the classification of 1000 workers in a factory, according to disciplinary action taken by the management and their promotional experience.
Disciplinary Action Promotional Exp Promoted Offenders Non-offenders 30 70 Not Promoted 670 230

Is there any evidence from above data of a relation between type of college and attrition?

Examine whether the disciplinary action taken and promotional experiences are associated.

Chi-square Test: Points to Remember


When the expected frequencies are too small, the value of chisquare will be overestimated, leading to many rejections of null hypothesis. Sample size should be large enough. Expected frequency should not be <5 in one cell of a contingency table combine the two categories in order to increase cell value. Zero chi-square value does not mean that there are no differences between observed and expected frequencies. The rows and columns of a contingency table must be mutually exclusive categories that exhaust all the possibilities of the sample.

Locating the Critical Value in the Chi-square Table: Right Tailed vs. Left Tailed Test
Right Tailed Test , use the upper pp limit table for the right g tailed To find out critical value, tests Left Tailed-Test To find the critical chi square value for a left tailed test, you will be using the table labeled "lower limits. Subtract your significance level (given by the Greek symbol alpha) from 1. 1 For example, example if your significance level is 0 0.025, 025 then 1-0 1-0.025 025 = 0.975. Find this value at the top of the chi square table, heading a column.