53 views

Uploaded by Jagadeesh Rocckz

parametric and non parametric

- Nonparametric Methods
- 18-19-20 Hypothesis Testing, Parametric and Non-Parametric Test.ppt
- Attitude of Nurses in Government Health Institutions Towards Sickle Cell Persons
- Business Statistics
- Hypothesis Testing
- IE27_17_HypothesisTestingOnMeanandVarian
- Statistical tests, P values, confidence intervals, and power = a guide to misinterpretations
- Week 1 Quiz
- Week 7 Homework Problems3
- The Effect of Implementing Podcast in Enhancing Students’ Speaking Achievement in the Fully Digital Era
- Week 11-Fundamentals of Hypothesis Testing
- Que
- Factors Influencing the Rise of House Price in Klang Valley
- skittles project 2
- 1DrVinayKandpaljanissue2015.pdf
- Worksheet, Hypothesis Testing, Fa 14 w Ans
- TB Ch 10
- Midterm Exam With Answers
- Four Steps of Hypothesis
- sw3_exeol_oddonly.docx

You are on page 1of 81

Managers

Using Excel for Data Analysis

Syllabus

Code: 302

Credits: 3

Unit I

Digital Data Introduction, Types of Digital Data: Structured Data,

Unstructured Data, and Semi-Structured Data; Exploring and Discovering

Data; Introduction to OLTP: Queries, Advantages and Challenges;

Introduction to OLAP: One-dimensional, Two-dimensional and Threedimensional data; OLAP Architectures: MOLAP, ROLAP, HOLAP; Role of

OLAP Tools in BI Architecture.

Introduction to Big data; BI Component Frameworks: Business,

Administration and Operation, and Implementation Layer; BI is for

Whom?; BI Applications; BI Roles and Responsibilities; Basics of Data

Integration; Data Warehousing: Data Mart; What constitutes a Data

Warehouse?; Introduction to ETL; Data Integration.

CONTD...

UNIT II

Programming; Data Manipulation in R: Vectors, Basic Math, and Matrix Operations;

Summarizing Data: Numerical and Graphical Summaries; Data Visualization in R; Data

Transformation; Data Import Techniques in R; Time Series and Spatial Graphs; Graphs

for Categorical Responses and Panel Data.

Data Modeling using Excel Overview of Excel; Basic Data Summaries: Measures

of Central Tendency, Measures of Dispersion, and Measures of Skewness and

Kurtosis; Introduction to Parametric Hypothesis Testing: One and Two Sample Tests

Z Test and t-Test; Chi-square and Non-parametric Hypothesis Testing: Chi-square

Goodness-of-Fit Test, Sign Test, Wilcoxon Signed Rank Sum Test, Mann-Whitney U

Test; Linear Correlation and Regression Analysis; Time Series Data and Analysis.

Unit III

Linear Discriminant Analysis; Exploratory Factor Analysis; Confirmatory Factor

Analysis; Conjoint Analysis; Data Mining: Clustering Techniques, Association Rule

Mining and Sentiment Analysis, Decision Trees and Random Forests; Structural

Equations Modeling.

CONTD...

Suggested Readings

Prasad R N and Acharya Seema (2013), Fundamentals of Business Analytics, Wiley India Pvt.

Ltd., New Delhi.

Glyn Davis and Branko Pecar (2013), Business Statistics using Excel, Oxford University Press,

New Delhi.

Halady Rao Purba (2013), Business Analytics an Application Focus, PHI Learning Private

Limited, New Delhi.

Jank Wolfgang (2011), Business Analytics for Managers, SpringerScience + Business Media,

ISBN 978-1-4614-0405-7.

Davenport Thomas H et al. (2008), Competing on Analytics, Pearson Publication, USA.

Decision support systems and business intelligence HBR Press.

E Turban, et al. (2008), Business Intelligence: A Managerial Approach, Pearson Prentice Hall.

Mosimann R et al. (2007), The performance manager: Proven strategies for turning information

into higher business performance, Cognos Press.

Articles

Solomon Negash, Business Intelligence, Communications of the Association for Information

Systems (Volume 13, 2004) 177-195.

Sara Philpott, Advanced Analytics: Unlocking the Power of Insight, IBM, 2010.

Pam Baker, Using Data Visualizations To Drive Business Decisions, Fiercebigdata, 2013.

Topic

Overview of Excel

Basic Data Summaries

Session

Session 1

o Measures of Dispersion

o Measures of Skewness & Kurtosis

Session 2,3

o Chi-square Test

o

o

o

o

Session 4,5

Sign Test

Wilcoxon Signed Rank Sum Test

Mann-Whitney U Test

Correlation Analysis

Linear Regression Analysis

Time Series Data and Analysis

Session 6

Logistic Regression

Session 7

Session 8

Contd...

Topic

Session

Session 9

Session 10

Conjoint Analysis

Session 11

Data Mining

Session 12,13,14

Clustering Techniques

Association Rule Mining & Sentiment Analysis

Decision Trees & Random Forests

Structural Equations Modeling

Introduction to Parametric

Hypothesis Testing

hypothesis tests for one

and two samples where the

population is considered to

be normally distributed

Learning Objectives

On completing this unit you should be able to:

Understand concept of

parametric & non-parametric tests

one and two tail tests

type I and II errors

Conduct one sample hypothesis tests for the sample mean and

proportion

Conduct two sample hypothesis tests for the sample mean and

proportion

Conduct an F Test for two population variances

Solve hypothesis problems using the Microsoft Excel

Hypothesis Testing

Rationale

Hypothesis: statement about the value of a population parameter

developed for the purpose of testing

Deciding between two possibilities based on data - Is it real? Or is it just

coincidence? A hypothesis is either TRUE or FALSE

Example: - Hypothesis statement average salary of accountants is 31000 can be

measured & assessed through the variable salary

hypothesis is a reasonable statement & should not be rejected, or is an

unreasonable statement & should be rejected, based on sample

evidence & probability theory

Null hypothesis (H0): also known as the hypothesis of no difference and

is formulated in anticipation of being rejected as false

Alternative hypothesis (H1): is a positive proposition which states that a

significant difference exists

Example: - The average salary of accountants is 31000

Null hypothesis H : = 31000

Alternative hypothesis H :

Contd...

Level of significance - represents the amount of

risk an analyst will accept when making a decision

represents the amount of error associated with rejecting the null

hypothesis when it is true

Usually expressed as % and denoted by (Alpha)

Its normally 5% (0.05) or 1% (0.01), or sometimes 10% (0.1)

Value of depends upon how sure you want to be that your

decisions are an accurate reflection of the true population

relationship

Example: 5% LOS implies that there are about 5 chances in 100

of rejecting the H0 when it is true or we are 95% confident that we

will make a correct decision

1. If we sampled from a population data set that is normally distributed then the

sampling distribution for the sample mean will be normally distributed with sample

mean = population mean, with sampling error X n

2. For populations that are not normally distributed we can make use of the Central

Limit Theorem. For large n, the sampling distribution approximates to the normal

distribution.

3. For small sample size we employ the Student t distribution which states that if a

population is normally distributed then the sample mean is normally distributed with

sample mean = population mean, with the sampling error estimated using the sample

Contd...

We stated earlier that the alternative hypotheses is of the form

H1: 31000

Two tailed test - The sign tells us that we are not sure what the

direction of the difference will be (< or >) but that a difference exists

One tailed test - It is possible that we are assessing that

the average accountant salary is greater than 31000

implying H1: > 31000

is smaller than 31000

implying H1: < 31000

Two tail test H1: 100

100

hypothesis H0 made on the basis of 1)Use the p-value (via Excel)

information supplied by sample data

p-value represents the probability of the

results in one 2 types of errors

calculated random sample test statistic being

Type I Error: Committed by the test in

rejecting a true null hypothesis. Probability

of committing type I error is denoted by

Type II Error: Committed by the test in

accepting a false null hypothesis.

Probability of committing type II error is

denoted by

Your Decision

The Truth

Accept Null

Hypothesis

H0

Null

Hypothesis

H0

Correct

Decision

Research

Hypothesis

H1

Type II Error

[not easily

controlled]

Contd...

Accept Research

Hypothesis

H1

Type I Error

[level 0.05]

Correct

Decision

p-value is compared with the chosen

significance level () to make a decision

between accepting or rejecting the null

hypothesis H0

If p < , then reject null hypothesis H0

because low probability events are unlikely

to occur & accept alternative hypothesis H1

2)Calculate the test statistic & compare with a

critical test statistic

Calculate the test statistic and compare the value

with a critical test statistic estimate from an

appropriate table or via Excel

Value of the critical test statistic will depend upon significance level for z test problems and

significance level and number of degrees of

freedom for t test problems

If test statistic > critical test statistic then we

would reject null hypothesis H0 and accept

Tests of hypothesis are usually classified into two methods: parametric

and non-parametric:

Parametric methods - make assumptions about the underlying distribution

from which sample populations are selected

bell-shaped curve)

Data is at the interval/ratio level of measurement

distribution

are often based upon data that has been ranked, rather than actual

measurement data

One sample test - involves testing a sample parameter (e.g. mean value)

against a perceived population value (e.g. accountant salary 31000) to

ascertain whether there is not a significant difference between a sample

statistic and a population parameter

whether or not there is a significant difference between two samples and,

consequently, whether or not the two samples represent different populations

Test Statistic

P-value Method

Z-test

Two-tail tests

=2*(1-NORMSDIST(ABS(Z-value))

=NORMSINV(/2)

(Lower value)

Lower-tail test

=NORMSDIST(Z-value)

=NORMSINV()

Upper-tail test

=1-NORMSDIST(Z-value)

=NORMSINV(1-)

Two-tail tests

=TDIST(ABS(t-value),df,2)

= -TINV(,df) (Lower value)

Lower-tail test

=TDIST(ABS(t-value),df,1)

=TINV(2*,df)

Upper-tail test

=TDIST(ABS(t-value),df,1)

= -TINV(2*,df)

Two-tail tests

=FDIST(F-value, df1,df2)

=FINV(/2, df1,df2) (Lower value)

Lower-tail test

=1-FDIST(F-value, df1,df2)

=FINV(, df1,df2)

Upper-tail test

=FDIST(F-value, df1,df2)

=FINV(1-, df1,df2)

T-test

Summary

F-test

100

Assumptions

Sample data is randomly collected from a population

Population normally distributed

Population standard deviation is known

Example 1:- Historical output of

employees of a firm produce 100 units

per hour with a standard deviation of 20

units per hour. A new employee is tested

on 36 separate random occasions and

found to have an output of 90 units per

hour. Does this indicate that the new

employee's

output

is

significantly

different from the average output?

H0: = 100, H1: 100

Given = 5% = 0.05, = 100, = 20, n = 36,

X=

90

method (Zcri)

Step 1: State null and the alternate hypothesis - H0: = 100, H1: 100

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic -

P-value Method - If p < , then reject null hypothesis H0 & accept the

alternative hypothesis H1

X 90 100

Zcal

3

From Excel, Two tail p-value = 0.0026998

n

20 36

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject null hypothesis H0 and accept alternative hypothesis H1

From Excel, Two tail critical value = NORMSINV() = 1.96

As 3 > 1.96, Reject H0

Step 5: Interpretation

Evidence suggests that the new employee output is significantly different at 5%

from the firms existing employee output i.e., sample mean value (90 units

per hour) is not close enough to the population mean value (100 units per

Test

Statistic

P-value Method

Two-tail

tests

=2*(1-NORMSDIST(ABS(Z-value))

=NORMSINV(1-/2)

=NORMSINV(/2)

(Upper value)

(Lower value)

Zcal

X 90 100 3

n

20 36

Assumptions

If the population standard deviation is not known then t-test uses the sample

standard deviation, s, as an estimate of the population standard deviation,

If the population distribution is normal

Example 2: - A local car dealer wants to know if

the purchasing habits for extra fittings by a male

buyer has changed. Based upon collected data

he has estimated that the distribution of extra

fittings purchased is approximately normally

distributed with an average of 2000 per

customer. To test this hypothesis he has

collected the data of the purchases made by the

last seven male customers (): 2300, 2386,

1920, 1578, 3065, 2312 and 1790. Test whether

the extras purchased on average has changed.

H0: = 2000, H1: 2000

Given = 5% = 0.05, = 2000, = unknown, n = 7,

X=

2193

method (tcri)

Step 1: State null and the alternate hypothesis - H0: = 2000, H1: 2000

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic -

P-value Method - If p < , then reject null hypothesis H0 & accept the

X 2193 2000 1.0429

alternative hypothesis H1

t cal

s n

489.62.. 7

From Excel, Two tail p-value = 0.337182452

As 0.337 > 0.05, Accept H0

df n 1 6

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject null hypothesis H0 and accept alternative hypothesis H1

From Excel, Two tail critical value = TINV() = 2.45

As -2.45 < 1.04 < 2.45, Accept H0

Step 5: Interpretation

extras

2000.

purchased by the sample and the historical extras purchased of

t cal

s

489.62.. 7

df n 1 6

Test

Statistic

Two-tail

tests

P-value Method

=TDIST(ABS(t-value),df,2)

= -TINV(,df) (Lower value)

Example 3:- A large organisation

produces electric light bulbs in each

of its two factories (A and B). It is

suspected that the quality of

production from factory A is better

than from factory B. To test this

assertion the organisation collects

samples from factory A and B and

measures how long each light bulb

works (in hours) before the light bulb

fails. Conduct an appropriate test to

test this hypothesis?

H0: A B

H1: A > B

Given = 5% = 0.05, 2A = 52783, 2B = 61560, nA = 30, nB = 32

method (Zcri)

Step 1: State null and the alternate hypothesis - H0: A B

H1: A > B

Step 3: Select the test statistic -

P-value Method - If p < , then reject null hypothesis H0 & accept the

alternative hypothesis H1

Z cal

As 0.0000158 < 0.05, Accept H1

X X

A

2

A2

B

n

n

A

B

4.16

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Upper tail critical value Zcri = +1.64

As 4.16 > 1.64, Accept H1

Step 5: Interpretation

At the 0.05 level of significance, the light bulbs from factory A have significantly

longer life time than the light bulbs from factory B.

Z cal

Test Statistic

P-value Method

Upper-tail test

=1-NORMSDIST(Z-value)

=NORMSINV(1-)

X X

A

A

B

n

n

A

B

4.16

Analysis ToolPak

solution

Mean

Known Variance

Observations

Hypothesized Mean

Difference

Variable

2

1135.333 894.218

333

75

46516.6 57845.9

30

32

Variable 1

4.160713

018

1.58628EP(Z<=z) one-tail

05

1.644853

z Critical one-tail

627

3.17256EP(Z<=z) two-tail

05

Select Data > Data Analysis > Z Test: Two1.959963

Sample for Means

z Critical two-tail

Proportion

concerned with the number of

passengers not wearing rear seat

belts in cars decided to undertake a

series of surveys in two large cities.

The survey consisted of two

independent

random

samples

collected from city A and B and the

police authority would like to know if

the proportions of passengers wearing

seat belts between city A and B are

different. Conduct an appropriate test

to test this hypothesis?

H 0 : A = B H 1 : A B

Given = 5% = 0.05, NA = 250, NB = 190, nA = 135 nB = 80, A ~ A B ~ B

method (Zcri)

Step 1: State null and the alternate hypothesis - H0: A = B

H1: A B

Step 3: Select the test statistic - Z-distribution (Large samples)

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

A B

hypothesis H1

Z

2.49

cal

As 0.013 < 0.05, Accept H1

A 1 A B 1 B

NA

NB

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value Zcri = 1.95

Step 5: Interpretation

At the 0.05 level of significance, We conclude that a significant difference exists between

the proportions of rear passengers wearing seat belts between city A and B.

Test Statistic

Two-tail tests

P-value Method

=2*(1-NORMSDIST(ABS(Z-value))

City A City B

No. Interviewed, N

250

190

No. wearing seat

belts, n

135

80

=NORMSINV(1-/2)

(Upper value)

=NORMSINV(/2) (Lower value)

Z cal

A B

2.49

A 1 A B 1 B

NA

NB

(independent samples, equal variances Pooled t-test)

Example 5:- A certain product of organic beans are packed in

tins and sold by two local shops. The local authority have

received complaints from customers that the amount of beans

within the tins sold by the shop are different. To test this

statistically two small random samples were collected from both

shops.

H0: 1 = 2 H1: 1 2

Given = 5% = 0.05, 1 = 2 unknown (pooled)

n1 = 18 n2 = 25

method (tcri)

Step 1: State null and the alternate hypothesis - H0: 1 = 2 H1: 1 2

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic - t-distribution

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

n1 1 s12 n 2 1 s 2 2

From Excel, Two tail P-value = 0.036

As 0.036 < 0.05, Accept H1

A B

n1 n 2 2

2082.017

df n1 n 2 2 41

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value tcri = +2.019

As 2.156 > 2.019, Accept H1

Step 5: Interpretation

t cal

1 2

X2

1

1

n n

1

2

2.156

We conclude that based upon the sample data collected that we have evidence

that the quantity of beans sold by shops A and B are significantly different at the

5% level of significance. It should be noted that the decision will change if you

Test Statistic

Two-tail tests

P-value Method

=2*(1-NORMSDIST(ABS(Z-value))

=NORMSINV(1-/2)

=NORMSINV(/2)

(Upper value)

(Lower value)

A B

n 1s

1

n 2 1 s 2

2082.017

n1 n 2 2

2

df n1 n 2 2 41

t cal

1 2

X2

1

1

n n

2

1

2.156

Analysis ToolPak

solution

Variable 1 Variable 2

527.05555

Mean

56

496.64

2603.1143 1712.906

Variance

79

667

Observations

18

25

2082.0171

Pooled Variance

82

Hypothesized Mean

Difference

0

df

41

2.1563816

t Stat

53

0.0184852

P(T<=t) one-tail

15

1.6828780

t Critical one-tail

03

0.0369704

P(T<=t) two-tail

3

t

Test

for

2.0195409 Means

(independent samples, unequal variances)

Example 6:- A certain product of organic beans are packed in

tins and sold by two local shops. The local authority have

received complaints from customers that the amount of beans

within the tins sold by the shop are different. To test this

statistically two small random samples were collected from both

shops.

H0: 1 = 2 H1: 1 2

Given = 5% = 0.05,

1 and 2 unknown, Distribution unknown

n1 = 18 n2 = 25

method (tcri)

Step 1: State null and the alternate hypothesis - H0: 1 = 2 H1: 1 2

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic t-distribution

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

X1 X 2

hypothesis H1

t cal

2.083

2

2

s1

s2

From Excel, Two tail P-value = 0.044

n

n

2

1

0.044 < 0.05, Accept H

1

Critical test Statistic Method - If test statistic > critical test statistic then we

hypothesis

H1

2

2

2

s1

s2

n2

n1

s12

s22

n

n

1

2

n1 1

n 2 1

df

32

Step 5: Interpretation

We conclude that based upon the sample data collected that we have evidence

that the quantity of beans sold by shops A and B are significantly different at the

5% level of significance. It should be noted that the result in this case rests at

Test Statistic

Two-tail tests

P-value Method

=TDIST(ABS(t-value),df,2)

=TINV(,df) (Upper value)

= -TINV(,df) (Lower value)

X1 X 2

t cal

s1 s 2

n1 n 2

2

2.083

df

s12 s 2 2

n

n

1

2

32

2

s1

s22

n

n

2

1

n1 1 n 2 1

Analysis ToolPak

solution

variances

Variances

Variable 1 Variable 2

527.05555

Mean

56

496.64

2603.1143 1712.906

Variance

79

667

Observations

18

25

Hypothesized Mean

Difference

0

df

32

2.0833856

t Stat

02

0.0226440

P(T<=t) one-tail

68

1.6938887

t Critical one-tail

03

0.0452881

Two Sample

t-Test

for

Means

P(T<=t) two-tail

36 assuming

unequal

2.0369333

(dependent samples Paired t-test)

reduction program that they advertise will result in more than a 10

lb weight loss in the first 30 days. Twenty six subjects were

independently randomly selected for a study and their weights

before and after the weight loss program were recorded. Super

Slim have stated that the historical data shows that the

populations are normally distributed.

H0: D = 1 2 10

H1: D > 10

Given = 5% = 0.05,

Distribution unknown

1 unknown, 2 unknown

n1 = n2 = 26

Assume n large, CLT applies

method (tcri)

Step 1: State null & the alternate hypothesis - H0: D = 1 2 10

H1: D > 10

Step 3: Select the test statistic -

t-distribution

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

2

2

d

hypothesis H1

n

From Excel, Upper one tail P-value = 0.0093

As 0.0093 < 0.05, Accept H1

sd

n 1

14.57

df n 1 25

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Upper one tail critical value tcri = + 1.708

As 2.5178 > 1.708, Accept H1

t cal

dD

2.5178

sd n

Step 5: Interpretation

Conclude that the average weight loss is more than 10 Ibs at a 5% level of

significance. Again, you would have a borderline decision at 5%.

d d

2

Analysis ToolPak

solution

sd

n 1

14.57

df n 1 25

dD

t cal

2.5178

sd n

Select Data > Data Analysis > Two Sample t Test Assuming Paired Samples

Example 8:- A certain product of organic beans are packed in tins and

sold by two local shops. The local authority have received complaints

from customers that the amount of beans within the tins sold by the

shop are different. To test this statistically two small random samples

were collected from both shops. Use F-test to check if the two

population variances can be considered equal with a 95% confidence.

H0: 12 = 22

H1: 12 22

Given = 5% = 0.05

method (Fcri)

Step 1: State null & the alternate hypothesis - H0: 12 = 22

H1: 12 22

Step 3: Select the test statistic - F-distribution

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

2

From Excel, Two tail P-value = 0.3393282

As 0. 3393282 > 0.05, Accept H0

s A

F 2 1.5197

s B

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel,

As FL(0.39) < F(1.5197) < FU(2.38), Accept H0

df numerator n A 1

df denominator n B 1

Step 5: Interpretation

Conclude that the two population variances are not significantly different at the

95% level of confidence

Test Statistic

Two-tail tests

P-value Method

=FDIST(F-value, df1,df2)

=FINV(/2, df1,df2) (Lower value)

=FINV(1-/2, df1,df2) (Upper value)

s2A

F 2 1.5197

s B

df numerator n A 1

df denominator n B 1

Variable 1 Variable 2

527.05555

Mean

56

496.64

2603.1143 1712.9066

Variance

79

7

Observations 18

25

df

17

24

1.5197059

F

07

tail

07

tail

55

Select Data > Data Analysis > F Test for Two Population Variances (Variance Ratio Test)

Conclusion

In this presentation we explored the concept of hypothesis testing:

Thank You

Hypothesis Testing

This chapter will provide an overview to the chi squared distribution ( 2) and nonparametric tests that can be used when parametric methods are not appropriate.

Learning Objectives

On completing this unit you should be able to:

Apply the chi square test to measure the difference between

two proportions from two samples

Apply the chi-square test to test for association between

categorical variables

Apply the chi-square goodness-of-fit test

Apply the sign test to one sample

Apply the Wilcoxon signed rank T test to two paired samples

Apply the Mann-Whitney U test to two independent samples

Introduction

Parametric tests - assess whether the differences between means (or

variances, proportions) are statistically significant. Model assumptions

are:

a) Underlying population being measured varies as a normal distribution

b) The level of measurement is of equal interval or ratio scaling, and

c) The population variances are equal

Unfortunately, we will come across data that does not fit these

assumptions

a) How do we measure the difference between the attitudes of people

surveyed in assessing their favourite car, where the responses are in

the form of 1, 2, 3,, n? In this situation we have ordinal data in

which taking differences between the numbers (or ranks) is

meaningless.

b) Furthermore, if we are asking for opinions where the opinion is of a

categorical form (e.g. strongly agree, agree, do not agree) then the

concept of difference is again meaningless. The responses are words

not numbers, but you can, if you so wish, solve this problem by

allocating a number to each response, with 1 for strongly agree, 2 for

Choosing a Test

Chi-Square Test

Versatile test

Widely used test with data that is categorical (or nominal or

qualitative) in nature

This section will explore the application of chi square in solving 4 types

of problems:

1. Perform a 2 test of association (independence)

2. Perform a 2 test of the difference between two independent proportions

3. Perform a 2 test of the difference between two dependent proportions

(McNemars test for matched pairs)

4. Perform a 2 test of goodness of fit to a theoretical probability distribution

For 2 and 3, you could use a Z/t test if you assume the population is

normally distributed.

1.

category variables (or more) are significantly related (or

associated) to each other

The null hypothesis states that the row and column variables are not

associated

It can be shown that if the null hypothesis is true then the expected

frequencies (E) can be calculated using

Row Total Column Total

E

Grand Total

To test the null hypothesis we would compare the expected cell

frequencies with the observed cell frequencies and calculate the chi

O E 2

2

squared test statistic given by

E

For the chi square test to give meaningful results the expected frequency for each

cell is required to be at least 5.

Contd...

Example 9:- Suppose a university

sampled 485 of its students to determine

whether males and females differed in

preference for five courses offered. The

question we would like to answer is to

confirm whether or not we have an

association between the courses chosen

and the persons gender. In this case we

have two attributes, gender and course,

both of which have been divided into

categories: 2 for gender and 5 for

course. Determine whether gender and

course preference are associated using

chi-square test of association on the

contingency table.

Total Sample Size

O E 2

E

63.2

df r 1 c 1 4

Contd...

Step 1: State null and the alternate hypothesis

H0: Gender and course preference are not associated (or independent)

H1: There is an association between sex and course preference (or dependent)

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic - Chi-square distribution

Total Sample Size

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

2

From Excel, Two tail P-value = 5.7E-13

O E

E

63.2

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

df r 1 c 1 4

From Excel, Two tail critical value = 9.4877

As 63.36 > 9.4877 , Accept H1

Step 5: Interpretation

There is a significant relationship, or association, between the

category variables

Example 10:- A firm who surveys whether or not employees use the

train to travel to work. The firm collects the data and has created a 2*2

contingency table to summarise the responses for only the people who

work on two days. The question is now whether or not we have a

significant difference between the Monday and Wednesday employees

who travel to work by train.

Contd...

Step 1: State null and the alternate hypothesis

H0: 1 = 2

H1: 1 2

(proportions different)

Step 3: Select the test statistic - chi-square distribution

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

2

hypothesis H1

From Excel, Two tail P-value = 0.035161

As 0.035161 < 0.05, Accept H1

O E

E

4.4373

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value = 3.84

As 4.43 > 3.84 , Accept H1

Step 5:Interpretation - Conclude that there is a significant difference in the proportions

travelling by train on Monday and Wednesday. Note: if we have a 1%

significance level then the decision would be reversed

Dependent Proportions (McNemars test)

Example:- Estimate the effectiveness of a political campaign on the voting patterns

of a group of voters. 2 groups of voters are selected at random and their voting

intentions (Drop CO2, Tax) for a local election are recorded. Both groups are then

subjected to the same campaign and their voting intentions are recorded. The

question that arises is whether or not the campaign was effective on the voting

intentions of the voters.

In

the

problem, we

shall look at

whether or not

the proportion

voting

Drop

CO2

has

significantly

changed

Contd...

In general the 2*2 contingency table can be structured as shown above

Drop CO2 Before proportion (1) and Drop CO2 After proportion (2) are given by

equations

1 a b N

2 a c N

df r 1 c 1 1

Two Tests available:

To test the null hypothesis we can use the McNemar z-test statistic which is

normally approximated defined by the equation

bc

Z

bc

To test the null hypothesis we can use the McNemar 2-test statistic defined by

equation

2

2

b c

bc

Contd...

Step 1: State null and the alternate hypothesis

H0: 1 = 2

H1: 1 2

(proportions different)

Step 3: Select the test statistic - McNemar z-test statistic

Step 4: Formulate the decision rule

bc

89 45

3.801

bc

89 45

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

Proportions, 1 = 0.60 and 2 = 0.53

From Excel, the two tail p-value = 0.00014. As 0.00014 < 0.05, Accept H1

Critical test Statistic Method - If test statistic > critical test statistic then we

would reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value = 9.4877

As 3.801 > 1.96 , Accept H1

Step 5: Interpretation - There is a significant difference in the voting intentions for Drop

4.

cal 2

O E 2

E

df = n k 1

In this section we will explore concept of measuring how well a data set can be

modelled by a particular probability distribution using the method of goodness-of-fit

test

For a chi-square goodness of fit test, the hypotheses takes the form:

observed and expected frequencies as defined by equation

Contd...

r e

P X r

r!

A motorway safety officer who believes that the number of accidents per

week occurring on a stretch of motorway can be modelled using a

Poisson distribution.

If X denotes the number of accidents per week then the sample data can be

modelled by fitting a Poisson distribution to the sample data.

Contd...

Step 1: State null and the alternate hypothesis

H0: No. of accidents follow a Poisson distribution

H1: No. of accidents do not follow a Poisson distribution

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic - Chi-square distribution/ Goodness of fit test

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

From Excel, the p-value = 0.73

As 0.73 > 0.05, Accept H0

Critical test Statistic Method - If test statistic > critical test statistic then we would

From Excel, Two tail critical value = 11.07

As 11.07 > 1.96 , Accept H1

Step 5:Interpretation - Conclude that the there is a significant relationship between the

observed and expected frequencies. This implies that the data can be modelled

Non-Parametric Tests

Many statistical tests require that data follows normal distribution

Distribution free tests/ Non-parametric tests - Do not require the data to

follow a particular distribution

In this presentation will explore three non parametric tests

Sign test

Wilcoxon signed rank test

Mann Whitney U test

Test

One sample

Paired samples

Independent

samples

Parametric test

One sample z-test

One sample t-test

Two paired sample Z-test

Two paired sample t-test

Two independent sample t-test

Sign test

Wilcoxon signed-rank test

Sign test

Wilcoxon signed rank test

Mann Whitney U test (Wilcoxon

rank sum test)

5.

The sign test is used to test a set of data values against a perceived

hypothesis statement, including:

1.Assessing the validity of a population median value assessed from

collected sample data replaces the one-sample t-test which assumes a

normal population and that a mean value as meaning.

2.Assessing the validity that the difference between two population

medians is zero based upon sample data replaces the paired t-test

which assumes a normal population and that a mean value as meaning.

3.Assessing the validity of proportions where the proportions are

estimated from ordered nominal (or categorical) data where a numerical

scale is inappropriate but where we can rank the data observations

replaces the sample Z test for proportions which assumes a normal

population.

Contd...

If we rank the data then the null hypothesis would result in half the ranks

to be less than the median (r1) and half the ranks would be greater than

the median (r2)

In this situation the null hypothesis can be modelled by a binomial

distribution with the probability of a data value being less than or greater

than the median being equal to p = 0.5, with sample size n

The sign test assumptions are

Randomly selected samples and

Continuous distribution

Sign test measures the number of counts that fall above and below the

median value

Under the null hypothesis, we would expect the number of counts

distribution to be approximately symmetric around the median and the

distribution of values below and above to be distributed at random among

the ranks

Contd...

The corresponding hypothesis statements for two tail and one tail tests

are:

Two tail test

H0: sample median = population median (0.5)

H1: sample median population median (0.5)

H0: sample median population median (0.5)

H1: sample median > population median (0.5)

H0: sample median population median (0.5)

H1: sample median < population median (0.5)

For a binomial distribution the value of the probability (P(X=r)), mean () and

standard deviation are given by:

P X r C r p q

n

r n r

n r n r

p q

r

n

n!

r! n r !

r

np

npq np 1 p

success p = 0.5 and the number of trials represented by the number of paired

observations (n), X ~ Bin (n, p)

were chosen to measure the

For the training programme to be effective we

effectiveness of a new training

would expect the hypothesis statement to be

programme on the value of

H1: the training programme results in the

sales. Calculate test statistics:

average value in sales to increase

(i)Binomial probability P(X Given random selection is made and no

information is given about the distribution, we will

12)

Contd...

Contd...

Step 1: State null and the alternate hypothesis

H0: The median sales difference is zero

H1: Median sales after training > Median sales before training

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic Sign test

Step 4: Formulate the decision rule

Calculate binomial probabilities, P(Xx)

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

p = P(X 12) = P(X = 12, 13, 14, 15, 16)

= P(X = 12) + P(X=13) + P(X=14) + P(X=15) + P(X=16)

X 11.5 8

From Excel, = np = 8 and = sqrt(npq) = 2

Z c

1.75

2

From Excel, upper one tail p-value = 0.0401

As 0.0401 < 0.05, Reject H0

Critical test Statistic Method - If test statistic > critical test statistic then we would

reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value = 1.6449

As 1.75 > 1.6449, Reject H0

Step 5:Interpretation - Conclude that there is a significant difference in the proportions

travelling by train on Monday and Wednesday. Note: if we have a 1%

6.

(Matched Pairs Test)

The t-test is the standard test for testing the difference between population

means for two paired samples that are equal

If the populations are non-normal, particularly for small samples, then the ttest may not be valid

As for the sign test, the Wilcoxon signed rank sum test is another example of

a non-parametric/ distribution free test, used to test the null hypothesis that

the median of a distribution is equal to some value

It can be used in place of

1) One-sample t-test

2) Paired t-test

3) Ordered categorical data where a numerical scale is inappropriate but

where it is possible to rank the observations

The method considers the differences between n matched pairs as one

sample

If the two population distributions are identical, then we can show that the

sample statistic has a symmetric null distribution

Contd...

Assumptions

The Wilcoxon signed rank sum test assumptions are:

1)Each matched data pair is randomly distributed

2)The matched pair differences should be symmetrically distributed

Although the Wilcoxon test assumes neither normality nor homogeneity

of variance, it does assume that the two samples are from populations with

the same distribution shape

It is also vulnerable to outliers although not to nearly the same extent as

the t-test

Contd...

Suppose that Slim-Gym is offering a weight reduction program that they advertise

will result in more than a 10 lb weight loss in the first 30 days. Twenty subjects were

selected for a study and their weights before and after the weight loss program

were recorded.

Contd...

Test

Two-tailed Test

Hypothesis

Tcal

H1: Population locations not centred at 0

Tcal = Minimum of T- and T+

H1: Population differences are centred at a Tcal = T+

value > 0

One-tailed Test

H1: Population differences are centred at a Tcal = Tvalue < 0

1.

Rank data

2.

ranks, T- and T+

T- = Sum of ve ranks = 35

T+ = Sum of + ve ranks = 265

3.

Find Tcal

4.

5.

Make decision: From the

sample data we have

sufficient

statistical

evidence that the weight

loss is greater than 10Ibs.

n' n'1

T T

300

2

n' n'1

T

150

4

T

35.0

24

Tcal T 0.5

3.2714

T

Contd...

Step 1: State null and the alternate hypothesis

H0: The population median weight loss is atleast 10 lbs (X-Y10)

H1: The population median weight loss is greater than 10 lbs (X-Y-10>0)

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic Wilcoxon signed rank test (Samples consist of ratio

data & no information about the form of the distribution)

Step 4: Formulate the decision rule

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

From Excel, upper one tail p-value = 0.0401

As 0.000535 < 0.05, Reject H0

Critical test Statistic Method - If test statistic > critical test statistic then we would

reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value = 1.6449

As 3.2714 > 1.6449, Reject H0

Step 5:Interpretation - Conclude that there is a significant difference in the proportions

travelling by train on Monday and Wednesday. Note: if we have a 1%

Contd...

Small number of paired observations (n 20)

= 35. The decision rule is to reject H0 if Tcal Tcri

H1

calculate the value if you remember that the

distribution is symmetric about the median

(remember median = mean for symmetric

distributions):

lower

Tcri = upper Tcri T.

Dealing

with tiesT(Tied

Observations)

There are two types of tied observations that may arise when using the

Wilcoxon signed rank test:

1.

Observations in the sample may be exactly equal to 0 in the case of

paired differences. Ignore such observations and adjust n

accordingly. For the previous example we removed any values and

used n instead of n.

2.

Two or more observations/differences may be equal. If so, average

the ranks across the tied observations and reduce the variance by

Equation (8.16) for each group of t tied ranks.

t

48

7.

Independent Samples

of an unpaired t-test

It is used to test the null hypothesis that two samples come from the same

population (i.e. have the same median) or, alternatively, whether

observations in one sample tend to be larger than observations in the other

Although it is a non-parametric test it does assume that the two distributions

are similar in shape

The basic premise of the test is that once all of the values in the two samples

are put into a single ordered list, if they come from the same parent

population, then the rank at which values from sample 1 and sample 2

appear will be by chance

If the two samples come from different populations, then the rank at which

the sample values will appear will not be random and there will be a

tendency for values from one of the samples to have lower ranks than values

from the other sample

We are thus testing for different locations of the two samples

The Mann-Whitney assumptions are as follows:

(1) independent random samples are obtained from each population

(2) the two populations are continuous and have the same shape

an innovative programme to improve the

performance of students on the courses it

offers. To assess whether the new

programme improves student performance

the firm have collected two random samples

from the population of students sitting an

accountancy examination, where sample 1

students have studied via the traditional

method and sample 2 students via the new

programme.

The firm has analysed previous

data and the outcome of the results

provides evidence that the distribution is not

normally distributed but is skewed to the left.

This information provides concerns at the

suitability of using a two sample

independent t test to undertake the analysis

and instead decide to use a suitable

distribution free test. In this case the

appropriate test is the Mann-Whitney U test.

Contd...

Contd...

Contd...

2.1410

U

8.6410

H0: No difference in examination performance between the two groups

H1: New programme has improved performance (M1 < M2)

Step 2: Select the level of significance - = 0.05

Step 3: Select the test statistic - Mann-Whitney U test(Lower one tail test)

Step 4: Formulate the decision rule

If the total number of pair wise comparisons (n1n2 = 7*8 = 56 > 20) we can

approximate the Mann-Whitney distribution with a normal distribution

P-value Method - If p < , then reject null hypothesis H0 & accept the alternative

hypothesis H1

From Excel, lower one tail p-value = 0.0161

As 0.0161 < 0.05, Reject H0

Critical test Statistic Method - If test statistic > critical test statistic then we would

reject the null hypothesis H0 & accept the alternative hypothesis H1

From Excel, Two tail critical value = 1.6449

As -2.1410 > 1.65, Reject H0

Step 5:Interpretation - Based upon the data, there is sufficient evidence to indicate at a

5% significance level that the performance as improved. Note that if we modify

the

level of significance to 1%

then the decision would be a borderline decision.

n 1 n 1 1

n 2 n 2 1

nn

U 1 n 1n 2

T1 U 2 n 1n 2

T2 U Minimum (U1 and U2) U 1 2

cal

2

2

2

Paired Comparisons

Small number of pair wise observations (n 20)

For a small number of paired comparisons (n = n1n2 20) we use tables to

calculate an exact value of the critical test value (Ucri) or an exact p-value based

upon P (U 9). For a 5% two tail test with n1 = 7, n2 = 8: (i) the lower critical U

value, Ucri = 11. Since Ucal < Ucri (9 < 11), we reject H0 and accept H1, and (ii) the

lower p-value = 0.014.

Given that we have a two tailed test

then the two tail p-value = 2*0.014 =

0.028 < 0.05, we reject we reject H0

and accept H1. The theory suggests

that if the null hypothesis is true then

the U test statistic will be centered at

U = 28 with critical regions identified

in Figure 8.11.

79

Tied Observations

Dealing with ties

If we find data with the same number value then we can deal with this problem

my allocating the average tie value to each shared data value. In this situation

we would then have to use the normal approximation with the standard deviation

U adjustment give by Equation (8.23):

U

n n 3 n n g t j3 t j

n 1n 2

2

1

2

n1 n 2 n1 n 2 1

12

j1 12

1. In the example and exercises we have not modified the solution for tied

ranks.

2. The Mann-Whitney U test is statistically equivalent to the Wilcoxon rank sum

test.

80

Conclusion

In this presentation we explored the concept of chi squared and nonparametric hypothesis testing:

81

- Nonparametric MethodsUploaded byHazilah Mohd Amin
- 18-19-20 Hypothesis Testing, Parametric and Non-Parametric Test.pptUploaded bysunru24
- Attitude of Nurses in Government Health Institutions Towards Sickle Cell PersonsUploaded byFrancis Kofi Somuah
- Business StatisticsUploaded byClanlord
- Hypothesis TestingUploaded byFahmi_mukhtar
- IE27_17_HypothesisTestingOnMeanandVarianUploaded byCristina de los Reyes
- Statistical tests, P values, confidence intervals, and power = a guide to misinterpretationsUploaded byizeldien5870
- Week 1 QuizUploaded byMark
- Week 7 Homework Problems3Uploaded byCristhian Montoya
- The Effect of Implementing Podcast in Enhancing Students’ Speaking Achievement in the Fully Digital EraUploaded byGlobal Research and Development Services
- Week 11-Fundamentals of Hypothesis TestingUploaded bykinhtruong
- QueUploaded bySatish G Kulkarni
- Factors Influencing the Rise of House Price in Klang ValleyUploaded byesatjournals
- skittles project 2Uploaded byapi-316655135
- 1DrVinayKandpaljanissue2015.pdfUploaded byits4krishna3776
- Worksheet, Hypothesis Testing, Fa 14 w AnsUploaded byhassinanar
- TB Ch 10Uploaded byChristabel Ginika Genevive Ekechukwu
- Midterm Exam With AnswersUploaded byShanna Basallo Alenton
- Four Steps of HypothesisUploaded byP3 Powers
- sw3_exeol_oddonly.docxUploaded byAditya R. Achito
- Hypothesis TestingUploaded byAlok Mittal
- UntitledUploaded byapi-189452417
- Teacher Supervision Influence on Student’s Academic Achievement in Secondary School Education in Migori County, KenyaUploaded byAnonymous izrFWiQ
- 2012_BMJ_SATIRE Claim Credibility of Claims BMJ 2012Uploaded byOscar Ponce
- hypothesis_testing.pptxUploaded byRadical Grace
- 7Structure ReliabilityUploaded byDeepak Solanki
- statsch 9project-katieashlynndallas 1Uploaded byapi-345883789
- Chapter7_1Uploaded bymarches
- Term Report Fat Mw Ss Ye (1)Uploaded byYasmin El-Alawa
- thesis.pdfUploaded byKiyani Jin

- BFSIandITOrientationUnitIIandIIIUploaded byJagadeesh Rocckz
- Banking Notes FinalUploaded byJagadeesh Rocckz
- ALDI Competitive Advantage Through EfficiencyUploaded bynobleconsultants
- RBI - Functions.pptUploaded byJagadeesh Rocckz
- Olap & Oltp - UpdatedUploaded byJagadeesh Rocckz
- Consumer Behaviour Towards LedUploaded byJagadeesh Rocckz
- LeverageUploaded byJagadeesh Rocckz

- HanLecture2 MEUploaded byAhmed Gouda
- Introduction to Computation and Programming Using Python%2C Revised - Guttag%2C John v..233Uploaded byZhichaoWang
- How Operations Research is Applied in Industry BusinessUploaded byInternational Journal of Innovative Science and Research Technology
- Time- Session 6Uploaded byPratima Baharia
- Test Bank for Business Statistics in Practice 8th Edition by Bowerman Chapters 1 18Uploaded bya243011001
- HW1Uploaded byChristine Phuong Elizabeth Le
- lec30Uploaded bySaid
- (_partial)_(_partial y)(ln(2x^3+3y^2)) - Partial Derivative Calculator - SymbolabUploaded byafjkjchhghgfbf
- The Goldston-Pintz-Yıldırım sieveUploaded byJoão Rocha
- Kalmar-Nagy Subcritical Hopf Bifurcation in the Delay Equation Model for Machine Tool VibrationsUploaded bykalmarnagy
- Techniques of Differentiation and IntegrationUploaded byAndreLim
- Game TheoryUploaded byAman Raj
- Physics - Introduction to Differential Geometry and General RelativityUploaded byJunior
- 5 Functions and Graphs-1 Bank SoalanUploaded byErfi Cool
- mh2801tut02soln.pdfUploaded byShweta Sridhar
- Digital Image FilteringUploaded bykarthikarajaratna
- Numerical MethodsUploaded byvgnsh
- Bab 3 slide 1Uploaded byRosyid Ridho
- MATH2089 (NUM) NOTES.docxUploaded byHellen Chen
- Course OutlineUploaded byKK
- Assignment1.pdfUploaded byWillykateKairu
- 14632practicalsignificance-161017020922Uploaded byJasMisionMXPachuca
- Lecture 39Uploaded byAlan Kottommannil Thomson
- Unevenly Spaced GridUploaded byOgugua Onyejekwe
- 05_ContinuousRV_1206Uploaded byAnonymous KUuLddnO98
- Crash Course: QM MathUploaded byJosé Luis Salazar Espitia
- Qualitative Forecasting MethodsUploaded byHan Dee
- Inter 1st Year Maths IA-Functions Study Material.pdfUploaded byHanuma Reddy
- The First 500 Prime NumbersUploaded byNaveen Prasad
- the dualUploaded bysekelanilungu