32 views

Uploaded by Jasper Angeles

chi-square

chi-square

Attribution Non-Commercial (BY-NC)

- (Advances in Intelligent Systems and Computing 832) Sébastien Destercke, Thierry Denoeux, María Ángeles Gil, Przemyslaw Grzegorzewski, Olgierd Hryniewicz - Uncertainty Modelling in Data Science-Spring.pdf
- Medical Statistics for Beginners
- Quantitative Methods (Self Test)
- Attitude of Nurses in Government Health Institutions Towards Sickle Cell Persons
- Econ3120 Spr 14 Prelim1 Final Solution v2
- z score z test statistics
- Different Statistical Tests
- Chandrakantha (2014) Excel Simulation as a Tool in Teaching Sampling Distributions
- bayesclust
- socsci103notes
- Testing Hypothesis
- Errors in the Use of Statistical Methods
- HS7605
- Assignment Number 2.docx
- 13. Testing of Hypothesis.
- Chi2
- Statistical Estimation(2)
- hypothesis testing lecture4.pptx
- math 1040 project 3 first part with histograms final 2
- Chapter 9

You are on page 1of 10

all been designed to allow comparison of the means of two or more samples to determine if they are significantly different from each other. Such comparisons can only be conducted when the researcher has interval level data. While the use of interval level data is preferred by most researchers because it provides a more precise measurement of the phenomena under consideration, it is often impossible to obtain. Researchers must then turn to another set of statistical tools that allow the testing of hypotheses using nominal and ordinal data. These tools are referred to in the field of statistics as non-parametric tests. A parameter is a quantity which is constant for a given population. Parameters can also be defined as numerical descriptive measures of a population. Two major parameters already explained in earlier chapters are measures of central tendency and variability. For example, the mean is a parameter which describes an entire distribution of values. Obviously, these parameters cannot be obtained for nominal and ordinal data. It follows then that statistics not dependent on calculating measures of central tendency or variability are non-parametric. However, this is not to say that parameters are not studied when using non-parametric statistics. One just does not know or make assumptions about any specific values of a parameter. Statisticians generally refer to T-tests and ANOVA tests as parametric statistics. This chapter introduces and explains the use of Chi-Square, used to test hypotheses involving nominal data, while the next is devoted to a statistic called Mann-Whitney U which is employed for hypothesis testing when working with ordinal measures. It should be pointed out by way of a cautionary note that statistics designed to test hypotheses for nominal and

137

ordinal data are no better than the data which they are designed to analyze. Interval data are more precise and accurate. The lower level of precision possible using nominal or ordinal measures makes the non-parametric statistics are somewhat less accurate for hypothesis testing. This limitation is partially addressed through the use of more stringent demands for statistical significance when non-parametric statistics are used.

CHI-SQUARE The most frequently used non-parametric statistic for testing hypotheses with nominal data is Chi-Square. The nature of nominal data as explained in chapter one involves assigning data to mutual exclusive categories, labeling, or naming the data. Nominal data are most generally analyzed by frequency of occurrence. The non-parametric statistic ChiSquare is a comparison of relative frequencies among two or more groups. The null hypothesis for Chi-Square is that there is no statistically significant difference in the relative frequency of one outcome over another. For example, a possible null hypothesis might be that there is no statistically significant difference in the relative frequency of Hispanics failing their first math course in college and the relative frequency of Whites failing their first math course. In other words, there is no statistical difference between the two groups as measured by frequency of failure. Nominal data for testing this hypothesis can be organized in a twoby-two data matrix containing two rows and two columns for pass-fail categories and by group. This approach to organization is shown for a sample of 100 Hispanics and a sample of 100 Whites in Figure 11:1.

138

FIGURE 11:1

Pass Whites Hispanics Total 50 50 100 Fail 50 50 100 Total 100 100 200

In this example, the null hypothesis would be accepted because one can simply observe that there is no difference between Hispanics and Whites. The frequencies of pass or fail rates are the same for both groups. No statistics are necessary for nominal data equally distributed between groups, but not all frequencies are this simple. Generally, decisions relative to accepting and rejecting null hypotheses require far more complex analyses because differences between samples do occur. Whether or not these differences are sufficient to suggest a statistically significant difference in the overall populations is the reason for conducting statistical tests. Calculation of the Chi-Square statistic is basically a comparison between observed and expected frequencies. Observed frequencies are actual nominal data for each characteristic under consideration by the researcher. In the above example, one observes that fifty Whites and Hispanics failed and fifty Whites and Hispanics passed. The expected frequencies are the nominal data results one would expect to find if the null hypothesis is to be accepted. In the above example, one would expect the proportion of pass and fail frequencies for Whites and Hispanics to be the same. The theory behind the Chi-Square statistic is that if the difference between the observed and expected frequencies is large, that even with assumed sampling error, the null hypothesis is rejected. One would conclude that a statistically significant difference between two or more groups does exist. By implication, this also means

139

that not all differences between observed and expected frequencies are significant, some are the result of sampling error or too small to be significant. The formula for calculating the Chi-Square statistic is:

Where: the observed frequencies for each position in the matrix the expected frequencies for each position in the matrix

Calculation of the Chi-Square statistic is a simple process involving the use of a solution matrix. For example, suppose a researcher wanted to test the difference between frequencies of high or low incomes for men and women in the same profession. A research question could be stated as follows: Do male lawyers have higher incomes than female lawyers? The null hypothesis might be stated as follows: There is no statistically significant difference between the frequencies of the high and low incomes for males and the frequencies of the high and low incomes for females. Organizing the solution matrix for the Chi-Square statistic is simple and easy. First, the data are organized by row and column in the form of a data matrix. The actual or observed values for each place in the data matrix are recorded. Then the values in each rows and column are totaled and the total number of cases under consideration (n) is determined. The solution matrix will vary in size depending on the number of rows and columns needed to display the observed frequencies. In figure 11:2 the following data matrix was constructed

140

using the observed frequencies of high and low incomes (nominal) for men and women (nominal) are displayed in a 2 x 2 data matrix.

Men High Income Low Income Total 29 15

(19.66)

Women 25

(20.34)

Total 40 19 59

14

(9.34)

5

(9.66)

30

Once the data matrix has been constructed, the expected frequencies for each cell in the matrix can be determined using the formula:

For example, row 1 and column 1 square of the matrix, which represents high income men,

Row 1 column 2 is calculated: Expected frequencies are similarly obtained for all of the squares of the data matrix and included in parentheses within the data matrix immediately below the observed values. When the expected frequencies have been calculated, the remaining Chi-Square calculations are

141

Row

Column

1 1 2 2

1 2 1 2

15 25 14 5

The value of the Chi-Square statistic is 6.75. The next step in the process of testing the hypothesis requires that the degrees of freedom be determined. The simple formula for finding the degrees of freedom for P2 is: d.f. = (Total Rows - 1) (Total Columns - 1) In the context of the present example, df= (2-1)(2-1)=1(1)=1 By consulting the table in Appendix H the critical values for P2 at .05 and .01 are 3.84 and 6.63 for 1 degree of freedom. The researcher compares the obtained value for P2 with the critical value to determine if the observed difference in frequencies is statistically significant. The null hypothesis is rejected at both the .05 and .01 levels At the 95% and 99% confidence levels in this case because the obtained value is higher than either of the critical values from Appendix H. Therefore, the researcher must conclude that there is a statistically significant difference between the relative frequencies of high and low incomes for men and

142

women. Even allowing for the presence of sampling error, the value of P2 is large enough to suggest that a real difference exists between the populations represented by these samples. In this example, the research conclusion is that the female lawyers have higher incomes than male lawyers. A very useful rule for accepting or rejecting the null hypothesis for P2 is as follows: Accept null if the obtained P2 is less than the critical values in the P2 table. Reject the null hypothesis if the obtained P2 is equal to or greater than the critical values in the P2 table.1 Under certain circumstances when working with a 2x2 data matrix, the formula used to calculate the Chi-Square statistic is adjusted slightly. This process is utilized when any of the expected frequencies within the data matrix are lower than 10. The alternative Chi-Square formula is known as the Yates' Correction. When expected frequencies are this low, researchers have determined that it is appropriate to make the standard for rejecting the null hypothesis more stringent by subtracting .5 from the absolute value of the difference between each observed and expected frequency before the differences are squared. The formula for Chi-Square using Yates Correction is as follows:

Applying this correction requires an additional column in the solution matrix and the

The critical values are critical because they are the basis for accepting or rejecting the null hypothesis. Since Chi-Square is a statistic based on nom inal data, the obtained Chi-Square m ust be larger than these critical values in the table for a significant difference in the frequencies.

143

correction will also reduce the size of Chi-Square. The reduction is an effort to be more conservative and reduce the probability of making the alpha error. The comparison of frequencies of men and women in high and low income categories earlier in the chapter provides an example of a context in which Yates Correction is to be applied. Compare the solution matrix using Yates Correction presented in figure 11:3 below with the one produced earlier. Notice the difference in the value of Chi-Square and the difference in statistical conclusions required when the Yates Correction is employed.

Men High Income Low Income Total 29 15

(19.66)

Women 25

(20.34)

Total 40 19 59

14

(9.34)

5

(9.66)

30

Row

Column

1 1 2 2

1 2 1 2

15 25 14 5

The obtained value for Chi-Square is 5.37 which is still significant at the .05 level but which

144

is no longer significant at the .01 level. In summation, the Chi-Square statistic is used to test hypotheses by comparing observed and expected frequencies of a characteristic for two or more groups. Chi-Square is not limited to the comparison of two samples. One may have a 5 x 5, 10 x 10, 7 x 10, or any size data matrix for many independent samples. Unlike the t test, Chi-Square is not used for dependent samples. In addition, Chi-Square is used only for nominal data, and a researcher should make use of Yates' Correction when it applies.

145

EXERCISES - CHAPTER 10 (1) A researcher wants to determine whether students who had taken a drivers education

course sponsored by the school passed their state drivers examination with a higher relative frequency than those who did not take the class. Using the data provided in the 2x2 matrix below and Yates Correction: A. B. C. Write a null hypothesis Calculate the value for Chi-Square Draw statistical and research conclusions

Taken Drivers Education Test Result Pas Fail Total Yes 14 5 19 No 6 10 16 Total 20 15 35

(2)

In a poll of New York residents, the following results were recorded with reference to political ideology and party affiliations. For 65 Republicans: 20 conservative, 35 liberal, and 10 neither. For 120 Democrats: 40 conservative, 70 liberal, and 10 neither. Test a null hypothesis for these data and draw statistical conclusions.

146

- (Advances in Intelligent Systems and Computing 832) Sébastien Destercke, Thierry Denoeux, María Ángeles Gil, Przemyslaw Grzegorzewski, Olgierd Hryniewicz - Uncertainty Modelling in Data Science-Spring.pdfUploaded byKharlzg
- Medical Statistics for BeginnersUploaded byAnonymous eSyBAd2N
- Quantitative Methods (Self Test)Uploaded byMohammad Bin Shahid
- Attitude of Nurses in Government Health Institutions Towards Sickle Cell PersonsUploaded byFrancis Kofi Somuah
- Econ3120 Spr 14 Prelim1 Final Solution v2Uploaded byDouglas Bourdett
- z score z test statisticsUploaded bySukhi Sohal
- Different Statistical TestsUploaded bysgstory
- Chandrakantha (2014) Excel Simulation as a Tool in Teaching Sampling DistributionsUploaded bymlosada70
- bayesclustUploaded byAshwini Kumar Pal
- socsci103notesUploaded byChas
- Testing HypothesisUploaded bySaif Kamal
- Errors in the Use of Statistical MethodsUploaded byutkmangal
- HS7605Uploaded bynuy
- Assignment Number 2.docxUploaded byAlyana Vee Ednalig Vigor
- 13. Testing of Hypothesis.Uploaded bySanjeev Gautam
- Chi2Uploaded byCamila
- Statistical Estimation(2)Uploaded byAmanuel Maru
- hypothesis testing lecture4.pptxUploaded byYasir Khan
- math 1040 project 3 first part with histograms final 2Uploaded byapi-312730395
- Chapter 9Uploaded bynikowawa
- Course Outline MbaUploaded byTinashe Mutetwa
- math 1040 skittles term projectUploaded byapi-349910128
- Chapter 15Uploaded byAbdul Mateen
- organicvsnormalstrawberriesUploaded byapi-358599175
- kantola calories study 2014Uploaded byapi-264589085
- Hypothesis Testing ZtestUploaded byAngelyn Joy Bayrante
- Research of Corporate Tax Impact on Financial Performance. Case of Companies Listed on Bucharest Stock ExchangeUploaded byDzoan Khoa Danh
- Other Standard Error FormulasUploaded byh3ro007
- 152_1_27Uploaded byPawel Pietruszczak
- 14 NonparametricUploaded byBrooke Tillman

- 19 - Correlation and RegressionUploaded byzahoor80
- Lecture 1.docxUploaded byJasper Angeles
- Literary texts for discussion.docxUploaded byJasper Angeles
- Filipino WordsUploaded byJasper Angeles
- SQ3RUploaded byJasper Angeles
- Executive Order ChildUploaded byJasper Angeles
- AP_Enjambment End-Stop and_Caesura.pdfUploaded byJasper Angeles
- Miracle ReadUploaded byJasper Angeles
- HARRY1Uploaded byJasper Angeles
- f FictionUploaded byJasper Angeles
- Long quizUploaded byJasper Angeles
- LOVE IN THE CORNHUSKS.docxUploaded byJasper Angeles
- Dominguez Mark Vincent 1stessayUploaded byJasper Angeles
- 262019962_1.docUploaded byJonald A. Carbonell
- TIP Implementation Manual.pdfUploaded byJasper Angeles
- Module 1 Educational Laws and Surveys Programs and Projects of the DepEd - Post AssessmentUploaded bycarrieanndulay
- Brain and LanguageUploaded byJasper Angeles
- LAST 10 MINUTES.docxUploaded byJasper Angeles
- rUploaded byJasper Angeles
- jjUploaded byJasper Angeles
- 023Uploaded byJasper Angeles
- New Yorker in TondoUploaded byJoanne Zuniga
- Last 10 MinutesUploaded byJasper Angeles
- Week 3-Job Analysis and Human Resource PlanningUploaded byJasper Angeles
- Collage SampleUploaded byJasper Angeles
- Assessment DANCEUploaded byJasper Angeles
- ZenUploaded byJasper Angeles
- ZEN Stories and HaikusUploaded byJasper Angeles
- Developing ParagraphsUploaded byNesrene Emy Lleno
- A Day's waitUploaded byPatricia Matei

- Jane_Maher_Seeing_language_in_sign_the_work_of_William_C._Stokoe.epubUploaded byjosef24
- Harvard Class Syllabus-Experience-Based Brain Development – Causes & ConsequenceUploaded byJ
- Educational Fun WebsitesUploaded byiqbal1439988
- Introduction to the Philosophy of the Human Person_Pag City_5Uploaded byLEX
- parent interview reflection paperUploaded byapi-449648328
- Lesson Plan TechnologyUploaded bymelissalybrook
- a journey down memory laneUploaded byapi-231837855
- Attitude is the cornerstone of success.docxUploaded byLai Raymundo
- Application Form AGIF Scholarship SchemeUploaded byKusum Bhandari
- laura teague albenesius resume updated 3 23 17Uploaded byapi-334871316
- paige murray resume1Uploaded byapi-295777197
- psa ppUploaded byapi-439295145
- 3573025 STOMP Unit Arts Ed InterrelatedUploaded byt4e1984
- Boston 2013 Final Programme FULLUploaded byOsdi Alampratama
- C Sample ExamUploaded byVivek Arumuga
- Practical Uses of Theoretical LinguisticsUploaded byDan Winch
- 5. Skills BuildingEUploaded byDaniel Adebayo
- Laws in LearningUploaded bynnaesor_1091
- 203 Ivy League Cover LetterUploaded bysengademrek
- New Approved English TestsUploaded byCarlos Caldas
- Candidates for Graduation With AddressUploaded byJohn B. Lacea
- "Beware of the Man with One Book": Teachers as Mediators of Identity-Based Conflict in Bosnia-Herzegovina (Oxford MSc Edu Dissertation)Uploaded bypaul_chiari9669
- Public Complaints Bureau Annual Report 2001Uploaded byadilshahkazmi
- Toeflspeakingteacher ComUploaded byHarumi Padmaswari
- outline for policy analysis paper 1 sw 4710 docxUploaded byapi-295520606
- Post Graduate Certificates Mohawk CollegeUploaded byAnonymous XFkQCF2qL4
- wida accessaccdescriptions 1Uploaded byapi-382594732
- ClubsandSchoolsUploaded bymattwebb7490
- JNTUK B.Tech_B.Phar 3-1, 3-2 Sem Academic Calendar For A.pdfUploaded bysathish kumar
- 2012 Smart Reading Clinic BrochureUploaded byMissy Lanius Walls