You are on page 1of 13
{3 pothesis Testing af categorical Data gii-Square Test) NG OBJECTIVES ionof this chapter, you Will be able to: ygaRNI and the concept of chi-square statistic and Understand the concept of ch-square test of Square dstnibution Independence: two-way contingency analysis sand the concept of chi-square goodness-of-it_ m_Understand the concept of chi-square test for population variance and chi-square test of homogeneity SIATISTICS IN ACTION: STATE BANK OF INDIA (SBI) sreSate Bank of India (SBD is the country's oldest and leading bank in terms of batance sheet size, number of branches, set eaptalzation, and profits. This two hundred year-old public sector behemoth s today stirring out of its public sec: gee and moving with an agility to give the private and foreign banks a run for their money. The bank has ventured “anery new businesses such as pension funds, mobile banking, point-of-sale merchant acquisition. advisory services. ‘risiuetured products. All these initiatives have a huge potential for growth.’ Suppose SBI wants to find out whether its “easervces such as mobile banking and internet tein il only be used by its younger custom-— p——— ‘erty customers across all age groups. Let us| TABLE 13. i f ty f banking across different age groucs: ve that the management has a perception Preferences of type of banking coecee a sesonal banking would be more popular |~-~— Product Mobile Internet Personal Row \ inteaged and older customers. SupPOsE |. ge banking banking banking ral e os 7 services of a marketing research firm. | - 1° 4 1s 5 145 ce hconducts @ survey among customers from pitt ‘age groups to find out an answer. } 28 to 35 155 180 197 ‘ es research firm has ‘randomly | 36 1044 167 210 150 Customers across the age groups: 146 156 142 77.28 0 38, 36 to 44, 45 to 57, and 58 ia 56 a ‘each observations made by the marketing 58 to 70 133 S chee about the type of banking ooted bY | Column total 726. 877 aslo i Qe groups are given in Table 13.1. OCAIIICU WILT Uadllis' ioe ee aes a -! INTRODUCTION In the previous chapters, we have discussed that under various citeumstanegs tests are used to test the hypothesis about the population parameters, Jn, this chap" “iscuss some tests related fo categorical data, Categorical data is definey 85 the OF frequencies from one or more variables. Let us take the example of + 5 Qrsanized by a company for its officers. The company has a total of 49 off, elected a random sample of 650 officers across four departments to aSSe55 the tativeness across departments in the seminar. Out of 650 randomly selected of Officers are from the production department, 200 officers are from the Marketing den 160 from the finance department, and remaining 140 from the human resources ye Aresearch variable “representatives from the departments” does not require any. pa fo be used. Here, the esearch question is the frequency count from each deparimen be analysed using the chi-square technique. ; Sree Some researchers place the chi-square technique in the category coy OEE tests for the testing of hypothesis. The tests described in previ hypothesis such as 13.2 DEFINING Z-TEST STATISTIC 7 Nest was developed by Katl Pearson i 19 _ ra ; n 1900. The symbol stands for the Greek ‘ia Paes ‘chi. We have discussed that “and F dis i of freedom associated to Likewise distribution nuPutions are functions oftheir degree a fein i tafe isa continous bution is skewed tothe Brobabilt) also a function ofits degree of freedom (Figure 13.1), Thedst- are quantities, 7° distribution can never®! 364 Business Research Methods OLAINEU WILE Udo! XC distribution for I degree of freedom, — Z © distribution for 5 degrees of freedom 2 distribution for 10 degrees of freedom FIGURE 13.1 7 distribution with 1, 5.and 10 degrees of freedom 4 Aeveptance region (Accept Hg). —————»; Rejection region (Reject Ho) FIGURE 13.2 XG, v (Critical value) Acceptance or rejection region ina 7 test Pst statistic can be defined as below: Mest statistic eee wine Seep nee isthe observed frequency, f, the expected or theoretic a and c the number of parameters being estimated frot _ tel leve of significance, the calulated value of 7° “Decision rules are a below: | ints 1” Lowe tect the mull hypothesis, otherwise do not eject the nul hypothesis. * “shown in Figure 13.2 al frequency. k the number of 1m the sample data. is compared with the critical 44 Uunathesis Testing for Categorical Data (Chi-Squa . SUATIEU Wit Cais 2 Test ing the 7 ; ‘Applying Fag Conditions 0° i pe satisfied before applying Z jitions need (© AS a test statis, ge condi The following con im a cell iy jg ; jequency of less than ; aaa be, a eT eae ages, WE Need 10 “poo gency table. 2 test _— ‘ son ied 1 a7 ane fing or succeeding frequency, x, frit the Pres 5 orn east 30 observations and should be draw, i of a bea adividual observations ina sample, frequency requi eres which are fess thal ofthe frequency DD E The sample should co from the population. In a indepenctent from each other -— Data should not be presented nal units, ist \ddition. all the ins or ratio form, rather th expressed in ori 73.3 @GOODNESS-OF-FIT TEST Fit test. 7 test enables us to ascertain whe, nd normal distribution, fest provides a platform —_y"test is very Popul cas ae Known probability distributions such as se ear cay ah bi ‘match with an actual sample distribution, In other words. we ean say the 7 test py, Platform that ean be used to ascertain whether theoretical probability distribution... With empirical sample distributions. "test compares the theoretical (expected) freq. With the observed (actual) to determine the difference between theoretical and wh. frequencies For applying test ist a theoretical distribution is hypothesized fora given p ula Mihe BeMt step. the testis applied to make sure whether the sample dicnheene {rom the population with the hypothesized theoretical probability distribution, The sec the popi Ispo ps ty “eps for hypothesis testing can also be performed using the ZF goodness-of-fit tes Example 13.1 mp! A company is concerned about the incre sing violent alters Hots between its employees, The number of velar recorded by the management during six randomly ‘months is given in Table 13,2 TABLE 13.2 Record of violent meidents in six randomly Selected months Monthy Jan Feb Mar Apr May Jun Number of violent incider SSS 65 68 22 80—ogs Use a= 0.05 to determine w hether the data uniform distribution ‘an be stated as alter MONS are uniformly distributed &" HeNambers of Over the. altercatio, ari aah SAO are not uniformly dist? Busnes Research Metnocs a OLAHICU WILT Udo “PPFODriate 54 istic is =>! with df= k= Te step 3: Set the level of significance ‘Alpha has been specitied as 0,05, Step 4: Set the decision rule ‘ora given level of significance 0,05 fo For ag ig 15. rule ceeptane! ‘of null hypothesis are as below les for acceptance or rejection W 64? Kowa eC the null hypothesis, otherwise, do not reject the null hypothesis. a The critical 7° value is Zp, 6-155 s= 11.07 where degrees of freedom = 1 — Step 5: Collect the sample data The sample data are given in Table 13.2 Step 6: Analyse the data Expected frequencies can be computed by dividing total observed frequencies by number of months. In this case. expected frequency = Table 13.3 exhibits expected frequencies and chi-square statistic for the data relating to violent altercations, TABLE 13.3 Computation of expected trequencies ana chi-square statistic tor Example 13 1 Months I, Jan 38 70 Feb 65 0 Mar 68 7000871 Apr n 7 00871 May B 70 0912 Jun 82 70 2.0871 Ys =420 0.65 So =p ess implication un the table 1s Jusion and busin | value obtained ti ‘as 6.65, which is fess than the Step 7: Arrive at a statistical cone! AL 95% confidence level, the Crnes! Jated 1.07. 7 value 1s calcu! Kow.s = ovaiiied Wit GainiS 3 ert Function the tenth unexplained te due to son jstributed 07 the tent u Must be of violence must be exp! possible. Jored and corrective meas is Testing with 7’ Sta 13.3.1 Using MS Excel for Hypothest for Goodness-of-Fit Test elp of MS Ex Chi-square value can be calculated a ee eecind rate is to calculate, the p value. Start with the ‘Select a Cone thee Or select a category, select Statistical and from ‘Select a function (Figure 13.3) and click OK. The Function Arguments dialog box », screen. Place the location of the observed value in the Actual _ range tay 2, location of the expected value in the Expected_range box (Figure 13.4) ¢ Excel will calculate the p value. The value of 7-test statistic can be help of this p value. For doing this, go back to the Insert Function f. dialog box. From “Or Se egory’, select Statistical and from ‘Select a function’ select CHIINY (Fiz & OK. The Function Arguments dialog box will reappear on your scr the calculated p value in the Probability box and place the degrees of fr: dom box and click OK (Figure 13.6). The 7? value will appear in the cor. el in two pars ‘Search for a Function; It brief deseris of what you wank bode Type a brief descript t andthen cick] [ye Ferbtion of whst you want to do and then cick & Or select a category: |Statistical cet PotOptate degree’ crane chi-squared distribution fr Sa Ho} Fi = 185;65;68;72;78;62) ote : 5] = 470;20;70;70;70;705 | a = 0247412805, | wet independence: the value from the chi-squared distribution for the statistic and the appropriate we expected_range is the range of data that Contains the ratio of the product of row totals and column totals to the grand total, = 0,247412885, aunt FIGURE 13.4 MS Excel Function Argume | dialog box Pees aly \BINOMOIST \chopist j aon \CHITEST CHIN (probability,deg_freedom) : f tion. Fetus the inverse of the one-tailed probability of the chi-squared distribution. FIGURE 13.5, MS Excel Insert Functw ooo dialog box OULAIIICU WILT vais = 0.247413 aa -5 aa pabiity 0247419 | _ a = 6,657141453 | ae jstribution- | | ed di the chisquar' _— ¥ the one-tailed probabilty of f degrees of freedorn, 2 number between 1 ang i Returns the verse o nator isthe Deg_freedom ero imt0. one: f Formula resut = 6,657141453 URE 13.6 nu itor i ion Proportion Using , 3.2 is Testing for a Population ] = Hythe ee Test as an Alternative Technique to the ZTeg ed the -test for a population proportion for np > 5 and ny» In Chapter 10), we discusst formula ean be presented as below: a The z-test for a population proportion for np 2 5 and ny 2 5 is given as where P is the sample proportion, n the sample size, p the Population proportion, g=l-p. The 7° goodness-of-fit test can be used to test the hypothesis about the population Portion as a special case when the number of classifications are two. Let us reconsider Example 10.5 discussed in Chapter 10 for understanding the cone The null and alternative hypotheses were stated as below: Hy p=0.10 Eom SemeetY €XPected distribution in which there a a Contingency table of defective “fective items and 0.90 non-defective items. Set? 2nd non-defective items (in this case frequencies) are 100, so the expected Category i f ~ acres for defective items are (0.10 x 100 ne Defective items 19 rapeeted frequencies for non-defective items al , Non-deteet rman The observed frequencies for dei” iu te n-defective items are 12 and 88, respectively: on 7 88 yy PaStS Of these observations, tingency table constr ons, a contingency ructed (Table 13.4) Smee atone OUGINIEU WILT Ud IIS) ce level is 95%, which sh fidene ae shows that on thy sepan will Be 0.028%. that is, 72... gn ONM sides of the distribution, the . 7 Statistic can be calculated as g FER LY 02-107 | an gop ; aos calculated value of 7° is in the acceptans : eae € Fegion (0.44 < $.0230 ¢ ep em lation proportion is 0.10 can be accepted. Ife oe te ofthe result that we have obtained in Exam, ‘We examine this result ie 1 roe can be observed. I that example, the calculated son ates ste 67 < 1.96), $0 the null hypothesis that ah lue of = is in the ace wegen the population proportion is 0.10 is a gpif:PRACTICE PROBLEMS nnn yethe data given in the table for determining whether vl Ae observed frequencies represent a uniform dist. Catsgory a tion Take a= 005. ; a “Category f 4 2 = so 1 ty : ” 6 2 Is 7 a7 3 2 , e 8 45 4 ih 5 ay 13A3._ The table below shows the sales of a compan: 6 a sand rupees) for eight years. Use a = 0.05 to 7 2 whether the data fit a uniform distribution. 8 Is Year Sales (in thousand rupees) 9 is 1 75 10 n 2 80 3 2 AZ Usethe data given in the table for determining whether 4 a the observed frequencies represent a uniform distribu : io von. Take a = 0.01, : came eee 6 2 Categon 7 OM 1 so 8 "8 2 ss "4 TEST OF INDEPENDENCE: TWO-WAY CONTINGENCY ANALYSIS nat business situations, a market researcher might be interested in understanding the ent? between two variables or to check whether they are independent of each other. . + 88 edible oil company be interested in knowing whether the purchase of , Seema 4 on the customer's age "dependent of the customer's age or whether it #s dependent on the . ‘oCarinreu wien Cam ay has t0 FRAME a producy, fhe HRD manager of a ¢, rumover is independery the compa wo differe le ; These are to differ TT extihe rate of employee ingly strategy accordingly whethe! inne asis of two Variables a et Observations are clas- quali 's are classifies le (Table 13.5). y ay a conti fable. the resulting table 16 When obser ation te referred to as a contingency ef fable. 2? test of indeDer — the resulting table is able for determin dence uses this conti rency 7 wency analy a a unger ses this conting as contingent vy table (Table 13.5), Variable. ma Observations in each cet, the continzenc) Jusive categories. Dendence ot bees teat is sometimes referred © common to the respective TOW and oly endence of two variables, te : this is why this test 1s some- It can be observed ot ae are classified into mutually times refered to as contin ae —S frequency of observations 1M A oral of the Ath column. When We add jy, th row and’ © 7 | is the sum of all the fregy.*' When eae eal (N) is obtained. This grand total Iculate the expect: aa a, “When we add the ro et totals, aces sample size. It is very important t0 ca NPected fregue represen 7 g ee es e 7-test. . sates celled a eaerceeer ne expected frequency for any cell is based on the concept of. Smpesce TSM ative law of probably. Probability theory suggests tha i WO events are indy ees ie eeeea i ence is equal to the product of their indin,g then the probability of their join eeurrence is equal tothe Product heii abilities. This concept of probability can be used to ¢ Pected frequen row and Ath column. So, the expected frequency of cell jk is Total of the jth row San) = cies Total number of frequencies Tora ofthe th column Total number of frequencies Total number of frequencies is the row total of the jth row, C. is the totaly We know (from Table 13.5) that &, Ath column, and the total number of frequencies are .V Placing these values in the e above, we get RG Sass = Y The expected frequency for any ¢ : obta ae queney for any cell can be obits e forma TABLE 13.5 Contingency table o R oO ° y ’ . 1 2 9 . Column tota Gl o ° o » ae } n 's Research Methods OUAHNCU WILT UGS RT CT ance N v total, CT the column tota v aero ‘OKLA the total a umber oF frequen, =y ates f the observed Frequency and f the expected gr theoreti 8 frequene: 9 qqdom ina 7 Cet of independence o! i geedom = (Number of rows ~ 1) (Number of columns - wf : ) sox Pasident (Sales) of a garment company wants to i whether sales of the company’s brand of jeans Sent of age BroUP. He has appointed a marketing sorter for this purpose. This marketing researcher has taken *2tasimple of 703 consumers who have purchased jeans. “freeacher conducted survey for three brands of the jeans, ey band 1, Brand 2, and Brand 3. The researcher has also “elie age groups into four categories: 15 t0 25, 26 10 35, ‘a f,and 46 to 55. The observations of the researcher are cniein Table 13.6: ‘Example 13.2 TABLE 13.6 Contingency table for Example 132 Brand Brat! Brand 2 Brand Row total dye 151025 65 5 Rn om 26103560 40 64164 360045 4s 52 507 405555, 65 oo 180 Column total 225 232 246 03 oy (a tether brand preference is independent of age Wea=0.05 The seven steps of hypothesis testing can be performed as below Step 1: Set null and alternative hypotheses The null and alternative hypotheses can be slated as below H,: Brand preference is independent of age group and H Brand preference is not independent of age group suai (ea witir'vai nS 2: petermine the appropriate statistical tog p 2: De The appropriate test statistic is ‘The appropria with degeces of freedom = (number of rows, , columns ~ 1) , of significance ed as 0.05, Step 3: Set the lev Alpha has been spe jet the decision rule Step . For a given level of significance 0.05, the rules for hy rejection of the null hypothesis are as follows: ae If 121 > Xana Feject the null hypothesis, otherwise, do, . Oe, null hypothesis. The critical 7? value is Z2os, ¢ = 12.59 where degrees of freedom = (number of ows ~ 1) (number =(4-1)x(3-1)=6 Cohan Step 5: Collect the simple data The sample data are given in Table 13.6, Step 6: Analyse the data ‘he contingency table with the observed and expected fe, shown in Table 13.7, pee TABLE 13.7 Contingency table of the observed and expect eartingency pected fre es Brand? Brandon 65 (67.8520) 9630) 72 (74.1849) 212 60 (52.4893) 40 (54,1223) 64 (57.3883) 161 45(47.0483) 52.48.5120) 50 (51.4395) 17 $5(57.6102) 65(59.4025) 60(62.9872) 10 225 | 246 703 = 67.8520 och Si ans ilar, the expected frequencies for other cell & ba .8 exhibits the computation of expected fies™™ Square statistic for Example 13.2, Business Research Methods OCAIINICU WILT vainS TABLE 13.8 Computation of expected fr for Example 13.2 equer Ncies AND chi-square stat ste I. Expected frequency) 6 67.8520 60 52.4893 45 47.0483 35 57.6102 15 69.9630 40 34.1223 32 48.5120 65 59.4025 n 74.1849 64 57.3883 50 51.4395, 60 62.9872 $0, 2=y af =723 Step 7: Arrive at a statistical conclusion and business implication ‘At 95% confidence level, the critical value obtained from the chi-square table. is 7295,¢ = 12.59. 72 is calculated as 7.23, which is less than the tabular value and falls in the acceptance region. Hence, the null hypoth- esis is accepted and the alternative hypothesis is rejected. There is enough evidence to indicate that brand preference is inde- pendent of age group. So, the management can go in for a uniform sales and marketing policy. {41 Using Minitab for Hypothesis Testing with 7° Statistic for Test of Independence i ill appear on the Sep is to select Stat from the menu bar. A pull: down menu wil | Pe Select Table from the menu bar. Another pull-down meni! will appear on the screen. 2 Chi Square Test (Table in Worksheet) from this pull-down mM jgt®Chi-Square Test (Table in Worksheet) dialog box will appear on le if 3.7), By using Select, place samples in Columns containing te tble B13.) click OK Minitab will calculate the 7° and p valve For 61° 8). Nef 7 Cau 13 gt : F ce; MS Excel cannot how- nih en be used del or the 2 18 ST sed ety fr of ing, directly forthe same test Similarly, MS Exo 0 aa, “Fi however, Minitab cannot be used directly Fr" OULAIINICU WILT vain

You might also like