Chi Square (Week 1 Reading)

18 FIGURE 18.1 Midway throuah wting the second edition of this book, things had gone ate stange Categorical data 18.1. What will this chapter tell me? © ‘We discovered in the previous chapter that I wrote 2 book. This book, There are a lor of good things abour writing books. The main benefit is chat Your parents are impressed. Wall, they're not that impressed acwally because they think that a good book sells as ‘many copies as Harry Potter and that people should queue outside bookshops for the lat- est enthralling instalment of Discovering Statistics... . My parents are, consequently, quite baffled about how this book is seen as successful, yet I don't get invited to dinnec by the Queen. Nevertheless, given that my family doz’ really understand what I do, books are tangible proof that { do something. The size of this book and the fact it has equations in it is an added bonus because it makes me look cleverer than | actually am. However, there is a price to pay, which is immeasurable mental anguish. In England we don’t talk about our emotions, because we fear that if they get out into the open, ciCHAPTER 18 CATEGORICAL DATA collapse, so J definitely will nor mention that the writing process for the second edition was so stressful that T eamne within one of Fuzzy’s whiskers of a toxal meltdown. It took me swo years to récover, just in time to start thinking about this third cdition. Sell, it was worth it because the feedback suggests that some people found the book vaguely useful. OF course, the publishers don’t care about helping people. they care only abour raking in as much cash as possible fo feed their cocaine habits and champagne addictions. Therefore, they are obsessed with sales figures and comparisons with other books. They have databases thet have sales figures of this book and its competitors in different ‘markers’ (you are not 4 person, you are a ‘consumer" and you don’t live in a country, you live in a ‘marker) and they gibber and twitch at their consoles creating frequency distributions (with 3-D effects) of these values. The data they ger are frequency data (the number of books sold in a certain timeframe), Therefore, if they wanted to compare sales of this book to its competitors, in different countries, they would need to read this chapter because i's all about analysing data, for which we know only the frequency with which evenrs occuz. Of course, they ‘won't read this chapter, bur they should ... 18.2. Analysing categorical data © Sometimes, we are intecested not in test scores, or continuous measures, but in categorical variables, These are not variables involving cats (although the examples in this chapter might convince you otherwise), but are what we have mainly used as grouping variables. They are variables thar describe categories of entities (see section 1.5.1.2). We've come across these types of variables in virtually every chapter of this book. There are different types of categorical variable (cee section 6.5.5), but ia theory a person, or ease, should fall into only one category. Good examples of categorical variables are gender (with few excep- tions people can be only biologically male or biologically female),* pregnancy (a woman can be only pregnant or not pregnant) and voting in an election (as a general rule you are allowed to vote for only one candidate). In all cases (except logistic regression) so far, we've used sitch categorical variables to predict some kind of continuous outcome, but there are times when we want to lao: 2t relationships between lots of categorical variables. This chapter looks at two sechniques for doing this. We begin with the simple case of wo categorical variables and discover the chi-square statistic (which we're not really discover~ ing because we've unwittingly come across it countless times before). We then extend this model to look at relationships between several categotical variables. 18.3. Theory of anal sing categorical data © ‘We will begin by looking at the simplest situation that you could encounter; that is, analysing two categorical variables. If we want to look at the relationship between two caregori- cal variables then we can’t use the mean or any similar statistic because we don’t have any vasiables chat have been measured continuously. Teying to calculate the mean of a eategori- cal variable is complerely meaningless because the numeric values you attach to different categories are arbitrary, and the mean of those numeric values will depend on how many ‘members each category has. Therefore, when we've measured only categorical variables, we analyse frequencies. That is, we analyse the number of things that fall into each combination "Before anyone tips my arms from their sockets and heats me around the head with them, Lam awate thar mumer- ‘ous chromasomal and hormonal conditions exist that complicate the matter. Also, people can have a different gender identcy to cher biological genderDISCOVERING STATISTICS USING sp: ‘TABLE 18.1 Contingency lable showing how many ca‘s will ine dance aller being trained with 4 aiferent rewards ' berry) Food as Reward Affection as Reward Totar (Gould They Dance? of categories, If we take an example, a rescarcher was interested in whether animals could be crained to line dance. He took 200 eats and tried to train them to line dance by giving ‘them either food or affection a: a reward for dance-like behaviour. At the end of the week they counted how many animals could line dance and how many could not. There are ewo categorical variables here: training (che animal was trained using cither food or affection, not both) and dance (the animal either learnt to line dance or it did nov). By combining categories, we end up with four different categories. All we then need to do is to count how many cats fall into each category. We can tabulate these frequencies as in Table 18.1 (which shows the data for this example) and this is known as a contingency table. Pearson’s chi-square test © If we want to see whether there's a relationship berween two categorical variables (Le. docs the amount of cats that line dance relate to the type of training used?) we can use the Pearson's chi-square test (Fisher, 1922; Pearson, 1900). This is an extremely elegant statis- ‘tic based on the simple idea of comparing the frequencies you observe in certain categories +o the frequencies you might expect to get in thase categories by chance. All the way back in Chapters 2, 7 and 10 we saw that if we fit x model to any set of data we can evaluate ‘thet model using a very simple equation (or some variant of it): deviation = (observed - model)? ‘This equation was the basis of our sums of squares in regression and ANOVA. Now, when ‘we have categorical data we can use the same equation. There is a slight variation ia that ‘we divide by the model scores as well, which is actually much the same process as divi the sum of squares by the degrees of freedom in ANOVA. So, basicaly, what we're doing is standardizing the deviation for each observation. If we add all of these standardized devia- tions together the resulting statistic is Pearson's chi-square (2) given by: 2 x peel 8.) in which / represents the rows in the contingency cable and represents the colurans. The observed data are, obviously, the frequencies in Table 18.1, but we necd to work out what the model is. In ANOVA the model we use is group means, but as I've mentioned we can’t ‘work with means when we have only categorical variables so we work with frequenciesCHAPTER 18 CATEGORICAL DATA instead. Therefore, we use ‘expected frequencies’. One way to estimate the expected fre quencies would be ro say ‘well, we've got 200 cats in rotal, and four categories, so the expected value is simply 200/4 = S0°. Thie would be fine if, for example, we had the same umber of cats that had affection as a reward and food as a reward; however, we didn’: 38 got food and 162 got affection as a reward. Likewise there are not equal numbers that could and couldn't dance. fo take account of this, we calculace expected frequencies for cach of the cells in the table (in this case there are four cells) and we use the column and row torals for a particular cell ro calculate the expected value: Lg, _ 10¥ total; x column totaly ‘nis simply the total number of observations (in this case 200). We can calculate these expected frequencies for the four cells within our table (row rotal and column total are abbreviated to RT and CT respectively): RT ve X CThoot _ 76% 38 modeleie yu = = fo = TE a 14 ee RT Wo X CTiged _ 124 x38 modelyood,o = — Soo = 23-56 RT yee x CTtieaiom _ 76% 162 model reaioa Yes= < Sm a8 56 RT No * CT aieeign _ 124 x 162 modelagenionNe = = Sap = 100.44 Given that we now have these model values, all we aced to do is take each value in each cell of our data table, subrract from it the corresponding model value, square the result, and then divide by the corresponding model value. Once we've done this for each cell in the table, we just add them up! 26-1444)? | (10-23.56)2 Tag * 2356 6136 00.44 _ (03.56)? | (13.56)? | (-13.561 13.56)" = "yaar * 9356 * ese * To0.4t 2.73 + 7.804299 + 1.83 =25.35 4 GB=OSOP | (114— 100.44)? This statistic can then be checked against a distribution with known properties, All we need to know is the degrees of freedom and these are calculated as (r= 1)(6~ 1) in which ris the number of rows and ¢ is the number of columns. Another way to think of itis the number of levels of each variable minus one multiplied. In this case we get df = (2 - 1) - 1)=1. If you were doing the test by hand, you would find a critical value for the chi-square distribu tion with df= 1 and if the observed value was bigger than this critical value you would say that there was a significant relationship between the two variables. These critical values are produced in Appendix A.4, and for df= 1 the critical values are 3.84 (p= .05) and 6.63 (p= 01) and so because the observed chi-square is bigger than these values itis significant at p < °01. However, if you use SPSS, it will simply produce an estimate of the precise probability of obtaining a chi-square statistic atleast as big as (in this case) 25.35 if there were no association in the population between the variables.DISCOVERING STATISTICS USING SPSS Fisher’s exact test © There is one problem with the chi-square rest, which is that the sampling distribution of the test statistic has an approximate chi-square distribution. The larger the sample is, the better this approximation becomes and in large samples the approximation is good enough to not worry about the fact that it is an approximation. However, in small samples, che approximation is not good enough, making significance tesct of the chi-square distribution inac- curate. This is why you often read thar to use the chi-square test the expected frequencies in cach cell must be greater than 5 (sec section 18.4). When the expected frequencies are greater than 5, the sampling distribution is probably close enough to a perfect chi-square discribution for us not to worry. However, when the expected frequencies are too low, it probably means that the sample size is too small and thar the sampling distzibution of the test statistic is t00 deviane from a chi-square distribution to be of any use. Fishor came up with a method for computing the exact probability of the chi-square statistic that is accurate when sample sizes arc small, This method is called Fisner’s exact test (Fisher, 1922) even though it’s not so much of a test as a way of computing the exact probability of the chi-square statistic. This procedure is normally used on 2 x 2 contingency tables (Le. two Variables cach with two options) and with small samples. However, it can be used on larger contingency tables and with large samples, but on larger contingency tables i becomes computationally intensive and you might find SPSS taking a long time to give ‘you an answer. In large samples there is really no poine because it was designed to over- come the problem of small samples, so you don’t need to use it when samples are large. SEEM The likelihood ratio © ‘An alternative to Pearson's chi-square is the likelihood ratio statistic, which is based on maximunvlikelihood theory. The general idea bchind.this theory is that you collect some data and create a model for which the probability of obtaining the observed sct of data is maximized, thea you compare this model co the probability of obtaining those dara under ‘the null hypothesis. The resulting statist is, therefore, based on comparing observed frc- quencies with those predicted by the model in which i and j are the rows and coturnns of the contingency table and In is the natural Jogatithin (this is the standard mathematical function that we came across in Chapter 8 and you can find it on your calculazot usually labelled as In or log). Using the same model and ‘observed valucs as in the previous section, this would give us: safc (28, ) 30 in 2,) ot) sae to EY] =2/(28 x 0.662) + (10 x 0.857) + (48 x —0.249) + (114 x 0.0.127]] = 218.54 — 8.57 — 11.94 + 14.44] 34 ‘As with Pearson's chi-square, this statistic has a chi-square distribution with the same degrees of freedom (in this case 1). As such, it is tested in the same way: we could lookCHAPTER 18 CATEGORICAL DATA ap the critical value of chi-squate for she number of degrees of freedom that we have. AS aoaaee pe valve we have here wil be signficane because its bigges chan the critical values od (pe.03) and 6.63 (p-=.01). For large samples tis seatistc wil be roughly the same or pears’ chi-squfr, but is prefersed when samples arc small PERM Yates’s correction @ “When you have a 2.x 2. contingency table (Le, cwo categorical variables ‘exch with two cat gories) then Pearson's chisquace tends to produce significance values that are too small (Ga other words, ic tends to make a Type T error). Therefore, Yates suggested a correction oe oanvtarcs frtaule (osualy referced co as Ytes's cotiuty correction). The basic idea is so an pen you calglat the devicion fom she maodel (che observed, - Model, equation {(8.1)) you subtract 0.5 from the absolute value of chi deviasion ‘before you square it. In (lin English his means you calculate the devia, ignore whether is positive or negative, ee trac 0.5 from dhe value and then square if, Pearson's equation then becomes: 7 (Jobservedi)— models] 0. sy ed madely For the data in our example this just translates into: (1356-05)? (13-56-05) i3.s6—0.5) _ (13. eo 4a Tse * (6136 I1B147 2442.77 41.70 =23.52 “The key thing to note is that it lowers the value of the chi-square staete and, therefore, rari lew signifeant. Although this seems like a nice solution 10 the robles, there is caaaeh ef evidence that this overcorzests and produces chi-square values thar are "00 small! Bee 2006) provides an excelient discussion of the problem with Yates's comero00 for sro Ay iPyonte inerested all | will say s that alshough i's worth knowing about, i's probably best ignored! 18. 4 Assumptions of the chi-square test © Teshould be obvious shat the chiesquare test docs not cely on assumptions such as havin cannes normally diseibuted data ike most ofthe othes tests inthis Book (categorical sarctavanot be normally distibuted because they aren't continuous). Howeves, the chi- square rest stil has wo important assumptions: 4 Pretty much all of the tescs we have encountered in this book ave made an assumption Peery pendence of data and the chi-square testis n0 exseprion. For the You might also like
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5807)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1091)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (842)
Magazines
Podcasts
Sheet music
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (599)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1716)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (590)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1104)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (897)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (537)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (2104)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (346)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (474)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1016)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (821)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1850)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (271)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (122)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (440)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (1946)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (401)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2259)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4610)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4203)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1929)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (806)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (98)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (266)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2521)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (231)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1898)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (234)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (738)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (3811)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2409)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (74)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (789)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (792)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (104)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (137)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (104)
British J of Management - 2012 - Bondy - The Dilemmas of Internationalization Corporate Social Responsibility in The
Document19 pages
British J of Management - 2012 - Bondy - The Dilemmas of Internationalization Corporate Social Responsibility in The
Santiago Rivera Rio
No ratings yet
The Meanings and Purpose of Employee Voice
Document23 pages
The Meanings and Purpose of Employee Voice
Santiago Rivera Rio
No ratings yet
Journal of World Business: Davide Fiaschi, Elisa Giuliani, Federica Nieri
Document18 pages
Journal of World Business: Davide Fiaschi, Elisa Giuliani, Federica Nieri
Santiago Rivera Rio
No ratings yet
Untitled
Document4 pages
Untitled
Santiago Rivera Rio
No ratings yet
Week 10: Evaluation and Integration: Quantitative Data Analytics DR Alison Mcfarland Alison - Mcfarland@Kcl - Ac.Uk
Document37 pages
Week 10: Evaluation and Integration: Quantitative Data Analytics DR Alison Mcfarland Alison - Mcfarland@Kcl - Ac.Uk
Santiago Rivera Rio
No ratings yet
Resource Curse Thesis
Document18 pages
Resource Curse Thesis
Santiago Rivera Rio
No ratings yet
Natural Resource Property Rights and Social Conflict
Document13 pages
Natural Resource Property Rights and Social Conflict
Santiago Rivera Rio
No ratings yet
Natural Resource Investment in Africa (Theoretical Approach)
Document13 pages
Natural Resource Investment in Africa (Theoretical Approach)
Santiago Rivera Rio
No ratings yet

Chi Square (Week 1 Reading)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chi Square (Week 1 Reading)

Uploaded by

Copyright:

Available Formats

You might also like