You are on page 1of 20
AP Statistics Name: Chapter 11 Activity 11: “I Didn’t Get Enough Blues!” Materials needed: One 1.69-ounce bag of plain M&M's per student, calculator, computer, and AP Stat textbook DO NOT EAT any M&Ms until you have completed the experiment ‘© The M&M/Mars Company, headquartered in Hackettstown, New Jersey, makes plain and peanut chocolate candies. In 1995, they decided to replace the tan-colored M&M’s with a new color. After conducting an extensive national preference survey, they decided to replace the tan M&M's with blue M&M's. © The company's Consumer Affairs Department announced the following: (this is an updated announcement) (yes, I know the distribution is 101%) According to M&M website, ‘On average, each 1.69-ounce package of Milk Chocolate M&M's should contain the following percentage of M&M's 24% blue 14% brown 16% green 20% orange 13% red 14% yellow They explained: While we mix the colors as thoroughly as possible, the above ratios ‘may vary somewhat, especially in the smaller bags. This is because we combine the various colors in large quantities for the last production stage (printing). The bags are then filled on high-speed packaging machines by weight, not by count. Purpose of this Activity Compare the color distribution of M&M's in your individual bag with the advertised distribution. In order to use as random a sample as possible, itis best if the bags of M&M's are purchased at different stores and not obtained from one or a few sources of supply. 1. Open your bag and carefully count the number of M&M's of color-brown, yellow, red, orange, green, and blue-as well as the total number of M&M's in the bag. 2. Fill in the counts, by color, and the total number of M&M's in the "Observed" row in Table 1. 3. To obtain the expected counts, multiply the total number of M&M's in your bag by the company's stated percentages (expressed in decimal form) for each of the colors. Write these values in the “Expected” row in Table 1 4. For EACH color, perform this calculation: (observed - expected)?/ expected and enter the result in the last row in Table 1 5. ‘Then add up all of these calculated values in the last row of Table 1 and name the sum 72. ‘Table 1: (Make a table in your notebook) Color Brown | Yellow | Red | Orange | Green | Blue_| Total Observed Expected (O-EY/E Answer the following questions in your notebook: 6. Does your sample reflect the distribution advertised by the M&eM/Mars Company? 7. Are the entries in the last row all about the same, or do any of the quantities stand out because they are "significantly" larger? 8. Did you get more of a particular color than you expected? 9. Did you get fewer of a particular color than you expected? 10. On my teacher website, click the link for the Excel form. Answer the questions using your table above. AP Statistics Name Chapter 11 Chapter 11: Inference for Tables: Chi-Square Procedures 11.1 - Chi-Square Test for Goodness of Fit From Chapter 9: (Inference for proportion) * Performed significance test with proportions. «p= The proportion of blue M&M's Ho: p = 0.24 He: p < 0.24 ‘© Performing 5 more of these test for each of the other M&M colors in the bag would be inefficient. + Doing this wouldn't tell us how likely it is that six sample proportions differ from the values stated by M&M/ Mars as much as our sample does. Chi - Square (’) Goodness of Fit Test * Determine whether a specified population distribution seems valid, * Single test used to see if the observed sample distribution is significantly different from the hypothesized population distribution © Null hypothesis + A population distribution is the same as a reference distribution. © Alternate Hypothesis: * A population distribution is different from a reference distribution. © Hypotheses can be stated in words + H,:The age group distribution in 1996 is the same as the 1980 age group distribution. + H,: The age group distribution in 1996 is different from the 1980 age group distribution. © Hypotheses can be stated in notation (the proportions that make up the distribution) © Hy! Pag = 14, Pa, = 24, Po = -16, Pg = -13,Py =-14,Po = 20 + H,:Atleast one of the proportions differs from the stated values. © Can be applied to see if the observed sample distribution is significantly different from the hypothesized population distribution. «The more the observed counts differ from the expected counts the more evidence we have to reject the null hypothesis. (0-8) measures how well the observed counts fit the expected counts, if the null were true. (o-£y A {Co =) calculated for each category of the distribution. (0-87 The SUM of ——— is called Chi - Square Statistic x? The larger the difference between the observed and expected values, the larger the Chi - Square Statistic x 2 Chi - Square Statistic: Pa Chi - Square (,) Distributions Properties of the Chi - Square Distribution © Family of distributions that take only positive values © Skewed to the right © The total area under a chi - square curve is 1 Based on degrees of freedom “n-1” or “categories-1” or “proportions-1” o Weare dealing with percentages, so five of the six values are free to vary but the sixth one may not because they must (should) all add. up to 100% A chi - square curve is specified by the degrees of freedom © Each row in the Chi - Square chart is a distribution based on the degrees of freedom. As degrees of freedom increase: co The density curves become less skewed ©. Larger values are more probable. © The curve becomes more and more symmetrical and more like a normal curve. Each chi-square curve: © Begins at 0 on the horizontal axis, increases to a peak, and then approaches the horizontal axis asymptotically from above. © The only curve that does not follow this is when df = 1 P- Value © Area under the curve to the right of the chi - square test statistic. ©. The probability that observing a value of 7 at least as extreme as the one actually observed. ©. Larger the value of the chi ~ square statistic, the smaller the P- value ©. The smaller the P-value the more evidence against the null hypothesis. Wc dieae eas BCR oon kaa * Random: The data come from a well-designed random sample or randomized ‘experiment. = 10%: When sampling without replacement, check that "= * Large counts: All expected counts are at least 5. o 1. The chi-square test statistic compares observed and expected counts. Don't try to perform calculations with the observed and expected proportions in each category. 2. When checking the Large Counts condition, be sure to examine the expected counts, not the observed counts. prise aed al ‘Suppose the conditions are met. To determine whether a categorical variable has 2 ‘Specified distribution in the population of interest, expressed as the proportion of individuals falling into each possible category, perform a test of Ho: The stated distribution of the categorical variable in the population of interest is. correct, Ha: The stated distribution of the categorical variable in the population of interest is not correct. Start by finding the expected count for each category assuming that Ho is true. Then calculate the chi-square statistic ; (Observed — Expected)? a eee a oe Expected where the sum over the k different categories. The P-value is the area to the right of x? under the density curve of the chi-square distribution with k ~ 1 degrees of freedom. Always record Test Statistic, p value, and degrees of freedom. AP Statistics Chapter 11 11.2: Inferences for Two-Way Tables Statistical Methods for Multiple Comparisons ‘© An overall test (that calculate a chi-square statistic) to see if there is good evidence of any differences among parameters of interest. ‘* A detailed follow-up analysis to decide which of the parameters differ and to estimate how large the differences are. Two - Way Tables ‘© Organizes data about categorical variables * Summarize large amounts of data by grouping outcomes into categories. + First step in this type chi-square statistic test is to arrange data in a two - way table. + “rx” table (row x column) To test H,, compare the observed counts with the expected counts. Expected Counts + Counts we would expect (except for random variation) if H, were true. The expected count in any cell of a two - way table when Hyis true is row total x expected count = 6108 table total Chi-Square Test with Two-Way Tables * Measure of the distance between the observed counts and the expected counts. * It isa distance - it is always zero or a positive value * When equal to Zero - the observed counts and the expected counts are exactly equal. © Large counts of x’ are evidence against H, because the observed counts are far from what we would expect if H,were true, ‘* Itis an approximate method that becomes more accurate as the counts in the cells of the table get larger. Chi- Square Statistic: = (0-4). E ‘The sum is over all “r X c “cells in the table. 6 Chi - Square Test for Homogeneity of Populations * Compares several population proportions Arrange into a two-way table: Select an independent SRS from each of “c” (several) populations. Classify each individual in a sample according to a categorical response variable with “1 possible values. There are “c” different sets of proportions to be compared, one for each population. State Hypotheses: H, = the distribution of the response variable is the same in all “c” populations. H,= that these “c” distributions are not all the same. (allows any other relationship among the population proportions) Check Conditions ‘¢ Must come from independent SRSs from population of interest. ‘* Populations 10 times larger than samples «All individual expected counts are at least 5 If H,is true, the chi-square statistic has approximately a chi-square statistic distribution with a specified degree of freedom. Degree of Freedom = (1-1)(c-1) The P-value for the chi-square statistic is the area to the right of x under the chi-square density curve with d.f. (degrees of freedom) The Chi-square Test for Independence * Asingle SRS is drawn from one population. + Observations are classified according to two categorical variables (these variables can have levels) * Tests the null hypothesis: ©. Hy: There is no relationship between the row variables and the column variables OR © Hy: There is no relationship between the two categorical variables. * “The row and column variables are not related to each other” Setup the Two-Way Table ‘© Marginal = each row total and each column total. © Calculate conditional probabilities Descriptive Statistics Analysis of data: describe relationship between categorical variables by comparing the percents (not counts) ‘* Compute conditional distributions ‘* Graph the data to visually examine. (Bar chart) The Chi-square Test for Independence ‘* Asingle SRS is drawn from a single population. ‘+ Tests that there is no relationship between the row variable and the column variable. * Will assess whether this observed association is statistically significant. Example Hypothesis Phrasing: H,: Smoking and SES are independent H,,: smoking and SES are dependent Example Hypothesis Phrasing: Hy: There is no association between smoking and SES H.,: there isan association between smoking and SES Check Conditions © Must come from well-designed random sample or randomized experiment + When sampling, check that population is 10 times larger + All expected counts are at least 5 Degree of Freedom = (r-1)(-1) The P-value for the chi-square statistic is the area to the right of x? under the chi-square density curve with d.f. (degrees of freedom) Note: If a conclusion is made that there is an association, this does not show or prove causation. 15, What's your sign? The University of Chicago's General Social Survey (GSS) ts the nation’s most {important social science sample survey. For reasons known only to social scientists, the GSS regularly asks random sample of people their astrological sign. Here are the counts of responses from a recent GSS: Sign: Aries Taurus Gemini Cancer Leo Virgo Count: 321 360 367 374 383 402 Sign: Libra Scorpio Sagittarius Capricorn Aquarius Pisces Count; 392 329 331 354 376 355 births are spread uniformly across the year, we expect all 12 signs to be equally likely. Do these data provide convincing evidence that all 12 signs are not equally ikely? If you finda significant result, perform a follow-up analysis. 27. Why men and women play sports Do men and women participate in sports for the same reasons? One ‘goal for sports participants is social comparison —the desire to win orto do better than other people. Another is mastery —the desire to improve one’s skills or to try one’s best, A study on why students participate in sports collected data from independent random samples of 67 male and 67 female ‘undergraduates at a large university."? Each student was classified into one of four categories based on his, ‘or her responses to a questionnaire about sports goals. The four categories were high social comparison high mastery (HSC-HM), high social comparison-low mastery (HSC-LM), low social comparison-high ‘mastery (LSC-HM), and low social comparison-low mastery (LSC-LM). One purpose ofthe study was to compare the goals of male and female students, Here are the data displayed in a two-way table: Gender Goal Female Male HSC-HM 14 31 HSC-LM 7 18 LSC-HM val 5 LSC-LM 25 13 (a) Calculate the conditional distribution (in proportions) ofthe reported sports goals for each ‘gender, {(b) Make an appropriate graph for comparing the conditional distributions in par (a). {€) Write a few sentences comparing the distributions of sports goals for male and female undergraduates. C 29, Why women and men play sports Refer to Exercise 27. Do the data provide convincing evidence of a difference in the distributions of sports goals for male and female undergraduates at the university? (a) State appropriate null and alternative hypotheses for a significance test to help answer this question. (b) Calculate the expected counts. Show your work. (©) Calculate the chi-square statistic, Show your work. 31, Why women and men play sports Refer to Exercise 27 and Exercise 29. (a) Check that the conditions for performing the chi-square test are met. (b) Use Table C to find the P-value. Then use your calculator’s y7cdf command. (c) Interpret the P-value from the calculator in context. (4) What conclusion would you draw? Justify your answer. 33. Python eggs How is the hatching of water python eggs influenced by the temperature of the snake's nest? Researchers randomly assigned newly laid eggs to one of three water temperatures: hot, neutral, or cold. Hot duplicates the extra warmth provided by the mother python, and cold duplicates the absence of the ‘mother. Here are the data on the number of eggs that hatched and didn’t hatch:!5 Water Temperature Hatched? Cold Neutral Hot Yes 16 38 15 No. i 18 29 (a) Compare the distributions of hatching status forthe three treatments (b) Are the differences between the three groups statistically significant? Give appropriate evidence to support your answer. 45. Regulating guns The National Gun Policy Survey asked a random sample of adults, “Do you think there should be a law that would ban possession of handguns except for the police and other authorized persons?" Here are the responses, broken down by the respondent's level of education: Education Less than Highschool Some College Postgrad high school grad college grad degree Yes 58 84 169 98 7 No. 58 129 294 135 99 Does the sample provide convincing evidence of an association between education level and opinion about aa handgun ban in the adult population? 2003 AP® STATISTICS FREE-RESPONSE QUESTIONS ‘5. A random sample of 200 students was selected from a large college in the United States. Each selected student was asked to give his or her opinion about the following statement. “The most important quality of a person who aspires to be the President of the United States is a knowledge of foreign affairs.” Each response was recorded in one of five categories. The gender of each selected student was noted. ‘The data are summarized in the table below. Response Category ‘Strongly | Somewhat | Neither Agree | Somewhat | Strongly Disagree | Disagree | norDisagree | Agree | Agree Male 10 Female 20 25, 15 Is there sufficient evidence to indicate that the response is dependent on gender? Provide statistical evidence to support your conclusion. 2002 AP® STATISTICS FREE-RESPONSE QUESTIONS STATISTICS SECTION I Part B Question 6 ‘Spend about 25 minutes on this part of the exam. Percent of Section II grade—25 Directions: Show all your work. Indicate clearly the methods you use, because you will be graded on the correctness of your methods as well as on the accuracy of your results and explanation. 6. Asurvey given to a random sample of students at a university included a question about which of two well- known comedy shows, S orF, students preferred. The students were asked the question, “Do you prefer S or F 2” ‘The responses are shown below. Preference Ss F Toul 185 139) 324 (a) Based on the results of this survey, construct and interpret a 95% confidence interval for the proportion of students in the population who would respond S to the question, “Do you prefer Sor F 2” (b) What is the meaning of “95% confidence” in part (a) ? (©) Ina follow-up survey, a separate group of randomly selected students was asked “Do you prefer F or S 2” ‘The responses are shown below. Preference | Ss F | Toul 68 8156 ‘Based on these two surveys, is there evidence that the stated preference depends on the order in which the comedy shows were listed in the survey question? Justify your answer, (d) Suppose the test in part (c) indicates that the order in which the shows were listed does make a difference. Is the pooled value 485+ 68 — 0.527 a reasonable estimate forthe proportion of students at the university Poa Wa + 156 parte who would respond $ ? If so, justify your answer. If not, what would be a more reasonable estimate? Explain why. 13 2008 AP® STATISTICS FREE-RESPONSE QUESTIONS 5. A study was conducted to determine where moose are found in a region containing a large bumed area. A map of the study area was partitioned into the following four habitat types. (2) Inside the burned area, not near the edge of the bumed area, (2) Inside the burned area, near the edge, (3) Outside the burned area, near the edge, and (4) Outside the burned area, not near the edge. The figure below shows these four habitat types. Note: Figure not drawn to scale. The proportion of total acreage in each of the habitat types was determined for the study area. Using an aerial survey, moose locations were observed and classified into one of the four habitat types. The results are given in the table below. Habitat Type | Proportion of Total Acreage | _ Number of Moose 1 0.340 25 2 0.101 Bs 3 0.104 30 bd 0.455 40 Total 1.000 Tr (a) The researchers who are conducting the study expect the number of moose observed in a habitat type to be proportional to the amount of acreage of that type of habitat. Are the data consistent with this expectation Conduct an appropriate statistical test to support your conclusion. Assume the conditions for inference are met. (b) Relative to the proportion of total acreage, which habitat types did the moose seem to prefer? Explain. (© 2008 The College Board, All rights reserved, Visit apcentra.collegeboardLcom (for AP professionals) and www-collegeboard.comiapstudents (fr students and parents). Th GO ON TO THE NEXT PAGE. | Y \Test LIA AP Statistics Name: Part 1: Multiple Choice. Circle the letter corresponding to the best answer. Use the following for questions 1 - 3: A well-known chewing gum maker wants to determine if any of its four flavors of gum are more popular than the others. A random sample of 80 people who say they chew gum regularly is asked to identify their favorite flavor of gum: Here are the results: Flavor Peppermint Cinnamon [Wintergreen | Spearmint Frequency 25 19) 22 14 1. Which of the following would be an appropriate null hypothesis for the company to test? @ A= mr (b) The observed counts are all equal to 20. (©) Flavor preferences for the population are evenly distributed across the four flavors. (@) At least one of the four flavor preferences in the population is different from the other three. (e) The observed counts are equal to the expected counts. 2, Which of the following are conditions that must be met in order to test this hypothesis using a chi-square test? ) I. If p= proportion of gum-chewers in the population, then mp 210 and n(1- p)=10. II. All expected cell counts are at least 5. TIL. The sample size is no more than 10% of the population size. (a) Land Honly (b) Mand Ill only (©) Land Ill only @) only (&) 1, Hand 1 3. Which of the following represents the component of the chi-square statistic for Wintergreen? (a) 22 7 © a © (ae) ‘©BFW Publishers The Practice of Statist for APY Sle Use the following for questions 4 ~ 6: Do male and female children respond differently to colors? A study of color association in children asked separate random samples of male and female fourth-graders what emotion they associated with the color red. Here are the results for each group: Emotion ‘Anger [Happiness] Love | Pain | Total Female | 27 19 39 7 102 Male | 34 12 38 2B 12 Total OT 31 7 5 214 4, Which of the following would be the appropriate null hypothesis for this test? (a) The distribution of emotional associations with the color red is the same for male and female fourth-graders, (b) Gender is dependent upon emotional association with the color red. (c) Emotional associations with the color red are independent of gender. (d) The number of observations in each cell is the same for each emotional association. (c) 25% of all fourth graders associate the color red with each of the four listed emotions. 5. Under the assumption that the null hypothesis is true, which of the following represents the expected count for female children who associate the color red with love? (a) 39 77)(214) wy I) 102 (77)(102) 214 39) (102) @ G2)0%) 7 (39) 214 (©) © 6. The chi-square statistic for these data is X*= 4.629. Which of the follow: value for this test? (a) 0.005 s P-value s 0.01 (b) 0.015 P-value = 0.025 (©) 0.025 s P-value s 0.05 (@) 0.05 s P-value < 0.1 (©) P-value2 0.1 tervals contains the P- (©BFW Publishers The Practice of Statistic for APY Sle ) Use the following for questions 7 ~ 8: State traffic engineers want to characterize the types of vehicles found on three state roads. They take a random sample of vehicles on each road over a two-week period and get the results in the table for the number of vehicles of each type on each road. The engineers perform a chi-square test of homogeneity, using the null hypothesis that there is no difference in distribution of vehicles types on the four roads. Vehicle type Cars Light trucks/SUVs Heavy trucks/trailers Route 9 126 a2 16 Route 47 216 31 35 Route 116 271 4L 56 Route 176 413 37 a 7. For this chi-square test, what are the correct degrees of freedom? @ 3 () 5 ©6 @u © 12 8. Below are the individual components for the chi-square statistic for this test: Cars Light trucks/SUVs __Heavy trucks/trailers Route 9 18 14.1 03 Route 47, 13 38 09 Route 116 06 09 10.7 Route 176 60 95 1d Based on the original data and the components, which of the following statements is true’ (a) The observed count of heavy trucks/trailers on Route 176 is much higher than the expected count (b) There are many more light trucks on Route 9 than we would expect if the null hypothesis were true. (c) The number of observed cars on Route 116 is much lower than we would expect if the null hypothesis were true. (d) The greatest difference between observed and expected counts is for heavy trucks/trailers on Route 9. (e) The chi-square statistic for this test is less than 30. 9, Which of the following statements about chi-square distributions are true? L.A chi-square distribution with fewer than 10 degrees of freedom is roughly symmetric. II. The more degrees of freedom a chi-square distribution has, the larger the median of the distribution. IIL For all chi-square distributions, P(z? = 0) =1 (@) Lonly (6) only (c) IM only (@) Mand 111 (©) All three statements are true ‘OBFW Publishers The Practice of Statistics for APY, Sle 10, Is the accident rate for some car colors different than for other car colors? An insurance company selects a random sample of cars that it insures and records their color (using five categories: white, silver, black, red, or “all others”) and whether or not they have been involved in an accident in the last three years. They perform a chi-square test of association and obtain a test statistics of x°= 8.474, which yields a P-value of 0.0758. Using a significance level of ct = 0.05, which of the following is the appropriate conclusion for this test? (a) Reject Ho: there is convincing evidence of an association between car color and proportion of cars involved in accidents. (b) Accept Hf: there is convincing evidence that car color and proportion of cars involved in accidents are independent. (©) Reject Ho: there is insufficient evidence to establish an association between car color and proportion of cars involved in accidents, (d) Fail to reject Hg: there is insufficient evidence to estal proportion of cars involved in accidents. (e) Fail to reject Hg: there is convincing evidence that car color and proportion of cars involved in accidents are independent. hh an association between car color and ‘OBFW Publishers The Practice of Statistics for AP", Sle Part 2: Free Response ») Show all your work. Indicate clearly the methods you use, because you will be graded on the correctness of your methods as well as on the accuracy and completeness of your results and explanations. 11. Big Box Electronics, a large national chain store, has one store in the city of Kingston. One factor in deciding whether to build a second store in the city is whether the current store is serving all residents equally well, or whether unequal proportions of residents from different parts of town are using the store because it’s located on one side of town. The national managers of Big Box divide Kingston into four geographical regions and determine the percentage of residents who live in each region, Here’s what they find: Region North ‘South’ East West Percentage of population | 40% 24% 22%. 14% ‘Then the managers take a simple random sample of 250 shoppers at Kingston’s Big Box store and out: determine which part of town they come from by asking for their zip code when they are checking Region North South, East ‘West ‘Number of shoppers 120 48. 62 20 Is Kingston's only Big Box store used by a higher proportion of the residents in some parts of town than others? Support your conclusion with an appropriate statistical test. ‘©BFW Publishers “The Practice of Stasis fr APF, Sie 12, A few weeks before the senatorial election between incumbent Senator Smirk and his challenger, former Governor Graff, the senator’s polling organization wants to know where he should concentrate his campaigning. They take simple random samples of potential voters in the southern and northern portions of the state, and ask them if they have decided who to vote for or are still undecided. Here are the results Decided on a Still candidate __ undecided _— Total Region North 116 60 176 South 148 52, 200 Total 264 112 316 (a) Do these data provide convincing evidence that there is a difference in the distribution of voters who have decided or are still undecided in the two regions? Use a chi-square test to support your conclusion, (b) The pollsters are concerned that while all 200 people in the “South” sample responded, 24 people (out or the original SRS of 200) in the “North” sample did not respond. Is it possible that the opinions of these people would change the pollsters’ conclusions? Explain. ‘OBEW Publishers The Practice of Statics for APF Sle

You might also like