You are on page 1of 13
[yet a le ve Introduction to Hypothesis Testing D&T ‘The concept of evidence collection ‘A well-known story goes something like this: Four students missed the midterm for their statistics class. They went to the professor together and said, “Please let us make up the exam. We carpool together, and on our way to the exam, we got a flat tire. That's why we missed the exam.” The professor didn’t believe them, but instead of arguing he said, “Sure, you can make up the exam. Be in my office tomorrow at 8,” ‘The next day, they met in his office. He sent each student to a separate room and gave them an exam. The exam consisted of only one question: “Which tire?” We don’t know the outcome of this story, but let's imagine that all four students answer, “left rear tire.” The professor is surprised. He had assumed that the students were lying. “Maybe,” he thinks, “they just got lucky. After all, if they just guessed, they could still all choose the same tire.” But then he does a quick calculation ‘and figures out that the probability that all four students will guess the same tire is only 1.6%. Reluctantly, he concedes that the students were not lying, and now he must give all of them an A on the exam. ‘The statistics professor has just performed a hypothesis test. Hypothesis testing is a formal procedure that enables us to choose between two hypotheses when we are uncertain about our measurements. We call hypothesis testing a formal procedure because itis based on particular terminology and a rather well specified set of steps. However, we hope to show you that this “formal” procedure has a generous helping of common sense supporting it. Start with: A Pair of Hypotheses In a formal hypothesis test, hypotheses are always statements about population parameters. Remember you would never make a guess at something you already know, hypothesis are always on parameters not statistics! | _ ahaa py ty, Prenstt) Hor wees albbnbie fe no of Veure, no peal henge sessing | Hypotheses come in mutually exclusive pairs: aed ¢l! ertompessicg | =~, A > , Ou or Hoar os Serer\, . Ant a the ober Ber tos \ SPR ate ae ville Hye eel Tyre Cant haus Both! The null hypothesis, which we write HO (and pronounce “H-naught” or simply “the null hypothesis”), is the conservative, status-quo, business-as-usual statement about a population parameter. In the context of researching new ideas, the null hypothesis often represents “no, change,” “no effect,” or “no difference.” ‘The alternative hypothesis, Ha (pronounced “H-A”), is the research hypothesis. It is usually a statement about the value of a parameter that we hope to demonstrate is true. ‘The most imy step of a formal hypothesis test is choosing the hypotheses. In fact, there are really only two steps of a formal hypothesis test that a computer cannot do, and this is one of those steps. (The other step is checking to make sure that the conditions necessary for the probability calculations to be valid are satisfied.) thesis tests are like criminal trials. In a criminal trial, tw: fore the Hi fendant is not guilty, or he is guilty. These hypotheses are not given equal weight, however. The jury is told to assume the defendant is not guilty until the eviden suggests this is not so. (Defendants charged with a crime in tht Canada must be found guilty “beyond all reasonable doubt.”) 7 ‘The null hypothesis is, by default, assumed true. This means when no substantial evidence is provided the null is “not dethroned’ this does not guarantee the truthfulness ofthe null, rather just that at the moment we can’t discount i ‘Should we then claim H, is true? Or that we failed to say H, is false? 4-yyyed) a Hypothesis tests follow the same principles. The statistician plays the role of the prosecuting attorney, who hopes to show that the defendant is guilty. The hypothesis thatthe statistician or researcher hopes to establish, called the ‘claim,’ plays the role of the alternative hypothesis. The null hypothesis is chosen to be a neutral, noncontroversial statement. Just as in a jury trial, where we ask the jury to believe that the defendant is not guilty unless the evidence against this belief is overwhelming, we will believe that the null hypothesis is true in the beginning. But once we ‘examine the evidence, we may reject this belief if the evidence is overwhelmingly against it. Using Mathematical Signs to set up Hypothesis ‘An extremely powerful tool for setting up the correct hypotheses is to consider the condition of the statement given as it relates to the following mathematical notation. Gs or 4>,<) Any sign that includes the equality (=, <, >) must be located in the null hypothesis! Ue Halt of Wop SOS — Hor Apiors Conversely, the corresponding non-equal sign (+, >, <) must be located in the alternative hypothesis! Or vise-versa. . NAT it Ok Ho P2 OS He MAOH Consider the following statements. In each, state the appropriate null and alternative hypothesis. (a) One flips a coin n-times to test the claim that the coin isa “fair” coin ' Ve expe! S98 deeds So% HS. hypothesis | ontains the 2 Teis-alsoacceptable to always write it only the equal Ho 20S Ha eFOS (b) An ‘a doctor claims that the average cost of a Multiple Resonance Imaging (MRI) test, isffess thn $1,200. 4 eae ee an ak of ites Statist Ty, prt ge — aiae Fyroo THE oO Sins 1 . #200 Hes 0<'1200 He. ae (c) An economist wishes to test March unemployment rate in Alberta - the percentage of able-working Albertan’s - is €igher han February’s Alberta unemployment rate oa) 7 Ho: p£ 0.068 ee Hat P #064 bot 64% prof ore atime! Hes p 90-068 ar p70.064 pep ox altos, (d) Researchers believe that a new chemotherapy treatment will@rolong)the lifetime of patients afflicted with liver cancer. The @ieanfaveragd survival time of liver cancer patients using the current chemotherapy regime is 43 months. Mameric! Ho AS YS months of Ho E43 mening Har Ard Smonihs , Hes LF 4.3 mantles (e) Anepidemiologist wants to see iftbetter detection methods ancdimproved treatment has awe conclude that the null hypothesis is supported “we fail to reject Ho” (FRHo), implies if Hy is true we got a typical response ab intervie ©S the Hue Mn ele sealed mess thik FO ta fis oli, Phe The Noll thel ype arept pol ober pene fled 1 wee) kest 1 ep vatue < awe conclude thatthe null hypothesis not is supported “we reject Ho” (RHO), | implies if Hy is true something rare has occurred. ae a nN Loe : ory “ 2. \ ie Dasicvs ae oe i Ys we i ated ml sae Beer hese He wn a? tee : gee et oe Het em > ee oral ote Oe t ae 3 of ay Itis improper (maybe even impolite/incorrect!) to say that you have “accepted” the null hypothesis, when your p-value is bigger than 0.05. Instead, we say “We have failed to reject Ho” or “We cannot reject Ho.” The reason for this is that several factors might make it difficult to determine whether the null hypothesis is false. Appin, The Pobien tage 5 4 Ha ynst bene you cet prove tuslt Bars Ast genrlatee Ignite, | aya Coming to Incorrect Decisions ( $ Did at prove 7 just Dd? f Mistakes are an inevitable pat ofthe hypothesis-testing process. The trick is not to make them too qe r fen, (Not talking about mathematically mistakes here, we are referring to mistakes of conclusion.) One mistake we might make is to reject the null hypothesis when it is true. This is called a Type I error. For example, recall the default or null hypothesis of a court trial is that the defendant is innocent.’ If a truly innocent defendant is found guilty this is considered a type I error. _Mowever, we might mistakenly, fail to reject the null hypothesis when itis actually false. This is called a Type II error. In our example, this would be described as finding a truly guilty defendant innocent. SO eR Ue per eee los ‘Ho True: Ho False Ps oka RHo _a@=significance level 1-f=Power "4oo)* a ——. P(RHo|Ho true) = P(RHolHio false) Heel P(type | error) = no error FRHo 1-@ B = P(RFHolHo false) be ‘onfidence level P(type II error) doe” = P(FRHo|Ho true) 4 no error Calculating the poweris tricky and somewhat complex, in part because it requires that we know the true value of ti population proportion. We leave this calculation to a future statistics course, For now, be aware that if you do a hypothesis test and do not reject the null, then there is always the chance that you have made a mistake because your power is too low. You simply don’t have enough evidence to tell the difference between the pl hypothesis and the truth, a eee i rrS—r——“‘“EUCCO ‘The Tradeoff between Significance Level and Power bn Aiea alee bees We are free to choose any value we wish for the significance level. Typically, we set this probability at 0.05, but sometimes we go as low as 0.01. But why don't we make it arbitrarily, = © small? Say 0,00000012 That way, we'd almost never make this mistake. guar make *4P% FY thy” Ze Nowe Bid mane Set We can’t make the significance level as small as we would like because we have a price to pay. Bro The price is that if we make the significance level smaller, then the power gets smaller too! Jus a a To see this, think about our criminal justice example. We can make the significance level, the ie probability of convicting an innocent man, a = 0 by following a simple rule: Free every a1 defendant. If everyone goes free, then itis impossible to convict an innocent person because YoU yaaont are convicting no one. But now the power—the probability of correctly convicting a guilty person | et is: 1B = 0%, Know that although a and f are inversely related 1 — a is not how we calculate B ee Clearly we have a lot of p symbols let's review their meanings... | I Variables, Proportions; a sl Hypothesis Testing for Categori modification to the CLT. P- Pap. prop Test Statistic bea B= sanpe PrP Where p, represents the hypothesized proportion Po byP Example: ifthe number of people sampled was 100 000 and the number of Asem success’ was 25 214, could you conclude the population proportion (i) was Ree 0.257 (ii) was less than 0.25? (ii) was more than 0.25? — : , ea Pevelane Has C)inpAos a U2) Has ORS Lad) Yer? it _ BM 01ST 156 Se a thé ob how cane © Semple is Test ste+ 70 605 Understanding how to interpret the p-value is crucial to understanding hypothesis testing, The computer might compute the p-value for you, but you need to understand how the computer did this calculation if you are to successfully perform a hypothesis test. ‘The p-value is all about extremes. The meaning of the phrase “as extreme as or more extreme than” depends on the alternative hypothesis. There are three basic pairs of hypotheses TC ft He p= po Ho: Ho: p= po fek, So He p# po He p Po a If the alternative hypothesis is ont Ha: p # po (the true value of p is either bigger or smaller than what the null hypothesis claims) f then “as extreme as or more extreme than” means “even farther away from O than the value you observed.” This corresponds to finding the probability in both til of the N(O,1) distribution, This is called a two-tailed hypothesis... het “a a w Pe If the alternative hypothesis is v Ha: p< pz (the true value is less than the value claimed by the null hypothesis) then “as extreme as of more extreme than” means “less than or equal to the observed value.” This corresponds to finding the probability in the left tail of N(O, 1). This is an example of a one-tailed hypothesis. This alternative is also sometimes called a left-tailed gs Jower-tailed hypothesis, because the p- value area is in the left tail Sa e Finally, ifthe alternative hypothesis is Ha: p > py (the true value is greater than the value claimed by the null hypothesis) then “as extreme as or more extreme than” means “greater than or equal to the observed value.” This corresponds to finding the probability in the right tail of N(O, 1). This is another one-tailed hypothesis. This alternative hypothesis is called a right-tailed or upper-tailed hypothesis. are how The p-value is always the tail area(s) of the curve, which side will correspond to the sign in the alternative hypothesis! : ~ J 0 depo DO RR pa 7 tle pe \se us ey YF zl ost paula aS i$ an © j-o.4ihol 7 ay ooh — + 0.0544 ol Pala 2(0.88%9), 9g “VS POO Prvelutto-ts Example: The BRCA1 is a gene that has been linked to breast cancer. Itis believed that aot all women who have a family history of breast cancer have the BRCAJ mutatign fo Researchers used DNA anal me to search for BRCA] mutations in(|69srandomly chosen on) Of the 169 women tested, 27had BRCA 1 mutation Does this data suggest tha mory than(2% of women with a family history of breast cancer have the BRCA1 mutation? (a) State the appropriate null and alternative hypothesis. Ho: PEO? ge Hot PEO ive Pe PMP cb wamenith Wi Has p> out He: p72 a rr— (b) Find the value of the test statistic. agin gustan A a b.1d TONG % ee ae Z-P-Pe yout & |. S407 Ponty soe (ey Gao 2 ot Ne 7 VaR te ee \ (c) Find the P-value. Her p20. _ pevrbee ag Pr vel? 10,5444 ; fmt aa Tel shot Po pone? # =) 0.0984 > 0-05 wher nop ob woe Sp Reseed on ths alate VF epee P ae a LDLrtr——e Sah ob Beast tani lays tron 0:20 Hypothesis Testing for Numerical Variables, Means Test Statistic = i Fy Ko Where uo represents the hypothesized average ormean mn a> prety pace all fle Hae! Uge t usher © unbaos 1 ! or 2 whe 5 known =» prety much Aawers You will always use one of the following three pairs of hypotheses for the one sample t-test: The p-values for each alternative hypothesis, all using the same t-statistic value of t = 2.1 and the same sample size of n= 30. Wer wee Ile pot = Bord = 24 oot, dln eS Firs = 2S OF as “a! pails 2 feria ODS 0.0% i eae \ Lapgiepvsiet of : SE unt a(Eoobe © one | 1008S pet eget triikb {fen Example: Hypothesis Testing of the Population Mean pt Ina study conducted by Patel et al, the immune system status was measured on 15 randomly selected people who are infected with the HIV virus. One’s immune system status is measured by a CD4. Higher CD4 values are associated with better immunity system function 16,324, 256, 536, 321, 190, 818, 355, 465, 519, 87, 108, 190, 573, 1032 Using MINITAB, the 2, #, s were computed and provided below. Descriptive Statistics: CD4 Variable N Mean StDev Qi Median 3. cps 15 386.0 279.2 190.0 324.0 536.0 Can you infer from this data that the mean CD4 count of all those infected with HIV iqlessyhan 500? (a) State the appropriate null and alternative hypothesis. Ho wr, $00 gg He HO Hes we S00 Ye tne Seo (b) Find the value of the test statistic. Assume the CD4 counts of people infected with HIV are Normally distributed. _ ee ee ey Cia CMMs) | (c) Find the P-value. ron + debe 0.054 pveloe 20.01 (d) Provide a conclusion, testing at a level of significance of 5%: y a 4 ee) ele 98% Garten The bony those he lesg then 50 ad voit BV More Examples: Example: An office manager believes the @ean mount of time spent by male-office workers playing fantasy football at work is a normally distributed random vari mean€xceeding 23)ninutes per week. The office manager randomly selecte: memeivctes Sp to participate in such gaming) and had their ‘fantasy football website’ activity monitored for a week. The amount of time each male spent over the course of the week is given below, to the nearest minute. 35,23, 48, 13, 29,9, 44, 11, 17, 30, 21,42, 32, 37, 28, 43, 34, 48 The mean, median, and standard deviation of this sample is: ¥ = 30.22, = 31,8 = 12.47 Does this sample support the office manager's belief? Assuming that the amount of time a male in this particular office spends per week playing fantasy football is normally. distributed, conduct the appropriate statistical test using a level of significance of 5%) Ho: a4 28 = Ha: A? 2S beta. pared = 1H Sua) Ca) t fee ee aos dink, Hb af Pasett on : hom «4 work per ele sped Pell ae t le spent P cp tien 2S we foal Ps Cae ae weare Met a0 Cpl ta Coal Bowed, oe i mind yor, a eae Ly Wwe ort 95% > ecb a) een Hine 6 A fo Po Example: Premier Redford’s approval rating was at@2in September 2013. That is, @a) of Alb x's approved of her performance as the Premier of Alberta. A Leger survey one-thoygand Alberta residents were randomly chosen between February 24th and 28th, io ‘approved’ of Premier Redford’s performance as premier. % Does this recent poll indicate that Premier Redford’s approval rating na€Gectina) since September 2013? Test at a = 0.05 z Hing? 8% pe (pr o-32 Harpe O-dt Hs pg o-3t wit ap oe spp earwsdl fom Ss A toes we sup? Bade 0.222854 han Example: A cigarette manyfacturer claims that the average amount of nicotine in a certain brand of cigarette §& 1.5 milligram A random sample oft rettes of this particular brand was taken; the amount of nicotine in each cigaretle was Measured. The average and standard deviation of this sample were(T-6mgs and@2Ings. ) x Does this data suggest on average, the amount of nicotine inthis brand of cigarette jxéqual) to 1.5 milligrams? Use a = 0.05, = Ho, A= LS ber 27S, a nied Bose on ths Tate il * Ale 5 tol nice wee tee eae SM pt Sqr gs er gpk tnpile 3, BFF) cy yor te ge 15S! C1552 7 taroet te eteess es ol ae cae \ ow om figs ke pale nrve’ : Uy bree ben meee? Bet gpd errr oo Thome Beer Heel St Wen we Bis Bt HS wer scoble a9 type EER BF 7 Pe Example: An Ipsos-Keid poll conducted in early September 2013 suggested that, percentage of Canadigns who will use their cell phones while they are drixing is(19%) A random sample of 60) ‘drivers in the City of Calgary is taken, of whicl ere observed to have a cell-phone in ther hand while diving. Does this sample suggest tp at the percentage of Calgary drivers who use a cellephage while operating a vehicle iRaeen than the national percentage? Test wsing@=001;) x % Mos P= On9 Hes Pons Povo © od gf sol 1 printleb & O04 Bare o delevet Thee “hove A Fes Bevel on the dale, it offers | Chen dies b the prperlion © bert Drow OF the a re a dest hae oho ue all phos able f ‘le : Ql, whe un call phos we woe sure te prop a a aa and aa Davy Beloit jangle Tle inter’ ‘ os bl ost _. ae ps le lew! ol %4 ow. athe! Jeet agpe ot Bere 7 oe ee Quen Mock? world yor be » : ary pv jun Oe F Pulp Tet He rose ai vee eet fe wid het PAe at sale nen i ibs 5 72 fs sh ; oo te aul Canhelnt Bebe te s HY ote weet Sprain ang cont pe oh jo!

You might also like