This action might not be possible to undo. Are you sure you want to continue?
Many elementary concepts have been skipped. At this stage, it is assumed that you should know them well. In particular, you MUST know how to do HATPC for each of the 8½ hypothesis tests. Only important things, or those that inter-connect several topics together, are elaborated here. You have ABSOLUTELY NO hope of passing STAT170 if you do not know the 8½ HATPCs. This PP file will NOT push you from F to P. The contents of this file will only help the P or above students, given the presumed basic knowledge.
Binding things together
Review of: • 5 types of graphics • 5 types of research questions • 8½ statistical tests • 8 or MORE types of reports
Displaying Data: 5 types of graphics
Displaying Data: 5 types of graphics
(The following table conveys the same information as the previous slide.)
Combination of variable(s) Graphic •Bar chart •pie chart •Histogram • stem-&-leaf Clustered bar chart Scatter plot Comparative box plots
clustered bar chart comparative box plots bar chart or pie chart
comparative box plots
bar chart or pie chart histogram or stem-and-leaf plot
One categorical (Lecture 2, 11) One numerical (Lecture 2, 7) Two categorical (Lecture 2, 11, 12) Two numerical (Lectures 2, 9 & 10)
scatter plot histogram or stem-and-leaf plot
One categorical and one numerical (Lecture 2, 8)
many students have difficulty in this very first step. Freq. The most important step is correctly identifying the types of variables: NUMERICAL vs CATERGORICAL. 5 This doesn’t make any sense! 6 2. Comments on a single bar chart (seldom asked) Comment depends on whether variable is ordinal or nominal • Ordinal: comment similar to histogram • Nominal: comment on which categories have the highest count and lowest frequencies 400 350 300 250 200 150 100 50 0 meat vegetarian diet vegan Skewed to the right. Comments on a single histogram (or stem-and-leaf plot) 1. 5 different types of graphics. The correct/wrong identification of variables would lead you to the correct/wrong: • Type of graphic • Research question. 5. and 5 possible research questions. lowest frequencies near the centre • U-shaped. high frequencies near both ends. 3. 4. but slightly skewed left • Range from 0 to 12 Freq. How to comment on graphics: 1. and • Statistical test. 2. 100 80 60 40 20 7 Individual Days 0 0 3 6 9 12 8 . normal) Range from xxxx to xxxx Majority (high frequencies) of data about xxxx Comment outliers (if present) Comment on any unusual features (if present) Assessment Example: • U-shaped. 500 400 300 200 100 0 0 5 10 15 20 25 30 Comment on shape (skewed left/right.5 types of graphics STAT170 is restricted to only 5 types of combinations of variables. Surprisingly.
ie about 1/3 and 2/3 for smokers and non-smokers. Comments on clustered bar charts Compare the shapes of the clusters.3. Shapes (not size) similar ⇒ The 2 variables independent (ie have no association) (since % are the same) Comments on clustered bar charts: explanation Shapes (not size) not similar ⇒ The 2 variables not independent (ie have association) (because % are not the same) 11 Never compare the actual frequencies (sizes). Comments on comparative boxplots • Compare medians • Compare spread (IRQ) • Compare outliers (Even when there are no outliers. if any Comment on residuals – Sym on both sides of the line/normal? – Constant SD? UAI day evening 15 20 25 30 35 Age Birth Rate 50 45 40 35 30 25 20 15 10 5 0 10 15 20 25 30 35 40 45 age marriage 55 50 45 40 35 30 25 20 15 10 10 20 30 40 husband age 50 60 70 110 100 90 80 70 60 50 40 30 20 10 -1 0 1 2 GPA 3 4 5 9 Median Age 10 5. smoking status is independent of Activity Level (no association) 12 . say “no outliers”. NOT the sizes. Comments on scatter plot • • • • Comment on linear/curved? Positive or negative slope? Comment on amount of scatter (big or small?) Comment on outliers. Since proportions are almost the same.) Class Class 4. Only compare % (or proportions) (shapes).
you’ll only lose a few marks in Q.Comments on clustered bar charts: explanation Never compare the actual frequencies (sizes). If you cannot distinguish between nominal and ordinal.1. Only compare % (or proportions) (shapes). similar in shape (although different sizes) Different shape. there is an association between smoking status and gender. 15 16 numerical comparative boxplots 2-sample t test scatter plot T-test of β Histogram 1-sample Z or t test bar chart z-test of proportion or chi sq test of proportions categorical numerical comparative boxplots 2-sample t test bar chart z-test of proportion or chi sq test of proportions Histogram 1-sample Z or t test Note: 7 tests above + paired t-test +”OR”= 8½ tests in STAT170 . No need to further classify into continuous or discrete(=integer). But how about numerical vs categorical ? See next slide. (although same size) 13 14 The 8½ hypothesis tests in STAT170 DATA categorical Clustered barchart Chi sq test of association + OR Determining numerical vs. Since percentages of smokers and non-smokers are obviously different for males and females. nor further classify into nominal or ordinal. categorical You only need to be able to identify between numerical and categorical.
plus other marks in subsequent parts of the questions. How students fail ? But many students already have trouble in the first question: How to determine how many variables are there?to make friends with? Who do you find it easier frequency 400 350 300 250 200 150 For example. HOW MANY variables? 2.Example: Numerical vs Categorical Age: age in years ⇒ Numeric (continuous) ⇒ Histogram / stem-leaf => z-test or t-test Age: 0-12 children (1). (This is unlike other 1st-year stat courses in other universities. not the meaning we use in daily language. The key is look at the definition. 20 19 . Read the question! The results are unchanged if we use the names ABC or XYZ instead of 17 AGE.) 18 A mistake will cost you at least 6 marks in HATPC. How many variables are there? 3 or 1? 100 50 0 same sex opposite sex response either Think of the survey. and almost the correct test. 13-18 teenager (2). How many questions? 3 or 1? How many columns do you need to store the data? 3 or 1? You are doomed if you choose “3 variables”. In fact there is no test in STAT170 that involves 3 variables. hopefully. ⇒ Categorical (ordinal) ⇒ bar chart /pie chart => Chi sq test of proportions (GOF test) No one can help you … How many such mistakes can you afford to make in exam? 3 such mistakes => you’ll fail in STAT170 You have absolutely no hope of passing STAT170 if you cannot distinguish between numerical and categorical variables – since the whole philosophy of STAT170 is based on classifying categorical and numerical variables. The HATPC is then. Are the variables numerical or categorical? Answering these 2 questions correctly will lead you to one of the 5 cases. > 18 adult (3). Absolute bottom line: 1. bookwork.
) 21 22 Another example: How many variables are there? 1. perhaps it fits best in the 2nd case (one sample t-test). Count how many variables 2. How to determine the appropriate test Variable(s) One categorical Graphics Barchart.How students fail ? Smoker Male Female 4 5 Non-smoker 11 8 Getting a pass in STAT170 You need to be able to do ALL of the following: 1. boxplot Clustered barchart Scatter plot Is there an association between … and …? Is there a relation between … and …? Comparative Is there a diff in heights One categ (binary) & boxplots between males and one numeric females? Note: 1. if you can do ALL of them well.) •Is the proportion of smokers equal to 0.3? •Are the proportions of meateaters. Then think! Eg: Weight loss program? Y1=Wt before.g. 2. pie chart Research Question (e. There is the paired t-test which doesn’t fit in any of 5 cases above.10) 2-sample t-test (Lect 8) One numerical Two categorical Two numerical Hist. 2 or 4? You are doomed if you choose “4 variables”. 0. Y2=Weight after •z-test of proportion (Lect 7) – 2 categories only • χ2 test of proportions (GOF ) (Lect 11) -. Identify the variables as numerical or categorical 3.8. 7½ tests above + paired t-test = 8½ hypothesis tests in STAT170 23 24 . a Cr is guaranteed.15 & 0. 12) or Odds ratio Regression analysis: Test of slope (Lect 9.05? Answering the research Q: Formal stat test Beware of the paired t-test The paired t-test may be mistaken as: • 2-sample t-test • Regression Read the given Research Question If you see “relation” or “predict” => regression If you see “difference” => 2-sample t or paired t.2 or more categories z and t-tests of mean (Lect 7) Chi sq test of association (Lect 11. Do ALL 8½ hypothesis tests You will fail in STAT170 if you cannot do just ONE of them! (In fact.Is the mean equal to …? leaf. stem. vegetarians & vegans equal to 0.
. Q Proportion. “average” => One-sample z-test or t-test See the underlined keywords in the previous slide. of fit (chi sq percentages test of (plural) proportions) Ho: π1=….... “percentages” => Chi-sq test of proportions (GoF) • “mean”. π3=… are NOT correct.. π2=…. Conclusion Conclusion (NOT reject Ho) (reject Ho) Proportion π could be equal to π0 The proportions π1=…. X and Y are Ei = row tol × col tot grand total independence independent. π2=…. but many questions do NOT show graphs! • ONE histogram/stem-leaf => z-test or t-test or paired t • Bar chart/pie chart => chi-sq test of proportions (GOF) (if binary. But be warned it is NOT 100% fool-proof....96 p(1 − p) n df=c-1 Chi sq test of Association. 100% • “association” => Chi-sq test of association certain • “relation”.... Read from computer output df=(r-1) (c-1) X and Y are independent .. π3=… . This is almost certain. π3=… COULD be correct... n(1-π0)≥5 Test statistic 3 types of statistical tests involving categorical data (CONTINUED) Copy Ho + “could be” Opposite of Ho + is higher/lower z= p −π 0 π 0 (1 − π 0) n (O j − E j ) Ej 2 Ho Ho:π= π0 95% C. X and Y COULD be independent (not associated) Proportion π is higher/lower than π0. The proportions π1=…. “percentage” => Z-test of proportion • “Proportions” (plural). “predict” => Regression (with t-test on slope) • “difference” => 2-sample t-test... then deduce the appropriate test.I. NOT 100% fool-proof! Eg: Are proportions of smokers the same for 25 males and females? => Chi-sq test of association How to determine the appropriate test (continued) Method 3 (Easiest for you) Look at the given graphic. π3=… .. Ei=n*πi ≥5 χ2 = ∑ p ± 1. then you may look for keywords in the research questions.. GOF or z-test of proportion) • Clustered bar chart => chi-sq test of association • Scatter plot => regression: test of slope • TWO histograms/stem-leafs and/OR comparative box plots => 2-sample t 26 3 types of statistical tests involving categorical data Statistical test z-test of proportion Keywords in Res. or paired t-test • “Proportion” (singular!). π2=…..How to determine the appropriate test Method 1 The ONLY SURE way to determine the correct test is to identify the variable types correctly! Method 2 IF you cannot do (1). % Ho Ho:π= π0 Assumptions nπ0≥5.. ----------- 27 28 . independent ≥5 (no association) proportions χ2 = ∑ (Oij − Eij ) 2 Eij Ho: π1=….... X and Y are dependent (associated) Chi sq goodness Proportions. π2=…..
.. ... You have absolutely no hope of passing STAT170 if you cannot do the 8 HATPCs – since hypothesis tests. 1. . 30 1-sample z-test of mean Mean.....96 σ n y ± tn −1 Ho:µd=µ0 Difference from normal popn. .. .. or n ≥25 (CLT) Both groups from normal popn. predict Ho: β=0 Linear Res normal Res const SD t=b/SEb df=n-2 29 Ho: β=0 b ± tn-2 SEb In ALL hypothesis tests..5 types of statistical tests involving continuous data Statistical test Keywords in Ho Res... ( y1 − y 2 ) (2-sample t) n2 ± tν s p df=n1+n2-2 Test of linear relation between 2 variables There COULD be no difference 1 1 between ave xxx + n1 n2 and ave xxx There COULD be no relation between X & Y Relation.. or n ≥25 (CLT) Test statistic z= y − µ0 σ/ n Conclusion (Reject Ho) Ave xxx is higher/lower than µ0 The difference is higher/lower than µ0 on ave Ave xxx is higher/lower than ave xxx There is a positive/negative relation. 7. 8 types of Simple Reports – involve only 1 hypothesis test only reports One sample t-test (See Tutorial 8) One-sample z-test Paired t-test 2-sample t-test Z-test of proportion Regression Chi-sq test of proportions Chi-sq test of independence (See Lect 13) 31 32 . Ho:µ=µ0 (σ known) Ho:µ=µ0 (σ unknown) Copy Ho + “could be” Ho Ho: µ=µ0 (σ known) Ho: µ=µ0 (σ unknown) 5 types of statistical tests involving continuous data (CONTINUED) 95% CI (NOT reject Ho) Opposite of Ho + is higher/lower Assumptions Normal population. .. same SD Ho: µd=µ0 (paired t) . . . and related questions. 5. y − µ0 t= df=n-1 s / n yd − µ d sd / n df=n-1 t= t= sp y1 − y 2 1 y ± 1... include CI in the conclusion. There are tons of examples of EACH in Lecture and Tutorial notes. . 4... Examples of the 8 HATPCs? It is assumed that you know them well at this stage. yd ± t n−1 s n sd Ave xxx COULD be equal to µ0 The difference COULD be µ0 on ave n 2-sample t-test difference Ho:µ1=µ2 n1 + 1 Ho: µ1=µ2 . average 1-sample t-test of mean Paired t-test difference .. span more than 60% of exam materials. 8. 3. 2.. 6.. Q. ...
Note: It is most important that you identify the correct statistical method used (how???). depending on the type of test] 33 Results (NO HATPC. 2009C. and you’ll lose most of the marks – 34 and your time! Complex Reports: Involve several hypothesis tests Reports involving hypothesis tests of the same type: • SIBT 2008B. For example.] *Your conclusion should be almost the same (several sentences) as the conclusion you have in the proper hypothesis test (HATPC). NO calculations) *Test statistic *P-val. MQC 2010A – chi squares • University 2007.Key points to write in the Simple report (Check list) – 1-hypothesis-test only Introduction *What this study is about. Term 2 – 2-sample t Reports involving hypothesis tests of different types: • SIBT 2008C. 2011B – regressions & 2sample t 35 Note: No matter how complicated it may appear (many X’s).g. 2010C – regressions • SIBT 2009C. 2009B – 2-sample t & chi squares • MQC 2009B. 2009A – regressions • MQC 2009A. 95% CI if appropriate. decision (reject/not reject null) Conclusion *Decision in words: There is evidence/no evidence … [Check that the research question is answered. since there are • 8½ possible simple reports • at least 5 complex reports 36 . if it is a chi-sq test and you mention t-test. e. 2011A. 2010B. there should only be ONE Y. then the rest does not make sense. it is stupid to copy a sample report (eg the one in Tute 8) in your crib sheet. (Several Y’s would bring you to post-graduate level!) Since so many (at least 5) cases are possible. and why this study – if known *Research question – any wording is OK *Target population Method *How the sample was collected (why random and representative) *Define variables *Statistical method used *Null hypothesis *Justify assumptions [put under Method or Result.
alarm bell rings) • P-val>0. and which best predicts Y? 2nd Example: SIBT 2008C exam Research question: Which variables X1.08 0.1st Example: SIBT 2008B exam (report question) (I do not have a copy of the exam paper. no relation in regression.01 0. … y vs x6 Research Question: Which variables X1. the conclusion of your report will be exactly opposite of what it should be.05 => Discard X Warning: If you misunderstand the above.05 (ie those where Ho is NOT rejected.02 Significant variable? (Reject Ho?) Yes (Reject Ho) No (Not reject Ho) Yes (Reject Ho) Result Keep X1 ------------(Discard X2) Keep X3 1st General rule for COMPLEX report Warning: Common mistake: • P-val<0.05 => not reject Ho => => not reject variable X Discard X Golden rule: You may avoid mentioning Ho! • p-val<0. because null hypothesis represents no effect) (eg no difference in 2-sample t. y vs x2. no association in chi-sq test) Variable P-val X1 X2 X3 0. • those whose p-val > 0. … X6 are significant predictors for Y.05 => reject Ho => reject the variable X Keep X • P-val>0. X2.05 => Keep X (Small prob (<5%). X3 and X4 affect Y? Y and X1 Y and X2 Y and X3 Y and X4 37 38 1st General Rule for COMPLEX report Discard the bad variables: • those where assumptions are violated – not valid.) Given 6 regressions (6 tables and 6 scatter plots): Y vs x1. and you will lose MANY marks! 40 39 .
And X3 is the best predictor for Y.000 ----0. Do NOT compare the p-val of one type of graph with the pval of another type of graph. compare an orange with an orange. (Compare an apple with an apple. . choose the largest r2. Choose the best one within each group.) : Regressions choose best X : : 2-sample t’s choose best X : : Chi squares choose best X 41 : What is the “best” X and how to choose it? • In EACH set.67 --------- Result ----- ----Yes Yes ----No (p-val>5%) Best --------Y and X4 44 Y and X3 Hence only X2 and X3 are significant (important) variables 43 affecting Y.07 r2 ----0. not smallest p-val 42 Example of choosing/discarding variables Needed for choosing the BEST An example on regression to illustrate 1st general rule:variable(s) Significant variable? (Reject Ho?) 2nd Example: SIBT 2008C exam Research question: Which variables (X1. X3 and X4) affect Y? Y and X1 Y and X2 Variable Assumptions P-val satisfied? X1 X2 X3 X4 X5 No Yes Yes No Yes ----0.006 0. • For regression. choose the variable with the smallest p-val (ie the one that strongly rejects Ho) – EXCEPT regression.2nd General rule for complex report Sometimes the question may ask for the BEST variable that determines Y.53 0. X2.
Y vs Wt Y vs WIN Y vs Starts Compare: • Y vs WT : p-val = 0. x2.) 47 Results (NO HATPC.0000 Hence WIN has no effect on Y. affect variable Y? • Which of the variables X1. Payout has an effect. NO calculations) *Discard poor ones (assumptions violated. (But AVOID lengthy repetitive checking the assumptions one by one.0012 Both p-val< 0. pick the best one within each group. 45 Y vs Payout Compare: • Y vs WIN: p-val=0.00055 • Y vs STARTS: p-val = 0. or p-val>0. 46 Key points to write in the COMPLEX report (No rigid rules!) Introduction *Research question *Target population Method *How the sample was collected (why random and representative) *Define the Y and X variable(s) *List ALL statistical methods used *Check assumptions [put under Method or Result. …. x2.05) (AVOID lengthy repetitive checking p-val one by one.) *IF required by the question. Conclusion Answer the research question! --------------------------------------------------------------BTW. what is the research question like? Two possibilities: • Which of the variables X1. BEST affect variable Y? 48 .05 => both Wt and Starts affect Y. depending on the type of test] in EACH case. but Wt has a stronger effect (because of smaller p-val).5641 • Y vs PAYOUT: p-val=0. ….
1-tailed normal table: • 1-tailed – probability calculations • 2-tailed – hypothesis testing Suggestions: The FIRST thing you should do in exam. before you start writing anything.I.Hints and Tips: normal tables 1. 2-tailed normal table vs. paired t.) • The t-statistic is calculated (not read from tables) • The “tν” in 95% CI is read from table (row ν and column 0. is (on the two z-tables): (This applies to the HD students as well!) 49 50 Hints and Tips: t and tcrit 2. before you start writing anything. T statistic and the “tν” in C. is (on the t-table): (This applies to the HD students as well!) 51 52 . (This applies to ALL t tests: 1-sample t. t in regression slope. 2-sample t.05) The SECOND thing you should do in exam.
Note that there are NO such formulas: z= y−µ σ vs. z= y−µ σ n in probability calculations: Look for the keyword “mean” or “average” => y-bar. must be 2-sample t-test If n1=n2 => either test is possible.Hints and Tips: chi sq table 3. z = Hints and Tips: y and y-bar y−µ σ vs. You should only use the top few rows of chi sq. 2nd clue: Ask yourself “Can I move the values of one variable without moving the corresponding values of the other variable?” • Can move values of one variable => independent data => 2-sample t • Cannot move values of one variable (need to move BOTH variables) => dependent data => paired-t 56 55 . 4. 5. 2-sample t-test No rules! 1st clue: Different n1 & n2 => CANNOT be paired t-test. z = y−µ σ n 53 54 Hints and Tips: 2-sample t and paired t 5. Paired-t test vs.
59 Tips and Hints: Which condition? nπ≥5 and n(1-π)≥5. p NEVER goes with π together. s is useless => use z-test. which could be anything. the sample mean (y-bar) is approximately normally distributed. or np≥5 and n(1-p)≥5 ? 8.” The applies to one-sample z or t test. This is NOT CLT.96 Check np≥5 and n(1-p)≥5 np n Rule: p goes with p. Once we have σ. Z-test vs. π goes with π. then the histogram should be approximately bell-shaped.” The statement means that if we make a histogram of the sample (n≥25). The correct statement of CLT is: “When sample size is large (n≥25). the sample histogram looks more and more like the population. You have to know which one is the correct condition for checking. the sample is approximately distributed. 2sample t and the pair t-tests. Lect 5 (prob calculation on p) or Lect 7 (z-test on π) p −π z= Check nπ≥5 and n(1-π)≥5 π (1 − π ) n p (1 − p ) Lect 6: CI for π p ± 1. 60 60 . t-test • Know population standard deviation σ => z-test • Do NOT know σ => t-test Clues: * “It is known that SD=xxx” => likely σ => z-test * Given numerical summary of data (MUST be sample): n xxxx mean xxxx StDev xxxx Baby ID 21 22 23 24 25 26 27 28 29 30 Mother’s age 28 34 24 34 32 24 30 29 37 41 Father’s age 33 40 26 45 35 27 39 27 34 46 57 The SD from a data set (sample) MUST be s. never σ => t-test * Do watch out if both σ and s are given. the 2 corresponding conditions are not. it is WRONG! We know that as sample size n becomes larger and larger. The above statement is NOT CLT. This is a common mistake: “When sample size is large (n≥25).Hints and Tips: z and t tests 2nd clue: From Lect 13: Age difference between husband and wife Can we swap the fathers’ of ages “33” and “46” WITHOUT moving the wives and the babies? • Move alone =>indep’t => 2-sample • Move pairwise together => paired t 6. Note that although the above 2 formulas are in the formula sheet. 58 Hints and Tips: CLT 7.
40. 32. 33. 67 (MUST be sorted first!) mean 100 and SD 15.Tips and Hints: pth percentile (including LQ. use the formula: n*p/100 (Lect 2) Then check result is integer or non-integer etc. males. with (3) being terrible. 10. 17. No association/association between smokers and nonsmokers (In fact. then “There is no difference …” is wrong 64 (accepting the null hypothesis).” 63 . 40. No association/association between males and females. (there is strong no evidence to indicate otherwise. Eg in 2-sample t-test: (1) There could be no difference ….) (2) There is probably no difference … (3) There is no significant difference … (4) There is no evidence to indicate a difference … All of the above are correct! Please stick to (1). LQ) 9. What is the 10th percentile?” What is the LQ? (ii) non-normal (or unknown distribution) CANNOT do it! 61 62 Hints and Tips: Association Smoker Nonsmoker Male Female 4 5 11 8 Hints and Tips: Writing conclusion when Ho is NOT rejected 11. hence students are confused. smokers and non-smokers are NOT variables. Find pth percentile pth percentile (b) Given population (of infinite size) of known (given) µ and σ: µ = 100 σ = 15 (i) Given normal: 100 Find z from the given area p (1-tailed) Then find y = µ+σ*z Eg: “It is known that IQ is normally distributed with (a) Given ANY sample of size n. females. if you miss out ‘probably’ or ‘significant’.) It should be: “Could be no association/There is association between Sex and Smoking Status. Keep things simple! Note that in (2) or (3). 28. which you may make mistakes. which is easiest! (3) and (4) are double negatives. Eg AGE: 12. Many versions.
65 Eg 2-sample t: “Ho: There is no difference in exam marks on average for boys and girls.” P-val>0.” P-val<0.” 67 68 . use the verb “is” •Also give further info: “is greater/less than”.” Writing conclusion in HATPC: Example 2 Eg chi sq test of association: “Ho: There is NO association between sex and smoking status” P-val<0.” Eg chi sq test of association: “Ho: There is NO association between X and Y” Eg regression: “Ho: β=0” (No relation between X and Y) P-val<0. with girls have higher average than that of boys.05 =>Reject Ho •Negate (make opposite) Ho •Be certain.” P-val>0.) (2) There is probably no association … (3) There is no significant association … (4) There is no evidence of an association … Again all of them are correct. “is longer/shorter” (eg onesample or 2-sample t) – except chi sq P-val>0.05 =>Do not reject Ho “There could be no difference in average exam marks between boys and girls.05 =>Reject Ho “There is association between sex and smoking status.05 =>Reject Ho “There is a difference in exam marks between boys and girls. Conclusion: (1) There could be no association ….05 =>Do not reject Ho •Copy Ho •Change the verb “is” to “could be”. 66 Writing conclusion in HATPC: Example 1 Eg 2-sample t: “Ho: There is no difference in exam marks on average between boys and girls.05 =>Do not reject Ho “There could be no association between sex and smoking status. (there is strong no evidence to indicate otherwise.Writing conclusion in HATPC: the rules: Try the chi sq test of association: Ho: There is NO association between X and Y Suppose we do NOT reject Ho.
which is the easiest HATPC. and WILL likewise occur in the future in this semester (prob=0. p. STAT170: p=sample proportion π=population proportion µ and y s and σ A confusion between p and π. but many!70 Predicting the future The following happened in past semesters without exceptions. If you have problems in the left column. it is not much more difficult than Primary 1 !!! (a) Confusion of symbols of similar meanings: 1st yr Uni. this is Mary’s book. k m. but not least. o.99999): 1. v z. j g. this is your book. b h. 69 Hints and Tips: Symbols – their writings and meanings (b) Confusion of look-alike symbols: Primary 1: 1st yr Uni. 3. Someone will write z = y−µ σ and z= y−µ σ n 7. students find the symbols in STAT170 more difficult than the 26 English letters in Primary 1. MANY students have lots of problems here. 2. STAT170: µ and u σ and θ β and B ω and w Σ and E i. You will be in big trouble if you regard Mary’s money as the same as yours. My book. n l. Someone will not know the meaning of r2. Primary 1: This is my book. 2 Which is more difficult? Surprisingly. e. your book and Mary’s book are not the same. you will be in big trouble. and s and σ will cost your dearly in exam. 4.Hints and Tips: Symbols – their writings and meanings 12. Last. You will NOT lose “just a a few marks”. c d. µ and y . Someone will write “There is an association between males and females”. 1 u. Surprising. 71 Predicting the future 6. Someone will write u instead of µ. q a. 5. This makes no sense at all. Someone will leave the whole page blank on the hypothesis test on slope in regression. Someone will copy a sample report (from past exam papers or Tute 8) onto the crib (pink) sheet. Someone will use the “formula” for 2-sample t or paired-t y − µ d ( y1 − y2 ) − 0 = t= d sd / n ( s1 − s2 ) / n 72 .
s and σ 74 73 Failure check list (continued!) • • • • • • • Did not do the exercises on the tutorial sheets Gave up assignment(s) Do not know how to use calculator to find SD Low marks in Practical Test Copy past exam solutions. onto crib sheet Copy report(s). onto crib sheet Do not read past exam papers How many ticks do you have in the above list ? ____ Unfortunately. π. word by word. β and ω Mix up p and π. word by word. eg “Can do just one hypothesis test”. on average?” Macquarie University recommends (minimum): 3 credit points * 3 hours = 9 hours = 4 hours in class + 5 hours on your own at home Every WEEK. • • • • • • • • • • Profile of students who fail – Failure check list The followings are common characteristics of those who fail: Low class attendance Did not study on a weekly basis No/few attempts of online quizzes #Can do at most one hypothesis test in exam #Cannot do t-test on regression #Cannot count how many variables #Cannot distinguish between categorical and numerical Do not know parameters vs statistics Do not know the symbols µ. y-bar and µ. can (and will) make a failure! Note: # = fatal 75 . σ. even just ONE tick.Ask yourself … “How many hours did I spend on STAT170 each week.