3-6DataTests_slides.pdf

© All Rights Reserved

13 views

3-6DataTests_slides.pdf

© All Rights Reserved

- ch11-solns-all_skuce_2e
- Hello Computational Learning Theory, Meet Evolutionary computation
- Bayesian Statistics Explained to Beginners in Simple English
- BBS11_ISM_Ch09 (1)
- chap09
- Hui 1996
- Research Designe
- Hypothesis Testing
- TUGAS SPSS
- Evaluation Aset Management
- 2014 Sample Answers
- Ssf + Placket Burmane Design
- IdentificationoftheProblemsFacedbySecondarySchoolTeachersinKohatDivisionPakistan
- 6375_downloadfile-38
- Hypothesis Testing Sample Worksheet
- Chapter 9
- Chapter 13
- Exercise 13-Chi Square-Assoc Group 6
- Condition of Naturally Aged Papers
- SPSS Getting Started Guide

You are on page 1of 45

Data Analysis:

Goals

Understand confidence intervals and pvalues Learn to use basic statistical tests including chi square and ANOVA

Types of Variables

Types of variables indicate which estimates you can calculate and which statistical tests you should use Continuous variables:

Always numeric Generally calculate measures such as the mean, median and standard deviation Information that can be sorted into categories Field investigation often interested in dichotomous or binary (2-level) categorical variables Cannot calculate mean or median but can calculate risk

Categorical variables:

Measures of Association

Strength of the association between two variables, such as an exposure and a disease Two measure of association used most often are the relative risk, or risk ratio (RR), and the odds ratio (OR) The decision to calculate an RR or an OR depends on the study design Interpretation of RR and OR:

RR or OR = 1: exposure has no association with disease RR or OR > 1: exposure may be positively associated with disease RR or OR < 1: exposure may be negatively associated with disease

Risk ratio

Used when comparing outcomes of those who were exposed to something to those who were not exposed Calculated in cohort studies Cannot be calculated in case-control studies because the entire population at risk is not included in the study Used in case-control studies Odds of exposure among cases divided by odds of exposure among controls Provides a rough estimate of the risk ratio

Odds ratio

Commonly used with dichotomous variables to compare groups of people Table puts one dichotomous variable across the rows and another dichotomous variable along the columns Useful in determining the association between a dichotomous exposure and a dichotomous outcome

Table 1. Sample 2x2 table for Hepatitis A at Restaurant A

Outcome Hepatitis A Exposure Ate salsa Did not eat salsa 218 21 No Hepatitis A 45 85 Total 263 106

Total

239

130

369

Table displays data from a case control study conducted in Pennsylvania in 2003 (2) Can calculate the odds ratio: *OR = ad = (218)(85) = 19.6 bc (45)(21)

Confidence Intervals

Point estimate a calculated estimate (like risk or odds) or measure of association (risk ratio or odds ratio) The confidence interval (CI) of a point estimate describes the precision of the estimate

The CI represents a range of values on either side of the estimate The narrower the CI, the more precise the point estimate (3)

You want to know the percentage of green marbles but dont want to count every marble Shake up the bag and select 50 marbles to give an estimate of the percentage of green marbles Sample of 50 marbles:

Based on sample we conclude 30% (15 out of 50) marbles are green 30% = point estimate Actual percentage of green marbles could be higher or lower, ie. sample of 50 may not reflect distribution in entire bag of marbles

How do you calculate a confidence interval? Can do so by hand or use a statistical program

Epi Info, SAS, STATA, SPSS and Episheet are common statistical programs

Default is usually 95% confidence interval but this can be adjusted to 90%, 99% or any other level

Confidence Intervals

95% CI indicates that our estimated range has a 95% chance of containing the true population value

Assume that the 95% CI for our bag of marbles example is 17-43% We estimated that 30% of the marbles are green:

CI tells us that the true percentage of green marbles is most likely between 17 and 43% There is a 5% chance that this range (17-43%) does not contain the true percentage of green marbles

Confidence Intervals

A 99% CI will have only a 1% chance of error but will have a wider range 99% CI for green marbles is 13-47%

Confidence Intervals

Very narrow confidence intervals indicate a very precise estimate Can get a more precise estimate by taking a larger sample

Point estimate stays the same (30%) 95% confidence interval is 21-39% (rather than 17-43% for original sample) Point estimate is 30% 95% confidence interval is 24-36%

Confidence Intervals

Odds ratio = 19.6 95% confidence interval of 11.0-34.9 (95% chance that the range 11.0-34.9 contained the true OR) Lower bound of CI in this example is 11.0 (e.g., >1)

Odds ratio of 1 means there is no difference between the two groups, OR > 1 indicates a greater risk among the exposed

Conclusion: people who ate salsa were truly more likely to become ill than those who did not eat salsa

Confidence Intervals

Must include CIs with your point estimates to give a sense of the precision of your estimates Examples:

Children who ate corn/tuna salad had 6.19 times the risk of becoming ill as children who did not eat salad 95% confidence interval: 4.81 7.98

(5)

Case-patients had 6.4 times the odds of living with a 6-10 yearold child than controls 95% confidence interval: 1.8 23.4

Measure of association (risk ratio or odds ratio) Confidence interval Chi-square test

Chi-Square Statistics

A common analysis is whether Disease X occurs as much among people in Group A as it does among people in Group B

People are often sorted into groups based on their exposure to some disease risk factor We then perform a test of the association between exposure and disease in the two groups

Retrospective cohort study conducted All 300 people on cruise ship interviewed, 60 had symptoms consistent with

Salmonella

Questionnaires indicate many of the casepatients ate tomatoes from the salad bar

Table 2a. Cohort study: Exposure to tomatoes and Salmonella infection

Salmonella? Yes Tomatoes No Tomatoes Total 41 19 60 No 89 151 240 Total 130 170 300

To see if there is a statistical difference in the amount of illness between those who ate tomatoes (41/130) and those who did not (19/170) we could conduct a chi-square test

There must be at least a total of 30 observations (people) in the table Each cell must contain a count of 5 or more

To conduct a chi-square test we compare the observed data (from study results) with the data we would expect to see

Table 2b. Row and column totals for tomatoes and Salmonella infection

Salmonella? Yes Tomatoes No Tomatoes Total 60 240 No Total 130 170 300

Gives an overall distribution of people who ate tomatoes and became sick Based on these distributions we can fill in the empty cells with the expected values

Expected Value

For the first cell, people who ate tomatoes and became ill:

Expected value =

130 x 60 300

26

Same formula can be used to calculate the expected values for each of the cells

Table 2c. Expected values for exposure to tomatoes

Salmonella? Yes Tomatoes No Tomatoes Total

130 x 60 = 26 300

170 x 60 = 34 300

60

240

To calculate the chi-square statistic you use the observed values from Table 2a and the expected values from Table 2c Formula is [(Observed Expected)2/Expected] for each cell of the table

Table 2d. Expected values for exposure to tomatoes

Salmonella? Yes Tomatoes (41-26)2 = 8.7 26 (19-34)2 = 6.6 34 No (89-104)2 = 2.2 104 (151-136)2 = 1.7 136 Total 130

No Tomatoes

170

Total

60

240

300

Chi-Square Test

What does the chi-square tell you? In general, the higher the chi-square value, the greater the likelihood there is a statistically significant difference between the two groups you are comparing To know for sure, you need to look up the pvalue in a chi-square table We will discuss p-values after a discussion of different types of chi-square tests

Many computer programs give different types of chi-square tests Each test is best suited to certain situations Most commonly calculated chi-square test is Pearsons chi-square

Parade of Statistics Guys

The right test...

Pearson chi-square (uncorrected)

To use when.

Sample size >100 Expected cell counts > 10 Sample size >30 Expected cell counts 5 Sample size > 30 Variables are ordinal

Mantel-Haenszel chi-square

In each study, investigators chose the type of test that best applied to the situation (Note: while the chisquare value is used to determine the corresponding p-value, often only the p-value is reported.)

Pearson (Uncorrected) Chi-Square : A North Carolina study investigated 955 individuals because they were identified as partners of someone who tested positive for HIV. The study found that the proportion of partners who got tested for HIV differed significantly by race/ethnicity (p-value <0.001). The study also found that HIV-positive rates did not differ by race/ethnicity among the 610 who were tested (p = 0.4). (6)

Additional examples:

Yates (Corrected) Chi-Square: In an outbreak of Salmonella gastroenteritis associated with eating at a restaurant, 14 of 15 ill patrons studied had eaten the Caesar salad, while 0 of 11 well patrons had eaten the salad (p-value <0.01). The dressing on the salad was made from raw eggs that were probably contaminated with Salmonella. (7) Fishers Exact Test: A study of Group A Streptococcus (GAS) among children attending daycare found that 7 of 11 children who spent 30 or more hours per week in daycare had laboratory-confirmed GAS, while 0 of 4 children spending less than 30 hours per week in daycare had GAS (p-value <0.01). (8)

P-Values

How do we know whether the difference between 32% and 11% is a real difference?

32% of people who ate tomatoes got Salmonella as compared with 11% of people who did not eat tomatoes

In other words, how do we know that our chisquare value (calculated as 19.2) indicates a statistically significant difference?

P-Values

Many statistical tests give both a numeric result (e.g. a chi-square value) and a p-value The p-value ranges between 0 and 1 What does the p-value tell you?

The p-value is the probability of getting the result you got, assuming that the two groups you are comparing are actually the

same

P-Values

Start by assuming there is no difference in outcomes between the groups Look at the test statistic and p-value to see if they indicate otherwise

A low p-value means that (assuming the groups are the same) the probability of observing these results by chance is very small

A high p-value means that the two groups were not that different A p-value of 1 means that there was no difference between the two groups

P-Values

Generally, if the p-value is less than 0.05, the difference observed is considered statistically significant, ie. the difference did not happen by chance You may use a number of statistical tests to obtain the p-value

If the chi-square statistic is small, the observed and expected data were not very different and the pvalue will be large If the chi-square statistic is large, this generally means the p-value is small, and the difference could be statistically significant Example: Outbreak of E. coli O157:H7 associated with swimming in a lake (1)

Case-patients much more likely than controls to have taken lake water in their mouth (p-value =0.002) and swallowed lake water (p-value =0.002) Because p-values were each less than 0.05, both exposures were considered statistically significant risk factors

Note: Assumptions

Statistical tests such as the chi-square assume that the observations are independent

If this assumption is not true, you may not use the chi-square test Do not use chi-square tests with:

Repeat observations of the same group of people (e.g. preand post-tests) Matched pair designs in which cases and controls are matched on variables such as sex and age

Data do not always fit into discrete categories Continuous numeric data may be of interest in a field investigation such as:

Clinical symptoms between groups of patients Average age of patients compared to average age of non-patients Respiratory rate of those exposed to a chemical vs. respiratory rate of those who were not exposed

ANOVA

May compare continuous data through the Analysis Of Variance (ANOVA) test Most statistical software programs will calculate ANOVA

Output varies slightly in different programs For example, using Epi Info software:

Generates 3 pieces of information: ANOVA results, Bartletts test and Kruskal-Wallis test

ANOVA

ANOVA uses either the t-test or the f-test Example: testing age differences between 2 groups

Use a t-test for comparing 2 groups Use an f-test for comparing 3 or more groups Both tests result in a p-value

If groups have similar average ages and a similar distribution of age values, t-statistic will be small and the pvalue will not be significant If average ages of 2 groups are different, t-statistic will be larger and p-value will be smaller (p-value <0.05 indicates two groups have significantly different ages)

Critical assumption with t-tests and f-tests: groups have similar variances (e.g., spread of age values) As part of the ANOVA analysis, software conducts a separate test to compare variances: Bartletts test for equality of variance Bartletts test:

Produces a p-value If Bartletts p-value >0.05, (not significant) OK to use ANOVA results Bartletts p-value <0.05, variances in the groups are NOT the same and you cannot use the ANOVA results

Kruskal-Wallis Test

Used only if Bartletts test reveals variances dissimilar enough so that you cant use ANOVA Does not make assumptions about variance, examines the distribution of values within each group Generates a p-value

If p-value >0.05 there is not a significant difference between groups If p-value < 0.05 there is a significant difference between groups

Figure 1. Decision tree for analysis of continuous data.

Bartletts test for equality of variance p-value >0.05?

YES

NO

p<0.05

p>0.05

p<0.05

p>0.05

Conclusion

In field epidemiology a few calculations and tests make up the core of analytic methods Learning these methods will provide a good set of field epidemiology skills.

Further data analysis may require methods to control for confounding including matching and logistic regression

References

1. Bruce MG, Curtis MB, Payne MM, et al. Lake-associated outbreak of Escherichia coli O157:H7 in Clark County, Washington, August 1999. Arch Pediatr Adolesc Med. 2003;157:1016-1021. Wheeler C, Vogt TM, Armstrong GL, et al. An outbreak of hepatitis A associated with green onions. N Engl J Med. 2005;353:890-897. Gregg MB. Field Epidemiology. 2nd ed. New York, NY: Oxford University Press; 2002. Aureli P, Fiorucci GC, Caroli D, et al. An outbreak of febrile gastroenteritis associated with corn contaminated by Listeria monocytogenes. N Engl J Med. 2000;342:1236-1241.

2.

3. 4.

References

5. Schafer S, Gillette H, Hedberg K, Cieslak P. A community-wide pertussis outbreak: an argument for universal booster vaccination. Arch Intern Med. 2006;166:1317-1321. Centers for Disease Control and Prevention. Partner counseling and referral services to identify persons with undiagnosed HIV --- North Carolina, 2001. MMWR Morb Mort Wkly Rep.2003;52:1181-1184. Centers for Disease Control and Prevention. Outbreak of Salmonella Enteritidis infection associated with consumption of raw shell eggs, 1991. MMWR Morb Mort Wkly Rep. 1992;41:369-372. Centers for Disease Control and Prevention. Outbreak of invasive group A streptococcus associated with varicella in a childcare center -Boston, Massachusetts, 1997. MMWR Morb Mort Wkly Rep. 1997;46:944-948.

6.

7.

8.

- ch11-solns-all_skuce_2eUploaded bygainesboro
- Hello Computational Learning Theory, Meet Evolutionary computationUploaded byKeki Burjorjee
- Bayesian Statistics Explained to Beginners in Simple EnglishUploaded byramesh158
- BBS11_ISM_Ch09 (1)Uploaded bycleofecalo
- chap09Uploaded byChetna Vaghasiya
- Hui 1996Uploaded bySandra R Ruiz Rojas
- Research DesigneUploaded byShahid Ahmed Heera
- Hypothesis TestingUploaded bySivakasi Velan
- TUGAS SPSSUploaded byKhaerani Arista Dewi
- Evaluation Aset ManagementUploaded byRatnawaty
- 2014 Sample AnswersUploaded byDâwood Mehmood
- Ssf + Placket Burmane DesignUploaded byIshwar Chandra
- IdentificationoftheProblemsFacedbySecondarySchoolTeachersinKohatDivisionPakistanUploaded byAqib Mihmood
- 6375_downloadfile-38Uploaded byvania
- Hypothesis Testing Sample WorksheetUploaded byJuliana Budomo
- Chapter 9Uploaded byJes Bui
- Chapter 13Uploaded byNdomadu
- Exercise 13-Chi Square-Assoc Group 6Uploaded byelizabacud
- Condition of Naturally Aged PapersUploaded byZsolt Szakács
- SPSS Getting Started GuideUploaded bymehrinfatima
- Important .pdfUploaded byAbdul Sattar Ansari
- 2050AOVXUploaded byYash Gautam
- Chapter 8 Part 1Uploaded byrmart3966
- math studies iaUploaded byapi-285168683
- fragility fracturedUploaded byfebri
- Lecture 11- Use of statistics in QC.pdfUploaded byWoon How
- HUL_bs_plUploaded byIshu Singla
- enme392_1301_lecture16_hypothesis3Uploaded byZain Baqar
- Scientists who engage with society perform better academicallyUploaded bypaulusmil
- Proyek SIx Sigma untuk meningkatkan Business Account.pdfUploaded byMartin M Parasian

- 0006 a Molekulak Vilaga IIUploaded byalyssa_marie_ke
- Human BiochemistryUploaded byalyssa_marie_ke
- 1[1]. Overview of Moral Situations and Contemporary Moral TrendsUploaded byIvy Tan
- 01+Chemical+FoundationsUploaded byalyssa_marie_ke
- 3 Component Systems HandoutUploaded byalyssa_marie_ke
- 04e.Bacterial Pathology (1).pdfUploaded byalyssa_marie_ke
- Phase Diagrams of 2 Component Solid Liquid Systems HandoutUploaded byalyssa_marie_ke
- Binary Liquid Systems HandoutUploaded byalyssa_marie_ke
- Metals 2Uploaded byalyssa_marie_ke
- Bc34lec2 CarboUploaded byalyssa_marie_ke
- Experiment 8 Dry LabUploaded byalyssa_marie_ke
- EnzymesUploaded byakshaymoga
- Rogers & Gibon 2009Uploaded byIbrar Hussain
- Amino Acid CatabolismUploaded byalyssa_marie_ke
- MicroscopyUploaded byalyssa_marie_ke
- lesson 1Uploaded byalyssa_marie_ke
- 01.JEJ.introductionUploaded byalyssa_marie_ke
- 0534166962_524Uploaded byalyssa_marie_ke
- 108458727 Muscular SystemUploaded byalyssa_marie_ke
- Molecular MicrobiologyUploaded byalyssa_marie_ke
- Correlation Regression Multiple(1)(1)Uploaded byalyssa_marie_ke
- Formulas in Inferential StatisticsUploaded byLlarx Yu
- InferenceUploaded byalyssa_marie_ke
- Phl - Fallacies of ReasoningUploaded byalyssa_marie_ke
- Learning StyleUploaded byalyssa_marie_ke
- Apa DocumentationUploaded byalyssa_marie_ke
- The Weapon for a Microbiologist is His MicroscopeUploaded byalyssa_marie_ke
- Quiz 2Uploaded byalyssa_marie_ke
- Quiz 1Uploaded byalyssa_marie_ke

- Synopsis AccentureUploaded bySona Bhatia
- Working Paper 23Uploaded bybedilu77
- s0097-8485-2802-2900029-3Uploaded byRajan Panda
- Real-time Monitoring for Crop DiseaseUploaded byPatrickLimo
- Pharma related aspectsUploaded bysiribandla
- VAL - APPL- F(1)-CUploaded byusagnia
- as.pdfUploaded byAlanSam
- Job Leveling Global Grading and Career MapUploaded byBarlian Widaryanto
- 9 BibliographyUploaded byPolisetty Guptha
- E-book - Applied Microbiology Volume 2Uploaded byAhmed Hamed
- Validity and reliability of the Edmonton Frail ScaleUploaded byMihaela Zamfir
- st03.pdfUploaded byLutfi Hidiyaningtyas
- The Ultimate Guite to Creating e BooksUploaded byCyberBizia
- Mirai 18 Group 50Uploaded byMuhammad Awaluddin
- HBL PresentationUploaded byMuhammad Umair Khalid
- Leadership Skills (1)Uploaded byvipsyadav90
- SOC 300 Week 9 Assignment 2 Lending Institutions, Health Care, And Human CapitalUploaded byOnline University Education
- diabetes mellitus nephropathyUploaded byJulienne Rowelie Aujero Sanchez
- Post Hoc TestUploaded byMutiara Aprilina
- Final ProjectUploaded byAlokTiwari
- 1 Disconnected Proj MgtUploaded byAyaz Ali
- CRS 98-993Uploaded byakujo
- Ch02Uploaded bymirrmirmir
- Isidor Isaac RabiUploaded byjack
- Review on Selective Harmonic Palliation in Multilevel Inverter by Using Particle Swarm Optimization (PSO)Uploaded byIJARMATE
- 43_359Uploaded byNgôn Nguyễn
- 117076134 SpecUploaded byJorge Hantar Touma Lazo
- GEM_MONARCH_35UAV.pdfUploaded byMauricio Santisteban Campos Robles
- PG 2Uploaded byPhillipMichaelLeblanc
- NegotiationUploaded byvphani11