0 Up votes0 Down votes

14 views62 pagesintro to biostat

Jan 04, 2015

© © All Rights Reserved

PPT, PDF, TXT or read online from Scribd

intro to biostat

© All Rights Reserved

14 views

intro to biostat

© All Rights Reserved

- Question.docx
- 12.Hypothesis Test for Two Means
- Human Factors Engineering Studies of the Design and Use of Pushbutton Telephone Sets - Bell Labs
- Business Research Session 11 Data Analysis
- PRINT ABSTRAK
- 2158244015582229.pdf
- Parametric Test
- guna
- Analysis Customer Satisfaction for Developer Performance on Cluster Edison Summarecon Serpong
- Unique Innovation
- Chi Square Fluency & Flexibility
- art%3A10.1007%2Fs00024-015-1112-z.pdf
- De Sposato
- Analysis 2001
- 86560-213284-1-PB
- Ppt Variance
- ps05
- NASKAH_PUBLIKASI
- 9814272795_279573
- A Review of Smoking and Health

You are on page 1of 62

Final Exam

You may bring 5 pages of notes

You MUST bring full copies of

statistical tables (on Blackboard)

You MUST bring a calculator

Hypothesis Testing for a single mean and

proportion, and for two means

One-way ANOVA

Chi-square Tests

Power and Sample size

Regression and Correlation

Logistic Regression

Survival analysis

Single mean

Single proportion p

Paired (or matched) data d

Define null and research hypotheses

Define test statistic, level of

significance and decision rule

sample data.

Use decision rule or p-value to decide

whether to reject or not reject the null

hypothesis.

For a single mean

If n 30, use z-test statistic

If n < 30 use t-test statistic

Use z-test statistic

Check assumptions

For comparing two means 1 - 2

If n1 and n2 both 30, use z-test statistic

If n1 and/or n2 < 30 use t-test statistic

Use chi-square test

Type I error occurs when we reject null

hypothesis when we shouldnt.

Pr(Type I error) =

Type II error occurs when we dont

reject null hypothesis when we should

have.

Pr(Type II error) =

One-Way ANOVA

Used when we want to compare the means

of three or more groups from independent

populations.

Continuous outcome measured on each

subject.

We set up an analysis of variance table and

compare the variances of between groups

and within groups.

An F-test is used with two different degrees

of freedom terms.

Chi-Square Test

Chi-square goodness of fit test

Assess whether responses fit a specified

distribution for one sample of people

Test if two discrete variables are associated in

some way for a sample of people

Compare distributions of proportions among two

or more independent groups

Need a large enough sample to ensure

you have the pre-specified amount of

precision in analysis

Sample size determined based on type

of planned analysis:

Confidence interval

Hypothesis test

We always round up our calculation.

Need to account for possible dropout

from study. This always increases the

required sample size.

Power

Linked up with Type II error

Power = 1-

=P(Reject H0 | H0 false)

= Probability of correctly

rejecting H0 when H0 is false.

Correlation

Correlation measures the nature and

strength of linear association between

two variables at a time.

Regression equation that best

describes relationship between

variables.

Correlation Coefficient

Population correlation is r (rho)

Sample correlation is r where

-1 < r < +1

Sign indicates nature of relationship

(positive or direct, negative or inverse)

Linear Regression

A very popular method for describing

the linear relationship between two

variables (usually continuous

variables).

We use a scatterplot to display the

data graphically

the two variables.

Y = Dependent, Outcome variable

X = Independent, Covariate, Predictor

variable

y = b0 + b1 x

Useful when we want to jointly

examine the effect of several X

variables on the outcome Y variable.

Y = continuous outcome variable

X1, X2, , Xp = set of independent or

predictor variables

y

. = b0 + b1 x1 + b2 x 2 + . . . + bp x p

Linear Regression

Predictors can be continuous, indicator

variables (0/1) or a set of dummy variables

Confounding the effect of a risk factor on

an outcome is somehow changed due to the

effect of another factor.

Effect Modification a different relationship

between the risk factor and an outcome

depending on the level of another variable.

Logistic Regression

Used when the outcome is dichotomous

(binary), e.g. diseased , not diseased.

Our goals remain the same as for linear

regression:

is there an association between a

variable X and our outcome variable Y?

If so, what type?

We model the probability p of having

the disease.

b 0 b1X

e

p

b 0 b1X

1 e

p

b0 b1x

logit( p ) ln

1 p

Outcome is dichotomous (1=event,

0=non-event) and p=P(event)

Outcome is modeled as log odds

p

b0 b1x1 b 2 x 2 ... b p x p

ln

1 - p

Exp(bi) = OR

Survival Analysis

Outcome is the time to an event.

An event could be time to heart attack,

cancer remission or death.

(Yes/No) and if so, their time to event.

Determine factors associated with longer

survival.

Survival Analysis

Incomplete follow-up information

Censoring

Measure follow-up time and not time to

event

We know survival time > follow-up time

two or more independent groups

Model:

ln(h(t)/h0(t)) = b1X1 + b2X2 + + bpXp

Model used to jointly assess effects of

independent variables on outcome

(time to an event).

Final Exam

Problem 1.

Suppose a cross-sectional study is

conducted to investigate cardiovascular risk

factors among a sample of patients seeking

medical care at one of three local hospitals.

A total of 300 patients are enrolled. Using

the following data, test if there is an

association between enrollment site (i.e.,

hospital) and family history of CVD. Run

the appropriate test at a 5% level of

significance.

Problem 1.

Family

Hx

Definite

Hosp 1

Hosp 2

Hosp 3

24

14

22

Probable

14

No

68

72

70

Total

100

100

100

Problem 1.

H0: Site and family history are

independent

H1: H0 is false

=0.05

Df = (r-1)(c-1) = (3-1)(3-1) = 4.

Reject H0 if 2 > 9.49

Problem 1.

Family

Hx

Definite

Hosp 1

Hosp 2

Hosp 3

24 (20)

14 (20)

22 (20)

Probable

8 (10)

14 (10)

8 (10)

No

68 (70)

72 (70)

70 (70)

100

100

100

Total

Problem 1.

(24 20 ) 2 (14 20 ) 2 (22 20 ) 2 (8 10 ) 2 (14 10 ) 2 (8 10 ) 2

20

20

20

10

10

10

(68 70 ) 2 (72 70 ) 2 (70 70 ) 2

70

70

70

2

+ 0.06 + 0 = 5.32

Do not reject H0 because 5.32 <9.49.

We do not have significant evidence,

=0.05, to show that site and family

history are not independent.

Problem 2.

The following table summarizes data collected

in the study described in problem 1. The

variable summarized below is body mass

index (BMI) computed as the ratio of weight

in kilograms to height in meters squared.

BMI

N

Mean

Std Dev

Overall

300

24.8

2.5

Hosp 1

100

21.6

2.1

Hosp 2

100

24.8

1.8

Hosp 3

100

27.9

1.3

Problem 2.

Test if there is a significant difference in the mean BMI

scores among hospitals. Show all parts of the test and

use a 5% level of significance. (HINT: MSE = 3.1).

H0: 123

H1: means not all equal

SSb n j (X j X)

=0.05

=100((21.6-24.8)2+(24.824.8)2+(27.924.8)2)

= 100(10.24 + 0 + 9.61) = 1985

Problem 2.

Source

SS

Df

MS

Between

1985

992.5

320.2

Error

920.7

297

3.1

Total

2905.7

299

F = 320.2

Reject H0 since 320.2 > 3.09. We have significant

evidence, =0.05, to show that the means are not

all equal.

Problem 3.

Suppose each participant in the study

described in problem 1 is assigned a

cardiovascular risk (a value between 0 and

100 with higher scores indicative of more

risk of cardiovascular disease). The mean

cardiovascular risk is 21.7 with a standard

deviation of 5.6. Suppose that the

covariance between BMI and cardiovascular

risk is 4.5.

Problem 3.

Compute the sample correlation coefficient between

BMI and cardiovascular risk.

Var(BMI) = sx2= 2.52

Var(Risk) = sy2 = 5.62

Cov(X,Y)

2 2

x y

ss

4.5

2

(2.5) (5.6)

0.3

Run the appropriate test at a 5% level of significance.

H0: r = 0

H1: r 0

(n 2)

Zr

1 r2

=0.05

Reject H0 if Z < -1.96 or if Z > 1.96

298

Z 0.3

5.4

2

1 (0.3)

Reject H0 since 5.4 > 1.96. We have significant

evidence, =0.05, to show that r 0.

Problem 4.

Compute the equation of the line that best describes

the relationship between BMI and cardiovascular risk

(Assume that cardiovascular risk is the dependent

variable).

sy

5.6

b1 r 0.3

0.67

sx

2.5

y 5.08 0.67X

Problem 5.

Suppose we restrict our attention to the

subgroup of patients at high risk for

cardiovascular disease (cardiovascular

risk score of 30 or more).

Using the following data, test if BMI is

significantly different in men versus

women. Use a 5% level of significance.

Problem 5.

H0: 1 = 2

H1: 1 2

=0.05

BMI

X1 X 2

t

1 1

Sp

n1 n 2

Men

Women

20

10

Mean

31.6

28.1

Std Dev

1.7

2.1

Df=20+10-2 = 28

Reject H0 if t < -2.048 or if t > 2.048

Problem 5.

19(1.7) 2 9(2.1) 2

Sp

1.84

20 10 2

31.6 - 28.1

4.91

1 1

1.84

20 10

=0.05, to show there is a difference in mean BMI

between men and women.

Problem 6.

How many men and women would be required to

estimate a difference in mean BMI with a 95%

confidence interval and a margin of error not

exceeding 1 unit. (Use data from problem 6 as

needed.)

2

Zs

ni 2

E

Use Sp from #6

1.96(1.84)

ni 2

26.01

1

Problem 7.

The following table was constructed based on a

comparison of various sociodemographic

characteristics between men and women enrolled in

the study of cardiovascular risk factors.

Which, if any, of the characteristics shown

above are significantly different between men

and women? Justify.

Problem 7.

Characteristic

Men (n=160)

Women (n=140)

45

47

Race

p

0.7256

0.0354

% White

32

38

% Black

41

37

% Hispanic

25

19

% Other

% HS Graduate

78

64

0.0245

47

31

0.0001

% No Insurance

0.9876

Problem 8.

men and women?

Two sample test for equality of independent

means.

What test was used to compare race between

men and women?

Chi-square test of independence.

What test was used to compare educational

level (% high school graduates) between men

and women?

Two sample test for equality of independent

proportions or chi-square test of independence.

Problem 9.

Two different scales are used in a particular

laboratory. There is some concern that one

scale gives different readings than the other.

Ten specimens are randomly selected and

weighed on each scale. The data are shown

below.

weights between the two scales at =0.05

Problem 9.

Specimen

Scale 1

Scale 2

1.2

2.1

3.5

3.6

1.8

1.9

4.0

4.0

5.0

4.9

1.9

2.0

2.7

2.7

2.2

2.3

2.8

2.9

10

3.5

3.7

diff 2 diff /n

2

diff 1.5

Xd

0.15

n

10

sd

n 1

0.276

9

H0: d = 0

H1: d 0 =0.05

t

Xd

sd

, df n 1

t

Xd

sd

0.15

1.72

n 0.276

10

have significant evidence at =0.05 to show that d 0

Problem 10.

Patients with hypertension are generally

recommended to follow a low salt diet.

Surveys report that approximately 75% of

patients adhere to these diets. In a random

sample of 100 patients with hypertension,

70% report following a low-salt diet. Are

these patients significantly low in terms of

adherence? Run the test at = 0.05.

Problem 10.

H0: p = 0.75

H1: p < 0.75

=0.05

p p 0

p 0 (1 p 0 )

n

Z

p p 0

p 0 (1 p 0 )

n

0.70 0.75

0.75(1 0.75)

100

1.15

have significant evidence at =0.05 to show that p<0.75.

Problem 11.

The following table was presented in a journal and describes

the associations between demographic and clinical risk

factors and systolic blood pressure.

Risk Factors

Intercept

Age

Male Sex

Current Smoker

Number

of

Exercise/Week

Hrs

Pressure

p

Regression

Coefficient

105.3

0.0001

1.2

0.0042

4.5

0.0956

-0.5

0.2354

-2.4

0.0003

Problem 11.

a) What type of analysis generated the results summarized

above?

Multiple linear regression analysis because the outcome

(systolic blood pressure) is continuous.

b) Which of the risk factors are significantly associated with

systolic blood pressure?

significant at the 5% level (both have p values < 0.05). Male

sex is marginally significant with a p value of 0.0956.

Problem 11.

c) What is the relative importance of the risk factors?

The most important (statistically significant) risk factor is number of

hours of exercise per week, followed by age and then male sex.

Current smoking status is not statistically significant.

d) How would you interpret the regression coefficient associated with

male sex? With number of hours of exercise per week?

Mens systolic blood pressure is 4.5 units higher than womens

holding age, smoking status and number of hours of exercise

constant. Each additional hour of exercise per week is associated

with a reduction of 2.4 units of systolic blood pressure holding age,

sex and current smoking status constant.

Problem 12.

The following table was presented in a journal and describes

the associations between demographic and clinical risk factors

and hypertension.

Risk Factors

Outcome = Hypertension

Regression Coefficient

3.5

0.0001

Age

0.02

0.0357

Male Sex

0.27

0.0264

-0.005

0.7564

-0.36

0.0111

Intercept

Current Smoker

Number of Hrs Exercise/Week

Problem 12.

a) What type of analysis generated the results summarized above?

Multiple logistic regression analysis because the outcome

(hypertension) is dichotomous.

b) Which of the risk factors are significantly associated with

hypertension?

Age, male sex and number of hours of exercise are statistically

significant at the 5% level (both have p values < 0.05).

c) What is the relative importance of the risk factors?

The most important (statistically significant) risk factor is number of

hours of exercise per week, followed by male sex and then age.

Current smoking status is not statistically significant.

Problem 12.

d) Compute odds ratios for each of the risk factors.

Risk Factors

Outcome = Hypertension

Regression Coefficient

Odds Ratio

Age

0.02

1.02

Male Sex

0.27

1.31

-0.005

0.99

-0.36

0.70

Current Smoker

Number of Hrs Exercise/Week

male sex? With number of hours of exercise per week?

Men are 1.31 times more likely to have hypertension than women, holding

age, current smoking status and number of hours of exercise per week

constant.

Each additional hour of exercise per week is associated with a 30% reduction in

the likelihood that someone has hypertension, holding age, sex and current

smoking status constant.

Problem 13.

A study is conducted to assess whether there is a difference in physicians

opinions regarding the treatment of early stage throat cancer. Specifically,

physicians were asked if they would recommend radiation, surgery or

neither upon initial diagnosis. Based on the data below, is there a

relationship between treatment recommendations and physicians age?

Run the test at a 5% level of significance.

Radiation

Surgery

Neither

Total

<40

35

15

50

100

40-59

29

30

41

100

60-79

40

43

22

105

Total

104

88

113

305

Problem 13.

H0: Age and treatment recommendation are independent

H1: H0 is false

=0.05

2

(

O

E

)

2

E

Df = (r-1)(c-1) = (3-1)(3-1) = 4.

Reject H0 if 2 > 9.49

(35 34 .1) 2 (15 28 .9) 2 (50 37 ) 2 (29 34 .1) 2 (30 28 .9) 2 (41 37 ) 2

34 .1

28 .9

37

34 .1

28 .9

37

(40 35 .8) 2 (43 30 .3) 2 (22 38 .9) 2

35 .8

30 .3

38 .9

2

Radiation

Surgery

Neither

Total

<40

35 (34.1)

15 (28.9)

50 (37.0)

100

40-59

29 (34.1)

30 (28.9)

41 (37.0)

100

60-79

40 (35.8)

43 (30.3)

22 (38.9)

105

Total

104

88

113

305

Reject H0 because 25.66 > 9.49. We have significant evidence, =0.05,

to show that age and treatment recommendation are not independent.

Problem 14.

For each of the following scenarios,

indicate which test would be used. Use

the letters below to indicate the test in

the space provided. Note that the same

test might be used for more than one

scenario.

Problem 14.

a)

b)

c)

d)

e)

f)

g)

h)

i)

j)

k)

Compare proportion to historical/external control

Compare two independent means

Compare two matched/paired means

Analysis of variance

Chi-square goodness of fit test

Chi-square test of independence

Correlation analysis

Linear regression analysis

Logistic regression analysis

Survival analysis

Problem 14.

Scenario

1. We want to test if there is a significant association between BMI (kg/m2) and

incident myocardial infarction adjusting for age, sex, systolic blood pressure and

smoking.

2. We want to test if a new environmental intervention is effective in reducing

exposure to second-hand smoke. Each participant in the study has levels of exposure

measured before and after the intervention is implemented.

3. We wish to test if there is a significant association between GRE scores and first

year GPA in MPH students who matriculated in fall 2011.

4. We want to determine if there are significant differences in ages of participants

enrolled in a study comparing those with a family history of cardiovascular disease to

those without.

5. A study reports that 15% of college freshman smoke. We want to test if

significantly more BU freshman smoke.

6. We want to test if there is a difference in preterm versus term deliveries among

women of black, Hispanic and white race.

7. We want to test if nutritional supplements prolong life (minimize time to death) in

persons over 65 years of age, adjusted for sex and other comorbid conditions.

8. A clinical trial is run to assess the safety of a new drug compared to a standard

drug and the outcome is development of skin rash or not

9. We want to test if there is a difference in mean time to complete a physical task

when comparing 12, 13, 14 and 15 year olds.

10. We want to test whether smoking in pregnancy increases the risk of infection in

newborns.

Test

j

d

h or i

c

b

g

k

g or j

e

g or k

- Question.docxUploaded byRadjes Kase
- 12.Hypothesis Test for Two MeansUploaded byFaithMayfair
- Human Factors Engineering Studies of the Design and Use of Pushbutton Telephone Sets - Bell LabsUploaded byscrappydappyboo
- Business Research Session 11 Data AnalysisUploaded byleuthai
- PRINT ABSTRAKUploaded byRaihan Ardia
- 2158244015582229.pdfUploaded byPriyal Jain
- Parametric TestUploaded byCarlos Manson
- gunaUploaded byJo Thi Manikandan
- Analysis Customer Satisfaction for Developer Performance on Cluster Edison Summarecon SerpongUploaded byInternational Journal of Innovative Science and Research Technology
- Unique InnovationUploaded bysufyanbutt007
- Chi Square Fluency & FlexibilityUploaded byhaniza6385
- art%3A10.1007%2Fs00024-015-1112-z.pdfUploaded byMelania Ionescu
- De SposatoUploaded byPriscilla Leine
- Analysis 2001Uploaded byHoai Chau
- 86560-213284-1-PBUploaded byMansi Tomar
- Ppt VarianceUploaded byPashmeen Kaur
- ps05Uploaded byspitzersglare
- NASKAH_PUBLIKASIUploaded byHerlinsye Purimahua
- 9814272795_279573Uploaded bymanuelq9
- A Review of Smoking and HealthUploaded byAtika Sugiarto
- ExerciseUploaded byfarrahnajihah
- Chi-Square Test of IndependenceUploaded byJhoanie Marie Cauan
- 5016707 (1)Uploaded byRakesh GN
- Metrics Lec01_02_03 SlidesUploaded byn2535
- Pertemuan 15 Comparing SystemUploaded bybumisatriawan
- 3 sarnyaUploaded byanju
- educ 780 quantitative section report 2Uploaded byapi-310887172
- CHAPTER III-R.docUploaded byFila DataSquare
- final objectivesUploaded byyasirzaidi1
- L05 LectureUploaded bynira_110

- Maple ManualUploaded byVijay Simha
- RegressionUploaded byki
- Statistics Miscellaneous QuestionsUploaded byNipun Goyal
- BBA - 106 - Lecture Notes on Regression AnalysisUploaded byTanmay Chakraborty
- AP statistics _ Formulas and TablesUploaded byJinhui Zheng
- distribusi_frekuensi_1Uploaded byabu_ali59
- Guide to Excel Statistical Functions, Routines and ToolsUploaded bywmsaddiq
- Hypothesis Testing 1Uploaded bysheelajeevakumar
- StatisticsUploaded byAnonymous hYMWbA
- Meucci 2011 - The PrayerUploaded byshuuchuu
- As an Educational ResearcherUploaded bySunil Kumar
- Tutorial 4 - Probability Distribution (With Answers)Uploaded byLiibanMaahir
- quatdr drstdrUploaded byiva afi
- Correlation Explained.pdfUploaded byK Murugan
- Convolution PaperUploaded bySyifa Aulia
- UNIT 3Uploaded bydeeksha
- Math 540 Week 2 Quiz 2Uploaded byGaryoFrobon
- ANOVA IntroductionUploaded bylavaniyan
- Forecasting Conflict in the BalkansUploaded byFikarxyzone
- OCR S2 Rsdadevision SheetUploaded byTrishnee Munusami
- [Bird] Analysis of variance via confidence intervals.pdfUploaded byeman_tenan2220
- FRM 2017 LearningObjectivesUploaded bypremseoul
- 04112995Uploaded byArnold medina sanchez
- Probability 2Uploaded byMohamed Meeran
- Bayesian Classification NGUploaded byJanani Aec
- DOE Course - Parts 1-4Uploaded byajay rana
- 12-adk-regresi-logistik-2-ks-2017Uploaded byanne
- GMM-2Uploaded byamubine
- Probability May 12Uploaded byMaos Wu
- Poisson Assgn.Uploaded byDaman Dhiman

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.