epidiomology statistik

PRACTICAL SESSIONS:

EPIDEMIOLOGY & STATISTICS

FACULTY OF MEDICINE

UNIVERSITI KEBANGSAAN MALAYSIA

KUALA LUMPUR

Scenario

An outbreak of gastroenteritis occurred in Bandar Tun Razak, a suburban

neighborhood, on the evening of April 28. A total of 89 people went to the

emergency departments of the three local hospitals during that evening. No more

cases were reported afterward.

The patients complained of headache, fever, nausea, vomiting and diarrhea. The

disease was severe enough in 19 patients to require hospitalization for

rehydration.

The local health department was immediately notified of a potential food-borne

outbreak of gastroenteritis in Bandar Tun Razak.

Exercise 1

1. Define epidemic, endemic and pandemic.

2. Describe the gastroenteritis outbreak according to disease transmission and

epidemiological triad.

3. What are the possible causes of the outbreak?

4. List and discuss steps that should be taken in outbreak investigations

5. What further information needed?

Exercise 2

The epidemic team, including a medical epidemiologist (public health physician

Health Officer), health inspectors and a nurse, visited the local hospitals to

interview the attending physicians, the patients and some of their relatives. Some

stool samples were obtained from patients for microbiologic identification of the

causative agent.

The distribution of the disease by person (age and gender) was found as follows:

Age group

0 - 5 yr

6 - 10 yr

11 yr and

older

Total by

gender

by Age and Gender

Female

Male

Total by age

No

%Females

No

%Male

No

%

1

1

38

37

10

Please calculate the totals for each column and row and their corresponding

percentages to try to determine if there are any important differences by age or

by gender. Interpret your findings.

Exercise 3

Therefore the epidemic team investigated the places where affected persons,

their relatives and neighbors ate that day (April 28). The following table shows

the team's findings:

Place

People

who

attended

Ill

people

Attack

rate

People

who did

not

attend

Ill

people

Cafeteria

LRT

207

61

157

47

Kedai

Makan Ali

246

25

122

13

Restaurant

ABC

475

68

189

29

Elementary

school

cafeteria

239

67

495

22

Attack

rate

Relative

risk

Please calculate the attack rates per 100 (incidence rates per 100) by place to try

to determine where the contaminated meal was served. For each place compare

attack rates (AR) for those who attended with attack rates for those who did not,

by using the relative risk (i.e., RR = AR in attendees/AR in non attendees).

Interpret your findings.

Exercise 4

Once the implicated place was determined, the investigation centered on the

food. The following table includes the food items served in that place on April 28:

Food

Item

Beef

rendang

Burger

Ate the food item

Did not eat the food item

No.

Ill

Attack

No.

Ill

Attack

people

people

rate

people

people

rate

Salad

276

218

105

28

21

49

266

131

297

27

14

15

Baked

potato

139

11

213

31

88

48

279

25

175

18

203

49

Fruit

cocktail

Ice

cream

Relative

risk

Important note: None of the kitchen personnel were ill. The names of the kitchen

personnel and their participation in the food preparation are as follows: Ms Mary

prepared the beef rendang and the potatoes, Johan prepared the salad and the

fruit, Salmah served all dishes except the ice cream, and Jamilah prepared the

burgers and served the ice cream. The ice cream was a commercial brand and

was bought at a nearby supermarket.

Please calculate the attack rates per 100 (incidence rates per 100) by food item

to try to determine the one that was probably contaminated. Compare attack

rates (AR) for those who ate the food item with attack rates for those who did not

eat the food item, by using the relative risk (i.e., RR = AR in those who ate the

food/AR in those who did not eat the food).

Interpret your findings.

Exercise 5

Given that the epidemic team worked fast enough and the implicated meal(s)

was (were) identified before all food leftovers were discarded, food samples from

some meal leftovers were taken to the laboratory. In addition, stool samples were

taken from the kitchen personnel who prepared or handled each different food

item.

The laboratory confirmed that Salmonella toxin was present in some of the food

samples and that one of the kitchen personnel of that place had the same

Salmonella species. Furthermore, the Salmonella species found in the food and

the kitchen worker was the same species found in stool samples of the patients.

Please discuss these findings and identify the kitchen worker possibly

responsible for the outbreak.

Discuss the general principle of prevention and control of gastroenteritis outbreak.

Screening: Definition

Screening Test

individuals, of those who are sufficiently at

risk of a specific disorder

Screening program:

Requirements (I)

Screening vs Diagnosis

z In

make a definitive diagnosis or offer

therapeutic intervention solely based

on a positive result

understood

z Have an agreed policy on whom to treat

z Prevalence of undiagnosed disease high

z Disease has high morbidity and mortality

z Of public health concern

z Early treatment easier and more effective

z

Screening program:

Requirements (II)

Signs present to indicate disease presence

Screening test acceptable and harmless

z Screening test must be valid

z Yield of screening must be high

z Diagnostic work-up for a positive test must

have acceptable morbidity

z Screening exercise must be cost-effective

z

z

a screening test

Resembles an observational study

Same concepts applied for diagnostic test

z Designed to determine how well a test can

discriminate between diseased and nondiseased

z A predictor variable (the test result)

z An outcome variable (presence or absence

of disease)

z

a screening test

z

screening tests:

z

z

Dichotomous

z

+ve or -ve

Continuous

z

TRUTH

Disease

Positive

A

True-positive

B

False-positive

Negative

C

False-negative

D

True-negative

Sensitivity =

Sensitivity

Sensitivity is the

proportion of those

with the disease who

tested positive

Indicates how good a

test is at identifying

the diseased

Specificity

z

Specificity is the

proportion of those

without the disease

who tested negative

Indicates how good a

test is at identifying

the non-diseased

No disease

TEST RESULT

Presence or absence

determined by a

gold standard

Categorical

Validity

Sensitivity and specificity

The disease as

outcome variable

A

A + C

x 100

Specificity =

D

B + D

x 100

z

RULE OUT the disease

CONFIRM the presence of disease

the usefulness of a test

z A test of efficient use of time and

resources

z PV estimate the probability of disease

z PV describe the frequency of correct

identification

z Positive PV and Negative PV

Predictive values

TRUTH

z Assess

Disease

Positive

A

True-positive

B

False-positive

Negative

C

False-negative

D

True-negative

PV+ =

Predictive values

PV of a positive test is the

proportion of individuals

who test +ve and have the

disease

the likelihood that a

person who tests positive

has the disease

proportion of individuals

who test -ve and dont

have the disease

the likelihood that a

person who tests negative

is actually disease free

D

x 100

C + D

implement a screening program

Comments

z

Disease status

Cancer

No cancer

Total

Positive

Negative

132

47

985

62295

1117

62342

Total

179

63280

63459

z

z

z

0.3%

73.7%

98.4%

11.8%

99.9%

PV-

Sensitivity: 132/179 x 100 =

Specificity: 62295/63280 x 100 =

+ve PV:

132/1117 x 100 =

-ve PV:

62295/62342 x 100 =

A

x 100

A + B

Predictive values

Mammography

No disease

TEST RESULT

False +ve tests outnumber the true +ve tests by

over 7:1 (PV+ =12%)

~7 in every 8 patients who had positive

mammograms had normal biopsies

Predictive value for a positive test is low (12%)

(False -ve 0.01%)

Affected By Prevalence Of Disease

SUMMARY

A screening test study determines the

usefulness of a test in identifying those at

risk of a disease

z Students must be able to calculate and

interpret sensitivity, specificity & predictive

values.

z

THANK YOU

Year-2, Semester-1

Trigger:

You are the State Medical Officer for AIDS/HIV of Negeri Sembilan and you are expected

to conduct a sentinel surveillance for HIV amongst;

o

Year-2, Semester-1

To select the appropriate screening test, you did a literature review and collated the following tables.

Calculate the sensitivity, specificity, PPV and NPV of each test to help you decide.

Positive

Negative

Total

Disease

Present

TP

FN

TP + FN

Disease

Absent

FP

TN

FP + TN

Total

TP + FP

FN + TN

N

TP = True Positive

FP = False Positive

FN = False Negative

TN = True Negative

Sensitivity = TP/(TP+FN) x 100%

Gold Standard

+

+

1000

9

EIA (blood)

0

8991

total

1000

9000

total

1009

8991

10,000

Gold Standard

+

+

999

270

PA

1

8730

total

1000

9000

total

1269

8731

10,000

PPV = TP/(TP+FP) x 100%

NPV = TN/(TN+FN) x 100%

Rapid Test

+

total

Gold Standard

+

998

180

2

8820

1000

9000

total

1178

8822

10000

+

total

Gold Standard

+

930

180

70

8820

1000

9000

total

1110

8890

10000

EIA

PA

Rapid

Oral

Sensitivity

Specificity

PPV

NPV

Year-2, Semester-1

Based on the earlier analysis, HIV EIA, a test with sensitivity of 100.0% and specificity of 99.9% was

selected to be used for the sentinel surveillance in Negeri Sembilan. You decided to include the

inmates of Pusat Serenti Tampin and Pusat Serenti Jelebu in the sentinel surveillance. Each study

population consisted of 10,000 people. Calculate the PPV, NPV and prevalence rate of HIV for each

study population.

Antenatal mothers

Disease

Present

Disease

Absent

Total

PPV =

NPV =

Positive

10

13

Negative

9987

9987

Total

9997

10000

Disease

Present

Disease

Absent

Total

Positive

10

19

Negative

9981

9981

Total

9991

10000

Disease

Present

Disease

Absent

Total

Positive

2000

2008

Negative

7992

7992

Total

2000

8000

10000

Blood donors

PPV =

NPV =

IVDU

Population

Antenatal mothers

Blood donors

IVDUs

Population with

HIV

3

9

2000

PPV =

Population

without HIV

9987

9991

8000

NPV =

TOTAL

Prevalence rate

10,000

10,000

10,000

Since the sensitivity and specificity is the same for all three study populations, please discuss how

PPV and NPV are affected by the prevalence of the disease in each study population.

PPV and NPV can also be calculated using the following formulas;

PPV =

Prevalence

x

Sensitivity

(Prev x Sen) + (1 - Prev)x (1 - Sp)

(1-Prev) x Sp + Prev x (1 - Sen)

Year-2, Semester-1

Population

Sensitivity

Specificity

Prevalence

0.01%

0.02%

0.03%

0.05%

0.09%

1.00%

5.00%

10.00%

20.00%

30.00%

+

10,000

+

a

b

a+b

100.00%

c

d

c+d

99.90%

a+c

b+d a+b+c+d

TP

FP

FN

TN

PPV

a

b

c

d

a+c

b+d

a/a+b

1

10

0

9,989

1

9,999

9.09%

2

10

0

9,988

2

9,998

16.67%

3

10

0

9,987

3

9,997

23.08%

5

10

0

9,985

5

9,995

33.34%

9

10

0

9,981

9

9,991

47.39%

100

10

0

9,890

100

9,900

90.99%

500

10

0

9,491

500

9,500

98.14%

1000

9

0

8991

1000

9,000

99.11%

2000

8

0

7992

2000

8,000

99.60%

3000

7

0

6993

3000

7,000

99.77%

NPV

d/c+d

100.00%

100.00%

100.00%

100.00%

100.00%

100.00%

100.00%

100.00%

100.00%

100.00%

<- Antenatal

<- Blood Donors

prevalence

20.0%

10.0%

5.0%

1.0%

0.1%

99%

99%

96.1%

91.7%

83.9%

50.0%

9.0%

sensitivity %

specificity %

95%

90%

95%

90%

82.6%

69.2%

67.9%

50.0%

50.0%

32.1%

16.1%

8.3%

1.9%

0.9%

80%

80%

50.0%

30.8%

17.4%

3.9%

0.4%

Year-2, Semester-1

References:

Osman Ali. 1990. Kaedah Epidemiologi. Penerbit: Dewan Bahasa Dan Pustaka.

UNAIDS/WHO. 2004. UNAIDS/WHO Policy Statement on HIV Testing

http://www.who.int/ethics/topics/hivtestingpolicy_who_unaids_en_2004.pdf

WHO (March 1997) Revised Recommendation for the Selection and Use of HIV Antibody Tests.

Weekly Epidemiological Record, No. 12. http://www.who.int/docstore/wer/pdf/1997/wer7212.pdf

WHOSEA. 1998. Standard Operating Procedures for Diagnosis of HIV Infection.

http://w3.whosea.org/bct/332/diagnosis1.htm

CDC. 2005. What are the different HIV screening tests available in the U.S.?

http://www.cdc.gov/hiv/pubs/faq/faq8.htm

USFDA. 2006. Donor Screening Assays for Infectious Agents and HIV Diagnostic Assays

http://www.fda.gov/cber/products/testkits.htm

Joseph Hellweg. 2005. Narrative and Secrecy: Sentinel Surveillance and Alternative Epidemiologies of

HIV/AIDS in Northwestern Cte d'Ivoire. Africa Conference 2005: African Health and Illness.

http://www.utexas.edu/conferences/africa/2005/panels/hellweg.html

Trisha Greenhalgh. 1997. How to read a paper: Papers that report diagnostic or screening tests. BMJ

1997;315:540-543 (30 August) http://bmj.bmjjournals.com/cgi/content/full/315/7107/540

PRACTICALS GUIDE

Medicine & Society Module (FF2613)

INTRODUCTION

In this module there will be 4 practical sessions for the research project and statistical exercises.

Students will be guided by the respective lecturer/tutor assigned to each lab.

The schedule for the practical sessions for this semester is as stated below;

DATE

14/07/10

TIME

10.30 12.30

TOPIC

CONTENT

Descriptive Statistics

& Research Project 1

given dataset, including calculating the measures

of central tendencies and variability using

statistical formulas.

Determine the title, objective, problem

framework, hypothesis and methodology.

Once the above has been agreed upon, as

homework, they are expected to write up the

proposal, including the questionnaire, which will

be discussed during the second practical session.

21/07/10

10.30 12.30

Analysis of

Quantitative Data &

Research Project 2

proportionate tests using the given dataset.

Presentation of the complete research proposal.

Upon acceptance, as homework, the students are

expected to distribute the questionnaires and

collect the data for the study. All completed

forms are to be brought to the third practical

session.

20/08/10

2.30 4.30

Correlation &

Research Project 3

regression using the given dataset.

Students are guided on how to enter the data into

the computer using Excel or SPSS. Each lab is

required to prepare a notebook for the session.

For homework, students will complete the data

entry for all collected data and bring the

complete file to the fourth practical session.

27/08/10

10.30 12.30

Research Project 4

and chi-square tests using the given dataset.

Each lecturer will demonstrate how to analyse

the data using computer and advice on the

interpretation of results. For homework, the

students will complete the analysis and prepare a

PowerPoint presentation for the final practical

session.

20/09/10

24/09/10

2.00 4.00

10.00 12.00

Research Project 5

the students will prepare a written report of the

study, to be submitted in two weeks time from

their presentation.

Practical 1

Descriptive Statistics

Introduction

In the old curriculum, the practical sessions were slotted immediately after the

respective lectures. In the past we had 25 hours of lectures and 8 practical sessions

just for statistics and research methodology. Now we only have 7 hours of lecture and

4 practical sessions for statistics and research methodology in the new curriculum.

Whenever possible, we try to slot the practical sessions according to lectures. But we

cant cover everything; therefore students are also expected to learn on their own.

Please be patient and persists in doing the exercises.

For this session, we are will learn about measures of central tendency and

variability. We use these measures of central tendency and variability to describe the

data that we collected. The measures of central tendency are mean, mode and median.

For variability, it is standard deviation (sd). Kindly refer to your formula sheet or your

books for help.

Measures of Central Tendency for Quantitative Data

1. Write down the formulas for mean in the boxes below;

Basic Formula

2. Calculate the mean, mode and median for the age i of the following respondents;

35 24 36 21 21 20 34 29 37 30 26 27 29 34 33 33 27 25 21 26 32 30 33 36 28 33 19

29 27 29 22 23 31 32 31

Total = ___________

Mean = __________

n = ________

Median = __________

Mode = __________

3. Write down the formulas for standard deviation in the boxes below;

Basic Formula

4. Using the data from Q.2, calculate the standard deviation and variance of the age i

of respondents.

x

x-mean (x-mean)2

19.00

20.00

21.00

21.00

21.00

22.00

23.00

24.00

25.00

26.00

26.00

27.00

27.00

27.00

28.00

29.00

29.00

29.00

Total

x

29.00

30.00

30.00

31.00

31.00

32.00

32.00

33.00

33.00

33.00

33.00

34.00

34.00

35.00

36.00

36.00

37.00

x-mean (x-mean)2

Total

Therefore standard deviation s = _________________

It is easy to calculate the mean and standard deviation for data with few observations.

But for studies with large number of samples, it is much harder. Therefore for large

studies, the quantitative data are sorted in frequency tables such as the one below;

5. These are data from a case-control study to identify factors that are associated with

small for gestational age amongst newborn babies. For the table below, the factor

being studied is the weight of the mothers during first trimester (first three months of

pregnancy) and the incidence of babies with low birth weight.

Weight during first

All

Frequency Frequency of

trimester in kg Frequencies of Cases

Controls

30.0-39.9

5

5

0

40.0-49.9

69

48

21

50.0-59.9

82

43

39

60.0-69.9

45

10

35

70.0-79.9

10

2

8

80.0-89.9

3

1

2

90.0-99.9

4

1

3

Total

218

110

108

For the following exercise, calculate the mean, mode, median and standard deviation

for both cases and controls. To simplify matters, just fill up the table below;

For cases;

Weight in kg

30.0-39.9

40.0-49.9

50.0-59.9

60.0-69.9

70.0-79.9

80.0-89.9

90.0-99.9

Total

Frequency

5

48

43

10

2

1

1

110

m.p

34.95

44.95

54.95

64.95

74.95

84.95

94.95

f.mp

f.mp2

f cumulative

5

53

96

106

108

109

110

For controls;

Weight in kg

30.0-39.9

40.0-49.9

50.0-59.9

60.0-69.9

70.0-79.9

80.0-89.9

90.0-99.9

Total

Frequency

0

21

39

35

8

2

3

108

m.p

34.95

44.95

54.95

64.95

74.95

84.95

94.95

f.mp

0

f.mp2

0

f cumulative

0

21

60

95

103

105

108

Fill up your answers in the table below;

Case

Control

Mean

Mode

+

Median

+

Standard deviation

=

Practical 1b

Research Proposal

Each lab group is required to come up with a research proposal, collect the data

required, analyse the data, present their findings and write up the final report for

submission.

For this session, the students are expected to agree on the;

Title of the research

Objectives

Problem Framework

Hypothesis

Methodology

Once the above has been agreed upon, as homework, they are expected to write up the

proposal, including the questionnaire, which will be discussed during the second

practical session.

Practical 2

Inferential Statistics

Statistical Tests & Types of Variables

In general there are 2 types of variables; qualitative & quantitative. When you want to

test the association between 2 variables, the type of test to be utilised depends on the

type of variables. The tables below gave a general guide on the correct statistical test

for the respective variable types.

Qualitative Data Analysis

Parametric Analysis

Qualitative

Dichotomus

Qualitative

Polinomial

Quantitative

Quantitative

Student's t Test

Quantitative

ANOVA

Quantitative

same individual & item (e.g. Hb

level before & after treatment).

Normally distributed data

Quantitative continous

continous

Linear Regresssion

Non-Parametric Analysis

Variable 1

Qualitative

Dichotomus

Qualitative

Dichotomus

Qualitative

Polinomial

Quantitative

Quantitative continous

Variable 2

Qualitative

Dichotomus

Criteria

Type of Test

Sample size < 20 or (< 40 but Fisher Test

with at least one expected value

< 5)

Quantitative Data not normally distributed

Wilcoxon Rank Sum

Test or U MannWhitney Test

Quantitative Data not normally distributed

Kruskal-Wallis One

Way ANOVA Test

Quantitative Repeated measurement of the Wilcoxon Rank Sign

same individual & item

Test

Quantitative - Data not normally distributed

Spearman/Kendall

continous

Rank Correlation

Practical 2

This is the second practical session for this module. In this session, we will be

conducting exercises on Students t-test, paired t-test and proportionate test.

Students t-test

1a. Write down the formula for Students t-test in the boxes below;

Basic Formula

b. Based on results from the previous session, Q5, complete the boxes below;

Case

Control

110

108

Mean

Standard deviation

n

There is a difference of first trimester body weight between the cases (mothers with

SGA babies) and controls (mothers with non-SGA babies).

c. Write down the null hypothesis;

e. Please refer to table A1 and A3, and try to estimate the p value from the t value

calculated. Discuss which table is more appropriate for this exercise.

f. Based on the above p value, is the null hypothesis rejected?

g. Is there a significant difference of first trimester weight between the two groups?

Explain your answer.

2. During the examination, we will not tell you what test to use. Instead the students

are expected to choose the appropriate one based on the problem and the data given.

For example, try to do the exercise below;

A case-control study to identify factors that can cause small for gestational age SGA

was conducted. Among the factors studied were the mothers heights. It is believed

that the shorter mothers were of higher risk to get SGA babies.

Total of samples n

Total of weight x

Total of (x-mean)2

Total of samples n

Total of weight x

Total of (x-mean)2

Case

110

16620

2326

Control

108

16439

3605

Both groups

218

33059

5931

a. State the hypothesis and null hypothesis for the above problem.

c. Using the data given, conduct the statistical test.

Paired t-test

3a. Write down the formula for paired t-test in the box below;

Basic Formula

b. Thirty of the pregnant mothers were found to be anaemic during their second

trimester follow-up. They were treated with haematinics for 2 months and their

haemoglobin levels were measured again. To measure the effectiveness of the

treatment, please complete the table below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

Hb1

9.3

9.5

9.5

9.6

9.7

9.8

9.8

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.1

Hb2

9.5

10.0

10.0

11.0

12.0

9.0

9.6

7.2

9.6

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.0

10.3

10.5

10.6

10.8

11.0

11.0

11.0

11.0

11.5

13.0

13.0

13.0

11.0

D2

Total

c. Is the intervention effective? Do a paired t-test analysis using the data above.

Proportionate Test

4a. Write down the formula for proportionate test in the box below;

Basic Formula

The rate of SGA for mothers exposed to cigarette smoke (passive smoker) was

89/156. The rate of SGA for mothers not exposed to cigarette smoke was 20/61.

b. State the appropriate null hypothesis.

c. Do the proportionate test and discuss its result using 0.05 as the level of

significance (the z value in the normal distribution table for 0.05 as the level of

significance is 1.96).

Research Project 2

Presentation of the complete research proposal. Upon acceptance of the proposal, as

homework, the students are expected to distribute the questionnaires and collect the

data for the study. All completed forms are to be brought to the third practical session.

Practical 3

Inferential Statistics 2

Introduction

This is the third practical session. In this session we will do exercises on Pearson

correlation and linear regression.

Pearson Correlation

1a. Write down the formula for Pearson Correlation in the boxes below;

Basic Formula for r

(x-mean x)2

(y-mean y)2

As you can see from the formulas above, to calculate the correlation coefficient (r),

you need to identify the following;

Total of the first variable (x),

Total of the first variable squared (x2),

Total of the second variable (y),

Total of the second variable squared (y2) and

Total of the two variables multiplied (xy).

Just imagine the number of calculations that you have to do before you even get to

calculate the correlation coefficient (r). If the sample size is 150, you will have to do

more than 455 calculations. Since youll be doing this calculations manually, the

chance of error occurring is quite high indeed.

For exercise, complete the following table. Measure the time required to complete it.

Once done, please note that you may have to do the same thing again for a dataset 5

times larger than this..

2.

A case-control study to identify factors that can cause small for gestational age

SGA was conducted.

In the past exercise, we have proven that there is an association between the

mothers first trimester weight and SGA.

Now we want to see whether there is an association between the mothers first

trimester weight (WEIGHT2) and the childs birth weight (BIRTHWGT).

Please complete the following table;

INDEX

9

10

12

20

21

29

31

32

34

43

60

70

72

79

90

97

117

126

131

138

145

146

156

159

171

173

174

175

178

181

TOTAL

WEIGHT2

42.00

40.00

66.00

51.50

47.50

39.50

40.00

46.50

55.00

49.20

45.00

63.50

52.40

52.30

47.50

62.00

55.10

72.00

61.50

86.00

60.80

44.00

58.00

70.00

44.00

59.50

47.50

53.00

62.50

92.00

WEIGHT22

BIRTHWGT

2.40

2.30

2.30

2.10

2.23

2.49

2.46

2.52

2.28

2.20

2.48

2.00

2.31

2.15

2.55

2.41

3.46

3.50

2.97

3.48

3.00

2.84

3.55

3.19

3.09

3.56

3.16

3.10

3.27

3.00

BIRTHWGT2

xy

a. State the null hypothesis for correlation test between the two variables.

b. Conduct the correlation test and calculate the r (correlation coefficient). How

strong is the relationship between the two variables?

the one below;

3.6

3.4

3.2

3.0

2.8

2.6

2.4

2.2

2.0

Babies' Birthweight

1.8

1.6

1.4

1.2

1.0

.8

.6

r = 0.431, p = 0.017

.4

.2

0.0

Rsq = 0.1874

0

10

20

30

40

50

60

70

80

90

100

Mothers' Weight

To expect the students to calculate all that during the examination, would be

rather cruel. Instead, usually, all the required data will be given, along with some

extraneous data, just to confuse the students. It is up to the students to select the

appropriate data and use it in the appropriate statistical test.

3. A case-control study to identify factors that can cause small for gestational age

SGA was conducted. Among the factors studied were whether there is an

association between the mothers height in cm (HEIGHT) and the childs birth

weight in kilogram (BIRTHWGT).

n = 218

Mean

Standard deviation

(observation)

(observation2)

(observation 1 x observation 2)

HEIGHT

151.65

5.26

33059.00

5019291.00

BIRTHWGT

2.79

0.54

608.46

1760.98

92386.35

a. Name the appropriate statistical test to test the association between the two

variables.

b. State the null hypothesis for the above statistical test.

c. Conduct the statistical test including the test of significance. Discuss the result of

the test.

Linear Regression

4a. Write down the formula for linear regression in the boxes below;

Basic Formula

b. Using the data from Q2, conduct the test for linear regression and calculate the

regression co-efficient (b) and constant (a).

d. Draw a rough diagram of the final equation from the calculation.

Research Project 3

Students will be guided on how to enter the data that they have collected into the

computer using Excel or SPSS. Each lab is required to prepare a notebook for the

session.

For homework, students are required to complete the data entry for all collected data

and bring the completed file to the fourth practical session.

!"#$%&$#'()(

*+,-"-+%&#'(.%#%&/%&$(0(

(

*+%"123$%&1+!

!

"#! $%&'! ()*+$&+*,! '-''&.#/! 0-! 0&,,! 1-! 2.! -4-)+&'-'! .#! +%&5'67*)-! $-'$! *#2! #.#5

(*)*8-$)&+!*#*,9'&':!!

!

45&6.73#"-(8-/%(9:;<(

!

;%&'!&'!$%-!8.'$!<)-67-#$!'$*$&'$&+*,!*#*,9'&'!$%*$!&'!$-'$-2!<.)!27)!-4*8&#*$&.#:!=.!

8*>-! '7)-! $%*$! 9.7! )-*,,9! 7#2-)'$*#2! &$:! ;%&'! *#*,9'&'! &'! 2.#-! $.! $-'$! <.)! *''.+&*$&.#!

1-$0--#!$0.!67*,&$*$&?-!?*)&*1,-':!!

!

@1'-)?-2!2*$*!0.7,2!1-!'.)$-2!*++.)2,9!&#!*!+.#$-#+9!$*1,-:!;%-#!$%-!-4(-+$-2!

?*,7-!$*1,-!&'!+*,+7,*$-2/!7'!$%-!).0'!*#2!+.,78#!$.$*,'/!*'!&,,7'$)*$-2!1-,.0A!!

!

@1'-)?*$&.#!;*1,-!

!

B!

5!

!

B!

*!

1!

3!

5!

+!

2!

%!

!

-!

<!

#!

!

C4(-+$-2!D*,7-!;*1,-!

!

B!

5!

!

B!

-3E#!

<3E#!

3!

5!

-%E#!

<%E#!

%!

!

-!

<!

#!

!

!

F%&5'67*)-! &'! +*,+7,*$-2! 19! '788! 7(! G.1'-)?-2! H! -4(-+$-2IJE-4(-+$-2! <.)! -*+%!

+-,,:!

!

!

J!

KJ L!M!!G@5CI

!

!!!

C!

!

2<!L!G)!H!NI!G+!H!NI!

!

!

O.!$%-!<.,,.0!-4-)+&'-:!;%-#!+.8(*)-!0&$%!$%-!*#'0-)!<.)!P:Q+!<).8!R)*+$&+*,!J:!!

!

N: ;%-! )*$-! .<! =ST! <.)! 8.$%-)'! -4(.'-2! $.! +&3*)-$$-! '8.>-! GU(*''&?-! '8.>-)VI! &'!

WXENYZ:!;%-!)*$-!.<!=ST!<.)!8.$%-)'!#.$!-4(.'-2!$.!+&3*)-$$-!'8.>-!&'!J[EZN:!!

!

@1'-)?*$&.#!$*1,-!

!

=ST!

\.)8*,!

!

R*''&?-!=8.>-)!

WX!

Z]!

NYZ!

\.#5=8.>-)!

J[!

QN!

ZN!

!

N[X!

N[W!

JN]!

!

*: F.8(,-$-!$%-!$*1,-!.<!-4(-+$-2!?*,7-'!1-,.0A!

!

!

=ST!

\.)8*,!

!

R*''&?-!=8.>-)!

!

!

!

\.#5=8.>-)!

!

!

!

!

!

!

!

!

1: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!

!

!

!

+: ^%*$! &'! $%-! )*$-! .<! =ST! <.)! (*''&?-! '8.>-)'! *#2! #.#5'8.>-)'_! "'! $%-)-! *#9!

2&<<-)-#+-_!!

!

!

!

2: F.#27+$! $%-! *(().()&*$-! '$*$&'$&+*,! $-'$! $.! ().?-! 9.7)! %9(.$%-'&':! O&'+7''! 9.7)!

<':!

(

(

(

(

(

(

(

(

(

(

(

(

=&/5-">/(?@#$%(8-/%(

!

`&'%-)a'!C4*+$!;-'$!&'!+.#27+$-2!$.!$-'$!$%-!*''.+&*$&.#!1-$0--#!J!67*,&$*$&?-!?*)&*1,-'!

*#2!%*'!*!'8*,,!'*8(,-!'&b-A!,-''!$%*#!J[!.)!,-''!$%*#!Q[!*#2!.#-!.<!$%-!-4(-+$-2!

?*,7-'!&'!,-''!$%*#!Y:!;%-!<.)87,*!&'!*'!<.,,.0'A!!

!

!

B!

5!

!

!!!!!!().1*1&,&$9!(!L!!!-c<c3c%c!!!c!

B!

*!

1!

3!

!

!

!!!!#c*c1c+c2c!

!

!

!

!

!

5!

+!

2!

%!

!

-!

<!

#!

!

J: `).8! $%-! -*),&-)! =ST! '$729/! Jd! .<! $%-! )-'(.#2-#$'! %*2! 8&'+*))&*3-'! &#! $%-! (*'$:!

e9! *#*,9'! $%&'! 3).7(! .<! (*$&-#$'! 0&$%! (..)! .1'$-$)&+! %&'$.)9/! &'! $%-)-! *#!

*''.+&*$&.#! 1-$0--#! -4(.'7)-! $.! +&3*)-$$-! '8.>-! *#2! =ST_! e*'-2! .#! $%-!

<.,,.0!+.#$-#+9!$*1,-/!+.#27+$!$%-!*(().()&*$-!'$*$&'$&+*,!$-'$:!!

!

R*''&?-!=8.>-)!

\.#5=8.>-)!

!

=ST!

N[!

[!

N[!

\.)8*,!

]!

Z!

Nd!

!

N]!

Z!

Jd!

!

*: ^%*$!&'!$%-!(!?*,7-_!

!

1: ^%*$!+.#+,7'&.#!+*#!9.7!8*>-!<).8!$%-!*1.?-!)-'7,$'_!

(

(

A&'$1@1+(B#+C(.3D(8-/%((

(

;%&'!$-'$!&'!$%-!#.#5(*)*8-$)&+!-67&?*,-#$!.<!$%-!=$72-#$a'!$!$-'$/!17$!+.#27+$-2!.#!#.$!

#.)8*,,9! 2&'$)&17$-2! 2*$*:! "$! &'! 7'-2! $.! $-'$! <.)! $%-! *''.+&*$&.#! 1-$0--#! *! 67*,&$*$&?-!

2&+%.$.8.7'!?*)&*1,-!0&$%!*!67*#$&$*$&?-!?*)&*1,-:!;%-!8-$%.2!&'!'&8(,-/!f7'$!'.)$!$%-!

2*$*! &#! *#! *'+-#2! .)2-)/! )*#>! $%-8/! '78! 7(! $%-! )*#>'! *++.)2! $.! 3).7('! *#2!

+.8(*)-!$%-!?*,7-!0&$%!$%-!$*1,-!.<!+)&$&+*,!?*,7-'!<.)!^&,+.4.#!g*#>!=78!;-'$:!!

!

d: `.)! ()*+$&'-/! 2.! $%-! <.,,.0! -4-)+&'-:! ;%-! 2*$*! &'! *! '71'-$! .<! $%-! -*),&-)! '$729:!

^-! *)-! $)9! $.! '--! 0%-$%-)! $%-)-! &'! *#9! *''.+&*$&.#! 1-$0--#! -4(.'7)-! $.!

+&3*)-$$-!'8.>-!*#2!$%-!0-&3%$!.<!$%-!1*19:!=&#+-!$%-!'*8(,-!'&b-!&'!67&$-!'8*,,/!

$%-!*(().()&*$-!$-'$!&'!*!#.#5(*)*8-$)&+!*#*,9'&':!!

!

*: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!

!

-: F.#27+$! $%-! *(().()&*$-! '$*$&'$&+*,! $-'$! $.! ().?-! 9.7)! %9(.$%-'&':! O&'+7''! 9.7)!

<':!

!

E1+6.D1C-"(9+FGH<(

J&"%5(A-&K5%(

B#+C(

Q:J[!

!

d:XZ!

!

d:][!

!

d:ZN!

!

d:JZ!

!

d:NY!

!

d:NJ!

!

d:[[!

!

J:XW!

!

J:WQ!

!

J:WN!

!

J:Y]!

!

J:QQ!

!

J:Qd!

!

J:N[!

!

!#//&I-(.D1C-"(9+FGH<(

J&"%5(A-&K5%(

B#+C(

d:]Z!

!

d:Z[!

!

d:YY!

!

d:QW!

!

d:JY!

!

d:[Z!

!

d:[Y!

!

J:YY!

!

J:Q]!

!

J:QZ!

!

J:QY!

!

J:QY!

!

J:Qd!

!

J:d[!

!

J:[X!

!

A&'$1@1+(B#+C(.&K+(8-/%(

!

;%&'! $-'$! &'! $%-! #.#5(*)*8-$)&+! -67&?*,-#$! .<! $%-! (*&)-2! $! $-'$/! 17$! +.#27+$-2! .#! #.$!

#.)8*,,9! 2&'$)&17$-2! 2*$*:! "$! &'! +.#27+$-2! $.! $-'$! 0%-$%-)! $%-)-! &'! *#9! *''.+&*$&.#!

1-$0--#!J!67*#$&$*$&?-!?*)&*1,-'!0%&+%!*)-!)-(-*$-2!8-*'7)-'!.#!$%-!'*8-!&?&27*,/!

.<! $%-! '*8-! $%/! *$! 2&<<-)-#$! $&8-':! T'! &+*$-2! 19! $%-! #*8-/! $%-! +*,+7,*$&.#!

2-(-#2'! .#! $%-! '&3#! *#2! )-,*$&?-! 8*3#&$72-! .<! $%-! 2*$*/! #.$! .<! $%-! )-*,! ?*,7-! .<! $%-!

2*$*:!!

!

Q: `.)! ()*+$&'-/! 2.! $%-! <.,,.0! -4-)+&'-:! ;%-! 2*$*! &'! *! '71'-$! .<! $%-! -*),&-)! '$729:!

^-!*)-!$)9!$.!'--!0%-$%-)!$%-!&#$-)?-#$&.#!.<!%*-8*$&#&+'!+*#!&#+)-*'-!$%-!,-?-,!

.<!%*-8.3,.1&#!.<!$%-!*#*-8&+!8.$%-)':!)-!&'!*#9!*''.+&*$&.#!1-$0--#!-4(.'7)-!$.!

+&3*)-$$-! '8.>-! *#2! $%-! 0-&3%$! .<! $%-! 1*19:! =&#+-! $%-! '*8(,-! &'! 67&$-! '8*,,/! $%-!

*(().()&*$-!$-'$!&'!*!#.#5(*)*8-$)&+!*#*,9'&':!!

!

*: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!

!

1: F.#27+$!$%-!*(().()&*$-!'$*$&'$&+*,!$-'$!$.!().?-!9.7)!%9(.$%-'&':!O&'+7''!9.7)!

<':!

!

!

!

!

!

!

!

"\OCh!

ieJ!

ied!

i1!O&<<!

g*#>!

N!

N[:Q!

N[:[!

!

!

Y!

N[:]!

NN:[!

!

!

Nd!

N[:Y!

NN:[!

!

!

N]!

N[:Z!

N[:W!

!

!

NX!

N[:Y!

NN:[!

!

!

J[!

N[:]!

NN:[!

!

!

JX!

N[:d!

X:Y!

!

!

Z[!

X:d!

X:Y!

!

!

ZN!

N[:[!

NN:Y!

!

!

XJ!

N[:Y!

N[:W!

!

!

XQ!

N[:Q!

W:J!

!

!

XY!

N[:[!

]:J!

!

!

NdQ!

N[:J!

NN:[!

!

!

NWW!

N[:[!

N[:[!

!

!

NX]!

N[:J!

N[:[!

!

!

(

(

B-/-#"$5(!"1L-$%()(

!

C*+%!,-+$7)-)!0&,,!2-8.#'$)*$-!%.0!$.!*#*,9'-!$%-!2*$*!7'!$%-!+.8(7$-)!*#2!*2?&+-!

$%-!'$72-#$'!.#!%.0!$.!&#$-)()-$!$%-!)-'7,$':!`.)!%.8-0.)>/!$%-!'$72-#$'!0&,,!+.8(,-$-!

$%-!*#*,9'&'!*#2!()-(*)-!*!R.0-)R.&#$!()-'-#$*$&.#!<.)!$%-!<&#*,!()*+$&+*,!'-''&.#:!

i*>+&($*!$-)(-,&%*)*!O)!Tb8&!j.%2!;*8&,!

T8*,&Z:2.+!JQ5W5[Z:!

