You are on page 1of 43

FF2613

MEDICINE & SOCIETY II

PRACTICAL SESSIONS:
EPIDEMIOLOGY & STATISTICS

FOR YEAR 2 STUDENTS ONLY

DEPARTMENT OF COMMUNITY HEALTH


FACULTY OF MEDICINE
UNIVERSITI KEBANGSAAN MALAYSIA
KUALA LUMPUR

Scenario
An outbreak of gastroenteritis occurred in Bandar Tun Razak, a suburban
neighborhood, on the evening of April 28. A total of 89 people went to the
emergency departments of the three local hospitals during that evening. No more
cases were reported afterward.
The patients complained of headache, fever, nausea, vomiting and diarrhea. The
disease was severe enough in 19 patients to require hospitalization for
rehydration.
The local health department was immediately notified of a potential food-borne
outbreak of gastroenteritis in Bandar Tun Razak.
Exercise 1
1. Define epidemic, endemic and pandemic.
2. Describe the gastroenteritis outbreak according to disease transmission and
epidemiological triad.
3. What are the possible causes of the outbreak?
4. List and discuss steps that should be taken in outbreak investigations
5. What further information needed?

Exercise 2
The epidemic team, including a medical epidemiologist (public health physician
Health Officer), health inspectors and a nurse, visited the local hospitals to
interview the attending physicians, the patients and some of their relatives. Some
stool samples were obtained from patients for microbiologic identification of the
causative agent.
The distribution of the disease by person (age and gender) was found as follows:

Age group
0 - 5 yr
6 - 10 yr
11 yr and
older
Total by
gender

Gastroenteritis Outbreak Findings by Person, Case Distribution


by Age and Gender
Female
Male
Total by age
No
%Females
No
%Male
No
%
1
1
38
37
10

Please calculate the totals for each column and row and their corresponding
percentages to try to determine if there are any important differences by age or
by gender. Interpret your findings.

Discuss the epidemic curve above

Exercise 3
Therefore the epidemic team investigated the places where affected persons,
their relatives and neighbors ate that day (April 28). The following table shows
the team's findings:

Gastroenteritis Outbreak Findings by Place

Place

People
who
attended

Ill
people

Attack
rate

People
who did
not
attend

Ill
people

Cafeteria
LRT

207

61

157

47

Kedai
Makan Ali

246

25

122

13

Restaurant
ABC

475

68

189

29

Elementary
school
cafeteria

239

67

495

22

Attack
rate

Relative
risk

Please calculate the attack rates per 100 (incidence rates per 100) by place to try
to determine where the contaminated meal was served. For each place compare
attack rates (AR) for those who attended with attack rates for those who did not,
by using the relative risk (i.e., RR = AR in attendees/AR in non attendees).
Interpret your findings.

Exercise 4
Once the implicated place was determined, the investigation centered on the
food. The following table includes the food items served in that place on April 28:

Food
Item
Beef
rendang
Burger

Gastroenteritis Outbreak Findings by Person


Ate the food item
Did not eat the food item
No.
Ill
Attack
No.
Ill
Attack
people
people
rate
people
people
rate

Salad

276
218
105

28
21
49

266
131
297

27
14
15

Baked
potato

139

11

213

31

88

48

279

25

175

18

203

49

Fruit
cocktail
Ice
cream

Relative
risk

Important note: None of the kitchen personnel were ill. The names of the kitchen
personnel and their participation in the food preparation are as follows: Ms Mary
prepared the beef rendang and the potatoes, Johan prepared the salad and the
fruit, Salmah served all dishes except the ice cream, and Jamilah prepared the
burgers and served the ice cream. The ice cream was a commercial brand and
was bought at a nearby supermarket.
Please calculate the attack rates per 100 (incidence rates per 100) by food item
to try to determine the one that was probably contaminated. Compare attack
rates (AR) for those who ate the food item with attack rates for those who did not
eat the food item, by using the relative risk (i.e., RR = AR in those who ate the
food/AR in those who did not eat the food).
Interpret your findings.

Exercise 5
Given that the epidemic team worked fast enough and the implicated meal(s)
was (were) identified before all food leftovers were discarded, food samples from
some meal leftovers were taken to the laboratory. In addition, stool samples were
taken from the kitchen personnel who prepared or handled each different food
item.
The laboratory confirmed that Salmonella toxin was present in some of the food
samples and that one of the kitchen personnel of that place had the same
Salmonella species. Furthermore, the Salmonella species found in the food and
the kitchen worker was the same species found in stool samples of the patients.
Please discuss these findings and identify the kitchen worker possibly
responsible for the outbreak.
Discuss the general principle of prevention and control of gastroenteritis outbreak.

Screening: Definition

Screening Test

The identification, amongst apparently healthy


individuals, of those who are sufficiently at
risk of a specific disorder

Screening program:
Requirements (I)

Screening vs Diagnosis
z In

screening, there is no intention to


make a definitive diagnosis or offer
therapeutic intervention solely based
on a positive result

Natural history of disease must be


understood
z Have an agreed policy on whom to treat
z Prevalence of undiagnosed disease high
z Disease has high morbidity and mortality
z Of public health concern
z Early treatment easier and more effective
z

Screening program:
Requirements (II)
Signs present to indicate disease presence
Screening test acceptable and harmless
z Screening test must be valid
z Yield of screening must be high
z Diagnostic work-up for a positive test must
have acceptable morbidity
z Screening exercise must be cost-effective
z

The ideal screening test


z

Would always give the right answer

Quick, safe and simple

Painless, reliable and inexpensive

Structure of a study involving


a screening test
Resembles an observational study
Same concepts applied for diagnostic test
z Designed to determine how well a test can
discriminate between diseased and nondiseased
z A predictor variable (the test result)
z An outcome variable (presence or absence
of disease)
z

Structure of a study involving


a screening test
z

The test result

Measures of accuracy for


screening tests:
z
z

Dichotomous
z

+ve or -ve

+, ++, +++, ++++

Continuous
z

mg/dl, ng/L, etc.

Evaluation of a screening test


TRUTH
Disease

Predictive values (PV)

Positive

A
True-positive

B
False-positive

Negative

C
False-negative

D
True-negative

Sensitivity =

Sensitivity

Sensitivity is the
proportion of those
with the disease who
tested positive
Indicates how good a
test is at identifying
the diseased

Specificity
z

Specificity is the
proportion of those
without the disease
who tested negative
Indicates how good a
test is at identifying
the non-diseased

No disease

TEST RESULT

Positive PV and Negative PV

Presence or absence
determined by a
gold standard

Categorical

Validity
Sensitivity and specificity

The disease as
outcome variable

A
A + C

x 100

Specificity =

D
B + D

x 100

Sensitivity and specificity


z

Describe the performance of a test

A test with a high sensitivity is useful to


RULE OUT the disease

A test with a high specificity is useful to


CONFIRM the presence of disease

Predictive values (PV)


the usefulness of a test
z A test of efficient use of time and
resources
z PV estimate the probability of disease
z PV describe the frequency of correct
identification
z Positive PV and Negative PV

Predictive values
TRUTH

z Assess

Disease

Positive

A
True-positive

B
False-positive

Negative

C
False-negative

D
True-negative

PV+ =

Predictive values
PV of a positive test is the
proportion of individuals
who test +ve and have the
disease

The positive PV estimates


the likelihood that a
person who tests positive
has the disease

PV of a negative test is the


proportion of individuals
who test -ve and dont
have the disease

The negative PV indicates


the likelihood that a
person who tests negative
is actually disease free

D
x 100
C + D

Greatest value in deciding whether to


implement a screening program

Not useful if positive PV is low

Comments
z

Disease status
Cancer

No cancer

Total

Positive
Negative

132
47

985
62295

1117
62342

Total

179

63280

63459

z
z
z

0.3%
73.7%
98.4%
11.8%
99.9%

PV-

Sensitivity, specificity and PVs

Prevalence: 179/63459 x 100 =


Sensitivity: 132/179 x 100 =
Specificity: 62295/63280 x 100 =
+ve PV:
132/1117 x 100 =
-ve PV:
62295/62342 x 100 =

A
x 100
A + B

Predictive values

Mammography

No disease

TEST RESULT

Mammography had an excellent specificity (98%)


False +ve tests outnumber the true +ve tests by
over 7:1 (PV+ =12%)
~7 in every 8 patients who had positive
mammograms had normal biopsies
Predictive value for a positive test is low (12%)

(False +ve 88.2%)


(False -ve 0.01%)

Shapiro et al., 1988

Predictive Value Of A Test Is


Affected By Prevalence Of Disease

SUMMARY
A screening test study determines the
usefulness of a test in identifying those at
risk of a disease
z Students must be able to calculate and
interpret sensitivity, specificity & predictive
values.
z

THANK YOU

Year-2, Semester-1

Trigger:
You are the State Medical Officer for AIDS/HIV of Negeri Sembilan and you are expected
to conduct a sentinel surveillance for HIV amongst;
o

Pregnant mothers (Antenatal Screening)

STD clinic patients

Page 2 of 12

Year-2, Semester-1

Data Information Sheet-1 Choosing The Appropriate Screening Test


To select the appropriate screening test, you did a literature review and collated the following tables.
Calculate the sensitivity, specificity, PPV and NPV of each test to help you decide.

Positive
Negative
Total

Disease
Present
TP
FN
TP + FN

Disease
Absent
FP
TN
FP + TN

Total
TP + FP
FN + TN
N

TP = True Positive
FP = False Positive
FN = False Negative
TN = True Negative
Sensitivity = TP/(TP+FN) x 100%

HIV Enzyme Immuno Assay (EIA)


Gold Standard
+
+
1000
9
EIA (blood)
0
8991
total
1000
9000

total
1009
8991
10,000

HIV Particle Agglutination Test


Gold Standard
+
+
999
270
PA
1
8730
total
1000
9000

total
1269
8731
10,000

Specificity = TN/(TN+FP) x 100%


PPV = TP/(TP+FP) x 100%
NPV = TN/(TN+FN) x 100%

HIV Rapid Test Kit

Rapid Test

+
total

Gold Standard
+
998
180
2
8820
1000
9000

total
1178
8822
10000

+
total

Gold Standard
+
930
180
70
8820
1000
9000

total
1110
8890
10000

Oral Rapid Test Kit

Oral Test Kit

EIA

PA

Rapid

Oral

Sensitivity

Specificity

PPV

NPV

Which is the best screening test?

Page 5 of 12

Year-2, Semester-1

Data Information Sheet-2 Effect of Prevalence on Sensitivity & Specificity


Based on the earlier analysis, HIV EIA, a test with sensitivity of 100.0% and specificity of 99.9% was
selected to be used for the sentinel surveillance in Negeri Sembilan. You decided to include the
inmates of Pusat Serenti Tampin and Pusat Serenti Jelebu in the sentinel surveillance. Each study
population consisted of 10,000 people. Calculate the PPV, NPV and prevalence rate of HIV for each
study population.
Antenatal mothers
Disease
Present

Disease
Absent

Total

PPV =
NPV =

Positive

10

13

Negative

9987

9987

Total

9997

10000

Disease
Present

Disease
Absent

Total

Positive

10

19

Negative

9981

9981

Total

9991

10000

Disease
Present

Disease
Absent

Total

Positive

2000

2008

Negative

7992

7992

Total

2000

8000

10000

Blood donors

PPV =
NPV =

IVDU

Population
Antenatal mothers
Blood donors
IVDUs

Population with
HIV
3
9
2000

PPV =

Population
without HIV
9987
9991
8000

NPV =

TOTAL

Prevalence rate

10,000
10,000
10,000

Since the sensitivity and specificity is the same for all three study populations, please discuss how
PPV and NPV are affected by the prevalence of the disease in each study population.

PPV and NPV can also be calculated using the following formulas;
PPV =

Prevalence
x
Sensitivity
(Prev x Sen) + (1 - Prev)x (1 - Sp)

NPV = (1-Prevalence) x Specificity


(1-Prev) x Sp + Prev x (1 - Sen)

Page 6 of 12

Year-2, Semester-1

Data Information Sheet-3 Effect of Prevalence on Sensitivity & Specificity

Population
Sensitivity
Specificity
Prevalence
0.01%
0.02%
0.03%
0.05%
0.09%
1.00%
5.00%
10.00%
20.00%
30.00%

Hypothetical Illustration of Screening Programme with Test Kit


+
10,000
+
a
b
a+b
100.00%
c
d
c+d
99.90%
a+c
b+d a+b+c+d
TP
FP
FN
TN
PPV
a
b
c
d
a+c
b+d
a/a+b
1
10
0
9,989
1
9,999
9.09%
2
10
0
9,988
2
9,998
16.67%
3
10
0
9,987
3
9,997
23.08%
5
10
0
9,985
5
9,995
33.34%
9
10
0
9,981
9
9,991
47.39%
100
10
0
9,890
100
9,900
90.99%
500
10
0
9,491
500
9,500
98.14%
1000
9
0
8991
1000
9,000
99.11%
2000
8
0
7992
2000
8,000
99.60%
3000
7
0
6993
3000
7,000
99.77%

NPV
d/c+d
100.00%
100.00%
100.00%
100.00%
100.00%
100.00%
100.00%
100.00%
100.00%
100.00%

<- Antenatal
<- Blood Donors

<- Pusat Serenti

PPV based on Prevalence, Sensitivity & Specificity

prevalence
20.0%
10.0%
5.0%
1.0%
0.1%

99%
99%
96.1%
91.7%
83.9%
50.0%
9.0%

sensitivity %
specificity %
95%
90%
95%
90%
82.6%
69.2%
67.9%
50.0%
50.0%
32.1%
16.1%
8.3%
1.9%
0.9%

80%
80%
50.0%
30.8%
17.4%
3.9%
0.4%

Page 7 of 12

Year-2, Semester-1

References:
Osman Ali. 1990. Kaedah Epidemiologi. Penerbit: Dewan Bahasa Dan Pustaka.
UNAIDS/WHO. 2004. UNAIDS/WHO Policy Statement on HIV Testing
http://www.who.int/ethics/topics/hivtestingpolicy_who_unaids_en_2004.pdf
WHO (March 1997) Revised Recommendation for the Selection and Use of HIV Antibody Tests.
Weekly Epidemiological Record, No. 12. http://www.who.int/docstore/wer/pdf/1997/wer7212.pdf
WHOSEA. 1998. Standard Operating Procedures for Diagnosis of HIV Infection.
http://w3.whosea.org/bct/332/diagnosis1.htm
CDC. 2005. What are the different HIV screening tests available in the U.S.?
http://www.cdc.gov/hiv/pubs/faq/faq8.htm
USFDA. 2006. Donor Screening Assays for Infectious Agents and HIV Diagnostic Assays
http://www.fda.gov/cber/products/testkits.htm
Joseph Hellweg. 2005. Narrative and Secrecy: Sentinel Surveillance and Alternative Epidemiologies of
HIV/AIDS in Northwestern Cte d'Ivoire. Africa Conference 2005: African Health and Illness.
http://www.utexas.edu/conferences/africa/2005/panels/hellweg.html
Trisha Greenhalgh. 1997. How to read a paper: Papers that report diagnostic or screening tests. BMJ
1997;315:540-543 (30 August) http://bmj.bmjjournals.com/cgi/content/full/315/7107/540

Page 8 of 12

PRACTICALS GUIDE
Medicine & Society Module (FF2613)
INTRODUCTION
In this module there will be 4 practical sessions for the research project and statistical exercises.
Students will be guided by the respective lecturer/tutor assigned to each lab.
The schedule for the practical sessions for this semester is as stated below;
DATE
14/07/10

TIME
10.30 12.30

TOPIC

CONTENT

Descriptive Statistics
& Research Project 1

Manipulation and presentation of data using the


given dataset, including calculating the measures
of central tendencies and variability using
statistical formulas.
Determine the title, objective, problem
framework, hypothesis and methodology.
Once the above has been agreed upon, as
homework, they are expected to write up the
proposal, including the questionnaire, which will
be discussed during the second practical session.

21/07/10

10.30 12.30

Analysis of
Quantitative Data &
Research Project 2

Calculation and interpretation of t-tests and


proportionate tests using the given dataset.
Presentation of the complete research proposal.
Upon acceptance, as homework, the students are
expected to distribute the questionnaires and
collect the data for the study. All completed
forms are to be brought to the third practical
session.

20/08/10

2.30 4.30

Correlation &
Research Project 3

Calculation and interpretation of correlation and


regression using the given dataset.
Students are guided on how to enter the data into
the computer using Excel or SPSS. Each lab is
required to prepare a notebook for the session.
For homework, students will complete the data
entry for all collected data and bring the
complete file to the fourth practical session.

27/08/10

10.30 12.30

Chi-Square, NonParametric and


Research Project 4

Calculation and interpretation of non-parametric


and chi-square tests using the given dataset.
Each lecturer will demonstrate how to analyse
the data using computer and advice on the
interpretation of results. For homework, the
students will complete the analysis and prepare a
PowerPoint presentation for the final practical
session.

20/09/10
24/09/10

2.00 4.00
10.00 12.00

Research Project 5

Presentation of their findings. For homework,


the students will prepare a written report of the
study, to be submitted in two weeks time from
their presentation.

Practical 1
Descriptive Statistics
Introduction
In the old curriculum, the practical sessions were slotted immediately after the
respective lectures. In the past we had 25 hours of lectures and 8 practical sessions
just for statistics and research methodology. Now we only have 7 hours of lecture and
4 practical sessions for statistics and research methodology in the new curriculum.
Whenever possible, we try to slot the practical sessions according to lectures. But we
cant cover everything; therefore students are also expected to learn on their own.
Please be patient and persists in doing the exercises.
For this session, we are will learn about measures of central tendency and
variability. We use these measures of central tendency and variability to describe the
data that we collected. The measures of central tendency are mean, mode and median.
For variability, it is standard deviation (sd). Kindly refer to your formula sheet or your
books for help.
Measures of Central Tendency for Quantitative Data
1. Write down the formulas for mean in the boxes below;
Basic Formula

Formula for grouped data (Formula A)

2. Calculate the mean, mode and median for the age i of the following respondents;
35 24 36 21 21 20 34 29 37 30 26 27 29 34 33 33 27 25 21 26 32 30 33 36 28 33 19
29 27 29 22 23 31 32 31
Total = ___________
Mean = __________

n = ________

Median = __________
Mode = __________

3. Write down the formulas for standard deviation in the boxes below;
Basic Formula

Formula for grouped data (Formula A)

4. Using the data from Q.2, calculate the standard deviation and variance of the age i
of respondents.
x
x-mean (x-mean)2
19.00
20.00
21.00
21.00
21.00
22.00
23.00
24.00
25.00
26.00
26.00
27.00
27.00
27.00
28.00
29.00
29.00
29.00
Total

x
29.00
30.00
30.00
31.00
31.00
32.00
32.00
33.00
33.00
33.00
33.00
34.00
34.00
35.00
36.00
36.00
37.00

x-mean (x-mean)2

Total

Total (x-mean)2 = _______________


Therefore standard deviation s = _________________
It is easy to calculate the mean and standard deviation for data with few observations.
But for studies with large number of samples, it is much harder. Therefore for large
studies, the quantitative data are sorted in frequency tables such as the one below;
5. These are data from a case-control study to identify factors that are associated with
small for gestational age amongst newborn babies. For the table below, the factor
being studied is the weight of the mothers during first trimester (first three months of
pregnancy) and the incidence of babies with low birth weight.
Weight during first
All
Frequency Frequency of
trimester in kg Frequencies of Cases
Controls
30.0-39.9
5
5
0
40.0-49.9
69
48
21
50.0-59.9
82
43
39
60.0-69.9
45
10
35
70.0-79.9
10
2
8
80.0-89.9
3
1
2
90.0-99.9
4
1
3
Total
218
110
108

For the following exercise, calculate the mean, mode, median and standard deviation
for both cases and controls. To simplify matters, just fill up the table below;
For cases;
Weight in kg
30.0-39.9
40.0-49.9
50.0-59.9
60.0-69.9
70.0-79.9
80.0-89.9
90.0-99.9
Total

Frequency
5
48
43
10
2
1
1
110

m.p
34.95
44.95
54.95
64.95
74.95
84.95
94.95

f.mp

f.mp2

f cumulative
5
53
96
106
108
109
110

For controls;
Weight in kg
30.0-39.9
40.0-49.9
50.0-59.9
60.0-69.9
70.0-79.9
80.0-89.9
90.0-99.9
Total

Frequency
0
21
39
35
8
2
3
108

m.p
34.95
44.95
54.95
64.95
74.95
84.95
94.95

f.mp
0

f.mp2
0

f cumulative
0
21
60
95
103
105
108

f.mp2 means frequency x (midpoint)2, not (fmp)2


Fill up your answers in the table below;
Case

Control

Mean
Mode
+

Median
+

Standard deviation
=

The answers above will be used in the coming practical sessions.

Hakcipta terpelihara Dr Azmi Mohd Tamil


Amali1.doc 8-4-07.

Practical 1b
Research Proposal
Each lab group is required to come up with a research proposal, collect the data
required, analyse the data, present their findings and write up the final report for
submission.
For this session, the students are expected to agree on the;
Title of the research
Objectives
Problem Framework
Hypothesis
Methodology
Once the above has been agreed upon, as homework, they are expected to write up the
proposal, including the questionnaire, which will be discussed during the second
practical session.

Practical 2
Inferential Statistics
Statistical Tests & Types of Variables
In general there are 2 types of variables; qualitative & quantitative. When you want to
test the association between 2 variables, the type of test to be utilised depends on the
type of variables. The tables below gave a general guide on the correct statistical test
for the respective variable types.
Qualitative Data Analysis

Parametric Analysis
Qualitative
Dichotomus
Qualitative
Polinomial
Quantitative

Quantitative

Normally distributed data

Student's t Test

Quantitative

Normally distributed data

ANOVA

Quantitative

Repeated measurement of the Paired t Test


same individual & item (e.g. Hb
level before & after treatment).
Normally distributed data

Quantitative continous

Quantitative - Normally distributed data


continous

Pearson Correlation &


Linear Regresssion

Non-Parametric Analysis
Variable 1
Qualitative
Dichotomus
Qualitative
Dichotomus
Qualitative
Polinomial
Quantitative
Quantitative continous

Variable 2
Qualitative
Dichotomus

Criteria
Type of Test
Sample size < 20 or (< 40 but Fisher Test
with at least one expected value
< 5)
Quantitative Data not normally distributed
Wilcoxon Rank Sum
Test or U MannWhitney Test
Quantitative Data not normally distributed
Kruskal-Wallis One
Way ANOVA Test
Quantitative Repeated measurement of the Wilcoxon Rank Sign
same individual & item
Test
Quantitative - Data not normally distributed
Spearman/Kendall
continous
Rank Correlation

Practical 2
This is the second practical session for this module. In this session, we will be
conducting exercises on Students t-test, paired t-test and proportionate test.
Students t-test
1a. Write down the formula for Students t-test in the boxes below;
Basic Formula

Sample size > 30

Small sample size & equal variance

b. Based on results from the previous session, Q5, complete the boxes below;
Case

Control

110

108

Mean
Standard deviation
n

The hypothesis that we want to test out is that;


There is a difference of first trimester body weight between the cases (mothers with
SGA babies) and controls (mothers with non-SGA babies).
c. Write down the null hypothesis;

d. Calculate the t for Students t-test for the above exercise;

e. Please refer to table A1 and A3, and try to estimate the p value from the t value
calculated. Discuss which table is more appropriate for this exercise.
f. Based on the above p value, is the null hypothesis rejected?
g. Is there a significant difference of first trimester weight between the two groups?
Explain your answer.
2. During the examination, we will not tell you what test to use. Instead the students
are expected to choose the appropriate one based on the problem and the data given.
For example, try to do the exercise below;
A case-control study to identify factors that can cause small for gestational age SGA
was conducted. Among the factors studied were the mothers heights. It is believed
that the shorter mothers were of higher risk to get SGA babies.
Total of samples n
Total of weight x
Total of (x-mean)2
Total of samples n
Total of weight x
Total of (x-mean)2

Case
110
16620
2326

Control
108
16439
3605
Both groups
218
33059
5931

a. State the hypothesis and null hypothesis for the above problem.

b. What is the appropriate statistical test to prove this hypothesis?


c. Using the data given, conduct the statistical test.

d. What is your conclusion, based on your answers in Q2c?

Paired t-test
3a. Write down the formula for paired t-test in the box below;
Basic Formula

b. Thirty of the pregnant mothers were found to be anaemic during their second
trimester follow-up. They were treated with haematinics for 2 months and their
haemoglobin levels were measured again. To measure the effectiveness of the
treatment, please complete the table below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Hb1
9.3
9.5
9.5
9.6
9.7
9.8
9.8
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.1

Hb2
9.5
10.0
10.0
11.0
12.0
9.0
9.6
7.2
9.6
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.3
10.5
10.6
10.8
11.0
11.0
11.0
11.0
11.5
13.0
13.0
13.0
11.0

D2

Total

c. Is the intervention effective? Do a paired t-test analysis using the data above.

d. Discuss the result of your statistical test.

Proportionate Test
4a. Write down the formula for proportionate test in the box below;
Basic Formula

The rate of SGA for mothers exposed to cigarette smoke (passive smoker) was
89/156. The rate of SGA for mothers not exposed to cigarette smoke was 20/61.
b. State the appropriate null hypothesis.

c. Do the proportionate test and discuss its result using 0.05 as the level of
significance (the z value in the normal distribution table for 0.05 as the level of
significance is 1.96).

Research Project 2
Presentation of the complete research proposal. Upon acceptance of the proposal, as
homework, the students are expected to distribute the questionnaires and collect the
data for the study. All completed forms are to be brought to the third practical session.

Hakcipta terpelihara Dr Azmi Mohd Tamil


Amali4.doc 8-8-06.

Practical 3
Inferential Statistics 2
Introduction
This is the third practical session. In this session we will do exercises on Pearson
correlation and linear regression.
Pearson Correlation
1a. Write down the formula for Pearson Correlation in the boxes below;
Basic Formula for r

(x-mean x)2

(y-mean y)2

(x- mean x)(y-mean y)

As you can see from the formulas above, to calculate the correlation coefficient (r),
you need to identify the following;
Total of the first variable (x),
Total of the first variable squared (x2),
Total of the second variable (y),
Total of the second variable squared (y2) and
Total of the two variables multiplied (xy).
Just imagine the number of calculations that you have to do before you even get to
calculate the correlation coefficient (r). If the sample size is 150, you will have to do
more than 455 calculations. Since youll be doing this calculations manually, the
chance of error occurring is quite high indeed.
For exercise, complete the following table. Measure the time required to complete it.
Once done, please note that you may have to do the same thing again for a dataset 5
times larger than this..
2.

A case-control study to identify factors that can cause small for gestational age
SGA was conducted.
In the past exercise, we have proven that there is an association between the
mothers first trimester weight and SGA.
Now we want to see whether there is an association between the mothers first
trimester weight (WEIGHT2) and the childs birth weight (BIRTHWGT).
Please complete the following table;

INDEX
9
10
12
20
21
29
31
32
34
43
60
70
72
79
90
97
117
126
131
138
145
146
156
159
171
173
174
175
178
181
TOTAL

WEIGHT2
42.00
40.00
66.00
51.50
47.50
39.50
40.00
46.50
55.00
49.20
45.00
63.50
52.40
52.30
47.50
62.00
55.10
72.00
61.50
86.00
60.80
44.00
58.00
70.00
44.00
59.50
47.50
53.00
62.50
92.00

WEIGHT22

BIRTHWGT
2.40
2.30
2.30
2.10
2.23
2.49
2.46
2.52
2.28
2.20
2.48
2.00
2.31
2.15
2.55
2.41
3.46
3.50
2.97
3.48
3.00
2.84
3.55
3.19
3.09
3.56
3.16
3.10
3.27
3.00

BIRTHWGT2

xy

a. State the null hypothesis for correlation test between the two variables.
b. Conduct the correlation test and calculate the r (correlation coefficient). How
strong is the relationship between the two variables?

c. Is the r significant? What is the p value? How is it calculated?

If the r is significant, it is best to demonstrate it using a scatter diagram like


the one below;
3.6
3.4
3.2
3.0
2.8
2.6
2.4
2.2
2.0

Babies' Birthweight

1.8
1.6
1.4
1.2
1.0
.8
.6

r = 0.431, p = 0.017

.4
.2
0.0

Rsq = 0.1874
0

10

20

30

40

50

60

70

80

90

100

Mothers' Weight

To expect the students to calculate all that during the examination, would be
rather cruel. Instead, usually, all the required data will be given, along with some
extraneous data, just to confuse the students. It is up to the students to select the
appropriate data and use it in the appropriate statistical test.
3. A case-control study to identify factors that can cause small for gestational age
SGA was conducted. Among the factors studied were whether there is an
association between the mothers height in cm (HEIGHT) and the childs birth
weight in kilogram (BIRTHWGT).
n = 218
Mean
Standard deviation
(observation)
(observation2)
(observation 1 x observation 2)

HEIGHT
151.65
5.26
33059.00
5019291.00

BIRTHWGT
2.79
0.54
608.46
1760.98
92386.35

a. Name the appropriate statistical test to test the association between the two
variables.
b. State the null hypothesis for the above statistical test.
c. Conduct the statistical test including the test of significance. Discuss the result of
the test.

Linear Regression
4a. Write down the formula for linear regression in the boxes below;
Basic Formula

b. Using the data from Q2, conduct the test for linear regression and calculate the
regression co-efficient (b) and constant (a).

c. Write down the final equation of the calculation.


d. Draw a rough diagram of the final equation from the calculation.

Research Project 3
Students will be guided on how to enter the data that they have collected into the
computer using Excel or SPSS. Each lab is required to prepare a notebook for the
session.
For homework, students are required to complete the data entry for all collected data
and bring the completed file to the fourth practical session.
Hakcipta terpelihara Dr Azmi Mohd Tamil
Amali5.doc 16-8-06.

!"#$%&$#'()(
*+,-"-+%&#'(.%#%&/%&$(0(
(
*+%"123$%&1+!
!
"#! $%&'! ()*+$&+*,! '-''&.#/! 0-! 0&,,! 1-! 2.&#3! -4-)+&'-'! .#! +%&5'67*)-! $-'$! *#2! #.#5
(*)*8-$)&+!*#*,9'&':!!
!
45&6.73#"-(8-/%(9:;<(
!
;%&'!&'!$%-!8.'$!<)-67-#$!'$*$&'$&+*,!*#*,9'&'!$%*$!&'!$-'$-2!<.)!27)&#3!-4*8&#*$&.#:!=.!
8*>-! '7)-! $%*$! 9.7! )-*,,9! 7#2-)'$*#2! &$:! ;%&'! *#*,9'&'! &'! 2.#-! $.! $-'$! <.)! *''.+&*$&.#!
1-$0--#!$0.!67*,&$*$&?-!?*)&*1,-':!!
!
@1'-)?-2!2*$*!0.7,2!1-!'.)$-2!*++.)2&#3,9!&#!*!+.#$&#3-#+9!$*1,-:!;%-#!$%-!-4(-+$-2!
?*,7-!$*1,-!&'!+*,+7,*$-2/!7'&#3!$%-!).0'!*#2!+.,78#!$.$*,'/!*'!&,,7'$)*$-2!1-,.0A!!
!
@1'-)?*$&.#!;*1,-!
!
B!
5!
!
B!
*!
1!
3!
5!
+!
2!
%!
!
-!
<!
#!
!
C4(-+$-2!D*,7-!;*1,-!
!
B!
5!
!
B!
-3E#!
<3E#!
3!
5!
-%E#!
<%E#!
%!
!
-!
<!
#!
!
!
F%&5'67*)-! &'! +*,+7,*$-2! 19! '788&#3! 7(! G.1'-)?-2! H! -4(-+$-2IJE-4(-+$-2! <.)! -*+%!
+-,,:!
!
!
J!
KJ L!M!!G@5CI
!
!!!
C!
!
2<!L!G)!H!NI!G+!H!NI!
!
!
O.!$%-!<.,,.0&#3!-4-)+&'-:!;%-#!+.8(*)-!0&$%!$%-!*#'0-)!<.)!P:Q+!<).8!R)*+$&+*,!J:!!
!

N: ;%-! )*$-! .<! =ST! <.)! 8.$%-)'! -4(.'-2! $.! +&3*)-$$-! '8.>-! GU(*''&?-! '8.>-)VI! &'!
WXENYZ:!;%-!)*$-!.<!=ST!<.)!8.$%-)'!#.$!-4(.'-2!$.!+&3*)-$$-!'8.>-!&'!J[EZN:!!
!
@1'-)?*$&.#!$*1,-!
!
=ST!
\.)8*,!
!
R*''&?-!=8.>-)!
WX!
Z]!
NYZ!
\.#5=8.>-)!
J[!
QN!
ZN!
!
N[X!
N[W!
JN]!
!
*: F.8(,-$-!$%-!$*1,-!.<!-4(-+$-2!?*,7-'!1-,.0A!
!
!
=ST!
\.)8*,!
!
R*''&?-!=8.>-)!
!
!
!
\.#5=8.>-)!
!
!
!
!
!
!
!
!
1: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!
!
!
!
+: ^%*$! &'! $%-! )*$-! .<! =ST! <.)! (*''&?-! '8.>-)'! *#2! #.#5'8.>-)'_! "'! $%-)-! *#9!
2&<<-)-#+-_!!
!
!
!
2: F.#27+$! $%-! *(().()&*$-! '$*$&'$&+*,! $-'$! $.! ().?-! 9.7)! %9(.$%-'&':! O&'+7''! 9.7)!
<&#2&#3':!
(
(
(
(
(
(
(
(
(
(
(
(
=&/5-">/(?@#$%(8-/%(
!

`&'%-)a'!C4*+$!;-'$!&'!+.#27+$-2!$.!$-'$!$%-!*''.+&*$&.#!1-$0--#!J!67*,&$*$&?-!?*)&*1,-'!
*#2!%*'!*!'8*,,!'*8(,-!'&b-A!,-''!$%*#!J[!.)!,-''!$%*#!Q[!*#2!.#-!.<!$%-!-4(-+$-2!
?*,7-'!&'!,-''!$%*#!Y:!;%-!<.)87,*!&'!*'!<.,,.0'A!!
!
!
B!
5!
!
!!!!!!().1*1&,&$9!(!L!!!-c<c3c%c!!!c!
B!
*!
1!
3!
!
!
!!!!#c*c1c+c2c!
!
!
!
!
!
5!
+!
2!
%!
!
-!
<!
#!
!

J: `).8! $%-! -*),&-)! =ST! '$729/! Jd! .<! $%-! )-'(.#2-#$'! %*2! 8&'+*))&*3-'! &#! $%-! (*'$:!
e9! *#*,9'&#3! $%&'! 3).7(! .<! (*$&-#$'! 0&$%! (..)! .1'$-$)&+! %&'$.)9/! &'! $%-)-! *#!
*''.+&*$&.#! 1-$0--#! -4(.'7)-! $.! +&3*)-$$-! '8.>-! *#2! =ST_! e*'-2! .#! $%-!
<.,,.0&#3!+.#$&#3-#+9!$*1,-/!+.#27+$!$%-!*(().()&*$-!'$*$&'$&+*,!$-'$:!!
!

R*''&?-!=8.>-)!
\.#5=8.>-)!
!

=ST!
N[!
[!
N[!

\.)8*,!
]!
Z!
Nd!

!
N]!
Z!
Jd!

!
*: ^%*$!&'!$%-!(!?*,7-_!
!
1: ^%*$!+.#+,7'&.#!+*#!9.7!8*>-!<).8!$%-!*1.?-!)-'7,$'_!
(
(
A&'$1@1+(B#+C(.3D(8-/%((
(
;%&'!$-'$!&'!$%-!#.#5(*)*8-$)&+!-67&?*,-#$!.<!$%-!=$72-#$a'!$!$-'$/!17$!+.#27+$-2!.#!#.$!
#.)8*,,9! 2&'$)&17$-2! 2*$*:! "$! &'! 7'-2! $.! $-'$! <.)! $%-! *''.+&*$&.#! 1-$0--#! *! 67*,&$*$&?-!
2&+%.$.8.7'!?*)&*1,-!0&$%!*!67*#$&$*$&?-!?*)&*1,-:!;%-!8-$%.2!&'!'&8(,-/!f7'$!'.)$!$%-!
2*$*! &#! *#! *'+-#2&#3! .)2-)/! )*#>! $%-8/! '78! 7(! $%-! )*#>'! *++.)2&#3! $.! 3).7('! *#2!
+.8(*)-!$%-!?*,7-!0&$%!$%-!$*1,-!.<!+)&$&+*,!?*,7-'!<.)!^&,+.4.#!g*#>!=78!;-'$:!!
!
d: `.)! ()*+$&'-/! 2.! $%-! <.,,.0&#3! -4-)+&'-:! ;%-! 2*$*! &'! *! '71'-$! .<! $%-! -*),&-)! '$729:!
^-! *)-! $)9&#3! $.! '--! 0%-$%-)! $%-)-! &'! *#9! *''.+&*$&.#! 1-$0--#! -4(.'7)-! $.!
+&3*)-$$-!'8.>-!*#2!$%-!0-&3%$!.<!$%-!1*19:!=&#+-!$%-!'*8(,-!'&b-!&'!67&$-!'8*,,/!
$%-!*(().()&*$-!$-'$!&'!*!#.#5(*)*8-$)&+!*#*,9'&':!!
!
*: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!
!
-: F.#27+$! $%-! *(().()&*$-! '$*$&'$&+*,! $-'$! $.! ().?-! 9.7)! %9(.$%-'&':! O&'+7''! 9.7)!
<&#2&#3':!
!
E1+6.D1C-"(9+FGH<(
J&"%5(A-&K5%(
B#+C(
Q:J[!
!
d:XZ!
!
d:][!
!
d:ZN!
!
d:JZ!
!
d:NY!
!
d:NJ!
!
d:[[!
!
J:XW!
!
J:WQ!
!
J:WN!
!
J:Y]!
!
J:QQ!
!
J:Qd!
!
J:N[!
!

!#//&I-(.D1C-"(9+FGH<(
J&"%5(A-&K5%(
B#+C(
d:]Z!
!
d:Z[!
!
d:YY!
!
d:QW!
!
d:JY!
!
d:[Z!
!
d:[Y!
!
J:YY!
!
J:Q]!
!
J:QZ!
!
J:QY!
!
J:QY!
!
J:Qd!
!
J:d[!
!
J:[X!
!

A&'$1@1+(B#+C(.&K+(8-/%(
!
;%&'! $-'$! &'! $%-! #.#5(*)*8-$)&+! -67&?*,-#$! .<! $%-! (*&)-2! $! $-'$/! 17$! +.#27+$-2! .#! #.$!
#.)8*,,9! 2&'$)&17$-2! 2*$*:! "$! &'! +.#27+$-2! $.! $-'$! 0%-$%-)! $%-)-! &'! *#9! *''.+&*$&.#!
1-$0--#!J!67*#$&$*$&?-!?*)&*1,-'!0%&+%!*)-!)-(-*$-2!8-*'7)-'!.#!$%-!'*8-!&#2&?&27*,/!
.<! $%-! '*8-! $%&#3/! *$! 2&<<-)-#$! $&8-':! T'! &#2&+*$-2! 19! $%-! #*8-/! $%-! +*,+7,*$&.#!
2-(-#2'! .#! $%-! '&3#! *#2! )-,*$&?-! 8*3#&$72-! .<! $%-! 2*$*/! #.$! .<! $%-! )-*,! ?*,7-! .<! $%-!
2*$*:!!
!
Q: `.)! ()*+$&'-/! 2.! $%-! <.,,.0&#3! -4-)+&'-:! ;%-! 2*$*! &'! *! '71'-$! .<! $%-! -*),&-)! '$729:!
^-!*)-!$)9&#3!$.!'--!0%-$%-)!$%-!&#$-)?-#$&.#!.<!%*-8*$&#&+'!+*#!&#+)-*'-!$%-!,-?-,!
.<!%*-8.3,.1&#!.<!$%-!*#*-8&+!8.$%-)':!)-!&'!*#9!*''.+&*$&.#!1-$0--#!-4(.'7)-!$.!
+&3*)-$$-! '8.>-! *#2! $%-! 0-&3%$! .<! $%-! 1*19:! =&#+-! $%-! '*8(,-! &'! 67&$-! '8*,,/! $%-!
*(().()&*$-!$-'$!&'!*!#.#5(*)*8-$)&+!*#*,9'&':!!
!
*: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!
!
1: F.#27+$!$%-!*(().()&*$-!'$*$&'$&+*,!$-'$!$.!().?-!9.7)!%9(.$%-'&':!O&'+7''!9.7)!
<&#2&#3':!
!
!
!
!
!
!
!
"\OCh!
ieJ!
ied!
i1!O&<<!
g*#>!
N!
N[:Q!
N[:[!
!
!
Y!
N[:]!
NN:[!
!
!
Nd!
N[:Y!
NN:[!
!
!
N]!
N[:Z!
N[:W!
!
!
NX!
N[:Y!
NN:[!
!
!
J[!
N[:]!
NN:[!
!
!
JX!
N[:d!
X:Y!
!
!
Z[!
X:d!
X:Y!
!
!
ZN!
N[:[!
NN:Y!
!
!
XJ!
N[:Y!
N[:W!
!
!
XQ!
N[:Q!
W:J!
!
!
XY!
N[:[!
]:J!
!
!
NdQ!
N[:J!
NN:[!
!
!
NWW!
N[:[!
N[:[!
!
!
NX]!
N[:J!
N[:[!
!
!
(
(
B-/-#"$5(!"1L-$%()(
!
C*+%!,-+$7)-)!0&,,!2-8.#'$)*$-!%.0!$.!*#*,9'-!$%-!2*$*!7'&#3!$%-!+.8(7$-)!*#2!*2?&+-!
$%-!'$72-#$'!.#!%.0!$.!&#$-)()-$!$%-!)-'7,$':!`.)!%.8-0.)>/!$%-!'$72-#$'!0&,,!+.8(,-$-!
$%-!*#*,9'&'!*#2!()-(*)-!*!R.0-)R.&#$!()-'-#$*$&.#!<.)!$%-!<&#*,!()*+$&+*,!'-''&.#:!
i*>+&($*!$-)(-,&%*)*!O)!Tb8&!j.%2!;*8&,!
T8*,&Z:2.+!JQ5W5[Z:!