You are on page 1of 73

STATISTICAL METHODS IN

NURSING

Dr. R. C. Ram
Professor (Demography & Statistics)
Department of Community Medicine
Pt. JNM Medical College, Raipur
Means of data analysis
• Manual
• M. S. Excel programme
• Software packages:
i) Statistical Analysis Systems (SAS)
- a most comprehensive statistical system

ii) Statistical Product and Service Solutions (SPSS)


- formerly Statistical Package for Social Sciences
- latest version 17.0
iii) Epi- Info
- Developed by CDC Atlanta
- free public domain software package
- can be downloaded from
http:/www.cdc.gov/epo/epiinfo.htm
- for epidemiological investigations and
surveys

iv) BMDP (Bio Medical Data Processing)


Data Type

• Nominal
• Ordinal
• Numerical (quantitative)
Nominal/ categorical data

•Sex
•marital status
•caste
•religion
•residence etc.

(Studied by calculating proportion/ percentage/


rate/ ratio etc.)
Ordinal data
(Assigning various levels to the variable)

•Grade: I, II, or III


•Malnutrition level: mild, moderate, or severe
•Blood pressure level: high or low
(Studied by calculating proportions/ percentage/
rate/ ratio etc.)
Numerical data

•Weight
•age
•height
•blood pressure
•income etc.
(Studied by using measures of average,
variability, skewness, correlation, regression etc.)
Types of Analysis

• Descriptive :Summarization of data


• Inferential :Hypothesis testing i. e. selecting
representative sample(s) from the population(s) ,
and drawing inferences about the population
parameters based on sample statistics.
Descriptive Analysis
• Presentation of data
• Diagrammatic/ Graphical display
• Central tendency/ Averages
• Dispersion/ Variability
• Correlation
• Multiple correlation
• Simple Regression
• Multiple Regression
• Logistic Regression etc.
Inferential Analysis
Hypothesis Testing using:

• Z - test (Normal test)


• t – test (Student’s t – test)
• χ2- test (Chi-square test)
• F – test
• Analysis of Variance (ANOVA)
• Multivariate Analysis of Variance (MANOVA)
• Analysis of Covariance (ANCOVA)
I. Descriptive Analysis
(Summarization of data)
Presentation of data

 By Tables
 By Drawings
Tabular Presentation
 Contingency tables r*c (for qualitative i.e.
nominal/ categorical data)

 Frequency tables (for quantitative data)


-Univariate
-Bivariate
-Multivariate
Presentation by drawings

• A picture is worth 1000 words


• Pictures leave better and lasting visual
impression on the mind.
• Easy to comprehend
• Include diagrammatic display for nominal
data and graphical display for numerical
data
Diagrammatic display for nominal data

• Bar chart
• Circular diagram
• Map diagram
• Pictogram
Graphical display for numerical data:

• Histogram
• Frequency polygon
• Frequency curve
• Ogive
• Scatter plot
• Line graph
Measures of central tendency/ Averages

• Mean (Arithmetic Mean)


• Median
• Mode
Mean (Arithmetic Mean)

x
 x
n
Ex. Diastolic BP in mmHg of 10 healthy persons
90, 70, 80, 84, 82, 72, 78, 84, 90, 80

x 
 x

870
 87
n 10

Most sensitive to extreme values.


Median

Value of the middle most observation after placing in


ascending/ descending order.
Median=(n+1)/2 th value in ascending order.
Ex. The incubation period in years of AIDS for 7 cases:
3, 5, 2, 4, 3, 15, 2
Ans. Values in ascending order
2, 2, 3, 3, 4, 5, 15
Median= 4th value in ascending order
= 3 years
Note: Not sensitive to extreme values, therefore used to
skewed data
Mode
• The value that occurs most frequently.
• Used when the researcher wants to designate the
value that takes place most often.
Exa. The incubation period of polio in years for 100 cases
Incubation period No. of cases
17 2
18 4
19 11
20 70
21 10
22 3
Ans. Mode incubation period of polio 20 years
Measures of Dispersion/ Variability/
Spread
• Range
• Quartile Deviation
• Mean Deviation
• Standard Deviation
• Coefficient of Variation
Range
• It is the difference between the highest value and the
lowest value.
• Used when only the two extreme values need to be
emphasized.

Exa. Diastolic BP in mmHg of 10 healthy persons


90, 70, 80, 84, 82, 72, 78, 84, 90, 80

Range= Highest value - Lowest value


= 90 – 70
= 20
Standard Deviation

For n ≥ 30 (large sample)

S.D. =
 (x  x) 2

n
For n < 30 (small sample)

S.D. =
 (x  x) 2

n 1
Mean ± 1 S.D.: contains 68 % observations

Mean ± 2 S.D.: contains 95.4 % observations

Mean ± 3 S.D.: contains 99.7 % observations


Coefficient of Variation

 C.V. = (SD/ Mean)*100


 A measure of relative variation among two or
more groups of data
 The group having higher CV is considered as
having more spread than the other group(s).
 Frequently used in laboratory testing and quality
control procedures.
Exa. Mean Hb. SD
Male 12 3
Female 15 3.3
Which group shows more variability in their Hb.
level?
Ans. CV(males) = (3/12)*100 = 25 %
CV( females)= (3.3/15)*100 = 22%

Since CV(males) > CV( females), therefore males


have more variability in their Hb. levels, though the
SD of females is greater than that of males.
Relationship among variables

 Between two variables


 Among more than two variables
Relationship between two variables
 Scatter Plot
 Correlation Coefficient
 Rank Correlation
 Odds Ratio
 Relative Risk
 Linear Regression
Relationship among three or more
variables
 Multiple Correlation
 Multiple Regression
 Logistic Regression
Measures of relationship between two
variables
Scatter Plot/ Scatter Diagram:
 A graphical display
 Not precise measure
 Before finding correlation coefficient, draw it to
ensure that the relationship is linear
Correlation coefficient:

 Relationship between two numerical


characteristics

 Correlation does not imply causation, e.g. unsafe


sex practices are linked with AIDS but the
causative agent of AIDS is HIV.
Karl Pearson coefficient of correlation:

r=
 ( x  x )( y  y )
 (x  x) *  ( y  y)
2 2

-1 ≤ r ≤ 1
Types Correlation coefficient
Negative and perfect r= -1
Negative and partial - 1< r < 0
No correlation r=0
Positive and partial 0<r<1
Positive and perfect r=1
However, the significance of r can only be
examined by using Student’s t test.

Exercise: In two separate studies the correlation


between body weight and Sy BP level were found as
i) n= 11, r= 0.6, p > 0.05 Not significant
ii) n=102, r= 0.6, p< 0.001 Significant

r
Using t- test, t n2
1 r 2

with d.f. = n-2


Spearman Rank Correlation Coefficient:
 Relationship between two ordinal
 characteristics
 A non- parametric method
 Even for one or both numerical variables when
skewed

6 d 2

r  1
n(n  1)
2
Odds Ratio (OR)

Relationship between two nominal characteristics

That is, the relationship between a risk factor and
the occurrence of a given outcome (say disease).

Provides a way to look at risk in case- control
studies.

In case- control studies one group with disease and
the other without disease is taken opposite to cohort
studies

One of the two risk ratios namely OR and RR
OR = (Odds that a person with an adverse outcome was
at risk)/ Odds that a person without an adverse
outcome was at risk)
Odds Ratio (OR)

Disease No Disease Total

Risk factor a b a+b


present
Risk factor c d c+d
Absent
Total a+ c b +d a+b+c+d

OR = (a/ b)/ (c/ d) = ad/ bc


Odds Ratio (OR)

Group With With out Total


respiratory respiratory
distress distress
TRH 260 132 392
(Thyrotropin
Releasing
Hormone)
Placebo 244 133 377

Total 504 265 769


(cases) (Controls)

OR = (a/ b)/ (c/ d) = 1.07


OR=1.1 means that an infant in the TRH group is 1.1
times more likely to develop Respiratory Distress
Syndrome than an infant in the placebo group.

The OR was not statistically significant.


Relative Risk (RR):

Relationship between two nominal characteristics

Provides a way to look at risk in cohort studies.

In cohort studies one group of subjects with risk
factor and the other without risk factor is identified,
then followed through time to determine which
person develop the outcome (say disease ) of interest.

RR =Incidence among exposed / Incidence among non-


exposed
Relative Risk (RR)

Disease No Disease Total

Risk factor a b a+b


present
Risk factor c d c+d
Absent
Total a+ c b +d a+b+c+d

RR = = (a/ a +b) / (c/ c+ d)


Relative Risk (RR)

Group MI With out Total


MI
Aspirin 139 10898 11037

Placebo 239 10795 11034

Total 378 21693 22071

RR = 0.581
RR=0.58 which is < 1 means that patients with Aspirin
were 0.58 times more likely to have an MI than in
the placebo group.

The RR value can be tested for its significance.


Regression Analysis

Functional relationship between a response variable
and one or more explanatory variables

Used in prediction

Types:
i) Simple/linear regression
ii) Multiple regression
iii) Logistic regression
Linear regression
 Only one explanatory (independent)variable is
used to predict an outcome
 Correlation and regression measure only a
straight line or linear relationship between two
variables
 Y= a+ bx is the regression equation of y on x,
where x is independent and y is dependent
variable
Regression coefficient
If y: dependent/ response variable
x: independent/ predictor variable
Then the regression coefficient of y on x
y y
byx =
xx

It is nothing but ‘b’ in the regression line of


y on x : y = a + bx
Exa. y: Insulin sensitivity among hyperthyroid
women
x: BMI
Regression line of y on x:
Y= 2.336 – 0.077* X
IS= 2.336 – 0.077*BMI
• Insulin sensitivity levels can be predicted
for BMI among hyperthyroid women
• Predicted Insulin sensitivity decreases by
approximately 0.077 for every unit increase in
BMI
Measures of relationship among three or
more characteristics

One dependent and two or more independent


variables
Multiple correlation
(Three or more numerical variables)
Generalization of simple correlation
X1 = Birth weight of new born (response/ outcome)
X2 = Age of the mother (explanatory)
X3 = Mother’ nutritional status (explanatory)

The multiple correlation of X1 on X2and X3 is the


correlation coefficient between X1 and the joint impact
of X2 and X3 on X1.
Represented by R 1.23
where 0 ≤ R 1.23 ≤ 1
If R 1.23 = 0
Then X1 is completely uncorrelated with X2
and X3

If R 1.23 = 1
Then the correlation is perfect.
Multiple Regression
 It is generalization of simple regression
 Two or more explanatory variables are used to predict
an outcome
 All variables are numerical
Y = a + b1 x1 +b2 x2
Exa.
Predicted IS= 2.291 – 0.068*BMI – 0.004* Age
Logistic Regression
 Also named as logistic model or logit model
 Used for prediction of the prob. of occurrence of
an event , say HD
 The outcome variable is binary/ dichotomous
 The independent variables include both numerical
and nominal measures
 Before applying the Logistic Regression, apply χ2
-test to determine whether an independent variable
adds significantly to the prediction
The logistic model for 3 predictors
logit (p) = ln (p/ 1-p) = b0+ b1x1 +b2 x2 +b3x3
p : prob. of the occurrence of the outcome, say DH
x1 : Sex ( male-1, female-0)
x2 : BP
x3 : Age
logit (p) = 2+ 1.2 (sex)+1.01 (BP)+1.04 (age)
ie i) males have 20% higher risk of HD than females
ii) for every unit increase in BP, the risk of HD
increases by a factor of 1.01 (or by 1%)
iii)for every 1 year increase in age, the risk of HD
increases by a factor of 1.04 (or by 4%)
II. Inferential Analysis
(Hypothesis Testing/ Tests of Significance)

An approach to statistical inference resulting


into a decision to accept or reject the null
hypothesis.
Types of tests of significance

• Parametric Tests
• Non- parametric Tests
Parametric Tests
 Statistical tests that make assumption
regarding the distribution of the observations.
 Some parametric tests are:
 Z- test
 t- test
 F- test
 ANOVA
 MANOVA
 ANCOVA
Non- parametric Tests
 Statistical tests that make no assumption
regarding the distribution of the observations.
 Also called distribution free methods.

 Some non- parametric tests are:


 χ2- test
 Sign median test
 Mann- Whitney test
Some Parametric Tests

These tests of significance are based on the


assumption that the observations follow normal
distribution.
Z - test (Normal test)
(Large sample test: n≥ 30)

Applications:
• Significance of difference between two
means
• Significance of difference between two
proportions
• Significance of difference between two
standard deviations
t – test (Student’s t – test)
(Small sample test: n< 30)

Applications:
• Significance of difference between two
means
i) Unpaired t – test (Independent samples)
ii) Paired t- test (Dependent samples)
• Significance of correlation coefficient
F – test

Applications:
• Equality of two variances
• Equality of several means :
(Analysis of Variance- ANOVA)
Analysis of Variance (ANOVA)
 A statistical procedure that determines
whether any difference exists among 3 or more
groups of subjects on one or more factors.
 F- test is used in ANOVA.
 χ2- test can be extended for 3 or more groups
when the outcome is a categorical (counted).
 When the outcome is numerical, means are
used, t-test can be used for comparison of 2
groups, and ANOVA can be used for
comparison among 3 or more groups.
Multivariate Analysis of Variance
(MANOVA)

An advanced statistical method that provides
a global test when there are multiple
dependent and independent variables, and the
independent variables are nominal.

It is a simple extension of univariate ANOVA
design.

If the results from MANOVA are statistically
significant, using multivariate statistic called
Wilks’ lambda, follow up ANOVAs may be
done to investigate the individual outcomes.
Problem:
A study was conducted to identify attitudinal
differences of nurses, nursing assistants and
residents (three groups) as barrier in effective
pain management . Information regarding
their beliefs about 12 components of chronic
pain management was collected.
Ans. The study involves 12 independent variables,
therefore if ANOVA is used , 12 univariate
ANOVAs would be needed.
Single MANOVA is the right choice.
Analysis of Covariance (ANCOVA)

A special type of ANOVA or regression used to
control for the effect of a possible confounding
factor.

A confounding factor is a variable more likely
to be present in one group of subjects than the
other that is related to the outcome of interest,
and thus potentially confuses or confounds the
results.
Problem:
In a study, when BMI alone is used to predict
Insuline Sensitivity(IS) in hyperthyroid women, the
regression equation was
IS = 2.336 – 0.07 7* BMI
which means that for every unit increase in
BMI, IS is predicted to decrease by 0.077.
Age is a confounding factor that affects BMI
as well as IS. A way to control for the possible
confounding effect of age is to include that variable
in the regression equation .
The regression equation with age included is
IS = 2.291 – 0.0045*Age - 0.068* BMI
Using this equation, the women’s IS level is
predicted to decrease by 0.068 for every unit
increase in BMI.
Women’s age BMI predicted IS
50 years 25 0.456
60 years 25 0.321
Some Non-Parametric Tests

These statistical tests of significance do not


presume that the observations should follow any
distribution.
χ2- test (Chi-square test)
(A non-parametric test)

Applications:
• Test of association / independence of
two attributes
• Test of goodness of fit
Sign Median Test
A NPT used for testing a hypothesis about median
in a single group.
Problem: Standard median energy consumption
for 2 year children is 1286 kcal. Such data for 94
children is recorded. Did children in the study
have median level of energy intake?
Soln. H0 : Median intake= 1286 kcal
H1 : Median intake≠ 1286 kcal
Use the test statistic x  n  1 / 2
z
n (1   )
Mann-Whitney Test/ Wilcoxon Rank Sum
Test/ Mann-Whitney-Wilcoxon Test

• A NPT for comparing two independent


samples with ordinal data or with numerical
observations that are not normally distributed.
• Test available in most statistical computer
packages
How to apply:
 Rank all the scores/ values from lowest to
highest (or vice-versa) ignoring the group they are
in.
 The ranks are analyzed as if they are original
observations.
 The means and SDs are calculated for each
group.
 H0 : Mean ranks are equal
H1 : Mean ranks are not equal
 Apply t-test

You might also like