You are on page 1of 45

Nursing

Research
Data Analysis

Florenda F. Cabatit RN MA
Facilitator
DATA ANALYSIS
Data analysis is the process by which
information is rendered meaningful and
intelligible (Polit and Hungler, 1995).
It is the systematic organization and
synthesis of research data and the testing
of research hypotheses using those data
(2004).
Statistical Analysis
Quantitative analysis deals with numerical
analysis of information.
It is the manipulation of numeric data through
statistical procedures for the purpose of
describing phenomena or assessing the
magnitude and reliability of relationships
among them.
Statistics is the scientific method used in
quantitative analysis.
Statistics

Statistics helps to:


• Organize data
• Summarize data
• Evaluate data
• Present data in an easily
understood form.
Statistics
Two branches of Statistics:
 Descriptive statistics -statistics used
to describe and summarize data
 Inferential Statistics – statistics that
permit inferences on whether
relationships observed in a sample are
likely to occur in the larger population.
Considerations in the choice of
appropriate statistical methods

• The purpose of the research


• The level of measurement of the
variables
• The number of groups/variables
involved
• The type of groups being studied
Purposes of Research
• To describe
• To compare or determine
differences
• Seek relationships
Levels of Measurement
• Nominal - the lowest level
- involves assigning numbers to classify
characteristics into categories
- numeric codes assigned in nominal
measurement do not convey quantitative
information.
- the numbers are merely symbols that
represent different values.
- categories must be mutually exclusive
and collectively exhaustive.
Ordinal Measurement
• This involves sorting objects on the
basis of their relative standing or
ranking on an attribute.
• The numbers are not arbitrary-they
signify incremental values but does
not however, tell anything about how
much greater one level is than
another.
Interval Measurement

• A measurement in which an
attribute of a variable is rank
ordered on a scale that has
equal distances between
points on that scale.
Ratio Scale

• A quantitative measurement in which


intervals are equal and there is a true
zero point.
• The highest level of measurement
• All arithmetic operations are permissible
with this measurement (add, subtract,
multiply, and divide numbers on this
scale).
Descriptive Statistics
Three characteristics to fully
describe a set of data:
• shape of the distribution
values
• central tendency
• Variability
Review of Descriptive Stats.
• Descriptive Statistics are used to
present quantitative descriptions in a
manageable form.
• This method works by reducing lots of
data into a simpler summary.
• Example:
– 370 Centigrade as average adult body
temperature
– SU’s quality-point system
Univariate Analysis
• This is the examination across cases of
one variable at a time.
• Frequency distributions are used to group
data.
• One may set up margins that allow us to
group cases into categories.
• Examples include
– Age categories
– Price categories
– Temperature categories.
Distributions
Two ways to describe a univariate
distribution
• A table
• A graph (histogram, bar chart)
Distributions (con’t)
• Distributions may also be displayed using
percentages.
• For example, one could use percentages
to describe the following:
– Percentage of people under the poverty level
– Over a certain age
– Over a certain score on a standardized test
Distributions (cont.)

A Frequency Distribution Table

Category Percent
Under 35 9%
36-45 21
46-55 45
56-65 19
66+ 6
Distributions (cont.)
A Histogram

45
40
35
30
25
20
Percent
15
10
5
0
Under

36-45

46-55

56-65

66+
35
Central Tendency
• An estimate of the “center” of a
distribution
• Three different types of estimates:
– Mean
– Median
– Mode
Mean
• The most commonly used method of
describing central tendency.
• One basically totals all the results and
then divides by the number of units or
“n” of the sample.
• Example: The NCM 104 Quiz mean
was determined by the sum of all the
scores divided by the number of
students taking the exam.
Median
• The median is the score found at the exact
middle of the set.
• One must list all scores in numerical order
and then locate the score in the center of
the sample.
• Example: If there are 500 scores in the
list, score #250 would be the median.
• This is useful in weeding out outliers.
Mode
• The mode is the most repeated score in
the set of results.
• Lets take the set of scores:
15,20,21,20,36,15, 25,15
• Again we first line up the scores
• 15,15,15,20,20,21,25,36
• 15 is the most repeated score and is
therefore labeled the mode.
Central Tendency
• If the distribution is normal (i.e., bell-
shaped), the mean, median and mode are
all equal.
• In our analyses, we’ll use the mean.
Dispersion
• Two estimates types:
– Range
– Standard deviation
• Standard deviation is more
accurate/detailed because an outlier
can greatly extend the range.
Range
• The range is used to identify the
highest and lowest scores.
• Lets take the set of
scores:15,20,21,20,36,15, 25,15.
• The range would be 15-36. This
identifies the fact that 21 points
separates the highest to the lowest
score.
Standard Deviation
• The standard deviation is a value that
shows the relation that individual
scores have to the mean of the sample.
• If scores are said to be standardized to
a normal curve, there are several
statistical manipulations that can be
performed to analyze the data set.
Standard Dev. (con’t)
• Assumptions may be made about the
percentage of scores as they deviate from
the mean.
• If scores are normally distributed, one can
assume that approximately 69% of the scores
in the sample fall within one standard
deviation of the mean. Approximately 95% of
the scores would then fall within two
standard deviations of the mean.
Standard Dev. (con’t)
• The standard deviation calculates the
square root of the sum of the squared
deviations from the mean of all the
scores, divided by the number of
scores.
• This process accounts for both
positive and negative deviations from
the mean.
RESEARCH QUESTION: DESCRIBE

LEVEL TYPE OF DESCRIPTION STATISTICAL TOOL

Frequency distribution
Distribution Contingency Table
NOMINAL
Central Tendency
Mode

Distribution Frequency Distribution


ORDINAL Contingency Table
Scatterpoint

Central Tendency
Mode, Median

Frequency Distribution
Distribution Contingency Table
Scatterpoint
RATIO/INTERVAL
Central Tendency
Mode, Median, Mean

Variability
Range, Variance,
Standard Deviation
Inferential statistics
• Based on the law of probability
• It provides a means for drawing
conclusions about a population, given
data from a sample
• It estimates population parameters
from sample statistics
Inferential Statistics
Statistical Inference consists of two
techniques:
2.Estimation of parameters
3.Hypothesis testing
Hypothesis Testing
Statistical hypothesis testing provides
objective criteria for deciding whether
hypotheses are supported by empirical
evidence.
• It is a process of disproof or rejection.
• Researchers seek to reject the null
hypothesis through various statistical
tests.
• Hypothesis testing uses samples to draw
conclusions about relationships within
the population.
Type I and Type II Errors
Type I Error - researchers make a type I
error when a true null hypothesis is
rejected.

Type II Error – researchers make a type


II error when a false null hypothesis is
accepted
Level of Significance
This refers to the risk of making a
type I error in a statistical analysis.
The value selected beforehand
signifies the risk or the probability of
rejecting of rejecting a true null
hypothesis.
The two most frequently used
significance levels (referred to as alpha
or α) are:
.05
.01
Level of Significance
• With .05 significance level, we are
accepting the risk that out of 100 samples
drawn from a population, a true null
hypothesis would be rejected only 5 times.

• With a .01 level of significance, the risk of


a type I error is lower: in only 1 sample out
of 100 would we erroneously reject the
null hypothesis.
Critical Region
This refers to the area in the
sampling distribution representing
values that are “improbable” if the
null hypothesis is true.
It is defined by the level of
significance
Statistical Tests
Two-tailed test- this means that both
ends or tails of the sampling
distribution are used to determine
improbable values.
In one-tailed tests, the critical region of
improbable values is entirely in one tail
of the distribution-the tail
corresponding to the direction of the
hypothesis
An example of Critical Regions of a two
-tailed test
Types of Statistical Tests
• Parametric Tests – a class of inferential
statistical tests that involve:
a. Assumptions about the distribution of
the variables
b. The estimation of a parameter
c. The use of interval or ratio measures.
Statistical Tests
• Non-parametric Tests –statistical tests
that do not estimate parameters
- also called distribution-free statistics.
Steps in Hypothesis testing
1. State the alternative hypothesis
2. State the null hypothesis
3. Establish the level of significance
4. Select a one-tailed or two-tailed test
5. Compute a test statistic
6. Calculate the degrees of freedom
7. Obtain a tabled value for the statistical test
8. Compare the test statistic with the tabled
value.
The Decision Matrix
In reality Null true Null false
Alternative false Alternative true
In reality... In reality...
What • There is no real program effect • There is a real program effect
• There is no difference, gain
we conclude • Our theory is wrong


There is a difference, gain
Our theory is correct

Accept null 1-α β


Reject alternative THE CONFIDENCE LEVEL TYPE II ERROR
We say...
The odds of saying there is no The odds of saying there is no
• There is no real program effect or gain when in fact there effect or gain when in fact there
is none is one
effect
• There is no difference, # of times out of 100 when # of times out of 100 when
gain there is no effect, we’ll say there is an effect, we’ll say
• Our theory is wrong there is none there is none
Reject null α 1-β
Accept alternative TYPE I ERROR POWER
We say... The odds of saying there is an The odds of saying there is an
effect or gain when in fact there effect or gain when in fact there
• There is a real program is none is one
effect
• There is a difference, gain # of times out of 100 when # of times out of 100 when
• Our theory is correct there is no effect, we’ll say there is an effect, we’ll say
there is one there is one
Decision Matrix
If you try to increase power, you increase
the chance of winding up in the bottom
row and of Type I error.

If you try to decrease Type I errors, you


increase the chance of winding up in
the top row and of Type II error.

You might also like