Ph.
D In Nursing
RESEARCH METHODS in nursing
Module No: 7 DATA PREPARATION
Name of the subtopic
7.3 data verification and Qualitative vs.
Quantitative data analysis techniques
Faculty Name
Date:
Subject Code:
School of Nursing
Learning Objectives
At the end of this module the students will be able to ;
Understand the Meaning of data verification
list the Errors in data
Explain the Steps in data verification
Describe the Importance of data cleaning and maintaining
consistency
Discuss the Data verification procedures using WinDEM
Getting ready for analysis
Selecting a Data Analysis Strategy
Explain the Quantitative and qualitative analytical techniques
2
List of contents
introduction
Meaning of data verification
Errors in data
Steps in data verification
Importance of data cleaning and maintaining consistency
Data verification procedures using WinDEM
Getting ready for analysis
Selecting a Data Analysis Strategy
Quantitative and qualitative analytical techniques
Summary
references
3
Introduction
nursing science is the body of knowledge that supports
evidence-based practice
The format of the data after data entry and data verification
processes have been completed, is often not the best format for
the use in data analyses. In order to manipulate, analyze and
report the information collected in a convenient and efficient
way, the data needs to be organized in a database system. Such
a database system is a structured aggregation of data-elements
which can be related and linked to each other through specified
relationships. The data-elements can then be accessed through
specified criteria and through a set of transaction operations,
which are usually implemented through a data-retrieval
language.
4
Meaning of Data verification
A critical step in the management of survey data is the
verification of the data. It must be ensured that the data are
consistent and conform to the definitions in the codebook and
are ready for analytical use.
5
Causes of data errors
Whenever data are collected, almost inevitably errors are
introduced by various sources. Substantial delays occur when
these errors are not anticipated and safeguards built into
procedures.
Errors in the data may be caused:
(a) by faulty preparation of the field monitoring and survey
tracking instruments,
(b) by the assignment of incorrect identifications to the
respondents,
(c) during the field administration,
(d) during the preparation of the instruments including the
coding of the responses, and
(e) during transcription of the data.
6
Causes of data errors-contd
questions may have been skipped or answered in a way not
intended because of misunderstanding during translation and/or
ambiguities in the question. Such deviations can result in:
(a) variables which have been skipped but which should have
been included; or, variables which have been included when
they should not have been (e.g. when they were misprinted);
(b) incorrectly coded variables;
(c) variables which have a content different from that specified
by the codebook.
7
Causes of data errors-contd
The amount of work involved in resolving these problems,
often called “data cleaning” can greatly be reduced by using
well-designed instruments, qualified field administration and
coding personnel and appropriate transcription mechanisms.
8
Data verification steps
The steps that must be undertaken to verify the data are
implied in the quality standards that have been defined for the
corresponding survey. Procedures must be implemented for
checking invalid, incorrect and inconsistent data, which may
range from simple deterministic univariate range checks to
multivariate contingency tests between different variables and
different respondents.
9
Data verification steps
The steps that must be undertaken to verify the data are
implied in the quality standards that have been defined for the
corresponding survey. Procedures must be implemented for
checking invalid, incorrect and inconsistent data, which may
range from simple deterministic univariate range checks to
multivariate contingency tests between different variables and
different respondents.
10
Data verification steps
The criteria on which these checks are based depend, on the
one hand, on variable type (i.e. different checks may apply to
data variables, control variables, and identification variables)
and, on the other hand, to the manner and sequence in which
questions are asked.
For some questions a certain number of responses are required,
or responses must be given in a special way, due to a
dependency or logical relationship between questions.
11
Data verification steps
Depending on the data collection procedures used, it must be
defined when and at what stage data verification steps are
implemented.
For example, some range validation checks may be applied
during data entry whereas more complex checks for the
internal consistency of data or for outliers may be more
appropriately applied after the completion of the data entry.
Problems that have been detected through verification
procedures need to be resolved efficiently and quickly.
12
Data verification steps-(validation check)
Substruction: Operational System
Pain Intensity Functional Status
Instrument: Instrument:1-5 Likert scale,
VAScontinuous
10 cm scale 1=low & 5=high function
Scale: or
discrete?
(low to high pain)
Scale: continuous or discrete?
13
Data verification steps-(validation
check)
Scaling
Discrete: non-parametric (Chi square)
Nominal gender
Ordinal low, medium, high income
Continuous: parametric (t or F tests)
Interval Likert scale, 1-5
functionality
Ratio money, age, blood pressure
14
Rigor in Quantitative Research
Theoretical Grounding: Axioms & postulates –
substruction-validity of hypothesized relationships
Design validity (internal & external) of research design;
Instrument validity and reliability
Statistical assumptions met (scaling, normal curve, linear
relationship, etc.)
(Note: Polit & Beck: reliability, validity, generalizability,
objectivity)
15
Data verification steps-(validation
check)
Design Validity
Statistical conclusion validity
Construct validity of Cause & Effect (X & Y)
Internal validity
External validity
16
Data verification steps-(validation
check)
Design Validity
Statistical Conclusion Validity rxy?
Type I error (alpha 0.05)
Type II error (Beta) Power = 1-Beta, inadequate power,
i.e. low sample size
Reliability of measures
Can you trust the statistical findings?
Data verification steps-(validation
check)
Design Validity
Construct Validity of Putative Cause & Effect (X Y?)
Theoretical basis linking constructs and concepts
(substruction)
Outcomes sensitive to nursing care
Link intervention with outcome theoretically
Is there any theoretical rationale for why X and Y should be
related?
Data verification steps-(validation
check)
Design Validity
Internal Validity
Threat of history (intervening event)
Threat of maturation (developmental change)
Threat of testing (instrument causes an effect)
Threat of instrumentation (reliability of measure)
Threat of mortality (subject drop out)
Threat of selection bias (poor selection of subjects)
Are any Z variables causing the observed changes in Y?
Data verification steps-(validation
check)
Design Validity
External Validity
Threat of low generalizability to people, places, & time
Can we generalize to others?
Data verification steps-(validation
check)
Building Knowledge
Goal is to have confidence in our descriptive, correlational, and
causal data.
Rigor means to follow the required techniques and strategies
for increasing our trust and confidence in the research findings.
Data verification -contd
Some problems can, using certain assumptions, be resolved
automatically on the basis of cross-checks in the datafiles.
Other problems will require further inspection and manual
resolving. Where problems cannot be resolved, the problematic
data-values must be recoded to special missing codes.
The criteria on which the checking was based depended, on the
one hand, on the type of variables that were used to code the
information (for example, different criteria apply to data
variables,
22
Data verification -contd
identification variables, control variables, filter variables, and
dependent variables) and, on the other hand on the way and
sequence in which questions were asked (for example, for
some questions a certain number of responses are required, or
responses must be given in a special way, or there is a
dependency, or a logical relationship between certain
questions).
23
Data verification steps
Data verification steps
Usually, data verification is undertaken through a series of
steps, which for each survey need to be established and
sequenced in accordance with the quality requirements, the
type of data collected, and the field operations and data entry
procedures applied.
24
Data verification steps-contd
Common data verification steps are:
• the verification of returned instruments for completeness and
correctness;
• the verification of identification codes against field
monitoring and survey tracking instruments;
• a check for the file integrity, in particular, the verification of
the structure of the datafiles against the specifications in the
codebook;
25
Data verification steps-contd
the verification of the identification codes for internal
consistency;
• the verification of the data variables for each student or
teacher against the validation criteria specified in the
codebook;
• the verification of the data variables for each student and
teacher against certain control variables in the datafiles;
26
Data verification steps-contd
the verification of the data obtained for each respondent for
internal consistency (for example, the responses for questions
which were coded through split variables the answers to these
can be cross-validated);
• the cross-validation of related data between respondents;
• the verification of linkages between related datafiles,
especially in the case of hierarchically structured data; and
• the verification of the handling of missing data.
27
Data verification steps-contd
1. Verification of file integrity
As a first step it needs to be ensured that the overall structure
of the datafiles conforms to the specifications in the codebook.
For example, if raw datafiles are used, then each record should
have the same length in correspondence with the codebook.
Often it is useful to introduce column-control variables in the
codebook at regular intervals which the coder should code with
blanks.
28
Data verification steps-contd
Special recordings
Sometimes it is necessary to recode certain variables before
they can be used in the data analyses. Examples for such
situations are:
• The sequence of questions or items has been changed for
some reason and is not any more in accordance with the
codebook;
• A question or item may have been asked to some respondents
in a different way or format than was intended;
• A question or item has not been asked to some respondents
but the missing information could be derived from other
variables.
29
Data verification steps-contd
Value validation
Background questions and test items for which a fixed set of
codes rather than open-ended values applied need to be
checked against the range validation criteria defined in the
codebook. Variables with open-ended values (e.g. “Student
age”) need to be checked against theoretical ranges.
30
Data verification steps-contd
Treatment of duplicate identification codes
Each datafile should be checked for duplicate identification
codes. It is thereby often useful to distinguished between the
following two cases:
• Respondents who have identical identification codes but
different values for a number of key data variables. These
respondents have probably been assigned invalid identification
codes.
• Respondents who have identical identification codes and at
the same time also identical data values for the key data
variables. These respondents are most likely duplicate entries
in which case the second occurrence of the respondents data
was removed from the datafiles.
31
Data verification steps-contd
Internal validation of an hierarchical identification system
If a hierarchical identification system is used for identifying
respondents at different levels, then the structure of this system
can be verified for internal consistency. The number of errors
in identification codes which are often a serious threat to the
use of the data can thus be dramatically reduced. Often
inconsistencies can then be resolved automatically or even
avoided during the entry of data.
32
Data verification steps-contd
Verification of the linkages between datafiles
Often data are collected at different levels, for example data
are collected for students and for the teachers which teach the
classes in which the students are enrolled. Then it is important
to verify the linkages between such levels of data collection.
33
Data verification steps-contd
There may be cases where for a teacher in the teacher datafile
there are no students linked to him or her in the student datafile
(that is, for a certain Teacher ID in the teacher datafile there is no
matching Teacher ID in the student datafile);
• There may be cases in the student datafile which do not have a
match in the teacher datafile even though they are associated
with a Teacher ID;
• There may be cases where the Class IDs of all students which
were linked to a teacher were different from the Class ID of this
teacher in the teacher datafile; and
• There may be cases where the Class IDs of all students which
were linked to a teacher are different from the Class ID of this
teacher in the teacher datafile
34
Data verification steps-contd
Verification of participation indicator variables against data
variables. for example, in the first situation we would base the
student score only on the answers given in the first testing
session and exclude the items in the second testing session
from scoring, whereas in the second situation case we would
score all items in the second testing session as wrong.
To allow the verification of this, the datafile should, besides the
variables with the questions and test items, contain also
information about the participation status of the respondents in
the different
35
Data verification steps-contd
testing sessions. It is often useful to group the variables in the
codebook into blocks. Each of these blocks can then begin with
a variable the code of which indicates the participation of the
student in the respective testing session. It can then be verified
whether the participation indicator variables matched the data
in the corresponding data variables.
36
Data verification steps-contd
Verification of exclusions of respondents
Checking for inconsistencies in the data
37
Data verification procedures using
WinDEM
Data verification procedures using WinDEM
WinDEM offers some simple data verification procedures. All
the procedures are found under the “Verify” menu. The
following section describes some of the fundamental ones:
38
Data verification procedures using
WinDEM
1. Unique ID check
There must be only one record within a file for each unit of
analysis surveyed. This verification procedure checks whether
each record has been allocated with a unique identification
code.
[Link] check
When a series of similar variables exist in a file, it is possible
that the enterer skips a variable or enters a variable twice, and
consequently a column shift occurs. This can be avoided if you
introduce variables in the datafile at regular positions in the
codebook, into which the data entry personnel must enter a
blank value.
39
Data verification procedures using
WinDEM-contd
[Link] check-contd
. In order to be recognized by the automatic checking routines
of the WinDEM program, the names of these variables must
have the prefix “CHECK”. Column shift should not occur if
the data enters followed these directions of entering the blank
values. You can also see that the data entry proceeded correctly
by looking at the “Table entry” from the “view” menu.
40
Data verification procedures using
WinDEM-contd
[Link] check
As mentioned before, WinDEM assures that the values are
within the range specified in the structure file unless the data
puncher explicitly confirms the out-of-range values entered.
This validation criteria check will show all the variables of all
the cases that have been “confirmed” to contain out-of-range
values. This can be useful especially when many data enterers
are involved in the survey study.
41
Data verification procedures using
WinDEM-contd
Merge check
WinDEM allows you to check the consistency between
variables. This check detects records in a datafile that do not
have matches in a related datafile for a higher level of data
aggregation
42
Data verification procedures using
WinDEM-contd
Using “Merge check” from the “Verify” menu, you can select
the variables (or variable combinations) by which the records
in the selected data file are matched against the records in the
higher-level aggregated data file. The software will ask you to
specify the datafile against which to check the merge of the
current datafile in the “File Open Dialog”.
The program will notify you if some errors are found. The
software will ask you if you want to open the data verification
report
43
Data verification procedures using
WinDEM-contd
Double coding check
In order to produce high-quality data, it is sometimes
recommended to enter data into two different computers
(requiring two different data enterers). This allows you to
examine if the two files have exactly the same structures and
the values on all records. In order to check this, however, you
will have to have these two files under different names and the
two corresponding codebooks on one
44
GETTING DATA READY FOR
ANALYSIS
After data are obtained through questionnaires, interviews,
observation, or through secondary sources, they need to be
edited. The blank responses, if any, have to be handled in some
way, the data coded, and a categorization scheme has to be set
up. The data will then have to be keyed in, and some software
program used to analyze them.
45
Selecting a Data Analysis Strategy
Earlier Steps (1, 2, 3) of the Marketing Research Process
Known Characteristics of Data
Properties of Statistical Techniques
Background & Philosophy of the Researcher
Data Analysis Strategy
Quantitative techniques of data analysis
A Classification of Univariate
Techniques Univariate Techniques
Metric Data Non-numeric
Data
One Sample Two or More One Sample Two or More
Samples Samples
* t test * Frequency
* Z test * Chi-Square
* K-S
* Runs
* Binomial
Independent Related
* Two- * Paired Independent Related
Groups t * t test
test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon
* One-Way * Median * McNemar
ANOVA * K-S * Chi-Square
* K-W ANOVA
A Classification of Multivariate
Techniques
Multivariate Techniques
Dependence Interdependenc
Technique e Technique
One Dependent More Than One Variable Interobject
Variable Dependent Interdependenc Similarity
Variable e
* Cross- * Multivariate * Factor * Cluster Analysis
Tabulation Analysis of Analysis *
* Analysis of Variance and Multidimensiona
Variance and Covariance l Scaling
Covariance * Canonical
* Multiple Correlation
Regression * Multiple
* Conjoint Discriminant
Analysis Analysis
Types of Statistical Analysis Used in Research
Descriptive Analysis:
Mean, Mode, Median and Standard deviation.
Inferential Analysis:
Hypothesis testing and estimation of true population values.
Differences Analysis:
Determination of significant differences exit in the
population.
Associative Analysis:
Investigation of how two and more variables are related.
Predictive Analysis:
It is used to enhance prediction capabilities of marketing
researcher. Ex: regression analysis
Understanding Data Via Descriptive Analysis
Measure of Central Tendency:
Mode
Highest occurrence in a set of variables.
Median
Occurrence in the middle of a set values.
Mean:
Arithmetic average of a set of numbers.
Understanding Data Via Descriptive Analysis (cont.)
Measure of Variability:
Frequency Distribution:
Number of times that each different value appears.
Range:
Identifies the distance between the lowest and the highest
value in an ordered set of variables.
Standard Deviation:
The degree of variation or diversity in the values in a such
a way to be translated in a normal bell-shaped distribution.
Understanding the Data Via Descriptive
Statistics (cont.)
Other Descriptive Measures:
Measure of Skewness:
Reveals the degree of direction of asymmetry in a
distribution. A ‘0’ value indicates symmetric distribution,
a negative value indicates distribution has tail to the left, a
positive value indicates distribution has tail to the right.
Kurtosis:
How pointed and peaked a distribution appears. A ‘0’
value indicates distribution is bell shaped, a negative value
indicates distribution is more flat, a positive value
indicated distribution is more peaked than the bell shaped
curve.
Other Quantitative techniques of data analysis
2. Programming technique
a) Linear programming
b) Inventory control
c) Game theory
d) Network analyze
e) Non-linear program
Limitation of quantitative technique
The inherent limitation concerning mathematical expressions.
High cost are involved in the use of quantitative techniques
Quantitative techniques do not take into consideration the
intangible factors (i.e.) non-measurable human factors
Quantitative techniques are just the tools of analysis and not
the complete decision making process
Qualitative data analysis techniques
The qualitative researcher, however, has no system for pre-
coding, therefore a method of identifying and labelling or
coding data needs to be developed that is bespoke for each
research. - which is called content analysis.
Content analysis can be used when qualitative data has been
collected through:
Interviews
Focus groups
Observation
Documentary analysis
56
Qualitative data analysis techniques
The analysis of qualitative research involves aiming to uncover
and / or understand the big picture - by using the data to
describe the phenomenon and what this means. Both
qualitative and quantitative analysis involves labelling and
coding all of the data in order that similarities and differences
can be recognised. Responses from even an unstructured
qualitative interview can be entered into a computer in order
for it to be coded, counted and analysed.
57
Qualitative data analysis techniques
Content analysis is '...a procedure for the categorisation of
verbal or behavioural data, for purposes of classification,
summarisation and tabulation.'
The content can be analysed on two levels:
Basic level or the manifest level: a descriptive account of the
data i.e. this is what was said, but no comments or theories as
to why or how
Higher level or latent level of analysis: a more interpretive
analysis that is concerned with the response as well as what
may have been inferred or implied
58
Qualitative data analysis techniques
Content analysis involves coding and classifying data, also
referred to as categorising and indexing and the aim of context
analysis is to make sense of the data collected and to highlight
the important messages, features or findings.
59
Six Steps in Qualitative Data Analysis
1. Give codes from the notes.
2. Note personal reflections in the margin.
3. Sort and sift the notes to identify similar and different
relationships between patterns.
Six Steps in Qualitative Data Analysis
4. Identify these patterns, similarities and differences.
5. Elaborate a small set of generalizations that cover the
consistencies.
6. Examine those generalizations and form grounded theory.
Grounded Theory Analysis
Strategies
Grounded theory:
A process of constructing various data
Inductive process by collecting, analyzing and comparing data
systematically.
Theory is grounded on data to explain the phenomena.
The main purpose is to develop theory through understanding
concepts that are related by means of statements of
relationships.
Grounded Theory Analysis
Strategies-contd
Recur by moving back and forth with the data, analyzing,
collecting more data and analyzing some more until reaching
conclusions.
An interactional method of theory building by comparing and
analyzing the data.
Grounded Theory Analysis
Strategies-contd
Three steps in the grounded theory analytic process:
1. Open coding:
Break data into small parts compare for similarities and
differences explain the meanings of the data by focusing on
“ who, when, where, what, how much, why” (ask questions to
get a clear story)
Grounded Theory Analysis
Strategies-contd
2. Axial coding:
After open coding, make connection (sort) between categories
and confirm or disconfirm your hypotheses.
3. Selective coding:
Select the core category (match hypotheses) and explain the
minor category (against hypotheses) with additional supporting
data.
Coding process:
Open coding
Axial coding
Select coding
Interpretation Issues in
Qualitative Data Analysis
A. Triangulating Data
Use multiple methods and data sources to support the
strength of interpretations and conclusion
Ex) semi-structured interviews, consent form, grounded
theory
Interpretation Issues in
Qualitative Data Analysis-contd
B. Audits
Questions to examine the data for interpretations and conclusion
1. Is sampling appropriate to ground the findings?
2. Are coding strategies applied correctly?
3. Is the category process appropriate?
4. Do the results link hypotheses? (examine literature review)
5. Are the negative cases explained? (minority’s voice)
Suggestions
Four steps of negative case testing
1. Make a rough hypothesis
2. Conduct a thorough search
3. Discard or reformulate hypothesis
4. Examine all relevant cases
Interpretation Issues in
Qualitative Data Analysis-contd
C. Cultural bias
Discuss cultural differences with different groups of
participants
To see whether divergence is based on culturally different
interpretations
Interpretation Issues in
Qualitative Data Analysis-contd
D. Generalization
Not appropriate for qualitative research
Two perspectives for generalization
1. Case-to-case translation (transferability)-
by providing thick description to apply to another setting
2. Analytic generalization-
form a particular set of results to a broader theory
Ex) use deviant cases
Summary
This module explains quantitative and qualitative research in
detail. The validity, strength and weakness were given in detail.
In the next module we will understand data analyses and
parametric test in detailed
72
References
Carol Leslie Macnee, (2008), Understanding Nursing Research:
Using Research in Evidence-based Practice, Lippincott Williams
& Wilkins, ISBN 0781775582, 9780781775588
[Link], [Link], (2013). ‘Nursing research-principles and
methods’, revised edition, Philadelphia, Lippincott
[Link]
methods/instrument-validity-reliability/
[Link]
research-matters/validity/
[Link]
[Link]
and_reliability.html
Thanks
Next Topic>>
PARAMETRIC
TEST
74