You are on page 1of 7

BIOSTATISTICS AND EPIDIMEOLOGY

Data Collection, Processing, Analysis and Presentation

Data Sources

Primary
 Survey
 Observation
 Experimental
 Secondary
 Existing Records
 Registry
 Census
 Hospitals, Clinics, Laboratories and Physician’s Offices
 Primary Data Sources
o Advantages:
 The investigator collects data specific to the problem under study.
 There is no doubt about the quality of the data collected (for the investigator).
 If required, it may be possible to obtain additional data during the study period.
o Disadvantages:
 The investigator has to contend with all the hassles of data collection-deciding why, what, how,
when to collect; getting the data collected (personally or through others); getting funding and
dealing with funding agencies; ethical considerations (consent, permissions, etc.).
 Ensuring the data collected is of a high standard-all desired data is obtained accurately, and in
the format it is required in; there is no fake/ cooked up data; unnecessary/ useless data has not
been included.
 Cost of obtaining the data is often the major expense in studies.
 Secondary Data Sources
o Advantages:
 No hassles of data collection
 It is less expensive
o Disadvantages
 Data collected in one location may not be suitable for the other one due variable environmental
factor.
 With the passage of time the data becomes obsolete and very old.
 Secondary data collected can distort the results of the research. For using secondary data a
special care is required to amend or modify for use.

Data Collection

Methods of Data Collection

o Census
o Survey
o Observation
o Experiment
o Simulations
o Review of documents and records
 Survey
o To query (someone) in order to collect data for the analysis of some aspect of a group or area (Merriam
Webster)
o Solicits information from people
o Steps:
 Interview
 Verbal communication between the researcher and the participant, during which
information is collected
 Types:
 UNSTRUCTED
i. More conversational
ii. Allows flexibility in questioning
 STRUCTED
i. Operates within a formal interview schedule
ii. Order of questions are designed prior to the interview
 Key Informant Interview
 One-on-one interview with a point person
 Focus Group Discussion
 Small group of people interviewed at the same time to discuss specific topics under the
guidance of a moderator
 Questionnaire
 A series of questions designed to collect information
 Most common type of instrument used
 Typically filled out by participants
 Types:
i. Open Ended
 Can elicit more detailed responses
 Responses require more effort to encode for data analysis
ii. Close Ended
 Easy to administer
 Uniform and pre-coded
 Can be encoded and analyzed in a short time
 Observation

Types Advantages Disadvantages


Naturalistic Observation  Good for observing specific subjects  Ethics: When consent is not obtained-
 Provides ecologically valid recording of details may be used which infringe
natural behavior confidentially.
 Spontaneous behaviors are more likely to
happen
Structure Observation  Allows control of extraneous variables  The implementation of controls may
 Reliability of results can be tested by have an effect on behavior
repeating the study  Lack of ecological validity
 Provides a safe environment to study  Observer effect
contentious concepts such as infant  Observer bias
attachment
Unstructured Observation  Gives a broad overview of a situation  Only really appropriate as a ‘first
 Useful where situation/subject matter to be step’ to give an overview of a
studied is unclear situation/ concept/ idea
Participant Observation  Gives an ‘insiders’ view  Observer effect
 Behaviors are less prone to  Possible lack of objectivity on the part
misinterpretation because researcher was a of the observer
participant
 Opportunity for researcher to become an
‘accepted’ part of the environment
Non-Participant Observation  Avoidance of observer effect  Observer is detached from situation
so relies on their perception which
may be inaccurate

 Experiment
o Treatment and observe the response
 Control group (a group receiving not treatment or a placebo)
 Used to compare the effectiveness of a treatment
 Simulation
o Uses a mathematical, physical, or computer model to replicate the conditions of a process or situation
o Frequently used when the actual situation is too expensive, dangerous, or impractical to replicate in real
life
 Review of Records
o Collection of data from existing records using an abstraction from
o Examples:
 Hospital or facility records
 Computer data bases
 Government reports
 Census data

ADVANTAGES & DISADVANTAGES

ADVANTAGES DISADVANTAGES
Questionnaire  Can assess a large group quickly  Requires a “good” language
 Easy to analyze if constructed correctly  Social desirability bias
 Not very good in getting in-depth
information
Interview  Best when you want to know what people  Recall bias
think, believe, or perceive  Social desirability bias
Review of Records  Relatively inexpensive  Coding errors (missing or incomplete
 Faster than collecting the original data again data)
 Data may not be exactly what is
needed
 Difficulty in getting access
 Needs to verify the validity and
reliability of data
Observation  Relatively inexpensive  Hawthorne effect
 Collects data on actual vs. self-reported  Observer bias
behavior or perceptions  Can be labor intensive

Qualities of Statistical Data

 Timeliness
o Data are up to date (current)
o Information is available on time
 Validity
o Data measure what they are intended to measure
 Reliability
o Data are measured and collected consistently according to standard definitions and methodologies
o Results are the same when measurements are repeated
 Completeness
o All data elements are included (as per the definition and methodologies specified)
 Precision
o Repeatability
o Data have sufficient detail
 Integrity
o Relying to the data to be able to draw valid conclusions
o Data are protected from deliberate bias or manipulation for political or personal reasons

Data Processing and Analysis

 Data Processing
o Systematic procedure to ensure that the information/data gathered are complete, consistent and
suitable for data analysis (1: Data Coding, 2: Data Encoding & 3: Data Editing)
o Data Coding
 Transforming collected information/observation into numbers (cohesive categories) which can
be more easily encoded, counted and tabulated
 Allows rapid storage of data
 Minimizes errors in encoding data
 Sometimes necessary so that the statistical software can perform various analysis on the data
 Guidelines: number of codes must be kept to minimum (preferably <8) and it should be
Exhaustive and Mutually Exclusive
o Data Encoding
 Entering of data in a spreadsheet
 Use computer programs for encoding
o Data editing
 Inspection and correction of any errors or inconsistencies in the information collected
 Purpose:
 To make changes/corrections as early as possible
 To ensure completeness, consistency and legibility of data entries
 To prepare the data for analysis
 Data Analysis
o The process of evaluating data using analytical and statistical tools to discover useful information

Consideration in choosing statistical test

 Objective of Analysis
o Relationship tests
 Test for the significance of the relationship of variables
o Difference tests
 Test for the significance of differences in the groups being compared
 Level of measurement of the variable
o Parametric tests
 Make assumptions about the parameters of the population distribution(s) from which the data
are drawn
o Non-Parametric tests
 Make no assumptions about the parameters of the population distribution(s) from which the
data are drawn
 Study Design
o Number of groups to be compared
o Whether the samples are independent or related

Data Presentation

 The method of summarizing, organizing and communicating information using a variety of tools
 Purpose:
o Display data clearly and effectively
o Summarize large quantities of date to the reader
o Facilitate analysis of trends, comparisons or relationships between variables
 Methods of Data Presentation
o Tabular Method
o Graphical Method
 Guidelines in Table Construction
o All tables must be simple, direct and clear.
o It should appear immediately after the text where it is first cited.
o All tables should have a uniform style.
o Categories must be mutually exclusive.
o The unit of measurement should be well defined.
o Ideally, limit only to 3 or 4 variables per table.
o If the observations are large in numbers, they can be broken into 2 or 3 tables.
o Tables should be self-explanatory
 Parts of Statistical Table
o Title
 Explanatory
 Gives a clear and concise description of the data
 Answer the following questions:
 What
 Who
 Where
 When
o Box Head (Column Heading)
 Indicated the basis of classification of the column or
vertical series
o Stubs (Row Heading)
 Indicates the basis of classification in rows or horizontal series
o Body
 main part of the table (composed of cells)
 Figures within the cell should be aligned
 Consistency in the number of decimal places
 Align all plus, minus, and plus-minus signs
 Empty cells should be indicated with a zero (0) or hyphen (-)
 Should be uniform in terms of decimals
 Include parenthesis for sign
 Can add space instead of a symbol to avoid problems having to translate between languages
o Foot Notes
 Appear immediately below the body of the table
 Designated by letters instead of numbers
 Provides additional information that cannot be easily understood from the title, box head or
stub
o Source of data
 Exact reference of the information
 Includes the information about compiling agency, publication, etc.
 Source should not be placed as a footnote to the page

Master Table

 Shows the distribution of observations across several variables of interest in a study


 Shows detailed statistical data and facilitated generation and tabulation of smaller tables

Dummy Table

 Skeleton tables that give a preview of what table outputs may be expected from the study
 Purpose:
o Help researcher clarify instrument
o Help protocol reviewer
o Help statistician/computer programmer

Frequency Distribution Table

 Show either the actual number of observations falling in each range or the percentage of observations
 Parts of Frequency distribution table
o Class interval
 Width of class distribution
o Frequency
 Record the number of times a result appears in class interval
o Cumulative frequency
 Add the frequency of the previous row to the frequency of the current row
o Percentage
 List the percentage of the frequency in each class interval
o Cumulative percentage
 Add the percentage of the previous row to the percentage of the current row

Type of Graphs

 Guidelines
o Should be self explanatory
o Source should be cited if data is secondary
o Title may be placed at the top or bottom of the graph
o Vertical and horizontal scales should be properly labeled
o Properly identify trend lines or curves with labels or legend
o Frequencies are placed on vertical axis while basis for classifications is on the horizontal
o Vertical scale should always starts with zero
o Use colors or degrees of shading for emphasis or to differentiate between
 Pie chart
o Describe how a whole is divided into parts/slices
o Show the percentage of the total number of observations falling into each categories of a qualitative
variable
 Bar Graph
o Compare data between different categories
o Qualitative variable or discrete quantitative variable
o Height of the bar is proportional to their values
o Bars should be of equal width and separated by gaps
 Horizontal Bar Graph
o Qualitative variable
 Vertical Bar Graph
o Discrete Quantitative Variable
 Component Bar Diagram
o Compare the compositions of two or more different groups as opposed to pie chart
o Qualitative data
o Each bar shows how a whole is made up of its component parts
 Histogram
o Represents frequency distribution of continuous quantitative variables
o Horizontal axis shows the unit of measurement
o Vertical scale gives the frequencies
o The area of rectangle is proportional to both the frequency and the width
 Frequency Polygon
o Displays the frequency of continuous quantitative variable
o Advantageous for two or more distributions are being depicted in a single graph
o Frequencies are plotted against the corresponding midpoints of the classes
 Line Graph/ Line Diagram
o Time series or time charts
o Quantitative variable over a period of time
o Intended to show trends or changes in the variable with time
 Scatterplot
o Relationship between two quantitative variables
o A graph in which the values of two variables are plotted along two axes
 Box Plot
o Shows skewness of data by comparing the mean and median
o Useful for showing description of large quantitative data including range, quartiles, spread, shape, tail
lengths and outliers

You might also like