You are on page 1of 46

Data Preparation for

Statistical Analysis
Practical Session for Biostatistics
Session 1
Department of Biostatistics, Epidemiology and Population Health
Pre-test Scenario
In 2012, WHO stated that stroke is one of two leading cause of death in
the world. There are several risk factors that are associated with stroke
and one of those is smoking. Smoking doubled the risk of stroke and
Indonesia has 29,2% of daily tobacco smoking and 34,8% of current
tobacco smoking among its population.
Thus, student wants to conduct a study entitled “A Descriptive Study:
Assessment of Smoking Habit in Stroke Patients Treated in Dr. Sardjito
Hospital Yogyakarta”.
This study aims to describe prevalence of smoking in stroke patients
including its detailed habits.
Variable Example
Gender a. Male
b. Female
Age ## years old
Smoking category a. Active Smoker
b. Passive Smoker
Or
a. Current Smoker
b. Ex-smoker
c. Non Smoker

Number of cigarette per day ## cigarettes per day


a. <3 packs per day
b. 3-5 packs per day
c. >5 packs per day

Smoking duration ## years


a. <10 years
b. 10-20 years
c. >20 years
Data Management – Aims
• Ensure the quality of the data
• To guarantee the integrity of the data analyses
• To be confident of the adequacy of the results of these data analyses
for answering the intended research and policy questions.
Data Management – Processing Steps
• Data entry
• Coding
• Editing
• Checking
• Update or correction
Type of Data
• Numerical data
• Consist of numbers comes from measurement
• Also called quantitative data
• In numbers forms
• Example : Height, weight, age, income, day of sick, duration of smoking
• Categorical data
• Consist of categorized data (not in actual numbers)
• Sometimes in the form of rank
• Example : educational level, gender, blood type, personality type, disease
status, Category of nutritional status (Fat, Slim, Normal)
Software Data Entry – Microsoft Excel
• Strength
- No need to prepare electronic questionnaire
- Can do simple analysis
• Weakness
- Not flexible for large data and many variables
- Errors in data entry cannot be controlled
- Data cannot be inputted via mobile phone
- Licensed
Software Data Entry – Epi Data
• Strength
- Flexible for large data and many variables
- Minimizing errors in data entry
- Free
• Weakness
- Need to prepare electronic questionnaire
- Data cannot be inputted via mobile phone
- Data has to be exported for complex analysis
Software Data Entry – Epi Info
• Strength
- Flexible for large data and many variables
- Minimizing errors in data entry
- Can do simple analysis
- Data can be inputted via “Epi Info for mobile phone” – real time
- Free
• Weakness
- Need to prepare electronic questionnaire
In-Class Exercise
• Create Questionaire File
• Get the questionaire “Morbid Form” from gamel. Make a “.qes” file
using morbid form.
Introduction to EpiData
• Data entry and documentation
• Free program
• Based on EpiInfo
• Windows format
• No limit on No of observation
(tested with >100.000)
EpiData (I)
• Creating questionnaire
• Controlled data entry
• Documenting and printing data
• Correction of questionnaires, records
• Importing and exporting data
EpiData (II)
• Simple surveys – one questionnaire
• Complicated surveys – few questionaires

If there is ID – possible to merge data


EpiData files
• When you create a questionnaire in EpiDATA this is saved as
a .QES file
• When you create a data file and enter data into it this file is
saved as a .REC file
• If you decide to include data checks that check data
*during* the data entry process (to reduce the likelihood of
making data entry errors) this is saved as a .CHK file
EpiData workflow
1. Define Data 4. Enter Data
2. Make Data File 5. Document
3. Set up Checks 6. Export Data
Creating Questionnaire
Creating a new questionnaire file

• Select NEW.QES to start a new form or OPEN. QES to open


an existing form. The file extension for questionnaire files is
always .QES.
Creating a new questionnaire file
• When you open a new form you will see several
toolbar options below the work process tools.
• When creating a questionnaire always follow conventions of good
questionnaire design
• avoid complicated or overly long questions,
• define concepts *very* clearly,
• avoid jargon,
• make sure the response lists include all possible choices).
New questionnaire
• Type in window
• Cut and paste from Word documents
• Preview questionnaire
• (click Make data file > preview data form)
Structure of questionnaire
Study Study number (identification number) ##
Name Name of patient <A____________________________>
DOB Date of birth <dd/mm/yyyy>
Age Age in years ##
Sex Sex of patient 1.Male 2.Female #

Text describing field Input definition (number/ letters/ date)

Field name (variable)


Field name (variable)
• No more than 10 characters
• Begin with a letter
• No spaces or punctuation marks
Text variables
• Information of text and/or numbers
• Holding information (e.g. names, addresses)
• UPPER CASE
• Can only hold upper case (capital) letters
• Lower case variable automatically converted into upper case text (ex: Egypt
converted into EGYPT)
• No mathematical operations
• Length (How many characters)
• ___________
Numeric variables
• Numerical information
• Hold integers (whole numbers) or numbers with a
decimal point
• Length (digits, decimals after the comma)
• # or ##.#
Other variables
• Boolean variables (s. logical variables s. YES/NO variables)
• only two possible answers: Yes or No
• <Y>

• Date variables:
• Hold information on dates
• Data in american <MM/DD/YYYY>
• European <DD/MM/YYYY>

• Soundex:
• Coding of words (anonymous, eg. A-123)
• Code to limit orthographic errors (eg. Rome and Roma)
• <S >
System variables
• Values generated automatically
• Today date: date of the data entry
• <Today-dmy>
• <Today-mdy>
• Auto identification number: Counts the records entered
• <IDNUM>
Variable type
• Define variables using “Pick List” or “Code writer”

• Choose type of variable:


• Numeric
• Text
• Date
• Soundex
• Boolean (Yes/No)
• Autonumber
Exercise
• Now, Convert your (smoking form.docx) into Epidata format
Preview data form (for data entry)
You must open your qes file, press ctrl+T
Save Questionnaire
Create Data file
Create Data file
Preventing errors
• Standardised and previously tested questionnaire
• Training the interviewers and data entry clerck
• Checking and validating paper forms of the questionnaires
• Checking during the data entry (Check module Epi-Data)
• Validation: entering twice data by different operators
• Checking after data entry (Analysis module Epi-Info)
Checks (I)
• Reduce errors in input
• Checks help with data entry
• Many different types
• Examples:
• Limit entry of numbers to specific range
• Forcing entry to be made in field
• Conditional jumps
• Copying the data from the previous record
• Help messages
• Conditional operations (ex if….then operations)
File structure
Click “document”, “file structure”
Data Entry (I)
• Click Enter data button > choose .REC file
Data Entry (II)
• Record navigation:

• Delete records:
• Click cross to delete
• Record marked for deletion, but can be recovered
Document Tools
• File Structure
• Data entry notes ( .NOT file)
• Use to write comments during data entry eg: difficult to read
handwriting etc
• View Data
• List Data
• Codebook
• Basic descriptive statistics on all variables
• Validate duplicate files
• Check consistency after double entry
Document Tools

View Data
Export to other programs

• Click Export data button


• Choose program
• Including Excel, Stata, SPSS
• For Epi-info open .REC file directly
Scales of Variables – Nominal
• Nominal scales assign numbers as labels to identify objects or classes
of objects. The assigned numbers carry no additional meaning except
as identifiers. Note that the order has no meaning here, and the
difference between identifiers is meaningless. In practice it is often
useful to assign numbers instead of letters.
• Example : Gender and Religion
Scales of Variables – Ordinal
• Ordinal scales assign numbers to objects to reflect a rank ordering on
an attribute in question. Order does matter in these variables (unlike
nominal scale variables).
• Example : Education
Scales of Variables – Interval
• In an interval scale, numbers are assigned to objects such that the
differences between the numbers can be meaningfully interpreted.
Ratios of interval scale variables have limited meaning because there
is not an absolute zero for interval scale variables.
• Temperature (in Celsius or Fahrenheit) – 00 C does not mean 00F
Scales of Variables – Ratio
• Ratio scales have all the attributes of interval scale variables and one
additional attribute – i.e. ratio scales include an absolute “zero”
point.
• Example : weight – 0 in kg has same meaning 0 in gr

You might also like