You are on page 1of 9

Nature of Biostatistics • Did it pass the 4 phases of clinical trials?

Phase 0 laboratory,
phase 1 for animal model, 2 for soldiers voluntary, 3 for small
I. Introduction community voluntary 4 several communities voluntary

• Bio: life • What are the adverse effects?

• Statistics: deals with the collection, organization, analysis and Informed consent – consent needed for parents with different side
interpretation of numerical data. (This is the scientist’s bread-and- effect and have an option to get out of the study. You must
butter because this is where they earn income and it is always use translate or said it to the participant to clearly know what’s the
in education to give more accurate data) research about. African Sleeping Sickness, trypanosoma brucei.
Many side effect but must be good.
2 Important properties of the research
• Do the pros outweigh the cons?
Validity: Yielded data are real numbers
2. Biostat techniques are used in the design and evaluation of
Reliability: Significant of the result research projects
A. Usage • Ex. Philippine Lymphatic Filariasis (LF) Elimination Project
(1998) 2 billion pesos. Compare the epidemiology before and after
• Public Health (Generalization or the community, More on control
Less than 1 out of 1000 are affected it means it is eliminated.
and prevention)
Failed in data dessimination and education program
– Used in planning, monitoring and evaluating
• Design was based on LF programmes from
programs
Western Africa
– Used in determining vital and health statistics
• Evaluation: will it achieve the targeted
• Vital statistics: number of births, deaths and marriages
(madadagdagan ng population), immigration and emigration Prevalence rate before 2020?
• Health statistics: morbidity data, hospital and clinic statistics, – 9.7%/1000 population
service statistics
3. Biostat is used by health administrators/planners/workers in the
• Biology systematic collection of data from w/c health indicators can be
derived and used as tools for decision-making
– Determine the efficacy of a test substance as an alternative to
commercially available products (antibacterial (Antibacterial – Problem identification (Mosquito bite, 4 genera of mosquito,
resistance) (zone of inhibition test), antihypertensive, antilarva, Culex, Aedes (Dengue), Anopheles (malaria), etc.) wrong
antihelminthes etc) dissemination (nausea, after eating meal)
– Establish occurrence of microevolution in a specific area – Needs assessment
– Determine the extent of speciation in a given area – Allocation of limited resources (What is more prevalent, more
children died in waterborne diseases)
– Establish correlation between phenotype and certain
environmental factors – Program evaluation
• Medicine (clinical setup, in hospital, Outpatient, Diagnosis and C. Branches (1st sem – Descriptive, 2nd sem – inferential)
treatment)
1. Descriptive statistics: summarizes and presents data in forms
– Establish correlation between a specific condition and certain that will make them easier to analyze and interpret.
risk factors
Narrative – Kung konti nakuha mong data, Tabular – kapag
– Determine the efficacy of certain tx/tx programs marami nakuha mong data, Graphs – comparing population
B. Applications in Policy Making Example
1. Biostat is a tool needed in information-based decision-making
 Central tendencies
Primary investigator: Ikaw lang ang mag-cocontrol ng content  Prevalences (quartile etc.)
 Variance and standard variation
MDA: Mass Drug Administration
2. Inferential statistics: refers to methods involved in order to make
• Ex. Mass Administration of a drug/vaccine generalizations and conclusions about a target population based on
results from a sample population.
• Is it as good as the standard tx? STH (soil-transmitted helminths
due to airborne of egg) Cyst or trophozoites Target population or population and sample: Inferential: sample
population, prevalence HIV
a. Estimation of parameters E.g., sex, occupation, disease status

Parameters: properties of your target population Some variables can be measured qualitatively

b. Testing of hypothesis or quantitively

in rounding of age, 9 years and 363 days, 9 years

E.g., height: measured in cm or categorized as

short, medium, tall

G. Types of Variables According to Scale of Measurement

1. Nominal

Gr. Nomos = name

Qualitative
D. The Phenomenon of Variation
E.g., gender, sex, disease status, immune status
• Variation- refers to the tendency of a measurable characteristic to
change from one individual or one setting to another, or from one 2. Ordinal
instant of time to another instant within the same individual or
setting Variables that can be ranked/ordered

E. Types of Data Can be qualitative or quantitative

• Biostatistics deals with both qualitative and quantitative data Ex. Severity of disease, Intensity of infection (number of eggs)

Constants- a phenomenon whose value remains the same from 3. Interval


person to person, from time to time or from place to place
Exact distance between categories can be measured
Ex: 1hr=60mins 𝜌=3.14159 1dozen=12pcs
but 0 is arbitrary (0 doesn’t necessarily mean zero) like
Variable- a phenomenon whose values or categories cannot be temperature, or location, or longitude
predicted w/ certainty
4. Ratio
Ex: Infection status, Parasitaemia, Length of gestation,
Weight/Height Educational attainment Exact distance between categories can be measured

F. Types of Variables 0 is fixed (weight, height) (For applied science)

1. Qualitative: categories are used as labels to distinguish one “What is the purpose of classifying variables?”
group fr. another. One group is not better than the other.
To know what to use for the tool or methodology for the research.
Ex: Sex (presence of Y chromosome), Urban/rural, Religion,
Region, Disease status. Linear Regression - Quantitative

2. Quantitative: categories can be measured and ordered according Logistic, Chi-Square – Qualitative
to quantity/amount
Quantitative Discrete (1 variable) – Bar graph (with spaces)
Categories can be expressed numerically
Quantitative Continuous (1 variable) – Histogram
Ex: Population size Birthweight Number of hospital beds
Time elements – Line graph
2. Quantitative
2 variables for correlation – Scatterplot
Subtypes:
Data collection
a. Discrete- can only assume integral values
I. Categories of Data According to Sources
E.g., household size, hospital capacity, parasitaemia (number of
parasites per ml of blood) A. Primary

b. Continuous- includes fractions and decimals • Data obtained by the investigator

E.g., birthweight, arm circumference, age • Advantage:

Some variables are always qualitative in nature • investigator can control the quality of data, the design of your
study is tailor-fitted to your objectives.
• Disadvantage: - Data collector and subjects give them reward or monetary reward

• more time consuming and more expensive

B. Secondary (Meta-analysis, Record’s review) (by other C. Questionnaires


investigators)
• If an interview is not possible, questionnaires can be given to the
• Existing data obtained by other people for the purposes not respondents
necessarily those of the investigator’s
• Instruments: Questionnaire
• Advantages:
• Advantages:
• Pre-collected data
• Less expensive
• Cheaper
• Timesaving (simultaneous, 50 subjects)
• less tedious
• Disadvantages:
• Timesaving
• Lower yield of respondents
Disadvantages (the quality will suffer)(past collector has different
objectives) such as for description purposes. - 1-2 pages must be done and not 4-5 pages

• The user has no control how the data were collected (what if not - Itapon mga answer
well-trained data collector)
- data is bad, sample size must be doubled
Trophozoites, cyst and pseudoparasite because not well-trained
III. Qualities of Statistical Data (Validity and Reliability)
• Problems in timeliness (20 years ago or 10 years ago)
A. Timeliness (Be ready for assholes)
• Problems in confidentiality (STD, do not release for hospital)
(Primary source) • Interval between the date of occurrence of the different events
considered and the time the data is ready to be used or
• The objectives behind the collected data differ from those of the disseminated
researcher
• Reasons for delay:
• The definition of terms may differ from the researcher
• Numerous forms that need to be filled-up and filed
Amoebiasis – parasite is present, change physiology of the host
• Difficulty in completing sample size (high sample size)
Definition of amoebiasis – parasite present and acidic diarrhea
• Problem in transportation and communication
Basic – viral or bacterial diarrhea
• Bureaucracy
II. Methods of Data Collection
• Accomplishment of too many permits
A. Physical observations (using senses for collecting data)
• Lack of funds, manpower etc.
• Investigator may directly or indirectly (hiring research assistant)
(i.e. thru Ras/trained data collectors) observe the sample B. Completeness
population/experimental subjects
• 2 components:
• Instruments:
• Completeness of coverage 99/100 = incomplete, 6 districts
• Cameras, Recorders, Laboratory apparatus, Diagnostic kits must have 6 districts people, High school student = hindi nag-
aaral or lagay sa scope and limitation
B. Interview
• “Do the data cover the entire geographic area and target
• One on one interaction between the subject and the investigator population within the area of interest?”
(High response rate and lower rate of lying) (low temptation of
lying) • Completeness in accomplishing all the items in every form

• Conducted if some variables (e.g. subjects’ opinions/feelings) are • Incomplete questionnaires


not amenable to observation
C. Accuracy = Validity of the research
• Answers should be written verbatim (word for word)
External validity – data collected from the sample description can
• Instruments: be extrapolated from the target population

• Interview schedule Internal validity – Research has biases, Free from confounding,
low natural errors.
• Key to attaining relevance

• Having well-defined objectives for data collection

Researches Bias – can be controlled by blinding • There should be constant communication between data producers
and data
Single Blinding – treatment didn’t know the medication
Users
Double blinding – data collector didn’t know the medication
F. Adequacy (Enough data to answer all of the objectives)
Triple blinding – data processor/epidemiologist/statistician didn’t
know it. • “Do the collected data provide all the basic information needed to
meet the requirements of the user?”
Natural error – innate in research, it’s always present
Steps in Planning and Conducting Research
How to control natural error: You increase the sample size

Confounders can be controlled after data collection


Research
Another way to improve accuracy: Probability sampling – all the
population has the opportunity to become one • Systematic (we follow the scientific method)), objective (must
know the outcome even failed)
Non-Probability
Researcher’s Bias: Negative and not will publish or manipulate
Purposive sampling – it is a bias sampling such as barangay. and reproducible activity
Conveniences. Classmates
• Systematic: follows the scientific method of inquiry
Quota sampling – 100 people, it’s fine na
• Objective: conclusions based on empirical evidence
• Refers to how close the measurement or the data is to its true
value • Reproducible: repeatable

• “Do data reflect the true situation?” - Do what you love

Improve your research design Steps in Conducting Research

2 Types of Research I. Identify the problem/topic

1. Experimental A. Selection of topic

2. Observational B. Formulation of objectives

 Cross Sectional – sample population without II. Review of related literature


classifying the exposure and outcome status
(simultaneous), you cannot establish causation, only III. Statement of the problem
correlation)
 Case Control – Retroactive, based on their outcome IV. Formulating testable
(who’s positive or not) after that they’ll talk about the
hypothesis
exposure variable
 Cohort – Most valid, sampling population and divided V. Research design
it in their exposure status (expose and non-expose).
Effect of smoking to cervical cancer; After 20 years, VI. Tools for data collection
Causal. Losses to follow up. Funding.
VII. Plan for data analysis
Exposure variable = Independent variable
VIII.Data collection
Outcome variable = dependent variable
IX. Data processing
D. Precision/Consistency = Reliability = Repeatability
X. Data analysis
• Refers to the extent to which similar information is obtained
when a measurement is performed or an observation is made more XI. Written report
than once (different population and comparable in result) (quality
setting) XII. Dissemination

E. Relevance XIII. Utilization

• Consistency of the data produced with the needs of the data users I. Identify the problem/topic
(fund of agency)(must be lower province or Manila only)
A. Selection of topic
1. Researcher characteristic (Topic Interest, you will pick what is • To determine the antihelminthic potential of Tabernaemontana
your topic) and pick an adviser pandacaqui bark extract

a. Personal interest and inclination • Specific objectives (Most substantial form, most simple form
possible and measurable)
b. Training
 Limit it like using methanolic extract only
c. Previous experience (OJT so it will have normal experience)  Bacteria: Gram positive, Gram negative, fungal species
 Loophole is bad
Bureau of Plants
Example: To identify the gastrointestinal parasite present in fishes.
2. Nature of topic to be investigated
(This is wrong)
a. Timeliness (must not be outdated)
• Identify in detail and in measurable terms the aims of the
b. Relevance (Topics that the academe wanted) (magbasa ng research
papers of journals) (8 am – 12 noon)
• Breaks down what needs to be accomplished into smaller logical
c. Duplication (must not be redundant) components

• Indicators that will make the objectives measurable should be


d. Applicability or practical value of results (Do you have an included here
outcome?) (Dapat may pondo)
General: To determine the antimicrobial potential of Mangifera
e. Cost-effectiveness (Molecular characteristics of Narra) indica leaf extract
3. Feasibility (will not graduate on time because of research due to • To determine which of the following concentrations, 25%, 50%,
their feasibility) 75% and 100% of the aqueous leaf
a. Availability of subjects/test organisms extract will exhibit the largest zone of inhibition against
Staphylococcus aureus, Escherichia coli and Candida albicans
b. Availability of special equipment
• To determine which among the aqueous, methanolic or ethanolic
c. Necessity for special working conditions (sterile environment,
leaf extract will produce the largest zone of inhibition against
every day with UV light)
Staphylococcus aureus, Escherichia coli and Candida albicans
d. Degree of sponsorship or administrative cooperation required
General: To determine the risk factors associated with Entamoeba
e. Hazards, handicaps to be encountered histolytica infection (Risk factors: Sex, Age, Occupation, Water
supply, Socio-economic status, Type of school)
f. Time requirements or duration of the project
Specific: To determine the correlation between type of water
g. Availability of research funds source and EH infection among Barangay 123 (among grade
school student of Manila Elementary school in Barangay 686).
B. Formulation of research objectives (the Why)
General: To determine the antihelminthic potential of
• The objectives should reflect the questions whose answers the Tabernaemontana pandacaqui bark extract.
investigator wants the study to yield
To determine the antihelminthic potential of pandacaqui
- Used as guides in specifying the variables of the study methanolic bark extract against Helicobacter using the L3 larva
assay.
Ex.
* Notes on identifying specific objectives
 To determine the relationship between music genre and
pulse rate 1. Review theories and research related on the topic (Always check
 What is the relationship between music genre and pulse recommendation)
rate?
2. Replicate earlier studies to verify the results
* Categories of research objectives
If ginawa sa 123, gawin nalang sa 124
General objective- generic statement w/c describes in broad terms
what the study wishes to accomplish, (use layman terminology) • Ensure that the topic is not over-studied

Ex: 3. Repeat past researches but with modifications (sample studies –


call centers -> factory workers)
• To determine the antimicrobial potential of Mangifera indica
(mango) leaf extract Rapid Diagnostic Tests – Use it to other types of infection

• To determine the risk factors associated with Entamoeba Rapid diagnostic tests (RDTs) are a type of point-of-care
histolytica infection diagnostic, meaning that these assays are intended to provide
diagnostic results conveniently and immediately to the patient
while still at the health facility, screening site, or other health care IV. Formulate testable hypothesis and define basic concepts and
provider. variables

4. Methods used in one area can be applied into other related areas. • Hypothesis: assertion/proposition about the relationship between
two or more variables (Must be parallel to the objectives)
3. Characteristics of research objectives
Number of specific objectives = Number of hypothesis and Parallel
a. Objectives should be phrased clearly, unambiguously and
specifically • Each specific objective should have its own hypothesis

b. Objectives should be stated in measurable and operational terms • Each hypothesis should be expressed clearly

Ex: • Hypothesis should be expressed in observable and


measurable terms w/c would allow objective evaluation
•To determine effect of poverty (Household income, socio-
economic status) on the rate (prevalence- over population size, Categories of variables (test statistics)
incident-specific time period over person time (hours of at risk of
exposure to patient)) of sth infection 1. Independent variable: presumed to cause, influence or simulate
the outcome (Xs, X-axis)
II. Review of related literature
• AKA exposure variable, cause, risk factor
Significance
2. Dependent variable: refers to the output, the outcome or
• To know current state of knowledge re selected topic response variable, disease
• To identify gaps in knowledge 3. Control variable: a variable that may influence the result of the
study.
• To integrate and criticize other related studies
Covariate -Randomize study location and increase sample size
Note the ff when writing the rrl
Exposure and Outcome
• Who have done work on the research topic being planned
• May interfere with the interaction between the IV and DV
• Research design utilized
• Thus, needs to be controlled, held constant or randomized to
• Study results neutralize/cancel out the effect
- 56 students are low, public and private must-have 3. Control variable
• Problems encountered during research • Can be a covariate or a confounder
 How the problems were resolved • Covariate- characteristic of the population that may be related to
the outcome
III. Statement of the problem (refinement or tuning of your specific
objectives) • May or may not be associated with the DV
• Involves defining the actual problem for investigation in clear • Confounder- characteristic of the population that may interfere
specific terms with the interaction between the IV and DV
• Purpose • Associated with both the IV and the DV

• Further refinement of objectives Sex and malaria – Sex (independent/Exposure), Malaria


• Narrows the scope of the study (Dependent/Outcome), Occupation (Confounder).
• Restricts and limits the coverage of the study
Covariate – associated with the outcome variable.
Research algorithm – flowchart (For experimental)
Confounder – associated with the exposure and the outcome
Research Paradigm – Theoretical and not experimental variable.

Ex

• Gen Obj: To determine the morbidity of sth among children Examples

• Spc Obj: To determine the specific prevalence rates of sth among • Prevalence of Trichomonas vaginalis infection among 16-66 yo
pre-school and grade school children in Intramuros. women

• Statement of the problem: To determine the specific prevalence Age (Exposure), diseases status/Infection status (Outcome),
rates of Ascaris lumbricoides, Trichuris trichiura and hookworm Number of sex numbers, frequency of intercourse, location (mas
infection among the 3-6 and 7-12-year-old residents of Brgy 654, adventurous), Urban/Rural area (Covariate)
Intramuros, Manila
• Antihelminthic potential of Tabernaemontana pandacaqui 2. Natural Error – Explanation due to chance; to decrease this,
aqueous increase the sample size, Alpha of your study, allowable error.

- methanolic and ethanolic leaf extracts against the L3 of 3. Effect of Confounder


Caenorhabditis elegans, E= type of extract (methanolic etc.) O=
L3 or mortality rate of the larvae C= Area where your leaf get.

• Sex as a risk factor for malaria

• The effect of exercise on colon cancer Design the tools for data collection

E= Level of exercise/activity O= disease status C= Weight, age • Can be based on related studies or can be designed by
the investigator
• Diabetes as a risk factor for dementia
• Ex: interview schedule, questionnaire, experimental set-up
E= Diabetes (type 1 or type 2) O= Dementia C= Age
VII. Design the plan for data analysis
Research Design (90% of your methodology)
• Dummy tables- skeleton tables drawn to help conceptualize how
• Represents the strategy of the researcher in answering the the data will be organized and presented (To ensure that all of your
objectives of the study objectives will be answered) No dummy table = Kulang sa data
(what if di nagawa sa raining season). To prevent it, construct
• Should be based on: dummy table but the E and O variable is plotted. Prevalence during
July – August, Prevalence during Oct-Nov. Test for different
• The objectives
proportion.
• Feasibility
VIII Data collection (methodology must be good)
• Ethics (Getting all permits – Invasive or non-invasive, benefits,
• Most expensive and most time-consuming phase of the research
objectives, risk, compensation, in local dialect, English – para sa
project
institution)
• Problems/handicaps depend on how well steps 1-7 are executed
• Economy and efficiency (Time-frame)
• If research involves human subjects, the community must be well
Components:
oriented before data collection.
• Sampling method (probability sampling and what type? and non-
Invasive – Pagkuha ng dugo ; Non-invasive – Stool collection.
probability, why would you pick this?)
IX. Data Processing (Data editing)
• Sample size (Formula) N=56 Can be used, It will depend on your
indicator. Mean? Proportion? Percentage? Different proportion? - Encoding the data
If sample size, you can use other paper for it. Data editing – Questionnaire kapag di nasagutan, alisin
• Strategies for controlling and manipulating relevant variables Data doctoring – Papalitan mo yung sagot para mastisfy neto. If
negative result, do not doctor.
• Evaluation criteria for outcome instrumentation (Questionnaires,
Translated, and tested) Legibility of entries – discarded from the start.
Research Design Cause of death – Cervical cancer and Procorpio
- The research design should be able to ensure the external • Raw data is transformed into information
(represent your target (and internal validity (biases) of the research
and control confounder • Where data is edited and responses/observations are coded
External validity – Sample population = good representation of the • Editing: investigator checks the data collection tool for:
target population.
• completeness,
Internal validity – Real relationship or the accurate relationship
between your Independent and dependent variable • legibility of entries

-> To determine how close your acquired value from the real value • Consistency of responses

If Correlation study = Interaction between E (Independent) and O • Purpose: facilitate data analysis and interpretation
(Dependent) variable
X. Data Analysis (What would be the statistical test available)
How to know? Eliminate 3 variable
- Parametric (ANOVA, P-Test or T-test. If probability sampling or
1. Bias – systematic error committed by the researcher. (Non- normal distribution, Equal distribution of variance,) and Non-
probability or sample size is small) (Blindings) Parametric (non Probability)
• Involves quantification, description and classification of data the Exposure variable) EMM – you don’t have to correct but
explain.
• Researcher must know the basic concepts, procedures and
limitations of each statistical test (Decision tree)

XI. Write the research report Is there an assoc between variable and outcome?

• 3 copies of your written report (overall and separately among exposed and unexposed)

- Investigator is required to submit a report of all research-related If Yes: Is there an assoc between variable and exposure (Must not
activities he’s done and his findings be biological)

XII. Result dissemination (You have to contribute to the pool of No: Variable is not a confounder
knowledge)
If yes (2): Variable is a confounder
• Via scientific journals, scientific fora
No: Variable is not a confounder
• Beware of predatory conferences/publishers (maniningil lang ng
convention fee and certificate) https://beallslist.weebly.com/

XIII. Result utilization (Based on number of citations)

The best way to determine value of a research is through the extent

to w/c the results are utilized

• Baseline study results= used in determining the effects of an Noncausal


intervention

• Needs assessment study results= “””“ design for interventions Causal, unidirectional
• Clinical trials/experimental study results= “”” recommendations but not sure.
for most

effective Tx
Causal,
Confounding unidirectional
Internal Validity: association between exposure and outcome is
true if the study rules out the 3 alternative explanations Odd’s Ratio = odds of the cases to be exposed over the odds of the
controls being exposed (unexposed group)
 Bias- systematic error in the investigator’s design or
conduct of the study that leads to false assoc. between AD/CB
exposure and outcome
 Random error- probability that the observed result is Epidemiology – Case can be also called “deceased”
due to chance (Increase sample size to minimize
random error) - Control (non-deceased)
 Confounding
Criterion (3)
Confounding: third/extraneous/nuisance variable that distorts the
 The confounder is not an intermediate variable in the
relationship between exposure and outcome
causal pathway between exposure and outcome
 Can be an independent risk factor for the outcome  The variable should not be affected by exposure or
outcome, although it may affect exposure or outcome
Covariance (Other risk factors that you’re not interested in), EMM
(Effect measure modifier -, you cannot be corrected, only Sex and Malaria – EMM (Chromosomes of Y)
explained) diseased
Malaria and Occupation – 5.31 Odd’s Ratio (531% higher)
Criteria
Protective – Anemia (negative or lower than 1)
1. The variable is causally associated w/ the outcome
III. Identifying Confounders (If you know the Confounder)
2. The variable is causally or noncausally associated with the
1. Strength/Magnitude of association Less than 5% - huwag
exposure (double-headed), Tumilaok manok, walang sun pero
galawin
baliktad

3. The variable is not an intermediate step in the causal pathway


between exposure and outcome (It shouldn’t be a substep if not
biological or physical in nature interaction) (if you’re c precedes
= Underestimate the interaction to the exposure and outcome

Adjusted for L. loa parasitaemia OR A = 5x more severe

(Qualitative confounding) Opposite

= exposed to protective factor (1.78 – first world) (adjusted: lower


than 0.72 (lower than 28%) )

S= Substeps (EMM)

Orc = Confounding

ORa = Adjusted

There is a confounder if the association seen across the strata are of


similar magnitude but different from the crude estimate.

2. Direction of Confounding

 Confounding pulls the observed association away from


true association
 It exaggerates true association (Positive Confounding)
OR u > OR A

2. Direction of Confounding

 It hides true association (Negative Confounding) OR u


< OR A
 O. volvulus patients taking Ivermectin VS Severe
Adverse Effect OR u = 1
 Adjusted for L. loa parasitaemia OR A = 5

2. Direction of Confounding It inverses the direction of the


association (Qualitative Confounding)

a 1+ d 1 a 2+d 2
+
n1 n2
b 1+ c 1 b 2+c 2
+
n1 n2
(Positive Confounding) OR u > OR A

= The confounder actually exaggerates the relationship between


the exposure and outcome (Unadjusted is bigger)

(Negative confounding) OR u < OR A

You might also like