You are on page 1of 222

Research Methodology and

Biostatistics

Dr.K.P.Suresh ,Ph.D (Biostatistics)


National Institute of Veterinary Epidemiology and
Disease Informatics (NIVEDI) Hebbal, Bangalore560024

A statistician is someone who, with his


head in an oven and his feet in a bucket
of ice water, when asked how he feels,
responds: On the average, I feel fine.

What is Research?
Research is the systematic process
of
collecting
and
analyzing
information
to
increase
our
understanding of the phenomenon
under study. It is the function of the
researcher to contribute to the
understanding of the phenomenon
and
to
communicate
that
understanding
others.
Invention: Invest money to
to generate
knowledge
Innovation: Invest knowledge to generate money

The logic of scientific


reasoning

The whole point of


science is to uncover the
truth.
We have our senses,
through which we
experience the world
and make observations.

We have the ability to


reason, which enables us
to make logical
inferences.
In science we impose
logic on those
observations.

Inductive Inference:
Statistics as the Technology of the
Scientific Method
Statistical methods are objective methods by which
group trends are
abstracted from observations on many separate
individuals.
Summarizing data: Averages, percentages , presentation
of tables and charts
A major part of statistics involves the drawing of
inferences from samples to a population in regard to
some characteristic of interest
In statistical reasoning, then, we make inductive
inferences, from the particular (sample) to the general
(population). Thus, statistics may be said to be the
technology of the scientific method.

CLINICAL RESEARCH PROCESS


Pre-clinical testing
Investigational New Drug Application (IND)
Phase I (assess safety)
Phase II (test for effectiveness)
Phase III (large-scale testing)
Licensing (approval to use)
Approval (available for prescription)
Post-marketing studies (special studies and
long-term effectiveness/use)

Scientific enterprise
Values, Ethics and Standards in Scientific
Research
Research is based on the same ethical
values that apply in everyday life, including
honesty, fairness, objectivity, openness,
trustworthiness, and respect for others.
A scientific standard refers to the
application of these values in the context of
research. Examples are openness in
sharing research materials, fairness in
reviewing grant proposals, respect for
ones colleagues and students, and honesty
in reporting research results.

Scientific misconduct
The most serious violations of standards have come to be known as
scientific misconduct. The U.S. government defines misconduct as
fabrication, falsification, or plagiarism (ffP) in proposing, performing, or
reviewing research, or in reporting research results.
Scientists who violate standards other than ffP are said to engage in
questionable research practices. Scientists and their institutions should
act to discourage questionable research practices (QRPs) through a broad
range of formal and informal methods in the research environment
Fabrication is making up data or results.
Falsification is manipulating research materials, equipment, or processes,
or changing or omitting data or results such that the research is not
accurately represented in the research record.
Plagiarism is the appropriation of another persons ideas, processes,
results, or words without giving appropriate credit.
Questionable Research Practices: deliberately dividing research
results into the least publishable units to increase the count of ones
publications

Intellectual Property rights in


Research
Discoveries made through scientific research can have great
value to researchers in advancing knowledge, to
governments in setting public policy, and to industry in
developing new products.
Researchers should be aware of this potential value and of
the interest of their laboratories and institutions in it, know
how to protect their own interests, and be familiar with the
rules governing the fair and proper use of ideas.
Intellectual Property rights:
benefiting from a new idea may require establishing
intellectual property rights through patents and copyrights,
or by treating the idea as a trade secret. Intellectual
property is a legal right to control the application of an idea
in a specific context
Patent: Control the Application of ideas
Copyright: Control the expression of ideas

Research methods
vs Research methodology
Research methods: usually refers to
specific
activities
designed
to
generate
data
(Questionnaire,
interviews,
focus
groups,
observation, experimental)
Research Methodology: is more about
your
attitude
to
and
your
understanding of research and the
strategy you choose to correctly

Why do research?
Research allows you to gain appreciation for the practical
applications of knowledge, and to step outside your
classroom and learn about the theories, tools, resources, and
ethical issues that scholars and professionals encounter on a
daily basis.
1.Fascination
2.Answer to unsolved problems
3.Gain insight in to a particular issue
4. Develop many transferable skills for the benefit of
Research
is a method of making new ideas to be implemented into
community
practice and checking if it works
5.Personal satisfaction and achievement

Increasing Mind
Power
NEGATIVE
POSITIVE
Anger

Preoccupation/Work

Irritability

Laughing therapy

Jealousy

Yoga

Blame

Music therapy

Complain

Read

Anxiety

Play

Inferiority

Ego personality

How to be Happy
Keep your heart free
from hate, your
mind from worry.
Live simply, expect
little, give much,
sing often, pray
always. Fill your life
with love, scatter
sunshine, forget

Stages of Scientific Knowledge


1. Explorative :
is undertaken when few or no previous studies exist. The aim is look for
patterns or hypothesis that can be tested and will form the basis for
further research: Meta-analysis

2. Descriptive research
Describe the data generating process, Description would answer questions
such as; 1. what is the range of prostate volumes(ml) for a sample urology
patients 2. What is the difference in average volume between patients
with negative biopsy results and those with positive results

3. Explanation or analytical research


We seek to infer characteristics of the data generating process . Inference
would answer questions such as, for a sample of patients with prostate
problems , can we expect the average of volumes of patients with positive
biopsy results to be less than those of negative biopsy results.

4. Predictive research
We seek to make predictions about a characteristic of data generating
process, such prediction would answer questions such as, on the basis of
patient negative digital rectal examination, Prostate Specific antigen

Internal Validity
Internal validity is a crucial measure in quantitative studies, where it
ensures that a researchers experiment design closely follows the
principle of cause and effect.
The researcher can eliminate almost all of the
potentialconfounding variablesand set up strongcontrolsto
External
isolate other factors.

Validity

External validity is one the most difficult of the validity types to achieve,
and is at the foundation of every good experimental design.
External validity asks the question of generalizability: To what
populations, settings, treatment variables and measurement
variables can this effect be generalized?

Confounding: Confounding is the distortion of the


effect of one risk factor by the presence of another.
Confounding occurs when another risk factor for a
disease is also associated with the risk factor being
studied but acts separately
Confounding can be controlled by
restriction, by matching on the
confounding variable or by including it in
the statistical analysis.
Bias (Systematic Error): Any process or
effect at any stage of a study from its design to
its execution to the application of information
from the study, that produces results or
conclusions that differ systematically from the
truth.

Extraneous variables
Extraneous variables are those having relationship with
main variables (X and Y)
Confounder:

Z
X

Mediator
Z
Moderator: A moderator is a variable (z) , where X and
Y have different relationship at different levels of Z
Suppressor:
Z

Covariate
X

Decision theory and Statistics


Decision theory is theory about decision, This subject is
not unified one. The Decision theory is concerned with
goal directed behavior in the presence of options

Statistics for decision


making
Data analysis
Statistical tests
Predictions
Simulations

Accuracy and Precision


Accuracy is the
degree of veracity
(adherence to
truthfulness)
Precision is the
degree of
reproducibility

Waiting hurts. Forgetting hurts. But not


knowing which decision to take can
sometimes be the most painful...

Decision based on
sample data

Decision
making
system

Testing Surveillance System


Criteria:
Surveillan
ce system

Actual Condition
(in population)
Infection
present

Infection
Absent

Infection
present

True Positive

False Positive
(Type II error)

Infection
Absent

False
Negative
(Type I error)

True Negative

Reality
Effect does
not exist

Effect
Exists

Effect does
not Exist

Correct
Decision

Type 2
Error

Effect Exists

Type 1
Error

Correct
Decision

Conclusion

How to control errors


Definition of population
Study design
Research hypothesis
Null hypothesis
Alternative hypothesis
One tailed/two tailed study
Sample size
Sample selection
Randomization
Blinding
Data collection procedures
Data management procedures
Suitable statistical method/s
Interpretations

Testing Surveillance System


Criteria:
Surveillance
system

Actual Condition
(in population)
Infection
present
(Ho)

Infection
Absent
(H1)

Infection
present

True Positive
(TP)

False Positive
(Type II error)

Infection
Absent

False Negative
(Type I error)

True Negative
(TP)

Sensitivity

Sensitivity= TP/(TP+FN)

Specificity

Specificity=TN/(TN+FP)

PPV/NPV

PPV= TP/(TP+FP)
NPV=TN/(TN+FN)

Accuracy=
(TP+TN)/N

infection or disease; or to detect an exotic or new disease


so that control action can be quickly instituted
Monitoring:
Systematic
collection,
analysis
and
dissemination of information about the level (occurrence,
incidence and prevalence) of infections or disease that
are known to occur in a specified population
Surveys: are the tools for data collection
An Investigation using the systematic collection of
information from a population that is not under the control of
Investigator
Passive Surveillance: passive or general surveillance typically
takes the form of disease reporting system. If a producer notices
a disease problem , this is reported and reported in systematic
manner
Active Surveillance: uses structured disease surveys to collect
high-quality disease information quickly and cheaply
Representative Sample: one that is Similar to Population.
Inference is valid only when a representative sample is chosen
Random Sample: Every element in the population has the same
probability of being selected in the sample.

Universal TRUTH we learnt


sun rises in the east
Fact:sun neither rises nor sets, only
earth rotates
Moral:
Education spoils our

Applying Inclusion and


Exclusion criteria
for defining study population
All clinical trials have guidelines about who can participatethese
are specified in the inclusion/exclusion criteria:
Factors that allow someone to participate in a clinical trial are
"inclusion criteria"
Factors that exclude or do not allow participation in a clinical
trial are "exclusion criteria"
These factors may include: Age, Gender, The type and stage of
disease, Previous treatment history, Specific lab values, Other
medical conditions..
Inclusion and exclusion criteria are not used to reject people
personally. The criteria are used to:
1. Identify appropriate participants
2. Keep them safe
3. Help ensure that researchers can answer the questions they
want answered
One of the crucial component of successful research or trial is the

Testing as a Scientific
Knowledge
1.

Specify clearly, completely and unequivocally the question


you are asking

Identify, specify in detail and plan how to measure the


variable(s) to answer that question

Review your definitions of population and sample and verify


the appropriateness of generalization

Review the sampling scheme to obtain the data

Specify exactly the null and alternative hypothesis

Select the risks for Type I and Type II error

Choose the form of statistical test

Verify that your sample size is adequate to achieve the


proposed statistical power

At this point , obtain your data

10

Identify and list and test possible biases

11

Perform the statistical test and form the conclusion

Study designs

In many ways the design of


a study is more important
than the analysis. A badly
designed study can never
be retrieved, whereas a
poorly analysed one can
usually be re-analysed.

Consideration of design is also important


because the design of a study will govern how the
data are to be analysed.

Study Designs
Observational Studies:

Nature determines who is


exposed to factor of interest and who is not

Cross-Sectional studies

Case-Control studies

Prospective
Experimental Study Designs:

Investigator

determines who is exposed

Correlation studies and Modeling:

Feasibility study:
In vitro studies, case series studies, pilot
studies

Case series
Descriptive account of an interesting
characteristic
In one patient
In a small group of patients
Usually involves patients seen over a short
period of time
Does not involve controls
No research hypothesis
Leads to formulation of hypotheses, other
types of studies

Cross sectional studies


Analyze data collected at a single
point in Time
Provide information on status quo
(e.g. prevalence of a condition, or
disease characteristics)
Quick to complete, cheap
Cannot examine outcomes
May lead to biased conclusions about
disease
progression

Case control studies


Longitudinal, retrospective
design
Starts with the outcome
Cases: those with the outcome
Controls: those without the
outcome

Case-control advantages
Shorter, Cheaper and Useful
to study rare diseases or
diseases that take a
long time to manifest, or to
explore preliminary
Hypotheses
Case-control
disadvantages
Difficult to control for bias
May depend entirely on
quality of existing records
Can be difficult to designate
appropriate control group

Advantages/Disadvantages of CaseControl study

Advantages

Disadvantages

Quick

Uncertain if exposure preceded


disease

Require reasonably small


numbers

Potential for recall bias

Reasonably economical

Selection bias (Recruitment


influenced by exposure)

Sensible for study of rare


disease

Unable to estimate disease


incidence

No loss to follow-up

Case control studies are less


reliable than either randomized
controlled trials or cohort
studies.

Can test current hypothesis


Consistency of measurement
easily maintained

RETROSPECTIVE STUDY

Experimental Design
where the investigator determines who is exposed. These
may prove causation
Determine causes:
True experimental design is regarded as the most accurate form of
experimental research, in that it tries to prove or disprove a
hypothesis mathematically, with statistical analysis.
A double blind experiment is an experimental method used
to ensure impartiality, and avoid errors arising from bias.
Quasi Experimental Design: where lack of randomization

Prospective Study
designs
The most powerful studies are prospective
studies, and the paradigm for these is the
randomised controlled trial. In this subjects
with a disease are randomised to one of two
(or more) treatments. one of which may be a
control treatment.

Parallel Study designs


A parallel group design is one in which
treatment and control are allocated to different
individuals. To allow for the therapeutic effect of
simply being given treatment, the control may
consist of a placebo, an inert substance that is
physically identical to the active compound.

Randomized Controlled
Study

1. Randomized controlled trials require one or more control


groups for purposes of comparison.
2. The selection of control groups depends on the objectives
of the study. In the evaluation of traditional medicine,
3. a concurrent control group should be used.
The control groups may involve (not in order of priority):
well established treatment
non-treatment
different doses of the same
treatment
sham or placebo treatment
full-scale treatment
minimal treatment
alternative treatment.

Cross-over Study designs


A crossover study is one in which two or more treatments
are applied sequentially to the same subject.
The advantages are that each subject then acts as their own
control and so fewer subjects may be required.
The main disadvantage is that there may be a carry over
effect in that the action of the second treatment is affected
by the first treatment.

Cohort studies
Cohort: a group of people who have something in common and who
remain part of a group over an extended period of time
Outcomes determined after follow-up: longitudinal, prospective
studies

COHORT STUDY

Advantages/Disadvantages of Cohort
study
Advantages
Disadvantages
Can collect exposure
information as exposure
happens

Duration of study : May take


decades to complete

Can collect multiple


different exposures

Subjects must be followed


over time

Exposure information can


be relatively reliable

Losses potentially invalidate


the study

Can collect information as


outcome happens

Very expensive
Can you afford to wait
decades for your answer

Comparison of Case-Control ad
Cohart Study Design

Case-control works from outcome (or presence of disease) to


treatment (or exposure),
Cohort works from treatment (or exposure) to outcome (or
presence of disease).

Nested Case-Control Study

cases of a disease that occur in a


defined cohort are identified and, for
each, a specified number of matched
controls is selected from among those
in the cohort who have not developed
the disease by the time of disease
occurrence in the case

potentially offers impressive reductions


in costs and efforts of data collection and
analysis compared with the full cohort
approach, with relatively minor loss in
statistical efficiency

Case-Cohort
studies
The case-cohort design is most useful in analyzing time to
failure in a large cohort in which failure is rare.
Covariate information is collected from all failures and a
representative sample of censored observations.
Sampling is done without respect to time or disease status,
and, therefore, the design is more flexible than a nested
case-control design.
Despite the efficiency of the methods, case-cohort designs
are not often used because of perceived analytic
complexity.

IMPROVED EXPERIMENTAL
DESIGN

Shift effect or symptomatic effect


Slope effect or Disease modifying effect

R
BETTE MANCE
R
PERFO

WITHDRAWAL DESIGN
ACTIV
E
SLOPE EFFECT/DISEASE
MODIFYING EFFECT

PLACEB
O

TIME

Blinding
To

further
eliminate
bias,
randomized trials are sometimes
"blinded" (also called masked).
Single-blinded trials are those
in which participants do not
know which group they are in and therefore which
intervention they are receiving
- until the conclusion of the
study.
Double-blinded trials are those
in which neither the
participant nor the
investigators know to which
group the participant has been
assigned until the conclusion
of the study.
- Triple blinded trials where
Statistician also blinded

1.A study is done to examine the association


between a mothers education and risk of a
congenital heart defect in her offspring. The
investigator enrolls a group of mothers of babies
with birth defects and a group of mothers of
babies without birth defects. The mothers are
then asked a series of questions about their
education
1.Case series
2.Case-control study
3.Nested case-control study,
4.Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,

2.A study on the association of coffee


consumption and performance on a
memory test randomly assigns half of the
enrolled subjects to drink coffee one hour
before taking the memory test and the
other half to not drink coffee one hour
before taking the memory test.
1. Case series
2. Case-control study
3. Nested case-control study,
4. Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,
7.Cross-sectional study,
8.Ecological study and

3.A study examining the association


between meat consumption and heart
disease compares the average number of
kilograms of meat consumed per person
for 50 different countries to the incidence
rate of heart disease in the same 50
countries
1. Case series
2. Case-control study
3. Nested case-control study,
4. Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,
7.Cross-sectional study,
8.Ecological study and

4.A study describes a group of hospital patients all of


whom suffer from migraine with aura and experienced an
ischemic stroke.
1. Case series
2. Case-control study
3. Nested case-control study,
4. Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,
7.Cross-sectional study,
8.Ecological study and
9.Case cohort study

5.An investigator enrolls a group of healthy


individuals and distributes questionnaires to
collect information on sex and blood type. The
investigator then examines the association
between sex and blood type.
1. Case series
2. Case-control study
3. Nested case-control study,
4. Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,
7.Cross-sectional study,
8.Ecological study and
9.Case cohort study

6.A researcher uses a database of medical


records to identify a group of retired factory
workers. He reviews each persons medical
records to follow their factory exposures over
time and see which of these subjects has
developed skin cancer in the past 25 years.
1. Case series
2. Case-control study
3. Nested case-control study,
4. Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,
7.Cross-sectional study,
8.Ecological study and
9.Case cohort study

Boy: Where Are You Going?


Girl: For Suicide..
Boy: Then, Why so Much Make-Up?
Girl: You Idiot..!! Tomorrow My Photo
will Come In Newspaper..........

Superiority, Equivalence and NonInferiority studies


Superiority study:
A Trial or Research with the objective of showing that the
response to the investigational product is superior to a
comparative agent (Active or placebo control)

Significant results are good outcome

Equivalence study:
Research with the objective of showing that the difference
between control and study treatments are not large in either
direction of the study . Investigational product is compared
to a reference treatment without the objective of showing
superiority
Non-significant/significant results are good outcome

Non-inferiority study:
If the study objective is to demonstrate that

Measurement error
If more than one operator used in study
then measurement (gauge) error has two
components of variance:

2
total

Repeatability:
2
repeatability
instrument

Reproducibility:
2

reproducibility
operators

2
product

2
repeatability

2
gauge

2
reproducibiliy

- Variance due to measuring

- Variance due to different

What is Hypothesis
testing?

A statistical hypothesis is an assumption about a population


parameter. This assumption may or may not be true. Hypothesis
testing refers to the formal procedures used by statisticians to
accept or reject statistical hypotheses.
Statistical Hypothesis
The best way to determine whether a statistical hypothesis is true
would be to examine the entire population. Since that is often
impractical, researchers typically examine a random sample from
the population. If sample data are not consistent with the
statistical hypothesis, the hypothesis is rejected.
Null hypothesis. The null hypothesis, denoted by H 0, is usually the
hypothesis that sample observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H1
or Ha, is the hypothesis that sample observations are influenced by
some non-random cause.

Example
For example, suppose we wanted to determine whether a coin was fair
and balanced. A null hypothesis might be that half the flips would result in
Heads and half, in Tails. The alternative hypothesis might be that the
number of Heads and Tails would be very different. Symbolically, these
hypotheses would be expressed as
H0: P = 0.5
Ha: P 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails.
Given this result, we would be inclined to reject the null hypothesis. We
would conclude, based on the evidence, that the coin was probably not
fair and balanced.
hypothesis test can have one of two outcomes: Reject the Null
Hypothesis or failure to reject the null hypothesis.
distinction between "acceptance" and "failure to reject?"
Acceptance implies that the null hypothesis is true.
Failure to reject implies that the data are not sufficiently persuasive for
us to prefer the alternative hypothesis over the null hypothesis.

Null hypothesis/Alternative hypothesis


The null hypothesis, H0, represents a theory that has been put
forward, either because it is believed to be true or because it
is to be used as a basis for argument, but has not been
proved. For example, in a clinical trial of a new drug, the null
hypothesis might be that the new drug is no better, on
average, than the current drug. We would write
H0: there is no difference between the two drugs on average.
We give special consideration to the null hypothesis. This is due
to the fact that the null hypothesis relates to the statement
being tested, whereas the alternative hypothesis relates to
the statement to be accepted if / when the null is rejected.
The final conclusion once the test has been carried out is always
given in terms of the null hypothesis. We either "Reject H0 in
favour of H1" or "Do not reject H0"; we never conclude
"Reject H1", or even "Accept H1".
If we conclude "Do not reject H0", this does not necessarily
mean that the null hypothesis is true, it only suggests that
there is not sufficient evidence against H0 in favour of H1.

Null Hypothesis:
Suppose we are testing
the efficacy of a new drug on
Example

patients with myocardial infarction (heart attack).


We divide the patients into two groupsdrug and no drug
according to good design procedures,
and use as our criterion measure mortality in the two
groups.
It is our hope that the drug lowers mortality,
but to test the hypothesis statistically, we have to set it up in a sort
of backward
way.
We say our hypothesis is that the drug makes no difference, and
what we hope to do is to reject the no difference
hypothesis, based on
evidence from our sample of patients.
This is known as the null hypothesis.

Alternative Hypothesis
We test this against an alternate hypothesis, known as HA , that the
difference in death rates between the two groups does not equal 0.
We then gather data and note the observed difference in
mortality between group A and group B.
If this observed difference is sufficiently greater than zero, we
reject the null hypothesis.
If we reject the null hypothesis of no difference, we accept the alternate
hypothesis, which is that the drug does make a difference.
1.I will assume the hypothesis that there is no difference is true;
2. I will then collect the data and observe the difference between
the two groups;
3. If the null hypothesis is true, how likely is it that by chance
alone I would get results such as these?
4. If it is not likely that these results could arise by chance under
the assumption than the null hypothesis is true, then I will
conclude it is false, and I will accept the alternate hypothesis.

Why Do We Test the Null Hypothesis?


Suppose we believe that drug A is better than drug B in preventing death
from a heart attack.
Why don't we test that belief directly and see which drug is better, rather
than testing the hypothesis that drug A is equal to drug B?
The reason is that there is an infinite number of ways in which
drug A can be better than drug B, so we would have to test an
infinite number of hypotheses.
If drug A causes 10% fewer deaths than drug B, it is better. So first
we would have to see if drug A causes 10% fewer deaths.
If it doesn't cause 10% fewer deaths, but if it causes 9% fewer deaths, it is
also better.
Then we would have to test whether our observations are
consistent with a 9% difference in mortality between the two
drugs.

One tailed vs two tailed


A one-tailed test looks for an increase or
decrease in the parameter whereas
a two-tailed test looks for any change in
the parameter (which can be any
change- increase or decrease).

One tailed or two tailed tests


When is a one-tailed test appropriate?
Because the one-tailed test provides more power to detect an effect, you
may be tempted to use a one-tailed test whenever you have a hypothesis
about the direction of an effect. Before doing so, consider the
consequences of missing an effect in the other direction. Imagine you
have developed a new drug that you believe is an improvement over an
existing drug. You wish to maximize your ability to detect the
improvement, so you opt for a one-tailed test. In doing so, you
fail to test for the possibility that the new drug is less effective
than the existing drug. The consequences in this example are
extreme, but they illustrate a danger of inappropriate use of a
one-tailed test.
So when is a one-tailed test appropriate? If you consider the
consequences of missing an effect in the untested direction and conclude
that they are negligible and in no way irresponsible or unethical,
then you can proceed with a one-tailed test.
When is a one-tailed test NOT appropriate?
Choosing a one-tailed test for the sole purpose of attaining significance is
not appropriate. Choosing a one-tailed test after running a twotailed test that failed to reject the null hypothesis is not
appropriate, no matter how "close" to significant the two-tailed test

examples
We could use a one-tailed test, to see if
the stream has a higher pH than one
year ago, for which we would use the
alternate hypothesis HA: prev < current.
However, we may want a more rigorous
test, for the hypothesis that HA: prev
current. This would mean that both HA:
prev < current and HA: prev > current were
satisfied, and we could be sure that
there is a significant difference between
the means.

Factorial Designs
An experiment , the process engineer's goal is to determine how the yield
of an adhesive application process can be improved by adjusting three
(3) process parameters: mixture ratio, curing temperature, and curing
time. For each of these input parameters, two levels will be defined for
use in this 2-level experiment. For the mix ratio, the high level is set at
55%, while the low level is set at 45%. For the curing temp., the high
level is set at 150 deg C while the low level is set at 100 deg C. For the
curing time, the high level is set at 90 minutes, while the low level is set
at 30 minutes.As mentioned, the output response monitored is process
yield. Assume further that the data were gathered by performing just a
single replicate (n=1) per combination treatment.

ADVANTAGES
Factorial designs are the
ultimate designs of choice
whenever we are interested in
examining treatment
variations.
Factorial designs are
efficient. Instead of
conducting a series of
independent studies, we
are effectively able to
combine these studies into

P value
The probability value (p-value) of a
statistical hypothesis test is the
probability of getting a value of
the test statistic as extreme as or
more extreme than that observed
by chance alone, if the null
hypothesis H0, is true.
It is the probability of wrongly
rejecting the null hypothesis if it
is in fact true.

1.Significant figures

+ Suggestive significance
(P value: 0.05<P<0.10)
* Moderately significant
( P value:0.01<P 0.05)
** Strongly significant
(P value : P 0.01)

The Confidence Level


The confidence interval is the statistical measure of the number of
times out of 100 that results can be expected to be within a
specified range.
For example, a confidence interval of 90% means that results of an
action will probably meet expectations 90% of the time.
The basic idea described in Central Limit Theorem is that when a
population is repeatedly sampled, the average value of an attribute
obtained is equal to the true population value. In other words, if a
confidence interval is 95%, it means 95 out of 100 samples will
have the true population value within range of precision.
Degree of Variability.
Depending upon the target population and attributes under
consideration, the degree of variability varies considerably. The
more heterogeneous a population is, the larger the sample size is
required to get an optimum level of precision. Note that a

Measurements
Identity : each value on the measurement scale
has unique meaning
Magnitude: Values on the measurement scale
have an ordered relationship to one another. That
is, some values are larger and some are smaller.
Equal intervals. Scale units along the scale are
equal to one another. This means, for example,
that the difference between 1 and 2 would be
equal to the difference between 19 and 20.
A minimum value of zero. The scale has a true
zero point, below which no values exist.

Types of Measurements
1. Nominal Scale of Measurement
The nominal scale of measurement only satisfies the identity property
of measurement.
2. Ordinal Scale of Measurement
The ordinal scale has the property of both identity and magnitude
3. Interval Scale of Measurement
The interval scale of measurement has the properties of identity,
magnitude, and equal intervals.
4. Ratio Scale of Measurement
The ratio scale of measurement satisfies all four of the properties of
measurement: identity, magnitude, equal intervals, and a minimum
value of zero.

Normal distribution and


Standard Deviation

No effect : d<0.20
Mild effect : 0.20
<d<0.50
Moderate effect: 0.50
<d<0.80
Large effect :
0.80<d<1.20
Very large effect :
d>1.20

In statistics, an effect size is a


measure of the strength of the
relationship between two
variables in a statistical
population, or a sample-based
estimate of that quantity. An
effect size calculated from data
is a descriptive statistic that
conveys the estimated
magnitude of a relationship
without making any statement
about whether the apparent
relationship in the data reflects
a true relationship in the
population.

Effect size is simply the change in the scale


from before to after
treatment, divided by the standard deviation at
baseline.

Question: "How to Kill an Ant??"


Marks!!

Asked in an Exam for 10

Student:
Mix Chilli Powder with Sugar, & keep It Outside the Ant's
Hole..!
After eating, Ant will Search for some Water near a Water
tank.
Push ant in to it.. =!!

Now Ant will go to Dry itself Near Fire,


When it Reaches fire, Put a Bomb into D fire..!!
Then Admit Wounded Ant in ICU..!! =O
And Then Remove Oxygen Mask from it's Mouth and Kill the
Ant.. !! =|
MORAL:
Don't Play with Students.. !!

Sample Size
Determining the sample size to be selected is an important step in any
research study. For example let us suppose that some researcher wants to
determine prevalence of eye problems in school children and wants to
conduct a survey.
The important question that should be answered in all sample
surveys is "How many participants should be chosen for a survey"?
However, the answer cannot be given without considering the
objectives and circumstances of investigations.
The choosing of sample size depends on non-statistical
considerations and statistical considerations. The non-statistical
considerations may include availability of resources, manpower,
budget, ethics and sampling frame.
The statistical considerations will include the desired precision of
the estimate of prevalence and the expected prevalence of eye
problems in school children.
The Level of Precision: Also called sampling error, the level of precision,
is the range in which the true value of the population is estimated to be.
This is range is expressed in percentage points. Thus, if a researcher finds

Sample size
Estimation
Size;

of

Optimum

Sample

Lower sample size leads to more


insignificant results
Too
many
sample
make
unnecessary significant results and
costly experiment
Criteria for Estimation of sample
size
1.Type I error (0.05 or 5%)
2.Statistical Power( 0.80 or 80%)
3.Expected difference
4.Standard deviation

Sample size :
Dose Escalation
Dose limiting toxicity (DLT) must
be defined
Decide a few dose levels (e.g. 4)
At least three patients will be
treated on each dose level
(cohort)
Not a power or sample size
calculation issue

Dose Escalation
Enroll 3 patients
If 0/3 patients develop DLT
Escalate to new dose

If DLT is observed in 1 of 3
patients
Expand cohort to 6
Escalate if 3/3 new patients do not
develop DLT (i.e. 1/6 develop DLT)

Dose Escalation
Maximum Tolerated Dose (MTD)
Dose level immediately below the
level at which 2 patients in a
cohort of 3 to 6 patients
experienced a DLT

Usually go for safe dose


MTD or a maximum dosage that is
pre-specified in the protocol

Phase II/III :Number of


Patients to Enroll?
Ratio of two arms !:1, 1:1.5 or 1:2
Power of study minimum 80.0% or
=0.80
Difference of outcome
Standard deviation
One tailed/Two tailed
Type I error, = 0.05/0.01

Sample size:Example Survey type


Z 2 /2 * P *(1 p ) * D
N
E2

Where P is the prevalence or proportion of event of interest for


the study, E is the Precision (or margin of error) with which a
researcher want to measure something. Generally E will be 10%
of P and Z/2 is normal deviate for two tailed alternative
hypothesis at level of significance, for example, for 5% level of
significance Z/2 is 1.96 and 1% level of significance it is 2.58 as
shown in table 2.
D is the design effect reflects the sampling design used in the
survey type of study. This is 1 for simple random sampling and
higher values (usually 1 to 2) for other designs such as stratified,
Systematic, cluster random sampling etc, estimated to
compensate for deviation from simple random sampling
procedure. The design effect for cluster random sampling is
taken as 1.5 to 2.
For the purposive sampling, convenience or judgment sampling,

1. Sample size estimation for proportion in


survey type of studies
Example: Researcher interested to know the sample size for
conducting a survey for measuring the prevalence of obesity in
certain community. Previous literature gives the estimate of
obesity at 20% in the population to be surveyed, and assuming
95% confidence interval or 5% level of significance and 10%
margin of error, the sample size can be calculated as follow as;

a simple random sampling design. Hence sample size of


1537 is required to conduct community based survey to
estimate the prevalence of obesity. Note-E is the margin
of error, in the present example it is 10%X0.20=0.02.
To find the final adjusted sample size, allowing nonresponse rate of 10% in the above example,, the adjusted
sample size will be 1537/(1-0.10)=1537/0.90=1708.

2. Sample size estimation with single


group mean
N = (Z/2)2 s2 / d2,
where s is the standard deviation obtained from
previous study or pilot study and
d is the accuracy of estimate or how close to the true
mean
Z/2 is normal deviate for two tailed alternative
hypothesis at level of significance
Example: In a study for estimating the weight of
population and wants the error of estimation to be less
than 2 kg of true mean (that is expected difference of
weight to be 2 kg) , the sample standard deviation was 5
and with a probability of 95%, and (that is) at an error
rate of 5%, the sample size estimated to be
N=(1.96)2 (5)2/ 22 gives the sample of 24 subjects, if the
allowance of 10% for missing, losses to follow-up,

3. Sample size estimation with two means


N

(r 1)( Z /2 Z1 ) 2 2
r d2

Where Z is the normal deviate at


level of significance( Z is 1.96
for 5% level of significance and
2.58 for 1% level of significance)
and Z1- is the normal deviate at
1-% power with % of type II
error( 0.84 at 80% power and
1.28 at 90% statistical power).
r=n1/n2 is the ratio of sample
size required for two groups,
generally it is one for keeping
equal sample size for two groups,
If r=0.5 gives the sample size
distribution as 1:2 for two

Let`s
us
say
a
clinical
researcher wanting to compare
the effect of two drugs, A and
B, on systolic blood pressure
(SBP). On literature search
researcher found the mean SBP
in two groups were 120 and
132 and common standard
deviation of 15, The total
sample size for the study with
r=1 (equal sample size), =5%
and power at 80% and 90%
were computed as
24 and for 90% of statistical
power the sample size will be
32. In unequal sample size of
1: 2 (r=0.5) with 90% statistical
power of 90% at 5% level

4. Sample size estimation with two proportions

/2

2 p(1 p) Z1 p1 (1 p1 ) p2 (1 p2 )

( p1 p2 )2

Where p1 and p2 are the proportion of event of


interest (outcome) for group I and group II and p
is , is normal deviate at level of significance
and Z1- is the normal deviate at 1-% power with
% of type II error, normally type II error is
considered 20% or less.
If researcher is planning to conduct a study with
unequal groups, he or she must calculate N as if
we are using equal groups, and then calculate the
modified sample size . If r=n1/n2 is the ratio of
sample size in two groups, then the required

4. Sample size estimation with two proportions


Example: It is believed that the proportion of patients
who develop complications after undergoing one type of
surgery is 5% while the proportion of patients who
develop complications after a second type of surgery is
15%. How large should the sample be in each of the two
groups of patients if an investigator wishes to detect,
with a power of 90% , whether the second procedure has
a complications rate significantly higher than the first at
the 5% level of significance?

5. Sample size estimation with


correlation co-efficient
N

1
4

Z /2 Z1

1 r
log

Example: According to the literature, the correlation


between salt intake and systolic blood pressure is around
0.30. A study is conducted to attests this correlation in a
population, with the significance level of 1% and power of
90%. The sample size for such a study can be estimated as
2
follows:
2.58

1.28

1
1 0.3
log e

4
1

0.3

the sample size for 90%


power at 1% level of
significance was 99 for
two tailed alternative
test and 87 for one

6. Sample size estimation with odds


ratio

Sample size(per group) estimation: A Thumb


rule
Two group : n=16 s2/d2
Three groups: n=22 s2/d2
Four groups: n=26 s2/d2

One tailed: 20% less


Pre-post design: 50% less
Cross-over design: 25% of
two tailed

K:1 ratio for Unequal


sample size
Where s is the within
Increase total sample size
standard deviation d is the
by
smallest difference of means
(k+1)2/4k
Total sample size for two
group is 26
Factors affecting the sample sizeWant in 2:1
1. The Sample size increases as SD
increase
Then
26*9/8=30 (20:10)
2. The sample size increases as significance level made
smaller(<0.05)
3. Sample size increases with required power increases
(>0.80)
4. Sample size increases with decrease in difference

Sample Size calculation

n=111

n=141
n=69
for =0.70

Yoga teacher to a woman: Has


yoga any effect over your husbands
drinking habit?
Woman: Yes, An Amazing Funny
Effect !! Now he drinks the whole
bottle standing upside down over his
head.

Selection of Controls
Generally control: Cases ratio is 1:1
1.Historical controls: comparison of group
which was treated earlier period using
another form of therapy/intervention
2.Geographical control : Comparison with a
group treated elsewhere with a different
form of therapy /intervention
3.Volunteer control: May not be matched group
4.Concurrent control : Control group observed
simultaneously with the treated group
Placebo control vs Active control A placebo is
an inactive pill, liquid or a powder that has
no treatment value Control is the standard by
which the experimental observations are
evaluated. In many Clinical trials an

How many controls


Case-Control Study
Sample size
calculation says
n=13 cases
and controls are
needed
Only have 11 cases!
Want the same
precision
n0 = 11 cases
kn0 = # of controls

n
k
2n0 n
k = 13 / (2*11 13) =
13 / 9 = 1.44
kn0 = 1.44*11 16
controls (and 11
cases)
Same precision as
13 controls and 13
cases

Question by a student !!
If a single teacher can't
teach us all the subjects,
Then...
How could you expect a single
student
to learn all subjects ?

Randomization
An important aspect of any research
which should be clearly stated in
the final report is the method used
to assign treatments (or other
interventions)
to
participants.
Random assignment has been used
for more than 50 years and is the
preferred method of assignment.
Randomization eliminates the source
of bias in treatments assignment;
Subjects in various groups should not
differ in any systematic way,
If
treatments
are
systematically
different , research results will be
biased.
Inadequate
randomization,
overestimate treatment by 40%

Commonly used term in


research
Protocol

A protocol is a study plan on which all clinical


trials are based. The plan is carefully
designed to safeguard the health of the
participants as well as answer specific
research questions. A protocol describes
what types of people may participate in the
trial; the schedule of tests, procedures,
medications, and dosages; and the length of
the study. While in a clinical trial,
participants following a protocol are seen
regularly by the research staff to monitor
their health and to determine the safety and
effectiveness of their treatment.

Informed Consent form


Informed Consent An agreement signed by all volunteers
participating in a clinical research study, indicating their
understanding of:

(1)why the research is being done;


(2)what researchers hope to learn;
(3)what will be done during the trial and for how
long;
(4)what risks are involved;
(5)what, if any, benefits can be expected from
the trial;
(6)what other interventions are available;
(7)the participants right to leave the trial at any
time.

Two principles of data


analysis
Intent-to-treat Analysis (ITT) Full Analysis data Set
Patients in a trial assigned to one treatment group but
for variety of reasons, they receive other treatment
(withdrawal or failure to comply). If this occurs, subjects
should be analyzed as if they had completed the study in
their treatment group. If the composition of each
treatment groups are altered, one negates the intention of
randomized trial- to have random distribution of
unmeasured characteristics that may affect the outcome
(confounders). Regardless protocol deviations, subject
compliance or withdrawal analysis is performed according
to assigned treatment group. Admits non-compliance and
protocol deviations.
Per Protocol Analysis (PP) Per Protocol Analysis data Set or
efficacy sample or evaluable sample
Only patients who sufficiently complied with the trial
protocol should be considered in analysis. Compliance
covers exposure to treatment, availability of measurement
and absence of major protocol violations, also called as

Types of Clinical trails


Treatment trials

Testing new drugs/new approaches

Prevention trials

Better ways to prevent diseases.


Vaccines, medicines, minerals etc

Diagnostic trials

Find better test for diagnosing the


diseases

Screening trials

Test the best way to detect certain


disease or health conditions

Quality of Life
trials

Explore the ways to improve


comfort and quality of life for
individuals with Chronic illness

Phases of Clinical trails


Phase I

Researchers test an experimental drug or treatment in a


small group of healthy people (20-80) for the first time to
evaluate its safety, determine a safe dosage range, and
identify side effects

Phase II

The experimental study drug or treatment is given to a


larger group of patients (100-300) to see if it is effective
and to further evaluate its safety.

Phase III

the experimental study drug or treatment is given to large


groups of Patients (1,000-3,000) to confirm its
effectiveness, monitor side effects, compare it to
commonly used treatments, and collect information that
will allow the experimental drug or treatment to be used
safely.

Phase IV

post marketing studies delineate additional information


including the drug's risks, benefits, and optimal use.

Teacher fell asleep in class and a little naughty boy walked up


to him,
Little boy: "Teacher are you sleeping in class?"
Teacher: "No I am not sleeping in class."
Little boy: "What were you doing sir ?"
Teacher: "I was talking to God."
The next day the naughty boy fell asleep in class and the same
teacher walks up to him...
Teacher: "young man, you are sleeping in my class."
Little boy: "No not me sir, I am not sleeping."
Angry teacher: "What were you doing.??"
Little boy: "I was talking to God."
Angry teacher: "What did He say??"

Data management
Involves Screening for missing data
Is the Missing due to incomplete data
collection
Is the missing due to non-response
Is the pattern of missing is random
Data validity: Screening for data validity; wrong
entry etc
Outliers: Screening for outliers
Normality test:
test

Kolmogrove Smrinov normality


Shaperio wilk W test

Missing values , outliers and strategies


Since 1960, principles of Intent-to-treat (ITT) has become widely accepted for the
analysis of controlled clinical trails. In this context, the question of how to
perform such an analysis in the presence of missing information about a main
endpoint is of Major importance. 10-20% additional samples are required to
adjust the drop out rate
Drastic increase in the Type I error and Substantial decrease in Power of the study
Type of Missing:
1.Missing completely at random(MCAR): when missing values are randomly
distributed across all observations, it can be confirmed by dividing respondents into
those with and without missing data, using the t-test or chi-square test to establish two
groups do not differs significantly
2. Missing at random (MAR) : is a condition which exits when missing values are not
randomly distributed across all the observations but are randomly distributed within
one or more subsamples
Methods of estimating missing values
1. LOCF (Last observation carry forward)
2. Mean value
3. By Regression method

Data sheet on missing

SAMPLING
It is a scientific method of data collection.
The main principle behind sampling is that we seek knowledge about the total units(called
population) by observing a few units(called sample) and extend our inference about the
sample to the entire population.

DIFFERENCE BETWEEN SAMPLING


AND CENSUS
IN CENSUS METHODLOGY EACH AND EVERY ELEMENT OF THE UNIVERSE
IS CONTACTED
WHEREAS IN SAMPLING METHODOLOGY FEW ELEMENTS ARE SELECTED
FROM UNIVERSE FOR THE RESEARCH.

NEED OF SAMPLING
1. POPULATION IN MANY CASES MAY BE SO LARGE AND SCATTERED THAT A
COMPLETE COVERAGE MAY NOT BE POSSIBLE.
2. IT OFFERS HIGH DEGREE OF ACCURACY BECAUSE IT DEAL WITH A SMALL
NUMBER OF PERSONS.
3. IN SHORT PERIOD OF TIME VALID AND COMPARABLE RESULTS CAN BE
OBTAINED
4. SAMPLING IS LESS DEMANDING IN TERMS OF REQUIREMENTS OF
INVESTIGATORS.
5. IT IS ECONOMICAL SINCE IT CONTAINS FEWER PEOPLE.

Role Surveys in disease controls


For the purpose of disease control , improving the health and
productivity of animal or aquatic animals and therby, the well-being of
the people information needed to
1.
2.
3.
4.
5.
6.
7.

Identify what disease exist in the country


Determine the level and location of diseases
Determine the importance of different diseases
Set priorities for the use resources for disease control activities
Plan, implement, and monitor disease controls programs
Respond to disease outbreaks
Meet reporting requirement of International organisations OIE, WHO
etc
8. Demonstrate disease status for trading parteners

Population, Sample and Sampling Method


Who is the target group This is called the study
for the study?
population
Who in the target group This is called the
should be surveyed?
sample.
How many people
should be surveyed?

This is called the


sample size.

How should the people This is called the


to be surveyed by
sampling method.
selected?

PRINCIPLES OF SAMPLING
1. SAMPLE UNITS MUST BE CHOSEN IN A SYSTEMATIC AND OBJECTIVE
MANNER.
2. SAMPLE UNITS MUST BE CLEARLY DEFINED AND EASILY IDENTIFIABLE
3. SAMPLE UNITS MUST BE INDEPENDENT OF EACH OTHER.
4. SAME UNITS OF SAMPLE SHOULD BE USED THROUGHTOUT THE
STUDY.
5. THE SELECTION PROCESS SHOULD BE BASED ON SOUND CRITERIA
AND SHOULD AVOID
ERRORS, BIAS AND DISTORTIONS.

Population

Sample

Convenience Sampling

Researche
r

Researcher selects units to be included based


on ease of obtaining them or simple availability

Volunteer Sampling
Ill do it!

Ill do it!

Researcher

Ill do it!

Researcher uses only people who volunteer to participate


in the research

Network/Snowball Sampling
Researcher

Researcher selects a few participants, who then


suggests others
who may be willing to participate

Simple Random Sampling

2, 6, 7, 12, 18

Each member of the population is listed in fashion (e.g., numerically


and then a sample is drawn by randomly selecting members of the p

Systematic/Sequential Random Sampling

Desired Sample Size: 5


Population Size: 20
A random start
in the
sequence is selected, and sample is selected
Random
Start:
Increment:
20/52 = 4

selecting cases sequentially in the list to produce the desired sampl

Cluster Sampling

Probability Sampling
1. Random Sampling

Random sampling is one


of the most popular types
of random or probability
sampling.

2. Stratified Sampling Method


Stratified sampling is a
probability sampling
technique wherein the
researcher divides the
entire population into
different subgroups or
strata, then randomly
selects the final subjects
proportionally from the
different strata.

3. Systematic Sampling
Systematic sampling is a random sampling technique which is
frequently chosen by researchers for its simplicity and its
periodic quality.

Starting number: The researcher selects an integer that must be less


than the total number of individuals in the population. This integer will
correspond to the first subject.
Interval: The researcher picks another integer which will serve as the
constant difference between any two consecutive numbers in the

4. Cluster Sampling
In cluster sampling, instead of selecting all the subjects from
the entire population right off, the researcher takes several
steps in gathering his sample population.

5. Area Probability Sample


An area probability sample is one in which geographic areas are
sampled with known probability. While an area probability sample
design could conceivably provide for selecting areas that are
themselves the units being studied,
In survey research an area probability sample is usually one in
which areas are selected as part of a clustered or multi-stage
design. In such designs, households, individuals, businesses, or
other organizations are studied, and they are sampled within the
geographical areas selected for the sample.
Area sampling is basically multistage sampling in which maps,
rather than lists or registers, serve as the sampling frame. This is
the main method of sampling in developing countries where
adequate population lists are rare. The area to be covered is
divided into a number of smaller sub-areas from which a sample is
selected at random within these areas; either a complete
enumeration is taken or a further sub-sample.

6. Multi-stage Sampling
A multi-stage sample is one in which sampling is done
sequentially across two or more hierarchical levels, such as
first at the county level, second at the census track level, third
at the block level, fourth at the household level, and ultimately
at the within-household level.
Single-stage samples include simple random sampling, systematic
random sampling, and stratified random sampling. In single-stage
samples, the elements in the target population are assembled into a
sampling frame; one of these techniques is used to directly select a
sample of elements.
In contrast, in multi-stage sampling, the sample is selected in
stages, often taking into account the hierarchical (nested)
structure of the population. The target population of elements
is divided into first-stage units, often referred to as primary
sampling units (PSUs), which are the ones sampled first. The
selected first-stage secondary .
IN THIS METHOD , SAMPLING IS SELECTED IN VARIOUS STAGES

7. MULTI-PHASE SAMPLING
IN THIS TYPE OF SAMPLING THE PROCESS IS SAME AS IN
MULTI-STAGE SAMPLING . IN THIS EACH SAMPLE IS
ADEQUATELY STUDIED BEFORE ANOTHER SAMPLE IS DRAWN
FROM IT.
IN MULTISTAGE SAMPLING ONLY THE FINAL SAMPLE IS STUDIED
WHEREAS
IN MULTI-PHASE SAMPLING ALL SAMPLES ARE RESEARCHED.

Summing up.
Hypothesis Tests
Statisticians follow a formal process to determine whether to
reject a null hypothesis, based on sample data. This process, called
hypothesis testing, consists of four steps.
1.State the hypotheses. This involves stating the null and
alternative hypotheses. The hypotheses are stated in such a way
that they are mutually exclusive. That is, if one is true, the other
must be false.
2.Formulate an analysis plan. The analysis plan describes how to
use sample data to evaluate the null hypothesis. The evaluation
often focuses around a single test statistic.
3.Analyze sample data. Find the value of the test statistic (mean
score, proportion, t-score, z-score, etc.) described in the analysis
plan.

Non-Probability Sampling:
1.Convenience Sampling

Convenience
sampling is a nonprobability sampling
technique where
subjects are selected
because of their
convenient
accessibility and
proximity to the
researcher.

2. Sequential Sampling Method


Sequential sampling is a non-probability sampling
technique wherein the researcher picks a single or a
group of subjects in a given time interval, conducts his
study, analyzes the results then picks another group of
subjects if needed and so on.

3. Quota Sampling
Quota sampling is a non-probability sampling technique wherein
the assembled sample has the same proportions of individuals
as the entire population with respect to known characteristics,
traits or focused phenomenon.

4. Judgmental Sampling
Judgmental sampling is a non-probability sampling technique where the
researcher selects units to be sampled based on their knowledge and
professional judgment.

5. Snowball sampling
Snowball sampling is a non-probability sampling technique that is used
by researchers to identify potential subjects in studies where subjects
are hard to locate.

Types of Snowball Sampling

Decision Errors
Two types of errors can result from a hypothesis test.
Type I error. A Type I error occurs when the researcher rejects a null
hypothesis when it is true. The probability of committing a Type I error is
called the significance level. This probability is also called alpha, and is
often denoted by .
Type II error. A Type II error occurs when the researcher fails to reject a
null hypothesis that is false. The probability of committing a Type II error
is called Beta, and is often denoted by . The probability of not
committing
Decision
Rules
a Type II error is called the Power of the test.
The analysis plan includes decision rules for rejecting the null
hypothesis.
P-value. The strength of evidence in support of a null hypothesis is
measured by the P-value. Suppose the test statistic is equal to S. The Pvalue is the probability of observing a test statistic as extreme as S,
assuming the null hypothesis is true. If the P-value is less than the
significance level, we reject the null hypothesis.
Region of acceptance. The region of acceptance is a range of
values. If the test statistic falls within the region of acceptance, the null
hypothesis is not rejected. The region of acceptance is defined so that
the chance of making a Type I error is equal to the significance level.
One tailed or two tailed

Car driver: You cheated me. You sold me useless


radio.
Shopkeeper: No, I sold a good radio to you.
Car driver: Radio label shows "Made in Japan" but
radio says: This is all India Radio.

Preparation of masterchart or data


sheet

CONSTRUCTION OF
SCALES

Construction of
questionnaire or scale
What a person knows (knowledge of
information)
What a person likes and dislikes (values and
preferences)
What a person thinks (opinions, attitudes,
beliefs, perceptions)
What experiences have taken place
(biography)
What is occurring at present (facts)

Likert scaling (1932)


Likert scales were developed in 1932 as the familiar five-point bipolar
response format most people are familiar with today. These scales always
ask people to indicate how much they agree or disagree, approve or
disapprove, believe to be true or false.

Frequency

Important

Quantity

Likert scaling format

Writing questions: key points


Anonymity and Confidentiality
An anonymous study is one in which nobody (not even the study directors)
can identify who provided data on completed questionnaires.
The Length of a Questionnaire: As a general rule, long questionnaires
get less response than short questionnaires (Brown, 1965; Leslie, 1970).
Color of the Paper : Berdie, Anderson and Neibuhr (1986) suggest that
color might make the survey more appealing.
Incentives: Many researchers have examined the effect of providing a
variety of nonmonetary incentives to subjects. These include token gifts
such as small packages of coffee, ball-point pens, postage stamps, or key
rings .
The "Don't Know", "Undecided", and "Neutral" Response Options
Response categories are developed for questions in order to facilitate the
process of coding and analysis. Respondents were more likely to choose the
"undecided" category when it was off to the side of the scale.
Question Wording

Steps to construct a scale


Phase I: Item generation
Face Validity
Content Validity
Phase II: Scale Development (pilot
study)
Construct validity
Criterion related validity
Reliability
Internal Consistency
Phase III: Scale Evaluation (Large scale
data collection)
Reliability
Internal Consistency
Discriminant validity

Table 1: Statements selected from various source for face


validity
Sources

Number of
statements

Previous literature

Thesis

40

8.0

Peer reviewed
papers

40

8.0

Bulletins/Mannuals/
Annual report

30

6.0

Experts consulted

Professors

75

15.0

Student
Entrepreneurs

25

5.0

Entrepreneurs

50

10.0

Entrepreneurship
websites

100

20.0

Online library
Online journal
Discussion forum
Others
Total

15
20
5
100
500

3.0
4.0
1.0
20.0
100.0

Table 2: Content validity by five


experts for developing
Entrepreneurial skills
questionnaire for graduate
Description
students
Number of
statement
Number of statements
screened at face validity
phase
Number of statements
evaluated by experts

Number of statements
satisfied Aiken`s Index >0.70

250

100.0

250

100.0

110

44.0

140

56.0

110

44.0

Internet source

Number of statements not


satisfied Aiken`s Index

Number of statements
considered of Pilot study

Table 4: Criterion related validity


Entrepreneurial skills

Present study Standard


scale
scale

Correlation Significant

General

11.1742.85

33.205.99

0.528

<0.001**

Managerial

108.6841.25 33.205.99

0.591

<0.001**

Manufacturing

93.6831.56

33.205.99

0.522

<0.001**

Marketing

85.9933.02

33.205.99

0.599

<0.001**

399.72132.38 33.205.99

0.604

<0.001**

Total

Table 3 Content and Construct validity by Item-Total


correlation
Factor loadings
Statements
1. I am a person who is ready to take
responsibility
2. I want to be economically
independent
3. I feel responsible for my mistakes
and take corrections
4. I persevere till I can achieve my
dream
5. I have a supportive network of
friends, family and advisers
6. I'm flexible and able to take advice
7. I am a person who makes decision
within a reasonable time frame
8. I set goals & articulate a vision
9. It is important to me to make a mark
in this life
10. I have self confidence and self
esteem
11. I have a strong need to work
independently
12. I don't start something without a
clear vision and plan of action
13. Once I start a project I pursue it
inspite of challenges

Aiken`s
Index

Item-Total
correlation

Factor 1

Factor 2

Factor 3

Factor 4

0.850

0.847

0.848

0.291

0.265

0.243

0.850

0.858

0.821

0.298

0.291

0.263

0.850

0.856

0.833

0.312

0.263

0.256

0.750

0.850

0.832

0.278

0.293

0.251

0.750

0.863

0.827

0.317

0.261

0.277

0.750

0.858

0.828

0.297

0.283

0.264

0.750

0.848

0.824

0.290

0.283

0.255

0.750

0.855

0.802

0.312

0.276

0.280

0.850

0.857

0.823

0.312

0.261

0.273

0.850

0.850

0.844

0.291

0.253

0.266

0.900

0.852

0.831

0.274

0.266

0.291

0.900

0.846

0.823

0.305

0.262

0.256

0.850

0.839

0.843

0.277

0.263

0.248

Table 6: Explorative Factor analysis: Extraction and Rotation Sums of Squared Loadings

Initial Eigen values


Compo
nents
Total

Extraction Sums of
Squared Loadings

% of
Cumul
Varian
Total
ative %
ce

Rotation Sums of
Squared Loadings
(Varimax)

% of
Cumul
Varian
Total
ative %
ce

% of
Cumul
Varian
ative %
ce

74.83

68.03

68.03

74.83

68.03

68.03

27.49

24.99

24.99

7.95

7.23

75.26

7.95

7.23

75.26

23.83

21.66

46.65

6.58

5.98

81.24

6.58

5.98

81.24

22.15

20.13

66.78

5.27

4.79

86.03

5.27

4.79

86.03

21.18

19.25

86.03

Table 7: Test-Retest reliability(Stability ) and


Cronbach Alpha (Consistency) Co-efficient based on Pilot study
Entrepreneuria Number Max Cronbach Correlatio
l skills
of items score
alpha
n
General
Managerial
Manufacturing
Marketing
Total

30
30
25
25
110

150
150
125
125
550

0.996
0.994
0.993
0.990
0.996

0.7381
0.7610
0.6652
0.7149
0.7707

Reliabilit
y Index

P value

Remark

0.8493

<0.001**

High
reliable

0.8643

High
<0.001** reliable

0.7990

Very
<0.001** reliable

0.8337

High
<0.001** reliable

0.8705

High
<0.001** reliable

VALIDITY AND
RELIABILITY
Validity refers to the accuracy
or truthfulness of a measurement. Are we measuring what we

think we are? "Validity itself is a simple concept, but the determination of the validity of a measure
is elusive
Face validity is based solely on the judgment of the researcher.
Content validity. Expert opinions, literature searches, and pretest open-ended questions help to
establish content validity.
Criterion-related validity can be either predictive or concurrent. Predictive validity refers to the
ability of an independent variable (or group of variables) to predict a future value of the dependent
variable. Concurrent validity is concerned with the relationship between two or more variables at
the same point in time.
Construct validity: It looks at the underlying theories or constructs that explain a phenomena.
Reliability is synonymous with repeatability. A measurement that yields consistent results over
time is said to be reliable.
A test-retest measure of reliability: administering the same instrument to the same group of
people at two different points in time.
equivalent-form technique: The researcher creates two different instruments designed to
measure identical constructs.
split-half reliability: measures of internal consistency

Statistical Methods

STATISTICAL METHODS
OUTCOME

COMPARISON

PARAMETRIC

NON-PARAMETRIC

MEAN/SD

ONE GROUP

ONE SAMPLE Z TEST

RUN TEST

NO(%)

ONE GROUP

EXACT TESTS
CHI-SQUARE TEST
FISHER EXACT TEST

MEAN/SD

TWO GROUP

STUDENT T TEST

MANN WHITENY U TEST

MEAN/SD

TWO GROUP
(PRE-POST)

STUDNT T
TEST(PAIRED)

WILCOXON
SIGNED RANK TEST

NO(%)

TWO GROUP

MEAN/SD

THREE OR MORE
GROUPS

NO(%)

THREE OR MORE
GROUPS

RELATIONSHIP

CHI-SQUARE TEST
FISHER EXACT TEST
ANOVA

KRUSKAL WALLIES
TEST
CHI-SQUARE TEST
FISHER EXACT TEST

PEARSON
CORRELATION
REGRESSION

SPEARMAN
CORRELATION

Some Important scales


The Beck Depression Inventory. The 1970 version of the Beck
Depression Inventory is a 21-item selfreport inventory. Each item consists
of four alternative statements that represent gradations of a given
symptom rated in severity from 0 to 3. The scale is scored by summing the
item ratings; the total scores can range from 0 to 63. The instrument was
either self-administered by the patients or read aloud by one of the
The
Hopelessness
research
assistants. Scale. The Hopelessness Scale consists of 20 truefalse statements that assess the extent of pessimism. Each of the 20 items
is scored 1 or 0; the total score is the sum of the individual item scores. The
possible range of scores is from 0 to 20. The method of administration was
similar to the procedure used for the Beck scale.
The Scale for Suicide Ideation. This scale quantifies the severity of
current suicidal ideas and wishes. It was developed on the basis of
systematic clinical observations and interviews with suicidal patients. The
scale includes 19 items; each of these is composed of three choices that
range from 0 (least severe) to 2 (most severe) of the given construct. The
total score is computed by summing the item ratings; the scores can range
from 0 to 38. The items quantify the frequency and duration of suicidal
thoughts as well as the patients' attitudes toward them.

Japanese Prime Minister:


Give me Bihar for 3 years, we
will turn it into Japan.
Laloo: Give me Japan for 3
months, I will turn it into Bihar.

Fundamental Technique
of Life table or Survival
analysis
time to event
Time to death, time to relapse,
recovery, pregnancy, receiving organ
transplant, failure to treatment
Uses the time of entry
Answer the question of the chance of
survival after being diagnosed with
disease or after beginning the
treatment

Example
A group of 200 subjects followed for
three years
Deaths (Events) occurred throughout
three years
What is the chance of surviving at
the end of three years??

Interva It
l
1
200

dt

qt

pt

Pt

20

0.1

0.9

1.0

180

30

0.17

0.83

0.9

150

40

0.27

0.73

0.747

0.73x0.747=0.545

0.545

It: Number alive at the beginning of time t


dt:Number of deaths during the time interval
qt:dt/It=prob of dying during the time interval
pt=1-qt: Probability of surviving in the time interval
Pt=Cum probability of survival at the beginning of time interval or at
the end of previous time interval

Meta-analysis
About 40,000 journals for the sciences,
the researchers are filling those journals at the rate of one article
every 30 seconds.
As results accumulate, it becomes increasingly difficult to
understand what they tell us and becomes difficult to find the
knowledge in this flood of information.
Meta-analysis is a rapidly expanding area of research that has been
relatively underutilized in Animal Nutrition and Physiology
Meta-analysis is also an evidence based research
Meta-analysis is a analysis of analysis, the statistical analysis of
large collection of results from individual studies for the purpose
of integrating the findings.

Meta-analysis aims to quantitatively combine


results of different studies
Pooled estimates should be a weighted average
of all studies included in meta- analysis.
Increases the statistical power

Resources

Schoenfeld, Richter. Nomograms for calculating the number of


patients needed for a clinical trial with survival as an endpoint.
Biometrics. 38(1):163-170, 1982.
Bland JM and Altman DG. One and two sided tests of significance.
British Medical Journal 309: 248, 1994.
Pepe, Longton, Anderson, Schummer. Selecting differentially
expressed genes from microarry experiments. Biometrics.
59(1):133-142, 2003.
http://faculty.vassar.edu/lowry/VassarStats.html
Statistics guide for research grant applicants, St. Georges Hospital
Medical School (http://www.sghms.ac.uk/depts/phs/guide/size.htm)
http://www.physics.csbsju.edu/stats/contingency_NROW_NCOLUMN
_form.html
http://www.physics.csbsju.edu/stats/exact_NROW_NCOLUMN_form.
html
http://www.randomization.com/
http://www.graphpad.com/quickcalcs/index.cfm

Clinical Study Protocol


Administrative Structure
Sponsor Signature page
Investigator Signature Page
Facilities
1.0 List of abbreviations and definitions of terms
2.0 Introduction
2.1 Background
2.1.1 Non clinical Summary
2.1.2 Clinical Summary
2.2 Benefits and Risks
2.2.1 Benefits and Risks of the study
2.2.2 Benefits and Risks of the study Drug
2.3 Rationale for the study

3.0 Study Objectives


3.1 Primary objectives
3.2 Secondary objectives
3.3 Exploratory Objectives
4.0 Investigational Plan
4.1 Study design
4.2 Blinding
4.3 Randomization/ treatment Groups
4.4 Study endpoints
4.4.1 Pharmacokinetic endpoints
4.4.2 Pharmacodynamic endpoints
4.4.3 Efficacy endpoints
4.4.4 Safety Endpoints
4.4.5 Immunogenicity Endpoints
4.5 Duration of the Study
4.5.1 Part 1 treatment and Evaluation
4.5.2 Part 2 Recovery and follow-up

4.6 Selection of Study population


4.6.1 Inclusion Criteria
4.6.2 Exclusion Criteria
4.6.3 Protocol Required Restrictions
4.6.4 Patient Withdrawal
4.6.5 Screen failure or Rescreening
4.7 Study Assessments
4.7.1 Clinical laboratory Evaluations
4.7.2 Vital Signs
4.7.3 Physical Examination
4.7.4 Electrocardiogram
4.7.5 Chest X ray and Quantiferon -TB Gold Test
4.8 Study Periods
4.8.1 Part 1 Treatment and evaluation
4.8.1.1 Screening
4.8.1.2 Randomization Visit 1(week 0), Day 1
4.8.1.3 Evaluation Visit 2, Visit3 Visit 4 .
4.8.3 End- of-study

5.0 Study treatments


5.1 Treatment administered
5.2 Investigational products
5.3 Packaging and Labelling
5.4 Storage
5.5 Preparation
5.6 Study Medication Accountability
5.7 Prior and concomitant Treatments
5.8 Rescue Therapy
5.8.1 Prohibited Concomitant Medications
5.8.2 Allowed Medications
5.9 Treatment Compliance
6.0 Pharmacokinetic, Pharmacodynamic, efficacy outcome and
immunogenicity assessment
6.1 Pharmacokinetics and Pharmacodynamics
6.1.1 Pharmacokinetics
6.1.2 Pharmacodynamics

6.1 Pharmacokinetics and Pharmacodynamics


6.1.1 Pharmacokinetics
6.1.2 Pharmacodynamics
6.2 Efficacy Outcomes
6.2.1 Health assessment questionnaire Disability index
6.3 Immunogenicity
6.4 Appropriateness of Measurements
7.0 Safety
7.1 Drug associated adverse events- warnings, precautions and treatment
recommendations
7.2 Adverse Events
7.2.1 Definitions and general guidelines
7.2.2 Clinical laboratory abnormalities and other abnormal assessments
7.2.3 Pregnancy
7.2.4 Safety monitoring

8.0 Statistics
8.1 Determination of sample size
8.2 Statistical Methods
8.2.1 Interim Analysis
8.2.2 Efficacy Analysis
8.2.3 Safety Analysis
8.2.4 Analysis Populations
8.3 Safety
8.3.1 Adverse Events
8.3.2 Clinical Laboratory Evaluations
8.3.3 Vital signs Measurements, physical findings and other safety Evaluations
8.3.4 Immunogenicity
8.4 Pharmacokinetic Analysis
8.5 Pharmacodynamic analysis
9.0 Ethics
9.1 Institutional Review Board or Independent Ethics Committee

10.0 Data Integrity and Quality Assurance


10.1 Monitoring
10.2 Data Management/ Coding
10.3 Quality Assurance Audit
10.4 Ethical Conduct of the study
10.5 Patient Information and informed consent
10.6 Patient Data Protection
11.0 Study Administration
11.1 Administrative Structure
11.2 Study and study Centre closure
11.3 Study discontinuation
11.4 Data handling and Record Keeping
11.5 Direct Access to source Data/Documentation
11.6 Investigator Information

11.6.1 Investigator Obligations


11.6.2 Protocol Signatures
11.6.3 Publication Policy
11.7 Financing and Insurance
11.8 Interpretation of the Protocol/Protocol Amendment(s)
11.9 Protocol Deviations/Violations
12.0 Appendices
12.1 Appendix 1:Schedule of Assessments
( and so on.... list of appendices)
13.0 References
List Of Tables

Contact details: 9341321900/9341364359/9663590140


sureshkp97@gmail.com

The winners in life think constantly in terms of I can, I will, and I am. Losers, on
the other hand, concentrate their waking thoughts on what they should have or
would have done, or what they can't do.

Thank you