You are on page 1of 3

Research Methods Assignment 1

Question 1
State and explain ways of calculating sample size.

Calculation of exact sample size is an important part of research design in biomedical

statistics. It is very important to understand that different study design need different method
of sample size calculation and one formula cannot be used in all designs. Several approaches
are used to determine the sample size. These include using a census for small populations,
imitating a sample size of similar studies, using published tables and applying formulas to
calculate a sample size.

Making use of a census for small population samples: making use of the entire population as
the sample is another method which can be employed by biomedical researchers to calculate
sample sizes. Although cost considerations make this impossible for large population sizes. A
census is mainly attractive for small population sample sizes. A census eliminates sampling
error and provides data on all the individuals in the population. Virtually, the entire
population would have to be sampled in small population size to achieve a desirable level of

Making use of a sample size of the similar study: This approach is to use the same sample
size as those of studies similar to the one you plan. Without reviewing the procedures
employed in these studies you may run the risk of repeating errors that were made in
determining the sample size for another study. However, a review of the literature can provide
guidance about typical sample sizes which are used.

Using the published tables is the other way to determine the sample size is to rely on the
published tables, which provide the sample size for a given set of data. The tables present
sample sizes that would be necessary for a given combinations of precision, confidence levels
and variability. Firstly, these sample sizes reflect the number of obtained responses, and not
necessarily the number of surveys mailed or interviews planned (this number is often
increased to compensate for non-response). Secondly, the sample sizes presume that the
attributes being measured are distributed normally or nearly so. If this assumption cannot be
met, then the entire population may need to be surveyed.

Making use of formulas to calculate sample size: Although the tables can provide a useful
guide for determining the sample size, you may need to calculate the necessary sample size
for a different combination of levels of precision, confidence, and variability. The fourth
approach to determining the sample size is the application of one of the several formulas.

Question 2
On carrying out a research, explain how will you deal with missing data?

Missing data can reduce the statistical power of a study and can produce biased estimates,
leading to invalid conclusions. Missing data refers to the data that is not stored for a variable
in the observation of interest.

There are three main classes of missing data which are:

a). Missing Completely at Random (MCAR) is defined as when the probability that the data
are missing is not related to either the specific value which is supposed to be obtained or the
set of the observed responses. The missing values are independent from values of observed or
unobserved characteristics in the set of data. MCAR data might result if a survey respondent
unintentionally failed to answer a question that the researcher is using as a variable in an
b). Missing at Random (MAR) is defined as when the probability that the responses are
missing depends on the set of the observed responses, but is not related to the specific missing
values which is expected to be obtained. The data that are MAR can be predicted using
observed variables.
c). Missing not at Random (MNAR) comes into play if the characteristics of the data do not
meet those of MCAR or MAR, then they fall into the category of Missing not at Random
(MNAR). when the data are MNAR, the researcher cannot approximate the missing values
because the values of other relevant variables are also not observed.

The best possible method of dealing the missing data is to prevent the problem by well-
planning the study and collecting data carefully. Researchers use a variety of alternative
techniques to accommodate missing data and minimise its potential negative effects. There
are three most used approaches identified by Little (1988b), which are a) examining the
incomplete cases, b) replacing values for missing data and c) providing statistical weights to
complete cases. Within the general category of data replacement, there are specific techniques
that vary in complexity. The two commonly used techniques are single imputation via mean
replacement and multiple imputation. This part proceeds through the examination of listwise
or case deletion and these alternative techniques and compares their performance using
survey data in an application.

Listwise or case deletion is also known as the complete case (available case) analysis. The
most common approach to the missing data is to simply omit those cases with the missing
data and analyse the remaining data. Two conditions should be met for listwise deletion to be
appropriate for dealing with the missing data, that is the missing data are MCAR, and the
sample remains large after deletion of individual observations. The deleting of observations
for non-response is less consequential if the values are MCAR, because if missingness is
completely random, the data deleted would also be random, and they would not cause the loss
of important variation.

If the data are, instead, MAR (missing at random) or MNAR (missing not at random), they
are inconsistent with the assumptions of listwise deletion. Its use may result in the sample
mean being different from the population mean. It might also affect estimates in a manner like
selection bias; if a set of respondents systematically chooses not to answer a question and
those observations are then deleted from the sample, and the observations that remain in the
analysis may be meaningfully different from the larger population. The other issue with
listwise deletion is that it reduces the sample size and impacts the statistical power of the
analysis. Smaller samples are more likely to generate false null results that might otherwise
not be null with a larger sample.

Single imputation is a general term that describes a family of missing data replacement
techniques, including last value replacement, mean replacement, and single regression
replacement. Last value replacement, which can be used with panel or time-series data,
involves the replication of the most recent value in cases of missingness. Carrying the last
known value forward yields a conservative estimate of the treatment effect when a post-test
value is missing.

Second version of value replacement, sometimes referred to as hot-decking, uses the

information from similar observations to replace the missing data. It is built around a premise
like that of propensity score matching; if observations can be matched with others that look
similar across the known values for a set of variables, missing responses can be replaced by
the value of its matched observation with observed responses. This technique is limited to
data that are MCAR(missing completely at random) or MAR(missing at random).Mean
replacement replaces missing observations with the mean value of that variable from
observed responses in the sample. This preserves the overall mean of each variable but
reduces the variation of the sample. This technique may be appropriate when the degree of
missingness is small and the sample size is large. The smaller the amount of missingness, the
less impact this has on the overall variance estimate.

Multiple imputation is an extension of the single imputation regression replacement method.

As its name suggests, missing values are estimated multiple times. Analysing multiply
imputed data follows three steps: a) the imputation of missing data, b) the running of
independent statistical analysis on the resulting individual data sets, and c) the pooling of the
results across the imputations.

The first step of multiple imputation is similar to the single regression replacement method;
variables that are theoretically related or statistically correlated to the target variable are
identified and used in a regression model to predict the values of the missing data. However,
in multiple imputation, this replacement process is repeated numerous times to incorporate the
uncertainty in the prediction process. Through this process, randomness is incorporated into
the value of the error term, uncertainty in predicting the value of the missingness is included
into the value of the missing data.

Multiple imputation creates numerous data sets, each containing somewhat different estimates
of the missing values. Rubin’s (1978) formula suggests three to ten imputations are necessary
to produce results that incorporate enough variation in the prediction process. This ensures
that the uncertainty inherent in the prediction of missing values is accounted for to
appropriately increase the standard errors in the analysis.


Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics.

2004;60:1015–24. (PubMed)

Jaykaran, Saxena D, Yadav P, Kantharia ND. Negative studies published in medical journals
of India do not give sufficient information regarding power/sample size
calculation and confidence interval. J . 2011;57:176–7. [PubMed] Postgrad Med

Jaykaran, Yadav P, Kantharia ND. Reporting of sample size and power in negative clinical
trials published in Indian medical journals. J Pharm Negative Results.
2011;2:87–90. [Google Scholar]

Kasiulevicius V, Sapoka V, Filipaviciute R. Sample size calculation in epidemiological

studies. Gerontology. 2006;7:225–31. [Google Scholar]

Shah H. How to calculate sample size in animal studies? Natl J Physiol Pharm Pharmacol.

You might also like