You are on page 1of 15

A Gaussian, or normal distribution, follows a classical bell shaped curve and is symmetrical around its

mid-point. It is a continuous probability distribution. 

In a Gaussian distribution:

 The median is the middle point of the observations


 The mode is the most commonly observed measurement
 The mean is the arithmetic average

Skewed distribution the median is the preferred measurement

p value is considered statistically significant when it is less than 0.05 i.e. 5% or 1 in 20. 

The null hypothesis states that there is no difference between the groups. 

Statistical significance is not the same as clinical significance. Although a study may show a benefit from
a treatment or intervention there are many other factors that need to be considered before it can be
stated as being clinically significant.
p value
 < 0.05 taken to be statistically significant and the null hypothesis is rejected
 > 0.05 then there is not a statistically significant difference between the two groups and the null
hypothesis is accepted.

The correlation coefficient is used to measure the strength of linear dependence between two
variables.
 It is denoted by r and indicates how closely the plotted points lie to a line, r takes a value between -1
and + 1:
 When r is positive there is a positive correlation
 When r is negative there is a negative correlation
 When r is zero there is no correlation between the variables
 The closer r is to zero the less the linear association between the variables
 Values of -1 or +1 imply a perfect linear correlation between the variables

 The degree of correlation is as follows:


 High correlation = +0.5-1.0 or -0.5-1.0
 Medium correlation = +0.3-0.5 or -0.3-0.5
 Low correlation = +0.1-0.3 or -0.1-0.3

Regarding the validity of a study

 Selection bias is a form of systematic error. 


 Random error is always present in a measurement.
 They are caused by unpredictable fluctuations in the readings of a measurement apparatus or in
the experimenters’ interpretation of the instrumental reading. 
 Random error will reduce with increasing sample size. 
 Randomisation will reduce confounding.
 A confounding factor is a background variable that is not of direct interest to the study.

Scales of measurement:

 The nominal scale is for mutually exclusive, but not ordered data. It collects qualitative data. 
 The ordinal scale involves ranking of the variable. The order matters but not the difference
between values. 
 The interval scale has meaningful differences between the values.
The Fahrenheit scale is an interval scale, since each degree is equal but there is no absolute zero
point. 
 The ratio scale has all the properties of the interval scale, and also has a clear definition of zero.
When the variable equals zero there is none of that variable. Variables like height, weight, and
enzyme activity are measured on the ratio scale.

The unpaired students t-test is used to analyze the means of small samples and requires the
observations to be normally distributed. 

The numbers of patients in the two groups are not required to be the same.
For the unpaired students t-test there will be n-2 degrees of freedom where n is the total number of
people in the trial. In the paired students t-test both groups of patients would have received both drugs
and would therefore act as their own control. Therefore the number of degrees of freedom for the
paired students t-test is n-1.

The null hypothesis states that there is no difference between the groups. 

 A type I error occurs when the null hypothesis has been rejected when it is actually true. 
 A type II error occurs when the null hypothesis has been accepted when it is actually false.
 Even when the null hypothesis is true some difference would be expected because of random
variation.
 The p value is the probability of observing a difference of that magnitude if the null hypothesis is
true. The smaller the p value the more significant the result. 
 The p value is misleadingly low with a type I error and misleadingly high with a type II error.
 If the p value is less than a pre-determined significance level (often < 0.05) then the null
hypothesis can be rejected and it can be assumed that the result is statistically significant. 
 The z-score indicates how many standard deviations a result is from the mean. It is an indicator
of heterogenicity. If the z-score is 0 then the result is equal to the mean. If the z-score is greater
than 2.2 (> 2.2 standard deviations greater than the mean) then the null hypothesis can be
rejected. 
 The power of a study is the probability of correctly rejecting the null hypothesis when it is false.

Screening tests are used to identify individuals that are at risk of a disease. A positive screening test
does not necessarily mean that the individual tested has the disease. Individuals who are positive on
screening tests should be investigated further to determine if they actually have the disease. Some of
those that screen positive will not have the disease and some of those that actually have the disease
may test negative.

 Sensitivity is the proportion of true positives correctly identified by the test.


 Specificity is the proportion of true negatives correctly identified by the test. 
 The positive predictive value is the proportion of those that test positive that actually have the
disease. 
 The negative predictive value is the proportion of those who test negative that do not have the
disease.
 The positive and negative predictive values can only be estimated using data from cross-
sectional studies or other population based studies in which valid prevalence estimates can be
obtained. The sensitivity and specificity can be estimated from case-control studies.
 The  variance   is a measure of the spread of observations around the mean value.

L’Abbe plots are commonly used to display data visually in a meta-analysis of clinical trials that
compare treatments against control intervention or placebos. 

They are essentially scatter-plots of results of individual studies with the treatment group results on the
vertical axis and the control group results on the horizontal axis. Trials in which experimental treatment
is better than control are displayed in the upper left hand side of the plot. Trials in which control or
placebo are better than the experimental treatment are displayed on the lower right of the plot. The
size of the trial is reflected by the size of the circle used. 

Examples of observational studies include:

 Case-control studies
 Cross-sectional studies
 Cohort studies
 Ecological studies

Example of experimental studies include:

 Randomized Controlled Studies


 Crossover Studies
Box and whisker plots are a convenient way of displaying quantitative variables that have a
skewed distribution.

Box and whisker plots are constructed as follows:

 75% of the values lie below the upper quartile


 25% of the values lie below the lower quartile
 The box represents 50% of the data
 The box is divided by the median
 Outliers may be plotted as individual points

confidence interval (CI)is a range of values for a variable of interest constructed so that this range has
a specified probability of including the true value of the variable of interest. The 95% confidence interval
means that 95% of the time the confidence interval should contain the true value of the variable of
interest.

 The 95% confidence interval can be calculated regardless of the size of the sample. It will be wide if the
sample size is smaller and narrower if the sample size is larger. 

The 95% confidence interval is narrower than the 99% confidence interval. 
95% (CI) = +/- 1.96 times the standard error of the mean

The sample proportion +/- the standard error of the proportion is the 67% confidence interval for the
proportion.

Nominal variables are variables that are discontinuous or non-quantitative in populations, such as hair
or eye colour, gender and racial grouping.

 The central tendency of a nominal variable is given by its mode. Neither the mean nor the median can
be defined. 

Nominal variables can be statistically analyzed using the chi-squared test.

 Nominal variables and categorical variables are one and the same. A nominal variable is one that has
two or more categories, but there is no intrinsic ordering to the categories.

Meta-analysis is a statistical procedure that integrates the results of multiple independent studies with
common features with the goal of identifying patterns and variability amongst the study results. It is
useful when individual studies are too small to give reliable answers alone. 

By combining studies, a meta-analysis increases the sample size and therefore the power to study the
effects of interest 

The principal  advantages of meta-analysis   are as follows:

 Allows the integration and summary of results from individual studies


 Allows analysis of differences in results from individual studies
 Increases precision in estimating effects
 Can overcome small sample sizes

  Meta-analysis can determine if new studies are needed to further investigation an effect but they
cannot eliminate the need for further studies.  
To avoid bias, both positive and negative studies should be included. Research has proven that studies
with a positive result are more likely to be published than those with a negative result. This is referred to
as ‘publication bias’ and proper selection of studies for meta-analysis can help to overcome this.  

In order to avoid bias, unpublished but properly conducted studies should also be included.

Case-control study is a type observational study in which two groups of patients, one with the disease
and one without, are compared on the basis of a proposed causative factor that occurred in the past.
They are therefore retrospective in nature and are useful in hypothesis generation.

They are suitable to be used when investigating a rare disease or as a preliminary study in cases where
little is known about the disease and the proposed aetiological factor. They can look at multiple risk-
factors (exposures) but can only look at a single outcome.

They are not good at identifying rare exposures, cohort studies are better for this as one group with a
particular exposure is compared to a control group without that exposure.

Case-control studies allow the assessment of the influence of predictors on outcome via the calculation
of an odds ratio.

Compared with randomised controlled trials, case-control studies are generally relatively inexpensive to
run but provide less evidence for causal inference.

Compared with prospective cohort studies, case-control studies are usually less expensive and also
shorter in duration.

It can be difficult to obtain reliable information regarding exposure over time to the proposed
aetiological factor in case-control studies. This can significantly reduce their reliability compared with
experimental studies. Bias is a major problem with case-control studies.
Cohort study is a form of longitudinal, observational study that follows a group of patients (the cohort)
forward in time to monitor the effects of a proposed aetiological factor upon them.

They follow a cohort of patients without a disease and evaluate the absolute and relative risk of
contracting the disease state after exposure to the aetiological agent.

An example of a cohort would be a group of patients that have been exposed to a particular drug. This
group could then be followed longitudinally to see if they develop a particular side effect or disease as a
consequence of this drug exposure. The comparison group can be the general population from which
the cohort was drawn, or alternatively another subgroup of the cohort itself.

They are therefore good at investigating rare exposures as the study design sets the exposure. A rare
outcome is unlikely to appear during the study time and they are therefore poor at detecting rare
outcomes.

Cohort studies are the best way to determine the incidence of a disease.

Cohort studies are often compared to case-control studies. They are generally more expensive and
longer in duration than case-control studies but usually provide more useful and reliable information.

Subject-selection and loss to follow-up are two major potential sources of bias with cohort studies and
confounding variables are the main problem with analysis of the results.

Cross-sectional studies involve observations of the frequency and characteristics of a disease in a


population as one specific point in time. They are descriptive studies that aim to provide data on the
entire population within the study.

They can be used to describe a feature of the population, such as the prevalence of a condition, or they
may be used to support or refute inferences of cause and effect.

They are relatively quick to perform but are also moderately expensive to run. They are also not suitable
for the study of rare diseases but can be used to study multiple outcomes. They can be used to study
both acute and chronic conditions.

Cross-sectional studies are the best way to determine the prevalence of a disease.

The UK national census, which takes place every 10 years, is an example of a cross-sectional study.

Cross-sectional studies cannot be used to calculate the relative risk of a condition.

Cross-sectional studies do not themselves differentiate between cause and effect or establish the
sequence of events.

Crossover design is a modification of the randomised controlled trial (a type of experimental study), in
which each patient receives treatment and placebo in a random order.
Crossover studies are only suitable for chronic diseases that are not curable but for which treatment
may give short-lived, temporary relief. They are particularly helpful when the outcome is measured by
reports of subjective symptoms, such as pain relief from an analgesic. Outcome is monitored during
each period of treatment, and in this way each patient can serve as his own control.

There is often a ‘wash-out’ period between tests to remove any carry over effects from the treatment.

Fewer patients are required because many between-patient confounders may be removed.

In observational studies, pre-defined groups are observed to see what happens to them. The different
types of observational study consider events or aetiological factors in the past, present or future in an
attempt to identify differences between the groups.

A case-control study is a type observational study in which two groups of patients, one with the disease
and one without, are compared on the basis of proposed causative factors that occurred in the past.

A cross-sectional study involves collecting data at a set, defined time point. They can be used to assess
the prevalence of a condition, to answer questions about the cause of a disease or to assess the results
of a medical intervention.
A cohort study is a form of longitudinal, observational study that follows a group of patients (the cohort)
forward in time to monitor the effects of a proposed aetiological factor upon them.

Experimental studies are characterised by the fact that the study subjects are allocated by the
investigator to the different study groups through the use of randomisation. This randomisation serves
to remove any potential bias.

Clinical trials are experimental studies and may be:

1. Double-blind: where neither the patient nor the investigator (or the treating clinician) are aware or
which treatment the patient has been randomised to receive.

2. Single-blind: where either the patient or the researcher (or the treating clinician) is not aware which
treatment the patient has been randomised to. In these studies it is usually the patient that is not aware
of the treatment type.

3. Unblinded: these are also known as ‘open studies’. Both the patient and the researcher (or treating
clinician) are aware of the treatment type.
The variance is a measure of the spread of the observations around the mean value. It only applies to
normally distributed data.
 
The standard deviation gives a measure of the spread of the distribution values. The standard deviation
can be calculated by:
 
              Standard deviation = √variance
 
 
The smaller the standard deviation (or variance) the more tightly grouped the values are.
 
The standard error of the mean (SEM) is a measure of how precisely the sample mean approximates the
population mean
 
When n is the sample size the SEM can be calculated by:
 
 
Standard deviation
SEM  =
√n
 
 
The SEM is directly proportional to the standard deviation and inversely proportional to the sample size.
It therefore decreases as the sample size increases and increases as the sample size decreases. A larger
sample provides more information and therefore a more precise estimate with a smaller standard error.
 
The SEM can be used to construct confidence intervals. In a normal distribution:
 Approximately 68% of the values lie within +/- 1 SEM
 Approximately 95% of the values lie within +/- 2 SEM
 Exactly 95% of the values lie within +/- 1.96 SEM
 
The interquartile range lies between the 25 th and the 75th centiles.

A likelihood ratio: 

 Applies to a piece of diagnostic information


 Tells you how useful that information is when making a diagnosis
 Is a number between zero and infinity
 If greater than one, indicates that the information increases the likelihood of the suspected
diagnosis
 If less than one, indicates that the information decreases the likelihood of the suspected
diagnosis

Likelihood ratios are defined as the ratio of the probability of a test result among patients with a target
disorder to the probability of that same test result among patients who do not have the target disorder.
They provide a more useful way of presenting diagnostic data and can be applied to individual patients
in a way that sensitivity and specificity cannot.

Likelihood ratios are used in evidence-based medicine for assessing the value of performing a diagnostic
test. They use the sensitivity and specificity of the test to estimate pre-test probabilities of having a
condition.
 
The likelihood ratio for a positive test result (LR+) can be calculated by:
 
sensitivity
LR+ =
1 – specificity
 
The likelihood ratio for a negative test result (LR-) can be calculated by:
 
1 – sensitivity
LR- =
specificity
 
A likelihood ratio of greater than 1 indicates that the result is associated with the presence of the
disease. A likelihood ratio of less than 1 indicates that the result is associated with the absence of the
disease.
 
Likelihood ratios may be multiplied by pre-test odds to calculate post-test odds. They are independent
of prevalence.

Statistical significance tests can be either parametric or non-parametric. Parametric tests usually
assume that data is normally distributed, whereas non-parametric tests are usually based on rank order.

The following are examples of non-parametric significance tests:

 Mann-Whitney U
 Wilcoxon paired
 Sign
 Spearman’s rank
 Kendall’s test
 Kruskal-Wallis test
 Friedman test

The following are examples of parametric significance tests:

 Unpaired student’s t
 Paired student’s t
 Pearson’s test

The chi-squared test can be used with both parametric and non-parametric data.

The chi-squared test can be only be used with nominal data.


Audits are a necessary part of medical practice and now form part of the revalidation process. It is
important to understand the process of performing an audit and the stages involved in an audit cycle.

The clinical audit cycle is as follows:

1. Identification of a problem
2. Defining the criteria and the setting of standards
3. Observation of practice and data collection
4. Comparison of performance with the criteria and standards
5. Implementation of change
Hospital Episode Statistics  (HES) is a data collection process that involves the collection details of
all hospital admissions, outpatient appointments and A&E attendances at NHS hospitals in England.

This data is collected during a patient's time at hospital and is submitted to allow hospitals to be paid for
the care they deliver.

The benefits of HES include:

 The monitoring of trends and patterns in NHS hospital activity


 Assessment of effectiveness of care delivery
 The support of local service planning
 It provides the basis for national indicators of clinical quality
 It reveals health trends over time
 Helps to inform patient choice

Local Patient Administration System data will only give information on a local scale and not assess the
national hospital MRSA rates.

Hospital Patient Feedback Questionnaires will only provide subjective data that is difficult to quantify.

Quality and Outcome Framework data deals solely with primary care data collection and is not
applicable to hospitals.

National census data will only give information on demographics.

You might also like