You are on page 1of 17


. There 2 main objective for epidemiological studies; descriptive and analytic.
. Descriptive epidemiology deals with rates, ratios and distributions, it explain
the determinants of the disease in the form of time place and person.
. Analytical epidemiological tests consist of observational studies and
experimental studies.
. Observational studies include: Case-Control, Cohort and cross-sectional studies.
. CASE CONTROL STUDY (Retrospective study):
. The movement is from the effect to the disease.
. The researcher begins with a population with a certain outcome, and subjects
are classified into:
Either "cases" or "controls" based on the outcome status.
. The cases and controls are assessed retrospectively to for the presence of risk
(Information is collected about exposure to risk factors).
. Is very popular in exploring an exposure - disease association.
. Subjects with the disease of interest are compared with an otherwise similar
group that is disease free.
. It is more cheap and easy than cohort study
. It is retrospective study aiming at determining the association between risk
factors and disease occurance.
. Used to compare the exposure of people e disease (cases) to the exposure of
people eout the disease (control).
. The main measure of association is Odds ratio can be calculated in the case
control study but incidence of the disease can't.
. One of the drawbacks of case control study is that the risk cannot be derived
directly from its results.
- Incidence measures ( e.g. relative risk or relative rate) can't be directly
measured in case-control study,
- Because the people being studied are those who have already developed the
- Relative risk and relative rate are calculated in cohort studies, where people
are followed over time for the occurrence of the disease.
. A Prospective or longitudinal COHORT STUDY:
. Divides the study group into "exposed" and "non exposed" to the risk factors.
. Each subject is then follow prospectively till the presence of the disease.
. Is a prospective observational study in which groups are chosen based upon the
presence or absence of one or more risk factors.
. All subjects are then observed over time for the development of the disease of
. Thus allowing estimation of the incidence within the total population and
comparison of incidences between subgroups.

. It is best for determining the incidence of the disease & comparing the
incidence of the disease in 2 populations,
. (one with and one without agiven risk) allows for calculation of a relative risk.
. It is stronger than case-control study and cross sectional study.

- Median survival: used to compare the median survival times in two or more
groups of patients (e.g. receiving
new treatment or placebo).
- Median survival is calculated in cohort study or clinical studies.
- Prevalence odds ratio: is calculated in cross-sectional studies to compare the
prevalence of the disease
between two different peoples.
. INCIDENCE: (typical for USMLE)
. It is the frequency of new cases of a disease arising in a population at risk over
a specified time period.
. It is the measure of the appearance of new cases.
. PREVALENCE: is the measure of those with the disease in the population at a
particular point in time.
. the relation between them in a stable population (little migration) can be
demonstrated by:
Prevalence = (incidence) (time)
. So if the incidence is fixed in a stable population, the prevalence is increased if
there are factors,
that prolong survival (i.e. disease duration) e.g. improved quality of care.
. Prevalence of disease in a population = incidence of the disease / population
. A Retrospective cohort study:
. Starts at some point between the exposure and the outcome.
. The researcher reviews the past records and classify subjects into "exposed"
and "non exposed" and then follow them until the outcome.
. In a cohort study, the study subjects are free of the outcome at the time a study
. Both the exposure and the outcome are studied at one point of time (at one
cross section of time).
. Since both exposure and outcome are present for some time before the study, it
is not possible to,
determine the temporal association between the exposure and outcome from
cross-sectional study.
. Takes a sample of individual from a population at one point in time.
. It allows determination of disease prevalence (the total number of cases in a
population at a given time).
. Disease incidence can't be determined.

. A study involving only patients already diagnosed with the condition of interest.
. It is helpful in determining the natural history of uncommon conditions.
. But provides no information about the disease incidence.
. Compare the therapeutic benefit of different interventions in patient already
diagnosed with a particular disease.
. Can't be used to determine disease incidence.


. Type of experimental study.
. It is considered as the gold standard for studying the efficacy of a treatment or
a procedure.
. Subjects are randomly assigned to an experimental or control group.
. This type of study has the least bias and helps to show a strong causal
. Is used as a measure for association in a cohort studies.
. It is the ratio of the risk in an exposed group to that of the unexposed group.
. The NULL value of RR is 1.0.
. A RR of 1 means that there is no association between the risk factor and the
. A relative risk > 1 means that there is a positive association between the risk
factor and the outcome.
. A relative risk < 1 means that there is a negative association between the risk
factor and the association.
. The farther the value of the RR from 1, the stronger the association.
. Example: the RR of bronchogenic cancer in smokers is greater than 2 -->
. a strong association between smoking (risk factor) and bronchogenic carcinoma
. When exposure is measured on a continuous scale (Number of smoked
cigarettes per day or PPD),
. The classification into two or more ordinal categories enable the risk to be
assessed as a function of exposure.
. And the DOSE RESPONCE EFFECT can be calculated from the exposure and the
. The present example illustrates a dose response relationship between smoking
and bronchogenic cancer,

. (The RR of bronchogenic lung cancer increases as the number of smoked PPD

. One weakness of the RR is that it gives no clue whether such finding can be
explained by chance alone.
. The confidence interval and the "P" value can help strengthen the finding of the
. For the study to be statistically significant:
1- The confidence interval must not contain null value (1).
2- The "p" value should be less than 0.05 (i.e < 5% chance the result obtained
were due to chance alone).
3- The RR is not Null value (1).
- The "p" value is used to strengthen the results of the study,
it is defined as the probability of obtaining the result by chance alone.
- e.g. "P" value is 0.01 means that (the probability of obtaining the result by
chance alone is 1%).
- The commonly accepted upper limit (cut-off point) of the "P" value for the
to be considered statistically significant is 0.05 (i.e. less than 5%).
- The "P" value deals with random variability, not bias.
- If the "P" value less than 0.05 (i.e. the study is statistically significant), the
95% confidence interval doesn't contain 1.0 (the null value for RR).
- A relative risk of 0.71 shows that the drug decreased the risk of mortality by
29% (the null value for RR is 1).
e.g.: A case of RR 1.6 (greater than 1) & the confidence interval 1.02-2.15 (doesn't
contain the null value 1),
so for the study to be statistically significant the "P" value must be less than
.N.B.: Absolute Risk Reduction (ARR)
. In case of 2 drugs or interventions study one drug reduce the relative risk (RR)
than the other.
. Absolute risk Reduction (ARR) = RR of first drug - RR of second drug.
. Number needed to treat (NNT): is the number of people that should receive a
treatment to prevent one defined event.
. Is calculated by inverse the absolute risk reduction.
. NNT = 1/ARR.
- The power of a study is the ability to detect a difference between two groups
(treated versus non treated, exposed versus non exposed).
- Increasing the sample size --> increases the power of the study and
consequently makes
the confidence interval of the point of estimate (e.g. relative risk) tighter.
- If the sample size is small --> low power of study to detect the difference
between exposed and non-exposed subjects & this makes the confidence interval
of the study wide (e.g. 0.8-3.1) and makes the study statistically insignificant.

- if we increase the sample size -> the confidence intervalll be tighter and the
studyll be statistically significant.
. Results from the manner in which the subjects are selected for the study, from
the selective losses from the follow-up.
. Is seen when the estimate of exposure and outcome association is biased.
. Because the study sample isn't representative of the target population with
respect to the joint distribution of exposure and outcome.
. Occurs due to imperfect assessment of the association between the exposure
and outcome.
. As a result of errors in the measurements of exposure and outcome status.
. It can be minimized by using standardized techniques for surveillance and
measurement of outcomes
as well as trained observers to measure the exposure and outcome.
. Occurs from poor data collection with inaccurate results.
. Lead-time bias should be considered while evaluating any screening test.
. It happens when two interventions are compared to diagnose a disease,
and one of them diagnose the disease earlier than the other without an effect on
the outcome (survival).
. What actually happens is that detection of the disease was made at an earlier
point of time,
. But the disease course itself or the prognosis did not change.
. So the screened patients appeared to live longer from the time of diagnosis till
the time of death.
. Think of LEAD BIAS when you see " a new screening test" for poor prognosis
diseases like lung cancer or pancreatic cancer.
. when the observer maybe influenced by prior knowledge or details of the study
that can affect the results.
. Refers to misclassification of an outcome and /or exposure.
. e.g.: labeling diseased subjects as non-diseased and vice versa.
. Blinded studies usually avoid this bias by preventing the observer from knowing
which treatment or intervention the participants are receiving.
. and are related to the design of the study (the scenario will describe how the
study was desgined).

. Occurs when a study participant is affected by prior knowledge to answer a
. This is more common in case-control studies than in randomized clinical trials.
. Occurs when the outcome of the test is obtained by the patient's response not
by objective diagnostic methods (e.g. migraine headache).
. Is a type of selection bias where a treatment regimen is selected for a patient
based on the severity of their condition, without taking into account other
possible confounding variables.
. Offline case20.
. Occurs when at least part of the exposure-disease relationship can be explained
by another variable (confounding).
. Due to presence of one or more variables associated independently with both
the exposure and the outcome.
. For example: cigarette smoking can be a a confounding factor in studying the
association between maternal alcohol drinking and low birth weight babies.
. As cigarette smoking is independently associated with alcohol consumption and
low borth weight babies.
. Hawthorne effect:
. It is the tendency of a study population to affect the outcome because these
people are aware that they are being studied.
. This awareness leads to consequent change in behavior while under observation
--> seriously affecting the validity of the study.
. It is usually seen in studies that concern behavioral outcomes or outcomes that
can be influenced by behavioral changes.
. In order to minimize the Hawthorne effect, the studied subjects can be kept
unaware that they are being studied.
1- Selection bias can be controlled by choosing a representative sample of the
population for the study & achieving a high rate of follow up.
2- Observer's bias can be controlled by blinding technique.
3- Ascertainment bias can be controlled by selecting a strict protocol of case
4- Confounders: can be avoided by 3 methods in the design stage of the study;
matching restriction and randomization.
- Randomization is commonly employed in clinical trials its purpose is to balance
various factors (confounders) that can influence the estimate of association
between the treatment and placebo groups so that the un-confounded effect of
the exposure can be isolated.

- A very important advantage of randomization when compared to other

methods is the possibility to control,
the known risk factors(as; Age, severity of the disease) as well as unknown &
difficult to measure confounders
as (level of stress, socioeconomic status) and make all confounders evenly
distributed between the treatment
group and the placebo.
- In clinical trials, randomization is said to be successful, when there is similarity
in the distribution of
the baseline characteristics (age, race, prevalence...) between the treatment
and placebo groups
i.e the confounders are evenly distributed between the treatment and the
placebo groups.
. It is the ratio of the chance of an event occurring in the treatment arm (drug or
group of interest),
. compared to the chance of that event occurring in the control arm (the other
drug or group) during a set period of time.
. Hazard ratio = event occurring in the test group / event occurring in the control
. So; the lower the hazard ratio, the less likely the event will occur in the
treatment arm.
. The higher the ratio, the more likely the event will occur in the treatment arm.
. A ratio close to 1 indicates no significant difference between the 2 groups,
. Example: Hazard ratio of 2 drugs A & B in bleeding complications:
. Hazard ratio for major bleeding = 0.93 i.e. close to 1 means that both groups
are similar to each others in this event.
. Hazard ratio for intracranial bleeding = 0.41 (indicates the lower chance of drug
"A" to cause intracranial bleeding than drug "B").
. Hazard ratio for GIT bleeding = 1.50 (indicates that drug "A" has a higher
chance to cause GIT than drug "B").
. Hazard ratio for life threating bleeding = 0.80 (indicates the lower chance of
drug "A" to cause intracranial bleeding than drug "B").
. Hazard ratio for total bleeding = 0.91 (indicates the slight lower chance of drug
"A" to cause intracranial bleeding than drug "B").
. In case number (11 offline) you should focus on the baseline value in the case in
take the corresponding hazard ratio in the study then
. Decide which one of them has the greater hazard of hyperkalemia (N.B. Ca
channel blockers affects GFR).
. You should learn case 19 in offline 2013. :)
. In any randomized clinical study, the goal of successful randomization is:
1- to eliminate bias in treatment assignments.
2- Blind the investigators from the identity of the patients who receive the
treatment arm.
3- Minimize the confounding variables.
. Ideal randomization allows for adequate statistical power and should include:
1- Equal patient group sizes.
2- Low selection bias.
3- Low probability of confounding variables.

. A listing of the base line characteristics of the patients in each armwould

if the two arms had patients with similar characteristics and would insure the
proper randomization occured in the study
. Two SAMPLE "T test":
. It is commonly used to compare two means not proportions.
. The basic requirements needed to perform this test are:
--> the two mean values - the sample variances - the sample size.
. "T test" is then done to obtain the "P" value.
. If the "P" value is less than 0.005 --> the null hypothesis (that there is no
difference between the two groups) is rejected, and the two means are assumed
to be statistically different.
. If the "P" value is large --> the Null hypothesis is retained.
. TWO SAMPLE "Z test:
. Also can be used to compare two means, but
. Population (not sample) variances are employed in the calculations.
. Because the population variances are not usually known --> this test has limited
. ANOVA test:
. I.e. Analysis of variances (ANOVA).
. Used to compare three or more means.
. Chi Square test:
. Used to compare proportions (of categorized outcome, e.g. high or low ) then
presented with the exposure (present or not present).
. A 22 table may be used to compare the observed values to the expected
. If the difference between the observed and expected values is large, this means
there is association between the exposure and the outcome.
. Is an epidemiologic method for pooling of the data from several studies to do an
analysis having a relatively big statistical power.
. Involves 2 or more experimental interventions, each e 2 or more variables that
are studied independently.
. For example:
. A study uses 3 different interventions
beta blocker (metoprolol), calc. channel blocker (amlodipine) or ACEIs (ramipril)
with to two different variable bl pr. endpoints (102-107 mmHg or < 92 mmHg).
# Patient Randomization:
---------------------------------1) ACEIs: - higher bp goal

- Lower bp goal.

2) Beta blocker: - higher bp goal - Lower bp goal.

3) Ca channel blocker: - higher bp goal - Lower bp goal.
. Is the grouping of different data point into similar categories.
. Usually involves randomization at the level of groups rather than at the level of
. In which a group of participants is randomized to one treatment for a period of
. and the other group is given an alternate treatment for the same period of time.
. At the end of the time period, the two groups then switch treatment for another
set period of time.
. Randomizes one treatment to one group and another treatment to the other
. Such as treatment drug to one group versus a placebo to the other group.
. There are usually no other variables are measured.
. Occurs when the effect a main exposure on an outcome is modified by another
. It is not a bias.
. It is a natural phenomenon that should be described not corrected as it is not a
bias or confoundation.
. Example: the effect of oral contraceptives on breast cancer is modified by the
family history.
. i.e. women with +ve family H/O have an ++ risk, while women without +ve
family H/O don't have an ++ risk.
. Other examples: studying the effect of estrogen on the risk of venous
thrombosis (modified by smoking).
. Also studying of the risk of lung cancer in people exposed to asbestos (greatly
depends on / modified by smoking).
. Is a time period required for an exposure to start the effect i.e the time require
from getting exposed to outcome.
. In infectious diseases it is relatively short, while in chronic diseases (e.g. cancer
or CAD),
it may be very long and extended period of exposure may be required to affect
the outcome.
. Latent period is a natural phenomenon nt a bias.
. OUTLIER: extreme observation:
. It is defined as an extreme and unusual observed in a dataset.
. It may be the result of a recording error, a measurement error or a natural

. It affects the measures of central tendency as well as measures of dispersion

for example:
. The mean: is extremely sensitive to the outliers and easily shifts towards them.
. The standard deviation is sensitive to outliers because it is the measure of
dispersion within the data set,
. and outliers significantly increase the dispersion (SD = deviation of values
around the mean).
. The rang = maximum value - minimal value (so it is definitely changed).
. The mode is not changed by outliers as they dont change the most frequent
value observed.
. The median is much more resistant to the outliers as is located in the middle of
the dataset where the observations usually dont differ much from each other.
. The normal distribution is symmetrical and bell shaped.
. All measures of central tendency are equal i.e. mean = median = mode.
. The degree of dispersion from the mean is determined by the standard
. 68% of data --> within 1 Standard deviation from the mean ( mean +/- 1 SD).
. 95% of data --> within 2 standard deviation from the mean (mean +/- 2 SD).
. 99.7% of data --> within 3 standard deviation from the mean (mean +/- 3 SD).
. Sensitivity --> the proportion of true +ve cases among all +ve cases (Sensitivity
= true +ve by the test/all +ve that are actually diseased).
. Indicates the ability of a test to detect those patient with disease.
. A higher sensitivity --> the higher the test detect patient with the disease -->
decrease false negatives.
. Specificity --> the proportion of true -ve cases among all -ve cases
(Specificity = true -ve by the test/all-ve that are actually free).
. Is a measure of the true negative rate and indicates how will a test can rule out
a given condition
(exclude those without the disease).
. The higher the specificity the more likely that most healthy patients will have a
-ve test results.
. The higher the specificity --> the less likely the false +ves.
. They are fixed values that are not vary with the pre-test probability of a disease
or with the prevalence of the
. The ideal diagnostic test should have high sensitivity and specificity.
- Raising the cutoff point of a diagnostic test --> decrease it's sensitivity but
increase it's specificity.
- Lowering the cutoff point of a diagnostic test --> increase it's sensitivity but
decrease it's specificity.
. Exposure Odds ratio:

. Is the measure of association in case control study.

. It compares the odds of exposure in cases to the odds of exposure in control.
. It is not the same as relative risk.
. RR can be calculated in follow up studies by comparing the risk of exposed
individuals to the risk of unexposed individuals.
. Direct calculation of RR in case-control study is not possible, because the study
design doesn't include following peoples overtime.
. But sometimes the RR can be approximately equal to the odd's ratio.
. If the prevalence of the disease is low --> the odd's ratio approximates the
Relative risk (RR).
. This is called ( the rare disease assumption).
. Increasing the sample size will decrease th "P" value of the odd's ratio and
make the confidence interval tighter.
. Pre and post-test Probabilities (+ve predictive value (PPV) & -ve predictive
A. Positive predictive value (ppv) test:
-------------------------------------------------. describes the probability of having the disease if the test result is +ve.
. The post-test probability of having the disease is directly related to the PPv.
. If the PPV is 25% i.e low, consequently if the test result is positive, then the
post-test probability of having the disease is low.
. The post-test probability is also dependent on the sensitivity, specificty and
pre-test probability of having the disease.

B. Negative predictive value (NPV) test:

----------------------------------------------------. describes the probability of not having the disease if the test result is -ve.
. NPV will vary with the pre-test probability of a disease (important) i.e,
. A patient with high probability of having a disease will have a low NPV.
. And a patient with a low probability of having a disease will have a high NPV.
. If the NPV is 96 % this means that if the test result is -ve, the chances of the
patient to not have the disease is high (96%).
. And the chances of the patient to have the disease is low (100 - 96 = 4%).
-----------1- BREAST CANCER & FNA test results:
. a patient of a high pre-test probability for having the disease (1st degree
relative having breast cancer or age > 40 ys), has a low NPV.
. a patient of a low pre-test probability for having breast cancer (less than 40 ys
old), has a high NPV.
2- HIV & ELISA test results:
. A patient who belongs to a high risk group e.g. (multiple sexual partners, use
no condoms, IV drug abuse)
--> has a high pre-test probability of having AIDS --> so he will have a low NPV.
. On the other hand a patient who belongs to a low risk group (one sexual
partner, using condom and no IV drug abuse)
--> has a low pre-test probability of having AIDS --> so has a high NPV.

-------. The prevalence of the disease is directly related to the pre-test probability of
having the disease (PPV) & inversely related to
the pre-test probability of not having the disease (NPV), so increased prevalence
--> low NPV but high PPV and vice versa.
-----. If the test result is -ve , the probability of the patient to have the disease = 1 NPV.
. Cases and diagnostic tests tha are high yield USMLE questions in probabilities:
- coronary artery disease and ECG stress test.
- Pulmonary embolism and ventilation-perfusion scanning.
- Prostate cancer and serum PSA level.
. Represents the appropriateness of the test (i.e. the test ability to measures
what is supposed to be measured).
. In order to determine the validity of a test, the results are compared to those
obtained from the gold standerd test.
. It doesn't depend on the pre-test probability of the disease.
N.B.: Also sensitivity and specificity of a test compare its results to the results
obtained by the gold standard test

. Test-retest reliability.
. A reliable test gives similar or very close results on repeat measurements.
. Receiver Operating Characteristic (ROC) curve:
. It emphasizes the importance of choosing the appropriate cutoff value, although
overlapping of normal & abnormal results make it difficult.
. Any cutoff point demonstrates a trade-off between SENSITIVITY and 1SPECIFICITY.
. Sensitivity (positivity in disease) --> is the proportion of subjects who have the
target condition and gives positive results.
. Sensitivity = TP/(TP + FN).CLINICALLY
. Specificity (Negativity in health) -> is the proportion of subjects eout the target
condition and gives -ve results.
. Specificity = TN/(TN + FP).CLINICALLY
. ++ Sensitivity --> ++ true +ve & -- false -ve (diagnosed as normal but he is

. ++ Sensitivity --> allaw not to miss any diseased patient (not to miss any true
. ++ Specificity --> ++ true -ve & -- false +ve (diagnosed as diseased but he is
. ROC --> Aiming at decrease false -ve and false +ve results (i.e increase
sensitivity and specificity).
- In ROC curve: sensitivity = true positive while (1-specificity) = false positive.
. Positive predictive value (ppv) --> is the probability of having the disease if the
test results are +ve.
. PPV = TP/(TP + FP).
. Negative predictive value (NPV) --> is the probability of not having the disease
if the test result is -ve.
. NPV = TN/(TN + FN).
. Positive likelihood ratio (LR+) = sensitivity/(1-specificity).
. (LR+) --> is the ratio of the proportion of patients who have the target condition
& test positive to,
. the proportion of patients without the target condition & who also test positive.
. Negative likelihood ratio (LR-) = (1-specificity)/sensitivity.
. (LR-) --> is the ratio of the proportion of patients who have the target condition
who test negative to,
the proportion of patients without the target condition who also test negative.
. ROC curve has 2 lines; vertical line (Y) for sensitivity and horizontal line (X) for
. Large Y values --> Indicates High sensitivity.
. Small X values --> Indicates High specificity.
. Low cutoff --> Increase sensitivity (better ability to identify patients with the
disease i.e increase true positive),
Although this causes decrease specificity (the test falsely identifies more
subjects as diseased also they are not) and vice versa.
. High cutoff --> Decrease sensitivity and Increase specificity.
. Low cutoff --> High Sensitivity --> higher negative predictive value (NPV) -->
decrease false -ve results (Ruling out probability).
. High cutoff --> Higher Specificity --> higher positive predictive value (PPV) -->
decrease false +ve results (Ruling in probability).
. A shift of the ROC curve upwards for a given cutoff indicates increased
sensitivity and vice versa.
. A shift of the curve to the right for a given cutoff indicates decreased specificity
and vice versa.

. The curve usually shows that an increase in sensitivity is offset by decrease in

. Both sensitivity and specificity depend on the cutoff value of a given test for
. Raising the cutoff value makes it more difficult to diagnose the condition i.e
it makes it harder to obtain +ve results and easier to obtain -ve results --> this
will increase specificity but
decrease sensitivity.
. Lowering the cutoff value makes it easier to obtain +ve results and harder to
obtain -ve results,
i.e increase sensitivity and decrease specificity.
. Increase sensitivity --> increase -ve predictive value (NPV) due to (decrease
false -ve results).
. Increase specificity --> increase +ve predictive value (PPV) due to (decrease
false +ve results).
. Is the proportion of the true +ve results out of the total number of the true
results of the test (-ve results are not taken into account).
. Precision is equivalent to +ve predictive value i.e. true +ve/all true.
. It is the measure of the random error in the study.
. The study is prcised if the results are not scattered widely, this is reflected by
a tight confidence interval.
. So, if the first study has a wider confidence interval than the second study -->
the second study is more prcised.
. Is the proportion of the true results (true +ve and true -ve) out of all results
that are predicted by the test.
. The closer the plotted curve approaches the left and top borders of the ROC
curve, the more accurate the test.
. Accuracy can also be measured by the total area under the plotted curve on
ROC curve.
. Increase of the total area under the curve --> increases the accuracy of the test.
. Both accuracy and precision depend upon sensitivity and specificity of the test
as well as the prevalence of the condition in the population tested.
. Validity and accuracy are measures of systematic errors (bias).
. Accuracy is reduced if the sample doesn't reflect the true value of the
parameter measured.
. Increasing the sample size --> increases the precision of the study, but doesn't
affect the accuracy.


. It assesses a linear relationship between two variables.
. The null value for the correlation coefficient is 0 (no association).
. And the range of plausible values is from -1 to 1.
. The sign (mark) of correlation coefficient indicates a positive or negative
. The closer the value to its margins (-1 or 1), the stronger the association.
. The correlation coefficient shows the strength of association but does nt
necessarily imply causality (cause of it).
. The association is statistically significant if P value is low.
. Risk:
. It measures the incidence of the disease.
. It is calculated by divide the number of diseased subjects by the number of
people at risk or of interest.
. Mean --> is the sum of observations divided by the number of observations.
. Mean (X') = E X/N. i.e = sum of obs./ N. of obs.
. Median --> is the middle observation in a series of observations after arranging
them in an ascending or descending manner.
. If number of observations is odd --> Median = (n+1)/2.
. If the number of observations is even --> Median = n/2
. Mode --> is the most frequent occurring value in the data.
. EXAMPLE: 5,6,7,5,10,3
. Mean = (5+6+7+5+10+3)/5 = 36/6 = 6.
. Mode --> 5.
. EX2: 5,6,8,9,11
. Median = (5+1)/2 = 3. so Median is the 3rd observ. --> median = 8.
. EX3: 5,6,8,9
. Median = 4/2 = 2. so median is the 2nd obsrv.
. Median will be the mean of observations 2&3 --> (6+8)/2 = 7.
. Range: is a measure of variation (dispersion).
. Range: is the difference between the largest and the smallest values
. Range = largest value - smallest value
e.g. Range = 9-5 = 4.


. They are useful for crude analysis of data.
. They can demonstrate the type of association (linear or nonlinear).
. If a linear association is present, the correlation coefficient can be calculated.
. The association is positive (if the outcome increases with the increase in the
exposure) -> +ve correlation coefficient while
the association is negative (if outcome decreases with the increase in exposure)
-> -ve correlation coefficient.
. the correlation coefficient in an almost perfect linear association is close to 1.
. Crude analysis of association using the scatter plots doesn't account for
possible confounders.
. N.B:
1- It is very important to consider the natural history of a disease when
evaluating the effectiveness of a druge
in a trial, e.g. common cold --> natural resolution within one week should be
taken in consideration while evaluating, an anti-viral drug used in treatment
of common cold.

It is difficult to comment on a drug effectiveness, unless a comparison is

made with the control group and
statistical significance is made to know the power of the study.


---------------------------. Is always the statement of NO relationship between the exposure and the
. To state the null hypothesis correctly you should recognize the study design
. In cross-sectional study: the 2 variables (CRP & cancer colon) are studied at the
same point of time so,
the temporal relationship between the 2 variable can't be evaluated.
. So you can't measure the relationship between the 2 variables --> Null
hypothesis is better considered.
--------------------------------------. It Opposes the Null hypothesis.
. It States that there is a relationship between the exposure and the outcome.
. It is better for studies in which a relationship between the 2 variables is
existing to consider the Alternative hypothesis.
. It is the applicability of the obtained results beyond the cohort that was
. External validity answer the question "how the generalizable are the results of a
study to other populations.
. For example: if the cohort is restricted to middle aged women, the results of the
study are applicable only to middle aged women & not applicable to elderly men.

Dr. Hisham Elkilany