Professional Documents
Culture Documents
1
5/20/2021
2
5/20/2021
3
5/20/2021
Pre-test the
Literature review to
questionnaire & Data management
understand the issue
research process and analysis
Usually review data… Preliminary analysis Many steps…
Develop rationale to
Ethical approval from Conclusions
support the
investigation a ERB Recommendations
Numbers
Develop Research
Develop a Hypothesis Dissemination and
Protocol &
or Research question Response
Questionnaires,
Hypothesis testing Sample Size, Variables, Questions Crisp Numbers
▪ The methods and tools of biostatistics are used to analyze the data for
decision making
▪ make valid inferences from known samples about the populations from
which they were drawn.
4
5/20/2021
DATA ANALYSIS
There are various types of Statistical Analysis:
▪ Descriptive analysis: used to describe the data set
▪ Inferential analysis: used to generate conclusions about the
population’s characteristics based on the sample data
o Differences analysis: used to compare the mean of the responses
of one group to that of another group
o Associative analysis: determines the strength and direction of
relationships between two or more variables
▪ Predictive analysis: allows one to make forecasts for future events
5
5/20/2021
UNDERSTANDING VARIABLES
▪ “any population/sample characteristic that we want to study in a study
is called a “variable” e.g., age, sex, years of education, HIV status,
income etc.,
▪ The term "variable" makes sense because the value of the characteristic
varies from one subject to another…. Because of inherent variation
among individuals and from errors, called measurement errors, made in
measuring and recording a subject's value on a characteristic.
QUALITATIVE QUANTITATIVE
Categorical Numerical/Scale
6
5/20/2021
mean
median
mode
Measure of central
tendency
Variance
QUALITATIVE
(Categorical) Frequencies/Proportions
7
5/20/2021
8
5/20/2021
mean
median
mode
Measure of central
tendency
Variance
QUALITATIVE
(Categorical) Frequencies/Proportions
9
5/20/2021
10
5/20/2021
11
5/20/2021
• Variance is the sum of the squared deviations from the mean divided by N.
2
=
(x -)2
• Population variance is given by 2
N
(x - x)2
• Sample variance is given by s2 s 2
=
n-1
STANDARD DEVIATION
▪ Commonest measure of dispersion used
▪ To calculate Standard Deviation, simply calculate the square root of the
variance.
▪ Population Standard deviation is given by
▪ Sample Standard deviation is given by s
▪ Average or mean is always presented along with standard deviation
12
5/20/2021
▪ Where range is a measure of where the beginning and end are in a set, an
interquartile range is a measure of where the bulk of the values lie. That’s why it’s
preferred over many other measures of spread when reporting things like school
performance or SAT scores.
▪ The interquartile range formula is the first quartile subtracted from the
third quartile:
IQR = Q3 – Q1.
mean
median
mode
Measure of central
tendency
Variance
QUALITATIVE
(Categorical) Frequencies/Proportions
13
5/20/2021
NORMAL DISTRIBUTION
▪ the most famous probability distribution in statistics.
▪ also called “Gaussian distribution” or “bell shaped curve”
▪ It is continuous, smooth, bell shaped and having only one peak (unimodal)
▪ The curve is symmetrical about the mean (shape is same on both sides)
▪ The mean, median and mode are equal and located at the center of the distribution
▪ Two parameters define the normal distribution, the mean () and the standard
deviation ().
▪ Since it is a probability distribution, total area under the curve is 1.00 or 100%
14
5/20/2021
NORMAL DISTRIBUTION
15
5/20/2021
SKEWNESS OF DATA
16
5/20/2021
TYPES OF DISTRIBUTION/CURVE
Normal Positive Skew Negative Skew
Distribution
Median Mean
Mean & Median
Mean arethe
& Median same
e Mean Mean Median
17
5/20/2021
OUTLIERS
▪ An outlier is any observation, which falls more than 3 std deviation away
from the mean.
▪ Outliers are extremely important because they can significantly skew
distributions, which otherwise are normal.
18
5/20/2021
19
5/20/2021
NORMAL DISTRIBUTION
Z SCORE TABLE
AREA UNDER THE CURVE
20
5/20/2021
Z SCORE TABLE
AREA UNDER THE CURVE
(X = μ + Zσ)
▪ To solve this…. Look for 90% in the table
and check the z score for it.
21
5/20/2021
▪ Even if the sample size if high, in reality, there will never be a situation
where you know the true population variance or standard deviation…
22
5/20/2021
T DISTRIBUTION
▪ The t distribution (Student’s t-distribution) is a probability distribution that is used
to estimate population parameters when
i. the sample size is small
ii. and/or when the population variance is unknown.
where x is the sample mean, μ is the population mean, s is the standard deviation of the
sample, and n is the sample size. The distribution of the t statistic is called the t
distribution or the Student t distribution.
T DISTRIBUTION
23
5/20/2021
T DISTRIBUTION
Following are the characteristics of the t distribution:
(1) The t statistic lies between −∞ < t < ∞.
(2) The probability distribution appears to be symmetric about t = 0.
(3) The probability distribution appears to be bell-shaped.
(4) The density curve looks like a standard normal curve, but the tails of the t-
distribution are "heavier" than the tails of the normal distribution. That is, we are
more likely to get extreme t-values than extreme z-values. There are no outliers.
(5) As the degrees of freedom increases, the t-distribution appears to approach the
standard normal z-distribution.
T SCORE TABLE
AREA UNDER THE CURVE
24
5/20/2021
25
5/20/2021
26
5/20/2021
CHI SQUARE
TABLE
▪ No negative values
▪ The value of χ2 is
different with each
degree of freedom
▪ What will be the value
of χ2 statistic if df=1
and the confidence
level is 95%
27
5/20/2021
PROBABILITY DISTRIBUTIONS
DISCRETE CONTINUOUS
28
5/20/2021
Bar Charts
Histograms/Frequency Polygons
Pie Charts
mean
Scatter Plots
median
Descriptive statistics
29
5/20/2021
100%
80%
60%
40%
20%
0%
BHWL BNU DGK GJRN GJRT HYD KHI KSUR LRK MPK NWB PSH QTA RWP SHKP SLKT SKKR TRBT OVERALL
30
5/20/2021
Street
14%
Hotel/Msg
3%
KK
Brothel 36%
1%
31
5/20/2021
32
5/20/2021
33
5/20/2021
34
5/20/2021
DESCRIPTIVE ANALYSIS
▪ Understand our data set
▪ Scale/Numerical data :
▪ Check for Distribution (Skewness & kurtosis (symmetrical or skewed.. Remember z distribution, t distribution)
▪ Measures of central tendency (Mean, Median, Mode)
▪ Measures of Dispersion (Standard deviation, variance, range, IQR)
▪ DECIDE – Do you want to make categories or present as such
▪ Qualitative/Categorical data:
▪ Check for proportions within each category
▪ The distribution of categorical variables follow Chi square distribution
DESCRIPTIVE ANALYSIS
35
5/20/2021
DESCRIPTIVE
ANALYSIS
THANKS
PLEASE READ THESE BASIC CONCEPTS….
36