You are on page 1of 39

Biostatistics notes on:

Variables, graphical presentation


and Normal distribution curve
Dr. Ehab Abo Ali
• What is Biostatistics ?
Biostatistics is a branch of applied statistics that is concerned with the
application of statistical methods to medicine and other biological fields.
• It deals with application of the most appropriate statistical methods for :

Collection of data.

Organization and Summarization of collected data.

Presentation and Analysis of the summarized data.

Interpretation and Decision Making on the basis of analyzed data.


Data Statistics Information

Statistical Analysis :
Statistical analysis is concerned with making sense of data to get valid
conclusions or inferences which enables us in making wise decisions in the
face of uncertainty.

There are two phases (types) of statistical analysis, the descriptive and
the inferential (analytic) phases :
• Classification (types) of Variables :
The first step in any statistical analysis is to identify
the type of data (variables) you have.
• There are three ways of classification of
variables :
• Quantitative vs Qualitative.
• Continuous vs Discrete.
• Dependent vs Independent.
1- Quantitative versus Qualitative
Quantitative (Numerical, Metric) Variables :
• They are variables that yield measurements for which the value has numerical
meaning (?).

• Examples of quantitative variables include height measured in inches or centimeters,


blood pressure (mmHg), age (years), weight (Kgm), heart rate (beats/min), number of
beds in a hospital…etc.

Qualitative (Categorical) Variables :


• They are variables that yield observations on which individuals can be categorized
according to some characteristic or quality. They are variables that :
Cannot be measured numerically.
Elements can only be assigned into different categories (groups).

• Examples of qualitative variables include gender, marital status (single – married –


divorced – widowed), educational level (illiterate – read and write – high education),
occupation, nationality...etc.
2- Continuous versus Discrete
Continuous Variables:
• A continuous variable is one with theoretically an infinite (unlimited) number of possible values in any interval.
It can take values at any point along its scale of measurement.

Height (1.83, 1.74…Cm), weight (48.72, 65.83…Kgm), temperature, time, blood pressure, and blood sugar level
are commonly used continuous variables.

Discrete Variables:
• A discrete variable is the one that can take values only at specific points along its scale of measurement.

• Characteristics of discrete variables :


Have values which are usually whole numbers (integers) e.g. number of children in a family (two or three
children but not 2.5).
Counting is the mathematical operation most often used with it.
They have gaps where no real intermediate values of the variable are found in specific intervals assumed by it.
Several examples include blood group, sex, educational level, heart rate, white blood cells in a given sample of
blood, number of cigarettes per day, and number of live births per mother.

All qualitative variables are discrete.


Quantitative variables may be continuous or discrete


3- Dependent versus Independent
Dependent / Response.
Variable of primary interest (e.g. blood pressure in an antihypertensive drug trial).
Not controlled by the experimenter.

Independent / Predictor.
When an experiment is conducted, some variables are manipulated by the
experimenter (independent variables) and the effects of these are measured on a
response variable (dependent variable).
An easy way to distinguish the independent variables in an experiment is to ask
the question, “What would be the effect of (the independent variable) on (the
dependent variable). For example, a new statin drug may be given to patients to
see if it lowers their cholesterol levels.
Scales of Measurement
(Scales used to measure variables)
• Nominal - categories only.
• Ordinal - categories with some order.
• Interval - differences but not ratios (no
natural starting point). = no true Zero
• Ratio - differences and ratios (a natural
starting point).= True Zero
Meaning of the measurement scales.

Scale Characteristic question Examples

Marital status
Eye colour
Nominal Is A different than B?
Gender
Religion

Stage of disease
Ordinal Is A bigger than B? Severity of pain
Level of satisfaction
Temperature
Interval By how many units A and B differ? Calendar date
IQ test
Distance
Ratio How many times bigger than A is B? Height
Weight
Graphical presentation of data
(Graphs or Charts)
- Pie chart.
- Bar chart.
- Histogram.
- Frequency polygon.
- Scatter diagram.
- Box and plot.
Pie chart:

• It is suitable for summarizing data arranged in categories and on


percentage basis. It is specially useful in presenting data that consist of
a small number of categories.
• Pie chart is a circle divided into wedges (segments or slices) that
correspond to the percentages or frequencies of the distribution i.e. the
size of the slice is proportional to the frequency or percentage of cases
belonging to the category it represents.
• Because there is 360° in the circle, each 1% of the distribution can
be represented by a sector of the circle with a central angle of 3.60.
Bar chart:
• It is a tool for presentation of categorical variables where the various
categories are represented on one axis (usually the horizontal) and
frequencies or percentages of each category along the other axis
(usually the vertical).
• A vertical bar represents each category, and the height of each bar
represents the frequency (or relative frequency) corresponding to each
one. The bars should be separate and of equal width
• Bar chart may be simple when it represents one variable or it may
be multiple (clustered or stacked) when it represents the comparison of
more than one variable in the different comparison groups.
Histogram:
• A continuous metric variable can take a very large number of values, so it is usually
impractical to plot them without first grouping the values. The grouped data is plotted
using a frequency histogram. It is a special form of a bar chart that presents categories
(intervals) of a grouped quantitative variable. The bars are not separated by any space on
the X axis. The frequency or percentage of data in each category is depicted on the Y axis.

• A histogram looks like a bar chart used with discrete data except that each bar in a
histogram represents an interval (category or class) of possible values rather than a single
value but without any gaps between adjacent bars. This emphasizes the continuous nature
of the underlying variable.

• The width of the bar represents the interval of each category and the total area of each
bar is proportional to the corresponding frequency or percentage of each category.
Histogram
Frequency Polygon
Scatter Diagrams
• The relationship between two variables can be shown graphically in a
scatter diagram, as shown in Fig. A scatter diagram is a graph in which each
individual or unit measured is entered as a point, the position of each point
being determined by the values for the two characteristics measured.
Box-plot
• Boxplots provide a graphical summary of distribution based on the three quartile
values, the minimum and maximum values, and any outliers. Like the pie chart, the
boxplot can only represent one variable at a time, but a number of boxplots can be set
alongside each other. Useful for comparing large sets of data.

• A box-plot is a visual description of the distribution based on :


Minimum
Q1
Median
Q3
Maximum
Box plots are particularly useful when comparing similar data for two or more
populations.
Data Distribution
• Data can be “distributed” (spread out) in
different ways
A Bell Curve
What are some examples of things
that follow a Normal Distribution?
• Heights of people
• Errors in measurements
• Blood Pressure
• Test Scores

?! QUANTITATIVE
variables
Normal Distribution Curve
• mean=median=mode
• Symmetry about the center
• 50% of the values less than the mean and 50%
greater than the mean
68% of values
are within 1
standard
deviation of the
The Standard mean

Deviation :
95% of values
are within 2
is a measure standard
deviations of
of how the mean

spread out
values are. 99.7% of values
are within 3
standard
deviations of
the mean
Why do we need to know Standard
Deviation?
• Any value is
– likely to be within 1 standard deviation of the
mean
– very likely to be within 2 standard deviations
– almost certainly within 3 standard deviations
LET’S RECAP!
The properties of a normal distribution:
• It is a bell-shaped curve.
• It is symmetrical about the mean, μ. (The mean, the mode and the median all have the same value).
• The total area under the curve is 1 (or 100%).
• 50% of the area is to the left of the mean, and 50% to the right.

50% 50%

μ
The properties of a normal distribution:
• It is a bell-shaped curve.
• It is symmetrical about the mean, μ. (The mean, the mode and the median all have the same value).
• The total area under the curve is 1 (or 100%).
• 50% of the area is to the left of the mean, and 50% to the right.
• Approximately 68% of the area is within 1 standard deviation, σ, of the mean.

68%

σ σ
μ-σ μ μ+σ
The properties of a normal distribution:
• It is a bell-shaped curve.
• It is symmetrical about the mean, μ. (The mean, the mode and the median all have the same value).
• The total area under the curve is 1 (or 100%).
• 50% of the area is to the left of the mean, and 50% to the right.
• Approximately 68% of the area is within 1 standard deviation, σ, of the mean.
• Approximately 95% of the area is within 2 standard deviations of the mean.

95%

σ σ σ σ
μ - 2σ μ - σ μ μ + σ μ + 2σ
The properties of a normal distribution:
• It is a bell-shaped curve.
• It is symmetrical about the mean, μ. (The mean, the mode and the median all have the same value).
• The total area under the curve is 1 (or 100%).
• 50% of the area is to the left of the mean, and 50% to the right.
• Approximately 68% of the area is within 1 standard deviation, σ, of the mean.
• Approximately 95% of the area is within 2 standard deviations of the mean.
• Approximately 99% of the area is within 3 standard deviations of the mean.

99%

σ σ σ σ σ σ
μ - 3σ μ - 2σ μ - σ μ μ + σ μ + 2σ μ + 3σ
LET’S PRACTICE!
WE DO: 95 % of students at school are
between 1.1 m and 1.7 m tall.
• Assuming this data is normally distributed can
you calculate the mean and standard
deviation?
YOU DO: 68% of mothers’ own
children between 4 and 6 years old.
Assuming this data is normally distributed can you
calculate the mean?
WE DO: The reaction times for a hand-eye coordination test
administered to 1800 teenagers are normally distributed with a
mean of 0.35 seconds and a standard deviation of 0.05
seconds.
• Represent this information on a bell curve:

• About how many teens had reaction times


between 0.25 and 0.45 seconds?
• What is the probability that a teenager selected
at random had a reaction greater than 0.4
seconds?
YOU DO: The waiting times for an elevator are normally distributed
with a mean of 1.5 minutes and a standard deviation of 20 seconds.

• Represent this information on a bell curve:

• Find the probability that a person waits longer


than 2 minutes 10 seconds for the elevator.
YOU DO: Dr. Ehab gave a test in PHC class. The scores were normally
distributed with a mean of 85 and a standard deviation of 3.

• Represent this information on a bell curve:

• What percent would you expect to score


between 82 and 88?
YOU DO: The heights of 250 twenty-year-old women are
normally distributed with a mean of 1.68 m and standard
deviation of 0.06 m.

• Represent this information on a bell curve.

• Find the probability that a woman has a


height between 1.56 m and 1.74 m
Thank You

You might also like