You are on page 1of 34


Descriptive statistics

Dr. N Shiukashvili

What is biostatistics???
Almost everyday several news portals inform as similar information:
A new treatment for HIV disease works better than current therapies
High blood pressure is demonstrated to be associated with heart
A study suggests that a certain pollutant may be harmful to
Such results are the work of multidisciplinary teams of researchers, including
public and environmental health specialists
Biostatisticians play essential roles in
designing the studies
analyzing the data
creating new methods for addressing these problems.

Descriptive Statistics
Class A
IQs of 13 Student

Class B
IQs of 13 Students



Which Group is Smarter?

Descriptive Statistics
Which group is smarter now?
Class A--Average IQ

Class B--Average IQ

Theyre roughly the same!

With a summary descriptive statistic, it is much easier to answer our question.
Descriptive statistics merely describe, organize, or summarize data; they refer only
to the actual data available.
For examples: mean blood pressure of a group of patients
success rate of a surgical procedure.

Descriptive Statistics



A population is a set of measurements, for example - the IQ of the whole

university students taken as a whole.
Few of those measurements evaluated separately from the rest of the
population make up a sample.

Biostatistics is also used in modeling and hypothesizing.
Given a set of data, scientists combine biostatistics and probability theory in order to
determine the likelihood of diseases to hit populations, drugs to cure those diseases,
and peoples reaction to those drugs.
In this way, biostatistics promises to be as good at predicting the future as it is at
analyzing the past.
What means Probability???
A physician say that a patient has a 5050 chance of surviving a certain
A physician may say that she is 95 percent certain that a patient has a particular
As these examples suggest, most people express probabilities in terms of

we measure the probability (p) of the occurrence of some event by a number
between zero and one as the event either occurs or not

The event less likely to occur is closer to the number 1;
Whereas the event more likely to occur is closer to the number 0.
An event that cannot occur has a probability of zero, and an event that is certain to
occur has a probability of one

Addition rule
Two events are called to be dependent if they DO affect one another
If there are 4 cards, what is the probability of after random taking to have heart card?

25 %
What is the probability to get red card?
25+25= 50%

The addition rule of probability states that

If events A and B are mutually exclusive, then the probability of any one of several
particular events occurring is equal to the sum of their individual probabilities,
mutually exclusive - they cannot both happen

Multiplication rule
Two events are called to be independent if they do NOT affect one another
A method for finding the probability that both of two events occur together.
A - blue eyes
B - high IQ
If the probability for a newborn
girl to have blue eyes is 25%,
and high IQ 1%
what is the probability that the
newborn blue eyed girl has high
If we take probability range from 0-1
0.25 X 0.01 = .0025 (0.25%)

Binomial Distribution
Representation of descriptive statistics data:

Organize Data

Summarize Data
Central Tendency

Binomial Distribution
The binomial distribution is a probability distribution.
It has discrete values. It counts the number of successes in yes/no-type
There are two parameters:
the number of times an experiment is done (n)
the probability of a success (p).
Tossing a coin 10 times, and counting the number of face-ups. (n=10, p=1/2)

Binomial Distribution
if coins will be tossed twice the four possible outcomes are:

Frequency Distributions
You are a researcher and conducting study about arterial tension in normal population.
You have a data of 500 person.
What's the next step?
Organizing the data from the highest to the lowest in order, recording the
frequency () with which each score occurs.


What will be the frequency of 60/40 mm Hg?



What will be the frequency of 110/70 mm Hg?



What will be the frequency of 260/200 mm Hg?


Arterial tension

Frequency Distribution
Grouped frequency

Frequency Distribution
It transforms data, which shows the percentage of all the elements that fall within
each class interval.
If 18 person from 50 had same data, relative frequency will be 36 (18/50 X 100)

Normal Distribution
If we take the same example: arterial tension in normal population
gathered from 500 person.
Graphically it will be represented like this




Gaussian distribution

Arterial tension

Non-Normal Distribution
Distribution is not always symmetrical. There are Asymmetric frequency distributions
called skewed distributions.
by the location of the tail of the curve distribution can be:
Positively (or right) skewed distributions
negatively (or left) skewed distributions (Because the long "tail" is on the negative side of the

Negative skew - have a relatively

large number of high scores and a
small number of low scores.

Positive skew - have a relatively

large number of low scores and a
small number of very high scores;

Non-Normal Distribution
There is also another non-normal distribution called Bimodal distribution
Ex: Good pasteurs syndrome: A very rare diseases with bimodal age distribution
20-30 years and 60-70 years.

Measures of Central Tendency

What is "central tendency," and why do we want to know it?
Imagine this situation:
You have a 5-point quiz in Behavioral science.
Next day your score is written to be "3/5" (60%)
How do you react?
Are you happy with your score of 3 or disappointed?
How do you decide?
What additional information you will need for final feeling?

What other students got???

Are you like most students???
Might be your 60% is the highest in group. Or lowest

Measures of Central Tendency

Comparing individual scores to a distribution of scores is fundamental to statistics.
Which of the three datasets would make you happiest?

Dataset B is a depressing outcome even though your score is no different than

the one in Dataset A.
The problem is that the other four students had higher grades, so if we will
make graph your mark will be below the center of the distribution.

Measures of Central Tendency

Measures of central tendency are:
The "mean" same as Mathematical average" is the number where you add up all
the numbers and then divide by the number of numbers.
This is the age at which some disease affects teenagers:
13, 18, 13, 14, 13, 16, 14, 21, 13
The mean age for this disease onset will be:
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) 9 = 15

Measures of Central Tendency

During normal distribution will be directly in the middle
Negative skewed distribution most negative direction
Positive skewed distribution most positive direction

Measures of Central Tendency

The "median" is the "middle" value in the list of numbers.
To find the median, your numbers have to be listed in numerical order
This is the age at which some disease affects teenagers:
13, 18, 13, 14, 13, 16, 14, 21, 13
Rewrite in a numerical order
13, 13, 13, 13, 14, 14, 16, 18, 21
So the median is 14.
The mode is the number that is repeated more often than any other. If no
number is repeated, then there is no mode for the list.
13, 13, 13, 13, 14, 14, 16, 18, 21
so in above numbers 13 is the mode.

Measures of Variable
There are two normal distributions (A and B) with the identical means, modes, and
Despite these similarities, these two distributions are obviously different;
Means that only central tendency alone is not enough!!!

The scores forming distribution A are clearly more scattered than are those
forming distribution B.
They differ in terms of their variability

Measures of Variable

Blood glucose level

If these 2 graphs depict the drug effect, which drug will be more efficient???

drug B is the better, as fewer patients

on this distribution have very high or
very low glucose levels

Patient number

There are three important measures of variability:

Standard deviation.

Measures of Variable
Is the difference between the highest and the lowest scores in the distribution.
13, 13, 13, 13, 14, 14, 16, 18, 21
The largest value in the list is 21, and the smallest is 13
so the range is 21 13 = 8.

Measures of Variable
The variance measures how far each number in the set is from the mean.
You and your friends have just measured the heights of your dogs (in

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
Mean (average) height is 394 mm

Measures of Variable

Now we calculate each dog's difference from the Mean:

Measures of Variable
To calculate the Variance, take each difference, square it, and
then average the result:

So, the Variance is 21,704.

Variance equal to zero indicates that all values within a set of numbers are
A large variance indicates that numbers in the set are far from the mean and each
other, while a small variance indicates the opposite.
Has a limited use

Measures of Variable
Standard Deviation
It is just the square root of Variance, so:
Standard Deviation: = 21,704 = 147.32... = 147

So, using the Standard Deviation we have a "standard" way of knowing what is
normal, and what is extra large or extra small.
Rottweiler's are tall dogs. And Dachshunds are a bit short ...

but don't tell them!

Measures of Variable
Samples can be very uniform with the data all collected around the mean or they
can be spread out a long way from the mean.
Standard deviation measures it.

68-95-99 rule

Measures of Variable

What is biostatistics???
According to statistics every 6th on the earth is Chinese

How many are you here???

Who of you is Chinese???

Do NOT take statistics TOO seriously

Thanks For Attention

Dr. Nino Shiukashvili