Descriptive statistics

Dr. N Shiukashvili

What is biostatistics???

Almost everyday several news portals inform as similar information:

A new treatment for HIV disease works better than current therapies

High blood pressure is demonstrated to be associated with heart

disease

A study suggests that a certain pollutant may be harmful to

humans

Such results are the work of multidisciplinary teams of researchers, including

Physicians

public and environmental health specialists

BIOSTATISTICIANS

Biostatisticians play essential roles in

designing the studies

analyzing the data

creating new methods for addressing these problems.

Descriptive Statistics

Class A

IQs of 13 Student

Class B

IQs of 13 Students

102

115

128

109

131

89

98

106

140

119

93

97

110

127

162

131

103

96

111

80

109

93

87

120

105

109

Descriptive Statistics

Which group is smarter now?

Class A--Average IQ

110.54

Class B--Average IQ

110.23

With a summary descriptive statistic, it is much easier to answer our question.

Descriptive statistics merely describe, organize, or summarize data; they refer only

to the actual data available.

For examples: mean blood pressure of a group of patients

success rate of a surgical procedure.

Descriptive Statistics

Population

Sample

university students taken as a whole.

Few of those measurements evaluated separately from the rest of the

population make up a sample.

Probability

Biostatistics is also used in modeling and hypothesizing.

Given a set of data, scientists combine biostatistics and probability theory in order to

determine the likelihood of diseases to hit populations, drugs to cure those diseases,

and peoples reaction to those drugs.

In this way, biostatistics promises to be as good at predicting the future as it is at

analyzing the past.

What means Probability???

A physician say that a patient has a 5050 chance of surviving a certain

operation.

A physician may say that she is 95 percent certain that a patient has a particular

disease.

As these examples suggest, most people express probabilities in terms of

percentages.

Probability

we measure the probability (p) of the occurrence of some event by a number

between zero and one as the event either occurs or not

0-1

The event less likely to occur is closer to the number 1;

Whereas the event more likely to occur is closer to the number 0.

An event that cannot occur has a probability of zero, and an event that is certain to

occur has a probability of one

Probability

Addition rule

Two events are called to be dependent if they DO affect one another

If there are 4 cards, what is the probability of after random taking to have heart card?

25 %

What is the probability to get red card?

25+25= 50%

If events A and B are mutually exclusive, then the probability of any one of several

particular events occurring is equal to the sum of their individual probabilities,

mutually exclusive - they cannot both happen

Probability

Multiplication rule

Two events are called to be independent if they do NOT affect one another

A method for finding the probability that both of two events occur together.

A - blue eyes

B - high IQ

If the probability for a newborn

girl to have blue eyes is 25%,

and high IQ 1%

what is the probability that the

newborn blue eyed girl has high

IQ?

If we take probability range from 0-1

0.25 X 0.01 = .0025 (0.25%)

Binomial Distribution

Representation of descriptive statistics data:

Organize Data

Tables

Graphs

Summarize Data

Central Tendency

Variation

Binomial Distribution

The binomial distribution is a probability distribution.

It has discrete values. It counts the number of successes in yes/no-type

experiments.

There are two parameters:

the number of times an experiment is done (n)

the probability of a success (p).

Example:

Tossing a coin 10 times, and counting the number of face-ups. (n=10, p=1/2)

Binomial Distribution

if coins will be tossed twice the four possible outcomes are:

Frequency Distributions

You are a researcher and conducting study about arterial tension in normal population.

You have a data of 500 person.

What's the next step?

Organizing the data from the highest to the lowest in order, recording the

frequency () with which each score occurs.

500

LOW

200

0

HIGH

Population

LOW

Arterial tension

Frequency Distribution

Grouped frequency

Frequency Distribution

RELATIVE FREQUENCY DISTRIBUTIONS

It transforms data, which shows the percentage of all the elements that fall within

each class interval.

If 18 person from 50 had same data, relative frequency will be 36 (18/50 X 100)

Normal Distribution

If we take the same example: arterial tension in normal population

gathered from 500 person.

Graphically it will be represented like this

200

0

Population

500

called:

symmetrical,

bell-shaped

Gaussian distribution

Arterial tension

Non-Normal Distribution

Distribution is not always symmetrical. There are Asymmetric frequency distributions

called skewed distributions.

by the location of the tail of the curve distribution can be:

Positively (or right) skewed distributions

negatively (or left) skewed distributions (Because the long "tail" is on the negative side of the

peak)

large number of high scores and a

small number of low scores.

large number of low scores and a

small number of very high scores;

Non-Normal Distribution

There is also another non-normal distribution called Bimodal distribution

Ex: Good pasteurs syndrome: A very rare diseases with bimodal age distribution

20-30 years and 60-70 years.

What is "central tendency," and why do we want to know it?

Imagine this situation:

You have a 5-point quiz in Behavioral science.

Next day your score is written to be "3/5" (60%)

How do you react?

Are you happy with your score of 3 or disappointed?

How do you decide?

What additional information you will need for final feeling?

Are you like most students???

Might be your 60% is the highest in group. Or lowest

Comparing individual scores to a distribution of scores is fundamental to statistics.

Which of the three datasets would make you happiest?

the one in Dataset A.

The problem is that the other four students had higher grades, so if we will

make graph your mark will be below the center of the distribution.

Measures of central tendency are:

Mean

Median

Mode

MEAN

The "mean" same as Mathematical average" is the number where you add up all

the numbers and then divide by the number of numbers.

This is the age at which some disease affects teenagers:

13, 18, 13, 14, 13, 16, 14, 21, 13

The mean age for this disease onset will be:

(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) 9 = 15

MEAN

During normal distribution will be directly in the middle

Negative skewed distribution most negative direction

Positive skewed distribution most positive direction

MEDIAN

The "median" is the "middle" value in the list of numbers.

To find the median, your numbers have to be listed in numerical order

This is the age at which some disease affects teenagers:

13, 18, 13, 14, 13, 16, 14, 21, 13

Rewrite in a numerical order

13, 13, 13, 13, 14, 14, 16, 18, 21

So the median is 14.

MODE

The mode is the number that is repeated more often than any other. If no

number is repeated, then there is no mode for the list.

13, 13, 13, 13, 14, 14, 16, 18, 21

so in above numbers 13 is the mode.

Measures of Variable

There are two normal distributions (A and B) with the identical means, modes, and

medians

Despite these similarities, these two distributions are obviously different;

Means that only central tendency alone is not enough!!!

The scores forming distribution A are clearly more scattered than are those

forming distribution B.

They differ in terms of their variability

Measures of Variable

If these 2 graphs depict the drug effect, which drug will be more efficient???

on this distribution have very high or

very low glucose levels

Patient number

Range

Variance

Standard deviation.

Measures of Variable

RANGE

Is the difference between the highest and the lowest scores in the distribution.

13, 13, 13, 13, 14, 14, 16, 18, 21

The largest value in the list is 21, and the smallest is 13

so the range is 21 13 = 8.

Measures of Variable

VARIANCE

The variance measures how far each number in the set is from the mean.

You and your friends have just measured the heights of your dogs (in

millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and

300mm.

Mean (average) height is 394 mm

Measures of Variable

Measures of Variable

To calculate the Variance, take each difference, square it, and

then average the result:

Variance equal to zero indicates that all values within a set of numbers are

identical;

A large variance indicates that numbers in the set are far from the mean and each

other, while a small variance indicates the opposite.

Has a limited use

Measures of Variable

Standard Deviation

It is just the square root of Variance, so:

Standard Deviation: = 21,704 = 147.32... = 147

So, using the Standard Deviation we have a "standard" way of knowing what is

normal, and what is extra large or extra small.

Rottweiler's are tall dogs. And Dachshunds are a bit short ...

Measures of Variable

Samples can be very uniform with the data all collected around the mean or they

can be spread out a long way from the mean.

Standard deviation measures it.

68-95-99 rule

Measures of Variable

What is biostatistics???

According to statistics every 6th on the earth is Chinese

Who of you is Chinese???

