You are on page 1of 38

Descriptive statistics

Armand B. Hisona Jr, RMT, MSPHue


Statistics
• Many studies generate large numbers of data
points, and to make sense of all that data,
researchers use statistics that summarize the
data, providing a better understanding of overall
tendencies within the distributions of scores.
Reasons for using statistics
• aids in summarizing the results
• helps us recognize underlying trends and
tendencies in the data
• aids in communicating the results to others
Types of statistics

1. descriptive (which summarize some


characteristic of a sample)
• Measures of central tendency
• Measures of variability
• Measures of skewness

2. inferential (which test for significant differences


between groups and/or significant relationships
among variables within the sample
 t-ratio, chi-square, beta-value
Descriptive statistics

• Descriptive statistics is a series of


procedures designed to illuminate the
data, so that its principal characteristics
and main features are revealed.
• This may mean sorting the data by size;
perhaps putting it into a table, maybe
presenting it in an appropriate chart, or
summarising it numerically.
Descriptive Statistics is a
tool or technique that is used
to describe and organize the
characteristics of a collection
of information or data. The
collection is called a data set
or just data.
Why use it?

Descriptive Statistics is used in


research to answer five basic
questions based on five key
concepts.
Concept: Finding middle scores

Question: What is the


middle set of scores for this
data set?
Concept: Finding the spread of
scores

Question: How spread out


are the scores of this data
set?
Concept: Finding the rank of scores

Question: How does a


particular score compare to
the rest of the set of scores
for this data set?
Concept: Finding relationships between
variables

Question: How are different variables


related in this data set?
Key terms

• Central Tendency measures. They are


computed to give a “center” around which the
measurements in the data are distributed.

• Variation or Variability measures. They


describe “data spread” or how far away the
measurements are from the center.

• Relative Standing measures. They describe


the relative position of specific measurements in the
data.
Measures of Central Tendency

• Mean
• Median
• Mode
The Mean (average value)
sum of all the scores divided by the number of
scores.
 a good measure of central tendency for roughly
symmetric distributions
 can be misleading in skewed distributions since it
can be greatly influenced by extreme scores in which
case other statistics such as the median may be more
informative
 formula m = SX/N (population)
X = x
¯i/n (sample)
where m/X
¯ is the population/sample mean
and N/n is the number of scores.
Example of Mean

• MEAN = 40/10 = 4

• Notice that the sum of the


Measurements Deviation
“deviations” is 0.
x x - mean
3 -1

• Notice that every single


5 1
5 1
1 -3
7
2
3
-2
observation intervenes in
6
7
2
3 the computation of the
mean.
0 -4
4 0
40 0
The mean

Features:
1. One advantage of the mean over the median is that it
uses all of the information in the data set.
2. it is affected by skewness in the distribution, and by
the presence of outliers in the data.
3. it cannot be used with ordinal data.
The median
The median
the data is sorted from the lowest to the
highest ,the middle value is the median, half of
the values will be equal to or less than the
median value, and half equal to or above it.
Exercise
• The following is 11 rats survival days:
4 , 10 , 7 , 50 , 3 , 15 , 2 , 9 , 13 , >60
, >60
Questions:the average survival days?

Day: 2 , 3 , 4 , 7 , 9 , 10 , 13 , 15 , 50 ,
>60 , >60
Rank: 1 2 3 4 5 6 7 8 9 10 11
The median

Features:
1. the median is that it is not much affected by skewness in
the distribution, or by the presence of outliers.
2. it discards a lot of information, because it ignores most of
the values, apart from those in the centre of the
distribution.
Normal Distributions

 Curve is basically bell shaped from -


 to 
 symmetric with scores
concentrated in the middle (i.e. on
the mean) than in the tails.
Mean, medium and mode coincide
They differ in how spread out they
are.
 The area under each curve is 1.
The height of a normal distribution
can be specified mathematically in
terms of two parameters: the mean
(m) and the standard deviation (s).
Mode
• The most frequently occurring score
• Look at the simple frequency of each score
• Report mode when using nominal scale, the
most frequently occurring category
• If you have a rectangular distribution do not
report the mode
•Features:
•1. the mode is a measure of common-ness or
typical-ness.
•2. The mode is not particularly useful with metric
continuous data where no two values may be the
same
Example of Mode

Me asure me nts

x
3
5 • In this case the data have
5 tow modes:
1
7 • 5 and 7
2
6
• Both measurements are
7 repeated twice
0
4
Measures of Variability

• Range
• Variance
• Standard deviation
Range
• Distance between the highest and lowest scores
in a distribution;
 sensitive to extreme scores;
 Can compensate by calculating interquartile range
(distance between the 25th and 75th percentile
points) which represents the range of scores for the
middle half of a distribution
Usually used in combination with other measures
of dispersion.
range
unit 1 unit 2 unit 1 unit 2
* 9 26
9.7 9.0
| 25
11.5 11.2 | 24
11.6 11.3 | 23
12.1 11.7 | 22
| 21
12.4 12.2 | 20
12.6 12.5 | 19
13.1 13.2 | 18
13.5 13.8 | 17
| 3 16 24 *
13.6 14.0 | 15 56 |
14.8 15.5 | 8 14 0 |
16.3 15.6 | 651 13 28 |
26.9 16.2 | 641 12 25 |
| 65 11 237 |
16.4 | 10 |
* 7 9 0 *

R: range(x) • would be better to use midspread…


Variance
• The difference between an observed value and the
mean is called the deviation from the mean
• The variance is the mean squared deviation from
the mean
• i.e. you subtract each value from the mean, square
each result and then take the average.

• Because it is squared it can never be negative

2 = (x-
¯ i /n
x )2
Standard Deviation (SD)

A summary statistic of how much scores


vary from the mean
Square root of the Variance
▫ expressed in the original units of measurement
▫ Represents the average amount of dispersion
in a sample
▫ Used in a number of inferential statistics
The standard deviation
What is a standard deviation (in English)?

the mean of deviations from the mean (sort


of)
What is:

(lowercase sigma) is the population standard


σ deviation.

S the sample standard deviation

ŝ (s-hat) is the sample estimate of σ


The deviation (definitional) formula for the
population standard deviation

X  X 
2


N
• The larger the standard deviation the more variability
there is in the scores
• The standard deviation is somewhat less sensitive to
extreme outliers than the range (as N increases)
x- x the difference deviates from the mean
Estimating the population
standard deviation from a sample

S, the sample standard, is usually a little smaller than the


population standard deviation. Why?

The sample mean minimizes the sum of squared deviations


(SS). Therefore, if the sample mean differs at all from the
population mean, then the SS from the sample will be an
understimate of the SS from the population

Therefore, statisticians alter the formula of the


sample standard deviation by subtracting 1 from N
• The population variance could be interpreted as
the average squared difference from the
population mean, and the sample variance has
almost the same interpretation about the sample
mean.
feature
• the variance and the standard deviation are
shown to be the most appropriate measures of
variation when the data come from a symmetric
distribution,used to describe the spread
tendency of the numeric variable.
Skewness of distributions
• Measures look at how lopsided distributions are
—how far from the ideal of the normal curve
they are
• When the median and the mean are different,
the distribution is skewed. The greater the
difference, the greater the skew.
• Distributions that trail away to the left are
negatively skewed and those that trail away to
the right are positively skewed
• If the skewness is extreme, the researcher should
either transform the data to make them better
resemble a normal curve or else use a different
set of statistics—nonparametric statistics—to
carry out the analysis
Positive and negative Skewness

Mode Median Mean


Mean Median Mode
Percentiles

• The p-the percentile is a number such that at most p% of


the measurements are below it and at most 100 – p
percent of the data are above it.
• Example, if in a certain data the 85th percentile is 340
means that 15% of the measurements in the data are
above 340. It also means that 85% of the measurements
are below 340

• Notice that the median is the 50th percentile


So
• Descriptive statistics are used to summarize data
from individual respondents, etc.
▫ They help to make sense of large numbers of
individual responses, to communicate the essence
of those responses to others
• They focus on typical or average scores, the
dispersion of scores over the available responses,
and the shape of the response curve
38

End

Thank You For Listening!

You might also like