Professional Documents
Culture Documents
Descriptive Statistics Lecture 2
Descriptive Statistics Lecture 2
in Descriptive Statistics
1
Data Types
Numerical/Quantitative
Discrete: A variable is discrete if its set of
possible values either is finite or else can be listed
in an infinite sequence (one in which there is a
first number, a second number, and so on).
Example: the number of persons arriving for
service during a particular period. (1,2,4,6,7……)
Based on enumeration/counting
2
Data Types
Numerical/Quantitative
Continuous: A variable is continuous if its
possible values consist of an entire interval on the
number line.
Example: weight of an individual, reaction time for
a particular process. 2.5. 2.55, etc.
Based on measurement.
3
Data Types
Qualitative/Categorical
Ordinal: A natural ordering of classes; juniors,
seniors and graduate students or excellent, good,
fair, poor, worst
Arbitrary: black, green, yellow, white
4
Plotting Data: describing spread of data
Frequency Table and Histogram
Example:
◦ A researcher is investigating short-term memory
capacity: how many symbols remembered are
recorded for 20 participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10
10, 6, 8, 9, 3, 5, 6, 4, 11, 6
◦ We can describe our data by using a Frequency
Distribution
5
Illustration of Frequency Table
X f p % Frequency tables can display more
11 1 0.05 5% detailed information about distribution
10 2 0.1 10%
X= Memory Score (No of symbols
9 1 0.05 5%
remembered)
8 2 0.1 10%
f = Frequency
7 2 0.1 10%
N= ∑ f = 20
6 4 0.2 20%
5 3 0.15 15%
Percentages and proportions
4 3 0.15 15% p = fraction of total group
3 2 0.1 10%
associated with each score
(relative frequency)
p = f/N, ∑ pi = 1
% : p(100) =100(f/N)
Important Use
Proportion of individuals/participants who remembered upto 6
symbols = 0.1 + 0.15 + 0.15 + 0.2 = 0.6 = 60 % 6
Histogram
A Histogram displays a range of values of a variable
that have been broken into groups or intervals.
7
Histogram (Discrete Data)
First, determine the frequency and relative
frequency of each x value.
Mark possible x values on a horizontal scale.
Above each value, draw a rectangle whose height is the
relative frequency (or alternatively, the frequency) of
that value.
This ensures that the area of each rectangle is
proportional to the relative frequency of the value.
If the relative frequencies of x=1 and x = 5 are .35
and .07, respectively, then the area of the rectangle
above 1 is five times the area of the rectangle above
5.
8
Histogram (Simple)
9
Histogram (Simple)
Mode = Variable
with Highest
Frequency/Relative
Frequency
10
Histogram
(large and/or continuous data)
11
Grouped Frequency Distribution Tables
(Class Interval)
X f
95-99 1 ◦ Sometimes the spread of data is too
90-94 1 wide
85-89 0 ◦ Grouped tables present scores as class
80-84 1 intervals
75-79 2 About 5-20 intervals
70-74 4
An interval should preferably be of
65-69 7
equal width
60-64 0
55-59 6
50-54 3
12
Histogram with Class Intervals/Class
Widths (Continuous Data)
13
Histogram with Equal Class
Intervals/Class Widths
14
Histogram with Class Intervals/Class
Widths
Class Formation
There are no hard-and-fast rules concerning
either the number of classes or the choice of
classes themselves.
Between 5 and 20 classes will be satisfactory for
most data sets.
Generally, the larger the number of observations
in a data set, the more classes should be used.
A reasonable rule of thumb is
No of Classes = √ No of observations
15
Histogram with Un-equal Class
Intervals/Class Widths
Equal-width classes may not be a sensible choice if a
data set “stretches out” to one side or the other
After determining frequencies and relative
frequencies, calculate the height of each rectangle
using the formula
18
Histogram -Qualitative Data
19
Histogram Shapes-Data Distribution
Unimodal: One peak
20
Frequency Distribution: the Normal
Distribution
◦ Bell-shaped: symmetrical around the mid point, where
the greatest frequency of scores occur
21
Skewness
Measures of symmetry of data (location
of concentration of data)
◦ Positively Skewed
Skewed to the left
Longer right tail towards high values
Mean > Median
22
Skewness
Measures of symmetry of data
◦ Negatively Skewed
Skewed to the right
Longer left tail towards low values
Mean < Median
23
Skewness
Measures of symmetry of data
◦ Symmetric : Bell shaped (Normal
Distribution)
Mean ~ Median
Mirror image on both sides of centre
24
Distribution shapes
Positively skewed
Symmetric 8
4
8
2
6
0
4 1 2 3 4 5 6
0
1 2 3 4 5 6 7
Negatively skewed
8
0
1 2 3 4 5 6
25
Shape of Data
26
Skewness
Measures of asymmetry of data
◦ Positive or left skewed: Longer right tail
◦ Negative or right skewed: Longer left tail
29
Kurtosis Formula
Let x1 , x2 ,...xn be n observations. Then,
n
n ( xi x ) 4
Kurtosis i 1
2
3
n 2
( xi x )
i 1
30