Lec 2 - Descriptive Statistics

Pictorial and Tabular Methods
in Descriptive Statistics
1
Data Types
Numerical/Quantitative
Discrete: A variable is discrete if its set of
possible values either is finite or else can be listed
in an infinite sequence (one in which there is a first
number, a second number, and so on).
Example: the number of persons arriving for
service during a particular period. (1,2,4,6,7……)
Based on enumeration/counting
2
3
Data Types
Numerical/Quantitative
Continuous: A variable is continuous if its
possible values consist of an entire interval on the
number line.
Example: weight of an individual, reaction time
for a particular process. 2.5. 2.55, etc.
Based on measurement.
4
5
Data Types
Qualitative/Categorical
Ordinal: A natural ordering of classes; juniors,
seniors and graduate students or excellent, good,
fair, poor, worst
Arbitrary: black, green, yellow, white
6
7
Plotting Data: describing spread of data
Frequency Table and Histogram
Example:
◦ A researcher is investigating short-term memory capacity:
how many symbols remembered are recorded for 20
participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10
10, 6, 8, 9, 3, 5, 6, 4, 11, 6
◦ We can describe our data by using a Frequency Distribution
◦ This can be presented as a table or a graph. Always presents:

 The set of categories
 The frequency of each score/category
Three important characteristics: shape, central tendency, and

variability
8
Illustration of Frequency Table
X f p %  Frequency tables can display more
11 1 0.05 5% detailed information about distribution
10 2 0.1 10%
 X= Memory Score (No of symbols
9 1 0.05 5%
remembered)
8 2 0.1 10%
 f = Frequency
7 2 0.1 10%
 N= ∑ f = 20
6 4 0.2 20%
 Percentages and proportions
5 3 0.15 15%
4 3 0.15 15%
 p = fraction of total group associated
3 2 0.1 10% with each score (relative frequency)
 p = f/N, ∑ pi = 1
 % : p(100) =100(f/N)
Important Use
Proportion of individuals/participants who remembered upto 6
symbols = 0.1 + 0.15 + 0.15 + 0.2 = 0.6 = 60 % 9
Histogram
A Histogram displays a range of values of a

variable that have been broken into groups or
intervals.
Histograms are useful if you are trying to graph a

large set of data
10
Histogram (Discrete Data)
First, determine the frequency and relative
frequency of each x value.
Mark possible x values on a horizontal scale.
Above each value, draw a rectangle whose height is
the relative frequency (or alternatively, the
frequency) of that value.
This ensures that the area of each rectangle is
proportional to the relative frequency of the value.
If the relative frequencies of x=1 and x = 5 are .35
and .07, respectively, then the area of the rectangle
above 1 is five times the area of the rectangle above
5.
11
Histogram (Simple)
12
Histogram (Simple)
Mode = Variable
with Highest
Frequency/Relative
Frequency
13
Histogram
(large and/or continuous data)
To make a Histogram for a large and/or continuous

data,
◦ divide the data into intervals
◦ count the number of observations in each interval
◦ represent each interval with a bar indicating the
number of observations
14
Grouped Frequency Distribution Tables
(Class Interval)
X f
95-99 1 ◦ Sometimes the spread of data is too
90-94 1 wide
85-89 0 ◦ Grouped tables present scores as class
80-84 1 intervals
75-79 2  About 5-20 intervals
70-74 4
 An interval should preferably be of
65-69 7
equal width
60-64 0
55-59 6
50-54 3
15
Histogram with Class Intervals/Class Widths (Continuous Data)
16
Histogram with Equal Class Intervals/Class Widths
17
Histogram with Class Intervals/Class Widths
Class Formation
 There are no hard-and-fast rules concerning
either the number of classes or the choice of
classes themselves.
 Between 5 and 20 classes will be satisfactory for
most data sets.
 Generally, the larger the number of observations
in a data set, the more classes should be used.
 A reasonable rule of thumb is
No of Classes = √ No of observations
18
Histogram with Un-equal Class Intervals/Class Widths
 Equal-width classes may not be a sensible choice if a

data set “stretches out” to one side or the other
 After determining frequencies and relative
frequencies, calculate the height of each rectangle
using the formula
Density = Relative frequency of class

Class width
 The resulting rectangle heights are usually called
densities, and the vertical scale is the density scale.
 When class widths are unequal, not using a density

scale will give a picture with distorted areas.
19
 If a large number of equal-width classes are used, many

classes will have zero frequency.
 Using a small number of equal-width classes results in
almost all observations falling in just one or two of the
classes.
 A sound choice is to use a few wider intervals near extreme
observations and narrower intervals in the region of high
concentration.
20
21
Histogram -Qualitative Data
22
Histogram Shapes-Data Distribution
Unimodal: One peak
Bimodal: Two Peaks
23
Frequency Distribution: the Normal
Distribution
◦ Bell-shaped: symmetrical around the mid point,
where the greatest frequency of scores occur
24
Skewness
25
26
Skewness
Measures of symmetry of data (location of
concentration of data)
◦ Positively Skewed
 Skewed to the left
 Longer right tail towards high values
 Mean > Median
27
28
Skewness
Measures of symmetry of data
◦ Negatively Skewed
 Skewed to the right
 Longer left tail towards low values
 Mean < Median
29
30
Skewness
Measures of symmetry of data
◦ Symmetric : Bell shaped (Normal
Distribution)
 Mean ~ Median
 Mirror image on both sides of centre
31
Distribution shapes
Positively skewed
Symmetric 8
4
8
2
6
0
4 1 2 3 4 5 6
0
1 2 3 4 5 6 7
Negatively skewed
8
0
1 2 3 4 5 6
32
33
Shape of Data
Shape of data is measured by
◦ Skewness
◦ Kurtosis
34
Skewness
 Measures of asymmetry of data
◦ Positive or left skewed: Longer right tail
◦ Negative or right skewed: Longer left tail
Let x1 , x2 ,...xn be n observatio ns. Then,

n
n  ( xi  x ) 3
Skewness  i 1
3/ 2
 n
2
  ( xi  x ) 
 i 1 
35
Kurtosis
 Kurtosis is a measure of whether the data are
peaked or flat relative to a normal distribution
(relative flatness or peakedness of a distribution).
 A standard normal distribution (blue line: µ = 0; 
= 1) has kurtosis = 0.
 Data sets with high kurtosis tend to have a
distinct peak near the mean, decline rather
rapidly, and have heavy tails.
 Data sets with low kurtosis tend to have a flat top
near the mean rather than a sharp peak.
 Positive kurtosis indicates a "peaked" distribution
and negative kurtosis indicates a "flat"
distribution.
36
Kurtosis
37
Kurtosis Formula
Let x1 , x2 ,...xn be n observatio ns. Then,
n
n ( xi  x ) 4
Kurtosis  i 1
2
3
 n 2
  ( xi  x ) 
 i 1 
38
39
40

Lec 2 - Descriptive Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 2 - Descriptive Statistics

Uploaded by

Copyright:

Available Formats

Pictorial and Tabular Methods

◦ We can describe our data by using a Frequency Distribution

◦ This can be presented as a table or a graph. Always presents:

Three important characteristics: shape, central tendency, and

A Histogram displays a range of values of a

Histograms are useful if you are trying to graph a

To make a Histogram for a large and/or continuous

 Equal-width classes may not be a sensible choice if a

Density = Relative frequency of class

 When class widths are unequal, not using a density

 If a large number of equal-width classes are used, many

Bimodal: Two Peaks

Let x1 , x2 ,...xn be n observatio ns. Then,

You might also like