You are on page 1of 40

Pictorial and Tabular Methods

in Descriptive Statistics

1
Data Types
Numerical/Quantitative
Discrete: A variable is discrete if its set of
possible values either is finite or else can be listed
in an infinite sequence (one in which there is a first
number, a second number, and so on).
Example: the number of persons arriving for
service during a particular period. (1,2,4,6,7……)
Based on enumeration/counting

2
3
Data Types
Numerical/Quantitative
Continuous: A variable is continuous if its
possible values consist of an entire interval on the
number line.
Example: weight of an individual, reaction time
for a particular process. 2.5. 2.55, etc.
Based on measurement.

4
5
Data Types
Qualitative/Categorical
Ordinal: A natural ordering of classes; juniors,
seniors and graduate students or excellent, good,
fair, poor, worst
Arbitrary: black, green, yellow, white

6
7
Plotting Data: describing spread of data
Frequency Table and Histogram
Example:
◦ A researcher is investigating short-term memory capacity:
how many symbols remembered are recorded for 20
participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10
10, 6, 8, 9, 3, 5, 6, 4, 11, 6

◦ We can describe our data by using a Frequency Distribution

◦ This can be presented as a table or a graph. Always presents:


 The set of categories
 The frequency of each score/category

Three important characteristics: shape, central tendency, and


variability

8
Illustration of Frequency Table
X f p %  Frequency tables can display more
11 1 0.05 5% detailed information about distribution
10 2 0.1 10%
 X= Memory Score (No of symbols
9 1 0.05 5%
remembered)
8 2 0.1 10%
 f = Frequency
7 2 0.1 10%
 N= ∑ f = 20
6 4 0.2 20%
 Percentages and proportions
5 3 0.15 15%

4 3 0.15 15%
 p = fraction of total group associated
3 2 0.1 10% with each score (relative frequency)
 p = f/N, ∑ pi = 1
 % : p(100) =100(f/N)

Important Use
Proportion of individuals/participants who remembered upto 6
symbols = 0.1 + 0.15 + 0.15 + 0.2 = 0.6 = 60 % 9
Histogram

A Histogram displays a range of values of a


variable that have been broken into groups or
intervals.

Histograms are useful if you are trying to graph a


large set of data

10
Histogram (Discrete Data)
First, determine the frequency and relative
frequency of each x value.
Mark possible x values on a horizontal scale.
Above each value, draw a rectangle whose height is
the relative frequency (or alternatively, the
frequency) of that value.
This ensures that the area of each rectangle is
proportional to the relative frequency of the value.
If the relative frequencies of x=1 and x = 5 are .35
and .07, respectively, then the area of the rectangle
above 1 is five times the area of the rectangle above
5.
11
Histogram (Simple)

12
Histogram (Simple)
Mode = Variable
with Highest
Frequency/Relative
Frequency

13
Histogram
(large and/or continuous data)

To make a Histogram for a large and/or continuous


data,
◦ divide the data into intervals
◦ count the number of observations in each interval
◦ represent each interval with a bar indicating the
number of observations

14
Grouped Frequency Distribution Tables
(Class Interval)

X f
95-99 1 ◦ Sometimes the spread of data is too
90-94 1 wide
85-89 0 ◦ Grouped tables present scores as class
80-84 1 intervals
75-79 2  About 5-20 intervals
70-74 4
 An interval should preferably be of
65-69 7
equal width
60-64 0
55-59 6
50-54 3

15
Histogram with Class Intervals/Class Widths (Continuous Data)

16
Histogram with Equal Class Intervals/Class Widths

17
Histogram with Class Intervals/Class Widths

Class Formation
 There are no hard-and-fast rules concerning
either the number of classes or the choice of
classes themselves.
 Between 5 and 20 classes will be satisfactory for
most data sets.
 Generally, the larger the number of observations
in a data set, the more classes should be used.
 A reasonable rule of thumb is

No of Classes = √ No of observations

18
Histogram with Un-equal Class Intervals/Class Widths

 Equal-width classes may not be a sensible choice if a


data set “stretches out” to one side or the other
 After determining frequencies and relative
frequencies, calculate the height of each rectangle
using the formula

Density = Relative frequency of class


Class width
 The resulting rectangle heights are usually called
densities, and the vertical scale is the density scale.

 When class widths are unequal, not using a density


scale will give a picture with distorted areas.
19
Histogram with Un-equal Class Intervals/Class Widths

 If a large number of equal-width classes are used, many


classes will have zero frequency.
 Using a small number of equal-width classes results in
almost all observations falling in just one or two of the
classes.
 A sound choice is to use a few wider intervals near extreme
observations and narrower intervals in the region of high
concentration.
20
Histogram with Un-equal Class Intervals/Class Widths

21
Histogram -Qualitative Data

22
Histogram Shapes-Data Distribution
Unimodal: One peak

Bimodal: Two Peaks

23
Frequency Distribution: the Normal
Distribution
◦ Bell-shaped: symmetrical around the mid point,
where the greatest frequency of scores occur

24
Skewness

25
26
Skewness
Measures of symmetry of data (location of
concentration of data)
◦ Positively Skewed
 Skewed to the left
 Longer right tail towards high values
 Mean > Median

27
28
Skewness
Measures of symmetry of data
◦ Negatively Skewed
 Skewed to the right
 Longer left tail towards low values
 Mean < Median

29
30
Skewness
Measures of symmetry of data
◦ Symmetric : Bell shaped (Normal
Distribution)
 Mean ~ Median
 Mirror image on both sides of centre

31
Distribution shapes
Positively skewed

Symmetric 8

4
8
2
6
0
4 1 2 3 4 5 6

0
1 2 3 4 5 6 7
Negatively skewed
8

0
1 2 3 4 5 6

32
33
Shape of Data
Shape of data is measured by
◦ Skewness
◦ Kurtosis

34
Skewness
 Measures of asymmetry of data
◦ Positive or left skewed: Longer right tail
◦ Negative or right skewed: Longer left tail

Let x1 , x2 ,...xn be n observatio ns. Then,


n
n  ( xi  x ) 3
Skewness  i 1
3/ 2
 n
2
  ( xi  x ) 
 i 1 
35
Kurtosis
 Kurtosis is a measure of whether the data are
peaked or flat relative to a normal distribution
(relative flatness or peakedness of a distribution).
 A standard normal distribution (blue line: µ = 0; 
= 1) has kurtosis = 0.
 Data sets with high kurtosis tend to have a
distinct peak near the mean, decline rather
rapidly, and have heavy tails.
 Data sets with low kurtosis tend to have a flat top
near the mean rather than a sharp peak.
 Positive kurtosis indicates a "peaked" distribution
and negative kurtosis indicates a "flat"
distribution.
36
Kurtosis

37
Kurtosis Formula
Let x1 , x2 ,...xn be n observatio ns. Then,
n
n ( xi  x ) 4
Kurtosis  i 1
2
3
 n 2
  ( xi  x ) 
 i 1 

38
39
40

You might also like