You are on page 1of 16

STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

CHAPTER 2: DESCRIPTIVE STATISTICS

2.1 ORGANISING AND GRAPHING DATA

Qualitative Data Quantitative Data


Tabulate Frequency distribution Frequency distribution
Contingency table
Graph Bar chart Histogram
Pie chart Box and Whisker plot
Stem and Leaf plot

2.1.1 ORGANISING AND GRAPHING DATA (QUALITATIVE DATA)

a) Frequency Distribution

A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.

Table 2.1: Number of students using different modes


of transportation

Mode Frequency
Car 11
School bus 8
Taxi 3
Others 3

b) Contingency Table

A contingency table summarizes information involving two variables.

Table 2.2: Number of students using different modes


of transportation according to gender

Gender
Mode Male Female
Car 4 7
School bus 6 2
Taxi 2 1
Others 1 2

LECTURER: U. H LAU 1
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

c) Bar Chart

A graph representing qualitative data and made of bars whose heights


represent the frequencies of respective categories.

Modes of transportation Modes of tranportation Modes of Transportation

12 7 Gender 12 Gender
Male Male
6 Female Female
10 10
Number of students

Number of students

Number of students
5
8 8
4
6 6
3
4 4
2
2 2
1

0
0 0
Car School bus Taxi Others
Car School bus Taxi Others Car School bus Taxi Others
Mode Mode
Mode

Simple bar chart Multiple / compound / cluster Stacked / component


bar chart bar chart

d) Pie Chart

A pie chart is a circle divided into sectors representing the categories of


a qualitative variable.

f
Angle of each sector = x 360 o
f

Pie chart will be more suitable as compared to simple bar chart when
the variable has many categories (more than four).

Modes of transportation

Mode
Car
School bus
12.0% Taxi
Others

12.0%

44.0%

32.0%

LECTURER: U. H LAU 2
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

2.1.2 ORGANISING AND GRAPHING DATA (QUANTITATIVE DATA)

a) Frequency Distribution

A frequency distribution for quantitative data is a table that lists all the
possible values in the data and the number of times the number is being
repeated (frequency)

Number of children Frequency Number of children Frequency


0 3 0–1 8
1 5 2–3 19
2 11 4–5 3
3 8
4 2
5 1

b) Histogram

Purpose: Summarise distribution of univariate data set graphically.

Shows: i) center of the data


ii) spread of the data
iii) skewness of the data
iv) presence of outliers

12

10

2 Std. Dev = .18


Mean = 1.03
0 N = 60.00

.69 .81 .94 1.06 1.19 1.31 1.44


.75 .88 1.00 1.13 1.25 1.38 1.50

Service time 1/4

LECTURER: U. H LAU 3
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

Right skewed Normal Left skewed

c) Box and Whisker Plot

Purpose: (same as histogram)

Shows: (same as histogram)

Extras: i) Outliers could be identified easily.


ii) Comparing the center and spread changes between
different sets of data.

Maximum value

Third quartile

Median

First quartile

Minimum value

LECTURER: U. H LAU 4
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

DISTRIBUTION DESCRIPTION BOX PLOT

Median line  towards bottom


Right skewed Top whisker  longer
Bottom whisker  shorter

Median line  center


Normal
Top whisker = Bottom whisker

Median line  towards top


Left skewed Top whisker  shorter
Bottom whisker  longer

LECTURER: U. H LAU 5
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

d) Stem and Leaf Plot

Purpose: (same as histogram)

Shows: (same as histogram)

Extras: i) Outliers could be identified easily.


ii) Values of each individual data can be recovered.

Right skewed Normal Left skewed

LECTURER: U. H LAU 6
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

2.2 NUMERICAL DESCRIPTIVE MEASURES

2.2.1 MEASURES OF CENTRAL TENDENCIES

Measures of Central Tendency

Mean Median Mode

Measures of central tendency are numerical descriptive measures


indicating the center of a data set.

a) MEAN

i) Mean will best represent the data if the normality (the bell-shaped
curve) assumption is met.

ii) x
x where x is the sum of all values
n n is the sample size

x is the sample mean


μ is the population mean

iii) Disadvantage: very sensitive to outliers or extreme values (values that


are very small or very large relative to the majority of the values in a
data set.)

Example 1:

The following are the age of all eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

LECTURER: U. H LAU 7
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

b) MEDIAN

1. Median is the value of the middle term in a data set that has been
ranked in increasing order.

2. The appropriate average to apply for ordinal data, and for interval and
ratio data that do not meet the normality criteria.

3. To find the median,

i) Arrange the data set in ascending order.


ii) Determine the position of the middle term.
n 1
Position of the middle term =
2

iii) Find the value of the median.


a) If n is odd number,
median = middle value in sequence

b) If n is even number,
median = average of 2 middle values

4. Median is not affected by extreme values.

Example 2:

The following are the number of patients that visited a clinic over 8
days.

60, 95, 98, 100, 100, 110, 112, 115

Calculate the median.

c) MODE

1. Mode is the value that occurs with the highest frequency in a data set.

2. There is data set with no mode or more than one mode.

3. Mode is not affected by extreme values.

4. Mode can be used for numerical and categorical data

LECTURER: U. H LAU 8
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

5. Ungrouped data

Mode = value that is repeated most often in the data set.

Example 3:

Find the mode of the followings:

a) The speed (in km/hr) of 6 cars that were stopped for speeding
violations are
100,115, 95, 100,120 and 110.

b) Prices for the same brand of TV set at 8 stores are

495, 486, 503, 495, 470, 505, 470 and 499.

c) Incomes (in RM) of five randomly selected families are

26150, 65750, 34985, 47490 and 13730.

LECTURER: U. H LAU 9
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

2.2.2 MEASURES OF DISPERSION

Measures of Dispersion

Variance Standard Deviation Coefficient of Variation

Measures of dispersion are numerical descriptive measures that indicate


how wide spread is the data set.

a) VARIANCE AND STANDARD DEVIATION

1. Variance approximates the average of deviations of each of the


measurements from the mean.

2. Deviation refers to the difference between each data value and its
mean.

3. Ungrouped Data:

(  x)2
 (x  x ) 2  x2  n
s2 = or
n 1 n 1

4. Standard deviation takes the positive square root of the variance.


It is recommended as the appropriate measure of spread for
symmetrically distributed data.

s= s2

5. Ungrouped data:

2 (  x)2
2 x 
 (x  x ) n
s= or
n 1 n 1

LECTURER: U. H LAU 10
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

Example 4:

The following data shows the number of patients to an outpatient clinic


in a particular week. Calculate the sample variance and standard
deviation for the following data.

110, 112, 98, 100, 115, 95, 100

b) COEFFICIENT OF VARIATION

1. Use to compare 2 or more groups.

2. Widely used by researchers

sample standard deviation


3. Coefficient of variation, CV = x 100 %
sample mean
s
=  100 %
x

4. It is used to judge how large or small the deviation is with respect to


its data, which is represented by the mean.

5. The larger the percentage, the greater the variation.

Larger variation implies less consistency.


Smaller variation implies better consistency.

Example 5:

A factory makes two types of tires X and Y. The table below shows the
distance traveled before the tires burst.
Distance X 100 120 121 130 125 130 140 110 115 125
Traveled (km) Y 125 130 126 123 131 129 124 123 125 128
Table below shows the mean and standard deviation of the distance
covered before bursting for both types of tires.
Types Mean Standard Deviation
X 121.6 11.31
Y 126.4 2.91

Which type of tires has a more uniform (consistent) strength?

LECTURER: U. H LAU 11
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

2.2.3 MEASURES OF SKEWNESS

1. Skewness refers to the shape of a distribution, either it is symmetric,


skew to the right (positively skewed or the left (negatively skewed).

2. Outliers tend to pull the mean in their direction but leave the median
and the mode unchanged. This will result in a lopsided or not
symmetrical distribution as shown below.

3. Mean tend to be overestimated when the outliers are extremely large


which results in a positively skewed distribution.

4. Mean tend to be underestimated when the outliers are extremely small


and hence a negatively skewed distribution exists.

5.
Normal distribution x ~
x  xˆ
(symmetric)

x
~
x

Right skewed x  xˆ
distribution
(positively skewed)

x̂ ~
x x

Left skewed x  xˆ
distribution
(negatively skewed)

x ~
x x̂

LECTURER: U. H LAU 12
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

6. Pearson’s Coefficient of Skewness is used to confirm if there is the


element of skewness.
x  xˆ 3(x  ~
x)
Pearson’s Coefficient of Skewness = or
s s

7. Type of distribution based on signs of coefficient.

Sign Type of Distribution

Positive positively skewed


(skewed to the right)

Zero symmetric

Negative negatively skewed


(skewed to the left)

Example 6:

Determine whether the distribution is normal, positively or negatively


skewed using Pearson’s Coefficient of Skewness if

x  115.60, x  114.50, s  14.48

LECTURER: U. H LAU 13
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

2.2.4 MEASURES OF POSITION

Measures of Position

Quartiles Percentile

A measure of position determines the position of a single value in relation to other


values in a sample of a population data set.

Quartiles

1. Quartiles are values that divide an array into four equal quarters.

2. The first quartile, Q1 is the value such that at most ¼ of the measurements
are less than Q1 or at most ¾ are greater than Q1.

3. The second quartile, Q2 is the median.

4. The third quartile, Q3 is the value such that at most ¾ of the


measurements are less than Q3 or at most ¼ are greater than Q3.

5. Location of quartiles in a distribution:

A1 A2 A3 A4

Q1 Q2 Q3

LECTURER: U. H LAU 14
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

6. For ungrouped data:

n 1
Q1 = the values corresponding to the position at 4
n 1
Q3 = the values corresponding to the position at 3( 4 ).

Example 7:

The number of patients to the outpatient clinic:

110, 112, 98, 100, 115, 95, 100, 60

Find the Q1 and Q3.

LECTURER: U. H LAU 15
STA 404 : STATISTICS FOR BUSINESS AND SOCIAL SCIENCES CHAPTER 2

APPENDIX: SPSS OUTPUT FOR DESCRIPTIVE STATISTICS

LECTURER: U. H LAU 16

You might also like