You are on page 1of 15

STATA 273 CHAPTER 2

Categorical data ( words )

Numerical data ( numbers )

EXAMPLE#1 A survey made on a resort city showed that 40 tourists arrived by the means of transportations:

car, car, plane, plane, plane, bus, train, car, car, car, plane, car, plane, train, car, car, bus, car, plane, plane,

train, train, plane, plane, car, car, train, car, car, plane, car, car, plane, bus, plane, bus, car, plane, car, train.

EXAMPLE#2 The following are the grades that 50 students obtained on an STATISTICS test:

73 65 82 70 45 50 70 54 32 75 75 67 65 60 75 87 83 40 72 64 58 75 89 70 73

55 61 78 89 93 43 51 59 38 65 71 75 85 65 85 49 97 55 60 76 75 69 35 45 63

EX#1: Draw a pie chart for the following distribution.

Car Service Frequency Car Service Percentage Central Angle


Frequency Frequency
× 100% × 360°
Total Total
excellent 45 excellent 30% 108°
very good 60 very good 40% 144°
good 25 good 16.7% 60°
poor 20 poor 13.3% 48°
Total 150 Total 100%

poor
13%
excellent
good
30%
17%

very good
40%

1|P a g e
STATA 273 CHAPTER 2

EX#2: The following are determinations of a river's annual maximum flow in cube meters per second 405, 355, 419, 267,
370, 391, 612, 383, 434, 462, 288, 317, 540, 295, 508. Construct a stem-and-leaf display with two-digit leaves.

2 2 67 88 95 2 67 88 95
minimum = 267 3 3 55 70 91 83 17
(smallest values)
3 17 55 70 83 91
4 4 05 19 34 62 4 05 19 34 62
maximum = 612 5
(largest values) 5 40 08 5 08 40
6 6 12 6 12

EX#3: The following are figures on a well's daily production of oil in barrels 214, 19 8
203, 226, 198, 243, 225, 207, 203, 208, 200, 217, 202, 208, 212, 205, 220. 20 0 2 3 3 5 7 8 8
Construct a stem-and-leaf display with stem labels 19, 20, …, and 24. 21 2 4 7
22 0 5 6
23
24 3

EX#4: The following are the numbers of automobile accidents that occurred at 60 major inter-sections in a certain city during
th
the 4 of July weekend. Group the data into a frequency distribution showing how often each value occurs.

0 2 5 0 1 4 1 0 2 1
5 0 1 3 0 0 2 1 3 1
1 4 0 2 4 1 2 4 0 4
3 5 0 1 3 6 4 2 0 2
0 2 3 0 4 2 5 1 1 2
2 1 6 5 0 3 3 0 0 4

no. of automobile accidents frequency


0 15
1 12
2 11
3 7
4 8 If these numbers are replaced
5 5 with words, we call the table
6 2
categorical distribution
TOTAL 60

2|P a g e
STATA 273 CHAPTER 2

EX#5: The following are the grades of 50 students on a STATISTICS test.


73 65 82 70 45 50 70 54 32 75 75 67 65 60 75 87 83 40 72 64 58 75 89 70 73

55 61 78 89 93 43 51 59 38 65 71 75 85 65 85 49 97 55 60 76 75 69 35 45 63

Group the given data into a frequency distribution with seven classes.

𝒏 = 𝒅𝒂𝒕𝒂 𝒔𝒊𝒛𝒆  𝒏 = 𝟓𝟎

𝒌 = # 𝒄𝒍𝒂𝒔𝒔𝒆𝒔  𝒌 = 𝟕

𝒓 = 𝒓𝒂𝒏𝒈𝒆 = 𝒎𝒂𝒙 − 𝒎𝒊𝒏  𝒓 = 𝟗𝟕 − 𝟑𝟐 = 𝟔𝟓

𝒓 𝟔𝟓
𝐠𝐚𝐩 = → 𝑟𝑜𝑢𝑛𝑑 𝑢𝑝 𝑡𝑜 𝑛𝑜. 𝑜𝑓 𝑑𝑒𝑐𝑖𝑚𝑎𝑙 𝑝𝑙𝑎𝑐𝑒𝑠 𝑎𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔 𝑡𝑜 𝑦𝑜𝑢𝑟 𝑑𝑎𝑡𝑎.  𝐠𝐚𝐩 = = 𝟗. 𝟐𝟖𝟓𝟕𝟏𝟒 → 𝟏𝟎
𝒌 𝟕

73 65 82 70 45 50 70 54 32 75 75 67 65 60 75 87 83 40 72 64 58 75 89 70 73

55 61 78 89 93 43 51 59 38 65 71 75 85 65 85 49 97 55 60 76 75 69 35 45 63

Grade Frequency
30 – 39 3
40 – 49 5
50 – 59 7
60 – 69 11
70 – 79 15
80 – 89 7
90 – 99 2
Total 50

EX#6: From the given the frequency distribution complete the table to the right.

Class Frequency Class Limits Class Boundaries


20 – 34 5
35 – 49 11
50 – 64 14
65 – 79 6
80 – 94 4
TOTAL 40

Then determine the following:

The class limits have same (a) the lower class limits
number of decimal places
as the data. (b) the upper class limits

The class boundaries have (c) the lower class boundaries


one more decimal place
than the given data. (d) the upper class boundaries

(e) the class interval / width

(f) the class marks

3|P a g e
STATA 273 CHAPTER 2

SOLUTION

(a) the lower class


limits
20 35 50 65 80

(b) the upper class


limits
34 49 64 79 94

Class Boundaries
(c) the lower class
boundaries
19.5 34.5 49.5 64.5 79.5
19.5 – 34.5
34.5 – 49.5
49.5 – 64.5
(d) the upper class 64.5 – 79.5
boundaries
34.5 49.5 64.5 79.5 94.5
79.5 – 94.5

CLASS BOUNDARIES
34.5 49.5

20 34 35 49 50 64
FIRST CLASS SECOND CLASS THIRD CLASS

CLASS LIMITS CLASS LIMITS CLASS LIMITS

15

NOTES
 don’t use limits
(e) the class interval
/ the class width  Use boundaries

 Use successive limits

 Use successive boundaries

27 42 57 72 87
(f) the class marks
𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦+𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦
𝑂𝑅
2 2

4|P a g e
STATA 273 CHAPTER 2

EX#7: The class marks of a distribution of temperature readings (given to the nearest degree Celsius) are 16, 25, 34, and 43.

(a) Find the class boundaries.


(b) Find the class limits.

SOLUTION

16 25 34 43

11.5 20.5 29.5 38.5 47.5


16 + 25 25 + 34 34 + 43
2 2 2

Class Boundaries Class Limits


11.5 – 20.5 12 – 20

20.5 – 29.5 21 – 29

29.5 – 38.5 30 – 38

38.5 – 47.5 39 – 47

EX#8: Use the following frequency distribution to find how many students obtained.

(a) at least 35 Test Score Frequency


15 – 29 4
(b) at most 35
30 – 34 9
(c) less than 40
35 – 39 7
(d) more than 40
40 – 44 6
(e) 30 or less 45 – 49 4
(f) 30 or more 50 – 54 10
(g) exactly 50 TOTAL 40

SOLUTION

at least 35 27 Test Score Frequency Test Score Frequency Test Score Frequency
at most 35 NO ANSWER 15 – 29 4 15 – 29 4 15 – 29 4
30 – 34 9 30 – 34 9 30 – 34 9
less than 40 20
35 – 39 7 35 – 39 7 35 – 39 7
more than 40 NO ANSWER
40 – 44 6 40 – 44 6 40 – 44 6
30 or less NO ANSWER
45 – 49 4 45 – 49 4 45 – 49 4
30 or more 36 50 – 54 10 50 – 54 10 50 – 54 10
exactly 50 NO ANSWER TOTAL 40 TOTAL 40 TOTAL 40

5|P a g e
STATA 273 CHAPTER 2

HW#1: The following are determinations of a river's annual maximum flow in cube
2 67 88 95
meters per second 405, 355, 419, 267, 370, 391, 612, 383, 434, 462, 288, 317, 3 17 55 70 83 91
540, 295, 508. Construct a stem-and-leaf display with two-digit leaves. 4 05 19 34 62
5 08 40
6 12

HW#2: The following are figures on a well's daily production of oil in barrels 214,
19 8
203, 226, 198, 243, 225, 207, 203, 208, 200, 217, 202, 208, 212, 205, 220. 20 0 2 3 3 5 7 8 8
21 2 4 7
Construct stem-and-leaf display with stem labels 19, 20, …, and 24.
22 0 5 6
23
24 3

HW#3: The weights of certain mineral specimens given to the nearest tenth of an ounce 15.4, 17.6, 10.9, 11.3, 13.1, 14.5,
20.2, 18.8, 16.3, 19.9, 12.7.

(a) Group these data into a table having the classes: [10.5,12.5) , [12.5,14.5) , Class Frequency
[14.5,16.5) , [16.5,18.5) , [18.5,20.5) ounces where the left-hand endpoint is [ 10.5 , 12.5 ) 2
[ 12.5 , 14.5 ) 2
included but the right-hand endpoint is excluded. [ 14.5 , 16.5 ) 3
[ 16.5 , 18.5 ) 1
[ 18.5 , 20.5 ) 3
Total 11
(b) Use the previous part to find

 class marks
10.5 + 12.5
= 11.5
2
12.5 + 14.5
= 13.5
2
14.5 + 16.5
= 15.5
2
16.5 + 18.5
= 17.5
2
18.5 + 20.5
= 19.5
2

 class interval
The difference of successive class marks = 13.5 – 11.5 = 2

6|P a g e
STATA 273 CHAPTER 2

EX#9: Use the given frequency distribution to answer the relevant questions below. Class Frequency
31 – 40 4
(a) Construct a cumulative less than distribution. 41 – 50 9
51 – 60 7
(b) Construct a cumulative less than or equal distribution.
61 – 70 6
(c) Construct a cumulative more than distribution. 71 – 80 4
(d) Construct a cumulative more than or equal distribution. TOTAL 30

SOLUTION

Cumulative Less Than (Less Than Or Equal) Distribution

Class Boundaries Cumulative Frequency


less than 30.5 0 zero
less than 40.5 4
less than 50.5 13 addition
less than 60.5 20
less than 70.5 26
less than 80.5 30 total

Cumulative More Than (More Than Or Equal) Distribution

Class Boundaries Cumulative Frequency


more than 30.5 30 total
more than 40.5 26
more than 50.5 17 subtraction
more than 60.5 10
more than 70.5 4
more than 80.5 0 zero

7|P a g e
STATA 273 CHAPTER 2

EX#10: Represent the given distribution graphically by Grade Frequency


50 – 59 10
 a histogram
60 – 69 12
 a frequency polygon 70 – 79 20
80 – 89 15
 an upward ogive
90 – 99 3
 a downward ogive Total 60

SOLUTION

Histogram

Class Boundaries Frequency


49.5 – 59.5 10
59.5 – 69.5 12
20 69.5 – 79.5 20
79.5 – 89.5 15
16 89.5 – 99.5 3

12

4
89.5
49.5

59.5

99.5
69.5

79.5

Frequency Polygon

Class Marks Frequency


50+59
= 54.5 10
2
20 60+69
= 64.5 12
2
70+79
16 = 74.5 20
2
80+89
12 = 84.5 15
2
90+99
= 94.5 3
8 2

4
84.5

104.5
44.5

54.5

94.5
64.5

74.5

8|P a g e
STATA 273 CHAPTER 2

SOLUTION

Upward Ogive

Class Boundaries Frequency


60 49.5 0
59.5 10
50 69.5 22
79.5 42
40 89.5 57

30 99.5 60

20

10

89.5
49.5

59.5

99.5
69.5

79.5

Downward Ogive

Class Boundaries Frequency


60 49.5 60
59.5 50
50 69.5 38
79.5 18
40 89.5 3

30 99.5 0

20

10
89.5
59.5

99.5
69.5

79.5
49.5

9|P a g e
STATA 273 CHAPTER 2

Ungrouped Data

Grouped Data = Frequency Distribution

EX#11: Given the following marks of 20 students in the final exam of STATISTICS course: 50 43 68 81 90 93 76 70
69 94 25 55 89 93 71 65 72 71 84 57.

(a) Calculate the average (mean) mark of the class.

(b) Calculate the average (mean) mark of those students who achieved at least 90.

SOLUTION - (a)

∑𝑥 ∑ 𝑑𝑎𝑡𝑎
𝑥̅ = =
𝑛 #𝑑𝑎𝑡𝑎
50 + 43 + 68 + 81 + 90 + 93 + 76 + 70 + 69 + 94 + 25 + 55 + 89 + 93 + 71 + 65 + 72 + 71 + 84 + 57 1416
= = 70.8
20 20

SOLUTION - (b)

90 + 93 + 94 + 93 370
= = 92.5
4 4

EX#12: By mistake, an instructor has erased the grade that one of ten students in her class received in the final examination.
However, she knows the class average (mean) is 71 and the other nine students’ grades are 96, 44, 82, 70, 47, 74, 94, 78,
and 56. What must have been the grade that the instructor erased?

∑ 𝑑𝑎𝑡𝑎
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 (𝑚𝑒𝑎𝑛) =
#𝑑𝑎𝑡𝑎
96 + 44 + 82 + 70 + 47 + 74 + 94 + 78 + 56 + 𝒙
71 =
10
641 + 𝒙
71 =
10
(71)(10) = 641 + 𝒙

710 = 641 + 𝒙

710 − 641 = 𝒙

69 = 𝒙

The most frequent value

EX#13: Find the mode of the following data:

(a) 70 82 84 40 57 48 39 75 55 39 69 57 45 80 57 91 100.  MODE = 57

(b) 90 90 66 66 57 57 46 46 32 32 18 18.  NO MODE

(c) 11 34 59 11 60 78 96 35 60 31 21 74.  MODES = 11 and 60

10 | P a g e
STATA 273 CHAPTER 2

EX#14: You are given the following dataset 11 , 10 , 23 , 40 , 12 , 15 , 17 , 19 , 50 , 36.


Calculate the percentile P0.20

Calculate the percentile P0.33

Calculate the percentile P0.40

Calculate the percentile P0.61

How to calculate the percentiles?

(1) Order Data : ‫الترتيب‬ Case 1 : whole number

(2) Index = 𝒏𝒑

Case 2 : decimal number


𝒕𝒉𝒆 𝒊𝒏𝒅𝒊𝒄𝒂𝒕𝒆𝒅 𝒏𝒖𝒎𝒃𝒆𝒓 + 𝒕𝒉𝒆 𝒏𝒖𝒎𝒃𝒆𝒓 𝒕𝒉𝒂𝒕 𝒇𝒐𝒍𝒍𝒐𝒘𝒔
Case 1 𝒂𝒏𝒔𝒘𝒆𝒓 =
𝟐

(3) Answer ?
Case 2 𝒂𝒏𝒔𝒘𝒆𝒓 = 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 𝑛𝑢𝑚𝑏𝑒𝑟 𝑏𝑦 𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑢𝑝

SOLUTION

Ordered Data : 10 , 11 ,12 , 15 , 17 , 19 , 23 , 36 , 40 , 50

Calculate the percentile P0.20

index = 𝑛𝑝 = (10)(0.20) = 2 whole number


11 + 12
answer = = 𝟏𝟏. 𝟓 10 , 11 ,12 , 15 , 17 , 19 , 23 , 36 , 40 , 50
2

Calculate the percentile P0.33

index = 𝑛𝑝 = (10)(0.33) = 3.3 decimal number


answer = 𝟏𝟓 10 , 11 ,12 , 15 , 17 , 19 , 23 , 36 , 40 , 50
Calculate the percentile P0.40

index = 𝑛𝑝 = (10)(0.40) = 4 whole number


15 + 17
answer = = 𝟏𝟔 10 , 11 ,12 , 15 , 17 , 19 , 23 , 36 , 40 , 50
2

Calculate the percentile P0.61

index = 𝑛𝑝 = (10)(0.61) = 6.1 decimal number


answer = 𝟐𝟑 10 , 11 ,12 , 15 , 17 , 19 , 23 , 36 , 40 , 50

11 | P a g e
STATA 273 CHAPTER 2

EX#15: You are given the following stem and leaf display.
1 2 8
2 6 8
(a) Calculate the range.
3 1 3 5
(b) Calculate the midrange.
4 4 4 5 9
(c) Calculate the 1st quartile Q1.
5 3
(d) Calculate the 2nd quartile Q2 (median).
6 1 1 3
(e) Calculate the 3rd quartile Q3.
7 0 5
(f) Calculate the inter-quartile range (IQR).
8 0 0 9
(g) Draw a boxplot to represent the data graphically. 9 6

SOLUTION

range 𝑚𝑎𝑥 − 𝑚𝑖𝑛 = 96 − 12 = 𝟖𝟒

𝑚𝑎𝑥 + 𝑚𝑖𝑛 96 + 12 108


midrange = = = 𝟓𝟒
2 2 2

index = 𝑛𝑝 = (21)(0.25) = 5.25 decimal


1 2 8
answer = 𝟑𝟑 2 6 8
3 1 3 5
4 4 4 5 9
Q1 = 𝐏𝟐𝟓% 5 3
6 1 1 3
7 0 5
8 0 0 9
9 6
index = 𝑛𝑝 = (21)(0.50) = 10.5 decimal
1 2 8
answer = 𝟒𝟗 2 6 8
3 1 3 5
4 4 4 5 9
Q2 = 𝐏𝟓𝟎% 5 3
6 1 1 3
7 0 5
8 0 0 9
9 6
index = 𝑛𝑝 = (21)(0.75) = 15.75 decimal
1 2 8
answer = 𝟕𝟎 2 6 8
3 1 3 5
4 4 4 5 9
Q3 = 𝐏𝟕𝟓% 5 3
6 1 1 3
7 0 5
8 0 0 9
9 6

IQR Q3 − Q1 = 70 − 33 = 𝟑𝟕

Q3 + Q1 70 + 33 103
mid-quartile = = = 𝟓𝟏. 𝟓
2 2 2

12 | P a g e
STATA 273 CHAPTER 2

The boxplot

10 20 30 40 50 60 70 80 90 100

EX#16: Use the given boxplot to answer the relevant questions below.

1 3 5 7 9 11 13 15 17 19 21 23 25

SOLUTION

Q1 Q2 Q3
minimum maximum

1 3 5 7 9 11 13 15 17 19 21 23 25

(a) IQR = Q3 − Q1 = 19 − 7 = 12

(b) range = maximum − minimum = 23 − 5 = 18

(c) median = Q2 = 15

maximum + minimum 23+5 28


(d) midrange = = = = 14
2 2 2

(e) minimum = 5

(f) first quartile = Q1 = 7

13 | P a g e
STATA 273 CHAPTER 2

EX#17: Find the mean, variance, standard deviation and coefficient of variation for the following data: 7 14 22 19
20 13 25 11.2

SOLUTION
𝒙 𝒙𝟐
7 49 ∑𝒙
14 196 mean = 𝒙 =
𝒏
22 484
19 361 𝒏 ∑ 𝒙𝟐 − ( ∑ 𝒙 ) 𝟐
variance = 𝒔𝟐 =
20 400 𝒏(𝒏 − 𝟏)
13 169
25 625 standard deviation = 𝒔 = √𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
11.5 125.44
SUM 131.2 2409.44 𝒔
coeffficient of variation = 𝐂𝐕 =
𝒙

131.2
mean = = 𝟏𝟔. 𝟒
8

8(2409.44) − (131.2)2
variance = ≈ 𝟑𝟔. 𝟖𝟐
8(7)

standard deviation = √36.82 ≈ 𝟔. 𝟎𝟕

6.07
coefficient of variation = ≈ 𝟎. 𝟑𝟕
16.4

HW#4: If the mean annual salary paid to the chief executive of three engineering firms is $125,000, can one of them receive
$400.000?

HW#5: In the recent years, the price of copper was 69.6, 66.8, and 66.3 cents per pound. And the price of bituminous coal
was 19.43, 19.82, and 22.40 dollars per ton. Which of the two set of prices is relatively more variable?

14 | P a g e
STATA 273 CHAPTER 2

EX#18: Use the given frequency distribution to find the following descriptive measures:
Class Frequency
(a) The modal class.
01 – 05 4
(b) The mean.
06 – 10 3
(c) The variance and standard deviation. Grouped
Data 11 – 15 7
(d) The percentile P62%.
16 – 20 5
(e) The first quartile Q1.
21 – 25 6

SOLUTION – (a), (b), (c) ∑ 𝒙. 𝒇


mean: x =
𝒏
Class 𝑓 𝑥 𝑥2 𝑥∙𝑓 𝑥2 ∙ 𝑓
𝒏 ∑ 𝒙𝟐 . 𝒇 − ( ∑ 𝒙. 𝒇 )𝟐
01 – 05 4 3 9 12 36 variance: 𝒔𝟐 =
𝒏(𝒏 − 𝟏)
06 – 10 3 8 64 24 192
11 – 15 7 13 169 91 1183 standard deviation: 𝒔 = √𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
16 – 20 5 18 324 90 1620 𝑛 = sum of frequencies
21 – 25 6 23 529 138 3174
𝑓 = frequencies
TOTAL 𝑛 = 25 355 6205
𝑥 = class marks

(d) The modal class = class with largest frequency  11 – 15

∑𝑥 . 𝑓 ∑𝑥 . 𝑓
(e) The mean = ∑𝑓
= = 355/25 = 14.2
𝑛

𝑛 ∑(𝑥 2 .𝑓)−(∑ 𝑥 .𝑓)2 𝟐𝟓(𝟔𝟐𝟎𝟓)−(𝟑𝟓𝟓)𝟐 𝟏𝟓𝟓,𝟏𝟐𝟓−𝟏𝟐𝟔,𝟎𝟐𝟓 𝟐𝟗𝟏𝟎𝟎


(f) The variance = 𝑛(𝑛−1)
= (𝟐𝟓)(𝟐𝟒)
= 𝟔𝟎𝟎
= 𝟔𝟎𝟎
= 𝟒𝟖. 𝟓

The standard deviation = √variance = √𝟒𝟖. 𝟓 ≈ 𝟔. 𝟗𝟔

SOLUTION – (d) & (e)

Cumulative np − previous cumulative frequency


Class 𝑓 percentile = lower boundary + . class interval
Frequency frequency
01 – 05 4 4
06 – 10 3 7
(d) The percentile P62%
11 – 15 7 14
16 – 20 5 19  index  np = 25 * 0.62 = 15.5  cumulative frequency > np  16 - 20
21 – 25 6 25 𝐧𝐩 − 𝐩𝐫𝐞𝐯𝐢𝐨𝐮𝐬 𝐜𝐮𝐦𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲
 answer  lower boundary + 𝐜𝐚𝐥𝐬𝐬 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲
. class interval
TOTAL 𝑛 = 25
15.5 − 14
 P62% = 15.5 + 5
∗ 5 ≈ 15.5 + 1.5 = 𝟏𝟕
(e) The first quartile Q1 = P25%

 index  np = 25 * 0.25 = 6.25  cumulative frequency > np  06 - 10


𝐧𝐩 − 𝐩𝐫𝐞𝐯𝐢𝐨𝐮𝐬 𝐜𝐮𝐦𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲
 answer  lower boundary + 𝐜𝐚𝐥𝐬𝐬 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲
. class interval

6.25 − 4
 Q1 = 5.5 + 3
∗ 5 = 5.5 + 3.75 = 𝟗. 𝟐𝟓

15 | P a g e

You might also like