You are on page 1of 23

JANUARY SEMESTER 2023

NBHS1202

STATISTICS
TABLE OF CONTENTS
PART 1............................................................................................................................................2

QUESTION 1..................................................................................................................................2

QUESTION 2..................................................................................................................................3

i. Mean, Median, and Mode for each data set:.................................................................................3

Data Set A:.......................................................................................................................................3

Data Set B:.......................................................................................................................................3

Data Set C:.......................................................................................................................................3

ii. Quartiles for Set C:......................................................................................................................4

iii. Coefficient of variation for each distribution:............................................................................4

CV for Set A:...................................................................................................................................4

CV for Set B:...................................................................................................................................5

CV for Set C:...................................................................................................................................5

QUESTION 3..................................................................................................................................6

d. Relative frequency distribution table:..........................................................................................6

e. Histogram, frequency polygon, and cumulative frequency polygon:..........................................7

(b) a..................................................................................................................................................9
PART 1

QUESTION 1
Quantitative variables and qualitative variables are two types of data that are frequently utilised
in research and data analysis. Both sorts of variables can be divided into subcategories.
Quantitative variables are numeric and quantifiable, whereas qualitative variables are non-
numerical and descriptive. This is the most important contrast between the two kinds of
variables.

Data points that may be quantified or tallied quantitatively are known as quantitative variables.
The age, height, weight, and income of a person are a few examples of quantitative variables.
Discrete and continuous variables can be used to further separate these variables. [Further
information required] As opposed to discrete variables, which can only take on one of numerous
possible values, continuous variables, like weight or height, can take on any value within a
specific range. The number of kids in a family is an illustration of a discrete variable.

But, qualitative variables are employed for descriptive purposes only; they are not numerical.
When something needs to be described or categorised based on characteristics or qualities, they
are employed. Qualitative variables might be anything from education level to marital status to
career to gender. For categorization purposes, these variables are frequently split into nominal
and ordinal variables. Contrary to continuous variables, nominal variables—such as gender or
race—are categorical and lack any inherent order of their own. On the other hand, ordinal
variables, such educational levels or income ranges, have a natural ordering. As opposed to
categorical variables, which lack a natural ordering, this.

In a nutshell, qualitative variables employ a descriptive approach and do not use numerical
values, whereas quantitative variables take numerical values and can be quantified. While you
can further categorise quantitative variables into discrete and continuous variables, you can only
categorise qualitative variables as nominal or ordinal variables.
QUESTION 2

i. Mean, Median, and Mode for each data set:

Data Set A:

 Mean: sum of all data points divided by the number of data points =
(70+84+119+127+84+127+157+80+84+119+140+126+105+153+88+130+116+76+85+
115) = 2185

>2185 / 20 = 109.25 # (Mean)

 Median: middle value when data is arranged in order = (sorted data set)
70,76,80,84,84,84,85,88,105,115,116,119,119,126,127,127,130,140,153,157

>115+116 = 231

>231/2 = 115.5 # (Median)

 Mode: most frequent value in the data set = 84 # (Mode)

Data Set B:

 Mean:

69+84+120+127+74+127+160+79+84+110+140+126+105+152+80+130+132+74+88+
135 = 2196

>2196/20= 109.8# (Mean)

=
 Median: middle value when data is arranged in order (sorted data set)
69,74,74,79,80,84,84,88,105,110,120,126,127,127,130,132,135,140,152,160

>110+120= 230

>230/2 = 115# (Median)

 Mode: = most frequent value in the data set

>74, 84 and 127 (three appear twice) # (Mode)


Data Set C:

 Mean: =
(48+94+119+121+83+129+159+80+82+117+140+127+102+153+56+139+110+83+90+
129 = 2161

>2161/20 =108.05# (Mean)

 Median: most frequent value in the data set (sorted data set)
48,56,80,82,83,83,90,94,110,117,119,121,127,129,129,139,140,153,159

>110+120 =227

>227/2 = 113.5# (Median)

 Mode: = most frequent value in the data set

>83 and 129 ( two appear twice)#


ii. Quartiles for Set C:
 To find the quartiles, we first need to arrange the data set in ascending order: 48, 56, 80,
82, 83, 83, 90, 94, 102, 110, 117, 119, 121, 127, 129, 129, 139, 140, 153, 159.
Q1=
Q1 is positioned between 4 th and 5 th and it’s 0.25 above the 4 th position
Q2
Q2 is positioned between 9 th and 10 th and it is 0.5 above the 9 th position .
Q3=(20+1)=
Q3 is positioned between 14 th and 15 th and it is 0.75 above the 14 th position

Q1 is positioned between (83+83) and it is 0.25 above the 5 th position


Q1=
Q2=
Q3=129+(0.75)(129-129)=129.0
25% of data will fall below 83.0 and 75% of data will fall below 129.0
iii. Coefficient of variation for each distribution:

 The coefficient of variation (CV), which is derived as the standard deviation divided by
the mean and given as a percentage, is a measure of the relative variability of a data
collection.
CV for Set A:
𝑋 𝑋−𝑋 (𝑋 − 𝑋)2
70 70-109.25=-39.25 1540.5625
76 76-109.25=-33.25 1105.5625
80 80-109.25=-29.25 855.5625
84 84-109.25=-25.25 637.5625
84 84-109.25=-25.25 637.5625
84 84-109.25=-25.25 637.6525
85 85-109.25=-24.25 588.0625
88 88-109.25=-21.25 451.5625
105 105-109.25=-4.25 18.0625
115 115-109.25=5.75 33.0625
116 116-109.25=6.75 45.5626
119 119-109.25=9.75 95.0625
119 119-109.25=9.75 95.0625
126 126-109.25=16.75 280.5625
127 127-109.25=17.75 315.0625
127 127-109.25=17.75 315.0625
130 130-109.25=20.75 430.5625
140 140-109.25=30.75 945.5625
153 153-109.25=43.75 1914.0625
157 157-109.25=47.75 2280.0625
Total =2185/20 Total=13221.75
X =109.25
13221.75
𝑆2 = = 695.882
19

=26.38
26.38
𝑋 100 = 24.15%
109.25
CV for Set B:
X X- X (X-X)2
69 69-109.8= -40.8 1664.64
140 140-109.8= 30.2 912.04
84 84-109.8= -25.8 665.64
126 126-109.8=16.2 262.44
120 120-109.8=10.2 104.04
105 105-109.8= -4.8 23.04
127 127-109.8= 17.2 295.84
152 152-109.8= 42.2 1780.84
74 74-109.8= -35.8 1281.64
80 80-109.8=-29.8 888.04
127 127-109.8=17.2 295.84
130 130-109.8=20.2 408.04
160 160-109.8=50.2 2520.04
132 132-109.8=22.2 492.84
79 79-109.8=-30.8 948.64
74 74-109.8=-35.8 1281.64
84 84-109.8=-25.8 665.64
88 88-109.8=-21.8 475.24
110 110-109.8=0.2 0.04
135 135-109.8=25.2 635.04
Total=2196/20 Total =15601.2
=109.8

15601.2
𝑆2 = = 821.116
19
=28.655

28.655/109.8x100=26.10%

CV for Set C:
X X-X (X -X)
48 48-108.05=-60.05 3606.0025
140 140-108.05=31.95 1020.8025
94 94-108.05=-14.05 197.4025
127 127-108.05=18.95 359.1025
119 119-108.05=10.95 119.9025
102 102-108.05=-6.05 36.6025
121 121-108.05=12.95 167.7025
153 153-108.05=44.95 2020.5025
83 83-108.05=-25.05 627.5025
56 56-108.05=-52.05 2709.2025
129 129-108.05=20.95 438.9025
139 139-108.05=30.95 957.9025
159 159-108.05=50.95 2595.9025
110 110-108.05=1.95 3.8025
80 80-108.05=-28.05 786.8025
83 83-108.05=-25.05 627.5025
82 82-108.05=-26.05 678.6025
90 90-108.05=-18.05 325.8025
117 117-108.05=8.95 80.1025
129 129-108.05=20.95 438.9025
Total =2161/20 Total = 17798.95
=108.05
17798.95
𝑆2 = = 936.78
19

=30.61

30.61
𝑥100 = 28.33%
108.05

 Based on these calculations, we can see that Set C has the highest coefficient of variation,

indicating a greater degree of variability in the data set relative to its mean. Conversely,

Set A has the lowest coefficient of variation, suggesting a lower level of variability in the

data set. Set B falls in between Sets A and C in terms of variability. Overall, the

coefficient of variation can provide insight into the relative dispersion of the data in each

set and can be useful for comparing variability across different data sets.

QUESTION 3
Answer a)

a)

K= 1+3.3log(n) =

N= 60

K= 1+3.3log(60)

=6.8 (at least 6 classes)


b)

class width= 79-11


6
=11.3
=11

C Frequency distribution table


A frequency table is an effective way to summarise or organise a dataset

Classes Tally Frequency

11-21 //// /// 8


22-32 //// //// //// //// //// 25

33-43 //// //// // 12

44-54 //// //// 10

55-65 // 2

66-76 // 2

77-87 / 1

Total 60
CLASSES LOWER UPPER MIDPOINT FREQUENCY
BOUNDARIES BOUNDARIES

11-21 10.5 21.5 11+21 =16 8


2

22-32 21+22= 21.5 32+33 =32.5 22+32 =27 25


2 2 2

33-43 32.5 43.5 33+43=38 12


2

44-54 43.5 54.5 44+54 =49 10


2

55-65 54.5 65.5 55+65=60 2


2
66-76
65.5 76.5 66+76 =71 2
2
77-87
76.5 87.5 77+87=82 1
2
Total
60

d) Relative frequency distribution table:

Relative = FREQUENCY

TOTAL FREQUENCY
Class Class Frequency Relative frequency Relative frequencg

11-21 11-21 8 8 0.13 x100= 13%


= 0.13
60
22-32 22-32 11-21 11-21 0.42 x 100=. 42%

33-43 33-43 12 12 0.2 x100= 20%


= 0.2
60
44-54 44-54 10 10 0.17 x100=17%
= 0.17
60
55-56 55-56 2 2 0.03x100=3%
= 0.03
60
66-76 66-76 2 2 0.03x100=3%
= 0.03
60
77-87 77-87 1 1 0.02 x100=2%
= 0.02
6
Total Total 60 1 100

. Cumulative frequency distribution


Classes Frequency Upper boundaries Cumulative Cumulative
boundaries Frequency
0-11 0 10.5 0 0
11-21 8 21.5 0+8 8
22-32 25 32.5 8+25 33
33-43 12 43.5 33+12 45
44-54 10 54.5 45+10 55
55-56 2 65.5 55+2 57
66-76 2 76.5 57+2 59
77-89 2 86.5 59+1 60
Total 60

(B)a.
B)

A bar graph would be used to present the data of a number of cases based on gender and region.

The number of cases is presented by bars with the appropriate status of Gender and Region. The

bar graph can compare the number of cases in each region based on gender for understanding the

data presented through the graph.


c)

Region I has the highest number of cases while Region III has the lowest number of cases. All

regions except Region IV have higher cases among females than among males.

Part 2 Online participation - OCP

You might also like