You are on page 1of 17

DATA ORGANIZATION

Raw data

Tally

Tallying is a quick way of organizing into suitable interval

SAMITH NISHSHANKA (BSC) 1


Definitions

Interval of Class marks for histogram, cumulative frequency table

SAMITH NISHSHANKA (BSC) 2


Histogram

Steam-leaf diagram

SAMITH NISHSHANKA (BSC) 3


Numerical and quantitative data

Measures of central tendency

Mean

SAMITH NISHSHANKA (BSC) 4


SAMITH NISHSHANKA (BSC) 5
Median

If 𝑥1 , 𝑥2 , 𝑥3 … . 𝑥𝑛 are an array of data,


𝑛 𝑛 𝑛
If Median is in 2 th place. If the value of 2 is not divisible by two, the place 2 (𝑥𝑟+1 − 𝑥𝑟 ) th.

SAMITH NISHSHANKA (BSC) 6


Mode

The mode is the value which occurs most frequently.

Examples

1,2,2,2,2,2,3,5,8,3,7

Mode is 2

Exercises

SAMITH NISHSHANKA (BSC) 7


SAMITH NISHSHANKA (BSC) 8
Measures of dispersion

Range

Max value of data -Min value of data

Example

1,5,6,8,4,6,2,3,8,7,5,9,12

Range =12-1=11

This means the value of data spread 11.

Inter quartile range

3rd quartile-1st quartile

How to find quartiles

Example

1,2,5,8,7,9,5,2,4,7,10,6,3,9,13,15

Be sure to arrange values ascending order

1,2,2,3,4,5,5,6,7,7,8,9,9,10,13,15
1
𝑄1 = 4 (𝑛 + 1) term
1
𝑄2 = 𝑀𝑒𝑑𝑖𝑎𝑛 = 2 (𝑛 + 1) term
3
𝑄3 = 4 (𝑛 + 1)term
1 1
𝑄1 = 4 (16 + 1)= 4.25 = 4th term+4 (5𝑡ℎ 𝑡𝑒𝑟𝑚 − 4𝑡ℎ 𝑡𝑒𝑟𝑚) =3 + 0.25 = 3.25
1 1
𝑄2 = 2 (16 + 1)= 8.5 = 8th term+2 (9𝑡ℎ 𝑡𝑒𝑟𝑚 − 8𝑡ℎ 𝑡𝑒𝑟𝑚) =6 + 0.5 = 6.5
3 1
𝑄3 = 4 (16 + 1)= 12.75 = 12th term+4 (13𝑡ℎ 𝑡𝑒𝑟𝑚 − 12𝑡ℎ 𝑡𝑒𝑟𝑚) =9 + 0 = 9

Therefore, inter quartile range (𝑄3 − 𝑄1 ) = 9 - 3.25 =5.75

Mean deviation

Data

𝑥1 , 𝑥2, 𝑥3, … … . 𝑥𝑛
1
𝑥̅ = 𝑛 ∑ 𝑥
1
Mean deviation =𝑛 ∑|𝑥 − 𝑥̅ |

SAMITH NISHSHANKA (BSC) 9


Variance
1
𝑣(𝑥) = 𝜎 2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛

Standard deviation

1
𝑆𝑥 = 𝜎 = √ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛

An alternative form of variance and standard deviation


1 1 1 1
𝜎 2 = ∑(𝑥𝑖 − 𝑥̅ )2 = ∑(𝑥𝑖2 − 2𝑥𝑖 𝑥̅ + 𝑥̅ 2 ) = (∑ 𝑥𝑖2 − 2𝑥̅ ∑ 𝑥𝑖 + ∑ 𝑥̅ 2 ) = (∑ 𝑥𝑖2 − 2𝑛𝑥̅ 2 + 𝑛𝑥̅ 2 )
𝑛 𝑛 𝑛 𝑛

1
𝜎2 = ∑ 𝑥𝑖2 − 𝑥̅ 2
𝑛
1
𝜎 = √ ∑ 𝑥𝑖2 − 𝑥̅ 2
𝑛

With frequencies
∑ 𝑓(𝑥𝑖 −𝑥̅ )2 ∑ 𝑓𝑥𝑖2
𝑣(𝑥) = 𝜎 2 = ∑𝑓
or 𝜎2 = ∑𝑓
− 𝑥̅ 2

∑ 𝑓(𝑥𝑖 −𝑥̅ )2 ∑ 𝑓𝑥𝑖2


𝜎=√ ∑𝑓
or 𝜎=√ ∑𝑓
− 𝑥̅ 2

Combining mean and standard deviation of Two set of data arrangement

Let be the 𝑥̅ , 𝜎𝑥 for 𝑥1 , 𝑥2 , 𝑥3 … … . 𝑥𝑛 and 𝑦̅ , 𝜎𝑦 for 𝑦1 , 𝑦2 , 𝑦3 … … . 𝑦𝑚 two set of data and combination
of this data as 𝑧̅ , 𝜎𝑧 for ∑ 𝑧𝑟 = 𝑥1 , 𝑥2 , 𝑥3 … … . 𝑥𝑛 + 𝑦1 , 𝑦2 , 𝑦3 … … . 𝑦𝑚 . notation as usual.
1
𝑥̅ = 𝑛 ∑ 𝑥 and then 𝑛𝑥̅ = ∑ 𝑥
1
𝑦̅ = 𝑚 ∑ 𝑦 and then 𝑚𝑦̅ = ∑ 𝑦

1 1
𝜎𝑥2 = ∑(𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖2 − 𝑥̅ 2
𝑛 𝑛
1 1
𝜎𝑦2 = ∑(𝑦𝑖 − 𝑦̅)2 = ∑ 𝑦𝑖2 − 𝑦̅ 2
𝑚 𝑚

SAMITH NISHSHANKA (BSC) 10


For combined set of data,
𝑥1 , 𝑥2 , 𝑥3 … … . 𝑥𝑛 + 𝑦1 , 𝑦2 , 𝑦3 … … . 𝑦𝑚 (𝑥1 , 𝑥2 , 𝑥3 … … . 𝑥𝑛 ) + (𝑦1 , 𝑦2 , 𝑦3 … … . 𝑦𝑚 ) ∑𝑥 + ∑𝑦
𝑧̅ = = =
𝑛+𝑚 𝑛+𝑚 𝑛+𝑚
𝑛𝑥̅ + 𝑚𝑦̅
𝑧̅ =
𝑛+𝑚

1
𝜎𝑧2 = ∑ 𝑧𝑖2 − 𝑧̅ 2
𝑛+𝑚

∑ 𝑧𝑖2 = 𝑥12 + 𝑥22 + 𝑥32 … … + 𝑥𝑛2 + 𝑦12 + 𝑦22 + 𝑦32 … … + 𝑦𝑚


2
= ∑ 𝑥𝑖2 + ∑ 𝑦𝑖2

∑ 𝑥𝑖2 + ∑ 𝑦𝑖2 = 𝑛𝜎𝑥2 + 𝑥̅ 2 + 𝑚𝜎𝑦2 + 𝑦̅ 2

Therefor

∑ 𝑧𝑖2 = 𝑛𝜎𝑥2 + 𝑚𝜎𝑦2 +𝑥̅ 2 + 𝑦̅ 2

1
𝜎𝑧2 = (𝑛𝜎𝑥2 + 𝑚𝜎𝑦2 +𝑥̅ 2 + 𝑦̅ 2 ) − 𝑧̅ 2
𝑛+𝑚

Few examples of combined set of data

1. Means and standers deviations of two math’s class are 55, 70,10 and 8 respectively. Find the mean
and standard deviation when combining both classes. Number of students of this classes are 40 and
20 respectively.
2. Two set of data is given by 𝑥𝑖 = 1,2,3,4,5,6,7,8, ,9,10 and 𝑦𝑖 = 2,4,6,8,10. 𝑥̅ , 𝑦̅, 𝜎𝑥, 𝜎𝑦 are in usual
notation. If z is the combined set of given two set of data, find the 𝑧̅ and 𝜎𝑧 .if the 7 of the set of data
𝑥𝑖 has to be corrected as 8, find new 𝑥̅ 𝜎𝑥, .and new 𝑧̅ and 𝜎𝑧 .

Linear transformation
𝑥𝑖 −𝑎
Let be the 𝑦𝑖 is such that 𝑦𝑖 = 𝑏
.

Connection with mean

𝑥𝑖 = 𝑏𝑦𝑖 + 𝑎
∑ 𝑥𝑖 = ∑ 𝑏𝑦𝑖 + ∑ 𝑎 ,

𝑛𝑥̅ = 𝑏𝑛𝑦̅ + 𝑛𝑎,

𝑥̅ = 𝑏𝑦̅ + 𝑎

SAMITH NISHSHANKA (BSC) 11


Connection with variance and standard deviation
1
𝜎𝑥2 = ∑(𝑥𝑖 − 𝑥̅ )2 and
𝑛

1
𝜎𝑥2 = ∑[𝑏𝑦𝑖 + 𝑎 − (𝑏𝑦̅ + 𝑎)]2
𝑛
1 1
𝜎𝑥2 = ∑(𝑏𝑦𝑖 − 𝑏𝑦̅)2 = 𝑏 2 ∑(𝑦𝑖 − 𝑦̅)2
𝑛 𝑛
𝜎𝑥2 = 𝑏 2 𝜎𝑦2

𝜎𝑥 = 𝑏𝜎𝑦

Why the use of linear transformation?

See the given table below

x f ∑ 𝑓𝑥
Because of 𝑥̅ = ∑𝑓
, the calculations such as given example is so hard to work.
175.5 8
therefore, linear transform is the best way to work with.
275.5 9
375.5 11 𝑥𝑖 −𝑎
Being 𝑥𝑖 = 𝑏𝑦𝑖 + 𝑎, 𝑦𝑖 = 𝑏
475.5 13
575.5 7 𝑥𝑖 −375.5
Let’s take 𝑎 = 375.5 and 𝑏 = 100 so 𝑦𝑖 = 100
then completing the table
675.5 2
again with y value.

x f 𝑦𝑖 𝑦𝑖2 𝑓𝑦𝑖 𝑓𝑦𝑖2


175.5 8 -2 4 -16 32
275.5 9 -1 1 -9 9
375.5 11 0 0 0 0
475.5 13 1 1 13 13
575.5 7 2 4 14 28
675.5 2 3 9 6 18
∑ 𝑓 = 50 ∑ 𝑓𝑦𝑖 = 8 2
∑ 𝑓𝑦𝑖 = 100

∑ 𝑓𝑦 8
𝑦̅ = ∑𝑓
= 50 = 0.16

Being 𝑥̅ = 𝑏𝑦̅ + 𝑎, 𝑥̅ = 100 × 0.16 + 375.5 = 391.5


∑ 𝑓𝑦𝑖2 100
Being 𝜎𝑦2 = ∑𝑓
− 𝑦̅ 2 , 𝜎𝑦2 = 50
− (0.16)2 = 1.9774

𝜎𝑦 = 1.406

Being 𝜎𝑥 = 𝑏𝜎𝑦 , 𝜎𝑥 = 100 × 1.406 = 140.6

𝜎𝑥 = 140.6

SAMITH NISHSHANKA (BSC) 12


Calculating mode, median and quartiles

x F Mode must be in the interval having the highest frequency


10-20 5
Therefore, the interval of mode is 40 – 50.
20-30 8
30-40 9 Mode can be calculated Using histogram.
40-50 12
50-60 7
60-70 6
70-80 3
∑ 𝑓 = 50

AGB and CGD are similar triangle


𝐸𝐺 𝐺𝐹
Therefore, 𝐴𝐵 = 𝐷𝐶 ,
𝑥 10−𝑥
3
= 5
, 𝑥 = 3.75

Therefore mode = 40 + 3.75 = 43.75

SAMITH NISHSHANKA (BSC) 13


Cumulative frequency charts
x x F cf
10-20 𝑥 < 20 5 5
20-30 𝑥 < 30 8 13
30-40 𝑥 < 40 9 22
40-50 𝑥 < 50 12 34
50-60 𝑥 < 60 7 41
60-70 𝑥 < 70 6 47
70-80 𝑥 < 80 3 50
∑ 𝑓 = 50

50
𝑄1 = 4
=12.5th place
50
𝑄2 = 𝑚𝑒𝑑 = 2
=25th place
50
𝑄3 = 3 × =37.5th place
4

The cf graph can be used to find quartiles.

Finding 𝑸𝟏

ABE and ACD are similar triangles


𝐴𝐸 𝐸𝐵
Therefore, 𝐴𝐷 = 𝐷𝐶 ,
𝑥 7.5
10
= 8
, then 𝑥 = 9.375

So 𝑄1 = 20 + 9.375 = 29.375
Similarly 𝑄2 , 𝑄3 can be found.

SAMITH NISHSHANKA (BSC) 14


Finding 𝑸𝟐

ABE and ACD are similar triangles


𝐴𝐸 𝐸𝐵
Therefore, 𝐴𝐷 = 𝐷𝐶 ,
𝑥 3
10
= 12 , then 𝑥 = 2.5

So 𝑄2 = 40 + 2.5 = 42.5

Finding 𝑸𝟑

ABE and ACD are similar triangles


𝐴𝐸 𝐸𝐵
Therefore, = ,
𝐴𝐷 𝐷𝐶
𝑥 3.5
= , then 𝑥 = 5.857
10 7

So 𝑄3 = 50 + 5.857 = 55.857

Exercises
1. Grouped data of corona infected people of a small scheme given below.
Age (x) 10-20 20-30 30-40 40-50 50-60 60-70
Patient(f ) 2 4 8 7 5 4

I. Find the mean


II. Find the mode
III. Find the median and quartiles
IV. Find the inter quartile range
V. Find the standard deviation

SAMITH NISHSHANKA (BSC) 15


2. Weight of students of a combined math’s class given below in a table
weight (x) 30-35 35-40 40-45 45-50 50-55 55-60
Student (f ) 10 15 25 20 18 12

I. Find the mean


II. Find the mode
III. Find the median and quartiles
IV. Find the inter quartile range
V. Find the standard deviation

3. The table given in the below contains number of vehicles owned by businessmen’s in Colombo
district.
vehicle (x) 2-5 6-9 10-13 14-17 18-21 22-25
owners (f ) 3 5 12 10 6 4

I. Find the mean


II. Find the mode
III. Find the median and quartiles
IV. Find the inter quartile range
V. Find the standard deviation

4. The salary level of accompany is given below. Using the fallowing date, answer the all question

Monthly Salary level (X) Number of workers (f)


10000-15000 40
15000-20000 60
20000-25000 80
25000-30000 90
30000-35000 120
35000-40000 100
40000-45000 90
45000-50000 90
50000-55000 80
55000-60000 50

I. Find the mean


II. Find the mode
III. Find the median and quartiles
IV. Find the inter quartile range
V. Find the standard deviation

SAMITH NISHSHANKA (BSC) 16


Skewness
When the graph of frequency is symmetrical, its shape is like a bell (normal distribution)

When the graph of frequency is skewed positively

When the graph of frequency is skewed negatively

Mo=Mode, Md= Median, Me= Mean


Karl Pearson’s measures of skewness are fallows. It gives the sign positive or negative indicating the
skewness.
𝑀𝑒−𝑀𝑜 3(𝑀𝑒−𝑀𝑒)
𝑆𝑘1 = 𝜎
𝑆𝑘2 = 𝜎

SAMITH NISHSHANKA (BSC) 17

You might also like