Professional Documents
Culture Documents
DATA MINING:
Summary Statistics
Prof. Sherica Lavinia Menezes
Asst. Professor
Computer Engineering Department
Goa College of Engineering
SUMMARY STATISTICS
1
30-09-2020
AGENDA
Measures of
02 Median
04 Statistics
LEARNING OBJECTIVES
Define different
measures of
exploring data
01
02 Compare the mean
and median
04 mean/median/range/vari
ance of a given dataset
2
30-09-2020
5 attributes:
150 flowers, 50 each
Sepal Length in cms
from 3 species: Iris
Sepal Width in cms
Setosa, Iris
Petal Length in cms
Versicolor, Iris
Petal Width in cms
Virginica
Class: {setosa, versicolor, virginica}
3
30-09-2020
MODE
4
30-09-2020
PERCENTILES
Can be used for ordered data
Attribute x of m
Sort dataset in
instances and Index = m*p ascending order
percentile p
N Index is
𝑥𝑝% = 𝑥[𝑖𝑛𝑑𝑒𝑥] Round up index whole
number?
Y
𝑥𝑝%
= (𝑥 𝑖𝑛𝑑𝑒𝑥
+ 𝑥 𝑖𝑛𝑑𝑒𝑥 + 1 )/2
10
5
30-09-2020
Example of Percentiles
{85, 34, 42, 51, 84, 86, 78, 85, 87, 69, 94, 74,
65, 56, 97}
{34, 42, 51, 56, 65, 69, 74, 78, 84, 85, 85, 86,
87, 94, 97}
{34, 42, 51, 56, 65, 69, 74, 78, 84, 85, 85, 86,
87, 94, 97}
𝟖𝟔 + 𝟖𝟕
𝒙𝟖𝟎% = = 𝟖𝟔. 𝟓
𝟐
11
𝒙𝟏 = 𝒎𝒊𝒏(𝒙) 𝒙𝒎 = 𝒎𝒂𝒙(𝒙)
𝟏 𝒎
𝒎𝒆𝒂𝒏 𝒙 = 𝒙𝒊
𝒎 𝒊=𝟏
𝒙 𝒊𝒇 𝒎 𝒊𝒔 𝒐𝒅𝒅 𝒊. 𝒆. 𝒎 = 𝟐𝒓 + 𝟏
𝒎𝒆𝒅𝒊𝒂𝒏 𝒙 = ቊ 𝒓+𝟏
𝟏/𝟐(𝒙 𝒓 + 𝒙𝒓+𝟏 ) 𝒊𝒇 𝒎 𝒊𝒔 𝒆𝒗𝒆𝒏 𝒊. 𝒆. 𝒎 = 𝟐𝒓
12
6
30-09-2020
{34, 42, 51, 56, 65, 69, 74, 78, 84, {2, 34, 42, 51, 56, 65, 69, 74, 78,
85, 85, 86, 87, 94, 97} 84, 85, 85, 86, 87, 94, 97}
𝟕𝟒+𝟕𝟖
Median(x) = 𝟕𝟖 Median(x) = = 𝟕𝟔
𝟐
13
14
7
30-09-2020
{1, 2, 34, 42, 51, 56, 65, 69, 74, 78, 94, 97}
{1, 2, 34, 42, 51, 56, 65, 69, 74, 78, 94, 97}
𝟏
𝑻𝑴@𝟒𝟎% = ∗ 𝟑𝟒 + 𝟒𝟐 + 𝟓𝟏 + 𝟓𝟔 + 𝟔𝟓 + 𝟔𝟗 + 𝟕𝟒 + 𝟕𝟖 = 𝟓𝟖. 𝟔
𝟖
15
𝒎
𝟏
𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒙 = 𝒔𝟐𝒙 = 𝒙 )𝟐
∗ ( 𝒙𝒊 − ഥ
𝒎−𝟏
𝒊=𝟏
𝒔𝒕𝒅𝒅𝒆𝒗 𝒙 = 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆(𝒙) = 𝒔𝒙
16
8
30-09-2020
𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔, 𝟕, 𝟖, 𝟗, 𝟏𝟎 {𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔, 𝟕, 𝟖, 𝟗, 𝟑𝟎}
Range(x1) = 09 range(x2) = 29
17
Interquartile Range
𝑰𝑸𝑹 𝒙 = 𝒙𝟕𝟓% − 𝒙𝟐𝟓%
18
9
30-09-2020
19
Let ഥ 𝒙𝟏 , ഥ
𝒙 = {ഥ 𝒙𝟐 , … . . , ഥ
𝒙𝒏 } be mean values of n
attributes.
𝒎
𝟏
𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒙𝒊, 𝒙𝒋 = ∗ (𝒙 𝒌𝒊 − 𝒙𝒊 )(𝒙𝒌𝒋 − 𝒙𝒋 )
𝒎−𝟏
𝒌=𝟏
20
10
30-09-2020
THANKS
21
11