Professional Documents
Culture Documents
3 4lec PDF
3 4lec PDF
CSE-4107
■ Transaction data
■ Behavioral data
■ Networked data
■ Missing values
■ Duplicate data
A mistake or a millionaire?
Missing values
■ Data cleaning
■ Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
■ Data integration
■ Integration of multiple databases, data cubes, or files
■ Data transformation
■ Normalization and aggregation
■ Data reduction
■ Obtains reduced representation in volume but produces the same
or similar analytical results
■ Data discretization
■ Part of data reduction but with particular importance, especially
for numerical data
Population Sample
■ Example I
- Data: 8, 4, 2, 6, 10
▪ Example II
– Sample: 10 trees randomly selected from Battle Park
– Diameter (inches):
9.8, 10.2, 10.1, 14.5, 17.5, 13.9, 20.0, 15.5, 7.8, 24.5
Here, population is the weighting factor and the average income is the
variable of interest
■ Example I
■ Data: 8, 4, 2, 6, 10 (mean: 6)
2, 4, 6, 8, median:
10 6
• Example II
– Sample: 10 trees randomly selected from Battle Park
– Diameter (inches):
9.8, 10.2, 10.1, 14.5, 17.5, 13.9, 20.0, 15.5, 7.8, 24.5
(mean: 14.38)
Monthly Number of
rent (Rs) Libraries (f)
500-1000 5
1000-1500 10
1500-2000 8
2000-2500 16
2500-3000 14
L1=lower bound of modal class
f1= frequency of modal class
3000 & 12
Above f0= frequency of previous class
f2= frequency of next class
Total 65 i = class interval
Mode Variance
Coefficient of Variation
25 25 25 25
% % % %
Q1 Q2 Q3
■ The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
■ Q2 is the same as the median (50% are smaller, 50%
are larger)
■ Only 25% of the observations are greater than the
third quartile
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
Inter-Quartile Range = 9 - 5½ = 3½
Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
Inter-Quartile Range = 10 - 4 = 6
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
4 5 6 7 8 9 10 11 12
Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
3 4 5 6 7 8 9 10 11 12 13 14 15
137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186
Lower Upper
Quartile Median Quartile
= 158 = 171 = 180
Boys
Girls
1. The boys are taller on average.
5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85