Professional Documents
Culture Documents
Statistical+Learning+ +Measures+of+Data
Statistical+Learning+ +Measures+of+Data
1. Raw Data
2. Frequency Distribution - Histograms
3. Cumulative Frequency Distribution
4. Measures of Central Tendency
5. Mean, Median, Mode
6. Measures of Dispersion
srashti1993tripathi@gmail.com
A62KUCY7WL
7. Range, IQR, Standard Deviation,coefficient of
variation
8. Normal distribution, Chebyshev Rule.
9. Five number summary, boxplots, QQ plots, Quantile
plot, scatter plot.
10.Visualization: scatter plot matrix.
11.Correlation analysis
srashti1993tripathi@gmail.com
A62KUCY7WL
Histogram depicts the pattern of the distribution emerging from the characteristic
being measured.
15 12
10
Fre quency
srashti1993tripathi@gmail.com
7
A62KUCY7WL
5 2 3
1
0
8-11 11-14 14-17 17-20 20-23
Classes
Total 25 1.00
30
24 25
srashti1993tripathi@gmail.com
A62KUCY7WL
Cumulative 20 21
Frequency 10 9
0 2
11 14 17 20 23
Torque(less than
value)
X = Arithmetic Mean
X
srashti1993tripathi@gmail.com
A62KUCY7WL
X=
∑
Applying the formula
n
We get mean = (565+570+572+568+585)/5 =572
45 40 60 80 90 65 55
90 80 65 60 55 45 40
srashti1993tripathi@gmail.com
A62KUCY7WL
Cannot be treated Cannot be treated
Can be treated algebraically. That is, algebraically. That is,
algebraically. That is, Medians of several Modes of several
Means of several groups groups cannot be groups cannot be
can be combined. combined. combined.
Range = = 18-9=9
Data Set: 12, 14, 11, 18, 10.5, 12, 14, 11, 9
Arranging in ascending order, the data set becomes
9, 10.5, 11, 11, 12, 12, 14, 14, 18
srashti1993tripathi@gmail.com
A62KUCY7WL
IQR=Q3-Q1=14-10.75=3.25
srashti1993tripathi@gmail.com
A62KUCY7WL
srashti1993tripathi@gmail.com
A62KUCY7WL
srashti1993tripathi@gmail.com
A62KUCY7WL
68%
μ
μ ± 1σ
95% 99.7%
μ ± 2σ μ ± 3σ
The five numbers that help describe the center, spread and shape
of data are:
▪ Xsmallest
srashti1993tripathi@gmail.com
A62KUCY7WL
▪ First Quartile (Q1)
▪ Median (Q2)
▪ Third Quartile (Q3)
▪ Xlargest
srashti1993tripathi@gmail.com
A62KUCY7WL
Q1 Q2 Q3 Q 1 Q2 Q3 Q1 Q 2 Q3
> ≈ <
Xlargest – Q3 Xlargest – Q3 Xlargest – Q3
> ≈ <
srashti1993tripathi@gmail.com
A62KUCY7WL
Example:
25% of data 25% 25% 25% of data
of data of data
Xsmallest Xlargest
Q1 Median Q3
40
This file is meant for personal use by srashti1993tripathi@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Five Number Summary:
Shape of Boxplots
• If data are symmetric around the median then the box
and central line are centered between the endpoints
srashti1993tripathi@gmail.com
A62KUCY7WL
Xsmallest Xlargest
Q1 Median Q3
• A Boxplot can be shown in either a vertical or
horizontal orientation
srashti1993tripathi@gmail.com
A62KUCY7WL
Q1 Q2 Q3 Q 1 Q2 Q3 Q 1 Q2 Q3
5
The data are right skewed in the following plot
srashti1993tripathi@gmail.com
A62KUCY7WL
00 22 33 55 2 27 7
0 5 10 15 20 25 30
Displays all of the data (allowing the user to assess both the
overall behavior and unusual occurrences)
Plots quantile information
For a data xi data sorted in increasing order, fi indicates that
approximately 100 fi% of the data are below or equal to the
value xi
srashti1993tripathi@gmail.com
A62KUCY7WL
srashti1993tripathi@gmail.com
A62KUCY7WL
srashti1993tripathi@gmail.com
A62KUCY7WL
srashti1993tripathi@gmail.com
A62KUCY7WL
For a data set of m dimensions, create m windows on the screen, one for
each dimension
The m dimension values of a record are mapped to m pixels at the
corresponding positions in the windows
The colors of the pixels reflect the corresponding values
srashti1993tripathi@gmail.com
A62KUCY7WL
(a) Income (b) Credit Limit (c) transaction volume (d) age
51
This file is meant for personal use by srashti1993tripathi@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Scatterplot Matrices
srashti1993tripathi@gmail.com
A62KUCY7WL
srashti1993tripathi@gmail.com
A62KUCY7WL
• Histograms
• Measures of central tendency: mean, mode, median
• Measures of dispersion: range, IQR, variance, std deviation,
coefficient of variation.
• Normal distribution, Chebyshev Rule.
• Five number summary, boxplots, QQ plots, Quantile plot,
scatter plot.
srashti1993tripathi@gmail.com
A62KUCY7WL