Professional Documents
Culture Documents
1
Classification :: Statistics
• Categorical Data
– The Objects are grouped into categories based on some Qualitative Trait
– The resultant data are merely labels or categories
– Example:
• Hair Color: Brown / Black / Red
• Smoking Status: Favor / Neutral / Against
• Measurement Data
– The Objects are “measured” on some Quantitative Trait
– The resultant data is a set of numbers
– Example:
• Age of the Students
• JEMAT Score
• Number of Students Not Attending Class
2
Categorical Data
• Nominal Data
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female : 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe
3
Measurement Data
• Discrete Data
– Only Certain Values are Possible
– There are gaps between the possible value
– Are generated through the process of Counting
– Example:
• Number of students in the class
• Number of Employees Absent from Work
• Continuous Data
– Any value within an interval is possible with a suitable measuring device
– Theoretically, the number can be accurate to any desired number of
decimal places
– Are generated through the process of Measurement
– Example:
• Height in cm
• Time to complete the assignment
4
Classification :: Scaling Theory
• Nominal Data ORDER DISTANCE ORIGIN
– A type of categorical data in which numbers act as a label without having
any specific meaning
– Example:
• Male : 1
• Female : 2
• Ordinal Data
– A type of categorical data in which numbers act as an guide to the level of
importance of the object
– Example:
• Mild
• Moderate
• Severe
7
Some Points to Remember
• Tend to use Interval Scales
• Data need not be comparable with other studies
• Data has to make sense in your context
• Students fail to understand the importance of Data
– Wrong Approach
• “Data Collect Kore Niyechi… Ebar Ki Kori”
– Right Approach
• “Amar Ki Data Dorkar? Kano Daokar? Kothay Pabo? Kibhabe
Analyse Kore Uttor Pabo”
8
Descriptive Statistics
:: A Quick Review
9
Measures of Central Tendency
• Central tendency is “loosely” defined as the concept of
location of the center of a distribution of data
• Three basic measures
– Arithmetic Mean
– Median
– Mode
10
Arithmetic Mean
• Advantages:
– Easy to Compute
– Affected by every value in the set of observations
– Defined by rigid mathematical formulation
– It is relatively reliable
– It represents the “center of gravity” of the data
• Disadvantages:
– Unduly affected by small and / or large values
– Cannot be calculated for data with open ended class
– Is a good measure only when the distribution is fairly symmetric
11
Median
• Advantages
– Refers to the “Middle Value” of the distribution
– It is a “positional measure”
– Useful in case of open ended class
– Not seriously affected by Extreme Values
– Most appropriate for dealing with Qualitative Rank Data
– Has a series of related positional measures like Quartiles, Deciles,
Percentiles
• Disadvantages:
– It does not take every value into consideration
– It is not capable of algebraic treatment
– It is erratic if the number of items are smalle
12
Mode
• Advantages:
– It is the most typical or representative value of a distribution
– Not unduly affected by extreme values
– It can be used to describe qualitative phenomenon
• Disadvantages:
– Mode may not be there in a distribution or may be present more
than once in a distribution
– Not capable of algebraic treatment
– It is not rigidly defined for calculation
13
Relation Between the 3 Measures
• In moderately skewed distribution:
Mode = 3 Median – 2 Mean
14
Measures of Dispersion
• Dispersion is defined as the degree to which data tends to
spread about a central value
• Four Absolute & Relative Measures
– Range Coefficient of Range
– Quartile Deviation Coefficient of Quartile Deviation
– Mean Absolute Deviation Coefficient of MAD
– Standard Deviation Coefficient of Variation
15
Range
• Range
• Advantages
– Simplest to understand and compute
• Disadvantages:
– Not based on each and every item in the data
– Does not take into account the shape of distribution
– Cannot be computed in case of open ended classes
16
Quartile Deviation
• Inter Quartile Range (IQR)
• Coefficient of QD
17
Quartile Deviation
• Advantages:
– Can measure variation in open ended distributions
– It is extremely useful in case of erratic or badly skewed data
– It is not affected by extreme values
• Disadvantages:
– Ignores 50% of the data
– Is not capable of mathematical manipulation
– Is not considered as a measure of dispersion:
• Effectively shows the distance between two positional points
18
Mean Absolute Deviation
• Mean Absolute Deviation (MAD) defined as:
19
Standard Deviation
• Defined as “Root Mean Squared Deviation from Mean”
• Coefficient of Variation
20
Standard Deviation
• Advantages:
– Best Measure of Dispersion
– Possible to calculate the combined standard deviation of two or
more groups
– Chebycheff’s Theorem (1821-1894)
• What so ever be the distribution at least 75% of the values will fall
within +/- 2 sd from the mean of the distribution and at least 89% will
fall within +/- 3 sd from the mean of the distribution
– Has relation with other measures:
• QD = 0.667 SD
• MD = 0.80 SD
21
Skewness
• Refers to the asymmetry in the shape of the distribution
22
Skewness - Measures
• Karl Pearson’s Measure of Skewness:
Mean – Mode OR
3(Mean – Median)
Standard Deviation Standard Deviation
- Skewness coefficient > 0 is positively skewed
- Skewness coefficient < 0 is negatively skewed
- Skewness coefficient = 0 is symmetrical
• Bowley’s Measure
• Moments Measure
23
Kurtosis
• Kurtosis means “Bulginess”
• Refers to the degree of flatness or peaked-ness in the
region about the mode of the distribution:
– Lepto-Kurtic : If the curve is more peaked than Normal Curve
– Meso-Kurtic : If the curve is the same as the Normal Curve
– Platy-Kurtic : If the curve is less peaked than Normal Curve
24
KURTOSIS - Measures
• Kurtosis
Excess Kurtosis
Kurtosis
25
Interpretation
• A normal distribution has kurtosis exactly 3 (excess
kurtosis exactly 0). Any distribution with kurtosis ≈3
(excess ≈0) is called mesokurtic.
• A distribution with kurtosis <3 (excess kurtosis <0) is
called platykurtic. Compared to a normal distribution, its
central peak is lower and broader, and its tails are shorter
and thinner.
• A distribution with kurtosis >3 (excess kurtosis >0) is
called leptokurtic. Compared to a normal distribution, its
central peak is higher and sharper, and its tails are longer
and fatter.
Kurtosis: Leptokurtic
Kurtosis: Mesokurtic
Kurtosis: Platykurtic
Uses of Skewness and Kurtosis
• Most stock prices and asset returns are positive or
negative skew. Skewed data can be used to determine
whether a given or future data point can be more or less
than the mean. Basically related to asymmetries (or
risks) in information. Higher risks lead to higher
returns
• Kurtosis is used to describe volatility around the mean.
For example, if past data yields leptokurtic distribution,
the stock will have a relatively low amount of variance.
This further implies the return values are close to the
mean hence less volatile. Platykurtic distribution
expect more volatilty (or losses ) in the future.
What is Descriptive Statistics?
• The following Needs to Be Reported:
– Arithmetic Mean
– Median
– Mode
– Standard Deviation
– Variance
– Kurtosis
– Skewness
– Range
– Minimum
– Maximum
– Sum
– Count
31