Professional Documents
Culture Documents
• Types:
– descriptive statistics
– inferential statistics
3
4
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
5
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
Descriptive Statistics: Central Tendency
• Mean:
Median:
• describes the center of the data set when the data is ordered
by value.
• If two numbers are in the middle then the median is the
average of the two.
7
Descriptive Statistics: Central Tendency
Mode:
This is the most commonly occurring value in the dataset.
• The mode is robust to outliers as well.
• In the normal distribution the mean = median = mode.
8
Descriptive Statistics: Central Tendency
9
http://www.syque.com/quality_tools/tools/Tools63.htm
Descriptive Statistics: Central Tendency
10
Descriptive Statistics: Central Tendency
11
https://www.slideshare.net/indramani332211/measures-of-central-tendency-and-dispersion
Descriptive Statistics: Central Tendency
12
https://bjcvs.org/article/3002/en-US/operating-with-data---statistics-for-the-cardiovascular-surgeon--part-iii--comparing-groups
Descriptive Statistics: Variability (Spread)
Variance: This is essentially the sum of squares of the distance
of each measurement from the mean, divided by a constant
(roughly the number of measurements). The larger the variance,
the more spread apart the data is.
Standard deviation (SD): This is the square root of variance. It
is more commonly reported than variance because it is on the
same scale as the measurements themselves. For example, if the
measurements are in inches, then variance is in square inches
while the SD is in inches as well.
13
https://web.stanford.edu/~kjytay/courses/stats32-aut2018/Session%202/Summary%20Statistics.pdf
Descriptive Statistics: Variability (Spread)
14
Descriptive Statistics: Variability (Spread)
15
• To get back to the original units, we take the square root of the
variance: this is called the standard deviation and is signified
by σ for a population and s for a sample.
• For a population, the formula for the standard deviation is:
16
Outliers
• There is no absolute agreement among statisticians about
how to define outliers, but nearly everyone agrees that it is
important that they be identified and that appropriate
analytical techniques be used for data sets that contain
outliers.
• Basically, an outlier is a data point or observation whose value
is quite different from the others in the data set being
analyzed.
17
Descriptive Statistics: Variability (Spread)
• Interquartile range (IQR): This the 3rd quartile minus the 1st
quartile.
• Note: The first 2 measures (Variance & SD) can be heavily
influenced by outliers, while the third (Interquartile range) is
robust to them.
18
https://web.stanford.edu/~kjytay/courses/stats32-aut2018/Session%202/Summary%20Statistics.pdf
19
https://www.slideshare.net/Sazedur92/measures-of-dispersion-73562437
Summary:
• True” value, error, and uncertainty
20
https://www.iso.org/sites/JCGM/GUM/JCGM100/C045315e-html/C045315e_FILES/MAIN_C045315e/AD_e.html
• Fourth Chapter: Descriptive Statistics and Graphics
21
22
23
24
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
25
Source: Statistics And Probability Tutorial | Statistics And Probability for Data Science | Edureka
26
27
https://www.slideshare.net/HardikAgarwal3/applications-of-central-tendency
Standard deviation in the Normal distribution
28
http://www.syque.com/quality_tools/tools/Tools63.htm