Professional Documents
Culture Documents
Descriptive Statistics Slides PDF
Descriptive Statistics Slides PDF
Data Analytics
Section 1
• What is statistics?
• The study of data and its relationship with probability
• Data
• Mean
• Median
• Mode
Data
• One of the goals of business statistics is to transform raw data
into actionable information
• What is raw data?
• A list of values that correspond to physical events
• Roll a die 100 and record which number it lands on
• Daily sales of milk at your local store
• Thickness of tire wall manufactured at a factory
• Who, a group of people voted for in an election
• It can be very difficult to understand a large list of numbers
• We will learn to summarize data
Mean
• To find the median sort the data and find the middle
Median
• Example: Let’s find the median of the milk sale data
• First sort the data
• 565, 570, 572, 568, 585
• 565, 568, 570, 572, 585
• The median is 570
• If we have one more day of data
• 565, 570, 572, 568, 585, 580
• 565, 568, 570, 572, 580, 585
• The median is somewhere between 570 and 572
• The convention is to define the median as the number halfway
between (571)
Median
• Suppose the store closed early on the first day and only sold 100
units of milk
• 100, 570, 572, 568, 585
• 100, 568, 570, 572, 585
• The median is still 570!
• Even though there is an outlier in the data, the median stays
the same
• In general, the median is unaffected by outliers
Skewness
• By comparing the median to the mean we can get a good idea
of the asymmetry of the data
• If the median and the mean are equal then the data is symmetric
• Suppose the mean is much lower than the median
• Then the data points less than the median tend to be further
from the median than the data points above the median
• Then the data is said to be skewed to the left
• The opposite is true if the mean is much higher than the
median
Skewness
• Let’s look at the milk data when the store closed early
• 100, 570, 572, 568, 585
• The mean is 479
• The median is 570
• The 100 sales day skewed the data to the left
Mode
• Interquartile Range
• 5-Number Summary
• Variance
• Standard Deviation
Dispersion
• To understand standard deviation we will first define variance, and then use that to
get standard deviation
1
• 𝑉𝑎𝑟 𝑋 = 𝑛−1 σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
1
• 𝑆𝑑 𝑋 = 𝑉𝑎𝑟 𝑋 = σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
𝑛−1
Standard Deviation
• Variance measures the average squared distance from the mean
– Why 𝑛 − 1instead of 𝑛?
• By taking the square root of this we get something like the average distance
from the mean
– 𝑎+𝑏 ≠ 𝑎+ 𝑏
• If we want the average distance, why not calculate
1 𝑛
– σ
𝑛 𝑖=1
𝑋𝑖 − 𝑋ത
– This is called mean absolute deviance
• We will see that standard deviation is mathematically much more convenient
we get to probability.
• They are very similar.
Standard Deviation
• Let’s go back to the milk sales example
• 565, 570, 572, 568, 585
• The mean is 572
• To calculate variance we first subtract the mean from each data point
and square the difference
• (-7)2, (-2)2, 02, (-4)2, 132
• Then we add all these up and divide by 5-1=4
1
• 𝑉𝑎𝑟 𝑥 = 49 + 4 + 0 + 16 + 169 = 59.5
4
• The standard deviation is the square root of this
– 𝑆𝑑 𝑋 = 59.5 = 7.71
• Mean absolute deviance is 5.2
Dan Mitchell
Section 4
VISUALIZING DATA
Outline
• Histogram
• Box Plot
Histogram
• A histogram is a graph that plots the relative
frequency of data
• 10.61 12.18 11.73 11.28 4.43 9.83 10.13 9.48 9.27
5.95 8.74 9.96 10.73 13.26 10.95 11.13 13.42 9.13
8.77 6.77 9.53 7.09 9.48 11.23 8.35 12.26 6.47
12.45 10.82 2.50 14.26 11.56 10.76 10.98 7.52
10.38 10.35 15.938.40 10.57 7.88 12.43 8.95
10.10 12.10 11.13 7.18 10.77 11.54 8.03
• No two data points are the same
• Let’s group the data into a few bins
Histogram
• There is 1 number less than 4
• There are 2 numbers between 4 – 6
• There are 6 numbers between 6 – 8
• There are 13 numbers between 8 – 10
• …
• To make a histogram we group data together like this and then
make a bar chart
• The width of the bars represents the range of the groups
• The heights of the bars represent how many data points are
in that group
Histogram