You are on page 1of 24

Statistical Lingo

You may have come to this
presentation because you
really like statistics, but there’s
also the possibility that you’d
rather be somewhere else…
…like maybe playing golf at a
fancy resort or something?
The irony is that sports probably
refers to statistics more than any
other segment of our society.

Statistical Lingo
Virtually everyone has a pretty good
understanding of what is meant by
the word “average”.
A golfer who shot rounds of 78, 84,
and 87 could compute her average,
or what statisticians would call her
“mean” (  or x).
NOTE: In general, a Greek letter is
used if an entire population’s data is
being checked…
…but in the case of a sample, the
regular letter is used.
78
75

84
80

85

87
90

or what statisticians would call her “mean” (  or x). and 87 could compute her average. 84. like this: 78 84 +87 249 : 3 = 83 75 80 85 90 . A golfer who shot games of 78.Statistical Lingo Virtually everyone has a pretty good understanding of what is meant by the word “average”.

78 84 +87 249 75  -5  +1  +4 80 85 90 .Statistical Lingo Each of these scores deviates from the average (83) by some amount. These deviations can be combined to calculate what is called a “standard” deviation.

78 84 +87 249 75  -5  25  +1  1  +4  16 80 85 90 .Statistical Lingo But if we want to calculate the “standard deviation” we can’t simply add them up – they’ll cancel each other out and we’ll get zero. squaring the deviations will prevent that problem. On the other hand.

) 78 84 +87 249 75  -5  25  +1  1  +4 + 16 42 80 85 90 . (The concept of adding up squares of differences like this is called “the sum of the squares”.Statistical Lingo Then we can add the squares up .this helps to get an estimate of how much variation is present.

Statistical Lingo Then we divide this sum by the number of scores in the list (N) minus 1.) 78 84 +87 249 75  -5  25  +1  1  +4 + 16 42 : 2 = 21 80 85 90 . (This is because we only have a sample of all this person’s golf scores – if we had all of their golf scores we would simply divide by N.

6 75 80 85 90 . (Also. If we take the square root (which cancels out the fact that we squared the deviations earlier) we’ll get the standard deviation ( or s). we divide by 2 because it’s the number of data points in the sample minus 1.) 78 84 +87 249  -5  25  +1  1  +4 + 16 42 : 2 = 21 21 = 4.Statistical Lingo If we just leave it like this. it’s called the variance ( 2 or s2).

Real estate folks might refer to a median income level for an area – it’s virtually unaffected by Bill Gates moving into (or out of) the neighborhood. It’s the “middle value” of the data and is insensitive to actual values in the set.Statistical Lingo Another common term is the median. 78 84 87 75 80 85 90 .

Statistical Lingo In a few short slides. deviation 78 84 +87 249 variance (s2)  -5  25  +1  1  +4 + 16 42 : 2 = 21 249 / 3 = 83 mean (x) 75 standard deviation (s) 21 = 4. we’ve covered a number of the most frequently used statistical terms.6 median 80 85 90 .

especially if there’s a lot of data. 75 80 85 90 .Statistical Lingo Of course. if you had to manually compute: • an average • a deviation for each data point • a square of all the deviations • a sum of the squares • a variance • a standard deviation • a median every time you got some data. Thankfully. we have Minitab. things could get crazy.

then on Basic Statistics. then on Display Descriptive Statistics. . Click on Stat.Getting Basic Stats From Minitab 1. Enter whatever data you want to analyze into a column in Minitab 2.

. Click OK. In the box labeled Variable. 7. indicate the column containing the data. Check “Graphical summary”. 5. 5. 6. 7. 6. Click on the box labeled “Graphs”. 4. Click OK.Getting Basic Stats From Minitab 3. 3. 4.

Getting Basic Stats From Minitab Minitab will provide a summary of the data that looks something like this. . We’ll break this down in pieces to explain all the information displayed.

05. p > 0. it allows for a number of predictions and analytical methods that would otherwise not be valid. Normal Not Normal Not Normal If data is normally distributed. . For example. the mean and standard deviation can be used to predict the odds of having values fall within certain ranges (like within specified tolerances).05 yes) If a set of data is normally distributed it means that when it is plotted as a histogram it has a symmetric bell shaped distribution.Getting Basic Stats From Minitab Does the data “fit” a normal distribution well enough to assume normality? (p < 0. no.

) . It can be thought of as the “average distance that data points are from the mean” – the larger the standard deviation. (If calculated using a sample of data from a population it may be written x. if calculated using all the data in the population it may be written . the greater the variation. if calculated using all the data in the population It’s usually written  . (If calculated using a sample of data from a population it’s usually written s.Getting Basic Stats From Minitab x (sample) or  (population) s (sample) or  (population) Mean: The average value of all the data points.) StDev: The standard deviation of all the data points.

a very positive kurtosis indicates a distribution that is more peaked than usual. the skewness value will range from negative 3 to positive 3. if a distribution has a large tail at the upper end of its distribution.Getting Basic Stats From Minitab s2 (sample) or 2 (population)  N (sample size) Variance: Equal to the standard deviation squared. Typically. the more skewed the data. Kurtosis: A number reflecting how much the sample data resembles a normal distribution in shape. A very negative kurtosis indicates a distribution that is flatter than usual. . The kurtosis value is approximately zero for a normal distribution. N: The number of data points used in the creation of this summary. Skewness: A measure of asymmetry – the further from zero. For example. skewness will likely be positive.

3rd Quartile: The value which 75% of the data points fall below. Maximum: The highest value data point in the sample. 1st Quartile: The value which 25% of the data points fall below.Getting Basic Stats From Minitab Minimum: The lowest value data point in the sample. Median: The value which 50% of the data points fall below. .

Minitab can help us to understand how good our estimates of things like the mean (Mu). it can only estimate what the entire population is like. Minitab does this by calculating an interval within which it is 95% certain that these parameters actually reside if the whole population were to be included.Getting Basic Stats From Minitab Confidence Intervals: Because we only gave Minitab a sample of data from a presumably larger population. the standard deviation (Sigma). and median are. .

the vertical line in the middle of the red bar shows a mean of about 50. Minitab calculates an interval within which it is 95% certain that the population mean and median actually reside.9 and 52.Getting Basic Stats From Minitab The vertical line part way through each of the red boxes is the calculated mean (top) and median (bottom) for the sample of data entered. Around these points. While this is probably not the EXACT mean for the population. .6. using the number of data points and the amount of variation they exhibited it can be estimated with good confidence (95%) that the mean for the population falls somewhere between 48. For example.3. in the case of the top red bar.

5(Q3-Q1) or greater than Q3+1.5(Q3-Q1) are considered “outliers” and appear as individual dots 1st quartile Median 3rd quartile The “Box and Whisker plot” divides data into “quarters” .Getting Basic Stats From Minitab Histogram of the data (with Minitab’s best estimate of what normal curve fits the data best) NOTE: Data points with values lower than Q1-1.

36% 96. 83 87.73% of the population will be captured within three standard deviations of the mean.Once you have the basic stats.73% 13. what’s next? Given a process with a mean = 83 & std dev = 4.5 34% 34% 95% 68% 34% 34% 99.5 13.5 74 68% of the population will be captured within one standard deviation of the mean.36% 69.5 92 96. 78.5 87.5% 2.5 68% 34% 34% 78. 95% of the population will be captured within two standard deviations of the mean.6 69.5 68% 74 99.5 .5% 13.5% 2.5% 92 13.

5 92 96. height.5% 2. That information characterizes the box for them. 78.36% 96.5% 92 13.73% 13.36% 69. and weight of the box. width.5% 2. mean. and standard deviation) help to characterize the process [or the performance of a process].5 34% 34% 95% 68% 34% 34% 99.6 69. what’s next? Given a process with a mean = 83 & std dev = 4.5 .5 87.5 83 87.5 74 Note that the three items mentioned (shape.5 13. In other words. they know what to expect when they come to get it.5% 68% 74 13.Once you have the basic stats. It’s somewhat like when you ship a box for overnight delivery: the courier wants to know the length.5 68% 34% 34% 78.

5% 92 13.5 13.5 68% 34% 34% 78. what’s next? Given a process with a mean = 83 & std dev = 4.5% 68% 74 13.5 83 87.5 87.73% 13.5% 2.5 34% 34% 95% 68% 34% 34% 99.36% 96.Once you have the basic stats. predicting the percentage of times something will fall above or below any given value (like a tolerance limit.6 69. for instance) is relatively easy.5 . For example.36% 69. That’s the topic of another tool time: Process Capability.5 Understanding a process this well has some rather powerful implications. 74 78. once you know the mean and standard deviation of a process that’s normally distributed. In other words. we can tell how often the process will perform “properly”.5 92 96.5% 2.