This action might not be possible to undo. Are you sure you want to continue?

# Statistics

ST 361: Statistics for Engineers Descriptive Statistics: Additional Topics

Kimberly Weems ksweems@ncsu.edu 5260 SAS Hall

Outline

• Transformations • Outliers • Quartiles, Five Number Summary & Interquartile Range (IQR) • Boxplots • Histogram Review

Statistics

Transformations

• Often we change units of the data. What happens?

– Change feet to centimeters? – Pounds to kilograms – Add 20 points to each students score.

Statistics

3

Transformations Multiplying each value in a data set by a constant multiplies the mean and standard deviation by the same amount. The variance is multiplied by the square of the constant. – Variance is on the squared scale – Mean and SD are on scale of the data. Statistics 4 .

Transformations Adding the same value to each item in a data set changes the mean by that amount but does not change the standard deviation or variance. – Spread of the items does not change. – Everything shifts together. Statistics 5 .

55 • Standard Deviation: 11. One fluid ounce is equal to 29.87 Statistics .Example • Soda consumption: The soda amounts were given in ounces.57=338.46*29. If we had measured the amounts in milliliters what would the mean and standard deviation have been? • Mean: 15*29.57=443.57 milliliters.

Outliers • Outliers.unusual values that do not fit with the overall group – Cause the mean to be unrepresentative Statistics •7 .

– – Should be teenager but is really a toddler Should be corrected Statistics •8 .Causes of Outliers • Data entry errors. – – Person records values incorrectly Should be corrected • Value from another population.

– Ask why it occurred –may give more information about the phenomena Statistics •9 .Causes of Outliers • Actual unusual values – Sometimes you have student who is 7’ tall – Should be explored and verified.

Impact of Outliers • Can substantially change the mean and standard deviation – Relative to data without outlier • Does not change median – Resistant to outliers Statistics •10 .

max. Q3.Five Number Summary • Five numbers that tell us about data set – Includes min. Q1. and Q3 are quartiles Statistics •11 . median (Q2) – Q1. Q2.

– value that has 25% of the data below it • Q2 = Q(. – value that has 75% of the data below it Statistics 12 .75) is the 3rd (upper) quartile.50) is the median. – value that has 50% of the data below it • Q3 = Q(.Quartiles • Q1 = Q(.25) is the 1st (lower) quartile.

• Q3 is the median of the ordered observations that lie to the right of Q2. Statistics 13 .Quartiles • Q1 is the median of the ordered observations that lie to the left of Q2.

We want the same amount of observations on both sides of the median. 90. 68. we take the average of the two (74+75)/2=74. As a convention.5 as the median. 97. 75. 72. 99 • The median splits the data into two parts.Quartiles • Order the data from smallest to largest 56. Any point between 74 and 75 would serve the purpose. 74. Statistics . 67. 88.

68. 88. Q3 is ________ Statistics . 97. find the median of the two halves of the data formed determined by Q2 • So Q1 is ____. 67. 75.Quartiles Ex. 72. 56. 90. 99 • There are different conventions for finding quartiles! We choose to use the following convention: To find Q1 and Q3. 74.

100 • If the number of data points n is odd then our convention is that Q2 is counted as part of the lower half and upper half • Q2=75. 67. 72. 74. 97. 56. 68.Quartiles Ex. 99. 88. 75. Q3= _____ Statistics . 90. Q1=____.

not as heavily influenced by outliers – Does not summarize all values. Statistics •17 .Inter-quartile Range • Measure of variability – Width of the middle 50%. IQR = Q3-Q1 – Resistant to outliers.

Boxplots • also called box and whisker plots • Present the 5 number summary in graphical form – Helps understand data Statistics •18 .

Boxplot Statistics •19 .

Boxplot •median Statistics •20 .

Boxplot •Q1 •median •Q3 Statistics •21 .

Boxplot •Q1 •median •Q3 •Minimum •Maximum Statistics •22 .

Statistics .Shapes from boxplots • Shapes of distributions from boxplots – Whiskers indicate the long tail of a distribution – Skewed or symmetric.

Long whisker to the right •Long tail to the right Statistics .

•Equal whiskers •Both whiskers about the same Statistics •25 .

Long whisker to the left •Long tail to the left Statistics .

Statistics .Shapes from boxplots • Note: We can not readily determine if a distribution is multimodal from a boxplot.

Outliers • computer programs identify automatically – Whiskers extend to largest/smallest non-outliers – Uses asterisk or dots to mark outliers Statistics •28 .

Comparative Boxplots Statistics •29 .

Comparative Boxplots •Outlier •Smallest nonoutlier Statistics •30 .

Comparative Boxplots • Males located about 7 inches higher – Outlier among males • Variability about the same • Both distributions are roughly symmetric Statistics .

Comparative Boxplots •Outlier •Smallest nonoutlier Statistics •32 .

Statistics .Outlier in the males • What is the cause of this outlier? – Data entry error? Should be 73? – Wrong population? A female incorrectly recorded as a male? – Actual unusual value? A male that is really 63 inches tall? • We should explore each of these three causes.

Histograms: Review • The most common graphical summary of quantitative data is a histogram • Describes the observed distribution of a single quantitative variable • Graphs the frequencies or relative frequencies of a single quantitative variable Statistics .

Things to look for in a histogram • Shape • Location • Spread Statistics .

Things to look for in a histogram: Location • For symmetric distributions. mean = median = mode • For skewed distributions. the three measures differ. •Mode •Mean •Median Statistics •Mean •Mode •Median .

How can we obtain the median for this histogram? Statistics .

Determining the median for a given histogram • Add the heights of the rectangles to determine n (n=75) • Median Q2 is located at position 38 • Add up the rectangle heights starting from the left until reaching 38. • May not be able to determine exact value (From statcrunch: median Q2=90) Statistics .