# SOURCE: http://exploringdata.

net/

Welcome to the Exploring Data website. This website provides curriculum support materials for teachers of introductory statistics.

can be done. Normal Distribution Discover the link between the 1.5*IQR Rule and the normal distribution, and why you should use light bulbs to burn those traditional statistics textbooks. Probability Probability is a wonderful subject to teach! There are so many activities for teaching concepts, puzzles and problems with non-intuitive answers and a variety of contexts for the exercises. This page contains a small collection. Sampling Contains a nice little activity - the JellyBlubbers - which was modified from a problem in a leading textbook which was modified from an activity in Activity Based Statistics. And the original activity started as a bucket of rocks on the desk of a statistics teacher. An activity with impeccable lineage. Confidence Intervals Are you 95% confident that you can correctly teach your students the correct meaning of confidence intervals? Hypothesis Testing Teaching students to understand hypothesis testing is a difficult business indeed. This page contains some activities that will give students some hands-on experiences with the underlying concepts. Curve Fitting Contains Anscombe's famous dataset, and a comprehensive manual on using technology to fit functions to data.

Knowledge in the subject allows us to make informed judgements about the statistics presented by others to persuade us. Introduction to Exploring Data Statistics is a fascinating subject. It is pointless. a dataset from another source). despite the title. How to Make Statisics Boring. as we are bombarded with statistics every day of our lives. such an attitude would not be surprising. All of the Maths A and B texts that I have examined are loaded with similar examples. Source: Boggs. As teachers we need to give our students an understanding of the place of statistics in modern society. . If you wish to use these resources you must seek permission of the owner of the copyright. It is trivial. I am not picking on this particular textbook. Brisbane. µHey. Construct a box-and-whisker graph for the following data which are the masses in kilograms of nine Year-11 girls: 35 47 48 50 51 53 54 70 75 This was chosen only because it is a typical example of the statistics that many of us are teaching our students. There are other things wrong with this exercise.. You are welcome to use the resources on this site freely for educational purposes if you acknowledge Education Queensland as the owner of the resource. Actually. R." Given the sort of statistics to which we¶ve been subjecting ourselves and our students over the years. other than the fact that it is boring. an interest in the subject and a solid grounding for further study. Consider the following exercise on constructing a boxplot. Teaching Mathematics. it doesn¶t need to be made boring. boring is not the most important issue. then you should acknowledge that source also. to both learn and teach! It is also an important one. It is worth noting that at the tertiary level more students study statistics than study calculus subjects. QAMT.Copyright Most of the resources on this site are the property of Education Queensland. You are not permitted to sell these materials for commercial gain without the express written consent of Education Queensland. statistics is already boring. Exceptions to this are material that is owned by another person or organisation and for which permission has been granted for use on this site. there are many more where this came from.g. The data are fake. which is from a popular Math A text. If this exercise doesn¶t convince you that statistics can be boring. If the resource you wish to use contains an acknowledgement (e. Setting the Scene (from How to Make Statistics Boring) Ah! I¶ll bet some of you thought. (1996).

process control. It gives a nice real-world example of queueing. as opposed to the spiceless variety taught in most schools'. From the AP-Statistics Guidebook I thought this was nicely written. From careful observations of patterns in data. and unusual values. sampling. from simple comparisons of proportions through linear regression. This website has extensive extension material which can give the teacher a broader background to the subject.Themes One of the first tasks of the statistician when analysing a set of data is viewing the data in a variety of ways. looking for patterns and notable features in data. so I will share it with you. looking for intriguing patterns. students should be able to detect important characteristics. both graphically and numerically. Teachers must know more statistics than that outlined in the syllabus or contained in the textbook. which is published by the University of Zimbabwe and which is meant to be 'a device that would educate Zimbabwean schoolkids about the virtues of real maths. such as shape. In examining distributions of data. activity-based approach to statistics should be used. Computers and graphing calculators have an essential role in statistics as they excel at drawing graphs and doing calculations. The difference between association and causation must accompany this conceptual development throughout. Exploratory analysis of data makes use of graphical and numerical techniques to study patterns and departures from patterns. noting how statistics is used in assisting the process. Students should be concentrating on the underlying concepts. Whenever possible. unusual observations and the general characteristics of the dataset. surveying and forcasting. The notion of how one variable may be associated with another permeates almost all of statistics. location. and our students should study real problems with real data. This aspect of statistics is a focus of this website. variability. . y y y Statistics and Chocolate One of my favourite Internet publications is ZiMaths. This website has four underlying themes: y Data should be central to the study of statistics. Students need to be actively involved in the study of statistics. and justifying these decisions. calibrating machinery. learning to make appropriate decisions on the choice of summary statistics and analyses. An article that threads it way through the first three issues follows a bar of chocolate from its raw materials to its marketing. a collaborative. And I'm the Point 3! Jane Watson from the University of Tasmania talks on the Australian Broadcasting Corporation's Radio National program about the need for statistical literacy in the Australian community. students can generate conjectures about relationships among variables.

Shape is commonly categorised as symmetric. in minutes. The graphical displays below are based on 222 measurements of the duration of the geyser. 1997 The Six Characteristics of a Dataset Once some data have been gathered.http://curriculum. and as uni-modal.qed. so it should be the first characteristic to be noted. These six characteristics of a dataset are a good starting point in analysing a dataset. The shape of the Old Faithful dataset is bi-modal. there are a large number of estimates of 10 m and 15 m. The Old Faithful Dataset The Old Faithful dataset has some interesting features and hence will be the example used in this article. left-skewed or right-skewed.au/kla/eda/ © Education Queensland. .qld. Old Faithful is a geyser in Yellowstone National Park in Wyoming. This is almost certainly due to subjects rounding off their estimate and is a feature more of our number system than the size of the hall. in the Metric dataset which consists of forty-four estimates of the width of a lecture hall. although a fuller analysis extends to looking for unexpected anomalies and patterns in the data.The Six Characteristics of a Dataset From the Exploring Data website . bi-modal or multi-modal. the first step in working with the data is to look at it in a variety of ways.gov. For example. Shape The shape of a dataset will be the main factor is determining which set of summary statistics best summarises the dataset. Note that both the histogram and the dotplot do a good job of showing this while the boxplot doesn¶t indicate this at all.

The Old Faithful dataset shows evidence of granularity.calculators when studying statistics is the necessity of viewing the data in a variety of ways. By examining the original data it becomes clear that this is the result of the data being rounded to one decimal place and is not a feature of the data itself. \$35 000 for tradespersons and \$50 000 for management. Continuous data can show granularity if the data is rounded. The choice of bin width of a histogram can markedly alter the apparent shape of the data. As they are so quick to generate. eg annual wages for a factory may cluster around \$20 000 for unskilled factory workers. By default. . Clustering Clustering implies that the data tends to bunch up around certain values. discrete data has some granularity as only certain values are possible.5 minutes. Granularity Granularity implies that only certain discrete values are allowed. eg a company may only pay salaries in multiples of \$1.000. The Old Faithful dataset shows two clusters centred around 2 minutes and 4. it may be worth our while looking at some alternative histograms to see what they show. especially if the data is not uni-modal. A dotplot shows granularity as stacks of dots separated by gaps. Clustering shows up most clearly on a dotplot. Other Features With the availability of computers and low cost statistics software it is possible to calculate summary statistics and generate graphical displays very rapidly. Without technology to draw the graphs this would be impossible to do efficiently.

skew or bi-modal? Are there any unusual data values such as outliers? . They are quicker and easier to construct by hand than histograms. is it symmetric. so for some datasets that otherwise meet the criteria. The bins may be too large or too small to properly display the distribution of the data. For these datasets. a histogram is preferable. It is a worthwhile exercise to give students a dataset that is not unimodal and ask them to choose the best histogram and then defend their decision. Which particular displays are best is not a question that can necessarily be answered before the data is viewed. students should consider these questions: What is the location of the data? How much is the data spread out? What is its overall shape. The choices of µbin width¶ are limited. Once a stemplot is constructed. hence a statistican will view the data in different ways. A stemplot shows the shape of the distribution and indicates whether there are potential outliers. Constructing a stemplot is often the first step in analysing a dataset.The choice of bin width (and hence the number of bins) does change the appearance of the histogram. The main point is that the plot should quickly inform us about the salient features of the dataset. Stemplots The purpose of displaying data graphically is to give a visual display of the interesting and important features of the dataset. They are also appropriate if it is important to retain the original data. How to displaying a particular dataset with a stemplot often requires judgement. How to split the stems. a stemplot may not be very useful. and helps to determine what analysis is appropriate. Stemplots are useful for displaying small datasets with only positive values. how to represent outliers and whether to truncate the data are decisions that often have to be made. Which one µbest¶ gives a true picture of the data is subjective.

NCSS has chosen a two-digit stem with single digit leaves.01 = 5.65 5. scales the data to remove decimal points. While the density of the earth is obviously not uniform.01 Example: 1 |2 Represents 0.46 5.07 5. representing 4.26 5.34 5.29 5.85 5. counting in from each end. and gives the number of entries in that row. The units are grams / cm3.62 5.12 .88 5. Henry Cavendish measured the density of the earth using an instrument called a torsion balance.29 5. Multiply this by the unit (. The entry in brackets locates the row that contains the median.42 The outlier is labelled as µLow¶ and the entire value (407.79 5.5 5. Unit = . 54 | 2 represents a value of 542. For example.01) to return the original value: 542 x .be scaled. Density Measurements 5.07) is given in the 'Leaves' column.42 5. the value of the mean density is important in determining the earth¶s composition. Stem-Leaf Plot Section of Density Depth Stem | Leaves Low | 407 2 2 2 3 7 12 (4) 13 8 4 2 48 | 8 49 | 50 | 51 | 0 52 | 6 7 9 9 53 | 0 4 4 6 9 54 | 2 4 6 7 55 | 0 3 5 7 8 56 | 1 2 3 5 57 | 5 9 58 | 5 6 The scale is given at the bottom.75 4. For example NCSS Jr.27 5.39 5. Three Examples The Density of the Earth Dataset In 1798.0 Jr stemplot of this display along with some comments.63 5.58 5.57 5.44 5. by multiplying or dividing by a power of 10. Comments The Depth column records the cumulative number of data values. For this dataset NCSS multiplies each value by 100 to remove the decimal point.86 Here is the NCSS 6.55 5.1 5.34 5.3 4.53 5.47 5.61 5.36 5.

Since the original data are retained. The data exhibits two peaks. The number of eggs produced by the fleas over 27 consecutive days is given below. The true width of the hall was 13. in order to study the egg production of the flea. | 89 1* | 0 0 0 0 0 0 1 1 1 1 T | 22333 F | 44455555555 S | 6667777 . which are due to students choosing 10 and 15 more often than numbers near to those. and µ. This is a common method of splitting stems.The Metric Dataset Shortly after metric units were introduced in Australia.27 . There are four high outliers which are given in the 'High' row at the end of the stemplot. It is a reflection of our number system and the rounding inherent in estimation. the reason for the two peaks can be determined from the stemplot. Source: Introduction to the Practice of Statistics. The labels used are as follows: '*¶ represents 0-1 µT¶ represents 2-3 µF¶ represents 4-5 µS¶ represents 6-7. 38. The Fleas Dataset Researchers at the Purdue University School of Veterinary Medicine deposited 25 female and 10 male fleas in the fur of a cat. a group of 44 students was asked to guess. David Moore and George McCabe. NCSS Jr has split the stems into five parts.1 metres. Guesses (Metres) 8 14 17 9 14 17 10 14 18 10 15 18 10 15 20 10 15 22 10 15 25 10 15 27 11 15 35 11 15 38 11 15 40 11 16 12 16 12 16 13 17 13 17 13 Stem-Leaf Plot Section of Guess Depth Stem Leaves 2 12 17 (11) 16 9 7 6 5 . | 88 2* | 0 T | 2 F | 5 High | 27. 40 Unit = 1 Example: 1 |2 Represents 12 Comments To achieve the best display. to the nearest metre.¶ represents 8-9. p. 35. the width of the lecture hall in which they were sitting.

94 Unit = 10 Example: 1 |2 Represents 120 Comments The stemplot shows the data has two clusters.' are traditional. the leading digit was repeated. As the data was gathered over time a timeplot should also be constructed as this shows how the number of eggs changed over time. This stemplot shows an alternative method of displaying split stems. The last digit of the data was truncated. of eggs 436 495 575 444 754 915 945 655 782 704 590 411 547 584 Day 15 16 17 18 19 20 21 22 23 24 25 26 27 No of eggs 550 487 585 549 475 435 523 390 425 415 450 395 405 Stem-Leaf Plot Section of No_Fleas Depth Stem Leaves 2 9 13 (3) 11 6 6 5 4 3 | 99 4 | 0112334 4 | 5789 5 | 244 5 | 57899 6 | 6 | 5 7 | 0 7 | 58 High | 91. Both methods are common. the stemplot is easier to read. The two high outliers are listed at the bottom of the stemplot to two significant figures.Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No. Instead of using the symbols '*' and '. . While some detail from the original data is lost.' . though the '*' and '.

For smallish datasets a dotplot is easy to construct. In what sense is machine C2 µbetter¶ at producing pins? Justify your argument. compare C1 and C2 in light of "the six features that are often of interest when analyzing a distribution of data. 2. Two machines. i. 1. Dotplots A traditional dotplot resembles a stemplot lying on its back. variation. including the graphical display of data.01 cm or they are rejected. so the dotplot is a particularly valuable tool for the statistics student who is working without technology. without doing any calculations or counting. Visit the STEPS page for further information and a list of the modules available. outliers. By simply looking at the dotplots. a course designed to give successful high school students university credit for introductory statistics. Here is an assessment item from a test by Al Coons' website to illustrate these features. clustering and granularity. location and spread of the distribution.centre.STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. as well as showing evidence of clusters. It does a good job of displaying the shape. 3.e. His website supports AP (Advanced Placement) Statistics. In what sense is machine C1 µbetter¶ at producing pins? Justify your argument. are making pins which must have a diameter of 8 cm ± . They are both on the same scale. symmetry. granularity and outliers. . with dots replacing the values on the leaves. . C1 and C2. Dotplots of 50 pins from each machine are displayed below.

The darker points represent two or more values plotted at the same location Which charactistics of the dataset does this dotplot highlight? This dotplot shows that the data is bimodal. but each display adds to the overall picture that we are trying to form. Histograms As a teacher of junior maths and Maths in Society. There is some granularity evident. and gives a good feel for the spread of the data. I used to think that a histogram was a rather trivial statistical object. This type of dotplot doesn¶t give a good feel for the shape of the distribution of the data or allow the student to accurately estimate the location of the centre. I never realised that statisticians actually find histograms to be useful! . and there are no outliers.An Alternative Method of Constructing a Dotplot Here is a dotplot from NCSS 97 of the time between eruptions from the Old Faithful dataset. As there are over two hundred data values it would not have been feasible to use a more traditional dotplot. This plot displays the scale along a vertical axis. Access to statistics software is vital if the student is to generate these displays without getting bogged down in this stage of the analysis. The horizontal component is randomised so that not all points are plotted at exactly the same location. The value of each dot is given by its vertical component. sort of a bar graph with the gaps removed to save space. For many real datasets a single type of display doesn¶t suffice.

including the graphical display of data. Students will improve their ability to visualise the shape of a distribution given the summary statistics. With histograms. The article How Wide Is Your Bin? contains an interesting thread (i. including many hypothesis tests. therefore stemplots are preferable for a small dataset.e. unlike the stemplot. the location and the spread. the original data are usually lost. And no statistican would rely strictly on formal tests without viewing the data also. Bin Width Statistics computer programs and graphical calculators will generate a default histogram if bin width or the number of bins is not specified. It is interesting that there is no clear winner in the choice of algorithm used for choosing the number of bins or the bin width. Matching Histograms and Boxplots Matching Histograms and Summary Statistics Students will improve their ability to interpret the information given in a boxplot by matching boxplots of sample data drawn from different distributions with their associated histograms. symmetry. One application of the humble histogram is determining if a set of data is approximately normally distributed.A modern data-centred approach to statistics starts with viewing the data in a variety of ways. Visit the STEPS page for further information and a list of the modules available. a discussion topic) from the Ed-Stats mailing list. A histogram with a scale on the horizontal axis is generally useful for showing all of these features. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. though for some distributions the features of a dataset can be disguised or distorted due to a particular choice of bin width. though for a given dataset one or the other of these methods of displaying the data may be preferable. While there are formal tests of normality. Histograms and Stemplots Compared A histogram shows much the same information as a stemplot. existence of outliers and evidence of clusters or gaps. Some points to note: y y y y Histograms are preferable for larger datasets as stemplots become unwieldy. What is meant by viewing the data? Features of interest to a statistician are the overall shape of the data. though a histogram is most effective with for this purpose if the dataset is large. Histograms take more time than a stemplot to construct by hand. The Density Trace . often a quick look at a histogram of the data is sufficient. Normality is a pre-condition for certain analyses of data. The choice of bin size or number of bins is not restricted.

each with a different bin width. From the Exploring Data website . Which histogram is preferred depends upon which aspects of a dataset are to be featured.gov. and watch how that effects the shape of the histogram. Slide the bar to change bin widths. For some datasets.qld. However. The article (which is the NCSS Jr help file on this topic) The Density Trace discusses this plot further. and constructing a histogram by hand is a tedious process. have a look at the Histogram Applet. as a variety of histograms.Efficient Storing of Data on the TI-83.e. about storing data in a program for later use.0 Jr allows the user to add a display called a density trace to a histogram. Dept.http://curriculum.qed. To see this for yourself. from R. can be constructed. you usually lose the actual data values. Beware the Humble Histogram! Ideally a histogram should show the shape of the distribution of the data. It is efficient. The Histogram and Stemplot Compared A histogram is an alternative to a stemplot for displaying data. It is displayed as the curved line in the diagram. new methods of displaying a dataset have been invented. Univ. 1997 Histograms Worksheet Datasets and Stories for Histograms There is benefit in students using the same datasets for different analyses. USA. The series nicely shows the effect of bin size on the appearance of the histogram.With the widespread use of computers in modern statistics. the Word document Old Unfaithful contains a series of histograms of the interruption time of the Old Faithful geyser. NCSS 6. The density trace can be thought of as a smoothed histogram in which the problems caused by fixed bin widths are obviated. a histogram is under no such restriction. If they are using a graphing calculator the students don¶t need to enter a new set of data into the lists. Webster West.) Another benefit is the opportunity to contrast the features of the data highlighted by each display. A stemplot is restricted by our number system to certain bin widths. of South Carolina (you will need a javaenabled browser to see the applet). For these reasons I suggest the students use the datasets and stories on the stemplots .au/kla/eda/ © Education Queensland. (I recommend you read the article by Al Coons. a decision about bin sizes and the number of bins has to be made when tabulating the data. the choice of bin width can have a profound effect on how the histogram displays the data. Will you ever trust a histogram again? As most classrooms don¶t have Internet access on tap. When constructing a histogram by hand. time between eruptions) of the Old Faithful Geyser in Wyoming. A computer is of value here. as students don¶t need to acquaint themselves with a new story for each display. of Statistics. It is a histogram of the interruption time (i. A poor decision can result in a histogram that either gives misleading information about the data or fails to inform the viewer about some aspect of the data.

different choices of bin widths may give histograms that look markedly different. Oscar Winners. Note that graphical calculators and computer statistics programs don't necessarily choose the best display by default and hence it is an unwise student who doesn't construct a few histograms of varying bin widths as part of their analysis. Students should realise that a dataset doesn't have a single histogram but many histograms. Using Technology As noted elsewhere. Drawing a single histogram by hand should be sufficient for the students to get a feel for the mechanics of drawing a histogram. Speed of Light and Wild Horses. data that is symmetric and with no clustering or outliers. and then report to the class. For such datasets students will need to produce a variety of histograms. i. y look for any other features of interest such as clustering or gaps. one for each choice of bin width. Other datasets that are appropriate for histograms include Air Pollution. y note the overall shape of the distribution. Bradmanesque. Students need to practice writing a short report on the interesting features brought out by a graphical display. One approach would be to give each small group a different dataset and story and have them produce the display (say on a graphical calculator). and look for potential outliers. so additional histograms should be constructed using a computer or a . Analysing a Histogram After the histogram is drawn. especially to match the quality and accuracy of a histogram drawn by even the simplest computer statistics program. discuss within the group the characteristics of the data brought out by the display. For 'nice' data. students should y locate the approximate centre of the distribution by eye. y determine the spread of the data. and then make and defend their choice as to which is 'best'. With data that isn't so nice. the set of histograms may all give the same general picture so the choice of histogram is not critical.e. it is quite time-consuming to draw a histogram. Follow the links to the datasets and from there to their stories. Introducing Histograms Give students a set of data and the accompanying story.worksheet when learning about histograms as well as the data generated when the students played Greed!.

qed.qld. by writing the letter of the boxplot in the space provided. Students are given a worksheet which contains a series of histograms in the left column and a series of boxplots in the right column. of course. . From the Exploring Data website . which is available from the Resources page of the Exploring Data website. and possibly the least important task you could ask a student to do in statistics.http://curriculum.gov. 1997 Materials: Time: Instructions: Matching Histograms and Boxplots Match each histogram with its corresponding boxplot. for your interest. 1997 Matching Histograms and Boxplots Objective: Students will improve their ability to interpret the information given in a boxplot. It follows that students shouldn¶t be required to construct any of these displays by hand for assessment purposes . From the Exploring Data website . The distribution from which each sample was drawn in given in the solutions.graphical calculator. These remarks apply equally to other graphical displays.it is a trivial exercise.au/kla/eda/ © Education Queensland. One worksheet per student or small group. and deciding what the data say).au/kla/eda/ © Education Queensland. They need to be able to defend their decisions in a subsequent whole class discussion.qld. Let the technology shine in its sphere (repetitive algorithmic processes) and let the students shine in their sphere (looking for patterns.qed.http://curriculum. by matching boxplots of sample data drawn from different distributions with their associated histograms. 20 minutes.gov. Note: the sample data for this worksheet was generated with a nifty little freeware Windows program called PQRS. They are to match each boxplot with its associated histogram.

_______ A._______ D._______ B.Geometric(1) .1) A .1. 1997 Matching Histograms and Boxplots . 5. 4._______ [To Instructions] [To Solutions] E. 2. 3._______ C.Solutions D .Normal(0. © Education Queensland.

Weibull(4.calculating summary statistics is a waste of time until the user decides what is important about the data and which summary statistics may be useful. I must say I was intrigued to see that the topic of finding an average inStatistics.Geometric(1) B . Before I go into detail on these.9) C . Concepts and Controversies by David S.1. STEPS .1) B . median and mode are boring? Well. but there are some interesting little side alleys to this topic that are worth exploring. Moore is delayed until page 237! This illustrates two very important points .4) Measures of Location o you think teaching about the mean.A .Uniform(1.9) E -Uniform(1. Well maybe that is only one important point.5) [To Instructions] [To Worksheet] E .Weibull(4.4) D .Weibull(4.Weibull(4. maybe.Normal(0.1.5) C .

it doesn't give as much information about the data as the standard boxplot. Interquartile Range The interquartile range. Even the TI-83 graphical calculator. Mean Deviation Until recently I was never able to satisfactorily answer the question. Unfortunately boxplots in our texts tend to only be simple boxplots. In fact I would rate these questions as being at least 1. so I recommend that your students learn to draw standard boxplots as well as simple ones. Nonetheless statisticians almost exclusively draw boxplots with outliers. even if done by hand. Visit the STEPS page for further information and a list of the modules available. While this is suitable for a quick analysis. This shouldn't be surprising as the Mathematics A syllabus only makes mention of simple boxplots. and is the basis of the article. When to Choose the Boxplot . courtesy of Pat Ballew. while simple in concept. There are two basic flavours of boxplot. The article How to Construct a Boxplot explains how to do it. I'm Not Mad about MAD. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. including summary statistics. the calculator of choice for AP-Statistics students in the US. "The mean deviation is simple. if they exist. in the context of constructing a boxplot. The 'simple' boxplot has the whiskers drawn out to the maximum and minimum values.5 standard deviations above the average question. Why is the standard deviation used rather than the mean deviation?" An email by Paul Gardner from Monash University gives a clear explanation. Boxplots The treatment of boxplots in current senior secondary textbooks highlights the need for Queensland high school teachers to use resources other than the textbook when teaching introductory statistics. It could have been worse . Such points are called outliers. which draws the whiskers no longer than 1. The process isn't difficult.an early model of a graphical calculator available in Queensland used the mean rather than the median to mark the centre of the dataset. draws boxplots two ways.is obviously not very robust and hence is not particularly useful. and the default boxplot ignores outliers.5 IQRs from the box and locates points beyond that individually. and makes a recommendation as to which is 'best'. has caused much grief to introductory statistics teachers since different respectable sources define it in different respectable ways! The article Ticky-Tacky Boxesdiscusses the different methods of finding Q1 and Q3. Standard Deviation The Measures of Spread worksheet contains some lovely questions on standard deviation.

I teach students a method that is easy to remember and easy to do. It is a great story to tell students who wantonly delete outliers from a dataset merely because theyare outliers. a statistician looks at the data. Constructing Boxplots It is interesting that there is general agreement among statisticians about how to construct the whiskers and determine outliers (which is where the problem lies with our texts) but very little agreement on how to construct the box. Differences in the centres and spread of the datasets are clearly visible with a boxplot. and histograms and a boxplot would be the displays most often chosen. including boxplots. There is assessment available from theAssessment page. Visit the Ozone and Outliers page for all of the fascinating details. who has provided much of the information for this article. The Boxplots worksheet contains data drawn from physics. The Codeine Concentrationsworksheet has some data suitable for displaying using boxplots. you may find the article Ticky-Tacky Boxes interesting. please don't try to tell your students about all of this. Thanks to Bob Hayden. The 1970 Draft Lottery . Read the article How to Construct a Boxplot for details. may be needed for a solution.whatever you do. Ticky-Tacky Boxes If you are interested in learning about different methods of calculating the 1st and 3rd quartiles (and the angst this has caused among AP-Stats teachers). cricket and biology. You will only confuse the cherubs. and shows outliers very clearly. Prior to conducting a hypothesis test. where a variety of graphical displays. A condition of many hypothesis tests is that the data is approximately normally distributed and a boxplot can assist in determining this. Matching Histograms and Boxplots Students will improve their ability to interpret the information given in a boxplot by matching boxplots of sample data drawn from different distributions with their associated histograms. including the graphical display of data. Visit the STEPS page for further information and a list of the modules available. Warning . A boxplot also gives a picture of the symmetry of a dataset. The article is based on emails from the AP-Stat and Edstat mailing lists. Both of these features are important when deciding which summary statistics would best describe the dataset. Using the KISS principle. Ozone and Outliers The 'ozone hole' above Antarctica provides the setting for one of the most infamous outliers in recent history.Boxplots are most useful when comparing two or more sets of sample data. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics.

and preferably understand how to construct one. discusses why it is a valuable tool in your arsenal of graphical displays and gives you a recipe for making your very own plot. Based on what the boxplots show. introduces the normal plot. At a minimum a student studying the Probability and Statistics optional unit in Mathematics C should be able to interpret a normal plot.. Normal plots are not mentioned in the Mathematics A. 3-D Scatterplot Java Applet Given three columns of data. You will need a VRML browser to view the scatterplot. Yours truly was given a free ticket in the lottery. this site generates a VRML file for viewing the data in 3-dimensions.. which gave rise to possibly the single most famous set of boxplots in existence. A nice introduction to scatterplots and the importance of displaying data clearly. it is often rather difficult to classify a particular document as one or the other. so the story on this page is of uncommon interest to me. Using a Scatterplot to Find a Friend Peter Smith from Mechanicsburg High School in Pennsylvania shares this great introductory activity that helps students learn about scatterplots and correlation.the first has F. B or C syllabus.No discussion of boxplots should leave out the story of the 1970 Draft Lottery. Scatterplots The Challenger Disaster A worksheet that gives a brief background to the Challenger disaster and the dataset that gave warning of the disaster. which is one of the assumptions on which the t-test is based. It convinced me anyway.0 Jr.0 only) . The article Normally I Wouldn't Reveal the Plot . Great fun. Anscombe's famous quartet of datasets and the second has the scatterplots of these datasets. Nonetheless the collection here consists of documents that were specifically intended to be assessment. However they are a valuable tool for determining if a dataset is normal. Absolutely convincing proof of the need to look at the data first. including 'flying' right in the middle of the data. Anscombe's Datasets Two overhead transparencies . The scatterplots were produced using NCSS 6. the first lottery held to select those chosen to serve in Vietnam. Bradmanesque (available in Word 2. Assessment As one person's worksheet is another's assignment. it turns out that this October-born lad was even luckier than was thought at the time.J. Normal Plots The normal probability plot (sometimes called a quantile plot) is a useful tool for determining the normality of a dataset.

The Age of Female Actor Oscar Winners This dataset has some intriguing patterns. As an indication of the size of the item bank the Descriptive Statistics section (one section of eight sections) has a filesize of about 150K. mainly in the US) maintain websites with worksheets. Pecking Order in Chickens A researcher on animal behavior wants to study the relationship between pecking order and weight. Topics: summary statistics. In this assigment the student trys to find a mathematical function which allows the price of a diamond ring to be determined from the size of the diamond. Topic: curve fitting. AP Stats Assessment A number of teachers of Advanced Placement Statistics (a first year tertiary statistics course taught in high schools. Al Coons at Buckingham Browne & Nichols School . The AP-Statistics course covers all of the statistics in Maths A. As the researcher¶s assistant. Pricing Diamond Rings Pricing diamond rings in Singapore can be viewed as an interesting exercise in statistical modelling. graphical displays. a craftsmanship fee plus the cost of the diamond. you have been asked to analyse this data (and possibly generate some graphical displays) and write a report on the relationship. They are categorised for convenience. datasets and assessment. many of them multiple choice from Georgia State University. Topics: graphical display of data. if any.A lovely assignment that requires students to draw from their pool of knowledge about descriptive statistics. B and C and more. summary statistics. curve fitting. Topics: various Introduction to Business Statistics at Georgia State University An absolutely enormous item bank of statistics questions. with their associated datasets available from the Datasets page. More Stories Here are some more stories. The price equals the current market value of the gold content of the ring. Students are asked to determine if the average age of female actor Oscar winners is increasing. summary statistics. He places four chickens in each of seven pens and observes the pecking order that emerges in each pen. between pecking order and weight. The assignment includes a graphic of some of Galileo's original notes. as you desire. These can be turned into assessment items or further examples or exercises. Galileo's Gravity and Motion Experiments This dataset may need some dusting off as it is over 400 years old! Galileo produced this data when he was studying motion under gravity. Topics: graphical display of data. Topics: linear regression.

an Outlier? Bradmanesque Carbon Dioxide Carbon Emissions Challenger Cloud Seeding Codeine Concentration Cricket (the Insect) Density of the Earth Density of Nitrogen Diamond Rings Fleas Topics boxplot boxplot scatterplot. worksheets.0 datasets require that you download two files. assessment and articles in the Exploring Data website.Follow the link to Projects/Student Papers.0 . curve fitting scatterplot. Notes: Clicking on the name of the dataset will give you the story behind the dataset. summary statistics nonlinear regression data display. NCSS Jr. 6. Paul Myers at Woodward Academy Follow the link to Assessment.Excel 4.0 and Tab Delimited . NCSS Jr 6. time series Formats Excel NCSS Tab . summary statistics boxplot linear regression. The articleStudent-Generated Data discusses a few ways that this can be done. His complete set of tests from 1997 is currently available. Note that some of the projects are outside of our syllabus areas (eg Chi-Square). Datasets These datasets support the activities. nonlinear regression boxplot. Datasets are available in three formats . with extensions . summary statistics curve fitting curve fitting scatterplot boxplots. Paul is posting his tests in html format. dotplot boxplots.Al's website is a real treasure for anyone teaching AP Statistics for the first time.s0 and . Dataset 1970 US Draft Lottery 1971 US Draft Lottery AIDS / HIV Air Pollution Alligator! Anscombe's Dataset Bradman . scatterplot boxplot. graphical display graphical display. t-test linear regression graphical display. But they should also gather data themselves.s1 Students should work with real data gathered by others for purposes of solving problems.

In real life functions often arise from data gathered from experiments or observations. summary statistics linear regression. curve fitting graphical display. regression scatterplot. defined as the number of steps per second. summary statistics exponential regression histogram linear regression graphical display.Galileo's Experiments Global Temperature 1 Global Temperature 2 Metric Estimates Oil Production Old Faithful Olympic Gold Oscar Winners Pecking Order Pottery Smoking and Cancer Speed of Light Stride Rate Wild Horse World Population Year 10 Certificates polynomial regression scatterplot. Using Statistics in Human Movements One measure of form for a runner is stride rate. . summary statistics graphical display. the student MUST first plot the data. summary statistics boxplots scatterplot. The first functions we study in Maths B are linear. Anscombe invented these datasets to demonstrate the importance of graphing the data before finding the correlation and line of regression. linear regression graphical display. Looking at the Data When fitting a function to data.J. There is variability in real data that needs to be explained and measured. This article gives a fully-worked solution to finding the stride rate as a function of speed using the statistics functions of the TI-83 graphics calculator. summary statistics Linear Regression The study of functions in Maths B can be enriched by including authentic applications which illustrate how mathematics can model aspects of the world. and it is the task of the student to find the function that best 'fits' the data in some sense. The stride rate is related to speed. the greater the speed. F. summary statistics exponential regression graphical display. regression graphical display. and this activity shows why. and such data rarely falls neatly into a straight line or along a curve. A runner is considered to be efficient if the stride rate is close to optimum. They present a very striking picture. the greater the stride rate. so it makes sense to start with problems that are whose data are linear in nature.