SOURCE: http://exploringdata.


Welcome to the Exploring Data website. This website provides curriculum support materials for teachers of introductory statistics.

TABLE OF CONTENTS Read Me First! What's here, and how to find it. Also copyright information. Introduction to Exploring Data Statisticians do it! And so should you. Looking for Patterns The most valuable feature of a dataset may be that which is unexpected. Looking at the data in a variety of ways may reveal interesting and surprising patterns. Stemplots All you need to know about this useful graphical display. Activities, worksheets and extension material are available from this page. Dotplots Dotplots are often the neglected cousin in the family of graphical displays of data. But they are easy to construct and can tell us much about a dataset. Histograms Histograms are very useful, but care needs to be taken in constructing and interpreting them. Remarkably, research about histograms is still being conducted. Measures of Location So you think the mean, median and mode are boring? Well, maybe, but there are some interesting little side alleys to this topic that are worth exploring. Measures of Spread Visit here, and you will learn about some vary useful statistics. Boxplots Visit this page and you may possibly learn more than you ever wanted to know about boxplots. Normal Plots Not included in Queensland syllabuses, a normal plot of a dataset shows at a glance if a dataset is approximately normally distributed. A very useful display to put in your display cabinet. Scatterplots When working with bi-variate data, the absolutely first thing a statistician does is construct a scatterplot. So the absolutely first thing a student working with bivariate data should do is construct a scatterplot. Assessment Some nice assignments and test questions are available here. Datasets Students should work with real data. So here is some, available in tab-delimited, Excel 4.0 and NCSS 6.0 Jr formats. Linear Regression Linear regression has the potential to integrate the topics of Introduction to Function and Applied Statistical Analysis in Maths B. This page explains how it

can be done. Normal Distribution Discover the link between the 1.5*IQR Rule and the normal distribution, and why you should use light bulbs to burn those traditional statistics textbooks. Probability Probability is a wonderful subject to teach! There are so many activities for teaching concepts, puzzles and problems with non-intuitive answers and a variety of contexts for the exercises. This page contains a small collection. Sampling Contains a nice little activity - the JellyBlubbers - which was modified from a problem in a leading textbook which was modified from an activity in Activity Based Statistics. And the original activity started as a bucket of rocks on the desk of a statistics teacher. An activity with impeccable lineage. Confidence Intervals Are you 95% confident that you can correctly teach your students the correct meaning of confidence intervals? Hypothesis Testing Teaching students to understand hypothesis testing is a difficult business indeed. This page contains some activities that will give students some hands-on experiences with the underlying concepts. Curve Fitting Contains Anscombe's famous dataset, and a comprehensive manual on using technology to fit functions to data.

Read This First! This website is an outcome of the 1997 Raybould Tutorial Fellowship. Each year the Raybould Fellow spends the second semester on a project to support senior secondary mathematics in Queensland. For my project I have chosen to provide curriculum support for the topic of Exploring Data and have chosen a website rather than a booklet as my mode of publication. Resources This website contains activities, worksheets, overhead transparency masters, datasets and assessment to support data exploration. It also contains an extensive collection of articles designed to enhance the statistics knowledge of the teacher. There is a resources page that gives a select list of the finest resources available to support introductory statistics, including texts, websites, datasets, java applets and mailing lists. To make the resources accessible to a wide range of people, the majority of the resources are available as web pages and as Word 2.0 documents. A web page can be accessed by pointing to the image leads to a Word 2.0 document. FYI Many secondary mathematics teachers in Queensland completed their formal study of statistics years ago, or have never studied statistics. Their knowledge of modern statistics may not extend much beyond what is in the high school textbooks and thus they may feel uncomfortable when a more inquisitive student wants to know more about a topic than what is in the text. To assist these teachers, many pages of this website include a section headed FYI (For Your Information) that contains articles that discuss topics beyond those listed in the syllabuses. Many of the articles originated from discussions between the knowledgeable statistics educators who populate two Internet mailing lists devoted to statistics education. Datasets Datasets will be available in three formats: Tab delimited This is a format that any spreadsheet or statistics program should be able to read. This format is also suitable for uploading datasets to the TI graphics calculators. Excel 4.0 Excel is widely used in Queensland high schools. NCSS Jr. 6.0 NCSS is a professional statistics program written by Jerry Hinze. NCSS Jr. 6.0 is a 'light' version of the full NCSS 6.0 program. It provides excellent software support for all of the statistics in Mathematics A, B and C. It has an Excel-like interface, is easy to use and is absolutely free! Click here and follow the link to the downloads page to download it now. image while the

Knowledge in the subject allows us to make informed judgements about the statistics presented by others to persuade us. Introduction to Exploring Data Statistics is a fascinating subject. It is pointless. a dataset from another source). despite the title. How to Make Statisics Boring. as we are bombarded with statistics every day of our lives. such an attitude would not be surprising. All of the Maths A and B texts that I have examined are loaded with similar examples. Source: Boggs. As teachers we need to give our students an understanding of the place of statistics in modern society. . If you wish to use these resources you must seek permission of the owner of the copyright. It is trivial. I am not picking on this particular textbook. Brisbane. µHey. Construct a box-and-whisker graph for the following data which are the masses in kilograms of nine Year-11 girls: 35 47 48 50 51 53 54 70 75 This was chosen only because it is a typical example of the statistics that many of us are teaching our students. There are other things wrong with this exercise.. You are welcome to use the resources on this site freely for educational purposes if you acknowledge Education Queensland as the owner of the resource. Actually. R." Given the sort of statistics to which we¶ve been subjecting ourselves and our students over the years. other than the fact that it is boring. an interest in the subject and a solid grounding for further study. Consider the following exercise on constructing a boxplot. Teaching Mathematics. it doesn¶t need to be made boring. boring is not the most important issue. then you should acknowledge that source also. to both learn and teach! It is also an important one. It is worth noting that at the tertiary level more students study statistics than study calculus subjects. QAMT.Copyright Most of the resources on this site are the property of Education Queensland. You are not permitted to sell these materials for commercial gain without the express written consent of Education Queensland. statistics is already boring. Exceptions to this are material that is owned by another person or organisation and for which permission has been granted for use on this site. there are many more where this came from.g. The data are fake. which is from a popular Math A text. If this exercise doesn¶t convince you that statistics can be boring. If the resource you wish to use contains an acknowledgement (e. Setting the Scene (from How to Make Statistics Boring) Ah! I¶ll bet some of you thought. (1996).

process control. It gives a nice real-world example of queueing. as opposed to the spiceless variety taught in most schools'. From the AP-Statistics Guidebook I thought this was nicely written. From careful observations of patterns in data. and unusual values. sampling. from simple comparisons of proportions through linear regression. This website has extensive extension material which can give the teacher a broader background to the subject.Themes One of the first tasks of the statistician when analysing a set of data is viewing the data in a variety of ways. looking for patterns and notable features in data. so I will share it with you. looking for intriguing patterns. students should be able to detect important characteristics. both graphically and numerically. Teachers must know more statistics than that outlined in the syllabus or contained in the textbook. which is published by the University of Zimbabwe and which is meant to be 'a device that would educate Zimbabwean schoolkids about the virtues of real maths. such as shape. In examining distributions of data. activity-based approach to statistics should be used. Computers and graphing calculators have an essential role in statistics as they excel at drawing graphs and doing calculations. The difference between association and causation must accompany this conceptual development throughout. Exploratory analysis of data makes use of graphical and numerical techniques to study patterns and departures from patterns. noting how statistics is used in assisting the process. Students should be concentrating on the underlying concepts. Whenever possible. unusual observations and the general characteristics of the dataset. surveying and forcasting. The notion of how one variable may be associated with another permeates almost all of statistics. location. and our students should study real problems with real data. This aspect of statistics is a focus of this website. variability. . y y y Statistics and Chocolate One of my favourite Internet publications is ZiMaths. This website has four underlying themes: y Data should be central to the study of statistics. Students need to be actively involved in the study of statistics. and justifying these decisions. calibrating machinery. learning to make appropriate decisions on the choice of summary statistics and analyses. An article that threads it way through the first three issues follows a bar of chocolate from its raw materials to its marketing. a collaborative. And I'm the Point 3! Jane Watson from the University of Tasmania talks on the Australian Broadcasting Corporation's Radio National program about the need for statistical literacy in the Australian community. students can generate conjectures about relationships among variables.

but they failed to display the link between ambient temperature and O-ring damage in a clear and unambiguous fashion to those making the final decision. but that is a different issue). The Queensland Towns activity also contains the original data in table format. Due to the large number of graphics these activities are only available as Word 2. Inc had recommended that the flight be delayed.Looking For Patterns On January 28. Students are asked to discover the many intriguing patterns in the data and hence to deduce what was the incident. These rings had lost their resiliency because of the low temperature at the time of the flight. For a number of reasons. . that is the wrong place to start an analysis of data. And it certainly isn¶t what a statistician does. After gathering data. E. The launch proceeded with disastrous consequences. The link between O-ring damage and ambient temperature had been established prior to the flight. statisticians like to look at the data in as many ways as possible. The engineers at Morton Thiokol.0 documents.R. Students are asked to match each city or town with its climate chart. Weather Data for Capital Cities Weather Data for Queensland Towns These are follow-up activities to An Unusual Incident. A simple scatterplot showing the link between O-ring damage and ambient temperature during previous launches may have changed the decision about launching. Visual Explanations An Unusual Incident High school textbooks often start a unit on statistics with the calculation of the mean of a set of numbers (usually made up. Seven astronauts died because two rubber O-rings leaked during takeoff. For each location the mean monthly maximum and minimum temperatures and the median monthly rainfall are given in graphical form. and put into a trivial setting. Any unusual or interesting patterns in the data should be flagged for further investigation.. and the temperature of the O-rings about 6 degrees below that. The air temperature was about 0 0 Celsius. An Unusual Incident is an engaging activity to introduce a unit on statistics. 1986 the space shuttle Challenger exploded. How much damage would you have expected at 0 0Celsius? adapted from: Tufte.

Shape is commonly categorised as symmetric. in minutes. The graphical displays below are based on 222 measurements of the duration of the geyser. 1997 The Six Characteristics of a Dataset Once some data have been gathered.http://curriculum. and as uni-modal.qed. so it should be the first characteristic to be noted. These six characteristics of a dataset are a good starting point in analysing a dataset. The shape of the Old Faithful dataset is bi-modal. there are a large number of estimates of 10 m and 15 m. The Old Faithful Dataset The Old Faithful dataset has some interesting features and hence will be the example used in this article. left-skewed or © Education Queensland. .qld. Old Faithful is a geyser in Yellowstone National Park in Wyoming. This is almost certainly due to subjects rounding off their estimate and is a feature more of our number system than the size of the hall. in the Metric dataset which consists of forty-four estimates of the width of a lecture hall. although a fuller analysis extends to looking for unexpected anomalies and patterns in the data.The Six Characteristics of a Dataset From the Exploring Data website . bi-modal or multi-modal. the first step in working with the data is to look at it in a variety of For example. Shape The shape of a dataset will be the main factor is determining which set of summary statistics best summarises the dataset. Note that both the histogram and the dotplot do a good job of showing this while the boxplot doesn¶t indicate this at all.

Read the Ozone and Outliers article at this website to learn more about this fascinating story. A glance at the boxplot of the Old Faithful dataset shows that this dataset contains no outliers.blindly following a procedure will not always give the best results. often just estimated by eye. so it is the term I've adopted for this website. or if it is non-representative (or an error) in which case it can be excluded. Again. as it usually is used when the data be approximately normally distributed. Outliers Outliers are data values that lie away from the general cluster of other data values. an approximate value is sufficient initially.5. It may be that an outlier is the most important feature of a dataset. This example illustrates an important point . Each outlier needs to be examined to determine if it represents a possible value from the population being studied. For the Old Faithful dataset the standard deviation doesn¶t give a good picture of the spread of the data. but the values were tossed out by a computer program because they were smaller than thought possible. with the choice of measure of spread being informed by the shape of the data. and its intended use. If more accurate values are wanted then the dataset could be broken into two sections and the mean or median of each section calculated independently. and it is fairly obvious that these values tell us very little about the data. I would say none of these are a good measure of location! The mean is 3. The interquartile range again is unsatisfactory as it doesn¶t give a true picture of how the data is distributed. and discussing the spread of each section. When initially examining a dataset only an approximate location is needed. Common measures of spread are variance. or at least uni-modal and reasonably symmetric. The best choice of display when looking for outliers is the boxplot. Either the standard deviation or the interquartile range could be used. Less commonly used is the range. 'Location' is both simpler and more descriptive than 'measure of central tendency'. in which case it should be retained. the mid-range (the value midway between the minimum and maximum values) and the truncated mean (where a fixed percentage of the largest and smallest scores are deleted from the dataset and the mean of the remaining data is calculated) For the Old Faithful dataset.6 and the median is 4. A more sophisticated description of location would be to say that the data is bi-modal with one peak about 2 and the other about 4. Spread This is a measure of the amount of variation in the data. There is a true story that the ozone hole above the South Pole had been detected by a satellite years before it was detected by ground-based observations. One strong argument for the need to use computers and graphing . Less common measures of location are the mode (the most frequent value). Looking at the data and using judgement about how to describe the location of the data are needed. Probably the best description of the spread would be found by dividing this dataset in two sections.Location Statisticians often use the term 'location' for what Queensland texts often call the µmeasure of central tendency¶. After further analysis the choice of measure of location should become clearer. as it is not very robust. depending on which measure of location was chosen. Note that the three displays complement each other in the information they provide about the data. standard deviation and the interquartile range. Common measures of location are the mean and median.

The Old Faithful dataset shows evidence of granularity.calculators when studying statistics is the necessity of viewing the data in a variety of ways. By examining the original data it becomes clear that this is the result of the data being rounded to one decimal place and is not a feature of the data itself. $35 000 for tradespersons and $50 000 for management. Continuous data can show granularity if the data is rounded. The choice of bin width of a histogram can markedly alter the apparent shape of the data. As they are so quick to generate. eg annual wages for a factory may cluster around $20 000 for unskilled factory workers. By default. . Clustering Clustering implies that the data tends to bunch up around certain values. discrete data has some granularity as only certain values are possible.5 minutes. Granularity Granularity implies that only certain discrete values are allowed. eg a company may only pay salaries in multiples of $1.000. The Old Faithful dataset shows two clusters centred around 2 minutes and 4. it may be worth our while looking at some alternative histograms to see what they show. especially if the data is not uni-modal. A dotplot shows granularity as stacks of dots separated by gaps. Clustering shows up most clearly on a dotplot. Other Features With the availability of computers and low cost statistics software it is possible to calculate summary statistics and generate graphical displays very rapidly. Without technology to draw the graphs this would be impossible to do efficiently.

skew or bi-modal? Are there any unusual data values such as outliers? . They are quicker and easier to construct by hand than histograms. is it symmetric. so for some datasets that otherwise meet the criteria. The bins may be too large or too small to properly display the distribution of the data. For these datasets. a histogram is preferable. It is a worthwhile exercise to give students a dataset that is not unimodal and ask them to choose the best histogram and then defend their decision. Which particular displays are best is not a question that can necessarily be answered before the data is viewed. students should consider these questions: What is the location of the data? How much is the data spread out? What is its overall shape. The choices of µbin width¶ are limited. Once a stemplot is constructed. hence a statistican will view the data in different ways. A stemplot shows the shape of the distribution and indicates whether there are potential outliers. Constructing a stemplot is often the first step in analysing a dataset.The choice of bin width (and hence the number of bins) does change the appearance of the histogram. The main point is that the plot should quickly inform us about the salient features of the dataset. Stemplots The purpose of displaying data graphically is to give a visual display of the interesting and important features of the dataset. They are also appropriate if it is important to retain the original data. How to displaying a particular dataset with a stemplot often requires judgement. How to split the stems. a stemplot may not be very useful. and helps to determine what analysis is appropriate. Stemplots are useful for displaying small datasets with only positive values. how to represent outliers and whether to truncate the data are decisions that often have to be made. Which one µbest¶ gives a true picture of the data is subjective.

Most computer-generated stemplots display the outlier as a data value outside of the stemplot proper. It is a nice activity where amongst other things students will learn how to construct a we include a large number of empty rows between an outlier and its nearest value? Finally how do we handle very large and very small numbers? The worked solutions to the worksheet Advanced Stemplotsillustrates some common practices. the stem is best split into two parts. There is also the decision on what to do with outliers . it is common to decide how many digits are needed and then truncate the data.qed. Not much is lost by doing this. 8-9. Imagine a dataset that contains an extreme outlier. It is a matter of judgement when to adopt this approach. For data with a large number of significant digits. Nonetheless. It isn't sensible to extend the stem to include the outlier. For example one row may have many more elements in it than the other rows. 2-3. Advanced Stemplots Some might say 'advanced stemplots' is an oxymoron. © Education Queensland. eg. The student might ask. If the values to be plotted are extremely large or extremely small the data has to Truncated data values Outliers Scaling . as the essence of the original data is still retained. Other datasets may benefit if the stem is split into five parts: 0-1. which means including row after row of empty at the top or bottom of the stemplot as appropriate. µIs this a random occurance or is it a relevant feature of this dataset?¶ Often answering such questions isn¶t easy. An advanced stemplot includes one or more of these features: Split stems One purpose of a stemplot is to display the shape of the distribution. From the Exploring Data website . Data is truncated rather than rounded as it is easier to do.qld.stemplots by their nature should be simple to construct. 4-5. However with some datasets it may be necessary to split the stems. and with others to truncate the data. with one part containing unit values from 0 to 4 and the other part from 5 to 9. there may be times when a stemplot is desired and constructing it involves a greater effort than usual. To achieve a satisfactory display of some datasets. 6-7. and labelled as HIGH or LOW. Constructing a Stemplot If you have never constructed a stemplot. visit the webpage Greed!. 1997 Advanced Stemplots µAdvanced stemplots¶ is really a contradiction .Is there evidence of clustering? Students should also note any unusual features of the dataset not highlighted by the above questions.

NCSS has chosen a two-digit stem with single digit leaves.01 = 5.65 5. scales the data to remove decimal points. While the density of the earth is obviously not uniform.01 Example: 1 |2 Represents 0.46 5.07 5. representing 4.26 5.34 5.29 5.85 5. counting in from each end. and gives the number of entries in that row. The units are grams / cm3.62 5.12 .88 5. Henry Cavendish measured the density of the earth using an instrument called a torsion balance.29 5. Multiply this by the unit (. The entry in brackets locates the row that contains the median.42 The outlier is labelled as µLow¶ and the entire value (407.79 5.5 5. Unit = . 54 | 2 represents a value of 542. For example.01) to return the original value: 542 x .be scaled. Density Measurements 5.07) is given in the 'Leaves' column.42 5. the value of the mean density is important in determining the earth¶s composition. Stem-Leaf Plot Section of Density Depth Stem | Leaves Low | 407 2 2 2 3 7 12 (4) 13 8 4 2 48 | 8 49 | 50 | 51 | 0 52 | 6 7 9 9 53 | 0 4 4 6 9 54 | 2 4 6 7 55 | 0 3 5 7 8 56 | 1 2 3 5 57 | 5 9 58 | 5 6 The scale is given at the bottom.75 4. For example NCSS Jr.27 5.39 5. Three Examples The Density of the Earth Dataset In 1798.0 Jr stemplot of this display along with some comments.63 5.58 5.57 5.44 5. by multiplying or dividing by a power of 10. Comments The Depth column records the cumulative number of data values. For this dataset NCSS multiplies each value by 100 to remove the decimal point.86 Here is the NCSS 6.55 5.1 5.34 5.3 4.53 5.47 5.61 5.36 5.

Since the original data are retained. The data exhibits two peaks. The number of eggs produced by the fleas over 27 consecutive days is given below. The true width of the hall was 13. in order to study the egg production of the flea. | 89 1* | 0 0 0 0 0 0 1 1 1 1 T | 22333 F | 44455555555 S | 6667777 . which are due to students choosing 10 and 15 more often than numbers near to those. and µ. This is a common method of splitting stems.The Metric Dataset Shortly after metric units were introduced in Australia.27 . There are four high outliers which are given in the 'High' row at the end of the stemplot. It is a reflection of our number system and the rounding inherent in estimation. the reason for the two peaks can be determined from the stemplot. Source: Introduction to the Practice of Statistics. The labels used are as follows: '*¶ represents 0-1 µT¶ represents 2-3 µF¶ represents 4-5 µS¶ represents 6-7. 38. The Fleas Dataset Researchers at the Purdue University School of Veterinary Medicine deposited 25 female and 10 male fleas in the fur of a cat. a group of 44 students was asked to guess. David Moore and George McCabe. NCSS Jr has split the stems into five parts.1 metres. Guesses (Metres) 8 14 17 9 14 17 10 14 18 10 15 18 10 15 20 10 15 22 10 15 25 10 15 27 11 15 35 11 15 38 11 15 40 11 16 12 16 12 16 13 17 13 17 13 Stem-Leaf Plot Section of Guess Depth Stem Leaves 2 12 17 (11) 16 9 7 6 5 . | 88 2* | 0 T | 2 F | 5 High | 27. 40 Unit = 1 Example: 1 |2 Represents 12 Comments To achieve the best display. to the nearest metre.¶ represents 8-9. p. 35. the width of the lecture hall in which they were sitting.

94 Unit = 10 Example: 1 |2 Represents 120 Comments The stemplot shows the data has two clusters.' are traditional. the leading digit was repeated. As the data was gathered over time a timeplot should also be constructed as this shows how the number of eggs changed over time. This stemplot shows an alternative method of displaying split stems. The last digit of the data was truncated. of eggs 436 495 575 444 754 915 945 655 782 704 590 411 547 584 Day 15 16 17 18 19 20 21 22 23 24 25 26 27 No of eggs 550 487 585 549 475 435 523 390 425 415 450 395 405 Stem-Leaf Plot Section of No_Fleas Depth Stem Leaves 2 9 13 (3) 11 6 6 5 4 3 | 99 4 | 0112334 4 | 5789 5 | 244 5 | 57899 6 | 6 | 5 7 | 0 7 | 58 High | 91. Both methods are common. the stemplot is easier to read. The two high outliers are listed at the bottom of the stemplot to two significant figures.Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 No. Instead of using the symbols '*' and '. . While some detail from the original data is lost.' . though the '*' and '.

For smallish datasets a dotplot is easy to construct. In what sense is machine C2 µbetter¶ at producing pins? Justify your argument. compare C1 and C2 in light of "the six features that are often of interest when analyzing a distribution of data. 2. Two machines. i. 1. Dotplots A traditional dotplot resembles a stemplot lying on its back. variation. including the graphical display of data.01 cm or they are rejected. so the dotplot is a particularly valuable tool for the statistics student who is working without technology. without doing any calculations or counting. Visit the STEPS page for further information and a list of the modules available. outliers. By simply looking at the dotplots. a course designed to give successful high school students university credit for introductory statistics. Here is an assessment item from a test by Al Coons' website to illustrate these features. clustering and granularity. location and spread of the distribution.centre.STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. as well as showing evidence of clusters. It does a good job of displaying the shape. 3.e. His website supports AP (Advanced Placement) Statistics. In what sense is machine C1 µbetter¶ at producing pins? Justify your argument. are making pins which must have a diameter of 8 cm ± . They are both on the same scale. symmetry. granularity and outliers. . with dots replacing the values on the leaves. . C1 and C2. Dotplots of 50 pins from each machine are displayed below.

The darker points represent two or more values plotted at the same location Which charactistics of the dataset does this dotplot highlight? This dotplot shows that the data is bimodal. but each display adds to the overall picture that we are trying to form. Histograms As a teacher of junior maths and Maths in Society. There is some granularity evident. and gives a good feel for the spread of the data. I used to think that a histogram was a rather trivial statistical object. This type of dotplot doesn¶t give a good feel for the shape of the distribution of the data or allow the student to accurately estimate the location of the centre. I never realised that statisticians actually find histograms to be useful! . and there are no outliers.An Alternative Method of Constructing a Dotplot Here is a dotplot from NCSS 97 of the time between eruptions from the Old Faithful dataset. As there are over two hundred data values it would not have been feasible to use a more traditional dotplot. This plot displays the scale along a vertical axis. Access to statistics software is vital if the student is to generate these displays without getting bogged down in this stage of the analysis. The horizontal component is randomised so that not all points are plotted at exactly the same location. The value of each dot is given by its vertical component. sort of a bar graph with the gaps removed to save space. For many real datasets a single type of display doesn¶t suffice.

including the graphical display of data. Students will improve their ability to visualise the shape of a distribution given the summary statistics. With histograms. The article How Wide Is Your Bin? contains an interesting thread (i. including many hypothesis tests. therefore stemplots are preferable for a small dataset.e. unlike the stemplot. the location and the spread. the original data are usually lost. And no statistican would rely strictly on formal tests without viewing the data also. Bin Width Statistics computer programs and graphical calculators will generate a default histogram if bin width or the number of bins is not specified. It is interesting that there is no clear winner in the choice of algorithm used for choosing the number of bins or the bin width. Matching Histograms and Boxplots Matching Histograms and Summary Statistics Students will improve their ability to interpret the information given in a boxplot by matching boxplots of sample data drawn from different distributions with their associated histograms. symmetry. One application of the humble histogram is determining if a set of data is approximately normally distributed.A modern data-centred approach to statistics starts with viewing the data in a variety of ways. Visit the STEPS page for further information and a list of the modules available. a discussion topic) from the Ed-Stats mailing list. A histogram with a scale on the horizontal axis is generally useful for showing all of these features. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. though for some distributions the features of a dataset can be disguised or distorted due to a particular choice of bin width. though for a given dataset one or the other of these methods of displaying the data may be preferable. While there are formal tests of normality. Histograms and Stemplots Compared A histogram shows much the same information as a stemplot. existence of outliers and evidence of clusters or gaps. Some points to note: y y y y Histograms are preferable for larger datasets as stemplots become unwieldy. What is meant by viewing the data? Features of interest to a statistician are the overall shape of the data. though a histogram is most effective with for this purpose if the dataset is large. Histograms take more time than a stemplot to construct by hand. The Density Trace . often a quick look at a histogram of the data is sufficient. Normality is a pre-condition for certain analyses of data. The choice of bin size or number of bins is not restricted.

each with a different bin width. From the Exploring Data website . Which histogram is preferred depends upon which aspects of a dataset are to be and watch how that effects the shape of the histogram. Slide the bar to change bin widths. For some datasets.qld. However. The article (which is the NCSS Jr help file on this topic) The Density Trace discusses this plot further. and constructing a histogram by hand is a tedious process. have a look at the Histogram Applet. as a variety of histograms.Efficient Storing of Data on the TI-83.e. about storing data in a program for later use.0 Jr allows the user to add a display called a density trace to a histogram. Dept.http://curriculum.qed. To see this for yourself. from R. can be constructed. you usually lose the actual data values. Beware the Humble Histogram! Ideally a histogram should show the shape of the distribution of the data. It is efficient. The Histogram and Stemplot Compared A histogram is an alternative to a stemplot for displaying data. It is displayed as the curved line in the diagram. new methods of displaying a dataset have been invented. Univ. 1997 Histograms Worksheet Datasets and Stories for Histograms There is benefit in students using the same datasets for different analyses. USA. The series nicely shows the effect of bin size on the appearance of the histogram.With the widespread use of computers in modern statistics. the Word document Old Unfaithful contains a series of histograms of the interruption time of the Old Faithful geyser. NCSS 6. The density trace can be thought of as a smoothed histogram in which the problems caused by fixed bin widths are obviated. a histogram is under no such restriction. If they are using a graphing calculator the students don¶t need to enter a new set of data into the lists. Webster West.) Another benefit is the opportunity to contrast the features of the data highlighted by each display. A stemplot is restricted by our number system to certain bin widths. of South Carolina (you will need a javaenabled browser to see the applet). For these reasons I suggest the students use the datasets and stories on the stemplots .au/kla/eda/ © Education Queensland. (I recommend you read the article by Al Coons. a decision about bin sizes and the number of bins has to be made when tabulating the data. the choice of bin width can have a profound effect on how the histogram displays the data. Will you ever trust a histogram again? As most classrooms don¶t have Internet access on tap. When constructing a histogram by hand. time between eruptions) of the Old Faithful Geyser in Wyoming. A computer is of value here. as students don¶t need to acquaint themselves with a new story for each display. of Statistics. It is a histogram of the interruption time (i. A poor decision can result in a histogram that either gives misleading information about the data or fails to inform the viewer about some aspect of the data.

different choices of bin widths may give histograms that look markedly different. Oscar Winners. Note that graphical calculators and computer statistics programs don't necessarily choose the best display by default and hence it is an unwise student who doesn't construct a few histograms of varying bin widths as part of their analysis. Students should realise that a dataset doesn't have a single histogram but many histograms. Using Technology As noted elsewhere. Drawing a single histogram by hand should be sufficient for the students to get a feel for the mechanics of drawing a histogram. Speed of Light and Wild Horses. data that is symmetric and with no clustering or outliers. and then report to the class. For such datasets students will need to produce a variety of histograms. i. y look for any other features of interest such as clustering or gaps. one for each choice of bin width. Other datasets that are appropriate for histograms include Air Pollution. y note the overall shape of the distribution. Bradmanesque. Students need to practice writing a short report on the interesting features brought out by a graphical display. One approach would be to give each small group a different dataset and story and have them produce the display (say on a graphical calculator). and look for potential outliers. so additional histograms should be constructed using a computer or a . Analysing a Histogram After the histogram is drawn. especially to match the quality and accuracy of a histogram drawn by even the simplest computer statistics program. discuss within the group the characteristics of the data brought out by the display. For 'nice' data. students should y locate the approximate centre of the distribution by eye. y determine the spread of the data. and then make and defend their choice as to which is 'best'. With data that isn't so nice. the set of histograms may all give the same general picture so the choice of histogram is not critical.e. it is quite time-consuming to draw a histogram. Follow the links to the datasets and from there to their stories. Introducing Histograms Give students a set of data and the accompanying story.worksheet when learning about histograms as well as the data generated when the students played Greed!.

qed.qld. by writing the letter of the boxplot in the space provided. Students are given a worksheet which contains a series of histograms in the left column and a series of boxplots in the right column. of course. . From the Exploring Data website . which is available from the Resources page of the Exploring Data website. and possibly the least important task you could ask a student to do in statistics. 1997 Materials: Time: Instructions: Matching Histograms and Boxplots Match each histogram with its corresponding boxplot. for your interest. 1997 Matching Histograms and Boxplots Objective: Students will improve their ability to interpret the information given in a boxplot. It follows that students shouldn¶t be required to construct any of these displays by hand for assessment purposes . From the Exploring Data website . The distribution from which each sample was drawn in given in the solutions.graphical calculator. These remarks apply equally to other graphical is a trivial © Education Queensland. One worksheet per student or small group. and deciding what the data say).au/kla/eda/ © Education Queensland. They need to be able to defend their decisions in a subsequent whole class discussion.qld. Let the technology shine in its sphere (repetitive algorithmic processes) and let the students shine in their sphere (looking for patterns.qed.http://curriculum. by matching boxplots of sample data drawn from different distributions with their associated histograms. 20 Note: the sample data for this worksheet was generated with a nifty little freeware Windows program called PQRS. They are to match each boxplot with its associated histogram.

_______ A._______ D._______ B.Geometric(1) .1) A .1. 1997 Matching Histograms and Boxplots . 5. 4._______ [To Instructions] [To Solutions] E. 2. 3._______ C.Solutions D .Normal(0. © Education Queensland.

Weibull(4.calculating summary statistics is a waste of time until the user decides what is important about the data and which summary statistics may be useful. I must say I was intrigued to see that the topic of finding an average inStatistics.Geometric(1) B . Before I go into detail on these.9) C . Concepts and Controversies by David S.1. STEPS .1) B . median and mode are boring? Well. but there are some interesting little side alleys to this topic that are worth exploring. Moore is delayed until page 237! This illustrates two very important points .4) Measures of Location o you think teaching about the mean.A .Uniform(1.9) E -Uniform(1. Well maybe that is only one important point.5) [To Instructions] [To Worksheet] E .Weibull(4.4) D .Weibull(4.Weibull(4. maybe.Normal(0.1.5) C .

On this basis the range. If you are interested in their responses read The Mean? Who Needs It! Finally. Range One measure of the usefulness of a statistics is its robustness. Sex and Dating The results of a sex survey conducted in the Chicago area gave the average number of lifetime sex partners for men as 6. The mean and the median are the common measures of location. Visit the STEPS page for further information and a list of the modules available. and for women as 2. This statistic wasn't questioned until someone posting to the rec. is this possible?' Read the article Sex Survey to find out more.puzzles newsgroup asked. Therefore the average number of legs is: (2000 x 1 + 18 998 000 x 2) / 19 000 000= 1. A robust measure is one that is little affected by outliers.999895.smallest data value . including summary statistics. Which Average? The choice of measure of location requires understanding of the properties of each measure. Vary Useful Statistics The measure of spread of a dataset is a vary useful statistic! When summarising a dataset.we can ignore the mean as a descriptor of a dataset and put the entire burden of locating a dataset onto the median. which is simply calculated as Range = largest data value . Simpson's Paradox Have you ever noticed that a government can give tax cuts to the population and still earn more money than ever before? Did you realise that it is possible for Steve Waugh to have a better batting average than his brother Mark in each of two Ashes Series and yet have a worse average overall? Read the article Simpson's Paradox to learn about these and other intriguing examples of this to locate the dataset and another to indicate the spread of the data. Note: you may need to change the context before you introduce this little puzzle to students! Abolish the Mean! I once had a clever idea . Since most people have two legs.. So I told some statisticians about it. at least two measures are needed . The A Rather Average Worksheet contains three nice problems on this topic.. while the standard deviation and interquartile range are commonly used to summarise the spread of the data.The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. did you know that the great majority of people have more than the average number of legs? Amongst the 19 million people in Australia there are probably 2 000 people who have only one leg and no one has three or more legs. 'Hey.

it doesn't give as much information about the data as the standard boxplot. Interquartile Range The interquartile range. Even the TI-83 graphical calculator. Mean Deviation Until recently I was never able to satisfactorily answer the question. Unfortunately boxplots in our texts tend to only be simple boxplots. In fact I would rate these questions as being at least 1. so I recommend that your students learn to draw standard boxplots as well as simple ones. Nonetheless statisticians almost exclusively draw boxplots with outliers. even if done by hand. Visit the STEPS page for further information and a list of the modules available. While this is suitable for a quick analysis. This shouldn't be surprising as the Mathematics A syllabus only makes mention of simple boxplots. and is the basis of the article. When to Choose the Boxplot . courtesy of Pat Ballew. while simple in concept. There are two basic flavours of boxplot. The article How to Construct a Boxplot explains how to do it. I'm Not Mad about MAD. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. including summary statistics. the calculator of choice for AP-Statistics students in the US. "The mean deviation is simple. if they exist. in the context of constructing a boxplot. The 'simple' boxplot has the whiskers drawn out to the maximum and minimum values.5 standard deviations above the average question. Why is the standard deviation used rather than the mean deviation?" An email by Paul Gardner from Monash University gives a clear explanation. Boxplots The treatment of boxplots in current senior secondary textbooks highlights the need for Queensland high school teachers to use resources other than the textbook when teaching introductory statistics. It could have been worse . Such points are called outliers. which draws the whiskers no longer than 1. The process isn't early model of a graphical calculator available in Queensland used the mean rather than the median to mark the centre of the dataset. draws boxplots two obviously not very robust and hence is not particularly useful. and the default boxplot ignores outliers.5 IQRs from the box and locates points beyond that individually. and makes a recommendation as to which is 'best'. has caused much grief to introductory statistics teachers since different respectable sources define it in different respectable ways! The article Ticky-Tacky Boxesdiscusses the different methods of finding Q1 and Q3. Standard Deviation The Measures of Spread worksheet contains some lovely questions on standard deviation.

I teach students a method that is easy to remember and easy to do. It is a great story to tell students who wantonly delete outliers from a dataset merely because theyare outliers. a statistician looks at the data. Constructing Boxplots It is interesting that there is general agreement among statisticians about how to construct the whiskers and determine outliers (which is where the problem lies with our texts) but very little agreement on how to construct the box. Differences in the centres and spread of the datasets are clearly visible with a boxplot. and histograms and a boxplot would be the displays most often chosen. including boxplots. There is assessment available from theAssessment page. Visit the Ozone and Outliers page for all of the fascinating details. who has provided much of the information for this article. The Boxplots worksheet contains data drawn from physics. The Codeine Concentrationsworksheet has some data suitable for displaying using boxplots. you may find the article Ticky-Tacky Boxes interesting. please don't try to tell your students about all of this. Thanks to Bob Hayden. The 1970 Draft Lottery . Read the article How to Construct a Boxplot for details. may be needed for a solution.whatever you do. Ticky-Tacky Boxes If you are interested in learning about different methods of calculating the 1st and 3rd quartiles (and the angst this has caused among AP-Stats teachers). cricket and biology. You will only confuse the cherubs. and shows outliers very clearly. Prior to conducting a hypothesis test. where a variety of graphical displays. A condition of many hypothesis tests is that the data is approximately normally distributed and a boxplot can assist in determining this. Matching Histograms and Boxplots Students will improve their ability to interpret the information given in a boxplot by matching boxplots of sample data drawn from different distributions with their associated histograms. including the graphical display of data. Visit the STEPS page for further information and a list of the modules available. Warning . A boxplot also gives a picture of the symmetry of a dataset. The article is based on emails from the AP-Stat and Edstat mailing lists. Both of these features are important when deciding which summary statistics would best describe the dataset. Using the KISS principle. Ozone and Outliers The 'ozone hole' above Antarctica provides the setting for one of the most infamous outliers in recent history.Boxplots are most useful when comparing two or more sets of sample data. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics.

and preferably understand how to construct one. discusses why it is a valuable tool in your arsenal of graphical displays and gives you a recipe for making your very own plot. Based on what the boxplots show. introduces the normal plot. At a minimum a student studying the Probability and Statistics optional unit in Mathematics C should be able to interpret a normal plot.. Normal plots are not mentioned in the Mathematics A. 3-D Scatterplot Java Applet Given three columns of data. You will need a VRML browser to view the scatterplot. Yours truly was given a free ticket in the lottery. this site generates a VRML file for viewing the data in 3-dimensions.. which gave rise to possibly the single most famous set of boxplots in existence. A nice introduction to scatterplots and the importance of displaying data clearly. it is often rather difficult to classify a particular document as one or the other. so the story on this page is of uncommon interest to me. Using a Scatterplot to Find a Friend Peter Smith from Mechanicsburg High School in Pennsylvania shares this great introductory activity that helps students learn about scatterplots and correlation.the first has F. B or C syllabus.No discussion of boxplots should leave out the story of the 1970 Draft Lottery. Scatterplots The Challenger Disaster A worksheet that gives a brief background to the Challenger disaster and the dataset that gave warning of the disaster. which is one of the assumptions on which the t-test is based. It convinced me anyway.0 Jr.0 only) . The article Normally I Wouldn't Reveal the Plot . Great fun. Anscombe's famous quartet of datasets and the second has the scatterplots of these datasets. Nonetheless the collection here consists of documents that were specifically intended to be assessment. However they are a valuable tool for determining if a dataset is normal. Absolutely convincing proof of the need to look at the data first. including 'flying' right in the middle of the data. Anscombe's Datasets Two overhead transparencies . The scatterplots were produced using NCSS 6. the first lottery held to select those chosen to serve in Vietnam. Bradmanesque (available in Word 2. Assessment As one person's worksheet is another's assignment. it turns out that this October-born lad was even luckier than was thought at the time.J. Normal Plots The normal probability plot (sometimes called a quantile plot) is a useful tool for determining the normality of a dataset.

The Age of Female Actor Oscar Winners This dataset has some intriguing patterns. As an indication of the size of the item bank the Descriptive Statistics section (one section of eight sections) has a filesize of about 150K. mainly in the US) maintain websites with worksheets. Pecking Order in Chickens A researcher on animal behavior wants to study the relationship between pecking order and weight. Topics: summary statistics. In this assigment the student trys to find a mathematical function which allows the price of a diamond ring to be determined from the size of the diamond. Topic: curve fitting. AP Stats Assessment A number of teachers of Advanced Placement Statistics (a first year tertiary statistics course taught in high schools. Al Coons at Buckingham Browne & Nichols School . The AP-Statistics course covers all of the statistics in Maths A. As the researcher¶s assistant. Pricing Diamond Rings Pricing diamond rings in Singapore can be viewed as an interesting exercise in statistical modelling. graphical displays. a craftsmanship fee plus the cost of the diamond. you have been asked to analyse this data (and possibly generate some graphical displays) and write a report on the relationship. They are categorised for convenience. datasets and assessment. many of them multiple choice from Georgia State University. Topics: graphical display of data. if any.A lovely assignment that requires students to draw from their pool of knowledge about descriptive statistics. B and C and more. summary statistics. curve fitting. Topics: various Introduction to Business Statistics at Georgia State University An absolutely enormous item bank of statistics questions. with their associated datasets available from the Datasets page. More Stories Here are some more stories. The price equals the current market value of the gold content of the ring. Students are asked to determine if the average age of female actor Oscar winners is increasing. summary statistics. He places four chickens in each of seven pens and observes the pecking order that emerges in each pen. between pecking order and weight. The assignment includes a graphic of some of Galileo's original notes. as you desire. These can be turned into assessment items or further examples or exercises. Galileo's Gravity and Motion Experiments This dataset may need some dusting off as it is over 400 years old! Galileo produced this data when he was studying motion under gravity. Topics: graphical display of data. Topics: linear regression.

an Outlier? Bradmanesque Carbon Dioxide Carbon Emissions Challenger Cloud Seeding Codeine Concentration Cricket (the Insect) Density of the Earth Density of Nitrogen Diamond Rings Fleas Topics boxplot boxplot scatterplot. worksheets.0 datasets require that you download two files. assessment and articles in the Exploring Data website.Follow the link to Projects/Student Papers.0 . curve fitting scatterplot. Notes: Clicking on the name of the dataset will give you the story behind the dataset. summary statistics nonlinear regression data display. NCSS Jr. 6. Paul Myers at Woodward Academy Follow the link to Assessment.Excel 4.0 and Tab Delimited . NCSS Jr 6. time series Formats Excel NCSS Tab . summary statistics boxplot linear regression. The articleStudent-Generated Data discusses a few ways that this can be done. His complete set of tests from 1997 is currently available. Note that some of the projects are outside of our syllabus areas (eg Chi-Square). Datasets These datasets support the activities. nonlinear regression boxplot. Datasets are available in three formats . with extensions . summary statistics curve fitting curve fitting scatterplot boxplots. Paul is posting his tests in html format. dotplot boxplots.Al's website is a real treasure for anyone teaching AP Statistics for the first time.s0 and . Dataset 1970 US Draft Lottery 1971 US Draft Lottery AIDS / HIV Air Pollution Alligator! Anscombe's Dataset Bradman . scatterplot boxplot. graphical display graphical display. t-test linear regression graphical display. But they should also gather data themselves.s1 Students should work with real data gathered by others for purposes of solving problems.

In real life functions often arise from data gathered from experiments or observations. summary statistics linear regression. curve fitting graphical display. regression scatterplot. defined as the number of steps per second. summary statistics exponential regression histogram linear regression graphical display.Galileo's Experiments Global Temperature 1 Global Temperature 2 Metric Estimates Oil Production Old Faithful Olympic Gold Oscar Winners Pecking Order Pottery Smoking and Cancer Speed of Light Stride Rate Wild Horse World Population Year 10 Certificates polynomial regression scatterplot. Using Statistics in Human Movements One measure of form for a runner is stride rate. . summary statistics graphical display. the student MUST first plot the data. summary statistics boxplots scatterplot. The first functions we study in Maths B are linear. Anscombe invented these datasets to demonstrate the importance of graphing the data before finding the correlation and line of regression. linear regression graphical display. Looking at the Data When fitting a function to data.J. There is variability in real data that needs to be explained and measured. This article gives a fully-worked solution to finding the stride rate as a function of speed using the statistics functions of the TI-83 graphics calculator. summary statistics Linear Regression The study of functions in Maths B can be enriched by including authentic applications which illustrate how mathematics can model aspects of the world. and it is the task of the student to find the function that best 'fits' the data in some sense. The stride rate is related to speed. the greater the speed. F. summary statistics exponential regression graphical display. regression graphical display. and this activity shows why. and such data rarely falls neatly into a straight line or along a curve. A runner is considered to be efficient if the stride rate is close to optimum. They present a very striking picture. the greater the stride rate. so it makes sense to start with problems that are whose data are linear in nature.

as every statistics textbook I have ever used always had a question that started. So I decided to ask my colleagues.. for a variety of distributions.5*IQR (or Q3 + 1. This worksheet asks the student to demonstrate this for the normal distribution.¶. which resulted in this interesting exchange. Students are asked to find a linear model for each set of data.5?¶. This page contains a small collection. once told me that he believed that the lifetime of light bulbs and car batteries both have a decaying exponential distribution. why do we use Q1 . with mean. discus throw since 1896 are supplied. was a student of John Tukey. This article contains examples from high schools in the U. Tukey answered. a statistician at Cornell University.5? Many students are curious about the µ1. When he asked Tukey. .e... µAssume the lifetime of light bulbs is normally distributed. Linear Regression Java Applet This applet teaches students the effect on a regression line of adding an additional point.¶ It has been shown that this is a reasonable rule for determining if a point is an outlier.S. who invented the boxplot and the 1.5*IQR Rule. Probability Probability is a wonderful subject to teach! There are so many activities for teaching concepts. and predict the gold medal performance in Sydney in the year 2000. as this illustrates how fitting a function to data may be done in real-life.5*IQR) as the value for deciding if a data value is classified as an outlier? Paul Velleman. µBecause 1 is too small and 2 is too large. Light Bulbs and Dead Batteries Don Kerr. Normal Distribution Why 1. I was intrigued by this. so datasets about the Oympics are worth their weight in gold medals.5*IQR Rule¶. which gives them ownership of the data and an understanding of the process (often difficult) of collecting reliable and valid data. puzzles and problems with non-intuitive answers and a variety of contexts for the exercises. But we should also get our students to generate their own data. high jump. Olympic Gold Medal Performances The Olympics coming to Australia in the year 2000. µWhy 1. gathered to give insight into real problems. Normal Approximation to the Binomial A Java applet that visually demonstrates how accurately the normal distribution approximates the binomial distribution for given values of n and p.Student Generated Linear Data At times we should use real data. of Brisbane-based Zeno Educational Consultants. In this worksheet the data for the gold medal performances in long jump. i.1.

. Unders and Overs was once a popular game at school fetes in Queensland. then you are excused from homework for one week. Any takers? After all. I'll let you choose first. If you are the champion. which deals with elementary probability.. We will each chose a die. Now the reason these puzzles are so popular is because they are great puzzles! The Monty Hallpuzzle can even boast about a little collection of websites devoted to it. and then explain the rules 'I have this set of four dice. in fact since I am such a generous person. cards. I have used this activity in the past. Dice Difference Dice Difference is really a dice game with a difference! Rather than add the numbers on the two dice together.Unders and Overs Now having thrown out that challenge. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. We'll roll the two dice and the winner is the person who die has the highest number. while person B gets the totals. 4 and 5. Bring into your class a special set of dice. The first person to record five wins is the champion. The puzzle most commonly posted by newbies is the 'Three Men and the Bellboy' puzzle. but breathing down the neck of the bellboy is the 'Monty Hall' puzzle. 1 and 2. but not with the flair that Bill Simpson demonstrated at a Fun of Mathematics night at the University of Queensland. which usually turns out to be a puzzle that was posted last week and the week before that and. you have to do an extra hour of homework tonight.puzzles newsgroup regulars get very annoyed when a newbie finds the newsgroup and immediately posts his favourite puzzle. based on a special set of dice.. The Monty Hall Problem The rec. the first activity I am going to suggest to you involves dice! But here the dice are being used in the context of a once popular gambling game and not as a dry as toast exercise with little relevance. Now if I am the champion. A Special Set of Dice Here is a neat little trick to play on your students. dice. as was all gambling. 3. but people turned a blind eye as the money raised went to a school. Visit the STEPS page for further information and a list of the modules available. Person A gets the totals 0. the student will be excused from doing the extra homework if the class can figure out why the teacher wins almost all of the time. marbles and urns? A Dice-Free . including the binomial distribution and conditional probability. It was illegal.' Of course. subtract the smaller from the larger. you've got to be in it to win it. Is this game fair? A Dice-Free Worksheet What message is being given to students about the importance of understanding probability when a large proportion of the exercises in our texts are based on coins.

The correct interpretation is based on repeated sampling. assume that our population parameter of interest is the population mean. Data Collection and Sampling A website containing a collection of articles from the Hobart Mercury newspaper that illustrate both good and poor methods of sampling from a population. Understanding Confidence Intervals is an activity which helps students understand confidence intervals. including simple random samples and the distribution of sample means. But any particular confidence interval either contains the population mean. and a confidence interval is calculated from each sample. Sampling JellyBlubbers A hands-on introduction to simple random samples and the importance of sample size. The confidence interval shouldn¶t be interpreted as a probability. While such exercises take some effort to create.Worksheet gives examples of realistic applications of probability that are suitable for Maths A and Maths B students. If samples of the same size are drawn repeatedly from a population. Visit the STEPS page for further information and a list of the modules available. I think the effort is necessary if we are to help students realise why we study this topic. STEPS .0 format as I haven't been able to convert the graphic to Word 2. Confidence Intervals The concept of a confidence interval is quite difficult for beginning statistics students. Chance and Basic Probability A website containing a collection of articles from the Hobart Mercury newspaper that illustrate various aspects of probability in the news. The activity requires a TI-83 graphing calculator.0 format (yet). What is the meaning of a 95% confidence interval in this situation? Many students want to say that a 95% confidence interval means that there is a 95% chance that the confidence interval contains the population mean. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. Note that this worksheet is in Word 6. The worksheet containing the JellyBlubbers population may be useful for hypothesis testing as well. or it doesn¶t. then 95% of these intervals should contain the population mean. and sometimes for beginning statistics teachers! For example.

Hypothesis Testing Introduce the Concept Early! The concept of hypothesis testing should be introduced informally when first constructing stemplots and histograms.' Students who understand this conclusion are well are their way towards understanding hypothesis testing. The 'Barramundi' Dataset This dataset contains 1000 integers (having a normal distribution with mean=55 and sigma=12. after students have constructed a stemplot of Henry Cavendish's data on the density of the earth. Confidence Intervals Java Applet The applet helps students understand confidence intervals. Inference in the News A website containing a collection of articles from the Hobart Mercury newspaper that illustrate various aspects of drawing conclusions from data. The HTML version of the dataset contains 800 integers as that was as many as would fit onto two pages. N. For example. I have used this data to simulate a population of barramundi in the Fitzroy river.. including confidence intervals. the following issues can be discussed: y y y y The data is only a sample of all possible measurements.3. Why measurements. Visit the STEPS page for further information and a list of the modules available. Visit the STEPS page for further information and a list of the modules available. . Each of the 50 lines on the graph represents one confidence interval for the mean. It is worth mentioning at this early stage that there are statistical procedures that allow us to make precise statements about liklihood and that the students will meet these later in the unit. STEPS The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. The true value of the density can be estimated.B. The data fit nicely onto both sides of an A4 sheet of paper. A conclusion might be. The idea of a point estimate and an interval estimate arises quite naturally. The terminology relating to populations and samples can be introduced early in the study of statistics and used consistently throughout the unit. even of the same quantity. including hypothesis testing. but you of course can make up your own scenario. are not identical.The STEPS modules are a collection of hypertext-based tutorials covering a wide range of statistics topics. 'It is highly unlikely that the actual density of the earth is 5.3 or lower. The likelihood of the true density being as low as 5.

J. Ho. including using a curve fitting software program such as CurveExpert.uta. He created a quartet of paired datasets that wonderfully illustrate first has F. he had a lot of trouble remembering where to put the equal sign. also known as curve fitting." Eventually Santa had to shorten this phrase to make it easier to remember. Ho. Pricing Diamond Rings Assignment This document is an assignment on finding a regression equation relating the price of a diamond ring with the size of the diamond. exponential or log functions or periodic functions students in your classroom are working with real problems containing real data (and I hope they are) then there is no choice about including this topic. And if in the study of polynomials. Future Developments .4. Most of you do not know that when Santa was a young man he had to take a statistics course. Anscombe's famous quartet of datasets with some summary statistics and the second has the scatterplots of these datasets. discusses the mathematics needed to understand nonlinear regression. The data and the idea for this assignment came from the article Diamond Ring Pricing Using Linear Regression in the Journal of Statistics Education v. When the class started covering two-sided hypothesis tests. Anscombe's Dataset F. available as a Word 2 document only. Hypothesis Testing Joke This joke is from Mark Eakin (eakin@omega. The equal sign goes in the null hypothesis. Students will need to be familar with both linear and nonlinear regression. It's already there. It includes some fully worked examples of how to determine which nonlinear function best fits a set of data as well as a sample assignment. Anscombe was a pioneer in demonstrating the importance of looking at a set of data before choosing which analyses were appropriate.J. He started repeating to himself "The equal sign goes in the null hypothesis.Central Limit Theorem This applet demonstrates the central limit theorem using simulated dice-rolling experiments. who has kindly given permission for me to include it on the website. The equal sign goes in the null hypothesis. It looks at both linear and nonlinear models. Anscombe's Datasets contain masters of two overhead transparencies ." Curve Fitting Nonlinear regression. Curve Fitting The paper Curve Fitting. Note that the file is quite large (888K) as it contains numerous screen graphics from a graphics calculator and statistics software. nicely integrates statistics and the study of functions. In fact to this day you can still hear him say "Ho. n.3 (1996) by Singfat Chu.

. but when I do I will make a document that discusses this issue available from this page .There has been an interesting discussion on the ap-stat mailing list about the interpretation of r and r2 when dealing with nonlinear data. I haven't absorbed it all yet.