# STATISTICAL / DATA PRESENTATION TOOLS

Descriptive statistics enable us to understand data through summary values and graphical presentations. Summary values not only include the average, but also the spread, median, mode, range, and standard deviation. It is important to look at summary statistics along with the data set to understand the entire picture, as the same summary statistics may describe very different data sets. Descriptive statistics can be illustrated in an understandable fashion by presenting them graphically using statistical and data presentation tools.

STATISTICAL / DATA PRESENTATION TOOLS
When creating graphic displays, keep in mind the following questions:  What am I trying to communicate?  Who is my audience?  What might prevent them from understanding this display?  Does the display tell the entire story?

the number of complaints. charts displaying trends (run and control charts)... percentage who want blue/percentage who want red/percentage who want yellow. based on some continuous scale: e. charts displaying distributions (histograms). and Pareto charts. and charts displaying associations (scatter diagrams). including: (a) (b) (c) (d) charts displaying frequencies (bar. Variable data are measurement data. cost.g. time.g. Attribute data are countable data or data that can be put into categories: e. length. Different types of data require different kinds of statistical tools. . the number of people willing to pay. There are two types of data. pie.STATISTICAL / DATA PRESENTATION TOOLS Several types of statistical/data presentation tools exist.

variable data) Forty or more paired measurements (measures of both things of interest. variable data) Frequency of occurrence: Bar chart Simple percentages or Pie chart comparisons of magnitude Pareto chart Trends over time Line graph Run chart Control chart Histograms Distribution: Variation not related to time (distributions) Association: Looking for a correlation between two things Scatter diagram .CHOOSING DATA DISPLAY TOOLS To Show Use Data Needed Tallies by category (data can be attribute data or variable data divided into categories) Measurements taken in chronological order (attribute or variable data can be used) Forty or more measurements (not necessarily in chronological order.

Boxplots can be drawn either horizontally or vertically. and identify outliers.  upper quartile (Q3).  and largest observation (sample maximum).BOXPLOT In descriptive statistics. Boxplots display differences between populations without making any assumptions of the underlying statistical distribution: they are non-parametric. might be considered outliers.  median (Q2). . a box plot is a convenient way of graphically depicting groups of numerical data through their five-number summaries:  the smallest observation (sample minimum).  lower quartile (Q1). A boxplot may also indicate which observations. if any. The spacings between the different parts of the box help indicate the degree of dispersion (spread) and skewness in the data.

BOXPLOT .

The left column contains the stems and the right column contains the leaves. Unlike histograms. to assist in visualizing the shape of a distribution. is a device for presenting quantitative data in a graphical format. and put the data in order. in statistics. A basic stemplot contains two columns separated by a vertical line.STEMPLOT A stemplot (or stem-and-leaf plot). similar to a histogram. thereby easing the move to order-based inference and non-parametric statistics. Stem and Leaf Graph used for Japanese Train Time Table . stemplots retain the original data to at least two significant digits.

Here is the sorted set of data values that will be used in the following example: 44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106 Next. In the case of very large numbers. The remaining digits to the left of the rounded place value are used as the stem. the leaf contains the last digit of the number and the stem contains all of the other digits. the leaf represents the ones place and the stem will represent the rest of the number (tens place and higher). .CONSTRUCTING A STEMPLOT To construct a stem plot. we must determine what the stems will represent and what the leaves will represent. the observations must first be sorted in ascending order. In this example. Typically. the data values may be rounded to a particular place value (such as the hundreds place) that will be used for the leaves.

CONSTRUCTING A STEMPLOT The stemplot is drawn with two columns separated by a vertical line. The leaves are listed in increasing order in a row to the right of each stem. It is important that each stem is listed only once and that no numbers are skipped. even if it means that some stems have no leaves. The stems are listed to the left of the vertical line.0 . It is important to note that when there is a repeated number in the data (such as two 44's) then the plot must reflect such (so the plot would look like 4 | 4 4 6 7 9 if it had the numbers 44 44 46 47 49) 4|4679 5| 6|34688 7|2256 8|148 9| 10 | 6 key: 6|3=63 leaf unit: 1.0 stem unit: 10.

678. 56. As in this example below: -2 | 4 -1 | 2 -0 | 3 0|466 1|7 2|5 3| 4| 5|7 .87. the stem plot below would be created: -23. -12.45. 24. even for more complicated data sets.5. Non-integers are rounded.678758. 5. 16. which is still the value X / 10.8 For negative numbers.43. a negative is placed in front of the stem unit.CONSTRUCTING A STEMPLOT Rounding may be needed to create a stemplot. This allowed the stem and leaf plot to retain its shape. 5. 4.4. -3.7. Based on the following set of data.

SCATTER PLOT A scatter plot or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data. This kind of plot is also called a scatter chart. . each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. The data is displayed as a collection of points. scatter diagram and scatter graph.

If no dependent variable exists. it is called the control parameter or independent variable and is customarily plotted along the horizontal axis. either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables.SCATTER PLOT A scatter plot is used when a variable exists that is under the control of the experimenter. . If a parameter exists that is systematically incremented and/or decremented by the other. The measured or dependent variable is customarily plotted along the vertical axis.

The Scatter plot takes multiple scalar variables and uses them for different axes in phase space.SCATTER PLOT A 3D scatter plot allows for the visualization of multivariate data of up to four dimensions. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and colored using another scalar variable .

It then shows the proportion of cases that fall into each of several categories.HISTOGRAM In statistics. i. The categories (intervals) must be adjacent. showing a visual impression of the distribution of experimental data.. The height of a rectangle is also equal to the frequency density of the interval. The categories are usually specified as consecutive. a histogram is a graphical representation.e. shown as adjacent rectangles. It is an estimate of the probability distribution of a continuous variable. with the total area equalling 1. erected over discrete intervals (bins). with an area equal to the frequency of the observations in the interval. and often are chosen to be of the same size. the frequency divided by the width of the interval. non-overlapping intervals of a variable. The total area of the histogram is equal to the number of data. A histogram may also be normalized displaying relative frequencies. . A histogram consists of tabular frequencies.

HISTOGRAM Histograms are used to plot density of data. . then a histogram is identical to a relative frequency plot. The total area of a histogram used for probability density is always normalized to 1. If the length of the intervals on the x-axis are all 1. and often for density estimation: estimating the probability density function of the underlying variable.

. The bars can also be plotted horizontally.BAR CHART A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent.

some examples of continuous data would be 'height' or 'weight'.BAR CHART Bar charts are used for plotting discrete (or 'discontinuous') data i. In contrast. data which has discrete values and is not continuous. . Some examples of discontinuous data include 'shoe size' or 'eye color'.e. for which you would use a bar chart. A bar chart is very useful if you are trying to record certain information whether it is continuous or not continuous data.

Pie chart of populations of English native speakers . It is named for its resemblance to a pie which has been sliced. is proportional to the quantity it represents.PIE CHART A pie chart (or a circle graph) is a circular chart divided into sectors. the sectors create a full disk. the arc length of each sector (and consequently its central angle and area). Together. In a pie chart. illustrating proportion. When angles are measured with 1 turn as unit then a number of percent is identified with the same number of centiturns.

Dot plots are one of the simplest statistical plots. box plot or histogram may be more efficient. Their other advantage is the conservation of numerical information. Dot plots are used for continuous. They are useful for highlighting clusters and gaps.DOT PLOTS A dot chart or dot plot is a statistical chart consisting of group of data points plotted on a simple scale. . as dot plots may become too cluttered after this point. and are suitable for small to moderate sized data sets. When dealing with larger data sets (around 20 30 or more data points) the related stemplot. Data points may be labelled if there are few of them. quantitative. as well as outliers. univariate data.

.DOT PLOTS A dot plot of 50 random values from 0 to 9.

It is a basic type of chart common in many fields. which displays information as a series of data points connected by straight line segments. and is created by connecting a series of points that represent individual measurements with line segments. A line chart is often used to visualize a trend in data over intervals of time a time series thus the line is often drawn chronologically . It is an extension of a scatter graph.LINE GRAPHS A line chart or line graph is a type of graph.