# 6/5/2011

Histogram Construction

Constructing a Histogram
Here is the data on starting salaries of 1995 Psychology graduates. When constructing a histogram it is helpful to sort the observations. 08820 10800 12000 12500 13000 14000 15000 16000 16500 16600 16700 16900 16900 17000 17000 17600 17880 18000 18000 18000 18000 18000 18000 18000 18000 18000 18000 18500 18680 19100 20000 20000 20000 20000 20000 20300 20900 22000 23000 23000 23000 23000 23400 24000 25000 25000 26000 26000 27000 30000 30000 32500 37000 48000 Minimum = 8820 Maximum = 48000 Range = 39180. To begin, decide how many intervals you would like. A good rule of thumb is to use the square root of the number of observations (after rounding). Here, that is the square root of 54 = 7.34; round up and use 8. The interval width should then be approximately equal to the range divided by the number of intervals. Range/# Intervals = 39180/8 = 4897.5; I'll round up to the conveniently even figure of 5000. (It is quite helpful to use a round number.) Start the first interval at a convenient value below the minimum. Here the minimum is 8820, so begin at 7500 (other choices are equally acceptable). The intervals then begin at 7500 and have a width of 5000. So, the first interval runs from 7500 to 12500, the second from 12500 to 17500 and so on. By convention we agree that an interval includes the lower boundary point, but does not include the upper boundary point. So, for instance, a value of 7500 falls in the (7500, 12500) interval, but a value of 12500 does not. A value of 12500 falls instead in the (12500, 17500) interval. Construct a simple table including each interval, the count of observations in that interval and the relative frequency or percentage of observations in the interval. Interval 7500-12499 12500-17499 17500-22499 22500-27499 27500-32499 32500-27499 37500-42499 42500-47499 47500-52499 Total Count Percentage 3 5.56 12 22.22 23 42.59 11 20.37 2 3.70 2 3.70 0 0.00 0 0.00 1 1.85 54 99.99

Take for instance the interval from 12500 to 17499 (the red row). Scroll back to the listing of
oswego.edu/~srp/stats/hist_con.htm 1/4

This is to avoid crowding the tick mark labels (including the 12500 would crowd the labels). Label your axes. You might also notice that we have 9 classes rather than the desired 8. (Sometimes you might get fewer intervals than you set out for.22222. The relative frequency of observations falling in this interval is then 12/54 = 0. They were somewhat subjective. The horizontal axis should stretch from the lower endpoint of the first interval (7500) to the higher endpoint of the final interval (52500).6/5/2011 Histogram Construction the data: the observations that fall in this interval are red. If you really have to have 8 intervals you might change the class width to 6000! Now. The remainder of the table is constructed in this fashion.. measured in \$. The exact percentage of observations in the 12500-17499 class is 22. The horizontal axis measures the variable Salary. There are 12 such observations.01 are to blame. draw a grid for your histogram. You might notice that the percentages do not add up to exactly 100%. Draw a bar over that interval with height 5. The vertical axis should be marked high enough to accomodate the highest percentage interval. 5.56.. No big deal.1 of 100% this artifact may be safely ignored.htm 2/4 . Effective displays tend to have a width to height ratio of about 4:3. You could include all interval endpoints if you wrote smaller or omitted the final two digits of each label: 12500 would become 125 and a ledger would indicate that all figures are in 100s. The slight difference between exact values and values to the nearest 0.2222 which is equivalent to 22.) This happened because of our choices of starting value and interval width.edu/~srp/stats/hist_con. The vertical axis measures the % of observations falling with an interval. Note that the tick marks are labeled once every two intervals.56% of the observations fall in the first interval (from 7500 to 12400). oswego. Include the unit of measurement. Generally. This is due to accumulated round-off error. if your total is within 0.22%.

edu/~srp/stats/hist_con. Continue until all intervals have been exhausted.htm 3/4 .22% of the observations fall in the second interval (from 12500 to 17499). Here's the final product! oswego.6/5/2011 Histogram Construction 22. Draw a bar over that interval with height 22.22.

6/5/2011 Histogram Construction oswego.htm 4/4 .edu/~srp/stats/hist_con.