Lecture 2 : Diagrammatic & Graphical Presentation of Data Department of Decision Sciences

Objective of the Lecture

 

To introduce diagrammatic and graphical statistical methods that allows managers to summarize data visually to produce useful information. To understand the importance of the graphical methods commonly used to summarize both qualitative and quantitative data. To know how they are prepared and how they should be interpreted.

Introduction

The most common and simple forms of pictorial representation of data are: (i) Bar diagram (ii) Histogram (iii) Pie diagram (iv) Stem-Leaf display (v) Frequency Polygon (vi) Ogive Though the first two approaches above are similar in nature, the bar diagram is meant for categorical data whereas the histogram and stem-leaf display are meant solely for quantitative data. On the other hand, pie diagram can be used for both types of data.

Example 1.1:

University Placement Office Survey

The student placement office at a university conducted a survey of last year's business school graduates to determine the general areas in which the graduates found jobs. The placement office intended to use the resulting information to help decide where to con-centrate its efforts in attracting companies to campus to conduct job interviews. Each graduate was asked in which area he or she found a job. The areas of employment are Accounting Finance General management Marketing/Sales Other

The responses were recorded using the codes 1, 2, 3, 4, find 5, respectively. Construct a frequency and relative frequency distribution for these data and graphically summarize the data by producing a bar chart and a pie chart. Data on the next slide…

Amity Business School Data 1 4 5 1 2 1 5 4 5 1 2 1 1 3 4 4 4 4 4 3 1 1 1 1 3 4 3 4 4 1 2 2 5 1 4 4 4 5 2 1 5 3 1 1 1 2 1 4 3 1 5 2 2 4 1 4 5 4 2 2 1 4 2 4 4 1 2 2 5 1 4 3 5 5 4 2 3 2 1 3 3 2 5 2 2 2 3 1 4 1 1 4 3 4 1 3 4 2 4 1 4 1 5 1 2 1 1 4 5 1 3 1 5 1 2 4 3 2 1 1 3 4 4 1 3 4 3 4 2 1 1 2 1 4 2 2 4 4 4 4 5 4 1 4 1 3 5 2 1 5 1 4 1 4 2 2 4 4 1 2 4 4 2 3 2 5 5 1 3 2 2 2 2 1 3 4 2 3 5 1 1 1 2 4 4 1 4 4 5 3 3 5 2 4 4 4 5 2 4 5 3 5 4 3 4 1 2 2 1 3 1 4 1 3 5 1 1 3 4 1 1 1 4 4 2 3 2 2 1 2 4 1 2 1 3 4 2 1 3 5 1 1 3 1 3 2 2 1 2 1 3 1 .

To choose the appropriate technique we must first identify the type of data.1 . In this example the data are nominal because the numbers represent categories. The frequency and relative frequency distributions are combined in Table 1. To extract the information requires the application of a statistical or graphical technique. The only calculation permitted on nominal data is to count the number of occurrences of each category.Amity Business School  Scanning the data produces no real information. The relative frequency distribution is produced by converting the frequencies into proportion. The list of the categories and their count constitute the frequency distribution.

1 Area Accounting Frequency 73 Relative Frequency 28.23% 25.85% Finance General Management Marketing / Sales Other Total 52 36 64 28 253 20.07% 100% Interpretation: Accounting is the most popular area of employment.Amity Business School Table 1. followed by marketing/ sales.55% 14.1 Frequency and Relative Frequency Distribution for Example 1. general management and others . finance.30% 11.

=COUNTIF([Data range). The frequency will appear in the dialog box. Alternatively. type the following into any active cell. In the Range box specify the input range of the data (A1:A254) In the Criteria box type the code you want to count (1) (2) (3) (4) (5). Click fx and select the category Statistical. Change the criteria to produce the frequency of another cate-gory. and the func-tion COUNTIF.Amity Business School MS – Excel Commands for Frequency     Type data into one or more columns Activate any empty cell. [Criteria])) .

Tables and Tally Individual Variables Type or use the Select button to specify the name of the variable or the column where the data are stored in the Variable box (Area). Under Display click Counts and Percents. . Click Stat.Amity Business School Minitab Commands for Frequency    Type data into one or more columns.

Amity Business School SPSS Commands for Frequency     Click on Analyze at the SPSS menu bar. . Click OK to get the tables with counts and percentages. Click on Descriptive Statistics. On the dialogue box which appears. select the variables for which Frequency Tables are required. for each of the selected variables. by clicking on the right arrow to transfer them from the variable list on the left to the Variables box on the right. followed by Frequencies.

A bar chart is often used to display frequencies. . Two graphical techniques can be used to display the results shown in the table. a pie chart graphically shows relative frequencies.Amity Business School Bar and Pie Chart  Graphical techniques generally catch a reader's eye more quickly than does a table of numbers.

Amity Business School Bar Chart for Example 1.1 .

which is equal to 90 degrees.2. The number of degrees for each category in Example 1. A pie chart is simply a circle subdivided into slices that represent the categories.1 is shown in Table 1.Amity Business School Pie Chart  If we wish to emphasize the relative frequencies instead of drawing the bar chart. since the entire circle is composed of 360 degrees. . For example. It is drawn so that the size of each slice is proportional to the percentage corresponding to that category. we draw the pie chart. a category that contains 25% of the observations is represented by a slice of the pie that contains 25% of 360 degrees.

2 91.23% 25.2 Area Proportion of Graduates 28.85% 20.55% 14.00 39.1 Other Total 11.07% 100.Amity Business School Table 1.30% Slice of Pie Accounting Finance General Management Marketing / Sales 103.2 Proportion in Each Category in Example 1.8 360 .9 74.0 51.

Amity Business School Pie Chart for Example 1.1 .

and make whatever changes you think make the chart look best. Click Chart (on Tool Bar). Chart Options. For a bar chart click the Chart Wizard. highlight the column of frequencies. For a pie chart click Pie instead of Column. . Column and Finish.Amity Business School MS – Excel Commands for Bar and Pie Chart     After creating the frequency distribution.

We clicked Slice Labels and clicked Category name and Percent.  Type or use the Select button to specify the variable in the Variables box (Area).  We clicked Labels and added the title. .  We clicked Labels and added the title and clicked Data Labels and use yvalue labels to display the frequencies at the top of the columns. For a pie chart:  Click Graph and Pie Chart.  Click Chart raw data and in the Categorical variables box type or use the Select button to specify the variable (Area).Amity Business School Minitab Commands for Bar and Pie Chart For a bar chart:  Click Graph and Bar Chart.  In the Bars represent box click Counts of unique values and select Simple.

followed by Frequencies. select the variables for which Frequency Tables are required. for each of the selected variables. selecting the required type of charts. and clicking Continue before step 4 above. On the dialogue box which appears. Click on Descriptive Statistics. Alternatively : click on Graphs at the SPSS menu bar followed by Chart Builder . by clicking on the right arrow to transfer them from the variable list on the left to the Variables box on the right. Click OK to get the tables with counts and percentages. Charts can be requested by clicking on Charts on the main dialogue box.Amity Business School SPSS Commands for Bar and Pie Chart       Click on Analyze at the SPSS menu bar.

Amity Business School Histogram  Example 1.2: A random sample of 40 days gave the following information about the total number of people treated per day at a community hospital emergency room (ER). . and the sample = collection of 40 days The (quantitative) variable = number of people being treated at the ER per day. the population = collection of days over a long period of time. 40 8 26 21 36 35 42 32 46 30 42 53 28 22 40 6 17 28 19 38 13 25 31 20 30 50 23 29 30 29 60 24 30 31 31 27 12 28 30 41 Here.

since there are much fewer entries in a single column (compared to the whole array). which is easy to do. First. here we first find the minimum (min) and maximum (max) entries to get a spread of the variable (in the sample). it does not make sense to have frequencies for distinct entries (we might end up with 40 distinct entries with each having frequency 1). find the min and max for each column. find. Next.Amity Business School Since the variable is quantitative and can take many possible values (much more than a typical categorical variable). . There is a systematic way of finding the min and max. So.

and hence min = 6 and max = 60 . Column maximums = 40.60 and 41. 23.50. we get Column minimums = 8.46. 30. 6. 13.53. 22.38. By this method. 24 and 12.31.Amity Business School (overall) min = minimum of column minimums and (overall) max = maximum of column maximums.

5).. our modified range is (5. the smallest possible increment of the quantitative variable) is 1 (or 1 patient).e.5. and the upper limit is 60. The length (L) of the modified range is L = upper limit – lower limit = 60. 60. The lower limit of the modified range is 5. We modify the range (6.5 = 55 .5.Amity Business School  Note that the unit here (i.5 – 5. This called a modified range and for the present data set. 60) by extending by one half of a unit on both sides. The idea behind the modified range is that it includes the boundary values (6 and 60) properly.5.

For the present case take k = 5 and then l = length of each class = L/k = 11 (The notation l is used to denote the length of each class or sub interval) Therefore.5) into successive contiguous classes: (5.5). usually taken between 5 and 8. .5 + l) = (5. 49.5 + l) = (27. 38.5.5.Amity Business School  This length L is now divided into several subintervals which gives us a few classes.5). The number of classes. (38.5.5).5 + l) = (38. 38. 27. 16.5. 27. we can divide the modified range (5. 60.5. 60.5). (27. (16. is a convenient number.5) and (49. 16. 5.5.5 + l) = (16.5. say k.5.5.5. 49.5.5 + l) = (49.

5 27.5 – 16.5 – 38.5 – 60.5 – 49.3 : Frequency table for number of individuals treated at ER per day Class 5.5 38.5 49.5 – 27.5 Total 10 17 6 3 40 .5 Frequency 4 16.Amity Business School Table 1.

Now using the relative frequencies as the heights. like the bar diagram. the y-axis represents the relative frequency and the x . Draw and label the x-y axes.axis represents the class (or interval) boundaries. draw vertical bars for each class. Usually.Amity Business School Histogram for Example 1. .2  Now we use the emergency room data to illustrate a histogram.

no intermediate values are possible. The separation between bars is appropriate for qualitative data because the data are discrete. the histogram of a dataset is unique (unlike the bar diagram).Amity Business School   Given a frequency table (with fixed number of classes and class boundaries). For discrete quantitative data. A histogram is just a bar graph with no separation between bars. . This is due to the natural ordering of the classes. both are graphical presentations of the data in a frequency distribution. Another departure from the bar diagram is the absence of fixed gap between two successive classes. a separation between bars is also appropriate. A bar graph and a histogram are essentially the same thing.

From the frequency polygon in Figure 1. Similarly. the frequency or relative frequency increases as the number of patients per day increases to 33.3 it is clear that for the emergency room dataset. Roughly. and beyond this the frequency starts falling. A frequency polygon shows the trend in the data in terms of frequency (which is also evident in the histogram). Connect the adjacent points with straight line segments. . and the resultant diagram is a frequency polygon. we see that there are more days when we treat 25 patients per day than 15 patients per day. Find the midpoint of each class (midpoint of a class is found by adding the two endpoints of the class and then dividing by 2). less number of days treat 45 patients per day than 35 patients per day.Amity Business School Frequency Polygon   Histogram gives rise to another simple concept called relative frequency polygon. and then plot the relative frequencies (on y-axis) against the midpoints (on x-axis).

then the frequency polygon (or the histogram) is called positively (negatively) skewed.Amity Business School Relative Frequency Polygon for the data in example 1.e. .2 If a frequency polygon has a longer right (left) tail than the left (right) one along with a single hump.. looks symmetric) then it is said to have a bell shape. If a frequency polygon with a single hump has approximately equal left and right tails (i.

to make cosmetic changes. Click Options. In another column type the upper limits of the class intervals. Excel counts the number of observations in each class that are greater than the lower limit and less than or equal to the upper limit. Except for the first class. Click Chart Output. Note that the numbers along the horizontal axis represent the upper limits of each class although they appear to be placed in the centers.. move the pointer to Gap Width and change the number from 150 to O.. Excel calls them bins Clicks Tool. Click (with the left button) Format Data Series . Click Labels if the first row contains names.. Data Analysis ….Amity Business School MS Excel Commands for Histogram       Type the data into one column. . Click Chart and Chart Options . To remove the gaps place the cursor over one of the rectangles-and click the right button of the mouse.. you have to install it by using Excel Options and Add ins. If Data Analysis does not appear in the menu box. and Histogram. Specify the Input Range and the Bin Range..

To choose your own classes. Click Data Display and Bar.. Click Binning. Click Data View. Type or import the data into one column. Under Interval Type choose Cutpoint. .. Histogram .Amity Business School Minitab Commands for Histogram        Note that Minitab counts the number of observations in each class that are strictly less than the upper limit and greater than or equal to the lower limit. Under Interval Definition choose Midpoint/Cutpoint positions and type in your choices. . and Simple. Type or use the Select button to specify the name of the variable in the Graph variables box . Minitab will create a histogram using its own choices of class intervals. double-click the horizontal axis. Click Graph.

6) we do lose individual values of the observations. in a frequency table (like Table 2. yet allows us to recover the original dataset if required. A stem-leaf display is a simple device that groups the whole dataset and produces a histogram or bar diagram like picture. . However.Amity Business School Stem and Leaf Display  The stem-leaf display is an extremely useful way of studying data structure for a quantitative variable. We illustrate this with the following example. A frequency table and the corresponding histogram provide a useful organization and pictorial representation of data.

Amity Business School  Example 1.4 Commuting Distance Data 13 7 12 6 34 14 47 25 45 2 13 26 10 8 1 14 41 10 3 21 8 13 28 24 16 19 4 7 50 36 .4 gives the one-way commuting distance (in nearest miles) of 30 work-ing mothers in a large city Table 1.3 Table 1.

an entry = 8 0 8 tens digit units digit  (A single digit entry. we partition the digits of each individual observation (numeric value) into two components: stem and leaf. say 8. is read as 08 before being broken into 'stem' and 'leaf'. The number of digits to be included in the stem is chosen conveniently so that the number of stems in the display is between 5 and 20.Amity Business School  To make a stem-leaf display. The left side group of digits of the entry is called a stem and the right side group of digits is called a leaf.) .

5 (or 9. (10. we use tens digit of an entry to form the stem and the units digit to form the corresponding leaf. it is clear that most of the entries are in the l0-mile range [i. since the data entries are rounded values and hence anyone commuting 9.9) miles would be assigned the value 10.6 or 9. Figure 1. . (0.4. The horizontal length of the leaves represents the frequency for the corresponding stem which is essentially a class. meaning 0 for its stem and 8 for its leaf.7 or 9.or two-digit numbers.5 gives the stem-leaf display of the above mentioned data.5 miles. The entry 8 is treated as 08.Amity Business School  For the data in Table 1. or more correctly the class 9. 9) miles]. From Figure 1.5.5-19. where all entries are one. followed by the 0-mile range [i.. the stem is 1 and the leaf is 3. For the first entry 13.e.. The stem 1 represents the class 10-19 miles.8 or 9. 19) miles].e.

Amity Business School A Stem-and-Leaf display for the data in example 1. 8. 0. 5. 7. 3. 8. 7 6. 4 3. Leaf 1. 6. 0. 3. 4. 2. 4. 8. 0.3 Stem 0 1 2 3 4 5 6 7. 1. 4. 4. 6. 3. 6 5 1 2. 9 7 8 9 .

Amity Business School MS Excel Commands for Stem and Leaf Display    Type the data into one column Click Tools. Data Analysis Plus. and Stem and Leaf Display. (The increment is the difference between stems) . Click one of the values of Increment. Specify the Input Range.

. Click Graph and Stem-and-Leaf… Type or use the select button to specify the variable in the Variable box.Amity Business School Minitab Commands for Stem and Leaf Display    Type the data into one column. Type the increment in the Increment box.

Then Click OK . followed by Explore… Select the variable and transfer it into the Dependent List box and select plots for display. Select None for Boxplots and Stem-and-Leaf for Descriptive and then click on Continue to return to the Explorer dialog box.Amity Business School SPSS Commands for Stem and Leaf Display      Enter the data into one column. Click on Analyze at the SPSS menu bar. Click on Descriptive Statistics. Click on Plots to open the Explore: Plots dialog box.

5 and that 92. .5 – 60.5 – 16. 77.5 16.5 49.5 Frequency 4 10 17 6 3 Cumulative Frequency 4 14 31 37 40 Table 1. Class 5.5 displays this type of distribution for Example 1.5 27. Table 1.Amity Business School Ogive  The frequency distribution lists the number of observations that fall into each class interval. for example.5 Cumulative Frequency table for number of individuals treated at ER per day From Table 1.5 38.5.5 % of the data is less than or equal to 38. In some situations we may wish to highlight the number of observations that lie below each of the class limits.5 we can see that.5 – 49. In such cases we create the cumulative frequency distribution.5 % were less than or equal to 49.2.5 – 38.5 – 27.

Amity Business School Ogive for the data in Example 1. Figure 1.2 .2  Another way of presenting this information is the ogive. which is a graphical representation of the cumulative frequencies.5 illustrates an ogive for the cumulative frequencies in example 1.

Fre-quency distributions. is often difficult to interpret directly in the form in which it is gathered. Frequency distributions.Amity Business School Summary  A set of data. percent frequency distributions. relative frequency distributions. bar graphs. . cumulative frequency distributions. and ogives were pre-sented as ways of summarizing quantitative data. percent fre-quency distributions. Graphical methods provide procedures for organizing and summarizing data so that patterns are revealed and the data are more easily interpreted. A stem-and-leaf display provides an ex-ploratory data analysis technique that can be used to summarize quantitative data. and pie charts were presented as tabular and graphical procedures for summarizing qualitative data. even if modest in size. relative frequency distributions. histograms.

A frequency distribution is a tabular summary of data showing the a. percentage of items in several classes c. ogive d.Amity Business School Self Test 1. Qualitative data can be graphically represented by using a(n) a. frequency polygon c. relative percentage of items in several classes d. histogram b. bar graph . number of items in several classes 2. fraction of items in several classes b.

Amity Business School 3. dividing the relative frequency by 100 c. adding 100 to the relative frequency . dividing the sample size by the frequency of the class d. dividing the frequency of the class by the midpoint c. multiplying the relative frequency by 10 b. The relative frequency of a class is computed by a. dividing the midpoint of the class by the sample size b. The percent frequency of a class is computed by a. multiplying the relative frequency by 100 d. dividing the frequency of the class by the sample size 4.

The graphical device(s) which can be used to present these data is (are) a. and 30% in Accounting. only a bar graph c. 20% in Finance. only a pie chart d. 35% in Management. Fifteen percent of the students in a school of Business Administration are majoring in Economics. both a bar graph and a pie chart . a line graph b.Amity Business School 5.

the percentage of data items with values less than or equal to the lower limit of each class . the proportion of data items with values less than or equal to the lower limit of each class c. A cumulative relative frequency distribution shows a. the percentage of data items with values less than or equal to the upper limit of each class d.Amity Business School 6. the proportion of data items with values less than or equal to the upper limit of each class b.

largest data value/number of classes . the approximate class width is computed as a. (largest data value .smallest data value)/number of classes b.smallest data value)/sample size c.Amity Business School 7. bar graph c.largest data value)/sample size d. The most common graphical presentation of quantitative data is a a. relative frequency d. histogram b. (smallest data value . pie chart 8. In constructing a frequency distribution. (largest data value .

longer tail to the right b. When a histogram has a longer tail to the right. longer tail to the left 10. skewed to the left c. symmetrical b.Amity Business School 9. skewed to the right d. A histogram is said to be skewed to the left if it has a a. shorter tail to the right c. none of these alternatives is correct . it is said to be a. shorter tail to the left d.

39 40 .59 60 . has kept a record of the number of cars they have rented for a period of 80 days.19 20 . a national car rental company.Amity Business School  Exhibit 1 Michael's Rent-A-Car.79 80 . Their rental records are shown below: Number of Cars Rented 0 .99 Number of Days 5 15 30 20 _10_ Total 80 .

0 to 100 b. Refer to Exhibit 1. The lower limit of the first class is a. 20 . 20 c. 80 c. 5 b. Refer to Exhibit 1. 80 d. The class width of the above distribution is a.Amity Business School 11. 0 d. 5 12.

0% d. Refer to Exhibit 1. 62. 80 14. 75. the last class will have a frequency of a.Amity Business School 13.5% b. 100 c. 90. 0 to 100 d. 10 b.5% c. If one develops a cumulative frequency distribution for the above data.0% . Refer to Exhibit 1. 37. The percentage of days in which the company rented at least 40 cars is a.

The number of days in which the company rented less than 60 cars is a. Refer to Exhibit 1. 20 b. 30 c.Amity Business School 15. 60 . 50 d.

Management. and Marketing. There are four majors in the School: Accounting. Finance. There are 800 students in the School of Business Administration. Major Accounting Number of Students 240 Finance Management Marketing 160 320 80 Develop a percent frequency distribution and construct a bar chart and a pie chart. .Amity Business School  16. The following shows the number of students in each major.

Draw an ogive.Amity Business School 17. To help determine the need for more golf courses. b. Draw a histogram. These data are as follows 18 30 15 28 29 24 18 26 35 19 36 26 30 10 16 14 27 20 17 20 14 35 20 28 30 32 13 16 30 18 9 26 36 23 28 15 24 17 12 24 3 19 18 21 28 31 29 28 10 15 25 25 13 18 5 42 18 18 23 26 38 14 22 19 29 20 22 31 24 25 23 24 30 36 13 a. d. a survey was undertaken. c. Draw a stem-and-leaf display. . A sample of 75 self declared golfers was asked how many rounds of golf they played last year. Describe what you have learned.