Lecture 2 : Diagrammatic & Graphical Presentation of Data Department of Decision Sciences

Objective of the Lecture

 

To introduce diagrammatic and graphical statistical methods that allows managers to summarize data visually to produce useful information. To understand the importance of the graphical methods commonly used to summarize both qualitative and quantitative data. To know how they are prepared and how they should be interpreted.

Introduction

The most common and simple forms of pictorial representation of data are: (i) Bar diagram (ii) Histogram (iii) Pie diagram (iv) Stem-Leaf display (v) Frequency Polygon (vi) Ogive Though the first two approaches above are similar in nature, the bar diagram is meant for categorical data whereas the histogram and stem-leaf display are meant solely for quantitative data. On the other hand, pie diagram can be used for both types of data.

Example 1.1:

University Placement Office Survey

The student placement office at a university conducted a survey of last year's business school graduates to determine the general areas in which the graduates found jobs. The placement office intended to use the resulting information to help decide where to con-centrate its efforts in attracting companies to campus to conduct job interviews. Each graduate was asked in which area he or she found a job. The areas of employment are Accounting Finance General management Marketing/Sales Other

The responses were recorded using the codes 1, 2, 3, 4, find 5, respectively. Construct a frequency and relative frequency distribution for these data and graphically summarize the data by producing a bar chart and a pie chart. Data on the next slide…

Amity Business School Data 1 4 5 1 2 1 5 4 5 1 2 1 1 3 4 4 4 4 4 3 1 1 1 1 3 4 3 4 4 1 2 2 5 1 4 4 4 5 2 1 5 3 1 1 1 2 1 4 3 1 5 2 2 4 1 4 5 4 2 2 1 4 2 4 4 1 2 2 5 1 4 3 5 5 4 2 3 2 1 3 3 2 5 2 2 2 3 1 4 1 1 4 3 4 1 3 4 2 4 1 4 1 5 1 2 1 1 4 5 1 3 1 5 1 2 4 3 2 1 1 3 4 4 1 3 4 3 4 2 1 1 2 1 4 2 2 4 4 4 4 5 4 1 4 1 3 5 2 1 5 1 4 1 4 2 2 4 4 1 2 4 4 2 3 2 5 5 1 3 2 2 2 2 1 3 4 2 3 5 1 1 1 2 4 4 1 4 4 5 3 3 5 2 4 4 4 5 2 4 5 3 5 4 3 4 1 2 2 1 3 1 4 1 3 5 1 1 3 4 1 1 1 4 4 2 3 2 2 1 2 4 1 2 1 3 4 2 1 3 5 1 1 3 1 3 2 2 1 2 1 3 1 .

The frequency and relative frequency distributions are combined in Table 1. To choose the appropriate technique we must first identify the type of data. The relative frequency distribution is produced by converting the frequencies into proportion.1 .Amity Business School  Scanning the data produces no real information. In this example the data are nominal because the numbers represent categories. To extract the information requires the application of a statistical or graphical technique. The list of the categories and their count constitute the frequency distribution. The only calculation permitted on nominal data is to count the number of occurrences of each category.

23% 25.07% 100% Interpretation: Accounting is the most popular area of employment.1 Area Accounting Frequency 73 Relative Frequency 28. finance.85% Finance General Management Marketing / Sales Other Total 52 36 64 28 253 20.Amity Business School Table 1.30% 11.1 Frequency and Relative Frequency Distribution for Example 1. followed by marketing/ sales. general management and others .55% 14.

Click fx and select the category Statistical. =COUNTIF([Data range). In the Range box specify the input range of the data (A1:A254) In the Criteria box type the code you want to count (1) (2) (3) (4) (5). Alternatively. type the following into any active cell.Amity Business School MS – Excel Commands for Frequency     Type data into one or more columns Activate any empty cell. Change the criteria to produce the frequency of another cate-gory. and the func-tion COUNTIF. [Criteria])) . The frequency will appear in the dialog box.

Under Display click Counts and Percents. Tables and Tally Individual Variables Type or use the Select button to specify the name of the variable or the column where the data are stored in the Variable box (Area). Click Stat. .Amity Business School Minitab Commands for Frequency    Type data into one or more columns.

for each of the selected variables. select the variables for which Frequency Tables are required. followed by Frequencies. Click on Descriptive Statistics. On the dialogue box which appears. by clicking on the right arrow to transfer them from the variable list on the left to the Variables box on the right. Click OK to get the tables with counts and percentages. .Amity Business School SPSS Commands for Frequency     Click on Analyze at the SPSS menu bar.

Amity Business School Bar and Pie Chart  Graphical techniques generally catch a reader's eye more quickly than does a table of numbers. a pie chart graphically shows relative frequencies. . A bar chart is often used to display frequencies. Two graphical techniques can be used to display the results shown in the table.

1 .Amity Business School Bar Chart for Example 1.

It is drawn so that the size of each slice is proportional to the percentage corresponding to that category. which is equal to 90 degrees.2. since the entire circle is composed of 360 degrees. The number of degrees for each category in Example 1. A pie chart is simply a circle subdivided into slices that represent the categories. a category that contains 25% of the observations is represented by a slice of the pie that contains 25% of 360 degrees. For example.1 is shown in Table 1.Amity Business School Pie Chart  If we wish to emphasize the relative frequencies instead of drawing the bar chart. . we draw the pie chart.

07% 100.55% 14.Amity Business School Table 1.2 91.2 Area Proportion of Graduates 28.23% 25.30% Slice of Pie Accounting Finance General Management Marketing / Sales 103.1 Other Total 11.85% 20.0 51.9 74.00 39.8 360 .2 Proportion in Each Category in Example 1.

1 .Amity Business School Pie Chart for Example 1.

highlight the column of frequencies. Column and Finish. . Chart Options. Click Chart (on Tool Bar). and make whatever changes you think make the chart look best.Amity Business School MS – Excel Commands for Bar and Pie Chart     After creating the frequency distribution. For a bar chart click the Chart Wizard. For a pie chart click Pie instead of Column.

 Click Chart raw data and in the Categorical variables box type or use the Select button to specify the variable (Area). We clicked Slice Labels and clicked Category name and Percent. .  We clicked Labels and added the title and clicked Data Labels and use yvalue labels to display the frequencies at the top of the columns. For a pie chart:  Click Graph and Pie Chart.  In the Bars represent box click Counts of unique values and select Simple.  Type or use the Select button to specify the variable in the Variables box (Area).  We clicked Labels and added the title.Amity Business School Minitab Commands for Bar and Pie Chart For a bar chart:  Click Graph and Bar Chart.

Charts can be requested by clicking on Charts on the main dialogue box. select the variables for which Frequency Tables are required. Alternatively : click on Graphs at the SPSS menu bar followed by Chart Builder . by clicking on the right arrow to transfer them from the variable list on the left to the Variables box on the right. On the dialogue box which appears. Click OK to get the tables with counts and percentages. followed by Frequencies. selecting the required type of charts. for each of the selected variables.Amity Business School SPSS Commands for Bar and Pie Chart       Click on Analyze at the SPSS menu bar. Click on Descriptive Statistics. and clicking Continue before step 4 above.

2: A random sample of 40 days gave the following information about the total number of people treated per day at a community hospital emergency room (ER). . 40 8 26 21 36 35 42 32 46 30 42 53 28 22 40 6 17 28 19 38 13 25 31 20 30 50 23 29 30 29 60 24 30 31 31 27 12 28 30 41 Here. and the sample = collection of 40 days The (quantitative) variable = number of people being treated at the ER per day. the population = collection of days over a long period of time.Amity Business School Histogram  Example 1.

First. since there are much fewer entries in a single column (compared to the whole array). Next. here we first find the minimum (min) and maximum (max) entries to get a spread of the variable (in the sample). There is a systematic way of finding the min and max. So. it does not make sense to have frequencies for distinct entries (we might end up with 40 distinct entries with each having frequency 1). which is easy to do. find. find the min and max for each column. .Amity Business School Since the variable is quantitative and can take many possible values (much more than a typical categorical variable).

13.31.38. By this method.46. we get Column minimums = 8. 22.53.60 and 41.50. 23. and hence min = 6 and max = 60 . Column maximums = 40. 30. 6. 24 and 12.Amity Business School (overall) min = minimum of column minimums and (overall) max = maximum of column maximums.

Amity Business School  Note that the unit here (i.e.5 – 5. The length (L) of the modified range is L = upper limit – lower limit = 60. This called a modified range and for the present data set. We modify the range (6. our modified range is (5..5 = 55 .5.5). 60) by extending by one half of a unit on both sides. The lower limit of the modified range is 5. the smallest possible increment of the quantitative variable) is 1 (or 1 patient).5.5. 60. The idea behind the modified range is that it includes the boundary values (6 and 60) properly. and the upper limit is 60.

usually taken between 5 and 8. For the present case take k = 5 and then l = length of each class = L/k = 11 (The notation l is used to denote the length of each class or sub interval) Therefore. we can divide the modified range (5. 60.5.5).5 + l) = (5. 49. .5. 27.5.5.5. 38.5) and (49.5. (27. The number of classes. (38.5). 27.5.5 + l) = (49.5. 49.5 + l) = (38. 16. say k. 16.5.5) into successive contiguous classes: (5.5.5 + l) = (27.5 + l) = (16.5).5.5). 60. (16. 38.Amity Business School  This length L is now divided into several subintervals which gives us a few classes. 5. is a convenient number.

5 49.Amity Business School Table 1.5 – 60.5 Frequency 4 16.5 – 49.5 38.5 – 27.5 Total 10 17 6 3 40 .5 27.3 : Frequency table for number of individuals treated at ER per day Class 5.5 – 38.5 – 16.

Now using the relative frequencies as the heights.axis represents the class (or interval) boundaries.2  Now we use the emergency room data to illustrate a histogram.Amity Business School Histogram for Example 1. the y-axis represents the relative frequency and the x . . like the bar diagram. Draw and label the x-y axes. Usually. draw vertical bars for each class.

no intermediate values are possible. the histogram of a dataset is unique (unlike the bar diagram). .Amity Business School   Given a frequency table (with fixed number of classes and class boundaries). For discrete quantitative data. The separation between bars is appropriate for qualitative data because the data are discrete. both are graphical presentations of the data in a frequency distribution. This is due to the natural ordering of the classes. A histogram is just a bar graph with no separation between bars. A bar graph and a histogram are essentially the same thing. Another departure from the bar diagram is the absence of fixed gap between two successive classes. a separation between bars is also appropriate.

A frequency polygon shows the trend in the data in terms of frequency (which is also evident in the histogram). the frequency or relative frequency increases as the number of patients per day increases to 33.3 it is clear that for the emergency room dataset. and then plot the relative frequencies (on y-axis) against the midpoints (on x-axis). and the resultant diagram is a frequency polygon.Amity Business School Frequency Polygon   Histogram gives rise to another simple concept called relative frequency polygon. we see that there are more days when we treat 25 patients per day than 15 patients per day. Similarly. Find the midpoint of each class (midpoint of a class is found by adding the two endpoints of the class and then dividing by 2). . and beyond this the frequency starts falling. Connect the adjacent points with straight line segments. Roughly. From the frequency polygon in Figure 1. less number of days treat 45 patients per day than 35 patients per day.

looks symmetric) then it is said to have a bell shape.2 If a frequency polygon has a longer right (left) tail than the left (right) one along with a single hump.e.Amity Business School Relative Frequency Polygon for the data in example 1. then the frequency polygon (or the histogram) is called positively (negatively) skewed. .. If a frequency polygon with a single hump has approximately equal left and right tails (i.

move the pointer to Gap Width and change the number from 150 to O. you have to install it by using Excel Options and Add ins. Data Analysis …. Excel counts the number of observations in each class that are greater than the lower limit and less than or equal to the upper limit.. Note that the numbers along the horizontal axis represent the upper limits of each class although they appear to be placed in the centers. Click Labels if the first row contains names. Except for the first class.. In another column type the upper limits of the class intervals.. Click (with the left button) Format Data Series . Excel calls them bins Clicks Tool.. To remove the gaps place the cursor over one of the rectangles-and click the right button of the mouse. ..Amity Business School MS Excel Commands for Histogram       Type the data into one column. Specify the Input Range and the Bin Range. Click Chart Output. If Data Analysis does not appear in the menu box. to make cosmetic changes. and Histogram. Click Options. Click Chart and Chart Options .

Type or import the data into one column. Click Data View. Minitab will create a histogram using its own choices of class intervals. . Click Graph.. Click Data Display and Bar.Amity Business School Minitab Commands for Histogram        Note that Minitab counts the number of observations in each class that are strictly less than the upper limit and greater than or equal to the lower limit. . Histogram . Type or use the Select button to specify the name of the variable in the Graph variables box . Under Interval Definition choose Midpoint/Cutpoint positions and type in your choices. Click Binning. and Simple. To choose your own classes. Under Interval Type choose Cutpoint. double-click the horizontal axis..

We illustrate this with the following example. .Amity Business School Stem and Leaf Display  The stem-leaf display is an extremely useful way of studying data structure for a quantitative variable. However. A frequency table and the corresponding histogram provide a useful organization and pictorial representation of data. A stem-leaf display is a simple device that groups the whole dataset and produces a histogram or bar diagram like picture. in a frequency table (like Table 2.6) we do lose individual values of the observations. yet allows us to recover the original dataset if required.

4 Commuting Distance Data 13 7 12 6 34 14 47 25 45 2 13 26 10 8 1 14 41 10 3 21 8 13 28 24 16 19 4 7 50 36 .Amity Business School  Example 1.3 Table 1.4 gives the one-way commuting distance (in nearest miles) of 30 work-ing mothers in a large city Table 1.

) . The number of digits to be included in the stem is chosen conveniently so that the number of stems in the display is between 5 and 20. The left side group of digits of the entry is called a stem and the right side group of digits is called a leaf. an entry = 8 0 8 tens digit units digit  (A single digit entry. say 8.Amity Business School  To make a stem-leaf display. we partition the digits of each individual observation (numeric value) into two components: stem and leaf. is read as 08 before being broken into 'stem' and 'leaf'.

5-19. 9) miles]. (10.4.. Figure 1. followed by the 0-mile range [i. The stem 1 represents the class 10-19 miles.6 or 9.e.5 miles. since the data entries are rounded values and hence anyone commuting 9. The entry 8 is treated as 08. where all entries are one. For the first entry 13.5 (or 9. we use tens digit of an entry to form the stem and the units digit to form the corresponding leaf. (0.5.Amity Business School  For the data in Table 1. the stem is 1 and the leaf is 3.7 or 9.5 gives the stem-leaf display of the above mentioned data. From Figure 1.8 or 9. or more correctly the class 9.or two-digit numbers. The horizontal length of the leaves represents the frequency for the corresponding stem which is essentially a class. 19) miles].. . it is clear that most of the entries are in the l0-mile range [i. meaning 0 for its stem and 8 for its leaf.9) miles would be assigned the value 10.e.

4. 3. 6. 0. 4 3. 0. 4. 9 7 8 9 . Leaf 1.Amity Business School A Stem-and-Leaf display for the data in example 1. 8. 7 6. 4. 6 5 1 2. 7. 3. 8. 5. 0. 4. 3. 2. 6. 8. 1.3 Stem 0 1 2 3 4 5 6 7.

Click one of the values of Increment. and Stem and Leaf Display. Specify the Input Range. Data Analysis Plus.Amity Business School MS Excel Commands for Stem and Leaf Display    Type the data into one column Click Tools. (The increment is the difference between stems) .

.Amity Business School Minitab Commands for Stem and Leaf Display    Type the data into one column. Type the increment in the Increment box. Click Graph and Stem-and-Leaf… Type or use the select button to specify the variable in the Variable box.

followed by Explore… Select the variable and transfer it into the Dependent List box and select plots for display. Click on Analyze at the SPSS menu bar. Then Click OK .Amity Business School SPSS Commands for Stem and Leaf Display      Enter the data into one column. Click on Plots to open the Explore: Plots dialog box. Click on Descriptive Statistics. Select None for Boxplots and Stem-and-Leaf for Descriptive and then click on Continue to return to the Explorer dialog box.

5 27.5 – 16.5 Frequency 4 10 17 6 3 Cumulative Frequency 4 14 31 37 40 Table 1.5 – 60.5 % were less than or equal to 49.5 we can see that. In such cases we create the cumulative frequency distribution.5 – 27. 77. for example. Table 1.5 and that 92.5 – 38.5 49.5 displays this type of distribution for Example 1.Amity Business School Ogive  The frequency distribution lists the number of observations that fall into each class interval.5 38.5 16.2.5 % of the data is less than or equal to 38.5 Cumulative Frequency table for number of individuals treated at ER per day From Table 1.5.5 – 49. Class 5. . In some situations we may wish to highlight the number of observations that lie below each of the class limits.

Amity Business School Ogive for the data in Example 1. Figure 1.5 illustrates an ogive for the cumulative frequencies in example 1. which is a graphical representation of the cumulative frequencies.2 .2  Another way of presenting this information is the ogive.

bar graphs. Frequency distributions. cumulative frequency distributions. is often difficult to interpret directly in the form in which it is gathered. Fre-quency distributions. percent fre-quency distributions. histograms. Graphical methods provide procedures for organizing and summarizing data so that patterns are revealed and the data are more easily interpreted. and pie charts were presented as tabular and graphical procedures for summarizing qualitative data. relative frequency distributions.Amity Business School Summary  A set of data. percent frequency distributions. and ogives were pre-sented as ways of summarizing quantitative data. even if modest in size. A stem-and-leaf display provides an ex-ploratory data analysis technique that can be used to summarize quantitative data. relative frequency distributions. .

A frequency distribution is a tabular summary of data showing the a. percentage of items in several classes c. ogive d. histogram b. bar graph .Amity Business School Self Test 1. relative percentage of items in several classes d. frequency polygon c. Qualitative data can be graphically represented by using a(n) a. number of items in several classes 2. fraction of items in several classes b.

dividing the midpoint of the class by the sample size b. dividing the relative frequency by 100 c. dividing the frequency of the class by the midpoint c. The relative frequency of a class is computed by a. multiplying the relative frequency by 100 d. dividing the frequency of the class by the sample size 4. dividing the sample size by the frequency of the class d. The percent frequency of a class is computed by a. multiplying the relative frequency by 10 b. adding 100 to the relative frequency .Amity Business School 3.

both a bar graph and a pie chart . 35% in Management. 20% in Finance. and 30% in Accounting. only a pie chart d.Amity Business School 5. only a bar graph c. Fifteen percent of the students in a school of Business Administration are majoring in Economics. a line graph b. The graphical device(s) which can be used to present these data is (are) a.

A cumulative relative frequency distribution shows a. the percentage of data items with values less than or equal to the lower limit of each class . the percentage of data items with values less than or equal to the upper limit of each class d. the proportion of data items with values less than or equal to the upper limit of each class b.Amity Business School 6. the proportion of data items with values less than or equal to the lower limit of each class c.

relative frequency d. In constructing a frequency distribution. (smallest data value .smallest data value)/number of classes b. (largest data value . largest data value/number of classes .Amity Business School 7. histogram b. the approximate class width is computed as a. bar graph c. The most common graphical presentation of quantitative data is a a. pie chart 8. (largest data value .largest data value)/sample size d.smallest data value)/sample size c.

When a histogram has a longer tail to the right. A histogram is said to be skewed to the left if it has a a. symmetrical b. longer tail to the right b. it is said to be a. skewed to the right d. skewed to the left c.Amity Business School 9. longer tail to the left 10. shorter tail to the right c. none of these alternatives is correct . shorter tail to the left d.

39 40 . Their rental records are shown below: Number of Cars Rented 0 .19 20 .Amity Business School  Exhibit 1 Michael's Rent-A-Car. a national car rental company.59 60 .99 Number of Days 5 15 30 20 _10_ Total 80 .79 80 . has kept a record of the number of cars they have rented for a period of 80 days.

20 . The lower limit of the first class is a. Refer to Exhibit 1.Amity Business School 11. Refer to Exhibit 1. 80 c. 80 d. The class width of the above distribution is a. 5 12. 5 b. 20 c. 0 d. 0 to 100 b.

75. 90. If one develops a cumulative frequency distribution for the above data. 62.Amity Business School 13.5% b. Refer to Exhibit 1. Refer to Exhibit 1. 37. the last class will have a frequency of a.5% c. 0 to 100 d.0% d. 100 c. 80 14. 10 b.0% . The percentage of days in which the company rented at least 40 cars is a.

30 c. 60 .Amity Business School 15. 20 b. 50 d. The number of days in which the company rented less than 60 cars is a. Refer to Exhibit 1.

and Marketing. The following shows the number of students in each major. There are 800 students in the School of Business Administration. Major Accounting Number of Students 240 Finance Management Marketing 160 320 80 Develop a percent frequency distribution and construct a bar chart and a pie chart.Amity Business School  16. There are four majors in the School: Accounting. Management. . Finance.

a survey was undertaken. These data are as follows 18 30 15 28 29 24 18 26 35 19 36 26 30 10 16 14 27 20 17 20 14 35 20 28 30 32 13 16 30 18 9 26 36 23 28 15 24 17 12 24 3 19 18 21 28 31 29 28 10 15 25 25 13 18 5 42 18 18 23 26 38 14 22 19 29 20 22 31 24 25 23 24 30 36 13 a. A sample of 75 self declared golfers was asked how many rounds of golf they played last year. To help determine the need for more golf courses. Draw an ogive. Draw a histogram. Draw a stem-and-leaf display. d. . Describe what you have learned. c.Amity Business School 17. b.