# Introduction

Excel Guide

1

The Excel Guide
to accompany Practical Business Statistics by Andrew F. Siegel

PREPARED BY ANDREW F. SIEGEL

Excel is a registered trademark of Microsoft, Inc.

2

Excel Guide

Introduction

Preface

Introduction

Excel Guide

3

Introduction and Sample Excel Session
Excel is a powerful computing environment with statistical capabilities. You can type data into the worksheet, analyze and manipulate the data, and write text to identify and explain it. Summaries, charts, and detailed calculations are easily done using Excel’s menu commands and functions. In this introductory chapter we cover some of the basics of Excel with hints and tips including: entering data, doing arithmetic, using functions, selecting and naming cells, using UNDO, formatting, working with data files from Practical Business Statistics, sorting, and making a chart. If you are new to Excel, remember that the best way to learn is by experimenting. Explore the menu system and try things out to see how they work. Use Help for guidance. This manual gives you step-by-step instructions for many tasks. If you are an experienced Excel user, this manual will show you many ways in which Excel can be used for statistical calculations.

Moving Around and Typing Data into the Worksheet
I enjoy the freedom of working in spreadsheets like Excel. You can click on any cell you want, type anything you want - text or number or function - and it stays there when you hit the Enter key. To move around, you can use the mouse or the cursor keys ← , ↑ , → , and ↓ . Here is a worksheet that has some text, some numbers and a function. Note that you can see what function is in the selected cell (in this case, “=C3+C4”) by looking at the Formula Bar near the top.

In this case, the function starts, as always with an equals sign “=”. To enter the formula that adds Jim’s and Adrian’s sales together, you might either type the formula directly and hit Enter, or construct it by pointing to cells as follows: 1. 2. Select cell C6 by clicking on it or moving to it with the cursor keys Hit the = key

4 3. 4. 5. 6. Click on Jim’s sales in cell C3 Hit the + key

Excel Guide

Introduction

Click on Adrian’s sales in cell C4 Hit Enter.

Using Formulas to do Basic Arithmetic
Any cell can contain a formula that uses basic arithmetic with numbers and references to numbers (or formula results) in other cells. Here are some of the rules of basic arithmetic in Excel. 1. 2. Start by selecting the cell where you want the result to go and hitting the = key Use operators “+” for addition, “−“ for subtraction, “*” for multiplication, “/” for division, and exponentiation “^” to raise to a power. Here are some examples (in column B) with the formulas written out (in column C). Note that 2 ^ 3 means 2 multiplied by itself 3 times, so it’s 2 * 2 * 2 = 8. Note also that the last formula, in cell B7, adds the results of two other formulas to find 18 as 8 + 10.

3.

Rules of arithmetic say that these operations are performed in the following order: a. b. c. d. Exponentiation “^” is done first Multiplication “*” and division “/” happen next. You will want to use parentheses so that equations with multiplication after division like 2 / (3 * 4) are correctly evaluated Addition “+” and subtraction “-“ are done last. Thus 6 + 4 * 2 ^ 3 is evaluated as 6 + 4 * 8, which is 6 + 32, which is 38. If you want something to happen first, put it in parentheses. For example, (2 + 3) * 4 makes the addition happen before the multiplication.

Introduction e.

Excel Guide

5

If you have a minus sign that is not subtracting, be careful! It happens even before exponentiation! Thus -2 ^ 4 is evaluated as (-2) ^ 4 which is 16. If you wanted -(2 ^ 4) you would need to include the parentheses to make the exponentiation happen first and to get -16 as the answer.

4.

Percentages are used as if they were already divided by 100. For example, if you enter a percent like “20%” directly into a cell, its value is taken to be 0.20. This makes it easy, for example, to find 20% of a number: you simply multiply the number by 20%.

Using Functions to Compute a Number
Excel has a vast collection of useful functions. One easy way to browse them is to select an empty cell and choose Insert/Function from the main menu. Here is how it looks if you select the Statistical Function Category and then the AVERAGE function:

This is a nice way to insert a function into the worksheet because Excel will help you fill in the details in the correct order, so that you don’t have to memorize what goes where, which is especially useful with functions that need more than one piece of information. To insert the AVERAGE function, click OK to see a dialog box like this

6

Excel Guide

Introduction

that is ready for you to select one or more cells by clicking or dragging the mouse across cells with the numbers you want to average. You may move this dialog box out of the way by dragging most anywhere on it. Here is how it looks after dragging down cells B2 through B7:

When you click OK, the result is placed into the worksheet in the cell that was selected when you first chose Insert/Function from the menu. Here is the result:

You could achieve exactly the same result by selecting cell B9 and typing “=AVERAGE(B2:B7)” without the quotation marks and then hitting Enter. Another way to do this is to type “=AVERAGE(” without the quotation marks, then use the mouse to drag down cells B2 through B7, then type “)” without the quotation marks and hit Enter.

Selecting a Range of Cells
You will probably want to do many things to cells: put things in them, format them, calculate with them. The way Excel works, you will need to know how to select cells in order to change them or use them. To select a rectangular range of worksheet cells, simply drag the mouse from one corner diagonally to the opposite corner. The result will look something like this:

Introduction

Excel Guide

7

Another way to select these cells would be to use the cursor keys to move to one corner, say C70. Then hold the Shift key while you move right → twice. Then hit the End key (with or without Shift). Finally, hold the Shift key while you hit the down arrow ↓ . When you use the End key, the next movement (left, right, up, or down) will go to the end of the row or column you are working in. Holding the Shift key expands the selection.

Naming a Range of Cells
It is much more convenient to refer to a list of data using an Excel Range Name like “Sales” instead of an Excel address like “D3:D6”. It is a good idea to also have a column heading like “Sales” in the cell above the data, but this may not be enough. Some versions of Excel will try to figure out which cells you wish to work with, but the best way to be sure that the name is associated with the correct data is to explicitly give the range a name. Here is one way to create an Excel range name for a column of sales numbers with a label at the top: 1. Begin by selecting the sales numbers (just the numbers, not the label) by dragging the mouse down the column. It should look something like this:

8

Excel Guide

Introduction

2.

Choose Insert/Name/Define from the main menu system. Because the label is at the top and you have selected cells below it, Excel knows what you want to do and proposes to give the range name “Sales” to the data in cells D3 through D6. Here is how it should look

you can also use this Define Name dialog box to see what other names are defined and to check that they refer to the correct worksheet range. You cannot just choose any name for a range. The first character must be a letter or the underscore character “_”. The other characters can be letters, numbers, periods, and underscore characters, but not spaces (use underscores instead). Names cannot be the same as a cell reference (e.g. C16, R3C5, R and C are not allowed). There is no distinction between uppercase and lowercase letters, so “Sales”, “sales”, “SALES”, and “sALeS” all refer to the same worksheet cells. 3. When you choose OK, the range name is assigned. Whenever you select this range, its name (“Sales”) will appear in the name box near the top left corner of the worksheet, at the left end of the formula bar. You can select this range quickly by choosing its name in the name box.

Introduction

Excel Guide

9

The Fill Handle
At the lower right-hand corner of a selection is the fill handle. Here’s one nice thing it can do if you drag it with the mouse: extend a sequence automatically:

Another nice thing the fill handle can do is automatically copy a selected cell’s formula down a column by dragging the fill handle as far as you want. If the cell is next to a column with data in it, then double-clicking the fill handle will automatically copy the cell’s contents down the column!

10

Excel Guide

Introduction

Copying and Pasting
To copy and paste text, a number, or a formula, you select the source cell(s), choose Edit/Copy from the main menu, select a cell at the destination, and choose Edit/Paste. To move the contents of a cell or cells, select the source cell(s) and choose Edit/Cut, select a cell at the destination, and choose Edit/Paste (or just hit Enter). To paste just the numbers but not the formulas, when you paste, choose Edit/PasteSpecial/Values. If a formula adds the two numbers to its left, then the way it copies depends on how the cell addresses are specified. With relative addressing the formula changes to reflect its new location. Suppose that the formula “=A5+B5” is in cell C5. When this formula is copied to another cell, the resulting formula will change so that it adds the two cells to the left of the destination. For example, if this formula is copied to cell C6, the formula will change to “=A6+B6”. With absolute addressing the formula remains the same. If the formula includes dollar signs to read “=\$A\$5+\$B\$5”, then it remains unchanged when it is copied, always adding these two cells.

Using UNDO
Thank goodness for UNDO! No need to worry if you have just erased your precious data by accidentally hitting the delete key, so long as you react reasonably quickly. Just choose Edit/Undo from the main menu, and your valuable data will reappear as if by magic. Excel now has multiple UNDO levels, so that you can undo more than one action.

Formatting a Range of Cells
To make your worksheet look nice, you will need to format cells. Select the cells first, then use Format/Cells from the main menu. You will then have control over how numbers appear (number of decimal places, percentage, dollar signs, dates, etc.), how cells are aligned (left or right, top or bottom), what font is used (including color, size, underline, italic, bold), how cell borders are indicated, and what patterns and colors fill the cells. To show numbers with 2 decimal places and commas for thousands separation, here is the Format/Cells dialog box with the Number tab chosen. Don’t forget to select the cell(s) first!

Introduction

Excel Guide

11

To show numbers as percentages with one decimal place, you would use

12

Excel Guide

Introduction

Working with Data Files from Practical Business Statistics
For use with Excel, each chapter of Practical Business Statistics has its own data file that includes the data tables from examples and problems. To access it, use File/Open from Excel’s menu. Each column of numbers is named and ready to use. For example, the data sets from Chapter 3 are in the file named Chapter03.xls, and the employee database from Appendix A of the textbook is in the file named EmployeeDatabase.xls. A list of the names used for each individual data table within a file can be found in the Appendix to this Excel guide. To work with a column of numbers from a data file, you may use its name in a formula, such as “=AVERAGE(yield)” to place the average of a column of numbers named “yield” into a cell in your worksheet. Alternatively, you may drag the mouse down the numbers in the data set to select them if you wish.

Sorting to Put a Range in Order
When you want to put a column of data in order, smallest to largest or largest to smallest, simply select your data, then choose Data/Sort from the main menu. If you have a larger database with more than one variable measured on each elementary unit, be sure to select the entire data set before sorting. Here is a small database:

To sort it by revenues, you may either start by selecting A6 through C9, or let Excel do it for you when you choose Data/Sort from the main menu. Here is how it should look as you prepare to sort by Revenues, with both columns of data selected along with the identifying labels.

Introduction

Excel Guide

13

When you choose OK, the cities are sorted in order by revenues, and their expenses have correctly remained associated with them:

Making a Chart
Here is how to create a chart in Excel. 1. 2. Select your data, either one column or multiple columns. In some cases you will want to select the label at the top of the column for Excel to use. Choose Insert/Chart from the main menu or click on the Chart Wizard icon toolbar. The dialog box gives you many chart options: on the

14

Excel Guide

Introduction

Of particular interest in statistics are the XY (Scatter) used for bivariate and multivariate data and the Line chart used in time series analysis. Creating a histogram will require some computation before the chart is created. Details on creating particular types of charts will be covered as situations arise in this Excel Guide. 3. As you click on Next > to go through the sequence of dialog boxes, you will have the option to add titles, as well as to add or take away gridlines or legends. If you choose to put the chart back “As Object in” your worksheet, you will be able to move and size it near the data it came from. In addition, if you don’t like the gray background in a chart, double-click on it and set the Patterns in the Area to None. To change the size of the chart, drag a sizing handle (which appear in the corners and in the middle of the sides when you click just inside the edge of the chart). To move the chart to a different place in the worksheet, drag just inside the edge but not on a sizing handle. To add or change titles, right-click just inside the chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To change the font size, right-click on the item (a title or an axis) and choose Format from the little popup menu.

4.

Using the Data Analysis ToolPak
Some statistical methods, such as regression and the analysis of variance, can be performed in Excel by using the “Data Analysis ToolPak” which is part of Excel, but you may need to install it before you can use it.

Introduction

Excel Guide

15

To find out if the Data Analysis ToolPak is installed on your system, look under the Tools menu for Data Analysis. If you cannot find Data Analysis under Excel’s Tools menu, select Add-Ins from the Tools menu and make sure the Analysis ToolPak is checked. If the Analysis ToolPak was not installed when Excel was installed on your computer, you may need to install it from the Excel CD-ROM.

Hints, Tips, and Troubleshooting
Here are some general comments that fall into the categories of hints, tips, and troubleshooting. Experiment! Explore the menu system. Try things out to see how they work. And check your work for reasonableness: don’t just believe it has to be correct because you did it on a computer. Save your work often so that if the computer shuts off unexpectedly you will not be sad. If your work is important, then keep more than one copy of it in more than one place. Use the help system to learn more about Excel, either from the menu or by hitting the F1 key. Personally, I find that Help/ContentsAndIndex/Index from the main menu is the most useful. Be familiar with the Tools/Options menu choice, which give you control over many worksheet features. Here are some highlights: 1. With the View tab of Tools/Options, if something like a formula bar or scroll bar disappears from your worksheet, you will be able to bring it back. If you want to get rid of those gridlines, you can. With the Calculation tab of Tools/Options, you can make sure that the worksheet is set to calculate automatically. If calculation is set to Manual, then you may need to hit the F9 key to see correct up-to-date results. With the Edit tab of Tools/Options, you can control whether the selection moves down, or some other direction, or stays in the cell when you hit Enter. You can also ask Excel not to guess what you mean when you start typing, by un-checking the box at “Enable AutoComplete for cell values”. With the General tab of Tools/Options, you can choose the default font and size.

2.

3.

4.

To widen a column so that you can see all that is in a cell, select the cell and then use Format/Column/AutoFitSelection from the main menu. There are additional toolbars, in particular, the drawing toolbar can be useful for placing arrows and other drawing objects on the worksheet. To see them, right-click in the open area near the top of the window. If you are not sure how to get Excel to do something, try right-clicking or double-clicking on the object. The context-sensitive menu that appears when you right-click with the mouse can be very helpful, by making suggestions that are appropriate to the object you are interested in. Try this on a range or on part of a chart when you are not sure what to do. The Esc key makes this pop-up menu go away if you decide not to use it.

16

Histograms

Chapter 3

Histograms (Chapter 3)
Here is how to produce a histogram in Excel by first creating a column of bins to hold the frequencies, then using Excel’s COUNTIF function to count how many data values fall into each bin, and finally create a bar chart of these frequencies with labels and connected bars. You have two alternatives to these procedures while staying in Excel. First, with StatPad, creating a histogram is quick and easy. Second, with the data analysis add-in (“Analysis ToolPak”), creating a histogram requires more steps and the final result (after eliminating gaps between bars) can be counterintuitive because a data value that falls on a bin boundary may be placed in the bin to its left, instead of the bin to its right (so that, e.g., 60 would be counted as “50 to 60” instead of “60 to 70”).

Example: Computer Ownership Rates (Histogram)
Consider the data for rates of computer ownership (Table 3.5.2 of Practical Business Statistics). Here are the steps involved in creating a histogram: 1. Create a column of bin boundaries, in this case from 30% to 70% by 5% (a reasonable choice because the data values range from 37.2% to 66.1%). To do this, you might type “30%” in cell E277, hit Enter, then use Excel’s menu commands Edit/Fill/Series with Series in Columns, Step value 5% and Stop value 70% as shown here:
1

2.
1

Compute the counted frequencies using the COUNTIF function. Select the cell to the right of the first bin boundary amount. We want the number of data values from 30% to

Typing “30%” in the cell is the same as typing “0.30” in the cell and then using Format/Cells/Number to specify percentage format with two decimal places.

Chapter 3

Histograms

17

35% (remember that 35% is the same as 0.35 in Excel). Since 30% is in cell E271 and 35% is in cell E272, we can use the formula
=COUNTIF(computer_owners,"<"&E272)-COUNTIF(computer_owners,"<"&E271)

which has been carefully crafted in this form so that all counts can be found by copying down the column, to the next-to-last cell (representing data values from 65% to 70%). For this formula to work, the column of data must have a name such as “computer_owners” here (if your data does not yet have a name, then select the numbers in the data column and use Excel’s menu command Insert/Name/Define to give your data a name). To copy and paste after typing the formula and hitting enter, you may use the menu command Edit/Copy, then select the cells of the column and then use Edit/Paste (or just double-click the little fill handle at the lower right of the selected cell, then delete the last one in the column). Here is the result so far:

3.

Prepare for charting by selecting the bin boundaries and the counts, INCLUDING THE BLANK TOP ROW, which will convince Excel to draw the bar chart correctly, using the bin boundaries as the category axis. Here is how it should look as you select Insert/Chart from the menu (or click on the Chart Wizard icon on the toolbar):

18

Histograms

Chapter 3

4.

Use the standard Column Chart Type with first Sub-Type:

Chapter 3 5.

Histograms

19

Click on Next > twice, then eliminate some unnecessary features. Delete the legend by selecting the Legend tab and unselecting the “Show legend” checkbox, and eliminate gridlines by selecting the Gridlines tab and unselecting anything checked there:

6. 7.

Click on Finish to place the chart in the worksheet. Eliminate the gaps between the bars by right-clicking on a bar to bring up a little menu from which you choose “Format Data Series”

8.

Choose the Options tab, then decrease the Gap Width to 0 to make it into a true histogram:

20

Histograms

Chapter 3

9.

Click OK to complete this task. You now have a histogram in the worksheet!

10. Here are some optional steps. If you don’t like the gray background, double-click on it and set the Patterns in the Area to None. Similarly, by double-clicking inside a bar, you may change or eliminate the color. To change the size of the histogram, drag a sizing handle (which appear in the corners and in the middle of the sides when you click just

Chapter 3

Histograms

21

inside the edge of the chart). To move the chart to a different place in the worksheet, drag just inside the edge but not on a sizing handle. To add titles, right-click just inside the chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To change the font size, right-click on the item (a title or an axis) and choose Format from the little pop-up menu. To format the horizontal axis as percent, double click on the axis, then choose Number and Percent. Here is one possible result: Histogram of Computer Ownership 20 Number of States 15 10 5 0 30% 35% 40% 45% 50% 55% 60% 65% Percent of Households

Example: Assets of Commercial Banks (Transformation)
This example shows how you can transform a data set using logarithms. We use the Excel function =LOG10( ) to find the base 10 logarithm of each data values, but you may use natural logarithm (base e), using the =LN( ) function instead. Consider the assets, in billions, of commercial banks in the Fortune 1000 (Table 3.4.1 of Practical Business Statistics). A histogram of these value, found using the methods explained earlier in this chapter, is very skewed:

22

Histograms

Chapter 3

To compute the logarithms of the data values, begin by computing the logarithm of the first data value. To do this, select the cell to its right, then use Excel’s Insert/Function menu command. You will find the LOG10 function under the Math & Trig category:

Select OK to see the LOG10 dialog box, then click on the first data value (you may need to drag the dialog box out of the way to see it) to tell Excel which number to take the logarithm of, as follows (in this case, the number in cell E79, which you specify by clicking on it):

Chapter 3

Histograms

23

Select OK, then double-click on the fill handle to copy this formula down the column of data, resulting in a new column containing the logarithms of the data (if you prefer, you may use Edit/Copy and Edit/Paste instead):

24

Histograms

Chapter 3

Now give these logarithms a name, for example, logAssets, while they are still selected, by choosing the Insert/Name/Define menu command and typing the name logAssets:

Chapter 3

Histograms

25

Now we are ready to construct the histogram of logAssets, using the methods explained earlier in this chapter, but this time for the logAssets data. Here is the resulting histogram, which is much less skewed than the original data:

26

Landmark Summaries

Chapter 4

Landmark Summaries (Chapter 4)
Excel can quickly compute many statistical summaries and, with some effort, draw the related graphs. In this chapter we consider the average, median, weighted average, five-number summary, boxplot, and cumulative distribution function.

Example: How Many Defective Parts? (Average, Median)
This example shows how to use Excel to find the average, median, quartiles, and percentiles. Consider the data for defective parts (from the example in Chapter 4 of Practical Business Statistics). If your data are not yet named, begin by giving a name (such as “Defects” here) to your column of numbers by highlighting the numbers and then using Excel’s menu command Insert/Name/Define. Next, select the cell where you want to put the average. You may either 1. or 2. select Average from the statistical functions listed under the menu command Insert/Function, hit OK, and then either type “Defects” directly into the dialog box, or drag the mouse down your column of numbers to tell Excel which data set to use. Then select OK. type “=AVERAGE(Defects)” directly into the cell and hit Enter

Chapter 4

Landmark Summaries

27

Either way, the result is the same. After selecting another cell to hold the median and repeating these steps to find the median, the result (average is 5.1, median is 4.5) is as follows:

This example shows how to compute a weighted average, given two columns of numbers: one with values and the other with the weights. Consider the data on grades (from the example in Chapter 4 of Practical Business Statistics). A grade point average is the weighted average grade where credits define the weights.

28

Landmark Summaries

Chapter 4

Be sure each column of numbers has a name (select the column of numbers and use Excel’s Insert/Name/Define menu command if needed). The weighted average can then be computed using the expression “=SUMPRODUCT(Credits, Grade)/SUM(Credits)”. The SUMPRODUCT function multiplies credits by grade for each course and adds them up, while the SUM function finds the total credits. Remember always to divide by the sum of the weights (in this example, the credits). The result here is a grade point average of 3.45:

Example: How Many Defective Parts? (Quartiles, 5-Number Summary, Percentiles)
To find the quartiles, recall that the rank of the lower quartile is [1+int(1+n)/2]/2. You can find n, the number of data values, by using Excel’s COUNT function. To convince Excel to find the data value at this rank (and to average two data values if the rank includes a fractional part), we can use Excel’s PERCENTILE function, with a few modifications, as shown below. To find the upper quartile, the formula changes only slightly. You can use these formulas to find the quartiles of any data set by substituting the data set name in place of “Defects”. Here are the results for the Defects data:

Chapter 4

Landmark Summaries

29

The 5-number summary consists of the smallest, lower quartile, median, upper quartile, and largest. You can use Excel’s MIN and MAX functions to find the smallest and largest. Here is the 5-number summary:

To find a percentile when you have the percentage, you may use Excel’s PERCENTILE function, which needs to know the data set and the percentage. Here is the 85th percentile for the Defects data:

Given a number (not necessarily a data value, but in the same units as the data values) you may use Excel’s PERCENTRANK function to find the percentage that tells what percentile it is. This example shows that 11 is the94th percentile. That is, about 94% of the data values are smaller than 11. To get the number 0.944 to show as 94.4%, you may select the cell and format it as a percentage (using the menu command Format/Cells/Number/Percentage).

30

Landmark Summaries

Chapter 4

Example: CEO Compensation (Boxplot)
This example shows how to draw a box plot, once you have the 5-number summary, which involves a particular arrangement of the five numbers in a table. A simpler alternative is to use StatPad. Consider a data set of CEO compensation, with five-number summary 100,000, 1,000,000, 1,497,500, 2,101,000, and 7,730,000. Here are the steps involved in creating a box plot: 1. Arrange the 5-number summary exactly as follows, repeating some summaries and leaving a space before the median as shown here:

2.

To the left of these numbers, type in the numbers 1, 2, 3 in exactly the following sequence. This will tell Excel how to draw the lines to create the box plot (the number 2 is in the middle, while 1 will place it to the left and 3 to the right).

Chapter 4

Landmark Summaries

31

3.

Select both columns of numbers all the way down (including the blank line) and choose Insert/Chart from the menu as follows:

32 4.

Landmark Summaries

Chapter 4

Choose “XY (Scatter)” as the Chart Type, and choose “Scatter with data points connected by lines without markers” as the Chart sub-type, as follows:

And here

5.

Click Next > twice, then eliminate some unnecessary features. Delete the X Axis by selecting the Axes tab and unselecting the “Value (X) Axis” checkbox. Delete the legend by selecting the Legend tab and unselecting the “Show legend” checkbox, and eliminate gridlines by selecting the Gridlines tab and unselecting anything checked there. You may also add titles by clicking on the Titles tab:

Chapter 4

Landmark Summaries

33

6.

Click on Finish to place the chart in the worksheet. The chart is selected so you see the sizing handles around it and the data it was made from.

7.

Drag the sizing handles to make it larger. In addition, if you don’t like the gray background, double-click on it and set the Patterns in the Area to None. To move the

34

Landmark Summaries

Chapter 4

chart to a different place in the worksheet, drag just inside the edge but not on a sizing handle. To add or change titles, right-click just inside the chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To change the font size, right-click on the item (a title or an axis) and choose Format from the little pop-up menu. Here is the result:

Example: Defects Data (Cumulative Distribution Function)
This example shows how to draw a cumulative distribution function (CDF), which involves arranging two copies of the data set together with the percentages in a table. A simpler alternative is to use StatPad. Consider the data for defective parts in production (from an example in Chapter 4 of Practical Business Statistics). Here are the steps involved in creating the CDF: 1. Select the all of the numbers in the data column and get ready to make copies of it using Edit/Copy from the main menu. One quick way to select the numbers is to click on the first number, then hit the End key, and then hold the Shift key while you hit the down arrow ↓ .

Chapter 4

Landmark Summaries

35

2.

Click on a wide-open area of the worksheet with room for two columns not touching any other data in your worksheet. Paste the data once (using Edit/Paste from the main menu), then select the empty cell under the last data value (one quick way is to hit End, ↓ , and ↓ ) and paste it again. Here is how it looks after pasting once, just before the second pasting:

36

Landmark Summaries

Chapter 4

3.

Now sort this double data set as follows. First, select any single data value within the column (Excel should sort the entire column). Then choose Data/Sort from Excel’s main menu and select OK from the dialog box. You will then have two copies, sorted. Here is the worksheet just before sorting:

Chapter 4

Landmark Summaries

37

4.

Create the column of percentages. Place the number 0 in the empty cell just to the right of the top cell of your sorted double data set by typing 0, Enter. Just below it, type the formula “=1/COUNT(Defects)” where you would substitute your data set name for “Defects” here. Just below that, type the = key, click on the cell with the 0 you just entered, then type “+1/COUNT(Defects)”, substituting your data set name for “Defects” and hit Enter. Finally, double-click the fill handle to complete the column (or copy this cell to the cells under it to fill out the column). Here is the result just before double-clicking on the fill handle - note that the cell P10 is where the zero was entered.

38

Landmark Summaries

Chapter 4

5.

Select both columns of numbers and choose Insert/Chart from the menu as follows:

Chapter 4

Landmark Summaries

39

6.

Choose “XY (Scatter)” as the Chart Type, and choose “Scatter with data points connected by lines without markers” as the Chart sub-type, as follows:

40

Landmark Summaries

Chapter 4

And here

7.

Click Next > twice, then eliminate some unnecessary features. Delete the legend by selecting the Legend tab and unselecting the “Show legend” checkbox, and eliminate gridlines by selecting the Gridlines tab and unselecting anything checked there. You may also add titles by clicking on the Titles tab:

Chapter 4

Landmark Summaries

41

8.

Click on Finish to place the chart in the worksheet. The chart is selected so you see the sizing handles around it and the data it was made from.

42

Landmark Summaries

Chapter 4

9.

Drag the sizing handles to make it larger. Then double-click on the Cumulative Percent axis (or on any number on this Y axis), select the Number tab, choose Percentage with 0 Decimal places as follows:

Chapter 4

Landmark Summaries

43

10. In addition, if you don’t like the gray background, double-click on it and set the Patterns in the Area to None. To move the chart to a different place in the worksheet, drag just inside the edge but not on a sizing handle. To add or change titles, right-click just inside the chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To change the font size, right-click on the item (a title or an axis) and choose Format from the little pop-up menu. Here is the result:

44

Variability

Chapter 5

Variability (Chapter 5)
Excel can quickly compute the basic variability measures. In this chapter we consider the standard deviation, the range, the coefficient of variation, and the variance.

Example: The Advertising Budget (Standard Deviation, Range, Coefficient of Variation, Variance)
This example shows how to find four measures of variability: the standard deviation, range, coefficient of variation, and variance. Consider the data for the advertising budget of firms within an industry group (from the example in Chapter 5 of Practical Business Statistics). For these formulas to work, the column of data should have a name such as “Budget” here (if your data does not yet have a name, then select the numbers in the data column and use Excel’s menu command Insert/Name/Define to give your data a name). Use Excel’s STDEV function to find the sample standard deviation. To find the range, subtract the smallest from the largest using Excel’s MIN and MAX functions. To find the coefficient of variation, recall that we divide the standard deviation (STDEV function) by the average (AVERAGE function). Finally, to find the variance, use Excel’s VAR function. Here are the results:

Chapter 5

Variability

45

If you need the population standard deviation instead of the sample standard deviation, you may use the function STDEVP instead of STDEV.

46

Probability

Chapter 6

Probability (Chapter 6)
Most of the probability chapter requires thinking, and perhaps a calculator, to get the answers. Of course you can use Excel to do your arithmetic for you - just select a cell, hit the = key, type an expression such as (0.1+0.3)*0.4, and hit Enter to see the answer. Excel can also be used to demonstrate the law of large numbers, to show you how the (random) relative frequency of an event becomes closer to the probability as the number of trials grows larger.

Example: The Law of Large Numbers
Suppose an event has probability 0.4. There is nothing random about this number. The randomness is in whether the event happens or doesn't each time you run the random experiment. If you run it 10 times, the event might happen exactly 4 times, but it also might happen twice, 6 times, or just once. In this example you will see how the relative frequencies, while being random, get closer to the probability as n increases. 1. 2. 3. Start with a new worksheet (File/New) and then type “Probability” in cell A2 and 0.4 in cell B2. With cell B2 still selected, use Insert/Name/Define from the main menu to name it “Probability”. In cell A8, type the formula “=IF(RAND()<Probability,1,0) and hit Enter.

4.

Hit the F9 key (called the “Recalculation key”). Each time you do, a new random number RAND() will be compared to the Probability: if it is smaller, then 1 is displayed and the event “happens”, otherwise you will see 0. Hit F9 over and over to get a sense of how a random event with probability 0.4 might occur. If you wish, select cell B2 and type in a different probability number, hit Enter, then recalculate over and over again with F9. Try it with probability 0.1 and 0.9 and others if you wish.

Chapter 6 5.

Probability

47

Now select cell A8 and choose Edit/Copy from the main menu. Next, click once with the mouse on cell A9. To select lots of cells from A9 on down, hold the Shift key while you hit Pg Dn over and over. When you have selected a few hundred or a few thousand cells, choose Edit/Paste from the main menu. You now have repeated the random experiment many times, once in each cell starting with A8:

6.

Compute the relative frequencies as follows: enter the formula “=SUM(\$A\$8:A8)/COUNT(\$A\$8:A8)” into cell B8, being careful about the \$ signs, which will help Excel when you copy the formula down the column by double-clicking on the fill handle, as shown here:

48

Probability

Chapter 6

6.

Hit the F9 key to see how the relative frequencies might change. Here is one possibility: note that the relative frequencies are 0 for the first two trials because the event didn’t happen yet. After 3 trials, the relative frequency is 1 out of 3, or 0.333333. After 4 trials it drops to 1 out of 4, or 0.25, and so forth:

Chapter 6

Probability

49

7.

To create a graph, first select the column of relative frequencies. This might be done by selecting cell B8, hitting End, then holding down Shift while you hit the down arrow ↓ . Then choose Insert/Chart from the main menu and choose a Line Chart with the first Chart sub-type:

50

Probability

Chapter 6

8.

Click Next > twice, then delete the legend by selecting the Legend tab and unselecting the “Show legend” checkbox. You may also add titles by clicking on the Titles tab:

Chapter 6 9.

Probability

51

Click on Finish to place the chart in the worksheet, and resize it with the sizing handles. Note how the graph of the relative frequencies hovers fairly near to the probability of 0.4. Hit the recalculation key (F9) a few times to see how else it might have come out, with different randomness each time.

10. You can see what relative frequencies look like with different probabilities. Here is how they might look if you change Probability to 0.9:

52

Probability

Chapter 6

Chapter 7

Random Variables

53

Random Variables (Chapter 7)
Excel can be used to perform, or help with, many of the basic calculations involving random variables. This chapter will cover discrete random variables (mean and standard deviation), binomial probabilities, normal probabilities, Poisson probabilities, and exponential probabilities.

Example: Profit and Economic Scenarios (Mean and Standard Deviation of a Discrete Distribution)
This example shows how to use Excel to help with the calculation of the mean and standard deviation of a discrete distribution. Consider the profits example (from the example in Chapter 7 of Practical Business Statistics). For these formulas to work, each column of data should have a name such as “Profit” and “Probability” here (if your data does not yet have a name, then select the numbers in a data column and use Excel’s menu command Insert/Name/Define to give your data a name). The mean, 3.65, is the sum of the products of value times probability, hence the formula is “=SUMPRODUCT(Profit,Probability)”. Give this cell (which now contains the mean) the name “Mean”. The standard deviation, 4.40, is the square root (SQRT function) of the sum of the products of the square of value minus mean times probability, hence the formula is “=SQRT(SUMPRODUCT((Profit-Mean)^2,Probability))”. These formulas give us 3.65 for the mean and 4.40 for the standard deviation:

Example: How Many of Five Possibilities Will Succeed? (Binomial Probabilities)
This example shows how to find probabilities for the binomial distribution, given the number of trials n and the probability π for each one. Consider the example in which n = 5 and π = 0.8. That is, you have 5 independent possibilities and each one has probability 0.8 of success.

54

Random Variables

Chapter 7

To use Excel to compute binomial probabilities, use the formula “=BINOMDIST(a,n,π,FALSE)” to find the probability P(X=a) of being equal to a, and use the formula “=BINOMDIST(a,n,π, TRUE)” to find the probability P(X≤a) of being less than or equal to a, as follows, where the “FALSE” and “TRUE” in Excel’s binomial distribution formula refers to whether the probability distribution is cumulative or not. Here are some results. The probability that exactly 3 succeed is 0.2048, the probability that 3 or fewer succeed is 0.2627, and the probability that 3 or more succeed is 0.9421 (evaluated as “not 2 or less”):

Example: Standard Normal Probabilities
This example shows how to find probabilities for the standard normal distribution in Excel. The standard normal probability table (Chapter 7 of Practical Business Statistics) gives the probability that a normal distribution with mean 0 and standard deviation 1 will be less than a given value. For example, the probability that a standard normal is less than 1.38 is 0.9162. These may easily be found using Excel’s NORMSDIST function as follows:

Chapter 7

Random Variables

55

Example: Sales Forecasting (Normal Probabilities)
This example shows how you can solve normal probability problems without standardizing the numbers. Because you tell Excel the mean and standard deviation, you can ask about probabilities concerning the original numbers (no need to subtract the mean and divide by the standard deviation; Excel’s NORMDIST function will do that for you). Consider the sales forecasting example (from Chapter 7 of Practical Business Statistics). Sales are forecast as having a mean of \$20 million and a standard deviation of \$3 million. Here you find the probability (0.0478) that sales will be less than \$15 million, as well as three other probabilities: To use Excel to compute these probabilities, we use the function “NORMDIST(value,mean,standardDeviation,TRUE)” to find the probability that a normal distribution with specified mean and standard deviation is less than some value. There is no need to standardize because Excel will do this for you as part of the calculation. The first calculation is straightforward because it is a probability of being less. The second calculation is one minus the NORMDIST function because it is a probability of being greater. The third calculation is the difference of two NORMDIST calculations because it is the probability of being between two values. The fourth calculation is one minus the difference of two NORMDIST calculations because it is the probability of NOT being between two values. Here are the results:

Example: How Many Warranty Returns (Poisson Probabilities)
This example shows how to find probabilities for the Poisson distribution. Consider the warranty returns example (from Chapter 7 of Practical Business Statistics) where you expect 1.3 of your products to be returned, on average, each day for warranty repairs. Assuming a Poisson distribution, the POISSON function can give you either the probability that a particular number

56

Random Variables

Chapter 7

will be returned, or the cumulative probability that a particular number or less will be returned on a particular day. Here is how to use Excel’s function “POISSON(value,mean,FALSE)” to find the probability that a Poisson random variable is exactly equal to some value, and how to use “POISSON(value,mean,TRUE)” to find the probability that a Poisson random variable is less than or equal to some value. The terms TRUE and FALSE in the function refer to whether the probability is cumulative or not. Here are the results:

Example: Customer Arrivals (Exponential Probabilities)
This example shows how to find probabilities for the exponential distribution. Consider the customer arrivals example (from Chapter 7 of Practical Business Statistics) where customers arrive independently at a constant mean rate of 40 per hour. The random variable is the waiting time until the next customer arrives. The mean waiting time is 1.5 minutes, computed as 60 minutes per hour divided by 40 expected arrivals in that time. Using Excel’s function EXPONDIST(value,1/mean,TRUE), you can find the probability that an exponential random variable with a given mean is less than or equal to the given value. Note that Excel’s EXPONDIST function uses 1/mean, not the mean itself. Here are two calculations, the probability of waiting 5 minutes or less for the next customer, and the probability of waiting 2 minutes or less:

Chapter 8

Random Sampling

57

Random Sampling (Chapter 8)
Excel can choose a random sample with or without replacement. The standard error of the average may easily be found using Excel formulas.

Example: Choosing a Random Sample of 3 from a Population of 10
Here is how to use Excel to choose a random sample of size n = 3 from a population of size N = 10 by shuffling the population, using a column of random numbers placed next to the population listing. 1. Create a column of frame numbers, in this case from 0 to 10. To do this quickly (even for much larger N), you might type “1” in cell A3, hit Enter, then use Excel’s menu commands Edit/Fill/Series with Series in Columns, Step value 1 and Stop value 10 as shown here:

2.

Insert random numbers by typing “=RAND()” in cell B3, just to the right of the first frame number, hit ENTER, and then copy the result down the column to produce a column of random numbers (this is quickly done by double-clicking the little fill handle at the lower right corner of the selected cell B3). To shuffle the population, first select both columns of numbers (the frame numbers and the random numbers). For a large population, this is easily done by selecting the first frame number (cell A3 here), holding Shift while you hit the right arrow → , hitting End, and holding Shift while you hit the down arrow ↓ . Then use Data/Sort from Excel’s main menu, being sure to sort by the random numbers.

3.

58

Random Sampling

Chapter 8

4.

After the columns are sorted randomly, you may take the first three frame numbers to obtain your random sample, which results in selection of items 7, 10, and 2 in this example.

Example: Shopping Trips (Standard Error of the Average)
This example shows how to find the standard error of the average for a column of data, once you have the standard deviation, by dividing it by the square root of n. Consider the shopping trips example (from Chapter 8 of Practical Business Statistics). Suppose you put the standard deviation, S = 8.63, into cell A15 and the sample size, n = 200, into cell A16. The standard error of 0.610 may then be found using the formula “=A15/SQRT(A16)” as follows:

Chapter 8

Random Sampling

59

Alternatively, you can compute the standard error all at once with the formula “=STDEV(rangeName)/SQRT(COUNT(rangeName))”, where “rangeName” is the name of your data.

60

Confidence Intervals

Chapter 9

Confidence Intervals (Chapter 9)
You can use Excel to compute confidence intervals for you, given a sample of data, at any specified confidence level. Excel will even look up the t table value for you.

Example: Controlling the Average Thickness of Paper (Confidence Interval)
This example shows how to construct confidence intervals for a sample of data. Consider the example of paper thickness (Table 9.1.2 of Practical Business Statistics). Here is how to use Excel to find the confidence interval. First, if needed, give the data column a name (such as “Thickness” here) by selecting the numbers and using Excel’s Insert/Name/Define menu command. Next, use Excel’s AVERAGE, STDEV, and COUNT functions to compute the average, the standard deviation, and the sample size respectively and name the cells so they can be easily used. The 95% confidence interval formula is then computed as average plus or minus t times the standard error, where we use Excel’s TINV function to find the t value. Excel’s TINV function is shown using “1−0.95” because it needs “one minus the confidence level” instead of the confidence level itself. The term n−1 is used because TINV needs the number of degrees of freedom.

To use a different confidence level other than 95%, you need only change the 0.95 in the TINV function. For example, for a 99% confidence interval, you would use 0.99 in place of 0.95.

Chapter 9

Confidence Intervals

61

Example: Controlling the Average Thickness of Paper (One-sided Confidence Intervals)
Here is how to find a one-sided confidence interval. Consider the example of paper thickness (Table 9.1.2 of Practical Business Statistics). In order to find a one-sided 95% confidence interval, the t table value changes to TINV(2*(10.95),n-1), placing all the probability of error on one side because the other side extends indefinitely without chance of error. To claim that the population mean paper thickness is at least a certain value, the appropriate calculation is average minus t times standard error (so that the one-sided interval from here to all higher values includes the average). Using the average value of 0.0040147, the standard deviation of 0.0002614, and the sample size of 15, we have:

To use a different confidence level other than 95%, you need only change the 0.95 in the TINV function. For example, for a 99% confidence interval, you would use 0.99 in place of 0.95.

62

Hypothesis Testing

Chapter 10

Hypothesis Testing (Chapter 10)
Excel can help you perform hypothesis tests for various situations involving population means for which univariate data are available: one- and two-sided tests, various test levels, and two-sample problems (both paired and unpaired). If you are using the confidence interval approach to hypothesis testing (for example, deciding a two-sided test by seeing whether the reference value is in the interval), please use the confidence intervals explained earlier for Chapter 9. Instead of having you specify the test level (for example 5%), Excel can give you the p-value (as well as the t value and basic summaries). You may then complete the test at any level by comparing the computed p-value to the test level. For example, if the reported p-value is less than 5%, the test is significant at the 5% level (otherwise it is not significant). You may wish to review the discussion of p-values in Chapter 10 of Practical Business Statistics.

Example: Controlling Paper Thickness (the t Test: Computing the t Statistic and Finding the p-Value)
This example shows how to test a population mean against a known reference value based on a random sample from the population. Consider the data on paper thickness (Table 9.1.2 of Practical Business Statistics), to be tested against the reference value µ0 = 0.00385. If your data are not yet named, please select your column of numbers and use Excel’s menu command Insert/Name/Define. To find the t statistic, we subtract the reference value, 0.00385, from the average and then divide by the standard error (which is standard deviation divided by square root of n). To find the p-value, we use the Excel formula =TDIST(ABS(t),n-1,2) where t is the computed t statistic and n is the sample size (the “2” tells Excel to find a 2-sided p-value). Here, then, are the results of an ordinary two-sided test for this example:

Chapter 10

Hypothesis Testing

63

Example: Controlling Paper Thickness (One-sided t Test)
For a one-sided test, the t statistic and sample size n both stay the same as before, but the p-value must be computed differently. These calculations are different depending on the side being tested. First, consider the case of a one-sided test to see if the sample average is significantly larger than the reference value (that is, the research hypothesis claims that the population mean is larger than the reference value). In this case, the p-value is either =TDIST(ABS(t),n-1,1) or =1=TDIST(ABS(t),n-1,1), depending on whether t is positive or negative respectively. Using the t statistic of 2.4395561 and sample size n = 15 for the paper thickness example, the one-sided pvalue is 0.0143, found as follows:

Next, consider the case of a one-sided test to see if the sample average is significantly smaller than the reference value (that is, the research hypothesis claims that the population mean is smaller than the reference value). In this case, the p-value is either =TDIST(ABS(t),n-1,1) or =1=TDIST(ABS(t),n-1,1), depending on whether t is negative or positive respectively. Using the t statistic of 2.4395561 and sample size n = 15 for the paper thickness example, the one-sided pvalue is 0.9857, found as follows:

64

Hypothesis Testing

Chapter 10

This example has been used to illustrate the calculations. Note that, in real life, you would not compute both of these tests (significantly greater, significantly smaller) on the same data set because you would have to choose the side you wished to test before performing the test.

Example: Reactions to Advertising (Paired t Test)
This example shows how to perform a paired t test to see whether two paired columns of data are significantly different or not, on average. This test begins by subtracting the two columns (which is permitted because the situation is paired) and then testing these differences against the reference value 0. Consider the data on reactions to advertising (Table 10.6.1 of Practical Business Statistics). The differences are calculated by using Excel’s arithmetic formulas. In this case, the formula =D10-C10 was entered into cell E10 to compute After - Before for the first person. This formula was then copied down the column (either using copy and paste from the main menu, or simply double-clicking the little fill handle at the lower right corner of the selected cell E10). The twosample paired t test then becomes an ordinary one-sample t test of the differences, using the reference value 0. The result is p =0.03, and since p < 0.05, we conclude that there is a significant difference between the Before and the After scores. Here are the calculations:

Chapter 10

Hypothesis Testing

65

Example: Gender Discrimination and Salaries (Two-Sample Unpaired t Test)
This example shows how to perform a two-sample t test for the small-sample situation (see the two formulas for the standard error of the difference in Chapter 10 of Practical Business Statistics). Consider the data on gender discrimination and salaries (Table 10.6.4 of Practical Business Statistics). The hypothesis test (to see if the average salaries of men’s and women’s salaries are different from one another) starts with the basic summaries: the average of each group, the standard deviation of each group, and the sample size of each group. Each of these summaries is given a name to make it easy to use (by selecting the cell and using the menu command Insert/Name/Define).

66

Hypothesis Testing

Chapter 10

Then you can find the standard error of the average difference, the t statistic, and the p-value from these summaries. The conclusion is that there is a very highly significant difference between men's and women's salaries (p < 0.001). Here are the Excel results:

Chapter 10

Hypothesis Testing

67

68

Correlation and Regression

Chapter 11

Correlation and Regression (Chapter 11)
Excel provides assorted methods for the analysis of bivariate data: correlation, plotting, and regression analysis.

Example: Contacts and Sales (Correlation)
This example shows how to find the correlation in Excel by using the CORREL function after naming your two columns of numbers (for example, by selecting a column of numbers and using the Insert/Name/Define menu command to name it). Here is how to find the correlation of 0.985 between contacts and sales:

Example: Internet Usage Ratings (Plotting the Data)
This example shows how to use Excel to create a scatterplot for a bivariate data set. Consider the data on Internet usage ratings (Table 11.1.3 of Practical Business Statistics). It is easiest if the two columns are next to each other, with the X-axis data to the left of the Y-axis data. We will create a scatterplot of Time (vertical) against Pages (horizontal). 1. 2. Begin by selecting both columns of numbers (with the horizontal X axis data to the left). Choose Insert/Chart from the main menu.

Chapter 11 3. 4.

Correlation and Regression

69

Choose XY (Scatter) from the list of chart type, and the first Chart sub-type (“Scatter. Compares pairs of values”). Continuing with Excel’s steps, you can create a scatterplot as an object in the worksheet. Here is how the initial dialog box looks like after you select the data and begin to insert a chart, together with the finished chart in the worksheet.

5.

In addition, if you don’t like the gray background in the chart, double-click on it and set the Patterns in the Area to None. To eliminate the legend at the right in the chart, rightclick on it and clear. To eliminate gridlines, right-click on one and clear. To change the size of the chart, drag a sizing handle (which appear in the corners and in the middle of the sides when you click just inside the edge of the chart). To move the chart to a different place in the worksheet, drag just inside the edge but not on a sizing handle. To add or change titles, right-click just inside the chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To change the font size, right-click on the item (a title or an axis) and choose Format from the little pop-up menu. To change the number format of an axis, double-click on it and select Number. Here is one possible result:

70

Correlation and Regression

Chapter 11

90 80 70 60 Time 50 40 30 20 10 0 0 50 100 Pages 150 200

Example: Internet Usage Ratings (Plotting the Least-Squares Line)
Here is how to use Excel to add a least-squares line to a scatterplot. We continue with the Internet usage ratings data. Right-click with the mouse on a data point in the chart, then select Add Trendline from the context-sensitive menu that appears, and finally specify Linear as the Trend/Regression type before clicking OK. The initial step of right-clicking on a data point is shown below, followed by the end result after the line has been added.

Chapter 11

Correlation and Regression

71

72

Correlation and Regression

Chapter 11

Example: The Stock Market (Regression Analysis)
Here is how to perform regression analysis with Excel, using data from Table 11.1.6 on the daily percent change in the S&P500 stock market index, trying to predict today’s market movement from yesterday’s. As an alternative, you may wish to consider using StatPad, which will provide more explanation of the results and give more output and charting options. 1. 2. First give a name to each column of numbers if needed (for example, by selecting a column of numbers and using Excel’s Insert/Name/Define menu command). Look under the Tools menu for Data Analysis, and then select Regression. In the resulting dialog box, you may specify the range name for the Y variable (“Today” in this example) and for the X variable (“Yesterday”). If you cannot find Data Analysis under Excel’s Tools menu, select Add-Ins from the Tools menu and make sure the Analysis ToolPak is checked. If the Analysis ToolPak was not installed when Excel was installed on your computer, you will need to install it from the Excel CD-ROM. Click “Output Range” in the dialog box and specify where in the worksheet you want the results to be placed, then click OK. Here is the dialog box and its results, which include the R2 value of 0.0132 (which tells you that only 1.32% of the variation in market performance can be explained by yesterday’s market change), the standard error of estimate Se of 0.0087 (which tells you that, after using the predicted value, the actual performance is typically different by about 0.87 percentage points), as well as the estimated regression coefficient b = 0.1114 (which tells you that, for each percentage point of yesterday’s market performance, we expect today’s market performance to be up about an additional one-tenth of that, on average), its standard error Sb = 0.1522, the t statistic t = 0.732, and its p-value of 0.468 (which is not significant because p > 0.05, telling you that there is no significant relationship between yesterday’s and today’s market performance).

3.

Chapter 11

Correlation and Regression

73

From these results, looking at the last table’s Coefficients and recognizing that “X variable 1” refers to the X variable “Yesterday”, you can see that the least-squares prediction equation is Today = 0.000398 + 0.111421 × Yesterday Because the R2 is 0.0132 or 1.32% (from the first table of Regression Statistics), it is clear that given whatever the market did yesterday does not seem to help you very much to predict what it will do today. To perform the t test, you may look at the t statistic (“t stat” for X Variable 1” in the last table) of 0.732 and its p-value of 0.468. Because p > 0.05 the relationship between Yesterday’s and Today’s stock market movements is not significant. This is also clear from the 95% confidence interval for the regression coefficient, which extends from -0.196226 to 0.419068 and includes the reference value 0. These numbers are found in the last row of the last table under the headings “Lower 95%” and “Upper 95%”.

74

Multiple Regression

Chapter 12

Multiple Regression (Chapter 12)
Excel’s Tools/DataAnalysis commands allow you to perform multiple regression analysis and correlation analysis of multivariate data. As an alternative, you may wish to consider using StatPad, which will allow you to pick an choose your X variables even if they are not right next to one another, and will also explain the results and give more output and charting options.

Here is how to perform multiple regression analysis on the magazine ad data from Table 12.1.3 in Chapter 12 of Practical Business Statistics, to understand how the cost per page of advertising can be (at least partially) explained by the magazine’s characteristics. Here is the data set we will be working with (there are 55 magazines in all - this is just the top of the database).

1.

Look under the Tools menu for Data Analysis, and then select Regression. If you cannot find Data Analysis under Excel’s Tools menu, select Add-Ins from the Tools menu and make sure the Analysis ToolPak is checked. If the Analysis ToolPak was not installed when Excel was installed on your computer, you will need to install it from the Excel CDROM. In the resulting dialog box, you may specify the range for the Y variable by selecting the label at the top along with the column of numbers to be predicted by dragging the mouse down the column starting at the label “Page” in cell D9 down to the Page Cost value for

2.

Chapter 12

Multiple Regression

75

the last magazine in cell D64. The X variables must be right next to each other, forming a rectangular range of rows and columns. In this case the X variable range, including labels, is from E9 (the label “Audience”) to G64 (the Income measure for the last magazine), selected by dragging the mouse diagonally from one corner to the other. Here is the resulting dialog box:

What to do if you do not want to use all of the X variables? For example, to leave one out you should create a copy of the X variables (selecting them, using Edit/Copy, selecting a cell in a different part of the worksheet, then using and Edit/Paste), select the column of data to be omitted, delete it with the Del key (this is why we use a copy!), select the columns to its right by dragging the mouse diagonally across from one corner to the other, then use Edit/Cut, move to the empty column, and use Edit/Paste to close the gap. You now have a copy of the X data that omits the column you are not using. 3. Click “Labels” in this dialog box because we have included labels at the top of the data columns. This was done to make the results easier to interpret (so that Excel can use the names of the variables instead of just “X variable 2” for example). Click “Output Range” in this dialog box and specify (by clicking the mouse or typing a cell address) where in the worksheet you want the results to be placed, then click OK. The result is not a pretty sight - it still needs to be tidied up because some cells cannot be read because they are blocked by others and the numbers are not aligned nicely.

4.

76

Multiple Regression

Chapter 12

4.

Now tidy it up and format the results. If there is more in a cell than you can see, select it and use the menu command Format/Columns/Autofit Selection in order to make the column wider so that you can see it all. To control the number of decimal places shown, select the cell(s), then use Format/Cells, then under the Number tab you might choose Number and then specify the number of decimal places. The last two columns have been deleted because they contain no new information (they just repeated the columns before them). Here are the results after tidying up:

Chapter 12

Multiple Regression

77

The results in the first table of Regression Statistics include the R2 value of 0.787 (which tells you that or 78.7% of the variation in Page Costs can be explained by the X variables) and the standard error of estimate Se of 21,578 (which tells you that Page Costs can be predicted to within about this many dollars). The ANOVA table includes the F test, whose p-value 3.81619E-17 is very small (the “E-17” tells you to move the decimal point to the left 17 places, so actually p = 0.0000000000000000381619). In particular, p < 0.001 and the result is very highly significant. The last table has the Coefficients, including the constant term of 4,042.799 and the regression coefficients: 3.788 for Audience, -123.634 for Male, and 0.903 for Income. The Standard Error column shows standard errors for each of these coefficients. Next are their t statistics and p-values (note that Audience and Income are significant, but Male is not). Finally you have 95% confidence intervals for the regression coefficients - for example, we are 95% sure that the effect of an additional dollar of Income is to increase Page Costs somewhere between \$0.161 and \$1.645, on average.

Here is how to find the correlation matrix of a multivariate data set, giving you the correlation of each pair of variables.

78 1.

Multiple Regression

Chapter 12

Look under the Tools menu for Data Analysis, and then select Correlation. In the resulting dialog box, you may select the labels at the top of each column as part of the data range (which must be data columns arranged right next to each other, forming a rectangular range of rows and columns). Also click on “Labels in First Row” so that Excel can use the variable names to help you understand the results. In this case the Input Range is from D9 (the label “Page”) to G64 (the Income measure for the last magazine), selected by dragging the mouse diagonally from one corner to the other. Here is the resulting dialog box:

2.

Click on OK. You can see, for example, that the correlation between Page Costs and Audience is the highest, with r = 0.872. The correlation between Audience and Income is negative, with r = -0.353. Here are the results:

Chapter 14

Time Series

79

Time Series (Chapter 14)
Excel can be used to perform a trend-seasonal analysis of time series data, accomplished by performing a number of detailed steps one at a time to produce the results. As an alternative, you may wish to consider using StatPad, which will perform this analysis automatically, with many output and charting options.

Example: Ford Automotive Sales (Trend-Seasonal Analysis)
This example shows how to perform a trend-seasonal analysis of quarterly data using Excel. This analysis is built up one basic step at a time by finding the moving average, the ratio-to-movingaverage, each seasonal index, the seasonally adjusted series, and the long-term trend. Consider the data for Ford Motor Company’s Automotive Sales from Table 14.2.1 of Practical Business Statistics. Here is the data set (it actually extends through 2000 - this is just the top of the database).

1.

To find the moving average for a quarterly series like this one, remember that it starts with the third row (so that we can average a full year’s worth of data, with a half-year before and a half-year after). So we start in the third quarter (cell D6 in this case). Note that if we go back two quarters and ahead two quarters there are two “Quarter 1” values, so they must have weight 0.5 each so that quarters 1 through 4 are treated equally. The easiest way to compute this weighted average is actually to average two overlapping full years’

80

Time Series

Chapter 14

worth of data: the four quarters of 1994 (cells C4:C7 here) with the full year beginning one quarter later (cells C5:C8). This is why, in this case, you can use the formula =AVERAGE(C4:C7,C5:C8) in cell D6 for the first moving-average value. An easy way to enter the formula is to drag down each four-quarter range instead of typing in its address. Here is how it looks so far:

If you have a monthly instead of a quarterly time series, then instead of the “quarter” column with 1, 2, 3, 4, 1, 2, 3, ... you would have a “month” column with 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, ... and the moving average would start in the seventh row instead of the third. The formula for the moving average would again be the average of two overlapping full years’ worth of data (1) the first 12 months and (2) the full year beginning one month later with months 2 through 13. With monthly data the moving average is also unavailable for the last six months. 2. Double click on the fill handle to copy this formula down the column, then select and delete the last two entries of this column because the moving average is unavailable for the last two quarters. (For monthly data, delete the last six entries). Here is the result so far:

Chapter 14

Time Series

81

3.

Find the ratio-to-moving-average, by dividing the Sales value by the Moving Average value, (in this case, place the formula =C6/D6 in cell E6) then double-click on the fill handle to copy this formula down the column. Here is the result:

82

Time Series

Chapter 14

4.

The seasonal index can be computed for all quarters, even when the moving average and ratio are unavailable. The seasonal index for a given quarter (1, 2, 3, or 4) is the average

Chapter 14

Time Series

83

of all the ratios for that quarter, averaged over all the years that have a ratio for that quarter. For example, the seasonal index for quarter 1 is the average of the ratio 1.03142 for quarter 1 in 1995 with the ratio 1.000796 for quarter 1 in 1996, and so forth through quarter 1 of 2000. Here is a fairly easy way to compute the seasonal index column by using the SUMIF(RANGE,CRITERIA,SUM_RANGE) function to sum the ratios for the selected quarter, divided by the COUNTIF(RANGE,CRITERIA) function that counts how many there are. In this case the formula to put in cell F4 is =SUMIF(\$B\$6:\$B\$29,B4,\$E\$6:\$E\$29)/COUNTIF(\$B\$6:\$B\$29,B4) Note carefully the use of dollar signs in the cell addresses: references with \$ will not change when the formula is copied. The RANGE is \$B\$6:\$B\$29 in both functions (SUMIF and COUNTIF), consisting of those values in the “Quarter” column for which ratios are available, so that the first two and last two rows are excluded. The CRITERIA in both functions (SUMIF and COUNTIF) is simply B4, which refers to the Quarter number, 1, for the first row of data. No dollar signs are used here so that when the formula is copied, the result will be for the appropriate quarter for that row. The SUM_RANGE is \$E\$6:\$E\$29 for the SUMIF function, telling it to sum up the ratio values for the specified quarter number, specifying only those rows for which ratios are available. After entering this formula into cell F4, drag the fill handle down the entire column (or use copy and paste) to find all the seasonal values. Note that they repeat exactly from one year to the next, for example, the quarter 1 seasonal index is always 0.9993252 for all years:

84

Time Series

Chapter 14

5.

The seasonally adjusted values are found by dividing each Sales figure by its Seasonal Index. In this case, the formula is =C4/F4. Enter the formula into the top cell, then copy down the column, perhaps by double-clicking on the fill handle:

Chapter 14

Time Series

85

6.

Before you can find the long-term trend, you need a “time period” column consisting of the numbers 1, 2, 3, ... counting how many time periods have gone by. A quick way to do this is to start with 1 and 2 in the first two rows (H4 and H5 in this example), select both cells, then double-click the little fill handle in the lower right corner of the selected cells.

86

Time Series

Chapter 14

7.

Use this column of time periods to predict the seasonally adjusted column (Y) from the time period (X) using regression analysis. A quick way to do this is with the FORECAST(X,KNOWN_Y’S,KNOWN_X’S) function, using the first time period value column for X, using the entire seasonally-adjusted series with absolute \$ cell addressing

Chapter 14

Time Series

87

as the KNOWN_Y’S, and using the entire time period column with absolute \$ cell addressing as the KNOWN_X’S. In this case, entering the formula into cell I4 using Insert/Function from the main menu for this problem looks like this (be careful to use omit \$ for X but to use \$ in the other two ranges:

8.

Choose OK to see the resulting long-term trend value in the top cell, then double-click the fill handle to copy the formula down the column:

88

Time Series

Chapter 14

9.

To extend the trend beyond the series and find the seasonally-adjusted forecast values, the quickest way is to select the last two rows of the time period and the trend columns (you

Chapter 14

Time Series

89

need two rows so that Excel will know to keep increasing the time period in the next step) as follows:

and then to drag the little fill handle at the lower right corner of the selected range to drag it down as many rows as you want. It’s like magic!

10. To prepare to forecast by seasonalizing the trend, you will need to extend the columns for year, quarter, and seasonal index (columns A, B, and F here). After extending columns A and B, you may select the last seasonal index (cell F31 here) and drag the fill handle down to extend it (if Excel has not already done this for you):

90

Time Series

Chapter 14

11. You are now ready to create the forecast values by multiplying the trend by the seasonal index. In this example, enter the formula =I4*F4 into cell J4, then double-click the fill handle (or copy and paste) to complete the forecast column. Congratulations! You are done the calculations!

Chapter 14

Time Series

91

92

Time Series

Chapter 14

Example: Ford Automotive Sales (Charting the Series and Forecast)
Here is one way to make a chart of one or more of the columns you have created. In this example we create a chart of the original series (sales) together with the forecast values. 1. To begin, select the Sales column including the label at the top (so Excel can use this label), then choose Insert/Chart from the main menu and specify Chart Type as Line and Chart sub-type as either the first choice, or “Line with markers displayed at each data value” as specified here:

Chapter 14

Time Series

93

94 2.

Time Series

Chapter 14

To list the years along the horizontal axis, click Next >, choose the Series tab, click in the “Category (X) axis labels:” portion of the dialog box and drag with the mouse down the numbers in the Year column in the spreadsheet (in this example, cells A4:A36, excluding the label at the top this time). The dialog box now looks like this:

3.

To add the forecasts to this chart, click Add, then click in the Values area of the dialog box, then drag with the mouse down the Forecast values in the worksheet (just the numbers). Next click in the Name area of the dialog box, then click on the cell with the label “Forecast” (in cell J3 here). Your dialog box now looks like this:

Chapter 14

Time Series

95

4.

Click Next >, make any changes you like, then click Finish to place the chart into the worksheet. After resizing the chart and double-clicking on the gray background to make it white, the chart looks like this:
45 40 35 30 25 20 15 10 5 0 1995 1997 1999 2001 1994 1996 1998 1999 2000 2001 1994 1996 1998 2002 1995 1997 2000 Sales Forecast

96

ANOVA

Chapter 15

ANOVA (Chapter 15)
Excel can perform one-way and two-way ANOVA. Here is an example of each type of analysis.

Example: Supplier Quality Scores (One-way ANOVA)
This example shows how to perform a basic one-way analysis of variance to test for significant differences among several individual columns of data. Consider the data on supplier quality (Table 15.1.1 of Practical Business Statistics). 1. Look under the Tools menu for Data Analysis, select “Anova: Single Factor”, and choose OK. If you cannot find Data Analysis under Excel’s Tools menu, select Add-Ins from the Tools menu and make sure the Analysis ToolPak is checked. If the Analysis ToolPak was not installed when Excel was installed on your computer, you will need to install it from the Excel CD-ROM.

2.

In the dialog box that appears, click in “Input Range” and select your data including labels at the top, being sure to extend down to the last row even if you extend past the end of some data columns. Excel requires that your variables be next to one another so that your Input range is a rectangle. Click the check box “Labels in First Row” so that Excel will recognize the names of the columns. Click to the left of “Output Range”, click to the right of “Output Range” and then click in a cell in the worksheet where Excel can put the results. So far, here is how it looks:

Chapter 15

ANOVA

97

3.

Click OK to see the results. In this case the p-value of 0.005 tells you that the mean quality scores of these three suppliers are highly significantly different from one another (p < 0.01). That is, you may conclude that there are supplier differences. Also shown are the average quality for each supplier (82.056, 80.667, and 87.684) and each supplier's variance. You also find the between-sample variability of 269.081 and the within-sample variability of 45.631 under the MS column of the ANOVA table (MS stands for Mean Square). Here are the results, after tidying up by adjusting column widths (try selecting cells that are not displayed properly, then using Format/Column/AutoFitSelection) and by formatting most cells to show three decimal places (using Format/Cells, selecting the Number tab, then using Category Number with 3 decimal places for these cells).

98

ANOVA

Chapter 15

4.

To find the suppliers’ standard deviations, you may take the square root of each variance, using the SQRT function as follows:

Chapter 15

ANOVA

99

Example: Production Quality by Shift and Supplier (Two-way ANOVA)
This example shows how you might perform a two-way analysis of variance with interaction term. Consider the data on production quality according to shift (day, night, and swing) and supplier (A, B, and C) summarized in Figure 15.4.1, and discussed in Problem 16, with averages listed in Table 15.5.4 of Practical Business Statistics. 1. Look under the Tools menu for Data Analysis, select “Anova: Two-Factor with Replication”, and choose OK. If you cannot find Data Analysis under Excel’s Tools menu, select Add-Ins from the Tools menu and make sure the Analysis ToolPak is checked. If the Analysis ToolPak was not installed when Excel was installed on your computer, you will need to install it from the Excel CD-ROM.

100

ANOVA

Chapter 15

2.

In the dialog box that appears, click in “Input Range” and select your data including labels at the top and on the sides. Excel requires that the data be arranged in a table as shown below. In this case there are 5 observations for each combination of shift and supplier, so the “Rows per sample” is set at 5. Click to the left of “Output Range”, click to the right of “Output Range” and then click in a cell in the worksheet where Excel can put the results. So far, here is how it looks:

Chapter 15 3.

ANOVA

101

Click OK to see the results, as shown below. First you see summary statistics for each combination of shift and supplier (for example, the average quality for Shift 1 and Supplier 1 is 77.062, the average for Supplier 1 is 82.417 (to the right in the first table, for Supplier 1, under “Total”), and the average for Shift 1 is 80.076 (below, in the table headed “Total” under the column headed Shift 1 at the very top). In the ANOVA table are the results of the hypothesis tests, including a p-value of 0.720 for testing whether the suppliers have equal means or not, a p-value listed as 0.000 for testing whether the shifts have equal means or not, and a p-value of 0.014 for the interaction of shift and supplier. Here are the results, after tidying up by adjusting column widths (try selecting cells that are not displayed properly, then using Format/Column/AutoFitSelection) and by formatting most cells to show three decimal places (using Format/Cells, selecting the Number tab, then using Category Number with 3 decimal places for these cells).

102

ANOVA

Chapter 15

Chapter 16

Nonparametrics

103

Nonparametrics (Chapter 16)
Excel has functions that can help you with nonparametric testing based on ranks of the data. In this chapter we will illustrate the sign test and the two-sample unpaired nonparametric test (the Mann-Whitney U-test, also called the Wilcoxon rank-sum test).

Example: Local and National Family Income (Sign Test)
The sign test can be used to test whether the median of a random sample differs significantly from a reference value. Consider the example of local family incomes using data from table 16.1.2 of Practical Business Statistics. The question is: Do these incomes differ significantly from the national median of \$27,735? By using the COUNTIF(RANGE,CRITERIA) function, you can find the modified sample size by using, for the criteria, the condition that the data values are different from the reference value (to say “different” in Excel, you use the less-than and greater-than signs like this: “<> ReferenceValue”). You can also find the number of data values below the reference value by using the less-than sign “< ReferenceValue” in the criteria. You may then find the p-value for the test by using the BINOMDIST and the MIN functions. If m denotes the modified sample size, #Below denotes the number of data values below the reference value, and θ0 denotes the reference value, then the p-value is =2*MIN(BINOMDIST(m,#Below,0.5,TRUE),BINOMDIST(m-#Below,0.5,TRUE)) Here is how you could find the modified sample size and the number of data values below the reference value (m = 25 and #Below = 6) for the income data, together with the p-value. The result is that the observed median income is significantly different from the reference value.

104

Nonparametrics

Chapter 16

Example: Incomes of Mortgage Applicants (Unpaired Two-Sample Test)
The nonparametric test for two unpaired samples is based on the ranks of the overall data set with both samples combined. Both the Mann-Whitney U-test and the Wilcoxon rank-sum test give the same result. Excel can help you perform this procedure.

Chapter 16 1.

Nonparametrics

105

Begin by listing both groups of numbers in a single column, with labels in the column to its left to identify the group of each number. Then, with any data cell selected, use the menu command Data/Sort to sort both columns by data value. Here is the Data/Sort dialog box:

2.

Now find the rank of each data value, being careful to average any ties. To do this, create a column (headed “1, 2, 3 ...” below) consisting of the initial ranks (before averaging) of

106

Nonparametrics

Chapter 16

1, 2, 3, and so forth. Then create a column of ranks with tie averaging by using the SUMIF(DataRange,DataValue,123Range)/COUNTIF(DataRange,DataValue), being careful to use absolute \$ addressing for DataRange and 123Range but not for DataValue. Here is the result after copying that formula down the column (for example, by doubleclicking on the fill handle after entering the first formula). Note that the averaged rank of 18.5 is used for both income values of 57,000.

Chapter 16

Nonparametrics

107

3.

To find the average rank for each group, you may again use the SUMIF and COUNTIF functions, this time as

108

Nonparametrics

Chapter 16

SUMIF(GroupLabelRange,”Fixed”,RanksRange)/COUNTIF(groupLabelRange,”Fixed”) for the fixed-rate mortgages, changing “Fixed” to “Variable” for the variable-rate mortgages. Here are the results:

Chapter 16 4.

Nonparametrics

109

Now find the average difference in ranks by subtracting these average ranks. Find the standard error by using the sample size for each group (16 and 14, here). Divide the average difference in ranks by the standard error to find the test statistic. Finally, find the p-value using the function =2*(1-NORMSDIST(ABS(TestStatistic))) The results are as follows. Note that these two groups are not significantly different from one another because p > 0.05.

110

Chi-Squared Analysis

Chapter 17

Chi-Squared Analysis (Chapter 17)
Chi-squared analysis is used to test for significance in counted data. You can use Excel to test whether population percentages are equal to known reference values; this is done by performing the appropriate steps (e.g., compute the expected counts, etc.). Excel may also be used to perform a test for independence in a two-way table of counts.

Example: Causes of Quality Problems (Chi-squared Test for Known Percentages)
In this example we solve the problem of testing observed counts against known population reference percentages by executing the steps in Excel. Consider the data on quality problems in Tables 17.2.2 and 17.2.3 of Practical Business Statistics. 1. Find the column of expected counts by multiplying each reference percent by the total observed count. Give a name such as “Observed” to the observed counts, and the name “Expected” to the expected counts, perhaps by selecting a column of numbers and using the menu command Insert/Name/Define. To find the chi-squared statistic, since you have named the Observed and Expected numbers, you may use the following matrix formula: =SUM((Observed-Expected)^2/Expected) by typing it in and holding Ctrl and Shift while you hit Enter. Because this is a matrix formula, it may not give you an answer if you hit only Enter by itself. 3. To find the p-value, you may use the function CHIDIST(ChiSq,DF) evaluated using the chi-squared statistic and its number of degrees of freedom. Here are the results. Note that the observed counts do not differ significantly from the reference percentages because p > 0.05.

2.

Chapter 17

Chi-Squared Analysis

111

Market

Segmented?

(Chi-squared

Test

for

In this example we perform the chi-squared test for independence of two categorical variables with Excel’s CHITEST function, which gives the p-value directly. Consider the data on market segmentation (Table 17.3.1 of Practical Business Statistics) which gives the number of consumers of each type (practical or impulsive) who purchased each type of rowing machine. Remember that the chi-squared test requires counts, not percentages or averages. 1. Excel can help you compute the p-value of the chi-squared test for independence using the CHITEST function, but you have to compute the table of expected counts first. In the example below, to create a formula for expected counts that will copy correctly to fill the entire table, note the use of “absolute addressing” using dollar signs in the formula “=B\$6*\$D3/\$D\$6” to find the expected 18.93 purchases of basic machines by practical consumers. This formula can be copied and pasted to fill the table while always taking column totals from row 6 (hence the reference B\$6), always taking the row totals from column D (hence the reference \$D3), and always taking the overall total from cell D6 (hence the reference \$D\$6).

112 2.

Chi-Squared Analysis

Chapter 17

The results are shown below: first the original table of counts, next the table of expected counts, and finally the CHITEST function, which uses both the original table and the table of expected counts (but not the totals). The resulting CHITEST p-value is 3.07823E-15, which represents the very small number 0.00000000000000307823 because the scientific notation "E-15” tells you to move the decimal point 15 places to the left. Clearly the result is very highly significant because this p-value is less than 0.001.

Chapter 18

Quality Control

113

Quality Control (Chapter 18)
You can produce quality control charts in Excel. You arrange the data for the chart in one column, copy the center line number down another column, and similarly set up one column for each of the control limits, then create the chart. Here are examples for XBar and R charts. In a similar way you can also produce a percentage chart. The procedure is simpler if you use StatPad.

Example: Weights of Boxes of Detergent (XBar and R Charts)
Here is how to use Excel to draw an XBar chart for the detergent data from Table 18.3.4 of Practical Business Statistics. Begin with a column containing a list of the averages (of five observations each in this case). Immediately to its right, create a column containing the average XBarBar=16.093 of these averages repeated down the column. Next to it, create a column for the lower control limit XBarBar-A2*RBar = 15.941 and one for the upper control limit XBarBar+A2*RBar = 16.245. Now select all four of these columns (just the numbers) and use Excel’s menu command Insert/Chart (or click on the Chart Wizard icon) to bring up the Chart Wizard dialog box. Under “Chart type” select Line (with markers displayed at each data value) as shown. As you work your way through the Chart Wizard (by clicking on Next), pause at the Chart Options dialog box. Click on the Gridlines tab to specify whether or not you would like gridlines (which were not used here) and click on the Legend tab to delete the legend (by unselecting the “Show legend” check box). Click on Finish, and the XBar chart appears.

114

Quality Control

Chapter 18

To use Excel to draw an R chart for the detergent data, proceed as for the XBar chart, but use the range values R for the first column, their average RBar for the second column, and the

Chapter 18

Quality Control

115

appropriate lower and upper control limits D3*RBar = 0 and D4*RBar = 0.556 for the third and fourth columns. Here is the R chart in Excel:

116

Excel Range Names

Appendix

Appendix: Excel Range Names
For use with Excel, each chapter of Practical Business Statistics has its own file on the CD-ROM that includes the data tables from examples and problems. To access it, use File/Open from Excel’s menu. Each column of numbers is named and ready to use. For example, the data sets from Chapter 3 are in the file named Chapter03.xls, and the employee database from Appendix A of the textbook is in the file named EmployeeDatabase.xls. To work with a column of numbers from a data file, you may use its name in a formula, such as “=AVERAGE(yield)” to place the average of a column of numbers named “yield” into a cell in your worksheet. Alternatively, you may drag the mouse down the numbers in the data set to select them if you wish. Here are the Excel range names assigned to each individual data table within a file. Note that spaces are not allowed in Excel names, and that the underline character (_) is often used instead. CHAPTER 2, RANGE NAMES. Characteristics of Food Services Companies. • Profits • Return • Employees • Revenues Daily Stock Price Information for Home Depot • Open • High • Low • HD_Close • Volume • HD_Percent_Change Table 2.6.1. Employment/History Status of Five People. • Salary • Experience Table 2.6.2. Selected Product Output of Production Facilities. • Employ Table 2.6.3. Sales and Income. • Sales • Income Table 2.6.4. Selected Customers and Purchases. • Purchases Table 2.6.5 Ratings of Cell Phones. • Price Table 2.6.6. Comparison of Upright Vacuum Cleaners. • Vacuum_Price • Weight Table 2.6.7. Closing Price and Monthly Change for DJIA Firms. • Close • Change Table 2.6.8. Daily DJTA for September 1998. • DJIA • Net_Change • Percent_Change

Appendix A

Data Files and Variable Names

117

CHAPTER 3, RANGE NAMES. Table 3.2.1. Home Mortgage Rates. • Interest_Rate Table 3.2.2. Starting Salaries. • Salary Table 3.4.1. Assets of Commercial Banks in the Fortune 1000. • Assets Table 3.5.1. Yields of Money Market Funds. • Taxable_Yield • Tax_Exempt_Yield Table 3.5.2. Rates of Computer Ownership • Computer_Owners Table 3.6.1. Changes in Spending on Syndicated TV Advertising. • Change Table 3.8.1. The Number of Employees for Food Services Firms. • Employees Table 3.9.1. Yields of Municipal Bonds. • Yield Table 3.9.2. Market Response to Stock Buy-Backs. • Price_Change Table 3.9.3. Active Stock Market Issues. • Stock_Change Table 3.9.4. CREF'S Investments. • Market_Value Table 3.9.5. Percent Change in Revenues for Scientific, Photo, and

Control Equipment Companies in the Fortune 500. • Revenue_Change Table 3.9.6. Hospital Charges for Heart Failure and Shock. • Hospital_Charges Table 3.9.7. CEO Compensation for Food Processing Firms. • CEO_Compensation Table 3.9.8. Market Share for Seattle Radio Stations. • Listeners Table 3.9.9. Net Income of Selected Firms. • Net_Income Table 3.9.10. Cost of Traditional Funeral Service. • Funeral_Cost Table 3.9.11. Special Exemptions to the Tax Code. • Exemption Problem 3.21. Defective Motors, Per Batch Of 250. • Defects Table 3.9.12. Cost to Rent a Car. • Rental Problem 3.23. Interest Rates. • Rate Problem 3.24. Market Values. • Market Problem 3.25. Executive Salaries. • Executive_Salary Problem 3.26. Order Size.

118 • Order

Excel Range Names

Appendix

CHAPTER 4, RANGE NAMES. Example: How Many Defective Parts? • Defects Example: Your Grade Point Average. • Credits • Grade Example: The Firm's Cost of Capital • Market_Value • Rate_of_Return Table 4.1.1. Loss at Opening, Crash of 1987. • Loss Table 4.2.1. CEO Compensation in Technology. • Salary Table 4.2.2 Business Failures by State. • Failures Problem 4.1. Cars Requiring Extra Work • Cars Table 4.3.1. Last Month's Sales • Sales Table 4.3.2. Value Added Tax Rates by Country. • VAT Table 4.3.3. Profits for General Merchandisers in the Fortune 500. • Profits Problem 4.7. Beta of Stock Portfolio. • Shares • Cost_Per_Share • Beta Table 4.3.4. State Population and Taxes.

Problem 3.27. Envelope Prices. • Envelope_Price Problem 3.28. Market Share. • Share Table 3.9.13. Percentage Change in Dollar Value. • Dollar_Change Problem 3.30. Tylenol Prices. • Tylenol Table 2.6.7. Closing Price and Monthly Change for DJIA Firms. • DJIA_Close • DJIA_Change Table 2.6.8. Daily DJIA for January 2002. • DJIA_Net_Change • DJIA_Percent_Change Case. • Material • Manager • Inventory

Appendix A • • Population State_Taxes

Data Files and Variable Names • CREF_Value

119

Table 4.3.5. Percent Change in Housing Values over Five Years for U.S. Regions. • Percent_Change Table 4.3.6. Revenues for selected Fortune 500 companies. • Revenues Table 4.3.7. Percent increases of initial public stock offerings. • Percent_Increase Problem 4.16. Paper Mill Problems. • Problem Table 4.3.8. Home Mortgage Loan Fees • Fee Problem 4.23. Strength of Cotton Yarn. • Strength Problem 4.24. Factory Inventory Level. • Inventory Problem 4.25. Your Products' Share. • Share Problem 4.26. Monthly Sales. • Monthly_Sales Table 4.3.9. Changing Value of the Dollar. • Change Table 3.9.1. Yields of Municipal Bonds. • Yield Table 3.9.2. Market Response to Stock Buy-Backs. • Price_Change Table 3.9.4. CREF'S Investments.

Table 4.3.10. Length in minutes for selected films from a video library. • Time Table 3.9.6. Hospital Charges for Heart Failure and Shock. • Hospital_Charges Table 3.9.7. CEO Compensation for Food Processing Firms. • CEO_Compensation Table 3.9.10. Cost of Traditional Funeral Service. • Funeral_Cost Table 4.3.11. Sales of Some 'Light' Foods. • Food_Sales Table 2.6.7. Closing Price and Monthly Change for DJIA Firms. • DJIA_Close • DJIA_Change Table 2.6.8. Daily DJIA for January 2002. • DJIA_Net_Change • DJIA_Percent_Change Case. • Chairs • Tables • Bookshelves • Cabinets • Value CHAPTER 5, RANGE NAMES. Table 5.1.1. Finding The Deviations From The Average. • Dart_Returns

120

Excel Range Names

Appendix

Example: The Advertising Budget. • Budget Table 5.1.3-5.1.4. Closing Stock Prices and Daily Returns. • Dow_Jones • Dow_Jones_Return Example: S&P 500 Stock Index Volatility. • Standard_Deviation Table 5.2.1. Employee Salaries. • Employee_Salary Table 5.2.2. Hospital Length of Stay. • Days Table 5.5.1. Advertising Accounts in Play. • Ad_Budget Table 5.5.2. Performance of Pharmaceutical Firms. • Stock_Return Table 5.5.3. Largest Stock Mutual Funds. • Return_Mutual_Fund • Assets_Mutual_Fund Problem 5.6. Number of Executives for Seattle Firms. • Executives

Table 5.5.4. Weights for Two Samples of Candy Bars. • Before • After Table 5.5.5. Cost due to traffic congestion, per registered vehicle. • Cost_Traffic Problem 5.17. Rates of Return. • ROR Problem 5.18. Interest Rates • Rate Table 5.5.6. Theme Park Admission Prices. • Admission Table 5.5.7. Changing Value of the Dollar. • Change Problem 5.20. Weights of Sinks. • Weight Table 5.5.8. Hotel Room Prices. • Price Table 5.5.9. Gifts Returned. • Returned Problem 5.23. Airline Ticket Prices • Ticket_Cost Problem 5.24. Productivity Measures. • Productivity Problem 5.25. Sales. • Sales Problem 5.26. Percentage of Gold. • Gold Problem 5.27. Return on Equity.

Appendix A • ROE

Data Files and Variable Names

121

Table 4.3.2. Value Added Tax Rates by Country. • VAT Table 4.3.10. Length in minutes for selected films from a video library. • Time Table 5.5.10. International taxation. • GDP • Taxes Table 3.9.1. Yields of Municipal Bonds. • Yield Table 3.9.2. Market Response to Stock Buy-Backs. • Price_Change Table 3.9.4. CREF'S Investments. • Market_Value Table 3.9.10. Cost of Traditional Funeral Service. • Funeral_Cost Problem 5.40. Defective Motors, Per Batch Of 250. • Defects Table 4.3.1. Last Month’s Sales. • Last_Month_Sales Table 4.3.5. Percent Change in Housing Values over Five Years for U.S. Regions. • Percent_Change Table 4.3.8. Home Mortgage Loan Fees. • Fee

Table 5.5.11. International Bond Mutual Fund Performance. • Performance_Before • Performance_After Table 5.5.12. Age and Cost for Presses. • Age • Cost_Presses Table 2.6.7. Closing Price and Monthly Change for DJIA Firms. • DJIA_Close • DJIA_Change Table 2.6.8. Daily DJIA for January 2002. • DJIA_Net_Change • DJIA_Percent_Change Case. • Part_Size CHAPTER 7, RANGE NAMES. Example. Profit Under Various Economic Scenarios. • Profit • Prob_of_Profit Table 7.6.1. Probability Distribution of Payoff. • Payoff • Prob_of_Payoff Table 7.6.2. Probability Distribution of Downtime. • Downtime • Prob_of_Downtime TABLE 7.6.3. Probabilities for Qualified Technical Applicants. • Applicants • Probability_of_Applicants

122 Table 7.6.4. Rates of Return and Probabilities. • ROR • Prob_of_ROR

Excel Range Names

Appendix

CHAPTER 9, RANGE NAMES. Table 9.1.2. Thickness of Selected Sheets of Paper. • Thickness TABLE 9.1.3. Yearly Percentage of Adults Using the Internet. • Internet_Usage Table 9.1.4. Yields of a Chemical Processing Facility. • Tons Problem 9.11. Personal Computer Orders. • Computers Problem 9.14. Weights of Loaves of Bread. • Weight Problem 9.16. Cleaning Cost • Cleaning_Cost Table 9.6.1. Prices at SuperMall and elsewhere for various items. • SuperMall • Elsewhere • Savings Problem 9.27. Daily Changes in S&P 500 Stock Market Index. • Change Table 9.6.2. Performance of Recommended Stocks. • Performance Problem 9.34. Computer Speed. • Seconds Problem 9.35. Economic Viability of Mining Operation. • ROR

Table 7.6.5. Quality Control Problems. • Prob_of_Rework • Rework_Cost Case. • Oil_Price • Prob_of_Oil_Price CHAPTER 8, RANGE NAMES. Table 8.6.1. Project Analysis. • Probability • Profit_or_Loss Table 8.6.2. Industrial Farm Equipment Firms. • Profit Table 8.6.3. Revenue Change for Fortune 500 Soap and Cosmetics Companies. • Revenue_Change Table 8.6.4. Economic Forecasts. • Forecast Problem 8.31. Recent Billings. • Billing Problem 8.33. Quality of Agricultural Produce. • Quality

Appendix A

Data Files and Variable Names • Weight_Food

123

Table 4.3.1. Last Month's Sales. • Sales Problem 9.40. Strength of Cotton Yarn. • Strength Table 5.5.4. Weights for Two Samples of Candy Bars. • Before • After Problem 9.44. Quality scores for agricultural produce. • Quality Problem 9.45. Caffeine in Coffee. • Caffeine Case. • Order_Amount CHAPTER 10, RANGE NAMES. Table 10.6.1. Relaxation Scores. • Before • After Table 10.6.4. Salaries Arranged by Gender. • Women • Men Problem 10.8. Inventory Level. • Inventory Problem 10.9. Weights of Loaves of Bread. • Weight_Bread Table 4.3.7. Percent increases of initial public stock offerings. • Percent_Increase Problem 10.21. Weight of Frozen Foods.

Problem 10.22. Prices. • Price Problem 10.23. Calorie Content. • Calories Table 10.7.2. Store Returns. • Returned Problem 10.25. Satisfaction Scores. • Satisfaction Problem 10.26. Pollutant Levels. • Pollution Problem 10.27. Component Weights. • Weight_Component Table 10.7.3. Performance of Socially Aware Funds. • ROR Table 10.7.4. World Income Funds OneYear Market Return. • Market_Return Table 10.7.5. Vocal Stress Level. • True_Stress • False_Stress Table 10.7.6. Wine Tasting Scores. • Chardonnay • Sauvignon Table 10.7.7. Days Until Failure. • You • Competitor Table 10.7.8. Monthly Daycare Rates. • Laurelhurst • Other_Areas Table 10.7.9. New Product Preferences.

124 • • Milwaukee Green_Bay

Excel Range Names • • Today Yesterday

Appendix

Table 10.7.11. Supplier Quality. • Custom_Cases • International_Plastics Table 5.5.4. Weights for Two Samples of Candy Bars. • Candy_Before • Candy_After Case. • n • Avg • stdDev • stdErr • t • p CHAPTER 11, RANGE NAMES. Table 11.1.1. First Quarter Performance. • Contacts • Sales_Qtr Table 11.1.3. Internet Usage Ratings. • Audience • Reach • Pages • Time Table 11.1.4. Top Merger & Acquisition Advisers. • Deals • Dollars Table 11.1.5. Mortgage Costs. • Fee • Interest Table 11.1.6. Percent Change in Stock Index.

Table 11.1.7. S&P100 Index Call Options. • Strike_Price • Call_Price Table 11.1.8. Temperature and Yield for an Industrial Process. • Temperature • Yield_Process Table 11.1.9. Fiber-Optics LongDistance Communications. • Investment • Miles Table 11.1.11. U. S. Treasury Bonds. • Coupon_Rate • Bid_Price Table 11.1.12. Weekly Production. • Number_Produced • Cost_Weekly Data for Figure 11.1.18. Restaurant and Food Store Expenditures by State, millions. • Food_Stores • Restaurant Table 11.2.2. Weekly Production (Outlier Omitted). • Produced • Cost Table 11.2.3. Territory and Performance of Salespeople. • Territory • Sales_Performance Table 11.3.1. Printing Presses. • Age

Appendix A • Cost_Printing

Data Files and Variable Names Table 11.3.9. Market Share and 30Second Advertising Cost. • Share • Ad_Cost Table 11.3.10. Gold Coins. • Weight • Price_Gold Table 11.3.11. Room for Expansion. • Existing_Units • Capacity Table 11.3.12. Gasoline Prices. • Price_11_30_90 • Price_2_26_91 Table 11.3.13. Salaries and Money Raised Per Capita, Charitable Organizations. • President • Money_Raised Table 11.3.14. Mailing Lists. • Size • Sales

125

Table 11.3.2. Airline On-Time Performance. • Month_On_Time • Year_On_Time Table 11.3.3. International Closed-End Bond Funds. • NAV • Price_Fund Table 11.3.4. Business Failures By State. • Failures • Population Table 11.3.5. Daily Stock Changes. • McDonalds • Dow_Jones Table 11.3.6. Expense Ratio and OneYear Rate of Return. • WR_ Expense_Ratio • WR_Return Table 11.3.7. Votes for Albert Gore, Jr. • Nov7 • Certified • Change Table 11.3.8. Total U.S. Advertising Spending by Retail Firms. • Ads2000 • Ads1999

Table 11.3.15. Short-Term Bond Funds. • Maturity • Return Table 11.3.16. Production Data. • Workers • Production Table 11.3.17. Biotechnical Stocks. • EPS • Price_Biotech Table 11.3.18. Newspaper Advertising Rates Per Line. • Circulation • Open_Line_Rate

126

Excel Range Names • • • Salary Experience Gender

Appendix

Table 11.3.19. Newspaper Ad Rates Adjusted for Readership. • Circulation2 • Milline_Rate Table 11.3.20. Defects and Possible Causes. • Defect_Rate • Temperature_Variability • Stoppages Case • Purifier • Yield_Case CHAPTER 12, RANGE NAMES. Table 12.1.3. Advertising Costs, Characteristics of Magazines. • Page • Audience • Male • Income Table 12.2.2. Computers and Office Equipment Companies in the Fortune 500. • Mkt_Val • Assets • Employees Table 12.2.13. Dividends, Sales of Goods. • Div • Nondur • Durable Table 12.3.3. Temperature and Yield for an Industrial Process. • Temperature_Process • Yield Table 12.4.4. Salary, Experience, and Gender for Employees.

Table 12.5.1. Picasso Paintings. • Price • Area • Year Table 12.5.4. Computer Response Time, Users, and Load. • Response_Time • Users • Load Table 12.5.5. Performance of International Stocks. • US • Europe • Pacific_Rim Table 12.5.7. CEO Salaries, Sales and Return on Equity for Northwest Companies • NW_Salary • NW_Sales • NW_ROE Table 12.5.8. Brokerage House AssetAllocation. • Performance • Stocks • Bonds Table 12.5.11. Staff and Contribution Levels for Charities. • Staff • Public • Govt • Other Table 12.5.14. Fiber-Optics LongDistance Communications. • Invest

Appendix A • Miles

Data Files and Variable Names

127

Table 12.5.16. Price and Profit in Test Markets. • Price_Test • Profit Table 12.5.18. Interest Rates. • Fed_Funds • T_Bills • T_Bonds Case. • Temperature • Density • Rate • AM_PM • Defect CHAPTER 13, RANGE NAMES. Data from Appendix of Report: Quick Pricing Formula. • Components • Size • Cost CHAPTER 14, RANGE NAMES. Table 14.1.1. Radio, TV, and Computer Store Sales. • Radio_TV_Computer Table 14.1.3-14.1.4. U.S. Retail Sales. • Unadjusted_Sales • Seasonally_Adjusted_Sales Table 14.1.5. U.S. Treasury Bills. • Yield Table 14.2.1. Ford Motor Company. • Automotive_Sales

Table 14.3.1. Civilian Unemployment Rate. • Unemployment Table 14.4.1. Quarterly Revenues for Walt Disney Company and Subsidiaries. • Disney Table 14.4.2. Quarterly Net Sales for PepsiCo. • PepsiCo Table 14.4.3. Quarterly Sales for Deere & Company. • Deere_Sales Table 14.4.4. Quarterly Sales for Castle & Cooke, Inc. • Castle_Cooke_Sales Table 14.4.5. Quarterly Sales for Nordstrom, Inc. • Nordstrom_Sales Table 14.4.6. Quarterly Sales. • Sales Table 14.4.10. Interest Rate Forecasts. • Forecast • Lower_95 • Upper_95 CHAPTER 15, RANGE NAMES. Table 15.1.1. Quality Scores for Suppliers' Products. • Amalgamated • Bipolar • Consolidated Table 15.5.3. Lengths of Telephone Calls. • Info • Sales

128 • • Service Other

Excel Range Names • • Yours Competitor

Appendix

Table 15.5.4. Original Data. • Quality • Shift • Supplier CHAPTER 16, RANGE NAMES. Table 16.1.2. Incomes of Sampled Families. • Income Table 16.2.2. Level of Creativity. • Ad_1 • Ad_2 Table 16.3.2. Income of Mortgage Applicants. • Fixed • Variable Table 16.4.1. Building Materials Firm Profits. • Building_Profit Table 16.4.2. Aerospace Firm Profits. • Aerospace_Profit Table 16.4.3. Relaxation Scores. • Before • After Table 16.4.4. Stress Levels. • True_Answer • False_Answer Table 16.4.5. Gender Salary Data. • Women • Men Table 16.4.6. Reliability of Products Under Abuse.

Table 16.4.7. Prescription Drug Prices. • United_States • Canada CHAPTER 17, RANGE NAMES. Table 17.1.1. Vehicle Desired. • Vehicle_Count • Vehicle_Percent Table 17.1.2. Responses to the Question on GM Cars. • Boomer • Nonboom • Overall Table 17.2.2. Defective Components. • Defect_Count Table 17.3.1. Rowing Machine Purchases. • Practical • Impulsive Table 17.4.1. Vehicle Desired: This week's count and last year's percentage. • This_Count • Last_Percent Table 17.4.2. Incoming Telephone Calls. • Phone_Count • Phone_Percent Table 17.4.3. Survey of Future Business Conditions. • Managers • Employees Table 17.4.4. Survey on the Chances of a Stock Market Crash. • Stockholders

Appendix A • Nonstockholders

Data Files and Variable Names

129

Table 17.4.5. Order Rates by Region • East • West Table 17.4.6. Status of Mortgage Applications. • Residential • Commercial Table 17.4.7. Household Responses. • Satisfied • Dissatisfied Table 17.4.8. Newsletter Interest Level. • Customer • Potential_customer CHAPTER 18, RANGE NAMES. Table 18.1.1. Defect Causes, with Frequency of Occurrence. • Number_Defects Table 18.3.3. Summaries of Measurements for 8 Samples, n=4. • Average_Meas • Range_Meas Table 18.3.4. Weights of Sampled Boxes of Detergent. • Average_Detergent • Range_Detergent Table 18.4.2. Summaries of Measurements for 12 Samples. • Defects_of_500 Table 18.4.3. Errors in Batches of n=300 Purchase Orders. • Defects_of_300

Table 18.5.1. Frequency of Problems in Candy Manufacturing. • Candy_Problems Table 18.5.2. Problems in Rebate Processing. • Rebate_Problems Table 18.5.3. Thickness of Protective Coating. • Thick_1 • Thick_2 • Thick_3 Table 18.5.4. Length of Broccoli Trees, n=4. • Broc_1 • Broc_2 • Broc_3 • Broc_4 Table 18.5.5. Defective Invoices, n=500. • Errors Table 18.5.6. High Speed Memory Chips. • Chip_Number Table 18.5.7. Baking Oven Temperatures. • Mon_Avg • Mon_Range • Tues_Avg • Tues_Range • Wed_Avg • Wed_Range APPENDIX A, RANGE NAMES. Appendix A. Employee Database. • Salary • Gender • Age • Experience

130 • Level

Excel Range Names • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Lifetime Lifetime_D0 Lifetime_D1 MajorDonor MajorDonor_D0 MajorDonor_D1 MedHouseInc MedHouseInc_D0 MedHouseInc_D1 OwnerOccupied OwnerOccupied_D0 OwnerOccupied_D1 PCOwner PCOwner_D0 PCOwner_D1 PerCapIncome PerCapIncome_D0 PerCapIncome_D1 Professional Professional_D0 Professional_D1 Promotions Promotions_D0 Promotions_D1 RecentGifts RecentGifts_D0 RecentGifts_D1 Sales Sales_D0 Sales_D1 School School_D0 School_D1 SelfEmployed SelfEmployed_D0 SelfEmployed_D1 Technical Technical_D0 Technical_D1 YearsSinceFirst

Appendix

APPENDIX B, RANGE NAMES. Appendix B. Donations Database. Note: “_D0” indicates 19,011 nondonors, while “_D1” indicates 989 donors, out of 20,000 overall. • Age • Age_D0 • Age_D1 • Age55_59 • Age55_59_D0 • Age55_59_D1 • Age60_64 • Age60_64_D0 • Age60_64_D1 • AvgGift • AvgGift_D0 • AvgGift_D1 • Cars • Cars_D0 • Cars_D1 • CatalogShopper • CatalogShopper_D0 • CatalogShopper_D1 • Clerical • Clerical_D0 • Clerical_D1 • Donation • Donation_D0 • Donation_D1 • Farmers • Farmers_D0 • Farmers_D1 • Gifts • Gifts_D0 • Gifts_D1 • HomePhone • HomePhone_D0 • HomePhone_D1