You are on page 1of 40

Excel for Data Analysis

3005 30th Street Boulder, CO 80301 303-444-7863 www.n-r-c.com

Excel for Data Analysis


Table of Contents
Introduction .................................................................................................... 1 Data Entry in Excel .......................................................................................... 2
Unique IDs ................................................................................................................................. 2 Setting up the Worksheet............................................................................................................. 2 Entering Single-Response, Closed-Ended Questions........................................................................ 2 Entering Multiple-Response Questions ........................................................................................ 3 Entering Open-Ended Questions ................................................................................................... 5 Creating a Codebook ................................................................................................................... 6

Analyzing the Data .......................................................................................... 8


Calculating an Average ................................................................................................................ 8 Creating a Frequency Distribution for a Single-Response Question ................................................. 10 Creating a Frequency Distribution for a Multiple-Response Question .............................................. 13 Functions and formulas used for simple descriptive analyses in Excel............................................. 15

Presenting the Results: One Quick Idea........................................................ 17 Using Pivot Tables for Basic and Advanced Analyses .................................... 18
Creating a PivotTables (Basic Analyses) .................................................................................... 20 Crosstabulation of Data Using PivotTables (Advanced Analyses) ................................................. 23

APPENDIX I: Example Completed Surveys for Data Entry ............................ 25 APPENDIX II: Example Codebook ................................................................. 32 APPENDIX III: Example Analysis, with Formulas.......................................... 33 APPENDIX IV: Example of an Annotated Instrument................................. 37

Introduction
This handbook is designed to instruct program staff on how to set up data entry processes and perform simple analyses of data collected through surveys, course evaluations, or by observation or other record keeping. Throughout this handbook, a common example is used: data representing the results from six surveys completed by fictional participants of a fictional training program. A copy of the completed surveys can be found in Appendix I. The reader may find it helpful to review the surveys before continuing with the rest of the handbook. In fact, it might be beneficial to pull out the six surveys and refer to them periodically while reviewing the handbook. The individual conducting the data analysis is referred to in this handbook as the analyst. This person may be a program staff member, volunteer, board member or other stakeholder willing to accomplish this task. There is no job description for this analyst. He or she needs only to have a basic understanding of Microsoft Excel, know how to perform calculations using the contents of multiple cells, and be familiar with formulas. Reminders about using Excel are found in text boxes throughout the handbook. Good luck!

The Staff of NRC

Excel for Data Analysis was written by National Research Center, Inc. 3005 30th Street, Boulder, Colorado 80301 Phone: 303-444-7863 Fax: 303-444-1145 www.n-r-c.com Copyright 2003 by National Research Center, Inc. All rights reserved.

Excel for Data Analysis


National Research Center Inc.

Page 1
3005 30th St. Boulder, CO 80301 (303) 444-7863

Data Entry in Excel


The first job to be completed before data analysis of a data set is creating an electronic dataset, or entering the data into an electronic file. This can be done fairly simply using Microsoft Excel.
Unique IDs

Before beginning the data entry, it is advisable to put a unique identifier on each survey or data form. This will allow the analyst to keep track of his/her progress, and will also make it easier to track down and set straight any data entry errors. This identifier is not one that actually associates or identifies the survey with a particular person; rather, it is only to make it easier to find a specific survey at a later date. The surveys do not need to be in any particular order, just begin at the top of the stack with 1, and number consecutively.
Setting up the Worksheet

To set up a worksheet for data entry, the analyst will use the first row (row 1) as the question or question part labels. Dedicate the first column (column A) to the IDs. Thus, the analyst will put the label ID in cell A1. Cell B1 would contain the label q1 (for question #1) or whatever is appropriate for the first question or field of data. Cell B2 would contain the label q2 (or whatever is appropriate), etc. Each survey will then be entered into one row; the first survey in row 2 (ID #1), the second survey in row 3, and so on.
Reminder: Cell References Cells in an Excel spreadsheet are referred to by the intersection of the Column and Row in which they appear. In the example used for this handbook, the cell that contains the label ID is cell A1, because it is in the first column (A) and the first row (1). The cell that contains the answer to question #1 of the third survey entered is B4 (the 2nd column and the 4th row). Entering Single-Response, Closed-Ended Questions

A closed-ended question means that the respondent chooses an answer by marking a box or circling a number from a given list of possible responses. A single-response question means that the respondent is to only choose one answer from the list. Question #1 (shown below) from the example survey represents a single-response, closed-ended question.
1) How many of the training sessions did you attend? 1 to 2 3 to 4 5 6 or more

When entering and analyzing data, it is easiest to work with numbers. To do this, a number is assigned to each possible response option: 1 to 2 = 1, 3 to 4 = 2, 5 = 3, and 6 or more = 4.
Excel for Data Analysis
National Research Center Inc.

Page 2
3005 30th St. Boulder, CO 80301 (303) 444-7863

Thus, since the respondent to the first survey said they attended 5 sessions, a 3 would be entered as the answer. The example to the right shows how the answers to question #1 would be entered for all six fictional surveys (from Appendix I).

Entering Multiple-Response Questions

Question #2 from the fictional survey is a multiple-response question, meaning that respondents could give more than one answer to the question; in this example, they may have heard of the program from multiple sources.
2) How did you hear about this training? (Please check all that apply.) Neighborhood newsletter Bulletin boards in community buildings Flyers Your childs school Word of mouth Other

There are two ways the data could be entered from a question of this type. In the first method, a number is assigned to each response, similar to a single-response question. However, more than one column is assigned to the question. The number of columns assigned should be as many as the highest number of answers the analyst believes that the respondent may give; if necessary, assign as many columns as there are possible responses (in case a respondent checks every box). In the example at left, 3 columns were assigned to question #2, and the answers entered as shown.

The second approach to multiple-response questions is to assign a column to each possible response. For the example question #2 (shown on the previous page), the following columns would be assigned: q2a: Neighborhood newsletter q2b: Bulletin boards in community buildings q2c: Flyers q2d: Your childs school q2e: Word of mouth If a response was marked, place a 1 in the assigned column. If no response was given, leave it blank, or place a 0 in the column. With this method, it is harder to know if a respondent skipped a question altogether. The analyst may wish to have a column before q2a where he/she marks whether or not the question was left blank (1=blank, 2=not blank). This will help in the
Excel for Data Analysis
National Research Center Inc.

Page 3
3005 30th St. Boulder, CO 80301 (303) 444-7863

analysis, when calculating the percent of respondents giving each answer. The example below shows how the data could be entered for question #2 using this approach.

Reminder: Freeze Panes Freezing the panes allows the labels at the top of the worksheet and the IDs at the left of the worksheet to be always visible. To freeze the panes, put the cursor in the cell where the panes should break (usually B2). Then select Windows from the menu bar, and then the option to Freeze Panes. This option works as a toggle; that is, if this option is selected again, the panes will unfreeze. (If the panes are frozen, the menu option will read Unfreeze Panes.) Using this option is quite helpful where there are many variables (columns) or cases (surveys, records of data in rows).

Excel for Data Analysis


National Research Center Inc.

Page 4
3005 30th St. Boulder, CO 80301 (303) 444-7863

Entering Open-Ended Questions

An open-ended question is one in which respondents are invited to answer in their own words, rather than from a list of responses. Question #6 on the fictional survey represents an open-ended question.
6) Do you have any other comments you would like to make about this training? ________________________________________________________________________ ________________________________________________________________________

Depending on the type of open-ended question asked, the analyst may or may not wish to enter these responses into the dataset at the same time as the other questions are entered. These questions could be entered later into an appendix for a report, or they could be read and assigned codes; that is, like answers could be grouped into categories. Each category or code could be assigned a number, and these codes entered into the dataset in a manner similar to the examples shown above. For this fictional survey, the answers to Question #6 were deemed short enough to enter verbatim into the dataset, as shown in the example below:

However, the answers to Question #7 were considered appropriate for coding.


7) What is your race? ____________________

The answers were entered into the dataset as written in by respondents, as shown below, but then codes were assigned: 1=Latino/a; 2=Asian; 3=White/Caucasian.

Excel for Data Analysis


National Research Center Inc.

Page 5
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a Codebook

The examples above showed how the data entry would occur for each type of question. Generally, the analyst will want to set up the data entry spreadsheet before beginning the data entry. By knowing how to enter each type of question, the analyst can determine which questions will be entered into each column, being sure to reserve the first column for the IDs. Appendix II shows the codebook for the fictional survey being used as an example in this handbook. The ID is in column A (shown with a circle around it), question #1 is in column B, question #2, using the first version of multiple-response data entry, is in columns C through E, while question #2 using the second version of multiple-response data entry is in columns F through K (in this example, the others were ignored), the three parts of question #3 are in columns L, M and N, and so on. This codebook also shows the numeric equivalents assigned to each question response. It is a good idea to hang on to this codebook. It will serve as a customized guide in data entry, and in the analysis of the data once the dataset has been created. The example below shows the entered data for the surveys shown in Appendix I.

(Note: the columns for the open-ended questions were shrunk to allow all the columns to show.)

Excel for Data Analysis


National Research Center Inc.

Page 6
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Wrapping Text Sometimes the text entered into a cell is too long for it to display in its entirety. To turn on text wrapping (the text will automatically move to the next line if it runs out of room), highlight the cells to be formatted, then choose Format from the menu bar, and then Cells.

Click on the Alignment tab, and check the box labeled, Wrap text. Click the OK button to apply the formatting. The text should be wrapped in the cell. Note that wrapping text will change the height of the rows.

Excel for Data Analysis


National Research Center Inc.

Page 7
3005 30th St. Boulder, CO 80301 (303) 444-7863

Analyzing the Data


Now that the data collected by the program has been entered into an electronic dataset, the analyst is ready to start analyzing the information to get answers to the questions posed. This next section will demonstrate how to use formulas and functions within Excel to produce the statistics or summaries of the information needed.
Reminder: Formulas Formulas are used to perform calculations within a spreadsheet. To insert a formula, as opposed to a number or text, type an equals sign (=) in the cell where the calculation is to be performed, and then type in the rest of the formula. A formula can perform mathematical calculations or execute a wide variety of functions (see below for more on functions). To add or subtract, use the plus (+) or minus (-) symbol. To multiply, use an asterisk (*) and to divide use a slash (/). Use parentheses as necessary to indicate the desired order of operations. For example, if the analyst wanted to know how many seconds there were in 3 hours, he or she could type in the formula: =3*60*60. The result displayed in the cell would be 10,800. There might have been a cell somewhere on the page that had a value of 3 to indicate three hours; for the sake of an example, this cell is T21. To know how many seconds that represented, use the same formula as above, but exchange the 3 for the cell reference: =T21*60*60. If the number of hours in cell T21 changed, the result of the formula would also change. Reminder: Functions and Referring to a range of cells Functions can be used within formulas to perform special calculations or manipulations. There are a large number and variety of functions that can be used in Excel. Some of the functions are mathematical, some are logical, some are statistical, and others serve yet more purposes. All functions begin in a similar fashion: the equals sign (=), the function, immediately followed by an open parenthesis, the references on which the function should operate each separated by a comma (a different number of references are needed for each function), and a close parenthesis. For example, the SUM function can be used to add the values of several cells. Some functions will refer to a range of cells. For example, if an analyst wanted to total the number of youth served in the table below, a formula could be used like that found in cell B5: =B2+B3+B4. Alternatively, the SUM function could be used which referred to a range of cells to be summed, like this: = SUM(B2:B5). The colon indicates that a range of cells is being referred to, starting with (and including) the cell to the left of the colon, and ending with (and including) the cell to the right of the colon. The function SUM indicates what is to be done with this range of cells total all the values together.

Calculating an Average

Calculating the average of a range of cells is a fairly simple procedure within Excel, and appropriate for certain types of data. For example, in the fictional survey for our training program, one of the questions asks respondents to report their annual household income. The average annual income of participants could be calculated and reported.
Excel for Data Analysis
National Research Center Inc.

Page 8
3005 30th St. Boulder, CO 80301 (303) 444-7863

The function AVERAGE would be used to make this calculation. As shown in the table below, to create this formula an equals sign (=) is first typed, followed by the function, with the range of cells proceeding the function in parentheses.

Reminder: Formatting cells In many of the spreadsheet examples shown in this handbook, some of the cells are formatted as numbers, and some are formatted as percents. You will want to format the cells appropriately. To format a cell or group of cells, highlight the cells you wish to format, then choose Format from the menu bar, and then Cells. A dialogue box will open, with a number of formatting options. You can format the alignment of the cell contents, the cell shading or border, or the Number. If you choose the Number tab, you will be presented with a list of types of number formats, such as currency, percentage, etc. Choose the type, and then decide how many decimals you want. The highlighted cells will be formatted according to the specifications you choose.

Excel for Data Analysis


National Research Center Inc.

Page 9
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a Frequency Distribution for a Single-Response Question

Creating a frequency distribution, or a count and/or proportion of respondents giving each response to a question, is an intuitively easy process. However, doing it within Excel for a large number of cases is actually a multi-step procedure. The first step is to count how many respondents gave each response. There is a function within Excel that will help automate this step: COUNTIF. To use this function, specify two items: - What range of cells contains the answers to the question of interest, and - Which particular answer should be counted (the criterion). The function is set up as: =COUNTIF(range of cells, criterion). To know how many people attended the training program just one or two times, the analyst would want to count how many times 1 (the numeric assignment for question #1 to the response 1 to 2) was entered as the answer to question #1. The data for question #1 are in column B, and specifically in rows 2 through 7. The formula to enter to find out how many respondents said they attended one or two sessions would be: =COUNTIF(B2:B7,1) The results can be seen in the table below in cell B13. The formula is shown to the right in cell C13.

To get a count of the number of responses to each of the other possible answers, use the same formula, but change the criteria each time. (See the formulas in cells C14, C15, and C16.) In this example, no participants attended 1 to 2 sessions, three participants attended 3 to 4 sessions, two participants attended 5 sessions, and one participant attended 6 or more sessions.

Excel for Data Analysis


National Research Center Inc.

Page 10
3005 30th St. Boulder, CO 80301 (303) 444-7863

To know the proportion (percent) of respondents attending 6 or more times, the analyst would want to divide the number who gave that answer by the total number of those who answered the question. The SUM function can be used to total the number of respondents who answered that question. In the example above, the formula would be: =SUM(B13:B16). In the table below, that formula was entered into cell B11. To determine the proportion of people giving that answer, the contents of cell B16 would need to be divided by cell B11. As shown below, those results are displayed in cell B22. The formulas showing the formulas for calculating the proportion giving each answer to question #1 are also shown.

Excel for Data Analysis


National Research Center Inc.

Page 11
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Absolute versus relative cell references In a formula, a cell reference can be made in a relative or an absolute manner. For example, looking at the table below, if the analyst wanted to calculate a percent, he or she might create a formula in cell C2 which would display the proportion of youth served who are 12-14 years old. That formula would be: =B2/B5, which would divide the value of B2 (12) by the value of B5 (112).

The analyst may then wish to also calculate the proportion of youth served who are 15-17 years old. If the contents of cell C2 were copied to cell C3, the formula would look like this: =B3/B6. This is because in Excel the cell references in this formula are relative references; that is, Excel has assumed that because in cell C2 the calculated number was derived by dividing the number in the same row and one column to the left by the number three rows below and one column to the left, the same thing should happen in the cell to which the formula is copied. However, cell B6 is blank, so an invalid number would be calculated in cell C3 using this formula. This can be fixed by changing the formula after it has been copied, so that the denominator refers to B5. But, if the formula is then copied to cell C4, the denominator would again have to be manually changed in the formula to refer to the correct cell that contains the total number of youth served. If this manual change was not made, the formulas in column C would look like the formulas in column D in the table below. If, however, an absolute reference was used to refer to the row that contains the total number of youth served, when the formula was copied, the denominator would always refer to row 5. The dollar sign ($) is used to indicate an absolute reference. In this example, it is only used for the row designation, not for the column designation. It can be used for both the row and column designation, or only one or the other. Excel defaults to assuming that all cell references are relative, unless the change is made manually. Knowing how to use relative and absolute references can greatly speed up creation of spreadsheets in Excel.

Excel for Data Analysis


National Research Center Inc.

Page 12
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a Frequency Distribution for a Multiple-Response Question

The approach to be used to calculate the results to a multiple-response question depends upon the approach used to enter the data. If the data have been entered such using the first approach described, where a numeric assignment is made for each possible response, but more than one column is designated for entry of the results (as in columns D, E and F in the table below), then the counts and proportions can be calculated in a manner quite similar to that of an single-response question. The change would be in the definition of the range of cells to include in the count. Instead of covering only one column, it would cover multiple columns. In this example, the number of people who said they heard of the program through the neighborhood newsletter would be determined using the formula: =COUNTIF(D2:F7,1) Calculating the percent of respondents who heard of the program through the neighborhood newsletter would also be changed slightly. Instead of dividing the number of respondents giving a specific answer by the sum of the cells F13 through F17 (which would be the total number of responses, not respondents answering the question), the denominator is the total number of respondents answering the question. To determine this, the number of valid answers entered in column D would need to be examined. This can be done using the COUNT function. This formula is not shown in the table below, but would be entered in cell D11 as follows: =COUNT(D2:F7) This function counts the number of non-blank answers in the range of cells specified. In this case, every respondent gave at least one answer, so the total is 6, the same as the number of returned surveys. This same formula (with the correct cell range specification) was used in cells E11 and F11. The numbers displayed there designate the number of people who gave two or more answers (4 people, see cell E11) or three answers (1 person, see cell F11). It should be noted when reporting the percentages to a multiple response question that the percents will add to more than 100%, as respondents can give more than one answer.

Excel for Data Analysis


National Research Center Inc.

Page 13
3005 30th St. Boulder, CO 80301 (303) 444-7863

If the answers to question #2 were entered as shown in columns H through M, where each possible answer was assigned to a column, and a 1 was used to designate when a box was checked, then a slightly different approach is needed to create the frequency distribution. First, to get the total number of respondents who gave an answer, column H needs to be appropriately analyzed. In this instance, a 1 was entered if a respondent gave no answer to the question, and a 2 was entered if a respondent gave at least one answer. The formula in cell H11 (not shown in the table below) was =COUNTIF($H$2:$H$7,2), to count the number of valid answers to question #2. This formula was copied to cells I11, J11, K11, L11 and M11. To determine the number of people who indicated each potential source of familiarity with the training, the number of 1 responses in each column was counted, using the COUNTIF function. The formula for cell M13 (the number of respondents indicating they heard of the program by word of mouth) is shown in cell N13. A similar formula was used for each of the other responses. Next, to determine the proportion of respondents each of those counts represented, the counts were divided by the number of valid responses to question #2. As shown in cell M19, 33% of respondents reported they had heard of the training by word of mouth. The formula used to make that calculation is shown in cell N19. A similar formula was used for each of the other responses. Again, it should be noted when reporting the percentages to a multiple response question that the percents will add to more than 100%, as respondents can give more than one answer.

PivotTables cannot be used to calculate the frequency distribution of multiple response questions.

Excel for Data Analysis


National Research Center Inc.

Page 14
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Functions Revisited SUM is only one of a large number of functions available in Excel. Some of the functions are mathematical, some are logical, some are statistical, and others serve yet more purposes. All functions begin in a similar fashion: the function, immediately followed by an open parenthesis, the references on which the function should operate each separated by a comma (a different number of references are needed for each function), and a close parenthesis. The functions needed for simple descriptive analyses in Excel are shown below.

Functions and formulas used for simple descriptive analyses in Excel

The table on the next page displays the functions used to perform the analyses described in this handbook. The examples all refer to the spreadsheet and examples shown in Appendix III.

Excel for Data Analysis


National Research Center Inc.

Page 15
3005 30th St. Boulder, CO 80301 (303) 444-7863

Functions and formulas used for simple descriptive analyses in Excel Calculate . . . the number of surveys completed the average rating or answer of those who responded the lowest number given as an answer the highest number given as an answer the number of respondents who gave a specific answer* the total number of respondents who answered the question** the total number of respondents who answered the question by . . . counting the number of rows of data entered (regardless of whether some cells/rows are blank) calculating the average of the ratings or answers given by those who gave an answer examining the values in a range of cells, and finding the lowest value examining the values in a range of cells, and finding the highest value counting the number of responses of a certain type within a range of cells adding the number of people who gave a valid answer to a question using the function or formula . . . ROWS operators are: range of cells for which the number of rows should be counted range of cells containing the values to be averaged range of cells containing the values to be examined range of cells containing the values to be examined 1) the range of cells to be examined 2) the value to be counted range of cells to be totaled example: =ROWS(B2:B7) value displayed: 6 what it means: 6 surveys were returned The average annual income as reported for question #10 The lowest annual income as reported for question #10 The highest annual income as reported for question #10 2 people gave an answer of 5 times question #1 6 people answered question #1 4 people gave two or more answers to question #2 (as

AVERAGE

=AVERAGE(AH2:AH7)

$29,000

MIN

=MIN(AH2:AH7)

$15,000

MAX

=MAX(AH2:AH7)

$57,000

COUNTIF

=COUNTIF(B$2:B$7,3)

SUM

=SUM(B13:B16)

counting the number of nonblank answers

COUNT

range of cells to be examined

=COUNT(E2:E7)

column E contains the second answer people gave to question #2)

the proportion (percent) of respondents who gave a specific answer

dividing the number of people who gave a specific answer by the total number of people who answered the question

(division)
[cell reference1]/[cell reference2]

cell reference1 is the cell reference of the numerator; cell reference2 is the cell reference of the denominator

=B15/B$11

33%

33% of respondents gave an answer of 5 times to question #1

*This is used for each row or part of a frequency distribution. ** Or the sum of any list of numbers.
Excel for Data Analysis
National Research Center Inc.

Page 16
3005 30th St. Boulder, CO 80301 (303) 444-7863

Presenting the Results: One Quick Idea


Once the frequency distributions of the data set have been produced, how will the analyst and other program staff share this information with others? The Excel spreadsheet is not very pretty. One idea is to create an annotated instrument; that is, typing the results into a blank questionnaire.1 Most evaluation forms or surveys have been created using word processing software such as Word or WordPerfect, and thus are well-suited to this approach. A new file should be created from the electronic version of the survey. The check boxes can then be replaced with the proportion of respondents giving each answer. For example:
1) How many of the training sessions did you attend? 0% 1 to 2 50% 3 to 4 33% 5 17% 6 or more

Staff can write a cover memo or report to accompany the annotated instrument that explains the methods used to obtain the data and interprets the results. An example copy of an annotated instrument can be found in Appendix IV.

The term annotated instrument is one created by and used by staff at National Research Center, Inc. It is NOT a commonly used evaluation term, but one that we think is descriptive.
Excel for Data Analysis
National Research Center Inc.

Page 17
3005 30th St. Boulder, CO 80301 (303) 444-7863

Using Pivot Tables for Basic and Advanced Analyses


Pivot tables are an analytic tool at the disposal of the Excel user. They take a bit of time to set up, but can be very powerful. Pivot tables can be used as an alternate way to create frequency distributions, although they cannot be used for multiple response questions. They can also be used to create crosstabulations of data. For example, the analyst might wish to know whether males and females have a different response to a training, or whether younger respondents feel more positively about staff than older respondents. A useful first step before creating a pivot table is to name the range of cells that will be used for the analyses. This range of cells should include the first row with the variable names.
Reminder: Naming a Range of Cells To name a range of cells, highlight all the columns and rows that make up the database. Choose Insert from the menu bar, select Name and then Define

In general, when the named range of cells will be used for creating pivot tables, it is a good idea to name the range Database. This is the default name used by Excel in the pivot table wizard. The Define Name dialogue box above shows that the name Database has been typed in. The field labeled Refers to: shows that Database will refer to the cells starting at A1 and going to W7 in the worksheet labeled Data Entry. These are the cells that contain the data entered for the fictional survey.

Once a range of cells has been defined, pivot tables can be created from those data. It is easiest to create the pivot tables on another worksheet within the workbook.

Excel for Data Analysis


National Research Center Inc.

Page 18
3005 30th St. Boulder, CO 80301 (303) 444-7863

Reminder: Worksheets within a Workbook (or Spreadsheet) An Excel file is often referred to as a spreadsheet. This file, however, is comprised of a group of worksheets. By default, a new workbook in Excel usually contains three worksheets. These are usually labeled Sheet1, Sheet2, and Sheet3. The note below was entered in cell B7 on Sheet2. To see a different worksheet, simply click on the tab of the worksheet to be viewed. To rename the worksheets, double-click the tab and type a new name. Names are limited to a certain number of characters.

Excel for Data Analysis


National Research Center Inc.

Page 19
3005 30th St. Boulder, CO 80301 (303) 444-7863

Creating a PivotTables (Basic Analyses)

Before the analyst sets up the pivot table, he or she should place the cursor in the cell where it is desired to generate the pivot table. To set up a pivot table, go to the Data menu, then select PivotTable and PivotChart Report The PivotTable and PivotChart Wizard will walk one through the rest of the set up. In the example below, the pivot table will be placed in cell B4.

The Pivot Table and PivotChart Wizard Once PivotTable and Pivot Chart Report... has been selected from the Data menu, the Pivot Table and PivotChart Wizard will start displaying a series of dialogue boxes. The first dialogue box is shown below as Step 1 of 3. (Note: Different versions of Excel will have slightly different Pivot Table and PivotChart Wizard dialogue boxes, but the steps to follow are the same or similar.) Step 1: Two questions are asked in Step 1 of the Wizard. For the most part, the analyst will select the Wizards default options. In answer to the first question, the data to be analyzed is an Excel list or database. In answer to the second question, a PivotTable will be created. (Note: PivotCharts are not discussed in this handbook, but the analyst may wish to try this option.) Click Next to continue onto the next step of the Wizard.
Excel for Data Analysis
National Research Center Inc.

Page 20
3005 30th St. Boulder, CO 80301 (303) 444-7863

Step 2: In Step 2, the Wizard asks for the location of the data to be used in the PivotTable. The name Database is automatically inserted as the answer. If another named range is desired, it can be typed into the field. If the range of cells to be used has not been named, it can be selected by clicking on the Browse button. Click Next to continue onto the next step of the Wizard. Step 3: In Step 3, the Wizard asks where the PivotTable should be placed. The default is the location of the cursor when the Wizard was started. At this point, the analyst will choose the data to be displayed in the PivotTable by clicking on the Layout button. When this button is clicked, another dialogue box is displayed Layout: The Layout dialogue box displays all the variables or fields available for display in the PivotTable. These fields are shown as a series of buttons in the right half of the dialogue box. If there are a large number of fields, the scroll button below the fields can be used to show additional field buttons. In the left half of a blank template is shown. To select a field for display, simply drag the fields from the right into the areas on the left. To create a pivot table that displays the frequency of training attendances, the button q1 (How many of the training sessions did you attend?) would be dragged into the row area, so that the values in q1 will be listed vertically as rows. A field is also needed for the data section. It does not really matter what button is dragged into the data section, as it will be used simply as a counter. However, it should be a field that has no missing data; the ID field is ideal for this situation. As shown above, the ID field was dragged into the data area. Usually by default the field in the data area will be shown as a Count. If a different summary is desired, double-click the button, and a dialogue box displaying various options will be displayed.

Excel for Data Analysis


National Research Center Inc.

Page 21
3005 30th St. Boulder, CO 80301 (303) 444-7863

PivotTable Field: The Field dialogue box shown to the left is displayed if a button in the data portion of the template is double-clicked. In this example, the data summary chosen is Count. In addition, if the Options>> button is clicked, more options for the display of the data are shown.

In this instance, it would be appropriate to display the information as a proportion, so the option of showing the data as: % of column was selected.

Format Cells: To choose a number format for the data display, click on the Number button in the PivotTable Field dialogue box. A Format Cells dialogue box will be displayed, from which an appropriate number format can be selected.

Excel for Data Analysis


National Research Center Inc.

Page 22
3005 30th St. Boulder, CO 80301 (303) 444-7863

After this, click the OK buttons until the Step 3 dialogue box is again showing. At this point, if the Finish button is clicked, the PivotTable will be displayed. In this example, the PivotTable will appear as shown to the right: Note that when using the PivotTable method for this question, the value 1 (1 to 2 sessions) is not listed because no one selected this response in the survey.

Crosstabulation of Data Using PivotTables (Advanced Analyses)

Sometimes it is useful to analyze the data based on certain respondent characteristics; for example, satisfaction ratings by gender or program attended. One of the easiest ways to generate a table like this is through the use of a PivotTable. The example to the right shows the PivotTable layout and resulting table to perform a crosstabulation of the results to question #5 How would you rate the overall quality of this training? by the gender of the respondent. (Of course, crosstabulations are recommended with larger datasets than that created for these examples, with sufficient number of cases within each subgroup examined.) This PivotTable Layout: (Q9, gender, is placed in the column area, while q5, quality rating, is placed in the row area. ID is again used for the data section.)

produces:

Females (1) gave more positive answers than did males (2).

Excel for Data Analysis


National Research Center Inc.

Page 23
3005 30th St. Boulder, CO 80301 (303) 444-7863

The analysis in the previous example could also be performed using the average quality rating, on a scale from 1 to 4, where 4 = excellent and 1 = poor.

This PivotTable Layout: (Q9, gender, is placed in the column area, while q5, quality rating, is placed in the data area. The type of data summary was changed to Average, and the Number formatting was changed to a number with two decimal places.)

produces:

Again, this shows that females (1) gave higher quality ratings than did males (2).

Excel for Data Analysis


National Research Center Inc.

Page 24
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX I: Example Completed Surveys for Data Entry


The following pages show the completed surveys from six participants in a fictional training program. These were used for all the examples in this handbook.

Excel for Data Analysis


National Research Center Inc.

Page 25
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 26
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 27
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 28
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 29
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 30
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 31
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX II: Example Codebook

Excel for Data Analysis


National Research Center Inc.

Page 32
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX III: Example Analysis, with Formulas

Excel for Data Analysis


National Research Center Inc.

Page 33
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 34
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 35
3005 30th St. Boulder, CO 80301 (303) 444-7863

Excel for Data Analysis


National Research Center Inc.

Page 36
3005 30th St. Boulder, CO 80301 (303) 444-7863

APPENDIX IV: Example of an Annotated Instrument


The next page shows an example of an annotated instrument for the training program using the data examples as included in the previous appendices.

Excel for Data Analysis


National Research Center Inc.

Page 37
3005 30th St. Boulder, CO 80301 (303) 444-7863

Training Evaluation: Annotated Instrument


1) How many of the training sessions did you attend? 0% 1 to 2 50% 3 to 4 33% 5 17% 6 or more 2) How did you hear about this training? (Please check all that apply.) 33% Neighborhood newsletter 50% Your childs school 17% Bulletin boards in community buildings 33% Word of mouth 50% Flyers 0% Other 3) Please rate the following aspects of the training:
Very Poor Very Good

Poor

Good

The instructors knowledge of the topic .......................................................... 0% The instructors presentation style/skills ...................................................... 0% The handouts or take-home materials ........................................................... 20%
Strongly Disagree

17% 25% 0%

67% 50% 80%

17% 25% 0%
Strongly Agree

4) Rate the extent to which you agree or disagree with each of the following statements.
Disagree Agree

I would strongly recommend this training for my friend............................. 0% This training will help improve the quality of like for my family................. 0%
Poor

20% 17%
Fair

60% 50%
Good

20% 33%
Excellent

5) How would you rate the overall quality of this training? ...............................17% 6) Do you have any other comments you would like to make about this training? I think we spent too much time reviewing the background information. I had a lot of fun. I thought Angela was great. This was great! I will definitely apply what I learned at work and at home! 7) What is your race? 50% Latino/a 17% Asian 33% White 8) How long have you lived in Colorado? 17% 6 years 50% 7 years 33% 8 years 9) What is your gender? 50% Female 50% Male

0%

33%

50%

10) What is your annual household income? average annual income:= $29,000 33% less than $20,000 33% $20,000 to $29,999 17% $30,000 to $39,999 17% $40,000 or more 11) Is your child enrolled in the free lunch program? 50% Yes 50% NO

Thank you for your answers!


Page 38
3005 30th St. Boulder, CO 80301(303) 444-7863

Excel for Data Analysis


National Research Center Inc.

You might also like