This action might not be possible to undo. Are you sure you want to continue?
Using Advanced Excel Techniques to Manage a Large Data Collection (Office 2010)
Application Support & Training
Office of Information Technology, West Virginia University
OIT Help Desk – (304) 293-4444
Instructor: Roman Olynyk
Last revised: March 22, 2012
Table of Contents
Course Description ............................................................................................. 1 What’s Special about Tables? ......................................................................... 1 Excel or Access? ............................................................................................. 2 Creating a Table .................................................................................................. 2 Activity 1 – Creating a Simple Table ................................................................ 3 Anatomy of an Excel Table ................................................................................ 4 Table Features ................................................................................................ 4 Structured References..................................................................................... 5
Table Name...................................................................................................................... 5 Column Name .................................................................................................................. 5 Special Item Specifiers .................................................................................................... 5
Activity 2 – Modifying and formatting a simple table ........................................ 6 Activity 3 – Working with structured references ............................................... 6 Navigating the Table........................................................................................ 7 Activity 4 – Working with a Large Data Range ................................................ 7 Filtering................................................................................................................ 8 Specifying Criteria – Advanced Filter Techniques ........................................... 9 Creating the Advanced Filter ........................................................................... 9
Criteria Range .................................................................................................................. 9
Activity 5 – Advanced Filter, in place ............................................................. 10 Activity 6 – Advanced Filter to another location ............................................. 11 Activity 7 – Advanced Filter, Unique records only ......................................... 11 Advanced Filter – A look behind the Scenes ................................................. 12 Data Validation in a Table ................................................................................ 14 Activity 8 – Create a validation rule and copy it to other cells ........................ 14 Activity 9 – A table as a validation list ............................................................ 16 Database Functions .......................................................................................... 18 Syntax of the database function .................................................................... 18 Table of Database Functions Featured in Excel 2010 ................................... 19 Activity 10 – Using the “D” Functions............................................................. 19 Lookup Functions ............................................................................................. 20 The INDEX and MATCH Functions ............................................................... 20 Activity 11 – Using the INDEX and MATCH Functions .................................. 21 Going A Round with Pivot Tables ................................................................... 22 Activity 12 – A pivot table to count students .................................................. 22
ii WVU Office of Information Technology
This tutorial provides an introduction to Excel tables and how to use them for simple databases. In this tutorial, you will learn about: How to create a table Table features and components Formatting options for tables Structured references Converting a table to a range Working with large data sets Working with filters Advanced filters as they apply to tables Data validation in the context of a table Excel’s database functions Lookup functions
What’s Special about Tables?
Excel tables (previously known as lists) turn a range of cells into a data table, making it easier to manage large volumes of related data. A table typically contains related data in a series of worksheet rows and columns that have been formatted as a table. Following are some of the benefits and features of an Excel table: When you format a range in Excel as a table, you get a working set of data instead of a simple worksheet of values. Tables are optimized for handling large volumes of data. Column headings are automatically “frozen” for scrolling down. Each row is a record. When adding rows, there’s no need to worry about updating formula references, formatting options, filter settings, etc. Tables make use of structured references, which make it much easier and more intuitive to work with data and formulas. Table names and column headings work like name ranges. For example, a formula would look something like this: =SUM(Table1[Column3]). Formulas automatically adjust to accommodate changes in the data range. Tables export easily to Pivot Tables and Pivot Charts.
Excel for Databases
Excel or Access?
Microsoft Excel is a spreadsheet application that can perform calculations and provide graphing tools and pivot tables. Microsoft Access, on the other hand, is a database application that can organize data in tables. Excel is good for storing lists of data, and Excel tables can make that task considerably easier. In addition, some of the techniques in this tutorial will show you how to query this data in ways you may not have thought possible. Bear in mind, though, that Microsoft Access may be a more appropriate tool if you want to be able to look at relationships between multiple tables or if you wish to regularly perform complex queries on that data. Your decision to use either Excel or Access is not irreversible. Both of these products are quite capable of importing and exporting data to one another, so your decision to use one or the other can be largely governed by your needs and your own comfort level with either application.
Creating a Table
There are two ways to create a table. You can either insert a table directly in the default table style or you can convert an existing range into a table. The second approach is by far the most common: 1. On a worksheet, select the range of cells that you want to include in the table 2. On the Insert tab in the Tables group, click on the Table command.
3. A Create Table dialog box will appear. Your selected range appears as an absolute cell reference. If you wish to change the coordinates, you can edit the values or use the browse icon on the right to reselect the range.
Note that it’s really not necessary to select the range first. If you don’t, Excel tries to find the range for you, allowing you to override the cell coordinates in the “Where is the data for your table?” box or use the graphical “picker” to the right. 4. If your selected range contains data that you want to display as table headers, select the My table has headers check box. 5. Click the OK command button to create the table. When you have an Excel table selected, you will have access to a Table Tools contextual tab with a single Design sub-tab.
Each time you create a table, Excel creates a default table name in the Properties group (e.g., Table1, Table2, etc.). The scope of the table name is for the entire workbook. You can rename the table to something more meaningful by typing a new name in the Table Name box. This table name refers to the entire range of data in the table with the exception of the header and total rows (see Anatomy of an Excel Table below), and it can be used in structured references. A note on Activities: The hands-on activities in this tutorial assume that you have access to the Excel file “ExcelTablesData.xlsx” which should be available in the Excel folder of the OIT Workshops directory.
Activity 1 – Creating a Simple Table
1. In the Excel file provided for this tutorial, click on the worksheet named “SimpleRange,” which contains quarterly sales records for four different products. 2. Select the range A1:E5. 3. Click on the Table command, which is in the Tables group on the Insert tab. 4. In the Create Table dialog box, verify that the absolute range A1:E5 is selected and that the My table has headers box is checked. 5. Click the OK command. Your table should appear:
Note that the new table has preserved the original accounting/comma formats for the dollar amounts. 6. With the table selected, click on the Table Tools contextual tab. In the Table Style Options group, click in the Total Row checkbox. Note that a total only appears for Q4. 7. Select the Q4 total cell and copy the formula to the left so that there are totals for all four quarters. 8. In cell F1 type “YearTotal” and hit Enter. Observe that the table has expanded to include the new column. 9. Use the AutoSum command in cell F2 to add up the four quarterly sales figures for Dodads. Take note of the formula that displays before you press Enter. By entering a formula in one cell in a table column, you can create a calculated column in which that format is instantly applied to all other cells in the table column. 10. The overall sales total is missing in cell E6. Because you have previously activated the Total Row, you can click in cell E6 to display a pull-down arrow. Click the arrow and note the various functions that are available for the Totals Row. Select the Sum function. Also, note the syntax of the formula for calculating the total for YearTotal. 11. Back in the Table Style Options group of the Tables tab, click on the check box for Last Column. This will allow an Excel theme to provide additional formatting to the last column.
Excel for Databases 3
Anatomy of an Excel Table
A typical table contains the following elements:
1. The entire table (A1:F6) 2. Table data (A2:E5). When you assign a Table Name, this is what is referenced. 3. A column (D1:D6) and column header (Q3) 4. A calculated column (E4:E6) – By entering a formula in one cell in a table column, you can create a calculated column in which that formula is instantly applied to all other cells in that column. Last Column formatting – a Table Style Option – has also been applied to this column. 5. The Header row (A1:F1) – A newly created table has a header row by default. Notice that every column also has filtering enabled in the header row so that you can filter or sort your table data quickly. This appears when you first create a table, but you can toggle it on or off by clicking on the Filter command in the Data tab. 6. The Total Row (A6:E6) – The total row – another Table Style Option – provides access to summary functions (e.g., AVERAGE, COUNT, or SUM). A drop-down list appears in each total row cell so that you can quickly calculate the totals that you want. Total Row formatting has also been applied to this row.
Sorting and filtering Filter drop-down lists (i.e., AutoFilters) are automatically added in the header tow of a table, allowing you to sort in ascending order or by color. You can also filter to show only the data that meets the criteria that you specify. Formatting table data You can use predefined or custom table styles to format your table and make it easier to distinguish between data and other table components. Other style options allow you to show or hide other components, such as total row, header row, last column, or banding. Calculated columns To use a single formula that adjusts for each row in a table, you can create a calculated column. The formula automatically expands to include other rows in the table. Insert and delete table rows and columns You can insert and delete rows and columns as needed, either by right-clicking over a selected row or column, or by using commands in the Cells group of the Home tab. 4
Displaying and calculating table data statistics By displaying the Totals Row at the bottom of the table, you can show Sum, Average, Count, Max, Min, and many other functions.
The syntax of structured references is an important aspect in your use of tables and formulas. References can be used in the same way as named ranges. Like named ranges, structured references can make your formulas easier to read and understand. Unlike named ranges, however, structured references are automatically generated when you create a table. The references will adjust automatically as the table grows or shrinks. If you delete a table, those references get deleted too.
A table name is a meaningful name that you provide to reference the actual table data (excluding the headers row and totals row (if any). When you insert a table, Excel automatically assigns it a default name (Table1, Table2, etc.). You can change this name to something more meaningful by changing the entry in the Table Name box (in the Properties group of the Design tab). When you reference a table name in a formula, such as =SUM(Table1), you are operating on every data cell in the table.
The column name is derived from the column header. When referenced, the column name is enclosed in square brackets. It references all of the data in a column, excluding the header and totals row. When you specify a column in a formula, such as =SUM(Table1[Q1]), you will be adding all of the values in the column named Q1 in Table1. In order to reference a range of columns, enclose the range in square brackets as well, for example: =SUM(Table1[[Q1]:[Q4]] Even though all column headers are text strings, they do not require quotation marks when you use them in a structured reference.
Special Item Specifiers
You can also refer to specific portions of a table: #All The entire table, including column headers and totals #Data Only the data (excludes headers and totals) #Headers Only the headers row #Totals Only the Totals row (returns null if there is no Totals row) For example, use TableName[#All] whenever you need to refer to the entire data table, including column headers and totals. You could use this like a range name in a function.
Excel for Databases 5
Activity 2 – Modifying and formatting a simple table
This activity continues with the simple table you created in Activity 1 – Creating a Simple Table. We’re going to change the default name of the table to something more meaningful, add another record, sort the data, and modify the appearance of the table. 1. Click on the Table Tools Design tab. If you don’t see it, make sure the table is selected by clicking on any cell in the table. 2. In the Properties group, type over the existing name in the Table Name box, changing the name to something like “ProductSales.” Hit Enter. Click anywhere in a calculated column and notice that the formulas have been updated to use the new table name. 3. We want to add another product record, so add a new row above the Total Row (row 6). A convenient way to add a new row is to select the last cell of the last data row (i.e., F5) and press Tab. 4. In the new row, type in the product name “Gadgets.” Using the tab key, type in these quarterly sales amounts: Q1 1200 Q2 1400 Q3 9876.54 Q4 1234.56
Enter only the numbers, because the Accounting format from the original data range will handle the rest of the formatting for you. 5. Using the pull-down to the right of “Product” sort the records from A to Z. 6. In the Table Styles group of the Table Tools tab, click on the “more” arrow in the lower-right corner to expand the styles. Mouse over some of the styles and note that your table displays a preview of each style that you hover over. Select Table Style Medium #4 (fourth from the left in the first Medium row). Notice that this style does not have a color fill for the YearTotal column. However, because the Last Column table style is still in effect, YearTotal sums are shown in a bold-face font style. 7. In the Table Style Options group, uncheck the box for Banded Rows so that the table no longer has the banded appearance. 8. Click the Save icon in the Excel quick links section to save your work.
Activity 3 – Working with structured references
This activity continues with the simple table you worked with in Activity 2 – Modifying and formatting a simple table. You will transcribe some sample formulas to observe how structured references behave with different portions of table data. Finally, you will convert the table back to a range so that you can see how Excel translates a structured reference back to standard Excel. 1. A few lines below the ProductSales table, i.e., A10, type the following four labels: 6 Table Total Q1 Total Sales Total Quarterly Total
2. To the right of Table Total (B10), type the formula =SUM(ProductSales) and hit Enter. Notice that Table Total is adding up all of the cells in ProductSales, including the YearTotal column – a figure that is exactly double the YearTotal total in F7. 3. To the right of Q1 Total, type the formula =SUM(ProductSales[Q1]) and hit Enter. Q1 Total should be the same amount shown in the total row for Q1 (B7) even though the formula is different from that in B7. 4. To the right of Sales Total (B11), type the formula =SUM(ProductSales[[Q1]:[Q4]]) and hit Enter. This should display the grand total for quarterly product sales, but not include data from the YearTotal column. 5. To the right of Quarterly Total, type the formula =SUM(ProductSales[[#Totals],[Q1]:[Q4]]) and hit Enter. You have just calculated the sum of data in the Totals row of the table, but only in the range Q1 to Q4. 6. Click in the Totals row for Q1 (i.e., B7) and change the summary function from SUM to NONE. Notice that Quarterly Total no longer includes Q1 in its calculation, but Q1 Total and Sales Total remain unchanged. 7. We want to provide this data to another Excel user who doesn’t understand tables. In the Tools group of the Table Tools tab, click on the Convert to Range command. Click “Yes” in response to the dialog prompt “Do you want to convert the table to a normal range?” 8. Look over the formulas in the range version of ProductSales and the four totals formulas that you had just created. Note how the structured references have all been converted to standard cell references.
Navigating the Table
The following techniques are not specific to Excel tables, but a refresher might be useful. To do this: Go to the bottom of the sheet Go to the top of the sheet Select all the data in a table column Select all the data in a table row Type this: Ctrl-End Ctrl-Home Select the top data cell, then Ctrl-Shift-Down Arrow Select the first cell in a row, then Ctrl-Shift-Right Arrow
Activity 4 – Working with a Large Data Range
Now that you are somewhat comfortable with Excel tables, let’s work with something a little more challenging. An Access table of math and science students was imported into Excel and saved as a range with 941 records. We will create a new table, give it a meaningful name, and add a calculated column. 1. Make a copy of the Students worksheet and call it StudentData. We’ll work with the data in this copy, instead of the data in Students. 2. Make StudentData into a table – this time, however, do not try to select all of the data like we did in Activity 1. Be sure that “My table has headers” is checked. (The data range should be $A$1:$O$942). 3. Let’s give this table a more descriptive name. In the Properties group of the Table Tools tab, change the Table Name to StudentTable. Press Enter to make the change.
Excel for Databases 7
“StudentData” is also descriptive, but this way you’ll be able to better differentiate the worksheet name from the table name in the structured references. 4. Scroll down the table and observe that the column headers – Last Name, First Name, etc. – are still visible at the top of the table. As long as you are in a table, the field names will appear “frozen” on the header row. A feature of Excel tables is that the Header Row option is checked by default. 5. Click to select any cell outside of the table, and notice that the column headers turn back into traditional column labels. Note that the Table Tools tab has also disappeared. It is a contextual tab that is only visible when you are working within a table. Select a cell within the table so that the Table Tools tab is back again. 6. Put a check in the Total Row box (Table Tools tab Table Style Options group). Notice that the total row at the bottom of the table displays a sum for just the GPA4 field. A sum is not a particularly useful value for this type of data. 7. Click-select the cell with the sum, so that a pull-down arrow appears to its right. Click the arrow, and try out some of the other functions, such as Average, Max, and Min. We can settle on Average. 8. Click on the formula fill box for the GPA4 average cell and pull it three cells to the left so that you now have averages for GPA1-4 visible. 9. Click the Total cell at the bottom of the Last Name field (A943). From the drop-down, select Count. You should now see the count – 941 – of the records on display. 10. Click on the filter selector to the right of the Gender column header. Click to uncheck the Select All choice and then click to select just female. Click OK to close the filter dialog. 11. When you filtered Gender for “female,” notice that the Total Row beneath the Last Name column has changed from 941 to 481, reflecting the number of Last Names for females. The GPA averages to the right are also changed to show GPA averages for the females. 12. Click on the filter selector to the right of the Gender column header, and select Clear Filter From “Gender” to remove this filter. 13. Type Ctrl-Home to navigate back up to the top of the table. 14. In the header row at the top, click in the first empty cell to the right of GPA4 (it should be P1). Type the heading “CumulativeAVG” and hit Enter. Notice that the CumulativeAVG field was automatically integrated into the StudentTable table. 15. Select the first cell under CumulativeAVG, and in the Function Library group of the Formula tab, click the pull-down beneath AutoSum to display the menu. Select Average. The suggested formula range =AVERAGE(StudentTable[@[GPA1]:[GPA4]]) is correct. Hit Enter to accept this formula. The formula is applied to every record below the CumulativeAVG field. 16. Click the Save icon in the Excel quick links section to save your work.
You’re probably familiar with at least the basics of sorting and filtering in Excel, but it might be worthwhile to review some filtering concepts. When you apply a filter for a certain criteria, for example only Chemistry in the Major field, you are actually hiding the rows of records that do not contain “Chemistry” in the major.
If you delete the filtered range of chemistry majors, you would leave behind all of the hidden records (i.e., all of the other majors). You can copy a filtered range to another place in the worksheet or workbook. The copy, however, contains only the filtered range – not any hidden records. If you apply any conditional formatting rules; you will also gain the ability to filter for those formats. This provides you with a powerful tool to look at specific criteria in your data.
Specifying Criteria – Advanced Filter Techniques
Excel features an Advanced Filter command that lets you use complex criteria to filter a range. This technique is markedly different from the custom AutoFilters that you’ve used so far. It bears much more resemblance to an Access Query. Let’s first look at why you would use an advanced filter. If you wanted to search the Student table for female freshman geology majors, you could just specify filter criteria for gender, rank, and major. But what if you also wanted to search (don’t ask me why) for female freshmen from New Jersey, regardless of major? You can see that this is much more complicated. If you’d like to generate a list of unique values in a field range, such as all of the majors or -- better yet -- all of the state codes, the advanced filter will let you easily create such a list. Depending upon your selection, an advanced filter can perform one of the following actions: 1. Filter the list, in-place 2. Copy to another location In the next couple of activities, we’ll look at how each of these actions work.
Creating the Advanced Filter
Before we actually begin to use the Advanced Filter, you need to have an area in your worksheet where you can specify the criteria for your filter. Excel will use this criteria range (an actual named range) as the source for the advanced filter criteria. The criteria range is typically placed directly above or below the data – if you place it along side of the data range, you risk hiding it when a filter is applied to the data.
To create a criteria range, it should be at least three rows above the data range, so that when it is complete, there will be at least one blank row between the criteria range and the data range. The criteria range should have at least one column heading in the top row. This heading should match a heading in the data range. These are the basic steps: 1. Insert at least four blank rows above the data range. 2. Copy the first row containing the field name of the data range and paste it in as the top row of what will become your criteria range. 3. Enter the criteria below the appropriate column heading(s) in the criteria range (If you are familiar with Access Queries, this technique is very similar). 4. Run the Advanced Filter.
Excel for Databases
Activity 5 – Advanced Filter, in place
In this activity we’ll use the StudentTable table (in the StudentData worksheet) to create a criteria filter. We’ll filter this list “in place.” 1. Select the first four rows of the StudentTable, and in the Cells group of the Home tab click the Insert Sheet Rows command to create four blank rows above the table. 2. Copy the header row of the StudentTable (A5:P5) up to the first row. 3. Create a filter for female freshman geology majors in row 2:
Important: The Advanced Filter is case-sensitive, so be sure that “female” is lower case and “Geology” is capitalized. 4. Click anywhere in the StudentTable to select it, and then toggle the AutoFilter back on. 5. In the Sort & Filter group of the Data tab, click the Advanced Filter command. You will see the Advanced Filter dialog box:
a. In the Advanced Filter dialog box, be sure that the radio button for Filter the list, in place is selected. b. Select the entire table for the List range ($A$5:$P$947). One way to do this is to use the structured reference: StudentTable[#All]. c. Click the point & click selector to the right of the Criteria range box and use your mouse to select A1:P2. Close the criteria range picker. The range should now display as StudentData!$A$1:$P$2. d. Click the OK button to run the advanced filter. 6. The resulting table should be down to 26 records. Notice that the row numbers are in blue, which is one indication that you’re looking at filtered results. 7. Clear (i.e., turn off) the filter by clicking on the Clear command in the Sort & Filter group. 8. Save these changes to the StudentData worksheet, because we can use this data again later.
Activity 6 – Advanced Filter to another location
In this activity we’ll once again use the StudentTable table (in the StudentData worksheet) to expand the criteria filter. We’re going to search on additional criteria, looking for freshman females from New Jersey. We’ll also explore the option where you copy the filter results to another location. This can be handy if you need to provide an extract of your data. 1. In the blank row (row 3) underneath the current criteria range, add “female” under Gender, “1” under Rank, and “NJ” under State:
2. In the Sort & Filter group of the Data tab, click the Advanced Filter command.
a. In the Advanced Filter dialog box, click on the radio button for Copy to another location. b. The list range should remain the same ($A$5:$P$947). c. This step is important: you need to expand the Criteria range to include the additional row. Using the point & click selector to the right of the Criteria range box, reselect the criteria to A1:P3. The range should now display as StudentData!$A$1:$P$3. d. Use the point & click selector to the right of the Copy to box to select cell A950. Close the Copy to picker. e. Click the OK command button. 3. Scroll down to the end of the table (Ctrl-End will get you there fast), and see that your filtered data is a separate range (A950:P980) outside of the table. At this point, you can copy or move these filtered results to another worksheet or workbook. Unfortunately, Excel will not allow you to specify a different worksheet from the Advanced Filter command. Filter destinations must be on the same worksheet. 4. Select and remove the extracted data from A950:P980. We’re going to recycle that space for one more activity.
Activity 7 – Advanced Filter, Unique records only
The Unique records only check box for the advanced filter provides you with a great way to collect unique field data that can be used for pick lists, data validation lists, or lookup tables. Let’s extract just the list of states in our table. 1. Make sure that StudentTable is selected.
Excel for Databases 11
2. Click on the Advanced filter command:
a. In the Advanced filter dialog, click the radio button for Copy to another location. b. Use the point & click selector for List range to select the header for the State field (J5). Notice that relative reference is being used back in the List range box. Do a Ctrl-Shift-DownArrow keystroke command to select the entire State range. The relative reference should now be StudentTable[[#Headers],[#Data],[State]]. Close the List range picker. c. Click in the box for Criteria range and delete the old reference. We don’t want any criteria now. d. In the Copy to field, use the picker to select cell A950. The reference should become StudentData!$A$950. e. Put a check in the Unique records only box. f. Click on the OK button. 3. Go down to the bottom of your table and verify that you have a list of states running from A950:A1001. 4. If we’re going to use this state list, we should sort it first. Click on the “State” header at A950, press Ctrl-Shift-DownArrow to select the entire range. 5. In the Sort & Filter group, click on the Sort command: 6. In the Sort dialog, be sure to place a check in the box for My data has headers. 7. Click the OK command to exit the dialog and sort the list. 8. Create a new worksheet named StateList, and cut & paste the sorted State list into it.
Advanced Filter – A look behind the Scenes
Before moving on, let us take a moment to review what’s going on with the Advanced Filter. When you had run the Advanced Filter command, it had been automatically creating and using range names. To see this, go to the Name Manager command in the Defined Names group of
the Formulas tab. You should see something like this:
Although the Criteria and Extract range names have been created in the system, you should be careful about using them in subsequent references. In the case of the Criteria range, a problem will arise if you add or delete a row of information from an existing criteria range – the system won’t know that the range has changed. You can see this yourself if you remove the second row of criteria (i.e., the Jersey girls) and rerun the advanced filter. The system will interpret the blank row of criteria as a wildcard, and extract the entire data range, unfiltered. What you could do -- if you wanted to -- would be to edit the Criteria in the Name Manager, and change the third-row reference (StudentData!$A$1:$P$3) to include only the second row ($P$3 $P$2). Something else that you could do – now that you know what’s going on – is to create different range names (for example, criteria1, criteria2, etc.) and use them as references in your advanced filter, a la the Use in Formula command in the Formulas tab. It’s up to you. The Extract range is automatically created when you run the advanced filter with the “Copy to another location” option. However, when you cut & pasted the range of unique states to another sheet, you also moved the Extract range to that new location. If you were to re-run the advanced filter again, exactly as you did in Activity 7 – Advanced Filter, Unique records only, you would now receive an error message: “You can only copy filtered data to the active sheet.” Finally, notice the table icon for StudentTable in the Name Manager. If you look carefully at that reference, it’s only for the data portion of the table. It does not include the header row. That why you can’t refer to StudentTable by name in the List range of the advanced table dialog – the header references are missing. Remember that a criteria range must include the header row. If you want to refer to the entire student table (headers and data) you can use the special notation StudentTable[#All].
Excel for Databases
So if you have the third criteria row in place (i.e., the “Jersey girls”) and if your Extract range is still in its original location, you could quite properly run an advanced query that looks like this:
Data Validation in a Table
Aside from simple record-keeping, one of the reasons that we want to work with data is so that we can provide meaningful statistics. For example, What is the ratio of males/females How many students are in a particular major? Are some majors more popular than others?
In order to assure that the statistics are accurate, however, the underlying data must be uniform and consistent. You can use Data Validation rules to assure that certain information is properly recorded. Although data validation is not specific to Excel tables, it’s worth looking at how table techniques can help us with creating validation rules. Data validation rules for tables are pretty much the same as they are for simple ranges. You should bear in mind that – unlike calculated columns, where formulas are automatically copied – validation on table cells must be applied to selected cells. All of the cells in a data column must have validation set before validation will also work on newly inserted rows.
Activity 8 – Create a validation rule and copy it to other cells
We’re going to explore a couple of tricks for applying validation rules to tables. In the first case, we’ll create a validation rule for the Rank field (e.g., freshman=1, sophomore=2, etc.), restricting entries to a whole number between 1 and 4. 1. Select the first data cell under the Rank column in the table StudentTable (on the StudentData worksheet).
2. In the Data Tools group of the Data tab, click on the Data Validation command. In the Data Validation dialog, select the Settings tab:
a. In the Allow section, select Whole number. b. To the right, remove the checkmark from Ignore blank. This can be a judgment call – if Ignore blank is checked, a blank cell will also be allowed in the rank field. c. In the Data section, select the between value. d. In the Minimum section, type “1” e. In the Maximum section, type “4” f. Click the Error Alert tab, and type something like “Rank must be a whole number between 1 and 4” in the Error message area. These messages are not required, but it is a good practice to offer meaningful feedback. g. Click the OK button to complete the Data Validation. 3. Test out that data validation is operating for the cell you just modified with a new value outside of the validation range. 4. Make sure the first Rank cell is selected and copy it (Ctrl-C on the keyboard, or Home Clipboard Copy from the Ribbon). 5. We’re going to copy the validation rule to the rest of the Rank column: Select the next cell below the one you just copied, and type Ctrl-Shift-DownArrow to select the rest of the range. 6. With the range selected, we want to do a “Paste Special” – Clipboard group in the Home ribbon, click the down-arrow beneath Paste, and then select Paste Special from the
Excel for Databases
bottom of the pull-down:
7. In the Paste Special dialog box, click on the Validation radio button. This is how you can apply a cell’s validation rules to other cells. 8. Click OK to close the dialog box and complete the paste operation. 9. Test out that the validation rule is working for other cells in the range. If you add a new record, the validation should also work.
Activity 9 – A table as a validation list
If you’ve ever written a validation rule that makes use of a dropdown list, then you’re probably aware that one of the limitations is that the list cannot be dynamic, i.e., you cannot simply add or remove a list item without “breaking” the validation rule. If you refer to a range or a named range as the source for your list, you will need to update the validation rule or update the parameters of the range name. But Excel tables are supposed to be dynamic, right? They are, indeed. You just have to know how to coax them. Here’s how. In this activity we will create a validation rule that does not need to be modified whenever its data source is modified. 1. Go to the StateList sheet that you had created as part of “Activity 7 – Advanced Filter, Unique records only.” Turn the range of state codes into an Excel table, and name the table StateTable. 2. If we try to create a validation rule in StudentTable referring to =StateTable, we would get a warning that the formula contains an error. Here’s the trick:
a. Go to the Name Manager (in the Formulas tab) and click the command to create a new name.
b. In the New Name dialog box, assign a range name StateLookup. c. In the Refers to box, type in the structured reference for the table and column, in this case: =StateTable[State] d. Click the OK command button to save the new range name. e. Close the Name Manager dialog. 3. Select the first State data cell in the StudentTable, and execute the Data Validation command (Data tab Data Tools group Data Validation):
a. In the Settings tab of the Data Validation dialog, select List in the Allow: category. b. In the Source: category, type the reference to the range name: =StateLookup. (note that you could alternately click the selector on the right and select StateLookup from the Use in Formula command in the Defined Names group) c. Add an appropriate Error Alert message to let the user know that the State field is restricted to valid two-character state codes. d. Click OK to complete the data validation and exit the dialog. 4. Test the validation for the first state cell (should be Bradley Kuhl’s record). 5. Copy the state validation to the remaining cells in the State column (see steps 4-9 in “Activity 8 – Create a validation rule and copy it to other cells” above).
Excel for Databases 17
6. You can verify that this state validation list is working by adding a new “state” to the StateTable and testing it in the StudentTable.
Back up in Click the Save icon in the Excel quick links section to save your work. Activity 3 – Working with structured references, you learned how to use structured references to aggregate data in entire tables or columns. If you want to query your data with advanced filters, there are special functions to help you. Excel provides special database functions to calculate statistics such as total, average, maximum, minimum, and count for a particular database field and when specific criteria are met. This last part – specific criteria – is the distinctive feature. Say, for example, that you wanted to calculate the average GPA1 with complicated criteria, such as the one you used in Activity 5 – Advanced Filter, in place (remember the female geology freshmen?). You would need a formula that looks something like this: =AVERAGEIFS(StudentTable[GPA1],StudentTable[Gender],"female",StudentTable [Major],"Geology",StudentTable[Rank],1) Using a “D” function, the same formula would be much simpler: =DAVERAGE(StudentTable[#All],"GPA1",A1:P2) In this second example, the range A1:P2 represents the criteria range, which is responsible for handling the task of filtering for gender, major, and rank.
Syntax of the database function
The basic database functions all take the same three arguments, as illustrated by this DSUM function: =DSUM(database, field, criteria) where, Database specifies the range containing the database. It must include the field names in the top row. If you want to use the structured reference, consider using TableName[#ALL]. Field is the argument that specified the field whose values are to be calculated by the database function. Specify this argument by enclosing the name of the field in double quotes. Criteria is the argument that specifies the address of the range that contains the criteria that you are using to determine which values are calculated. Criteria take the form of the criteria range that you worked with in “Criteria Range” above. As a minimum, the range must include at least one field name that indicates the field whose values are to be evaluated and one cell with the values or expression to be used in the evaluation.
Table of Database Functions Featured in Excel 2010
Database Function DAVERAGE DCOUNT DCOUNTA DGET What It Calculates Averages all the values in a field of the database that match the criteria you specify. Counts the number of cells with numeric entries in a field of the database that match the criteria you specify. Counts the number of nonblank cells in a field of the database that match the criteria you specify. Extracts a single value from a record in the database that matches the criteria you specify. If no record matches, the function returns the #VALUE! error value. If multiple records match, the function returns the #NUM! error value. Returns the highest value in a field of the database that matches the criteria you specify. Returns the lowest value in a field of the database that matches the criteria you specify. Multiplies all the values in a field of the database that match the criteria you specify. Estimates the standard deviation based on the sample of values in a field of the database that match the criteria you specify. Calculates the standard deviation based on the population of values in a field of the database that match the criteria you specify. Sums all the values in a field of the database that match the criteria you specify. Estimates the variance based on the sample of values in a field of the database that match the criteria you specify. Calculates the variance based on the population of values in a field of the database that match the criteria you specify.
DMAX DMIN DPRODUCT DSTDEV DSTDEVP DSUM DVAR DVARP
Activity 10 – Using the “D” Functions
In this activity, we’ll continue with the StudentData worksheet that you have previously used in Activities 7 and 8. We’re going to calculate the average, maximum, minimum, and count for GPA1 grades for the freshman female geology majors. 1. Be sure that the criteria range contains the following filter for female freshman geology majors in row 2:
2. Two cells to the right of the header for the Criteria range (i.e., R1), type in a header (Statistics) and some place holders for Averages, Max, Min, and Count:
Note that we’re skipping column Q so that the Statistics section will not get automatically included as part of StudentTable.
Excel for Databases 19
3. Enter in the following functions to the right of the appropriate label: =DAVERAGE(StudentTable[#All],"GPA1", Criteria) =DMAX(StudentTable[#All],"GPA1", Criteria) =DMIN(StudentTable[#All],"GPA1", Criteria) =DCOUNT(StudentTable[#All],"GPA1", Criteria) 4. Based upon the current criteria, your results should look like the following:
As you can see from all of the “D” functions, the content of the Criteria range plays a major role in determining the basis of your statistics.
There are a number of Excel functions that you can use to look up and return information within a table. The most popular function for most users is VLOOKUP, which searches the first column of a range of cells and then returns a value from any cell on the same row. The inherent limitation of VLOOKUP is that whatever value you want to return must be to the right of that first search row. In StudentTtable, for example, you could use the unique identifier Student ID as the basis for a VLOOKUP (see the table below). If you did, you could return information to the right, such as Major, Rank, Date of Birth, etc., but you could not return the Last Name, First Name, or Gender, which are to the left of Student ID. Last Name First Name Gender Student ID Major Rank DOB
If you want to return information from a column to the left of the search column, you will need to do something else.
The INDEX and MATCH Functions
For more versatile lookups, one of the best solutions is to use a MATCH function nested within an INDEX function. Let’s look at these two functions separately first. The INDEX function, =INDEX(data_range, row_number, column_number) has three arguments. If you provide the data range (e.g., StudentTable) and a specific row number and column number, INDEX will display what’s in that cell. As you can see, however, INDEX needs to know the exact coordinates of what to look up. Here’s where we need the MATCH function, which returns the relative position of an item in an array that matches a specified value. Here’s the syntax: =MATCH(lookup_value, lookup_array, [match_type]) Where 20
lookup_value is the value you want to match in lookup array. lookup_array is the range of cells being search match_type is an optional argument that lets you control whether the match should be exact (0) or less than (1) or greater than (-1) the lookup_value. It is important to note that the exact match does not require that your lookup_array be in any specific order, whereas 1 or -1 requires that lookup_array be sorted in ascending order or descending order, respectively. Here’s where we get creative: we’re going to use the MATCH function to provide us with the row_number argument for the INDEX function. =INDEX(data_range, MATCH(lookup_value, lookup_array, 0), column_number) Notice that we’re using the zero match_type, here because we’re interested in finding exact matches. At first glance, this probably seems very complicated, but you’ll find that it gets easier with practice.
Activity 11 – Using the INDEX and MATCH Functions
In this activity, we’ll use the worksheet titled “ClassRegistration,” which is a table of nearly 3,000 Student IDs associated with the CRNs of classes they are taking. Some background on ClassRegistration: the Excel Data tab has a group titled Get External Data. In that group is a command, From Access, which opens up an import wizard. When importing data from Access, the Import Data dialog lets you Select how you want to view this data in your workbook. The default selection is as a Table. In other words, the default for importing an Access table is as an Excel data table. You should also be aware that the imported table has a data link to the original Access table, meaning that if one were to change a record in the Access version of the table, the change would also appear in the Excel data table. For this tutorial, this link has been removed so that this data is separated from its original source. We’re going to write a very simple search that uses the unique identifier Student ID to look up the student’s name from StudentTable in the StudentData worksheet. In StudentData, the Student ID column is to the right of Last Name and First Name, so a VLOOKUP function would not work in this situation. 1. Open up the worksheet titled ClassRegistration. Click on the Design tab, and look at the table name in the Properties group. The table name Table_Faculty.accdb was assigned to this table when it was imported from an Access database. We’ll keep this name as it is. 2. At the top of column D, type in a new column label Last Name. Hit the tab key to move one column to the right. Notice that First Name is now part of the Table_Faculty.accdb table. 3. In column E, type in a column label First Name and hit Enter. 4. Directly below Last Name column label (cell D2)), enter this formula: =INDEX(StudentTable[#Data], MATCH([@[Student ID]], StudentTable[Student ID],0),1) 5. Hit Enter on the formula. The last name Green should appear in the cell.
Excel for Databases
6. Directly below the First Name label (E2) enter this formula: =INDEX(StudentTable[#Data], MATCH([@[Student ID]], StudentTable[Student ID],0),2) (Note that this is the same formula as above, except the final column_number value is 2. To save time, you could simply copy the formula from Last Name, paste it into the First Name column, and change the column_number reference to 2). 7. Hit Enter on the formula. The First Name for Green should be Sheila. 8. Your results should look like this:
Going A Round with Pivot Tables
Pivot Tables were previously covered in the Intermediate Excel tutorial. However, since they provide an excellent way to summarize large amounts of data, we would be remiss to not mention them here. While advanced filter techniques provide an excellent way to ask specific questions about the data, a pivot table gives you an excellent tool for summarizing groups within your data. It helps with your data analysis by letting you group large amounts of data into categories. In the case of the StudentTable, for example, we can use a pivot table to quickly summarize the numbers of students in a variety of ways: How many are in each class rank? How are they distributed across majors? Is gender a significant factor?
Activity 12 – A pivot table to count students
This activity makes use of data in the StudentTable, which is on the StudentData worksheet. 1. Select the StudentData worksheet, and make sure that the table is selected (i.e., you should see the Table Tools contextual tab). 2. In the Tables group of the Insert tab, click on the PivotTable command icon. This will display a Create Pivot Table dialog:
a. The Select a table or range radio button should be checked by default. b. StudentTable should already appear in the Table/Range box. c. The New Worksheet radio button should also be checked by default. d. Click on the OK button to close this dialog box. 3. A new worksheet will appear. It should have a place holder for the new pivot table in the upper-left of the sheet, as well as a Pivot Table Field List panel running vertically down the right side. Note, also, that there is now a PivotTable Tools contextual tab available at the top of the page. Rename this new worksheet as StudentTable Pivot. 4. In the Pivot Table Field List, we need to decide upon which categories to display. Let’s take a look at how class majors are distributed. Click the check box to the left of Major in the Field List. Major should become a Row Label, displaying along the left side of the sheet, as well as appearing in the Row Labels quadrant of the Field List panel. 5. If we are counting students, the Student ID field should be our best choice, since each value represents a unique student from the StudentTable. Use your mouse pointer to select the Student ID in the Field List, and drag it down into the Values quadrant. A Count of Student ID should appear as values distributed among the Majors Row Labels. 6. Now that we have a count of students in each major, let’s see how they are distributed by class rank as well. Use your mouse pointer to select Rank in the Field List. Drag and drop Rank beneath Major in the Row Labels quadrant. 7. Note that the distribution of Rank appears beneath each Major, which is now an expanded category. You can click the minus box to the left of each Major if you want to collapse the Rank distribution. 8. Instead of looking at Rank as a subset of Major, let’s move it to the column axis. Click and drag Rank from the Row Labels quadrant and drop it into the Column Labels quadrant.
Excel for Databases
This table is easier to read:
9. It might be interesting to see how Gender is distributed among Rank and Major. Use your mouse pointer to select Gender in the Field List and drag it down to the Report Filter quadrant. Gender now appears as a filter above the main pivot table, allowing you to tabulate males and females separately. Set the Gender filter to tabulate for females. 10. With the Gender filter set to female, note that the number of female Geology freshmen is 26, which is the same value you should have gotten from Activity 5 – Advanced Filter, in place. Use your mouse to select the intersection of female freshmen Geology majors (i.e., the 26). 11. Double-click on that selection. Notice that a new worksheet appears which contains a data table of the values you just selected! In case you didn’t know: Pivot Tables allow you to extract any data that you can summarize. Unfortunately, there’s only so much that can be covered in a finite amount of time. There are many other ways that you can summarize this data, such as looking at the GPA averages or distributions by state. And then there are Pivot Charts, which can give you a visual dynamic representation of your data. We hope that this tutorial gives you a start on looking at what you can do with your own data. --The End --
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.