You are on page 1of 12

Data Cleaning &

Exploration
Welcome to your first lesson of the HR Data Analyst course. In this lesson, we will
focus on data cleaning and exploration using Excel.

AF
Case Study: Green Generation
You are an HR data analyst at Green Generation, a mid-sized organization specializing in sustainable energy solutions.
The company has experienced significant growth in recent years. Your HR director, Linda, has asked you to provide data-
driven insights for her quarterly board update. She has three specific questions:

1. What is the total headcount?

2. What is the average age in the organization?

3. How many women are employed in the Western region finance department?

Let's answer these questions using Excel. We will start by working with the "HRIS23" Excel file, which contains over
800 rows of employee data and 14 columns. The dataset includes employee information such as ID, demographic data,
tenure, region, department, salary, and performance ratings.
Manipulating Data in Excel
To make data analysis easier, we will convert the dataset into an Excel table. This allows us to filter, sort, and calculate
metrics more efficiently.

1. Select the entire data range by pressing Ctrl+A.

2. Click on the "Insert" tab and select "Table" or press Ctrl+T.

3. In the "Create Table" box, ensure that the range is correctly populated.

4. Select the tick box to indicate that your table contains headings.

5. Click OK to create the table.

We will rename the table as "MasterTable" for easier reference in formulas and data sources.
Features of Tables
Excel tables offer several useful features for data analysis.

• Filter buttons are automatically added to the header row, allowing you to easily filter the data.

• Column headings remain visible even when scrolling, eliminating the need to freeze rows or columns.

• Tables automatically expand to include new rows or columns when new data is added.

You can also use the total row in the Table Design tab to summarize data with counts, averages, sums, or other
descriptive statistics.
Calculating Average Age
To answer Linda's second question about the average age of employees, follow these steps:

1. Click on the cell in the total row of the Age column.

2. Select "Average" from the drop-down menu.

3. Excel will calculate the average age for you.

Remember to reformat the cells to remove additional decimal points if necessary.


Filtering and Sorting Data
You can further analyze your data by using sorting and filtering tools in Excel.

To remove any applied filters, go to the Data


Data Cleaning & Exploration | HR DATA
ANALYST: MODULE 1 LESSON 1
To find the current headcount in the organization, count the number of unique employee IDs in the dataset. Use the
formula =SUM(1/COUNTIF([ID],[ID])) in cell B837 to get the total number of unique values in column A, which
represents the number of employees. The result is 797.
Identifying and Removing
Duplicates
Identify and remove duplicates in the dataset to avoid biasing the analysis. Use
conditional formatting to highlight duplicate values in the ID column. Filter out the
duplicate data by selecting the light red colored bar in the ID column's filter by color
option. There are 76 rows with duplicate data, showing inconsistencies in the
Department and Hours columns.
Handling Duplicates
To deal with the duplicates, change the department to "Sales" for all rows and double
the number of working hours to 40. Multiply the salaries by 2 and change the column
format to Currency. Remove the color filter for the ID column.
Removing Duplicate Rows
Remove duplicate rows by clicking on "Remove Duplicates" in the Table Design tab
and ticking the ID checkbox. Excel will remove the duplicated data, resulting in the
removal of 38 duplicate values.
Using Slicers for Filtering
To answer the final question on the number of female employees in the finance department in the Western region, use
slicers. Use the slicers to filter the data by selecting F in the gender slicer, West in the region slicer, and Finance in the
department slicer. The table will adjust to display the filtered data, and the total row values will also adjust. The answer is
6 women in the finance department in the Western region.
Reporting Findings
You have now answered all three questions and can report your findings to Linda. In this lesson, you learned how to
clean data by removing duplicates and how to use table features like the total row, filters, and slicers for analysis. In the
next lesson, you will summarize workforce demographic data using IF and IFS functions in Excel.

You might also like