You are on page 1of 45

DATA MANAGEMENT USING MS

EXCEL
National Beneficiary Data Management Training
April 28 – May 7, 2016
Kennedy M. Velez
What Types of Analysis are performed?
 Completeness Analysis – count of data set records versus
blank or null
 Range Analysis – maximum, minimum, average and median
values found.
 Pattern Analysis – formats and relations
 Uniqueness Analysis – unique (distinct) count found across
variables & duplicates.
General Functions of Excel in Data
Profiling and Analysis
1. PivotTable (1 variable) – sum, count, distinct count
2. Statistical Functions
3. Sort & Filter
4. Conditional Formatting
5. Concatenate, Trim and Upper
6. Computing Age
7. CountIF Formulas
8. Vlookup Reference
9. PivotTable (multiple variables) – comparison of variables
10. Logical (if, and, or) Formulas
11. IF ISERROR Formula
12. Detect Duplicates
13. Tables & Graphs/Charts
Do Not Modify Your Original Data Copy!
Back up First!
Data Quality Profiling Using Excel
What is Data Quality Profiling?
 It is the process of statistically examining and analyzing the
content in a data source, hence collecting information about
the data.
 It helps us understand content, structure, relationship, etc.
about the data.
 It assists the discovery of anomalies in data.
 It enables us to make early decisions and act accordingly.
Using the PivotTable Functions (1 variable)
- most powerful features, allows you to summarize and extract significance from a large, detailed
data set
Insert > PivotTable
Statistical Functions
1. MAX – to find the maximum value =MAX(range)
2. MIN – to find the minimum value =MIN(range)
3. MEDIAN – to find the middle number =MEDIAN(range)
4. MODE – to find the most frequently number. =MODE(range)
5. AVERAGE – to calculate the average of a range of cells. =AVERAGE(range)
6. COUNTBLANK – to count missing or blank cells =COUNTBLANK(range)
Statistical Functions
1. Simply use Excel add-ins Analysis Toolpak
 to activate File>Option>Add-ins>Manage: Excel Add-ins>Go>Check Analysis Toolpak>Ok
Using the Sort & Filter
Sort – sort one or more columns ascending/descending
Filter - filter your Excel data if you only want to display records that meet certain criteria.
Using the Conditional Formatting
- enables you to highlight cells with a certain color, depending on the cell's value.
Concatenate - combines/joins two or more text strings into one.
=CONCATENATE(text1, text2, text3…)
EXAMPLE#1:

EXAMPLE#2: With TRIM & UPPER for consistent data entry


Computing of Age
=DATEDIF(date1, date2, “Y”)
EXAMPLE:

Y –Years
YM – Months
MD - Days

=DATEDIF(date1, date2, “Y”) & “ Years, “ DATEDIF(date1, date2, “YM”) & “ Months, “ DATEDIF(date1,date2, “MD”) & “ Days”

Example Result: 19 Years, 5 Months, 20 Days


Using the CountIF Formula
CountIF – to count cells based on one criteria.
=COUNTIF(range, criteria)
EXAMPLE:

CountIFs – to count cells based on multiple criteria.


=COUNTIFS(range1, criteria1, range2, criteria2)
EXAMPLE: * multiple updates but same field updated

Source: Regular Update Type 5 P1 2016


Using the Vlookup Reference
- looks for a value in the leftmost column of a table, and then returns a value in the same row
from another column you specify.
=VLOOKUP(lookup_value, table_array, col_index_num, range_lookup)
EXAMPLE #1:

CELL/Value to find
Search Range
Column Result
True/1: Approximate match
False/0: Exact Match (recommended)
Using the Vlookup Reference
EXAMPLE #2:
Data Analysis Using MS Excel
• Compare and Contrast of 2 datasets
• Duplicity Checking
Using the PivotTable Functions (multiple variables)
Insert > PivotTable

Source: Regular Update Type 5 P1 2016


Using the Logical IF Formula
Simple IF Statement:
=IF(condition, result if true, otherwise false)
EXAMPLE:

Multiple IF Statement:
=IF(condition1, result if true1, IF(condition2, result if true2, otherwise false))
EXAMPLE:
Using the Logical IF Formula
AND – ALL conditions are met, results TRUE, otherwise FALSE
OR – ANY conditions are met, results TRUE, otherwise FALSE
=IF(AND(Condition1, Condition2, Condition 3…), result if true, otherwise false)
EXAMPLE:

=IF(OR(Condition1, Condition2, Condition 3…), result if true, otherwise false)


EXAMPLE:
Using the IF ISERROR Formula
- conditional statement that detects match values.
=IF(ISERROR(MATCH(lookup_value, table_array, range_lookup)),”not match”,”match”)

EXAMPLE #1: Duplicate check using household id or entry id

CELL/Value to find
0 – False (exact match)
Search Range
Using the IF ISERROR Formula
EXAMPLE #2: Duplicate check using Full Name
3 Steps to Detect Duplicates (Level 1)
1. Create duplicate reference - concatenate the name of households (last name, first
name & middle name) as shown below.
2. Insert new column and use COUNTIF() formula, then drag-
down.
3. Filter with count 2 and above, sort per name and you’re
done! Save file for possible duplicate. 
Step-by-step Duplicity Check

Level 2
1. CREATING REFERENCE LIST OF
POSSIBLE DUPLICATES
TRIM and UPPER last name, first name and mid name,
then copy-paste each column as value to remove
formula. (you can delete the old names to minimize size)
Concatenate the name of households (last name + first
name + mid name)
Count the number of duplicates based on full name
(previously concatenated).
-Process time depends on size
Filter with count 2 and above, sort per name then save file as
“Possible Duplicate Reference”.
2. Master list versus Possible Duplicate
Reference
Copy-paste your master list household id into possible
duplicate reference then use IF ISERROR as shown
below, then filter to dups.
Shown below are households in your master list with
possible duplicates. Copy-paste full name
(concatenated) to other sheet as reference of names.

COP
Y
Copy-paste again your previous reference names then
use IF ISERROR as shown below, then filter to dups.
VIOLA! Shown below are master list household ids
including their possible duplicates for validation. SAVE
FILE. 
Data Reporting Using MS Excel
Tabular Form (Table and its parts)
Table 1. Number of Child Beneficiaries per Age and Grade Level
1) Table Number
Age Category and Table Title
Level of Education Grand Total
3-5 YO 6-14 YO 15 -18 YO
No Grade Reported 36,448 77,969 27,099 141,516 2) Column
Header
Day Care 103,861 50,582 949 155,392
Kinder 197,272 160,854 1,791 359,917
Kinder / Day Care 1,109 62,520 1,613 65,242
Grade 1 121,643 314,199 1,502 437,344
Grade 2 6,692 683,029 4,194 693,915
Grade 3 1,028 833,623 8,495 843,146
Grade 4 589 1,063,993 20,544 1,085,126
Grade 5 308 1,029,544 38,114 1,067,966
3) Row 4) Body
Grade 6 450 1,105,209 195,563 1,301,222
Classifier
Grade 7 97 613,181 77,412 690,690
Grade 8 69 522,538 181,210 703,817
Grade 9 79 302,485 300,276 602,840

Grade 10 / 4th Year HS 62 42,561 433,370 475,993

Grade 11 - 79 423 502


Grade 12 2 32 90 124
Grand Total 469,709 6,862,398 1,292,645 8,624,752

Source: Pantawawid Pamilya Information System as of March 31, 2016 5) Source Note
Graphical Forms
When Should You Use Them?
1. Column/Bar Charts
- Most common charts used in presentation, column charts are meant to compare
values to each other.
Select data > Insert and you’ll see a choice of charts type as shown below
Sample of 3-D Charts.
Here are the choices if you prefer to modify your charts
2. Pie Charts
- shaped like a pie and are best used when you need to show the amount of a much
larger category that’s taken up by smaller sub-categories.

 Table shows the


categorize approved
updates on basic
information or
update type 9 with a
total of 3,000.
 First name has the
highest updates on
basic information
and Extension name
as the lowest.
2. Line Charts
- typically used to show trends over a period of time.

 Table shows the


trend of registered
active households
from January to
April .
 A decrease of 2,716
or 6.52% on the
number of registered
active HHs which
might be due to
deactivation of HHs.
Data Management Using MS Excel Workshop
1. Using the list of approved updates (you can select
from Updated type 5, 9 or 11)
 Count of data set records with values or null (blank)
- E.g. – distinct count of households, count per field updated, count of new
values with blanks, etc.
 Find 3 data inconsistencies
- E.g. – old value equals new value, age versus grade level, child bene but
not child/grandchild, etc.
2. Using the grantee list and HH roster
 Graph the number of households by client status and province
 Summary of households with count of eligible by province.
 Compare the grade level and age of the child beneficiaries for education
using a two way table.
3. Using the list of Code 21 households
 Check possible duplicates using Level 1 Duplicity Check
The End
Thank you! 

KEEP IT SIMPLE

You might also like