Professional Documents
Culture Documents
EXCEL
National Beneficiary Data Management Training
April 28 – May 7, 2016
Kennedy M. Velez
What Types of Analysis are performed?
Completeness Analysis – count of data set records versus
blank or null
Range Analysis – maximum, minimum, average and median
values found.
Pattern Analysis – formats and relations
Uniqueness Analysis – unique (distinct) count found across
variables & duplicates.
General Functions of Excel in Data
Profiling and Analysis
1. PivotTable (1 variable) – sum, count, distinct count
2. Statistical Functions
3. Sort & Filter
4. Conditional Formatting
5. Concatenate, Trim and Upper
6. Computing Age
7. CountIF Formulas
8. Vlookup Reference
9. PivotTable (multiple variables) – comparison of variables
10. Logical (if, and, or) Formulas
11. IF ISERROR Formula
12. Detect Duplicates
13. Tables & Graphs/Charts
Do Not Modify Your Original Data Copy!
Back up First!
Data Quality Profiling Using Excel
What is Data Quality Profiling?
It is the process of statistically examining and analyzing the
content in a data source, hence collecting information about
the data.
It helps us understand content, structure, relationship, etc.
about the data.
It assists the discovery of anomalies in data.
It enables us to make early decisions and act accordingly.
Using the PivotTable Functions (1 variable)
- most powerful features, allows you to summarize and extract significance from a large, detailed
data set
Insert > PivotTable
Statistical Functions
1. MAX – to find the maximum value =MAX(range)
2. MIN – to find the minimum value =MIN(range)
3. MEDIAN – to find the middle number =MEDIAN(range)
4. MODE – to find the most frequently number. =MODE(range)
5. AVERAGE – to calculate the average of a range of cells. =AVERAGE(range)
6. COUNTBLANK – to count missing or blank cells =COUNTBLANK(range)
Statistical Functions
1. Simply use Excel add-ins Analysis Toolpak
to activate File>Option>Add-ins>Manage: Excel Add-ins>Go>Check Analysis Toolpak>Ok
Using the Sort & Filter
Sort – sort one or more columns ascending/descending
Filter - filter your Excel data if you only want to display records that meet certain criteria.
Using the Conditional Formatting
- enables you to highlight cells with a certain color, depending on the cell's value.
Concatenate - combines/joins two or more text strings into one.
=CONCATENATE(text1, text2, text3…)
EXAMPLE#1:
Y –Years
YM – Months
MD - Days
=DATEDIF(date1, date2, “Y”) & “ Years, “ DATEDIF(date1, date2, “YM”) & “ Months, “ DATEDIF(date1,date2, “MD”) & “ Days”
CELL/Value to find
Search Range
Column Result
True/1: Approximate match
False/0: Exact Match (recommended)
Using the Vlookup Reference
EXAMPLE #2:
Data Analysis Using MS Excel
• Compare and Contrast of 2 datasets
• Duplicity Checking
Using the PivotTable Functions (multiple variables)
Insert > PivotTable
Multiple IF Statement:
=IF(condition1, result if true1, IF(condition2, result if true2, otherwise false))
EXAMPLE:
Using the Logical IF Formula
AND – ALL conditions are met, results TRUE, otherwise FALSE
OR – ANY conditions are met, results TRUE, otherwise FALSE
=IF(AND(Condition1, Condition2, Condition 3…), result if true, otherwise false)
EXAMPLE:
CELL/Value to find
0 – False (exact match)
Search Range
Using the IF ISERROR Formula
EXAMPLE #2: Duplicate check using Full Name
3 Steps to Detect Duplicates (Level 1)
1. Create duplicate reference - concatenate the name of households (last name, first
name & middle name) as shown below.
2. Insert new column and use COUNTIF() formula, then drag-
down.
3. Filter with count 2 and above, sort per name and you’re
done! Save file for possible duplicate.
Step-by-step Duplicity Check
Level 2
1. CREATING REFERENCE LIST OF
POSSIBLE DUPLICATES
TRIM and UPPER last name, first name and mid name,
then copy-paste each column as value to remove
formula. (you can delete the old names to minimize size)
Concatenate the name of households (last name + first
name + mid name)
Count the number of duplicates based on full name
(previously concatenated).
-Process time depends on size
Filter with count 2 and above, sort per name then save file as
“Possible Duplicate Reference”.
2. Master list versus Possible Duplicate
Reference
Copy-paste your master list household id into possible
duplicate reference then use IF ISERROR as shown
below, then filter to dups.
Shown below are households in your master list with
possible duplicates. Copy-paste full name
(concatenated) to other sheet as reference of names.
COP
Y
Copy-paste again your previous reference names then
use IF ISERROR as shown below, then filter to dups.
VIOLA! Shown below are master list household ids
including their possible duplicates for validation. SAVE
FILE.
Data Reporting Using MS Excel
Tabular Form (Table and its parts)
Table 1. Number of Child Beneficiaries per Age and Grade Level
1) Table Number
Age Category and Table Title
Level of Education Grand Total
3-5 YO 6-14 YO 15 -18 YO
No Grade Reported 36,448 77,969 27,099 141,516 2) Column
Header
Day Care 103,861 50,582 949 155,392
Kinder 197,272 160,854 1,791 359,917
Kinder / Day Care 1,109 62,520 1,613 65,242
Grade 1 121,643 314,199 1,502 437,344
Grade 2 6,692 683,029 4,194 693,915
Grade 3 1,028 833,623 8,495 843,146
Grade 4 589 1,063,993 20,544 1,085,126
Grade 5 308 1,029,544 38,114 1,067,966
3) Row 4) Body
Grade 6 450 1,105,209 195,563 1,301,222
Classifier
Grade 7 97 613,181 77,412 690,690
Grade 8 69 522,538 181,210 703,817
Grade 9 79 302,485 300,276 602,840
Source: Pantawawid Pamilya Information System as of March 31, 2016 5) Source Note
Graphical Forms
When Should You Use Them?
1. Column/Bar Charts
- Most common charts used in presentation, column charts are meant to compare
values to each other.
Select data > Insert and you’ll see a choice of charts type as shown below
Sample of 3-D Charts.
Here are the choices if you prefer to modify your charts
2. Pie Charts
- shaped like a pie and are best used when you need to show the amount of a much
larger category that’s taken up by smaller sub-categories.
KEEP IT SIMPLE