You are on page 1of 45

Data Cleaning in Excel

Wow Tricks to make you Faster-Smarter-Better

Founder, YL Academy (Yoda Learning)


Table of Contents
01. Flash Fill
02. Find & Replace (Basic)
02A. Find & Replace with asterisk (*)
02B. Find & Replace with question mark (?)
02C. Find & Replace. Neutralizing Wildcard
02D. Find & Replace in MS Word
02E. Find & Replace – Cell Format
03A. Go To (Special) – Fill Blanks
03B. Go To (Special) – Delete Errors
04A. Text to Columns – Delimiter vs. Fixed Width
04B. Text to Columns – Trailing Minus
04C. Text to Columns - Invalid Dates
05. Data Cleaning using Formulas
06. Bonus: Our videos on YouTube

2
Flash Fill in Excel - Basic to Intermediate Top 30 Data Cleaning Tricks in Excel
https://youtu.be/Mj0Bl_KzGvk https://youtu.be/xg28Q0xB6aI

Text to Columns using Delimited in Excel Excel Data Cleaning Techniques for Data Analyst
https://youtu.be/MmUx13P67uc https://youtu.be/v3PMOr063yg

Text to Columns in Excel Clean Bank Statement (DR/CR) – Excel


https://youtu.be/IKsyUioJ20Y https://youtu.be/TgamD6znfnE

Text to Columns - Split One Column into Multiple Unique Formatting Tricks in Excel
https://youtu.be/f9O-mN6cRvM https://youtu.be/xg28Q0xB6aI

Fill Blank Cells Using Go To (Special)


https://youtu.be/5i8H3DpbB_0

How to Clean up RAW Data in Excel


https://youtu.be/HKA1sPPkfRs

Clean Bank Accounts data for Sales Collection


https://youtu.be/j_FA-0RasPY

3
01 | Flash Fill
Clean data using Flash Fill
Flash Fill (Ctrl + E)
 Flash Fill feature was introduced in Excel 2013. You can find The Flash Fill Option under the Data Tab,
right besides Text to Column.

© YL Academy

© YL Academy

 Flash Fill requires the user to define the pattern in the adjoining / connected cell of the same row with
1-3 output samples.

 You can keep the Flash Fill option ON by default. Go to the File > Options. Click on the Advanced menu
option and ensure that the ”Automatically Flash Fill” box is checked On.

www.yodalearning.com 5
Let’s understand with examples…

www.yodalearning.com 6
 Situation: Your data set has the (1) Bank Name, (2) Bank Account No. of different lengths, and (3) Addl.
IFSC or some Bank code.
 Complexity: You want to extract the Bank Account No. individually in the adjacent column. The relevant
string has been shaded in Yellow. Text to Columns will not help as the data set is unstructured. It doesn’t
have a proper consistent delimiter.

© YL Academy
?
… cont’d

www.yodalearning.com 7
 Solution: Flash Fill Feature needs few output samples. Steps include:
1. Copy the Bank A/c No. 21154000001 from Column B
2. Go to the adjoining cell and paste the Bank A/c No. 21154000001 there with a preceding
apostrophe ‘21154000001 – this will store the number as text
3. Press Enter
4. Now press Ctrl + E to activate Flash Fill
5. Flash Fill Technique will recognize the pattern and automatically fill the desired data in
the blank cells below

How to train Flash Fill?


 Take more samples
 Take diverse samples
 No blank columns in
between – input vs
sample output
 Use Apostrophe for
numbers with
preceding zeroes. E.g. ,
‘007

© YL Academy

www.yodalearning.com 8
02 | Find & Replace (Basic)
Basics

www.yodalearning.com 10
02A | Find & Replace with asterisk (*)
Remove unwanted data quickly
Find & Replace – Using Wildcard characters (* )

Output

Raw data (input)

Asterisk ( * ) : Any number of characters


E.g., Ri* includes Rishi, Rita, Rider

www.yodalearning.com 12
Let’s understand with examples…

www.yodalearning.com 13
 Situation: The data set reflects readings of electricity units consumed (kwh). These have
been measured on different dates and time.
 Complexity: You want to find the total electricity units consumed. To do so, the numbers
highlighted in Yellow should be extracted. E.g. 0.384, 1.201 etc.

 Solution:
1. Choose column A (data set)
2. Press Ctrl + H for Find & Replace
3. Type “*_” (asterisk followed by underscore)
4. Click on Replace All

*_

© YL Academy
© YL Academy

5. “*_” implies that all data set till the last


occurrence of underscore will be removed
Flash Fill can also be used.

www.yodalearning.com 14
02B | Find & Replace with question mark (?)
Remove unwanted data quickly
Find & Replace – Using Wildcard characters (?)

Output

Raw data (input)

Question ( ? ) : Any one character (single)


E.g., S*n includes Sun, Son, Sin

www.yodalearning.com 16
02C | Find & Replace | Neutralizing Wildcard
Neutralizing Wildcard characters to remove them from data
 Situation: The names list contain unnecessary asterisk signs.

 Complexity: Using Find and Replace with asterisk will remove ALL the characters including the names.

© YL Academy

*
?
© YL Academy

 Solution: Use tilde sign


(~) before the asterisk
sign (*) to neutralize ~*
the power of asterisk.
This will only remove
the asterisk signs.
© YL Academy

www.yodalearning.com 18
02D | Find & Replace in MS Word
Target all digits, letters
 Situation: The data set is composed of names and transaction IDs (nos. of different lengths)

 Complexity: You want to split names and numbers in two different columns. Text to Columns will not
help as the data is not structured enough.

 Solution: Use the Find & Replace of MS Word. It has advanced


© YL Academy
options under “More” > “Special”

© YL Academy

Flash Fill can also be used. … cont’d

www.yodalearning.com 20
 The data can be pasted to MS Word twice. On the first, the digits can be removed and on the
other, the letters can be removed.

You can remove


“Any Digit” or “Any letter”.

© YL Academy
© YL Academy

www.yodalearning.com 21
02E | Find & Replace – Cell Format
Target all digits, letters
Find & Replace – Cell Format

• FIND WHAT: Specify the source format


• REPLACE WITH: Specify the target format

www.yodalearning.com 23
03A | Go To (Special)
to fill blank cells with Ctrl + Enter
 Situation: The data set is composed of supplier names along with a series of transactions on the right
side.
 Complexity: You cannot apply Filter, Sort, and Pivot Tables correctly as the cells in between two supplier
names are blank. The list can run in to 1000s of names. How do you fill the blanks?

 Solution:

© YL Academy

?
?
© YL Academy

1. Choose the relevant data set (portion


of Col A & B) … cont’d in the next page

www.yodalearning.com 25
2. Press Ctrl + G to activate Go To Special box
3. Click on “Special” © YL Academy

4. Choose “Blanks” from the options and press


Enter
… cont’d in the next page

2
© YL Academy

www.yodalearning.com 26
5. Press equal to sign (=) and choose the cell above
6. Press Ctrl + Enter

© YL Academy © YL Academy

5 6

With multiple cells selected, Ctrl + Enter will enter the same data / formula logic in all the selected cells at once.

www.yodalearning.com 27
03B | Go To (Special)
to delete errors in the cells
 Situation: Excel data set with many calculations

 Complexity: There are many error cells in the Worksheet. You wish to detect and delete these error cells
in one go.

© YL Academy

… cont’d

www.yodalearning.com 29
 Solution:
1. Choose the data set
2. Press Ctrl + G, click on “Special” button, and press Enter

3. Choose “Formulas” and keep “Errors” option ticked


© YL Academy ON. Press Enter.
4. All the cells with formula-driven errors will be
selected. Press Delete or color the cells from the
Home tab.
5. Note: “Constant” implies no formulas i.e. hard-
3 coded text or number.
4

“Formulas” with “Numbers” can help select those cells whose answers are in number and are based on a calculation.

www.yodalearning.com 30
04A | Text to Columns –
Delimiter vs. Fixed Width
Text to Column with Fixed Width

… cont’d

www.yodalearning.com 32
Trick 1: Ensuring a pre-defined format for exported data @ Step 3 of 3. Applications:

1. Numbers stored as text to “General” format


2. Dates cleaning
3. Retaining prefix zeroes in cases of Credit Card & bank Account nos., ID Codes

www.yodalearning.com 33
Text to Column with Delimiter

www.yodalearning.com 34
04B | Text to Columns –
Cleaning up numbers w. trailing minus sign; replacing Dr/Cr w. +/-
Cleaning up numbers trailing minus sign
▪ Text-to-Columns is also
used to rectify Numbers
with trailing negative (-)
signs. E.g. From 212- to -212

www.yodalearning.com 36
04C | Text to Columns
Correcting invalid Dates
Correcting invalid Dates

• For Correcting Dates –


Apply “Confession Box”.
Choose the mistake or the
current sequence of date
components

• E.g. “DMY” – 29.10.2009 and


“YMD” for 20091031

www.yodalearning.com 38
05 | Data Cleaning using Formulas
LEFT(), RIGHT(), MID()

Extracted Numbers stored


as text. Use VALUE()

www.yodalearning.com 40
SEARCH() vs. FIND() vs. SUBSTITUTE()

Formula version of – Find & Replace

www.yodalearning.com 41
UPPER(), PROPER() & LOWER(); TRIM() & LEN()

www.yodalearning.com 42
VALUE(), T(), N(), REPT()

www.yodalearning.com 43
06 | Bonus: Our videos on YouTube
Flash Fill in Excel - Basic to Intermediate Top 30 Data Cleaning Tricks in Excel
https://youtu.be/Mj0Bl_KzGvk https://youtu.be/xg28Q0xB6aI

Text to Columns using Delimited in Excel Excel Data Cleaning Techniques for Data Analyst
https://youtu.be/MmUx13P67uc https://youtu.be/v3PMOr063yg

Text to Columns in Excel Clean Bank Statement (DR/CR) – Excel


https://youtu.be/IKsyUioJ20Y https://youtu.be/TgamD6znfnE

Text to Columns - Split One Column into Multiple Unique Formatting Tricks in Excel
https://youtu.be/f9O-mN6cRvM https://youtu.be/xg28Q0xB6aI

Fill Blank Cells Using Go To (Special)


https://youtu.be/5i8H3DpbB_0

How to Clean up RAW Data in Excel


https://youtu.be/HKA1sPPkfRs

Clean Bank Accounts data for Sales Collection


https://youtu.be/j_FA-0RasPY

45

You might also like