Task1 KMPG

Uploaded by

MOHAMMAD WASEEM SHAIKH

0% found this document useful (0 votes)

17 views2 pages

Original Title

task1-kmpg (1)

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

17 views2 pages

Task1 KMPG

Uploaded by

MOHAMMAD WASEEM SHAIKH

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Dear Manager,

Thank you for providing us with the three datasets from Sprocket Central Pty Ltd. The below table
highlights the summary statistics from the three datasets received. Please let us know if the figures
are not aligned with your understanding.

Table name No. of records Distinct customer ids Date data recieved
Customer 4000 4000 07-06-2020
demographic
Customer address 3999 3999 07-06-2020
Transaction data 20000 3496 07-06-2020

Notable data quality issues that were encountered and the methods used to mitigate the identified
data inconsistencies are as follows. Furthermore, recommendations have been provided to avoid the
reoccurrence of data quality issues and improve the accuracy of the underlying data used to drive
business decisions.

● Additional customer_ids in the ‘Transactions table’ and ‘Customer Address table’ but not in
‘Customer Master (Customer Demographic)’ Mitigation: Please ensure that all tables are from the
same period. Only customers in the Customer Master list will be used as a training set for our model.
This indicates that the data received may not be in sync with each other which may skew the
analysis results if there are missing data records. Please refer to excel file ‘data_outliers.xlsx’ for the
list of outliers between tables.

● Various columns, such as the brand of a purchase, or job title, have empty values in certain records
Mitigation: If only a small number of rows are empty, filter out the record entirely from the training
set for prediction. Else, if it is a core field, impute based on distribution in the training dataset. For
key datasets, such as transactions, less than 1% of transactions (totalling less than 0.1% of revenue)
have missing fields. These records have been removed from the training dataset.

● Inconsistent values for the same attribute (e.g. Victoria being represented as “V”, “Vic” and
“Victoria”) Mitigation: Use regular expression to replaced extended values into abbreviations to
ensure consistency across addresses. Recommendation: Enforce a drop-down list for the user
entering the data rather than a free text field. In order to construct meaningful variables for the
model, the data has been cleaned to avoid multiple representations of the same value. Additionally,
gender records where ‘U’ have been replaced based on the distribution from the training dataset.

● Inconsistent data type for the same attribute (e.g. numeric values for some fields and strings for
others) Mitigation: Convert selected records in characters to numeric. Remove non-numeric
characters from string. Recommendation: Ensure that fact tables in the given database have
constraints on data types. Having different data types for a given field make it difficult to interpret
results at the later stage. Therefore, appropriate data transformations are made to ensure
consistent data types for a given field.

Note: The data and information in this document is reflective of a hypothetical situation and client.
This document is to be used for KPMG Virtual Internship purposes only. Moving forward, the team
will continue with the data cleaning, standardisation and transformation process for the purpose of
model analysis. Questions will be raised along the way and assumptions documented.

After we have completed this, it would be great to spend some time with your data SME to ensure
that all assumptions are aligned with Sprocket Central’s understanding.

Kind regards,

Mimansha Agrawal

Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (122)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4610)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (231)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (589)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (401)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (137)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (599)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (537)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (842)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (897)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5806)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (345)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (266)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2409)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (271)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (234)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2259)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (821)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1898)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1715)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (440)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Rating: 4 out of 5 stars
4/5 (1091)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1929)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4203)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2104)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (3811)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2520)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1104)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (474)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (74)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1840)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (98)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
Rating: 3.5 out of 5 stars
3.5/5 (104)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
Rating: 3.5 out of 5 stars
3.5/5 (1946)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (104)
Data Wrangling and Visualization
Document48 pages
Data Wrangling and Visualization
Ysa Antonio
No ratings yet
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (806)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (792)
R Data Analysis Cookbook - Sample Chapter
Document29 pages
R Data Analysis Cookbook - Sample Chapter
Packt Publishing
No ratings yet
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1016)
Data Cleaning: A Brief Guide To
Document15 pages
Data Cleaning: A Brief Guide To
Subhankari Pradhan
100% (1)
Pytthon For Data Analysis From Scratch
Document37 pages
Pytthon For Data Analysis From Scratch
xwpom2
100% (3)
R For Health Data Science
Document365 pages
R For Health Data Science
ragmar
No ratings yet
Store24 A and B Questions - Fall - 2010
Document2 pages
Store24 A and B Questions - Fall - 2010
Arun Prasad
No ratings yet
Impact of Agriculture On Air Pollution Hristina Harizanova-Bartos
Document6 pages
Impact of Agriculture On Air Pollution Hristina Harizanova-Bartos
MOHAMMAD WASEEM SHAIKH
No ratings yet
Engineering Internship Final Presentation 22209on
Document14 pages
Engineering Internship Final Presentation 22209on
MOHAMMAD WASEEM SHAIKH
No ratings yet
3101.0 National, State and Territory Population TABLE 1. Population Change, Summary - Australia ('000)
Document13 pages
3101.0 National, State and Territory Population TABLE 1. Population Change, Summary - Australia ('000)
MOHAMMAD WASEEM SHAIKH
No ratings yet
Sprocket Central Pty LTD: Data Analytics Approach
Document8 pages
Sprocket Central Pty LTD: Data Analytics Approach
MOHAMMAD WASEEM SHAIKH
No ratings yet
CART v6.0
Document434 pages
CART v6.0
segorin2
No ratings yet
19BCS4030 Final Report
Document107 pages
19BCS4030 Final Report
Shivang Chanda
No ratings yet
A Robust Missing Value Imputation Method For Noisy Data
Document15 pages
A Robust Missing Value Imputation Method For Noisy Data
crackend
No ratings yet
Biological Databases
Document28 pages
Biological Databases
Moiz Ahmed Bhatti
No ratings yet
Early Childhood Predictors of Anxiety in Early Adolescence
Document13 pages
Early Childhood Predictors of Anxiety in Early Adolescence
Dessyadoe
No ratings yet
STROBE Checklist Cross-Sectional
Document2 pages
STROBE Checklist Cross-Sectional
Amalia Riska G
100% (1)
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
Document20 pages
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
Martin Iglesias
No ratings yet
D3181 PDF
Document6 pages
D3181 PDF
Nasir Majeed
No ratings yet
Implementation of A Smartphone Application in Medical Education: A Randomised Trial (iSTART)
Document9 pages
Implementation of A Smartphone Application in Medical Education: A Randomised Trial (iSTART)
Abdiaziz Walhad
No ratings yet
Rubin 1978
Document26 pages
Rubin 1978
EmmanuelRajapandian
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
Document10 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
Tomislav Petrušijević
No ratings yet
Radford 1974 Vascular Plants Systematics
Document732 pages
Radford 1974 Vascular Plants Systematics
Izabele Munaro
No ratings yet
Epilepsia Levotiracetam Vs Phenobabytal
Document9 pages
Epilepsia Levotiracetam Vs Phenobabytal
BYRON RENE MALDONADO CABRERA
No ratings yet
Bun Bun
Document304 pages
Bun Bun
Alexandra Elena Balanoiu
No ratings yet
Maternal Anxiety and Depression
Document7 pages
Maternal Anxiety and Depression
Cathy Georgiana H
No ratings yet
How To Critically Analyse Psychological Research: The University of Newcastle, Australia
Document13 pages
How To Critically Analyse Psychological Research: The University of Newcastle, Australia
Kate Calhoun
No ratings yet
Welcome To The BI Data Analyst Career Path: Introduction To Data Cheatsheet - Codecademy
Document8 pages
Welcome To The BI Data Analyst Career Path: Introduction To Data Cheatsheet - Codecademy
Rick Hunter
No ratings yet
Iteman Manual
Document32 pages
Iteman Manual
DelikanliX
No ratings yet
Internship Report Sakshi Barapatre
Document36 pages
Internship Report Sakshi Barapatre
Sakshi Barapatre
No ratings yet
Exploratory Data Analysis
Document14 pages
Exploratory Data Analysis
Gagana U Kumar
No ratings yet
Body Image: Meghan M. Gillen, Jamie Dunaev
Document8 pages
Body Image: Meghan M. Gillen, Jamie Dunaev
Rachel Victoriana
No ratings yet
Mice
Document142 pages
Mice
Hasan Tahsin
No ratings yet
Finding Latent Groups in Observed Data in MPLUS
Document11 pages
Finding Latent Groups in Observed Data in MPLUS
Lucía Stephanie Alvarez Nuñez
No ratings yet