Welcome to Scribd!

KPMG Virtual Internship Data Quality Report

Uploaded by

0% found this document useful (0 votes)

200 views2 pages

The document summarizes data quality issues found across three datasets provided by Sprocket Central including missing or inconsistent customer information, transaction details, and addresses. Methods used to address the issues are described along with recommendations to improve data accuracy and consistency such as enforcing dropdown lists and data type constraints. Next steps involve additional data cleaning and standardization before meeting with a Sprocket Central subject matter expert to validate assumptions.

Original Description:

Original Title

ans kpmg

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

200 views2 pages

KPMG Virtual Internship Data Quality Report

Uploaded by

ranbir singh

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Note: The data and information in this document is reflective of a hypothetical situation and client.

This document is to be used for KPMG Virtual Internship purposes only.

Dear [Client point-of-contact],

Thank you for providing us with the three datasets from Sprocket Central Pty Ltd. The below table
highlights the summary statistics from the three datasets received. Please let us know if the figures are
not aligned with your understanding.

Table name Accuracy completeness consistency Currency

Customer -DOB inaccurate -Job title blanks -gender inconsistant -diseased

Demogra customers filled
Age missing Customer id incomplete
phic out
Customer Address -customer id incomplete -states inconsistant
Transaction Data -Profit missing
-customer id incomplete -
Online order blanks
Brand blanks
Notable data quality issues that were encountered and the methods used to mitigate the identified data
inconsistencies are as follows. Furthermore, recommendations have been provided to avoid the re-
occurrence of data quality issues and improve the accuracy of the underlying data used to drive business
decisions.
● Additional customer_ids in the ‘Transactions table’ and ‘Customer Address table’
but not in ‘Customer Master (Customer Demographic)’
Mitigation: Please ensure that all tables are from the same period. Only customers in the Customer Master
list will be used as a training set for our model.
This indicates that the data received may not be in sync with each other which may skew the
analysis results if there are missing data records. Please refer to excel file ‘data_outliers.xlsx’ for
the list of outliers between tables.

● Various columns, such as the brand of a purchase, or job title, have empty values
in certain records
Mitigation: If only a small number of rows are empty, filter out the record entirely from the training set for
prediction. Else, if it is a core field, impute based on distribution in the training dataset.
For key datasets, such as transactions, less than 1% of transactions (totalling less than 0.1% of
revenue) have missing fields. These records have been removed from the training dataset.

● Inconsistent values for the same attribute

(e.g. Victoria being represented as “V”, “Vic” and “Victoria”)
Mitigation: Use regular expression to replaced extended values into abbreviations to ensure consistency
across addresses.
Recommendation: Enforce a drop-down list for the user entering the data rather than a free text field.
In order to construct meaningful variables for the model, the data has been cleaned to avoid
multiple representations of the same value. Additionally, gender records where ‘U’ have
been replaced based on the distribution from the training dataset.

● Inconsistent data type for the sameattribute

(e.g. numeric values for some fields and strings for others)
Mitigation: Convert selected records in characters to numeric. Remove non-numeric characters from string.
Recommendation: Ensure that fact tables in the given database have constraints on data types.
Having different data types for a given field make it difficult to interpret results at the later stage.
Therefore, appropriate data transformations are made to ensure consistent data types for a given
field.
Moving forward, the team will continue with the data cleaning, standardisation and transformation process
for the purpose of model analysis. Questions will be raised along the way and assumptions documented.
After we have completed this, it would be great to spend some time with your data SME to ensure that all
assumptions are aligned with Sprocket Central’s understanding.

Kind regards,
[Junior Consultant Name]

KPMG - VI Data Quality Assessment
Document1 page
KPMG - VI Data Quality Assessment
Md Ohidur Rahman Khan
50% (2)
KPMG Data Quality Report for Sprocket Central
Document2 pages
KPMG Data Quality Report for Sprocket Central
Yeji
100% (4)
PLSQL 9 1 Practice
Document7 pages
PLSQL 9 1 Practice
Paola Valdez
0% (3)
#Draft E-MAIL: Godehard - SF
Document6 pages
#Draft E-MAIL: Godehard - SF
Prabhav Garg
100% (1)
ICICI Bank Performance Review
Document32 pages
ICICI Bank Performance Review
Sandesh Kamble
50% (2)
An Introduction On Logica-Be Brilliant Together
Document20 pages
An Introduction On Logica-Be Brilliant Together
Logica Plc
No ratings yet
PM - BD Learnathon
Document3 pages
PM - BD Learnathon
abhinav agarwal
No ratings yet
SAP Engineering Change Management Overview
Document20 pages
SAP Engineering Change Management Overview
Anonymous IVxadA7HR
No ratings yet
Spring Framework Reference Documentation PDF
Document1,194 pages
Spring Framework Reference Documentation PDF
chumix23
No ratings yet
Task 1-Email
Document3 pages
Task 1-Email
hendra kuswantoro
No ratings yet
Data Analytics Consulting: Mohammad Waseem Shaikh 17cs002052
Document16 pages
Data Analytics Consulting: Mohammad Waseem Shaikh 17cs002052
Wasim Shaikh
No ratings yet
Accuracy Completeness Consistency Currency Relevancy Validity
Document2 pages
Accuracy Completeness Consistency Currency Relevancy Validity
Geetanjali
100% (1)
Anjali Final Report
Document51 pages
Anjali Final Report
pioneer professionals
No ratings yet
TASK 1 Data - Quality - Analysis
Document2 pages
TASK 1 Data - Quality - Analysis
hendra kuswantoro
No ratings yet
A Study On Customer Satisfaction of Mobile Wallet Services Provided by Paytm
Document8 pages
A Study On Customer Satisfaction of Mobile Wallet Services Provided by Paytm
IJEMR Journal
No ratings yet
FInal Internship Report-Rajesh Kumar K.R (MB274)
Document58 pages
FInal Internship Report-Rajesh Kumar K.R (MB274)
Bharathi Raju
No ratings yet
Dashboards Vs Story Boards Vs Infographics
Document24 pages
Dashboards Vs Story Boards Vs Infographics
Adil Bin Khalid
No ratings yet
CSR Activity at AMUL
Document12 pages
CSR Activity at AMUL
Tejas Shah
88% (8)
Sprocket Central Pty LTD: Data Analytics Approach
Document7 pages
Sprocket Central Pty LTD: Data Analytics Approach
Raghav Gupta
No ratings yet
Mini Project-Data Mining
Document25 pages
Mini Project-Data Mining
Stuti Prasad
No ratings yet
Prakasha
Document12 pages
Prakasha
Sharan Gowda
No ratings yet
4-Unit Chapter-1 I) G ST Definition:: o The End-Consumer
Document4 pages
4-Unit Chapter-1 I) G ST Definition:: o The End-Consumer
Mohamed Faizal
No ratings yet
Project Report On Airtel
Document56 pages
Project Report On Airtel
Himanshu Chawla
No ratings yet
CRM Project Report
Document125 pages
CRM Project Report
Sameer G
No ratings yet
Comparing Customer Satisfaction with Airtel and Jio
Document12 pages
Comparing Customer Satisfaction with Airtel and Jio
Section D
No ratings yet
Churn Prediction Using Data Mining
Document3 pages
Churn Prediction Using Data Mining
Ozioma Ihekwoaba
No ratings yet
Airtel Brand Management
Document25 pages
Airtel Brand Management
Prakash Bhojwani
No ratings yet
A Study On Drivers of Brand Switching Behaviour of Consumers From Jio To Airtel
Document41 pages
A Study On Drivers of Brand Switching Behaviour of Consumers From Jio To Airtel
Nagarjuna Reddy
No ratings yet
Titan Insurance Case Study Statistical Analysis
Document17 pages
Titan Insurance Case Study Statistical Analysis
BISHNUDEV DEY
No ratings yet
LSCM Assign....
Document13 pages
LSCM Assign....
ARPITA BAGH
No ratings yet
BPOqwr
Document69 pages
BPOqwr
Laxman Zagge
No ratings yet
Reliance Jio Predatory Pricing or Predatory Behaviour Analysis
Document20 pages
Reliance Jio Predatory Pricing or Predatory Behaviour Analysis
Karan Vashee
No ratings yet
SM (Case Study..)
Document3 pages
SM (Case Study..)
gyanesh83
75% (4)
Machine Learning: Types of Algorithms
Document11 pages
Machine Learning: Types of Algorithms
Hernán Ramírez Zavaleta
No ratings yet
Airtel AR 2016-17 CtoC 260717
Document276 pages
Airtel AR 2016-17 CtoC 260717
Sivaganesh Geddada
No ratings yet
Infosys Helps Outotec: Case Study
Document4 pages
Infosys Helps Outotec: Case Study
Shivam
No ratings yet
To Kill or To Reincarnate
Document2 pages
To Kill or To Reincarnate
npraj888312
No ratings yet
Rhythm SIP Report (Clo. Print) PDF
Document27 pages
Rhythm SIP Report (Clo. Print) PDF
chander mauli tripathi
No ratings yet
Study On Customer Satisfaction Regarding E-Commerce Company
Document65 pages
Study On Customer Satisfaction Regarding E-Commerce Company
Shweta Rajput
No ratings yet
KPMG Virtual Internship - Presentation For Client - by - Dhruv Bhatia
Document13 pages
KPMG Virtual Internship - Presentation For Client - by - Dhruv Bhatia
DhruvBhatia
No ratings yet
Default of Credit Card Clients
Document27 pages
Default of Credit Card Clients
deepu
No ratings yet
Customer Churn Prediction
Document19 pages
Customer Churn Prediction
surajkadam
100% (1)
Century Ply Business Solution Design Ver001
Document22 pages
Century Ply Business Solution Design Ver001
Tathagata Dutta
No ratings yet
Indusind Bank Presention
Document12 pages
Indusind Bank Presention
Saurav Sharan
No ratings yet
Casestudy:-Indian Telecom War: Startup Reliance Takes On Leader Airtel in 4G Services
Document7 pages
Casestudy:-Indian Telecom War: Startup Reliance Takes On Leader Airtel in 4G Services
Punith Kumar Reddy
No ratings yet
National Skill India Questionnaire
Document2 pages
National Skill India Questionnaire
Ujlayan Himanshu
100% (1)
BRM Report Writing
Document17 pages
BRM Report Writing
Udit Singh
100% (1)
Case Study Mansi Vishnoi FMS MBA 2020-22-105
Document4 pages
Case Study Mansi Vishnoi FMS MBA 2020-22-105
AAYUSHI MOONKA
No ratings yet
RESEARCH REPORT of PM Rakesh 13
Document10 pages
RESEARCH REPORT of PM Rakesh 13
Rakesh Swain
No ratings yet
Bagedari Sector
Document32 pages
Bagedari Sector
swathi
No ratings yet
MGT1022F20TE1DA218BEC0363
Document13 pages
MGT1022F20TE1DA218BEC0363
Dhruv Kapadia
No ratings yet
Loan risk analysis for personal loan request
Document25 pages
Loan risk analysis for personal loan request
sanjeebani swain
No ratings yet
Main Project PDF
Document56 pages
Main Project PDF
Naira Puneet Bhatia
No ratings yet
Predictive Analysis For Big Mart Sales Using Machine
Document11 pages
Predictive Analysis For Big Mart Sales Using Machine
563 K. Vaishnavi
No ratings yet
Value Chain Analysis of Tata Motors
Document24 pages
Value Chain Analysis of Tata Motors
Nithesh Pawar
No ratings yet
Comparing Financial Performance of ITC and Dabur
Document6 pages
Comparing Financial Performance of ITC and Dabur
sarthak.lad
No ratings yet
Effectieness of CRM at Airtel
Document97 pages
Effectieness of CRM at Airtel
grewal2
67% (3)
Impledge Recruitment Freshers
Document4 pages
Impledge Recruitment Freshers
Shivansh Uttam
No ratings yet
Big Data Governance-Research Report
Document19 pages
Big Data Governance-Research Report
Saif-Ur -Rehman
No ratings yet
Process Modelling N Digital Convergence
Document22 pages
Process Modelling N Digital Convergence
Arshit Mahajan
No ratings yet
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
KPMG Task1
Document2 pages
KPMG Task1
2021519199.shafaque
No ratings yet
Addressed Issues
Document3 pages
Addressed Issues
yash rao
100% (1)
0000 Data Management and Data Life Cycle UNHCR
Document14 pages
0000 Data Management and Data Life Cycle UNHCR
Leila Abotira
No ratings yet
SEI CERT C++ Coding Standard 2016 v01
Document435 pages
SEI CERT C++ Coding Standard 2016 v01
John Lau
No ratings yet
8 Online Training BES Design of Bridge Structure and Foundation 24 26 Februry 2021 PDF
Document4 pages
8 Online Training BES Design of Bridge Structure and Foundation 24 26 Februry 2021 PDF
Nilay Gandhi
No ratings yet
Add Operation Manual for MX-M350/M450 Fax Function
Document8 pages
Add Operation Manual for MX-M350/M450 Fax Function
phu polyfax
No ratings yet
Hot Potatoes
Document4 pages
Hot Potatoes
api-233558894
100% (1)
(Overall Equipment Effectiveness) : System Architecture Production Monitoring
Document1 page
(Overall Equipment Effectiveness) : System Architecture Production Monitoring
zaidan.abimanyu
No ratings yet
Manuals
Document239 pages
Manuals
Majid Mehmood
No ratings yet
Chart of Accounts of The Company Code: SAP FICO Minimal Configuration For MM Posting
Document8 pages
Chart of Accounts of The Company Code: SAP FICO Minimal Configuration For MM Posting
vinodbabu_78
No ratings yet
Optus Loop Downloading OptusLoop VoiceUC For Desktop QRG PDF
Document6 pages
Optus Loop Downloading OptusLoop VoiceUC For Desktop QRG PDF
marta
No ratings yet
T50e Replace Drive
Document24 pages
T50e Replace Drive
vcalderonv
No ratings yet
This Study Resource Was: ISO 27001 Documentation Toolkit
Document6 pages
This Study Resource Was: ISO 27001 Documentation Toolkit
fawas hamdi
No ratings yet
LANswitching
Document6 pages
LANswitching
gsicm2
No ratings yet
omxecics530_omegcics_guide_and_reference
Document320 pages
omxecics530_omegcics_guide_and_reference
knobi
No ratings yet
APCS Calculator Time Questions
Document5 pages
APCS Calculator Time Questions
versa005
No ratings yet
RANE Sl2 Manual DJ
Document16 pages
RANE Sl2 Manual DJ
Matthieu Tab
No ratings yet
FT232 FT245 APIv131
Document13 pages
FT232 FT245 APIv131
Kortner
No ratings yet
Javascript
Document81 pages
Javascript
Kirubanandhan Nandagobal
No ratings yet
8086 Memory Segmentation
Document11 pages
8086 Memory Segmentation
MAHALAKSHMI MALINI
No ratings yet
Remedy For GPS Rollover in 2022: GP-80/90/150, GP-1650/1850, SC-50/110 and Others
Document3 pages
Remedy For GPS Rollover in 2022: GP-80/90/150, GP-1650/1850, SC-50/110 and Others
Vikas Singh
No ratings yet
Programming Essentials Webinar Day 1
Document23 pages
Programming Essentials Webinar Day 1
Mimie Cali
No ratings yet
Delivering Quality Customer Service
Document18 pages
Delivering Quality Customer Service
Rhayan De Castro
No ratings yet
Ordercoe - Reu 611 59NVD
Document2 pages
Ordercoe - Reu 611 59NVD
Thor Odin
No ratings yet
Dialog Boxes: Popup child windows for special input/output
Document6 pages
Dialog Boxes: Popup child windows for special input/output
Jversha
No ratings yet
Big Data Workshop Contents
Document2 pages
Big Data Workshop Contents
Sunil Patil
No ratings yet
An Airport Flights Database System Report
Document25 pages
An Airport Flights Database System Report
Bilbil Isma
No ratings yet
2021, Annual Procurement Plan
Document178 pages
2021, Annual Procurement Plan
Mark Santos
No ratings yet
Design and Fabrication of A Password Protected Vehicle Security and Performance Monitoring System
Document6 pages
Design and Fabrication of A Password Protected Vehicle Security and Performance Monitoring System
Raheel Fareed
No ratings yet