You are on page 1of 2

Wrangle reporting

Introduction

In this project, i have tried to use most of the techniques shown in the datawrangling chapter

i have organized the project into the main steps: wich are: gathering/assessing/cleaning/anlysing/reporting

The data wrangling process cosnisite of the first 3 steps:


1- Data Gathering
2- Data assessing
3- Data cleaning

For the first step: which is gathering and colelcting the data needed for the project. we have used fifferent
techniques for gathering different sources of data: such as JISON files, TSV, web scrabing, Tweeter Api,..etc

3 differents sources proposed in this projects, where inputs and needed variables are scatter in the 3 files.
the role of the student is to collect all.

some of the files was ready made for the projects, others were downloadable directly from the URLs
communicated

For the Second step: after gathering the data. now we have all th inputs to start the project; the role is to
assess teh data, get usefull information

the question we ask often, is the data consistent, is is valid, is it tidy .. ?

techniques for visual & programatical data assesement being used. many Quality and tidiness issues been
found as per request in project.

below some finding about quality & tidyness issues:


Quality issues 1: wrong denumerator values
Quality issues 2: wrong numerator values
Quality issue 3: missing name 745 None, 55a,..etc
tidyness issues 1: missing info, each variable dosent form a column P1/P2/p3
tidyness issues 2: there are 3 dataframes, only 1 dataframe can be sufficeitn to anlyze the project
Quality issue 4: repeated URLS,..incosistency
Quaity issue 6: missing row (incompliteness), wrong dtype and ravirable type ( validity)
Quality issue 7: incomplitness (we have image info for only 2075);
Quality issue 10: Name with lower case, invalide.

In the last step: which is the cleaning phase, i have tired to clear all the issues raised during the assement
phase: where i have used, the drop/dtypes/..also the important feature of Melt/merg

conclusion

The 3 steps, are the foundation for any project; quality of handling the 3 steps will impact the overall project
output, the nalysis might be bais if any failures in the above 3 steps.
Iteration is very important, goign back from the begining, verifying the code, and reviewing the final data
before analysis step.

In [ ]:

You might also like