You are on page 1of 2

Wrangle reporting


In this project, i have tried to use most of the techniques shown in the datawrangling chapter

i have organized the project into the main steps: wich are: gathering/assessing/cleaning/anlysing/reporting

The data wrangling process cosnisite of the first 3 steps:

1- Data Gathering
2- Data assessing
3- Data cleaning

For the first step: which is gathering and colelcting the data needed for the project. we have used fifferent
techniques for gathering different sources of data: such as JISON files, TSV, web scrabing, Tweeter Api,..etc

3 differents sources proposed in this projects, where inputs and needed variables are scatter in the 3 files.
the role of the student is to collect all.

some of the files was ready made for the projects, others were downloadable directly from the URLs

For the Second step: after gathering the data. now we have all th inputs to start the project; the role is to
assess teh data, get usefull information

the question we ask often, is the data consistent, is is valid, is it tidy .. ?

techniques for visual & programatical data assesement being used. many Quality and tidiness issues been
found as per request in project.

below some finding about quality & tidyness issues:

Quality issues 1: wrong denumerator values
Quality issues 2: wrong numerator values
Quality issue 3: missing name 745 None, 55a,..etc
tidyness issues 1: missing info, each variable dosent form a column P1/P2/p3
tidyness issues 2: there are 3 dataframes, only 1 dataframe can be sufficeitn to anlyze the project
Quality issue 4: repeated URLS,..incosistency
Quaity issue 6: missing row (incompliteness), wrong dtype and ravirable type ( validity)
Quality issue 7: incomplitness (we have image info for only 2075);
Quality issue 10: Name with lower case, invalide.

In the last step: which is the cleaning phase, i have tired to clear all the issues raised during the assement
phase: where i have used, the drop/dtypes/..also the important feature of Melt/merg


The 3 steps, are the foundation for any project; quality of handling the 3 steps will impact the overall project
output, the nalysis might be bais if any failures in the above 3 steps.
Iteration is very important, goign back from the begining, verifying the code, and reviewing the final data
before analysis step.

In [ ]:

You might also like