Welcome to Scribd!

Skip carousel

Data Cleaning: Hints and Tips

Uploaded by

nandini

0% found this document useful (0 votes)

18 views11 pages

Data cleaning method hints and tips

Original Title

Datacleaning

Copyright

Available Formats

PPT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Data cleaning method hints and tips

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

18 views11 pages

Data Cleaning: Hints and Tips

Uploaded by

nandini

Data cleaning method hints and tips

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 11

Search inside document

Data cleaning:

hints and tips

Felicity Clemens
Stata Users Group meeting
London, 17 & 18th May 2005

Felicity Clemens 18 May 2005

Introduction
Data cleaning one of the most time
consuming jobs of all!
Many ways of attacking the same
problem when using Stata
The talk will describe some common
problems and propose possible solutions
These are mostly reminders!
Felicity Clemens 18 May
2005

Contents
1) Introduction to the first datasets
2) Identifying and removing duplicates
by hand
3) Merging data and uses of the
merge command
4) Generating a moving target
variable
Felicity Clemens 18 May
2005

The study
A case-control study carried across 3
central European countries
Exposure of interest: exposure to
chemicals in the environment
Outcome of interest: cancer

Felicity Clemens 18 May

2005

Identifying duplicates in a
dataset
This can be done automatically (using
the duplicates set of commands)
We will demonstrate a manual method of
identifying duplicates
Two different possibilities:
The same data have been entered on more
than one occasion;
Felicity Clemens 18 May
2005

Identifying duplicates in a
dataset
This can be done automatically (using the
duplicates set of commands)
We will demonstrate a manual method of
identifying duplicates
Two different possibilities:
The same data have been entered on more
than one occasion;
Different data have been entered using the
same identifier (id numbers)
Felicity Clemens 18 May
2005

The merge command

A necessary command in data
management of most big studies
There are many different uses of the
merge command. We look at two of
them:
Simple merge on id
Multiple merge on id
Felicity Clemens 18 May
2005

Identifying a moving
target
Scenario: we have data for each town giving
the chemical concentration for each year
between 1982 and 2002
Problem: we need to identify the year counting
backwards from 2002 in which the chemical
changed from its 2002 level
Why? We need to overwrite the 2002 value
with a new value, and overwrite backwards
until the value changed
Felicity Clemens 18 May
2005

Identifying a moving
target (2)
rescode

y1990

y1991

y1992

1010113

1010114

1010115

1010116

1010117

1010118

1010119

1010120

Felicity Clemens 18 May

2005

Identifying a moving
target (3)
We will use the forval loop to examine the
relationship between each years
observed value and the observed value
for the previous year

Felicity Clemens 18 May

2005

Summary
Identifying duplicates can be done by
hand or automatically using the
duplicates set of commands
Use of the merge command to merge
on a specific variable, to multiply merge
datasets
Generating a moving target variable the
use of the forval loop
Felicity Clemens 18 May
2005

Progressive Inspection 100 Hour Event: PA-42-1000 Cheyenne 400
Document66 pages
Progressive Inspection 100 Hour Event: PA-42-1000 Cheyenne 400
Eleazar
No ratings yet
Stasioneritas
Document32 pages
Stasioneritas
Bojes Wandi
No ratings yet
Illustrating Evolutionary Computation with Mathematica
From Everand
Illustrating Evolutionary Computation with Mathematica
Christian Jacob
Rating: 4 out of 5 stars
4/5 (1)
Paradoxical Twins: Acme and Omega Electronics
Document2 pages
Paradoxical Twins: Acme and Omega Electronics
Ujjwal Airan
No ratings yet
Chapter 06 - Heteroskedasticity
Document30 pages
Chapter 06 - Heteroskedasticity
Lê Minh
100% (1)
Panel Data Assignment
Document32 pages
Panel Data Assignment
Fatima Zehra
No ratings yet
Nozzle Catalog
Document202 pages
Nozzle Catalog
Murat Başak
0% (1)
Telephone Directory BZA
Document58 pages
Telephone Directory BZA
saranya
No ratings yet
KEY Energy Webquest
Document12 pages
KEY Energy Webquest
Elena Bishop
No ratings yet
Panel Data Econometrics: Theory
From Everand
Panel Data Econometrics: Theory
Mike Tsionas
No ratings yet
Lecture1: Symbolic Model Checking With Bdds
Document33 pages
Lecture1: Symbolic Model Checking With Bdds
yathisha12
No ratings yet
Lab Manual Computer Science & Engineering
Document29 pages
Lab Manual Computer Science & Engineering
41- Vaibhav Vyas
No ratings yet
Remarks On Monte Carlo Method in Simulation of Financial Problems - Final2
Document9 pages
Remarks On Monte Carlo Method in Simulation of Financial Problems - Final2
Oyelami Benjamin Oyediran
No ratings yet
Butterfly Method
Document14 pages
Butterfly Method
lizbet08
No ratings yet
Butterfly Method
Document14 pages
Butterfly Method
Cj Reyes
No ratings yet
Thesis Pbs
Document7 pages
Thesis Pbs
Fiona Phillips
100% (2)
Final Questions For Last Class
Document5 pages
Final Questions For Last Class
Nitish Kumar
No ratings yet
Bahan Ajar Minggu 12 Simsis
Document10 pages
Bahan Ajar Minggu 12 Simsis
jovanka
No ratings yet
Marco P. Tucci David A. Kendrick Hans M. Hamman
Document40 pages
Marco P. Tucci David A. Kendrick Hans M. Hamman
hoahairau
No ratings yet
Chem H2LB/M2LB - Minh Nguyen Expt 7./post Lab - Expt. 7 Minh Nguyen
Document2 pages
Chem H2LB/M2LB - Minh Nguyen Expt 7./post Lab - Expt. 7 Minh Nguyen
Kendall
0% (1)
Discrete Choice Methods With Simulation: Kenneth E. Train
Document8 pages
Discrete Choice Methods With Simulation: Kenneth E. Train
Jacky C.Y. Ho
No ratings yet
Econometrics Problems Autocorrelation An
Document42 pages
Econometrics Problems Autocorrelation An
Janestacy Anyango
No ratings yet
Econometrics Definations
Document5 pages
Econometrics Definations
mehwish sughra
No ratings yet
Gustavo Stas PCA Generic
Document52 pages
Gustavo Stas PCA Generic
Mohammad Nahid Mia
No ratings yet
Climate Models
Document25 pages
Climate Models
Julian Quiroz
No ratings yet
Lecture 5: Modelling The Dual Price Hypothesis: Honours Finance (Advanced Topics in Finance: Nonlinear Analysis)
Document60 pages
Lecture 5: Modelling The Dual Price Hypothesis: Honours Finance (Advanced Topics in Finance: Nonlinear Analysis)
Rama Imandani
No ratings yet
Greenhouse Monitoring & Controlling Agent: AI314 Autonomous Multiagent Systems 2
Document6 pages
Greenhouse Monitoring & Controlling Agent: AI314 Autonomous Multiagent Systems 2
Amal Sherif
No ratings yet
Balance de Hidratacion
Document34 pages
Balance de Hidratacion
Miguel Angel Izarra Porras
No ratings yet
Exercises: Di Culty and Topics Covered
Document2 pages
Exercises: Di Culty and Topics Covered
skullskull
No ratings yet
Dav Cia 2
Document6 pages
Dav Cia 2
Kishan Tiwari
No ratings yet
CIA PMF Allocation
Document28 pages
CIA PMF Allocation
SEEMA NIHALANI
No ratings yet
Clustering Lecture
Document46 pages
Clustering Lecture
ahmetdursun03
No ratings yet
Clarke
Document27 pages
Clarke
Stoune Stoune JR
No ratings yet
Butterfly Method - Foa
Document14 pages
Butterfly Method - Foa
Xai
No ratings yet
Units of Conversion, Significant Figures, Scientific Notation and Temperature
Document34 pages
Units of Conversion, Significant Figures, Scientific Notation and Temperature
Beatrice Agustin
No ratings yet
Assignment Problems
Document7 pages
Assignment Problems
Hari Haran
No ratings yet
SERP2003001
Document20 pages
SERP2003001
Saragih Hans
No ratings yet
Fiskom - CFD Vol.I by K A.hoffmann
Document500 pages
Fiskom - CFD Vol.I by K A.hoffmann
AsmaAL-farizi
No ratings yet
GMM and OLS Estimation and Inference For New Keynesian Phillips Curve
Document26 pages
GMM and OLS Estimation and Inference For New Keynesian Phillips Curve
Quang Kien Ta
No ratings yet
Operation Research
Document78 pages
Operation Research
kamun0
No ratings yet
Computers and Chemical Engineering
Document23 pages
Computers and Chemical Engineering
ManuelRamos
No ratings yet
Tutorial13 Basic TimeSeries
Document80 pages
Tutorial13 Basic TimeSeries
Ghulam Nabi
No ratings yet
Books 3337 0 0 (1201-1240)
Document40 pages
Books 3337 0 0 (1201-1240)
Pablo Ledezma
No ratings yet
Mechanical Systems and Signal Processing: Justin Flett, Gary M. Bone
Document12 pages
Mechanical Systems and Signal Processing: Justin Flett, Gary M. Bone
achraf zegnani
No ratings yet
Chapter 6-SCM - S06
Document54 pages
Chapter 6-SCM - S06
eurosign100
No ratings yet
Ass1 mth513
Document3 pages
Ass1 mth513
Muhammad Idrees
0% (1)
Lecture 22
Document6 pages
Lecture 22
Winny Shiru Machira
No ratings yet
Chapter 7
Document38 pages
Chapter 7
Mian Muhammad Rizwan
33% (3)
Sticky Information Models in Dynare: Dynare Working Papers Series
Document18 pages
Sticky Information Models in Dynare: Dynare Working Papers Series
Laur Laur
No ratings yet
Statistic - Rich Task 2
Document3 pages
Statistic - Rich Task 2
Tooba Aamir
No ratings yet
Introduction To Modeling in With Odes: Mathematical
Document83 pages
Introduction To Modeling in With Odes: Mathematical
Nirmala Pasala
No ratings yet
SSRN id356241EconomicForecastingLessonsL
Document38 pages
SSRN id356241EconomicForecastingLessonsL
christian.lochmueller
No ratings yet
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
Document47 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
Ashish Gupta
100% (1)
Module 4-1
Document23 pages
Module 4-1
Aditya ranjan Bubun
No ratings yet
(Daum) Nonlinear Filters - Beyond The Kalman Filter
Document13 pages
(Daum) Nonlinear Filters - Beyond The Kalman Filter
John Adcox
No ratings yet
A Comprehensive Note On The Informed Principal With Private Values and Independent Types
Document19 pages
A Comprehensive Note On The Informed Principal With Private Values and Independent Types
Lucía Quesada
No ratings yet
Infra 4 Deterioration
Document19 pages
Infra 4 Deterioration
nob1taku
No ratings yet
DSG Bring Your Own Project
Document8 pages
DSG Bring Your Own Project
kritig
No ratings yet
Mavroeidis-Weak Identi Cation of Forward-Looking Models in Monetary
Document29 pages
Mavroeidis-Weak Identi Cation of Forward-Looking Models in Monetary
elielo0604
No ratings yet
DB140
Document35 pages
DB140
Asad khattak
No ratings yet
Unit III Gene Structure and Expression Chapter 13 Meiosis
Document77 pages
Unit III Gene Structure and Expression Chapter 13 Meiosis
Sarah Chen
No ratings yet
Assignment Week3
Document20 pages
Assignment Week3
totomkos
No ratings yet
Garcia and Tsafac - 2011
Document36 pages
Garcia and Tsafac - 2011
Maruška Vizek
No ratings yet
Data Mining Assignment
Document8 pages
Data Mining Assignment
Amanat Construction
No ratings yet
RA 7925 Report MecLaws
Document6 pages
RA 7925 Report MecLaws
xydia
No ratings yet
Special Schedule 4 Exp 6
Document9 pages
Special Schedule 4 Exp 6
Prìyañshú Guptã
No ratings yet
BlueCoat ProxySG
Document3 pages
BlueCoat ProxySG
NoJster
No ratings yet
Service Manual: EPSON Stylus PHOTO 890/1280/1290
Document205 pages
Service Manual: EPSON Stylus PHOTO 890/1280/1290
Антон Русланов
No ratings yet
MWX FFDDE
Document6 pages
MWX FFDDE
Alfin Rizqiadi
No ratings yet
English9 ANFIS Org Indonesia
Document13 pages
English9 ANFIS Org Indonesia
Chevi Rahayu
No ratings yet
C Sharp Code Contracts Succinctly
Document90 pages
C Sharp Code Contracts Succinctly
pozoroberto
No ratings yet
Oauth and Tyk (Ext - Draft)
Document17 pages
Oauth and Tyk (Ext - Draft)
Carlos G. Rodríguez
No ratings yet
Nas 1790
Document4 pages
Nas 1790
Sagar Pawar
No ratings yet
Truck Total Vehicle Actros, Type 930-934 Introductory Training Final Test
Document10 pages
Truck Total Vehicle Actros, Type 930-934 Introductory Training Final Test
engdistya
No ratings yet
Moisture in The Analysis Sample of Coal and Coke: Standard Test Method For
Document3 pages
Moisture in The Analysis Sample of Coal and Coke: Standard Test Method For
merifie renegado
No ratings yet
Reservoir Compaction and Seafloor Subsidence at Valhall
Document10 pages
Reservoir Compaction and Seafloor Subsidence at Valhall
Ignasi Aliguer
No ratings yet
IEA: The Role of Gas
Document110 pages
IEA: The Role of Gas
Scott Parker
No ratings yet
Motion of A Projectile (Section 12.6) Today's Objectives: In-Class Activities
Document15 pages
Motion of A Projectile (Section 12.6) Today's Objectives: In-Class Activities
bigbangmelvan
No ratings yet
Esay Pact Rele, Contactores
Document84 pages
Esay Pact Rele, Contactores
Marlene Ruiz
No ratings yet
Panasonic Hdc-sd800 SM
Document79 pages
Panasonic Hdc-sd800 SM
MD AL Faishal
No ratings yet
Invitation To The 10th ISICAM-Dr. Asep Sopandiana Angga, SPJP, FIHA
Document2 pages
Invitation To The 10th ISICAM-Dr. Asep Sopandiana Angga, SPJP, FIHA
aneu28
No ratings yet
Ceiling Loudspeaker: Installation Note
Document8 pages
Ceiling Loudspeaker: Installation Note
DAN
No ratings yet
Aprilaire Owners Manua
Document16 pages
Aprilaire Owners Manua
icemann83
No ratings yet
Cleaning, Gauging, Pressure Test & Drying: If Applicable
Document1 page
Cleaning, Gauging, Pressure Test & Drying: If Applicable
Mark K
No ratings yet
Bank Stabilization & Improvement
Document34 pages
Bank Stabilization & Improvement
Osbert Grey
No ratings yet
FEMA P-750 Resource Papers
Document134 pages
FEMA P-750 Resource Papers
Gary Gutierrez Villegas
No ratings yet
Plastic Analysis Lecture
Document20 pages
Plastic Analysis Lecture
martrant
No ratings yet
UPX X.XX OEP Finder
Document2 pages
UPX X.XX OEP Finder
mankavar
No ratings yet
Ian Stewart On Minesweeper
Document5 pages
Ian Stewart On Minesweeper
Marcopollo824
No ratings yet
Fiori Self Loading Concrete Mixer DB 460
Document2 pages
Fiori Self Loading Concrete Mixer DB 460
Anangtri Wahyudi
No ratings yet