You are on page 1of 5

NAMP Data Migration SOP

1. Google Drive Links

Raw Data Folder https://drive.google.com/open?id=1-2ETLaCttVf18EfIhpnGuKLSc


R7o_Wif

Standard Format https://drive.google.com/open?id=10RKwCOPmDJXoo1oSRnv9


1E7na3bhQOMH

Tracking Sheet https://docs.google.com/spreadsheets/d/1De5JOjuUlgtoxqmeKcj


h9B3HrDzGoLxRuBMXun-a80o/edit?usp=sharing

Corrected Data Folder https://drive.google.com/drive/folders/1kfm0weNk5gi5loxjQaIUdB


V9QgunIq6p?usp=sharing

2. SOP FOR CORRECTING THE FORMAT

2.1. Getting the raw data Raw data has been arranged into folders
year-wise. Teams will be assigned data of a
specific state for a particular year. Assigned
raw data can be downloaded from the link
provided in Section 1.

2.2. Creating separate excel file for Some raw data files have multiple stations in
separate stations. a single excel workbook by having multiple
sheets in the workbook. But the NAMP Portal
takes only one sheet per excel workbook.

Hence, if there is any additional sheet in an


excel workbook, that sheet should be moved
to a new excel file. Basically, one excel file
should have only one sheet.

For eg. raw data of Chittoor, Andhra Pradesh


for 2018 was in file Chittoor2.xlsx. This file
has two sheets, one for station 582 and
another for station 742. Hence, while
handling this station, two new files are
created, one for each station. These files are
2018_Chittoor_582.xlsx and
2018_Chittoor_742.xlsx.

The filename for the corrected sheet should


be in format <year>_<city name>_<station
code>.xlsx

2.3. Making corrections in the format 1. In the raw data sheet, create two new
empty rows at the top.

2. Paste the two header rows from the


Standard Format into the sheet in which
correction is being done.

3. Match whether the data in the sheet


columns matches with parameters name
provided in the newly pasted header rows.
Make necessary movement of columns to
ensure that all the data in the sheet
corresponds to header rows parameter.

Example:
Raw Data File :
https://drive.google.com/open?id=1cndD1qIg
QKenduqyhhVuBa629yrUiKAt

Corrected Data File:


https://drive.google.com/open?id=1bpgIcfhga
hpk5nudajkjuAAnCxuwDav4

Kindly go through the aforementioned files to


understand the differences between raw data
file and corrected data file.

4. Remove the ​old header rows once the


matching is done. There should be only two
header rows i.e. the header rows pasted from
standard format.

5. If there is any extra column in raw data


which should not be there according to our
standard format, it should be removed after
asking in the group.

6. Similarly, if there are r​ ows at the bottom


pertaining to additional calculation such as
average, min, max etc. which should not be
there for raw data, these rows should be
removed after asking in the group.

7. Once the corrections are done, the file


should be uploaded in the Google Drive
folder for corrected data whose link is
provided in Section 1.

2.4. Some common types of corrections. 1. Date should be in the format ‘dd-mm-yyyy’.
No ‘/’ should be used in date.

2. In some sheets, extra rows are at the


bottom containing analysis of the data. These
rows are to be removed.

3. Ensure the final number of columns should


be upto ‘EL’.

4. In some raw data sheets, the headers do


not contain the time duration. For eg. our
standard format has six columns for four
hourly SO2 i.e. SO2_6AM_10AM,
SO2_10AM_14PM, SO2_14PM_18PM,
SO2_18PM_22PM, SO2_22PM_2AM,
SO2_2AM_6AM.

But in some files for eg., the headers for raw


data are given as SO2, SO2, SO2, SO2. No
time duration is mentioned. So, in that case,
the data should be moved to first four SO2
columns of standard format.

5. Some columns do not correspond to valid


time period. For eg. NH3 (24:00-00:00) is not
valid as initial time is same as final time. Such
columns generally do not have any data and
shall be deleted after mentioning in the group.

2.5. Updating the Shared Tracking Sheet Shared Tracking Sheet (link provided in
Section 1) is divided into tabs. The tab ‘Main
Tracking Sheet’ is for executive summary and
should not be modified by anyone.

Next tabs are ‘2018’, ‘2017’, and ‘2016’. Here,


go to the sheet corresponding to your
assigned data, make an entry for it if not
there. One entry per station. In columns,
‘G-R’ mention Y in front of months for which
raw data is available and N for other months.

Leave the column S i.e. ‘Verification Status’


empty as the verification will be done by other
team member who has not corrected the
data.

Column ‘V’ is for special remarks. Please


mention any special observation such as
whether any extra column was deleted or any
row deleted. Reference can be taken from
existing remarks in the tracking sheet.

3. SOP FOR VALIDATING THE DATA

3.1. Assignment Criteria The verification shall be done by a team


member who has not handled the file for
corrections.

3.2. Fetching the Raw Data and Corrected Data of assigned stations can be downloaded
Data from the links mentioned in Section 1 for Raw
Data Folder and Corrected Data Folder.

3.3. Checking the format The format of the corrected data should be
verified whether it corresponds to standard
format or not. Columns should be upto ‘EL’.
No extra column or rows containing analysis
should be there.

3.4. Checking data integrity Data integrity is to be checked column-wise.


To check data integrity of a particular column:
A) Take the sum of that column data in raw
data sheet.
B) Take the sum of that column in corrected
data sheet.

If both the sums are equal, this means the


data for that column is correct and has not
been altered due to any error.

3.5. Updating the Shared Tracking Sheet After verifying the station, the verifying person
shall update in shared tracking sheet that the
station has been verified. This update shall
be made in ​Column ‘S’ corresponding to
that station in shared tracking sheet​.
Special observation or comments can be
mentioned in column ‘V’ of the tracking sheet.

3.6. Copying the file to Verified Data folder Once the verification is complete and entry is
made in shared tracking excel, the file should
be copied to ‘Verified Data’ folder from where
it will be taken up by the script and uploaded
on the NAMP Portal.

You might also like