DATA BACKUP/RECOVERY

TRAINING
Dehner M. De Leon
Officer – Database Administration
IRRI Social Sciences Division
WHY THERE IS A NEED?
The unexpected computer
glitches always happen

whether it's the hardware
failure,

files become corrupted,

viruses attacks

WHY THERE IS A NEED?

power failures and spikes, or

you accidentally delete an essential
file, or

something else makes it impossible to
open and read a file.

WHY THERE IS A NEED?
As per IRRI’s BCM-RMQA
 Yearly risk assessment (2008-2010)
 Possible loss of research data
o Risk sources – technological failure
o Physically hazardous environments
o Lack of training/ awareness

WHY THERE IS A NEED?
As per IRRI’s BCM-RMQA

oDisasters/ Calamities






oRelocation of IT Equipments (2011)

SSD Data Backup/ Retrieval Procedure
Objective:
The purpose of this procedure is to safeguard
all the important/working files and databases
at IRRI SSD/GIS Workstations and laptops. in
compliance with IRRI’s Business Continuity
Management (BCM). To the event that a data
was loss a quick recovery plan is in place.
SSD Data Backup/ Retrieval Procedure
Data to be backed
up
 Data commonly stored on the secondary partition





WHAT IS A PARTITION?
Disk partitioning is the act of dividing
a hard disk drive into multiple logical
storage units referred to as partitions, to
treat one physical disk drive as if it were
multiple disks





500GB
250GB
250GB
1 Drive 1 Drive/ 2 Partitions
Primary Partition
Secondary Partition
•System Drive C:
•Program Files
•Temp Files
•Data Drive E:
•My Docs
•pst etc.
•Customized Programs
OS?
Data?
Etc?
SSD Data Backup/ Retrieval Procedure
Data to be backed
up
 Commonly stored on the secondary partition
 Research Data Files: All of your files including
Microsoft Office documents, spreadsheets,
presentations, databases, graphics files, audio &
video files, software, etc.
 Outlook – pst files


SSD Data Backup/ Retrieval Procedure
Data that are not necessary to
backup
 Files on USB Flash “thumb” drives
 Partition Drive C: (Operating System)
 NAS or \Netwin\SSD
 Stored data on the Cloud
 Non-official data
 If the source data capacity is larger than the backup
storage capacity (520GB > 500GB)


What is a Backup?
In Information Technology, a backup or the
process of backing up refers to making copies
of data so that these additional copies may be used
to restore the original after a data loss event .
 Synchronizing/ Mirroring - in computing is the process of
making sure that files in two or more locations are updated
through certain rules
 Imaging - A disk image is a single file or storage device
containing the complete contents and structure representing a
data storage medium or device



Modes of Backup
One-way Backup


Source Target
Two-way Backup


One-Way Synchronization (a.k.a. file mirroring / file
replication / file backup):

Files are expected to change in one location only. To
reconcile the changes, the synchronization process
copies files only in one direction. The two locations
are not considered equivalent. One location is
considered the Source and the other is considered
the Target. Files are pushed from Source to Target
(or files are pulled from Source to Target, but always
in one direction only). Source is said to be mirrored
to Target.




Source Target
Two-Way Synchronization (a.k.a. bi-directional
synchronization or both-ways synchronization):

This synchronization process copies files in both directions
to reconcile changes as needed. Files are expected
to change in both locations. The two locations are
considered equivalent.




Modes of Backup
One-way Backup


Source Target
Two-way Backup


Pros
 File Replication
 File Backup
 Can keep
previous data



Source Target
Cons
 Consumes more
space if unattended
 New file on target
will not reflect to
Source
 File
deleted/renamed to
Source will remain
in Target



Pros
 Source and
Target are up to
date



Cons
 Once deleted at the
Source same goes
to the Target


What are the available Backup Systems out
there??
Proprietary
Bundles with your External Storage Drive:
Seagate, Maxtor, Western Digital, etc.
Free Windows Backup Software
Cobian,Todo, Delta copy, Ace, Sync Toy, Windows 7 new feature






What are the available Backup Systems out
there?
One-way Backup


Source Target
Two-way Backup


Pros
 File Replication
 File Backup
 Can keep
previous data



Source Target
Cons
 Consumes more
space if unattended
 New file on target
will not reflect to
Source
 File
deleted/renamed to
Source will remain
in Target



Pros
 Source and
Target are up to
date



Cons
 Once deleted at the
Source same goes
to the Target


2 Main assets of SSD
1. Data
2 Main risks
1. Loss / Quality of Data


2. People
2. Loss of Life
(endanger)


Let’s focus on Data
The SSD Database
A comprehensive digital database

 It is an integrated data center
for the socio-economic data on
rice production system at the
farm level, rice demand, supply
and related data on the
national, and regional level
 It is composed of primary and
secondary data collected
through the research projects
and activities of SSD
 It is organized and accessible in
a user friendly fashion



Household Survey Database
 It is a rich collection of actual
farm and household level data on
rice production collected through
personal farmer
interviews,
farm record keeping,
and
periodic monitoring
of farm activities from various
sites in different rice growing
countries of Asia.
Core activities of SSD
Data
Collection
Data
Managemen
t
Data
Analysis
SSD Work Process Flow
Conceptualize

Pre-testing of the
survey
questionnaire
SSD Work Process Flow
Field Survey
Training of enumerators
SSD Work Process Flow
Coding and do quality
control of the data.

Confirmation
SSD Work Process Flow
Data Entry/ Cleaning
Data Analysis
SSD Work Process Flow
Incorporate results

Publications
Presentations
Data entry
Data cleaning (key variables)
Upload to public domain
Share data among members
Data merge
Data cleaning
Variable construction
Analysis
CSPro
Excel/STATA
STATA (program)
Current status of the household survey
data sets
 The SSD survey data sets are all over the
place, each researcher kept his own project
data set.
 It is kept in a format known only to the
researcher
 No focal person to ask who keep such and
such data set
 Lack of standard protocol for the repository
of data collected by SSD and NARES
collaborators
 Lack of standard system of
collecting/organizing the data sets


Where’s the Data??
STRASA
Africa
Rice
CSISA
PRSSP
GSR
VDS
Bohol
Project
CGISA
SSD

What steps are involved to build
this database?
1. Develop a standard format that
defines how all data sets by projects
must be formatted—how do we do
this?

 involve majority if not all
researchers in SSD who collect,
summarize, analyze and manage the
data sets of all completed and on
going projects in SSD.
 a series of meetings to develop a
common template for the data base.
 a workshop was held to implement
this template to at least one data
set for participant and to further
refine the template in terms of
applicability to majority of the data
sets in SSD
2. Merge and clean all the datasets to adhere
to a common format, codes etc. to establish
the master file database
3. With further consultation, agreed on which
components of the data sets will be freely
open to the public and which one will be of
limited access.
4. Do an additional processing of selected
variables to create the data set that will be
accessible to the general public

Steps involved…
CGISA
Unified Data
STRASA
Africa
Rice
CSISA
PRSSP
GSR
VDS
Bohol
Project
Database
Initial accomplishments
Number of data sets 10
Inclusive period 1993-2008
Sites 9 countries
Total number of records 6,622
Hundreds of variables: inputs and
outputs of rice
production,
demographics, income,
land profile, water use,
variety planted etc.

SSD Database
Unified
Database
IRRINAS
External
HDD
HDD Raid
SSD Workstations

INTERNET
SSD / GIS Servers managed in coordination with
ITS
SSD NAS (1TB)
Working files only!
Significance
 Once completed it will be the
first comprehensive digital
socioeconomic database on farm
level rice production in the rice
growing areas of Asia (Africa) .
 for use by researchers, govt and
academic institutions, donors and
other interested members of society
 a gold mine of information on
what is actually happening at the
farmer’s field
To attest SSD’s RMQA compliance
 Visit www.irri.org under Our
Sciences\Social Science & Economics
 Farm household survery
(http://geo.irri.org:8180/households)
 World rice statistics
(http:/geo.irri.org:8180/wrs)

 Procedures are in place (Netwin, Back-up
monitoring)
 SSD RMQA poster
 Awareness (OU heap approval,
notifications via email and mousepad
imprint)
People that work hard to make this happen
SSD Household Database
SDD RMQATeam
THANK YOU!!
What will it contain?
 1.) List of all completed
research projects of SSD
with some basic information
about it such as:
 project title
 project sites
 principal researcher
 duration of the project
 major variables collected
 main objective of the research
 project output (reports and paper
published)
 2. Selected summary tables on basic
information about rice production at
the farm level
 3. Detailed data by individual
household observations on the
following ;
 land use/profile
 household information
 inputs on rice production e.g. fertilizer,
seeds, pesticides, and labor use
 rice production practices – method of
crop establishment, variety planted,
level of mechanization etc
 costs and returns of rice production by
individual sample household

What will it contain?