Professional Documents
Culture Documents
net/publication/308729752
CITATIONS READS
27 855
2 authors:
All content following this page was uploaded by Manogar Ellappan on 12 November 2018.
So enterprises needs solutions to manage this explosion of There are several kind of Storage Optimization methods
information.Data warehouse takes up huge amount of storage which can be used to store the data. They areThin
in terabytes, petabytes of data. Most of the data stores contains proVISIOning, Snapshots, Clones, Deduplication and
the data derived from other data, so there is a chance, lot of Compression. But the techniques which are commonly used
duplicate data might be stored inside the data warehouse. in data storage optimization are data deduplication and data
While storing a large amount of duplicate data, it may affect compression, which is explained in detail the sections 3 and 4.
performance, bandwidth, storage inconsistencies, etc., for
A. Thin Provisioning
161
2014 Sixth International Conference on Advanced Computing(lCoAC)
technique which eliminates the unused capacity of physical "corresponding author." This is the author to whom proofs of
disk thereby achieve higher storage utilization. Fig. 1 shows the paper will be sent. Proofs are sent to the corresponding
Traditional storage vs. thin provisioning storage and how author only.
storage administrators typically allocate more storage than it is
C. Clones
required for applications [4]. Volume A contains only 200 GB
physical data out of 500 GB, but the remaining unused 300 Clones are an advanced form of writable snapshots. They are
GB is allocated for future use. essentially a snapshot volume presented as a 'real' volume can
be modified or changed. Initially clones had limited value,
which is used primarily for test and development applications.
With the rise of Virtualization field, especially desktop
virtualization, clones have immense values in reducing the
storage footprint [5] that these environments require. They can
also help improve performance since hundreds of virtual
machine-based storage images can now be loaded into a
cache.
D. Data Deduplication
162
2014 Sixth International Conference on Advanced Computing(lCoAC)
Deduplication
I
Source Target
I I
Single Instance Fixed Size Variable Size
(Client) (Server)
Storage Chunking Chunking
I I
In - Line Post - Process
163
2014 Sixth International Conference on Advanced Computing(ICoAC) 4
1) Source (Client) side Deduplication:As the name implies disk (Post), or both before and after written to the disk
source based deduplication process happens at the client side. (Hybrid) [8].
Fig. 3 shows that,the source based deduplication process is
1) An Inline deduplication can be done the client side or
carried out by placing dedupe agent at the physical or virtual
when the data is transferring from the client/source to the
server. Here, dedupe agent will check all the duplicating over
server. Inline deduplication is a process where the data is
the backup server and then only unique data blocks gets
deduplicated before it is written to disk. If a block of data
transmitted to the disk. This process is done before the data
arrives into the process/appliance, it analysis that, whether the
goes over the network. The advantage of the source side
data block has been processed already or not. If the data
deduplication is, it requires fewer bandwidth requirements to
processed before, it pulled away from the redundant block
the data and only the changed data gets backed up.
1------------------------- ,
then writes a reference to that block. If it identify the block of
I I
I I
I
data is unique, the process/appliance writes the block into the
I
I [)e I
dltp ll
storage. Since, the analysis data block is initiated before it is
I
..,cate [) written. This method of deduplication used in work in RAM,
at a
I
I to minimizes 110 overhead and thereby it saves disk space.
I
I However, it requires substantial resources and this become a
I
I network's bottleneck [13]. The advantage of inline
deduplication is that it does not require extra disk space. Data
Domain, Hewlett Packed and Diligent Technologies are few of
the companies offering inline and chunk based deduplication
Backup Servel' products [15], [16], [17]. In-Line deduplication process has
been depicted in Fig. 5.
164
2014 Sixth International Conference on Advanced Computing(ICoAC) 5
redundancy. Chunking is a process which breaks the data into chunk boundaries is Rabin fmger printing algorithm, and
a number of small pieces called chunks or blocks and it stores each chunk can be converted into hash values using
only the unique chunks. common hashing technique such as MD5 or SHAI.
Data Storage
Optimization
V. COMPARASION OF METRICS ACROSS DEDUPLICATION
METHODS
consider a large file which is changed in only a few bytes, in Metrics File Level Fixed Size Variable Size
this situation this approach makes the chunks to be reindexed
and stored in the backup location. For example, in a docwnent Deduplication Ratio Less Medium High
of 5GB which is changed by the user in only 100KB, the old
and new file having different checkswns, but SIS stores the Processing Time Medium Less High
165
2014 Sixth International Conference on Advanced Computing(ICoAC) 6
large scale data storage. And also to develop an efficient Performance Computing and Communications & 2013 IEEE
International Conference on Embedded and Ubiquitous Computing
method to reduce fragmentation and obtain high write and
(HPCC_EUC),pp 1982 - 1989,2013.
read throughput.
[24] Daehee Kim, Sejun Song, Baek-Young Choi, "SAFE: Structure-Aware
File and Email Deduplication for Cloud-based Storage Systems," pp
REFERENCES 130-137,IEEE,2013.
[2] John Gantz, David Reinsel. (June 20ll), "Extracting Value from
Chaos," Sponsored by EMC Corporation [Online]. Available:
http://www. emc.com!
[3] Min Li, Shravan Gaonkar, Ali R. Butt, Deepak Kenchammana, and
Kaladhar Voruganti, "Cooperative Storage-Level Deduplication for 110
Reduction in Virtualized Data Centers," IEEE International Symposium
on Modeling, Analysis & Simulation of Computer and
Telecommunication Systems ,pp.209-218,2012.
[6] Eunji Lee, Jee E. Jang, Taeseok Kim, Hyokyung Bahn, "On-Demand
Snapshot: An Efficient Versioning File System for Phase-Change
Memory," IEEE Transactions On Knowledge And Data Engineering,
Vol. 25,No. 12,December 2013.
[8] Philipp C. Heckel ( 2013, May 20). "Minimizing remote storage usage
and synchronization time using deduplication and multichunking,"
[Online]. Available: http://blog.philippheckel.com!
[13] Benjamin Zhu, Kai Li, and Hugo Patterson, "Avoiding the Disk
Bottleneck in the Data Domain Deduplication File System," Proc. of the
USENIX File And Storage Technologies,2008.
[14] D. T. Meyer, W. 1. Bolosky (2012), " A Study of Practical
Deduplication,"[Online]. Available:http://static.usenix.Orgf