Professional Documents
Culture Documents
Technical Report - Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision
Technical Report - Netapp a-SIS Deduplication - Deployment and Implementation Guide-4th Revision
Network Appliance, Inc. | Bill May, Data Protection and Retention Technical Marketing | 16 April 2008 | TR-3505
4th Revision
TECHNICAL REPORT
NetApp, a pioneer and industry leader in data storage technology, helps organizations understand and meet complex technical challenges with advanced storage solutions and global data management strategies.
Abstract
This guide introduces the NetApp deduplication for FAS technology and describes in detail how to implement and utilize it. It should prove useful for customers requiring assistance in understanding and architecting solutions with deduplication for FAS and NetApp storage systems.
NetApp, Inc.
NetApp, Inc.
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Table of Contents
1 Introduction............................................................................................................1
1.1 1.2 1.3 1.4 Intended Audience...................................................................................................... 1 Purpose....................................................................................................................... 1 Prerequisites and Assumptions ................................................................................. 1 Document Conventions.............................................................................................. 1
Overview.................................................................................................................2
2.1
2.1.1 2.1.2
2.2 2.3
2.3.1
Command Summary .................................................................................................. 7 Deduplication Quick Start Guide................................................................................ 8 Monitoring Deduplication Status ................................................................................ 8 End-to-End Deduplication Configuration Example.................................................. 10 Configuring Deduplication Schedules...................................................................... 14
Deduplication Target Environment .......................................................................... 16 Deduplication Performance...................................................................................... 16 Deduplication Storage Savings................................................................................ 16 Additional Deduplication Considerations ................................................................. 16
Number of Deduplication Processes.................................................................................. 17 Deduplication and Active/Active Configuration .................................................................. 17 Deduplication and Space Savings on Existing Data ......................................................... 17 Deduplication Best Practices .............................................................................................. 18
NetApp, Inc.
ii
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
NetApp, Inc.
iii
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
1 Introduction
1.1 Intended Audience
This technical report is designed for customers who seek education on the NetApp deduplication for FAS capability introduced in Data ONTAP 7.2L1, with the current minimum requirement of Data ONTAP 7.2.4. It will be most beneficial to those who are already familiar with NetApp hardware and software.
1.2 Purpose
The purpose of this paper is to present a guide for implementing NetApp deduplication for FAS. It will address step-by-step configuration examples, introduce known caveats and recommendations to assist the reader in designing optimal solutions, and prepare the audience for performing deployments of the technology in customer environments. Its use is threefold: Provide detailed information to all interested parties. Educate prior to performing deployments. Serve as a reference for resolving issues that could arise. This document is not: A sales guide (although some high-level thoughts are covered in the Solutions Overview section) A competitive comparison A complete product design document
NetApp, Inc.
1
Introduction
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
2 Overview
This section provides a quick overview of deduplication in general and then introduces what NetApp deduplication for FAS is and how it works at a high level.
Overview
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
NetApp, Inc.
3
Overview
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
To keep track of the many indirect blocks (IND in Figure 2) that are pointing to it, each data block has a block count reference kept in the volume metadata. As additional indirect blocks point to it or existing ones stop pointing to it, this value is incremented or decremented accordingly. When no indirect blocks point to a data block, it is released. Deduplication uses dense volume technology to allow duplicate blocks anywhere in the flexible volume to be deleted.
NetApp, Inc.
4
Overview
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Essentially, deduplication only stores unique blocks in the flexible volume and creates a small amount of additional metadata in the process. Notable features of NetApp deduplication for FAS include: Works with a high degree of granularity, at the block level. Operates on the active file system of the flexible volume. Snapshot copies created after running deduplication enjoy the same storage savings benefits. Is a background process that can be configured to run automatically, scheduled, or run manually through the command-line interface. Is application transparent and therefore can be used for deduplication of data originating from anywhere in the data center. Is enabled and managed using a simple command-line interface. Can be enabled on and deduplicate blocks on flexible volumes with existing data too.
The remainder of this document goes into great detail on the operation of deduplication, but in general the following occurs: Newly saved data on the NearStore is stored in blocks as usual by Data ONTAP. Each block of data has a digital fingerprint, which is compared to all other fingerprints in the flexible volume. If two fingerprints are found to be the same, a byte-for-byte comparison is done of all bytes in the block, and, if there is an exact match between the new block and the existing block on the flexible volume, the duplicate block is discarded and its disk space is reclaimed.
NetApp, Inc.
5
Overview
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
FAS3040, N5300: 3TB FAS3050, N5500: 2TB FAS3020, N5200: 1TB FAS2050: FAS2020: Protocols Applications 1TB 0.5TB
All file-based and block-based protocols supported by Data ONTAP Refer to the Deduplication Target Environment section
NetApp, Inc.
6
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Verifies and updates the fingerprint database for the specified flexible volume and includes purging stale fingerprints. Displays the statistics of flexible volumes that have deduplication enabled. Converts an deduplication-enabled flexible volume to a normal flexible volume.
sis on <vol>
Create, Modify, Delete Schedules (if not doing manually) Manually Run Deduplication (if not using schedules) Monitor Status of Deduplication Monitor Space Savings
Delete or modify the default deduplication schedule that was configured when deduplication was first enabled on the flexible volume or create desired schedule. sis config [-s sched] <vol> sis start <vol>
df s <vol>
NetApp, Inc.
8
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Below, from the sis man page, you see the various State, Status, and Progress messages that can be returned when running sis status. Note that if you dont provide a flexible volume name, the status for all flexible volumes that have deduplication enabled will be displayed. toaster> sis status Path /vol/dvol_1 /vol/dvol_2 /vol/dvol_3 /vol/dvol_4 /vol/dvol_5 /vol/dvol_6 /vol/dvol_7 /vol/dvol_8 State Enabled Enabled Disabled Enabled Enabled Enabled Enabled Enabled Status Idle Pending Idle Active Active Active Active Active Progress Idle for 10:45:23 Idle for 15:23:41 Idle for 37:12:34 25 GB Scanned 25 MB Searched 40 MB (20%) Done 30 MB Verified 10% Merged
And following is a textual description of the meaning for each flexible volume: dvol_1 is Idle. The last deduplication operation on the flexible volume was finished 10:45:23 ago. dvol_2 is Pending for resource limitation. The deduplication operation on the flexible volume will become Active when the resource is available. dvol_3 is Idle because the deduplication operation is disabled on the flexible volume. dvol_4 is Active. The deduplication operation is doing the whole flexible volume scanning (initiated with sis start s). So far, it has scanned 25GB of data. dvol_5 is Active. The operation is searching for duplicate data, and 25MB of data has already been searched. dvol_6 is also Active. The operation has saved 40MB of data. This is 20% of the total duplicate data found in the searching stage. dvol_7 is Active. It is verifying the metadata of processed data blocks. This process will remove unused metadata. dvol_8 is Active. Verified metadata are being merged. This process will merge together all verified metadata of processed data blocks to an internal format that supports fast sis operation.
The general flow of the phases deduplication goes through and the correlating sis status messages when actively running on a flexible volume are shown in Figure 4.
NetApp, Inc.
9
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
For additional information, the -l option will display detailed status, as shown below. toaster> sis status -l /vol/dvol_6 Path: State: Status: Progress: Type: Schedule: Last Operation Begin: Last Operation End: Last Operation Size: Last Operation Error: /vol/dvol_6 Enabled Active 41020 KB (20%) Done Regular sun-sat@0 Thu Mar 24 13:30:00 PST 2005 Fri Mar 25 00:34:16 PST 2005 4732932 KB -
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
1. Begin by creating a flexible volume (keeping in mind the maximum allowable volume size for the platform, as specified in the requirements table at the beginning of this section). r200-rtp01*> vol create VolPST aggr0 200g Creation of volume 'VolPST' with size 200g on containing aggregate 'aggr0' has completed.
2. Now, as a best practice, well disable scheduled Snapshot copies. An alternative to whats shown below would be to use the command snap sched VolPST 0 0 0. r200-rtp01*> vol status VolPST Volume State VolPST online Status raid_dp, flex Options
Containing aggregate: 'aggr0' r200-rtp01*> vol options VolPST nosnap true r200-rtp01*> vol status VolPST Volume State VolPST online Status raid_dp, flex Options nosnap=on
Containing aggregate: 'aggr0' 3. Now well enable deduplication on the flexible volume and verify that its turned on. The vol status command will show a sis attribute for flexible volumes that have deduplication turned on. (It can be a bit confusing, since sis is also indicated for those flexible volumes that have been written to by SnapVault for NetBackup.) Note that there needs to be space available in the flexible volume for the sis on command to complete successfully. That is, if the sis on command were attempted on a flexible volume that already had data and was completely full, it would fail (since there is no room to create the required metadata). Note that after turning deduplication on, Data ONTAP lets you know that if this were an existing flexible volume that already contained data prior to deduplication being enabled, you would want to run sis start s; in this example its a brand-new flexible volume, so thats not necessary. r200-rtp01*> sis on /vol/VolPST SIS for "/vol/VolPST" is enabled. Already existing data could be processed by running "sis start -s /vol/VolPST". r200-rtp01*> vol status VolPST Volume State VolPST online Status raid_dp, flex sis Containing aggregate: 'aggr0' Options nosnap=on
NetApp, Inc.
11
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
4. Another way to verify that deduplication is enabled on the flexible volume is to just check the output from running sis status on the flexible volume. r200-rtp01*> sis status /vol/VolPST Path /vol/VolPST State Enabled Status Idle Progress Idle for 00:00:20
5. Next well turn off the default deduplication schedule. Since in this example the administrators will be moving large quantities of PST files in as time permits, well want to let them run deduplication manually at opportune times. r200-rtp01*> sis config /vol/VolPST Path /vol/VolPST Schedule sun-sat@0
r200-rtp01*> sis config -s - /vol/VolPST r200-rtp01*> sis config /vol/VolPST Path /vol/VolPST Schedule -
At this point, in our example, the administrator NFS-mounted the flexible volume to /testPSTs on a Solaris host, sunv240-rtp01, and copied lots of PST files from their users directories into our new PST archive directory flexible volume. The result from the host perspective is shown below. (Obviously the same sort of thing could be accomplished by mapping a CIFS share to a Windows host.) root@sunv240-rtp01 # pwd /testPSTs root@sunv240-rtp01 # df -k . Filesystem kbytes used avail capacity r200-rtp01:/vol/VolPST 167772160 33388384 134383776 20%
Mounted on /testPSTs
The example continues with examining the flexible volume, running deduplication, and monitoring the status.
6. Use df s to examine the storage consumed and the space savings provided. Note that no space savings have been achieved by simply copying data to the flexible volume even though deduplication is turned on. What has happened is that all the blocks that have been written to this flexible volume since deduplication was turned on have had their fingerprints written to the change log file. r200-rtp01*> df -s /vol/VolPST Filesystem /vol/VolPST/ used 33388384 saved 0 %saved 0%
7. Start deduplication running on the flexible volume. This causes the change log to be processed, fingerprints to be sorted and merged, and duplicate blocks to be found. r200-rtp01*> sis start /vol/VolPST
NetApp, Inc.
12
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
r200-rtp01*> sis status /vol/VolPST Path /vol/VolPST State Enabled Status Active Progress 11 MB (0%) Done
r200-rtp01*> sis status /vol/VolPST Path /vol/VolPST State Enabled Status Active Progress 1692 MB (14%) Done
r200-rtp01*> sis status /vol/VolPST Path /vol/VolPST State Enabled Status Active Progress 10 GB (90%) Done
r200-rtp01*> sis status /vol/VolPST Path /vol/VolPST State Enabled Status Active Progress 11 GB (99%) Done
r200-rtp01*> sis status /vol/VolPST Path /vol/VolPST State Enabled Status Idle Progress Idle for 00:00:07
9. Once sis status indicates the flexible volume is once again in the Idle state, deduplication has finished running, and we can now check the space savings it provided in the flexible volume. r200-rtp01*> df -s /vol/VolPST Filesystem /vol/VolPST/ used 24072140 saved 9316052 %saved 28%
NetApp, Inc.
13
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Run with no arguments, sis config will return the schedules for all flexible volumes that have deduplication enabled. The example below shows the four different formats the reported schedules can have. toaster> sis config Path /vol/dvol_1 /vol/dvol_2 /vol/dvol_3 /vol/dvol_4 Schedule 23@sun-fri auto sat@6
The meaning of each of these schedule types is as follows. On flexible volume dvol_1 deduplication is not scheduled to run. On flexible volume dvol_2 deduplication is scheduled to run every day from Sunday to Friday at 11 p.m. On flexible volume dvol_3 deduplication is set to auto schedule. This means deduplication will be triggered by the amount of new data written to the flexible volume, specifically when there are 20% new fingerprints in the change log. On flexible volume dvol_4 deduplication is scheduled to run at 6 a.m. on Saturday.
When the -s option is specified, the command will set up or modify the schedule on the specified flexible volume. The schedule parameter can be specified in one of four ways: [day_list][@hour_list] [hour_list][@day_list] auto
The day_list specifies which days of the week deduplication should run. It is a comma-separated list of the first three letters of the day: sun, mon, tue, wed, thu, fri, sat. The names are not case sensitive. Day ranges such as mon-fri can also be given. The default day_list is sun-sat. The hour_list specifies which hours of the day deduplication should run on each scheduled day. The hour_list is a comma-separated list of the integers from 0 to 23. Hour ranges such as 8-17 are allowed.
NetApp, Inc.
14
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Step values can be used in conjunction with ranges. For example, 0-23/2 means "every two hours." The default hour_list is 0 (that is, midnight on the morning of each scheduled day). If "-" is specified, there won't be a scheduled deduplication operation on the flexible volume. The auto schedule causes deduplication to run on that flexible volume whenever there are 20% new fingerprints in the change log. This check is done in a background process and occurs every minute. When deduplication is enabled on a flexible volume the first time, an initial schedule is assigned to the flexible volume. This initial schedule is sun-sat@0, which means "once every day at midnight." To configure the schedules shown earlier in this section, the following commands would be issued: toaster> sis config -s - /vol/dvol_1 toaster> sis config -s 23@sun-fri /vol/dvol_2 toaster> sis config s auto /vol/dvol3 toaster> sis config s sat@6 /vol/dvol_4
NetApp, Inc.
15
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
4 Operating Characteristics
This section discusses where deduplication makes sense and the behavior that you can expect.
NetApp, Inc.
16
Operating Characteristics
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
NetApp, Inc.
17
Operating Characteristics
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
If there is very little new data, run deduplication infrequently, because it doesn't make sense to unnecessarily consume CPU resources. How often you run it will depend on the change rate of the data in the flexible volume. The best options are: Use the auto mode so that deduplication only runs when significant additional data has been written to each particular flexible volume (this will tend to naturally spread out when deduplication runs). Stagger deduplication schedules for the flexible volumes so it runs on alternative days. Run deduplication manually. Run deduplication before creating Snapshot copies, as this will ensure no undeduplicated data gets locked in Snapshot copies. If a Snapshot copy is created on a flexible volume before deduplication has a chance to run/complete on that flexible volume, this could result in lower space savings. The Snapshot reserve should be greater than 0 if Snapshot copies are to be used. (An exception to this might be in a SAN environment, where often it is set to zero for thin provisioning of LUNs.) There must be some free space in the flexible volume to allow deduplication to operate and create the metadata it requires. As necessary, flexible volumes can be resized, with no impact to data access, to accommodate this.
NetApp, Inc.
18
Operating Characteristics
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
5.1 Licensing
Make sure deduplication is properly licensed and, if the platform is not an R200, make sure the NearStore option is also properly licensed: fas3070-rtp01*> license a_sis <license> nearstore_option <license>
Also note that there needs to be free space available in the flexible volume for the sis on command to complete successfully. If a flexible volume is full, deduplication will not run. However, as noted earlier, flexible volumes can be resized with no impact to data access to accommodate this.
NetApp, Inc.
19
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
r200-rtp01*> sis status /vol/VolReallyBig2 Path /vol/VolReallyBig2 State Enabled Status Idle Progress Idle for 11:11:13
r200-rtp01*> sis off /vol/VolReallyBig2 SIS for "/vol/VolReallyBig2" is disabled. r200-rtp01*> sis status /vol/VolReallyBig2 Path /vol/VolReallyBig2 State Disabled Status Idle Progress Idle for 11:11:34
r200-rtp01*> sis undo /vol/VolReallyBig2 Wed Feb 7 11:13:15 EST [wafl.scan.start:info]: Starting SIS volume scan on volume VolReallyBig2. r200-rtp01*> sis status /vol/VolReallyBig2 Path /vol/VolReallyBig2 State Disabled Status Undoing Progress 424 MB Processed
Note that the undo option of the sis command is only available in the diag mode, accessed using the command priv set diag.
NetApp, Inc.
20
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
No status entry found. r200-rtp01*> df -s /vol/VolReallyBig2 Filesystem /vol/VolReallyBig2/ used 24149560 saved 0 %saved 0%
Note that if sis undo starts processing and then there is not enough space to undeduplicate, it will stop, complain with a message about insufficient space, and leave the flexible volume dense. All data is still accessible, but some block sharing is still occurring. Use df s to understand how much free space you really have and then either grow the flexible volume or delete data or Snapshot copies to provide the needed free space.
NetApp, Inc.
21
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Key points in this scenario are: The nearstore_option must be licensed on both the source and destination. Deduplication must be licensed at the primary location (source). Deduplication does not need to be licensed at the destination. However, if there is a situation in which the primary site is down and the secondary location becomes the new primary, deduplication needs to be licensed for continued deduplication to occur. Thus, the best practice is to have deduplication licensed at both locations. Deduplication is only enabled, run, and managed from the primary location. The flexible volume at the secondary location will inherit all the deduplication attributes and storage savings through SnapMirror. Only unique blocks are transferred, so deduplication reduces network bandwidth usage too.
NetApp, Inc.
22
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
Key points in this scenario are: The nearstore_option must be licensed on the destination. Deduplication is only licensed at the secondary location (destination). Deduplication is enabled, run, and managed on a flexible volume at the secondary location. Deduplication doesnt yield any network bandwidth savings as QSM works at the logical layer. Storage savings benefit at the QSM destination is achieved by running deduplication on the destination after QSM has finished transferring the data.
NetApp, Inc.
23
Technical Report: NetApp Deduplication for FAS Deployment and Implementation Guide
16 April 2008
TR-3505
4th Revision
NetApp, Inc. 2008 NetApp, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the NetApp logo, Data ONTAP, FlexClone, FlexVol, NearStore, SnapMirror, SnapVault, and WAFL are registered trademarks and NetApp and Snapshot are trademarks of NetApp, Inc. in the U.S. and other countries. Solaris is a trademark of Sun Microsystems, Inc. Windows and Microsoft are registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. NetBackup is a trademark of Symantec Corporation or its affiliates in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.
24