You are on page 1of 79

NetApp From the Ground Up A Beginners Guide

Index
FEBRUARY 15, 2015 / WILL ROBINSON
Below is a list of all the posts in the NetApp From the Ground Up A Beginners
Guide series:

NetApp From the Ground Up Part 1


NetApp From the Ground Up Part 2
NetApp From the Ground Up Part 3
NetApp From the Ground Up Part 4
NetApp From the Ground Up Part 5
NetApp From the Ground Up Part 6
NetApp From the Ground Up Part 7
NetApp From the Ground Up Part 8
NetApp From the Ground Up Part 9
NetApp From the Ground Up Part 10
NetApp From the Ground Up Part 11
NetApp From the Ground Up Part 12
NetApp From the Ground Up Part 13

NetApp From the Ground Up A Beginners Guide


Part 2
NOVEMBER 16, 2014 / WILL ROBINSON

Overview
Source #1
Reference: Datadisk NetApp Overview
The NetApp filer also know as NetApp Fabric-Attached Storage (FAS) is a type of
disk storage device which owns and controls a filesystem and present files and
directories over the network, it uses an operating systems called Data ONTAP
(based on FreeBSD).

NetApp Filers can offer the following:

Supports SAN, NAS, FC, SATA, iSCSI, FCoE and Ethernet all on the same
platform
Supports either SATA, FC and SAS disk drives
Supports block protocols such as iSCSI, Fibre Channel and AoE
Supports file protocols such as NFS, CIFS , FTP, TFTP and HTTP
High availability
Easy Management
Scalable
The NetApp Filer also know as NetApp Fabric-Attached Storage (FAS), is a data
storage device, it can act as a SAN or as a NAS, it serves storage over a network
using either file-based or block-based protocols:
File-Based Protocol: NFS, CIFS, FTP, TFTP, HTTP
Block-Based Protocol: Fibre Channel (FC), Fibre channel over Ethernet
(FCoE), Internet SCSI (iSCSI)
The most common NetAPP configuration consists of a filer (also known as
a controller or head node) and disk enclosures (also known as shelves), the
disk enclosures are connected by Fibre Channel or parallel/serial ATA, the filer
is then accessed by other Linux, Unix or Window servers via a network (Ethernet or
FC).

All filers have a battery-backed NVRAM, which allows them to commit


writes to stable storage quickly, without waiting on the disks. (Note, this is
disputed in the NVRAM page).
It is also possible to cluster filers to create a highly-availability cluster with a
private high-speed link using either FC or InfiniBand, clusters can then be grouped
together under a single namespace when running in the cluster mode of the Data
ONTAP 8 operating system.

The filer will be either Intel or AMD processor based computer using PCI, each filer
will have a battery-backed NVRAM adaptor to log all writes for performance and to
replay in the event of a server crash. The Data ONTAP operating system
implements a single proprietary file-system called WAFL (Write Anywhere File
Layout).

WAFL is not a filesystem in the traditional sense, but a file layout that supports
very large high-performance RAID arrays (up to 100TB), it provides mechanisms
that enable a variety of filesystems and technologies that want to access disk
blocks. WAFL also offers :

snapshots (up to 255 per volume can be made)


snapmirror (disk replication)
syncmirror (mirror RAID arrays for extra resilience, can be mirrored up to
100km away)
snaplock (Write once read many, data cannot be deleted until its retention
period has been reached)
read-only copies of the file system
read-write snapshots called FlexClone
ACLs
quick defragmentation

Source #2
Reference: NetApp University Introduction to NetApp Products
At its most basic, data storage supports production; that is, real-time read and
write access to a companys data. Some of this data supports infrastructure
applications, such as Microsoft Exchange or Oracle databases, which typically
use SAN technologies, FC, and iSCSI. Most environments also have large volumes
of file data, such as users home directories and departmental shared folders.
These files are accessed by using NAS technologies, SMB, and NFS. One of the
most important features of the Data ONTAP operating system is its ability to
support SAN and NAS technologies simultaneously on the same platform. What
once required separate systems with disparate processes is now unified, allowing
greater economies of scale and reducing human error.
NetApp Filer

Reference: Datadisk NetApp Overview

Reference: Storage-switzerland What is a Storage Controller?


The most common NetAPP configuration consists of a filer (also known as
a controller or head node) and disk enclosures (also known as shelves), the
disk enclosures are connected by Fibre Channel or parallel/serial ATA, the filer
is then accessed by other Linux, Unix or Window servers via a network (Ethernet or
FC).

Storage arrays all have some form of a processor embedded into a controller. As
a result, a storage arrays controller is essentially a server thats responsible for
performing a wide range of functions for the storage system. Think of it as a
storage computer. This computer can be configured to operate by itself (single
controller), in a redundant pair (dual controller) or even as a node within a
cluster of servers (scale out storage). Each controller has an I/O path to
communicate to the storage network or the directly-attached servers, an I/O path
that communicates to the attached storage devices or shelves of devices and a
processor that handles the movement of data as well as other data-related
functions, such as RAID and volume management.

In the modern data center the performance of the storage array can
be directly impacted (and in many cases determined) by the speed and
capabilities of the storage controller. The controllers processing capabilities
are increasingly important. There are two reasons for this. The first is the high
speed storage infrastructure. The network can now easily send data at 10
Gigabits per second (using 10 GbE) or even 16 Gbps on Fibre Channel. That
means the controller needs to be able to process and perform actions on this
inbound data at even higher speeds, generating RAID parity for example.

Also the storage system may have many disk drives attached to it and the storage
controller has to be able to communicate with each of these. The more drives,
the more performance the storage controller has to maintain. Thanks to Solid
State Drives (SSD) a very small number of drives may be able to generate more
I/O than the controller can support. The controller used to have time between drive
I/Os to perform certain functions. With high quantities of drives or high
performance SSDs, that time or latency is almost gone.

The second reason that the performance capabilities of the storage controller is
important is that the processor on the controller is responsible for an increasing
number of complex functions. Besides basic functions, such as RAID and volume
management, todays storage controller has to handle tasks such as:

snapshots
clones
thin provisioning
auto-tiering
replication
Snapshots, clones and thin provisioning are particularly burdensome since
they dynamically allocate storage space as data is being written or changed on the
storage system. This is a very processor-intensive task. The storage system
commits to the connecting host that the capacity it expects has already been set
aside, and then in the background the storage system works to keep those
commitments.

Automated tiering moves data between types or classes of storage so that the
most active data is on the fastest type of storage and the least active is on the
most cost-effective class of storage. This moving of data back and forth is a lot of
work for the controller. Automated tiering also requires that the controller analyze
access patterns and other statistics to determine which data should be where.
You dont want to promote every accessed file, only files that have reached a
certain level of consistent access. The combined functions represent a significant
load on the storage processors.

There are, of course, future capabilities that will also require some of the
processing power of the controller. An excellent example is deduplication, which
is becoming an increasingly popular feature on primary storage. As we expect
more from our storage systems the storage controller has an increasingly
important role to play, and will have to get faster or be able to be clustered in
order to keep up.

From a use perspective its important to pay attention to the capabilities of the
controller. Does it have enough horsepower to meet your current and future
needs? What are the options if you reach the performance limits of the processor?
In order to maintain adequate performance as storage systems grow and add CPU-
intensive features, companies need to budget for extra headroom in their
controllers. These systems need to either have plenty of extra headroom, have the
ability to add controller processing power or companies should look at one of the
scale out storage strategies like The Storage Hypervisor, Scale Out Storage or Grid
Storage.

Shelf with Controller Integrated

Reference: Mtellin NetApp Introduces New Controllers


Reference: FAS2240: An Inside Look
The FAS2240 is a storage shelf with the controllers inserted into the
back. The 2240-2 is a 2u system and based on the current FAS2246 SAS shelf,
while the 2240-4 is a 4u system and based on the current FAS4243 shelf. The
FAS2240-2 utilizes 2.5 SAS drives and supports either 450 or 600GB drives as of
today. The FAS2240-4 utilizes 3.5 SATA drives and supports 1, 2 or 3TB SATA
drives as of today. Both systems can be ordered with either 12 or 24 drives.

If you outgrow your FAS2240-2 or FAS2240-4, you can convert your base
chassis into a standard disk shelf by replacing the FAS2240 controller(s) (which
we call processor controller modules, or PCMs) with the appropriate I/O modules
(IOMs). Connect the converted shelf to a FAS3200 or FAS6200 series storage
system without the need to migrate data, and you are back in business with no
data migration required and a minimum of downtime.

Rear view of the FAS2240-2 controller

Rear view of the FAS2240-4 controller

NetApp From the Ground Up A Beginners Guide


Part 1
NOVEMBER 16, 2014 / WILL ROBINSON
I have recently started working on a FlexPod environment. The environment relies
on NetApp for all of its storage needs, but unfortunately I hadnt worked with
NetApp products before. Because of this I have been spending most of my spare
time reading and labbing so that I could build my skills up as quickly as possible.

An issue I ran into, and what I plan to resolve with a series of posts, is the lack of
structured information that is aimed at people who have not worked with NetApp
products before. (Im happy to be proven wrong so please feel free to comment on
this post or e-mail me if you feel I am incorrect).

Dont get me wrong, there is a limitless amount of well written NetApp


documentation out there, as are there blogs, instructional videos, NetApp
University, Technical Reports the list goes on. However, as these resources arent
really aimed at complete newcomers, the material doesnt easily flow from topic to
topic, so newcomers are left to piece bits and pieces of information together.
Having said this, I hope I dont sound like Im complaining because I have
thoroughly enjoyed the learning process.

One other thing I would like to point out is that most of the notes I have collected
during my studies (and therefore, most of the information in this series of posts)
were gathered from numerous sources. Im simply putting the
information together in a structured manner. Because of this, I will include
references to each of the sources so that the authors will get the recognition that
they deserve. Further to this, in most cases I have only collected snippets from
each source, so please also visit the reference links if youd like to view an
unedited/unsnipped version of the information.

Finally, once I have completed this series of posts, I will release them in PDF format
along with a table of contents to enable easy searching and navigation.

Now that I have the intro out of the way, my next post will jump straight into it. I
hope you enjoy it.

NetApp From the Ground Up A Beginners Guide Part 3


NOVEMBER 16, 2014 / WILL ROBINSON
Storage
Summary
Reference: Me :)
Vservers : contain one or more FlexVol volumes, or a single Infinite Volume
Volume : is like a partition that can span multiple physical disks
LUN : is a big file that is inside the volume. the LUN is what gets presented to the host.
RAID, Volumes, LUNs and Aggregates
Response #1
Reference: NetApp Community: Understanding Aggregate And LUN
An aggregate is the physical storage. It is made up of one or more raid groups of disks.
A LUN is a logical representation of storage. It looks like a hard disk to the client. It looks like a file inside of
a volume.
Raid groups are protected sets of disks. consising of 1 or 2 parity, and 1 or more data disks. We dont build raid groups,
they are built automatically behind the scene when you build an aggregate. For example:
In a default configuration you are configured for RAID-DP and a 16 disk raid group (assuming FC/SAS disks). So, if i create a
16 disk aggregate i get 1 raid group. If I create a 32 disk aggregate, i get 2 raid groups. Raid groups can be adjusted in
size. For FC/SAS they can be anywhere from 3 to 28 disks, with 16 being the default. You may be tempted to change the
size so i have a quick/dirty summary of reasons.

Aggregates are collections of raid groups. They consist of one or more Raid Groups.
I like to think of aggregates as a big hard drive. There are a lot of similarities in this. When you buy a hard drive you
need partition it and format it before it can be used. Until then its basically raw space. Well, thats an aggregate. its just
raw space.
A volume is analogous to a partition. Its where you can put data. Think of the previous analogy. An aggregate is the raw
space (hard drive), the volume is the partition, its where you put the file system and data. Some other similarities include
the ability to have multiple volumes per aggregate, just like you can have multiple partitions per hard drive. and you can
grow and shrink volumes, just like you can grow and shrink partitions.
A qtree is analogous to a subdirectory. Lets continue the analogy. Aggregate is hard drive, volume is partition,
and qtree is subdirectory. Why use them? to sort data. The same reason you use them on your personal PC. There are 5
things you can do with a qtree you cant do with a directory and thats why they arent just called directories:
Oplocks
Security style
Quotas
Snapvault
Qtree SnapMirror
Last but not least, LUNs. Its a logical representation of space, off your SAN. But the normal question is when do I use a LUN
instead of CIFS or NFS share/export. I normally say it depends on the Application. Some applications need local storage,
they just cant seem to write data into a NAS (think CIFS or NFS) share. Microsoft Exchange and SQL are this way. They
require local hard drives. So the question is, how do we take this network storage and make it look like an internal hard
drive. The answer is a LUN. It takes a bit of logical space out of the aggregate (actually just a big file sitting in
a volume or qtree) and it gets mounted up on the windows box, looking like an internalhard drive. The file system makes
normal SCSI commands against it. The SCSI commands get encapsulated in FCP or iSCSI and are sent across the network to
the SAN hardware where its converted back into SCSI commands then reinterpreted as WAFL read/writes.
Some applications know how to use a NAS for storage (think Oracle over NFS, or ESX with NFS datastores) and
they dont need LUNs. they just need access to shared space and they can store their data in it.
Response #2
Aggregates are the raw space in your storage system. You take a bunch of individual disks and aggregate them
together into aggregates. But, an aggregate cant actually hold data, its just raw space. You then layer on partitions, which
in NetApp land are called volumes. The volumes hold the data.
You make aggregates for various reasons. For example:
Performance Boundaries A disk can only be in one aggregate. So each aggregate has its own discreet
drives. This lets us tune the performance of the aggregate by adding in however many spindles we need to achieve
the type of performance we want. This is kind of skewed by having Flash Cache cards and such, but its still roughly
correct.
Shared Space Boundary All volumes in an aggregate share the hard drives in that aggregate. There is no way to
prevent the volumes in an aggregate from mixing their data on the same drives. I ran into a problem at one
customer that, due to regulatory concerns, couldnt have data type A mixed with data type B. The only way to
achieve this is to have two aggregates.
For volumes You cant have a flexible volume without an aggregate. Flex Volsare logical, Aggregates
are physical. You layer one or more flex vols on top (in side) of an aggregate.
Response #3
An aggregate is made of Raid Groups.
Lets do a few examples using the command to make an aggregate:
aggr create aggr1 16
If the default raid group size is 16, then the aggregate will have one raid group. But, if i use the command:
aggr create aggr1 32
Now I have two full raid groups, but still only one aggregate. So, the aggregate gets the performance benefit of 2 RGs
worth of disks. Notice we did not build a raid group. Data ONTAP built the RG based on the default RG size.
I explain this in more detail in a previous post in this thread.
Response #4
Volumes are accessed via NAS protocols, CIFS/NFS
LUNS are accessed via SAN protocols, iSCSI/FCP/FCoE
In the end you can put data in a LUN, you can put data in a Volume. Its how you get there thats the question.
Response #5
LUNs are logical. They go inside a volume, or in a qtree.
from a NetApp perspective they are really just one big file sitting inside of a volume or qtree.
from a host perspective, they are like a volume, but use a different protocol to access them (purists will argue with that but
im simplifying). LUNs provide a file system, like Volumes provide a file system, the major difference is who controls the
files system. with a LUN the storage system cant see the file system, all it sees is one big file. The host mounts the file
system via one of the previously mentioned protocols and lays a file system down inside. The host then controls that file
system.
I normally determine LUN/Volume usage by looking at the Application. Some apps wont work across a network, Microsoft
SQL and Exchange are two examples of this. They require local disks. LUNs look like local disks. Some applications work just
fine across the network, using NFS, like Oracle. In the end its normally the application that will determine whether you get
your filesystem access through a LUN or a Volume.
some things like Oracle or VMware can use either LUNs or NFS volumes, so with them its whatever you find easier or
cheaper.
Response #6
The underlying filesystem is always WAFL in the volume.
when you share out a volume it looks like NTFS to a windows box, or it looks like a UNIX filesystem to a unix box but in the
end its just WAFL in the volume.
with a LUN its a bit different.
You first make a volume, then you put a LUN in the volume. the volume has WAFL as the file system, the LUN looks
like one big file in the volume.
You then connect to the storage system using a SAN protocol. The big file we call a LUN is attached to the host via the SAN
protocol and looks like a big hard drive. The host then formats the hard drive with NTFS or whatever File system the unix
box is using. That file system is actually NTFS, or whatever. Its inside the LUN, which is big file inside of a Volume, which
has WAFL as its file system.
Response #7
Response #8
You create your volume group (or dynamic disk pool) and volumes (i.e. LUNs) on top of that.
If you have access to Field Portal, you can find more detailed info here:
https://fieldportal.netapp.com/e-series.aspx#150496
This is a good picture from one of the presos describing the architectural difference between FAS & E-Series:

Qtrees
Overview
Reference: NetApp
Qtrees represent the third level at which node storage can be partitioned. Disks are organized into aggregates which
provides pools of storage. In each aggregate, one or more flexible volumes can be created. Traditional volumes may also
be created directly without the previous creation of an aggregate. Each volume contains a file system. Finally, the volume
can be divided into qtrees.
Information
Reference: NetApp
The qtree command creates qtrees and specifies attributes for qtrees.
A qtree is similar in concept to a partition. It creates a subset of a volume to which a quota can be applied to limit its
size. As a special case, a qtree can be the entirevolume. A qtree is more flexible than a partition because you can change
the size of a qtree at any time.
In addition to a quota, a qtree possesses a few other properties.
A qtree enables you to apply attributes such as oplocks and security style to a subset of files and directories rather than to
an entire volume.
Single files can be moved across a qtree without moving the data blocks. Directories cannot be moved across a qtree.
However, since most clients use recursion to move the children of directories, the actual observed behavior is that
directories are copied and files are then moved.
Qtrees represent the third level at which node storage can be partitioned. Disks are organized into aggregates which
provides pools of storage. In each aggregate, one or more flexible volumes can be created. Traditional volumes may also
be created directly without the previous creation of an aggregate. Each volume contains a file system. Finally, the volume
can be divided into qtrees.
If there are files and directories in a volume that do not belong to any qtrees you create, the node considers them to be
in qtree 0. Qtree 0 can take on the same types of attributes as any other qtrees.
You can use any qtree command whether or not quotas are enabled on your node.
More Information
Reference: NetApp Forum
There are 5 things you can do with a qtree you cant do with a directory and thats why they arent just called directories:
Oplocks
Security style
Quotas
Snapvault
Qtree SnapMirror
RAID-DP
Understanding RAID-DP disk types
Reference: Understanding RAID disk types
Data ONTAP classifies disks as one of four types for RAID: data, hot spare, parity, or dParity. The RAID disk type is determined
by how RAID is using a disk; it is different from the Data ONTAP disk type.
Data disk: Holds data stored on behalf of clients within RAID groups (and any data generated about the state of the
storage system as a result of a malfunction).
Spare disk: Does not hold usable data, but is available to be added to a RAID group in an aggregate. Any
functioning disk that is not assigned to an aggregate but is assigned to a system functions as a hot spare disk.
Parity disk: Stores row parity information that is used for data reconstruction when a single disk drive fails within
the RAID group.
dParity disk: Stores diagonal parity information that is used for data reconstruction when two disk drives fail within
the RAID group, if RAID-DP is enabled.
RAID Groups and Aggregates
Reference: RAID Groups and Aggregates
In the course of teaching Netapps Data ONTAP Fundamentals course I have noticed that one of the areas that students
sometimes struggle with are RAID groups as they exist in Data ONTAP.
To begin with, Netapp uses dedicated parity drives, unlike many other storage vendors. Parity information is constructed
for a horizontal stripe of WAFL blocks in a RAID group within an aggregate and then written to disk at the same time the
data disks are updated. The width of the RAID group the number of data disks is independent of the parity disk or disks.
Take a look at this print screen from Filerview:

Notice that the RAID group size is 16. This is the default RAID group size for RAID-DP with Fibre Channel disks. Notice also
that the number of disk in Aggr1 is actually 5.
When I created aggr1 I used the command:
aggr create aggr1 5
This caused Data ONTAP to create an aggregate named aggr1 with five disks in it. Lets take a look at this with the
following command:
sysconfig r
If you notice aggr1, you can see that it contains 5 disks. Three disks are data disks and there are
two parity disks, parity and dparity. The RAID group was created automatically to support the aggregate. I have
a partial RAID group in the sense that the RAID group size is 16 (look at the Filerview screen shot). I only asked for an
aggregate with 5 disks, so aggr1 has an aggregate with one RAID group and 5 disk drives in it.
It is fully usable in this state. I can create volumes for NAS or SAN use and they are fully functional. If I need more space, I
can add disks to the aggregate and they will be inserted into the existing RAID group within the aggregate. I can add 3
disks with the following command:
aggr add aggr1 3
Look at the following output:

Notice that I have added three more data disks to /aggr1/plex0/rg0.


The same parity disks are protecting the RAID group.
Data ONTAP is able to add disks from the spare pool to the RAID group quickly if the spare disks are pre-zeroed. Before
the disks can be added, they must be zeroed. If they are not already zeroed, then Data ONTAP will zero them first. This may
take a significant amount of time. Spares as shipped by Netapp are pre-zeroed, but drives that join the spare pool after you
destroy and aggregate are not.
The inserted disks are protected by the same parity calculation that existed on the parity drives before they were inserted.
This works because the new WAFL blocks that align with the previous WAFL blocks in a parity stripe contain only zeroes.
They new (zeroed) disks have no affect on the parity drives.
Once the drives are part of the RAID groups within the aggregate, that space can be made available to volumes and used
by applications.
An aggregate can contain multiple RAID groups. If I had created an aggregate with 24 disks, then Data ONTAP would
have created two RAID groups. The first RAID group would be fully populated with 16 disks (14 data disks and two
parity disks) and the second RAID group would have contained 8 disks (6 data disks and two parity disks). This is a
perfectly normal situation.
For the most part, it is safe to ignore RAID groups and simply let Data ONTAP take care of things. The one situation you
should avoid however is creating a partial RAID group with only one or two data disks. (Using a dedicated aggregate to
support the root volume would be an exception to this rule.) Try to have at least three data disks in a RAID group for
better performance.
There is a hierarchy to the way storage is implemented with Data ONTAP. At the base of the hierarchy is the aggregate,
which is made up of RAID groups. The aggregate provides the physical space for the flexible volumes (flexvols) that
applications see. Applications, whether SAN or NAS, pull space that has been assigned to the volume from
the aggregate and are not aware of the underlying physical structure provided by the aggregate.
This is why we say that the aggregate represents the physical storage and the volumesprovide the logical storage.
RAID Groups
Reference: Playing with NetApp After Rightsizing
Before all the physical hard disk drives (HDDs) are pooled into a logical construct called an aggregate (which is what
ONTAPs FlexVol is about), the HDDs are grouped into a RAID group. A RAID group is also a logical construct, in which it
combines all HDDs into data or parity disks. The RAID group is the building block of the Aggregate.
So why a RAID group? Well, first of all, (although likely possible), it is not wise to group a large number of HDDs into a single
group with only 2 parity drives supporting the RAID. Even though one can maximize the allowable, aggregated capacity
from the HDDs, the data reconstruction or data resilvering operation following a HDD failure (disks are supposed to fail once
in a while, remember?) would very much slow the RAID operations to a trickle because of the large number of HDDs the
operation has to address. Therefore, it is best to spread them out into multiple RAID groups with a recommended fixed
number of HDDs per RAID group.
RAID group is important because it is used to balance a few considerations:
Performance in recovery if there is a disk reconstruction or resilvering
Combined RAID performance and availability through a Mean Time Between Data Loss (MTBDL) formula
Different ONTAP versions (and also different disk types) have different number of HDDs to constitute a RAID group. For
ONTAP 8.0.1, the table below are its recommendation.

So, given a large pool of HDDs, the NetApp storage administrator has to figure out the best layout and the optimal number
of HDDs to get to the capacity he/she wants. And there is also a best practice to set aside 2 hot spare HDDs for a RAID-DP
configuration with every 30 or so HDDs so that they can be used in the event of HDD failures. Also, it is best practice to
take the default recommended RAID group size most of the time as opposed to the maximum size.
I would presume that this is all getting very confusing, so let me show that with an example. Lets use the common 2TB SATA
HDD and lets assume the customer has just bought a 100 HDDs FAS6000. From the table above,
the default (and recommended) RAID group size is 14. The customer wants to have maximum usable capacity as well. In
a step-by-step guide,
1. Consider the hot sparing best practice. The customer wants to ensure that there will always be enough spares, so
using the rule-of-thumb of 2 spare HDDs per 30 HDDs, 6 disks are kept aside as hot spares. That leaves 94
HDDs from the initial 100 HDDs.
2. There is a root volume, rootvol, and it is recommended to put this into an aggregate of its own so that it gets
maximum performance and availability. To standardize, the storage administrator configures 3 HDDs as 1 RAID
group to create the rootvol aggregate, aggr0. Even though the total capacity used by the rootvol is just a few
hundred GBs, it is not recommended to place user data into rootvol. Of course, this situation cannot be avoided
in most of the FAS2000 series, where a smaller HDDs count are sold and implemented. With 3 HDDs used up as
rootvol, the customer now has 91 HDDs.
3. With 91 HDDs, and using the default RAID group size of 14, for the next aggregate of aggr1, the storage
administrator can configure 6 x full RAID group of 14 HDDs (6 x 14 = 84) and 1 x partial RAID group of 7
HDDs. (Note that as perthis post, theres nothing wrong with partial RAID groups).
4. RAID-DP requires 2 disks per RAID group to be used as parity disks. Since there are a total of 7 RAID
groups from the 91 HDDs, 14 HDDs are parity disks, leaving 77 HDDs as data disks.
This is where the rightsized capacity comes back into play again. 77 x 2TB HDDs is really 77 x 1.69TB = 130.13TB from an
initial of 100 x 2TB = 200TB.
If you intend to create more aggregates (in our example here, we have only 2 aggregates aggr0 and aggr1), there will be
more consideration for RAID group sizing and parity disks, further reducing the usable capacity.
More RAID Information
Reference: Playing with NetApp final usable capacity
An aggregate, for the uninformed, is the disks pool in which the flexible volume, FlexVol, is derived. In a simple picture
below.

OK, the diagrams in Japanese (I am feeling a bit cheeky today :P)!


But it does look a bit self explanatory with some help which I shall provide now. If you start from the bottom of the
picture, 16 x 300GB disks are combined together to create a RAID Group. And there are 4 RAID Groups created rg0,
rg1, rg2 and rg3. These RAID groups make up the ONTAP data structure called an aggregate. From ONTAP version 7.3
onward, there were some minor changes of how ONTAP reports capacity but fundamentally, it did not change much from
previous versions of ONTAP. And also note that ONTAP takes a 10% overhead of the aggregate for its own use.
With the aggregate, the logical structure called the FlexVol is created. FlexVol can be as small as several megabytes to
as large as 100TB, incremental by any size on-the-fly. This logical structure also allow shrinking of the capacity of the
volume online and on-the-fly as well. Eventually, the volumes created from the aggregate become the next-building blocks
of NetApp NFS and CIFS volumes and also LUNs for iSCSI and Fibre Channel. Also note that, for a more effective organization
of logical structures from the volumes, using qtree is highly recommended for files and ONTAP management reasons.
However, for both aggregate and the FlexVol volumes created from the aggregate, Snapshot reserve is recommended. The
aggregate takes a 5% overhead of the capacity for snapshot reserve, while for every FlexVol volume, a 20% snapshot
reserve is applied. While both snapshot percentage are adjustable, it is recommended to keep them as best practice
(except for FlexVol volumes assigned for LUNs, which could be adjusted to 0%).
Note: Even if the Snapshot reserve is adjusted to 0%, there are still some other rule sets for these LUNs that will further
reduce the capacity. When dealing with NetApp engineers or pre-sales, ask them about space reservations and how they
do snapshots for fat LUNs and thin LUNs and their best practices in these situations. Believe me, if you dont ask, you will be
very surprised of the final usable capacity allocated to your applications).
In a nutshell, the dissection of capacity after the aggregate would look like the picture below:

We can easily quantify the overall usable in the little formula that I use for some time:
Rightsized Disks capacity x # Disks x 0.90 x 0.95 = Total Aggregate Usable Capacity
Then remember that each volume takes a 20% Snapshot reserve overhead. Thats what you have got to play with when it
comes to the final usable capacity.
Though the capacity is not 100% accurate because there are many variables in play but it gives the customer a way to
manually calculate their potential final usable capacity.
Please note the following best practices and this is only applied to 1 data aggregateonly. For more aggregates, the same
formula has to be applied again.
1. A RAID-DP, 3-disk rootvol0, for the root volume is set aside and is not accounted for in usable capacity
2. A rule-of-thumb of 2-disks hot spares is applied for every 30 disks
3. The default RAID Group size is used, depending on the type of disk profile used
4. Snapshot reserves default of 5% for aggregate and 20% per FlexVol volumes are applied
5. Snapshots for LUNs are subjected to space reservation, either full or fractional. Note that there are considerations of
2x + delta and 1x + delta (ask your NetApp engineer) for iSCSI and Fibre Channel LUNs, even though snapshot
reserves are adjusted to 0% and snapshots are likely to be turned off.
Another note to remember is not to use any of those Capacity Calculators given. These calculators are designed to give
advantage to NetApp, not necessarily to the customer. Therefore, it is best to calculate these things by hand.
Regardless of how the customer will get as the overall final usable capacity, it is the importance to understand the NetApp
philosophy of doing things. While we have perhaps, went overboard explaining the usable capacity and the nitty gritty that
comes with it, all these things are done for a reason to ensure simplicity and ease of navigating data management in the
storage networking world. Other NetApp solutions such as SnapMirror and SnapVault and also the SnapManager suite of
product rely heavily on this.
Right-Sizing
See Part 11 for more information on Right-Sizing.
Other Posts in this Series:
See the NetApp From the Ground Up A Beginners Guide Index post for links to all of the posts in this series.
As always, if you have any questions or have a topic that you would like me to discuss, please feel free to post a comment at
the bottom of this blog entry, e-mail at will@oznetnerd.com, or drop me a message on Twitter (@OzNetNerd).
Note: This website is my personal blog. The opinions expressed in this blog are my own and not those of my
employer.
Related Posts
NetApp From the Ground Up A Beginners Guide Part 13
NetApp From the Ground Up A Beginners Guide Part 9
NetApp From the Ground Up A Beginners Guide Part 11
NetApp From the Ground Up A Beginners Guide Part 8
NetApp From the Ground Up A Beginners Guide Part 12
NetApp From the Ground Up A Beginners Guide Part 4
NetApp From the Ground Up A Beginners Guide Part 7

NetApp From the Ground Up A Beginners Guide


Part 4
NOVEMBER 16, 2014 / WILL ROBINSON

WAFL
Reference: Bitpushr: How Data ONTAP caches, assembles and writes data
WAFL is our Write Anywhere File Layout. If NVRAMs role is the most-commonly
misunderstood, WAFL comes in 2nd. Yet WAFL has a simple goal, which is to write
data in full stripes across the storage media. WAFL acts as an intermediary of sorts
there is a top half where files and volumes sit, and a bottom half (reference)
that interacts with RAID, manages SnapShots and some other things. WAFL isnt a
filesystem, but it does some things a filesystem does; it can also contain
filesystems. WAFL contains mechanisms for dealing with files & directories, for
interacting with volumes & aggregates, and for interacting with RAID. If Data
ONTAP is the heart of a NetApp controller, WAFL is the blood that it pumps.
Although WAFL can write anywhere we want, in reality we write where it makes the
most sense: in the closest place (relative to the disk head) where we can write a
complete stripe in order to minimize seek time on subsequent I/O requests. WAFL
is optimized for writes, and well see why below. Rather unusually for storage
arrays, we can write client data and metadata anywhere.

A colleague has this to say about WAFL, and I couldnt put it better:

There is a relatively simple cheating at Tetris analogy that can be used to


articulate WAFLs advantages. It is not hard to imagine how good you could be at
Tetris if you were able to review the next thousand shapes that were falling into
the pattern, rather than just the next shape.
Now imagine how much better you could be at Tetris if you could take any of the
shapes from within the next thousand to place into your pattern, rather than being
forced to use just the next shape that is falling.
Finally, imagine having plenty of time to review the next thousand shapes and plan
your layout of all 1,000, rather than just a second or two to figure out what to do
with the next piece that is falling. In summary, you could become the best Tetris
player on Earth, and that is essentially what WAFL is in the arena of data allocation
techniques onto underlying disk arrays.

The Tetris analogy incredibly important, as it directly relates to the way that
NetApp uses WAFL to optimize for writes. Essentially, we collect random I/O that is
destined to be written to disk, reorganize it so that it resembles sequential I/O as
much as possible, and then write it to disk sequentially. Another way of explaining
this behavior is that of write coalescing: we reduce the number of operations that
ultimately land on the disk, because we re-organize them in memory before we
commit them to disk and we wait until we have a bunch of them before committing
them to disk via a Consistency Point. Put another way, write coalescing allows to
avoid the common (and expensive) RAID workflow of read-modify-write.

Note: We write the clients data from RAM (not from NVRAM) to disk. (Reference).
More Information

Reference: NetApp University Introduction to NetApp Products


The Data ONTAP operating system supports production environments with robust
and distinctive data protection technologies that are not found in other storage
systems. How can a storage system that uses disk drives just like everyone else
provide such unique functionality? The secret to the success of the Data ONTAP
operating system is found in its core technologies: Write Anywhere File Layout
(WAFL), NVRAM, and Raid-DP. These components power the exclusive data
management capabilities of the Data ONTAP operating system.
The WAFL file system was developed with three architectural principles: optimize
for writes, integrate nonvolatile storage, and support RAID. The WAFL layer
optimizes write activity by organizing blocks of incoming data so that they can be
written simultaneously, across multiple disks, to enable maximum parallelism.
Internal file system metadata, known as inodes or pointers, is written alongside
production data, minimizing the need for the performance-impacting seek
operations common to other file systems.
In NAS environments, users access the WAFL file system directly through shares
and exports. In SAN environments, WAFL technology functions as a virtualization
layer, which enables the Data ONTAP process of optimizing block layout to remain
independent from the hosts proprietary layout inside the LUN.

The WAFL virtualization layer does a lot more than just put blocks on a disk. This
additional processing could introduce latency, but it does not. NVRAM is the key
component for delivering fast, low-latency data access while WAFL technology
virtualizes the storage subsystem. Each write or update request that the storage
system receives is logged to NVRAM and mirrored to the partner systems NVRAM.
Because the data is now protected by battery backup and the partner mirror, the
system can send the write acknowledgement without waiting for the storage layer,
which is much slower than NVRAM. In this way, data center production proceeds
over a purely electronic data path, resulting in high-speed, low-latency write and
update activity. The WAFL layer commits the writes to the storage medium, disk or
flash, independently. Each block of data must be successfully written before it is
cleared from NVRAM. NVRAM secures data and increases performance while the
WAFL layer intelligently organizes the destination storage structure.
NVRAM
Reference: Bit Pushr: How Data ONTAP caches, assembles and writes data
Physically, NVRAM is little more than RAM with a battery backup. Our NVRAM
contains a transaction log of client I/O that has not yet been written to disk from
RAM by a consistency point. Its primary mission is to preserve that not-yet-written
data in the event of a power outage or similar, severe problem. Our controllers
vary from 768MB of NVRAM (FAS2220) to 4GB of NVRAM (FAS6290). In my
opinion, NVRAMs function is perhaps the most-commonly misunderstood part of
our architecture. NVRAM is simply a double-buffered journal of pending write
operations. NVRAM, therefore, is simply a redo log it is not the write cache!
After data is written to NVRAM, it is not looked at again unless you experience a
dirty shutdown. This is because NVRAMs importance to performance comes from
software.
In an HA pair environment where two controllers are connected to each other,
NVRAM is mirrored between the two nodes. Its primary mission is to preserve data
that not-yet-written data in the event a partner controller suffers a power outage
or similar severe problem. NVRAM mirroring happens for HA pairs in Data ONTAP
7-mode, HA pairs in clustered Data ONTAP and HA pairs in MetroCluster
environments.

NetApp From the Ground Up A Beginners Guide


Part 5
NOVEMBER 17, 2014 / WILL ROBINSON

FlexVolume
Reference: NetApp University Introduction to NetApp Products
Data production and data protection are the basic capabilities of any storage
system. Data ONTAP software goes further by providing advanced storage
efficiency capabilities. Traditional storage systems allocate data disk by disk, but
the Data ONTAP operating system uses flexible volumes to drive higher rates of
utilization and to enable thin provisioning. NetApp FlexVol technology gives
administrators the flexibility to allocate storage at current capacity, rather than
having to guess at future needs. When more space is needed, the administrator
simply resizes the flexible volume to match the need. If the system is nearing its
total current capacity, more storage can be added while in production,
enabling just-in-time purchase and installation of capacity.
Note: You cant have a flexible volume without an aggregate.

Infinite Volume
Target Workloads and Use Cases

Reference: NetApp: Introduction to NetApp Infinite Volume


Infinite Volume was developed to provide a scalable, cost-effective solution for big
content workloads. Specifically, Infinite Volume addresses the requirements for
large unstructured repositories of primary data, which are also known as enterprise
content repositories.

Enterprise content repositories can be subdivided into workloads with similar


access patterns, data protection requirements, protocol requirements, and
performance requirements.
Infinite Volume is focused on use cases that can be characterized by input/output
(I/O) patterns in which data is written once and seldom changed. However, this
data is used for normal business operations, and therefore content must be kept
online for fast retrieval, rather than being moved to secondary storage.One
example of this type of workload is a video file archive. Libraries of large video files
are kept in a repository from which they are periodically retrieved and sent to
broadcast sites. These repositories typically grow as large as 5PB.

Another example is enterprise content management storage. This can be used to


store large amounts of unstructured content such as documents, graphics, and
scanned images. These environments commonly can contain a million or more
files.

More Information

Reference: Chucks Blog


Enter the world of vServers and the quintessentially named infinite volumes. If
you want to do seriously large file systems (like an Isilon normally does), there are
some interesting restrictions in play.
In ONTAP 8.1.1 to get meaningfully large file systems, you start with a dedicated
hardware/software partition within your 8.1.1 cluster. This partition will support one
(and apparently only one) vServer, or visible file system. Between the two
constructs exists a new entity: the infinite volume an aggregator of,
well, aggregates running on separate nodes.

This partitioned world of dedicated hardware, a single vServer and the new
infinite volume is the only place where you can start talking about seriously large
file systems.

Additional Information

Reference: NetApp: Introduction to NetApp Infinite Volume


Big content storage solutions can be categorized into three main categories based
on the storage, management, and retrieval of the data into file services, enterprise
content repositories, and distributed content repositories. NetApp addresses the
business challenges of big content by providing the appropriate solution for all of
these different environments.

File services represent the portion of the unstructured data market in which
NetApp has traditionally been a leader, including project shares and home
directory use cases.
The enterprise content repository market, by contrast, is less driven by direct
end users and more by applications that require large container sizes with an
increasing number of files.
Distributed content repositories take advantage of object protocols to provide
a global namespace that spans numerous data centers.
Infinite Volume addresses the enterprise content repository market and is
optimized for scale and ease of management. Infinite Volume is a cost-effective
large container that can grow to PBs of storage and billions of files. It is built on
NetApps reliable fabric-attached storage (FAS) and V-Series systems, and it
inherits the advanced capabilities of clustered Data ONTAP.

By providing a single large container for unstructured data, e-mail, video, and
graphics, Infinite Volume eliminates the need to build data management
capabilities into applications with big content requirements. For these
environments, Infinite Volume takes advantage of native storage efficiency
features, such as deduplication and compression, to keep storage costs low.

Further, since Infinite Volume is built into clustered Data ONTAP, the customer is
able to host both Infinite Volume(s) and FlexVol volumes together in a unified
scale-out storage solution. This provides the customer with the ability to host a
variety of different applications in a multi-tenancy environment, with nondisruptive
operations and the ability to use both SAN and NAS in the same storage
infrastructure leveraging the same hardware.

Advantages of Infinite Volume

Reference: NetApp: Introduction to NetApp Infinite Volume


Infinite Volume offers many business advantages for enterprise content
repositories. For example, an Infinite Volume for an enterprise content repository
solution can be used to:

Reduce the cost of scalability


Lower the effective cost per GB
Efficiently ingest, store, and deliver large amounts of data
Reduce complexity and management overhead
Simplify and automate storage management operations
Provide seamless operation and data and service availability
Infinite Volume leverages dense storage shelves from NetApp with the effective
use of large-capacity storage disks. The solution is built on top of the proven
foundation of Data ONTAP with storage efficiency features like deduplication and
compression.

Infinite Volume gives customers a single, large, scalable container to help them
manage huge amounts of growth in unstructured data that might be difficult to
manage by using several containers. Data is automatically load balanced across
the Infinite Volume at ingest. This manageability allows storage administrators to
easily monitor the health state and capacity requirements of their storage systems.

Infinite Volumes are configured within a Data ONTAP cluster and do not require
dedicated hardware. Infinite Volumes can share the same hardware with FlexVol
volumes.

Overview of Infinite Volume

Reference: NetApp: Introduction to NetApp Infinite Volume


NetApp Infinite Volume is a software abstraction hosted over clustered Data ONTAP.
It provides a single mountpoint that can scale to 20PB and 2 billion files, and it
integrates with NetApps proven technologies and products, such as deduplication,
compression, and NetApp SnapMirror replication technology.

Infinite Volume writes an individual file in its entirety to a single node but
distributes the files across several controllers within a cluster.

Figure 1 shows how an Infinite Volume appears as a single large container with
billions of files stored in numerous data constituents.
In the first version of Infinite Volume, data access was provided over the NFSv3
protocol. Starting in clustered Data ONTAP 8.2, Infinite Volume added support for
NFSv4.1, pNFS, and CIFS. Like a FlexVol volume, Infinite Volume data is protected
by using NetApp Snapshot, Raid-DP, and SnapMirror technologies, and NFS or CIFS
mounted tape backups.

FlexVol Vs Infinite Vol

Reference: NetApp: Comparison of FlexVol volumes and Infinite Volumes


Both FlexVol volumes and Infinite Volumes are data containers. However, they
have significant differences that you should consider before deciding which type of
volume to include in your storage architecture.

The following table summarizes the differences and similarities between FlexVol
volumes and Infinite Volumes:

FlexClone

Reference: Back to Basics: FlexClone


In the IT world, there are countless situations in which it is desirable to create a
copy of a dataset: Application development and test (dev/test) and the
provisioning of new virtual machines are common examples. Unfortunately,
traditional copies dont come free. They consume significant storage capacity,
server and network resources, and valuable administrator time and energy. As a
result, your operation probably makes do with fewer, less up-to-date copies than
you really need.

This is exactly the problem that NetApp FlexClone technology was designed to
solve. FlexClone was introduced to allow you to make fast, space-efficient copies of
flexible volumes (FlexVol volumes) and LUNs. A previous Tech OnTap article
describes how one IT team used the NetApp rapid cloning capability built on
FlexClone technology (now incorporated as part of the NetApp Virtual Storage
Console, or VSC) to deploy a 9,000-seat virtual desktop environment with
flexible, fast reprovisioning and using a fraction of the storage that would
normally be required. NetApp uses the same approach for server provisioning in its
own data centers.

Figure 1 FlexClone technology versus the traditional approach to data copies.

Using FlexClone technology instead of traditional copies offers significant


advantages. It is:

Fast. Traditional copies can take many minutes or hours to make. With
FlexClone technology even the largest volumes can be cloned in a matter of
seconds.
Space efficient. A clone uses a small amount of space for metadata, and
then only consumes additional space as data is changed or added.
Reduces costs. FlexClone technology can cut the storage you need for
dev/test or virtual environments by 50% or more.
Improves quality of dev/test. Make as many copies of your full production
dataset as you need. If a test corrupts the data, start again in seconds.
Developers and test engineers spend less time waiting for access to datasets
and more time doing productive work.
Lets you get more from your DR environment. FlexClone makes it
possible to clone and fully test your DR processes, or use your DR
environment for dev/test without interfering with ongoing replication. You
simply clone your DR copies and do dev/test on the clones.
Accelerates virtual machine and virtual desktop provisioning. Deploy
tens or hundreds of new VMs in minutes with only a small incremental
increase in storage.
Most Tech OnTap readers probably know about the use of FlexClone for cloning
volumes. Whats less well known is that, starting with Data ONTAP 7.3.1, NetApp
also gave FlexClone the ability to clone individual files and improved the capability
for cloning LUNs.

This chapter of Back to Basics explores how NetApp FlexClone technology is


implemented, the most common use cases, best practices for implementing
FlexClone, and more.

NetApp From the Ground Up A Beginners Guide


Part 6
NOVEMBER 17, 2014 / WILL ROBINSON

Backups
Reference: NetApp Training Fast Track 101: NetApp Portfolio
Affordable NetApp protection software safeguards your data and business-critical
applications.
Explore the range of NetApp protection software products available to protect your
valuable data and applications and to provide optimal availability, IT efficiency,
and peace of mind.
We have a number of different types of data protection applications they are:

Disk-to-Disk Back up and Recovery Solutions


Application-Aware Backup and Recovery Solutions for Application and
Backup Admins
Business continuity High Availability SolutionsLets look at these in detail.
Disk-to-Disk Backup and Recovery Solutions
Reference: NetApp Training Fast Track 101: NetApp Portfolio
Disk-to-Disk Back up and Recovery Solutions
SnapVault software speeds and simplifies backup and data recovery,
protecting data at the block level.
Open Systems SnapVault (OSSV) software leverages block-level
incremental backup technology to protect Windows, Linux, UNIX, SQL Server,
and VMware systems running on mixed storage.
SnapRestore data recovery software uses stored Data
ONTAP Snapshot copies to recover anything from a single file to multi-
terabyte volumes, in seconds.
Application-Aware Backup and Recovery Solutions for Application and
Backup Administrators

Reference: NetApp Training Fast Track 101: NetApp Portfolio


Application-Aware Backup and Recovery Solutions for Application and Backup
Admins
The SnapManager management software family streamlines storage
management and simplifies configuration, backup, and restore operations.
SnapProtect management software accelerates and simplifies backup and
data recovery for shared IT infrastructures.
OnCommand Unified Manager automates the management of physical
and virtual storage for NetApp storage systems and clusters.
Business Continuity and High Availability Solutions

Reference: NetApp Training Fast Track 101: NetApp Portfolio


Business continuity and High Availability Solutions
SnapMirror data replication technology provides disaster recovery
protection and simplifies the management of data replication.
MetroCluster high-availability and disaster recovery software delivers
continuous availability, transparent failover protection, and zero data loss.

Snapshot
Basic Snapshots

Reference: Why NetApp Snapshots Are Awesome


In the beginning, snapshots were pretty simple: a backup, only faster. Read
everything on your primary disk, and copy it to another disk.

Simple. Effective. Expensive.

Think of these kinds of snapshots as being like a photocopier. You take a piece of
paper, and write on it. When you want a snapshot, you stop writing on the paper,
put it into the photocopier, and make a copy. Now you have 2 pieces of paper.

A big database might take up 50 pieces of paper. Taking a snapshot takes a while,
because you have to copy each page. And the cost adds up. Imagine each piece of
paper cost $5k, or $10k.

Still, its faster than hand-copying your address book into another book every
week.

Its not a perfect analogy, but its pretty close.

Copy-on-Write Snapshots

Reference: Why NetApp Snapshots Are Awesome


Having to copy all the data every time is a drag, because it takes up a lot of space,
takes ages, and costs more. Both taking the snapshot, and restoring it, take a long
time because you have to copy all the data.

But what if you didnt have to? What if you could copy only the bits that changed?

Enter copy-on-write snapshots. The first snapshot records the baseline, before
anything changes. Since nothing has changed yet, you dont need to move data
around.

But as soon as you want to change something, you need to take note of it
somewhere. Copy-on-write does this by first copying the original data to a (special,
hidden) snapshot area, and then overwriting the original data with the new data.
Pretty simple, and effective.
And now it doesnt take up as much space, because youre just recording the
changes, or deltas.

But there are some downsides.

Each time you change a block of data, the system has to read the old block, write
it to the snapshot area, and then write the new block. So, for each write, the disk
actually does two writes and one read. This slows things down.

Its a tradeoff. You lose a bit in write performance, but you dont need as much disk
to get snapshots. With some clever cacheing and other techniques, you can reduce
the performance impact, and overall you save money but get some good benefits,
so it was often worth it.

But what if you didnt have to copy the original data?

NetApp Snapshots

Reference: Why NetApp Snapshots Are Awesome


NetApp snapshots (and ZFS snapshots, incidentally) do things differently. Instead of
copying the old data out of the way before it gets overwritten, the NetApp just
writes the new information to a special bit of disk reserved for storing these
changes, called the SnapReserve. Then, the pointers that tell the system where
to find the data get updated to point to the new data in the SnapReserve.

Thats why the SnapReserve fills up when you change data on a NetApp. And
remember that deleting is a change, so deleting a bunch of data fills up the
SnapReserve, too.

This method has a bunch of advantages. Youre only recording the deltas, so you
get the disk savings of copy-on-write snapshots. But youre not copying the original
block out of the way, so you dont have the performance slowdown. Theres a small
performance impact, but updating pointers is much faster, which is why NetApp
performance is just fine with snapshots turned on, so theyre on by default.

It gets better.

Because the snapshot is just pointers, when you want to restore data
(using SnapRestore) all you have to do is update the pointers to point to the
original data again. This is faster than copying all the data back from the snapshot
area over the original data, as in copy-on-write snapshots.
So taking a snapshot completes in seconds, even for really large volumes (like,
terabytes) and so do restores. Seconds to snap back to a point in time. How cool is
that?

Snapshots Are Views

Reference: Why NetApp Snapshots Are Awesome


Its much better to think of snapshots as a View of your data as it was at the time
the snapshot was taken. Its a time machine, letting you look into the past.

Because its all just pointers, you can actually look at the snapshot as if it was the
active filesystem. Its read-only, because you cant change the past, but you can
actually look at it and read the data.

This is incredibly cool.

Seriously. Its amazing. You get snapshots with almost no performance overhead,
and you can browse through the data to see what it looked like yesterday, or last
week, or last month. Online.

So if you accidentally delete a file, you dont have to restore the entire G:, or suck
the data off a copy on tape somewhere. You can just wander through
the .snapshot (or ~snapshot) directory and find the file, and read it. You can
even copy it back out into the active file system if you want.

All without ringing the helpdesk.

inodes & pointers

Reference: NetApp snapshots


Snapshot is the point in time of copy of a volume/file system. Snapshots are
useful for backup and recovery purposes. With snapshot technology the file
system/volume can be backed up within a matter of few seconds. With traditional
backups to tape, recovery of file/directory involves checking the media onto which
backup was written, loading that media into tape library and restoring the file. This
process takes long time and in some cases users/application developers needs the
file urgently to complete their tasks. With snapshot technology the snapshot of a
file/volume stores in the system itself and administrator can restore the file within
a fraction of seconds which helps users to complete their tasks.

How snapshot works


Snapshot copies the file system/volume when requested to do so. If we have to
copy all the data in file system/volume using traditional OS mechanisms, it takes a
lot of time and consumes lot of space in system. Snapshots overcome this
problem by copying only the blocks that have changed. This is explained below.

If we take a snapshot of a file system/volume, no new data is created or new space


is consumed in the system. Instead of this system copies the inode information of
the file system to snapshot volume. inode information consists of:

file permissions
owner
group
access/modification times
pointers to data blocks
etc
The inode pointers of snapshot volume point to same data blocks of the file
system for which snapshot created. In this way snapshot consumes very minimal
space (metadata of original file system).

What happens if block has been changed in original file system? Before changing
the data block, system copies the data block to snapshot area
and overwrites the original data block with new data. Inode pointer will
be updated in snapshot to point to the data block that is written in snapshot
area. In this manner, changing the data block involves reading the original data
block (read operation), writing it to snapshot area and overwriting the original data
block with new data (two write operations). This causes
performance degradation to some extent. But you dont need much disk space
with this method, as we will record only the changes made to file system/volume.
This is called Copy-On-Write (COW) snapshot.

Now we will see how Netapp does it differently.

While changing the block in volume with snapshot created, instead of copying the
original block to snapshot area, Netapp writes the new volume
to snapreserve space. This involves only one write instead of two writes and one
read with COW snapshot. However this also has some performance impact as this
involves the changing the inode pointers of file system/volume, but this
is minimal if compared to CO which is why Netapp snapshot method is superior
when compared to other vendor snapshots.

Also during restores also, Netapp changes the pointers of filesystem to snapshot
block. With COW snapshot we need to copy the file from snapshot volume to
original volume. So restore with NetApp if faster when compared to COW
snapshots.

Thus Netapp snapshot methodology is superior and faster compared to COW


snapshots.

SnapVault
Reference: NetApp Training Fast Track 101: NetApp Portfolio
Snapshot copies are stored on the local disk to enable fast backup and fast
recovery. But what if the local disk goes offline or fails? NetApp SnapVault software
protects against this type of failure and enables long-term backup and recovery.
SnapVault software delivers disk-to-disk backup, protecting NetApp primary
storage by replicating Snapshot copies to inexpensive secondary storage.
SnapVault software replicates only new or changed data using Snapshot copies,
and stores the data in native format, keeping both backup and recovery times
short and minimizing the storage footprint on secondary storage systems.

SnapVault software can be used to store months or even years of backups, so that
fewer tape backups are needed and recovery times stay short. As with the rest of
the NetApp Integrated Data Protection portfolio, SnapVault software uses
deduplication to further reduce capacity requirements and overall backup costs.
SnapVault software is available to all NetApp storage systems that run the Data
ONTAP operating system.

More Information

Reference: NetApp SnapVault


Providing speed and simplicity for data backup and recovery, NetApp SnapVault
software leverages block-level incremental replication and
NetApp Snapshot copies for reliable, low-overhead disk-to-disk (D2D) backup.

NetApps flagship D2D backup solution, SnapVault provides efficient data


protection by copying only the data blocks that have changed since the last
backup, instead of entire files. As a result, you can back up more frequently while
reducing your storage footprint, because no redundant data is moved or stored.
And with direct backups between NetApp systems, SnapVault D2D minimises the
need for external infrastructure and appliances.

By changing the backup paradigm, NetApp SnapVault software simplifies your


adaptation to data growth and virtualisation, and it streamlines the management
of terabytes to petabytes of data. Use the SnapVault backup solution as a part of
NetApps integrated data protection approach to help create a flexible and efficient
shared IT infrastructure.

By transferring only new or changed blocks of data, traditional backups which


would usually take hours or days to complete, only take minutes. Further to this, it
also enables you to store months or years of Point-in-Time Backup Copies on disk.
When used in conunction with deduplication, Tape backups are reduced or
even eliminated.

Backups can be used for:

Development and Testing


Reporting
Cloning
DR Replication

SnapMirror
Summary

Reference: Overview of NetApp Replication and HA features


SnapMirror is a volume level replication, which normally works over IP
network (SnapMirror can work over FC but only with FC-VI cards and it is not
widely used).

SnapMirror Asynchronous replicates data according to schedule.


SnapMirror Sync uses NVLOGM shipping (described briefly in my previous
post) to synchronously replicate data between two storage systems.
SnapMirror Semi-Sync is in between and synchronizes writes on Consistency
Point (CP) level.
SnapMirror provides protection from data corruption inside a volume. But with
SnapMirror you dont have automatic failover of any sort. You need
to break SnapMirror relationship and present data to clients manually. Then
resynchronize volumes when problem is fixed.

Information

Reference: NetApp Training Fast Track 101: NetApp Portfolio


Because customers require 247 operations, they must protect business
applications and virtual infrastructures from site outages and other disasters.

SnapMirror software is the primary NetApp data protection solution. It is designed


to reduce the risk, cost, and complexity of disaster recovery. It protects
customers business-critical data and enables customers to use their disaster
recovery sites for other business activities which greatly increases the utilization of
valuable resources. SnapMirror software is for disaster recovery and business
continuity, whereas SnapVault software is for long-term retention of point-in-
time backup copies.

SnapMirror software is built into the Data ONTAP operating system, enabling a
customer to build a flexible disaster recovery solution. SnapMirror software
replicates data over IP or Fibre Channel to different models of NetApp storage and
to storage not produced by NetApp but managed by V Series systems. SnapMirror
software can use deduplication and built-in network compression to minimize
storage and network requirements, which reduces costs and accelerates data
transfers.

Customers can also benefit from SnapMirror products if they have virtual
environments, regardless of the vendors they use. NetApp software integrates
with VMware, Microsoft Hyper-V, and Citrix XenServer to enable simple
failover when outages occur.

More Information

Reference: NetApp SnapMirror Data Replication


SnapMirror data replication leverages NetApp unified storage to turn disaster
recovery into a business accelerator.

Built on NetApps unified storage architecture, NetApp SnapMirror technology


provides fast, efficient data replication and disaster recovery (DR) for your
critical data, to get you back to business faster.
With NetApps flagship data replication product, use a single solution across all
NetApp storage arrays and protocols, for any application, in both virtual and
traditional environments, and in a variety of configurations. Tune SnapMirror
technology to meet recovery point objectives ranging from zero seconds to hours.

Cost-effective SnapMirror capabilities include new network


compression technology to reduce bandwidth utilization; accelerated data
transfers to lower RPO; and improved storage efficiency in a virtual environment
using NetApp deduplication. Integrated with FlexClone volumes for
instantaneous, space-efficient copies, SnapMirror software also lets you use
replicated data for DR testing, business intelligence, and development and test
without business interruptions.

Combine SnapMirror with NetApp MultiStore and Provisioning Manager software to


gain application transparency with minimal planned downtime.

Disaster Recovery

Reference: NetApp Training Fast Track 101: NetApp Portfolio


In addition to enabling rapid, cost-effective disaster recovery, SnapMirror
software simplifies testing of disaster recovery processes to make sure they
work as planned before an outage occurs. Typically, testing is painful for
organizations. Companies must bring in people on weekends, shut down
production systems, and perform a failover to see if applications and data appear
at the remote site. Studies indicate that disaster recovery testing can negatively
impact customers and their revenues and that one in four disaster recovery tests
fail.

Because of the difficulty of testing disaster recovery, many customers do


not perform the testing that is necessary to ensure their safety. With SnapMirror
products and FlexClone technology, customers can test failover any time
without affecting production systems or using much storage.

To test failover with SnapMirror products and FlexClone technology, a customer


first creates space-efficient copies of disaster recovery data instantaneously. The
customer uses the copies for testing. After finishing testing, the customer can
delete the clones in seconds.

SnapVault Vs SnapMirror
Reference: SnapVault vs SnapMirror what is the difference?
When I was getting into NetApp I had a big trouble understanding the difference
between snapvault and snapmirror. I heard an explanation: Snapvault is
a backup solution where snapmirror is a DR solution. And all I could do was say
ooh, ok still not fully understanding the difference

The first idea that popped out to my head was that snapmirror is mainly set
on volume level, where snapvault is on qtree level. But that is not always the
case, you can easily setup QSM (Qtree snapmirror).

The key to understanding the difference is really understand the sentence:


Snapvault is a backup solution where snapmirror is a DR solution.

What does it mean that SnapVault is a backup solution?


Let me bring some picture to help me explain:

Example has few assumptions:


Weve got filerA in one location and filerB in other location
That customer has a connection to both filerA and FilerB, although all
shares to customers are available from filerA (via CIFS, NFS, iSCSI or FC)
All customer data is being transfered to the filerB via Snapvault
What we can do with snapvault?

As a backup solution, we can have a longer snapshot retention time


on filerB, so more historic data will be available on filerB, if filerB has
slower disks, this solution is smart, because slower disk = cheaper disks, and
there is no need to use 15k rpm disk on filer that is not serving data to the
customer.
If customer has an network connection and access to shares on filerB he can
by himself restore some data to filerA, even single files
If there is a disaster within filerA and we loose all data we can restore the
data from filerB
What we cannot do with snapvault?

In case of an disaster within filerA we cannot set filerB as a production


side. We cannot revert the relationship making the qtree on filerB as a
source, and make them read/write. They are Snapvault destinations so they
are read-only.
(Having snapmirror license available on filerB we can convert Snapvault
qtree to snapmirror qtree which solves that issue).
What does it mean that SnapMirror is a DR solution?
Again, let me bring the small picture to help me explain:
Example has few assumptions:
Weve got filerA in one location and filerB in other location
That customer has a connection to both filerA and FilerB, although all
shares to customers are available from filerA
All customer data is being transfered to the filerB via snapmirror
What we can do with snapmirror?

As a backup solution we can restore the accidentally deleted, or lost data


on filerA, if the snapmirror relationship has not been updated meantime
If there is some kind or issue with filerA (from a network problem, to a total
disaster) we can easily reverse the relationship. We can make
the volume or qtree on filerB, as a source, and make it read-write, provide
an network connection to the customer and voila we are back online! After
the issue has been solved we can resync the original source with changes
made at the destination and reverse the relationship again.
To sum up
This is not the only difference between snapmirror and snapvault. But I would
say this is the main one. Some other differences are that snapmirror can be
actually in sync or semi-sync mode. The async mode can be updated even once a
minute. Where the snapvault relationship cannot be updated more often then
once an hour. If we have few qtrees on the same volume with snapvault they
share the same schedule, while with QSM they can have different schedules, etc..
;)

NetApp From the Ground Up A Beginners Guide


Part 7
NOVEMBER 18, 2014 / WILL ROBINSON

SyncMirror
Summary

Reference: Overview of NetApp Replication and HA features


SyncMirror mirror aggregates and work on a RAID level. You can
configure mirroring between two shelves of the same system and prevent an
outage in case of a shelf failure.

SyncMirror uses a concept of plexes to describe mirrored copies of data. You have
two plexes: plex0 and plex1. Each plex consists of disks from a separate
pool: pool0 or pool1. Disks are assigned to pools depending on cabling. Disks in
each of the pools must be in separate shelves to ensure high availability. Once
shelves are cabled, you enable SyncMiror and create a mirrored aggregate using
the following syntax:

aggrcreateaggr_namemddisklistddisklist

Plexes

Reference: Overview of NetApp Replication and HA features


SyncMirror mirror aggregates and work on a RAID level. You can configure
mirroring between two shelves of the same system and prevent an outage in
case of a shelf failure.

SyncMirror uses a concept of plexes to describe mirrored copies of data. You have
two plexes: plex0 and plex1. Each plex consists of disks from a separate
pool: pool0 or pool1. Disks are assigned to pools depending on cabling. Disks in
each of the pools must be in separate shelves to ensure high availability.

Plex & Disk Pools

Reference: NetApp Forum: What Is A Disk Pool?


By default Data ONTAP without syncmirror license will keep all
disks in pool0 (default). So you will have only one plex.

You need to have syncmirror license to get two plexes, which will enable RAID-
level mirroring on your storage system.

The following Filerview online help will give more information in this.

Managing Plexes
The SyncMirror software creates mirrored aggregates that consist of two plexes,
providing a higher level of data consistency through RAID-level mirroring. The two
plexes are simultaneously updated; therefore, the plexes are always identical.

When SyncMirror is enabled, all the disks are divided into two disk pools, and a
copy of the plex is created. The plexes are physically separated, (each plex has
its own RAID groups and its own disk pool), and the plexes are updated
simultaneously. This provides added protection against data loss if there is a
double-disk failure or a loss of disk connectivity, because the unaffected plex
continues to serve data while you fix the cause of the failure. Once the plex that
has a problem is fixed, you can resynchronize the two plexes and reestablish the
mirror relationship.

You can create a mirrored aggregate in the following ways:

You can create a new aggregate that has two plexes.


You can add a plex to an existing, unmirrored aggregate.
An aggregates cannot have more than two plexes.
Note: Data ONTAP names the plexes of the mirrored aggregate. See the Data
ONTAP Storage Management Guide for more information about the plex naming
convention.

How Data ONTAP selects disks


Regardless of how you create a mirrored aggregate, Data ONTAP determines which
disks to use. Data ONTAP uses the following disk-selection policies when selecting
disks for mirrored aggregates:

Disks selected for each plex must come from different disk pools.
The number of disks selected for one plex must equal the number of disks
selected for the other plex.
Disks are first selected on the basis of equivalent bytes per sector (bps) size,
then on the basis of the size of the disk.
If there is no equivalent-sized disk, Data ONTAP selects a larger-capacity disk
and uses only part of the disk.
Disk selection policies if you select disks

Data ONTAP enables you to select disks when creating or adding disks to a
mirrored aggregate. You should follow the same disk-selection policies that Data
ONTAP follows when selecting disks for mirrored aggregates. See the Data ONTAP
Storage Management Guide for more information
More Information

Reference: NetApp Forum: What Is A Plex


Diagram: NetApp
A plex is a complete copy of an aggregate. If you do not have mirroring enabled,
youll only be using Plex0. If you enabling mirroring, Plex1 will be created. Plex1
will synchornise with Plex0 so you will have two complete copies of the one
aggregate. This provides full redundancy should Plex0s shelf go off line or suffer a
multi-disk failure.

SyncMirror protects against data loss by maintaining two copies of the data
contained in the aggregate, one in each plex.
A plex is one half of a mirror (when mirroring is enabled). Mirrors are used to
increase fault tolerance. A mirror means, that whatever you write on one disk
gets written on a second disk at least that is the general idea- immediately.
Thus mirroring is a way to prevent data loss from loosing a disk.
If you do not mirror, there is no reason to call the disk in an aggregate a plex
really. But it is easier for consistency etc.- to call the first bunch of disks that
make up an aggregate plex 0. Once you decide to make of mirror -again to
ensure fault tolerance- you need the same amount of disks the aggregate is
made of for the second half of the mirror. This second half is called plex1.
So bottom-line, unless you mirror an aggregate, plex0 is just
a placeholder that should remind you of the ability to create a mirror if
needed.
By default all your raidgroups will be tied towards plex0, the moment you
enable syncmirror things will change. After enabling the syncmirror license
you move disks from default pool pool0 to pool1. Then when you syncmirror
your aggregate you will find pool0 disks will be tied with plex0 and pool1 will
be under plex1.
A plex is a physical copy of the WAFL storage within the aggregate. A
mirrored aggregate consists of two plexes; unmirrored aggregates contain a
single plex.
A plex is a physical copy of a filesystem or the disks holding the data. A
DataONTAP volume normally consists of one plex. A mirrored volume has two
or more plexes, each with a complete copy of the data in the volume. Multiple
plexes provides safety for your data as long as you have one complete plex,
you will still have access to all your data.
A plex is a physical copy of the WAFL storage within the aggregate. A
mirrored aggregate consists of two plexes; unmirrored aggregates contain
a single plex. In order to create a mirrored aggregate, you must have a filer
configuration that supports RAID-level mirroring. When mirroring is
enabled on the filer, the spare disks are divided into two disk pools. When an
aggregate is created, all of the disks in a single plex must come from the
same disk pool, and the two plexes of a mirrored aggregate must consist of
disks from separate pools, as this maximizes fault isolation.

Protection provided by RAID and SyncMirror


Reference: Protection provided by RAID and SyncMirror
Combining RAID and SyncMirror provides protection against more types of drive
failures than using RAID alone.

You can use RAID in combination with the SyncMirror functionality, which also
offers protection against data loss due to drive or other hardware component
failure. SyncMirror protects against data loss by maintaining two copies of the data
contained in the aggregate, one in each plex. Any data loss due to drive failure in
one plex is repaired by the undamaged data in the other plex.

For more information about SyncMirror, see the Data ONTAP Data Protection Online
Backup and Recovery Guide for 7-Mode.

The following tables show the differences between using RAID alone and using
RAID with SyncMirror:
Lab Demo
See this page for a lab demonstration.

SyncMirror Vs SnapVault
Reference: LinkedIn
SyncMirror synchronously mirrors aggregates on the same or a remote
system in the case of MetroCluster. While it is not exactly the same, it might help
to think of it to being analogous to RAID-10. As Aggregates in the NetApp world
store volumes, once you have a sync-mirrored aggregate, any volume and the
subsequent data placed in them is automatically mirrored in
a synchronous manner.

SnapVault is very different. It takes qtrees (directories managed by Data ONTAP)


from a source system, and replicates them (asynchronously) to a volume on a
destination system. Usually many source qtrees are replicated into one volume,
compressed, deduplicated and then archived via a Snapshot on the destination.
In general terms, SnapVault enables backup and archive of data from one or
many source systems to a centralised backup system. This methodology enables
many copies of production data to be retained on a secondary system with only
the block differential data being transferred between each backup.

SyncMirror Vs SnapMirror
Reference: LinkedIn
SyncMirror is for mirroring data between aggregates synchronously, usually
on the same system, but can be on a remote system in the case of MetroCluster.

SnapMirror operates at the volume level (can be Qtree as well but differs slightly
to volume SnapMirror), and is usually deployed for asynchronous replication. It
has no distance limitations (whereas SyncMirror in
a MetroCluster configuration is limited to 100km), replicates over IP
(MetroCluster requires fibre links between the sites), has compression and
is dedupe aware. If your source volume has been deduplicated,
then SnapMirror will replicate it in its deduplicated format. SnapMirror also has
features built in to make it easy to fail over, fail back, break the mirror and
resync the mirror. Its compatible with SRM, and also integrates with other NetApp
features such as MultiStore for configuring DR on a vFiler level.

More Information

Reference: LinkedIn
Absolutely you can do SyncMirror within the same system you just need to
create two identical aggregates for this purpose and you will have two
synchronously mirrored aggregates with all the volumes they contain on the one
system.

SnapMirror is a great asynchronous replication method and instead of


replicating aggregates, it is set up at the volume layer. So you have a source
volume and a destination volume, and there will be some lag time between them
both based on how frequently you update the mirror (e.g. 15 minutes, 30 minutes,
4 hours, etc.). You can very easily make the SnapMirror destination read/write (it
is read only while replication is taking place), and also resynchronise back to
the original source after a fail over event.
One of the issues with mirroring data is that the system will happily
mirror corrupt data from the application perspective it just does what it is
told. SnapVault fits in the picture here as it offers a longer term Snapshot
retention of data on a secondary system that is purely for backup purposes. By this
I mean that the destination copy of the data needs to be restored back to another
system it is generally never read/write but write only. SnapMirror, and SyncMirror
destinations contain an exact copy of the source volume or aggregate including
any Snapshots that existed when replication occured. SnapVault is different
because the Snapshot that is retained on the secondary system is actually created
after the replication update has occurred. So you can have a fan in type effect for
many volumes or qtrees into a SnapVault destination volume, and once the
schedule has completed from the source system(s), the destination will create a
Snapshot for retention purposes, then de-duplicate the data. You can end up with
many copies of production data on your Snapvault destination I have a couple of
customers that are retaining 1 years worth of backup using Snapvault.

It is extremely efficient as compression and deduplication work very well in this


case, and allows customers to keep many generations of production data on disk.
The customer can also stream data off to tape from the SnapVault destination if
required. As with SnapMirror, the SnapVault destination does not need to be the
same as source, so you can have a higher performance system in production (SAS
or SSD drive) and a more economical, dense system for your backup (3TB SATA for
example).

In a nutshell SnapMirror/SyncMirror come under the banner of HA, DR and


BCP, whereas SnapVault is really a backup and archive technology.

Open Systems SnapVault (OSSV)


Information

Reference: NetApp Training Fast Track 101: NetApp Portfolio


Open Systems SnapVault provides the same features as SnapVault but to storage
not produced by NetApp. SnapVault disk-to-disk backup capability is a unique
differentiator-no other storage vendor enables replication for long-term disk-to-disk
backup within an array. Open System SnapValut(OSSV) enables the customers to
backup non-NetApp data to secondary NetApp target. Open Systems SnapVault
leverages the block-level incremental backup technology found in SnapVault to
protect Windows, Linux, UNIX, SQL Server, and VMware systems running on mixed
storage.

More Information

Reference: NetApp Open Systems SnapVault


Designed to safeguard data in open-storage platforms, NetApp Open Systems
SnapVault (OSSV) software leverages the same block-level incremental backup
technology and NetApp Snapshot copies found in our SnapVault solution. OSSV
extends this data protection to Windows, Linux, UNIX, SQL Server, and VMware
systems running mixed storage.
OSSV improves performance and enables more frequent data backups by moving
data and creating backups from only changed data blocks rather than entire
changed files. Because no redundant data is moved or stored, you
need less storage capacity and a smaller storage footprintgiving you a cost-
effective solution. OSSV is well suited for centralizing disk-to-disk (D2D) backups
for remote offices.
===========================================================================
=========

NetApp From the Ground Up A Beginners Guide


Part 8
DECEMBER 12, 2014 / WILL ROBINSON

HA Pair
Summary

Reference: Overview of NetApp Replication and HA features


HA Pair is basically two controllers which both have connection to their own and
partner shelves. When one of the controllers fails, the other one takes over. Its
called Cluster Failover (CFO). Controller NVRAMs are mirrored over NVRAM
interconnect link. So even the data which hasnt been committed to disks isnt lost.

Note: HA Pair cant failover when disk shelf fails, because partner doesnt have a
copy to service requests from.

Mirrored HA Pair
Summary

Reference: Overview of NetApp Replication and HA features


You can think of a Mirrored HA Pair as HA Pair with SyncMirror between the
systems. You can implement almost the same configuration on HA pair with
SyncMirror inside (not between) each system (because the odds of the whole
storage system (controller + shelves) going down is highly unlikely). But it can
give you more peace of mind if its mirrored between two system.

It cannot failover like MetroCluster, when one of the storage systems goes
down. The whole process is manual. The reasonable question here is why it
cannot failover if it has a copy of all the data? Because MetroCluster is a separate
functionality, which performs all the checks and carry out a cutover to a mirror.
Its called Cluster Failover on Disaster (CFOD). SyncMirror is only
a mirroring facility and doesnt even know that cluster exists.

MetroCluster
Summary

Reference: Overview of NetApp Replication and HA features


MetroCluster provides failover on a storage system level. It uses the
same SyncMirror feature beneath it to mirror data between two storage systems
(instead of two shelves of the same system as in
pure SyncMirror implementation). Now even if a storage controller fails together
with all of its storage, you are safe. The other system takes over and continues to
service requests.

HA Pair cant failover when disk shelf fails, because partner doesnt have a copy
to service requests from.

Information

Reference: NetApp Training Fast Track 101: NetApp Portfolio


After Disaster Recovery, lets consider Continuous Availability. Customers must be
able to recover critical applications after a system
failure seamlessly and instantaneously. Critical applications include financial
applications and manufacturing operations, which must be continuously available
with near-zero RTO and zero RPO. NetApp MetroCluster is a unique array-based
clustering solution that can extend up to 200km and enable a zero RPO (no data
loss) with zero or near-zero RTO.

MetroCluster enhances the built-in redundancy of NetApp hardware and software,


providing an additional layer of protection for the entire storage and host
environment. MetroCluster seamlessly maintains application and virtual
infrastructure availability when storage outages occur (whether the outage is due
to a network connectivity issue, loss of power, a problem with cooling systems, a
storage array shutdown, or an operational error). Most MetroCluster customers
report that their users experience no application interruption when a cluster
recovery occurs.

MetroCluster enables single command fail over for seamless cutover of


applications that is transparent to the end user.

It supports a distance of up to 100 kilometers.

Integrated Data Protection


Information

Reference: NetApp Training Fast Track 101: NetApp Portfolio


When discussing data protection strategies with your customers, you should
consider the overall NetApp data protection portfolio, which we call NetApp
Integrated Data Protection. NetApp Integrated Data Protection enables customers
to:

Deliver backup
High availability
Business continuity
Continuous availability
from a single platform. It is a single suite of integrated products that works across
all NetApp solutions and with non NetApp storage. Your customers can use a single
platform for all data protection, the process of building, implementing, and
managing data protection over time is simpler, because they have fewer systems
from fewer vendors to install and manage. And because the portfolio uses NetApp
storage efficiency technology, cost and complexity can be up to 50% lower than for
competitive solutions.

NetApp Snapshot copies are the answer to shrinking backup windows. They are
nearly instantaneous, and as a result do not impact the application. As a result,
multiple Snapshot copies can be made per day hourly or even more often. They
are the primary solution for protecting against user errors or data corruption.

MetroCluster is NetApps solution for Continuous Availability, enabling zero data


loss in the event of wide variety of failure scenarios. MetroCluster, in conjunction
with VMwares HA and Fault Tolerance capabilities, give your
customers continuous availability of their ESX servers and storage. For single
failures, the storage systems will perform an automated, transparent failover.
MetroCluster has been certified with VMwares High Availability and Fault Tolerant
solutions.

SnapMirror provides asynchronous mirroring across unlimited distances to enable


Disaster Recovery from a secondary site. SnapMirror, in conjunction with VMwares
Site Recovery Manager, delivers automated, global failover of the entire virtual
environment to a recovery site, and integrates with SnapMirror and FlexClone.

NetApp enables your customers to use less expensive SATA drives as nearline
storage, and lower-cost controllers for asymmetric backups and backup
consolidation from multiple sources. NetApp solutions enable rapid search and
retrieval of backup data, and also support the re-use of backup data for other
business uses via our unique, near-zero impact FlexClones.

NetApp enables your customers to perform flexible backup vaulting: Disk to Disk to
Tape, full tape management as well as full cataloging of disk and tape backups. In
addition, NetApp allows your customers to choose how they want to manage their
data protection workflows. Your customers can use NetApp products such
as SnapProtect for end-to-end backup management including catalog and disk-
to-disk-to-tape; and for specific applications your customers can leverage
the SnapManager software.

Considering the Overall Data Protection Portfolio

Reference: NetApp Training Fast Track 101: NetApp Portfolio


NetApp Integrated Data Protection enables customers to deliver high availability,
business continuity, continuous availability, and backup and compliance from
a single platform. A single platform that works across all NetApp solutions and
with non NetApp storage. Because customers can use a single platform for all data
protection, the process of building, implementing, and managing data protection
over time is simpler, because they have fewer systems from fewer vendors to
install and manage. And because the portfolio uses NetApp storage efficiency
technology, cost and complexity can be up to 50% lower than for competitive
solutions. For example, a customer can use MetroCluster to provide a zero RPO
and then replicate data with SnapVault software to a remote site for long-term
backup and recovery. If the customer later decides to implement long-distance
disaster recovery with a short RPO, the customer can use SnapMirror software to
do so.
===========================================================================
=========

NetApp From the Ground Up A Beginners Guide


Part 9
JANUARY 25, 2015 / WILL ROBINSON

SnapRestore
Reference: NetApp SnapRestore
NetApp SnapRestore software uses stored Snapshot copies to recover entire file
systems or data volumes in seconds.

Whether you want to recover a single file or a multi-terabyte data volume,


SnapRestore software makes data recovery automatic and almost instantaneous,
regardless of your storage capacity or number of files. With a single simple
command, you can choose and recover data from any NetApp Snapshot copy on
your system.
Whereas traditional data recovery requires that all the data be copied from the
backup to the source, the SnapRestore process is fast and takes up very little of
your storage space. With SnapRestore, you can:

Restore data files and databases fast


Test changes with easy restores to your baseline copy
Recover at once from virus attacks, or after user or application error
In addition, SnapRestore software requires no special training, which reduces both
the likelihood of operator error and your costs to maintain specialized staffing
resources.

SnapManager
Reference: Back to Basics: SnapManager Suite
The more a backup application understands about the way an application works,
the more efficient the backup process will be. Unfortunately, back-end storage
systems typically know little or nothing about the application data they contain, so
you either have to use brute-force methods to perform backups on the storage
system or you have to let each application perform its own backup. Neither
alternative is particularly desirable.

To address this shortcoming, NetApp created its SnapManager software, a suite of


intelligent tools that allow applications and storage to coordinate activities to make
backup fast and space efficient, speed the restore process, and simplify common
data management tasks. The SnapManager suite represents thousands of man-
hours of effort going back to the original SnapManager for Exchange product,
which was introduced in 2000.
NetApp users today can choose from seven SnapManager tools that provide deep
integration to coordinate storage management activities with popular enterprise
software programsMicrosoft SQL Server, Exchange, SharePoint, Oracle, and SAP
as well as virtual infrastructureVMware and Microsoft Hyper-V. These tools offer
significant advantages for application backup. They:

Integrate closely with the unique features and capabilities of each


application.
Take full advantage of NetApp data protection features, including Snapshot,
SnapMirror, SnapRestore, and FlexClone technologies to provide fast, efficient
backup and restore, replication for DR, and cloning. (Cloning is not supported
by all SnapManager products.)
Allow backups to be completed more quickly in much less time (typically
minutes versus hours) so backups can be completed more often with less
disruption to the application.
Off-load most of the work of data protection from servers.
Provide application-centric interfaces that let application administrators
perform backups without having to understand details of storage or involve
storage administrators in routine activities.
Support both Data ONTAP technology operating in 7-Mode and clustered Data
ONTAP.
See this page for more information.

Snap Creator
Reference: Snap Creator Framework
OS-independent Snap Creator Framework integrates NetApp data protection with a
broad range of third-party applications.

NetApp Snap Creator Framework lets you standardize and simplify backup, restore,
and DR in any environment. Its a unified data protection solution for standard and
custom applications.

Snap Creator plug-ins integrate NetApp features with third-party applications,


operating systems, and databases, including Oracle, VMware, Citrix Xen, Red Hat
KVM, DB2, Lotus Domino, MySQL, Sybase ASE, and MaxDB. Snap Creator also
accommodates custom plug-ins and has an active developer community.

The Snap Creator Framework provides:

Application-consistent data protection. You get a centralized solution for


backing up critical information, integrating with existing application
architectures to assure data consistency and reduce operating costs.
Extensibility. Achieve fast integration using NetApp modular architecture
and policy-based automation.
Cloud readiness. OS-independent Snap Creator functionality supports
physical and virtual platforms and interoperates with IT-as-a-service and
cloud environments.
SnapProtect
Information
Reference: Back to Basics: SnapProtect
An important reason why IT teams choose NetApp storage is because it provides
the ability to use integrated data protection technologies
like Snapshot copies, SnapMirror replication, and SnapVault disk-to-disk
backup. These capabilities dramatically accelerate and simplify backup and
replication for DR and other purposes.

Still, we saw a need for deeper integration with backup applications, especially
for those who need to include tape in their backup environments.

NetApp introduced its SnapProtect management software about a year ago to


provide these and other features. NetApp partnered with CommVault to integrate
core components of CommVault Simpana with core NetApp technologies. The
combination delivers all the benefits you expect
from Snapshot copies, SnapMirror, and SnapVault, plus it offers significant
advantages including:

Accelerates both backup and restore operations


Full tape support
Cataloging of Snapshot copies, replicas, and tape
Built-in support for VMware, Hyper-V, and popular applications
Automated provisioning of secondary storage
Cascading and fan-out configurations
Flexible scheduling and retention
Reporting
Simple single-pane-of-glass management for all features
How SnapProtect Is Implemented
Reference: Back to Basics: SnapProtect
SnapProtect uses a variety of components. Most of these are familiar NetApp
technologies such as:

Snapshot copies
SnapMirror replication
SnapVault for disk-to-disk backup
FlexClone technology for cloning and indexing
SnapRestore technology for rapid restore of whole volumes and single files
OnCommand software (formerly NetApp Operations Manager) for
provisioning and replication
In addition, SnapProtect adds several additional components that enable
cataloging, coordination, management, and so on.

SnapProtect Server: Runs Windows, Microsoft SQL Server, and


management software
MediaAgents: Additional servers that help spread the data protection
workload during SnapProtect operations
iDataAgents (iDAs): Software agents installed on backup clients that are
responsible for data consistency during backup operations
SnapProtect Console
Reference: NetApp Training Fast Track 101: NetApp Portfolio
SnapProtect provides a single interface (the SnapProtect Console) from which your
customers can manage disk-to-tape or disk-to-disk-to-tape backup workflows. They
can use the SnapProtect Console to help reduce backup and recovery times,
minimize storage and operational costs, and meet requirements for physical and
virtualized environments.

In addition to allowing your customers to create and manage Snapshot copies from
the single console, the SnapProtect solution lets them manage policies for
SnapVault and SnapMirror replication to secondary storage; backup and restore
virtual machines, applications, and data; catalog Snapshot copies across both disk
and tape for rapid search and retrieval; and work with tape for long-term retention.

The SnapProtect solution is integrated with NetApp storage efficiency, Snapshot,


and thin replication technologies. Together, the SnapProtect solution and NetApp
technologies reduce backup and recovery windows by up to 98% relative to
traditional backup, reduce storage requirements by up to 90%, and reduce network
bandwidth utilization. With controller-based licensing, customers can grow their
environments and keep software licensing costs down.

NetApp & CommVault Partnership


Reference: NetApp to resell CommVaults backup technology
Reference: NetApp and CommVault Unite to Modernize Backup
===========================================================================
=========

NetApp From the Ground Up A Beginners Guide


Part 10
FEBRUARY 2, 2015 / WILL ROBINSON

OnCommand Overview
Reference: NetApp Training Fast Track 101: NetApp Portfolio
OnCommand management software helps your customers to monitor and manage
their NetApp storage as well as multi-vendor storage environments, offering cost-
effective and efficient solutions for their clustered, virtualized and cloud
environments. With OnCommand, our customers are able to optimize utilization
and performance, automate and integrate processes, minimize risk and meet their
SLAs. Our objective is to simplify the complexity of managing todays IT
infrastructure, and improve the efficiency of storage and service delivery.

Multiple Clustered NetApp Systems


Reference: NetApp Training Fast Track 101: NetApp Portfolio
Manage and automate your NetApp storage at scale. For your customers who are
growing and require a solution to manage multiple clustered NetApp systems, they
can turn to OnCommand Unified Manager, Performance Manager, and Workflow
Automation. These three products work together to provide a comprehensive
solution for todays software-defined data center. Also your customers can analyze
their complex virtualized environment and cloud infrastructure using NetApp
OnCommand Balance.

NetApp Storage Management


Reference: NetApp Training Fast Track 101: NetApp Portfolio

OnCommand Insight (Multi-vendor


Storage Management)
Reference: NetApp Training Fast Track 101: NetApp Portfolio

Integration
Reference: NetApp Training Fast Track 101: NetApp Portfolio

System Manager

Reference: NetApp Training Fast Track 101: NetApp Portfolio


Many of our NetApp customers start out using OnCommand System Manager
for simple, devicelevel management of individual or clustered
systems. System Manager features a simple, browser-based interface with a
dashboard, graphical reports, and automated workflows. It is designed to provide
effective storage management for virtualized data centers through a simple user
interface. For instance, using OnCommand System Manager, a customer was able
to simplify storage management and achieve more than 80% storage utilization,
while keeping costs low and using less power and space. It also supports the latest
features in clustered Data ONTAP, such as High Availability pairs, Quality of
Service, and Infinite Volumes.

Unified Manager

Reference: NetApp Training Fast Track 101: NetApp Portfolio

Performance Manager

Reference: NetApp Training Fast Track 101: NetApp Portfolio

Workflow Automation

Reference: NetApp Training Fast Track 101: NetApp Portfolio

Balance

Reference: NetApp Training Fast Track 101: NetApp Portfolio


Insight

Reference: NetApp Training Fast Track 101: NetApp Portfolio


OnCommand Insight provides your customers with capabilities such as capacity
planning, configuration and change management, showback and chargeback
reporting, virtual machine optimization, and monitoring to provide added insight
into multivendor, multiprotocol shared infrastructures. These analytics help your
customer better understand how to optimize the data and storage to help make
better decisions, improve efficiency, and reduce costs.
===========================================================================
=========

NetApp From the Ground Up A Beginners Guide


Part 11
FEBRUARY 4, 2015 / WILL ROBINSON

Capacity
Right-Sizing
Reference: NetApp Forum
Disk drives from different manufacturers may differ slightly in size even though
they belong to the same size category. Right sizing ensures that disks are
compatible regardless of manufacturer. Data ONTAP right sizes disks to
compensate for different manufacturers producing different raw-sized disks.

More Information
Reference: Storage Gaga
Much has been said about usable disk storage capacity and unfortunately, many of
us take the marketing capacity number given by the manufacturer in verbatim. For
example, 1TB does not really equate to 1TB in usable terms and that is something
you engineers out there should be informing to the customers.

NetApp, ever since the beginning, has been subjected to the scrutiny of the
customers and competitors alike about their usable capacity and I intend to correct
this misconception. And the key of this misconception is to understand what is the
capacity before rightsizing (BR) and after rightsizing (AR).

(Note: Rightsizing in the NetApp world is well documented and widely accepted
with different views. It is part of how WAFL uses the disks but one has to be aware
that not many other storage vendors publish their rightsizing process, if any)

Before Rightsizing (BR)


First of all, we have to know that there are 2 systems when it comes to system of
unit prefixes. These 2 systems can be easily said as

Base-10 (decimal) fit for human understanding


Base-2 (binary) fit for computer understanding
So according the International Systems of Units, the SI prefixes for Base-10 are:

In computer context, where the binary, Base-2 system is relevant, that SI prefixes
for Base-2 are

And we must know that the storage capacity is in Base-2 rather than in Base-10.
Computers are not humans.

With that in mind, the next issue are the disk manufacturers. We should have an
axe to grind with them for misrepresenting the actual capacity. When they say
their HDD is 1TB, they are using the Base-10 system i.e. 1TB = 1,000,000,000,000
bytes. THIS IS WRONG!
Lets see how that 1TB works out to be in Gigabytes in the Base-2 system:

1,000,000,000/1,073,741,824 = 931.3225746154785 Gigabytes

Note: 2^30 = 1,073,741,824

That result of 1TB, when rounded, is only about 931GB! So, the disk manufacturers
arent exactly giving you what they have advertised.

Thirdly, and also the most important factor in the BR (Before Rightsizing) phase is
how WAFL handles the actual capacity before the disk is produced to WAFL/ONTAP
operations. Note that this is all done before all the logical structures of aggregates,
volumes and LUNs are created.
In this third point, WAFL formats the actual disks (just like NTFS formats new disks)
and this reduces the usable capacity even further. As a starting point, WAFL
uses 4K (4,096 bytes) per block.

Note: It appears that the 4K block size is not the issue, its the checksum that is
the problem.

For Fibre Channel disks, WAFL then formats these blocks as 520 bytes per sector.
Therefore, for each block, 8 sectors (520 x 8 = 4160 bytes) fill 1 x 4K block,
leaving a remainder of 64 bytes (4,160 4,096 = 64 bytes). This additional 64
bytes per block is used as a checksum and is not displayed by WAFL or ONTAP
and not accounted for in its usable capacity, therefore the capacity seen by users
is further reduced.

512 bytes per sector are used for formatting SATA/SAS disks and it consumes 9
sectors (9 x 512 = 4,608 bytes). 8 sectors will be used for WAFLs 4K per block
(4,096/512 = 8 sectors), while the 9th sector (512 bytes) is used partially for
its 64 bytes checksum. As with the Fibre Channel disks, the unused 448 bytes
(512 64 = 448 bytes) in the 9th sector is not displayed and not part of the usable
capacity of WAFL and ONTAP.

And WAFL also compensates for the ever-so-slightly irregularities of the hard disk
drives even though they are labelled with similar marketing capacities. That is to
say that 1TB from Seagate and 1TB from Hitachi will be different in terms actual
capacity. In fact, 1TB Seagate HDD with firmware 1.0a (for ease of clarification) and
1TB Seagate HDD with firmware 1.0b (note a and b) could be different in actual
capacity even when both are shipped with a 1.0TB marketing capacity label.

So, with all these things in mind, WAFL does what it needs to do Right Size to
ensure that nothing get screwed up when WAFL uses the HDDs in its aggregates
and volumes. All for the right reason Data Integrity but often criticized for
their wrongdoing. Think of WAFL as your vigilante superhero, wanted by the
law for doing good for the people.

In the end, what you are likely to get Before Rightsizing (BR) from NetApp for each
particular disk capacity would be:
* The size of 34.5GB was for the Fibre Channel Zone Checksum mechanism
employed prior to ONTAP version 6.5 of 512 bytes per sector. After ONTAP 6.5,
block checksum of 520 bytes per sector was employed for greater data integrity
protection and resiliency.

From the table, the percentage of lost capacity is shown and to the uninformed,
this could look significant. But since the percentage value is relative to the
Manufacturers Marketing Capacity, this is highly inaccurate. Therefore,
competitors should not use these figures as FUD and NetApp should use these as
a way to properly inform their customers.
NetApp Figures

Reference: NetApp

RAID & Right Sizing


See Part 3 for information on RAID & Right Sizing.

4K Blocks
LinkedIn Discussion
Flash DBA
===========================================================================
========

NetApp From the Ground Up A Beginners Guide


Part 12
FEBRUARY 5, 2015 / WILL ROBINSON

Volume and Aggregate


Reallocation
Reference: Understanding NetApp Volume and Aggregate
Reallocation
Summary
Volume Reallocation: Spreads a volume across all disks in an aggregate
Aggregate Reallocation: Optimises free space in the aggregate by
ensuring free space is contiguous.
Details
One of the most misunderstood topics I have seen with NetApp FAS systems is
reallocation. There are two types of reallocation that can be run on these systems;
one for files and volumes and another for aggregates. The process is run in the
background, and although the goal is to optimize placement of data blocks both
serve a different purpose. Below is a picture of a 4 disk aggregate with 2 volumes,
one orange and one yellow.

If we add a new disk to this aggregate, and we dont run volume level reallocation
all new writes will happen on the area in the aggregate that has the most
contiguous free space. As we can see from the picture below this area is the new
disk. Since new data is usually the most accessed data you now have this single
disk servicing most of your reads and writes. This will create a hot disk, and
performance issues.
Now if we run a volume reallocation on the yellow volume the data will be spread
out across all the disks in the aggregate. The orange volume is still unoptimized
and will suffer from the hot disk syndrome until we run a reallocation on it as well.

This is why when adding only a few new disk to an aggregate you must run a
volume reallocation against every volume in your aggregate. If you are adding
multiple disks to an aggregate (16, 32, etc) it may not be necessary to run the
reallocate. Imagine you add 32 disk to a 16 disk aggregate. New writes will go to
32 disk instead of the 16 you had prior so performance will be much better without
taking any intervention. As the new disk begin to fill up writes will eventually hit all
48 disks in your aggregate. You could of course speed this process up by running
manual reallocation against all volumes in the aggregate.

The other big area of confusion is what an aggregate reallocation actually does.
Aggregate reallocation reallocate -A will only optimize free space in the
aggregate. This will help your system with writes as the easier it is to find
contiguous free space the more efficient those operations will be. Take the diagram
below as an example of an aggregate that could benefit from reallocation.

This is our expanded aggregate that we only reallocated the yellow volume. We
see free space in the aggregate where the blocks were distributed across the other
disk. We also see how new writes for the orange volume stacked up on the new
disk as that is where we had the most contiguous free space. I wonder if the
application owner has been complaining about performance issues with his orange
data? The picture below shows us what happens after the aggregate reallocate.

We still have the unoptimized data from the volume we did not reallocate. The only
thing the aggregate reallocate did was make the free space in it more contiguous
for writing new data. It is easy to see how one could be confused by these similar
but different processes, and I hope this helps explain how and why you would use
the different types of reallocation.

Zeroing & Sanitising Disks


Information
Reference: Spares FAQ
If a disk has been moved around, or previously had data on it, youll need to zero it
before it can be re-used. The disk zero spares command does the job,
depending on the size of the disk will depend on how long it takes, usually no more
than 4 hours even for the largest of disks (at time of writing, 1TB disks).

More Information
Reference: How to securely erase your data on a NetApp
When drives in a NetApp are being obsoleted and replaced we need to make sure
we securely erase all data that used to be on them. Unless youre just going to
crush your disks.

In this example weve got an aggregate of 14 disks (aggr0) that need to be wiped
and removed from our NetApp so they can be replaced with new, much larger
disks.

There are two methods that you can use to wipe disks using your NetApp. The
first is to simply delete the aggregate they are a member of, turning them into
spares and then running disk zero spares from the command line on your
NetApp. This only does a single pass and only zeros the disks. There are
arguments Ive seen where some people say this is enough. I honestly dont know
and we have a requirement to do a 7 pass wipe in our enterprise. You could run
the zero command 7 times but I dont imagine that would be as effective as option
number two. The second option is to run the disk sanitize command which
allows you to specify which disks you want to erase and how many passes to
perform. This is what were going to use.

The first thing youll need to do is get a license for your NetApp to enable the disk
sanitize. Its a free license (so Ive been told) and you can contact your sales rep
to get one. We got ours for free and Ive seen forum posts from other NetApp
owners saying the same thing.

There is a downside to installing the disk sanitization license. Once its installed on
a NetApp it cannot be removed. It also restricts the use of three commands once
installed:

dd (to copy blocks of data)


dumpblock (to print dumps of disk blocks)
setflag wafl_metadata_visible (to allow access to internal WAFL files)
There are also a few limitations regarding disk sanitization you should know about:
It is not supported in takeover mode for systems in an HA configuration. (If a
storage system is disabled, it remains disabled during the disk sanitization
process.)
It cannot be carried out on disks that were failed due to readability or
writability problems.
It does not perform its formatting phase on ATA drives.
If you are using the random pattern, it cannot be performed on more than
100 disks at one time.
It is not supported on array LUNs.
It is not supported on SSDs.
If you sanitize both SES disks in the same ESH shelf at the same time, you
see errors on the console about access to that shelf, and shelf warnings are
not reported for the duration of the sanitization. However, data access to that
shelf is not interrupted.
Ive also read that you shouldnt sanitize more then 6 disks at once. Im going to
sanitize our disks in batches of 5, 5 and 4 (14 total). Ive also read you do not want
to sanitize disks across shelves at the same time.

Fractional Reserve
Information on Fractional Reserve:

http://alethiakaiepisto.blogspot.com.au/2011/08/volume-fractional-reserve-vs-
snap.html
http://blog.hernanjlarrea.com.ar/index.php/what-is-fractional-reserve-option/
http://www.linkedin.com/groups/hi-All-What-is-fractional-93470.S.155142981
https://library.netapp.com/ecmdocs/ECMP1196995/html/GUID-596042AF-
8E9C-4187-969C-633DFDD5A936.html
https://niktips.wordpress.com/tag/fractional-reserve/
https://library.netapp.com/ecmdocs/ECMP1196995/html/GUID-AA594113-
8BA8-48BC-8982-928CA4B93B11.html
As per the following links, Fractional Reserve should be disabled for LUNs:

https://niktips.wordpress.com/2013/05/22/netapp-thin-provisioning-for-
vmware-luns/
https://niktips.wordpress.com/2012/03/15/netapp-thin-provisioning-for-
vmware/
===========================================================================
=========

NetApp From the Ground Up A Beginners Guide


Part 13
FEBRUARY 23, 2015 / WILL ROBINSON

Snap Reserve
Overview

Reference: NetApp Understanding Snapshot copy reserve


Snapshot copy reserve sets a specific percent of the disk space for storing
Snapshot copies. If the Snapshot copies exceed the reserve space, they spill into
the active file system and this process is called snapreserveSnapshot spill.
The Snapshot copy reserve must have sufficient space allocated for the Snapshot
copies. If the Snapshot copies exceeds the reserve space, you must delete the
existing Snapshot copies from the active file system to recover the space, for the
use of the file system. You can also modify the percent of disk space that is allotted
to Snapshot copies.
Defaults

Reference: NetApp What the Snapshot copy reserve is


The Snapshot copy reserve sets a specific percent of the disk space for Snapshot
copies. For traditional volumes, the default Snapshot copy reserve is set to 20
percent of the disk space. For FlexVol volumes, the default Snapshot copy reserve
is set to 5 percent of the disk space.
The active file system cannot consume the Snapshot copy reserve space, but the
Snapshot copy reserve, if exhausted, can use space in the active file system.

Diagram

Reference: NetApp Community: Understanding Aggregate And LUN


Note: See Part 3 for more information on the diagram below

Can NetApp Snapshots be used as Backups?

Information

Reference: Server Fault Can NetApp Snapshots be used as


Backups?
Backups serve two functions.

First and foremost, theyre there to allow you to recover your data if it
becomes unavailable. In this sense, snapshots are not backups. If you lose
data on the filer (volume deletion, storage corruption, firmware error, etc.), all
snapshots for that data are gone as well.
Secondly, and far more commonly, backups are used to correct for routine
things like accidental deletions. In this use case, snapshots are backups.
Theyre arguably one of the best ways to provide this kind of recovery,
because they make the earlier versions of the data available directly to the
users or their OSas a .snapshot hidden directory that they can directly read
their file from.
No retention policy

That said, while we have snapshots and use them extensively, we still do nightly
incrementals on Netbackup to tape or data domain. The reason is that snapshots
can not reliably uphold a retention policy. If you tell users that they will be able to
back up from a daily granularity for a week then a weekly granularity for a month,
you cant keep that promise with snapshots.

On a Netapp volume with snapshots, deleted data contained in a snapshot


occupies snap reserve space. If the volume isnt full and youve configured it this
way, you can also push past that snapshot reserve and have snapshots that
occupy some of the unused data space. If the volume fills up, though, all the
snapshots but the ones supported by data in the reserved space will get deleted.
Deletion of snapshots is determined only by available snapshot space, and if it
needs to delete snapshots that are required for your retention policy, it will.

Consider this situation:

A full volume with regular snapshots and a 2 week retention requirement.


Assume half of the reserve in use for snapshots based on the normal rate of
change.
Someone deletes a lot of data (more than the snapshot reserve), drastically
increasing the rate of change, temporarily.
At this point, your snapshot reserve is completely used, as is as much of the
data free space youve allowed OnTap to use for snapshots, but you havent
lost any snapshots yet. As soon as someone fills the volume back up with
data, though, youll lose all the snapshots contained in the data section,
which will push your recovery point back to the time just after the large
deletion.
Summary

Netapp snapshots dont cover you against real data loss. An errant deleted volume
or data loss on the filer will require you to rebuild data.

They are a very simple and elegant way to allow for simple routine restores, but
they arent reliable enough that they replace a real backup solution. Most of the
time, theyll make routine restores simple and painless, but when theyre not
available, you are exposed.

Volume Snap Reserve

Reference: NetApp: What the Snapshot copy reserve is


The Snapshot copy reserve sets a specific percent of the disk space for Snapshot
copies. For traditional volumes, the default Snapshot copy reserve is set to 20
percent of the disk space. For FlexVol volumes, the default Snapshot copy
reserve is set to 5 percent of the disk space.
The active file system cannot consume the Snapshot copy reserve space, but the
Snapshot copy reserve, if exhausted, can use space in the active file
system. (See Snapshot Spill below for more information).
Aggregate Snap Reserve
Information

Reference: NetApp Managing aggregate Snapshot copies


An aggregate Snapshot copy is a point-in-time, read-only image of
an aggregate. You use aggregate Snapshot copies when the contents of
an entire aggregate need to be recorded.
An aggregate Snapshot copy is similar to a volume Snapshot copy, except that it
captures the contents of the entire aggregate, rather than any particular volume.
Also, you do not restore data directly from an aggregate Snapshot copy. To restore
data, you use a volume Snapshot copy.
How you use aggregate Snapshot copies depends on whether you use
the SyncMirror or MetroCluster functionality.
If you use SyncMirror or MetroCluster, you must
enable automatic aggregate Snapshot copy creation and keep your
aggregate Snapshot reserve at 5 percentor higher.
If you use SyncMirror or MetroCluster and you need to break the
mirror, an aggregate Snapshot copy is created automatically before
breaking the mirror to decrease the time it takes to resynchronize the
mirror later.
Also, if you are making a global change to your storage system and you
want to be able to restore the entire system state if the change produces
unexpected results, you take an aggregate Snapshot copy before making
the change.
If you do not use either SyncMirror or MetroCluster, you do not need to
enable automatic aggregate Snapshot copy creation or reserve space for
aggregate Snapshot copies.
If the aggregate file system becomes inconsistent, aggregate
Snapshot copies can be used by technical support to restore the file
system to a consistent state. If that is important to you, you can ensure
that automatic aggregate Snapshot copy creation is enabled. However,
disabling automatic aggregate Snapshot copy creation and keeping your
aggregate Snapshot reserve at 0 percent increases your storage
utilization, because no disk space is reserved for aggregate Snapshot
copies. Disabling automatic aggregate Snapshot copy creation and
setting the aggregate Snapshot reserve to 0 percent does not affect
normal operation, except for making more free space available for data.
Note: The default size of the aggregate Snapshot reserve is 5 percent of the
aggregate size. For example, if the size of your aggregate is 500 GB, then 25 GB is
set aside for aggregate Snapshot copies.
Note: Unlike volume Snapshot copies, aggregate Snapshot
copies cannot consume any space outside of their Snapshot reserve, if automatic
aggregate Snapshot copy deletion is enabled. If automatic aggregate Snapshot
copy deletion is disabled, then aggregate Snapshot copies can consume space
outside of their Snapshot reserve.
More Information

Reference: Me
As per this page, Aggregate Snapshots arent used very often. As seen in the
comments, people prefer using Volume Snapshots instead.
Snapshot Spill
Information

Reference: NetApp How Snapshot copies and Snapshot reserve use


space in a volume
Understanding the Snapshot reserve area of a FlexVol volume or Infinite Volume
and what Snapshot spill is can help you correctly size the Snapshot reserve. For
FlexVol volumes, it can help you decide whether to enable the Snapshot autodelete
capability.

When Snapshot copies use more space than the Snapshot reserve, they spill over
and use space in the active file system. The Snapshot reserve area of a volume is
the space reserved exclusively for Snapshot copies. It is not available to the user
data or metadata area of the volume. The size of the Snapshot reserve is a
specified percentage of the current volume size, and does not depend on the
number of Snapshot copies or how much space they consume.

If all of the space allotted for the Snapshot reserve is used but the active file
system (user data and metadata) is not full, Snapshot copies can use more space
than the Snapshot reserve and spill into the active file system. This extra space is
called Snapshot spill.

The following illustration shows a FlexVol volume with no Snapshot spill occurring.
The two blocks on the left show the volumes used and available space for user
data and metadata. The two blocks on the right show the used and unused
portions of the Snapshot reserve. When you modify the size of the Snapshot
reserve, it is the blocks on the right that change.
The following illustration shows a FlexVol volume with Snapshot spill occurring. The
Snapshot reserve area is full and Snapshot copies spilling over into a Spill area that
is part of the user data and metadata areas available space. The size of the
Snapshot reserve remains the same.

Why Managing Snapshot Reserve is Important


Information

Reference: Why Managing Snapshot Reserve is Important


This post is essentially a continuation of Understanding NetApp Snapshots (was
thinking to call it part II except this subject deserves its own title!) Here well see
what happens when snapshots exceed their snapshot reserve, and what can
potentially happen if snapshots are allowed to build up too much.

At the end of the aforementioned post, we had our 1024MB volume with 0MB file-
system space used, and 20MB of snapshot reserve used. The volume has a 5%
snapshot reserve (for completeness: it is also thin-provisioned/space-
guarantee=none, and has a 0% fractional reserve.)

Filesystem total used avail capacity


/vol/cdotshare/ 972MB 0MB 972MB 0%
/vol/cdotshare/.snapshot 51MB 20MB 30MB 40%
Our client sees an empty volume (share) of size 972MB (95% of 1024MB).

Image: Empty Share

What does the client see if we increase the snapshot reserve to 20%?

::> volume modify cdotshare -percent-snapshot-space 20


Image: Empty Share but less user visible space due to increased
snapshot reserve

Our client sees the volume size has reduced to 819MB (80% of 1024MB).

So, we now have 204MB of snapshot reserve with 20MB of that reserve used.
::> df -megabyte cdotshare
Filesystem total used avail capacity
/vol/cdotshare/ 819MB 0MB 819MB 0%
/vol/cdotshare/.snapshot 204MB 21MB 183MB 10%
What happens if we overfill the snapshot reserve to say 200% (408MB) by adding
data, creating snapshot, and then deleing all the data? What does the client see?

Step 1: Add ~380MB to the share via the client


Step 2: Take a snapshot
::> snapshot create cdotshare -snapshot snap04 -vserver vs1
Step 3: Using the client, delete everything in the share so the additional 380MB is
also locked in snapshots
So, our snapshot capacity is now at 200% (thats twice its reserve) and the df
output shows theres only 613MB available user capacity.

Filesystem total used avail capacity


/vol/cdotshare/ 819MB 205MB 613MB 25%
/vol/cdotshare/.snapshot 204MB 410MB 0MB 200%
The 613MB available user capacity is confirmed via the client, even though volume
is actually empty.

Image: Share now occupied but not by active file-system data

Image: The share says it is empty!

If you ever get questions like my volumes empty but somethings using up my
space what is it? now you know what the answer might be snapshots going
over their reserve!

::> snapshot show cdotshare


Vserver Volume Snapshot State Size Total% Used%
-------- ------- ----------- -------- -------- ------ -----
vs1 cdotshare
snap01 valid 92KB 0% 15%
snap02 valid 20.61MB 2% 98%
snap03 valid 184.5MB 18% 100%
snap04 valid 205MB 20% 100%
To take it to the furthest, I could completely fill up the volume with snapshots (by
adding data, taking a snapshot, deleting data) and then theres no space for any
active file-system that is the user will see an empty but completely full volume!
Or better put a volume empty of active user data but otherwise full due to
snapshot retention of past changes/deletions.
Image: Empty but completely full volume!

Filesystem total used avail capacity


/vol/cdotshare/ 819MB 818MB 0MB 100%
/vol/cdotshare/.snapshot 204MB 1022MB 0MB 499%

What are (some) ways to manage snapshots?

1) The volume should be sized correctly for the amount of data thats going to go
in it (keeping in mind the need for growth), and the snapshot reserve should be of
a size that will contain the changes (deletions/block modifications) over the
retention period required.
2) The snapshot policy should be set appropriately:
::> volume modify -vserver VSERVER -volume VOLUME -snapshot-policy POLICY
Note: Changing the snapshot policy will require manual deletion of snapshots that
were controlled by the previous policy.
3) Consider making using of these space and snapshot management features:
::> volume modify -vserver VSERVER -volume VOLUME -?
-autosize {true|false}
-max-autosize {integer(KB/MB/GB/TB/PB)}
-autosize-increment {integer(KB/MB/GB/TB/PB)}
-space-mgmt-try-first {volume_grow/snap_delete}

Monitoring, Events and Alerting

OnCommand Unified Manager (OCUM) is available and free to use OCUM can
monitor your snapshot reserve utilization levels and much more!

Finally Fractional Reserve

Going back to our example, what would it look like if we set the fractional reserve
to 100%?

The answer is no different:

::> volume modify -vserver vs1 -volume cdotshare -fractional-reserve 100%

::> df -megabyte cdotshare


Filesystem total used avail capacity
/vol/cdotshare/ 819MB 818MB 0MB 100%
/vol/cdotshare/.snapshot 204MB 1022MB 0MB 499%
The volume is still full of snapshot with no space for active filesystem data!

More Snapshot Information

See the following pages for more information:

https://library.netapp.com/ecmdocs/ECMP1196991/html/GUID-
132FA703-6109-4BAB-9C04-D375E1DB0184.html
https://library.netapp.com/ecmdocs/ECMP1196991/html/GUID-
4547DD0A-4A55-4982-89A0-90AD8A1C86F4.html
https://library.netapp.com/ecmdocs/ECMP1196991/html/GUID-
7FD0912C-6C8C-420E-B1FD-B05A3E3D3180.html
http://broadcast.oreilly.com/2009/02/understanding-snapshot-
consump.html
===========================================================================
=========
Interacting with NetApp APIs, Part 1
APRIL 3, 2017 / WILL ROBINSON
If youre a regular reader of this blog, youll see that Ive been posting
about automation and Python quite a lot recently. The reason being that its not
only fun, but I feel its the way of the future. But I digress
The reason for this post is to discuss my recent experience with NetApps APIs. As I
got off to a pretty slow start (which I feel was due to lack of documentation), Ill
also provide setup and usage guidance in the hopes that you can get up and
running sooner than I did.

What do the NetApp APIs do?


To quote NetApps documentation, You can create an application for seamless
integration with NetApp solutions by using the ONTAP APIs. ONTAP APIs are
available for security management, license management, backup and recovery,
data replication, data archiving, and so on (see the following figure).

Getting Started
Now that we know what the APIs are capable of, lets get started. The first things
youll want to do are:

1. Download the API documentation


2. Download the NetApp Manageability SDK
Now this is where things get tricky. At the time of writing, the latest version of the
SDK is 5.6, as is the latest version of the documentation. However, the information
contained in the documentation pertains only to Data ONTAP v9.0 & 9.1 which run
ONTAPI versions 1.100 and 1.110 respectively. Because the simulators in my lab
are running CDoT 8.3.2, the documentation does not meet my needs.

In order to obtain the correct information, I need to go back to the 5.4


documentation as it provides descriptions of all the APIs for Data ONTAP
8.3.1. Its not quite the version Im running, but its close enough.
So to recap, Im running version 5.6 of the NetApp SDK software, but am reading
version 5.4 of the NetApp SDK documentation.
Now that weve got that out of the way, extract both of the ZIP files to a location of
your choosing.

Documentation
Just a quick FYI, youll need to use Internet Explorer or Edge when opening the
SDK_help.htm file because it wont work with Chrome.
Also, while the Type column tells you which entries are mandatory (e.g string)
or optional (e.g integer option), this information can actually be misleading. I
say this because if youve got a choice between two or more mandatory fields
which achieve the same outcome, both fields will be marked as optional.
For example, the below screenshot is from the aggr-create entry. It tells us that
eitherdisk-count or disks must be specified, yet theyre both marked
as optional.
The reason why NetApp would have done this is to avoid having users think they
need to specify both, which would be the case if they were marked as mandatory.
So while it might be a little confusing at first, it is definitely the right way to do it.

Cluster-Mode & Vserver APIs


As youll see when you browse the documentation, as well as ZExplore (which will
be discussed next), theres two types of API. Clustered and Vserver.
The Clustered API is used to issue commands which are related to the cluster
as a whole, such as aggr . The Vserver API on the other hand is for commands
which relate to SVMs, such as volume .
Note: It is important to remember that if youre using the Vserver API you
need to connect to an SVM LIF. You will also need to use an account that has the
required ONTAPI permissions.

ZExplore Development Interface (ZEDI)


To quote NetApps documentation again,
ZEDI is a utility with graphical user interface bundled with NetApp Manageability
SDK (NMSDK). This utility enables you to test DATA ONTAP APIs and OnCommand
Unified Manager APIs. This utility allows you to generate raw XML request for any
given API.
You can supply necessary arguments in the XML request before invoking the API
through HTTP or HTTPS and you can view the response in raw XML format or tree
format. Additionally, for a given API, the utility can generate sample codes in Java,
Perl, C, C#, Python and Ruby to demonstrate how the said API can be invoked
using NMSDK Core APIs.
You can choose to include comments (API documentation descriptions) and/or
optional parameters while generating the XML request and sample codes. You can
generate workflows by sequencing multiple APIs in logical order. It also supports
vFiler and Vserver tunneling, which enables a DATA ONTAP API to be invoked
directly on a vFiler or a Vserver respectively.
Now that you know what ZExplore does, lets take it for a quick spin. You can find
it in the Zedi directory of the netapp-manageability-sdk ZIP file you
downloaded earlier.
Heres what the Connect window looks like in my lab:

Note: Upon running ZExplore, you might see the following error: ReferenceError:
importPackage is not defined in <eval> at line number 2. Dont worry if you do,
the tool appears to work just fine anyway. Having said this, it might be worth
rebooting your PC as that is what I did and havent seen the error since.
After clicking Connect, Im presented with the following message:
The reason for this that CDoT 8.3.2 is running ONTAPI 1.32 but the version of
Zexplore Im running does not contain a 1.32 API document. This is interesting
given that it does have documents for 1.100 and 1.110 (Data ONTAP v9.0 & 9.1
respectively). In other words, CDOT 8.3.1 and, 9 and 9.1 API documents are
provided, but CDOT 8.3.2 are not. How odd! :)
Having said this though, it isnt too much of a problem. As mentioned by Andrew in
this NetApp forum thread:
Zexplore is simply lacking awareness of ZAPI 1.32. You can safely use version
1.31 for the vast majority of operationsyou might encounter an occasional issue
if there is a discrepancy between 1.31 and 1.32, where Zexplore is using the 1.31
specification but 1.32 differs in some way.
If youre looking at the ZAPI docs, and not relying on Zexplore 100%, youll be able
to spot the differences.
If you encounter a similar issue, what youll need to do is select the closest API
version from the dropdown menu located in the top left hand corner:

This will ensure the greatest amount of compatibility between XExplore and the
version of ONTAPI running on your controllers.

Continue onto Part 2


In Part 2 of this series I delve into ZExplores GUI further and walk you through
the process of making your first API call.

Interacting with NetApp APIs, Part 2


APRIL 5, 2017 / WILL ROBINSON
Picking up where I left off in Part 1 of this series, lets continue our exploration of
ZExplore :)

Mandatory Parameters
In part 1 I touched on the fact that the API documentation can be a little confusing
when it comes to mandatory fields. Unfortunately the same is true for ZExplore.
However, NetApps documentation explains it well:
Red colored arrows indicate mandatory parameters whereas Blue colored arrows
indicate optional parameters.
Note: In some APIs when either one of the two input parameters is
required, both these input parameters are marked by Blue color arrow
and not Red.
As I mentioned in my previous post, by doing this NetApp avoids confusing users
who might otherwise try to set both parameters if they were both marked as
required.

XML Configuration
The APIs entries are listed on the left hand side and are laid out in a hierarchical
fashion, similar to that of a directory structure. To use it you must first find the
directory youre interested in using and then you expand it by clicking the +.
Next you select the entry you want, right click on it and then click Generate.
If youd like more information on an entry, you can either read the documentation
or hover your mouse over the entry in ZExplore. The image below is an example of
the information that is presented when an entry is hovered over.
Clicking Generate adds the XML code to the Execute pane. The entry is also
added to the Added APIs pane:

Next you need to fill in the required fields. You can do that by:

1. Editing the XML code directly in the Execute pane.


2. Right clicking on the parameters in the Added APIs pane and then
clicking Edit.
My preference is the former as I find it quicker but the latter is perfectly fine too.

The great news is that as you add more entries, the Added APIs pane and the
XML code are updated automatically. This is a massive time saver as you dont
have to worry about ensuring your XML hierarchy is set up correctly ZExplore
does that for you.

Show Commands/Calls
There are two important things to note when dealing with show commands through
the API. The first is that theyre actually referred to as get API calls. This isnt a
big deal but is worth mentioning in case you were looking for show
entries/documentation.
The second is that the structure of the get calls differs from the set calls. In the
previous screenshot we saw that the disk-fail entry has a flat hierarchy. It has
simply has two components under the root and thats it. get calls on the other
hand have a much deeper hierarchy:
Thankfully theyre actually very easy to work with. The hierarchy is broken down
into three main categories:

desired-attributes : The information youd like to retrieve.


max-records : The maximum number of records youd like to receive.
query : Filters youd like to apply to your call.
For example, lets say Id like achieve the following:

Retrieve all information aggr-get-iter has to offer.


Limit the amount of records that are retrieved to 2.
The aggregate names must begin with aggr0.
To achieve this I simply need to do the following:
Generate aggr-get-iter and leave the desired-attributes as is. Doing so
will ensure all attributes are retrieved.
Set max-records to 2.
Delete all aggr-attributes except for aggregate-name . Set this parameter
toaggr0*
Note: The last step is extremely important. As mentioned above, the aggr-
attributes entries are filters. Unless your query results meet all attribute entries,
they will not be returned. Therefore I advise you to remove all attributes which you
do set a value for.
Once youve done this, your settings should look like this:

Next I click the Play button to run my query. If all goes well I should receive just
over 200 lines of XML code, which I do. As most of the output isnt relevant for this
post, Ive only provided the interesting parts below:
<?xmlversion='1.0'encoding='UTF8'?>

<netappversion='1.32'xmlns='http://www.netapp.com/filer/admin'>

<!Outputofaggrgetiter[ExecutionTime:47ms]>

<results status='passed'>

<!!OMITTED!!>
<num-records>2</num-records>

</results>
</netapp>
The first highlighted line tells us that our query was run successfully, while the
second highlighted line tells us how many records were returned. If our query
failed (e.g we did not provide a mandatory parameter) the results status would
contain an error message as opposed to the word passed.
Now because no one in their right minds would want to sift through over 200 lines
of XML, XExplore has a nice Tree feature that allows you to browse the results in
a hierarchical fashion. In the image below we can see the two aggr-attributes
records which were returned:

Drilling down the hierarchy shows us the actual values which were collected as
specified by the desired-attributes entries, as shown below:

And there we have it! Weve completed our first API call.
Continue onto Part 3
In Part 3 of this series I demonstrate how to use ZExplore to convert XML
queries to languages including Python, Perl, Ruby, just to name a few. I also explain
why you should only specify the desired-attributes which you intend to use.

Interacting with NetApp APIs, Part 3


APRIL 6, 2017 / WILL ROBINSON
In Part 2 of this series we made our first API call and received over 200 lines of
XML as a result. The reason why we received so much output is because we didnt
remove anydesired-attributes and therefore the call retrieved about 90 pieces of
information. When you multiply that by the number of queries we ran (2), you get
180 pieces of information being requested.
This post will discuss why you should limit your queries to only the pieces of
information youre interested in. It will also cover how you can use ZExplore to
convert its XML configuration to languages such as Python, Perl and Ruby.

Converting Code
As discussed previously in this series, ZExplore outputs your queries into
the Execute tab in XML. However, you can also have it convert the queries to
any one of the following languages:
C#
C
Java
Perl
Python
Ruby
You do this by navigating to Preferences -> Languages. Once youve selected
a language the converted code is available in the Develop tab.
This is a fantastic feature because it allows you to add your NetApp API calls to
your scripts simply by copying and pasting the generated code. Using this feature
Im able to combine my NetApp API calls with my Cisco MDS code which will enable
me to automate the provisioning of storage for new hosts connected to the
SAN network.

Cleaner Code
Cleaner code means you and anyone else who works with your code, is able to do
their job more efficiently. For example, say Im only interested in retrieving:

A list of nodes who own aggregates with the name aggr0*.


Instead of generating over 140 lines of XML code:
XML Code

1 <?xml version="1.0" encoding="UTF-8"?>


2 <netapp xmlns="http://www.netapp.com/filer/admin" version="1.32">
3 <aggr-get-iter>
4 <desired-attributes>
5 <aggr-attributes>
6 <aggr-64bit-upgrade-attributes>
7 <aggr-check-attributes>
8 <added-space></added-space>
9 <check-last-errno></check-last-errno>
10 <cookie></cookie>
11 <is-space-estimate-complete></is-space-estimate-complete>
12 </aggr-check-attributes>
13 <aggr-start-attributes>
14 <min-space-for-upgrade></min-space-for-upgrade>
15 <start-last-errno></start-last-errno>
16 </aggr-start-attributes>
17 <aggr-status-attributes>
18 <is-64-bit-upgrade-in-progress></is-64-bit-upgrade-in-progress>
19 </aggr-status-attributes>
20 </aggr-64bit-upgrade-attributes>
21 <aggr-fs-attributes>
22 <block-type></block-type>
23 <fsid></fsid>
24 <type></type>
25 </aggr-fs-attributes>
26 <aggr-inode-attributes>
27 <files-private-used></files-private-used>
28 <files-total></files-total>
29 <files-used></files-used>
30 <inodefile-private-capacity></inodefile-private-capacity>
31 <inodefile-public-capacity></inodefile-public-capacity>
32 <maxfiles-available></maxfiles-available>
33 <maxfiles-possible></maxfiles-possible>
34 <maxfiles-used></maxfiles-used>
35 <percent-inode-used-capacity></percent-inode-used-capacity>
36 </aggr-inode-attributes>
37 <aggr-ownership-attributes>
38 <home-id></home-id>
39 <home-name></home-name>
40 <owner-id></owner-id>
41 <owner-name></owner-name>
42 </aggr-ownership-attributes>
43 <aggr-performance-attributes>
44 <free-space-realloc></free-space-realloc>
45 </aggr-performance-attributes>
46 <aggr-raid-attributes>
47 <checksum-status></checksum-status>
48 <checksum-style></checksum-style>
49 <disk-count></disk-count>
50 <ha-policy></ha-policy>
51 <has-local-root></has-local-root>
52 <has-partner-root></has-partner-root>
53 <is-checksum-enabled></is-checksum-enabled>
54 <is-hybrid></is-hybrid>
55 <is-hybrid-enabled></is-hybrid-enabled>
56 <is-inconsistent></is-inconsistent>
57 <mirror-status></mirror-status>
58 <mount-state></mount-state>
59 <plex-count></plex-count>
60 <plexes>
61 <plex-attributes>
62 <is-online></is-online>
63 <is-resyncing></is-resyncing>
64 <plex-name></plex-name>
65 <raidgroups>
66 <raidgroup-attributes>
67 <checksum-style></checksum-style>
68 <is-recomputing-parity></is-recomputing-parity>
69 <is-reconstructing></is-reconstructing>
70 <raidgroup-name></raidgroup-name>
71 <recomputing-parity-percentage></recomputing-parity-percentage>
72 <reconstruction-percentage></reconstruction-percentage>
73 </raidgroup-attributes>
74 </raidgroups>
75 <resyncing-percentage></resyncing-percentage>
76 </plex-attributes>
77 </plexes>
78 <raid-lost-write-state></raid-lost-write-state>
79 <raid-size></raid-size>
80 <raid-status></raid-status>
81 <state></state>
82 </aggr-raid-attributes>
83 <aggr-snaplock-attributes>
84 <is-snaplock></is-snaplock>
85 <snaplock-type></snaplock-type>
86 </aggr-snaplock-attributes>
87 <aggr-snapmirror-attributes>
88 <dp-snapmirror-destinations></dp-snapmirror-destinations>
89 <ls-snapmirror-destinations></ls-snapmirror-destinations>
90 <mv-snapmirror-destinations></mv-snapmirror-destinations>
91 </aggr-snapmirror-attributes>
92 <aggr-snapshot-attributes>
93 <files-total></files-total>
94 <files-used></files-used>
95 <maxfiles-available></maxfiles-available>
96 <maxfiles-possible></maxfiles-possible>
97 <maxfiles-used></maxfiles-used>
98 <percent-inode-used-capacity></percent-inode-used-capacity>
99 <percent-used-capacity></percent-used-capacity>
100 <size-available></size-available>
101 <size-total></size-total>
102 <size-used></size-used>
103 </aggr-snapshot-attributes>
104 <aggr-space-attributes>
105 <hybrid-cache-size-total></hybrid-cache-size-total>
106 <percent-used-capacity></percent-used-capacity>
107 <size-available></size-available>
108 <size-total></size-total>
109 <size-used></size-used>
110 <total-reserved-space></total-reserved-space>
111 </aggr-space-attributes>
112 <aggr-striping-attributes>
113 <member-count></member-count>
114 </aggr-striping-attributes>
115 <aggr-volume-count-attributes>
116 <flexvol-count></flexvol-count>
117 <flexvol-count-collective></flexvol-count-collective>
118 <flexvol-count-not-online></flexvol-count-not-online>
119 <flexvol-count-quiesced></flexvol-count-quiesced>
120 <flexvol-count-striped></flexvol-count-striped>
121 </aggr-volume-count-attributes>
122 <aggr-wafliron-attributes>
123 <last-start-errno></last-start-errno>
124 <last-start-error-info></last-start-error-info>
125 <scan-percentage></scan-percentage>
126 <state></state>
127 </aggr-wafliron-attributes>
128 <aggregate-name></aggregate-name>
129 <aggregate-uuid></aggregate-uuid>
130 <nodes>
131 <node-name></node-name>
132 </nodes>
133 <striping-type></striping-type>
134 </aggr-attributes>
135 </desired-attributes>
136 <max-records>2</max-records>
137 <query>
138 <aggr-attributes>
139 <aggregate-name>aggr0*</aggregate-name>
140 </aggr-attributes>
141 </query>
142 <tag></tag>
143 </aggr-get-iter>
144 </netapp>
and over 200 lines of Python code:
Python Code

1 import sys
2 sys.path.append("<path_to_nmsdk_root>/lib/python/NetApp")
3 from NaServer import *
4
5
6 s = NaServer("192.168.0.205", 1 , 32)
7 s.set_server_type("FILER")
8 s.set_transport_type("HTTPS")
9 s.set_port(443)
10 s.set_style("LOGIN")
11 s.set_admin_user("admin", "<password>")
12
13
14 api = NaElement("aggr-get-iter")
15
16 xi = NaElement("desired-attributes")
17 api.child_add(xi)
18
19
20 xi1 = NaElement("aggr-attributes")
21 xi.child_add(xi1)
22
23
24 xi2 = NaElement("aggr-64bit-upgrade-attributes")
25 xi1.child_add(xi2)
26
27
28 xi3 = NaElement("aggr-check-attributes")
29 xi2.child_add(xi3)
30
31 xi3.child_add_string("added-space","<added-space>")
32 xi3.child_add_string("check-last-errno","<check-last-errno>")
33 xi3.child_add_string("cookie","<cookie>")
34 xi3.child_add_string("is-space-estimate-complete","<is-space-estimate-complete>")
35
36 xi4 = NaElement("aggr-start-attributes")
37 xi2.child_add(xi4)
38
39 xi4.child_add_string("min-space-for-upgrade","<min-space-for-upgrade>")
40 xi4.child_add_string("start-last-errno","<start-last-errno>")
41
42 xi5 = NaElement("aggr-status-attributes")
43 xi2.child_add(xi5)
44
45 xi5.child_add_string("is-64-bit-upgrade-in-progress","<is-64-bit-upgrade-in-progress>")
46
47 xi6 = NaElement("aggr-fs-attributes")
48 xi1.child_add(xi6)
49
50 xi6.child_add_string("block-type","<block-type>")
51 xi6.child_add_string("fsid","<fsid>")
52 xi6.child_add_string("type","<type>")
53
54 xi7 = NaElement("aggr-inode-attributes")
55 xi1.child_add(xi7)
56
57 xi7.child_add_string("files-private-used","<files-private-used>")
58 xi7.child_add_string("files-total","<files-total>")
59 xi7.child_add_string("files-used","<files-used>")
60 xi7.child_add_string("inodefile-private-capacity","<inodefile-private-capacity>")
61 xi7.child_add_string("inodefile-public-capacity","<inodefile-public-capacity>")
62 xi7.child_add_string("maxfiles-available","<maxfiles-available>")
63 xi7.child_add_string("maxfiles-possible","<maxfiles-possible>")
64 xi7.child_add_string("maxfiles-used","<maxfiles-used>")
65 xi7.child_add_string("percent-inode-used-capacity","<percent-inode-used-capacity>")
66
67 xi8 = NaElement("aggr-ownership-attributes")
68 xi1.child_add(xi8)
69
70 xi8.child_add_string("home-id","<home-id>")
71 xi8.child_add_string("home-name","<home-name>")
72 xi8.child_add_string("owner-id","<owner-id>")
73 xi8.child_add_string("owner-name","<owner-name>")
74
75 xi9 = NaElement("aggr-performance-attributes")
76 xi1.child_add(xi9)
77
78 xi9.child_add_string("free-space-realloc","<free-space-realloc>")
79
80 xi10 = NaElement("aggr-raid-attributes")
81 xi1.child_add(xi10)
82
83 xi10.child_add_string("checksum-status","<checksum-status>")
84 xi10.child_add_string("checksum-style","<checksum-style>")
85 xi10.child_add_string("disk-count","<disk-count>")
86 xi10.child_add_string("ha-policy","<ha-policy>")
87 xi10.child_add_string("has-local-root","<has-local-root>")
88 xi10.child_add_string("has-partner-root","<has-partner-root>")
89 xi10.child_add_string("is-checksum-enabled","<is-checksum-enabled>")
90 xi10.child_add_string("is-hybrid","<is-hybrid>")
91 xi10.child_add_string("is-hybrid-enabled","<is-hybrid-enabled>")
92 xi10.child_add_string("is-inconsistent","<is-inconsistent>")
93 xi10.child_add_string("mirror-status","<mirror-status>")
94 xi10.child_add_string("mount-state","<mount-state>")
95 xi10.child_add_string("plex-count","<plex-count>")
96
97 xi11 = NaElement("plexes")
98 xi10.child_add(xi11)
99
100
101 xi12 = NaElement("plex-attributes")
102 xi11.child_add(xi12)
103
104 xi12.child_add_string("is-online","<is-online>")
105 xi12.child_add_string("is-resyncing","<is-resyncing>")
106 xi12.child_add_string("plex-name","<plex-name>")
107
108 xi13 = NaElement("raidgroups")
109 xi12.child_add(xi13)
110
111
112 xi14 = NaElement("raidgroup-attributes")
113 xi13.child_add(xi14)
114
115 xi14.child_add_string("checksum-style","<checksum-style>")
116 xi14.child_add_string("is-recomputing-parity","<is-recomputing-parity>")
117 xi14.child_add_string("is-reconstructing","<is-reconstructing>")
118 xi14.child_add_string("raidgroup-name","<raidgroup-name>")
119 xi14.child_add_string("recomputing-parity-percentage","<recomputing-parity-percentage>")
120 xi14.child_add_string("reconstruction-percentage","<reconstruction-percentage>")
121 xi12.child_add_string("resyncing-percentage","<resyncing-percentage>")
122 xi10.child_add_string("raid-lost-write-state","<raid-lost-write-state>")
123 xi10.child_add_string("raid-size","<raid-size>")
124 xi10.child_add_string("raid-status","<raid-status>")
125 xi10.child_add_string("state","<state>")
126
127 xi15 = NaElement("aggr-snaplock-attributes")
128 xi1.child_add(xi15)
129
130 xi15.child_add_string("is-snaplock","<is-snaplock>")
131 xi15.child_add_string("snaplock-type","<snaplock-type>")
132
133 xi16 = NaElement("aggr-snapmirror-attributes")
134 xi1.child_add(xi16)
135
136 xi16.child_add_string("dp-snapmirror-destinations","<dp-snapmirror-destinations>")
137 xi16.child_add_string("ls-snapmirror-destinations","<ls-snapmirror-destinations>")
138 xi16.child_add_string("mv-snapmirror-destinations","<mv-snapmirror-destinations>")
139
140 xi17 = NaElement("aggr-snapshot-attributes")
141 xi1.child_add(xi17)
142
143 xi17.child_add_string("files-total","<files-total>")
144 xi17.child_add_string("files-used","<files-used>")
145 xi17.child_add_string("maxfiles-available","<maxfiles-available>")
146 xi17.child_add_string("maxfiles-possible","<maxfiles-possible>")
147 xi17.child_add_string("maxfiles-used","<maxfiles-used>")
148 xi17.child_add_string("percent-inode-used-capacity","<percent-inode-used-capacity>")
149 xi17.child_add_string("percent-used-capacity","<percent-used-capacity>")
150 xi17.child_add_string("size-available","<size-available>")
151 xi17.child_add_string("size-total","<size-total>")
152 xi17.child_add_string("size-used","<size-used>")
153
154 xi18 = NaElement("aggr-space-attributes")
155 xi1.child_add(xi18)
156
157 xi18.child_add_string("hybrid-cache-size-total","<hybrid-cache-size-total>")
158 xi18.child_add_string("percent-used-capacity","<percent-used-capacity>")
159 xi18.child_add_string("size-available","<size-available>")
160 xi18.child_add_string("size-total","<size-total>")
161 xi18.child_add_string("size-used","<size-used>")
162 xi18.child_add_string("total-reserved-space","<total-reserved-space>")
163
164 xi19 = NaElement("aggr-striping-attributes")
165 xi1.child_add(xi19)
166
167 xi19.child_add_string("member-count","<member-count>")
168
169 xi20 = NaElement("aggr-volume-count-attributes")
170 xi1.child_add(xi20)
171
172 xi20.child_add_string("flexvol-count","<flexvol-count>")
173 xi20.child_add_string("flexvol-count-collective","<flexvol-count-collective>")
174 xi20.child_add_string("flexvol-count-not-online","<flexvol-count-not-online>")
175 xi20.child_add_string("flexvol-count-quiesced","<flexvol-count-quiesced>")
176 xi20.child_add_string("flexvol-count-striped","<flexvol-count-striped>")
177
178 xi21 = NaElement("aggr-wafliron-attributes")
179 xi1.child_add(xi21)
180
181 xi21.child_add_string("last-start-errno","<last-start-errno>")
182 xi21.child_add_string("last-start-error-info","<last-start-error-info>")
183 xi21.child_add_string("scan-percentage","<scan-percentage>")
184 xi21.child_add_string("state","<state>")
185 xi1.child_add_string("aggregate-name","<aggregate-name>")
186 xi1.child_add_string("aggregate-uuid","<aggregate-uuid>")
187
188 xi22 = NaElement("nodes")
189 xi1.child_add(xi22)
190
191 xi22.child_add_string("node-name","<node-name>")
192 xi1.child_add_string("striping-type","<striping-type>")
193 api.child_add_string("max-records","2")
194
195 xi23 = NaElement("query")
196 api.child_add(xi23)
197
198
199 xi24 = NaElement("aggr-attributes")
200 xi23.child_add(xi24)
201
202 xi24.child_add_string("aggregate-name","aggr0*")
203 api.child_add_string("tag","<tag>")
204
205 xo = s.invoke_elem(api)
206 if (xo.results_status() == "failed") :
207 print ("Error:\n")
208 print (xo.sprintf())
209 sys.exit (1)
210
211 print ("Received:\n")
212 print (xo.sprintf())
I can achieve the same results using less than 20 lines of XML:
XML Code

1 <?xml version="1.0" encoding="UTF-8"?>


2 <netapp xmlns="http://www.netapp.com/filer/admin" version="1.32">
3 <aggr-get-iter>
4 <desired-attributes>
5 <aggr-attributes>
6 <aggregate-name></aggregate-name>
7 <nodes>
8 <node-name></node-name>
9 </nodes>
10 </aggr-attributes>
11 </desired-attributes>
12 <max-records>2</max-records>
13 <query>
14 <aggr-attributes>
15 <aggregate-name>aggr0*</aggregate-name>
16 </aggr-attributes>
17 </query>
18 <tag></tag>
19 </aggr-get-iter>
20 </netapp>
And less than 50 lines of Python code:
Python Code

1 import sys
2 sys.path.append("<path_to_nmsdk_root>/lib/python/NetApp")
3 from NaServer import *
4
5
6 s = NaServer("192.168.0.205", 1 , 32)
7 s.set_server_type("FILER")
8 s.set_transport_type("HTTPS")
9 s.set_port(443)
10 s.set_style("LOGIN")
11 s.set_admin_user("admin", "<password>")
12
13
14 api = NaElement("aggr-get-iter")
15
16 xi = NaElement("desired-attributes")
17 api.child_add(xi)
18
19
20 xi1 = NaElement("aggr-attributes")
21 xi.child_add(xi1)
22
23 xi1.child_add_string("aggregate-name","<aggregate-name>")
24
25 xi2 = NaElement("nodes")
26 xi1.child_add(xi2)
27
28 xi2.child_add_string("node-name","<node-name>")
29 api.child_add_string("max-records","2")
30
31 xi3 = NaElement("query")
32 api.child_add(xi3)
33
34
35 xi4 = NaElement("aggr-attributes")
36 xi3.child_add(xi4)
37
38 xi4.child_add_string("aggregate-name","aggr0*")
39 api.child_add_string("tag","<tag>")
40
41 xo = s.invoke_elem(api)
42 if (xo.results_status() == "failed") :
43 print ("Error:\n")
44 print (xo.sprintf())
45 sys.exit (1)
46
47 print ("Received:\n")
48 print (xo.sprintf())
This is a massive reduction of code and complexity. It is a much welcomed result
considering the fact that youll likely be combining this code with other intelligence
(e.g configuring MDS switches) and/or other API calls.

Further to this, even without any comments in the code it is clear what the latter
script is achieving as opposed to the former script.
Network Traffic
Ill now put my network engineer hat on for a moment to show what goes on
behind the scenes with our API calls.

To make the information in my Wireshark captures more informative, Ive


connected to CDoT simulators through HTTP as opposed to HTTPS:

Please do not do this on production equipment though as it is completely


insecure. Your username and password will be transmitted in clear text, as will all
data that is requested and received.
Looking at the capture from our full, unedited query, we can see seven full sized
packets (1514 bytes) being sent from the cluster to my PC:
On the other hand, when we modify the request to include only the pieces of
information we want, we dont even see one full sized packet:

The difference between the two is massive.

Youd be forgiven for thinking that it isnt that much data in the grand scheme of
things, but when you consider the fact that some of your remote sites might have
high latency, low bandwidth and/or lossy connection, you begin to see the benefits
of reducing the payload size.

Treat APIs like SNMP


My final point on this subject is that you should treat APIs like SNMP. You wouldnt
walk an entire SNMP tree just to get a devices hostname, nor should you make
unnecessary API calls.

As always, if you have any questions or have a topic that you would like me to discuss, please feel free to post a comment at
the bottom of this blog entry, e-mail at will@oznetnerd.com, or drop me a message on Twitter (@OzNetNerd).
Note: This website is my personal blog. The opinions expressed in this blog are my own and not those of my
employer.