Welcome To The Isilon Fundamentals Course.: Publish Date: February 2016

Welcome to the Isilon Fundamentals course.
Copyright ©2016 EMC Corporation. All Rights Reserved. Published in the USA. EMC believes the information in this publication is
accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. The
trademarks, logos, and service marks (collectively "Trademarks") appearing in this publication are the property of EMC
Corporation and other parties. Nothing contained in this publication should be construed as granting any license or right to use
any Trademark without the prior written permission of the party that owns the Trademark.
EMC, EMC² AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos,
Authentica, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Bus-Tech, Captiva,
Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC CertTracker. CIO Connect,
ClaimPack, ClaimsEditor, Claralert ,cLARiiON, ClientPak, CloudArray, Codebook Correlation Technology, Common Information
Model, Compuset, Compute Anywhere, Configuration Intelligence, Configuresoft, Connectrix, Constellation Computing, EMC
ControlCenter, CopyCross, CopyPoint, CX, DataBridge , Data Protection Suite. Data Protection Advisor, DBClassify, DD Boost,
Dantz, DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS ECO, Document
Sciences, Documentum, DR Anywhere, ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender , EMC Centera,
EMC ControlCenter, EMC LifeLine, EMCTV, Enginuity, EPFM. eRoom, Event Explorer, FAST, FarPoint, FirstPass, FLARE,
FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, Illuminator ,
InfoArchive, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS,Kazeon, EMC LifeLine, Mainframe
Appliance for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor , Metro, MetroPoint, MirrorView, Multi-Band
Deduplication,Navisphere, Netstorage, NetWorker, nLayers, EMC OnCourse, OnAlert, OpenScale, Petrocloud, PixTools,
Powerlink, PowerPath, PowerSnap, ProSphere, ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional,
QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo,
SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, EMC Snap, SnapImage, SnapSure, SnapView, SourceOne,
SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX,
Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex, UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock,
Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, Virtualize Everything, Compromise
Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence, VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net,
WebXtender, xPression, xPresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise
Storage.
Publish Date: February 2016
Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 1

This course introduces EMC’s Big Data storage product, Isilon. Isilon is a scale-out NAS
storage solution with an architecture that differs substantially from other EMC storage. This
e-Learning introduces the architecture, features, and capabilities of Isilon to audiences who
have not encountered it previously.
This course is for you if you will be promoting, selling, or using Isilon in any way. You’ll also
find value in the course if you are an EMC employee seeking familiarity with all of EMC’s
product portfolio.
When you have completed the course, you’ll know how Isilon differs from other storage
products; you’ll understand Isilon’s node-based architecture; you’ll know what Isilon can
do; and you’ll be well set up to understand deeper, more technical training about Isilon.

Because Isilon Fundamentals is a foundational course, there are no formal prerequisites for
taking this training. There are no other classes you must have passed before taking this
course.
However, note that Isilon Fundamentals is, itself, a prerequisite course you must pass
before you can proceed along most other learning paths related to Isilon.

While there are no formal course prerequisites for this training, this training was created for
professionals who have some experience with networking and data storage. You will have
difficulty following this course unless you already know the following:
• The meaning of basic computer terms; for example, processor, RAM, and hard drive
• Basic Ethernet and TCP/IP networking concepts, such as IP addressing, routing, DNS,
VLANs, and switch management
• The basics of administrating a Windows-based local area network
• The basics of UNIX-style user and file management
• Concepts and technologies used in backup recovery solutions
• Basic awareness of applications that EMC customers typically use on their business
networks, such as file servers, databases, and email systems.
If you are already familiar with other data storage systems besides Isilon, that could work
for or against you. You’ll be able to understand the terminology because you’ve already
been using it. However, Isilon’s scale-out architecture differs enough from some other
storage systems that you may also have to unlearn concepts you thought pertained to all
storage. Keep an open mind, and you’ll probably feel intrigued by the innovations Isilon
offers.
If you are not already familiar with all these concepts, you are not part of the intended
audience for this training. Consult with your manager about appropriate courses to take
before taking this one.

Here’s what to expect from this course.
Module 1 sets the context for today’s data storage offerings by refreshing your mind about
what came before. How did we go from slow computers the size of refrigerators to today’s
modern data centers? Why is some data just called “data” but other data is “Big Data”?
What are DAS, SAN, and NAS? How do scale-up NAS and scale-out NAS differ? Module 1
explores those topics.
Module 2 delves into what the Isilon product is, in terms of hardware and architecture. Are
there models of Isilon products? If so, what are they for? How would you know when to use
one model versus another?
Module 2 tells you what Isilon is, while Module 3 starts explaining what Isilon does. What
protocols does Isilon support? How do you authenticate to the cluster? What’s the
management interface like?
Module 4 concludes the course by explaining how you manage and protect data using
Isilon. To give these issues real-world context, Modules 3 and 4 are written from the
perspective of a fictitious storage administrator who is retiring. He briefs you, his successor,
on what you need to know in order to take over his Isilon clusters.
We hope you find this training enjoyable and enlightening. Let’s get started!

Isilon is the industry leading, scale-out data storage system made up of multiple servers
called ‘nodes’. Nodes are combined together with software to create a ‘cluster’, which
behave as a single, central storage system for a company’s unstructured data.

The advantages of Isilon include a single file system, single volume, fast and easy
scalability, simplified management, different levels of data protection and security, a rich
feature set of Enterprise ready components, and the ability to reduces silos of storage by
using Isilon as a data lake. In the next slides, we’ll show how the Isilon storage system
implements these features.

Isilon offers industry leading enterprise features including security, compliance, replication,
snapshots, tiering, and CloudPools. Having all of your storage infrastructure in one data
lake allows ease of management, new insight into your data, and how you can use this
information to expand your business or improve your processes.

This module focuses on setting the context in which Isilon operates. It briefly considers the
history of digital storage, trends in the way corporations use digital data today, and what
separates conventional data from Big Data. You’ll also learn about major categories of data.
The module concludes by explaining how the most popular storage architectures came to
be, and Isilon’s position within that context.

After completing this lesson, you will be able to explain what Isilon is, examine history of
computer storage, illustrate changes in data storage needs, define Big Data, and identify
what makes a Data Lake.

The first general-purpose computer became operational in 1946. It was called the Electronic
Numerical Integrator And Computer (ENIAC) and used more than 17,000 vacuum tubes and
70,000 resistors to hold a ten-digit decimal number in memory. The data was output as IBM
punch cards, a format that continued well into the 1960s.
In the 1960s, magnetic tape eclipsed punch cards as the way to store corporate computer
data.
Later, mag tape gave way to the hard disk drive. The first IBM hard drive was the size of
two refrigerators and required 50 disks to store less than four megabytes of data.
In the 1980s, the personal computer revolution introduced miniaturization, bringing a wide
array of storage form factors. Less than 30 years after two refrigerator-sized units stored
less than 4 megabytes, the average consumer could store a good portion of that, about
one-third, on a three-and-a-half inch plastic disk.

Unstructured data continues to grow by greater amounts every year. An IDC study
published in 2008 showed that the amount of digital data created, captured, and replicated
worldwide grew tenfold in just five years. This finding was based on the proliferation of
then-new technologies such as Voice over IP, RFID, smartphones, and consumer use of
GPS; and the continuance of data generators such as digital cameras, HD TV broadcasts,
digital games, ATMs, email, videoconferencing, medical imaging, and so on.
A 2012 study from IDC found that the digital universe is still expanding at a breathtaking
pace. To understand the results, it helps to realize that the preface “exa” means one billion
billion, or one quintillion. An exabyte (EB) is one quintillion bytes. Another way to say it is,
an exabyte is one billion gigabytes.
In 1986, the entire world had the technical capacity to store merely 2.6 exabytes. By 2020,
the world will need to store more than 40,000 exabytes. Much of this growth occurs
because a person formerly had to sit in an office to use a computer, but today, billions of
individuals generate data, all day, everywhere they go, from mobile devices.
Thus, studies document that the world’s data storage needs are not merely growing; they
are mushrooming.

When talking about data storage, there are two main types of data: structured data and
unstructured data.
Structured data often resides in a fixed field inside a larger record or file. A large file usually
requires a data model that defines the type of data (i.e., is the data numeric or
alphanumeric?), how the data will be accessed, and how it will be processed. Today,
structured data is most often expressed as a relational database. The rigid table structure
makes structured data easy to query. Spreadsheets, library catalogs, inventory sheets,
phone directories, and customer contact information are all examples of structured data
that would fit neatly into the rows and columns of a database.
Unstructured data does not fit into neat rows and columns because it has little or no
classification data. Image files, photographs, graphics files, video and audio files are all
examples of unstructured data. Imagine that you have a spreadsheet with information
about your pet dog. The spreadsheet might have the dog’s name, birthdate, breed, color,
weight, parent’s names and information, breeder information, location, etc., and this data
would be very easy to plug into the predefined field of a database, as the information deals
with classifying individual traits. Now imagine what would happen if you tried to fit a
photograph of your dog into those same fields: it wouldn’t fit. There is no way to classify an
image in the same way that we list out names, birthdates, and height.
According to industry analysts, the creation rate of unstructured data outpaces structured
data, with unstructured data comprising 80 to 90% of ALL digital data.
Isilon specializes in storing unstructured data.

Another way of categorizing data storage systems is to describe them as block-based or
file-based. Block data is usually found in SAN (storage area network) technology, for
example, the VNX, whereas file data is usually associated with NAS (network attached
storage) technology, such as Celera and Isilon.
A block of data is a sequence of bits or bytes in a fixed length; the length is determined by
the file system. Saving a single piece of data requires the operating system, or OS, to break
the file into blocks and each block is written to a particular sector (area) of the drive. A
single file may require compiling many, many blocks together. Block data is especially
useful when working with small bits of information that need to be accessed or written
frequently; for example, a large database full of postal codes. Someone querying the
database probably wants only some or one of the postal codes, but rarely wants all of them.
Block data makes it easy to gather information in partial sets and is particularly adept at
handling high volumes of small transactions, such as, stock trading data, which could
generate one billion 18k files in only a few hours. Block format is the go-to for flexibility and
for when you need intensive speed of input and output operations.
File data is created depending upon the application and protocol being used. Some
applications store data as a whole file, which is broken up and sent across the network as
packets. All of the data packets are required to reassemble the file. Unlike block where you
can grab only one type of postal code, in file storage you would need the whole file content
in order for it to be useful. For example, a PDF file is generally not readable unless you have
all of it downloaded; having only part of the file will generate an error and not allow the file
to be opened. File-based data is organized in chunks too large to work well in a database or
in an application that deals with intense amounts of transactions. In IT applications, block
data usually relates to structured data while file data usually relates to unstructured data.
Isilon specializes in handling file-based data. Can Isilon do block-based storage?
Technically, yes, but if you are looking for a block-based solution there are other EMC
products that specialize in block and would best handle that type of workflow.

The term Big Data is being used across the technology industry but what exactly is Big
Data? Big Data is defined as any collection of data sets so large, diverse, and fast changing
that it is difficult for traditional technology to efficiently process and manage. What exactly
makes computer data “Big Data”?
The storage industry says that Big Data is digital data having too much volume, velocity, or
variety to be stored traditionally. To make sure the three V’s of Big Data are perfectly clear,
let’s consider some examples.

What do we mean by volume? Consider any global website that works at scale. YouTube’s
press page says YouTube ingests 100 hours of video every minute. That is one example of
Big Data volume.
What’s an example of velocity? Machine-generated workflows produce massive volumes of
data. For example, the longest stage of designing a computer chip is physical verification,
where the chip design is tested in every way to see not only if it works, but also if it works
fast enough. Each time researchers fire up a test on a graphics chip prototype, sensors
generate many terabytes of data per second. Storing terabytes of data in seconds is an
example of Big Data velocity.
Perhaps the best example of variety is the world’s migration to social media. On a platform
such as Facebook, people post all kinds of file formats: text, photos, video, polls, and more.
According to a CNET article from June 2012, Facebook was taking in more than 500
terabytes of data per day, including 2.7 billion Likes and 300 million photos. Every day.
That many kinds of data at that scale represents Big Data variety.
The “Three Vs” – volume, velocity, and variety – often arrive together. When they combine,
administrators truly feel the need for high performance, higher capacity storage. The three
Vs generate the challenges of managing Big Data.
Growing data has also forced an evolution in storage architecture over the years. Growing
data has also forced an evolution in storage architecture over the years due to the amount
of data that needs to be maintained, sometimes for years on end. Isilon is a Big Data
solution because it can handle the volume, velocity, and variety that defines the
fundamentals of Big Data. These topics will be addressed as the course continues.

Let’s start with the first of the three Vs, managing Big Data volume.
Challenge: Complex data architecture. SAN and scale-up NAS data storage
architectures encounter a logical limit at 16 terabytes, meaning, no matter what volume of
data arrives, a storage administrator has to subdivide it into partitions smaller than 16
terabytes. This is part of why customers wind up with silos of data. To simplify this
challenge, scale-out NAS such as an Isilon cluster holds everything in one single volume
with one LUN. Isilon is like one gigantic bucket for your data, and really can scale
seamlessly without architectural hard stops forcing subdivisions on the data.
Challenge: Low utilization of raw capacity. SAN and scale-up NAS architectures must
reserve much of the raw capacity of the system for management and administrative
overhead, such as RAID parity disks, metadata for all those LUNs and mega-LUNs, duplicate
copies of the file system, and so on. As a result, conventional SAN and NAS architectures
often use only half of the raw capacity available, because you have to leave headroom on
each separate stack of storage. Suppose you have seven different silos of data. As soon as
you put them in one big volume, you immediately get back the headroom from six of the
seven stacks. In that way, Isilon offers high utilization. Isilon customers routinely use 80%
or more of raw disk capacity.
Challenge: Non-flexible data protection. When you have Big Data volumes of
information to store, it had better be there, dependably. If an organization relies on RAID to
protect against data loss or corruption, the failure of a single disk drive causes
disproportionate inconvenience. The most popular RAID implementation scheme allows the
failure of only two drives before data loss. (A sizable Big Data installation will easily have
more than 100 individual hard drives, so odds are at least one drive is down at any given
time.) The simpler answer is to protect data using a different scheme. Shortly you’ll learn
about Isilon’s clustered architecture based on nodes that do not use RAID. Nodes full of
hard drives rely less on any single drive and can recover a failed drive as a non-emergency.

What advantages does scale-out NAS offer for administrators coping with high velocity, the second V
of Big Data? Here are some examples.
Challenge: Difficult to scale performance. Some data storage architectures use two controllers,
sometimes referred to as servers or filers, to run a stack of many hard drives. You can scale capacity
by adding more hard drives, but it’s difficult to scale performance. In a given storage stack, the hard
drives offer nothing but capacity -- all the intelligence of the system, including computer processing
and RAM, must come from the two filers. If the horsepower of the two filers becomes insufficient, the
architecture does not allow you to pile on more filers. You have to start over with another stack and
two more filers. In contrast, every node in an Isilon cluster contains capacity PLUS compute power
PLUS memory. The nodes can work in parallel, so each node you add scales out linearly – in other
words, all aspects of the node scale up, including capacity and performance.
Challenge: Silos of data. Due to the architectural restrictions we just discussed, SAN and scale-up
NAS end up with several isolated stacks of storage. Many customer sites have a different storage
stack for each application or department. If the R&D stack performs product testing that generates
results at Big Data velocity, the company may establish an HPC stack, which could reach capacity
rapidly. Other departments or workflows may have independent storage stacks that have lots of
capacity left, but there’s no automated way for R&D to offload their HPC overflow to, for example, a
backup storage stack. Instead, an administrator has to manually arrange a data migration. In
contrast, an Isilon cluster distributes data across all its nodes to keep them all at equal capacity. You
don’t have one node taking a pounding while other nodes sit idle. There are no hot spots, and thus, no
manual data migrations. Automated balancing makes much more sense if the goal is to keep pace
with Big Data velocity.
Challenge: Concurrency. In conventional storage, a file is typically confined to a RAID stripe. That
means that the maximum throughput of reading that file is limited to how fast those drives can deliver
the file. But in modern workflows, you may have a hundred engineers or a thousand digital artists all
needing access to a file, and those RAID drives can’t keep up. Perhaps the two filers on that stack
can’t process that many requests efficiently. Isilon’s answer is that every node has at least a dozen
drives, plus more RAM and more computer processing, for more caching and better concurrent
access. When there is heavy demand for a file, several nodes can deliver it.
Challenge: Many manual processes. Besides manual data migrations, conventional storage has
many more manual processes. An administrator over a SAN or a scale-up NAS product spends a
significant amount of time creating and managing LUNs, partitioning storage, establishing mounts,

launching jobs, and so on. In contrast, Isilon is policy-driven. Once you define your
policies, the cluster does the rest automatically.
Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals ‹#›
A scale-out data lake is a large storage system where enterprises can consolidate vast
amounts of their data from other solutions or locations, into a single store called a data
lake. This helps address the variety issue with Big Data. The data can be secured, analysis
can be performed, and actions can be taken based on the insights that surface. Enterprises
can then eliminate the cost of having silos or “islands” of information spread across their
enterprises. The scale-out data lake further enhances this paradigm by providing scaling
capabilities in terms of capacity, performance, security, and protection.

This lesson briefly reviewed how business entities have stored digital data since it first came
into the mainstream in the 1950s. We examined several trends in data storage, including
the fact that because the amount of data stored always grows, that forces changes in
storage formats. We learned that Big Data is defined by its volume, velocity, and variety, all
of which occur at too great a scale for traditional means to process, store, and manage and
we took a quick look into what makes a data lake.

In this lesson, we introduce how and what makes Isilon's OneFS operating system unique.
We then delve into the challenges of using Big Data and the uses of multi-protocol. We'll
also explore the Edge-to-Core-to-Cloud solution and where Isilon fits into this overall
storage landscape. Let’s get started!

In the early days of computer data, corporations stored data on hard drives in the server. The
company’s intellectual property depended entirely upon that hard drive continuing to work. Thus, to
minimize risk, corporations mirrored the data on a Redundant Array of Independent Disks, or RAID,
for short. RAID disks were directly attached to a server so that the server thought the hard drives
were part of it. This technique is called Direct Attached Storage, or DAS.
As applications proliferated, soon there were many servers, each with its own DAS. This worked fine,
with some drawbacks. If one server’s DAS was full while another server’s DAS was half empty, the
empty DAS couldn’t share its space with the full DAS. People thought, “What if we took all these
individual storage stacks and put them in one big stack, then used the network to let all the servers
access that one big pool of storage? Then our servers could share capacity!”
Accomplishing that approach required a traffic cop to keep track of what data went with what
application. Thus, the Volume Manager was invented. Adding a volume manager to the storage
system created the Storage Attached Network, or SAN.
SAN was optimized for block data. It worked fine until employers began giving their employees
computers. Employees then needed to get to the stored data, but they couldn’t: SAN was set up for
servers, not personal computers. PCs worked differently from the storage file server, and network
communications only communicate from one file system to another file system. The answer arrived
when corporations put employee computers on the network, and added to the storage a file system to
communicate with users. Thus, Network Attached Storage, or NAS, was born.
NAS works pretty well. But it could be improved. For example, now the server is spending as much
time servicing employee requests as it is doing the application work it was meant for. The file system
doesn’t know where data is supposed to go, because that’s the volume manager’s job. The volume
manager doesn’t know how the data is protected; that’s RAID’s job. If high-value data needs more
protection than other data, you need to migrate the data to a different volume that has the protection
level that data needs. So there is opportunity to improve NAS.
To alleviate these issues, Isilon combined the file system, the volume manager, and the data
protection into one seamless, self-aware OS: OneFS.
Some advantages of this approach include the simplicity of having all data in a single file system and
a single volume. When you have storage capacity without hard architectural limitations, your system
is easier to manage and grow.
Isilon was designed to work in a mixed environment. Even if the clients attached to the server are a

mix of Windows, UNIX, and Mac OS X operating systems, Isilon offers a single unified
platform for all.
Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals ‹#›
Though we’ve defined DAS, SAN, and NAS, let’s draw attention to the distinction between
two kinds of NAS architectures: scale-up and scale-out.
Scale-up NAS came first, represented here with a green line. In this architecture, a pair of
controllers or filers manages a stack of disk trays. You can readily add capacity – if you
need more storage, you simply add more drives. But the architecture doesn’t let you pile on
more filers. As disk space grows, computing resources do not.
In contrast, scale-out NAS, represented here with a blue line, uses nodes. Each node
contains drives, but also more processing and more memory. By adding nodes,
performance and capacity scale out in proportion. The green line shows that over time, the
filers must work harder and harder to manage the growing capacity. Result: performance
slows.
The blue line shows that as you add nodes, performance improves, because every node can
exploit all the resources of every other node.
DAS, SAN, and scale-up NAS have their places, but were invented before the Big Data era.
Scale-out NAS systems were built for Big Data. Thus, in many regards, scale-out NAS
architecture makes managing Big Data less challenging. The next three slides give
examples of how.

Isilon scale-out NAS architecture simplifies managing Big Data. Isilon’s innovative operating
system, OneFS, gets its name from the fact that it can scale up to more than 50+
petabytes of storage in one volume, one namespace, and one file system.
Isilon was purpose-built to ease the challenges of processing, storing, managing, and
delivering data at scale. Isilon’s positioning is to provide simple yet powerful answers for
Big Data storage administrators.

Isilon's centralized file storage system scales easily for each customer’s needs and provides
access to data by using standardized protocols. The single central repository of data can be
accessed from all of these protocols simultaneously. Even Hadoop Distributed File system,
or HDFS, is offered as a protocol, eliminating the need for separate Name Nodes and
Storage Nodes for Hadoop functionality.

EMC IsilonSD Edge is a new software-defined storage product that expands the data lake by
bringing in data from the edge (remote and branch offices), enabling you to consolidate,
simplify management, and protect unstructured data. This new product will enable you to
consolidate data from edge locations to your core data center and then leverage the multi-
protocol capabilities to support a wide range of 2nd and 3rd platform applications, including
big data analytics, to enable you to gain value and insight from the enterprise edge data.
Management at the edge is simplified by using the familiar software tools and the
automated features found in OneFS. By leveraging off-the-shelf hardware and virtual server
environments located at your remote locations, you can deploy an economical software-
defined storage solution with the power of Isilon OneFS. You can also increase efficiency
and storage utilization at the edge to over 80% by aggregating unused storage capacity.
Finally, by using IsilonSD Edge, you can increase data protection by automatically
replicating data to your core data center while eliminating the need for manual data backup
processes and protect data consistently at all of your remote locations.

The new Isilon CloudPools software—which enables your customers to select from a number
of public cloud services or use a private cloud based on EMC Elastic Cloud Storage (ECS)—
provides the policy-based, automated tiering that enables your customers to seamlessly
integrate with the cloud as an additional storage tier from the Isilon cluster at their data
center.
CloudPools lets your customers address rapid data growth and optimize data center storage
resources by using the cloud as a highly economical storage tier with massive storage
capacity for cold or frozen data that is rarely used or accessed. This enables more valuable
on premise storage resources to be used for more active data and applications. To secure
data that is archived in the cloud, CloudPools encrypts data that is transmitted from the
Isilon cluster at the core data center to the cloud storage service. This data remains
encrypted in the cloud until it is retrieved and returned to the Isilon cluster at the data
center.

You can use EMC Isilon to consolidate file-based, unstructured data into a data lake that
can eliminate costly storage silos, simplify management, increase data protection, and
acquire more value from your data assets.
With built-in multi-protocol capabilities, Isilon can support a wide range of traditional and
next-generation applications on a single platform—including powerful Big Data analytics
that provide you with better insight and use of your stored information.
Data at edge locations (e.g., remote or branch offices) is growing. These edge locations are
often inefficient islands of storage, running with limited IT resources and inconsistent data
protection practices. Data at the edge generally lives outside of the business data lake,
making it difficult to incorporate into data analytics projects. The new edge-to-core-cloud
approach can extend your Isilon data lake to edge locations and out into the cloud, thus
enabling consolidation, protection, management, and backups of remote edge location data.

Isilon provides the industry leading scale-out clustered storage solution. It provides a single
volume of data storage at a massive scale that is easy to use and manage, offering linear
scalability and readiness for a customer’s performance applications, Hadoop analytics, and
other workflows. A data lake is a single central data repository that can store data from a
variety of sources, such as file shares, web apps, and the cloud. It enables businesses to
access the same data for a variety of uses and enables the data to be manipulated using a
variety of clients, analyzers, and applications. The data is real-time production data and
does not need to be copied or moved from an external source, like another Hadoop cluster,
into the data lake. The data is stored on the central storage solution and provides secure
access to the data. The data lake provides tiers based on data usage, and the ability to
instantly increase the storage capacity when needed. The above slide identifies the key
characteristics of a scale-out data lake.

This lesson covered how Isilon OneFS came to be. We identified Isilon's Big Data position,
examined IsilonSD Edge and the Isilon CloudPool feature, and discussed how to use these
features to provide data management from Edge-to-Core-to-Cloud.

This module focused on setting the context in which Isilon operates. We briefly considered
the history of digital storage, trends in the way corporations use digital data today, and
what separates conventional data from Big Data. You also learned about major categories
of data. The module concluded by explaining how the most popular storage architectures
came to be, and Isilon’s position within that context.

In this module, we turn our focus from abstract ideas about storage, to the tangible parts of
the Isilon product and how they’re arranged. After this module, you should understand how
the parts fit together. You’ll also become familiar with our principal node models, and
understand at a high level, which node typically goes with what uses.

This lesson explains what pieces make up an Isilon cluster, and how the pieces fit together
to communicate with one another. Let’s start right in.

The basic building block of an Isilon NAS cluster is a node. Our smallest cluster begins with
three nodes; combine them and you have a cluster. Clusters can scale from 3 to 144 nodes
and can be comprised of different node types.
Architecturally, every Isilon node is equal to every other Isilon node in a cluster. There is
NO node that is THE controller or THE filer. Instead, OneFS unites the entire cluster in a
globally coherent pool of memory, CPU, and capacity. OneFS writes files in stripes across
the nodes for built-in high availability. For reads, whichever node receives the request for a
file can manage the other nodes in re-assembling the file, then delivering it.

This slide provides an overview of the Isilon scale-out NAS architecture.
• Starting at the Client/Application layer, the Isilon NAS architecture supports mixed
modes. Windows, UNIX, and OSX operating systems can all connect to an Isilon cluster
and access the same files.
• At the networking level, the Isilon OneFS operating system supports key industry-
standard protocols over Ethernet, including network file shares, Server Message Block
(SMB), HTTP, FTP, Hadoop Distributed File System (HDFS) for data analytics, SWIFT, and
REST for object and cloud computing requirements. As a file-based storage system,
Isilon does not support protocols associated with block data.
• Nodes are combined into one volume by the OneFS operating system. All information is
shared among nodes, thus allowing a client to connect to any node in the cluster and
access any directory in the file system.
• On the back end, all the nodes are connected with an InfiniBand fabric switch for low-
latency internal communication with one another.
That’s the overview of the system. Next, let’s look at each portion in further depth.

The external networking components of an EMC Isilon cluster provides client access over a
variety of protocols. Each storage node connects to one or more external Ethernet networks
using 1 Gigabit Ethernet, or 1GbE, connections or 10 GbE connections.
The 1 GbE and 10 GbE interfaces support link aggregation. Link aggregation creates a
logical interface that clients connect to. In the event of a NIC or connection failure, clients
do not lose their connection to the cluster. For stateful protocols, such as SMB and NFSv4,
this prevents client-side timeouts and unintended reconnection to another cluster. Instead,
clients maintain their connection to the logical interface and continue operating normally.
Support for Continuous Availability, or CA, for stateful protocols like SMB and NFSv4 is
available with OneFS 8.0.

The nodes in the cluster communicate internally using InfiniBand, which was designed as a
high-speed interconnect for high performance computing. The reliability and performance of
the interconnect is very important in creating a true scale-out storage system. The
interconnect needs to provide both high throughput and very low latency. InfiniBand meets
this need, acting as the backplane of the cluster, enabling each node to contribute to the
whole.
A single front-end operation can generate multiple messages on the back end, since the
nodes coordinate work among themselves when they write or read data. Thus, the dual
backend InfiniBand switches handle all intra-cluster communication and provide redundancy
in the event that one switch fails.

Isilon does not use RAID. This section details what Isilon does instead.
In a single cluster, how many hardware failures can the system withstand while offering the
customer 100% data availability? That depends. The answer requires understanding N + M.
N + M comes from the Reed-Solomon algorithm, an industry standard developed to
enhance data integrity when it’s undesirable to have data retransmitted from another
source. Most DVDs and television broadcasts use Reed-Solomon codes so that you can view
video data without interruption.
In the N+M data model, N represents the number of nodes, and M represents the number
of simultaneous hardware failures that the cluster can withstand without incurring data loss.
“Hardware failures” refers to drives, nodes, or a combination of drives AND nodes.
As the system writes the data, it also protects the data with parity bits. The OneFS
operating system spreads the data across numerous drives in multiple nodes so that if part
of the data goes missing, the missing data can be recalculated and restored. This involves
complex mathematics, but to illustrate the concept, we’ll use a basic example. In this
calculation, 5 plus 3 plus 1 represents data stored on three different drives. What is the
sum of 5 plus 3 plus 1? Obviously, 9. Nine represents a parity bit; a value that OneFS sets
to show what total should result when the binary data is added together.
Suppose the drive holding the “3” stops working. Knowing that five plus something plus one
must equal nine, OneFS can easily rebuild the missing data. With the aid of the parity bit,
any one value could vanish, and OneFS could readily recalculate and restore it.

This Reed-Solomon approach of using parity bits to reconstruct data is called Forward Error
Correction, or FEC. FEC allows the customer to choose how many bits of parity to
implement. One bit of parity for many disks is known as N + 1; two parity points for many
disks is known as N + 2, and so on.
With N + 1 protection, data is 100% available even if a single drive or node fails. With N +
2 protection, two components can fail, but the data will still be 100% available. OneFS
supports up to N+4 – users can organize their cluster so that as many as four drives, or
four entire nodes, can fail without loss of data or of access to the data.
RAID is disk-based, so when you choose a level of protection – that is, how many parity bits
– you’ve chosen for the entire RAID volume. With Isilon’s FEC approach, you can set
different levels of protection for different nodes, directories, or even different files. Also you
can change protection levels on the fly, non-disruptively. Unlike RAID where you have the
same protection level across all the disks and this cannot be changed without reformatting
the disks.
When a client connects though a single node and saves data, the write operation occurs
across multiple nodes in the cluster. This is also true for read operations. When a client
connects to a node and requests a file from the cluster, the node that the client has
connected uses the backend InfiniBand network to coordinate with other nodes to retrieve,
rebuild, and deliver the file back to the client.

During a write operation, when OneFS stripes data across nodes, the system breaks the file
data into smaller logical sections called a stripe unit. The smallest element in a stripe unit is
8 kilobytes and each stripe unit is 128K, or 16 8kb blocks. If the data file is larger than 128
kb, the next part of the file is written to a second node. If the file is larger than 256
kilobytes, the third part is written to a third node, and so on. OneFS stripes these 128
kilobyte units across the cluster, using advanced algorithms to determine data layout for
maximum efficiency and performance.
The process of striping spreads all write operations from a client across the nodes of a
cluster. The example in this animation demonstrates how a file is broken down into chunks,
after which it is striped across disks in the cluster along with parity, also known as forward
error correction (FEC).
Though a client connects to only one node, when that client saves data to the cluster, the
write operation occurs in multiple nodes in the cluster. Each node contains between 12 and
60 hard disk drives, or a combination of SSDs and disk drives. As the system lays data
across the cluster, it sequences stripe units across nodes, and each node in turn may utilize
numerous drives, if the file is large enough. This method minimizes the role of any specific
drive or node. If that piece of hardware stops working, the data it contains can be
reconstructed.

A client connects to a single node at a time. When that client requests a file from the
cluster, the node that the client has connected to will not have the entire file on its local
drives.
The node with the requesting client then uses the backend InfiniBand network to coordinate
with other nodes to retrieve, rebuild, and deliver the file.

This lesson explained what pieces make up an Isilon cluster and how the pieces fit together
to communicate with one another.

In this lesson, we dive deeper into various models of Isilon nodes. You’ll get an overview of
the intended uses for each model, then some of the technical detail supporting the
overview. You’ll also see how users can add nodes to scale their Isilon cluster. The lesson
concludes with an explanation of optional enhancements to nodes, such as encryption and
accelerators.

Nodes comprise the storage capacity and processing power of the Isilon scale-out NAS
platform. Each node provides network connectivity, storage, memory, non-volatile RAM (or
NVRAM) and processing power (or CPUs). There are also different types of nodes that can
be mixed and matched to meet specific requirements.

The Isilon product family consists of four storage node series: S-Series, X-Series, NL-
Series, and the new HD-Series.
The S-Series is for ultra-performance primary storage and is designed for high-transactional
and IO-intensive tier 1 workflows.
The X-Series strikes a balance between large capacity and high-performance storage.
X-Series nodes are best for high-throughput and high-concurrency tier 2 workflows and also
for larger files with fewer users.
The NL-Series is designed to provide a cost-effective solution for tier 3 workflows, such as
nearline storage and data archiving. It is ideal for nearline archiving and for disk-based
backups.
The HD-Series is the new high-density, deep archival platform. This platform is used for
archival level data that must be retained for long, if not indefinite, periods of time.

A primary purpose of scale-out NAS is the “scale-out” part. An administrator should be able
to expand the storage at will by adding a new node. In Isilon’s case, once the node is
racked and cabled, adding it to the cluster takes about one minute. That’s because
automated policies will automatically discover the node, set up addresses for the node,
incorporate the node into the cluster, and begin automatically rebalancing capacity on all
nodes to take advantage of the new space. In that brief time, the node fully configures
itself, is ready for new data writes, and begins taking on data from the other nodes to
AutoBalance the entire cluster.
The video on this slide was shot by an enthused Isilon customer who posted it on YouTube
with the comment that adding a node was “insanely fast.” It’s not especially exciting to
watch, until you realize that accomplishing the same tasks with another NAS solution takes
26 steps and multiple hours.
If you look at the free space available in the pie chart, this Isilon customer took his system
from 280 terabytes to 403 terabytes, adding 120 terabytes of storage in a minute.

In addition to the base node models, Isilon offers some optional add-ons.
DARE stands for “Data At Rest Encryption.” Data at rest is any data sitting on your drives.
DARE is used for confidential or sensitive information. If somehow a hostile party infiltrated
your network, and IF they were able to access your Isilon cluster, and even IF they
somehow acquired the various levels of permissions and access to see the data striped on
the clusters –with DARE, they still could not read the data, because it’s encrypted.
In fact, many vertical markets require DARE. For instance, federal governments, financial
services, and HIPAA-compliant health care providers all must encrypt stored data.
A less obvious benefit of DARE occurs when it’s time to upgrade your hardware. If you run a
corporation in one of the regulated industries, and you need to dispose of an old drive, you
have a problem. The data on it is readable. Once you pull the drive from your systems, you
can’t just issue a GUI command to erase it. Further, many “erase” programs don’t literally
delete the data – they just mark the sector of the drive that holds the data as available for
overwrite. Unless something overwrites the data, it is still there, and can be recovered with
hacker tools. For that reason, a whole industry has sprung up around physically destroying
retired hardware. DARE provides an easier solution. It means that all data at rest has been
encrypted. If the data has been encrypted with a 256-bit key for all its life, you can recycle
the drive as is, without fear that anyone can read it. Isilon implements DARE by offering
optional Self-Encrypting Drives, or SEDs.

An Isilon accelerator is made for customers who don’t need more capacity but need certain
workflows to go faster. An accelerator contains processors and memory, but no storage. It
dedicates performance to a single client or group of clients. In essence, an accelerator acts
as a large cache to increase single stream performance.
The accelerator is based on a single hardware platform, but comes in two variations: a
model for pure performance, and a model to speed up backups. Both models have Intel
Sandy Bridge Hex-Core 2 GHz processors. The main differences between the models is that
the pure performance accelerator has more memory, 256 GB of RAM, used for L1 cache –
while the model for accelerating backups is the only Isilon node that offers Fibre Channel
ports.

This lesson dove deeper into various models of Isilon nodes. You got an overview of the
intended uses for each model then some of the technical detail supporting the overview.
You saw how users can add nodes to scale their Isilon cluster. The lesson concluded with an
explanation of optional enhancements to nodes, such as encryption and accelerators.

In this module, we turned our focus from abstract ideas about storage to the tangible parts
of the Isilon product and how they’re arranged. You should now be able to understand how
the parts fit together. You became familiar with our principal node models, and now
understand, at a high level, which node typically goes with what uses.

In this module, we'll go through an overview of OneFS, the management interface,
authentication, multiprotocol support, and your options for getting remote technical
support.

In this lesson, we will look more deeply at what OneFS is, understand the management
options, discuss authentication, understand how we do multiprotocol support, and identify
options for getting remote technical support.

The key to Isilon’s scale-out NAS is the architecture of OneFS, which is a distributed
clustered file system. That means one file system spans across every node in the storage
cluster. If you add more nodes, that file system automatically redistributes content to take
advantage of the entire cluster. When the system needs to write a file, it breaks data into
smaller sections‒called “stripes” ‒and then, for performance reasons, as well as for data
protection, OneFS stripes data across all the nodes and drives in a cluster.
As the system writes the data, it also protects the data. We already spoke about N + M
protection and parity bits. The technical way to describe that kind of fault tolerance
protection level is to say that OneFS uses Reed Solomon forward error correction
algorithms. The system can continuously reallocate data and make storage space more
usable and efficient. Isilon calls this ability Flexprotect.
Unlike scale-up NAS, in scale-out NAS there is no single master node or device that controls
the cluster. Each node is a peer that shares the management workload and acts
independently as a point of access for incoming data requests. That way, you don’t get
bottlenecks when there are a bunch of simultaneous requests. Thus, there is a copy of
OneFS on every node in the cluster. This approach prevents downtime, since every node
can take over activities for another node that happens to go offline.
Internally in the cluster, OneFS coordinates all the nodes on the back end, across an
InfiniBand switch.

The OneFS architecture is designed to optimize processes and applications across the
cluster. Usually one node doesn’t do a service or function unless it gets more nodes to help.
The shared infrastructure permits access to resources on any node in the cluster from any
other node in the cluster and you get the performance benefits of parallel processing. The
results are improved utilization of cluster resources for compute power, disk, memory and
networking. Because all the nodes work together, the more nodes you add, the more
powerful the cluster gets.

OneFS supports access to the same file using different protocols and authentication
methods at the same time. SMB clients that are authenticating using AD, and NFS clients
that are authenticating using LDAP, can access the same file with their appropriate
permissions applied. The permissions activities are seamless to the client.
To enable multiprotocol file access, Isilon translates Windows Security Identifiers, or SIDS,
and UNIX User Identities, or UIDs, into a common identity format. OneFS stores these
identities on the cluster, tracking all the user IDs from the various authentication sources.
OneFS also stores the appropriate permissions for each identity or group. We call this
common identity format stored on the cluster the “on-disk representation” of users and
groups.
So, for instance, the SMB protocol exclusively uses SIDs—that is, Security Identifiers—for
authorization data. If a user needs to retrieve a file for a Windows client, as OneFS starts to
retrieve the file, it converts the on-disk identity into a Windows-friendly SID and checks
permissions. Or, if the user is saving a file, OneFS would do the same kind of translation,
from the on-disk representation to SIDs, before saving. This works the same way on the
UNIX side, only using UIDs and GUIDs instead of SIDs. And that’s how all users can access
OneFS files in our mixed-platform client environment.

Authentication services offer a layer of security by verifying users’ credentials before
allowing them to access and modify files. Authentication answers the question, “Are you
really who you say you are?”
OneFS supports four methods for authenticating users: Active Directory, LDAP, NIS, and
Local (or file provider) accounts on the cluster.
You are likely already familiar with Active Directory. While Active Directory can serve
many functions, the primary reason for joining the cluster to an AD domain is to let the AD
domain controller perform user and group authentication. Each node in the cluster shares
the same Active Directory computer account, making it easy to administer and manage.
You probably know LDAP, too. A primary advantage of LDAP is the open nature of its
directory services and the ability to use LDAP across many platforms. OneFS can use LDAP
to authenticate user and group access to the cluster.
Network Information Service (NIS) is Sun Microsystem’s directory access protocol. By
the way, NIS differs from NIS+, which the Isilon cluster does not support.
Local / File Provider – Isilon supports local user and group authentication using the web
administration interface.
If you want to enable multiple authentication methods to the cluster, be sure you test how
they interact. You have to work methodically, or you can get confused about what is
authenticating who.

Each and every node contains a copy of the OneFS operating system and the cluster’s
configuration files. To execute its functions, OneFS creates automated policies. Managing by
automated policies makes processes repeatable, which decreases the time you spend
manually managing the cluster. You can change any policy as required. If you change a
configuration, the configuration updates on every node. The cluster executes policies as one
cohesive system.
Policies drive every process in OneFS. That includes the way data is distributed across the
cluster (and on each node); how client connections get distributed among the nodes; and
when and how maintenance tasks execute. Policies are kind of a big deal in OneFS, because
they enable so many automated activities.

You have three options for managing the cluster. You can use the web administration
interface, the CLI, and the Platform Application Programming Interface, or PAPI. The web
administration interface is pretty robust, but if you’re willing to dive into the CLI, you can
do a bit more advanced configuration. Some management functionality is only available in
the web administration interface. Conversely, sometimes the CLI offers a feature that’s not
available in the web administration interface.

The third way to access the cluster is through the application programming interface, or
API. The API enables customer and third-party applications to programmatically execute
management commands and directly access data on the cluster.
The APIs are separated into two types of operations: cluster management and access to
data. Cluster management is performed using the platform API, or PAPI. A limited number
of OneFS commands are currently PAPI enabled. The other API is RESTful access to the
namespace, or RAN. An API that adheres to the principles of REST does not require the
client to know anything about the structure of the API. Rather, the server provides
whatever information the client needs to interact with the service. Our internal developers
really like accessing the cluster using RAN; it’s one of the reasons API accesses the cluster
using good ol’ HTTP.

There are three different types of support available. You can manually upload log files to
EMC Isilon support FTP site as needed. The log files provide detailed information about the
cluster activities if an issue arises and a client needs technical support. EMC’s support
personnel request these files at the beginning of a support call.
The second option, called SupportIQ, is integrated into OneFS. The cluster automatically
generates and uploads log files to the EMC Isilon support site, on a schedule. This saves
clients time and provides multiple log files to EMC, which is very useful if a technical issue
has changed over time. System alerts called events are sent to EMC Isilon Support as part
of the service. Isilon provides some proactive support based on events and Isilon tech
support may reach out to clients when they see that a cluster is filled to 90% of its
capacity, for example. If they choose to, clients can grant permission for Isilon support to
remotely access the cluster through a secure connection.
For the third option, OneFS can use the EMC Secure Remote Support service (ESRS). This is
similar to SupportIQ and uses the same secure remote administration service. Data centers
that already have EMC devices under remote support can choose this option to keep their
support unified.

In this lesson, we looked more deeply at what OneFS is, discussed the management options
and authentication, understood how to provide multiprotocol support, and identified options
for getting remote technical support.

This module covered Isilon general functionality, including an overview of OneFS, the
management interface, authentication, multiprotocol support, and three remote support
options.

This module goes into more detail about how to manage the data on Isilon clusters, and it
outlines various ways OneFS assures data security, integrity, and availability.

After completing this lesson, you should be able to understand connection management,
explain data distribution, I/O optimization and data protection, be able to configure
management roles, manage the cluster’s capacity, identify data visibility and analytics, and
examine deduplication.

Clients can connect to a cluster in various ways. If you want to keep it simple, OneFS lets
you give the cluster a virtual host name and clients can access the cluster using DNS. If you
have clients that must connect to specific interfaces into the cluster, you can assign Static
IP addresses to the clients; and you can assign static IP addresses to nodes in the cluster.
You can also use both DNS and direct access to static IP addresses, depending on the
workflows.
Connectivity is based on standard networking and DNS principles. You can assign multiple
subnets to the cluster. Isilon refers to these as SmartConnect zones. Each zone is assigned
a group of external interfaces from a set of nodes.
Using the virtual host name, clients are assigned an IP address to connect to when they
access the cluster. The standard distribution policy uses round-robin, assigning clients to
the next available node interface IP address.

Other enhanced functionality includes additional policies for distributing connections in
addition to the standard round-robin style. With the enhanced license, you can have clients
connect to a given node based on criteria you define. For example, you can have clients
connect to the IP address with the lowest number of connections at the moment. Or, you
can direct client connections to the interface currently showing the least throughput, or the
least CPU usage.
The enhanced functionality includes continuously availability for SMB, NFSv3, and NFSv4.
This feature allows SMB, NFSv3, and NFSv4 clients to dynamically move to another node in
the event the node they are connected to goes down.
This feature applies to Microsoft Windows 8, Windows 10 and Windows Server 2012 R2
clients. This feature is part of Isilon's non-disruptive operation initiative to give customers
more options for continuous work and less down time. The CA option allows seamless
movement from one node to another and no manually intervention on the client side. This
enables a continuous workflow from the client side with no appearance or disruption to their
working time. CA supports home directory workflows as well.

Let’s start with the data distribution – we’re talking about how OneFS spreads data across
the cluster.
You can have various models of Isilon nodes, or “node types,” in a cluster. Nodes are
assigned to 'node pools' based on model type. The cluster can have multiple node pools
within a single cluster, and groups of node pools can be combined to form tiers of storage.
The data target can be a tier or, if SmartPools is licensed, a specific individual node pool.
Several policies determine how the data distribution occurs. The default policy is for data to
write anywhere in the cluster. Data distributes among the different node pools based on
percentage of most available space.

You can optimize Data Input/Output to match the workflows in the environment.
Optimization can be managed cluster wide – that’s the default – and at the level of
individual directories and even individual files. The data access pattern can be optimized for
random access, sequential access or concurrent access.
Pre-fetch is a guess at what you’ll need before you ask for it. When clients open larger files,
especially streaming formats like video and audio, the cluster assumes that you will
generally watch minute 4 of that video after minute 3. So it proactively loads minutes 4, 5,
and maybe even 6 into memory ahead of when your computer requests it. Then delivering
those minutes will be faster than if the cluster had to go to the hard drive repeatedly upon
each request. That’s the concept behind “pre-fetch.” With OneFS, you can configure the
pre-fetch cache characteristics to work best with the selected access pattern.

Data protection refers to how many components in a cluster can malfunction without loss of
data. Data protection on OneFS is manageable and flexible.
The system is enabled by default for virtual hot spares (VHS). VHSenables you to allocate
disk space to hold the data to be rebuilt when a disk drive fails.
Your earlier training acquainted you with Forward Error Correction. RAID works at the disk
level – once you’ve chosen a RAID type, that whole RAID volume can only be that type of
RAID; and if you want to change the RAID type, you’d have to move all the data off the
RAID disks before you can reformat.
But since Isilon uses FEC for protection, you can set the protection level different based on
tier, node pool, directory, and even by the individual file. Extra protection creates extra
overhead because OneFS writes more parity bits. You decide how to trade off extra capacity
(meaning, less protection) with greater redundancy (meaning, less capacity). So based on
the value of the data, you can adjust the protection level. For example, an R&D department
has a node pool dedicated to testing. Test data is not true production data so they’ve set
minimal +1 protection. But their customer database is the company’s most valuable asset.
Customer data hits a different node pool set to +4 protection. Protection is flexible with
OneFS. They could even set up mirrors of file up to eight mirrors of each file. This is not
space efficient, but for very frequently read files it can really speed things up.
The standard functionality is called unlicensed SmartPools; or sometimes, SmartPools Basic.
If you license SmartPools, you get enhanced capabilities.

In addition to the built-in root and admin roles, OneFS provides the capability for role-based
access control (RBAC). RBAC means you can define privileges for an administrator to
customize management access. The privileges apply to the web administration interface
and the CLI, and a smaller set of privileges for PAPI management. There are four built-in
roles or you can create custom roles to fit your needs or your business model. A user can
be assigned to more than one role at a time.

One of the ways you can subdivide capacity is by assigning storage quotas to users or
groups. You manage the quotas by policy. Quotas can be set by user, by group or by
directory or path. You can also nest quotas, apply quotas within quotas. For example, you
can place a quota on a whole department, then a smaller quota on each user within that
department, and even a different quota on a File Share they all use, and yet another on the
sub-directories of that File Share. All these are flexible and can be applied or modified on
the fly.
Quotas let you implement thin provisioning. For example, the day you tell a group “You can
use up to one terabyte of storage!” that group won’t instantly fill up the full terabyte. They
may NEVER fill it. But with quota-based thin provisioning, you can keep showing the group
an available terabyte of storage, even if you don’t have a full terabyte actually available on
the cluster currently.
OneFS has three primary types of quotas: accounting or advisory quotas; plus two levels of
enforcement quotas, soft limit and hard limit. Advisory quotas are informational only. If a
user exceeds their advisory storage quota, OneFS lets them; but the cluster provides a
comparison between the quota allocation and actual usage. In contrast, if a user exceeds a
soft limit quota, the system notifies the user with a warning email. If the user exceeds a
hard limit quota, the system will from then on deny the user the ability to write. It also
notifies the user that the quota has been violated.
You can customize the quota notifications in OneFS so that they meet your requirements.
Quotas are enhanced functionality that requires licensing. To get the feature, you must
purchase a SmartQuotas license for each node in the cluster.

Another enhanced function provides advanced data visibility and analytics. InsightIQ, a
powerful tool monitors one or multiple clusters, then presents data visually in a robust
graphical interface with reports you can export. You can customize the reports. You have
the ability to drill down into the information and break out specific information as desired,
and even take advantage of usage growth and prediction features.
The tool monitors many aspects of system performance, such as CPU utilization and
interface throughput. The tool also reports on the file system analytics including quota
usage, files per user, files per directory, average file size, and more, and lets you export the
analytics if you want them.
An external VMware system or standalone Linux server is required for this enhanced
functionality. The separate server runs external to the cluster and collects data from the
cluster in scheduled intervals. To enable these capabilities, you get a free InsightIQ license
for each cluster.

OneFS can implement deduplication. Deduplication provides an automated way to increase
storage efficiency. OneFS achieves deduplication by analyzing to find duplicate sets of data
blocks, then storing only a single copy of any data block that is duplicated. Deduplication
runs as a post-process job; in other words, on data already stored on the cluster.
Deduplication works at the 8K block level on files over 32K in size.
You can run deduplication jobs against a specific directory path or on the entire directory
structure.
OneFS also provides a dry run deduplication assessment tool, standard. This allows you to
test drive deduplication to see how much capacity you would save if you ran the actual
deduplication process. To enable the deduplication full functionality requires a SmartDedupe
license for each node in the cluster.

Having completed this lesson, you can now understand connection management, explain
data distribution, I/O optimization and data protection, configure management roles,
manage the cluster’s capacity, identify data visibility and analytics, and examine
deduplication.

After completing this lesson, you should be able to define data integrity, understand data
resiliency, and explain data recovery and data retention.

We’ve already discussed data protection and that OneFS uses the Reed-Solomon algorithm
for forward error correction, or FEC, instead of RAID. In earlier training, you saw how Isilon
breaks data into stripes and spreads it across nodes.
Each stripe is protected separately with FEC blocks, or parity. Stripes are spread across the
nodes and not contained in a single node. Only 1 or 2 data or protection stripe units are
contained on a single node for any given data stripe. Protecting at this granular level allows
you to vary your protection levels and set them separately for node pools, directories, or
even individual files.
What’s the point of all this? Well, in most popular SANs and in typical scale-up NAS, you
have a pair of heads so that one can back up the other, and that’s what provides high
availability. With OneFS, you could say that high availability is baked right in to every data
transaction, because the data is spread onto many drives and multiple nodes, all of them
ready to pitch in and help reassemble the data if a component fails. This approach creates
an amazingly resilient platform.

OneFS protects all metadata by mirroring it. Up to 8 copies of the metadata are maintained
for any file, depending upon the file’s protection level. OneFS also allows directories to be
protected one level higher than the data. This creates additional metadata copies and
ensures the data is always accessible.
In addition to FEC protection, OneFS checks the block and file integrity by implementing a
cyclic redundancy check, or CRC. CRCs are often referred to as checksums. Checksums run
at each stage of the storage process, ensuring each block’s integrity. Since OneFS checks
all the data as it is being written or read, you don’t have to do a show-stopping, entire-
system integrity check, as with some rival platforms.
All of the FEC protection, metadata, and CRC data integrity functionality come standard in
OneFS.

OneFS supports the Internet Content Adaptation Protocol (ICAP) for integration with major
anti-virus provider applications. ICAP is an HTTP-like protocol used to manage the off-
cluster anti-virus scan engines. The ICAP vendors that OneFS supports include Symantec,
McAfee, TrendMicro and Kaspersky.
OneFS supports different types of anti-virus scans. You can scan files when they are
accessed, when a file is opened for reading or modifying, and when a file is closed or saved.
OneFS also supports scheduled scans, based on policies. The policies let you refine how and
when scans occur.

OneFS also empowers you to decide how to treat a threat once it’s detected. You can set
policies to merely record an event; attempt to repair the file; quarantine the file; or
truncate the file.
You can also configure OneFS to send you an alert when a threat is detected, or if issues
occur with an ICAP server.
OneFS logs its scans, and you can pull reports from those logs.
ICAP support is a standard OneFS functionality.

Data resiliency refers to the ability to recover past versions of a file that has changed over
time. Sooner or later, every storage admin gets asked to roll back to a previous “known
good” version of a file. OneFS provides this capability using snapshots. Snapshots capture
the changed blocks and metadata information for the file.
OneFS uses the copy-on-write snapshot methodology. This approach keeps the live version
of data intact while storing differences in a snapshot. Because the system is only writing
changes, the writes are very fast.
Snapshot policies are used to determine the snapshot schedule, the path to the snapshot
location, and snapshot retention periods. Snapshot deletions happen as part of a scheduled
job, or you can also delete them manually. Yes, you can delete them out of chronological
order.
Some OneFS system processes use snapshots internally. No license is required for system-
based snapshot usage. However, to use snapshots for data resiliency requires a SnapshotIQ
license for each node in the cluster.

When we think data recovery, first we think of data backup. Isilon supports Network Data
Management Protocol (NDMP) for integration with backup applications provided by major
manufacturers such as Symantec, EMC, CommVault, and IBM.
A backup application external to the cluster manages the backup process. You can set this
up in one of two ways: send cluster data over your LAN to the backup device; or, send data
directly from the cluster to the backup device using Isilon backup accelerators.
Depending upon the amount of data and the interfaces selected on the external network,
backing up across a network might not be as efficient as using the backup accelerator. The
backup accelerator provides access to the data across the fast InfiniBand internal network
and delivers it to the backup device over Fibre Channel ports.
This NDMP support comes standard with OneFS.

While NDMP backup comes standard with OneFS, replication is an enhanced data recovery
option. Replication keeps a copy of one cluster’s data on another cluster. OneFS performs
replication during normal operations, from one Isilon cluster to another. Replication can
occur over a LAN or over a WAN. Replication may be from one to one, or from one to many
Isilon clusters. Synchronization only works in one direction.
OneFS supports two types of replication – copy and synchronization. With copy, any new
files on the source are copied over to the target, while files that have been deleted on the
source remain unchanged on the target. With synchronization, both the source and target
clusters maintain identical file sets, except that files on the target are read-only. When
synchronization occurs, the changed data blocks and the associated file metadata are sent
to the target.

Replication is policy-based and runs as a synchronization job. You can set replication
policies to run synchronization jobs whenever you want, or to replicate automatically if the
source data changes. The policies can be set up per directory, and for specific data types.
You can set up exceptions to include or exclude specific files.
OneFS also empowers you to limit the bandwidth used for replication, in order to optimize
the traffic for more important workflows.
Replication requires a license with OneFS. The license is called SyncIQ.

Data retention is the ability to prevent data from being deleted or modified before some
future date. In OneFS, you can configure data retention at the directory level, so different
directories can have different retention policies. You can also use policies to automatically
commit certain types of files for retention.
OneFS offers two types of data retention: enterprise and compliance.
Enterprise is more flexible than Compliance, and meets most companies’ retention
requirements. It can allow privileged deletes by an administrator.
Compliance level of retention is even more secure, designed to meet SEC regulatory
requirements. In Compliance mode, once data is committed to disk, no one can change or
delete the data until the retention clock expires. A common hacker ploy for beating
retention safeguards is to temporarily change the system clock to some date way in the
future, thus releasing all files. Compliance mode defeats this approach by relying upon a
specialized clock that prohibits clock changes.
You can still use SyncIQ to replicate the files that have retention policies applied.
Retention in OneFS is an enhanced function and requires a license called SmartLock.

Having completed this lesson, you should be able to define data integrity, understand data
resiliency, and explain data recovery and data retention.

This module went into more detail about how to manage the data on Isilon clusters, and
outlined various ways OneFS assures data security, integrity, and availability.

This concludes the Isilon Fundamentals training course. Thank you for your participation.

Welcome To The Isilon Fundamentals Course.: Publish Date: February 2016

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Welcome To The Isilon Fundamentals Course.: Publish Date: February 2016

Uploaded by

Copyright:

Available Formats

Welcome to the Isilon Fundamentals course.

Publish Date: February 2016

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 1

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 2

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 3

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 4

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 5

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 6

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 7

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 8

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 9

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 10

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 11

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 12

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 13

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 14

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 15

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 16

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 17

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 18

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 19

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 20

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 21

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 22

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 23

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 24

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 25

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 26

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 27

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 28

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 29

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 30

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 31

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 32

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 33

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 34

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 35

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 36

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 37

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 38

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 39

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 40

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 41

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 42

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 43

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 44

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 45

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 46

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 47

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 48

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 49

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 50

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 51

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 52

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 53

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 54

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 55

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 56

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 57

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 58

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 59

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 60

Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 61