Professional Documents
Culture Documents
Storage Basics: An Introduction To The Fundamentals of Storage Technology Storage Basics
Storage Basics: An Introduction To The Fundamentals of Storage Technology Storage Basics
Storage Basics
Contents
Section 1
The information society saving data and knowledge at new levels
Section 2
Tiered storage: intelligent information management in the company
Section 4
Storage networks spoilt for choice
Section 5
Backup & Restore: an unloved compulsory exercise
Section 6
Storage management making complex storage networks manageable
Section 7
Virtualization some catching up is necessary regarding storage topics
Section 8
The storage strategy of Fujitsu Siemens Computers and its partners
Forecast: Future storage trends
Glossary
As well as articles about storage from our partners
Brocade, CA, EMC, NetApp, Sun and Symantec
Section 3
Online storage: disks and reliability
Storage Basics
An introduction to the fundamentals
of storage technology
Storage Basics
An introduction to the fundamentals
of storage technology
Storage Basics
January 2009
Copyright
Fujitsu Siemens Computers 2009
Text, editing, production: ZAZAmedia / Hartmut Wiehr
Printed in Germany.
Published by
Fujitsu Siemens Computers GmbH
Mies-van-der-Rohe-Strasse 8
80807 Munich, Germany
Contact
www.fujitsu-siemens.com/contact
All rights reserved. Subject to delivery and technical changes. The names reproduced in this document can be brands whose use by third parties for own purposes can violate the rights of the
owners.
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Section 1
The information society saving data and knowledge at new levels . . . . . . . . . . . . . . 9
Section 2
Tiered storage: intelligent information management in the company . . . . . . . . . . . . 15
Section 3 Online storage: disks and reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Section 4 Storage networks spoilt for choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Section 5 Backup & Restore: an unloved compulsory exercise . . . . . . . . . . . . . . . . . 39
Section 6
Storage management making complex storage networks manageable . . . . . . . . . . 49
Section 7
Virtualization some catching up is necessary regarding storage topics . . . . . . . . . . 57
Section 8 The storage strategy of Fujitsu Siemens Computers and its partners . . . 61
Forecast: Future storage trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Partners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Preface
Dear Reader,
Coming to grips with the ever-growing flood of data is still the greatest challenge as far
as storage is concerned. This statement no longer only applies to major customers and
their data centers, but also affects SME businesses and even smaller companies. Coming to grip means secure data storage and accessibility with due regard to agreed quality standards (service levels) and at a reasonable cost (CAPEX und OPEX). Many of the
new technologies such as ILM, DeDup, SSD, virtualization, thin provisioning regardless of whether they are already established or still just hype are eager to help here.
We advise our customers to develop a storage solution that suits them from this
variety of established and new technologies. We have a comprehensive portfolio of
best-in-class products in this regard, but neither is it our intention nor are we able to do
everything ourselves. The way in which we monitor technology helps us to make the
right choice here. Together with technology leaders in storage we have developed strategic partnerships, we integrate their products in solutions and provide the appropriate
services.
If we see that our customers have problems, for which there is no suitable solution
on the market, we then develop our own products and solutions. CentricStor solutions
are prime examples of this.
In addition we have begun to not only develop and implement storage solutions at
our customers, but also to operate them ourselves for our customers. Fujitsu Siemens
Computers is to increase its investment in this managed storage business. Our managed storage customers receive an invoice and the pertinent report every month. The
advantages are quite obvious: transparency as regards costs and performance, improved
efficiency and return on investment. This not only gives us the advantage that we are
becoming better in selecting and developing products that enable storage solutions to
be run on a cost-optimized basis and at defined quality standards.
Thanks to this strategy Fujitsu Siemens Computers has also developed into one of
the most successful providers of storage solutions in Europe.
Preface
The aim of this book is to provide you with an introduction to storage technologies
and storage networks, and to highlight the value added for your company. We also
introduce some of our partners in the storage sector and provide a comprehensive
overview of the current storage product portfolio of Fujitsu Siemens Computers. Our
products and associated services delivered by us or our certified partners are the
basis for storage solutions that help you contend with the growing flood of data!
Your Helmut Beck
Vice President Storage
Fujitsu Siemens Computers
Section 1
n 2008 the World Wide Web celebrated its 15th birthday and Google, the Internet
search machine company, became 10 years old. Such milestones represent a new element in the history of computer technology: nowadays, anyone can just click on a
mouse and browse through a vast indescribable amount of information and entertainment data at anytime and anywhere. Search machines, such as Google, bring a certain
amount of order to this knowledge chaos and it was this particular discovery by two
Stanford students that revolutionized global knowledge within a very quick time. [1]
Accessing data quickly assumes that the data can be read and is available somewhere in the World Wide Web. Its readability is based on file formats, classifications and
index or meta data which must be defined before being accessed by the search machine.
Its existence is based on the fact that it is stored somewhere on electromagnetic storage systems. That is only possible using new and state-of-the-art technology which has
been developed over the last 60 to 70 years, i.e. information technology (IT).
10
Section 1
11
Digitizing knowledge
he means and technologies used by man to document experience and knowledge
either for a short time or indeed long-term have greatly developed throughout history, whereby the term knowledge is used here without any value attached to it, i.e.
without an assessment of the content. However, the reason why human or social groups
pass on information is still the same.
And it is the advance of computers and the World Wide Web which has now moved
society away from an industrial society to one based on information. The production of
goods in the classical sense is more in the background and it is now the product services and information which has been established. Distributing information and entertainment has become a separate profitable business and has constantly changed the
way in which society interacts [4].
More and more digital information is entered, saved and provided via networks. On
the one hand, this growth is based on the amount of accrued company data. Accounting was always at the center of automation developments. Some of the oldest clay
slates found in Persia contained book-keeping documentation. The first computers
were predominantly used for book-keeping purposes. And today IT has spread throughout companies like wildfire. More and more operational areas are now electronic and
each new sector provides an increasing amount of digital data. Depending on the business model, there are some companies today which just exist virtually within computers and that includes many Internet businesses. Classic companies started with solutions for ERP (Enterprise Resource Planning) and moved via CRM (Customer Relationship Management), SCM (Supply Change Management), Data Warehousing and Business Intelligence to new areas in Web 2.0 and social media. It is frequently
production-technical and commercial data that is entered the most and processed in
applications from SAP, Oracle or Microsoft.
This structured data, which is systematically stored in database fields, can be easily
accessed via queries. Its evaluation and interpretation has become very complex due to
the quantity and constant expansion. This is why Data Warehousing exists where the
data collected data is sorted and edited according to business intelligence criteria. For
example, for airline companies want to know for marketing and flight organization
purposes how often their customers fly, their destination and also their chosen form,
i.e. business or economy. Databases alone do not provide such interpretations.
12
Section 1
13
Megabyte (MB)
Gigabyte (GB)
Terabyte (TB)
Petabyte (PB)
Exabyte (EB)
Source: Horison Information Strategies, UC Berkeley Study How Much Information, IDC
The so-called Web 2.0 with its new interaction options for network participants, for
example YouTube, MyFace, LinkedIn or Xing, will also result in huge data quantities
being stored by the providers responsible. Most of todays Blade servers and storage
arrays are being sent to such companies. This development will increase as new technologies expand, such as leased software (Software as a Service / SaaS) or Cloud Computing where the user accesses programmes and data which is stored in giant data
centers somewhere in the Internet cloud. Medium-sized companies and start-ups will
enjoy low-priced options that enable them to use such a sophisticated infrastructure.
14
Section 1
Amazon with its large data centers is renting out computing and storage capacity for
its external customers. Of course, the appropriate network bandwidths must exist and
the provider must be 100% reliable. It is clear that new technologies which are used
first of all in the consumer environment will expand into the world of business IT. Risk
analyses for particular situations are essential, especially when looking at security topics and cost savings.
The options in our Information society have simply not yet been exhausted. New
processes for transferring knowledge and storing information have joined the existing
procedures [5]. Information technology has enormous potential but, just like all the
other technical progress before, it is simply a tool for specific purposes. It all depends
on how it is used and for which objectives. First of all, it has to be ensured that data
backup itself becomes more reliable. The storage example shows both the technological
opportunities as well as the restrictions. And that is why particularly in this environment there is a whole range of new fundamental inventions and gradual improvements.
Section 2
lmost all the forecasts and estimates about fast data and storage growth have so
far proven to be true despite the comments of a lot of analysts and market
observers. Such estimates have often proven to be even too conservative. In particular, recent years have shown that, in addition to the company IT and their storage
requirements, other groups in society now digitize their analog information. These
include film and video, voice and music recordings, medical x-rays (medical imaging),
TV cameras in major cities or at security-conscious locations (security & surveillance) as
well as the conversion from analog to digital radio in police, fire brigade and rescue
services. An additional factor is the so-called social networks (social communities), such
as YouTube or Facebook with their enormous amount of data photos and videos.
Digital information is being saved everywhere at a great extent including data centers which are expanding daily. But is all this data really worth it? Do all those holiday,
birthday and family snapshots which used to bore us all at family slideshow evenings
really have to be saved on state-of-the-art technology for ever and ever? Does a company really have to save everything electronically despite legal specifications?
According to IDC the growth of all non-structured data (file-based) that is being
increasingly collected outside the company will exceed structured data (block-based)
for the first time in 2008. The balance in the company between structured data entered
in databases or ERP applications and non-structured data resulting from e-mails
16
Section 2
Petabytes
Block based
File based
2005
2006
2007
2008
Source: IDC 2007
According to IDC, in 2008 file-based data will for the first time experience of stronger growth than
block-based data.
including attachments, Office files, presentations, videos and so on, has also shifted.
The problem of retrieving structured data via database queries and interpreting data via
Data Warehousing or Business Intelligence has today basically been solved although
there is no solution for non-structured data [1]. However, this cannot be said for the
actual storage of such data quantities: the main type of storage i.e. fast access to
expensive hard disks is limited and only assigned to really critical business applications.
Moving data
oving data records and data in servers and applications to less performant storage
areas is necessary for several reasons:
New up-to-date data is materializing every day, every hour in business-critical information. Booking, purchasing and selling data must all remain in direct access, yet it
becomes unimportant after a certain period and even out-of-date. It has to be
moved for space reasons, i.e. from primary online storage to slower and lower-priced
disk systems (Nearline or Secondary Storage).
Other files, such as presentations, are not necessary business critical yet have to be
stored nearby (Nearline) as such files are often modified and used again and again.
Such data is a typical example as it is saved several times and used or read by numer-
17
ous employees. The analysts in the Enterprise Strategy Group (ESG) have developed
the Iceberg model. There are two main types of data: dynamic data which is continually changed (= visible part of the data iceberg) and permanent data that is static
or fixed and will not be changed any more (= invisible part) [2].
Legal regulations or just careful commercial thinking requires long-term data storage without having to have the contents constantly available during daily business.
It can move to an archive in whichever manner. In early days it was stored to tape
(known nowadays as so-called Virtual Tape Libraries) which simulate tape storage in
disk systems. This is a third type of storage on tape but the data is a long way from
the original starting-point, such as servers and primary disk storage units.
Last but not least: data loss must be avoided depending on the value of the data by
making immediate or delayed copies which can be recovered as required (Backup,
Restore, Snapshot, CDP/Continuous Data Protection). These procedures are based on
up-to-date duplication concepts which fish out the copies during the backup processes. But this is only from the backup media and not from the primary or secondary
disk storage units. More information about this complex topic is in section 5.
Even if a company only uses one aspect of these procedures, it is still using a tiered
data backup system as data is being moved, even it is from the server to a backup
medium, and even though this would be inadmissible from a professional storage viewpoint.
18
Section 2
The stored data is increasingly moving towards non-access on account of the various levels
involved: from servers and quick primary storage and slower storage to backup mechanisms and
archiving.
using software to classify data. EMC bought Documentum a manufacturer of a document management solution (DMS) in order to set up packages for both their old and
new customers. ILM can be seen as a continuation of HSM in the world of Unix and
Windows. ILM manages the data from its origin to its archiving or deletion and stores
the data on various fast and powerful data media depending on its individual value.
This type of storage management is based on HSM technology and uses company policies to establish an optimal match between data values and the respective storage
subsystems. Even if companies are often not aware of it they all practice some form of
ILM. Even those who keep their data for a year or even longer on high-performance and
expensive online storage, have made a decision regarding the assumed value of their
data. But whether such a decision can be justified is not clear as in the meantime the
data could have been saved on cheaper data media.
A similar approach is Tiered Storage which is basically the same as HSM and ILM, but
which looks more at the combination of IT infrastructure (hardware basis) and data
contents. The stored data is increasingly moving towards non-access on account of the
various levels involved: from servers and quick primary storage (online storage for data
that requires immediate business access) and slower storage (nearline storage for data
that is only required sometimes) to backup mechanisms and archiving. Such a structure
based on the value of the data exists in every company whatever they may call it.
19
The recommendation of the SNIA Data Management Forum is to add intelligence to the tiering
efforts by integrating it into a broader ILM-based practice. More informations at www.snia.org/
dmf.
HSM and ILM can be regarded as a high-level, indeed tactile strategy which establishes the data backup stages and criteria in a justified sequence. Many manufacturers
selling HSM or ILM promised their customers that they would, above all, reduce their
storage costs. This refers to the classic storage processes such as data save, backup,
restore and disaster recovery which must be classified as such so that the data can be
saved on the appropriately priced and powerful storage medium according to its value
for the company. That is more easily said than done: how can someone decide which
data should be stored at which stage and on which medium and for how long? [4]
The traditional data hierarchy was split in two: data saved on hard disks with direct
and fast access and backup or archive data saved on cheap tape and which is not in
direct access and thus partially bunkered somewhere. Those who select HSM or ILM as
their strategy want to move away from this old concept and now wish to save data
according to usage, i.e. its direct significance for the business process. Those that plan
such a step with specific criteria can save money immediately [5].
Even if this approach is not always accepted [6], HSM and ILM have had an effect:
Tiered Storage today is seen by companies to be quite normal. A real hierarchy now
almost completely dominates the world of storage. The two tiers have now become four
20
Section 2
or five and, in an ideal situation, they reflect exactly the value of the data on each tier
based on the corresponding costs. In other words, expensive primary storage (fibre
channel and SAS disks) down to less expensive secondary storage (disk-to-disk (D2D))
on SATA disks which still have to be accessed via servers or applications, and then down
to different forms of backup: either as backup storage on cheap SATA disks which have
the function of the older type of tape backup (Virtual Tape Libraries / VTL) or as classic
backup and archive on magnetic tapes.
21
Tier 3: Sensitive data, about 25% of data, moderate response times, SATA disk, IPSAN (iSCSI), virtual tape libraries, MAID, disk-to-disk-to-tape periodical backups,
99.9% availability, recovery time objective: minutes, retention period: years.
Tier 4: Non-critical data, ca. 40% of the data, tape FC-SAN or IP-SAN (iSCSI), 99.0%
availability, recovery time objective: hours/days, retention period: unlimited.
HSM, ILM or Tiered Storage require clear-cut and continuous data classification. This
can only be handled manually, especially with non-structured data, which is again
much too expensive. The price of the equivalent software in the market, such as Data
Tier 2
Tier 3
Operational
Mission-critical, OLTP
99.999%
Application
Vital, sensitive
99.99%
Reference, archive
Reference
99.0%99.9%
Very high
None
High
< 5 hours/year
Moderate, low
=> 10 hours/year
Tape
Probab
i
lity o
f re
Primary storage se
Enterprise class disk
Mirroring and replication,
CDP
Synchronous and
asynchronous (remote)
Secondary storage
SATA disk and virtual tape
Fixed content, backup/
recorvery, reference data
Point-in-time, snapshot,
deduplication
Long-term retention
Fixed content
Video, medical,
government regulations
fd
eo
at
Amount of data
Disk
alu
Data type
Applicatons
Availability
in percent
I/O, throughput
Scheduled
downtime
Recovery
technology 100
VTLs
SATA/JBOD
ata
Amount of d
0
Average days
since creation
0 days
Recovery Time
Objective (RTO)
milliseconds
Key components
(ILM)
Policy engine
Tiered storage hierarchy
MAID
30+ days
seconds
Data mover
hours
days
(remastering)
It is increasingly important to understand that the value of data changes throughout its lifetime.
Therefore, where data should optimally reside and how it should be managed changes during its
lifespan.
22
Section 2
Section 3
n the early days of electronic data processing which only took place on mainframes,
punch cards were originally used to save data. This was followed in 1952 by magnetic tapes. Both methods were based on binary data (consisting of two figures: 1
or 0) either by punching or not punching a hole on paper cards or by magnetizing or
not magnetizing the tapes. This form of storage is still necessary today as computers
can only handle information which has been reduced or converted using such a binary
system. In other words: this numbering system just consists of a 1 and a 0 because
the heart of the computer, namely the processor, can only operate on such a basis. This
means in turn that data backup is basically pretty uncertain.
Magnetic tapes were fast and could save what was in those days a large amount of
data, namely five megabytes (= 5 million characters, corresponds to several books or
the complete works of Shakespeare see the overview in section 1). But as early as
1956 alternative data media started to appear: the magnetic disk, the forerunner of the
modern hard disk. The first data media of this type consisted of a stack of 51 disks with
a diameter of 60 cm (IBM RAMAC). Hard disks as known today have several, rotating
disks which are arranged above each other in an airtight housing. In contrast to magnetic tapes, the data is no longer written and read sequentially to the data media, which
slows down access and retrieval, but to coated disks. A read/write head is driven by a
motor above these disks and can skip to all the positions required. In contrast to sequential storage, this type is known as random access and is much faster.
24
Section 3
2009
23,3
23,3
26,4
30,4
16,4
1,4
44,9
33,9
Parallel SCSI
SAS
Fibre Channel
ATA/SATA
According to Gartner Dataquest the SAS interface is evolving into the number one disk technology.
The previosly dominant Parallel SCSI is sinking into insignificance.
25
own and were completely managed by an external controller. Todays standard hard
disks are based on standards, such as IDE/EIDE (Integrated Drive Electronics and
Enhanced Integrated Drive Electronics) and ATA (Advanced Technology Attachment)
which come from the consumer sector or are known as SCSI disks (SCSI (Small Computer Systems Interface) specially developed for enterprises. Many different devices could
be connected to a SCSI controller from hard disk to scanner. The parallel data transfer
rate was much higher than the previous sequential transport methods. Since 2001 the
ATA development SATA (Serial Advanced Technology Attachment) has become more
widespread the data is no longer transferred in parallel but in serial.
SATA hard disks now provide competition to fibre-channel hard disks as today they
have a higher degree of reliability and have fallen in price. Fibre channel technology as a
whole is regarded as particularly powerful for enterprises since the introduction of storage area networks (SANs). They read reliably and quickly. SAS disks (Serial Attached SCSI)
today play a significant part in this professional sector as they are gradually replacing the
SCSI disks. As they are compatible to SATA, they can be installed together in a joint array
which can result in tier 1 and tier 2 being connected within one single device.
Rotation speed
Seek time
Typical average
access time
Power-on time
(hours x days)
I/O duty cycle
MTBF
Fibre Channel
SAS
SCSI
Online storage Online storage Online storage
and transaction and transaction
and
data
data
transaction data
10k, 15k rpm *
10k, 15k rpm *
10k, 15k rpm *
3 4.5 ms **
3 4.5 ms **
3 4 ms **
SATA
Low-end file
storage
7,200 rpm *
8 10 ms **
5.5 7.5 ms
5.5 7.5 ms
5.5 7.5 ms
13 15 ms
24 x 7
24 x 7
24 x 7
10 x 5
High
> 1.4 million
hours
High
> 1.4 million
hours
High
> 1.4 million
hours
Low
3 gbps
3.2 gbps
Yes
Yes
No
Maximum bus
4 gbps
speed
Interactive error
Yes
management
* rpm = Rotations per minute
** ms = Milli seconds
Source: Horison Information Strategies
600,000 hours
26
Section 3
The technical options of the various hard disk types have not yet been fully exploited
and SATA will probably expand further regarding professional storage but will in turn be
replaced by SAS in nearline storage. The advantage of fibre channel is that, in addition
to better equipment with internal microprocessors for mechanic and error control, it
can be positioned away from other devices in the storage network (up to 10 kilometers,
whereas SCSI is only 25 meters). This was decisive in setting up Storage Area Networks
(SANs) since the end of the Nineties as decentralized locations, such as buildings on
extensive factory premises or within a town, could be connected to each other via the
storage network. The use of IP protocols for storage networks has since extended the
range of FC and SCSI/SAS hard disks to cover very large global distances.
Only powerful FC, SAS and SCSI disks are used in online storage as part of the data
and storage hierarchy [1]. Solid State Disks (SSD) already installed by some manufacturers in their storage systems are already significant as a kind of second cache (RAM) due
to their high access rates. As they have no mechanical parts, they have a lifespan that is
longer than that of classic hard disks. But they too are nearing their end as the SSD
lifecycle comes to a conclusion after 10,000 to 1,000,000 write accesses [2] according
to manufacturer specifications.
27
AFR (%)
8
6
4
5 Years
4 Years
3 Years
2 Years
1 Year
6 Month
3 Month
Not only the age of the disks is accounted for in the annual error rates but also the different
disk types.
28
Section 3
Even if the disks last somewhat longer than a three-year cycle in particular situations, it is advisable to make the change and recopy the data in good time. How important is the price of new disks (constantly falling) and the administration hours involved
in comparison to a data catastrophe that you yourself have caused and the costs of
which could damage a company beyond belief?
29
Company
IBM
IBM
IBM
STC
IBM
Seagate
HGST
HGST
Seagate
Model
350 Ramac
1301
2302-3
8800 Super Disk
3380
Barracuda 180
7K500
7K1000
Barracuda 7200.11
Year
1956
1962
1965
1975
1981
2001
2005
2007
2008
Formatted Capacity
4.4 MB
21.6 MB
112
MB
880
MB
1,260 GB
181.6 GB
500
GB
1
TB
1.5 TB
a disk downtime, the spare disk or another disk in the array were to fail? If such a failure
were to occur, RAID 6 has a second parity calculation ready which becomes active when
a second disk fails. In such a situation, the performance of the controller drops by more
than 30% compared with a simple RAID 5 downtime.
Most manufacturers recommend a specific RAID configuration for their systems or
applications. For example, Oracle recommends a combination of RAID 1 and 5 for its
database in order to increase performance. RAID 3 is more suitable for video streaming
and NetApp has selected RAID 4 for its NAS-Filer as very fast read/write actions can
occur on the disks [4].
30
Section 3
misuse. Todays SAN and NAS infrastructures unfortunately have only low-rate security
mechanisms both at a fibre channel level as well as on an iSCSI basis. They thus frequently do not meet the security policy requirements for the companys IT.
Zoning switches in FC-SANs ensure that access control is permitted for each storage
system. This zoning can be run on a hardware or software basis. Soft zoning means that
devices just get information about those systems with which they are to exchange data.
Hard zoning means that hardware checks all the packages and forwards them only to
the permitted addresses. LUN masking is the function in an FC-SAN which makes only
those storage areas visible to an application which the latter needs to implement its
tasks.
With IP-SANs on iSCSI basis, IPSec is used for authentication and saving data
streams, for example, via encryption [5].
Mirroring (a RAID 1 principle where i.e. one disk is an exact mirror of the other) can
also be applied to mirroring an entire storage system. An identical server can be positioned at a second location possibly several kilometers away and a storage landscape
can be set up based on a contingency data center. All the data is constantly transferred
from location A to location B so that the same data is at both locations. In the event of
a catastrophe, the productive IT including the stored data is transferred from A to B. As
everything is redundant and mirrored, IT operations can be continued.
The software elements in data backup are logically based on processes for Backup
and Restore, Continuous Data Protection (CDP) as well as Snapshots. (See section 5 for
more details.) All these processes have their origin in the basic problem of storing data
on electromagnetic media, such as hard disk or tape (and likewise DVD or Blu Ray).
Despite all the advantages of such technology the stored data could suddenly disappear
into thin air! An old medium such as paper can be longer lasting and proven methods
exist against dangers such as fire or natural catastrophes. However, protecting electronic data media on a permanent basis against downtime or damage remains a tricky
and never-ending story to which IT managers must give their full attention. There is no
patent recipe.
Section 4
ard disks that are installed in servers and PCs or are directly connected to the
servers in storage arrays are still the most wide-spread structure in small to
medium-sized companies known in this case as Direct Attached Storage (DAS).
Small to medium-sized businesses have discovered the productive forces of IT and use
them for their business processes. However, at the same time their financial resources
limit their investments in an own IT infrastructure. Furthermore, there are fewer experts
60 %
SAN
DAS
Source: IDC, 2008
Even although SAN determines the topology of storage systems as a whole, the share of DAS is still
large, particulary in small and medium sized business.
32
Section 4
available, who in addition are not able to specialize in the same way as their colleagues
in large-scale companies, who only have to provide support for a small part of the IT.
Consequently, small to medium-sized businesses do not follow every trend and concentrate on the basics. Although keeping a DAS structure does not meet the state of the art
of current storage technology, it can be used as the starting point for a gradual transformation.
But what does DAS really mean? Users who connect one or more storage arrays per
server have a dedicated, exclusive storage array for precisely the application that is
installed on the server. This silo structure may become somewhat complex with time
and use a great deal of space and energy in the server room or data center but it is
ultimately easy to monitor and manage. The disadvantage is obvious: If not split into
several partitions, each directly attached storage unit is in the same way as the individual server not at full capacity. The superfluous capacity and computing power is
used in the server for peaks for special occasions, such as internal monthly billing or
external accesses for web-based purchase orders in the pre-Christmas period. And for
storage it is reserved for corresponding write and read operations in other words
there is a gap between the investment made and benefit achieved. According to analysts individual servers only run at about 15 to 20 % capacity and for storage the value
is on average about 40%.
In other companies, where the client/server infrastructure was implemented via the
internal network, there were provided storage structures, in which several server units
had access to the same storage arrays but separate for mainframes and Open Systems (Unix computers, and later also for Windows servers). However, the amounts of
data to be moved in the local area network (LAN) became increasingly large, which was
to the detriment of the transfer speed and caused data losses. Networks on the basis of
the Internet protocol (IP), which had originally only been developed for the transport of
messages [1], thus reached the limits of their capacity.
PC Client
33
PC Client
PC Client
Ethernet LAN
Server
Server
Storage Area Network
Disk Storage
Disk Storage
Disk Storage
Disk Storage
Tape Storage
A storage area network (SAN) constitutes a separate storage infrastructure which is only intended
for data transport.
In an FC network special switches are given the task of connecting storage arrays
with servers and also with each other. A switch works as a kind of multiple socket, to
which various devices can be connected. [2] In contrast to the wide-spread image of
Fibre Channel as difficult to set up and manage, specialists describe it as easy to handle. This is for example the opinion of Mario Vosschmidt, Technical Consultant with the
American IT manufacturer LSI.
34
Section 4
During its time of origin this Fibre Channel architecture was particularly linked with
the Californian company Brocade, which was founded in 1995. Today, this company is
the market leader in FC switches, which work as the nerve center in an SAN and were
equipped with more intelligence in the course of their development. This means that
such switches can take on tasks within the network, such as zoning or virtualization.
With their help it is possible to set up a Fabric, a structure that forms the core of a
SAN.
One particular aspect is the configuration of different storage zones (zoning). The
administrator can define which devices and data are and which ones are not to be connected with each other. This serves to protect against unauthorized access both in and
also outside a company. If a SAN is extended, additional switches and zones can be set
up, depending on the existence of ports (connection for cable). The name Director has
become widely accepted for larger FC switches with at least 128 ports. Brocade has
taken over several providers (McData, CNT and Inrange), who were greatly involved with
directors [3]. The intention of the manufacturer with these purchases was to strengthen
it market position vis--vis Cisco.
Cisco, the worldwide leader in Ethernet switches, has also had Fibre Channel solutions in its portfolio for several years and has thus positioned itself as a competitor
against Brocade for Fibre Channel. Not for the first time in the history of information
technology have the cards been re-shuffled between the companies involved a recurring development that is enhanced by a forthcoming new technology: Currently, Fibre
Channel over Ethernet (FCoE) is being used to attempt to bring together the separate
networks of message transport (Ethernet or TCP/IP) and data storage (Fibre Channel and
iSCSI) to again form a common network. IP-SANs on an iSCSI basis would already be in
a position to do this, but the communication and storage transport networks are mostly
kept separate for performance reasons.
A new FCoE network calls for standards and an agreement between the various providers. However, before this is finally the case, hard-fought conflicts rage to decide
market positioning. Every manufacturer wants to be involved in FCoE, even if it has to
switch over almost completely to new products. Some providers have obviously still not
forgotten that a previous rival technology of Ethernet, named Token Ring, also lost
the race because the manufacturers behind it concentrated too much on their core
product and thus ultimately did not keep up with the competitors [4].
The historical service of FC-SANs, which are the prevailing storage infrastructure
today in large companies and institutions, consists in providing efficient, fast transport
services that are less susceptible to errors. Although the technology is simple in comparison with a classic network, problems frequently still occur in practice because server
and network administrators are too unfamiliar with data storage. Compared with Ethernet, Fibre Channel has ultimately remained a niche technology, in which there is still
a lack of standards in many places. And over the last few years false expectations have
35
in part been raised, because the obstacles (and prices) for FC training courses were set
too high. The storage arrays attached in the SAN are mostly managed via the tools supplied by the manufacturers, which in turn only requires a short familiarization period
and is directly supported by the suppliers [5].
DAS
NAS
SAN
Application
Application
Application
Network
File System
File System
File System
Disk Storage
Disk Storage
Network
Disk Storage
Source: Fujitsu Siemens Computers
Every topology pursues a different concept but the goal is the same: protecting application data.
36
Section 4
SAN
Fibre channel complex,
expensive, closed system,
secure
Protocol
Optimized for
Types of data
Split
Storage for
Drives
All data
Drives, resources
Server (data center)
All
NAS
IP simple implementation, cost-effective, open system, bear security in mind
TCP/IP fast but very high overhead
(up to 40% net)
Simple implementation, open and fast
communication over long distances
Only files
Files, stored content
Clients (workgroups)
Only disks
of the files and their contents and structures cannot be directly accessed, either. All PC
users know that their data is archived in certain files and folders and thus have a logical
structure. They also know that their data are ultimately scattered over the hard disk
after lengthier use they are fragmented (distributed, torn apart), because with each
storage process free blocks are occupied first, regardless of the context of the file content. Consequently, the operating system needs an increasingly long amount of time to
open files. First of all the various blocks have to be on the physical level and consolidated into one entity that is visible for the user. By using the command Defragment
the suffering Windows user puts things in order again on the hard disk at least for a
while.
In an NAS the focus is placed on the network functions [6] and less on the performance of the hard disks used. Many users consider it a lower-cost alternative to an SAN.
Whoever decides in favor of which version depends on a great many factors that are
perhaps also very individual. David Hitz, one of the founders and now Executive Vice
President of Engineering at NetApp, expressed a frank opinion in an interview: NAS
and SAN are like two flavors of the same ice cream. NAS is chocolate-flavored and SAN
is strawberry-flavored. And everything the customer needs to know about the two
technologies is only that both systems can be used at any time for data storage. What
intelligent person would be disturbed by the fact that the someone does not like chocolate-flavored ice cream, but prefers strawberry-flavored ice cream. [7] This somewhat
flippant statement can also be interpreted in such a way that companies with SAN and
NAS have two storage architectures to choose from, which can be individually adapted
depending on their requirements. No-one needs to have any reservations.
37
DAS
No
NAS
Yes
SAN
Yes
15
126
Yes (copper)
Yes (copper)
No (glass)
Low
Bad
25 m
SCSI
High
Relative
/
Ethernet
Very high
Very good
10 km
FCP
A third version has been under discussion for some years now: iSCSI networks for
storage units (also known as IP-SAN) obviously overcame the lengthy introductory
phase a year ago and have achieved significant sales figures. The attraction of this
architecture is its ability to use the existing TCP/IP infrastructure for data storage. This
makes the installation and maintenance of a second infrastructure (only set up for storage) superfluous, and the administrators can fall back upon their existing IP know-how.
In practice, however, there have been greater obstacles in the integration of the various
tasks of the LAN and iSCSI storage network. Nevertheless, new prospects are the result
of the new transfer speed of 10Gbit/sec for the Ethernet, because this technology is
currently faster than Fibre Channel with only 8 Gbit/sec at present. However, customers
incur additional costs due to the new cabling that becomes necessary. In the meantime,
it is generally assumed that an iSCSI infrastructure is mainly suited for small to mediumsized companies and has found its true position there.
Section 5
he penetration of society and the economy with IT has only just begun. Increasingly more parts of daily life from communication and information procurement
right through to healthcare are dominated by IT systems, and economic processes today depend on electronic support in almost all branches of industry and in all
sizes of companies. This interdependence of business processes and electronically generated and processed information makes it absolutely necessary for companies with
requirements of all magnitudes to ensure secure data storage.
40
Section 5
41
42
Section 5
Human
Error
Hardware
or System
Malfunction
32 %
44 %
14 %
Software Corrupton or
3 % 7%
Program Malfunction
Natural Disasters
Computer Viruses
Source: Fujitsu Siemens Computers
Software errors and viruses are relatively rarely the cause of data loss. Most faults are caused by
the hardware or system, followed by human error.
n addition to the aforesaid external influences, the direct reason for data loss is software errors, which impair data integrity or can even cause entire systems to fail, as
well as various hardware errors, which can go from the power supply units and processors [3], via hard disks [4] to other components and even to redundant assemblies (on a
double or multiple basis) such as hard disk arrays. Added to these are user and operating errors, which even proficient administrators can make, whereby in this regard a
great deal disappears under a dense veil of silence. Which IT department and which
company gladly admits to having done something wrong?
Regardless of the imminent disasters it is frequently these technical errors of everyday IT or simply the end of the life-span of the components and media used, whose
sudden expiry can also mean the demise of the data stored upon them. A separate
branch of industry, which has become greatly centralized over the course of the last
few years, looks after the recovery of data storage systems of various types [5].
In order to specify the durability of hard disks their capacity and expected life-span
are expressed in the term MTBF (Mean Time Between Failure), which is an extrapolated
value for the probable downtime of a drive. Many manufacturers specify values between
one and a million hours for high-end drives, which would mean a biblical life expect-
43
ancy of 114 years (= one million hours). The underlying tests assume a very high number
of parallel disks, on the basis of which possible failure rates are calculated. However, the
implied optimal conditions are in practice the exception so that real failure rates can be
very high. Redundancy through disk system arrays (see section 3) and sophisticated
backup mechanisms have to prevent this. Since the term MTBF has on account of its
inaccuracy increasingly come under criticism, other units of measurements, for example AFR (Annualized Failure Rate), are used today. It is established in the same way as
the MTBF, but specifies the anticipated annual failure rate as a percentage of the
installed number of disks. If 8.7 out of 1000 disks fail a year, the annual failure rate or
AFR is 0.87% [6].
The average life-span of hard disks is at present three to five years, in individual
cases even longer. Companies should only rely on longer time periods if they use automatic error controls with direct notification of the manufacturers service, which
depending on the stipulated Service Level Agreement (SLA) ensures a replacement
before the disk finally fails. As capacity increases, disks become more and more inexpensive (in summer 2008 Seagate announced a 1.5 TB disk), and the magnetic tapes
that are still used in storage are becoming more efficient. For example, their throughput
has in the meantime risen to more than 500 MB/sec, while capacities are also clearly on
the increase and now lie at about 1 TB (LTO 4). The life-span of magnetic tapes is specified as up to 30 years for DLT/SDLT and LTO and is thus clearly beyond that of hard disks
and solid state disks [7].
44
Section 5
RPO
RTO
Disaster
The gap is
narrowing
RPO = Recovery Point Objective, RTO = Recovery Time Objective
RPO = Data Loss
RTO = Downtime
Source: Fujitsu Siemens Computers
RPO = The amount of data which has to be recovered after a data outage in order to be able to
resume business as usual (measured in time). RTO = The maximum recovery time which can be
tolerated before business must be resumed.
45
46
Section 5
On the target
Deduplication takes place on the storage medium itself, which helps keep the data
volume to be stored to a minimum, but the entire storage process is extended as a
result.
At source
In addition to the low data volume the other advantage here is that the reduced
quantity of data can be transferred more quickly. This is relevant for branch offices,
because in part only analog data links are available here.
1
2
Mission Critical Business Critical
> 99.99 %
99.9 %
3
Business
Important
99 %
< 1 h/year
< 10 h/year
No commitment
< 1 h/month
< 2 h/month
< 8 h/month
Intermittent
< 15 Min.
< 1 hour
1 hour
12 hours
8 hours
48 hours
24 hours
96 hours
Seconds
Seconds
< 4 hours
2448 hours
97 %
95 %
90 %
90 %
< 30 Min.
< 45 Min.
< 2 hours
Not specified
4
Non Critical
97 %
47
restore, in an archive you should be able to find data relatively easily and quickly without having to perform a restore.
The copies of the original data mirrored on the backup media are usually only stored
for a shorter period of time and are replaced on a permanent basis by more up-to-date,
write-related copies. As they are actually not intended for use, only specific requirements arise, such as currency or completeness. In the event of a disaster all information
before the oldest existing backup and after the last one is irretrievably lost.
However, these data backups must also not be kept for a long time because the
copies are only needed in the event of a disaster. You will seldom have to restore a data
backup that is a few months old.
Archiving is an altogether different matter. This storage procedure when information leaves the productive process is about preparing data on a separate medium and
making it available for later use. Archiving media are not put to one side like backup
tapes, because their use is only taken into consideration in extreme cases. Whoever
archives, has reuse as his objective at some later point in time.
Archiving can be done of its own free will the bandwidth ranges from precautionary storage for as yet unforeseeable purposes right through to acquisitiveness or
because legislation, the banks or other institutions have issued mandatory regulations
or at least recommend archiving. Whoever no longer wants to use the archived data
later and also does not have to comply with regulations concerning any possible resubmission, should also consider deletion which saves resources, space and costs.
Section 6
he confusion in storage management software can possibly be overcome by keeping to the tools supplied by the manufacturers. Integrated configuration tools, web
tools or component managers help you at the start but do not cover the entire
planning of the overall architecture and its elements. The large storage management
suites and the administration of the entire IT infrastructure as offered by some manufacturers requires a great deal of experience and thus are only possible for large-scale
companies on account of its complexity.
Planning and monitoring storage installations should be set up systematically right
from the beginning and includes a constant list of all the phases, stages and modifications. This is necessary as otherwise chaos would occur when staff change their jobs.
The IT Infrastructure Library (ITIL) with its standard requirements and procedures is an
important aid in documenting such processes. Such standards are also of assistance in
the discussions between manufacturers and customers or during company mergers
50
Section 6
when different IT worlds have to be united. ITIL has issued a series of publications which
are aimed at helping a company to structure its IT and business processes.
For example, Change Management describes the procedure for the continuous
management of modifications which replaces simple but irregularly updated Excel
tables and which is based on ITIL and the management tools supplied by the respective
manufacturer. It is thus possible to avoid right from the start any storage wilderness
with all types of products that are only flimsily interconnected. Such a lack of clarity
results in errors and failures and thus extra overnight work for employees. However, if
standard solutions or Storage out of the Box are used, the overall management is
much simpler due to the use of software that is already suitable.
Medium-sized companies have fewer financial resources which can result in them
using quicker yet not fully tested implementations for long periods, longer than is
financially viable. It is not a coincidence that it is these companies which choose technologies that they know or at least seem to know: DAS is well known today with SMEs,
and iSCSI is widespread here thanks to its close relationship with LAN architecture. As
long as these are well-tested and proven solutions, then such cautious behavior is certainly not wrong; it just makes the company less flexible than their larger competitors.
The latter can afford to have well-trained employees and longer planning phases which
enables them to text new and more effective technologies for a longer period and then
apply them productively. This particularly applies to information management which
requires both investment and know-how. Medium-sized customers can usually not
afford most of the tools used in this sector. Their purchase would also hardly be sensible
when compared to the data quantities that have to be managed.
Data management is thus frequently without good planning and basically chaotic.
As a range of Windows licenses already exists together with unused servers, non-structured data, such as Office documents tends to be saved in a non-systematic manner
with all the ugly consequences regarding classification, indices and search options for
such documents. Larger companies are one step further and provide dedicated servers
in a SAN. As a SAN has already been configured, it probably also has data which could
be saved more cheaply on other storage levels with somewhat less performance that
would be chaos at a higher level. When the IT department realizes the mix-up, it usually
adds its own negative element, i.e additional NAS filers are now created so that the
non-structured data can be saved suitably. And then the question arises at some time
about merging or integrating the various stand-alone storage solutions where sometimes the block level (SAN) and sometimes the file level (NAS) is the dominating element. They also have their own solutions which in turn require more investment and
basically add yet another complexity level to the overall storage architecture.
51
Storage Management
(1) Storage Resource Management: SRM (Storage Resource Management) initiatives began in earnest in the late 1990s. This was a Unix, Windows and later a Linux
market which held great promise 2-4 years ago but has faded in recent years. There
were several reasons why the 20+ SRM companies faded and lost momentum:
1. SRM products had a hard time moving from a reactive, reporting tool to a proactive tool that could make decisions and take actions based on user-defined
policies,
2. SRM products were mainly homogeneous, thus failing to provide support for
heterogeneous environments, and
3. SRM products only dealt with disk space allocation and lacked any insight into
disk performance issues. SRM users were worn down with all the alerts and
decisions that they had to perform manually.
Todays reality is that organizations will need to integrate a variety of vendor and
homegrown tools. Storage organizations must accept that the structure of storage
is going to be split up by vendor and type of array and that organizationally, minimizing the number of vendors and storage pools is one way to reduce storage
administration overheads.
(Fred Moore, Horison Information Strategies)
(2) Information Management
(Information Lifecycle Management) The discipline and function of oversight and
control of information resources.
Information management services: The processes associated with managing information as it progresses through various lifecycle states associated with a Business
Process. These services exploit information about data content and relationships in
making decisions. Examples include records management and content management
applications.
(SNIA Dictionary)
The consequence is that, due to such structures, storage management becomes difficult as problems, failures, hardware and software faults cannot be immediately identified. Counter-measures are usually taken too late in such situations. However, monitoring, reporting and constant error analysis could be implemented with software support
whereby such tools are usually supplied by the manufacturers. If many different components are used, storage management is often faced with the problem of controlling
such heterogeneity.
52
Section 6
Storage
Storage
Hardware
Storage
Systems
Storage
Mechanisms
Disk Systems
HDDs
Tape Libraries
Tape Drives
Optical
Jukeboxes
Optical Drives
Storage
Infrastructure
Removable
Media
Storage
Services
Storage
Software
Data
Protection
and
Recovery
Storage
Management
Archive and
HSM
Storage
InfraInfrastructure
Storage
Replication
File
System
Other
Storage
SW
Consulting
Implementation
Storage
Device
Management
Management
Support
Source: IDC
According to IDC, storage can be devided into three main groups, which various fields are assigned
to. Storage Management is only one of many other areas.
Management elements are sufficient supply of storage for users, departments and
applications as well as provisioning. The latter is understood as providing and if necessary procuring on time the storage required: sufficient storage arrays and hard disks
must be available to meet all these requirements. If, for example, seasonal peaks, such
as Christmas business, must be taken into account, this would mean low system usage
during the other periods which could be 20 to 30% below actual capacity. The technology known as Thin Provisioning has been developed to counteract this problem. These
are procedures which plan the flexible, changing assignment of storage for various
applications: capacities are assigned and removed depending on capacities [1].
Stranded Storage is storage which was originally provided for specific applications or
users but not used after all. The objective of Thin Provisioning is basically to use
stranded storage capacities again. Thin Provisioning also uses virtualization [2] via
which different physical hard disks and disk arrays are combined to logical or virtual
units. This enables a very exact assignment of storage space and you no longer have to
consider the maximum and minimum physical disk limits.
The IT department should remain independent in order to free itself of any blame
and be able to refer to the other manufacturer who has forced the user to take on a
53
specific SAN, NAS or iSCSI infrastructure. A Fujitsu Siemens customer has gone his own
way and has deliberately selected and configured separate stand-alone solutions for
different architectures: a SAN for mainframes including mirroring for a second (backup)
data center, a virtualization solution for SQL and mail server based on Windows and
Linux and finally a NAS for FlexFrame for SAP with Fujitsu Siemens servers and NetApp
storage. All three sectors are separately managed whereby the required extra effort
guarantees that other areas are not automatically touched in an error situation. In
other words: the often vilified silo solutions can certainly make sense in certain customer situations just like DAS systems.
This approach can also be seen in the use of software: it is often the case that too
many tools from various sources are used instead of relying on a single provider.
Whereas storage systems such as SAN, NAS or IP-SAN can be configured by the supplied software, a central tool should be used for the subsequent administration which
thus enables end-to-end management. Almost all the storage manufacturers today
offer the appropriate programs.
54
Section 6
but it would be a mistake to believe that storage data can simply be moved around in
addition to the existing network paths. A separate IP infrastructure with the mounted
storage systems must be set up in order to have a working IP-SAN [3].
Storage management itself is above these transfer techniques and consists, first of
all, of the tools supplied by the manufacturer, also known as element manager. A web
server is installed for easier operation so that the administrators can access a browser.
More complex networks are monitored and controlled via monitoring services which
connect many storage components such as HBAs (Host Bus Adapter), switches and
storage systems via interfaces. Some examples are Control Center from EMC, BrightStor
SRM from CA or Sanpoint Control from Symantec.
In the past it was frequently the case that storage equipment from the various manufacturers did not understand each other in the network as they either do not have
suitable interfaces (APIs = Application Programming Interface) or they were not compatible with those from other manufacturers. In order to make storage networks userfriendly for the companies, the SNIA (Storage Networking Industry Association which
includes almost all the storage manufacturers) organized a so-called management
initiative in order to create a standard for all devices. Many years of work were spent
by the SNIA boards before they submitted a standardization proposal known as the
Storage Management Initiative Specification (SMI-S). The manufacturers now work
much closer together, swap information about their interfaces and ensure mutual
licensing. The communication interfaces covered by the SMI-S now often have priority
over the proprietary APIs.
According to Frank Bunn, SNIA Europe and storage specialist at Symantec, SMI-S has
an important long-term influence on storage management: SMI-S is not an overnight
solution. It is a constant process which began as early as 2002. SMI-S creates better
interoperability. It enables the standardized management of different products and
provides a consistent view of the SAN and NAS environment. Users are often thus very
enthusiastic as they can finally see their entire storage environment reflected in SMI-S.
Customers often do not even know which storage equipment they have, let alone the
quantity. But that is just the first step. The second step can greatly facilitate the management of storage systems. [4]
As far as Bunn was concerned, the subject of SAN was previously predominantly
controlled by the larger companies. Small and average companies were more skeptical
according to the motto too complex, too expensive and doesnt work anyway. Bunn:
And they were right more often than not. However, SMI-S does make SAN management much easier. Partners can thus implement and support storage networks who are
not complete SAN specialists. The SMI-S versions 1.1 and 1.2 take Fibre Channel SAN as
well as NAS and iSCSI into consideration which greatly expands the environment for
integrators.
55
However, the standardization process has not been completed despite the many
years of effort. Not every manufacturer is implementing the adapted interfaces in their
devices which would be compatible to the other providers. The mutual licensing process
often takes much longer than is really necessary as far as the topic itself is concerned.
Furthermore, various SMI-S versions are in use. These circumstances have resulted in
user acceptance not being particularly high.
Section 7
ccording to IDC, companies are holding back from using virtualization in their
storage environment as they do not regard this as absolutely essential. Storage is
often regarded as unavoidable for data storage, saving, backup or archiving
which does not contribute in any way towards increasing productivity or improving
operational procedures. Yet the experience gathered as a result of successful server
virtualization can also apply for storage. Many users follow exactly this procedure in
practice: they expand virtualization to other sectors on a step-by-step basis.
Virtual storage space can help companies to use existing storage resources efficiently as well as centralize and simplify administration. It was the x86 sector which
drove virtualization forwards as the average system utilization was very bad. To Dr.
Joseph Reger, CTO (Chief Technology Officer) at Fujitsu Siemens Computers, this is all
very obvious: Approximately 10% utilization is simply a bad value compared to existing
systems such as mainframes or Unix platforms. [1] Of all the options available to
improve the situation it is hardware virtualization which is the best: The reason is
because the superior layers no longer need to take care of this, says Reger. By superior
58
Section 7
layers we mean everything beyond the hardware layer, i.e. operating systems and
applications that also benefit from virtualization.
Reger continues his explanation: By simulating various machines pieces of hardware that do not even exist the result is a peculiar layer of transparency. With this
technology operating systems and applications need not even know that they are running in a virtualized environment. Consequently, the average degree of utilization was
dramatically increased and a great deal of money was saved. [2]
According to Reger there are in principle three large areas, in which virtualization
technologies can be applied today: hardware virtualization, operating system virtualization and application virtualization: With the first group it is a matter of pretending
as if we have more hardware than is actually available. This applies to server and storage in the same way. You simulate virtual levels of hardware that do not even exist.
Physically, the hardware exists once only; virtually, however, more is made of it. This
means that the operating system does not know it is running on virtual instances
from the view of the operating system there really are ten different servers or storage
arrays. If the operating system is virtualized, this means that the application thinks
there are several instances of the operating system, whereas in actual fact only one is
running. [3] By means of thin provisioning it is for example possible to have more
logical address space available in a storage array than is physically available.
Storage virtualization is in part misused as a term by the IT industry. It originally
meant the mapping of storage resources for the servers or applications involved, i.e. the
consolidation or regrouping of physical storage units to form logical ones. Today it is
usually used for the allocation of random storage resources, including data replication
mechanisms. This covers terms and technologies, such as volume management, virtual
disk volumes, file systems, virtual tape, virtual ports or virtual SANs: something they all
have in common is the approach of separating the physical view from the logical one,
i.e. splitting physical memory into partitions or consolidating several physical hard disks
into one or even various logical units.
When dealing with virtualization within the storage network (SAN device virtualization) itself, i.e. about virtualization at switch infrastructure level, a distinction should be
made between three different approaches:
1) With so-called in-band virtualization the control entity for the data connection,
meta data and data transport itself is on the same appliance. The scaling of these
solutions is determined by the transport performance of this appliance. The providers FalconStor and DataCore were among the first manufacturers to have offered
such solutions.
2) With out-of-band virtualization a single appliance takes care of only the meta data
and of controlling the data path, while the host and server respectively organize the
transport of the storage data to and from the storage devices.
59
3) The third approach consists in the separation of the control entity and data path,
which is done by an intelligent network device. This technology is known as Split
Path Architecture for Intelligent Devices (SPAID). Switch manufacturers like Brocade and Cisco provide suitable devices for this purpose. The separation of the
instances as performed here results in increased data transport speed and enables
scaling of the concept.
Such virtualization solutions normally have two goals. The first goal is to remove the
constraints of a storage array and/or a manufacturer. The second goal is to provide
manufacturer-independent data services, such as pool building (consolidation of storage capacities of the same service quality), data replication such as data snapshots,
remote mirroring or disaster recovery.
Something all these virtualization solutions have in common is that they permit a
coordinated selection of the optimal storage arrays for a certain task. In this way, storage resources can be made available at will and dynamically changed independent of
the storage array. Only the basic configuration of the elements to be virtualized is still
performed by the proprietary management applications. Therefore, the users have to
use this element manager together with the selected virtualization software.
The majority of the users of virtualization solutions use them in the sense of improved
storage management. Virtualization increases the freedom of choice of the users. You
can use solutions from several storage array manufacturers together in a pool. Storage
systems are frequently already completely partitioned during first installation and
afterwards are only managed via virtualization: The approach has proved itself to be
successful for years, particularly for very dynamic environments, for example with service providers or users with a large number of small application islands, such as in public
administration. Wherever high scaling is required for thousands of LUNS and hundreds
of servers the use of split-path technology is absolutely necessary. [4]
In contrast to server virtualization, storage virtualization has not yet made the
breakthrough. Neither is a market standard becoming apparent. This is certainly also
due to the fact that with the LUN concept every SAN storage array already has rudimentary hard disk virtualization. However, importance is increasingly attached to online
storage virtualization in virtualized server environments [5].
It is a different story entirely with file-based storage systems, the NAS systems. There
are a number of very promising approaches for file virtualization, but a market standard
has not established itself here, either [6].
The most progress can be seen in storage virtualization with magnetic tapes. When
its comes to backup on tape, Virtual Tape Libraries (VTL) are at present best practice.
CentricStor VT is currently the leading virtual tape product in data centers.
Section 8
ntil recently every application had its fixed allocated infrastructure, a collection
of servers, network components and storage systems. But now the aim is to
allocate only those infrastructure resources that the applications actually
require. This objective is achieved via virtualization: Large resource pools exist which
can then be used dynamically depending on the requirements involved. Fujitsu Siemens
Computers is moving in this direction as part of its Dynamic Infrastructure strategy of
which storage is an integral part.
62
Section 8
63
Terri McClure, analyst working for the Enterprise Strategy Group (ESG), concluded the following in a report:
The recently launched CentricStor FS from Fujitsu Siemens Computers is a high-end
storage system on file basis via which scaling is possible at a very fine level, thus
fulfilling capacity, availability and performance requirements as defined by file server
consolidation initiatives and Web 2.0 applications. The use of standard components
and the excellent cluster features make CentricStor FS a scalable easy-to-manage
file storage solution with a low starting price that has been specially designed for
the real world of increased file quantities. [2]
64
Section 8
FibreCAT SX systems are user-friendly and easy to put into operation. The administration is not complicated thanks to the intuitive web interface. The systems are suitable
for a wide range of applications.
The analyst Hamish Macarthur from Macarthur Stroud International said the following regarding the FibreCAT SX series: Managing and protecting the information assets
of an organization is critical in todays markets. The systems in which the data is resident must be secure, reliable and easy to manage. The FibreCAT SX range supports reliable primary storage as well as the need for faster backup and recovery. The new arrays,
with management tools included, will be a sound investment to meet the business and
compliance requirements of small, medium and large organizations.
65
66
Section 8
Managed Storage
rowth in storage means above all an immense increase in data quantities which
must all be managed, saved, provided and stored. The demand for online storage
capacity increases as does the demand for backup storage volumes. Any limits to such
growth rates are not appearing on the horizon.
Against this backdrop SAP asked itself whether they wanted to continue managing
the required storage volumes themselves or to practice what their own hosting experts
recommend to their customers, namely outsource the work that is not part of their core
competence and concentrate on the important elements essential for their core business.
SAP managers would thus place their operations and support regarding processsupporting storage infrastructure into the skilled hands of external people. SAP has
found such a competent partner in Fujitsu Siemens Computers.
Fujitsu Siemens Computers took on the role of general contractor for SAP and the
entire responsibility for providing online storage capacity for data backup 4 petabytes
monthly at the start and then moving up to more than 200 terabytes daily. Furthermore, appropriate reserve capacities were provided in order to meet any additional
requirements in time.
Fujitsu Siemens Computers thus supports one of the largest Managed Storage
projects in Europe and also manages the cooperation activities with the strategic partners involved, namely EMC and NetApp which provide products in the SAN and NAS
environment and whose specialists are involved in the corresponding service sector.
67
Customers benefit even more from the fact that Fujitsu Siemens Computers work
together very closely with many partners on storage matters. One example of such
excellent cooperation can be seen in the quotes from the CEOs of our partners EMC and
NetApp:
Joe Tucci, EMC President & CEO: The combination of EMCs networked storage solutions and the server-based solutions from Fujitsu Siemens Computers creates a wideranging offer of end-to-end infrastructure solutions that meet the requirements of our
customers. EMC solutions play a central role in the Fujitsu Siemens Computers vision of
a Dynamic Data Center, and we will continue to concentrate our joint operations on
offering our customers the most comprehensive solution portfolio available on the
market.
Dan Warmenhoven, CEO at NetApp: The strategic partnership with Fujitsu Siemens
Computers has contributed a lot to our success and is still growing in the EMEA region.
Our jointly developed solution FlexFrame for mySAP Business Suite, the implementation
of a fast backup solution for Oracle and the Center of Excellence which has been set up
with Oracle are just some of the excellent examples that have resulted from our cooperation so far.
Forecast
C
1.
2.
3.
4.
5.
70
Forecast
handled in exactly the same way as fibre channel (FCoE). In short: the 10Gb Ethernet is
available and 40Gb as well as 100Gb is on the horizon. However, there are plans to
extend fibre channel beyond 8Gb/s even up to 16Gb/s. A conversion to FCoE requires a
high amount of infrastructure investment which in due course will be balanced by savings in operating similar networks.
71
Remarks
Section 1
[1] The triumph of large numbers; 10 years Google; Neue Zricher Zeitung, April 25, 2008.
[2] Details about typeface development from Charles Panati, The Browsers Book of Beginnings
Origins of Everything Under, and Including, the Sun, New York 1998, Page 67 ff.
[3] Der Spiegel, edition dated 11. 8. 2008, cover feature Addicted to data, Page 88.
[4] Many authors have written about the positive and negative effects of this development. For
example, Neil Postman, Joseph Weizenbaum, Nicholas Negroponte or Nicholas Carr. See the
interview in Hendrik Leitner / Hartmut Wiehr, Die andere Seite der IT Business-Transformation durch Services und dynamische Infrastruktur, Munich 2006, Seite 215 ff.
[5] Cf. The following passage from an interview in the German weekly Die Zeit.
Question: Which development has changed the most the way we handle knowledge in
recent years?
Answer: Two. Firstly: hard disks now hardly cost anything. It is now no longer an utopian
idea to have mans entire published works on disk. Secondly, nobody in the world is more
than a days walk from an Internet caf. We now have the communication infrastructure to
provide the worlds great libraries to youngsters in Uganda or the poorer areas in the USA or
Germany. (Interview with Brewster Kahle, Director of the Internet Archive in San Francisco;
Die Zeit, 17. 1. 2008)
Section 2
[1] The fact that major storage manufacturers have been buying up companies, such as Cognos,
Business Objects, Hyperion, Documentum or FileNet, that developed software for document
management (DMS) or business intelligence (BI) proves that storage hardware and the criteria for stored data are merging. It can also be seen as an attempt to integrate classic storage
equipment with ILM or HSM.
[2] Steve Duplessie, File Infrastructure Requirements in the Internet Computing Era, Enterprise
Strategy Group (ESG), July 2008, Page 5.
[3] Fred Moore / Horison Information Strategies, Storage Spectrum (2007), Page 76.
[4] Every fourth user has implemented ILM at a certain point or in a certain area but only
3.5% use such solutions throughout the company. Users, who are seriously looking at such
a topic, must realize that ILM cannot be simply bought as a single product nor implemented via a one-off project. Wolfram Funk, Experton Group (quoted in the German magazine
Computerwoche 46/2006, Page 40)
[5] See also the interview with Michael Peterson in http://www.searchstorage.de/topics/
archiving/e-mail/articles/107136/.
[6] Cf. for example Dan Warmenhoven, NetApp CEO: My view (regarding ILM) is that a workintensive data management process should not be set up if there is an automated one. And
74
Remarks
ILM is very work-intensive. (). The user wants to migrate his data from expensive storage
forms to cheaper ones. And he requires an online archive in order to retrieve the data quickly
and also for compliance reasons. NetApp talks of data archiving and migration. Calling all
that ILM is confusing. (Interview in Computerwoche, 9. 11. 2007).
Section 3
[1] Cf. Hartmut Wiehr, Disk systems technology and products, in iX extra, edition 12/2007
(free download: http://www.heise.de/ix/extra/2007/ie0712.pdf).
[2] Cf. the forecast in this book, page 69.
[3] There is an early warning system in professional disk systems. Disks report during running
operations as to whether they are going to fail. Drives, which have not been in use for a
while, are addressed periodically to make sure that they still work. Do they react immediately
to a signal or only with the retries? The latter would suggest that the magnetic interface is
no longer 100%. All the corresponding information is collected and one of the spare disks is
triggered by the system if certain threshold values are exceeded. Or else a message is sent to
the service team.
[4] ICP Vortex has provided a White Paper about the various RAID-levels. The document is in
German and can be downloaded from: http://vortex.de/NR/rdonlyres/82BA6504-885D444E-AC71-7AC570CF56A3/0/raid_d.pdf.
[5] See Gnter Thome / Wolfgang Sollbach, Fundamentals of Information Lifecycle Management, Berlin Heidelberg 2007, page 214 ff.
Section 4
[1] For the history of the Ethernet protocol and the Internet, which were developed on behalf
of the US Army and were to ensure the transportation of messages in the event of war see
http://www.informatik.uni-bremen.de/grp/unitel/referat/timeline/timeline-2.html.
[2] For more information about SANs and Fibre Channel see http://www.snia.org/education/
tutorials/ and http://www.fibrechannel.org/.
[3] Seen from a historical viewpoint, CNT first took over its competitor Inrange, but was then
itself purchased by McData shortly afterwards. Brocade now unites all the former FC switch
providers against Cisco, the company with the largest market force in the network sector.
Seen globally, Brocade is still the market leader for FC switches, whereas Cisco has continuously extended its market share, particularly in USA.
[4] Good interpretations of the development of the IT industry are to be found in Paul E. Ceruzzis, A History of Modern Computing, Cambridge and London 2003, and in Clayton M. Christensens, The Innovators Dilemma, New York 2000.
[5] Cf. Mario Vosschmidt/Hartmut Wiehr, Storage networks and their instruments of administration; in: iX, vol. 8/2008, page 122 ff.
[6] A good overview is to be found in the brochure NetApp Products & Solutions.
Remarks
75
[7] Interview with David Hitz in project 57 Journal for Business Computing and Technology,
Special 01/05, page 39 ff. Cf. also the interview with NetApp CEO Dan Warmenhoven in
Computerzeitung, issue 38/2008, page 10.
Section 5
[1] Many storage manufacturers supply their own backup programs together with their hardware. EMC, a partner of Fujitsu Siemens Computers, has after the takeover of Legato in
2003 a complete solution (NetWorker) in its portfolio. Of the independent providers CA with
ARCserve and Symantec with NetBackup are particularly worth mentioning. CommVault and
BakBone can frequently be found in the Windows environment.
[2] See in this respect Dave Russell / Carolyn DiCenzo, MarketScope for Enterprise Backup/Recovery Software 2008, April 2008.
[3] An example of error corrections is ECC (Error Correction Code). The correction is used to
detect errors during the storage and transfer of data. The errors are then automatically
remedied in the second step.
[4] Hard disks with S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) have a
function that constantly monitors them, with technical values, such as temperature, startup
time or track reliability of the read and write heads being controlled.
[5] Specialist service providers, such as Kroll Ontrack or Seagate Services, can often recover hard
disks damaged by fire and water and thus save the lost data.
[6] See Hartmut Wiehr, Begrenzte Lebensdauer (= Limited life-span): Disks Status Quo and
Trends; in iX extra, issue 12/2007; download: http://www.heise.de/ix/extra/2007/ie0712.pdf.
Other terms and contexts relating to this topic are explained there.
[7] For more details see www.speicherguide.de: Focus on tape drives and tapes (http://www.
speicherguide.de/magazin/bandlaufwerke.asp?mtyp=&lv=200). The durability of DVDs and
Blu Ray Disc (BD) is subject to very large fluctuations. The market is wide-spread, particularly
in the consumer segment. A good overview of the durability of these optical media with due
regard to their suitability for archiving purposes can be found in ct Magazin fr Computertechnik, issue 16/2008, page 116 ff.
[8] As early as 1987 StorageTek presented the 4400 Automated Cartridge System (ACS), which
was the basis for the 4400 Tape Libraries (Nearline) which were introduced in 1988. The large
PowderHorn Libraries introduced in 1993 were a further development of this technology,
with which very fast access to tape cartridges is possible. A great many PowderHorns are
still in use today, and after StorageTek was taken over by Sun in 2005 the new owner had
to repeatedly extend the maintenance cycles on account of the pressure exerted by major
customers. These customers saw no reason to phase out their tried-and-trusted tape libraries and replace them with follow-up models.
[9] Cf. The Top 10 Storage Inventions of All Time, in Byteandswitch, June 16, 2008.
[10] Hartmut Wiehr, Dedup turns backup inside out, in Computerzeitung, issue 28/2008,
page 10.
76
Remarks
Section 6
[1] Fujitsu Siemens Computers has developed two dynamic infrastructure solutions FlexFrame
for SAP and FlexFrame for Oracle which combine storage, server and network resources on a
single platform. Resources can thus be assigned and moved in run mode in a virtual environment and depending on current requirements.
[2] See section 7 for more about virtualization.
[3] See Mario Vosschmidt/Hartmut Wiehr, Gut eingebunden Speichernetze und ihre Verwaltungsinstrumente (Well incorporated Storage networks and their administration tools), in:
iX Magazin fr professionelle Informationstechnik, Heft 8/2008, Page 123.
[4] Published in tecchannel, 26th January 2006. For more information, see the article SMI-S
is holding the storage networking industry together (Mit SMI-S hat sich die Storage-Networking-Industrie ihren inneren Zusammenhalt gegeben) www.searchstorage.de (September 5, 2007).
Section 7
[1] Virtualization drives IT industrialization forward: Interview by Hartmut Wiehr with Dr. Joseph Reger, at www.searchstorage.de, April 10, 2004: http://www.searchstorage.de/topics/
rz-techniken/virtuelle-systeme/articles/117015/.
[2] Ibid.
[3] Ibid.
[4] Mario Vosschmidt/Hartmut Wiehr, Gut eingebunden Speichernetze and ihre Verwaltungsinstrumente (Well incorporated Storage networks and their administration tools), in:
iX Magazin for professionelle Informationstechnik, issue 8/2008, page 124.
[5] LUN masking (also LUN mapping) means that only the storage area it needs to perform its
work is allocated and made visible to an application. As a result of this segmentation general
access to certain storage areas is prohibited, which at the same time increases the security
of all the applications. With SAN zoning the same principle is applied to the division of a
network into virtual subnetworks so that servers of the one zone cannot access storage
systems of another zone.
[6] Steve Duplessie, File Infrastructure Requirements in the Internet Computing Era, Enterprise
Strategy Group (ESG), July 2008.
Section 8
[1] Josh Krischer, Krischer & Associates, CentricStor Virtual Tape: the Swiss Army Knife for data
protection, September 2008
[2] Terri McClure, Enterprise Strategy Group (ESG), CentricStor FS von Fujitsu Siemens Computers, July 2008
Glossary
A
Compliance
In data storage terminology, the word compliance is used to refer to industry-wide government regulations and rules that cite how data
is managed and the need for organizations to
be in compliance with those regulations. Compliance has become a major concern for
organizations and businesses, due largely in
part to increasing regulatory requirements
which often require organizations to invest in
new technologies in order to address compliance issues.
Asynchronous Replication
After data has been written to the primary
storage site, new writes to that site can be
accepted, without having to wait for the secondary (remote) storage site to also finish its
writes. Asynchronous Replication does not
have the latency impact that synchronous replication does, but has the disadvantage of
incurring data loss, should the primary site fail
before the data has been written to the secondary site.
B
Backup/Restore
A two step process. Information is first copied
to non-volatile disk or tape media. In the event
of computer problems (such as disk drive failures, power outages, or virus infection) resulting in data loss or damage to the original data,
the copy is subsequently retrieved and restored
to a functional system.
Block Data
Raw data which does not have a file structure
imposed on it. Database applications such as
Microsoft SQL Server and Microsoft Exchange
Server transfer data in blocks. Block transfer is
the most efficient way to write to disk.
Business Continuity
The ability of an organization to continue to
function even after a disastrous event, accomplished through the deployment of redundant
hardware and software, the use of fault tolerant systems, as well as a solid backup and
recovery strategy.
D
Data Deduplication
Deduplication technology segments the
incoming data stream, uniquely identifies
these data segments, and then compares them
to segments previously stored. If an incoming
data segment is a duplicate of what has already
been stored, the segment is not stored again
but a reference is created for it (pointer). This
process operates at a very low level of granularity or atomic level to identify as much
redundancy as possible. The trade-offs in this
78
E
Ethernet
Local area network (LAN) topology commonly
operating at 10 megabits per second (mbps)
over various physical media such as coaxial
cable, shielded or unshielded twisted pair, and
fiber optics. Future plans call for 1, 10 and 100
gigabit Ethernet versions. Ethernet standards
are maintained by the IEEE 802.3 committee.
F
Failover
In the event of a physical disruption to a network component, data is immediately rerouted
to an alternate path so that services remain
uninterrupted. Failover applies both to clustering and to multiple paths to storage. In the
case of clustering, one or more services (such
as Exchange) is moved over to a standby server
in the event of a failure. In the case of multiple
paths to storage, a path failure results in data
being rerouted to a different physical connection to the storage.
Glossary
FaultTolerance
Faulttolerance is the ability of computer
hardware or software to ensure data integrity
when hardware failures occur. Fault-tolerant
features appear in many server operating systems and include mirrored volumes, RAID
volumes, and server clusters.
File Data
Data which has an associated file system.
Fibre Channel (FC)
A highspeed interconnect used in storage
area networks (SANs) to connect servers to
shared storage. Fibre Channel components
include HBAs, hubs, switches, and cabling. The
term Fibre Channel also refers to the storage
protocol.
Fibre Channel over Ethernet (FCoE)
A technology that encapsulates Fibre Channel
frames in Ethernet frames, allowing FC traffic
to be transported over Ethernet networks.
Standards are in work in different standardization committees. Products are announced for
2009 or 2010. FCoE could be an alternative to
classical Fibre Channel technology.
G
Global File System
In some configurations, as with clusters or
multiple NAS boxes, it is useful to have a means
to make the file systems on multiple servers or
devices look like a single file system. A global
or dispersed file system would enable storage
administrators to globally build or make
changes to file systems. To date this remains
an emerging technology.
H
High Availability
A continuously available computer system is
characterized as having essentially no downtime in any given year. A system with 99.999%
availability experiences only about five minutes of downtime. In contrast, a high availability system is defined as having 99.9% uptime,
Glossary
I
ILM (Information Lifecycle Management)
The process of managing information growth,
storage, and retrieval over time, based on its
value to the organization. Sometimes referred
to as data lifecycle management.
iSCSI (Internet SCSI)
A protocol that enables transport of block data
over IP networks, without the need for a specialized network infrastructure, such as Fibre
Channel.
ITIL (Information Technology Infrastructure
Library)
ITIL refers to a documentation of best practice
for IT Service Management. Used by many
hundreds of organisations around the world, a
whole ITIL philosophy has grown up around
the guidance contained within the ITIL books
and the supporting professional qualification
scheme. ITIL consists of a series of books giving
guidance on the provision of quality IT serv-
79
ices, and on the accommodation and environmental facilities needed to support IT. ITIL has
been developed in recognition of organisations growing dependency on IT and embodies
best practices for IT Service Management. ITIL
is often implemented when different enterprises work together, and can also facilitate
mergers and acquisitions.
J
JBOD (Just a Bunch of Disks)
As the name suggests, a group of disks housed
in its own box; JBOD differs from RAID in not
having any storage controller intelligence or
data redundancy capabilities.
L
LAN
Local Area Network. Hardware and software
involved in connecting personal computers
and peripherals within close geographic confines, usually within a building, or adjacent
buildings.
Load Balancing
Referring to the ability to redistribute load
(read/write requests) to an alternate path
between server and storage device, load balancing helps to maintain high performance
networking.
LTO
Linear Tape Open. The LTO family, a half-inch
open technology with Ultrium format a cartridge targeted at ultra-high capacity requirements.
LUN (Logical Unit Number)
A logical unit is a conceptual division (a subunit) of a storage disk or a set of disks. Logical
units can directly correspond to a volume drive
(for example, C: can be a logical unit). Each
logical unit has an address, known as the logical unit number (LUN), which allows it to be
uniquely identified.
80
LUN Masking
A method to restrict server access to storage
not specifically allocated to that server. LUN
masking is similar to zoning, but is implemented in the storage array, not the switch.
M
MAN
Metropolitan Area Network. A network capable
of high-speed communications over distances
up to about 80 kilometers.
Metadata
The information associated with a file but separate from the data in the file; required to
identify data in the file and its physical location on a disk.
Mirroring
A disk data redundancy technique in which
data is recorded identically and either synchronously or asynchronously on multiple
separate disks to protect data from disk failures. When the primary disk is off-line, the
alternate takes over, providing continuous
access to data. Normally used for missioncritical data, mirroring is classified as RAID 1
configuration and doubles disk costs.
N
NAS (Network Attached Storage)
A NAS device is a server that runs an operating
system specifically designed for handling files
(rather than block data). Network-attached
storage is accessible directly on the local area
network (LAN) through LAN protocols such as
TCP/IP. Compare to DAS and SAN.
P
Partition
A partition is the portion of a physical disk or
LUN that functions as though it were a physically separate disk. Once the partition is created, it must be formatted and assigned a drive
letter before data can be stored on it.
Glossary
Port
The physical connection point on computers,
switches, storage arrays, etc, which is used to
connect to other devices on a network. Ports
on a Fibre Channel network are identified by
their Worldwide Port Name (WWPN) IDs; on
iSCSI networks, ports are commonly given an
iSCSI name. Not to be confused with TCP/IP
ports, which are used as virtual addresses
assigned to each IP address.
R
RAID (Redundant Array of Independent
Disks)
A way of storing the same data over multiple
physical disks to ensure that if a hard disk fails
a redundant copy of the data can be accessed
instead. Example schemes include mirroring
and RAID 5.
Redundancy
The duplication of information or hardware
equipment components to ensure that should
a primary resource fail, a secondary resource
can take over its function.
Replication
Replication is the process of duplicating mission critical data from one highly available site
to another. The replication process can be synchronous or asynchronous; duplicates are
known as clones, point-in-time copies, or
snapshots, depending on the type of copy
being made.
S
SAN (Storage Area Network)
A storage area network (SAN) is a specialized
network that provides access to high performance and highly available storage subsystems
using block storage protocols. The SAN is made
up of specific devices, such as host bus adapters (HBAs) in the host servers, switches that
help route storage traffic, and disk storage
subsystems. The main characteristic of a SAN
is that the storage subsystems are generally
available to multiple hosts at the same time,
Glossary
which makes them scalable and flexible. Compare with NAS and DAS.
SAS/SATA
SAS: Serial Attached SCSI. While SATA (Serial
ATA) is designed for desktops, making it a good
choice in storage environments requiring configuration simplicity or optimal cost/capacity,
SAS delivers the high performance, scalability
and reliability required for mainstream servers
and enterprise storage.
SCSI (Small Computer System Interface)
A set of standards allowing computers to communicate with attached devices, such as storage devices (disk drives, tape libraries etc) and
printers. SCSI also refers to a parallel interconnect technology which implements the SCSI
protocol. SCSI is available in two flavours: Parallel SCSI and Serial Attached SCSI. Parallel
SCSI has been the standard in connectivity for
more than 20 years, and is known for its stability and reliability. Serial Attached SCSI (SAS) is
the newest generation of SCSI, with both Serial
ATA (SATA) and SAS drives.
Snapshot
A virtual copy of a device or filesystem. Snapshots imitate the way a file or device looked at
the precise time the snapshot was taken. It is
not a copy of the data, only a picture in time of
how the data was organized. Snapshots can be
taken according to a scheduled time and provide a consistent view of a filesystem or device
for a backup and recovery program to work
from.
Solid State Disk (SSD)
A solid state disk is a high-performance plugand-play storage device that contains no moving parts. SSD components include either
DRAM or EEPROM memory boards, a memory
bus board, a CPU, and a battery card. Because
they contain their own CPUs to manage data
storage, they are a lot faster than conventional
rotating hard disks; therefore, they produce
highest possible I/O rates. SSDs are most effective for server applications and server systems,
where I/O response time is crucial. Data stored
81
82
T
Tape Library
In data storage, a tape library is a collection of
magnetic tape cartridges and tape drives. An
automated tape library is a hardware device
that contains multiple tape drives for reading
and writing data, access ports for entering and
removing tapes and a robotic device for
mounting and dismounting the tape cartridges
without human intervention. To mount means
to make a group of files in a file system structure accessible to a user or user group.
Target
A target is the device to which the initiator
sends data. Most commonly the target is the
storage array, but the term also applies to
bridges, tape libraries, tape drives or other
devices.
TCP/IP
Transmission Control Protocol/Internet Protocol. A set of transport and network layer protocols developed under the auspices of the U.S.
Department of Defense. Has emerged as the
de-facto standard for communications among
Unix systems, particularly over Ethernet.
Thin Provisioning
Thin provisioning is most commonly used in
centralized large storage systems such as SANs
and also in storage virtualization environments
where administrators plan for both current
and future storage requirements and often
over-purchase capacity, which can result in
wasted storage. Since thin provisioning is
designed to allocate exactly what is needed,
exactly when it is needed, it removes the element of paid for but wasted storage capacity.
Additionally, as more storage is needed additional volumes can be attached to the existing
consolidated storage system.
Tiered Storage
Data is stored according to its intended use.
For instance, data intended for restoration in
the event of data loss or corruption is stored
locally, for fast recovery. Data required to be
Glossary
V
VTL (Virtual Tape Library)
Refers to an intelligent disk-based library that
emulates traditional tape devices and tape formats. Acting like a tape library with the performance of modern disk drives, data is deposited onto disk drives just as it would onto a
tape library, only faster. Virtual tape backup
solutions can be used as a secondary backup
stage on the way to tape, or as their own
standalone tape library solution. A VTL generally consists of a Virtual Tape appliance or
server, and software which emulates traditional tape devices and formats.
Virtualization
In storage, virtualization is a means by which
multiple physical storage devices are viewed as
a single logical unit. Virtualization can be
accomplished inband (in the data path) or
out-of-band. Outofband virtualization does
not compete for host resources, and can virtualize storage resources irrespective of whether
they are DAS, NAS or SAN.
Volume
A volume is an area of storage on a hard disk. A
volume is formatted by using a file system, and
typically has a drive letter assigned to it. A single hard disk can have multiple volumes, and
volumes can also span multiple disks.
Z
Zoning
A method used to restrict server access to storage resources that are not allocated to that
server. Zoning is similar to LUN masking, but is
implemented in the switch and operates on
the basis of port identification (either port
numbers on the switch or by WWPN of the
attached initiators and targets).
(Sources: Adaptec, Fujitsu Siemens Computers,
Horison Information Strategies, Microsoft,
SNIA, Webopedia, ZAZAmedia)
Information infrastructures
in enterprises
he principle of profitable trading increasingly demands efficient handling of information in the enterprise, especially when you consider that the volume of information is growing by an average of 60 percent per year. With its solutions, EMC is
striving to make optimal use of this capital as well as to protect, manage, store and
archive it. EMC thereby rings in a paradigm change by moving the information itself
rather than the applications into the center of the infrastructure. As a result, the
demands on the infrastructure must be focused on the paths the information takes in
the enterprise: from creation, capture and utilization, through to archiving and deletion. The optimum strategy for setting up an information infrastructure includes intelligent data storage, protection against data loss or misuse, optimizing the infrastructure of IT management and services and utilizing the value added potential of information. Alongside the top priority of cost reduction, enterprises mainly want to improve
their compliance with all legal requirements and enhance support of their business
processes. Business demands on IT are therefore given noticeably higher priority than
technological goals such as better data security or better structured data.
84
EMC
Before
After
Tier 1
Production
Data
Tier 2
Tier 3
Tiered Storage
85
86
EMC
Before
Tier 2
Tier 3
Tier 1
Tier 2
Active Archive
Tier 1
After
Tier 3
vielen
E-Mailoder
Dateisystemumgebungen
werden
mehr
InInmany
e-mail
or file
system
environments, greater than
75 percent
75%
nicht verndert
damit
diese
Daten for
ofalsthe
datader
is Daten
not modified,
which makes
it ansind
ideal
candidate
idealearchive.
Kandidaten fr die aktive Archivierung.
active
Before
87
After
88
EMC
Before
After
Backup
Data
Backup
Data
Tier 3
Before
89
After
Snaps
Snaps
Clones
Snaps
Snapshots provide
to bis
10 times
reductionReduzierung
in capacity required
Snapshots
bieten up
eine
zu zehnfache
der
for die
locallokale
replication.
fr
Replikation erforderlichen Kapazitt.
90
EMC
91
92
EMC
EMC Storage integration with Microsoft is implemented via Fujitsu Siemens Computers BladeFrame technology
Common solution for Grid Computing based on Fujitsu Siemens Computers
PRIMERGY server systems
OEM and reseller agreements for EMC Networker
For more information on EMC solutions, visit www.emc.com
Introduction
n January 2008, Brocade introduced 8 Gbit/sec capabilities for the Brocade 48000
Director and the new Brocade DCX Backbone platform. Brocade is expanding this
leadership position with the introduction of an entire family of 8 Gbit/sec switch products targeting a range of data center environmentsfrom the enterprise to Small and
Medium Business (SMB). In addition Brocade is launching 8 Gbit/sec Host Bus Adapters
(HBAs), providing the industrys first end-to-end 8 Gbit/sec solution for SMB to enterprise customers. These high-performance solutions are driven by a new family of Brocade 8Gbit/sec ASICs, which process and route data with much higher levels of efficiency. In addition to doubling performance throughput, these new ASICs offer new
capabilities that align with growing data center requirements for IT process automation, energy efficiency, and reduced Operating Expenses (OpEx). Steady increases in performance and functionality have been the hallmark of Fibre Channel evolution over the
past decade. With the periodic doubling of transport speed from 1 to 2 Gbit/sec and
from 2 to 4 Gbit/sec, storage administrators have quickly exploited the new perform-
94
Brocade
ance capabilities and advanced features to build more optimized storage networks.
With the introduction of Brocade 8 Gbit/sec switches and HBAs, it is now possible to
fully integrate advanced functionality that extends from the fabric all the way to the
server platform. In trying to decide where enhanced performance and capabilities can
be applied in your own environment, consider the following:
Storage Growth. Storage Area Network (SAN) storage capacity has dramatically
increased year over year in almost all data centers. As SAN storage grows, so do the
fabrics that interconnect storage with servers.
Large Fabrics. As fabrics grow, more Inter-Switch Links (ISLs) are used to keep pace
with storage and server scaling.
Higher Levels of Performance. In large-scale data centers, moving SAN bandwidthintensive hosts to 8 Gbit/sec connectivity enables the servers to achieve higher levels
of performance using fewer HBAs and a smaller cabling infrastructure.
Server Virtualization. Hosting multiple operating system instances on a single host
platform dramatically increases storage I/O demands, which in turn drives up host
SAN throughput.
Tiered Services. In a shared environment, in which IT may be using chargeback to
serve internal customers, a tiered services model requires the ability to specify service levels for hosted applications and to monitor and manage these services end to
endall capabilities of Brocade 8 Gbit/sec solutions.
Backup. Large amounts of traffic to tape or disk during backups require the fastest
SAN speeds possible to fit within backup windows.
Operational Flexibility. While not all hosts, storage, and ISLs currently require maximum speed capability, it is much easier to architect data center fabrics when highspeed ports are available.
Investment Protection. Existing SANs can be significantly enhanced with new capabilities enabled by 8 Gbit/sec port speed. Integrated Routing and Adaptive Networking services are compatible with legacy SAN equipment, extending their Return on
Investment (ROI) as data center fabrics scale.
Data centers may have some or all of these needs today. Although meeting these
needs may not require an immediate upgrade to 8 Gbit/sec for all storage applications,
future plans for expansion, virtualization, and fabric scaling will make acquiring 8 Gbit/
sec capabilities today a safe and well-founded decision. As fabrics scale, for example,
only half the number of ISLs is required with 8 Gbit/sec links than with 4 Gbit/sec links.
Likewise, the ISL oversubscription ratio is halved by upgrading from 4 to 8 Gbit/sec, ISLs
using the same number of links.
At long distances, 8 Gbit/sec can earn a very fast ROI compared to 4 Gbit/sec, due to
the high cost of dark fiber or WDM links. Almost all of these native FC extension links
support 8 Gbit/sec speeds, so utilization can be doubled on links that usually cost thou-
95
sands, if not tens of thousands, of dollars per month. This can quickly justify the equipment cost for the increased speed capability. Building a high-performance foundation
that provides the flexibility to selectively deploy 8 Gbit/sec as needed simplifies data
center fabric management and accommodates the inevitable growth in applications
and data over time.
96
Brocade
Brocades new family of 8 Gbit/sec switches supports the rapidly growing data center
by delivering 8 Gbit/sec performance on every port with no oversubscription. A completely non-oversubscribed switching architecture enhances server scalability by enabling the rapid growth of virtual servers without compromising data center performance.
Integrated Routing (IR) fabric service is a new option on the Brocade DCX Backbone
and Brocade 5300 and 5100 Switches with the release of Fabric OS (FOS) 6.1. As of FOS
6.1, IR can be activated on FC8 port blades with up to 128 IR ports per Brocade DCX
chassis. (When there are two Brocade DCX chassis connected via Inter-Chassis Links, a
total of 256 IR ports are available.) No additional hardware is required to enable perport Fibre Channel Routing; only an optional IR software license is required. IR can be
enabled on the maximum number of ports on the Brocade 5300 (80 ports) and Brocade
5100 (40 ports) via user configuration. Brocade 8 Gbit/sec HBA ASICs support a maximum of 500k I/O per Second (IOPS) per port ( >1M IOPS on a dual-port HBA) to free up
the host processors and meet virtualization productivity goals. In the future, two 8
Gbit/sec HBA ports will be able to be combined into a single, ultra-high-speed 16 Gbit/
sec connection using Brocade ISL Trunking technology, which balances traffic flows at
the frame level. Currently, the benefits of Brocade 8 Gbit/sec switching technology are
extended directly to VMs via N_Port ID Virtualization (NPIV), so that special Brocade
features, such as Top Talkers and QoS Traffic Prioritization, can be applied to individual
VMs. This end-to-end fabric and host integration is unique to Brocade and offers the
industrys highest I/O performance for virtualized environments.
Brocade 8 Gbit/sec HBAs complement industry-leading performance with advanced
storage functionality to further streamline virtualized server operations. To meet regulatory compliance requirements, for example, Brocade 8 Gbit/sec HBAs implement the
industry standard Fibre Channel Security Protocol (FC-SP) and will support in-flight
data encryption for secure network transactions.
In addition, the new Brocade fabric service, Adaptive Networking, provides configurable Quality of Service (QoS) for each VM. With the increasing use of VM mobility to
shift application workloads from one hardware platform to another, conventional networking methods are no longer sufficient. Brocade meets the needs of more dynamic
virtualized environments by providing an integrated fabric and HBA solution that can
selectively deploy security and QoS to VM-hosted applications as required.
97
transition to 8 Gbit/sec with the release of Fabric OS 6.1 and a full family of new
switches and HBAs for end-to-end connectivity in the data center:
Brocade 815 (single port) and 825 (dual port) HBAs
Brocade 300 Switch with 8, 16, and 24 ports
Brocade 5100 Switch with 24, 32, and 40 ports
Brocade 5300 Switch with 48, 64, and 80 ports
FC8-16, FC8-32, and FC8-48 port blades for the Brocade 48000 Director
Brocade 8 Gbit/sec switches comply with industry standards; and fabrics with 4 and
8 Gbit/sec devices interoperate seamlessly. Visit the Brocade Web site for data sheets
describing these products: www.brocade.com
Conclusion
peed increase in Brocade switching platforms is one of many advantages from Brocades next generation ASIC family. Higher speed in the data center brings the immediate benefit of higher-performing ISLs and increased scalability; since ISL performance
is doubled, more ports can be used for servers and storage. In addition, 8 Gbit/sec is
needed for server virtualization, scaling of fabrics, backups, and high-performance
computing requirements. New capabilities, such as Adaptive Networking and Integrated
Routing, plus the enhanced power efficiencies of the new switch platforms are also
important drivers for adoption of 8 Gbit/sec technology. Every data center user has or
will have these needs in the future, and as data center plans are developed, Brocades
integrated end-to-end 8 Gbit/sec solution provides the broadest choice of capabilities
with the highest performance and efficiency.
NetApp
Innovative solutions for storage
and data management
etApp embodies innovative storage and data management with excellent cost
efficiency. The commitment to simplicity, innovation and the success of its customers has enabled the company to become one of the fastest growing storage
and data management manufacturers. The wide-ranging solution portfolio for serverto-storage virtualization, business applications, data security and much more has persuaded customers worldwide to opt for NetApp.
NetApp ensures that your business-critical data is constantly available and can also
simplify your business processes. Based on the motto Go further, faster, NetApp helps
companies to be successful. The storage requirement for company data will continue to
grow fast in coming years. This presents IT managers with the challenge of purchasing
an ever increasing quantity of storage equipment yet also having to manage such
devices. With the help of its powerful unified storage architecture NetApp helps companies overcome these challenges efficiently: Extremely low operating costs (TCO), very
fast backup and restore processes, high availability levels, consolidation and virtualization options as well as simplified and easy management of the entire storage environment are behind the NetApp slogan Go further, faster.
Solutions
Microsoft, VMware, Oracle and SAP are important strategic NetApp partners. NetApp
has developed a wide range of tools for its database and application software.
100
NetApp
101
102
NetApp
Together with providers, such as VMware, NetApp offers solutions and best practices
for developing a virtualized infrastructure from servers to storage that provide a
number of advantages:
Scalable and consistent I/O performance for all ESX protocols (NFS, iSCSI and FC)
Flexible, fast, simple and low-priced provision and data management solutions
First-class virtualized storage solution for thin provisioning in heterogeneous storage environments
NetApp deduplication in the Esx environment
The NetApp deduplication is one of the fundamental components of our data ONTAP
operating system. The elimination of redundant data objects and exclusive referencing
to the original object permits more efficient use of the existing storage.
SnapManager for virtualized infrastructures (VI)
SnapManager for VI provides customers with an automated solution for backing up and
restoring virtual machines in a VMware ESX environment. The two main advantages of
this solution are:
The backups created using NetApp snapshot technology only use a fraction of the
storage space that traditional systems would require.
The system performance of the ESX environment and thus the applications are
hardly impaired by the SnapManager backup and restore processes. More than 5000
customers worldwide (March 2008) already benefit from the advantages of a VMware
solution with NetApp storage.
Support for a virtual desktop infrastructure
In addition to server virtualization, the VMWare Virtual Desktop Infrastructure (VDI)
offers additional resource-saving virtualization technology. Application environments
no longer run on the users desktop processor but in virtual machines in the data center.
NetApp FlexClone can configure thousands of such virtual machines within minutes.
Data deduplication enables storage capacity savings of approx 90%.
NetApp solutions for SAP
As a worldwide technology partner for SAP, NetApp has a successful history in developing solutions which significantly simplify SAP data management. As one of the founding members of the Adaptive Computing Initiative for SAP, NetApp has been awarded
numerous certificates for the compatibility of its storage solutions and is in the SAP
compliance list for the SAP Adaptive Computing Services for Unix, Linux and Windows platforms.
NetApp won the SAP Pinnacle Award for Technical Innovation and cooperation in
the Adaptive Computing Netweaver Innovation category for the FlexFrameTM for
103
mySAP Business SuiteTM joint development with Fujitsu Siemens Computers. Integrative components are NetApp system cloning and backup/recovery scenarios.
The NetApp Unified Storage model provides SAN / IP SAN and NAS connections with
block and file access methods within a single storage architecture.
Data management solutions, such as FlexCloneTM, are used to clone SAP productive
systems within a few minutes and without affecting performance and without any
additional initial storage requirement. This thus significantly simplifies the addition and
management of systems for QA, test, development, reporting, interfaces and training.
The combination of NetApp SnapShotTM and SnapRestore provides SAP customers
with fast, simple backup and restore processes for several TB of SAP data as well as
efficient, simple upgrades and migrations of SAP systems. NetApp Adaptive Computing
solutions enable SAP customers to react dynamically, flexibly and economically to business requirements.
NetApp also offers the following for those companies using SAP:
A comprehensive range of products for Windows, Unix and Linux environments with
unified NAS/SAN storage solutions.
ILM solutions: storage consolidation, backup and recovery, archiving and compliance
via ArchiveLinkTM and/or WebDAV
High-availability and disaster recovery solutions for data encryption
SnapManager for SAP: the solution certified by SAP simplifies the creation of application-consistent snapshot copies, automated error-free data restores and permits
application-specific disaster recovery.
Clones of the SAP database can also be automatically created.
There is a worldwide support agreement between NetApp and SAP which ensures
that the customer has 7x24 SAP infrastructure support.
Further information about this and other NetApp solutions can be found at www.
netapp.com
CA RECOVERY MANAGEMENT
Benefits
he solution can be seamlessly integrated into existing IT management solutions so
that Enterprise IT Management is simplified and extended.
CA Recovery Management:
a complete data protection solution
A Recovery Management offers comprehensive and integrated data protection and
recovery functions which your company requires. Robust and proven technologies
are used which are connected via one simplified interface. These technologies provide
multi-level data protection which can be aligned to your company targets, requirements and guidelines and covers numerous hardware and software platforms.
106
CA Recovery Management
Markets
un Microsystems is a system provider that develops hardware and software. Since
software development concentrates on resolving system-related tasks or setting
strategically important milestones in line with Suns vision, Sun is not in competition
with the developers of application programs. On the contrary, firm partnerships exist
with numerous renowned software manufacturers in order to develop offerings
together. In this way, customers retain their freedom, because they can decide in favor
of the best solution on the market. To ensure that systems are integrated at an early
stage Sun Microsystems has set up a number of partner programs, which have pioneered both branch-related and task-related methods. There is a solution portfolio of
almost 13,000 commercial and technical applications for Sun Systems with the platform SPARC/Solaris.
In addition to the partnerships with independent software manufacturers, Sun is
very much committed to long-term sales partnerships with innovative distributors and
resellers. These partnerships have enabled fast and competent solutions for end customers on a wide-spread basis.
108
Sun Microsystems
109
Highlights
Flexible scaling, for which only the capacity actually used is charged.
Cost savings through less space and lower power consumption.
Simpler storage management through partitioning, sharing.
Innovative technologies, risk reduction and development of new opportunities.
Support and services for successful installation, optimization and maintenance.
Highlights
Data consolidation: Excellent scaling as well as support for mixed media.
Joint resources: Designed for use in mainframes, Unix, Linux and Windows.
Higher availability: Upgrade without downtimes. Redundancy of operation.
Higher throughput: High-performance throughput and capacity.
Simple scaling: Without any downtimes, growth according to your requirements.
Low space requirements: High density of media slots, optimal utilization.
For more information on Sun solutions, visit www.sun.com
Symantec:
Confidence in a connected world
ymantec is a global leader in providing security, storage and systems management
solutions to help businesses and consumers secure and manage their information.
Headquartered in Cupertino, Calif., Symantec has operations in more than 40
countries.
Market Categories
Consumer Products; Security and Compliance; Storage and Availability Management;
Symantec Global Services.
Symantecs leaders bring decades of diverse experience and a history of success.
Combining business acumen with technical savvy, these executives guide more than
17,500 talented employees to create innovative products and solutions that enable
customers around the world to have confidence in their infrastructure, information and
interactions.
112
Symantec
Storage Basics
Storage Basics
Contents
Section 1
The information society saving data and knowledge at new levels
Section 2
Tiered storage: intelligent information management in the company
Section 4
Storage networks spoilt for choice
Section 5
Backup & Restore: an unloved compulsory exercise
Section 6
Storage management making complex storage networks manageable
Section 7
Virtualization some catching up is necessary regarding storage topics
Section 8
The storage strategy of Fujitsu Siemens Computers and its partners
Forecast: Future storage trends
Glossary
As well as articles about storage from our partners
Brocade, CA, EMC, NetApp, Sun and Symantec
Section 3
Online storage: disks and reliability