You are on page 1of 46

Best Practice Guide

February , 2014
This document was created to be an aid for those who install and configure DataCore Software Storage Virtualization
solutions. It is a collection of insights which have proved to be beneficial over time. It documents good storage and
storage network design as well as good software configurations that optimize utility (stability), performance, and
availability.
The Best Practice Guide is intended for use by trained personnel. We assume that standard terms such as Hosts, Fibre
Channel and/or iSCSI, etc. are understood. We also assume that common tasks such as creation of pools and Virtual
Disks, connection of Hosts, configuration RAID and serving volumes to Hosts are also understood.

Cumulative Change Summary Date


Memory and Hardware sections updated. Added ISCSI settings for Windows 2008. Added information about
running two different fail over applications on the same Windows application server, added FAQ, Qualified July, 2010
Lists and Manual/Admin guide links. Added 2TB storage LUNs support on SANsymphony-V 7.0.3 and 3.0.3.
Added User-Mode Crash Dump settings. Updated Network Configuration section. Added use of Long I/O
metrics to Pool Performance Aspects section. Updated Synchronous Mirroring over Long Distances with October, 2010
example. Edited Snapshot Best practices about reasons for 1MB SAU size for Snapshot Destination Pool.

Added SANsymphony-V-V. Removed references to 2.x and SANsymphony-V 6.x. February, 2011

Updated Network Connection paragraph on page 6 April, 2011

Added recommendation to not disable the DataCore iSCSI driver/adapter for those channels which won't be
June, 2011
used for iSCSI.

Overhaul to several chapters – too many to list individually. Removed previous recommendations about NIC
January, 2012
Adaptor settings as these have been found to be non-optimal in many cases.

Re-written to include new features and terminology only for SANsymphony-V February 2014

COPYRIGHT
Copyright © 2014 by DataCore Software Corporation. All rights reserved.

DataCore, the DataCore logo, SANsymphony-V, and are trademarks of DataCore Software Corporation. Other DataCore product or
service names or logos referenced herein are trademarks of DataCore Software Corporation. All other products, services and
company names mentioned herein may be trademarks of their respective owners.

ALTHOUGH THE MATERIAL PRESENTED IN THIS DOCUMENT IS BELIEVED TO BE ACCURATE, IT IS PROVIDED “AS IS” AND
USERS MUST TAKE ALL RESPONSIBILITY FOR THE USE OR APPLICATION OF THE PRODUCTS DESCRIBED AND THE
INFORMATION CONTAINED IN THIS DOCUMENT. NEITHER DATACORE NOR ITS SUPPLIERS MAKE ANY EXPRESS OR
IMPLIED REPRESENTATION, WARRANTY OR ENDORSEMENT REGARDING, AND SHALL HAVE NO LIABILITY FOR, THE USE
OR APPLICATION OF ANY DATACORE OR THIRD PARTY PRODUCTS OR THE OTHER INFORMATION REFERRED TO IN
THIS DOCUMENT. ALL SUCH WARRANTIES (INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, NON-
INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE AND AGAINST HIDDEN DEFECTS) AND LIABILITY ARE HEREBY
DISCLAIMED TO THE FULLEST EXTENT PERMITTED BY LAW.

No part of this document may be copied, reproduced, translated or reduced to any electronic medium or machine-readable form
without the prior written consent of DataCore Software Corporation.
Table of Contents

BEST PRACTICE GUIDE 1


1 – General Outline 3
2 – High Level Design 4
3 – DataCore Server Hardware Design Guidelines 5
Hardware & Software Requirements 5
General Hardware recommendations 5
SPOF (Single Point of Failure). 6
Determine performance requirements 7
Number of iSCSI ports 9
Amount and Speed of CPUs 9
Amount of Memory 10
RAM Sizing 11
General Operating System Notes 11
Microsoft Windows (non-OEM) versions vs. Microsoft OEM Windows versions 11
Language version 12
Windows Operating System Settings 12
4 - Network Configuration 14
Virus, Malware and other protection: 16
General 16
Installation 17
Backup and restore a DataCore Server 19
5 – SAN Design Guide 20
Avoiding Single Points of Failure (SPOF) 20
Cabling and Port Settings 21
Zoning Considerations (Fibre Channel Switch Configuration) 24
iSCSI (LAN) Cabling and Port Settings 26
6 - Pool performance aspects 29
Understanding DataCore Thin Provisioning Pool Technology 29
Disk Types 30
Amount of Disks 31
RAID Layout 31
General Storage/Disk Pool notes 36
7 – Synchronous Mirror Best Practices 37
Match Size and Performance Characteristics 37
Synchronous Mirroring over Long Distances 38
8 – Snapshot Best Practices 40
Snapshot Destinations Pool Considerations 40
Snapshot Performance Considerations 40
9 – Replication Best Practices 42
Link Throughput 42
Buffer Location and Size 42
Backup Applications use timestamps 43
Access Replication Destination Virtual Disk 43
Bidirectional Replication 43
10 – Continuous Data Protection (CDP) 45
11 –Host Best Practices 46
1 – General Outline

Best Practices are flexible

Each customer environment is unique, which makes giving general advice somewhat difficult. The
recommendations given in this document are guidelines – not rules. Even following all recommendations in this
guide does not necessarily mean that it will be perfect in any regard due to the dependencies on individual
factors. However, following the guidelines should most likely provide a stable, well-performing and secure
system.

Guidelines do not supersede technical training or professional skills

This document does not supersede DataCore technical training courses provided by DataCore Software or a
DataCore Authorized Training Partner. Attending one or more of the following training courses is mandatory for
any installer of high availability (HA) environments:

 SANsymphony-V Administration (SYMV9P) – 3 days


 DataCore DCIE Development Training (in-house) – 2 days
or
 SANsymphony-V Administration and DCIE Development (SSVDCIE) – 5 days

Also necessary is holding a valid DataCore Certified Implementation Engineer (DCIE) certification. In addition
professional skills in dealing with storage devices, RAID technology, networking, Storage Area Network (SAN)
infrastructures, Fibre Channel and/or iSCSI protocol are necessary.

If you do not fulfill the above mentioned points and/or have any difficulties understanding terms or procedures
described in this document, please contact your DataCore sales representative or DataCore training department
for information on obtaining a DCIE certification and the required skills.

Best Practice Guide 3


2 – High Level Design

Theory of Operation
First of all, prior to any task of system planning or implementation being performed, a Theory of Operation
should be developed. The Theory of Operation is a description of how a system should work and what goals
must be achieved. Operation, especially in terms of availability or safety, is always an end-to-end process from
the user to the data or service. Storage or storage services cannot be considered separately since they are not
isolated instances but integrated pieces of IT infrastructure.
Several books have been written about that topic which this document does not want to supersede. The
following list will provide some aspects of a safe IT environment which are too often overlooked.

Keep it simple
Avoid complexity at any level. Sophisticated environments may be smart in many aspects – however, if they are
error-prone, unstable or complex to support, a more simple approach is often more beneficial.

Separation is key
 Any dependency to possible single point of failure should be avoided. Some dependencies are often
disregarded but can impact a whole environment.
 Keep storage components away from the public network. Limit users who can access (or are aware of the
existence of) a central device.
 Distribute devices across separate racks, rooms, buildings and sites. Create separated hazard zones to
isolate disruptive impacts.
 Consider redundant power sources. Avoid connecting redundant devices to the same power circuit.
 Use UPS protected power supply and connect every device to it. For example, a UPS back-up does not
help much if the UPS fails to notify the Host to shut down because a management LAN switch was
considered as 'minor' and therefore not connected to the UPS backed power circuit.
 Regard failsafe networks (such as LAN, WAN, SAN infrastructure) – a highly-available IT system may
become worthless quickly if it can no longer be accessed due to a network outage.
 Do not forget environmental components (air condition, physical location, etc.). A non-redundant failed air
conditioner may collapse all redundant systems located in the same datacenter. Rooms on the same floor
may be affected by a pipe burst at the same time even if they are separated from each other. Datacenters
in the same building may be affected by a fire if a coffee machine inflamed a curtain somewhere else in the
building.

Control access
DataCore Servers should be accessed by qualified (trained and skilled) personnel only. Make sure that
everyone understands the difference between a 'normal' server and a DataCore Server as explained in this
document.

Monitoring and notification


Highly available systems typically recover automatically from errors or keep the service alive even if half of the
environment fails. However, those conditions must be recognized and fixed as soon as possible by the
responsible personnel to avoid future problems.

Knowledge and documentation


Document the environment well, keep the documentation up-to-date and available. Establish 'shared
knowledge' – make sure that at least two people are familiar with any area on the SAN/DataCore setup.

Best Practice Guide 4


3 – DataCore Server Hardware Design Guidelines
Hardware & Software Requirements
For minimum hardware and software requirements as well as supported hardware and software (Fibre Channel
rd
HBAs, Switches, 3 Party Software etc.) check Qualified Lists available on the DataCore Support Web site:
http://datacore.custhelp.com/app/answers/detail/a_id/283

If you will be running the DataCore Server in a Virtual Machine please check FAQ #1155
http://datacore.custhelp.com/app/answers/detail/a_id/1155

General Hardware recommendations


DataCore Servers play a key role for performance and availability in a storage environment since all traffic
between Application Server/Hosts and disk devices flows through these appliances. Therefore the hardware
platform hosting SANsymphony-V should be a machine of adequate quality. An industry standard server of any
of the major brands is normally a good choice.

Important points:
 The boot drive (C:\) should be two hard disks in RAID 1 configuration
 Use separate RAID controllers for boot drive (C:\) and RAID sets to be used for disks used in Pools or
Replication buffers
 Equip the server with redundant power supplies
 Cover hardware components with an appropriate service contract to ensure quick repair in case of failure
 Network connection that does not rely on DNS so that inter-node communication can always take place
 Separate ports used for iSCSI from performing regular LAN traffic
 DataCore Servers should be protected against sudden power outages (UPS)

PCI bus

RAID controllers, Fibre Channel HBAs, iSCSI HBAs, SSD cards, NICs, etc. can generate a lot of traffic which
needs to be transported over the buses of the DataCore Server. When selecting the hardware platform for the
DataCore Server make sure that the system bus and backplane can handle the expected workload.
PCI-Express (PCIe) is a serial bus architecture and best used with other serial protocols like Fibre Channel,
SAS/SATA and iSCSI. For a DataCore Server, adequate server hardware with appropriate/independent buses
should be chosen rather than workstation or blade server hardware which is typically not designed for heavy
backplane IO.

RAID controllers

RAID controllers or SAN storage controllers which are used to control physical disk drives used for DataCore
Virtual Disks have an essential requirement—they must be capable of delivering high performance. Bear in
mind that those controllers typically do not host disks for one server, but need to handle a heavy load of I/O
requests from multiple Hosts.
Here a rule of thumb applies: Low-end RAID controllers deliver low-end performance. For example, an
integrated onboard RAID controller that comes free with a server box may be sufficient to control two disks in a
RAID 1 configuration for the boot drive. RAID controllers which are capable of controlling numerous disks and
delivering adequate performance typically have their own CPU (RAID accelerator chip), battery protected
cache, multiple ports etc.
Best Practice Guide 5
Fibre Channel HBAs / iSCSI interfaces

Fibre Channel HBAs and network adapters used for iSCSI are available in single-port, dual-port and quad-port
cards. From a performance standpoint, there is often not much of a difference if two single-port cards or one
dual-port card is used. From an availability standpoint there is a difference. If a dual-port or quad-port card fails,
most likely all of the ports are affected. For minimizing the risk of multiple port failures, a larger number of cards
are preferred. Keep in mind that you also need a machine that has enough slots to support those cards.

Network Connection

The network connection between DataCore Servers is critical for inter-node communication. User Interface
updates also requires that the DataCore Servers can communicate to each other. Name resolution is needed
for proper communication so if the DNS is down or has stale information this can result in delayed responses. It
is recommended that this link be dedicated and not shared for iSCSI, Replication or other server management
activities.

SPOF (Single Point of Failure).


Always be aware of hardware which might be a potential SPOF. Always test these during the Functional Test
Plan, so the end customer/administrator knows what impact that part failing will have on the set up/SAN.
Types of hardware that can be SPOF:
 Multiport HBA’s
 Multiport NIC’s
 Single storage array as in the case of Shared Multiport Array (SMPA) configurations
 Single power supply
 All servers in the same location
See the section on SAN Design in this document for more information.

Best Practice Guide 6


Determine performance requirements

A DataCore Server’s hardware design is key to superior performance. Before the hardware is set up, learn what
the requirements are first.
In a Storage Area Network (SAN) without DataCore being involved all I/Os from the Hosts to the disks and back
are transported by the SAN infrastructure. This is known as the total throughput and measured in I/Os per
second and Megabytes per second (MB/s).

Host

SAN Throughput
I/Os & MB/s

Disks / Storage Arrays

The value IO/s declares how many I/O operations per second are processed while MB/s specifies the amount
of data being transported. Both values correlate with each other, but must be considered separately.
For example, a database application which creates many small I/Os may generate a high amount of IO/s, but
little MB/s due to the fact that the I/Os are small. A media streaming application may generate massive MB/s
with much fewer IO/s.
As a rule, there is never a sustained throughput. Workload may vary highly over time. User behavior influences
workload, especially application tasks like backup jobs, archive runs, data mining, invoicing etc. may generate
load peaks.
In a DataCore environment, all I/Os go through the SANsymphony-V Servers. In order to ensure an appropriate
hardware design, the performance requirements must be known. Good sources for getting performance
analysis are management applications, SAN switch logs, performance monitoring, etc. If those values are
unknown and cannot be measured, they may be estimated. However, hardware design based on assumptions
may turn out to be insufficient and may need to be adjusted.

When designing a DataCore Server, three points are crucial:

 Number of Fibre Channel (FC) / iSCSI ports


 Number of CPUs
 Amount of memory
 Speed of the storage and/or SCSI controllers

Best Practice Guide 7


Those requirements can be calculated as follows.

Table 1 – Maximum HBA performance (according to vendor specification)

Port Speed IO/s MB/s half-duplex MB/s full-duplex


2 Gb/s 100000 200 400
4 Gb/s 150000 400 800
8 Gb/s 200000 800 1600
16 Gb/s 1.2mil 1600 3200

Above table (Table 1) shows the maximum values that Fibre Channel ports can achieve. They are specifications
and manufacturer information; they do not reflect the effective amount of user data which can be transported.
In “real life” scenarios, Fibre Channel ports may not be utilized over 66% of their maximum IO/s due to network
protocol overhead. Table 2 shows more realistic values.

Table 2 – Realistic values*

Port Speed IO/s MB/s half-duplex MB/s full-duplex


2 Gb/s 65000 180 360
4 Gb/s 100000 360 720
8 Gb/s 130000 720 1440
16 Gb/s 792000 1440 2880
* These values are theoretical averages. In practice the real performance depends on many factors and may be
higher or lower. However, from a practical standpoint these values can be used for calculations.

On the basis of these values, the number of Fibre Channel ports needed to carry the load between Hosts and
disks can be determined. The amount of ports must be appropriate to satisfy load peaks and should leave some
space for future growth – do not choose hardware that is minimally sufficient.

Symmetric design

Best practice design rule is using separate physical ports for “frontend” traffic (to Hosts), “backend” traffic (to
storage/disks) and mirror traffic (to other DataCore Servers). Basically it is possible to share ports. However, a
clean design employs dedicated ports.

A rule of thumb applies:


Equal count of HBAs for frontend, backend and mirror
ports. If number of lines were calculated with 2, in total 6
Frontend
Fibre Channel ports are needed:
2 dedicated ports for frontend traffic
2 dedicated ports for backend traffic
Mirror
2 dedicated ports for mirror traffic

DataCore
If equipped with 8 Gb/s FC-HBAs, this DataCore Server Backend
should be able to transport 260,000 IO/s up to 1440 MB/s.
Server

Best Practice Guide 8


Number of iSCSI ports

Amount of required network interface cards (NIC) used for iSCSI traffic is more difficult to determine than the
number of Fibre Channel ports. iSCSI performance typically has higher dependencies on external factors
(quality of iSCSI initiators, network infrastructure, switches and their settings, CPU power, etc.). iSCSI is often
combined with other interconnect technologies like shown in diagram below.

Frontend (iSCSI)

Mirror
(Fibre Channel)

DataCore Server
Backend (SAS/SATA)

The table below shows average iSCSI performance values. Note that these values are just indications. The
nature of iSCSI is that throughput highly depends on a variety of factors.

Port Speed IO/s MB/s


1 Gb/s 12000 80 CPU considerations for iSCSI

10 Gb/s 80000 530


iSCSI traffic generates more CPU load than Fibre
Channel due to the fact that most iSCSI protocol overhead (encapsulation of SCSI commands in IP packages)
is handled by DataCore Servers CPUs. Typically, iSCSI initiator ports consume more CPU cycles than iSCSI
target ports.

There is no rule of thumb for calculating CPU load for iSCSI traffic as this depends on the Operating system as
well as on the CPU architecture and number of NIC cards/ports. This is what you should have at least:

For every 2 iSCSI ports (frontend, mirror or backend ports), add one additional CPU (core).

Amount and Speed of CPUs

Best Practice Guide 9


Depending on how many IO/s must be processed by the DataCore Server; the appropriate amount of CPUs
must be selected.
Modern server CPUs are able to execute millions of commands per second. CPU power is often
underestimated and as a result DataCore Servers are frequently “overpowered” in terms of CPU power.

Depending on how many independent buses the architecture provides

One CPU can handle one FC-HBA triplet (frontend, backend, mirror port) or two dual port FC HBAs.

Following this rule, a DataCore Server with 6 Fibre Channel ports needs 2 CPUs. Or the other way around—a
quad-core is adequate to serve 12 Fibre Channel ports (4 frontend, 4 backend, 4 mirror ports). This rule
assumes the usage of latest available CPU technology.

iSCSI note: Please be aware that iSCSI traffic causes much higher CPU load than Fibre Channel because the
encapsulation of SCSI commands in IP packages (and vice versa) is performed by the CPU. A DataCore Server
which handles heavy iSCSI traffic might have significantly higher demand for CPU power.

Speed of CPU

The faster a CPU runs, the more commands it can execute per second. However, practical experience has
shown that clock speed is a secondary factor in terms of I/O performance. Two slower CPUs are preferred over
one fast one.
This does not mean that clock speed does not matter. It means within a CPU family the clock speed difference
is minor, for example 3.0 GHz compared to 3.2 GHz clock speed of the same CPU type.

Type of CPU

Basically any x64 CPU is adequate; DataCore Software has not recognized any significant performance
difference between CPUs from different vendors with similar architecture and clock speed. However, we
recommend the use of “server class” CPUs instead of CPUs which are intended to be used in consumer
workstations. Intel Itanium processors are not supported in DataCore Servers.

Servers now come with CPU's that are multi-core and multi-threaded in order to deal with the high demand
multi-tasking variable applications. This can provide an overall high throughput benchmark when carrying out
many independent tasks in parallel, but there are pro's and con's of power and cpu efficiency versus low-latency
optimization. Dynamically switching, monitoring and balancing load increases system latency, which affects low
latency applications like SANsymphony-V that is optimized to deal with a very specific single task. The load
detection related to power saving of the system will put CPU and Power into a low power and power saving
state when it thinks that the CPU is idle. SANsymphony-V relies on the internal pooler being constantly
available to deal with I/O as it arrives and not wait for an interrupt to be told about the I/O arrival. Fortunately all
Servers have BIOS settings that can be adjusted to allow optimum settings so that performance is not affected.
You should set the BIOS setting for power management to "static high," to disable any power saving.

Amount of Memory

Best Practice Guide 10


A portion of memory (RAM) of a DataCore Server is allocated to DataCore Cache and is used for Read/Write
caching of I/O operations between Hosts and the physical disks in the storage backend. Cache is the
“workhorse” and should not be undersized. Cache that is oversized can drive up the cost of hardware and
delivers no real benefit. So it is important to size the amount of RAM used in the DataCore Server properly.

The Windows operating system of a DataCore Server does not need much system memory, 2 GB is usually
sufficient. DataCore Cache is set to use about 80% of the total amount of physical RAM. This value can be
adjusted if necessary but should not be unless advised by DataCore Technical Support or published in
Technical Bulletin 7b.

When more than 512 GB RAM of memory is installed in a server running SANsymphony-V the software will not
utilize as much as it should unless the registry is edited. The remaining memory remains available to the
Windows operating system but cannot be allocated to DataCore Cache. Please refer to Technical Bulletin 7b for
more detailed information on how to adjust the registry.

RAM Sizing

Exact calculation of RAM requires very detailed analysis and only satisfies a specific point in time, which is likely
to be out of date very quickly. Hence it is always better to oversize and have more RAM than undersize and
have too little RAM in a DataCore Server.

General Operating System Notes


Remember that a DataCore Server is not an “ordinary” Windows server. It is a Storage Hypervisor – common
rules or company policies do not apply to these machines.

Selecting the right Windows operating system edition for SANsymphony-V:


Up to 32 GB of physical memory: Windows 2008 Server R2 Standard Edition (64-bit)
32 GB – 2 TB physical memory: Windows 2008 Server R2 Enterprise & Datacenter Edition (64-bit)
Up to 32 GB of physical memory: Windows 2012 Foundation Edition
Up to 64 GB physical memory: Windows 2012 Essentials Edition
Up to 2 TB physical memory: Windows 2012 Standard Edition
Up to 4 TB physical memory: Windows 2012 Datacenter Edition

Microsoft Windows (non-OEM) versions vs. Microsoft OEM Windows versions


DataCore recommends using Microsoft Windows non-OEM versions for use with DataCore Servers. OEM
versions are not recommended because they often contain modified systems settings, 3rd party drivers,
monitoring software, tools and utilities etc. which may interfere with SANsymphony-V.
If you wish to use vendor software (such as hardware monitoring agents), please install necessary drivers only
and make sure that DataCore system functions are not affected (perform the Functional Test Plan found on
FAQ 1301).

Best Practice Guide 11


Language version
The recommended language version for Windows operating systems used for DataCore Servers is English.
Basically every language is supported. However, DataCore software products are primarily developed and
tested on English platforms. More important is that the English version helps to ensure quick and reliable help in
case of malfunction. English is the common language used in Technical support. For support reasons, we
strongly recommend using English Windows operating system versions only.

3rd party software


A DataCore Server acts as a Storage Hypervisor. It controls I/O operations between Hosts and disks – nothing
else. A DataCore Server should not be used for hosting backup software, network services (DHCP, DNS, etc.),
browsing the internet, downloading software, or performing any other functions that are unrelated to DataCore
software.
For this reason, a DataCore Server does not need to have any 3rd party software installed, except for:
 virus scanning application
 event notification tool/agent
 UPS communication software/agent
See section "Securing a DataCore Server" in this document for more information.

Before running any 3rd party diagnostic tools on a DataCore Server please contact DataCore Support and state
exactly what tool is being proposed to run and why. DataCore has no control over what the diagnostic tool may
do, by its nature it is very likely to be intrusive to the running SAN and may disable components. DataCore
strongly recommends not to run diagnostic tools on a running production SAN environment unless you know
exactly what effect it will have on the SAN. As a minimum, before running such tools, please Stop the DataCore
Software, as you would in a maintenance situation.

Windows Operating System Settings

Naming of a DataCore Server

Each object's name must start with a letter, but can end with any alphanumeric character. Only ASCII
characters are permitted, with a limit of 15 characters. Underscores and spaces are not allowed but hyphens
maybe used. Object Names are not case sensitive. For more details please refer to ‘Rules for Naming SAN
Components’ in the SANsymphony-V online help guide.

Important Note: Before installing the DataCore Software make sure that the Server machine name is no longer
than 15 characters and that it does not contain any underscores.

Communication Difficulties when Server has an underscore in the machine name.


Underscores in Server machine names may cause communication difficulties between DataCore Servers in the
group.
Note: Attempting to name your server with an underscore, will generate a warning from the Windows Operating
System.

Windows Automatic updates, patches and service packs:

Windows Automatic Updates should always be disabled on any DataCore Servers, as they often result in re-
boots, and each update should be reviewed carefully and should be performed during scheduled maintenance
windows only.

Only full Service Packs are qualified. DataCore does not qualify individual Security updates, Pre-Service Packs
or Hotfixes. See FAQ 839 for more detailed information.
This does not mean that DataCore Software will not function correctly with them installed but as with any 'non-
qualified' software installed on a server it should be tested in a non-production environment to verify there are
Best Practice Guide 12
no 'adverse effects' caused by them. Please remember that during any troubleshooting that may arise with the
Server software, DataCore may request the removal of the software to help resolve your issue.

Prior to installing any Microsoft update you should stop the DataCore software and stop presenting vdisks from
that server as many updates from Microsoft require a reboot and may affect the functioning of the Windows
Operating system that the DataCore software is installed on.

DataCore will not begin to qualify any Microsoft Service Packs until their official release date. So there may be a
slight delay while qualification takes place when a service pack is finally released to the general public.

Should any issues arise during qualification, DataCore will post documentation and/or code changes through its
support website that apply to the use of the specific Service Pack as a 'known issue'. We strongly urge you to
subscribe to FAQ 838 so that if any issues do occur, you will be notified by email directly.

DataCore does not recommend combining updates for the operating system with other software or hardware
maintenance on a Server in case of problems that may occur with multiple changes making the solving of the
problem much more difficult.

Pagefile size and memory dump settings

On a DataCore Server the page file (pagefile.sys) is not used by SANsymphony-V. However, it needs to be
configured of an appropriate size to record debugging information in case of an unexpected stop of the
DataCore Server (memory dump).

For best practice configure "Kernel memory dump" in Windows Startup and Recovery settings and match the
size of the page file with the installed RAM in the DataCore Server.

For more information regarding memory dump file options for Windows please see Microsoft KB article #254649

Send and collect save User-Mode dumps

On Windows Servers if the DataCore UI crashes a “Windows Problem Reports and Solutions window may open
prompting you to send information to Microsoft" please do so as DataCore can analyze the User-Mode dumps
sent to Microsoft.

Also configure Windows 2008 to save these User-Mode dumps on each DataCore Server:
 Open regedit, create the registry key;
 HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps
 Close regedit, no reboot is required.

Now if the DataCore UI crashes a user mode dump will be saved on the DataCore Server and can be sent to
DataCore Support for analysis. For more information about Collecting User-Mode Dumps see
http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx

Time synchronization

In SANsymphony-V it’s important that the DataCore servers are time synched, as it might influence license
activation, CDP and error reporting in tasks and alerts if they are not. In addition, for troubleshooting reasons
(such as log comparison) it is helpful if the system times on all DataCore Servers in a Group and all Hosts are
synchronized.

Help On time synch:


http://support.microsoft.com/kb/939322
http://technet.microsoft.com/en-us/library/cc773061(WS.10).aspx#BKMK_Config

All other Windows settings

Leave at default values. Don't try to optimize Windows—other default settings are sufficient.

Best Practice Guide 13


4 - Network Configuration

Keep DataCore Server away from public network

A DataCore Server is a Storage Hypervisor and must not be seen or accessed by any user beside the Storage
administrators. DataCore recommends connecting the DataCore Servers to a dedicated management LAN or
VLAN, not to the public (user) network.

Domain membership or workgroup

It is recommended, where possible, to put the DataCore Servers within the same server group to be placed in
the same workgroup and not as members of a Windows domain*. Domain membership could apply policies,
restrictions, user rights, etc. to the machine which may interfere with proper SANsymphony-V functionality.
Therefore DataCore Servers in a Group that do not need to be a domain are recommended to be placed in a
dedicated Windows workgroup, such as "DATACORE.” Do not leave them in the default workgroup
"WORKGROUP." This forces one of the DataCore Servers to be the workgroup’s Master Browser and ensures
quick responses.
*NOTE: Some features require workgroup membership to function, such as when SANsymphony-V is installed
on a Windows Hyper-V cluster. Please refer to the specific documentation for that configuration.

Use static IP addresses

DataCore Servers should use fixed IP addresses. Do not assign IP addresses dynamically by a DHCP server.

Name resolution

SANsymphony-V communicates with their peers by hostname. For simplicity DataCore does not recommend
registering the DataCore Servers on a DNS server but using the HOSTS file and LMHOSTS for static name
resolution instead. Enter hostnames and IP addresses of all DataCore Servers within a Group to the HOSTS file
located in: C:\WINDOWS\system32\drivers\etc.

SANsymphony-V will also need to have port 3793 open on the firewall for inter-node communications, to send a
support bundle and for replication. Port 3260 will need to be opened for iSCSI connections.

Dedicated Network Link

It is recommended that DataCore Servers in the same Group communicate with each other over the LAN and
have a dedicated LAN connection that is not used for iSCSI Traffic.

See the following diagram as an example of how a network connection can be setup:

Best Practice Guide 14


Best Practice Guide 15
Securing a DataCore Storage Server

Virus, Malware and other protection:

With any virus and worm attacks against Microsoft Windows Operating Systems, it is extremely important to
make the Server on which the SANsymphony-V software resides especially immune to these types of attacks.
This part is to educate administrators in the task of creating, adapting and applying existing policies to the
Server specifically.

NOTE: The topic of computer security is very broad, encompassing many areas. The most recognized
independent authority on this topic is the Computer Emergency Response Team (CERT), and it
classifies computer security into four areas; confidentiality, integrity, availability and mutual
authentication. Making a server immune to virus and worm attacks addresses the area of availability.
This focuses on that area. Other areas are beyond the scope of this, however several additional
resources are provided on other topics for your convenience.

General

As a first step, the storage administrator is encouraged to research and become familiar with the existing
corporate policy for Host security and anti-virus protection. Adopting a best practice that is the same as or as
similar as possible to that of the corporate policy increases the level of ‘buy in’ from other IT and operational
organizations within the enterprise. This is important in order to maximize and leverage expertise already in
house and to minimize the possibility for error due to lack of familiarity or understanding by administrators
outside of the storage group. Becoming knowledgeable about the current corporate security environment helps
the storage administrators focus research efforts and assess threats.

The next step is to leverage publicly available resources for guidance, recommendations and awareness. There
are many to choose from ranging from general to specific information. The following is a sampling of excellent
resources used by DataCore and many of our customers:

 Securing Network Servers http://www.cert.org/work/organizational_security.html


 Best Practices for Securing Your IT Resources http://protect.iu.edu/cybersecurity/best-
practices/
 80-20 Rule of Computer Security
http://securityresponse.symantec.com/avcenter/security/Content/security.articles/fundamentals.
of.info.security.html
 Security TechCenter http://technet.microsoft.com/en-US/security/default.aspx

In reviewing these resources, keep in mind that these guidelines are written for general applications, everything
from home computers to corporate servers. At a minimum, DataCore Servers should receive the same level of
high attention as any other mission critical, special-purpose enterprise server (e.g. SQL, Exchange and file/print
servers). If you have any questions about the applicability of specific third-party virus/worm protection
procedures to a DataCore Server, consult DataCore Technical Support.

Best Practice Guide 16


Installation

The following list of practices is designed to ensure a clean installation and includes a means to recover quickly
in the event that an infection should occur on any given DataCore Server. These practices follow the guidelines
of the documents referenced above and have been successfully applied by DataCore’s customers, our
professional services team, our certified installation partners, and our own corporate Information Technology
department.

1. Establish a firewall – Few enterprises today have the luxury of physical separation between the WAN
and LAN. Although Microsoft recommends enabling the Windows firewall, DataCore does not
recommend its use. During installation the DataCore software will change the settings that it needs to
functon. DataCore Servers should always be located behind at least one hardware firewall. If the
enterprise has multiple levels of trusted networks, then the DataCore Server should be placed on that
network with the highest level of security or an intranet firewall. If communication between DataCore
Servers need to cross a firewall or if there is a reason to activate firewall software (such as Windows
firewall) then certain ports must be opened. Port 3260 is needed for iSCSI traffic and port 3793 for
inter-node communication and replication.
2. Create a separate Windows Workgroup for Storage Servers – Making them members of the general
domain adds to network overhead.
3. Load the base Operating System and the latest DataCore-qualified Service Pack – DataCore
qualifies the latest Microsoft Service Pack (SP) upon general release. Check the DataCore website
prerequites page, particularly with new SP releases, to determine if it has been qualified for your OS.
4. Change the default Administrator password, control distribution of the password and change it
regularly – This common-sense recommendation immediately increases virus immunity significantly.
5. Review Security Bulletins and apply appropriate Security Hotfixes for the Operating System and
Windows Explorer – For a comprehensive list of bulletins go to the Microsoft Technet Website
http://www.microsoft.com/security/default.mspx
6. Disable unused TCP/UDP ports NOTE: DataCore recommends applying these changes as a part of
the installation before running the Functional Test Plan (FTP) to verify proper operation before putting
any DataCore Servers into production.
7. Apply anti-virus software with the latest virus signatures – Configure the software to scan system
resources including formatted disks only. Raw, (unformatted and partitioned) volumes should not be
scanned as this will interfere with normal operation. Virus scanning policies at the Host will ensure that
the data stored on DataCore managed disks remain virus free. DataCore does not recommend or
qualify specific virus scanning software products. Any known issues are listed on FAQ 1277. As
DataCore does not carry out our interoperability testing with virus scanning software, in the course of
troubleshooting an issue we may ask for it to be temporarily uninstalled.
8. Create a backup of the system (boot) disk – This is a good precaution to take. In the event that a
virus or worm should infect the DataCore Server, it will speed recovery of the Operating System to
reinstall the SANsymphony-V software on.
9. Install DataCore software, Optional Products and latest Product Service Packs – Check
DataCore’s Technical support website at for the latest product release.
10. Do not install any other services or software – The DataCore Server serves a special purpose as a
high-end disk controller. It should not be used to perform other functions not directly related to
SANsymphony-V other than downloading service packs, hotfixes or performing other SAN diagnostic
functions.
11. If Replication is used, exclude the Replication source buffers from being scanned.

Protect against power loss

If DataCore Servers from the same Group are located in the same room or data center, they should never be
connected to the same power circuit. In case of power loss to one DataCore Server, the other DataCore Server
can take over I/O handling for mirrored vdisks and switch off write caching of the affected mirrors to prevent
data loss or corruption. If both DataCore Servers loose power simultaneously this security mechanism will fail
and cache content will be lost. DataCore Servers and all components in the I/O path should be connected to
battery-backed power to prevent unexpected power loss.

Best Practice Guide 17


SANsymphony-V is UPS complaint provided that the Windows operating system is configured properly.
SANsymphony-V also includes PowerShell cmdlets to react on UPS events::http://www.datacore.com/SSV-
Webhelp/UPS_Support.htm

Non-Windows compliant UPS: When a UPS is used that is non-compliant with Windows, SANsymphony-V is
unable to monitor the UPS power state in order to stop the DataCore Servers as the low battery state is
reached. The DataCore Server therefore must be manually stopped to avoid an unclean shutdown.

As part of the controlled shutdown process, the following command needs to be run on the DataCore Server:

net stop dcsx

It may also be possible for SANsymphony-V users to implement one or more of SANsymphony-V's PowerShell
cmdlets in their own scripts.

See: http://www.datacore.com/SSV-Webhelp/Getting_Started_with_SANsymphony-V_Cmdlets.htm

DataCore do not have specific guidelines on which Powershell cmdlets to use in this case, so if you are in any
doubts simply run the 'net stop dcsx' command mentioned above.

Set up event log monitoring alerts

SANsymphony-V comes with a System Health monitoring tool which can be configured. Please refer to the
SANsymphony-V online help system for more information.
http://www.datacore.com/SSV-Webhelp/System_Health_Tool.htm

Install the agent of your monitoring application (Microsoft Systems Manager, HP OpenView, What's up Gold,
Nagios, EventSentry etc.) on the DataCore Server. The agent should be configured so that it scans the
Windows System and Application event logs for events with Type "Error" or "Warning" and reports them to
the monitoring management application or to the administrator’s or help desk’s email address.

In addition, you should configure automated tasks to send alerts via SMTP or configure SNMP Support.
DataCore also has a SCOM component that is a free download from our website.

Best Practice Guide 18


Backup and restore a DataCore Server

General: It is not recommended to install a backup agent or backup software on a DataCore Server and do a
full or incremental backup from time to time for following reasons:

1. Backup software may try to get exclusive access to a file during the backup process. If the backup
software tries to lock a DataCore system or configuration file it may cause a malfunction or crash.

2. Restoring a DataCore Server from a full backup (on tape or disk) may be a long process: Install the
Windows operating system, install service packs and hot fixes, install backup agent, connect to backup
server or tape drive, begin restore and wait to complete. There are smarter and faster ways to restore a
DataCore Storage Server.

The DataCore Server itself does not hold much configuration information. The majority of the configuration
information like pool configuration, virtual disk information etc. is stored in the meta data header on the physical
disks in the backend. The information is also stored in the xconfig.xml file located in C:\Program
Files\DataCore\SANsymphony-V.
These files are always identical on every DataCore Server within a Group and are updated on each update of
the configuration.

Backup Configuration
If a hardware (e.g., disk failure) or software failure should occur, your configuration will be simple to recreate.
Configuration files can be preserved from the SANsymphony-V Management Console or by using a Windows
PowerShell™ script file provided in the SANsymphony-V file folder. Configuration files are restored by running
the Windows PowerShell script file. See the SANsymphony-V Help system for more information.

Boot Partition Image

It might be helpful to create an image file of the boot partition (C:\) before carrying out major maintenance tasks,
such as installing a new Windows Service Pack, etc. Make sure that all DataCore services are stopped before
creating an image file. This image is of this specific point in time which conserves the status of all mirrors in the
registry and can’t be used at a later point. Restore this image only if your immediate attempt to update fails. It is
not suitable for use in any other situation or at a later time since actual mirror states will not be preserved. Loss
of data may result if this backup is reapplied improperly.

Installing the Windows operating system can be significantly sped up if an image of the C:\ partition is available.

Restoring a DataCore Server


The description of the following steps is just informational. Please do not try to restore a DataCore
Server in a production environment without consulting or assistance from DataCore Technical Support.

In order to restore a DataCore Server, install the Windows operating system and then SANsymphony-V from
scratch. Rejoin the Group then restore the configuration as outlined in the Online Help system under ‘Backing
Up and Restoring’. If you have not backed up the configuration before attempting to restore and you need to
get it back, please contact DataCore Technical Support for assistance.

Best Practice Guide 19


5 – SAN Design Guide

Avoiding Single Points of Failure (SPOF)

Pooling and sharing resources is a good idea in many regards. However, centralizing contains a risk too. If one
instance fails, several services might be affected. For this reason, modern IT infrastructures – not only storage
environments – provide various levels of fault tolerance and redundancy.

In a true High Availability (HA) environment, the outage of any single device may not impede the availability of
the whole system, but some constraints may still apply. For example, performance may be temporary degraded
in the event of a failure and during the process of recovery. To ensure this any "Single Point of Failure" (SPOF)
must be strictly eliminated from the design.

High Availability is an end-to-end process; from the user to the data. If only some links in the user-data-chain
are highly available, it cannot be considered a true HA environment. In this document we just pay attention to
what we can “see” from our storage perspective. However, many more factors should be considered like
application availability (such as clustering), network availability, power supply, climate control, and so on.

The logical diagram below shows an HA storage environment – everything is doubled (at the very least). Notice
that there are two HBA/NICs in the Host, connected to two independent networks or fabrics, two DataCore
servers each controlling separate storage, all data is mirrored, links between all devices are redundant. For any
single failure, there is always a second path to reach the physical storage location.

A Host

Fabric/Network Fabric/Network SAN infrastructure

DataCore Servers

Storages / Data

In addition to redundancy, separate components limit the effect of environmental impact. Place components in
different locations (racks, rooms, buildings), connect devices to different power circuits, and ensure that Hosts,
network infrastructure, and air conditioning units are also redundant.

Whether fibre channel or iSCSI networks are used, separate physical and logical isolation of pathing and routing
is needed to achieve the highest degree of availability and proper failover in the event of a failure.

Best Practice Guide 20


Fabric-1 Serving Diagram

Diagram Fabric-1 shows a recommended serving of a mirror virtual disk to a typical clustered application.
 All arrows are from initiator to target.
 Blue is the primary initiator/target path for the hosts.
 Red is the secondary initiator/target path for the hosts.
 The mirror paths are shown in green and pink.
Things to note about this properly mapped diagram are.
 Each host has two Fibre Channel/iSCSI and physically separate primary and secondary paths
 Host should use the same switch for primary IO
 No mirror path should follow any Host initiator/target paths. Particularly those that are part of the same
mirrored virtual disk.
 Mirror paths should be routed through separate switches/fabrics to prevent double failure and assure
highest availability.

Cabling and Port Settings

In order to ensure stable and manageable operations, some basic rules for SAN setup and cabling should be
followed. SANs are typically highly customized to the particular customer environment. This circumstance
makes it challenging to issue generally valid recommendations. Especially in regards to the cabling layout, this
will have to be adjusted to satisfy various needs.

Best Practice Guide 21


Frontend ports (to Hosts)

First consider the frontend ports of the environment. Pay attention to following points:

 Connect all frontend ports to the same fabric or switch (as shown in diagrams below).
 Configure the ports for their purpose (FE (frontend), MR (mirror) or BE (backend))

ISL

ISL

Some scenarios require a different type of cabling layout, such as for long-distance ISLs or to meet operating
system vendor recommendations. Please refer to Technical Bulletins available on the DataCore Technical
Support Website and check with your Host operating system vendor for other requirements.

By default, all ports are set to target & initiator mode. Frontend ports should operate in target only SCSI mode
because some operation systems get "confused" if they see a port operation in both modes simultaneously.
When a port id configured as FE, the local DataCore Server turns the port off if services are stopped. Some
operating systems require this behavior and it speeds up failover if a DataCore Server is shut down.

Best Practice Guide 22


Mirror ports

DataCore Server mirror ports can be connected either directly (point-to-point) or through switches. Both
solutions have pros and cons and their importance must be decided case by case for your environment. In
general, at least two physically separated mirror ports should be used to avoid single points of failure.

PRO PRO

 Simplicity (less configuration, no switch involved)  No issues with point-to-point connections

 Mirrors do not break if switch goes down (such as  Allows configurations with more than two DataCore
for firmware upgrades) Servers

 Longer distances possible (such as with stretched
fabrics)
CON

 Some hardware components have issues with CON


point-to-point connections, please check DataCore
Qualified Lists  Switch outage (such as for firmware upgrades) may
cause mirror recoveries
 Distance limitation of cables
 Mirror port direction (initiator to target relationship)
 Not feasible with more than two DataCore Servers must be configured exactly as shown in diagram
above

Best Practice Guide 23


Zoning Considerations (Fibre Channel Switch Configuration)

Why zone?

Zoning is a fabric-based service in a Fibre Channel SAN that groups host and storage ports that need to
communicate. Zoning allows nodes to communicate with each other only if they are members of the same zone.
Ports can be members of multiple zones.
Zoning not only prevents a host from unauthorized access of storage assets, it also stops undesired host-to-
host communication and fabric-wide Registered State Change Notification (RSCN) disruptions. RSCNs are
issued by the fabric name server and notify end devices of events in the fabric, such as a storage node going
offline. Zoning isolates these notifications to the nodes that require the update. This is important for non-
disruptive I/O operations, because RSCNs have the potential to disrupt storage traffic.

Zoning approach

There are multiple ways to group hosts ports and storage ports for a zoning configuration. Hosts (initiators)
rarely need to interact directly with each other and storage ports never initiate SAN traffic by their nature as
targets.
The recommended grouping method for zoning is "Single Initiator Zoning (SIZ)" as shown in the diagram below.
With SIZ, each zone has a single port and one or more storage ports. If the port needs to access both disk and
backup storage devices, then two zones should be created: one zone with the port and the disk devices, and a
second zone with the port and the backup devices. SIZ is optimal because it prevents any host-to-host
interaction and limits RSCNs to the zones that need the information within the RSCN.
On Initiators TPRLO has to be turned off.

Initiator Initiator

Single Initiator Zones

SAN switch

Target
Initiator

Target

Best Practice Guide 24


Soft or hard zoning

There are two types of zoning available in a fabric: soft zoning (WWN or alias based) and hard zoning (port
based).

Soft zoning is the practice of identifying and grouping end nodes by their World Wide Name (WWN) or their
respective alias name. The WWNs or aliases are entered in the fabric name server database. The name server
database is synchronized and distributed across the fabric to each switch. Therefore, no matter what port a
HBA is plugged into, it queries the name server to inquire the devices in its zone.

With hard zoning, nodes are identified and grouped by the switch port number they are connected to. Switch
ports in the same zone can communicate with each other regardless of “who” is connected to those ports. The
fabric name server is not involved in this zoning mechanism.

Soft zoning implies a slightly higher administrative effort when setting up the fabric compared to hard zoning.
However, it has significant advantages concerning management and solving connectivity issues. Due to the fact
that soft zoning is handled by the name server, a node can easily be moved to another switch/port within the
fabric. Conversely, a node cannot be plugged into a wrong port by accident – it simply does not matter where a
node is plugged in, it always sees the correct zone members.
Furthermore, soft zoning provides valuable information for solving fabric issues, especially if people who are not
familiar with the fabric setup are involved. For instance, it is easier to understand that alias
SQL_SERVER_FabA can communicate (is in the same zone) with alias DATACORE1_TARGET_FabA.
Troubleshooting in a hard zoned fabric tends to be more cumbersome and requires the precise (and up-to-date)
documentation of the physical connections between hosts and fabric (SAN diagram, cabling scheme and so
on).

Naming conventions

Naming conventions are very important to simplifying zoning configuration management. User-friendly alias
names ensure that zone members can be understood at a glance and configuration errors minimized. Good
practice for host aliases is to reference the hostname plus the particular HBA (number, slot, WWN). In case of
DataCore Servers, it might be also be helpful pointing out the port role (frontend, backend, and mirror). Storage
arrays which have multiple controllers and/or ports – this is also useful information to be mentioned in the alias
name. Following are some examples of alias names:

Hostname + HBA port e.g. SAPPROD_HBA0


DataCore Server name + Function + last 4 digits of WWN e.g. SDS01_Frontend_2E5C
Storage Array + Controller + Port # e.g. ArrayXYZ_Ctr1_Port0

Zone names should reflect their members. Following the rule that there should be just one initiator member per
zone, the initiator alias is unique and is a good name for the zone name too, optionally supplemented by the
targets this initiator can access. For example:

A zone that connects the DataCore Server’s backend port to the storage array XYZ with two FC ports:
Zone name: SDS01_Backend-XYZ contains the members:
 SDS01_Backend_2E5D (initiator)
 ArrayXYZ_CtrA_Port0 (target)
 ArrayXYZ_CtrB_Port1 (target)

Best Practice Guide 25


iSCSI (LAN) Cabling and Port Settings

In general, the same rules for Fibre Channel environments apply to iSCSI environments. Since iSCSI uses IP
protocol and Ethernet infrastructure, these additional general rules apply:

 iSCSI traffic should be strictly separated from the public (user) network.
 Use separate hardware (LAN switches) for iSCSI storage traffic or at least different virtual LANs
(VLANs).
 Use separate NICs for iSCSI than those used for inter-Server and management communication.
 Deactivate every other protocol or service except for "Internet protocol (TCP/IP)" on NICs used for
iSCSI.
 Port 3260 needs to be open on the firewall to allow iSCSI traffic.
 IPv6 is not supported on iSCSI traffic.

Do not disable any DataCore Software iSCSI Adapters under Server Manager, Diagnostics, Device
Manager, or DataCore Fibre-Channel Adapters.

If some DataCore Software iSCSI Adapters are disabled, when the DataCore Server is rebooted it can
lead to the incorrect channels becoming unavailable after the reboot. This can lead to broken iSCSI
connections to DataCore Software iSCSI targets.

Re-name the Server Ports in SANsymphony-V so it is clear that these DataCore Software iSCSI Adapters are
not to be used for anything else.
Other possibilities are as follows: In SANsymphony-V, remove the Front End (FE) and Mirror (MR) port
roles from the Server Port to prevent use or session log in by an iSCSI initiator.

iSCSI frontend ports (to Hosts)

First consider the frontend ports of the environment. Pay attention to the following points:
 Connect all frontend ports to the same switch or VLAN (as shown in diagrams below).
 Allow only one IQN per Host initiator to login to the same DataCore target port. Multiple iSCSI sessions
from the same host IP address to a single DataCore Server iSCSI target are not supported.
 By default, no more than 16 iSCSI initiator connections are allowed to an iSCSI target at any one time,
please refer to FAQ 1235 for more information.

Best Practice Guide 26


Some scenarios require a different type of cabling / iSCSI connection layout, such as with Wide Area Network
(WAN) links or special operating system vendor recommendations. Please refer to Technical Bulletins available
on the DataCore Technical Support Website and check with your Host operating system vendor for other
requirements. The DataCore driver only operates in Target mode on ports used for iSCSI and is therefore used
for frontend and mirror targets. The Microsoft iSCSI Initiator is used for any mirror and backend initiator
functions.

iSCSI mirror ports

DataCore Server mirror ports can be connected either directly (crossover cable) or through switches. Both
solutions have pros and cons and their importance must be decided case by case for your environment. In
general, at least two physically separated iSCSI mirror ports should be used to avoid single points of failure.
In order to use iSCSI for mirroring, Microsoft iSCSI Initiator software must be enabled on the DataCore Servers
Do not configure NIC teaming or bonding when using the MS iSCSI initiator as outlined in their release notes.

Customers are encouraged to check configurations and eliminate any situations where multiple initiators to a
single target sessions have been created. Multiple iSCSI sessions from the same IP address to a single
DataCore Server iSCSI target are not supported.

Best Practice Guide 27


192.168.1.1 192.168.1.3
192.168.1.1 192.168.1.3
192.168.0.2 192.168.0.4
192.168.0.2
192.168.0.4

Note: There is one dedicated iSCSI mirror path between DataCore Servers in each direction.

PRO PRO
 Simplicity (less configuration, no switch involved)  Allows configurations with more than two storage
servers
 Mirrors do not break if switch goes down (such as
for firmware upgrades)  Longer distances possible (such as for WAN links)

CON
CON
 Switch outage (such as for firmware upgrades) may
 Distance limitation of cables cause mirror recoveries
 Not feasible with more than two storage servers

Best Practice Guide 28


6 - Pool performance aspects

Disk performance is one the most important characteristics in a storage environment. For designing and
implementing appropriate solutions it is crucial to understand the difference between a “classic” storage array
and a DataCore thin provisioned pool (disk storage pool). In a classic environment, Hosts have exclusive
access to a disk or RAID set. The RAID set handles the I/O traffic of this particular server or application and
should be optimized accordingly. Within a DataCore thin provisioned pool, physical disks or RAID sets are
typically shared among multiple Hosts, the I/O pattern those disks experience may look very different.

Understanding DataCore Thin Provisioning Pool Technology

In a DataCore thin provisioned pool, physical disks are put together into a pool consisting of multiple storage
LUNs from which Virtual Disks are created. These Virtual Disks are then served to one or more Hosts.

The DataCore pool provisions storage allocation units (SAUs) from the physical disks as requested by the
writes from the Hosts. The data content of the Virtual Disks by default is equally distributed across all physical
disks within a pool.

Several factors make a significant difference in performance characteristics, such as:

 Disk type (SSD, SAS, SATA)


 Amount of disks
 Rotation speed
 RAID level
 LUN configuration
 Connection type (iSCSI, FC, direct attached SCSI)

Host

Virtual Disk

Disk Pool

Best Practice Guide 29


Disk Types

Different disk technologies have significant performance differences. Disk performance is primarily represented
by three key factors:
Average Seek Time The time the read/write head needs to physically move to the correct place.
IOs per Second The amount of operations a disk can handle per second.
MBs per Second The amount of data a disk can transfer per second.

Typical average disk performance values


Avg. I/Os per
Disk type Avg. Seektimes Avg. MB per Second Application
Second
3.5 ms (15K RPM); 180 IOPS (15K RPM) 20 MB (15K RPM) Performance-intensive, high
Fibre Channel
3.8 ms (10K RPM) 150 IOPS (10K RPM) 15 MB (10K RPM) availability, random access
3.2 ms (15K RPM); 200 IOPS (15K RPM) 23 MB (15K RPM) Performance-intensive, high
SAS
3.5 ms (10K RPM) 160 IOPS (10K RPM) 16 MB (10K RPM) availability, random access
9.5 ms (7.2K RPM) 75 IOPS (7.2K RPM) Avg. MB/s varies
SATA Capacity-intensive, sequential access
12 ms (5.4K RPM) 60 IOPS (5.4K RPM) see paragraph below

Compared to drive manufacturer's data sheets these values seem to be low. Published performance values by
drive and storage array vendors are often misleading and generally represent the best case scenario. In an
environment where physical disks are shared among several Hosts the access pattern is typically highly random
which causes a lot of repositioning of the actuators (disk’s read/write head assemblies). For this reason the
average “real world” performance is often much lower than the benchmark maximum measured in the lab.

Fibre Channel and SAS disks have comparable technologies and performance characteristics. For instance,
SAS disks have slightly better performance typically due to the fact the average seek time is shorter on a
smaller platter. FC and SAS disks can respond quickly to random I/O requests and are the first choice for
performance-intensive and highly random I/O patterns like database applications and email server applications.

SATA disks have a different technology inside the box compared to FC and SAS disks. They have a less
expensive mechanical design that results in slower rotation speeds and higher seek times for positioning the
read/write heads. SATA disks can perform very well (up to 90% of FC/SAS disks performance) if they are
accessed by sequential read/write traffic and large I/O sizes. If SATA disks need to respond to a large number
of small, random I/O requests they may deliver poor response times. SATA drives are the first choice for
capacity-hungry application with low performance requirements or mainly sequential I/O like archive systems,
media streaming or backup to disk.

Typical Disk Type Usage (RAID levels discussed later).

Disk Type RAID Level Storage Tier Typical Applications

FC & SAS RAID 1 / 10 Tier 1 Heavily used database, email, ERP systems etc.

FC & SAS RAID 5 Tier 2 File Service, lower loaded database applications

SATA RAID 5 / 6 Tier 3 Archive, Media Storage (x-ray, video), Backup to disk

Best Practice Guide 30


Amount of Disks

An old rule of thumb is that the more spindles a disk array contains the more performance it delivers. This rule
still applies. A RAID controller or storage controller in a storage array can distribute the incoming I/O requests to
all disks in a RAID set. The disks are acting independently from each other so the performance of the RAID set
is the sum of all disks grouped in a RAID set.
Following are examples showing two RAID 5 sets of the same capacity. One set is built with a few large
relatively slow SATA disks and one set is built with many small relatively fast SAS disks. The calculated
performance of the RAID set with the small SAS disks is approximately 15 times higher compared to the set
with large SATA disks.

3 x 750 GB SATA drives 15 x 146 GB SAS drives

One SATA Disk: 75 IOPs One SAS Disk: 200 IOPs


RAID: 5 RAID: 5
Total capacity: 2 TB Total capacity: 2 TB
Total IOPS: 225 Total IOPS: 3000

When designing the appropriate disk layout of a storage solution, the amount of disks being used is a significant
factor. Not just the requested capacity is essential, but the overall performance requirements define the number
of disks to satisfy performance needs.

RAID Layout
Physical disks connected to a RAID controller can be grouped in RAID sets and carved into LUNs presented to
the host in several ways. Understanding the relationship between physical disk/LUN grouping and the DataCore
pool technology is crucial to get best performance results. The block size must be 512 bytes.

Spindle counts in a RAID set


As shown above, the more disks a RAID set contains, the higher the performance typically. In order to get good
performance out of a RAID set a certain number of spindles is necessary. On the other hand, as the number of
physical spindles increases, the rebuild time to recover from a disk failure also increases. Of course, this is
highly dependent on the chosen RAID level and the disk type. A balance must be found between performance
and rebuild time and it is difficult to make general recommendations.

Total LUN counts


Each LUN is seen by the operating system as one disk and has one I/O queue. The I/O queue is the “pipe” that
transports I/Os between host and disk. The more I/O queues (disks) are seen by the host, the more I/Os can be
transported in parallel. This is another reason for having more, smaller RAID sets instead of one large one.

Quantity of partitions carved from a RAID set


Limit the number of LUNs carved from a RAID set to as few as possible. The DataCore pool algorithm
distributes data and I/Os across all disks within a pool. If those “disks” are in reality LUNs residing on the same
RAID set, this may lead to disk thrashing as the actuators are incessantly repositioned from one partition to
another on the physical disks. This is especially the case with disks having high seek times (time to position the
head) and may result in poor performance. DataCore recommends configuring only one LUN per RAID set per
pool to avoid disk thrashing.

Best Practice Guide 31


What is the optimal stripe size of a Storage Array attached to a DataCore Storage Server?

A DataCore Server can send a write to the storage array from anywhere between 4k and 1mb per IO. The
Server can merge writes depending on how they are sent from the Host and this can change the size of a given
write I/O at any time, therefore DataCore recommend leaving the Storage Array's Stripe Size at the Storage
Vendor's recommended default.

The following examples demonstrate the relationship between number of disks, RAID set, LUNs and the
DataCore pool technology.

Example 1 – 15 Disks in 1 RAID Set, 1 LUN exported to the pool:

Pro:
 Good performance due to striping across 15 physical
spindles
 small storage loss for RAID overhead (depending on
chosen RAID level)

Con:
 DataCore has just a single I/O queue to the disks which
may result in congestion
 If one physical disk fails, the whole LUN is affected by
RAID rebuild
 RAID rebuild may take a long time on large RAID sets
and degrade performance

Cons outweigh Pros. Configuration is OK, but there


might be a better solution.

Example 2 -- 15 Disks in 1 RAID Set, 3 LUNs created from RAID set and exported to the pool:

Pro:
 small storage loss for RAID overhead (depending on
chosen RAID level)

Con:
 If one physical disk fails, all three LUNs are affected by
RAID rebuild
 RAID rebuild may take a long time on large RAID sets
and degrades performance
 The DataCore pool concept of distributing allocated
blocks conflicts with the RAID layout – creates
additional seek and rotational latency on the physical
disks, thus degrading performance

Avoid this type of configuration due to too many


disadvantages and probability of poor performance.
Best Practice Guide 32
Example 3 – 15 Disks in 3 RAID Sets, 1 LUN for each RAID Set exported to the pool:

Pro:
 DataCore has three I/O queues to the disks.
 The distribution algorithm spreads out the load across
LUNs and increase performance
 A failed physical disk affects just one LUN
 RAID rebuild is quicker with fewer disks.

Con:
 Greater storage loss for RAID overhead (depending on
RAID level)

Pros outweigh Cons. Good balance between


performance/availability and costs. Recommended
configuration in this type of situation.

Example 4 - No RAID level – “just a bunch of disks” (JBOD)

Pro
 Full use of capacity
 Many I/O queues to disks
 Pool spreads loads across spindles

Con
 One failed disk will affect the entire pool(s)
 Disk failure may cause long recovery time
 Bad blocks on disks are not recognized

If JBODs are used in a pool, a physical disk failure will impact every virtual disk in the pool. DataCore mirroring
of all virtual disks is mandatory to prevent outage and data loss. Depending on the amount of data that must be
recovered from the mirror partner, long recovery times may result.

Best Practice Guide 33


RAID 0 – Striping

Pro
 Full use of capacity
 Highest write performance
 Highest read performance

Con
 Highest risk potential
 DataCore has just a single I/O queue to the disks which
may result in congestion
 One failed disk destroys all data in the pool
 Disk failure may cause long recovery time

RAID 0 sets have very high performance typically and the highest risk potential. DataCore Server mirroring of all
virtual disks is mandatory to prevent outage and data loss. A single failed disk destroys the whole affected
LUN/pool and causes all data to be recovered from the mirror partner side.

RAID 1 – Mirroring

Pro
 High security level
 High sequential read/write performance
 High random read/write performance
 DataCore spreads load across all RAID1 sets
 Failed disk doesn’t significantly affect performance
 Quick recovery from disk failure

Con
 50% capacity loss

Pools containing multiple RAID 1 sets typically have the highest security level and high performance. The
DataCore pool algorithm distributes the load across all LUNs. Disk failures affect just one LUN and usually
recover very quickly. Pools with multiple RAID 1 sets are recommended for non-mirrored volumes and
applications which cause lots of small random I/Os (like database and email applications). Pools which contain
numerous volumes accessed by many Hosts may experience highly random I/O too.

Best Practice Guide 34


RAID 10 (respectively RAID 01) – Striping across Mirrors

Pro
 High security level
 High sequential read/write performance
 Highest random read/write performance

Con
 50% capacity loss
 Less I/O queues compared to multiple RAID1

RAID 10 sets do not have significant advantages compared to multiple RAID1 sets in a pool. Just in some
specific types of access pattern (e.g. heavy sequential reads) the Host may benefit from the underlying block-
oriented striping of the RAID set.
Generally, multiple RAID 1 sets provide the same result plus more advantages (such as more I/O queues) and
are usually preferred over RAID 10 configurations.

RAID 5 – Striping with Parity

Pro
 Moderate capacity loss
 High sequential read/write performance
 High random read performance

Con
 Low random write performance
 Moderate security level
 Failed disk / rebuild impacts performance

RAID 5 sets perform very well with highly sequential access patterns and random reads. Due to the nature of
recalculating and updating parity information, heavy random writes may suffer low performance. Disk failures
cause significant performance decrease during rebuild. RAID 5 sets are good for applications with mainly
sequential I/O or highly random reads like file server, average loaded databases and so on. Creating multiple,
smaller RAID5 sets are preferred over fewer large ones.

Best Practice Guide 35


General Storage/Disk Pool notes

Same type of disk, speed, size per Storage Tier

Physical disks or RAID sets used in a particular storage tier should be equal in regard to their capacity, disk
technology and performance characteristics.
The DataCore pool algorithm distributes the allocated data blocks equally across all disks in a pool – so it
makes little sense having slower and faster or larger and smaller “spots” (disks) within one tier. If a pool
capacity is intended to be expanded and the original disks/type/capacity etc. are no longer available, it might be
worthwhile creating a new pool and migrating Virtual Disks.

Different pools for different purposes

Match pool performance and capacity characteristics with application needs. For example, a pool containing a
high count of spindles of smaller capacity does have a significant higher I/O performance typically compared to
one with fewer disks of high capacity. If performance/capacity requirements vary between applications (as is
typically the case) create multiple pools of different characteristics. Virtual disks created out of a pool can later
on be migrated to another pool if requirements change over time.

Following table shows examples for different pool characteristics*:

Pool Disk / RAID Tier Level Purpose, Applications

ERP systems, loaded database


"GOLD" FC Disks / RAID 1 Tier 1 (High Performance) application, email systems with lots of
users

files & print services, test and


"SILVER" SAS Disks / RAID 5 Tier 2 (Economy Storage) development systems, less loaded
databases

media storage, archive application,


"BRONZE" SATA Disks / RAID 5 Tier 3 (High Capacity)
backup-to-disk

*Note: These are examples. It does not necessarily mean that a pool of RAID 5 sets is not suitable for email
applications or a pool with SATA disk is not capable serving a file server. The best RAID layout depends only on
the effective performance requirements of the particular environment.

Auto-tiering:
Fixed Tier Assignment may not always be optimal, and with SANsymphony-V auto-tiering better use can be
made from different types of physical disks in a pool.
Access patterns within sections of the virtual disk can vary dramatically, For example, some files within a file
system may be more popular than others at any given time, e.g.: frequently accessed blocks, moderately
accessed blocks, infrequently accessed blocks. This is more pronounced with clusters of virtual servers sharing
a common virtual disk.
Results in poor use of high performance, premium-priced disks (especially SSDs)
Prevents other performance-oriented workloads from getting adequate response
More can be found in SANsymphony-V help: http://www.datacore.com/SSV-Webhelp/Auto-
tiering_and_Storage_Profiles.htm

Best Practice Guide 36


7 – Synchronous Mirror Best Practices

Match Size and Performance Characteristics

The two parts of a mirror Virtual Disk (primary/preferred and secondary/alternate) have be of the same size.
Primary and secondary volumes of a mirror should have the same performance characteristics. In a
synchronous mirror relationship, the slowest member determines the resulting performance. In DataCore
environments where both volumes have independent cache instances, disk performance differences can be
compensated to a certain degree by caching IOs.
However, in some scenarios the true disk performance comes into effect, for instance after a disk failure occurs.
In this case, caching is switched off to avoid any loss of Data kept in cache and I/O requests are rerouted to the
secondary volume directly (see diagram below). If the secondary volume has significant lower performance
characteristics as the primary volume, the Host will experience notably higher response times from its disk. The
secondary disk will not only carry the I/O load from the Host solely, in addition it will serve the resynchronization
traffic when the primary disk comes up again.

For this reason, both sides of a mirror should be of the same performance characteristics. On the other hand,
there is nothing wrong with using disks of different performance – as long as the possible results are acceptable
in the particular environment.

Application Server/Host

write IO
OK

LOG

Cache Cache

When one physical disk goes down, write


DataCore Server caching is disabled for the affected virtual disk DataCore Server
and a log is started.
The DataCore Server then sends the I/O to the
other, which also operates in write-thru mode
and waits for a return.
The write I/O is acknowledged to the Host only
after the data is written to the surviving physical
Preferred Disk disk. Alternate Disk
SAS 15K RPM SATA 7200 RPM

Best Practice Guide 37


Synchronous Mirroring over Long Distances

There is no set rule concerning the distance between the primary/preferred and secondary/alternate volumes of
a synchronous mirror. The limit will be determined by the acceptable latency of the application using the
mirrored virtual disk.
Read I/Os are always processed locally (primary volume) while Write I/Os need to be transmitted to the remote
DataCore Server (secondary volume). Writes I/Os to a stretched mirror experiences a longer latency, the time
an I/O needs to travel to the remote side plus the same amount of time the acknowledge needs for returning.

In environments where a direct connection (dark fibre network) between sites is used, latency is normally no
problem, for example:

 Dark fibre links can be stretched up to 10 km (with 1300 nm laser) or 35 km (with 1550 nm laser). A
dark fibre link of 35 km adds a latency of around 5 micro seconds per km.
o A microsecond (µs) is equal to one millionth of a second or one thousandth of a millisecond
(ms).
 Typical SCSI transactions require a transaction to transverse the link 8 times or four round trips.

This means a dark fibre link of 35 km adds a latency of 5 µs * 35km * 8(trips)= 1400 micro seconds(µs) = 1.4
milliseconds (ms) which is negligible for most applications, but it could affect time sensitive transactional Hosts
such as databases which can send a lot of small I/O per second.
However there are other considerations that can add to the reliability and latency across a link which needs to
be taken into account:
 Degradation of the signal along the cable; this tends to increase the longer the link and brings down
reliability without extra hardware to correct the degradation.
 If the link (HBAs and FC switches, link hardware) has not enough FC Buffer Credit, latency can
increase. The faster FC speed you want to use (2Gbps, 4Gbps, 8Gbps etc.) and the longer the link the
more FC Buffer Credits you will need to fully utilize the link.

DataCore recommends that you talk to your HBA, FC switch, link hardware and cable provider, if you have
concerns about any of the above bullet points.

Best Practice Guide 38


Long distance iSCSI connections, inter-site links which use routers for tunneling Fibre Channel through IP
(FCIP, iFCP, FCoE) or other WAN infrastructures may significantly increase latency. In those cases, the real
link latency must be correctly measured and the impact considered in regards to the response time of the
particular application.

In general, link distances between synchronous mirror members should neither exceed 35 km nor traverse
WAN connections. If this is a demand in your environment, please contact DataCore reseller and/or integrator
prior to setup.

Best Practice Guide 39


8 – Snapshot Best Practices

Snapshot Destinations Pool Considerations

Snapshot source and destination virtual disks must be of the same size. A snapshot destination normally
contains much less data than the source volume (just the changed blocks). Because of this, it is recommended
to use small storage allocation unit (SAU) size for the pool which contains Snapshot destination virtual disks.

With snapshots, the first change occurring on a source virtual disk causes the migration of a corresponding
chunk of the original source blocks to the destination. For this reason, given heavy utilization of snapshot
relationships with many small write I/Os (typical for email applications) it is advisable to create the snapshot
pool with a small allocation unit size. The small allocation unit size will result in better capacity utilization of the
snapshot pool. The unit size will depend on the size of the write I/Os to the source and the minimum unit size
can be 4MB.
Snapshot Performance Considerations

Snapshot destination Virtual Disks on fast disks

Snapshot copy-on-first-write process: For every incoming write I/O to an unmodified chunk, the original chunk
needs to be relocated to the snapshot destination before the Host receives the write acknowledgement. The
faster the disk behind the destination, the quicker this can be accomplished. If the disks behind the snapshot
destination are significant slower in performance than the source, this may impact the overall performance of
the production virtual disk (source).

Limit number of snapshots per virtual disk virtual disk

Every active snapshot relationship adds a certain amount of additional load to the source disk. SANsymphony-V
has no set limits on the number of snapshots per source, a high number of active snapshots however should be
avoided. While a couple of snapshots may not noticeably influence performance, numerous snapshots can slow
down the source significantly.

In addition, there is no valid reason for having many snapshots per source virtual disk. Snapshot technology is
sometimes confused as a replacement for backup or continuous data protection. In this case, the usage of other
solutions may be considered, such as DataCore Continuous Data Protection (CDP) or Replication.
Quiesce Host and flush cache before snap

A snapshot is a virtual image of a given point in time. In order to ensure valid data in the snapshot image it is
crucial that the snapshot source is in an idle and consistent state in the moment it gets enabled. Following five
steps must be observed (in order):

1. Quiesce application / stop I/O to disk - ensures and that no more write changes occur during the
creation (enable) of the snapshot.
2. Flush application cache and/or Operation System cache - ensures that all data is written to the disk
and no data remains in cache instances on the Host.
3. Create snapshot relationship - the snapshot source virtual disk is in a consistent state.
4. Resume normal operation of application - after the snapshot is enabled normal operation of
application can be resumed.

Best Practice Guide 40


5. Serve or mount the snapshot destination virtual disk to another Host (optional) – the snapshot can
now be served to another Host. NOTE: If the snapshot is already served, it must be unmounted
and remounted again to force the file system to import the changed state. Some operating systems
need to be rebooted in order to see the changes.

The above mentioned steps are usually automated and can be achieved by scripts. PowerShell, Volume
Shadow Copy Services (VSS) and so forth, please refer to the SANsymphony-V Help system and DataCore
training course manuals.

Best Practice Guide 41


9 – Replication Best Practices
Relocate Host page files to non-replicated Virtual disks
Pagefiles (like Windows pagefile.sys or VM swap files) are being continuously altered from the OS. If a pagefile
resides on a virtual disk that is replicated; all these changes are transmitted too and may cause high replication
traffic. The pagefile content is, however, irrelevant on the Replication destination side. Especially in larger
deployments it is recommended to relocate page and swap files to Virtual Disks which are not being replicated
to avoid unnecessary traffic.
Link Throughput
The “pipe” which connects the source with the remote site must be large enough to transport all the changes
(write IOs) which happen to the source volume. In other words, if 10 GB of data changed during the day, 10 GB
must be replicated to the remote site and the inter-site link must be capable of transporting the data.
Calculation values: A dedicated link of 1 Mb/s can carry about 300 MB/hr, or respectively 7.2 GB/day.
Buffer Location and Size
The disks where source buffer reside should have RAID protection to survive disk failures. Also, the buffer
should be on a dedicated, well performing RAID set locally attached to the DataCore Server (not virtual disks)
and should not be shared with other LUNs – especially not with the DataCore Server OS partition or LUNs to be
used for pools.
Note: Replication buffer and OS should not come from same RAID set! Use of the same SCSI controller for the
buffer and OS can also negatively impact performance.

Source buffer
The source buffer stores all I/Os which are not yet transmitted to the remote site for all replicated volumes. The
size of the source buffer should be considered after determining the maximum allowable IP link downtime.
Replication is asynchronous in that sense that the destination virtual disk can be out of synch and can contain
older data than the source at any point in time and not in any other sense. Host I/O to source virtual disks can
be degraded if the source buffer has relatively high latency. It is best practice to have the source buffer on very
fast storage with low latency.
Size the buffer after considering the possibility of IP link downtime between the source and destination servers.
The appropriate size of a buffer is determined by multiplying the amount of data that is expected to be
transferred daily by the maximum allowable IP link downtime.
For example, your IP link goes down over a weekend. If the amount of data changes is 20 GB/day and the IP
link downtime could go uncorrected for two days, create a buffer that is at least 40 GB. It is better to up-size the
buffer to allow for unforeseen increases in data transfers or miscalculations. If your buffer is 100 GB, then
changes for several days can be safely stored. A general rule of thumb applies: Use a fast local RAID 1 of 100
GB for the buffer and expand if needed.

Exclude Replication buffers from anti-virus applications


If the DataCore Server has an anti-virus application installed, exclude the source buffer directory from being
scanned. Replication files contain SCSI commands and raw data – nothing a virus scanner would detect.
Scanning these files will slow down replication processing and transmission unnecessarily.

Best Practice Guide 42


Backup Applications use timestamps
Backup solutions which backup files and reset the 'Archive Bit' cause a lot of data changes due to the fact that
they touch every file being backed up. A full backup for instance may cause a lot of data changes and
accordingly a high value of data need to be replicated to the remote site. (Note: This behavior is not specific to
DataCore; it applies to every asynchronous replication solution.)

In order to avoid a high data change rate due to archive bit based backups, timestamp based backups should
be used instead. Backups relying on timestamps typically do not touch and change the files backed up. Today
almost all major backup applications are capable of doing timestamp based backups so this issue can be easily
eliminated.

Access Replication Destination Virtual Disk


During normal Replication operation, the destination is continuously accessed and updated by the source. For
this reason, it cannot be accessed by another server without stopping and breaking the Replication set or using
the Replication Test Mode feature.

If the Replication destination is intended to be accessed for long periods of time, such as for testing or backup
purposes, a snapshot should be taken and served to the Host. To ensure a consistent state in the snapshot,
some rules apply as discussed in the Snapshot chapter of this document. For more details, see the
SANsymphony-V Help system and DataCore training course manuals.

Bidirectional Replication

Replication can be reversed if licensed for bidirectional replication.


When a controlled or emergency failover is necessary, the roles of the source (active) and destination (standby)
virtual disks must be swapped in order to reverse the direction of the replication. This can be accomplished by
either activating the standby side or deactivating the active side of a replication. When replication direction is
reversed, the paths from the host to the virtual disk being deactivated will be disabled and the paths from the
host to the virtual disk being activated will be enabled. When the process is complete, paths to the replication

Best Practice Guide 43


virtual disks at the failover site will be active and the virtual disks are in read/write mode. Data changes made by
the host at the failover site are added to the replication buffer on the DataCore Server at the failover site.
When the production site is functional again, the virtual disks at the production site can be activated again
(failback). Data changes will be transferred from the replication buffer on the DataCore Server at the failover site
to the DataCore Server at the production site.

Best Practice Guide 44


10 – Continuous Data Protection (CDP)

Data protection requires adequate resources (memory, CPU, disk capacity) and should not be enabled on
DataCore Servers with limited resources.

Use dedicated pools for data-protected virtual disks. Disk pools used for data-protected virtual disks and history
logs should have sufficient free space at all times. Disk pool thresholds and email notification via tasks should
be configured for notification when disk pool free space reaches the attention threshold to ensure sufficient free
space. Alternatively PowerShell scripts can be created to automatically add raw disks to the pools, and be used
as actions to the task.

Enabling data protection for a virtual disk may decrease I/O performance and should be used with caution to
protect mission critical data only.

The default history log size (5% of the virtual disk size with a minimum size of 8 GB) may not be adequate for all
virtual disks. The history log size should be set according to I/O load and retention time requirements. Once set,
the retention period can be monitored and the history log size can be increased if necessary. The current actual
retention period for the history log is provided in the Virtual Disk Details>Info Tab (see Retention period).

When copying large amounts of data to virtual disks it can fill the history log and cause retention time to be
reduced, enable data protection after copying the data to avoid a significant I/O load.

After an event that requires restoration of data, I/O to the affected virtual disk should be immediately suspended
and then rollbacks should be created. In this manner, older data changes will stop being destaged and rollbacks
will not expire. Keep I/O suspended until virtual disk recovery is complete.

A particular restore point must be chosen in order to create a rollback. There are two types of rollbacks:
Persistent and Expiring. This cannot be changed after creation so careful planning is needed when creating a
rollback.

Persistent - The history log will be blocked from destaging in order to keep the rollback restore point intact. If
the history log becomes full or reaches the retention period, it will fail any new writes to the CDP enabled virtual
disk. This will cause mirrored virtual disks to become redundancy failed, or in the case of a single virtual disks,
will fail new writes from the Host(s). To allow destaging to occur and unblock writes, the rollback must be split or
deleted.
Expiring - Will allow the history log to destage if it becomes full or reaches the retention period but will not block
new writes. The rollback however will no longer be valid. The rollback will need to be deleted.

Do not send large amounts of writes to a rollback or keep it enabled for a long period of time. This could result
in the pool where the rollback history log is stored filling up. Rollbacks were meant to be enabled for short
periods of time and then to be split or reverted to once the desired data has been found or recovered.

Rollbacks should only be created for the purpose of finding a consistent condition prior to a disastrous event
and restoring the virtual disk data using the best rollback. Delete rollbacks if they are no longer needed.

Best Practice Guide 45


11 –Host Best Practices

Special instructions apply to some Host operating systems. Please refer to the DataCore Technical Support
Website Technical Bulletins:

DataCore Technical Bulletins

Usage of MPIO (fail over software)


On Windows Hosts you cannot use any other fail over software if you use DataCore MPIO, as it uses parts of
Microsoft MPIO. You might not be able to install DataCore MPIO if there is another third party fail over Software
installed.

Exchange or Database applications:

How should a MS Exchange database be configured on Virtual Disks?

Microsoft recommends putting the database files and transaction logs on different disks. See the following
Microsoft KB articles:

How to protect Exchange data from hard disk failure

"To provide fault tolerance in case a hard disk fails, keep your Exchange transaction log files and database files
on separate physical hard disks. If you keep these log files and database files on separate hard disks you can
significantly increase hard disk I/O performance."

What Causes Exchange Disk I/O

"Because each Exchange store component is written to differently, you will experience better performance if you
place the .edb files and corresponding .stm files for one storage group on one volume, and place your
transaction log files on a separate volume."

Therefore, when using DataCore Pools to ensure different physical disks are used, use different pools for the
transaction logs and database files. For performance also consider using RAID 0 for transaction logs.

Best Practice Guide 46

You might also like