Professional Documents
Culture Documents
Best Practice Guide 02-2014 PDF
Best Practice Guide 02-2014 PDF
February , 2014
This document was created to be an aid for those who install and configure DataCore Software Storage Virtualization
solutions. It is a collection of insights which have proved to be beneficial over time. It documents good storage and
storage network design as well as good software configurations that optimize utility (stability), performance, and
availability.
The Best Practice Guide is intended for use by trained personnel. We assume that standard terms such as Hosts, Fibre
Channel and/or iSCSI, etc. are understood. We also assume that common tasks such as creation of pools and Virtual
Disks, connection of Hosts, configuration RAID and serving volumes to Hosts are also understood.
Added SANsymphony-V-V. Removed references to 2.x and SANsymphony-V 6.x. February, 2011
Added recommendation to not disable the DataCore iSCSI driver/adapter for those channels which won't be
June, 2011
used for iSCSI.
Overhaul to several chapters – too many to list individually. Removed previous recommendations about NIC
January, 2012
Adaptor settings as these have been found to be non-optimal in many cases.
Re-written to include new features and terminology only for SANsymphony-V February 2014
COPYRIGHT
Copyright © 2014 by DataCore Software Corporation. All rights reserved.
DataCore, the DataCore logo, SANsymphony-V, and are trademarks of DataCore Software Corporation. Other DataCore product or
service names or logos referenced herein are trademarks of DataCore Software Corporation. All other products, services and
company names mentioned herein may be trademarks of their respective owners.
ALTHOUGH THE MATERIAL PRESENTED IN THIS DOCUMENT IS BELIEVED TO BE ACCURATE, IT IS PROVIDED “AS IS” AND
USERS MUST TAKE ALL RESPONSIBILITY FOR THE USE OR APPLICATION OF THE PRODUCTS DESCRIBED AND THE
INFORMATION CONTAINED IN THIS DOCUMENT. NEITHER DATACORE NOR ITS SUPPLIERS MAKE ANY EXPRESS OR
IMPLIED REPRESENTATION, WARRANTY OR ENDORSEMENT REGARDING, AND SHALL HAVE NO LIABILITY FOR, THE USE
OR APPLICATION OF ANY DATACORE OR THIRD PARTY PRODUCTS OR THE OTHER INFORMATION REFERRED TO IN
THIS DOCUMENT. ALL SUCH WARRANTIES (INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, NON-
INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE AND AGAINST HIDDEN DEFECTS) AND LIABILITY ARE HEREBY
DISCLAIMED TO THE FULLEST EXTENT PERMITTED BY LAW.
No part of this document may be copied, reproduced, translated or reduced to any electronic medium or machine-readable form
without the prior written consent of DataCore Software Corporation.
Table of Contents
Each customer environment is unique, which makes giving general advice somewhat difficult. The
recommendations given in this document are guidelines – not rules. Even following all recommendations in this
guide does not necessarily mean that it will be perfect in any regard due to the dependencies on individual
factors. However, following the guidelines should most likely provide a stable, well-performing and secure
system.
This document does not supersede DataCore technical training courses provided by DataCore Software or a
DataCore Authorized Training Partner. Attending one or more of the following training courses is mandatory for
any installer of high availability (HA) environments:
Also necessary is holding a valid DataCore Certified Implementation Engineer (DCIE) certification. In addition
professional skills in dealing with storage devices, RAID technology, networking, Storage Area Network (SAN)
infrastructures, Fibre Channel and/or iSCSI protocol are necessary.
If you do not fulfill the above mentioned points and/or have any difficulties understanding terms or procedures
described in this document, please contact your DataCore sales representative or DataCore training department
for information on obtaining a DCIE certification and the required skills.
Theory of Operation
First of all, prior to any task of system planning or implementation being performed, a Theory of Operation
should be developed. The Theory of Operation is a description of how a system should work and what goals
must be achieved. Operation, especially in terms of availability or safety, is always an end-to-end process from
the user to the data or service. Storage or storage services cannot be considered separately since they are not
isolated instances but integrated pieces of IT infrastructure.
Several books have been written about that topic which this document does not want to supersede. The
following list will provide some aspects of a safe IT environment which are too often overlooked.
Keep it simple
Avoid complexity at any level. Sophisticated environments may be smart in many aspects – however, if they are
error-prone, unstable or complex to support, a more simple approach is often more beneficial.
Separation is key
Any dependency to possible single point of failure should be avoided. Some dependencies are often
disregarded but can impact a whole environment.
Keep storage components away from the public network. Limit users who can access (or are aware of the
existence of) a central device.
Distribute devices across separate racks, rooms, buildings and sites. Create separated hazard zones to
isolate disruptive impacts.
Consider redundant power sources. Avoid connecting redundant devices to the same power circuit.
Use UPS protected power supply and connect every device to it. For example, a UPS back-up does not
help much if the UPS fails to notify the Host to shut down because a management LAN switch was
considered as 'minor' and therefore not connected to the UPS backed power circuit.
Regard failsafe networks (such as LAN, WAN, SAN infrastructure) – a highly-available IT system may
become worthless quickly if it can no longer be accessed due to a network outage.
Do not forget environmental components (air condition, physical location, etc.). A non-redundant failed air
conditioner may collapse all redundant systems located in the same datacenter. Rooms on the same floor
may be affected by a pipe burst at the same time even if they are separated from each other. Datacenters
in the same building may be affected by a fire if a coffee machine inflamed a curtain somewhere else in the
building.
Control access
DataCore Servers should be accessed by qualified (trained and skilled) personnel only. Make sure that
everyone understands the difference between a 'normal' server and a DataCore Server as explained in this
document.
If you will be running the DataCore Server in a Virtual Machine please check FAQ #1155
http://datacore.custhelp.com/app/answers/detail/a_id/1155
Important points:
The boot drive (C:\) should be two hard disks in RAID 1 configuration
Use separate RAID controllers for boot drive (C:\) and RAID sets to be used for disks used in Pools or
Replication buffers
Equip the server with redundant power supplies
Cover hardware components with an appropriate service contract to ensure quick repair in case of failure
Network connection that does not rely on DNS so that inter-node communication can always take place
Separate ports used for iSCSI from performing regular LAN traffic
DataCore Servers should be protected against sudden power outages (UPS)
PCI bus
RAID controllers, Fibre Channel HBAs, iSCSI HBAs, SSD cards, NICs, etc. can generate a lot of traffic which
needs to be transported over the buses of the DataCore Server. When selecting the hardware platform for the
DataCore Server make sure that the system bus and backplane can handle the expected workload.
PCI-Express (PCIe) is a serial bus architecture and best used with other serial protocols like Fibre Channel,
SAS/SATA and iSCSI. For a DataCore Server, adequate server hardware with appropriate/independent buses
should be chosen rather than workstation or blade server hardware which is typically not designed for heavy
backplane IO.
RAID controllers
RAID controllers or SAN storage controllers which are used to control physical disk drives used for DataCore
Virtual Disks have an essential requirement—they must be capable of delivering high performance. Bear in
mind that those controllers typically do not host disks for one server, but need to handle a heavy load of I/O
requests from multiple Hosts.
Here a rule of thumb applies: Low-end RAID controllers deliver low-end performance. For example, an
integrated onboard RAID controller that comes free with a server box may be sufficient to control two disks in a
RAID 1 configuration for the boot drive. RAID controllers which are capable of controlling numerous disks and
delivering adequate performance typically have their own CPU (RAID accelerator chip), battery protected
cache, multiple ports etc.
Best Practice Guide 5
Fibre Channel HBAs / iSCSI interfaces
Fibre Channel HBAs and network adapters used for iSCSI are available in single-port, dual-port and quad-port
cards. From a performance standpoint, there is often not much of a difference if two single-port cards or one
dual-port card is used. From an availability standpoint there is a difference. If a dual-port or quad-port card fails,
most likely all of the ports are affected. For minimizing the risk of multiple port failures, a larger number of cards
are preferred. Keep in mind that you also need a machine that has enough slots to support those cards.
Network Connection
The network connection between DataCore Servers is critical for inter-node communication. User Interface
updates also requires that the DataCore Servers can communicate to each other. Name resolution is needed
for proper communication so if the DNS is down or has stale information this can result in delayed responses. It
is recommended that this link be dedicated and not shared for iSCSI, Replication or other server management
activities.
A DataCore Server’s hardware design is key to superior performance. Before the hardware is set up, learn what
the requirements are first.
In a Storage Area Network (SAN) without DataCore being involved all I/Os from the Hosts to the disks and back
are transported by the SAN infrastructure. This is known as the total throughput and measured in I/Os per
second and Megabytes per second (MB/s).
Host
SAN Throughput
I/Os & MB/s
The value IO/s declares how many I/O operations per second are processed while MB/s specifies the amount
of data being transported. Both values correlate with each other, but must be considered separately.
For example, a database application which creates many small I/Os may generate a high amount of IO/s, but
little MB/s due to the fact that the I/Os are small. A media streaming application may generate massive MB/s
with much fewer IO/s.
As a rule, there is never a sustained throughput. Workload may vary highly over time. User behavior influences
workload, especially application tasks like backup jobs, archive runs, data mining, invoicing etc. may generate
load peaks.
In a DataCore environment, all I/Os go through the SANsymphony-V Servers. In order to ensure an appropriate
hardware design, the performance requirements must be known. Good sources for getting performance
analysis are management applications, SAN switch logs, performance monitoring, etc. If those values are
unknown and cannot be measured, they may be estimated. However, hardware design based on assumptions
may turn out to be insufficient and may need to be adjusted.
Above table (Table 1) shows the maximum values that Fibre Channel ports can achieve. They are specifications
and manufacturer information; they do not reflect the effective amount of user data which can be transported.
In “real life” scenarios, Fibre Channel ports may not be utilized over 66% of their maximum IO/s due to network
protocol overhead. Table 2 shows more realistic values.
On the basis of these values, the number of Fibre Channel ports needed to carry the load between Hosts and
disks can be determined. The amount of ports must be appropriate to satisfy load peaks and should leave some
space for future growth – do not choose hardware that is minimally sufficient.
Symmetric design
Best practice design rule is using separate physical ports for “frontend” traffic (to Hosts), “backend” traffic (to
storage/disks) and mirror traffic (to other DataCore Servers). Basically it is possible to share ports. However, a
clean design employs dedicated ports.
DataCore
If equipped with 8 Gb/s FC-HBAs, this DataCore Server Backend
should be able to transport 260,000 IO/s up to 1440 MB/s.
Server
Amount of required network interface cards (NIC) used for iSCSI traffic is more difficult to determine than the
number of Fibre Channel ports. iSCSI performance typically has higher dependencies on external factors
(quality of iSCSI initiators, network infrastructure, switches and their settings, CPU power, etc.). iSCSI is often
combined with other interconnect technologies like shown in diagram below.
Frontend (iSCSI)
Mirror
(Fibre Channel)
DataCore Server
Backend (SAS/SATA)
The table below shows average iSCSI performance values. Note that these values are just indications. The
nature of iSCSI is that throughput highly depends on a variety of factors.
There is no rule of thumb for calculating CPU load for iSCSI traffic as this depends on the Operating system as
well as on the CPU architecture and number of NIC cards/ports. This is what you should have at least:
For every 2 iSCSI ports (frontend, mirror or backend ports), add one additional CPU (core).
One CPU can handle one FC-HBA triplet (frontend, backend, mirror port) or two dual port FC HBAs.
Following this rule, a DataCore Server with 6 Fibre Channel ports needs 2 CPUs. Or the other way around—a
quad-core is adequate to serve 12 Fibre Channel ports (4 frontend, 4 backend, 4 mirror ports). This rule
assumes the usage of latest available CPU technology.
iSCSI note: Please be aware that iSCSI traffic causes much higher CPU load than Fibre Channel because the
encapsulation of SCSI commands in IP packages (and vice versa) is performed by the CPU. A DataCore Server
which handles heavy iSCSI traffic might have significantly higher demand for CPU power.
Speed of CPU
The faster a CPU runs, the more commands it can execute per second. However, practical experience has
shown that clock speed is a secondary factor in terms of I/O performance. Two slower CPUs are preferred over
one fast one.
This does not mean that clock speed does not matter. It means within a CPU family the clock speed difference
is minor, for example 3.0 GHz compared to 3.2 GHz clock speed of the same CPU type.
Type of CPU
Basically any x64 CPU is adequate; DataCore Software has not recognized any significant performance
difference between CPUs from different vendors with similar architecture and clock speed. However, we
recommend the use of “server class” CPUs instead of CPUs which are intended to be used in consumer
workstations. Intel Itanium processors are not supported in DataCore Servers.
Servers now come with CPU's that are multi-core and multi-threaded in order to deal with the high demand
multi-tasking variable applications. This can provide an overall high throughput benchmark when carrying out
many independent tasks in parallel, but there are pro's and con's of power and cpu efficiency versus low-latency
optimization. Dynamically switching, monitoring and balancing load increases system latency, which affects low
latency applications like SANsymphony-V that is optimized to deal with a very specific single task. The load
detection related to power saving of the system will put CPU and Power into a low power and power saving
state when it thinks that the CPU is idle. SANsymphony-V relies on the internal pooler being constantly
available to deal with I/O as it arrives and not wait for an interrupt to be told about the I/O arrival. Fortunately all
Servers have BIOS settings that can be adjusted to allow optimum settings so that performance is not affected.
You should set the BIOS setting for power management to "static high," to disable any power saving.
Amount of Memory
The Windows operating system of a DataCore Server does not need much system memory, 2 GB is usually
sufficient. DataCore Cache is set to use about 80% of the total amount of physical RAM. This value can be
adjusted if necessary but should not be unless advised by DataCore Technical Support or published in
Technical Bulletin 7b.
When more than 512 GB RAM of memory is installed in a server running SANsymphony-V the software will not
utilize as much as it should unless the registry is edited. The remaining memory remains available to the
Windows operating system but cannot be allocated to DataCore Cache. Please refer to Technical Bulletin 7b for
more detailed information on how to adjust the registry.
RAM Sizing
Exact calculation of RAM requires very detailed analysis and only satisfies a specific point in time, which is likely
to be out of date very quickly. Hence it is always better to oversize and have more RAM than undersize and
have too little RAM in a DataCore Server.
Before running any 3rd party diagnostic tools on a DataCore Server please contact DataCore Support and state
exactly what tool is being proposed to run and why. DataCore has no control over what the diagnostic tool may
do, by its nature it is very likely to be intrusive to the running SAN and may disable components. DataCore
strongly recommends not to run diagnostic tools on a running production SAN environment unless you know
exactly what effect it will have on the SAN. As a minimum, before running such tools, please Stop the DataCore
Software, as you would in a maintenance situation.
Each object's name must start with a letter, but can end with any alphanumeric character. Only ASCII
characters are permitted, with a limit of 15 characters. Underscores and spaces are not allowed but hyphens
maybe used. Object Names are not case sensitive. For more details please refer to ‘Rules for Naming SAN
Components’ in the SANsymphony-V online help guide.
Important Note: Before installing the DataCore Software make sure that the Server machine name is no longer
than 15 characters and that it does not contain any underscores.
Windows Automatic Updates should always be disabled on any DataCore Servers, as they often result in re-
boots, and each update should be reviewed carefully and should be performed during scheduled maintenance
windows only.
Only full Service Packs are qualified. DataCore does not qualify individual Security updates, Pre-Service Packs
or Hotfixes. See FAQ 839 for more detailed information.
This does not mean that DataCore Software will not function correctly with them installed but as with any 'non-
qualified' software installed on a server it should be tested in a non-production environment to verify there are
Best Practice Guide 12
no 'adverse effects' caused by them. Please remember that during any troubleshooting that may arise with the
Server software, DataCore may request the removal of the software to help resolve your issue.
Prior to installing any Microsoft update you should stop the DataCore software and stop presenting vdisks from
that server as many updates from Microsoft require a reboot and may affect the functioning of the Windows
Operating system that the DataCore software is installed on.
DataCore will not begin to qualify any Microsoft Service Packs until their official release date. So there may be a
slight delay while qualification takes place when a service pack is finally released to the general public.
Should any issues arise during qualification, DataCore will post documentation and/or code changes through its
support website that apply to the use of the specific Service Pack as a 'known issue'. We strongly urge you to
subscribe to FAQ 838 so that if any issues do occur, you will be notified by email directly.
DataCore does not recommend combining updates for the operating system with other software or hardware
maintenance on a Server in case of problems that may occur with multiple changes making the solving of the
problem much more difficult.
On a DataCore Server the page file (pagefile.sys) is not used by SANsymphony-V. However, it needs to be
configured of an appropriate size to record debugging information in case of an unexpected stop of the
DataCore Server (memory dump).
For best practice configure "Kernel memory dump" in Windows Startup and Recovery settings and match the
size of the page file with the installed RAM in the DataCore Server.
For more information regarding memory dump file options for Windows please see Microsoft KB article #254649
On Windows Servers if the DataCore UI crashes a “Windows Problem Reports and Solutions window may open
prompting you to send information to Microsoft" please do so as DataCore can analyze the User-Mode dumps
sent to Microsoft.
Also configure Windows 2008 to save these User-Mode dumps on each DataCore Server:
Open regedit, create the registry key;
HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps
Close regedit, no reboot is required.
Now if the DataCore UI crashes a user mode dump will be saved on the DataCore Server and can be sent to
DataCore Support for analysis. For more information about Collecting User-Mode Dumps see
http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx
Time synchronization
In SANsymphony-V it’s important that the DataCore servers are time synched, as it might influence license
activation, CDP and error reporting in tasks and alerts if they are not. In addition, for troubleshooting reasons
(such as log comparison) it is helpful if the system times on all DataCore Servers in a Group and all Hosts are
synchronized.
Leave at default values. Don't try to optimize Windows—other default settings are sufficient.
A DataCore Server is a Storage Hypervisor and must not be seen or accessed by any user beside the Storage
administrators. DataCore recommends connecting the DataCore Servers to a dedicated management LAN or
VLAN, not to the public (user) network.
It is recommended, where possible, to put the DataCore Servers within the same server group to be placed in
the same workgroup and not as members of a Windows domain*. Domain membership could apply policies,
restrictions, user rights, etc. to the machine which may interfere with proper SANsymphony-V functionality.
Therefore DataCore Servers in a Group that do not need to be a domain are recommended to be placed in a
dedicated Windows workgroup, such as "DATACORE.” Do not leave them in the default workgroup
"WORKGROUP." This forces one of the DataCore Servers to be the workgroup’s Master Browser and ensures
quick responses.
*NOTE: Some features require workgroup membership to function, such as when SANsymphony-V is installed
on a Windows Hyper-V cluster. Please refer to the specific documentation for that configuration.
DataCore Servers should use fixed IP addresses. Do not assign IP addresses dynamically by a DHCP server.
Name resolution
SANsymphony-V communicates with their peers by hostname. For simplicity DataCore does not recommend
registering the DataCore Servers on a DNS server but using the HOSTS file and LMHOSTS for static name
resolution instead. Enter hostnames and IP addresses of all DataCore Servers within a Group to the HOSTS file
located in: C:\WINDOWS\system32\drivers\etc.
SANsymphony-V will also need to have port 3793 open on the firewall for inter-node communications, to send a
support bundle and for replication. Port 3260 will need to be opened for iSCSI connections.
It is recommended that DataCore Servers in the same Group communicate with each other over the LAN and
have a dedicated LAN connection that is not used for iSCSI Traffic.
See the following diagram as an example of how a network connection can be setup:
With any virus and worm attacks against Microsoft Windows Operating Systems, it is extremely important to
make the Server on which the SANsymphony-V software resides especially immune to these types of attacks.
This part is to educate administrators in the task of creating, adapting and applying existing policies to the
Server specifically.
NOTE: The topic of computer security is very broad, encompassing many areas. The most recognized
independent authority on this topic is the Computer Emergency Response Team (CERT), and it
classifies computer security into four areas; confidentiality, integrity, availability and mutual
authentication. Making a server immune to virus and worm attacks addresses the area of availability.
This focuses on that area. Other areas are beyond the scope of this, however several additional
resources are provided on other topics for your convenience.
General
As a first step, the storage administrator is encouraged to research and become familiar with the existing
corporate policy for Host security and anti-virus protection. Adopting a best practice that is the same as or as
similar as possible to that of the corporate policy increases the level of ‘buy in’ from other IT and operational
organizations within the enterprise. This is important in order to maximize and leverage expertise already in
house and to minimize the possibility for error due to lack of familiarity or understanding by administrators
outside of the storage group. Becoming knowledgeable about the current corporate security environment helps
the storage administrators focus research efforts and assess threats.
The next step is to leverage publicly available resources for guidance, recommendations and awareness. There
are many to choose from ranging from general to specific information. The following is a sampling of excellent
resources used by DataCore and many of our customers:
In reviewing these resources, keep in mind that these guidelines are written for general applications, everything
from home computers to corporate servers. At a minimum, DataCore Servers should receive the same level of
high attention as any other mission critical, special-purpose enterprise server (e.g. SQL, Exchange and file/print
servers). If you have any questions about the applicability of specific third-party virus/worm protection
procedures to a DataCore Server, consult DataCore Technical Support.
The following list of practices is designed to ensure a clean installation and includes a means to recover quickly
in the event that an infection should occur on any given DataCore Server. These practices follow the guidelines
of the documents referenced above and have been successfully applied by DataCore’s customers, our
professional services team, our certified installation partners, and our own corporate Information Technology
department.
1. Establish a firewall – Few enterprises today have the luxury of physical separation between the WAN
and LAN. Although Microsoft recommends enabling the Windows firewall, DataCore does not
recommend its use. During installation the DataCore software will change the settings that it needs to
functon. DataCore Servers should always be located behind at least one hardware firewall. If the
enterprise has multiple levels of trusted networks, then the DataCore Server should be placed on that
network with the highest level of security or an intranet firewall. If communication between DataCore
Servers need to cross a firewall or if there is a reason to activate firewall software (such as Windows
firewall) then certain ports must be opened. Port 3260 is needed for iSCSI traffic and port 3793 for
inter-node communication and replication.
2. Create a separate Windows Workgroup for Storage Servers – Making them members of the general
domain adds to network overhead.
3. Load the base Operating System and the latest DataCore-qualified Service Pack – DataCore
qualifies the latest Microsoft Service Pack (SP) upon general release. Check the DataCore website
prerequites page, particularly with new SP releases, to determine if it has been qualified for your OS.
4. Change the default Administrator password, control distribution of the password and change it
regularly – This common-sense recommendation immediately increases virus immunity significantly.
5. Review Security Bulletins and apply appropriate Security Hotfixes for the Operating System and
Windows Explorer – For a comprehensive list of bulletins go to the Microsoft Technet Website
http://www.microsoft.com/security/default.mspx
6. Disable unused TCP/UDP ports NOTE: DataCore recommends applying these changes as a part of
the installation before running the Functional Test Plan (FTP) to verify proper operation before putting
any DataCore Servers into production.
7. Apply anti-virus software with the latest virus signatures – Configure the software to scan system
resources including formatted disks only. Raw, (unformatted and partitioned) volumes should not be
scanned as this will interfere with normal operation. Virus scanning policies at the Host will ensure that
the data stored on DataCore managed disks remain virus free. DataCore does not recommend or
qualify specific virus scanning software products. Any known issues are listed on FAQ 1277. As
DataCore does not carry out our interoperability testing with virus scanning software, in the course of
troubleshooting an issue we may ask for it to be temporarily uninstalled.
8. Create a backup of the system (boot) disk – This is a good precaution to take. In the event that a
virus or worm should infect the DataCore Server, it will speed recovery of the Operating System to
reinstall the SANsymphony-V software on.
9. Install DataCore software, Optional Products and latest Product Service Packs – Check
DataCore’s Technical support website at for the latest product release.
10. Do not install any other services or software – The DataCore Server serves a special purpose as a
high-end disk controller. It should not be used to perform other functions not directly related to
SANsymphony-V other than downloading service packs, hotfixes or performing other SAN diagnostic
functions.
11. If Replication is used, exclude the Replication source buffers from being scanned.
If DataCore Servers from the same Group are located in the same room or data center, they should never be
connected to the same power circuit. In case of power loss to one DataCore Server, the other DataCore Server
can take over I/O handling for mirrored vdisks and switch off write caching of the affected mirrors to prevent
data loss or corruption. If both DataCore Servers loose power simultaneously this security mechanism will fail
and cache content will be lost. DataCore Servers and all components in the I/O path should be connected to
battery-backed power to prevent unexpected power loss.
Non-Windows compliant UPS: When a UPS is used that is non-compliant with Windows, SANsymphony-V is
unable to monitor the UPS power state in order to stop the DataCore Servers as the low battery state is
reached. The DataCore Server therefore must be manually stopped to avoid an unclean shutdown.
As part of the controlled shutdown process, the following command needs to be run on the DataCore Server:
It may also be possible for SANsymphony-V users to implement one or more of SANsymphony-V's PowerShell
cmdlets in their own scripts.
See: http://www.datacore.com/SSV-Webhelp/Getting_Started_with_SANsymphony-V_Cmdlets.htm
DataCore do not have specific guidelines on which Powershell cmdlets to use in this case, so if you are in any
doubts simply run the 'net stop dcsx' command mentioned above.
SANsymphony-V comes with a System Health monitoring tool which can be configured. Please refer to the
SANsymphony-V online help system for more information.
http://www.datacore.com/SSV-Webhelp/System_Health_Tool.htm
Install the agent of your monitoring application (Microsoft Systems Manager, HP OpenView, What's up Gold,
Nagios, EventSentry etc.) on the DataCore Server. The agent should be configured so that it scans the
Windows System and Application event logs for events with Type "Error" or "Warning" and reports them to
the monitoring management application or to the administrator’s or help desk’s email address.
In addition, you should configure automated tasks to send alerts via SMTP or configure SNMP Support.
DataCore also has a SCOM component that is a free download from our website.
General: It is not recommended to install a backup agent or backup software on a DataCore Server and do a
full or incremental backup from time to time for following reasons:
1. Backup software may try to get exclusive access to a file during the backup process. If the backup
software tries to lock a DataCore system or configuration file it may cause a malfunction or crash.
2. Restoring a DataCore Server from a full backup (on tape or disk) may be a long process: Install the
Windows operating system, install service packs and hot fixes, install backup agent, connect to backup
server or tape drive, begin restore and wait to complete. There are smarter and faster ways to restore a
DataCore Storage Server.
The DataCore Server itself does not hold much configuration information. The majority of the configuration
information like pool configuration, virtual disk information etc. is stored in the meta data header on the physical
disks in the backend. The information is also stored in the xconfig.xml file located in C:\Program
Files\DataCore\SANsymphony-V.
These files are always identical on every DataCore Server within a Group and are updated on each update of
the configuration.
Backup Configuration
If a hardware (e.g., disk failure) or software failure should occur, your configuration will be simple to recreate.
Configuration files can be preserved from the SANsymphony-V Management Console or by using a Windows
PowerShell™ script file provided in the SANsymphony-V file folder. Configuration files are restored by running
the Windows PowerShell script file. See the SANsymphony-V Help system for more information.
It might be helpful to create an image file of the boot partition (C:\) before carrying out major maintenance tasks,
such as installing a new Windows Service Pack, etc. Make sure that all DataCore services are stopped before
creating an image file. This image is of this specific point in time which conserves the status of all mirrors in the
registry and can’t be used at a later point. Restore this image only if your immediate attempt to update fails. It is
not suitable for use in any other situation or at a later time since actual mirror states will not be preserved. Loss
of data may result if this backup is reapplied improperly.
Installing the Windows operating system can be significantly sped up if an image of the C:\ partition is available.
In order to restore a DataCore Server, install the Windows operating system and then SANsymphony-V from
scratch. Rejoin the Group then restore the configuration as outlined in the Online Help system under ‘Backing
Up and Restoring’. If you have not backed up the configuration before attempting to restore and you need to
get it back, please contact DataCore Technical Support for assistance.
Pooling and sharing resources is a good idea in many regards. However, centralizing contains a risk too. If one
instance fails, several services might be affected. For this reason, modern IT infrastructures – not only storage
environments – provide various levels of fault tolerance and redundancy.
In a true High Availability (HA) environment, the outage of any single device may not impede the availability of
the whole system, but some constraints may still apply. For example, performance may be temporary degraded
in the event of a failure and during the process of recovery. To ensure this any "Single Point of Failure" (SPOF)
must be strictly eliminated from the design.
High Availability is an end-to-end process; from the user to the data. If only some links in the user-data-chain
are highly available, it cannot be considered a true HA environment. In this document we just pay attention to
what we can “see” from our storage perspective. However, many more factors should be considered like
application availability (such as clustering), network availability, power supply, climate control, and so on.
The logical diagram below shows an HA storage environment – everything is doubled (at the very least). Notice
that there are two HBA/NICs in the Host, connected to two independent networks or fabrics, two DataCore
servers each controlling separate storage, all data is mirrored, links between all devices are redundant. For any
single failure, there is always a second path to reach the physical storage location.
A Host
DataCore Servers
Storages / Data
In addition to redundancy, separate components limit the effect of environmental impact. Place components in
different locations (racks, rooms, buildings), connect devices to different power circuits, and ensure that Hosts,
network infrastructure, and air conditioning units are also redundant.
Whether fibre channel or iSCSI networks are used, separate physical and logical isolation of pathing and routing
is needed to achieve the highest degree of availability and proper failover in the event of a failure.
Diagram Fabric-1 shows a recommended serving of a mirror virtual disk to a typical clustered application.
All arrows are from initiator to target.
Blue is the primary initiator/target path for the hosts.
Red is the secondary initiator/target path for the hosts.
The mirror paths are shown in green and pink.
Things to note about this properly mapped diagram are.
Each host has two Fibre Channel/iSCSI and physically separate primary and secondary paths
Host should use the same switch for primary IO
No mirror path should follow any Host initiator/target paths. Particularly those that are part of the same
mirrored virtual disk.
Mirror paths should be routed through separate switches/fabrics to prevent double failure and assure
highest availability.
In order to ensure stable and manageable operations, some basic rules for SAN setup and cabling should be
followed. SANs are typically highly customized to the particular customer environment. This circumstance
makes it challenging to issue generally valid recommendations. Especially in regards to the cabling layout, this
will have to be adjusted to satisfy various needs.
First consider the frontend ports of the environment. Pay attention to following points:
Connect all frontend ports to the same fabric or switch (as shown in diagrams below).
Configure the ports for their purpose (FE (frontend), MR (mirror) or BE (backend))
ISL
ISL
Some scenarios require a different type of cabling layout, such as for long-distance ISLs or to meet operating
system vendor recommendations. Please refer to Technical Bulletins available on the DataCore Technical
Support Website and check with your Host operating system vendor for other requirements.
By default, all ports are set to target & initiator mode. Frontend ports should operate in target only SCSI mode
because some operation systems get "confused" if they see a port operation in both modes simultaneously.
When a port id configured as FE, the local DataCore Server turns the port off if services are stopped. Some
operating systems require this behavior and it speeds up failover if a DataCore Server is shut down.
DataCore Server mirror ports can be connected either directly (point-to-point) or through switches. Both
solutions have pros and cons and their importance must be decided case by case for your environment. In
general, at least two physically separated mirror ports should be used to avoid single points of failure.
PRO PRO
Mirrors do not break if switch goes down (such as Allows configurations with more than two DataCore
for firmware upgrades) Servers
Longer distances possible (such as with stretched
fabrics)
CON
Why zone?
Zoning is a fabric-based service in a Fibre Channel SAN that groups host and storage ports that need to
communicate. Zoning allows nodes to communicate with each other only if they are members of the same zone.
Ports can be members of multiple zones.
Zoning not only prevents a host from unauthorized access of storage assets, it also stops undesired host-to-
host communication and fabric-wide Registered State Change Notification (RSCN) disruptions. RSCNs are
issued by the fabric name server and notify end devices of events in the fabric, such as a storage node going
offline. Zoning isolates these notifications to the nodes that require the update. This is important for non-
disruptive I/O operations, because RSCNs have the potential to disrupt storage traffic.
Zoning approach
There are multiple ways to group hosts ports and storage ports for a zoning configuration. Hosts (initiators)
rarely need to interact directly with each other and storage ports never initiate SAN traffic by their nature as
targets.
The recommended grouping method for zoning is "Single Initiator Zoning (SIZ)" as shown in the diagram below.
With SIZ, each zone has a single port and one or more storage ports. If the port needs to access both disk and
backup storage devices, then two zones should be created: one zone with the port and the disk devices, and a
second zone with the port and the backup devices. SIZ is optimal because it prevents any host-to-host
interaction and limits RSCNs to the zones that need the information within the RSCN.
On Initiators TPRLO has to be turned off.
Initiator Initiator
SAN switch
Target
Initiator
Target
There are two types of zoning available in a fabric: soft zoning (WWN or alias based) and hard zoning (port
based).
Soft zoning is the practice of identifying and grouping end nodes by their World Wide Name (WWN) or their
respective alias name. The WWNs or aliases are entered in the fabric name server database. The name server
database is synchronized and distributed across the fabric to each switch. Therefore, no matter what port a
HBA is plugged into, it queries the name server to inquire the devices in its zone.
With hard zoning, nodes are identified and grouped by the switch port number they are connected to. Switch
ports in the same zone can communicate with each other regardless of “who” is connected to those ports. The
fabric name server is not involved in this zoning mechanism.
Soft zoning implies a slightly higher administrative effort when setting up the fabric compared to hard zoning.
However, it has significant advantages concerning management and solving connectivity issues. Due to the fact
that soft zoning is handled by the name server, a node can easily be moved to another switch/port within the
fabric. Conversely, a node cannot be plugged into a wrong port by accident – it simply does not matter where a
node is plugged in, it always sees the correct zone members.
Furthermore, soft zoning provides valuable information for solving fabric issues, especially if people who are not
familiar with the fabric setup are involved. For instance, it is easier to understand that alias
SQL_SERVER_FabA can communicate (is in the same zone) with alias DATACORE1_TARGET_FabA.
Troubleshooting in a hard zoned fabric tends to be more cumbersome and requires the precise (and up-to-date)
documentation of the physical connections between hosts and fabric (SAN diagram, cabling scheme and so
on).
Naming conventions
Naming conventions are very important to simplifying zoning configuration management. User-friendly alias
names ensure that zone members can be understood at a glance and configuration errors minimized. Good
practice for host aliases is to reference the hostname plus the particular HBA (number, slot, WWN). In case of
DataCore Servers, it might be also be helpful pointing out the port role (frontend, backend, and mirror). Storage
arrays which have multiple controllers and/or ports – this is also useful information to be mentioned in the alias
name. Following are some examples of alias names:
Zone names should reflect their members. Following the rule that there should be just one initiator member per
zone, the initiator alias is unique and is a good name for the zone name too, optionally supplemented by the
targets this initiator can access. For example:
A zone that connects the DataCore Server’s backend port to the storage array XYZ with two FC ports:
Zone name: SDS01_Backend-XYZ contains the members:
SDS01_Backend_2E5D (initiator)
ArrayXYZ_CtrA_Port0 (target)
ArrayXYZ_CtrB_Port1 (target)
In general, the same rules for Fibre Channel environments apply to iSCSI environments. Since iSCSI uses IP
protocol and Ethernet infrastructure, these additional general rules apply:
iSCSI traffic should be strictly separated from the public (user) network.
Use separate hardware (LAN switches) for iSCSI storage traffic or at least different virtual LANs
(VLANs).
Use separate NICs for iSCSI than those used for inter-Server and management communication.
Deactivate every other protocol or service except for "Internet protocol (TCP/IP)" on NICs used for
iSCSI.
Port 3260 needs to be open on the firewall to allow iSCSI traffic.
IPv6 is not supported on iSCSI traffic.
Do not disable any DataCore Software iSCSI Adapters under Server Manager, Diagnostics, Device
Manager, or DataCore Fibre-Channel Adapters.
If some DataCore Software iSCSI Adapters are disabled, when the DataCore Server is rebooted it can
lead to the incorrect channels becoming unavailable after the reboot. This can lead to broken iSCSI
connections to DataCore Software iSCSI targets.
Re-name the Server Ports in SANsymphony-V so it is clear that these DataCore Software iSCSI Adapters are
not to be used for anything else.
Other possibilities are as follows: In SANsymphony-V, remove the Front End (FE) and Mirror (MR) port
roles from the Server Port to prevent use or session log in by an iSCSI initiator.
First consider the frontend ports of the environment. Pay attention to the following points:
Connect all frontend ports to the same switch or VLAN (as shown in diagrams below).
Allow only one IQN per Host initiator to login to the same DataCore target port. Multiple iSCSI sessions
from the same host IP address to a single DataCore Server iSCSI target are not supported.
By default, no more than 16 iSCSI initiator connections are allowed to an iSCSI target at any one time,
please refer to FAQ 1235 for more information.
DataCore Server mirror ports can be connected either directly (crossover cable) or through switches. Both
solutions have pros and cons and their importance must be decided case by case for your environment. In
general, at least two physically separated iSCSI mirror ports should be used to avoid single points of failure.
In order to use iSCSI for mirroring, Microsoft iSCSI Initiator software must be enabled on the DataCore Servers
Do not configure NIC teaming or bonding when using the MS iSCSI initiator as outlined in their release notes.
Customers are encouraged to check configurations and eliminate any situations where multiple initiators to a
single target sessions have been created. Multiple iSCSI sessions from the same IP address to a single
DataCore Server iSCSI target are not supported.
Note: There is one dedicated iSCSI mirror path between DataCore Servers in each direction.
PRO PRO
Simplicity (less configuration, no switch involved) Allows configurations with more than two storage
servers
Mirrors do not break if switch goes down (such as
for firmware upgrades) Longer distances possible (such as for WAN links)
CON
CON
Switch outage (such as for firmware upgrades) may
Distance limitation of cables cause mirror recoveries
Not feasible with more than two storage servers
Disk performance is one the most important characteristics in a storage environment. For designing and
implementing appropriate solutions it is crucial to understand the difference between a “classic” storage array
and a DataCore thin provisioned pool (disk storage pool). In a classic environment, Hosts have exclusive
access to a disk or RAID set. The RAID set handles the I/O traffic of this particular server or application and
should be optimized accordingly. Within a DataCore thin provisioned pool, physical disks or RAID sets are
typically shared among multiple Hosts, the I/O pattern those disks experience may look very different.
In a DataCore thin provisioned pool, physical disks are put together into a pool consisting of multiple storage
LUNs from which Virtual Disks are created. These Virtual Disks are then served to one or more Hosts.
The DataCore pool provisions storage allocation units (SAUs) from the physical disks as requested by the
writes from the Hosts. The data content of the Virtual Disks by default is equally distributed across all physical
disks within a pool.
Host
Virtual Disk
Disk Pool
Different disk technologies have significant performance differences. Disk performance is primarily represented
by three key factors:
Average Seek Time The time the read/write head needs to physically move to the correct place.
IOs per Second The amount of operations a disk can handle per second.
MBs per Second The amount of data a disk can transfer per second.
Compared to drive manufacturer's data sheets these values seem to be low. Published performance values by
drive and storage array vendors are often misleading and generally represent the best case scenario. In an
environment where physical disks are shared among several Hosts the access pattern is typically highly random
which causes a lot of repositioning of the actuators (disk’s read/write head assemblies). For this reason the
average “real world” performance is often much lower than the benchmark maximum measured in the lab.
Fibre Channel and SAS disks have comparable technologies and performance characteristics. For instance,
SAS disks have slightly better performance typically due to the fact the average seek time is shorter on a
smaller platter. FC and SAS disks can respond quickly to random I/O requests and are the first choice for
performance-intensive and highly random I/O patterns like database applications and email server applications.
SATA disks have a different technology inside the box compared to FC and SAS disks. They have a less
expensive mechanical design that results in slower rotation speeds and higher seek times for positioning the
read/write heads. SATA disks can perform very well (up to 90% of FC/SAS disks performance) if they are
accessed by sequential read/write traffic and large I/O sizes. If SATA disks need to respond to a large number
of small, random I/O requests they may deliver poor response times. SATA drives are the first choice for
capacity-hungry application with low performance requirements or mainly sequential I/O like archive systems,
media streaming or backup to disk.
FC & SAS RAID 1 / 10 Tier 1 Heavily used database, email, ERP systems etc.
FC & SAS RAID 5 Tier 2 File Service, lower loaded database applications
SATA RAID 5 / 6 Tier 3 Archive, Media Storage (x-ray, video), Backup to disk
An old rule of thumb is that the more spindles a disk array contains the more performance it delivers. This rule
still applies. A RAID controller or storage controller in a storage array can distribute the incoming I/O requests to
all disks in a RAID set. The disks are acting independently from each other so the performance of the RAID set
is the sum of all disks grouped in a RAID set.
Following are examples showing two RAID 5 sets of the same capacity. One set is built with a few large
relatively slow SATA disks and one set is built with many small relatively fast SAS disks. The calculated
performance of the RAID set with the small SAS disks is approximately 15 times higher compared to the set
with large SATA disks.
When designing the appropriate disk layout of a storage solution, the amount of disks being used is a significant
factor. Not just the requested capacity is essential, but the overall performance requirements define the number
of disks to satisfy performance needs.
RAID Layout
Physical disks connected to a RAID controller can be grouped in RAID sets and carved into LUNs presented to
the host in several ways. Understanding the relationship between physical disk/LUN grouping and the DataCore
pool technology is crucial to get best performance results. The block size must be 512 bytes.
A DataCore Server can send a write to the storage array from anywhere between 4k and 1mb per IO. The
Server can merge writes depending on how they are sent from the Host and this can change the size of a given
write I/O at any time, therefore DataCore recommend leaving the Storage Array's Stripe Size at the Storage
Vendor's recommended default.
The following examples demonstrate the relationship between number of disks, RAID set, LUNs and the
DataCore pool technology.
Pro:
Good performance due to striping across 15 physical
spindles
small storage loss for RAID overhead (depending on
chosen RAID level)
Con:
DataCore has just a single I/O queue to the disks which
may result in congestion
If one physical disk fails, the whole LUN is affected by
RAID rebuild
RAID rebuild may take a long time on large RAID sets
and degrade performance
Example 2 -- 15 Disks in 1 RAID Set, 3 LUNs created from RAID set and exported to the pool:
Pro:
small storage loss for RAID overhead (depending on
chosen RAID level)
Con:
If one physical disk fails, all three LUNs are affected by
RAID rebuild
RAID rebuild may take a long time on large RAID sets
and degrades performance
The DataCore pool concept of distributing allocated
blocks conflicts with the RAID layout – creates
additional seek and rotational latency on the physical
disks, thus degrading performance
Pro:
DataCore has three I/O queues to the disks.
The distribution algorithm spreads out the load across
LUNs and increase performance
A failed physical disk affects just one LUN
RAID rebuild is quicker with fewer disks.
Con:
Greater storage loss for RAID overhead (depending on
RAID level)
Pro
Full use of capacity
Many I/O queues to disks
Pool spreads loads across spindles
Con
One failed disk will affect the entire pool(s)
Disk failure may cause long recovery time
Bad blocks on disks are not recognized
If JBODs are used in a pool, a physical disk failure will impact every virtual disk in the pool. DataCore mirroring
of all virtual disks is mandatory to prevent outage and data loss. Depending on the amount of data that must be
recovered from the mirror partner, long recovery times may result.
Pro
Full use of capacity
Highest write performance
Highest read performance
Con
Highest risk potential
DataCore has just a single I/O queue to the disks which
may result in congestion
One failed disk destroys all data in the pool
Disk failure may cause long recovery time
RAID 0 sets have very high performance typically and the highest risk potential. DataCore Server mirroring of all
virtual disks is mandatory to prevent outage and data loss. A single failed disk destroys the whole affected
LUN/pool and causes all data to be recovered from the mirror partner side.
RAID 1 – Mirroring
Pro
High security level
High sequential read/write performance
High random read/write performance
DataCore spreads load across all RAID1 sets
Failed disk doesn’t significantly affect performance
Quick recovery from disk failure
Con
50% capacity loss
Pools containing multiple RAID 1 sets typically have the highest security level and high performance. The
DataCore pool algorithm distributes the load across all LUNs. Disk failures affect just one LUN and usually
recover very quickly. Pools with multiple RAID 1 sets are recommended for non-mirrored volumes and
applications which cause lots of small random I/Os (like database and email applications). Pools which contain
numerous volumes accessed by many Hosts may experience highly random I/O too.
Pro
High security level
High sequential read/write performance
Highest random read/write performance
Con
50% capacity loss
Less I/O queues compared to multiple RAID1
RAID 10 sets do not have significant advantages compared to multiple RAID1 sets in a pool. Just in some
specific types of access pattern (e.g. heavy sequential reads) the Host may benefit from the underlying block-
oriented striping of the RAID set.
Generally, multiple RAID 1 sets provide the same result plus more advantages (such as more I/O queues) and
are usually preferred over RAID 10 configurations.
Pro
Moderate capacity loss
High sequential read/write performance
High random read performance
Con
Low random write performance
Moderate security level
Failed disk / rebuild impacts performance
RAID 5 sets perform very well with highly sequential access patterns and random reads. Due to the nature of
recalculating and updating parity information, heavy random writes may suffer low performance. Disk failures
cause significant performance decrease during rebuild. RAID 5 sets are good for applications with mainly
sequential I/O or highly random reads like file server, average loaded databases and so on. Creating multiple,
smaller RAID5 sets are preferred over fewer large ones.
Physical disks or RAID sets used in a particular storage tier should be equal in regard to their capacity, disk
technology and performance characteristics.
The DataCore pool algorithm distributes the allocated data blocks equally across all disks in a pool – so it
makes little sense having slower and faster or larger and smaller “spots” (disks) within one tier. If a pool
capacity is intended to be expanded and the original disks/type/capacity etc. are no longer available, it might be
worthwhile creating a new pool and migrating Virtual Disks.
Match pool performance and capacity characteristics with application needs. For example, a pool containing a
high count of spindles of smaller capacity does have a significant higher I/O performance typically compared to
one with fewer disks of high capacity. If performance/capacity requirements vary between applications (as is
typically the case) create multiple pools of different characteristics. Virtual disks created out of a pool can later
on be migrated to another pool if requirements change over time.
*Note: These are examples. It does not necessarily mean that a pool of RAID 5 sets is not suitable for email
applications or a pool with SATA disk is not capable serving a file server. The best RAID layout depends only on
the effective performance requirements of the particular environment.
Auto-tiering:
Fixed Tier Assignment may not always be optimal, and with SANsymphony-V auto-tiering better use can be
made from different types of physical disks in a pool.
Access patterns within sections of the virtual disk can vary dramatically, For example, some files within a file
system may be more popular than others at any given time, e.g.: frequently accessed blocks, moderately
accessed blocks, infrequently accessed blocks. This is more pronounced with clusters of virtual servers sharing
a common virtual disk.
Results in poor use of high performance, premium-priced disks (especially SSDs)
Prevents other performance-oriented workloads from getting adequate response
More can be found in SANsymphony-V help: http://www.datacore.com/SSV-Webhelp/Auto-
tiering_and_Storage_Profiles.htm
The two parts of a mirror Virtual Disk (primary/preferred and secondary/alternate) have be of the same size.
Primary and secondary volumes of a mirror should have the same performance characteristics. In a
synchronous mirror relationship, the slowest member determines the resulting performance. In DataCore
environments where both volumes have independent cache instances, disk performance differences can be
compensated to a certain degree by caching IOs.
However, in some scenarios the true disk performance comes into effect, for instance after a disk failure occurs.
In this case, caching is switched off to avoid any loss of Data kept in cache and I/O requests are rerouted to the
secondary volume directly (see diagram below). If the secondary volume has significant lower performance
characteristics as the primary volume, the Host will experience notably higher response times from its disk. The
secondary disk will not only carry the I/O load from the Host solely, in addition it will serve the resynchronization
traffic when the primary disk comes up again.
For this reason, both sides of a mirror should be of the same performance characteristics. On the other hand,
there is nothing wrong with using disks of different performance – as long as the possible results are acceptable
in the particular environment.
Application Server/Host
write IO
OK
LOG
Cache Cache
There is no set rule concerning the distance between the primary/preferred and secondary/alternate volumes of
a synchronous mirror. The limit will be determined by the acceptable latency of the application using the
mirrored virtual disk.
Read I/Os are always processed locally (primary volume) while Write I/Os need to be transmitted to the remote
DataCore Server (secondary volume). Writes I/Os to a stretched mirror experiences a longer latency, the time
an I/O needs to travel to the remote side plus the same amount of time the acknowledge needs for returning.
In environments where a direct connection (dark fibre network) between sites is used, latency is normally no
problem, for example:
Dark fibre links can be stretched up to 10 km (with 1300 nm laser) or 35 km (with 1550 nm laser). A
dark fibre link of 35 km adds a latency of around 5 micro seconds per km.
o A microsecond (µs) is equal to one millionth of a second or one thousandth of a millisecond
(ms).
Typical SCSI transactions require a transaction to transverse the link 8 times or four round trips.
This means a dark fibre link of 35 km adds a latency of 5 µs * 35km * 8(trips)= 1400 micro seconds(µs) = 1.4
milliseconds (ms) which is negligible for most applications, but it could affect time sensitive transactional Hosts
such as databases which can send a lot of small I/O per second.
However there are other considerations that can add to the reliability and latency across a link which needs to
be taken into account:
Degradation of the signal along the cable; this tends to increase the longer the link and brings down
reliability without extra hardware to correct the degradation.
If the link (HBAs and FC switches, link hardware) has not enough FC Buffer Credit, latency can
increase. The faster FC speed you want to use (2Gbps, 4Gbps, 8Gbps etc.) and the longer the link the
more FC Buffer Credits you will need to fully utilize the link.
DataCore recommends that you talk to your HBA, FC switch, link hardware and cable provider, if you have
concerns about any of the above bullet points.
In general, link distances between synchronous mirror members should neither exceed 35 km nor traverse
WAN connections. If this is a demand in your environment, please contact DataCore reseller and/or integrator
prior to setup.
Snapshot source and destination virtual disks must be of the same size. A snapshot destination normally
contains much less data than the source volume (just the changed blocks). Because of this, it is recommended
to use small storage allocation unit (SAU) size for the pool which contains Snapshot destination virtual disks.
With snapshots, the first change occurring on a source virtual disk causes the migration of a corresponding
chunk of the original source blocks to the destination. For this reason, given heavy utilization of snapshot
relationships with many small write I/Os (typical for email applications) it is advisable to create the snapshot
pool with a small allocation unit size. The small allocation unit size will result in better capacity utilization of the
snapshot pool. The unit size will depend on the size of the write I/Os to the source and the minimum unit size
can be 4MB.
Snapshot Performance Considerations
Snapshot copy-on-first-write process: For every incoming write I/O to an unmodified chunk, the original chunk
needs to be relocated to the snapshot destination before the Host receives the write acknowledgement. The
faster the disk behind the destination, the quicker this can be accomplished. If the disks behind the snapshot
destination are significant slower in performance than the source, this may impact the overall performance of
the production virtual disk (source).
Every active snapshot relationship adds a certain amount of additional load to the source disk. SANsymphony-V
has no set limits on the number of snapshots per source, a high number of active snapshots however should be
avoided. While a couple of snapshots may not noticeably influence performance, numerous snapshots can slow
down the source significantly.
In addition, there is no valid reason for having many snapshots per source virtual disk. Snapshot technology is
sometimes confused as a replacement for backup or continuous data protection. In this case, the usage of other
solutions may be considered, such as DataCore Continuous Data Protection (CDP) or Replication.
Quiesce Host and flush cache before snap
A snapshot is a virtual image of a given point in time. In order to ensure valid data in the snapshot image it is
crucial that the snapshot source is in an idle and consistent state in the moment it gets enabled. Following five
steps must be observed (in order):
1. Quiesce application / stop I/O to disk - ensures and that no more write changes occur during the
creation (enable) of the snapshot.
2. Flush application cache and/or Operation System cache - ensures that all data is written to the disk
and no data remains in cache instances on the Host.
3. Create snapshot relationship - the snapshot source virtual disk is in a consistent state.
4. Resume normal operation of application - after the snapshot is enabled normal operation of
application can be resumed.
The above mentioned steps are usually automated and can be achieved by scripts. PowerShell, Volume
Shadow Copy Services (VSS) and so forth, please refer to the SANsymphony-V Help system and DataCore
training course manuals.
Source buffer
The source buffer stores all I/Os which are not yet transmitted to the remote site for all replicated volumes. The
size of the source buffer should be considered after determining the maximum allowable IP link downtime.
Replication is asynchronous in that sense that the destination virtual disk can be out of synch and can contain
older data than the source at any point in time and not in any other sense. Host I/O to source virtual disks can
be degraded if the source buffer has relatively high latency. It is best practice to have the source buffer on very
fast storage with low latency.
Size the buffer after considering the possibility of IP link downtime between the source and destination servers.
The appropriate size of a buffer is determined by multiplying the amount of data that is expected to be
transferred daily by the maximum allowable IP link downtime.
For example, your IP link goes down over a weekend. If the amount of data changes is 20 GB/day and the IP
link downtime could go uncorrected for two days, create a buffer that is at least 40 GB. It is better to up-size the
buffer to allow for unforeseen increases in data transfers or miscalculations. If your buffer is 100 GB, then
changes for several days can be safely stored. A general rule of thumb applies: Use a fast local RAID 1 of 100
GB for the buffer and expand if needed.
In order to avoid a high data change rate due to archive bit based backups, timestamp based backups should
be used instead. Backups relying on timestamps typically do not touch and change the files backed up. Today
almost all major backup applications are capable of doing timestamp based backups so this issue can be easily
eliminated.
If the Replication destination is intended to be accessed for long periods of time, such as for testing or backup
purposes, a snapshot should be taken and served to the Host. To ensure a consistent state in the snapshot,
some rules apply as discussed in the Snapshot chapter of this document. For more details, see the
SANsymphony-V Help system and DataCore training course manuals.
Bidirectional Replication
Data protection requires adequate resources (memory, CPU, disk capacity) and should not be enabled on
DataCore Servers with limited resources.
Use dedicated pools for data-protected virtual disks. Disk pools used for data-protected virtual disks and history
logs should have sufficient free space at all times. Disk pool thresholds and email notification via tasks should
be configured for notification when disk pool free space reaches the attention threshold to ensure sufficient free
space. Alternatively PowerShell scripts can be created to automatically add raw disks to the pools, and be used
as actions to the task.
Enabling data protection for a virtual disk may decrease I/O performance and should be used with caution to
protect mission critical data only.
The default history log size (5% of the virtual disk size with a minimum size of 8 GB) may not be adequate for all
virtual disks. The history log size should be set according to I/O load and retention time requirements. Once set,
the retention period can be monitored and the history log size can be increased if necessary. The current actual
retention period for the history log is provided in the Virtual Disk Details>Info Tab (see Retention period).
When copying large amounts of data to virtual disks it can fill the history log and cause retention time to be
reduced, enable data protection after copying the data to avoid a significant I/O load.
After an event that requires restoration of data, I/O to the affected virtual disk should be immediately suspended
and then rollbacks should be created. In this manner, older data changes will stop being destaged and rollbacks
will not expire. Keep I/O suspended until virtual disk recovery is complete.
A particular restore point must be chosen in order to create a rollback. There are two types of rollbacks:
Persistent and Expiring. This cannot be changed after creation so careful planning is needed when creating a
rollback.
Persistent - The history log will be blocked from destaging in order to keep the rollback restore point intact. If
the history log becomes full or reaches the retention period, it will fail any new writes to the CDP enabled virtual
disk. This will cause mirrored virtual disks to become redundancy failed, or in the case of a single virtual disks,
will fail new writes from the Host(s). To allow destaging to occur and unblock writes, the rollback must be split or
deleted.
Expiring - Will allow the history log to destage if it becomes full or reaches the retention period but will not block
new writes. The rollback however will no longer be valid. The rollback will need to be deleted.
Do not send large amounts of writes to a rollback or keep it enabled for a long period of time. This could result
in the pool where the rollback history log is stored filling up. Rollbacks were meant to be enabled for short
periods of time and then to be split or reverted to once the desired data has been found or recovered.
Rollbacks should only be created for the purpose of finding a consistent condition prior to a disastrous event
and restoring the virtual disk data using the best rollback. Delete rollbacks if they are no longer needed.
Special instructions apply to some Host operating systems. Please refer to the DataCore Technical Support
Website Technical Bulletins:
Microsoft recommends putting the database files and transaction logs on different disks. See the following
Microsoft KB articles:
"To provide fault tolerance in case a hard disk fails, keep your Exchange transaction log files and database files
on separate physical hard disks. If you keep these log files and database files on separate hard disks you can
significantly increase hard disk I/O performance."
"Because each Exchange store component is written to differently, you will experience better performance if you
place the .edb files and corresponding .stm files for one storage group on one volume, and place your
transaction log files on a separate volume."
Therefore, when using DataCore Pools to ensure different physical disks are used, use different pools for the
transaction logs and database files. For performance also consider using RAID 0 for transaction logs.