VMware Site Recovery Manager

Reference & Troubleshooting Guide
(Version: SRM Reference Guide_x.docx)

Information to help you with your SRM experience! This guide has been created for VMware SE‘s as well as our partners SE‘s who are responsible for working with our products at our customers. It provides design, scalability, troubleshooting, and general information about SRM. This guide is intended for knowledgeable practitioners who are VMware staff or VMware partners. The information in this guide can help an experienced virtualization system engineer, but it can also hurt if you do not know what you are doing. This information also comes with no warranty implied or otherwise. This information is not VMware sanctioned or warrantied. Corrections and suggestions gratefully welcomed at mwhite@vmware.com.

Contents
Background .............................................................................................................................................. 7 Educational materials .......................................................................................................................... 7 Some things to think about for a successful SRM project ........................................................ 7 When is SRM not a good solution? ................................................................................................... 8 Install / Uninstall Information .......................................................................................................... 8
Where should I get SRAs from? .................................................................................................................... 8 Install account.................................................................................................................................................... 9 Install and Configure information for specific environments.......................................................... 9 Install Overview ................................................................................................................................................ 9 Install Test Outline ......................................................................................................................................... 10 Uninstall Information ................................................................................................................................... 10 Installing (uninstalling) on Windows 2008 .......................................................................................... 11

Upgrade / Patch Information .......................................................................................................... 11
What are the SRM build numbers? ........................................................................................................... 11 Upgrading to SRM 4.1.1 ................................................................................................................................ 11 Upgrading to SRM 4.1 – including upgrading to vSphere VirtualCenter 4.1 ............................. 12 VirtualCenter 4.1 ......................................................................................................................................................... 12 Upgrade to SRM 4.1.................................................................................................................................................... 17 Migration - SRM ........................................................................................................................................................... 18 Undo ................................................................................................................................................................................. 23 To our next major release – SRM 4.0 ....................................................................................................... 23

Design Guidelines ............................................................................................................................... 25

What goes wrong in SRM projects? .......................................................................................................... 25 Large VI environments ................................................................................................................................. 26 Suggested Recommendations – aka “Best Practices” ........................................................................ 27 Failback Outline .............................................................................................................................................. 29 Bandwidth Usage ............................................................................................................................................ 30 Multiple Tier Applications .......................................................................................................................... 30 Application References................................................................................................................................. 30 Protecting View Desktops............................................................................................................................ 30 Physical to virtual disaster recovery - P2V DR .................................................................................... 31 Shared Recovery ............................................................................................................................................. 31 Failback (plug-ins) ......................................................................................................................................... 31 A lost protected site and failing back to it ............................................................................................. 32 A sample recovery plan for testing an application ............................................................................. 32 Exchange Recovery Plan .......................................................................................................................................... 32 Adding scripts to a Recovery Plan in a call out .................................................................................... 33 What should I the PowerShell command look like to have it called from SRM?...................... 34 How can I see the environment variables that the admin guide says are available for scripts?................................................................................................................................................................ 35 Can a script execution in a recovery plan impact the inside of a protected VM? .................... 35 Will a non-zero script exit in a recovery plan stop the recovery plan? ...................................... 35 User designed callout has returned a non-zero value: 1 .................................................................. 35

SRM Administration Information .................................................................................................. 32

SRM Reference Guide

Page 2 of 166

What VM parameters are not failed over? ............................................................................................. 35 Does number of PG impact order of start for high priority VMs? ................................................. 36 What about backing up the SRM databases? ........................................................................................ 36 Can I change the Run button to work like the Test button? ............................................................ 36 Can I use VMware Heartbeat to protect SRM and VC? ....................................................................... 36 How can I capture the log and configuration information for support to work with? .......... 36 Where are the SRM server logs stored? .................................................................................................. 36 How do I capture the SRM plug-in log and config info?..................................................................... 37 Where are the Linux Image Customization logs stored? .................................................................. 37 I would like to retain the SRM logs longer ............................................................................................. 37 What happens when… ................................................................................................................................... 38 I add a new hard drive to an existing and successfully protect VM? ..................................................... 38 I add CPU and memory to an existing protected VM ................................................................................... 38 I add a network card to an existing protected VM ........................................................................................ 38 I add a new VM to an existing protection group ............................................................................................ 38 I remove a protected VM from a protection group ....................................................................................... 38 What ‘travels’ with VM’s between PG and recovery plans? ............................................................. 38 How can I tell the SRM version from the log files? .............................................................................. 39 Installation logs ............................................................................................................................................... 39 Automated Install ........................................................................................................................................... 39 Changing log details ....................................................................................................................................... 39 I would like to have a automated SRM type solution without SRM .............................................. 40 How can I have SSL communications between SRM and NetApp .................................................. 40 What happens when I Storage VMotion a protect VM – or how does changes to VM storage affects protection? .......................................................................................................................................... 40 What should I know about using the bulk IP utility? ......................................................................... 41 SRM Licensing Information ......................................................................................................................... 42 How does the SRM 4.1 licensing work? ............................................................................................................. 42 How does the SRM 4.0 licensing work? ............................................................................................................. 42 How does the SRM 1.0 licensing work? ............................................................................................................. 43 What does it look like if my VI is licensed for SRM? ..................................................................................... 43 What does it look like if my vSphere is licensed for SRM – after Update 1? ...................................... 44 What will happen if my license expires? ........................................................................................................... 44 What is the account that is asked for during install used for? ....................................................... 44 Is Essentials and Essentials Plus supported for SRM? ....................................................................... 45 How do I plan for disk utilization due to SRM database? ................................................................. 45 I would like to use trusted certificates with SRM – help! ................................................................. 45 Can I change the IP information for the SRM server? ........................................................................ 45 Can network customization work for operating systems other than Windows? .................... 45 Understanding order of operation for bringing VM’s back online ............................................... 45 How many VM’s can SRM start? ................................................................................................................. 46 Can I start more than, or less than, 2 VM’s per host? ......................................................................... 46 What does the Repair button do? ............................................................................................................. 46 Is it all over when the recovery plan fails? ............................................................................................ 46 Can I move an SRM server to a new host? .............................................................................................. 47 How can I configure a second HBA rescan? ........................................................................................... 47 Recommended minimum alarm notifications ..................................................................................... 48 SRM VirtualCenter events ............................................................................................................................ 48 Is thin provisioned VM’s support with SRM? ........................................................................................ 49 What does Microsoft offer for licenses for DR test? ........................................................................... 49
SRM Reference Guide Page 3 of 166

61 I cannot uninstall SRM successfully – what can I do? ...................................... 62 Network device needed by recovered virtual machine could not be found at recovery or test time ................................................................................................................................................................................................................................ 58 I would like to avoid the messages about shutdown ... 54 How can I change the command Timeout? .......................... and you just uninstalled an SRA ............ 59 What time guidelines can I expect for protecting VM’s? ...................................................... 59 I cannot run more than one simultaneous recovery plan with my MirrorView SRA ............ 52 Text Wrangler.............................................. and database access issues ....... 52 EditPlus ................................................................ resource pool.. 63 I can’t install the plug in – get an error . 51 SRM service doesn’t start.......................... 64 Unexpected MethodFault (dr................................................. 51 What rights does a user require to be a DR operator? ..................................... and event logs show errors with event ID of 7000 and 7009 ..................................................................................................................................... and the error has a –null in it .......................... 55 My Celerra prepare storage fails................................. 57 Why is my IP customization taking about 10 minutes extra per VM? ..............................................” ....... 58 When using Bulk Import I get column errors...... 59 When trying to do Inventory Mappings the VI Client hangs ............. 58 Unable to find any array script files – Please check your SRM installation......................................... 52 How can I have syntax highlighting to help read SRM log files? ............................................................................................ 53 Troubleshooting .................................................................... 58 dr..........................................................................................................................................secondary....................................................................................................................................................................................................................................................................................................................... 62 Only three Recovery Plans can run at the same time .................................................. 62 Unable to create placeholder virtual machine at the recovery site: host............. 61 SRM unlicensed error in logs but you have a good license ................................................. 62 SRM doesn’t start and nothing in SRM logs or event logs – what to do? ..................................................................................................................fault............................................................. 57 ESX 2.............. 60 “Failed to connect to the management system address when executing the discoverArrays command.............. 58 Pairing Issues .WrongVmInventoryPlacement ....... 64 SRM Reference Guide Page 4 of 166 ................................fault.................................................................................................................................. 60 How can I re-initialize the SRM database .......................... 63 Why is Port 80 used in the install but port 443 later? ............. 56 Where is the new Run and Test privileges? ............................................................. 61 SRM doesn’t start.................................... 56 I have accidently deleted my Shadow VM’s – what should I do to fix this? ................................................................5 accessing protected datastore will cause recomputed datastore failures .....................san....................................................................................................................................What vendors have application consistency options? ........................................................................ 56 SQL Authentication....................................................................................................................................... 63 Failed to test failover luns....................................................................................................................................................................................... 57 What causes the Recompute Datastore Group task? ................................................. 58 My Linux VM’s don’t have the host file changed after IP customization ............................. 56 Why cannot I customize Windows 2008? .................... 64 For SQL server use................................................................................................................ 50 What vendors have application consistency options that work with continuous replication? .... 59 What time guidelines can I expect for failing over VM’s?............................................................................................................. 60 Error LUNs with duplicate IDs or numbers received from SAN integration scripts .................................................................................................... 54 Things to watch out for .............................. and datastore are not compatible ................... does the SRM DB user need the DB_OWNER permission? ..................................................... 61 Error: Failed to recover datastore: .......... 57 Why does my recovery plan show error on VM status but the VM’s are ok? ......................................................... Existing with failure...................................................................................................................ManagementSystemNotFound) ......................

.......... failover fail with file write errors ............................................................................ 70 How do I find the Managed object reference (MoRef) for a VM? ..................................................................................................................................................................................... and install log shows VIEINSTUTIL: Failed to open service control manager ................................................................................................................... 66 SRM 4................ 70 Missing testbubble switch on recovery host..................................................... 74 SRM SRA Errata....................................................................................................................................................................................... 65 Error:Expected virtual machine file path ….. 67 Operation Timeout error when doing test recovery ........................................View” on ServiceInstance “DrServiceInstance” 71 Install hangs at 90%................................................................................................................................................. vm-vmname/vm-vmname......................... 70 Null parameter name:key error ................... 72 Error: The operation is not supported on this object .......... 92 IBM .................................................... 82 FalconStor .......vmx cannot be found ....................................................0 Update 1............................................................................... 69 No visible LUN’s during configuration of the array .......................................................................................................................................... 95 Compellent ..... 75 EMC ............................................................................. 64 My recovery site is only using x number of hosts to start VM’s but it should be using y number .................................................................................................... 68 Failed to launch SAN integration scripts .................................. test...................................................... 65 Permission to perform this operation failed .................................................. 69 Failed to connect to NFC during test failover with IP customization .................. 67 Grayed out options for creating and editing of protection group .............................. 68 Is there a limitation of DR failover LUNs for some iSCSI arrays and some Hosts? ....................................................................................................................................................................................................................................................................................................................................... 68 Net::SSLeay::load_error_strings.... 72 Operation failed…Details: VI API Version 4............................................................ 72 You do not see a newly ‘added’ LUN when creating a PG? ...................................................................................................................................................................................................... 71 Protection Group configuration times out ................................................................. 71 Failed to update Perl installation directories ...................................... 68 Array with key “xxxxxxxxx” not found error message ............................................................................................................................................................................................................................................................................................................................ 65 What does SRM database corruption look like? ................................ 66 Errors with using Network Customization ....................................................................................................................................... 95 HP ............................... 95 SRM Reference Guide Page 5 of 166 ........................0............................................................... 71 Error occurred – MirrorViewSRACore.... 72 SRM LUN discovery........................................................................................................... 66 No available Customization specifications found .............................................. 65 Error: A general system error occurred: cannot execute scripts ..................... 67 Recovery Plan error: Unable to access the VM config error message .............................................................................................................Changing passwords after SRM is working .........................dll not found .................................................... 93 Dell EqualLogic ......................................................................................................0 cannot start – I just updated to vSphere 4.............................................................................................................................................................................................................................................. 66 Database access issues..... 71 You do not hold system privilege “System................................................................................. 66 My script needs more time to execute ..... 65 Priority Levels in Recovery Plan don’t reflect my changes ....1 is not supported . 70 Review Replicate Datastores window of Array Manager is blank ................................................ 74 NetApp............................................................................................................................. 74 LeftHand Networks ........................................................... 68 Not sure the error name but interesting problem..................................................................................................................................................................... 71 Execution of scripts is disabled on this system ..................................................................................................................... 66 ESXi – not supported at 1... 68 Can I have a VM with multiple VMDKs spread across two NetApp SRA’s? .........................................................................................................................................................................................................................................0 nor is ESX / VC Update 2 ......................................

.................................... 99 EditPlus . 159 Procedure Tips ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 134 Lab 3 – IP Customization ..............................................100 Lab 1 – Installing SRM ............................................................................................................................................................................................................................................................................................................................................................................................................. 160 Conclusion..................................................... 157 Procedure Hints ....................... 158 Sample 1 – Bulk IP Load Screenshot ...........Miscellaneous Information URL’s .... 98 Syntax highlight module info .................................................................................................................................................................................................................................. 160 What’s New – additions or deletions or changes .............. 157 Helpful Starters ..................................................... 159 Reference Materials . 159 Scenario.................................... 158 Conclusion......... 159 Helpful Starters ................................................................................................................................................................................... 99 Text Wrangler language module .......... 160 Things to Remember about Scripts ............................................................................................................................................................................................................................................................................................................................................... 100 Lab 2 – Configuring SRM ......... 159 Lab 4 – Script Intro .................. 100 Lab Exercises .....................................................................................................................................................................................................................................162 SRM Reference Guide Page 6 of 166 .......................................................................................................................................................................................................................................................... 157 Reference Materials .................................................................................

http://www. In addition.com/product/paperback/administering-vmwares-site-recoverymanager/3688988?productTrackingContext=center_search_results .http://www.com/files/pdf/vcenter-srm-evaluators-guide.vmware.com/resources/techresources/1063 .pdf .com/support/srm/srm_releasenotes_4_1.Background This document has been designed to help your interaction with VMware Site Recovery Manager (SRM). It is important to read every page and fully understand it before implementing SRM at a customer site. as well as people working with a new SRA. and will now also help with design and troubleshooting.1 . Remember important is defined from the company point of view! SRM Reference Guide Page 7 of 166 .html There is a book called Administering VMware‘s Site Recovery Manager by Mike Laverick that is interesting.   A Business Impact Assessment or an existing run book can really help make sure that SRM is successful by protecting what is actually important to the business.http://www. Some things to think about for a successful SRM project Here are some things to think about that can really help in your SRM project. Find it at http://www.com). Educational materials There is a guide called the SRM Evaluation Guide. It is an attempt to share information among users of SRM to provide knowledge and share experience. The SRM documentation is found at the URL below and the Admin guide is very useful! It has lots of important information so you should be familiar with this very useful guide.vmware. The BIA or run book identifies the key apps and their dependencies. It surprises me that I get questions where the answers are in the release notes.html Prior to SRM it was still easier to do DR with virtualization than with a totally physical environment even though it was manual. This is the most well written and informative guide on SRM. suggestions or comments with the author (Michael White – mwhite@vmware. For that reason please share corrections.1.com/support/srm/srm_releasenotes_4_1_1. Troubleshooting is sometimes quickest when you are familiar with the release notes. the reason that release notes are HTML instead of PDF is that they are updated as necessary. It continues to grow as people submit new or updated information.vmware.vmware. and to make your time with it to be more productive.lulu. SRM 4.1 . It can be found at http://www. Often department heads will debate what are the important apps. This document is for the person who has installed SRM once or twice and needs a little help.vmware. For a very good understanding of that visit http://www.html SRM 4. or trying to technically sell someone SRM.com/support/pubs/srm_pubs.

Generally anything that is real time. That alone will determine if an RTO is too aggressive for SRM to handle. Use VMware PSO or someone else but make sure they have experience.         A strong team that will enhance the success of the project will include storage. Get proof in the form of references! A strong plan is a big part of success. This should also help understand what might go wrong in an SRM implementation and how to manage or mitigate it. but we cannot handle the same in RTO. Bear in mind it needs to start VM‘s. or just minutes is something that SRM cannot work with. In simple terms we can handle an RPO of zero or near zero. Some SRA‘s require very different things than others. Storage understanding is key. There is a useful order (it is described below in the Install Overview section) that helps it be a smoother process but it is still simple. and the storage are on the compatibility lists. and network resources.http://www. They can also help with funding and vendor / BU relationships. VC. It is very important to make sure that ESX. server. Have a strong partner to help.com/pdf/srm_compat_matrix_4_1. A Corporate sponsor is useful to help break blockage when two different business units declare their app as most important. Pick only one app and its dependencies and work all the way through including a fallback. Do not use SRA‘s that have been found elsewhere as they are potentially not certified and you may have issues!! I have two SRM Reference Guide Page 8 of 166 . Really worry about the storage and the SRA.http://www. Senior and experienced in each category is of particular importance. Install / Uninstall Information SRM is a simple application to install. it is not for everything. Do not assume that everything is compatible! Make sure by checking the matrixes. Starting VM‘s takes time. Start small and go one step at a time. virtualization.vmware. It is easy to install the Storage Replication Adapter (SRA) but troubleshooting them is quite complicated sometimes. This information can be found easily at the URL‘s below. VMware SRM compatibility matrixes . If a customer needs an RTO of zero they need a High Available solution which VMware doesn‘t have. than your RTO cannot be shorter than 100 minutes. and each requires 10 minutes to start. A close relationship with your technical staff at your storage vendor is very helpful.pdf Storage compatibility . Lab work or a proof of concept is very important to make sure that the entire DR / BC team fully understands the building blocks.vmware.pdf Where should I get SRAs from? It should be noted that the only place to get certified SRA‘s is at the VMware web site. Triple check for compatibility issues! Before you start the actual work! When is SRM not a good solution? While SRM is a great DR tool. They are often poorly understood and poorly documented.com/pdf/srm_storage_partners. and in fact are at the specific patch or firmware level as necessary. If you have 10 VM‘s. that is also very good for datacenter migration projects.

examples personally where there was a stalled and angry SRM install where the problem is an SRA that was not certified and not from VMware so avoid the issue and ONLY use SRAs from www.vmware.com.

Install account
When you install SRM you are prompted for an account and password to connect to VC. This account will be stored in a protected fashion and will be used by SRM to talk to VC. This will be an account that should be treated like a service account. It has a limit of 31 characters and must have a password that is all ASCII. You should not change the password of this account without other steps or SRM will not work. You can find information on this later in the document. During install, when you need to enter an account to access VirtualCenter, you need to be aware that username has a 31-character limit. The host name for VC is 32 characters, and the account name field for dr-ip-customizer.exe is 25 or so characters. Update 8/8/10 – I believe that this has be fixed and the character length is now 80. But I have not confirmed that. I recommend that you use a service type account for the install, which is domain admin, and admin in VC, and later after the install it will become SRM admin too. It should be used for the ODBC account, and to run the VC service as well. It has been brought to my attention (thanks Brock) that our admin guide suggests using the local administrator account for the install, and for running repair activities. I have never done this, and many customers I have worked with do not have access to local admin accounts. I am still using the domain account and will continue to unless there is actually a technical reason to not do this – which I am not aware of.

Install and Configure information for specific environments
The links below provide install info that is most useful for education and working in labs. Much of it is with virtual storage. It is still a VERY good way to learn, as well as useful to test out ideas and learn additional skills. EMC Celerra VSA - http://nickapedia.com/2010/10/04/play-it-again-sam-celerra-uber-v3-2/ and http://nickapedia.com/2011/02/05/how-to-uber-new-celerra-uber-vsa-guide/ FalconStor NSS Virtual Appliance - http://communities.vmware.com/docs/DOC-11410 LeftHand Networks VSA - http://communities.vmware.com/docs/DOC-11408 SRM with Left Hand Networks in a box – http://www.virtuallifestyle.nl/2008/11/vmware-site-recoverymanager-with-lefthand-vsa/#more-60 SRM with NetApp in a box - http://tendam.files.wordpress.com/2008/11/site-recovery-manager-in-a-boxpreview.pdf At the end of this document there is information on various different SRA related issues. Make sure to read through the section that pertains to your install.

Install Overview
It is important to understand the SRM installation overview. You must install using the order of operation as shown in the lab section of this document. You must do this on the protected site first, followed by the recovery side. Here is the outline:

SRM Reference Guide

Page 9 of 166

1. 2. 3. 4. 5. 6. 7. 8.

You will need to create a DB at both sides before you start. SRM application installed at Protected Site SRM application plug-in installed in VI clients that connect with the Protected Site SRA installed at the Protected Site SRM application installed at Recovery Site SRM application plug in installed in VI clients that connect with the Recovery Site SRA installed at the Recovery Site SRM configured at the Protected Site a. SRM server pairing b. Array Configured – both Protected Site and Recovery Site c. Inventory Mapping d. Protection Group 9. SRM configured at the Recovery Site a. Recovery Plan created You should now test and tweak SRM. Remember the goal is to have the required VM‘s running at the recovery site in the least amount of time. Remember when you are testing that you are testing for the applications to fail over in the shortest amount of time, and be functional when they are failed over!

Install Test Outline
When you have your storage ready, and SRM installed, here is a recommended test overview to maximize learning, but also to make sure things work in appropriate order. Some of these steps, if you are doing them with the customer will require stops for education. 1. Simple test failover a. Use no IP customization, or network changes b. Do each CG / LUN individually in a PG (remember, single app or business unit are most common models for organizing storage). However, another idea that may have serious merit at many customers is to organize by tiers – meaning all Tier 1 apps recovered together. This may help order of operation and improve performance. c. Now do single RP that covers off each PG 2. Enhanced test failover a. IP customization b. Isolated VLAN c. Callouts 3. Performance a. Does the SRA support simultaneous test failovers? b. Does the storage support simultaneous access? c. Use the info in this document to try and improve failover time Now you know everything works.

Uninstall Information
It is good to uninstall the SRA‘s first, than plug-ins, and finally SRM. Make sure to clean up the database and other plug-ins. Do this on both sides. Sometimes the scripts folder will be left in the SRM folder after an uninstall. This is due to some miscellaneous SRA files not removed during the uninstall. To be tidy, and avoid potential issues when you install SRM again on this machine you should remove those folders. If you are doing this on Win2K8 check out – page 11.

SRM Reference Guide

Page 10 of 166

If you do re-install make sure you have not missed anything above, and make sure the SRM database has been deleted and recreated as well.

Installing (uninstalling) on Windows 2008
If you have an install on Win2K8 with UAC configured on you will have issues with doing custom installs. You will also not be able to uninstall SRM. An attempted repair or uninstall will hang around 80% forever. This occurs for me on Win2K8 R2 as well. The solution is to right + click on the installer file and use Run as Administrator. See http://kb.vmware.com/kb/1028443 for more info.

Upgrade / Patch Information
Currently we have not had many updates or any that were complicated. It is important to understand that two different versions of SRM cannot communicate and will cause errors in the logs. Generally it is best to start by upgrading one SRM server, than the plug-in, followed by the other side. It is also a good idea to uninstall the SRM plug-in first. The release notes should generally have details on doing upgrades as well. Sometimes the patches have recommended uninstalling the plug in. If they don‘t mention that you can likely skip that step. Sometimes the release notes may indicate something else with respect to the plug-in so be aware. When doing patching, the above is likely all that is necessary. However when upgrading it is quite possible you may need to upgrade your SRA as well as SRM. So watch out for that. As an example, I upgraded my storage and that required a new SRA and it was not noted anywhere (but in this document now) so I had some errors that were cleared up when I upgraded the SRA.

What are the SRM build numbers?
SRM 4.1.1 – 340092 – Feb 10, 2011 SRM 4.1 – 267817 – July 13, 2010 SRM 4.0.1 - 236215 SRM 4.0 – 192291 SRM 1.0 Update 1 – 128004

Upgrading to SRM 4.1.1
This is an easy upgrade since you don‘t need to worry about upgrading to 64bit OS environments. While technically, you do not need to upgrade to vCenter 4.1 Update 1 first, you must for supportability. Fortunately that is an easy upgrade. If you need to upgrade to vCenter from prior to vCenter 4.1 see the information below. I do need to mention that this upgrade will only upgrade SRM 4.1 and not older versions. This SRM upgrade is minor, and there are no new features, but there is a significant list of fixes. Find out more at http://www.vmware.com/support/srm/srm_releasenotes_4_1_1.html . I have done a number of these upgrades without issue. My blog on this is at http://blogs.vmware.com/uptime/2011/02/vmware-vcenter-site-recovery-manager-411-isreleased.html .

SRM Reference Guide

Page 11 of 166

0 / VC 4. Avoid conflict with the existing VC. The instructions below have been used by me a few times and it works. Things to get ready  Make sure you have a good backup of everything that is going to change – which means your VC server.  You will need to use a 32-bit DSN for VUM so see the KB article at http://kb. I recommend you read carefully this document and its references completely. and understand them carefully. I changed my lab to use off host SQL and the upgrade process was much easier.  Your new host that is 64-bit will need to have the same name and IP as the old host.1 build 259021  ESXi – 4.com/kb/1010401 for help in making the 32-bit DSN in a 64-bit OS. You can find SQLncli_x64. The steps below will help you move from a SRM 4.zip doesn‘t.1 – including upgrading to vSphere VirtualCenter 4.1 build 267817  You must use a 64-bit DSN for VC and remember to make it using SQL Native. and then plan an appropriate outage and work all the way through. and the upgrade guide (link here). The files we need: o VirtualCenter ISO – we need the ISO as it comes with a folder we need. Make sure you have access to your service account information for VC and SRM.vmware. You do want to minimize the outage window of both VC and SRM! A very useful reference is the release notes (link here). This is not always necessary but it helped sometimes in this upgrade process. Remove your vSphere Client plug-ins.msi near the bottom of the page at http://www.Upgrading to SRM 4.com/downloads/details. You can extract the ISO    SRM Reference Guide Page 12 of 166 . it would be easier to delete and install new. This is important. VirtualCenter 4. I will try to point out useful information along the way to help in other migration scenarios.0 environment where SRM and VC are co-located (although that doesn‘t impact this process much if they are not) and SQL remote. I had SQL Express on the VC / SRM server and that was not a good upgrade path. Some interesting background  VirtualCenter ISO build – 4. The folder is called datamigration.1 is a lot more complex than previous upgrades.1 build 260247  ESX – 4.microsoft.1 This is more accurately referred to as migration since we are moving from a 32-bit host operating system to a 64-bit operating system.aspx?FamilyId=50b97994-8453-49988226-fa42ec403d17&displaylang=en . that the normal .1 build 260247  SRM – 4. You need to preserve the VC and SRM FQDN name through the migration.1 The upgrade process for 4. In fact. and database. So you will need to build it when it is not on the network.

2.to a location that you will have access to when working on either the old or new VC. Your database for VC is remote. SRM Reference Guide Page 13 of 166 . you should be able to keep your outage to around 3 – 4 hours. Migration . you need to resolve them before continuing. a. Than. click on autorun so that you get the main screen of the install. Near the bottom of it. select Agent Pre-Upgrade Check.VC 1. But that will vary depending on your prep work. either by using / mounting the ISO. datamigration folder that is only present when you have downloaded the ISO o If you have a spreadsheet that details the VM to LUN relationship that is good to have. and understanding of what is needed. o An outage – you will have no VC and no SRM for a number of hours. They will generally have KB articles to help. But this will vary widely! SRM should be available approximately 1 hour after your VC is again available. You should be logged into the current VC (or current VC/ SRM host). but still make sure to have a backup of it. With preparation. or if you have extracted the files from it. under Utility. If you have any issues that the check finds.

SRM Reference Guide Page 14 of 166 . On the existing VC / SRM machine. copy the datamigration folder to the local hard drive and expand it. 3. or your domain admin. Make sure to use the Windows credentials that is your VC service account.Autorun screen with the Agent Pre-upgrade check highlighted.

SSL certificates. When you execute the batch file. the path will be the same for both VC and VUM.log file will provide info on how the backup went – backup. it will normally only have a few questions. 7. Do what is necessary to make it part of your domain and ‗healthy‘. and creating the 64-bit DSN. Now you must turn off your existing host. Disconnect the network from it to make sure it is not accidently turned on. The datamigration folder has a data folder now that contains the backup. And you should generally say yes. which has the same IP and FQDN. or join it to the domain. Use the commands: – this may not be on this host if your SRM is not co-located with your VC.bat is in the datamigration folder. Now you need to use the backup. You may need to patch it now. This backup doesn‘t backup your remote database. VMware VirtualCenter. b. a. and licensing information. you need to have access to the install media so map a drive. Make sure that VMware Update Manager (VUM). The URL‘s earlier can help find what you need.bat file that is in the datamigration folder to do a backup of your Virtual Infrastructure environment. VMware VirtualCenter Management Web Services. You need to copy the entire datamigration folder to the new host. 10. You will now turn on your new host. The install. If you are using the ISO. You will be asked for the path to VC and than VUM.bat process.log will echo the work done. 9. 11. a. This includes the 64-bit SQL Native client install. Note the log folder? The backup. net stop “VMware vCenter Site Recovery Manager Server” The start of the install.bat file from the datamigration folder to start the install process. Use the install. a. and creating a 32-bit DSN. 8. 5. Now copy the entire datamigration folder to a location that you can copy it from in the future to the new VC host. 6. It will ask about if you wish to include ESX or VM patches. net stop “VMware Update Manager Service” c. but it does backup the port settings in use. net stop “VMware VirtualCenter Server” b.4. or have extracted the ISO. SRM Reference Guide Page 15 of 166 . On the new host.

SRM Reference Guide Page 16 of 166 . VC. Use the appropriate 32-bit DSN iii. Use the same ports. d. After the VC install is finished the install will return you to the install. You will see the normal install prompts for VC. j. e. If you. It is important to understand that the install. Select the same path as you had previously used (on the old VC) f.bat file and start the VUM install process. 12. A nice improvement! g. Notice how you have a choice at some point to do an automatic. 13. i. Connect to the VC. After the install is finished you will be returned to the install. Use the same DSN information. h. c.log file in the logs folder if you need to see what was done. 15. The next prompt is about the size of the JVM memory. i. and the VC Web service is running. Now install the VI client from the autorun screen. or manual update of the VC agents on hosts? I used automatic.b. There is a restore.bat is very smart. 16. Use the default or make a more appropriate choice. Confirm that VUM. like me. including with the correct credentials. and exit. Accept the defaults. you can start the install batch again after you have the 32-bit DSN and it will continue where it should! 14. They are likely NOT. ii.bat file. don‘t have the 32-bit DSN for VUM. Enter your VC service credentials. VUM will now be installed.

It will not redo an unnecessary install but rather start where you last finished successfully.  Remove your SRM plug-ins. which is NOT the default on a 64-bit OS.1 when you are using a 32-bit host OS. But as I mentioned earlier. Install the VUM plugin.17. fix the DSN issue. If you make a mistake. After the upgrade.http://www. the information below might help.vmware.com/pdf/srm_admin_4_1. You have now upgraded one of your VirtualCenter servers. Check the VMware download site for SRM and the current version of SRAs. your VC should show a version of something like above. If you need help with that see above in the VC section. You need to do the other one now! Note1: In all of the work I did.com/support/srm/srm_releasenotes_4_1. This could be on your VC or your desktop. Release notes URL – http://www. and restart the install batch file.pdf Some interesting background  SRM 4. 18. we always had the VC / VUM services NOT start.1. SRM Reference Guide Page 17 of 166 .1 This too is more a migration. remove the plug-ins first. If you are in fact using already a 64-bit OS.  SRM 4. Now check your VUM config. and then it worked.1 build 267817 Things to get ready  You will need to have your VC infrastructure already upgraded!  Make sure that SRM can do a test failover!  You will need to confirm that your SRA has been upgraded or certified for 4.1 and your config. Note2: Be careful with the 64-bit and 32-bit DSN‘s as it can get careless. and we had to assign the proper credentials – instead of the Local Service.vmware. and the SRA download for a readme that talks about what it is certified for. but while we do not step through that process below. you can do an in-place upgrade. but this time we don‘t have the lovely datamigration script to help us! But the steps below will help! The process below is for migrating to 4. and any other items to make sure what you have is 4. A very nicely done install.html SRM Admin guide URL .1 bits  SRA bits  Be aware that SRM requires a 32-bit DSN. you can cancel the VC or VUM install process.bat file! Upgrade to SRM 4.

2. SRM Reference Guide Page 18 of 166 . You should selct Yes.0 database. 6.xml file from each SRM server to a location where you will be able to access the file later. 4. This is due to using your old database with a new install. If you are re-using the SRM 4. Select Yes at this prompt. Backup the SRM database on each of the two (or more) sides. Migration . Important to note: a. Copy your vmware-dr. Turn off the old SRM host.  You need an SRM backup. b. make sure to use a copy and not the original. The default location is C:\Program Files\VMware\VMware Site Recovery Manager\config .1 on the new host. Turn on the new SRM host. Create a new install of SRM 4. You will be prompted about there being an SRM extension installed already. 3. it needs to be restored like that too. You should also have history reports as hardcopy just in case. but it needs to be taken at the same time – as if it was in a consistency group. BTW.xml file so you can deploy the new host. Remember that your new SRM host must have the same name / FQDN so you will need to turn off your old SRM host after you have your backups and . Errors or a cancellation could corrupt your database. 5. You will need to select the Automatically create the certificate choice. c. Make sure you have a 32-bit DSN.SRM 1.

7. It should continue fine. and hit retry. and reconfigure the array manager credentials. if you have changed advanced settings where you will need to migrate them. e. but not configured completely.d. 12. and install it again. You should now do a test and make sure it works! SRM Reference Guide Page 19 of 166 . there is a small thing to remember. Now you will need to re-create the site pair. See the section below for help. This is the time. SRM will likely NOT start. make sure to select your array! 13. and after a restart of the VI Client the name changed. 10. Now get the other side done. the plugin had the name of vDr instead of the fully spelled out name. It still worked. remove it. you will need to select the array. Make sure to do that before you continue on. in particular the authentication information. If the plug-in has not been removed. Install the SRA. Change the credentials with it to the proper SRM service account. Remember the DSN is 32-bit. You now have SRM running. 9. When you re-enter your credentials to the Array Manager. 8. right after the SRM upgrade. When doing this. 11. After entering the correct credentials. Several times in my testing.

you do not need to do this section – which should be true for most customers.You are now complete. But if you are not sure.hostRescanRepeatCnt. you will need to work through the process below. See this by <right+click> on the Site Recovery lighting bolt. See below for a screenshot of the Advanced Settings categories.CommandTimeout or San. please do not hesitate to contact our support organization but also leave me a comment! Migrating Changes to Advanced Settings If you have not made any changes to Advanced Settings. If you have any issues.Provider. Changes would be things like SanProvider. SRM Reference Guide Page 20 of 166 . If you know the changes you made you can just add them to your new install.

See below for that. Load the one from the Recovery Site when editing the Advanced Settings for the Recovery Site and do the same for the Protected Site. you will need to work through each category.xml file we record it in the Advanced Settings.xml file. When you are in the Advanced Settings window. One example is localSiteStatus. 1. in our example of localSiteStatus. SRM Reference Guide Page 21 of 166 . In the section of the VMware-dr.xml file for that phrase.xml file. See below for an example. 2. look for variables that match in the Advanced Settings category and change the value to match what is in your VMware-dr. This is a sample of the VMware-dr.This is how you access Advanced Settings Start by loading your vmware-dr. 3. Search your VMware-dr.xml file. After we see what is in the Vmware-dr.xml file you find the category.

but after three successful test failovers. but when I changed local service to the proper credentials it did and all was good. The error was Error:Error occurred: failed to prepare shadowVM for recovery. So I did. One VM was successfully recovered but three were not.  Forgot to update the credentials for the arrays.  Used a 64-bit DSN for SRM. And it worked.  The SRM service never started. I removed the protection group that held the VM‘s. Remember you will need to work through this process for each category. But by changing the credentials on the service for the correct ones solved the issue easy. but saw one that was called vcDr and it worked. So created one and restarted!  It may not be connected to the upgrade. Some issues I found I have mentioned these issues elsewhere but thought I would mention them here again.LocalSiteStatus of the Advanced Settings.  Did not see the Site Recovery Manager plug-in. I kept finding LocalService but once changed all was good. Then tried the 32-bit and it worked!  Didn‘t know to install the 64bit SQL Native client on Win2K8R2 SRM server. I had one fail.  Didn‘t know VUM needed a 32-bit DSN. Restarted the vSphere Client also cleared it up. than made sure that the folders on the ShadowVM LUN associated with SRM Reference Guide Page 22 of 166 .  None of the VC services started when they were supposed too.

Restart the VI client. This would work fine. Important Note three: I was using a legitimate SRM 4. However.0. Than start the services and you should be good to go.0 Note: I will delete this section in one of the next updates of this document to save space.com/support/srm/srm_releasenotes_4_0. I than recreated the PG. For many tests with no issues. there may not be new versions. c. Make backup of SRM database in case you need to rollback. Protection Site a.com/uptime/2009/10/srm-40-is-here-the-wait-for-vsphereand-nfs-support-is-over. I don‘t think any customers would have this sort of license but you are warned now. I discovered that I had old style licensing. Same steps as above. 2. Important Note two: You need to check SRM download page to see if there is new SRA‘s that you should use. With NFS support there will be new versions of the SRA. e. Recovery Site a. Release notes .html Important Note: You can upgrade an existing site with no changes – especially database. d.0 license before the vSphere Update 1 upgrade. Important Note three: You will need to have your new license.com/kb/1013166 Upgrade blog posting . you will not be able to enable it after the upgrade.http://kb.http://blogs. allow it to do the database upgrade. and than install new and point it at the upgraded database. and after the Update 1. Make sure your SRM plug in is enabled. and when I should have seen the new SRM Solution Licensing I saw nothing and SRM didn‘t work. Undo If you wish to undo this migration it is almost easy. g. You can learn about the new licenses on page 42.those newly unprotected VM‘s were gone (several were not). If this happens to you call support and tell them I sent you! 1.html Upgrade KB article . Install / upgrade the SRM server.http://www. OR you can install new. but if you are not going to use NFS. and you may need to restart your SRM service after the update once extra time to deal with license related issues. Upgrade to VC 4.vmware. You will need to use the steps below to successfully and smoothly upgrade to the next release. To our next major release – SRM 4. there is a workaround in that you would do a simple upgrade.vmware. and turn on the old ones. f. Make backup of VC database b.vmware. If you try to install new and use the old database there will likely be corruption. But you should check. If it is not. You would turn off the new hosts. attached it back to the recovery plan and it worked fine. Upgrade the plug in. and restore the backup copy of the databases I mentioned you needed to have. SRM Reference Guide Page 23 of 166 . You would need to stop all of VC and SRM services. They would not be happy since the databases that would still be in use would be the new ones.

c. Upgrade quickly so minimal outage / exposure. It might be easier in a Protected Site / Recovery Site situation to start upgrading ESX hosts on the recovery side first AND not upgrade to VH7 or VMware Tools. to update everything to ESX 4 but without updating the VH or VMware Tools.5 Ux. But they can failover if they have not been upgraded to VH7. Protected Storage. h. make sure to watch out for the VH levels as you can get errors trying to configure a PG to fall over in appropriately – i. a. As noted above. e. If you upgrade the OS to Win2K8 as part of the upgrade. If you have issues. If both sites are hosting protected it could be interesting! But the same idea might be good. and cannot proceed. 3. you should uninstall / reinstall SRM. remember that ESX 4 cannot failover to ESX 3 IF the VM‘s have been upgraded to virtual hardware version 7 (VH7). g. This can be done if necessary at the cluster level too I think. f. There should be backup copies of those files that you can than copy and paste back the custom entries. until all was done. Server (lanmanserver). until after the protected site is also upgraded. If you try a test recovery on the recovery side while the protected side is being upgraded there may be issues – so try to not do this. Make sure that DHCP client. b. and Workstation (lanmanworkstation) are all running on the SRM server before the upgrade.b. make sure the Protected Storage mentioned above is running. A recovery will run fine if the protection site is upgraded but not the recovery side. 5. 4. Hand made modifications to any of the <SRM root>/conf/* files will be overwritten. SRM Reference Guide Page 24 of 166 . ESX 4 hosted VM‘s that are at VH7 to a cluster that is held by a ESX 3 host. and as soon as possible test a real failover. Try to avoid this but it does work. b. Notes: a. But first rollback your VC to 2. d.e. When in linked mode. I suggest that you use linked mode for VC so that you can work with SRM easier than having two clients open. you can also only license in one place and yet select both sides to apply the license too. See below for the place to add the license. Now install the licenses! See screen shot below. Test the test failover. When you decide to start upgrading ESX hosts to vSphere. So it is a bit easier for licensing too.

The Admin guide has some very good information but we will look at things in this section that are not covered in our guide.Advanced Settings option in SRM 4. It is important to understand that for a test. to have all of your VM‘s in the same LUN(s) to provide the best situation. Remember the whole LUN must failover! It is worth thinking about having a department. It is often very successful for the greatest granularity during a failover or test failover when each CG hosts one or more LUNs that hold only one APP or one business unit. A Powershell script that can help with understanding where VM and their disk files are and on which LUN can be found at http://www.xml file. or an application worth of VM‘s in a LUN or LUNs to provide the best flexibility in a test or real failover. or a real failover. SRM Reference Guide Page 25 of 166 . This is not simple but instructions are in this document that will make it much easier! Look at the SRA information in this document. What goes wrong in SRM projects? The issues in SRM projects that become difficulty or expensive generally fall into three different categories. As part of this I would include some XP VM‘s for test purposes.php/vmware/another-way-to-gather-vmware-disk-info-withpowershell/ . Some vendors use Consistency Groups (CG) to group LUN‘s and this becomes the granularity that is seen through SRM. If VC is using trusted certificates than SRM must too.0 to replace manually editing the vmware-dr.peetersonline. as it will sometimes provide information that will impact your design.nl/index. and where to add the license. Design Guidelines This section will look at some design information. A VM must have its VMDK(s) on the same storage vendor arrays and NOT on two different storage vendors storage.

So investigate the SRA carefully. Linked Mode in vSphere will help too. and 1500 are protected. I would create two POD‘s. Application knowledge – we need to know what the corporation thinks is the most important app. So each POD would have up to approximately 750 unprotected VMS and up to 1000 protected VMs. We than need to know all of the upstream and downstream services that application needs to be considered working. We are working hard and fast to make these problems go away but in the meantime here is some useful information. EMC has finally gotten very good release notes. If the customer has a Business Impact Assessment (BAI) report it will help enormously but most don‘t have that either. Design your SRM infrastructure in a POD design. you may have some issues with SRM. This will allow SRM to work better as the full 1000‘s of VM‘s both protected and not protected are not seen by SRM. Each of the next major releases will continue to allow more scalability. In addition. You will get the granularity for testing or failover. Change Control will sometimes have very good info to help with understanding applications and their necessary services. Most people have little storage organization so this will need to be taken into account! 2. So this can confuse and frustrate customers. The best idea is to slowly migrate your applications to be protected to new replication LUNs. This was due to it appearing to be easier for them than anything smaller. Large VI environments If you have a large number of hosts. but they are hard to find as they don‘t ship with the SRA. So Exchange might be scattered on 10 different LUNs. Align each POD with a business or departmental unit and it will lessen the impact of the extra VC‘s to manage. and hopefully would be backed buy a corporate production SQL or Oracle cluster.1. Also. And there will not be any granularity. Sometimes these requirements are not written anywhere easy. These issues are considered scalability issues in both the platform and UI. 3. All of that information is necessary to build a test plan. So if you have 2500 VM‘s total. and it will be easier to upgrade the array in the future. This can be quite big when you consider all of the applications that companies might have! In addition. It is a good idea in this example to separate SRM and VC. Sometimes it needs a path change that its own installer didn‘t do. The pod should only manage approximately 750 unprotected VM‘s (and 1000 protected VMs) and less than 150 replicated LUNs. Storage Replication Adapter (SRA) – this little tiny piece of software can cause a great deal of grief. Some other recommendations would include: SRM Reference Guide Page 26 of 166 . Each POD would have separate SRM and VC servers. and VC and SRM installed on separate servers. I suspect they still have not upgraded to vSphere! But another issue is when there is no pattern for where applications are stored. Storage Organization – I have seen once potential customer for SRM that used 4 TB LUNs through their virtualization world. and if possible divide the protected VM‘s and the unprotected VM‘s between them. and VMs. not just what an IT manager thinks. Sometimes it needs a special license like SnapView for MirrorView or space efficient for IBM. more likely is the division by business or departmental guidelines. They normally only occur when there is very large numbers of VM‘s and hosts. However. often the SRA‘s don‘t support all of the features that the replication supports. most companies are in the category of not entirely sure what apps or what services they need. This will mean in a failover that all of the apps on the 10 different LUNs will need to failover.

VMware Tools speed recoveries as if they are not installed we must wait for the timeout to occur! High recovery should only be used where necessary as it slows things done. that may help the need to use a POD design. Less PG‘s speed up recoveries.enforced 150 Protection Groups – enforced 150 Replicated LUNs – advisory only (this could be more than an actual 150 LUNs depending on how your LUNs are managed. That way you will know it will scale without breaking. Do the storage layout just so everyone agrees on it and how it will grow.0 Admin guide is shows the new SRM maximums. As well. A general comment would be that adding VM‘s to protection groups is less costly in resource usage than adding PG‘s. These recommendations are not the easy answers that you may be looking for. Suggested Recommendations – aka “Best Practices” People are always looking for easy answers.       Large recovery plans may require more resources (processor / RAM / ESX servers) at the recovery site than at the protected side due to the nature of failovers and trying to start everything so quickly. This may help minimize the frustration and make the management a little more logical. To maximize performance you should. They include:     500 protected VM‘s . We can make some suggestions around SRM and some other general observations. Page 23 in the 1. that is as designed so that we can exactly determine the order of VM recoveries. try to have each recovery plan target a separate cluster.enforced 500 protected VMs in a single protection group . You should separate the VC and SRM databases as they are heavily used during a recovery. But only use it when you need too.enforced 150 Protection Groups – enforced 150 Replicated LUNs – advisory only (this could be more than an actual 150 LUNs depending on how your LUNs are managed. I am concerned about the terminology of best practices because much of our customer base will SRM Reference Guide Page 27 of 166 . but so much of the work around DR is outside of SRM it is hard for VMware to have suggestions or best practices. SRM is a very small part of a DR solution.0. Try to minimize this by building the pods within the limits above but also as department / business unit / or maybe even application / service based. 3 running recovery plans – advisory only On approximately page 11 in the SRM 4. If however you are starting with 3000 VM‘s to protect on day 1 the POD design will help. or perhaps just increase the frustration factor. when doing simultaneous recoveries. Another way of doing things. but do not hesitate to use what is necessary. is to do a 3 year sizing forecast and figure out what the end state architecture needs to be to support the number of projected workloads. Of course.1 U1 Admin guide shows the SRM maximums.      1000 protected VMs . 3 running recovery plans – advisory only When you need to build SRM in pods like this it can increase the complexity of management. and the RTOs (ie how much horizontal scaling) than backdate the end state picture to what you will implementing day 1.

SRM must start VM‘s. Patch regularly – SRM is not frequently updated. and make sure it is visible in the Array Configuration of SRM before attempting to use it. tricky manage but it is also powerful. Meaning you can fail your individual tier one applications over without failing anything else. With NFS it is less costly to have less and bigger mounts. I like the idea of doing a Health Check before starting an SRM project. The 10 MB files will compress very well! See page 37 for more info on this. This change is in SRM 4. I would recommend strongly the use of the VIX API for the scripts as well. please make sure they apply to you before implementing them. Plus. You can see page 33 and page 35 for more info. The log files compress very well after they are used. Service Level Agreements – SLA‘s are something you should tread carefully around since they can sometimes be a factor in a problem with SRM. 100 VMs in 1 PG failed over in approximately the same time as 30 VMs in 30 PG‘s. and more costly to have more and smaller. we cannot support the same in RTO as we need time to start VM‘s. Costly in this case impacts time to mount / dismount. but still. in internal tests. The RTO should always include the time to make the decision to execute a failover. Applications can make big differences in this testing. Script usage – you should think about your scripts. and in virtual machines you could have big drives. Experience suggests they will need partial failovers more often than complete ones. More protection groups lengthen failover. Use the account suggestions in page 9. 8. 5. Let replication finish before adding the newly replicated LUN to SRM. and that time is something that needs to be measured before any SLA should be agreed to! This means while we can support an RPO of zero or near zero. 9. Log Settings – you should increase the settings around logs. 13. 2. People should look at the recommended best practices and see if they apply. See how to do this on page 46. or many small scripts when it comes to IP Customization? Definitely you should store those scripts in one place – which should be on the SRM server. Perhaps it is best to minimize the use of high priority and to plan for the use of recovery plans to provide the control instead of priority. and provides significant opportunity for the customer to have very granular failovers if they need them. For example. Adding more PG is more costly than VMs. 3. This is of course. but it is important to upgrade when those patches are available.consider them best for them but we cannot do best practices for everyone. Think about keeping 100x 10 MB files. Increase the number of threads – If you are not using SRM 4. but it is a good idea to – where reasonable – minimize the number of PG. Plan specifically for a partial failover. 10. 7. 12. 1. 15. This is accomplished by organizing the storage so that you can in fact fail over just one app. See the recommendations at page 48. See how to do this on page 67. 11. The recommendations below are the first recommendations I have done for SRM and I think they should apply to most.1 you should increase the number of threads in use to avoid some time out issues. Should you use the idea of one big script.1. Maximum power on – can this be changed? By default we power on 2 VM‘s per host to a maximum of 10 hosts. or at least think of them and decide to not use them. Less shorten it. You can change this if you have lots of resources in terms of memory and processor and storage bandwidth as well. 6. SRM Reference Guide Page 28 of 166 . 14. When you unexpectedly need SRM to work you really need it to go and patches can solve issues that would stop SRM from working when you need it. This makes testing much easier. Alarms – you should configure the minimum set of alarms. 4. and generally there is a lot of big drives in physical machines. High priority provides maximum control but the slowest execution. Within reason of course. remember that the decision time must be part of the RTO. In particular it is worth doing it on the DR site to make sure it will be healthy when it is required.

3. delete the shadow VMs. It is ideal to have one of these documents if possible. 2. Delete them. a. 4. perhaps the whole company. A general idea of the failback is to do what you have already done in reverse. See more of them on 48. 17. Clean up any artifacts that remain from the original failover and the subsequent failback. and there are other vendors like FalconStor that are doing this.0 Admin guide). some EMC SRAs. Setup replication c. Remove the placeholder VM‘s b. So this may be done already. Set up SRM to failback – which means you are setting up SRM to fail over to the original protected site.0 Admin guide there is a very nice checklist for doing a fallback (page 41 in the SRM 4. In addition. d. 21. VUM. This is application discovery and mapping software that can tie servers into applications and help understand what is missing from a test. both NetApp and FalconStor have good documents for doing fallback that include both the storage and VMware steps. SRM Reference Guide Page 29 of 166 . Inventory mappings. But it is important to understand what the outline should be so you understand the big picture better. The steps might look like: 1. i. and the failed over VM and PG‘s are seen as invalid. Make sure the replication finishes. Clean up the Protection Groups and mappings at the previously protected site. Setup the original protected site – but first clean up! a. Reconfigure the array manager for the new direction b. especially in troubleshooting of failed tests to use some sort of software like vADM. Inventory mappings. b. I recommend if possible having a 5 GB shadow VM location so that the size will prevent any confusion by people putting real VM‘s on it. and the recovery plan(s) at the previously recovery side and start over. such as the Celerra plug-in for VC. This is important and do not forget it. Cofnigure array manager. Configure replication to now be back to the original protected site. Remove the recovered VM from VC and delete them from storage at the recovery site. and HDS do this. ii. etc.16. Do not use multi extent volumes since SRM will have issues with them. Some HP SRAs. 22. Cleanup – a. Starting to think that this should be a recommendation. Each tier 1 application should have its own PG / RP. On page 53 of the SRM 1. 18. Failback Outline EMC is providing automated failback tools. I have started to only use SQL accounts for vCenter. 19. Be aware a number of vendors start replication automatically after a failover. as well as be in an RP for a larger group. On the recovery side. Do not co-mingle protected and not protected VM‘s in the same replicated LUN. etc. and SRM and have been very happy with that. iii. Remove the PG and RP you used to failback. I expect to see more from EMC as well. 20. rescan the HBA. You should check out the events and tweak as appropriate. It will likely help most SRM projects. On the protected side.

and than the third tier. but I cannot imagine a real-world situation in which SRM bandwidth is not utterly dwarfed by that of the SAN. This would.vmware. than take the second tier of each of the multiple tier applications.http://www. Which doesn‘t require SRM at this time. The users home drive. I do provide the solution below. There can be brief spikes if SRM's connection to the remote VC server drops and gets reestablished.pdf PTC Windchill Solutions . This is not perfect. but in general SRM consumes very little bandwidth between the sites.http://www. Additional information on protecting View can be found in TECH-EUC-301 from PartnerExchange 2011. A possible solution would involve two URLs – one to the production View environment and one to the DR site View environment.com/files/pdf/FUJI_SRM_Final. there is some traffic between the sites. There was too much scripting for safety. the bandwidth between the sites should be almost nil (just periodic heartbeat/ping messages. would be replicated regularly between the two sites. Don't take these numbers as gospel. However there are changes in View 4. But that has not been developed or confirmed yet. and summaries of changes to the VC inventories). and add them to a second recovery plan.x it was something that was only possible with extensive and complex scripting that I was not comfortable with. SAP . allow the users to have a new desktop with their personal and shared data at the DR site when it was necessary. and add them to a recovery plan. and almost none during unprotection.com/files/pdf/partners/sap-srm-cx-final.5 that should allow SRM to work in protecting View desktops.vmware. During operations such as protection and unprotection of VMs. or shared data drives. but this should likewise not involve more than 100s of KB per VM.pdf FUJIFILM Medical Systems . and than the fourth tier you will end up with 3. or maybe 6 different recovery plans where in each one you can use normal or parallel operation for the best speed. Application References Are there any application references for SRM and application X? This is a spot where I will start to accumulate links to application SRM support or implement guides. 4 5. if you take the first tier of each of the multiple tier applications. but I would estimate this to be on the order of 100s of KB per VM during protection. However. In View 4.http://www. Multiple Tier Applications These kinds of applications often don‘t handle well with our serial or parallel recovery models. It is very well done and can be found on PartnerCentral.Bandwidth Usage I don't have specific numbers. Both environments would use the same version of templates to provide the same application and desktop experience. but it would provide the basic requirement. But he recovery plans ensure that you get the proper tier at the propter time and you will end up with your multiple tier apps working much faster than if you used serial or parallel operations alone. In the meantime. when I talk to customers about this. It is a presentation titled ―Designing Disaster Recovery for View‖ and was done by Matt Coppinger and Mark Benson.vmware. SRM Reference Guide Page 30 of 166 . Once protection is up and running and SRM is essentially idle.com/resources/techresources/10064 Protecting View Desktops This is something that comes up often. with some AD help. The DR site would not have the personalization that the users would or could do in the production site.

and is often requested that VMware provide it.pdf . and it will be a little easier in the future. and protect those VM‘s with another site. I am not aware of at this point (12/31/10) of any vendor plug-ins that can do failback with these two necessary features. perhaps the very next release. but it can be done without much pain and suffering. and not VM failback. Vendors are now providing failback plug-ins for vSphere. 5. To protect VM‘s on Site C. and the same for Site B. and IP Customization (back to the original no less!) and if they do not. but it would need a new SRM instance. It is not that hard – but does have an order of operation to follow. It is important to understand that the majority of them actually do storage failback. they likely are not good enough for your customer. you will need to manually configure SRM to protect the VM‘s. configuration of your backup tool. Shared Recovery is mostly targeted to outsource DR organizations.   Failback (plug-ins) Currently. It would be less messy if different storage were in use compared to A or B. If you have as a destination a replicated LUN it will mean you don‘t need to do a storage vMotion. This will improve at some point. where it will have an API and that would mean scripting of some of the steps above would be possible. It is likely in the future. Below is an outline of the steps to make this work. protected and recovery at Site C. you should remember that VM‘s from Site A would go back to Site A. SRM 4. and there are at least two software packages that facilitate this and Symantec and Acronis provide them. It would work but is messy and thus you should use Site D. Each morning. and Site B. Elsewhere in here (page 29) there is a fairly straightforward outline of how to do failback. make sure they do start order management. It could be A or B or D. What is important about these apps is:  Incremental imaging that makes for short backups which minimize impact  Universal restore which allows or facilitates restore to a new (virtual) hardware platform The general outline of P2V DR would be: Install. See the Shared Recovery documentation at http://www. but with any and all failback plug-ins. Do a test failover and make sure everything works.1 doesn‘t have an API on the protected side. Test the universal recovery in a VM.P2V DR This is something that is often talked about.Physical to virtual disaster recovery . Configure daily incremental backups. 3. SRM Reference Guide Page 31 of 166 . with SRM 4. The idea of P2V DR is to have physical machines recovered as virtual in a crisis. it is a little tricky since it would be using / seeing ESX hosts that are being used by a different SRM. VMware doesn‘t have the ability to do an automated failback. 1.   If you have Site A. If using A or B. 2.x. It is not something that VMware is currently talking about doing. Shared Recovery This was created for our developers by our developers and has since been released as Shared Recovery.com/pdf/srm_shared_recovery. This is often the most acceptable way for an application to be virtualized. you would need to have another SRM install.vmware. Currently. 4.

and move it to the proper test network. However. The isolated VLAN will need to be added to all the recovery side hosts as well as the recovery plan. 4) Isolated VLAN – This is important as it allows testing without impact of production resources. Setup: How we will set this up? 1) Domain controller will need to be in this plan. It will have a completely new infrastructure and you will NOT be able to fail back to it. Below is a sample recovery plan. It should not have extra VM‘s as that could be a problem in a real failover. You will however. 2) XP workstations need to be available as well 3) LUN organization needs to support this – meaning that the LUN(s) have all the necessary VM‘s stored on it. as the whole LUN must be failed over. We need to make sure we think of all of the upstream or downstream services that this app requires. as part of this test would be: 1) Address information: users. That is the whole premise of SRM. It can be a manually executed script. be able to protect your applications and fail over to that site. SRM Reference Guide Page 32 of 166 . A sample recovery plan for testing an application It is sometimes hard to organize information to create a recovery plan or to prepare for a RP. when you bring that lost site back. This is because you have lost all of your hardware and software and will need to buy and build new. For example there may be an AIX or other midsize box that needs to be connected to the test network. You will need to start from scratch and do your replication. it can be very easily adapted for doing a real failover. Note: Make sure to understand required. This new gear will not be the same as the old so you will not be able to fail your storage back. You can use a weekly script to take a cold clone of a DC. build PG / RP and test. groups 2) Credentials to access the accounts – this would need to be current information 3) Domain controller – to provide credentials 4) XP desktops including Outlook 5) Mailbox servers. and it will need to be current so it has the current AD information. While this example RP for testing an app is for testing. SRM Administration Information The information in this section will help with using SRM. Make sure it is a partitioned part of that AIX or midsize box before connecting it. including OWA servers 6) Security servers – such as anti-virus or anti-spyware. Goal: Successfully test Exchange Defined: 1) Send and receive emails using users / groups 2) Book meeting with user / group / room Required: The various things that would be needed. or available to it. Sometimes preparing for a test of the failover is harder than the actual failover. you will NOT be able to failback to it. maybe a PGP server. but it is important to successfully test an application first and that is why we have this section! Exchange Recovery Plan This is a sample to help you get started.A lost protected site and failing back to it If you lose your protected site. You need to have an understanding of the IP scheme. and if you need external connections – and you will likely need them for a conference room or something like that. you can failover to the recovery site and get your people back working.

Stranger (not in test or in your cache) iii. when created. Network contact. Virtualization / server / operations contact. 5) Replication – is it working and is everything in place for us? 6) It is suggest having a detailed to-do list with name / date info to make sure it is done smoothly. Again. Login with your normal account? b. Build Plan . 3) XP VM‘s – must be built. Use the information below to add a script that will work as expected. and approved is very useful to have at the recovery site. Address mess successfully to i. c. or perhaps there is a spare appliance on the recovery side that you can use. Look up phone number for someone? g. Start Outlook client with no errors? c. configured. you can take hot clones. or can we take a subset? And what subset should we take? Test: This is the test plan itself. 1) Isolated VLAN – this covers the network side (cabling and configs) as well as the VI team (virtual switches) 2) DC in or on the VLAN – clone or whatever method you use. 1) Exchange Test Plan Name:______________ Date: ____________ Pass / Fail: _______ a. A form that is signed after the test would be best. They should be tested and in the proper LUN‘s to be available for the test and during the test. And so on. DR team contact d. b. 6) Exchange – this is the subject of the test after all! But do we need to take all of it for the test.com files only! Command line scripts can only call executables. Book meeting successfully with your partner? f. This document.exe‖. 4) Exchange – we need to get copies of the Exchange servers in the test VLAN. Page 33 of 166 SRM Reference Guide .exe or . or if you have a hardware appliance. and have Office on them.5) External Resources – this would be anti – spam or anti – virus. 1) The approval would come from the data owner who is sometimes called the application owner. so after the recovery plan has been executed.   Use full paths to all executables – for example ―c:\windows\system32\cmd. we would use this information to test the application. sometimes they have a spare network port that can be used for the isolated test VLAN. Application owner test representative contact Adding scripts to a Recovery Plan in a call out When you add a script to a call out in a recovery plan. It is important to understand that the scripts or commands must be in the path on the SRM server. Access your mailbox via OWA with no errors? d.exe‖ instead of ―cmd.Infrastructure: this is the information to build out the plan and its infrastructure. 1) Protection groups – make sure proper LUN! 2) Recovery plan – watch order of recovery – DC first for example. Build Plan (SRM) – this covers off building out the SRM infrastructure to support this plan. Partner (in test) ii. You can use . but some for simple communications. Group e. Approval section When this test plan is a written document it should have a number of names on it – some for approval. it is an empty dialog. 2) Some other info would include: a.

please see below to see the environment variables and how they can all be displayed or the page in the admin guide to learn more. So it would look like c:\windows\system32\cmd.log echo Running in %VMware_RecoveryMode% mode! >> c:\scripts\test. To run a batch file you should start the shell command with ―c:\windows\system32\cmd. and executed on the SRM server.SRM server >> c:\scripts\test. For more information on the environment variables I am using in this script. These scripts are executed under the Local Security authority of the SRM server. but yet make changes inside of a VM. In it have a batch file called call.vmware. and some additional information.html and also check out http://blogs.log time /T >> c:\scripts\test.cmd file.html to learn about script placement.log echo Executed on %computername% . In addition they can be stored where you like but likely best to have them on the local SRM disk and not on a remote network share.com/uptime/2010/09/vmware-vcenter-siterecovery-manager-and-scripting-.cmd and it will contain – for example: @echo off date /T >> c:\scripts\test. This is an easy example for the purpose of showing you how to call a script.cmd” Have a c:\scripts folder on the SRM server.cmd that contains: @echo off c:\scripts\test. So write your PowerShell command line as if you were going to put it into a Scheduled Task. You can find a blog article on this at: http://blogs.exe‖.exe /C “c:\scripts\call.cmd In the c:\scripts folder have another file called test. you will need to use something like the VIX API that will allow you to have a script on the SRM server. If you need to make changes inside a VM. What should I the PowerShell command look like to have it called from SRM? You can think of this as a scheduled event but rather than Windows executing it on a schedule it is executed by SRM as required.exe /c ―c:\scripts\alarmscript.log echo Recovery Test %VMware_RecoveryName% Executed! >> c:\scripts\test. You can anything you want from inside of the test. If you use PowerShell scripts you may experience an odd issue – find it and the solution on page 71. Remember that the script file is stored on the SRM server. Example: Add to a script callout with the line: C:\windows\system32\cmd.vmware.com/uptime/2010/08/cana-script-or-message-call-out-stop-a-recovery-plan-and-a-little-bit-more.bat‖.cmd file above.log echo ++++++++++++++++++++++++++ >> c:\scripts\test.log This will execute during test or recovery and create and update a test. You will need to have PowerShell and PowerCLI installed on the SRM server remember! See the example below: SRM Reference Guide Page 34 of 166 .log file with the date / time. But instead put it in the test.log echo VM name is %VMware_VM_Name% >> c:\scripts\test.

It is not obvious or expected. Can a script execution in a recovery plan impact the inside of a protected VM? The scripts that are executed by a RP are held on the local hard disk of the SRM server but can execute against or using the VIX API library and impact the inside of a VM. For more see http://communities. This is due to these fields not being stored in the vmx file. Check out http://blogs. but now during a test failover the error above occurred. This is a documentation bug. There is no other way I am aware of to have a script execute on the SRM server console yet impact the inside of a VM. you can use the command below.psc1" " & "C:\Scripts\MyScript.vmware. What VM parameters are not failed over? This may not sound logical.C:\WINDOWS\system32\windowspowershell\v1.cmd.cmd batch I was trying to execute was actually test. I freshly installed SRM again. I probably would not have seen this error if I recreated the script. I had to rename it – must be a better way – at the DOS prompt and the error went away.com/uptime/2010/08/cana-script-or-message-call-out-stop-a-recovery-plan-and-a-little-bit-more. How can I see the environment variables that the admin guide says are available for scripts? The environment variables that SRM puts into the environment during the test are listed in the admin guide on page 51.exe -PSConsoleFile "C:\Program Files\VMware\Infrastructure\vSphere PowerCLI\vim. But here are the things to be aware of: 1. and the audit trail that SRM provides will not record the execution of the script.ps1" See more info on this at http://www. User designed callout has returned a non-zero value: 1 This occurred in my lab in interesting circumstances. It will be deleted from the SRM beta documentation before GA. This is due to thinking that there may be different SRM Reference Guide Page 35 of 166 . If you have any text in the Notes field in the Annotations area it will be failed over. 2.net/2009/07/10/running-a-powercli-scheduled-task/. Very odd. VM Permissions will not be failed over.virtu-al.com/community/developer .vmware. C:\windows\system\cmd. If you have any fields in Annotations (VM Summary tab).0\powershell. If the script is inside the VM. Will a non-zero script exit in a recovery plan stop the recovery plan? In both SRM 1.exe /C “echo set” This command will echo all the environment variable values to the SRM log file.txt. but there are things about a VM that are NOT failed over. But if you wish to see them in action. 3. they will not be failed over.html for more info on this. and is NOT correct. My SRM server was now Win2K8 R2 instead of Win2K3. I used scripts I had used before with no issues. and my test.x and the next major release beta documentation it is said if a script callout during a recovery plan has a non-zero return at the end of the script it will stop the recovery plan from finishing. I think when I copied the script over this name change occurred. than SRM alone cannot execute it.

always send the entire log bundle that is created with this tool. and it will be copied to the VM during the recovery operation. Find out how to do this at http://kb. However. You can change the true to a false to revert to the normal behavior. and I don‘t want anyone to use the Run button. Run this command on the SRM server.zip format where is it Month – Day – Year – Hours – Minutes. I twill be in a MM-DDYYYY-HH-MM. or later. So back them up in a way that you can restore both of them and what is restored is in sync. What about backing up the SRM databases? You need to understand that you need to keep the two SRM databases in sync. Since this change to the vmware-dr. The thinking was due to the resource decisions / standards in the DR side would be different than on the protected site. Is there another way? If you are using the current GA version of 4.xml file directly you will need to restart the SRM service. Resources – things like memory / CPU reservations / shares are not failed over. but it will not yet protect SRM. This command will produce a zipped file on your desktop. 4. and I am not sure about using the role / permissions to manage this.security thinking in the failover center. They will need to wait for the other logs. So you can edit a shadow VM for the desired resource configuration that is important. Does number of PG impact order of start for high priority VMs? No.vmware.xml file on the recovery side you will need to locate the section <RecoverySecondary>. How can I capture the log and configuration information for support to work with? This is most easily done after Update 1 by the use of the ―Generate Site Recovery Manager Log Bundle‖ command in the VMware \ VMware Site Recovery Manager Start Menu folder. you do have a configuration file option that can do this. Please. and configuration info as well as all of the log files! Where are the SRM server logs stored? They can be found in: SRM Reference Guide Page 36 of 166 .com/kb/1014266 . Can I change the Run button to work like the Test button? I am setting up SRM for computer show. and add to it an indented line that is <testOnly>true>/testOnly> and you will than have a Run button that looks like Run when you execute it. Can I use VMware Heartbeat to protect SRM and VC? You can use HB to protect VC that is hosting SRM. Make sure you make this change on the recovery side. the resources configuration of the shadow VM is copied to the recovered VM. In the vmware-dr. It turns out that SRM will look at all the protection groups that are in the RP and start all of the high priority VM‘s followed by the normal / low as if there was only 1 PG. Always provide the logs with your request for help! I strongly recommend you use this method. It captures things like core dumps. or remove the line you added.1. there is a workaround here in that after a failover occurs. but it in fact is a test. Very often people send to support just one of the support files and support will not be able to help with that. The history report will confirm that.

If something is happening that is generating a lot of info for the lob files. Locate it. Make sure to confirm the number from the index file to make sure you are working with the proper log file.1 logs on Win2K8 R2 servers you can find the SRM log location below. Y is the value for the maximum number of files.wsf . By default it would be c:\Program Files\VMware\Infrastructure\Virtual Infrastructure Client\Plugins\VMware Site Recovery Manager. C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs How do I capture the SRM plug-in log and config info? This can be accomplished by using the command below when you are in the folder that contains the SRM plug-in. In the config folder you will find the Vmware-dr. This is used when you are. In SRM 4. Open the vmwaer-dr.1 (4. 6. and make sure to do this on both sides! These settings are not part of Advanced Settings so you will need to change the VMware-dr.C:\Documents and Settings\All Users\Application Data\VMware\VMware Site Recovery Manager\Logs You will need to check the vmware-dr-index file to see what is the current log file. which is the SRM configuration file. The command is cscript srm-plugin-support. and find the log section which is denoted by <log>. and the other files not in use will be zipped.xml file. 3. <maxFileNum>y</maxFileNum> 5. <maxFileSize>x</maxFileSize> b. When you are finished it should look like the figure below. and the previous one is gzipped. trying to solve an issue with UI scalability.xml file.xml file and restart the SRM service. You will need to add the following lines between the <log> and </log>. for example. I would like to retain the SRM logs longer With the default settings. 2. For SRM 4. X is the value for the maximum file size. The script will produce a zip on your desktop. a. and the number of log files you keep use the information below. 4. 1. Where are the Linux Image Customization logs stored? They are kept in /var/log/vmware/imc and /var/log/vmware-imc folders. In the SRM program folder there is a config folder. you could end up rotating through the 10 files and lose something important. the SRM log files will grow to 5 MB and than another log is started. The instructions to increase the size of files. SRM Reference Guide Page 37 of 166 . There is a limit of 10 files.0) the currently used log file will not be zipped.

requires nothing. I add a network card to an existing protected VM In this case I did have a VM displaying that something needed to be configured. I remove a protected VM from a protection group This will generate an Invalid setting for the VM in the PG it used to be part of. The protection group. What happens when… This section will cover off a number of specific questions on how SRM handles things. You can right click and select remove protection and nothing else is required. I add a new VM to an existing protection group I had a warning that the VM was not protected in the PG. SRM Reference Guide Page 38 of 166 . and when the VM becomes part of a different recovery plan. It works. So I right clicked on it and selected Configure and that was it. I add a new hard drive to an existing and successfully protect VM? Nothing happens. but it required no changes in the RP. is anything impacted? When you make changes like callout scripts or IP customization files to a VM when it is in a PG attached to a RP. In the example above. I add CPU and memory to an existing protected VM Nothing happens. and that PG is attached to a different RP there is no loss of things like callouts or IP customization. don‘t forget to do this on both sides. You do need to wait for replication to finish first before testing this. and keeping 100 copies. It works. This is a situation that is often seen when you have one application in 1 or more protection groups attached to a recovery plan that is specific to that application. we are changing the settings to 10 MB in size. and that protection group is also attached to a companywide recovery plan.These changes will not be active until you restart the SRM service. I did wait for replication to occur before testing but the test was fine. This new VM was at the bottom of the list of protected VM‘s in the Priority level that was specified when configuring. So I right clicked on it and selected Configure and it was fine. Remember that the 10 MB files will be gzipped to a very small size. Make sure no one is using SRM before you do that! Also. After replication I did a test recovery and it worked fine. Nothing in the RP needed to be configured. I did have to configure the PG to protect this VM. You can have an email alert (or an SNMP trap) when a new VM is added to an existing PG. and the recovery plan. What ‘travels’ with VM’s between PG and recovery plans? When I make changes to things like callout scripts or IP customization to a VM.

exe /s /v"/qn AgreeToLicense=Yes DR_CB_HOSTNAME_IP=<DR hostname> DR_TXT_VCHOSTNAME=<VC hostname> DR_TXT_VCUSR=<Windows user> DR_TXT_VCPWD=<Windows user password> DR_TXT_LSN=<site name> DR_TXT_ADMINEMAIL=<administrator's e-mail address> DR_CB_DC=<SQL Server|Oracle> DR_TXT_DSN=<System DSN> DR_TXT_DBUSR=<DB user> DR_TXT_DBPWD=<DB user's password> DR_RB_CERTSEL=1 DR_TXT_CERTORG=<Arbitrary organization name> DR_TXT_CERTPWD=<arbitrary password> DR_TXT_CERTFILE=\"C:\Program Files\VMware\VMware vCenter Site Recovery Manager\bin\<VC hostname>.0-192291. The command line will look like: VMware-srm-1.0 tells the version and build=build-97878 tells the build.0 Patch 3.txt”. The version=1. However.exe /V‖/lve installfull. VMware-srm-4. The log file will be generated in the same folder you execute the command.0. you can use the following command line.exe /s /V”lve installlog.log‖ Automated Install If you would like to have an automated install. SRM 4.0 Installation logs are always created by default and can be found in C:\Documents and Settings\<user name>\Local Settings\Temp\vmsrminst.xml and is found by default in C:\Program Files\VMware\VMware Site Recovery Manager\config .0 You can create an installation log using the command line parameters of /s /V”lve installlog. The file name is vmware-dr. I am told that this is now a test by QA so it should not be missed again. It didn‘t change the build level and thus the log file will not reflect the proper build. It is certainly on my test list! Installation logs SRM 1. You will need to check the Add / Remove to see if Patch 3 has been installed or not. Remember that when you restart the service that you will interrupt anyone working with SRM. For installation logs on Win2K8 R2 they will be in a different location.0.p12\" DR_TXT_CERTORGUNIT=<Arbitrary organization Unit> VC_CERTIFICATE_THUMBPRINT=<untrusted VC certificate thumbprint> DR_TXT_PLUGIN_DESC=<extension description> DR_TXT_PLUGIN_COMPANY=<company name> Changing log details You can easily change the log detail level by editing a configuration file. One exception to this is SRM 1. That location is: C:\users\Install_user\AppData\Local\Temp You can also generate full logs with the command below but you will need to execute it from the command line. SRM Reference Guide Page 39 of 166 .log. but remember to add your own information to it! vmware-srm-<version information>. to have that change read by SRM you will need to restart the SRM service.txt” .0.How can I tell the SRM version from the log files? The first line of the SRM log files will hold the release info.0=<build_number>.

It is now much easier since our great guy in Cork has documented it – thanks Cormac. it will stay protected and its datastore will be added to the PG. A VM.  Storage VMotion.com/docs/DOC-11545 . I would like to have a automated SRM type solution without SRM It would not be automated. but it would be functional but through the use of scripts and senior experience in virtualization / scripting / storage.vmware. Some of the interesting component levels are:  Vmware-dr (DR service)  PrimarySanProvider (protected side array manager)  SecondarySanProvider (recovery side array manager)  SanConfigManager (managed storage configuration – datastore computation) You should confirm changes like this that you make are seen. must have all of the datastores that its storage uses recovered at once. You can set a different level of logging at the sub-component level. Find it at http://communities. Because of this any changes to storage may require editing the PG.com/resources/techresources/1063 . or to info. From least to most reporting the options are: error.Look for the line that looks like: <directory>C:\Documents and Settings\All Users\Application Data\VMware\VMware Site Recovery Manager\Logs</directory> Below it you will find a line that looks like: <level>verbose</level> You can change the verbose to trivia. It is important to understand if you increase the level of detail. You can change the roll over detail by using the information on page 37. verbose. But here is some of the key information. which is not part of any Protection Group. or do the same in the original PG.vmware. or the new PG. It would work well and of course virtualization makes it work well. to be recovered safely. The amount of work should scare anyone into purchasing SRM! How can I have SSL communications between SRM and NetApp By default the communications between SRM and NetApp is not secured or encrypted. Full information on this can be found in http://www. For the most detailed and complete information please see the wonderful KB article at http://kb. What happens when I Storage VMotion a protect VM – or how does changes to VM storage affects protection? This is a very complicated area. or even migration across PG boundaries is generally not good and you will need to revisit those VM‘s to confirm their protection – select the Virtual Machines Tab in the RP and clear any unconfigured errors. which generates less.  If the protected VM is migrated to a replicated datastore. info. the logs will faster and things may rotate and you lose what you need. and trivia.com/kb/1009900. Look for the sections in the config (vmware-dr. which will generate more entries. This has been difficult to encrypt using SSL. The change should be seen in the log as SRM starts.xml) file with the names from below. SRM Reference Guide Page 40 of 166 .vmware. You can have a default level of verbose for the overall log file but one component could be set to something more detailed. You can therefore confirm the change you made has been accepted. warning.

If the file has already been used only new info will be executed. but rather Adapter 0.  For Linux. Make a note of the time that the VM recovery timed out.  A sample command would look like dr-ip-customizer. 7. 2. Also make note if customization kicked off when you logged in.  The Adapter 0 reference is for all of the potential network cards that may be in a VM. and has never been turned on. gateway. it will take perhaps 10 minutes. Make note of the time that you logged into the VM. which is part of some other PG.  You would take the example. Use this process. Inside the VM goto c:\windows\temp\vmware-imc – this path may change depending on the version of Windows. It will generate confusing errors. There should be several logs files in this directory. Insert a Message prompt before any VM would power on. 4. Wait for the Plan to stop at the Message Prompt.  If the VM has been deployed from a template.  For each line. 3. and DNS server.  It appears that if there are values in both line 0 and line 1 that the value in line 1 wins.  If you need to specify multiple default gateways.  Make sure that each line for each VM has the VM name. Page 53 of the Admin guide provides additional info but some useful tidbits are here:  Use the generate parameter to pull info down in the form of a CSV file. It will I believe show a little yellow triangle. and you try to customize his IP during a test or failover.  The recreate parameter deletes then creates. IP address.csv –cmd generate . You may need to find the temp folder to locate the vmware-imc folder.  Do not use the spacebar to clear any fields in the CSV file..exe –cfg . SRM Reference Guide Page 41 of 166 . 6. I have not been able to test this as well as I would like.  Use the create parameter to push up info after you tweak the file.  The account name field for this utility is approximately 25 characters long. and you can fill in the DNS domain. You can solve this by turning it and letting sysprep finish. 5. subnet mask.csv open it in Excel and make add your IP changes. and the Adapter id in it!  Only new customization info is implemented in a Create. you will need to have an additional line to specify the second gateway. do not put DNS domain info on Adapter 1. This will be increased soon – early 2010.  If you use DHCP in the IP column you will not need to add other info. What should I know about using the bulk IP utility? You should make sure your recovery plan is done so the utility can pull our the VM info. If the protected VM is migrated to a replicated datastore. 1. The only information to add to this line is the domain suffix or DNS server. the Adapter 1 reference is for the actual virtual network card in a VM. Troubleshooting IP Customization If you are having serious issues with IP Customization – especially when the VM times out. Run a test recovery. then the VM will become ―invalid‖ and a user will need to re – protect it. Use the VI Client to log into one of the VM‘s that timed out.\config\vmware-dr. VM ID.xml –csv c:\example.  You cannot assign two IP to the same network card with this utility. You will need them. you can check the logs that are in the VM.

every time protection of. have in a protection group)? The Licensing Reporting Manager will help. SRM Reference Guide Page 42 of 166 . 2010 you can purchase either per proc. Likely in the next update of this guide I will provide pictures. or per VM and you can use either but not both. and if you had SRM before Update 1 your license information will be migrated to the Solution Licensing area of vSphere but if you are installing SRM new you will enter the license info in the vSphere Solution Licensing area. The SRM server will continue to work even if it becomes unlicensed – SRM works but no failovers. There is no cross-site license communication so licenses will need to be licensed at both sites – if appropriate. and prior to vSphere Update 1 is released. it works exactly like SRM 4. As well. or unprotection of a VM the same thing will happen. If the licenses expire. SRM 1. How many VM‘s do you want to protect (in other words. Unlike SRM 1. SRM licenses will be added to the License section of the Advanced Settings when you access them via a <right + click> on Site Recovery in the navigation pane (see below). We do not use Flex licensing any longer in SRM. SRM will total the number of protected virtual machines and every 24hours report if there are more than licensed. there will be no failover possible. and there are alarms (page 48) that can alert you if you go outside what you have licenses for.0 is released.0 licensing work? After SRM 4. So if you have 10 protected VM‘s you will need to have 10x per VM SRM licensing. How does the SRM 4. and installed. How does the SRM 4.html . You can find out more in http://blogs. This is an easy system however to understand.0 licenses will not work in SRM 4. There is no longer a host license and SRM will NOT require a license to work but only to protect VM‘s and that is a 25-character license that defines what can be protected.com/uptime/2010/09/vmware-vcentersite-recovery-manager-and-per-vm-licensing. this option will not be available (it will be invisible).1 licensing work? Before sometime in September (2010).0 licensing described below. However. Once vSphere Update 1 is released.x the service will still start.vmware.SRM Licensing Information This section will detail information about SRM licensing. If during an upgrade to vSphere Upgrade 1 you don‘t see SRM in the License area you will need to restart the SRM service. it will be based on per VM licensing. after that point in time.0 but new licenses can be obtained from the customer license portal if they have registered their existing SRM licenses. It should be noted that through to December 15.

Expiring licenses are managed the same way. until the SRM service was restarted. SRM licenses are pooled rather than assigned to specific host CPUs and most elements of license usage are done through periodic reporting rather than through check-in / check-out operations.  We periodically take the list of all VMs in all recovery plans. but not to attempt to strictly enforce compliance. Protected VM‘s are counted whether turned on or not. What does it look like if my VI is licensed for SRM? See the screen show below for an example of a licensed SRM install. The SRM license in vSphere Update 1 or later will look different. and this check is not done when there are no evaluation licenses or they have expired.Evaluation licenses are checked once per 24 hours to see if they are still active. SRM Reference Guide Page 43 of 166 . the last thing we want to do is to have any possibility that a failover would fail due to a license compliance check. we create an alert / warning in VC reporting an insufficient licenses error but don‘t take any specific action. and then compare that against the number of host capacity licenses (SRM_PROTECTED_HOST) in the license file.0 licensing work? The philosophy behind the SRM licensing is to do basic checks to help a customer ensure that the maintain license compliance.  If the SRM license expires a failover would still work. and the state of protected assets is reported to VC every five minutes. SRM doesn‘t try to control that. look to see which ESX hosts they are currently running on. The key elements in the current SRM implementation are:  SRM comes with a built-in 60 day evaluation license  We require an SRM server license (PROD_SRM) for the SRM server to start on protected and recovery sites. See below for an example. If there are insufficient licensee. The customer is responsible for ensuring that they aren‘t using more host CPUs than they have licenses for. After the restart no failover would work. Given that SRM is a DR product. How does the SRM 1. count the number of CPU sockets in those hosts.

this user will be constantly logged into the local VC server.x. While it uses FLEX licensing. but it is a SRM Reference Guide Page 44 of 166 . Again. this might be due to an odd issue that SRM has with licensing. Please note that this should be an account in the Administrators group. if the licenses expire you can still failover.0 U1. After restart the SRM service will not restart. The service will restart. and when you log into SRM.lic file in the Licenses folder and reread the license file(s) you will not see something like the screen above until you restart SRM! What does it look like if my vSphere is licensed for SRM – after Update 1? See the screen below for an example of a licensed SRM install. This is the account SRM will use to communicate with the local VC server. What will happen if my license expires? First it is important to understand that it is recommended to use SRM alerts to make sure you are not surprised by your SRM licenses expiring! In SRM 1. In SRM 4.If you do not see the licenses you expect. What is the account that is asked for during install used for? The 1. Please try to use AD accounts when you install SRM.0 or SRM 1. Using local accounts can work. all accounts in the Administrators group have complete access to SRM managed objects. this has not changed with U1. Changing the password for this account will make it impossible to use SRM.0 installer prompted for a username during installation. By default. when you install SRM 1. Since SRM constantly monitors the local VC inventory. if you only drop off the .x failover will not work if the license expires. unless you restart the SRM server at the protected side after the license expires.

but rather just ask for the answer.com/kb/1021031 .x at http://www. If you check the compatibility guide these are not mentioned – thus are not supported. So you could SRM Reference Guide Page 45 of 166 . SRM 1.vmware. Can network customization work for operating systems other than Windows? Yes. There is some excellent information to help you be successful at http://viops.x I would like to change the IP info for the SRM server once it is installed. The new URL path is http://communities. the order of recovery VMs is not as obvious as it may suggest.xls.x at http://www.vmware. This account is NOT the account used by the system – the SRM service uses the Local System Account. Understanding order of operation for bringing VM’s back online During the recovery period. Also be aware of http://kb.0 adds in Ubuntu as well to the Linux flavors that can be customized.x can be found at http://www. Once the change is done you will also need to pair the two sites again. This includes operating systems from Novell.vmware.vmware.com/home/docs/DOC-1261 . and Red Hat. I would like to use trusted certificates with SRM – help! You can use your own trusted certificates with SRM but it is more complicated than you might expect.com/files/pdf/Site_Recovery_Manager_1. This is a great example of how people don‘t look for answers.com/kb/1008390 and http://kb.vmware.com/docs/DOC-11411 . How do I plan for disk utilization due to SRM database? Recently we brought out the database-sizing tool. in Appendix C of the SRM Admin Guide. Is Essentials and Essentials Plus supported for SRM? This is an interesting question.com/files/pdf/Site_Recovery_Manager_4.vmware. or if the credentials (account or password) need to be changed you will need to use a special utility to accomplish either of these changes.0_Database_Sizing_Calculator_ORACLE. And I checked to confirm – not supported! The lesion here is if something is mentioned in our compatibility guides it is supported (perhaps with caveats that would be listed) but if it is not mentioned it is not supported.com/files/pdf/Site_Recovery_Manager_4.0_Database_Sizing_Calculator_SQL. I have heard it a number of times.xls / The Oracle one for SRM 4. You can find the SQL one for SRM 4. Normal and Low priority protection groups (VMs) will be started one VM per ESX – up tto a limit that varies according to version of SRM / VC – see the next point – How many VM‘s can SRM start. Is this safe or is there a specific way to do this without issues? When changing the IP info for the SRM server. You can find detailed info on how to do this on page 85. Can I change the IP information for the SRM server? SRM 4.little tricky.vmware.0U1_Database_Sizing_Calculator. xls . The specific version information can be found in the SRM Compatibility Matrix document. Find it for SRM 1.x You can use the Add / Remove in the Control Panel to start the install tool and redo all of the install configuration information. If you need some guidance on using local accounts I can help. SRM 4.

However. An example of when to use it. You will likely need to add a new section to the vmware-dr. and run the recovery plan again and if you have correctly addressed the error your test may in fact correctly complete this time. Misconfiguration of the security for storage arrays may impact the start order of VM‘s.1 – and later) you have access to a vmware-dr. 2 VM’s per host? Yes. and some array reconfiguration is required. With the current version of SRM (4. Do not change the default unless you have carefully thought about it. See the information below to change the number of concurrent power on VMs value. but the SRM Reference Guide Page 46 of 166 . and that this will not solve all issues. or less than. With SRM 4. Can I start more than. The Repair option would allow you to correct the credentials and continue with the failover recovery. that means only 10 hosts can start VMs simultaneously.0. if the security of the array means it cannot talk to a particular ESX host than that host will not be used to start VM‘s during a recovery plan. It is possible to see this without any obvious error messages! How many VM’s can SRM start? This is something you may need to be aware of when you have a very large SRM install. For example.xml file setting that can tweak the number of concurrent power ons for a host. there is a limit of 20 VM powering on at the same time. did a manual HBA refresh. and SRM shares that limit which means 16 hosts can power on VMs.5 has a limit of 16 VM‘s started at the same time. You could then address and solve the error.0. Once I had a problem with a VM starting and I let the replication finish. and understand the impact! Also be aware this is not a supported change (currently) but it is always easy to change back if necessary. Bear in mind that Virtual Center rules apply. Default is 2. This change is not in the UI so it does require a restart of the SRM service. If it is already there that is fine. but if it is not available than the repair button can be used. What does the Repair button do? The repair button is used when the protected site is not available. you may want to start less than 2 VM‘s concurrently. It will not redo things that it has done correctly already. In my lab with Nehalem processors I can easily start more than 2 VM‘s concurrently. is when the protected site is gone. Is it all over when the recovery plan fails? You can have a test recovery plan fail with some sort of error.have a number of Normal priority VM‘s starting at the same time – but spread across various ESX servers. The two VM‘s that had already started were not touched. If you have 45 ESX servers at the recovery side and you expect to use all of them to restart recovery VM‘s it will not happen. Normally it would be done at the protected site.xml file on the recovery side. since it works with vSphere 4. but it will complete anything that it can complete. VC 2.0. In the <config> section add: <Recovery> <powerOnsPerHost>x</powerOnsPerHost> </Recovery> Where x is the value of the number of concurrent power on operations. and thus with SRM 4. and you realize last week you change the storage credentials and that is now stopping you from recovering. High priority starts VM‘s serially regardless of how many hosts are involved. and tried again.0 starting two VM‘s per host. but if you have small processors and small amount of RAM per host.

xml file on the protected side. It would be good to avoid if possible but it can be done.com/kb/1008426 . and restart the SRM service.third VM that had just finished replicating. Can I move an SRM server to a new host? This is possible but requires a number of detailed steps.  Now do the same thing on the recovery side. Full info can be found at http://kb. SRM 4.  Below is an example. was in fact started. How can I configure a second HBA rescan? I have been told that my particular array will need a second rescan for my failovers to work.  You will need to add a <hostRescanRepeatCnt> element in the <SanProvider> element. <SanProvider> .x This is easy and can be configured. See the graphic below. this may perform differently as it depends on the storage and what stage the issues occur in. SRM Reference Guide Page 47 of 166 . In a non-test failover. HP has confirmed this is one of their requirements.  Make sure no one is using the SRM Plug-in. . .x This can be done in the Advanced Settings that is available after a <right + click> on the SRM lighting bolt icon. Use the steps below:  Edit the vmware-dr. Advanced Settings – right click on Site Recovery seen at top left SRM 1.  The value of <hostRescanRepeatCnt> should be set to 2.vmware.

html .0).vmware. You can therefore confirm the change you made has been accepted. and I will update it as necessary. Most organization will utilize email notifications but there are other choices as well. It is at http://blogs. Some of them are not necessarily appropriate on both or either side. a new not replicated VMDK or CD ISO). So the quicker you acknowledge the quicker the plan will continue.x You should confirm changes like this that you make are seen.1 (4. Recovery Plan Prompt Display – important as this may stop the RP.com/uptime/2011/02/recommended-alarms-for-srm-admins-to-watch. So if you set to be alerted on Remote Site Up.             Remote Site Down Remote Site Ping Failed Replication Group Removed Recovery Plan Destroyed License Server Unreachable (SRM 1. You may want to consider as well:  VM Protection invalid – I am not sure what triggers this one! With SRM 4. Recovery Profile Prompt Response – so you know that something has been acknowledged.x) Recovery Plan Started / Recovery Plan Execute Test License Expiring / License Expired Protected VM limit exceeded VM Added (and waiting for you to protect it) VM Not Protected (meaning the VM has been added. these alerts are not part of the improved vSphere environment.com/kb/1008283 as it is now in the kb. but something about it requires additional work – for example. Remember to set these suggested alarm notifications at both sides – as appropriate.x and 1. or just a VM from being recovered. See http://kb. SRM VirtualCenter events SRM will raise VC events for the following conditions:      Disk space low (on the SRM server) CPU use exceeded limit (on the SRM server) Memory low (on the SRM server) Remote Site not responding Remote Site heartbeat failed Page 48 of 166 SRM Reference Guide . You can set them on the Alarm tab of the SRM status summary page.vmware. Recommended minimum alarm notifications We suggest the following alarm notifications.<hostRescanRepeatCnt>2</hostRescanRepeatCnt> </SanProvider> SRM 4. Check out my blog for more information on this. The change should be seen in the log as SRM starts. you will be alerted very frequently! Remember that these alarms are configured at both the protected and recovery sites.

where the default is 80. non-production virtual network segment. the works – for this to be practical. No issues.  The server may not be in the same cluster as the production server. and that you can run the two production instances (primary and failover) simultaneously when recovering from the disaster. where the default is 32. SRM Reference Guide Page 49 of 166 . CPU or memory in the vmware-dr. ended.xml file in the SRM config folder.  The server must be turned off except for (i) limited software self-testing and patch management. The product use rights for the software and the following limitations apply to your use of software on a disaster recovery server. For each instance of eligible server software you run in a physical or virtual operating system environment on a licensed server.  Recovery Plan Test started. the minimum disk space is 100 MB and you may wish to have it 500 MB. that‘s great. ended. failed. The trick is that every software component must be under SA – OS. This is the language from Microsoft‘s use rights document: “Cold” Disaster Recovery Rights. or cancelled Virtual Machine Recovery started. For example. or (ii) disaster recovery. You can change disk. Is thin provisioned VM’s support with SRM? Yes.  Disk – (minDiskSpace). Its probably not the best approach to licensing SRM.  Your right to run the backup instances ends when your Software Assurance coverage ends. here‘s what I recommend: 2. That‘s 25% of the license fee per year. 1. succeeded. Be completely sure that production IT traffic cannot leverage resources in the SRM development environment. The bad part is that it is extremely expensive. If not. or reports a warning Some of these can be changed in how they are triggered. failed. The good part of this grant is that it makes it completely clear that you can run DR testing all you want to prepare for a disaster (―limited‖ isn‘t defined – its purely a matter of what the licensee chooses to do). If the customer already has SA everywhere. What does Microsoft offer for licenses for DR test? Microsoft offers what they call ―disaster recovery rights‖ for every license that you have covered by software assurance. apps.  CPU – (maxCpuUsage). due to high cost.  You may run the backup and production instances at the same time only while recovering the production instance from a disaster. for every license. This language was developed before SRM and virtualization made DR practical to test and tune on a broad scale. Search for the terms below (in vmware-dr. succeeded. Create the SRM development environment in an isolated.xml) to see where to make the change and than restart the SRM service. where the default is 100. you may temporarily run a backup instance in a physical or virtual operating system environment on a server dedicated to disaster recovery.  Memory – (minMemory).

netapp.html .com/?tk=3Z18DEF34F7873223089B0D956F6EBD8 . This will work with Oracle. Acquire enough Microsoft Developer Network (MSDN) subscriptions to license the OS and applications that will be used in the DR site. 4.cfm?pn=VMware&bhcp=1 and a direct link to the product datasheet can be found at http://www. Exchange. This includes the HP SANs that are based on HDS gear. When the customer is ready to test production failover. and Exchange this can be an issue. have this capability but I don‘t know much about it. and when SRM starts them. EMC has a tool called Replication Manager that works with at least four of the five different EMC replication technologies. This is a critical issue as it means that complex and modern applications will be replicated at best. However. I do not know if there are options that are manual or using scripts. IBM DB2. LeftHand Networks. but I am unable to find what applications it supports. SRM Reference Guide Page 50 of 166 . with crash consistency. the condition of the VM will be crash consistent.htm . from what customers and storage people tell me. Oracle.com/en/pages/index. NetApp has a tool called SnapManager for Virtual Infrastructure that has agents and can do application consistent snapshots that are replicated.3. SQL. any agent based system. This is not always an issue but for sophisticated applications such as SQL. The customer would need to either wait 90 days before testing the ―recovery‖ phase. please send it on to me. Oracle and SQL. FalconStor has something called Snapshot Director for VMware and more information on it can be found at http://www.falconstor. but are fully functional and allow any development. good links or other info.falconstor. and Lotus Domino. non-production use. Test and tune SRM using MSDN licenses until it works as desired. The ―failover‖ test is permitted – the customer will re-assign all their licenses to the disaster site hardware.com/us/products/managementsoftware/snapmanager-virtual. Hitachi SAN‘s do not have. Microsoft rules state (with some specific exceptions) that re-assignment may not be done more than once every 90 days. I think an important note is that many corporate accounts have SA in enough volume to make this test process not an issue.com/products/detail/software/replication-manager. What vendors have application consistency options? When storage arrays replicate VM‘s to another location. The vendors below can help avoid these issues by using application consistent technologies in the replication. Dell has something called the Automated Snapshot Manager for VMware. or ask Microsoft to acknowledge that they can test this critical business function without violating the terms of their license. they may want to ask for permission from Microsoft to re-assign their licenses on a short-term basis. It has agents that can provide application consistency for a variety of applications including Exchange. These are very low cost. If you have knowledge. See more at http://www. See more at http://www.emc.

as if we were doing periodical replication). So bottom line --> VM's are quiesced. as opposed to having to wait for an upcoming replication session (when using Periodical).‖ As I learn more about this I will share what other vendors can do. So the end effect is you get incredible RPO using Continuous Replication (any-point crash consistent state).What vendors have application consistency options that work with continuous replication? This is a little different in that with continuous replication it is hard to use agents to work with the point in time snapshot because the replication is perhaps real – time or maybe every 2 seconds – so there is not enough time to work with the agents to product application consistent snapshots. instead. but instead of waiting for a "replication interval" (as opposed to Periodical replication). We truly quiesce the VM's applications at the Protected site. but no NSS Snapshot is created on Protected Site. but you also get the benefits of amazing RTO through "application consistent snapshot" via periodical snapshot markers that are trickled down to the DR site via continuous replication (but these quiescent application consistent snapshots are still periodical. when the remote site processes the bookmark. compared to using our "Periodical Replication". The TimeMark is then created on the DR Replica disk. we just insert a bookmark in the I/O Journal queue. This will give a user the ability to trigger fail overs without being a VC admin user. So everything will show up in crash consistent. When I asked one of the architects at FalconStor about this. is that instead of creating a periodical replication point. but without sweeping rights. and as the journal is flushed out to the remote site. I got what is written below – thanks very much David! ―For the Continuous Replication. it can be a little tricky. we create a "Snapshot Marker" on regular intervals. we do not play the snapshot quiesce action "offline". As for the Continuous Replication question from your previous email. HDS has the ability to have application consistent continuous replication for physical machines but not virtual machines at this time. the main difference. the "state pointer" which is like a bookmak (aka Snapshot Marker) is inserted into the CDR Journal (Continuous Data Replication Journal) at the time right after the filesystem flush. and a PSO guy confirmed it was accurate. the way we can achieve better than Crash Consistency is through our "Snapshot Director" (virtual appliance) and our Snapshot Agents (loaded in the VM's). The rights that are required are: Protected Site  Read-Only at the vCenter root (Virtual Machine User – Operator)  Read-Only at the datacenter inventory object (Virtual Machine – Operator)  Protection Virtual Machine Administrator role at the virtual machine level (propagate) applied to VM folders SRM Reference Guide Page 51 of 166 . and that marker gets replicated instantly to the remote site (the TimeMark is then created on arrival on the REPLICA volume). it creates the TimeMark on the Replica disk. A customer figured this out. It likely has extra permissions due to the way it was figured out. and replicated immediately. What rights does a user require to be a DR operator? If you want a particular user to be a DR type operator and trigger plans. thus spaced apart. almost right away.

and then click Modify. In Registry Editor. You can make a change to a global setting that can increase the 30 to 60 seconds and it appears that will solve this issue. Use the steps below to make this improvement.editplus. and then right-click the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control 2. notice that New Value #1 (the name of a new registry entry) is selected for editing. It is very popular with developers. Point to New. Now you can restart and you should have no issue with SRM starting since there is more time for SQL and VC to start. SRM Reference Guide Page 52 of 166 . while this is possible with other editors I only show you with these two. In the right pane of Registry Editor. 2. Didn‘t work at customer unless assigned at cluster level. Recovery Site  Recovery Inventory Administrator role at the vCenter root  Recovery Datacenter Administrator role at the datacenter level (propagate). and then press ENTER. Power ON/OFF and Reset)  Recovery Virtual Machine Administrator at the resource pool and folder levels (propagate). Much less work now that I have done it for you! On the Mac you need to use TextWrangler (http://www. If necessary create a folder called Language Modules in Your ~\library\Application Support\TextWrangler folder. and then click OK (value is in milliseconds). 1.com/ ). Move your file to this folder. In the Value data text box. Right-click the ServicesPipeTimeout registry entry that you created in step c.  Recovery SRM Administrator at the SRM root level (propagate)  Recovery Plans Administrator at the SRM recovery plans level (propagate). Reset Guest Information. Thanks to Scott for this great info! How can I have syntax highlighting to help read SRM log files? This is very useful and can be done on both the Mac and PC‘s with a little work. This is due to the Windows Service Control Manager expecting a ―Service started successfully‖ message in 30 seconds. Type ServicesPipeTimeout to replace New Value #1. locate. The Edit DWORD Value dialog box appears. and event logs show errors with event ID of 7000 and 7009 This will not normally be seen in a production environment where SQL / SRM / VC are well designed. SRM service doesn’t start. Text Wrangler In the appendix there is a sample file that you can copy and paste to create a text file called log. Include Virtual Machine Interaction.plist. Assign VM to Resource Pool. and then click DWORD Value. but in a lab with limited resources this can and does happen. 5. 3. TextWrangler is freeware but EditPlus is only shareware. Host CIM and Rename Datastore  Recovery Host Administrator role at the host level and cluster (include Browse Datastore. Than use the following steps to make it live. type 60000. 1.com/products/TextWrangler/) and on the PC you need to use EditPlus (http://www.barebones. Console interaction. 4. Protection SRM Administrator role at the SRM site recovery root level (propagate)  Protection Groups Administrator role at the SRM protection groups level (propagate).

SRM Reference Guide Page 53 of 166 . 3. In the Syntax file section you should load your log. See below for a completed set of preferences as well as a sample file. 2. 4. Use the following steps to make it live. You should now be able to open a file that has an extention of .stx. EditPlus In the appendix there is the information to copy and paste that you can use to create a text file called log. you will need to define a log file type. 1. See below for the end result. In the Suffix Mappings section you can map the .str file.log to the Language Modules – called Log due to the filename.log and see words like error in color. Now in the Documents \ Permanent Settings we need to add this new file into EditPlus. You will need to copy this file to the C:\Documents and Setings\user_name\Application Data\EditPlus 3 folder.3. Under the Settings & syntax menu. 5. 4. In the File extensions section add the log type. The Preferences file can help adjust as necessary.

Often the errors visible in a Recovery Plan or history report are in English and you can start with the error message in your troubleshooting. but remember to check either the production site or the recovery side as appropriate. but generally all of the standard output or input of the SRA are captured to the SRM log. SRM Reference Guide Page 54 of 166 . Things to watch out for There are a number of things to check when troubleshooting an issue. Don‘t forget that SRA‘s have logs too.Troubleshooting This information will help with troubleshooting of SRM and SRM related issues. The SRM logs are very useful as well.

The default is 300 seconds (5 minutes – see the relevant section of the file below. Timeout errors can be found in the SRM log on the recovery side. This can mean a problem occurs that stops SRM from starting before it can touch the logs. There are some odd things that you need to remember. Use the Array Manager configuration LUN view to see if there are any clues. Remember that after you make a change to the vmwaredr. Some of these may occur naturally so be careful. Always check the release notes as well! How can I change the command Timeout? I am using an EMC SRA and I have heard that I may want to extend the execution timeout so that I can avoid timeout errors.vml file you need to restart the SRM service. That generally has helped me. but nothing is seen in the SRM logs. SRM Reference Guide Page 55 of 166 . failure or warning. Sometimes it is worth starting vmware-dr. Search for CommandTimeout to find where to make the change. You can use the date / time of the report / error to look for information. Some other things that may be useful to search for include credentials. The start of a recovery plan in the logs looks something like [-1] CHILDREN . such as an attached CD can be an issue in a failover. This gives you access to storage that you will not have when the RP is complete and cleaned up. If the Create Protection Group is grayed out that generally means that SRM cannot see the storage.xml file that is by default in the C:\Program Files\VMware\VMware Site Recovery Manager\config folder.You can often search the log for things like error] but also you can search it for what you see in the history report.exe to see if you can see anything that can help. it is useful to troubleshoot when the Continue function has not be issued so the RP is in effect paused. Always make sure that the SRM compatibility for compatibility in your situation (with things like ESX patch level or SAN compatibility) but also do not forget that the SRA often has prereqs that you need to worry about. Look in vmware-dr. This is particularly useful when you have tried to start the SRM service and it fails. RootStepList-xxx HAS Also. where possible.

850 02940 error 'Vdb'] Connection: Could not connect to database: -1 [2010-08-02 12:19:21.0 Update 1 you should see a Run and a Test privilege in the roles and privileges area but you may not.0 you will make this change in the Advanced Settings dialog and will not need to restart the service.740 02940 info 'App'] Set dump dir to 'C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\DumpFiles' [2010-08-02 12:19:21.0 there is an option in the UI to fix this situation. You can fix this easily by accessing the Protection Group that host these VM‘s on the Protected side and configure the VMs and they will be created again on the recovery side.0. [2010-08-02 12:19:21.0.I have heard that if you go from 300 to 1500 that some EMC SRA‘s will not error and will work.0][SQL Server]Login failed for user 'VMW-NE\SITE1-SRM01$'. For SRM 4.740 00560 warning 'App'] Failed to create console writer [2010-08-02 12:19:21. nor have I heard that it does solve the problem. In SRM 4.850 02940 warning 'App'] DBManager error: Could not initialize Vdb connection: ODBC error: (28000) . and will require you to redo your IP customization. The VM name will match before and after the Shadow VM creation and re – creation but the ID will not.421] vmware-dr service stopped SRM Reference Guide Page 56 of 166 . Important Note: When your protected VM‘s have their shadow VM created they will have a new ‗number‘ assigned to them which will cause you issues with IP customization. This updated 4.850 02228 info 'App'] [serviceWin32.1. [2010-08-02 12:19:21.0. Where is the new Run and Test privileges? After you update to SRM 1. I have not yet tested this myself. SRM will replace them with the ‗real‘ VM‘s during a failover as well as they are placeholders for you to know where VM‘s may end up at some point.17) that is supposed to solve this issue. pid=1348.0. build=build-267817.0][SQL Server]Login failed for user 'VMW-NE\SITE1-SRM01$'. The information previous is for SRM 1. Section for VMware vCenter Site Recovery Manager. RP customization around these VM‘s will be lost if this is necessary.[Microsoft][SQL Server Native Client 10.740 02940 info 'App'] Intializing the DBManager [2010-08-02 12:19:21. Restart VC and you will see them. option=Release [2010-08-02 12:19:21. You may have an issue like below in bold. It is related to the name of the Celerra VSA but it occurs with physical arrays as well! As of 1/29/10 EMC has released an updated SRA (4.0.[Microsoft][SQL Server Native Client 10.19 SRA has been released (May 2010) and is said to fix this issue but I have not confirmed that myself. You will need to recreate your CSV and reassign your IP customization. and database access issues If you have some issues with not being able to start the SRM service it may be an SQL server account issue. I have accidently deleted my Shadow VM’s – what should I do to fix this? The shadow VM‘s are important for several reasons. version=4. But it is supposed too! 3/26/10 this error is still in the wild.850 02940 error 'App'] Application initialization error: Could not initialize Vdb connection: ODBC error: (28000) .19 that supposedly fixes the issue but it has not been released yet. and the error has a –null in it This is due to a bug in the current Celerra SRA and in the second or third week of December 2009 there should be an update SRA that doesn‘t have this error. My Celerra prepare storage fails. SQL Authentication. There is a new SRA 4. I also know that the next generation of EMC SRA‘s will be much faster.

Remove the ESX 2.vmware. Why does my recovery plan show error on VM status but the VM’s are ok? The reason for this error was introduced in Update 3 of ESX. Find out more at http://kb.     Start in the ESX Service Console at the command prompt.vmware.When you use Windows Authentication to access the DB you must run the SRM service as the DB user account.5 host from the datastore.0. You will find info below about avoiding the Shutdown Tracker situation. What causes the Recompute Datastore Group task? A number of things can cause this including  Existing VM is deleted or unregistered  VM is storage VMotioned off  New disk is attached to a VM on a datastore previously NOT used by that VM  New datastore is created SRM Reference Guide Page 57 of 166 .com/kb/1003490 . <vmsvc> <heartbeatDelayInSecs>40</heartbeatDelayInSecs> <enabled>true</enabled> </vmsvc>    You will need to restart the management agent to load this change. If not you will need to add the complete section. When using SQL Authentication.com/kb/1008059 . It is important to note that this may cause an issue with restarting VM‘s. We adjusted the frequency of how often we check for VMware Tools heartbeat. This was fixed in Update 1 of SRM. Edit the /etc/vmware/hostd/config. See more about this at http://kb.com/kb/1006651 . the history report does show errors on the VMware Tools Status. and once it is located change the value of XX to 40.0 and you will be fine. Avoid this by disabling the automatic start up and shutdown of VM‘s. I am not aware of at this time a way to implement this change in ESXi. instead of adjusting the Recovery Plan wait of Tools timeout. but the test will take longer! You can also use the following information to for a better fix.5 accessing protected datastore will cause recomputed datastore failures If you have ESX 2. Upgrade to it or SRM 4. While the VM‘s are recovered successfully. The command is service mgmt-vmware restart and followed by service vmware-vpxa restart.1 timeframe. as of 1/30/09. 8/8/10 AOK.vmware.xml file. which will look like below. you can leave the default local System user.5 hosts accessing a protected datastore you will see datastore recomputed datastore failures. In addition. You can adjust the Recovery Plan Response Times wait for OS heartbeat from 300 to 450 and you will get rid of the error status. Why cannot I customize Windows 2008? This ability was added in at the Patch 4 of SRM 1. This recently become a supported fix – http://kb. there is a fix so this manual work is not required. You will need to change the vmsvc section (which will look like <vmsvc>) There may be a line that starts with <heartbeatDelayinSecs>XX. ESX 2.

New VM is created. This error can also occur with any storage vendors if you have not restarted SRM after installing their SRA. This sometimes looks poor during a demo.secondary. and than trying to re-IP during a test or failover. [#2] host = <unset>. and System. dr.".252 06140 verbose 'Replication'] Creation of shadow VM failed with error (dr.  Existing datastore is expanded. or resource pool are not compatible at the other side. Now it should work much better. This was fixed in 4. For example.ResourcePool:resgroup-392'. Your SRM install could be to D:\ and your EMC solution Enabler could be installed to C:\.fault. [#2] faultCause = (vmodl. between an ESX host with VM‘s that are at VH7 level. with an ESX 3 host. This will not occur with Excel 2007. which will screw up your columns. The problem is that Excel 2003 will strip off the last comma. [#2] resourcePool = 'vim. turning on and letting the deploy finish. [#2] } You will need to discover what the resource group 392 and datastore 2840 are to discover where the conflict is.0.WrongVmInventoryPlacement) { [#2] dynamicType = <unset>. Why is my IP customization taking about 10 minutes extra per VM? This has been seen to occur when a VM has been deployed from a template and never turned on. It is being worked on and hopefully will be addressed soon. resource pool and datastore are not compatible. When using Bulk Import I get column errors This can occur if you edit the . My Linux VM’s don’t have the host file changed after IP customization This is a current bug in SRM 4. The log will have errors like: [2009-09-19 18:27:04.fault.secondary. Basically it means that host.MethodFault) null.Datastore:datastore-2840'. network. you will see a request prompt when you log in after a failover about why the computer went down. It turns out that sysprep is trying to run while we use sysprep to re IP.CSV file with Excel 2003. It is in Computer Configuration.1 and later. Unable to find any array script files – Please check your SRM installation This can mean a few things. Administrator Templates. [#2] datastore = 'vim. This would be done using the following command: https://vc_recvoery_side/mob/mob?moid=datastore-2840 SRM Reference Guide Page 58 of 166 .WrongVmInventoryPlacement This error can occur when you are creating a PG and have mapped inventory items that are not compatible.0. [#2] msg = "Host. The IP customization on the Linux VM does actually work except for the change to the host table. How can I avoid this? You can use GPO to avoid this. This can be avoided by deploying. I would like to avoid the messages about shutdown By default.

you can change from the limit of 1 simultaneous plan to 3 with a registry change. The registry change should be on the SRM server. with no support. What is a baseline? Below is a guideline from our QA department. Your mileage may vary. I can in fact do it with other SRAs so I cam curious. If you have an issue at approximately 82 or 84% you should make sure that the account you used to connect to the Recovery site has both VC and SRM admin rights. The key is HKLM\SOFTWARE\EMC\MirrorViewSRA\Options\NumSimultaneousInvocationsAllowed with a value of 3. 100 VM‘s in 7 minutes 200 VM‘s in 15 minutes 300 VM‘s in 22 minutes 400 VM‘s in 27 minutes 500 VM‘s in 35 minutes What time guidelines can I expect for failing over VM’s? This is something that varies a lot. for experimental purposes only.You would change the datastore-2840 for the other variables as appropriate. This is (as of 12/12/09) correct and a precautionary measure to provide better response time while running the recovery plan. SRM Reference Guide Page 59 of 166 . Things to check during troubleshooting of pairing issues would include firewalls between the sites and is the recovery site running VC successfully? I cannot run more than one simultaneous recovery plan with my MirrorView SRA I need to run more than one recovery plan at a time so that I can cut my RTO. and in a lab. IP customization and script execution can add minutes per VM as well. and storage performance can make a big difference as well in terms of how fast we can have a recovery plan complete. In the future NaviSphere engineering will make some design improvements that will allow additional simultaneous recovery plan operation. The specific role for SRM is Protected Site Administrator and on the Recovery Site it is called Recovery Site Administrator. especially dependent on your storage. But I have not been able to do that with my MirrorView SRA. What time guidelines can I expect for protecting VM’s? I wonder if my SRM is taking longer than it should for creating a protection group. The Administrator role includes both the Protected and Recovery site admin roles. Pairing Issues If you have an issue at approximately 24% it could be related to the license file not being live or installed. NFS prepares for use much faster than VMFS FC or iSCSI storage. Reread the license file or restart the license service. In the meantime. This issue occurs most in a Microsoft domain world. For example. The processor / memory of the recovery hosts.

from 12 protection groups.xml recreate SRM Reference Guide Page 60 of 166 . There is a Patch (Patch 2) that can solve this issue. Use storage troubleshooting to figure it out. I have seen this with the MirrorView SRA and its odd ports. SRM 4. So the test time indicated below is very rough and should not be taken seriously. Time for a test (including clean up) 12 minutes Comments 8 Windows 2 hosts on recovery side.Important note: this information is for discussion and not indicative of what you can expect. but the datastore cannot be found. How can I re-initialize the SRM database SRM 1. Remember there are a lot of variables! But the info below can be used for understanding if you are seeing good numbers or not. Patch 3 should help this situation further. 8 Windows Yes / Yes – all 29 minutes 2 hosts on recovery side.exe . In addition.. the boundary of working and not working in the storage world and than deal with that. I have no other info on that. in 120 minutes. When trying to do Inventory Mappings the VI Client hangs This can occur when you have more than 7 ESX hosts. You need to find. But I am looking for more information for this section. There are MANY causes of this error.0 improved this a lot! “Failed to connect to the management system address when executing the discoverArrays command.0 You can also do it using the commands below: Cd <SRM bin folder> Initdb. check the storage and confirm it is readable and has VM‘s. and yet take different times. Number of VM’s Scripts and or IP Customization No / No – none Storage Information Virtual FalconStor running on dedicated physical FalconStor hardware.\config\vmware-dr. You may also need to check routing between the sites (in particular to the Recovery side SRA / storage management interface. Another important note: I have seen tests where I thought everything was the same. sometimes different by even 5 or 6 minutes. A customer reported to me that he failed over 100 virtual machines. This can occur after storage is mounted. That is harder to solve but information in the design section can help. Before continuing the test.” You should not often see this but it can be addressed by making sure the SRA is in fact installed on the recovery side. as well as with RecoverPoint. Virtual FalconStor running on dedicated physical FalconStor hardware. This can still occur when you have hundreds of ESX servers or thousands of VM‘s.

This is not something you need do often. In fact I never have. It would be perhaps useful if you suspect your database information is corrupt. SRM 4.0 In SRM 4.0 you can use the Change option for SRM in the Add / Remove control panel applet and it will allow you to make a number of changes including VC account / password, delete the contents of the SRM database and more.

Error LUNs with duplicate IDs or numbers received from SAN integration scripts
This occurred adding an array in the array configuration manager you may see this error in a popup window. In this example it occurred in an EMC Symmetrix and SRM 1.0 U1 environment. In the SRM logs you could see the same WWN for all LUNS. You will need to talk the storage team and make sure the correct flags are set on ALL FA ports. EMC will normally recommend the following flags set on all FA ports in an ESX environment.       Common serial number (C) Auto negotiation (EAN) set Fibrepath enabled on this port (VCM) SCSI 3 (SC3) set (enabled) Unique world wide name (UWN) SP-2 (Decal) (SPC2) flag is required

Error: Failed to recover datastore:
This error usually indicates that the recovery side cannot communicate with the array on the recovery side. In the SRM logs on the recovery side you can see a Mapped LUN line (s) that will help you see what the protected side is mapped to on the recovery side. This will sometimes help you fix this error message.

SRM unlicensed error in logs but you have a good license
If you change the SRM license file(s) you may have a small issue, as it is not the same process as changing an ESX or VC license. You would follow the normal steps of dropping the file in the license folder and rereading the license folder in the license tool. This would be enough for VC or ESX but is not enough for SRM. You could after these steps see the license in the VC Admin License view, but would still see the unlicensed errors in the SRM log. You need to restart the SRM service for the new license change to occur. You can find more information on SRM licensing in other parts of this document. For example see SRM Licensing Information.

I cannot uninstall SRM successfully – what can I do?
Uninstalling SRM will normally require access to the VC that it is paired with. If you do not have that VC running it is hard to uninstall SRM. If you don‘t cleanly uninstall SRM you cannot install it again. It is possible to uninstall with no VC if you read the screens carefully and answer appropriately, but I have seen where that doesn‘t work. Use one of the ideas below to help if you need it. It is always best to use the Add Remove programs method to uninstall but if that doesn‘t work the ideas below should.
msiexec.exe /qn /x {35A202EA-1549-4592-97A5-65F5E4CCDEC9}

Microsoft‘s uninstall utility: http://support.microsoft.com/kb/29031
SRM Reference Guide Page 61 of 166

SRM doesn’t start, and you just uninstalled an SRA
This was an interesting issue! I decided to not use a particular storage array any longer. So I migrated all of the protected VM‘s on it off to another storage array. Next, I removed the SRA, and deleted the virtual storage arrays. For an unrelated reason, I restarted SRM but it would not start. When I looked at the logs I noticed it was crashing on issues related to the SRA I had just uninstalled. So I installed the SRA again, and removed the Array Manager config for the removed SRA, and the associated PG‘s. When I removed the SRA again, there were no issues. The morale of this story, is when you remove an SRA, make sure to remove the Array Manager configs as well as PG‘s that point to it!

Unable to create placeholder virtual machine at the recovery site: host, resource pool, and datastore are not compatible
This is a frustrating error message. I first saw it when I started using distributed switches at one site and not the other. This error message means that you have mapped resources that are, for some reason, not compatible. One simple example is when you have mapped a VM network to a network where one host doesn‘t have access to that network. You can also confirm that the Shadow VM location is visible to all hosts at the recovery side. You will need client and server logs to investigate this further. Another cause of this issue can be mapping between a 4.x cluster and a 3.x cluster. You can map between a 3.x and a 4.x cluster, which will work for failover but not failback. I also saw this once after an SRM service restart during a test recovery. Restarting both VC and SRM servers solved it.

Network device needed by recovered virtual machine could not be found at recovery or test time
This error will occur when your protected virtual machines are using dVS switches. With 4.0, or 4.0.1 dVS is not supported even though it is supposed to be. This problem is in two parts with the first being a cosmetic issue in VC, and than the error above, which stops a recovery from being successful. As of 5/22/10 there is a patch that has been confirmed to work available from GSS, which means you need an SR to get it. In our next major release, and in our next patch, we will include this fix. Both of these will be available in the summer of 2010. The VC issue will be fixed in vSphere VC 4.0 Update 2. To confirm you have this issue, you will find NetworkDeviceNotFound in your SRM log. A few lines after that error you will see dvportgroup-xxxx messages. In the History Report you will get an error something like ―Network device needed by recovered virtual machine couldn‘t be found at recovery or test time. Update – SRM 4.1 doesn‘t have these issues, and if you use SRM 4.0.2 and VC 4.0 U2 you will not have this issue. KB article can be found at http://kb.vmware.com/kb/1019890 .

SRM doesn’t start and nothing in SRM logs or event logs – what to do?
The reason nothing is in the SRM logs is that SRM really hasn‘t started yet. When there is nothing in the events logs it is not a surprise. But I have seen this several times and there are two things to think about. 1. Use depends.exe to determine what missing DLL is hurting SRM. I once had SRM not start for me and it was due to a missing DLL by the name of MSVCP71.dll and by using depend.exe to start vmware-dr.exe (the SRM service) I was able to determine what DLL was missing and replace it with a copy from a different SRM server. Incidentally, depends.exe comes with Visual Studio.

SRM Reference Guide

Page 62 of 166

2. Start vmware-dr.exe manually and you may see a message such as msg=Login failed due to a bad username or password. This may or may not be in the log file. This can occur after changing the password that is tied to SRM. This message was likely in the SRM log but hard to find perhaps.

Only three Recovery Plans can run at the same time
Not sure what the error message is if you try to do more than 3 but at least you now know that only 3 should be executed at the same time – Update – this limit is not enforced. This is due to the QA level of testing and will be significantly improved in the future. It is rumored that up to 6 running RP will work without issue but above 6 there are issues, and for sure by 10 there is consistent and serious issues. Only three is supported! It is important to note that not all SRA‘s will support this. For example, due to issues in Navisphere, the MirrorView SRA will only support 1 running RP. Always check the readme or release notes for the SRA‘s.

Why is Port 80 used in the install but port 443 later?
During install of SRM port 80 is specified and you cannot type in 443, but after the install is complete than SRM talks to VC on 443, so why is 80 specified in the install? Even though SRM uses SSL when it communicates to VC, it does not use port 443. SRM establishes a TCP connection to port 80, than uses an HTTP CONNECT request to establish a tunnel to the VC servers, then does an SSL handshake with the VC over that tunneled connection. The SRM installation enforces these semantics.

Failed to test failover luns. Existing with failure
This is from the EMC SRDF SRA. The error snippet is:
[#4] [07/20 07:03:15 CopyLuns.cpp 1089 CopyLuns::ValidateOptionsFileDevicePairs ] Enter [#4] [07/20 07:03:15 CopyLuns.cpp 1098 CopyLuns::ValidateOptionsFileDevicePairs ] Checking if the number of input devices is same as the number ofsource devices in the options file [#4] [07/20 07:03:15 CopyLuns.cpp 1101 CopyLuns::ValidateOptionsFileDevicePairs ] [ERROR]: One or more input RDF devices are missing from the device pair list within the options file [#4] [07/20 07:03:15 CopyLuns.cpp 1258 CopyLuns::ValidateOptionsFileDevicePairs ] Exit [#4] [07/20 07:03:15 CopyLuns.cpp 0154 CopyLuns::TestFailover ] [ERROR]: Options file device pairs validation succeeded but one/many of the adapter's conditions have not met. Exiting with failure [#4] [07/20 07:03:15 EmcSrdfSra.h 0040 SymapiSession::~SymapiSession ] SymCommit() and SymExit() [#4] [07/20 07:03:16 CopyLuns.cpp 0206 CopyLuns::TestFailover ] Exit [#4] [07/20 07:03:16 EmcSrdfSra.cpp 1203 wmain ] [ERROR]: Failed to test failover luns. Exiting with failure

The question is what RDF is it talking about, and which options file? In the adapters directory on the recovery side there should be a file called EmcSrdSraOptions.xml. In that file you need to specify the R2 devices and their associated BCV pairs as part of the <TestFailoverInfo> information. You need to find the associated BCV device names for each of those devices, for example by using the "symmir" command and specifying the device group containing those devices. Then, modify EmcSrdfSraOptions.xml to include entries in the <TestFailoverInfo> stanza such as (for example if 477's BCV is 35F) <DevicePair> <Source>0477</Source> <Target>035F</Target> </DevicePair> Then run the test again, since this the "options" that the SRDF adapter is looking for. You will have to create this pairing information for each R2 device you plan to test. The output from the adapter will summarize what it thinks is specified in the EmcSrdfSraOptions.xml file, for example if the output has:
[#4] [07/16 08:57:16 EmcSrdfSra.cpp save_pool_name = n/a [#4] [07/16 08:57:16 EmcSrdfSra.cpp devices = n/a 0655 0673 SrdfSraOptionsReader::DisplaySrdfSraOptions] SrdfSraOptionsReader::DisplaySrdfSraOptions]

SRM Reference Guide

Page 63 of 166

As long as the schema has the same name as the username. You can locate the manifest. then you are ok.xml file in the SRA installation directory. does the SRM DB user need the DB_OWNER permission? For SQL server. if you are using BCV clones you will need to modify the EmcSrdfSraOptions. You can find parameter names (such as value for –sitename) in the vmware-dr. and is the default schema for that user. The information below is still appropriate for SRM 1.exe file but it has been reported to not always work. After you modify EmcSrdfSraOptions.exe –cmd confuserbased –sitename srm-primary –cfg vmware-dr-primary. The format is complex for this command.xml file accordingly including specifying the save pool name. This all assumes you are using standard Timefinder for snapshots. In theory you can use the installcreds.xml file found in the config folder. In a near future there will be an update to make this process easier but for now you must use the srm-config. You must ran it twice. modify the SRA name in it. the SRM DB user doesn‘t need the DB_OWNER permissions.10 –thumbprint 96:E0:E8:F5:59:1C:BF:6D:81:6C:A2:AB:51:76:24:DE:31:D1:E8 SRM Reference Guide Page 64 of 166 . I can’t install the plug in – get an error The information of where to install the plug-in from is held in extension.san.exe –cmd confuserbased –sitename <local site name> -cfg <SRM configuration file> -u <username> -vc <host[:port]> [-thumbprint <sha-1 server certificate thumbprint] Srm-config. This could be due to an issue during the install. When it is complete you will be able to restart the SRM service and have communication between the SRM servers (will need to repair the communication by doing the pairing again).0.xml which is in the install folder. and restart the SRM service and you would be good to go. This utility is found in the bin directory of the c:\program files\VMware\VMware Site Recovery Manager\config folder. Srm-config. Changing passwords after SRM is working Update – 11/26/09 – For SRM 4. Below is a sample command line.0 and later you can do this sort of thing much easier via the Add / Remove Programs and use the SRM Repair option. EMC can probably give more details as to the purpose of the options file. You can have some issues with changing account passwords after everything is working. the first time to obtain a thumbprint. This problem was fixed in Update 1.cpp gold_copy_type = BCV 0676 SrdfSraOptionsReader::DisplaySrdfSraOptions] where "devices = n/a" it thinks you haven't set any DevicePair settings.fault.exe -env) where the env flag will cause it to print out what it thinks is in the options file. and is owned by that user. but there is another option. For SQL server use. You can uninstall the new SRA and install the old one as a work around.10. Unexpected MethodFault (dr.xml you can also run the adapter binary by hand (EmcSrdfSra.10.[#4] [07/16 08:57:16 EmcSrdfSra. and than the second time to actually make the change.ManagementSystemNotFound) This error occurs after you upgrade the EqualLogic PS Series Interface SRA adapter to the Dell EqualLogic PS Series Interface.xml –u administrator –vc 10. It may have the wrong path.exe command.

Until there is a good solution. But when you refresh the Recovery Steps you see your VM‘s with the original priority and not the new that you changed in the Protection Group. If there is wide deltas between the build / patch level of the hosts in the cluster it is possible that certain hosts will not be used by SRM since DRS is not using them. or if you have tweaked the permissions of the account. the manual power on should work fine but you should provide your logs to VMware support and figure out what the issue is and get it fixed. it was due to the host that was not starting VM‘s not having access to the storage array.NoPermission).Without the password you will need to use the thumbprint. Priority Levels in Recovery Plan don’t reflect my changes You have made changes in the Protection Group to the priority level of some of your protected VM‘s. make sure that DRS is healthy. This was due to it not having a vmkernal port that LHN required. My recovery site is only using x number of hosts to start VM’s but it should be using y number When I experienced this. So run this command the first time without the thumbprint parameter and you will be shown the thumbprint and than run it again with the thumbprint. It would be possible from someone on the Protected side to make changes that affect VM‘s on the recovery side. You can see a little more about this on page 44.077 'SecondaryReplication' 1768 verbose] Loading ShadowVm from DB object SRM Reference Guide Page 65 of 166 . Just having VC and local admin rights are not enough. Test that all hosts can be used by VMotion by setting all hosts one by one in and out of Maintenance mode to confirm things are ok. You will see in the error log messages about ERROR 1920 Service VMware SRM Service (vmware-dr) failed to start. If your site name contains spaces enclose the name in quotes. I have seen a similar error where the single host at the recovery site didn‘t have an IP entered for the iSCSI array.fault. It is due to the difference in security permissions on both sides. Error: A general system error occurred: cannot execute scripts If you see this error. I have seen this with other vendors where there was no security between the ESX host in questions and the storage array. The SRM log would have in it errors that include (vim. There are no error messages associated with this situation so make sure you test for it. This may or may not be appropriate. You will need to worry about this if you cannot get the SRM service to start. It may be improved in the future. To solve this issue make sure the user account has the protect privilege. just right click on the VM in question and use the Move Up or Move Down options to change its execution order priority. What does SRM database corruption look like? I would like to know what I might see in the SRM logs if my SRM database is corrupted? [2009-08-04 21:15:18. This is correct behavior. In addition.0 and is covered in the admin guide. Permission to perform this operation failed This may occur when you try a variety of different options when you are using an account that is not the default the install occurred under. This is easier in SRM 4.

. This file is reset to its default values during the vSphere Update 1 process. Remember that if you make this change every script will not have longer to run in and this may impact your recovery RTO.0 it would look like <Recovery> <calloutCommandLineTimeout>500</calloutCommandLineTimeout> </Recovery> Database access issues Use Windows Authentication if the DB server is local to the SRM server. ESXi – not supported at 1. vm-vmname/vm-vmname. Even ESXi on FC is supported.html including a link there to the KB article. However ESXi on anything else is NOT supported.0.com/uptime/2009/11/srm-40-license-change-after-upgrading-tovsphere-40-update-1. Make sure the schema for the database has the same name as the user.414] vmware-dr service stopped Above is an example of what you might see in the SRM log files when the SRM database is corrupt. It is in the vmware-drl. My script needs more time to execute There is a variable that controls this. [2009-08-04 21:15:18.0. Update 2 does seem to work – at least for me.. Shutting down .0 nor is ESX / VC Update 2 It is not as obvious as it should be but ESXi is not supported with 1. You can restore the database if necessary. And as the title mentions nor is Update 2 of ESX or VirtualCenter. This will change with a patch for ESXi is released which should occur in late January. You will need to create your SRM Reference Guide Page 66 of 166 . This most often occurs when you add another VM to the protected datastore and before it has time to replicate start a test recovery.077 'App' 1768 error] Application error: unclosed token. In SRM 4. The solution is to wait until the replication catches up and try the test again.. This means no mapping and as a result SRM will not be able to start.xml file.0 GUI and so will need a restart. and SQL Authentication if the DB server is remote to the SRM server.077 'DrServiceInstance' 1768 warning] Initializing service content: Unexpected exception 'class Vmacore::Xml::XMLParseException' unclosed token [2009-08-04 21:15:18. This patch has been released and thus ESXi is supported. Sometimes this is due to a VM that is Storage VMotioned.[2009-08-04 21:15:18. and you are required to fill in all of the necessary information. This can be easily solved after it occurs by using the Repair SRM option in the Add / Remove Programs area. but make sure to do it on both sides and have SRM not running when you do it. but only the network info will be used.0 Update 1 There is a proxy. It doesn‘t appear to be in the SRM 4. This is how you can change a network setting in a recovery. As of Update 1 for SRM U3 of both ESX and VC are supported. or migrated off of a host at the worst possible time! SRM 4.0 of SRM. No available Customization specifications found You can create customizations using the View \ Edit Customization command in the VI client.xml file that maps between SRM and VC. Error:Expected virtual machine file path …. This is like sysprep.vmware.187 'App' 6344 info] [serviceWin32. See more at http://blogs.0 cannot start – I just updated to vSphere 4.vmx cannot be found This can occur during test or recovery and it means quite simply the VM reference in the error is not in the replicated SAN datastore where it is expected.

Currently. In this case the issue is RecoverPoint and there is a forthcoming patch to address it. If this doesn‘t work you should check the vmware-dr. Remember to restart the SRM service. Operation Timeout error when doing test recovery I have seen recently device timeout errors when doing test recoveries where the storage is RecoverPoint 3. the value of TaskMax is 10.customization specification on the recovery site.  Does your recovery server have a software initiator that points to your shared and replicated storage? This is configured with the Properties (and Configure on the General tab) button on the iSCSI adapter in the Storage Adapters area of the ESX server in question configuration area. .xml file and look for the section below: <vmacore> <threadPool> <initializedCOM>mta</initializedCOM> </threadPool> and it continues on . and sometimes that is not enough. . . and it may be called operation timeout. and it will be done in future releases. and some where not. Errors with using Network Customization This problem is seen when you try to change a VM during recovery from using DHCP to a static IP.3. You will likely need to re-pair. Along with the Authentication Group you also need a volume list.xml file on the recovery site for the following line: <disableNFCServerCertificateChecks>false</disableNFCServerCertificateChecks> You will need to change the false to true – and then restart the VMware SRM service. There is a fix for this that involves a configuration change. Remember that you can export and import customizations so if necessary it doesn‘t take much to move them between your protected and recovery sites. . Recovery Plan error: Unable to access the VM config error message I have seen this error in a number of different situations. but you can do it today if you see this operation timeout error. SRM Reference Guide Page 67 of 166 . We will increse the value for it to 20 in current releases. On the recovery side. . Some of the VM‘s were properly recovered. .  For Left Hand SRA‘s you need to have an Authentication Group on the recovery side and if you don‘t this error can occur. And add the line <TaskMax>20</TaskMax> so the section will look like: <vmacore> <threadPool> <initializedCOM>mta</initializedCOM> <TaskMax>20</TaskMax> </threadPool> and it continues on . Sometimes this error may occur on older and slower storage. open the vmware-dr. This error should not occur at GA – I confirmed as of build 97878 this has been fixed.

However. Is there a limitation of DR failover LUNs for some iSCSI arrays and some Hosts? There is a hard limit of 64 iSCSI arrays per host. with a real array name in place of the xxxx. Not sure the error name but interesting problem Shadow VM issue (thanks Jason): Customer cannot configure protection group because SRM throws the following error: [2009-02-11 16:16:45. This is not specific to SRM but to any DR setup you might test. But the solution was to log into the VSA LeftHand Networks CMC software. For more information about this please visit http://kb. I used the right IP in the array configuration.804 'SecondarySanProvider' 9896 warning] Failed to prepare shadow vm for recovery: Unexpected MethodFault (vim.com/kb/1005867 . I got through the Array Management configuration with no errors (but no green checkmark) but during the test failover it had an error with the error above. The fix was easy. and it was due to me setting the protected site LeftHand Networks SRA to the VIP instead of the management IP. Net::SSLeay::load_error_strings This comes from the Perl module for OpenSSL. This is true for any SRA‘s. If you have one VM. when using SRM there is a limit of approximately 23 recovery iSCSI LUNs on the recovery side only.fault. and one is on the NetApp FC / iSCSI SRA. which is required by some SRA‘s (such as NetApp) and means that perl is not installed on the recovery SRM server. This can also occur when you have no datastore groups. After you expanded and looked at the VSA‘s all was good. Grayed out options for creating and editing of protection group This happened several times in earlier builds and I was not able to understand why or what the problem was. and all was good. This can also occur when you have a cluster that you are recovering to and some of the hosts in the cluster do not have access to the storage! For example no iSCSI access to the recovering storage arrays. the shadow VM needs to be on a temporary datastore at the recovery site. saw the green checkmark. [#2] [DATASTORE-SRM-VDISK1]" [#2] } Technically it's not an adapter problem because the adapter successfully returned the replicated LUN.FileNotFound) { [#2] dynamicType = <unset>.vmware. You cannot spread a VM between arrays. Further up in the log I see that datastore: SRM Reference Guide Page 68 of 166 . [#2] file = "[DATASTORE-SRM-VDISK1]". and this datastore name looks a little strange. However. [#2] msg = "A file was not found. and one is on NetApp NFS you will get an error. Can I have a VM with multiple VMDKs spread across two NetApp SRA’s? No. Array with key “xxxxxxxxx” not found error message I received this error recently. with two VMDK files.

The error is ―Failed to launch SAN integration scripts to execute discoverArrays command. after SRM starts again.460 'SecondarySanProvider' 9896 verbose] Adding datastore 'DATASTORE-SRMVDISK1' with MoId 'datastore-220' and VMFS volume UUID '4992af7e-6a5f6312-7a66-001cc4bd0c2e' spanning 1 LUNs Hmm. But this time.4992af7e-6a5f6312-7a66-001cc4bd0c2e. is in paused mode and you just continue it. the resynchronized replication (so the remote LUN is read-only). could it be that the customer has somehow exposed the replicated datastore to the recovery site and is trying to use it as the temporary datastore? Further up in the log I see that the datastore UUID is: [2009-02-11 16:16:45. The default path is C:\Program Files\EMC\SYMCLI\bin and you will need to restart the SRM server service after the PATH change. made the replicated datastore visible. Failed to launch SAN integration scripts If you are using SRDF and get the error below when configuring your array you have a path issue. so perhaps they split replication. This exact error is from an issue with SRDF it may occur with other SRA‘s from other or the same vendor.[2009-02-11 16:16:45. Customer corrects issue by selecting a non-replicated datastore at recovery side as for shadow VMs. I removed the IP customization and everything worked.TEST\SRM-VDISK1\ACTIVE"> [#2] <Number initiatorGroupId="\Hosts\SECOURS\LYONSEC1">2</Number> [#2] <Number initiatorGroupId="\Hosts\SECOURS\LYONSEC2">2</Number> [#2] <Number initiatorGroupId="\Hosts\SECOURS\LYONSEC3">2</Number> [#2] </ReplicaLun> [#2] </ReplicaLunList> So. I found an KB article (http://kb. The solution is to add the path to the SYMCLI bin folder to the System variables PATH environment.com/kb/1009903 ) that I didn‘t like. the creation fails. I restarted both SRM services and I had no more issue. with no IP customization. The recovery plan. and thought would be an issue so I ignored it! I restarted the recovery side SRM service with no change or improvement..vmware. This issue can also be caused when there is no SRA installed! Failed to connect to NFC during test failover with IP customization I ‗accidently‘ restarted my recovery side SRM server during a test failover with IP customization. During the rest of the failover the IP customization failed.‖ The issue is a missing path to the SYMCLI folder in the path.vmhba1:0:2' with key 'host69. without issue.382 'SecondarySanProvider' 9896 trivia] Added vmfs extent 'host-69. SRM Reference Guide Page 69 of 166 . the protection site has a datastore with the same name as the recovery site . it seems the customer thought they needed to specify the replicated datastore as the shadow VM datastore.. I attached again the IP customization and the problem came back – failed to connect to NFC. when doing IP customization I had an issue. I could do test failovers with IP customization with no error. I have done this before. Now when SRM tries to create the shadow VM there.0' LUN vmhba1:0:2 Then in discoverLuns I see that LUN #2 is replicated: [#2] <ReplicaLunList> [#2] <ReplicaLun key="\Virtual Disks\SRM .

Under dnsName. 5. use the process above if it works! Null parameter name:key error If you are adding a protection group and you get a error with a value of null parameter name:key in it. Than select Back. Under ip. That may be due to the process above being old and not usable any longer. Visit https://VC_HOST_NAME/mob/?moid=SearchIndex&method=findByip 2. 8. Review Replicate Datastores window of Array Manager is blank When you are configuring your SRA and the last step in it is to show you the replicated LUN‘s. Now return to the SRM Array Manager configuration.0 if you have no VM‘s on your storage you will be able to see it now but with a warning. This would be in the SRM log for example where the VM might be known as vm-xxxx. Click ―Invoke Method‖ See more info in http://kb. 2. but rather it‘s MoRef.yellow-bricks. Under vmSearch. Sometimes you will not see the name of a VM.vmx files. Click ―Invoke Method‖ By IP: 1.No visible LUN’s during configuration of the array This will occur if there is NO VM‘s in the protected datastore. and you can use the info below to do that. Select Rescan. enter ―true‖ 4. 6. Recently this occurred for a very odd reason. You should now see your LUN information displayed. The SPC-2 bit was not set properly in an SRDF environment. To work around this issue. In the upper right area select the Refresh option. Under vmSearch. In SRM 4.com/kb/1017126.vmware. But for now. By DNS: 1. Visit https://VC_HOST_NAME/mob/?moid=SearchIndex&method=findByDnsName 2. enter the DNS name of the virtual machine as seen by the vCenter server in the summer tab 3. I will test this when I get a chance and correct as necessary. Now a message is displayed. SRM Reference Guide Page 70 of 166 . use the following steps: 1. Now select Storage 4. but you see nothing you have a problem.com/2009/07/21/srdf-sra-and-the-spc-2-bit/ for more information. It should be noted that this could occur if there is NO VM‘s on the LUN. enter the ip of the virtual machine as seen by the vCenter server in the summer tab 3.0 for no VM or VMFS on the replicated storage. Using the Rescan button doesn‘t cause the LUN(s) to be displayed. This very new KB article shows a different method than above. This has been fixed in SRM 4. the solution at this time is to restart the SRM service on both the protected and recovery sites. Now select Next 9. How do I find the Managed object reference (MoRef) for a VM? Sometimes you will have the find the MoRef of a VM. In the VI Client. Goto the ESX host configuration area 3. Technically it is looking for the . Add a VM to the protected datastore and the LUN will be visible in the array configuration. See http://www. 7. enter ―true‖ 4.

Other VC / SRM work will be done with your own account. Which is not a missing MirrorViewSRACore. They are: msvcr80. You do not hold system privilege “System. this error can be avoided by using an AD account that is Domain Admin. symapi. will generally occur when you are not following any of the suggestions above. Once they were renamed the timeout errors would go away and the configuration of the protection group went through without issue. it can be tricky to troubleshoot. This account will be used in the SRM install for the account that is kept by the install. The error above.microsoft. I have also seen myself this error when it was Solutions Enabler not installed. You can get help on this in PowerShell with the command get-help about signing or you can check out http://technet. and than in the SRM log you see an ―unable to load DLL MirrorViewSRACore. storapi. Install hangs at 90%. The solution is to enable remote signing.Missing testbubble switch on recovery host When you are checking your test recovery VM‘s for network connectivity you find that while one ESX host worth of VM‘s can talk to each other.DLL. you will not use that account again.aspx#EEAA . you may see the above error in the SRM logs. However.confirmed.DLL file.View” on ServiceInstance “DrServiceInstance” This error is referred to in http://kb. and msjava. Make sure that account is used by the VC service as well. Redo the install with an admin account. Therefore the VM‘s configured to use the test bubble switch that doesn‘t exist will not be able to communicate. Further checking shows that only one recovery ESX host has the testbubble switch and the other hosts do not have that switch even though the recovery VM‘s are configured to use it.dll not found If you see an ―Error occurred‖ in array configuration. SRM Reference Guide Page 71 of 166 . Execution of scripts is disabled on this system If you are trying to execute PowerShell scripts during the execution of a recovery plan and it is not working. Error occurred – MirrorViewSRACore. Protection Group configuration times out If you have done a failover. But you are missing potentially four other DLL files that you need to make sure are in the path. and install log shows VIEINSTUTIL: Failed to open service control manager This error can occur when you are installing SRM with a partial admin account.com/kb/1016875 and is not frequently seen. After that point. The Shadow VM‘s would be created.DLL. This problem has been reported (thanks John) when during the cleanup after the failover the LUN‘s with the name snap-xxx were NOT renamed.DLL. but the task would not complete.DLL‖ you have this problem. In point of fact you are missing the privilege to add a service. and are trying to failback.DLL.com/en-us/library/ee176949. but on other ESX hosts there is no connectivity. You need to use 32 bit SE since the SRA is also 32 bit.vmware. It will at that point turn into a service account. and the account should be the ODBC account as well. I have seen this error when you have 64 bit SE on a 64 bit platform. As well. and admin in both VC and SRM (although this comes after the install adds the SRM objects). and you should log into the VC server with it as well. and you get timeouts when you are configuring the protection group. This is a recently discovered bug that will likely be fixed in the very next release .

and now it was happening to me.0. It turns out that this error is due to a .X Compatibility Fact Fact Fact Fact Fact Product: CLARiiON CX4 Product: VMware VCentre Server 4. It is related to MirrorView. The workaround was to have svga.1 Product: VMware ESX Server 4.com/kb/210638 .VMX file and remove / add them back to inventory and they could be protected fine.vramsize=167772161. but I did learn that you can make the change in registry at HKLM\System\CurrentControlSet\Control\FileSystem\NtfsDisable8Dot3NameCreateion and change the 0 to a 1. The template was hard to fix. I have not seen this error with XP or Win2K3 but it is in theory possible with them. You do not see a newly ‘added’ LUN when creating a PG? This happened to a friend of mine lately.1 Application SW: MirrorView Insight for VMware 1. call support. One issue with this suggested solution is you will see in the VM events a notice that you can have larger video memory. Than do the install again. The VMX file setting that was causing the issue was svga. That is not required but a suggestion.autodetect say false. I had deployed several virtual machines (from template) that were Win2K8 R2 and they could not be protected. you will generally need to use the Array Manager and next through it until the last screen where you should see your new LUN. See our KB for help at http://kb.vmware. The KB is at http://kb. I think if you do that. but I have received reports from the field that indicate it may occur with Win2K3 as well. Neither of witch was default. This is a problem that can occur with any of our software that does re-configuration like View or Converter.com/kb/1028918 but also you may need info from MS at http://support. Error: The operation is not supported on this object This was very confusing for me and hard to troubleshoot. you should wait until it is finished replicating before adding it to SRM.x Application SW: VMware SRM 4.VMX parameter error. For the virtual machines already deployed I could edit the . If not. See below for the solution: ID: emc247587 Domain: EMC1 Solution Class: 3. change the video settings (to set it exactly to 4 MB) and turn it back to a template. Ignore it.15 Page 72 of 166 SRM Reference Guide . While this may not always be necessary. When you start replicating a new LUN.Failed to update Perl installation directories This was reported in our KB specifically with Win2K8 R2. suggested best practice is to let your replication finish. I think the 4 MB could also be 8 MB but I have not tested that. you will not have a problem when creating your new PG.1 is not supported This is covered in Primus Article 247587. It does have a KB now but for the longest time it didn‘t. I actually had to power it on. So. and than use the Array Manager to make sure you can see that new LUN. Operation failed…Details: VI API Version 4.com/kb/1020796. When the error occurs it is a pop up error – ―unable to create placeholder virtual machine at recovery site: Recovery virtual machine could not be created: the operation is not supported on the object‖. turn it into a VM.autodetect=‖true‖ together with svga.microsoft. BTW.vmware. I think between both of these you should be fine. or the vramsize parameter say 4194304. in that there may be some sort of a scheduled activity that allows SRM to see a newly replicated LUN. It has been seen in the wild once or twice. I could not find the fsutil tool. I am not sure that is true and I always use that method and I have no issues as a result.4. when I had this issue. and let me know how it all goes. Before you add it to a new PG.

. 2.5u2. Start the registry editor.16 Symptom Error when executing MirrorView Insight for VMware Symptom Operation failed.4. The entry should look like this: SupportedVIAPIVersion REG_SZ 2. The entry should look like this: SupportedVIAPIVersion REG_SZ For 32-bit Machines 1. to enable MVIV to recognize vCenter Server 4. 2.5u2.1.4.4.1 4.Details: VI API Version 4.1 4.4.1..5u2 and vCenter Server 4.4. If this entry is not there. Start the registry editor.5u2.5u2.0.0.Fact Application SW: MirrorView Insight for VMware 1. Fix Follow these steps to create or modify the following registry entry: For 64-bit machines 1.1 2. Navigate to: My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\emc\MirrorViewInsightForVMW are\Preferences 3.4.1 SRM Reference Guide Page 73 of 166 .5u2.4. Navigate to: My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\emc\MirrorViewInsightForVMWare\Preferences 3. MVIV was qualified with vCenter Server 4.4.4. vCenter Server 4.1 as the supported version.0.0.4. Modify the "SupportedVIAPIVersion" data so it reads as follows: 2. create a new string value of "SupportedVIAPIVersion" with the data of 2. If this entry is not there.4. create a new string value of "SupportedVIAPIVersion" with the data of 2. Modify the "SupportedVIAPIVersion" data so it reads as follows: 2.4.1.0. Subsequent to the release of the vCenter Server 4.1 was not yet available and the official support was only for VMware Virtual Center Server v2. a registry key must be added.0.0.0. However.4.5u2.1.1 is not supported Cause At the time of MirrorView Insight for VMware (MVIV) release in the year 2009.

01. the two IP fields for the LHN SRA do not require the same IP information nor to be both filled.vmware. If you stop now you will apparently have a working shared storage that is replicating.nl/2009/10/lefthand-sanlessons-learned/ . Don‘t forget that specific errors with any SRA will be reported above in the troubleshooting section but background information will be below in the appropriate section. You can put more than one IP address in the fields by separating them with a comma. Update. Only the first one needs to be used. Than on your protected site use the wizards to configure the VSA to be able to present storage to the protected site ESX server. It is related to IBM SVC but it is an interesting one.0. and NOT to a virtual IP.0. The original certified version is 7. which should be the VSA on the protected side. Once this is done you can work on the recovery site VSA but your configuration will be different. An old report of Lessons Learned is still interesting at http://frankdenneman. It turns out that this is caused by a Java garbage collection issue! SRM SRA Errata This section has very specific notes on SRA‘s that I work with. For the solution and more details see http://kb. LeftHand snap left visible after test recovery This is appropriate with the current versions of the LeftHand VSA and SRM.0 and the SRA is 9. Current version of LHN is 9.com/kb/1033871. The SRA must talk to a manager. but there is not many changes required for this document in terms of install and configure of the array. The protected side array configuration should reference the SRA installed on the protected side! Both IP fields should contain the same IP information. failover fail with file write errors Brock reported this to me so thanks very much for that.3561 (11/11/10). It will disappear according to the retention guidelines in the LeftHand CMC. There are a lot of new features in the SAN/iQ software.SRM LUN discovery. As part of this create a remote volume on the recovery side. But you will get the error mentioned in the Appendix about unable to access the VM configuration.0.00. Make sure it is seen in ESX before continuing.1682. Create a Remote Scheduled Copy from the protected site to the recovery site. SRM Reference Guide Page 74 of 166 . Miscellaneous Information When you install your VSA‘s make sure to specifically step by step follow the LHN instructions. There is a new version of the VSA and of the SRA and they both work well with Update 1 of SRM.6066. LeftHand Networks The LHN adapter requires the account / password of the CMC management app. test. Once this is done your Recovery Plan should work fine. All storage vendor SRAs requires a restart of the SRM service after the install. The SRM log will show ―Error writing to C:\users\srmadmin\appdata\local\temp\vmwaresrmadmin\dr-sanprovider6984-0‖ or something similar. But now it is currently 8.0. You will need to use the Tasks menu in the CMC to create a Volume List and than an Authentication Group. Good info on using this excellent gear. If you have five managers it would be a good idea to put at least two of them into the first or first and second IP fields. or that others share with me.

NetApp has mentioned that they designed the SRA to support simultaneous recovery plan operations but did not test it extensively. You can troubleshoot communication by using a browser and connecting to the filer as http://aa. It only took 5 seconds to fail.cc.bb. I have confirmed it works by default over HTTP but that it can in fact be changed to use HTTPS. This is useful to know if you think you are using something else such as IBM or EMC. More info on this can be found at the link below. I recently upgraded from 8. NetApp When using SRM and NetApp.1 and had a little interesting things happen.dd/na_admin and you should get the FilerView page. When reading through SRM logs. It said that it failed to authenticate with the array management system during a test failover. The error message in the history report was almost misleading. but the workaround is simple.The LHN VSA uses remote scheduled copies to do the replication and this means when the test fail over is progressing the remote copy process is not copied. So it is best if possible for best flexibility and working with SRM if there was one volume on one LUN. It is sometimes configured as a very large volume that has multiple LUN‘s on it. If you have recovery site igroups configured in your protected site igroups you will have errors. when it starts with 50:0A. I forgot to upgrade my SRA.0 to 8. http://now. It only said it failed to authenticate it would have been true. It looked like a credentials issue. the simulator can be one VM but be two instances so that it can do both the protected and fail over storage but with much less resource usage. One of the remote copies is mounted for the recovery site to work with but that doesn‘t stop the replication / copy process. So after I did the VSA upgrade my test failover failed. This can be configured as something else. Upgrading the VSA cleared this issue. Don‘t do this.com/NOW/cgi-bin/bol?Type=Detail&Display=464045 This SRA requires the account / password to the simulator or NetApp device but it only has one IP address field compared to the Left Hand which requires 2.netapp. you will sometimes see lines that start with <StoragePort id=‖…‖> and what you see after the =.2. SRM Reference Guide Page 75 of 166 . NetApp uses Flexclone to provide storage for the test failover so that means the replication of data is not impacted during a failover. When working with NetApp it is worth having a volume ‗equal‘ to LUN. The minimum software version on the NetApp devices that the NetApp SRA requires is 7. but with the extra stuff it was a different issue. and when using NFS and OnTap version 8. Make sure you put your NFS IP address into the NFS IP field even if you think since you are using the same IP‘s it is not necessary to do it. By default OnTap doesn‘t have SSH enabled and it should be (but is not required) to be for SRM. you may have a configuration issue stopping you from successful configuration. it means NetApp. However I have another report that it uses unsecure HTTP over port 80. The NetApp SRA uses SSL to talk to the NetApp controller and there are no other ports required.2. Of significant note.

The new version number of the DRA is v1. SRM Test recovery error: failed to recovery datastore – background information SRM will set the LVM settings it needs you should not need to mess with those.0.snapmirrored. If you run your recovery plans in "Run" or full-failover mode then you will see the netapp SRA break-off the snapmirror relationship and present the actual destination lun to ESX rather than use a flexclone. As of 4/3/09 the updated SRA (or as NetApp says DRA) is available that solves a number of issues and you should be sure to use it with SRM 1. Logs [2009-02-18 09:41:04.There is a VMware SRM in a NetApp Environment document that is quite useful – find it at http://media. The message you have copied below are coming from the fact that ESX has rescanned its devices and has just processed the "read-only" snapmirror destination lun and worked out that its what we call a snapshot lun but in storage terms is just a replica.html#more for more info on this updated SRA.0. once you see our "Preparing Storage. Its state should remain "online. When you execute a recovery plan in "Test" mode you will not see the snapmirror destination lun status change at all..xxxxxxxxxx" on the VMware ESX side it will appear as a snap-xxxxxx<originalvmfsname> datastore. Note the customer must have the netapp flexclone license installed to be able to create these devices.netapp. On the netapp side it will appear as "testfailoverclone..1 adapter are available at the VMware SRM download page.com/documents/tr-3671. There are some NetApp configurations that cause their SRA problems such as having multiple filers at both protected and recovery sites where each sites filers are clustered together but we should be able to identify that setup from the full SRM log.845 seconds [2009-02-18 09:41:04. Both it and the IBM N-Series 1.com/virtualization/2009/04/some-news-on-the-netapp-srmfront. You should see that disablesnapshot and enableresignature are BOTH set to 1.netapp.0.448 'SecondarySanProvider' 4176 trivia] 'Prepare 1 groups for test' took 34. During the running of a test if you open Netapp FilerView and navigate to the "Manage Volumes" screen and keep hitting the refresh button.1.pdf .448 'RSStorageOperation-8814-Task' 4176 verbose] Result set to SRM Reference Guide Page 76 of 166 ." task hit around 25% complete you should see a new volume get created.read-only".1 Update 1. Netapp have a new SRA coming i believe that supports this – is out now – 3/22/09 – see http://blogs. If you get SRM/VC events about ―Virtual machines have one or more devices which don‘t have file backings on the replicated site‖ this is due to a CD being attached to a VM.448 'SanConfigManager' 4176 trivia] Scheduling lun group computation in 0 seconds [2009-02-18 09:41:04. We have not done anything else with the device at this stage. If the customer starts to manually attempt to alter the status of the devices SRM+SRA are expecting to control then they will see errors during the recovery plan execution since the customer has changed the state of the device and SRM+SRA was expecting to do that itself so will report the fact something unexpected has happened.448 'SecondarySanProvider' 4176 trivia] Firing CallOnDestruction callback [2009-02-18 09:41:04. The reason for this is that during a "Test" the netapp SRA will dynamically provision a flexclone of the destination volume and present that device to the ESX hosts being used at the recovery site.

408 cpu7:1339)LVM: 5573: Device vml. len 22.ShadowVm:shadow-vm-8688'. [#14] fault = (dr. and MetroCluster is a stretched HA solution.vimext. [#14] primaryUrl = "sanfs://vmfs_uuid:4994e685-01ee1320-a88a001ec9f48f03/". and thus the KB article to help with that at http://kb.secondary.secondary.408 cpu7:1339)LVM: 5587: on-disk <type 2.408 cpu7:1339)LVM: 5580: queried <type 2. [#14] reason = (vmodl.vmware. devType 0.just like a normal controller failover process.(dr. with the disks from each site synchronously mirrored over fibre to the other site. [#14] msg = "" [#14] }.MethodFault) null. MetroCluster is basically a dual controller NetApp system. h(id) 11890432529146075181> Feb 18 00:47:13 vmkernel: 27:17:19:37. If you're going to stretch a storage system across two sites. and the surviving controller takes over . Then loosing one site results in a MetroCluster failover.020005000060a98000486e2f39535a4e674c59674e4c554e202020 Disk change be a disk ID: disk ID: Feb 18 00:47:13 vmkernel: 27:17:19:37.SingleVmFailure) { [#14] dynamicType = <unset>. stretched across two sites. [#14] datastore = (dr. and you'd want to stretch your ESX HA Clusters across the two sites as well (so you can VMotion from one site to the other). [#14] vm = 'dr.SingleVmFailure) [ [#14] (dr.383 cpu4:1221)LinBlock: 1994: VFS: detected on device 3:0 Feb 18 00:47:13 vmkernel: 27:17:19:37. devType 0. In terms of high-level comparison we could do this: SRM No MetroCluster Yes Page 77 of 166 Distance Limited SRM Reference Guide . [#14] } [#14] ] ---------------Now the problem seems to be that the replicated LUN is seen as a snapshot by the ESX host. BTW.com/kb/1001783 .secondary.ReplicationManager. followed shortly by a HA restart of the VMs on the surviving ESX servers in the surviving site. The idea being that you can loose one site. lun 5. then the chances are you'll have a decent network between the sites.RecoveredDatastoreNotFound) { [#14] dynamicType = <unset>.fault.SanProviderDatastoreLocator) { [#14] dynamicType = <unset>. our HA was not designed for distance. [#14] }. scsi 5. scsi 5.020005000060a98000486e2f39535a4e674c59674e4c554e202020:1 detected to snapshot: Feb 18 00:47:13 vmkernel: 27:17:19:37. lun 20.ReplicationManager.san. h(id) 3407130522988133436> MetroCluster background information (thanks Lee!) The sort version is that SRM is a DR solution. -----------vmhba2:0:5 vml. len 22.

Loosing a controller in either site for most vendors should be no big deal and the failover operation should take care of the storage side. with the campus implementation the process to failback is not as simple as bringing up Site1 and then just vmotioning the VM's back from Site2. we had a couple of customers a few years ago whose "campus" solution was wiped out entirely when the UK oil field disaster struck and took out both datacentres at the same time (they were 0. If you loose the entire site. As we say in the UK take what the whitepapers say with a pinch of salt until you've tried it yourself. With an SRM recovery plan the storage integration "tells" the storage to come online rather than having to wait for a failover heartbeat or similar to be detected by the storage itself. which will require restarting the VM's again and will incur downtime in the same way and SRM failback would work.vCenter Integrated DR Workflow Creation Transparent Failover Non-disruptive DR testing Site Failure VM Protection NFS Support Yes Yes No Yes Yes Yes No No Yes No Yes Yes Campus cluster / stretched HA environments (i. there will be **some** kind of pause whilst the system sorts things out. again it depends on the failure.e MetroCluster) work well if you have the right kind of infrastructure but they are not really DR solutions as typically the two sites are very close together and most customers I work with do not consider a DR site true DR if it is located within a certain distance of the primary. So when talking about failover initiation I would not say SRM vs stretched HA solutions are really any different time wise. again sometimes it may require a manual intervention (click a button. indeed if you wanted to automate the initiation of an SRM recovery plan you can do this though if it were my pair of sites i would want this process at some point to be kick started manually by someone once the true nature of the event was understood. or type a command to failover) and you need to have the process defined clearly for that event. I cannot imagine you would want a situation where Site1 came back online and you vmotioned 50% of your workload back to Site1 but left 100% of your disk workload SRM Reference Guide Page 78 of 166 . If we look at failback. Going back to campus clustering although array/disk shelf failover can be automated this does not always happen automatically either in my experience. then manual intervention will (probably) be required to failover it can sometimes be possible to script round this using staged heartbeats. when its ready. If you can live with the limitations of a campus cluster solution and they fit your needs then they can work well. Again still adds time to the failover.5 miles apart). The amount of time this takes depends entirely on what failed. then the storage will still be accessed via Stie2's controller / disks until you tell the storage arrays to go back to their default configuration. Extreme example maybe but illustrates the difference. Could be 2 seconds. or it could be 2 minutes or more. then you need to wait for HA to kick in. With any cross site storage architecture I have implemented. If you lost site1 completely and have had to failover to the disk shelves at Site2 then the VM's will now (once HA has restarted them) all be running from the disk shelves at Site2 if you simply VMotion them back to Site1.

Try it and see. at this point the VM's will not crash immediately at Site2 and it will take HA sometime to realise these VM's have an issue. if your vendor wants the zone across the sites to effectively be "open" to all ESX hosts then ensure you understand the implications of the ESX LVM settings with regards snapshot / disk resignature. if you run the two sites as one big HA/DRS cluster ensure you test out the various failure scenarios. if the vsan / zones are truly open or all hosts in same then certain fabric events can be a potential pain. The biggest difference in terms of customer feedback I receive is that the ability to perform automated. for example if DRS (or manual VMotion) moves a bunch of VM's from site1 to site2 but no failure as occurred at that time you now end up with VM's CPU/Memory/Network contexts running on hosts at Site2 but accessing their VMDK's on site1. Unless you have N+1 capacity spare at each site you will need to put in place HA/DRS settings that bring online the most important VM's first and dont end up in a failure situation with all your dev/test VM's online and half the production VM's "down" because you did not set correct priorities in HA. Storage Presentation. Split Brain. In SRM this is something the recovery plan handles and you can control. as everything is stretched you need to be very consistent and accurate with naming conventions across all inventory objects the VM's will use DRS/HA settings. This will work but is not always desirable from a latency point of view (might be none-issue if bandwidth sufficient) however what happens next if you now suffer disk outage at Site1. with campus clustering ensure that you know which VM's are important and define the correct settings per VM for recovery. as you build the design out for campus cluster ensure the design wont Page 79 of 166      SRM Reference Guide . Only other items you need to be thinking about with campus cluster are below I am not adding these to say "SRM is better" these are simply things I have had to work through when implementing campus cluster and some of these nuances don't always make it into the whitepapers/datasheets shall we say  VC Inventory / Layout. if you disconnect storage from a VM the VM will cling on to life (assuming IO pattern is normal) for quite sometime before a bluescreen is seen. I think in all cases customers I have put this in with have wanted the storage to "go back to how it was" ready for the next event or failure. Not something that is too common but i have seen it hurt a few customers. VC / ESX limits. You potentially will have ESX hosts that could at some point access both a source and target lun at the same time if someone or something altered the LVM defaults. be careful with the design.running at Site2. usually comes down to bad HBA or cables but can be a real pain to track down. Zoning. repeatable non-disruptive DR testing is one of the key factors moving customers towards SRM. Any rogue events such RSCN will disrupt both sites at the same time if all ESX hosts are on same open fabric so be careful here.

.. max number of luns/ESX host.those are the situations to be careful of.. scale..... Both use-cases are valid but ensure you work out what you actually need.its EMC VPLEX.... Let‘s compare some component basics..otherwise SRM/netapp/vmware is a good solution also :) Sometimes I find metrocluster is wrongly sold to customers who really needed DR..by the way. SRM's strength as a DR solution (combined with netapp snapmirror) is it allows customers to build repeatable recovery workflows that bring their infrastructure back online in a specific order.. This is not the case with metrocluster.. with metrocluster you‘re using a single vCenter instance across the two datacenter rooms so although HA can recover the VM‘s if vCenter is lost in your design your now using a single vCenter namespace across two sites so this needs to be taken into account when your adding objects in to your vCenter inventory (naming consistency.... EMC also integrate ALL of their platforms with SRM as they know that is what provides DR.. Other factors to consider that might not be immediately obvious (don‘t get me wrong here i‘m not bashing metrocluster with netapp/vmware its a good solution you just need to be sure its what customer needs. SRM uses two vCenter‘s meaning if the protected site VC dies it does not affect your ability to recover...have you quickly reaching the limits of what it supported in terms of things like max number of VMs/VC.limits etc). SRM Reference Guide Page 80 of 166 . And some more on this subject from Lee: . With SRM you can simply setup snapmirror for the volumes / luns you want to protect this is NOT the case with metrocluster Metrocluster has no offline/non-disruptive testing capability as we do with SRM/Netapp Flexclones so how do you prove you can failover successfully? Metrocluster has distance limits (2GB link 500m stretch or 4GB link 270M stretch.but as with NetApp... max number paths/lun/ESX host etc. Simple example.. With SRM the architecture is designed so that the recovery process does not depend on any component from the protected site to work.if your sites are close together and you truly want stretch HA then you will most likely be fine with these points BUT if you really wanted DR then these points will usually come as a surprise to customers and annoy them! In a metrocluster deployment granularity in the filer is at the aggregate level (highest level!!!) NOT at the volume level..metroclusters competition is NOT SRM. its not ALL automated by VMware HA by any means. As much as I like SRM solutions I also like the campus cluster / single pane of glass approach as well where it works/fits.VPLEX is EMC's stretched storage / HA solution. VMware HA does NOT do that.. some (not all) netapp account teams will try and only sell metrocluster because that solution is more $$$$$ for them and sometimes it because they just don't understand that netapp integrate with SRM very well!!! Other caveats of a metrocluster solution customers need to understand and be happy with are below.. Also there are failover scenarios to think about with metrocluster.

cannot go beyond this limit (672 = 6 switches) and would need 6 x 2 x 2 for redundancy at both sites. HDS One important thing to remember with HDS is that immediately after a real failover it will reverse direction and start replication in a new direction.in a stretched HA+metrocluster environment that is NOT the case. Metrocluster uses a single vCenter server for both sites.Ability to run the recovery in a pre-defined sequence that matches their own business recovery processes and SLA's ..metrocluster works for them there as they might be breaking the law by claiming to have DR protection if their datacenters are too close i..No single point of failure. SRM Reference Guide Page 81 of 166 .e both sites run with own management layers (vCenter) meaning if one of the sites is lost nothing from that site is needed to recover..Ability to pre-build recovery workflows that can be tested. Hope this information helps you work out what solution fits for your customer....Ability to run pre/post power on scripts . Max limit is 672 logins.Orchestration... why waste bandwidth???? With a Metrocluster solution ALL disk shelves MUST be mirrored even if the VM‘s within them are not needed for DR or are not business critical. vCenter is the single point of failure for some scenarios here.Ability to customize network at remote sites as part of the recovery .... using recovery plans to build recovery workflows that match what their business wants to happen .lot of switches J ALL disk shelves must be mirrored. In a metrocluster solution your using SyncMirror NOT Snapmirror.. This can be changed..Ability to run callout scripts that talk to other pieces of their infrastructure .Control of the failover? .. If you suspect the customer is being miss sold metrocluster ask customers simple questions to try and work out what do they want: .e could both be wiped out by the same disaster such as flood or power blackout to a city.hence they need NetApp Snapmirror combined with NetApp's SRA for SRM. so lots of flexibility and granularity no need to replicate things you don‘t need at DR site..Ability to perform non-disruptive tests .so less flexibility when carving up the storage..for greater distance need fabric/switch MC for up 100KM max distance span) Metrocluster solution *must* use recommended brocade switches to be supported All disks login to switches as ―hosts‖. For example if the two rooms were <20KM apart for some industries that might mean those two sites couldn't be classed by the regulator/auditor as DR anyway because they are too close together. If the customer has two sites / server rooms that are VERY close together and they do just want to run both rooms as "one" then that might be a good fit for metrocluster. if the answer to the above is YES then they are looking for a DR solution.i.so if that is the case. If you had say 20 diskshelves and broke those up into 4 aggregates and within each aggregate had say 40 volumes each containing single VMware NFS exports then with SRM you can simply snapmirror the volumes (export) you need to replicate. validated and invoked during an outage knowing the recovery will take place following the pre-programmed workflow ....

The HDS adapter is not returning the port.com/assets/pdf/implementing-vmware-site-recovery-managerwith-hitachi-enterprise-storage-systems. the shadow image snapshot is actually a ―remote‖ volume (―R‖) because the ―L‖ local volume is actually the replicated target.0 beta SRA has a log location of [SRM_InstallDir]\scripts\SAN\celerra\log\sra.22) is out and I think it important .pdf Celerra SRA release notes http://powerlink. Celerra and VMware Techbook . which presents the snapshot LUN.pdf .emc.com/2010/12/20/emc-virtual-storage-integratorvsi4-is-out/ Celerra This supports simultaneous recovery plan operation. A new SRA (4. This problem does not occur when all volumes are on the same port. Currently the HDS SRA doesn‘t set the path to the perl binary.com/collateral/demos/microsites/mediaplayer-video/video-walsworthtothepoint-vmsrm. Because it is not returning the port.wordpress.emc.emc.com/collateral/hardware/technicaldocumentation/h5536-vmware-esx-srvr-using-celerra-stor-sys-wp.it does. VMware Engineering is working on this with HDS.pdf . make sure you have checked the SRM HCL but also confirm that your SRA pre-requisites like SE.wordpress.Lots of help can be found in http://www. but it should do that properly in the next release .http://itzikr.0. even if LUNs on that part are made visible to the ESX hosts. SRM ignores it (by design) and the test fails.log . A video that talks about all four EMC replication technologies for SRM: http://www.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-007023.com/assets/pdf/hitachi-storage-replication-adapter-softwarevmware-vcenter-site-recovery-manager-deployment-guide.com/km/live1/en_US/Offering_Technical/Software_Download/SRMFailbackWizar SRM Reference Guide Page 82 of 166 .hds. Some help can be found in http://www.emc.http://www. FLARE/DART/RP versions are correct. Make sure they do not step on each other in terms of LUNs / VM‘s or their components. The reason seems to be some logic in the adapter which determines the ports on the array by looping through the list of source (―L‖ or local) replicated and shadow image devices on the target array. so the adapter is not returning the port of this volume.hds.com/2010/12/20/new-celerrasra-and-a-celerra-failback-plug-in-for-vmware-srm/ The Celerra 2. However. The SRA also needs the HDS‘s cci component installed as well. than SRM assumes that devices on that part are not for use by SRM.htm VSI version 4 is out . EMC As always. If discoverArrays does not return a WWPN.http://itzikr. SRM will only look for replicated datastores on devices that are presented to ports on the array that are returned by the discoverArrays command.pdf Celerra Failback plug-in Release notes http://powerlink.

which will be the new password. Until the next release of the SE software it is best to avoid this issue by using only 30 characters in the CG name. Press e to edit the line it is highlighting ("Linux" would be the normal word).com/2010/09/12/ubertastic-celerra-ubervsa-v3-unisphere/ Changing the Celerra passwords Use the following procedure to reset the root password: 1. 4. SRM Reference Guide Page 83 of 166 . Which causes SRM issues. "init 6" will reboot and it should boot automatically as normal boot. On the CX the snap name should have the following prefix: VMWARE_SRM_SNAP . At the end of the word. With this root login. Now the highlight should show the word "single" at the end.pdf Celerra and NFS with SRM .d_read_me_first. 6. Boot the Control Station or reboot or reset the power switch if shutdown commands cannot be issued. 2. I have been told that this is a requirement that is not documented anywhere.vmware.com/km/live1/en_US/Offering_Technical/Software_Download/EMC_Celerra_Failb ack_Plug-in_for_VMware_vCenter_SRM. 3.emc. When the BIOS checks complete and GRUB is loading.zip Celerra VSA – great for learning and testing . While the Solutions Enabler (SE) can be installed on the SRM server. 11.emc. The Solutions Enabler API is trimming off the last two characters from the name.com/docs/DOC-11541 Plug-in http://powerlink. Now the Control Station will boot to single user mode. 10. When using SnapView. or VM it sometimes will make thing easier to have it on the SRM server. append the word "single" with a space in the front and press ENTER. Access to the Console of the control station is required. 9. so either connect the console physically or use a serial console. reset the nasadmin password. CLARiiON Currently the CLARiiON has a limit of 32 characters for CG names. 8.http://communities. Select/highlight the line starting with the word "kernel" and press e to edit. 7. 5. Press b to boot from this modified line. remember that SV must snap to THICK luns. You would do this after logging in with the root account. with a # prompt appearing.com/collateral/hardware/technical-documentation/h5536-vmware-esx-srvr-using-celerrastor-sys-wp. This is for both CX and DMX equipment. if required. press any key (arrow key is best) to stop it from auto booting in 10 seconds.http://nickapedia. Use the new password set at step-10 for root. Issue passwd command and enter the new password (with confirmation of same) to be set.pdf http://www. physical host. which means it logged in as root already.

com/docs/DOC-11544 http://www. This is for both CX and DMX equipment. or VM it sometimes will make thing easier to have it on the SRM server. and EMC is SRM Reference Guide Page 84 of 166 . While the Solutions Enabler (SE) can be installed on the SRM server.com/c/v/V4081409244.Something that may be useful – SRM error is failed to create LUN snapshots http://blog.2 of the SRA and install / configuration. This is not a limitation of or by VMware but rather an EMC limitation. This may.yellow-bricks.vmware.pdf DMX When working with DMX.http://www. Powerlink (soft copy): http://powerlink.pdf New version of SRA – 2.emc. and if the SE host is a VM. physical host.com/km/live1/en_US/Offering_Basics/White_Paper/h7061-srdf-adapter-vcentersrm.xml but it says it did.vervante.virtualtacit. the gatekeeper LUN will need to be a pRDM. plus how to use the new features that include:  Test failover using TimeFinder/Snap off of a SRDF/A R2 (new with 5875)  Test failover without using TimeFinder technologies and instead directly running the test failover off of the SRDF R2  How to use the new VSI SRA utilities  And information in the Appendix on SE licensing.emc.html .com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wpldv. Or rather.2.http://communities. be mentioned in the release notes. It will be mentioned in the future if it is not. EMC CLARiiON . The DMX will need to have its LUNs in a device group.com/2010/12/16/new-emc-srdf-sra-for-srm-getthe-scoop-inside-3/ . you cannot use Timefinder snapshots.html?base_cat=EMC%3a%20EMC%20TechBooks&pard=e mc Important Note – the EMC VSI Plug-in version 4 does NOT write SRDF configuration out to the EmcSrdfSraOptions.com/2009/12/08/spc-2-set-or-not/ SRDF A new tech book on SRDF and SRM is now available. and using BCV‘s. Remember for DMX equipment the SE will need to have a gatekeeper LUN.wordpress.pdf Vervante (hard copy): http://store. It covers off version 2.emc. There is both hard copy and soft copy available. when the vSphere client is started (using the right click and start as admin option).0.com/home/2009/7/30/clariion-cg-snap-session-limit-smack-down-during-srm-testfa.http://itzikr.3 .com/collateral/software/solution-overview/h2197-vmware-esx-clariion-stor-syst-ldv.this is big release and an important one! SPC-2 . http://www. or may not. when it has NOT been started with the Administrator rights.

Look under:Home SRM Reference Guide Page 85 of 166 .2.emc.com/nsepn/webapps/btg548664833igtcuup4826/km/live1/en_US/Offering_Basics/ White_Paper/h7061-srdf-adapter-vcenter-srm.this is big release and an important one! To make your work with SRDF and SRM successful you will need two documents.wordpress.com/collateral/software/technical-documentation/h7061-srdf-adapter-vcentersrm.pdf I have had troubles with both links at different times.com/collateral/software/white-papers/h6971-businesscontinuity-view-srdf-wp.emc. which will be found in PartnerLink. 10/2/09 – there is updated SRA and Storage plug-in that makes things work easier! Make sure that you use them. and both links have worked for me at times.pdf Latest SRDF SRA release notes http://powerlink. If you cannot get the document I can send it to you! A View/SRDF/SRM white paper http://www.com/2011/01/10/srm-automatic-failback-using-emc-symmetrix-vmax/ New version of SRA – 2.thinking of other ways to manage this.emc.com/2010/12/16/new-emc-srdf-sra-for-srm-getthe-scoop-inside-3/ .3 .emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-010235_a03.com/collateral/software/technical-documentation/h7061-srdf-adapter-vcenter-srm.pdf Or at http://www.emc.wordpress. The first is the SRA release notes.com/collateral/software/whitepapers/h6368-using-emc-srdf-adapter-v2-vmware-srm-wp.pdf .emc.pdf Or http://www. The second is a new SRDF and SRM techbook. http://itzikr.pdf What licenses are necessary to successfully use SRDF and SRM? Generally you will require:  BASE  SERVER (to allow it to be an API-SERVER)  SRDFA (to allow it to manipulate SRDF/A RDF groups  SRDF (to allow it to manipulate RDF devices)  TimeFinder (to allow it to use TimeFinder /Mirror)  TimeFinder-Clone (to allow it to use BCV‘s for testing) A useful SRA and SRM document can be found at: http://www.I believe this may be been replaced with the document above.0.http://itzikr. 12/12/09 – I have heard but have not confirmed that SRDF will immediately after a failover reverse direction and start replication. which can be found at https://powerlink. EMC just told me the latest code went on powerlink today.

com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=-NULL--&internalId=0b01406680024e1b> > Product and Diagnostic Tools <http://powerlink. Check the ‗options‘ file in symapi/config folder and change the sym server security level from the default of ‗ANY‘ to ―NONSECURE‖.emc.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=-NULL--&internalId=0b014066800251e5> > Symmetrix Tools <http://powerlink. Instead you are expected to specify a resolvable host name or IP address in the address field of the Array Manager. The device pair list (as mentioned in your error) is stored in the xml file in Program Files\Vmware\Vmware Site Recovery Manager\scripts\SAN\EMC Symmetrix folder.emc. File is called EmcSrdfSraOptions. Sometimes where the Solutions Enabler versions are mismatched. these target devices are the Timefinder devices I just mentioned. Example device pair entry: <DevicePair> <Source>0477</Source> <Target>035F</Target> </DevicePair> Once you have a device pair for ALL of the R2 devices in your recovery plan save the xml file and try the test again. but it fails when powering on the VM‘s.emc.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=-NULL--&internalId=0b01406680270f14> > Symmetrix Tools for VMware <http://powerlink. you may find a failover that successfully proceeds past the storage configuration. It needs to be enabled on the front-end adapter on the recovery Symmetrix that is exposing the RDF and BCV LUNs to the ESX host. This issue may be due to SPC-2. On a Symmetrix using SRDF-A. The purpose of the EMC Storage plugin (latest one) is that it now includes an “EMC SRDF SRA” tab in vCenter that allows you to match up the pairs in vCenter and then save the xml file from that tab so no manual editing is required. SRM Reference Guide Page 86 of 166 .emc.xml In that file you need to specify the R2 devices and their associated VDEV/BCV pairs as part of the <TestFailoverInfo> information inside the device pair list element. VDEV’s if your sync and BCV’s if your async. SRDF adapter version 2. All of this is also covered in the SRDF guide (let me know if you don’t have this and I can send separately).emc. I have included a screenshot below of what this looks like.0 does not reference the netcnfg file any longer. You can find information on this in our forums. but also in document emc71378 in EMC‘s Powerlink KB. otherwise SRM cannot match the WWN of the LUN returned by the SRA with the WWN of the LUN present to the ESX host.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=image7 b&internalId=0b01406680407180&_irrt=true]]> When you are performing the failover test what kind of devices are we working with sync (SRDF/S) using Timefinder/Snaps (VDEVS) or async (SRDF/A) using Timefinder/Clones? (aka BCV’s) At the recovery site you need to “pair up” the R2 devices (replicas) in your datastore groups being “tested” with appropriate target devices for testing.<http://powerlink. Confirm both sides. SYMAPI_C_NET_Handshake_FAILED error This usually occurs when there is a security level mismatch between client and server.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=homeP gSecureContentBk> > Support <http://powerlink. You can even add a port with it if you are not using the default 2707.

To use SE in a client-server fashion. If you wish to have application consistent VM‘s after a failover you will need to use Replication Manager to arrange that. whatever you want). By making the change on the initiator. On the SE "client". The use of the netcfg means that there is a single point of failure. Some clients might use the Control Center as the SYMAPI server to avoid this. SE needs to be installed on both the SRM server (Windows version) as an SE "client". All devices in a consistency group (device group) must be failed over together. the hostname of the SE server (in this case. If you are using the SRDF SRA you will need the following manual step to avoid errors in the log that appear to indicate a path issue. The default path to SYMCLI is C:\Program Files\EMC\SYMCLI\bin . If they need more they will need to create multiple consistency groups. The SYMCLI is what is required by the SRA to talk to SRM. The EMC adapter seems to be coded to skip any devices in Adaptive Copy state (Data Mobility is the fancy name for Adaptive Copy). The "service name" is the name that should be entered for the SYMCLI_CONNECT environment variable on the SRM server. You will see errors about this if you don‘t. SRM Reference Guide Page 87 of 166 . I am not sure if it is yet compatible with SRM. as well as an error in the UI that is ―Failed to launch SAN integration scripts to execute discoverArrays command‖. That's how the SE "client" identifies the SE "server" to direct its SYMCLI commands to. http://www. EMC recommends to use the Solutions Enabler in a "client-server" fashion because the SRM server typically does not have direct fiber connectivity to the SAN (whereas the ESX host does).com/2009/12/08/spc-2-set-or-not/ .yellowbricks. you edit the netcnfg file to tell SE who the SE "server" is. you can avoid moving hosts off the FA to make the change there. This means they can only have one protection group and one recovery plan. This will occur when you are trying to configure the Arrays during the initial SRM setup. After adding the path you will need to restart the SRM server service. The edited line contains a "service name" (which can be arbitrary. This is when using SRDF/A. and also on at least 1 ESX host (RH Linux version) as the SE "server". The SRA by default creates a log under \program files\emc\symapi\log with the name of symvmwsrm<date>. Put the path of SE bin folder into the System Variable PATH. you should restart the SRM server after you are complete with the SE install and tweaks. In addition SRDF DM copies dirty tracks out of order to the R2 devices so likely not able to guarantee a consistent image so it is not a good SRM candidate.pdf SRDF DM doesn‘t work with SRM.com/files/pdf/VMware_SRM_SRDF_bestpractices. For SRM setups.vmware. By default it is C:\Program Fiels\EMC\SYMCLI\bin . the ESX server).This SPC-2 flag can be set either on the FA OR the initiator itself. In addition.log. Using EMC SRDF Adapter for VMware Site Recovery Manager http://www. and the IP address of the SE server. SRDF issue (thanks Jason for this sample): Customer claims datastore DMX-25-SRM-Testing-955 is on a replicated LUN however SRM does not create a datastore group including this datastore. The solution is to add a path to the SYMCLI binaries to the ―System variables‖ Path environment variables. as these devices won‘t be reported to SRM any VM‘s on these LUNs cannot be added to a protection group.

so SRM cannot map vmhba1:3:14 to a replicated LUN. but SRDF adapter only supports SRDF/S or SRDF/A.14.e.0' vmhba1:3:14. So the LUN is being replicated but not in the right mode.431 'SanConfigManager' 13084 trivia] Added vmfs extent 'host-7422. Solution is for customer to correct the LUN on which the datastore was created so that it is in synchronous or asynchronous mode.50:06:04:82:D5:2E:89:09' with keys 'host-7422. this host sees this LUN's UUID as: [2009-03-11 13:04:23. which is not this WWN: [2009-03-11 14:30:46. i. and the adapter is skipping it. but they are in the "SyncInProg" state in Adaptive Copy mode. i. I see: [2009-03-11 13:04:34.vmhba1:3:14' with key 'host7422. Elsewhere in this document you can SRM Reference Guide Page 88 of 166 . LUN 14 on target 3 of hba1 on host-7422.49b66ac2-f8e6e22e-e912-002264f6252c. so they would have to be in the "Synchronous" or "Asynchronous" mode. As of 12/12/09 you can only run 1 simultaneous recovery plan. 600604800001901020525330303935355 However. not adaptive copy (which in fact is the mode when you do the initial full synch from R1 to R2) MirrorView If you are using MV with Clariion you will need to use Solution Enabler (for communications) and Navisphere for the replication management.Looking at SRM log.vmhba1:3:14' and 'host7422.962 'SanConfigManager' 13084 verbose] Adding datastore 'DMX-25-SRM-Testing-955' with MoId 'datastore-8023' and VMFS volume UUID '49b66ac2-f8e6e22e-e912-002264f6252c' spanning 1 LUNs I see that this UUID is [2009-03-11 13:04:32.493 'SanConfigManager' 13084 trivia] Added LUN '10:00:00:00:C9:7A:42:65. discoverLuns returns only 1 replicated LUN.02000e00006006048000019010205253303039353553594d4d4554' the LUN WWN is encoded within the UUID (last token of this line) as characters 10 through 42.0' standalone='yes'?> [#2] <Response> [#2] <LunList arrayId="000190102052"> [#2] <Lun consistencyGroupId="RA::9" id="8F3" wwn="60:06:04:80:00:01:90:10:20:52:53:30:30:38:46:33"> [#2] <Peer> [#2] <ArrayKey>000187401329</ArrayKey> [#2] <ReplicaLunKey>738</ReplicaLunKey> [#2] </Peer> [#2] </Lun> [#2] </LunList> [#2] <ReturnCode>0</ReturnCode> [#2] </Response> How could there be only 1 replicated LUN? looking further up in the log the SRDF SRA reports several messages such as: [#2] 20090311 14:30:45 INFO Skipping SID [000190102052] RDF device [82C] config [#2] [RDF1+R-5] mode [Adaptive Copy] pair state [SyncInProg] [#2] star mode [False] meta type [Member] So it is skipping several LUNs that presumably are RDF1.094 'PrimarySanProvider' 14848 trivia] 'discoverLuns' returned <?xml version='1. If you use MV with the Celerra platform you will require neither.e. Replication Manger is useful for both.

It appears to me that you create the snapshot (or the storage admin does) before it is required and than the SRA activates it.3 or later 3. Confirmed – this is correct. but in December 2008 it may be supported. This is for test failover only. you will end up using 2162 / 2163. The MV SRA works on ports 80/443 but if they are not used. SE can be installed with no configuration or extra bits.2.5. With MirrorView you will need to make sure the EMC array scripts are in the same folder structure as the SRM install. This is only relevant if you have installed SRM to a different drive. You can get a ―failed to create LUN snapshots‖ error when working with MirrorView. In PowerLink article emc203510 you can find the SE Patch Release 6.20 that reduces the time required for storage preparation by more than 50%. Replication Manager Currently Replication Manager (RM) doesn‘t co-exist with SRM.experimentally change this. But see above how you could do I tnow if you need to test it. create snapshots on both sides Add the production side snapshot into the storage group for the Production site ESX hosts Add the Target volume an its newly created snapshot into the DR side ESX host storage group Create a consistency group on the production array and add the MirrorView relationship(s) to the consistency group. If you are working on a 64-bit SRM host. 4. If you have some performance issues with the failover you may be using an old version of the Solutions Enabler. 5. When installing solutions enabler accept all the defaults and perform a ―complete‖ install. You can sometimes avoid it by using the following steps:          Create the source volume and mask it to the ESX hosts on the production site Build a VMFS datastore on the Source and add a Guest to the Datastore Use the Navisphere MirrorView wizard to create a Target volume Create the MVs or MVA relationship Add the Target volume to the MVa / MVs relationship Once synced. It is generally a problem in the EMC configuration.5. There is some disagreement about this so it may work but I am checking. Consistency Groups must have pure alphanumeric characters in use or a real failover will work but not a test. The snapshot must have ―VMWARE_SRM_SNAP‖ in the name somewhere. For now the Release notes say no. This will impact a number of applications. In addition. It will take NavisSphere engineering changes to support running more than one RP. it has been suggested that all of the Storage Enabler options need to be installed. It has been said it will be dramatically easier to setup the array-to-array replication. SRA 1. Hopefully this will be improved in the future releases of the SRA. SRM Reference Guide Page 89 of 166 . you must use the 32-bit solutions enabler software. You can have only 1 recovery plan active when using this SRA.2 or later 2. Solutions Enabler 6. Some specific suggestions for MirrorView/S on Clariion would include: 1.

pdf The log location for the SRA is c:\program files\EMC\SYMAPI\log . It has been reported to me that RecoverPoint will support simultaneous recovery plan operation. or business unit per CG. This occurred due to the FA flag on the DMX source storage that was not set for the RPA but was in fact set for the ESX servers. This would provide the greatest flexibility in testing and failover.0. The CG polices that must be set include reservations support and VMware ESX or VMware ESX Windows as the host. which is 1. SRM Reference Guide Page 90 of 166 . You must organize that so that nothing impacts each other but it works. This is a clue that indicates the solution. There were device 0 and device 1 errors for the VM configuration since the VM configuration files were not being seen in the time that SRM required. This has been reported as necessary for HP. http://www. Changed it to one CG and the rest the same and it worked.1. That would be one or more LUNs as that app or business unit would require.com/collateral/software/white-papers/h7261-business-continuity-vsphere-recoverpointwp. when there were 10 CG‘s and one VM. The account that you use in SRM Array Manager to talk to the RecoverPoint appliance must be configured as admin in the RecoverPoint appliance. It was the RP SRA 3. It seems that RecoverPoint and SRM have issues if the ~ is in the CG name. Avoid that. but now with a large RecoverPoint CG as well. The RecoverPoint SRA uses TCP 7115 to talk to the RPA‘s. This error occurred after entering your credentials and selecting Connect. SPC2 issue with RecoverPoint and DMX If you see the error message below when working in the Array Manager and trying to configure your connection to RecoverPoint that is using DMX storage you may have an SPC2 problem. See How can I configure a second HBA rescan? For help on fixing this. that had 23 VMDKs attached and spread around those 10 CG‘s we were not able to do a failover. It appears that the SRA cannot handle it – you can see errors in the SRM log about not being able to find CGs. and sometimes with big HDS and SRDF environments. If the MUI cannot talk to the RPA neither will the SRA.1. EMC was a very quick help with this issue.emc.RecoverPoint You should avoid having spaces in a Consistency Group. In addition. With the number of CG‘s that are currently supported. A customer recently had 19 LUNs in a CG and was failing over unsuccessfully. A manual refresh on the host brought all the VM‘s online.2. You need to do two HBA refreshes. The CG should also be a CRR consistency group for remote replication and not CDP/local or CLR / local-remote. and that the number will grow in the future. it is suggested to think about have one app.

Test Recovery fails with already accessing image error message If you do a test failover when using RecoverPoint and it fails and in the very large error in the history report you see near the bottom a message about already accessing image you will know that the recovery side (or target) LUN is already set for access before the SRA arranges for it to be set for access and this generates an error. Additionally.2. RecoverPoint engineering identified this and the necessary changes made in the lab.. which is the source server – the VirtualCenter server.pl –version And you should see: EMC RecoverPoint Adapter for VMware Site Recovery Manager version 1. the RPA connections to those LUN‘s on the DMX did not have any SPC2 bit setting in place on the FA ports or for the RPA initiators.0.0. This bit setting was set on the FA ports that the ESX host was connected too. as well as the RPA appliances on both the source and target site. It was asked when the Site Management IP is in a protected management network.Site Management IP Using RecoverPoint (RP) which server talks to the RP management server? During the Array Manager configuration a connection is created to the Site Management IP for RP. they then required the HBA‘s on the ESX to be reset. Make sure you are using v1.0 SP2 P1 Which makes v1. SRM server. ―In the connectivity of ESX to the DMX there requires the SPC2 bit be set on the DMX array. So Lessons learned are to make sure that the SPC2 bit setting is in place prior to deploying SRM for both the ESX and RPA appliances for those LUN‘s. though somehow the HBA wwn‘s were excluded on the symmask list. SPC2 bit setting can be done at the FA port level or at the initiator level. or the ESX server? It is the SRM server..2. So if you can open the RP GUI from the SRM server. as part of this specific LAB environment (would not happen like this in an actual implementation) That is why the SPC2 bit setting issue hit both source and target SRM implementations even though the RecoverPoint target storage was on Clariion (which doesn‘t require SPC2 bit)..8. the SRA should work. It would be good if someone could test this. After the SPC2 bit was set. The target site RPA‘s required to be reset because the target site Journals actually resided on the same source site DMX.\.\external\perl-5.8\bin/perl. DMX and RecoverPoint This is from a support guy on how something was fixed in a RP issue with a DMX. and a rule is required to be created for the firewall to provide access. The instructions above are not quite right – there is an issue with the path or format. which is often located on the VC server that needs the communication with the RecoverPoint Site Management IP.1 or later.0 RecoverPoint SRA uses the same ports as the RP GUI (1099 and 4401).‖ SRM Reference Guide Page 91 of 166 .1.\.exe command. WARNING: UNKNOWN_ERROR When you see an error that looks like [#1] Fri Mar 20 09:25:05 PDT 2009 WARNING: UNKNOWN_ERROR it can mean that an older SRA is in use. Unable to connect SRM to the RecoverPoint Management Server The 3. This caused a mis-match of LUN UID‘s that SRM saw on the ESX host versus the RecoverPoint Appliances. AM I using the latest version of the SRA? If you are in the RP folder on the SRM server you can run: C:\Program Files\VMware\VMware Site Recovery Manager\SAN\array-typerecoverypoint>.

The FalconStor virtual appliance is both a gateway and a storage device. Sort of makes sense when you think of the error message. Using the SRM API there is always a way to integrate failover among disparate clusters Q: Why is the test recovery required to be performed on the entire RA group when it is possible to snapshot a LUNs using BCVs? A: This is a good question that was posed to the EMC engineering team. In a DR situation this would NOT be an issue since the primary volumes would not exist. As of the most current SRA (3/26/10) you will see the keycode is inserted for you.).0 but at this time I believe it does. but before the secondary storage recovery – you could use a script callout for this). etc. It is reported that only FalconStor has a product that integrates with SRM with this particular requirement. Another time I saw this was SRM Reference Guide Page 92 of 166 .g. Not correct any longer. You can address this by disconnecting primary ESX hosts from the primary targets pior to failover. This is designed behavior by FalconStor. I have seen this twice. 4/7/09 – Update – this is not an issue with current versions of the SRA.1 and IPStor 6. a Solaris cluster. such as a multi-tiered application that requires cross-application consistency. Once when TimeMark was not enabled on the recovery side. or they can script this disconnect as part of the recovery plan (after the primary VM shutdown. ECC should have the ability to manage any device regardless of its visibility to any host. FalconStor You should be aware that the FalconStor replication product will not allow a takeover of the replicated data if there are still hosts with live iSCSI connections to the primary volumes. Error: Non-fatal error information reported during execution of array…. which means SRM will always try to fail over those LUNs as a unit regardless of the presence of VMs on them Q: What if the adapter does not have visibility to all of the LUNs in the RA group A: EMC recommends the adapter use EMC Control Center (ECC) as the management server for manipulating the RDF devices. what about the script that tries to fail over the LUNs that is running on the non-ESX hosts (e. I have not been able to confirm that this behavior exists with both IPStor 5. it will impact SRM protected VMs. FalconStor SRA invalid keycode error The FalconStor SRA requires a key before it will work. FalconStor is more known as a company that provides gateway products rather than storage. and if not. EMC codes their adapter to best practices Q: What if the RA group constrains LUNs not used by ESX hosts? A: This question can be turned against any DR software. so this is probably not a best practice. This is very useful in the BC / DR space. If you have no key you get an invalid keycode error. The complete error is ―Error: Non-fatal error information reported during execution of array integration script: Failed to create lun snapshots‖. my understanding of their answer is that if an RA group is created it represents a consistency set of data that must be tested together.EMC Q & A Q: What happens if SRM uses fewer LUNs than those contained in an RA group? A: EMC's newly published adapter populates the consistency GroupId field of the SRM XML specification by defining all of the R1 LUNs in an RA group as part of the same consistency group. then any script (not just an SRA) to manage that group would be impossible Q: What if VMs not part of a recovery plan use the same RA group? A: Customer that replicates VMs using SRDF but does not recovery them as part of a recovery plan is probably wasting bandwidth replicating data that is never consumed. But this is an issue for test modes.

8 . The second is that they have capitalization errors in two files. <ConfigPath>=C:\Program Files\VMware\VMware vCenter Site Recovery Manager\scripts\SAN\</ to read: <ConfigPath>=C:\Program Files\VMware\VMware Site Recovery Manger\scripts\SAN\</ It is important to note that you need to restart the SRM service to make this work. It will proceed fine. Old information The IBM DS4xxx SRA when installed has two issues that stop it from working. It is a version different in the path.exe is not set. The SRA level of logging is set by the config. Does the FalconStor log file have any extra info? No. To solve this issue you need to change the following file: C:\Program Files\VMware\VMware Site Recovery Manager\config\vmware-dr. sometimes if you have trouble looking in the SRM log for SRA info. For information on changing the SRM log levels see the SRM admin guide.pl and common.35. You need to look for the $XML_RETURNCODE = ―Returncode‖ line and change it to ―ReturnCode‖.when the NSS Appliance had not been fully patched.xml You will need to look for the line under SanProvider and change the ConfigPath variable. You can of course have it run in compatibility mode and it will install fine. The path that needs to be added to the environment is c:\Program Files\VMware\VMware Site Recovery Manager\external\perl5.pm and they are both in the C:\Program Files\VMware\VMware Site Recovery Manager\scripts\SAN\IBM folder.01.06. You should also add c:\program files\VMware\VMware Site Recovery Manager\scripts\SAN\IBM to the path as well.0 path and you will than need to return the file to the way it was! SRM Reference Guide Page 93 of 166 . As well. By installing all patches as of 3/17/09 this error went away. It is possible to do different logging levels for the SRA and product more info for the SRA log than what goes to the SRM log but that is not the default method enabled. Both of these are mentioned in the readme so make sure to read it. no other vendor SRA‘s will now work with this change! And one day the IBM SRA will use the new SRM 4. Another path issue that has been reported (11/26/09) shows itself with errors when trying to configure the array. it maybe easier for you to look at the SRA log.ini – for more info see the FalconStor troubleshooting section of the SRA admin guide. and than run it as Win2K3 (SP1). Just right click on the installer and select Compatibility. The two files that you need to edit are command. It has the same info. but none of the VMware stuff. This has been reported for both the IBM SVC ad DS8K. IBM IBM DS4000/5000 IBM branded SRA (SMSRAinstaller-WS32-101.8. The first is that the correct path to perl. However.exe) on Win2K8 will not install This LSI SRA will not install on Windows 2008 and it will quit.exe is in. You can confirm this by looking for the directory that perl.

If the test recovery fails around 3% make sure that the flashcopy is deleted. It is not known currently if an SRA update will be required as well.IBM DS8000 This SRA has a configuration utility that is case sensitive. I have confirmed that the SVC SRA supports simultaneous recovery plan operation. They followed best practices. and hopefully in the 2Q2011 there will be a microcode update for the DS8000 that will fix this.vmware.1 SRA from IBM as well as the unreleased next build of it.x and information on how it can be made to work is in http://communities. and the recovery side.com/docs/DOC-12372 .10713 it seems they have fixed most of the issues below. The manual does talk about the utility. It is still there likely due to a failed test.exe on the SRM server desktop. SRM Reference Guide Page 94 of 166 . IBM SVC IBM recently released additional SRA support for other devices.com/kb/1013616 for examples when this is not done).20. The problem was the SVC management host on the protected side. the IBM SVC SRA installs a utility called IBMSVCSRAUtil. If the customer has only one IBM SVC Console to manage both the protected and recovery array you will run into the errors mentioned in http://kb. but doesn‘t mention it is case sensitive! For example. when you configure a field with p4. It still has some things to understand. The DS800 is only able to pause all of the replicated LUNs. As well. This means if you replicate three LUN‘s and one of them is not managed by SRM. my first experience with the SVC SRA was with a client that had IBM install the SVC hardware. it will still be impacted during a failover. it was that my management server also managed the recovery side.0.com/kb/1013643 and easily solved by having to consoles. You need to understand this utility and the document included with the SRA explains this utility starting on page 11.exe not in the path is a problem for the SVC SRA. Meaning it will be paused along with the SRM managed / used LUNs. If the test recovery fails around 14% it is like due to the configuration and you should check it out. This is an IBM issue. The workaround is to put a management agent on the SRM server (on each side) and make sure it can only see / manage the one side it is assigned to! This was using the 1. In my case. but that meant that the SRA didn‘t work.vmware. could manage either side of the SVC. Consistency Groups are not supported SVC host object names must be 9 characters or less LUN‘s assigned to ESX must have vmware in the SVC‘s hostname (see http://kb. The patch error mentioned about perl. Thanks again to Brock for this! Some things to note about the SVC SRA include:         My experience is that it had the perl issue mentioned above. In addition.vmware. it will not work if the array sees that field as P4. With 1. namely SVC. IBM XIV This new hardware is supported by SRM 4. So when the SRA was installed on the SRM server and pointed at the protected side SVC management host it was confused as it saw both sides. Check the Storage Partners compatibility matrix for the specifically supported hardware info.

aspx?docname=4AA2-8848ENW&cc=us&lc=en An online guide that helps with the EVA can be found at: http://h20000. restart the service again and it should work fine. Make sure the path HKEY_LOCAL_MACHINE\SOFTWARE\VMware. but have not confirmed. If you use 1.\VMware Site Recovery Manager] "InstallPath"="C:\\Program Files (x86)\\VMware\\VMware vCenter Site Recovery Manager" Once you have corrected the registry. It is for an old version of SRM but it still is helpful.hp.Dell EqualLogic Dell firmware 4 requires the SRA that is 1.www2.00 [HKEY_LOCAL_MACHINE\SOFTWARE\VMware. The regedit type file would look like: Windows Registry Editor Version 5. http://www.jsp?lang=en&cc=us&taskId=120&prod SeriesId=499896&prodTypeId=18964&objectID=c01493772 In my experience with working with an EVA in a PofC I found the above guide very useful.com/resourcecenter/assetview.com/bizsupport/TechSupport/Document.\VMware Site Recovery Manager\InstallPath exists and it has instead of InstallPath the path to the SRM installation folder without quotes.com/V2/GetDocument.0 with EqualLogic firmware 4 you will have a failover fail with a missing share LUN flag on the disk array management.aspx?id=5261 Compellent Currently there is a small issue with installing the SRA on Win2K8. When you start a failover (not a test) the replication will be automatically be reversed. There is an excellent best practices guide for working with HP and SRM at: http://h20195. or deselect the arrays in the array configuration depending on which side you are working with.hp. You can find help installing the SRA at the URL below.01.www2. account name and password fields filled in for both sides! This means both values – protected site and recovery site are entered in each field and separated by a semi – colon. It did neglect to mention several things. This can be a big surprise and an issue if there is low bandwidth. Confirmed.0. It is due to registry security issues. You need to select. HP EVA I have heard. that after a failover the direction will be reversed and replication started. I could not find anywhere that had mentioned the mode of access be set to none. You need to the management IP addresses. Inc. Watch out for this as if you have a slow connection it can cause issues! You need to set the rescan to two times – find out how in How can I configure a second HBA rescan? SRM Reference Guide Page 95 of 166 . install the SRA again.equallogic. Inc. Our site currently has version 1.

com/2009/01/20/sradiscoverluns/ .‖. so on a rescan the ESX kernel doesn‘t learn abut the LUNs.As well here is some background information . The recovery site EVA must have enough space to contain the snapshot volumes. This was seen due to an error 4 in the SRM logs. Replicated vdisks must e in a DR group and ALL vdisks in that DR group must be used by the primary / recovery site Hosts. but it might be useful for background information.yellow-bricks.0.xxx v 8.2xx v 8.1 EVA4400 XCS 9.1xx or 6. Background Information The info below may not be necessary for the latest SRA.0.0.1xx or 6. It has been seen once that the SRM service needed a domain account to talk to the Command View EVA. the driver does not deliver information about new LUNs to the ESX kernel in a timely fashion.2xx v 8. The version / firmware information below is what works with the HP StorageWorks EVA Virtualization Adapter version 1.1xx or 6.  In some combinations of storage hardware and FC drivers.2xx v 8.1 EVA6100 XCS 6. I have not been able to confirm this – no HP gear in my lab – other than LHN but this might change whether the Command View EVA is local not? I have confirmed in a VMware QA lab this change was not required.0. When using Hp Command View with HP EVA. You can configure SRM 1.0.1 Update 1 to do a second scan if necessary. A work around for the EVA right now is available.2xx v 8.2xx v 8.  After setting up the replication pair between two EVA arrays.0.1 EVA4100 XCS 6.http://www.1 EVA8000 XCS 6. If using two command view EVA management nodes then both IP addresses must be entered in the SRA config wizard primary first.0. It occurs when discoverluns runs as part of the setup through the SRM Plug in it fails with error 4.1 Miscellaneous information      Replicated vdisks on the EVA MUST be zoned to both sites (but must be created on the primary command view server).1xx or 6. similarly the CV server at the recovery site manages its EVA actively and the protected site EVA passively.1xx or 6.1 EVA8100 XCS 6.1 EVA6000 XCS 6. at the secondary array take a manual snapshot of each of the target Vdisks and present those snapshots to the ESX host using the default LUN number that the EVA management system picks (usually the lowest available SRM Reference Guide Page 96 of 166 . Then either CV server fails the other will takeover with no manual intervention required. HW Models Firmware CommandView EVA4000 XCS 6.0.2xx v 8. than secondary separated by a ―.1xx or 6. A second rescan is necessary in order to deliver info about the new LUN to the ESX kernel. Use the following steps to have a successful test. the HP best practice is to run CV servers active / active where the CV server in the Protected site manages its EVA actively and the remote EVA passively. if vdisks in the DR group are used by other hosts then the SRA discards them.

LUN number). Then do a rescan twice on the ESX host so it discovers those LUN numbers for the first time. Then destroy the snapshot. Now if you do a test recovery, the SRA for EVA will create a snapshot of each of the replicated LUNs and the behavior of the adapter will present those snapshots to the ESX host using the default LUN number. Since those numbers will already have been been discovered by ESX because of the manual steps done in the previous step, only a single scan will be required so the test will succeed.

Real solutions are not far off for this problem. It has been confirmed that some HP arrays will need a second rescan to make the recovered LUN‘s visible. You can find information on how to manage that elsewhere in this document. HP Storage Virtualization EVA Adapter configuration information http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=120&prod SeriesId=499896&prodTypeId=18964&objectID=c01493772 SRM has to be set up with HP Command View (CV) EVA to always have the DR site as the hosting the HP EVA failover primary site (i.e managing it actively), which should make sense since in a failover, you would lose the primary site and would not really want to have to manually make CV active . One area, which seems to cause HP EVA customers a lot of problems, is working out which CV setup/config they have in place and then working out how this should be entered into the GUI in SRM. Example: lets assume you have two command view (CV) EVA servers one in datacenter A (DC A)and one in datacenter B (DC B) and in each datacenter you have an EVA. EVA-A in DC A and EVA-B in DC-B. Usually HP will recommend that the CV servers are configured active/passive with each CV server actively managing the EVA in its local DC and passively managing the EVA at the opposite DC. so in this example we would have, CV-DC A manages EVA1 actively EVA2 passively CV-DC B manages EVA2 actively EVA1 passively When entering the information in the "Configure Array" wizard you can include the ip addresses for both CV servers in the same line and separate with a ";" you also assume both CV servers can be accessed using single login username/password. When you hit the "Connect" button two storage arrays will appear, when entering the protected side info simply check the box for the local EVA at the protected site and when you get to the recovery side screen select the other EVA. Some customers have an alternate configuration which is NOT HP best practice that is: CV-DC A manages EVA1 actively and EVA2 actively CV-DC B manages EVA2 and EVA1 passively The HP SRA adapter cannot associate vdisk with drgroups when connected to a passive command view host and I think this configuration has caused some customers issues during the setup stage. I believe this second config works but you need to be careful in the config array wizard as we cannot currently force the passive CV to become active. Below are some other checks that you may want to look at. Verify that the vdisks are correctly presented to the ESX hosts at both sites. I have seen issues where
SRM Reference Guide Page 97 of 166

customers don't have the access method set correctly for the vdisks. The HP documentation seems to make customers believe that the replica vdisks at the recovery site need to be made ―accessible‖ to the recovery site ESX hosts at all times i.e read/write. All that is actually required (as with other replicated array configs) is that the replicated luns, at the lun device level, simply need to be in the same ―zone‖ as the ESX hosts at the recovery site (i.e within VC they will appear on rescan in the storage adapter screen but not in the storage/vmfs datastores screen by default). Other things we have seen include:  Replicated vdisks on the eva MUST be zoned to both sites (must be created on the primary command view server) check they have done this.  Replicated vdisks must be in a DR group and ALL vdisks in that DR group must be used by the primary/recovery site ESX hosts, if vdisks in the DR group are used by other hosts then the SRA discards them. So again they need to verify this.  If using two command view eva management nodes (as described above) then both ip addresses must be entered in the SRA config wizard primary command view EVA first then secondary command view EVA, separated by a ―;‖ Recovery site command view EVA should be defined as the site that is the failover primary.  Customer must ensure recovery site EVA has enough space to contain the snapshot volumes. The SRA produces a log (hpsrmeva.log) which is a good place to look for other error messages. We have seen where sometimes the issue is a miss - configuration error of the SRA / Array Manager. During the setup because of the way Command View works you are presented with both EVA‘s in the Protected Arrays and Recovery Arrays screens. you need to uncheck the relevant EVA at each screen. Failure to do so can cause issues when you run test plans.

Miscellaneous Information URL’s
SRM 4.1 Release notes - http://www.vmware.com/support/srm/srm_releasenotes_4_1.html SRM 4.1 upgrade blog - http://blogs.vmware.com/uptime/2010/07/upgrading-to-srm-41-includingupgrading-to-vsphere-virtualcenter-41.html SRM 4.0 Release notes - http://www.vmware.com/support/srm/srm_releasenotes_4_0.html SRM 4.0 Upgrade KB article - http://kb.vmware.com/kb/1013166 SRM 1.0 Release Notes - http://www.vmware.com/support/srm/srm_10_releasenotes.html SRM 1.0 Installation and Configuration guide – http://www.vmware.com/pdf/srm_10_admin.pdf SRM install and configure video - http://mylearn.vmware.com/register.cfm?course=22279 List of SRM links - http://tendam.wordpress.com/srm-links/ VMware SRM in a NetApp environment - http://media.netapp.com/documents/tr-3671.pdf SRM 4.0 performance whitepaper - http://www.vmware.com/resources/techresources/10076

SRM Reference Guide

Page 98 of 166

Dell EqualLogic guide - http://www.equallogic.com/uploadedFiles/Resources/Tech_Reports/TR1039-De ll-EqualLogic-PS-Series-SAN-and-VMware-SRM.pdf Using EMC SRDF Adapter for VMware Site Recovery Manager http://www.vmware.com/files/pdf/VMware_SRM_SRDF_bestpractices.pdf VMware vCenter SRM in a NetApp Environment - http://media.netapp.com/documents/tr-3671.pdf VMware Uptime blog (VMware and Business Continuity) – http://blogs.vmware.com/uptime Availability Zone of VI:OPS - http://viops.vmware.com/home/community/availability - this includes a lot of lab setup info for various storage arrays. LeftHand Networks SRA Failback Procedure for SRM http://www.lefthandnetworks.com/document.aspx?oid=a0e0000000000NxAAI SRM in a can – EMC – with automated failback info http://virtualgeek.typepad.com/virtual_geek/2009/07/updated-site-recovery-manager-in-a-can-doc-nowwith-extra-emc-automated-failback--.html HP Storage Virtualization EVA Adapter configuration information http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=120&prod SeriesId=499896&prodTypeId=18964&objectID=c01493772

Syntax highlight module info
Text Wrangler language module
Use the info below to create a text file called log.plist.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd "> <plist version="1.0"> <dict> <key>BBEditDocumentType</key> <string>CodelessLanguageModule</string> <key>BBLMLanguageCode</key> <string>MWTR</string> <key>BBLMLanguageDisplayName</key> <string>Log</string> <key>BBLMSuffixMap</key> <array> <dict> <key>BBLMLanguageSuffix</key> <string>.log</string> </dict> </array> <key>BBLMColorsSyntax</key> <true/> <key>BBLMIsCaseSensitive</key> <false/> <key>BBLMKeywordList</key> <array> <string>authorization</string>

SRM Reference Guide

Page 99 of 166

Using SQL Server Management Server Express. Lab 1 – Installing SRM VMware vCenter Site Recovery Manager (SRM) Installation – Lab Station 01 Create the SRM database instance 1. #DELIMITER=[]:() #CASE=y #KEYWORD=Error error ERROR #KEYWORD=Warning warning WARNING #KEYWORD=Verbose verbose info trivia # Lab Exercises Below are a number of lab exercises from the Partner Exchange 2010 SRM bootcamp lab. They may be useful to people new to SRM. This file is required for EditPlus to run correctly.<string>credentials</string> <string>authentication</string> </array> <key>Language Features</key> <dict> <key>Identifier and Keyword Characters</key> <string>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</string> <key>String Pattern</key> <string>(error|warning|critical)</string> </dict> </dict> </plist> EditPlus Use the information below to create a text file called log. create a new database instance. SRM Reference Guide Page 100 of 166 . . XML syntax file written by ES-Computing. #TITLE=XML .stx.

Provide a name for the database instance. SRM Reference Guide Page 101 of 166 . Select the System DSN tab. Scroll down the list and select SQL Native Client. Click the Add button to open the Create New Data Source window. Close SQL Server Management Server Express. 7.2. 3. Start > All Programs > Administrative Tools > Data Sources (ODBC) 5. Create the ODBC data source connection for SRM 4. 6.

keep the default settings (which should be With Integrated Windows authentication). Using the Server drop-down menu. STU16-VC-B\SQLEXP_VIM when installing in the Recovery Site. 9. SRM Reference Guide Page 102 of 166 . Provide a Name and Description (optional) for the Data Source connection. Example: Station 16 would select STU16-VCA\SQLEXP_VIM when installing in the Protected Site. The number in the server name must match the station you are sitting at. 10.8. select the local server\database. For authentication.

11. SRM Reference Guide Page 103 of 166 . Change the default database to the SRM database instance name.

keep the default settings and click Finish. On the last window. SRM Reference Guide Page 104 of 166 .12.

SRM Reference Guide Page 105 of 166 . Click the Test Data Source button which should result in a ―test completed successfully‖ message.13.

SRM Reference Guide

Page 106 of 166

14. Click the OK button to close the ODBC Data Source Administrator window. This step completes the setup of the SRM database and the ODBC data source configuration.

SRM Reference Guide

Page 107 of 166

SRM Installation 15. Locate the SRM installation files in c:\files\SRM

16. Click the VMware vCenter Site Recovery Manager link.

SRM Reference Guide

Page 108 of 166

Click the Next button.17. SRM Reference Guide Page 109 of 166 .

Accept the license agreement and click the Next button.18. SRM Reference Guide Page 110 of 166 .

19. SRM Reference Guide Page 111 of 166 . Leave the default settings for the Destination Folder.

Example: Station 16 would select STU16-VCA\SQLEXP_VIM when installing in the Protected Site.20. The number in the server address must correspond with the station you are sitting at. Enter the vCenter Server address and credentials. STU16-VC-B\SQLEXP_VIM when installing in the Recovery Site. SRM Reference Guide Page 112 of 166 .

SRM Reference Guide Page 113 of 166 . If you receive a security warning.21. click Yes to proceed.

Keep the default settings for the Certificate Source. SRM Reference Guide Page 114 of 166 .22.

For the Organization and Organization Unit. SRM Reference Guide Page 115 of 166 . enter VMware.23.

24. SRM Reference Guide Page 116 of 166 . Provide a Local Site name and Administrator E-mail address.

use the Data Source Name drop-down menu to select the ODBC DSN created earlier. Enter the database user credentials. SRM Reference Guide Page 117 of 166 . In the Database Configuration window.25.

SRM Reference Guide Page 118 of 166 . Click the Install button to initiate and complete the SRM installation.26.

SRM Reference Guide Page 119 of 166 .

SRM Reference Guide Page 120 of 166 .

SRM Plugin installation 27. Going back to the original VMware vCenter Site Recovery Manager Installer window. SRM Reference Guide Page 121 of 166 . select the VMware vCenter Site Recovery Manager Plugin link to start the SRM plugin installation.

Click the Next button. SRM Reference Guide Page 122 of 166 .28.

Accept the license agreement and click the Next button.29. SRM Reference Guide Page 123 of 166 .

30. SRM Reference Guide Page 124 of 166 . Click the Install button.

31. SRM Reference Guide Page 125 of 166 . Click the Finish button.

You can verify the SRM plugin installation by opening the vSphere Client and clicking on the Plug-ins menu item.32. SRM Reference Guide Page 126 of 166 .

35.33. Double-click the FalconStorSRA executable to begin the SRA installation. Storage Replication Adaptor (SRA) installation 34. Click the Next button to begin the installation. SRM Reference Guide Page 127 of 166 . Navigate to c:\files\SRA. You can also click on Home in the menu bar and look for the Site Recovery button under Solutions and Applications.

36. SRM Reference Guide Page 128 of 166 . Accept the license agreement and click the Next button.

Enter the customer information and click the Next button. SRM Reference Guide Page 129 of 166 .37.

This can be found in the text file locate in c:\files\SRA. SRM Reference Guide Page 130 of 166 . Copy and paste this keycode into the Keycode . You will be prompted for a keycode.License window.38. Click the Next button.

SRM Reference Guide Page 131 of 166 .39. Click the Install button.

40. Click the Finish button. SRM Reference Guide Page 132 of 166 .

etc. about the SRA that are important to know when implementing SRM. SRM Reference Guide Page 133 of 166 . known issues. These often include information about features. It is important to review the readme file included with SRAs.41.

42. Lab 2 – Configuring SRM Start your vSphere client. Your not configured SRM should look like this on the protected side: SRM Reference Guide Page 134 of 166 . and select the SRM application. This completes the installation of the SRA.

Add the recovery site‘s VC (stuXX-vc-b—short DNS name is sufficient): Click next and accept the certificate on the error page (this is self-signed. connect the protected side to the recovery side (click Configure).First. so you should work with a valid one in your production environment): SRM Reference Guide Page 135 of 166 .

Authenticate with administrative credentials: Accept the certificate again: Wait for it to go through the connection procedure: SRM Reference Guide Page 136 of 166 .

SRM Reference Guide Page 137 of 166 .

Enter administrative credentials for the remote VC and click OK: Click Finish and you‘re through. SRM Reference Guide Page 138 of 166 .

Now go configure the Array Managers. Click Add…: Select FalconStor NSS Series from the Manager Type pull-down menu: SRM Reference Guide Page 139 of 166 .

Fill out the rest of the information and click Connect: SRM Reference Guide Page 140 of 166 .

The array(s) should show up in the lower part of the window: Click OK and your Protected Site Array Managers should look like this: SRM Reference Guide Page 141 of 166 .

Click Next and repeat the process for Recovery Site Array Managers. It should look like this when completed: SRM Reference Guide Page 142 of 166 .

Click Next and confirm the datastores: SRM Reference Guide Page 143 of 166 .

Click Finish. SRM Reference Guide Page 144 of 166 .

Click Configure on Inventory Mappings to pull up: Select VM Network and click Configure… Select VM Network and click OK. Configure the Protected Apps Resource Pool to map to Recovery Apps: SRM Reference Guide Page 145 of 166 .

Map the Protected Apps Virtual Machine Folders to Recovery Apps: Click on Site Recovery to get back to the SRM home screen: SRM Reference Guide Page 146 of 166 .

SRM Reference Guide Page 147 of 166 .

Note the VMs show up at the bottom: SRM Reference Guide Page 148 of 166 .Go to Protection Groups and click Create a Protection Group called Production Group: Click Next and select the protected datastore.

Click Next and select the datastore labeled ―esxXXb-shadow‖ for your placeholder VMs: SRM Reference Guide Page 149 of 166 .

Click Finish. SRM Reference Guide Page 150 of 166 .

If you are using vCenter Linked Mode like we are in the lab. That‘s OK. Just pull down and select the appropriate vCenter from the drop-down Site Recovery list in the breadcrumb trail: You will get a similar window. Name the Recovery Plan: SRM Reference Guide Page 151 of 166 . you need to move over to the recovery side vCenter. Go to Recovery Plans near the bottom and click Create. We‘re not protecting anything here. but notice the Protection Setup section is almost empty. it is easy to do.To set up the Recovery Plan.

Accept the defaults for Response Times: SRM Reference Guide Page 152 of 166 .Click Next and select Production Group to protect. Click Next.

Map VM Network to the Test Network: Click Next.Click Next. but feel free to look: SRM Reference Guide Page 153 of 166 . There are no VMs that we are suspending.

SRM Reference Guide Page 154 of 166 .

Select Production Recovery Group on the left: Click on the Test button: Confirm: Move over to the Recovery Steps tab to watch the progress: .

the default message will show: Click Continue.At around 54% on the progress bar. all info disappears! Not to worry. Click on the History tab and select your RP and click View: SRM Reference Guide Page 156 of 166 . Notice after it finishes step 11. SRM has saved it in a History Report.

the CSV file contains a list of virtual machine names.\config\vmware-dr. it is relatively easy to create a customization specification to change the IP information and connect it to the individual virtual machines.That will pull up an HTML report detailing the entire Test process. your command should look as follows: a. When you have only a few virtual machines in your recovery plan. which side should you be working in? 2. However. When first created. You have created a set of VLAN‘s at the recovery facility that will be used during a test – but since the IP information in the recovery facility is different than what you have in production – you need a way to change the IP information for each VM during a test or actual failover. When complete. Procedure Hints 1. To tell the utility to pull information from SRM. 80 or hundreds of virtual machines it becomes much more time consuming to create a custom specification for each one. Think: You are working to customize a recovery plan.csv –cmd generate SRM Reference Guide Page 157 of 166 . This process creates a series of customization specifications which will be read during the boot process for the affected virtual machines on the recovery side. You will need to execute the utility from a DOS prompt so that you may provide it with the correct parameters. you import the CSV back into SRM. You will find this utility in the VMware Site Recovery Manager program files folder under the bin directory.xml –csv c:\down. You use this template as a starting place to provide the proper IP information which will be required by the virtual machines at the recover site.exe –cfg . dr-ip-customizer. and then several IP related fields adjacent to the name that may be modified to fit your needs. The bulk IP tool we use here is designed to make it easier to create custom specifications. Lab 3 – IP Customization You have a healthy SRM implementation and are protecting hundreds of virtual machines. This lab will help you understand this tool and how to use it.. Helpful Starters The general idea when working with the Bulk IP Load tool is to create a CSV file that contains vital information about the virtual machines which are a part of the recovery plan. Run the dr-ip-customizer executable. You now want to ensure your VM‘s are connected to the right network on the recovery side for testing and for real failovers. when you have 50.

Excel. go to the virtual machine which was associated with the customization specification and check out its IP information. Please see below for an example of a clean dr-ip-customizer export. g. 0 means “global”. NOTE: Excel is not installed on any of these machines – Instead.exe –cfg . and you will need to authenticate as well. type dhcp. 5. f. The 2nd column contains the name of your virtual machine(s). You can now use Microsoft Excel to open the CSV file. Where in vCenter can you view Customization Specifications? Tip: Goto the Home screen. Click the row header to highlight the last row. Verify your import was successful. Since we have only 1 adapter. The DNS domain should now be vmworldtest. we will be modifying only 1 virtual machine and we will use the one on the bottom of your list. Reference Materials Sample 1 – Bulk IP Load Screenshot SRM Reference Guide Page 158 of 166 . Import the file back into SRM by executing the following command: a.com and it should be configured to use DHCP. dr-ip-customizer. Think! Dr-ip-customizer creates customization specifications. paste the contents of the clipboard. Save the CSV file. Did the command succeed? Tip: If the command worked. Follow these steps closely when modifying the file: a.3. d.csv” in the root of the C:\ drive. In your new row. you can run a test recovery and when you get to the yellow pause message (in the recovery steps tab). Checkout VMware ThinApp when you have a chance! 6. its ID is 1.com. e. b. 7. To complete this lab. 8. Tip: Before you paste. You will be prompted to trust a server twice. In the IP Address column. In the DNS Domain column. In the blank row directly below your last row. You will be prompted to trust twice again and authenticate 10.csv –cmd create 9. change the value in the Adapter ID column from 0 to 1.exe is wrapped in a VMware ThinApp package and simply runs as a self contained exe. and copy it to the clipboard c. and 1-4 refer to specific adapters. 4..xml –csv c:\down. Now you will introduce some changes to the file so the referenced virtual machines will be associated with new customization specifications. you should have a file named “down. type vmworldtest.\config\vmware-dr. click on the first cell in that new row (in the A column). 11. Tip: The values for Adapter ID can range from 0 to 4. For simplicity sake.

You are applying extended attributes to a virtual machine that is being protected by SRM so that when it is recovered. you utilized the bulk IP customization utility to alter the IP information for a virtual machine in a recovery plan. 3. highlight your protection group in the left pane. Dr-ip-customizer saves administrative time and minimizes errors. If you do not see these scripts. This utility created a customization specification and associated it with the virtual machine. For this lab. we will make a simple call to a pre-written script. To that end. remember to always use full path references and be sure of your spelling and punctuation. 2. This is a considerably valuable tool. There should be scripts named call.Conclusion In Lab 2. syntax is important. we are going to investigate the use of scripts during a recovery effort. With scripting. please let a lab proctor know. or even all virtual machines. and the ability to make callouts during a test or full recovery will serve to even further streamline the recovery effort. Consider if the script should be executed before the VM power on. click Configure Protection SRM Reference Guide Page 159 of 166 . Think: What side of SRM contains a list of the actual virtual machines that are being protected? Procedure Tips 1. 2. Be sure there are two scripts located on your SRM server. Both should be located in c:\scripts. Scripting is quite powerful. Lab 4 – Script Intro Scenario In this lab. For each virtual machine in the list. it automatically obtains custom IP attributes by taking direction from the customization specification.cmd. and click on the Virtual Machines tab in the right pane 3. Helpful Starters 1.cmd and test. This script is a control script and thus calls another script. which simply records the date and time of a virtual machine power on operation during a test recovery. or after it boots. especially for very large SRM environments where IP information may need to be altered for a large number. When the virtual machine is powered on for test in the recovery facility. the script will execute. On the Protected side.

Post Power On 5. 9. Reference Materials Things to Remember about Scripts Conclusion In Lab 4. Click Add Command to insert a call out to your control script. Type the following into the Add Command dialog box: a. You should see date and timestamp entries. C:\windows\system32\cmd. open up the c:\scripts\test. Scripts are useful for a variety of reasons from simple diagnostics and logging to more complex SRM Reference Guide Page 160 of 166 . Click Finish to store the configuration.cmd” 6. The scripts are configured to record the date and time stamp (for lack of anything else more interesting) to a log file located in the c:\scripts directory of the SRM server on the recovery side. Once the yellow banner appears in the Recovery Steps tab. 8. Flip to the recovery side and run a test failover. Be sure to repeat steps 4 and 5 for each virtual machine in the protection group 7.exe /C “c:\scripts\call.log file. Click Next through to the very last section. Your virtual machines are now configured to run a script post power on.4. you learned how to inject scripts into your SRM environment. 10.

SRM Reference Guide Page 161 of 166 . load balancers. and other services which may require updating on the recovery side.integration with the DNS environment.

 Added a section on failback plug-ins on page 31.  Add a section on Shared Recovery (Page 31). as well as updating some of the recommended alarms. See this info on page 94.  I added a section on upgrading to 4.  I added a reference to my suggested alarms blog on page 48.  Added an important new best practice (#20) on page 27. and Celerra SRA) in EMC – page 82. storage and release notes). VSI4.1 on page 11.  Added various URL (evaluation guide.16.  Added some info to protecting View desktops on page 30. 12/31/10  Added some info in best practices to improve the section (points 10 through 13 – page 27).  I added more to the best practices .  Added important note about the VSI and how it may not right SRA configuration information out – page 84.  I increased the complexity of the sample demo script on page 33.  Added info about a NetApp issue on page 75.  Added a section about not being able to failback to a lost protected site on page 32.  Added a MirrorView issue and solution on page 72.  Added a URL to a blog about LHN lessons learned – in LHN – page 74.What’s New – additions or deletions or changes 2/26/11  Added info on what a PowerShell script should look like to be executed by SRM – see it on page 34.  Added info on an error – configuring protection group timeout error – on page 71.  Added some new release info (on SRDF ARA. Also added 19 as well about using SQL accounts. SRM Reference Guide Page 162 of 166 .  Added additional resources for working with certificates on page 45.  Added another couple of best practices around multi – extents and application discovery and mapping software – on page 27.  Corrected spelling / grammar issues.  Added a simpler solution and a KB about it to issues installing / uninstalling on win2K8 – thanks again Brock – see page 11 for more info.  Added additional information on NetApp Metrocluster from Lee D on page 77.1.  IBM SVC and Java issue added on page 74.  I added some info to using the dr-ip-customizer tool on page 41.#15.17. and 18 on page 27.  Added a reference to the new SRDF techbook in the EMC \ SRDF section.  Added a section on what rights are required to execute recovery plans when you don‘t want your user to be a VC admin user – page 51.  Updated some IBM SVC SRA and issue – thanks to Brock for sharing.  Added some miscellaneous information to the install account section as a result of comments from Brock (thanks BTW).  Added some blog links to scripting on page 33. See page 9 for more info.  Added the Win2K8 R2 install log location on page 39.

Added information on case sensitivity.  Add URL reference to the new SRDF techbook. 5/22/10  Added info on Network device needed by recovered virtual machine error. but also a new suggestion (number 14) in best practices on page 27. yet. and a blog reference to SRM 4.  Added PowerShell signature issue and solution on page 71.  Added RecoverPoint TCP port info. or dangerous.  Added some EMC video links.  Added error info on not visible LUN (not able to create a PG with it) to page 72.1 build information  Added info on tweaking the log parameters (Page 37)  Added some info on best practices (Page 27)  Updated minimum Alarm notification suggestions (Page 48)  Added two URL‘s to help with HDS SRA installation (Page 81)  Added some info on SQL authentication and starting SRM issues.  I added some details to the best practices section (Page 27)  Added a new section to the IBM SRA section IBM DS4000/5000 (Page 93)  Added a new issue – install hangs at 90% .  Added info on a script error – page 35. 8/7/10  Added info on upgrading to SRM 4.  Added a section on P2V DR – page 31. See page 56. so make sure you know what you are doing!  How to reset the Celerra root and NASADMIN accounts (Page 82)  Account solution help for an error (Page 71)  I added a little more info.  Updated the network device not found info – page 62.1 (Page 11)  Added 4. Page 30).1 SRM licensing on page 42.  Some general readability improvements.0 section a little – page 43.  Added a brief note on when SRM is not a solution to consider.  Added some extra info on 4. with its information can be useful.  Added a Can I change the Run button to work like the Test button section?  Updated the how to find a name of a VM when I have a MoRef article. SRM Reference Guide Page 163 of 166  .  Added info on a hard to troubleshoot / fix error with a pop up error about not being able to protect a VM – page 72.  Added information on what VM parameters are not failed over – see page 35  Added info on high priority start order and multiple protection groups – page 36.on Page 71.  Updated SRM scripting info in several places. 10/16/10  Added a little text on the title page to explain this guide. and replicated LUNs issue with the DS8000 SRA – page 94.  Added Application References section – see page 30.1 licensing (Page 42)  Added a little more detail to where SRM doesn‘t fit (Page 8)  Added a section on protecting View desktops – be warned it doesn‘t.  Added info on install error with Perl – page 72. include SRM.  Added an error / solution – operation timeout.  Also clear up the 1.  Miscellaneous link and text updates – mostly spelling / grammar.

 Added info on the –null Celerra issue – it is supposedly fixed.  Added a SRDF white paper URL.  Added info in the script section to show how you could see the variables that are available during the run of the failover or test.  Added info on time required to protect VM‘s.x would handle a expired license situation. SRM Reference Guide Page 164 of 166 Added additional info on the mirrorviewsracore.dll issue and SE info in the EMC MirrorView section. While they are designed for a specific lab. Added information on expectations you can have for how long to fail over.  Fixed error in script example.        3/26/10  Added another HP link to online help for the EVA and SRM.  Added an additional suggested alert condition.  Added Labs to work with the SRM Boot camp at PEX.  Added two new alarms to the recommended alarms. Added information on FalconStor SRA log levels.  Working on improving spelling and grammar.  Added info on SRDF / SRM EMC SRDF licenses. Added info on what travels with a VM between recovery plans.  Added info on the two NetApp SRA question.  Some additional info for the HP EVA. Added a solution to SRM not starting with event log errors 7000 / 7009. 12/12/09  Add some additional info on redoing the SRM db.  Added some additional info on IP Customization.  Did some miscellaneous spelling corrections.  Added to the EMC RecoverPoint section the issue with using the ~ in CG names. Added problem / solution of failed to connect to NFC. . Added information to change the concurrent power on value. Slowly but will keep at it.  Added some extra info about which storage arrays support application consistency.  Added some basic detail on backups of SRM.  Added KB article to IBM SVC for an error previous reported but KB article has extra info.  Added information on the three things that derail SRM projects and Proof of Concepts. 2/4/10  Added some additional Script info.0 and SRM 4. including a sample.  Added an updated to the Celerra and –null issue.  Added a workaround for an issue with RecoverPoint and a CG with a large number of LUNS.  Added a solution to a Compellent SRA issue on Win2K8.  Corrected the path to the SSL and NetApp document.  Corrected the path to the how to use trusted certificates document. they are still useful for someone who wants to learn more.  Added detail about protecting multiple tiered applications.  Added info on thick LUNs to MirrorView.  Also provided info on how SRM 1. Some general cleanup and adding of info to various sections of the document.  Added link to issue / solution for CLARiiON issue.

Thanks Rob!!  SRM build numbers added.  Added to install section info on 15-character limit for VC username. 8/8/09  Comments about starting vmware-dr.  Added info on extending script timeouts. For HDS. Added a section on vendors and their tools that can do application consistent replication. and added SRM 4.0 license info in post Update 1 of vSphere. Added some SPC-2 info to EMC DMX area.  Added info on dr-ip-customizer issues 10/2/09  Miscellaneous updates for SRM 4.0 new method.  Update. SRM Reference Guide Page 165 of 166 .  Added info on Heartbeat and SRM  Finding a VM moid. Added a section on vendors who can do application consistent continuous replication.exe in the general troubleshooting section.  Added info on Symmetrix and duplicate WWN issue. Added a LHN error / solution (array‖xxxx‖ not found).0 performance whitepaper  Add info on repairing in SRM 4.  Some additional IP customization things to watch out for. Added info on avoiding shutdown tracker prompt.  Added how to catch the install and srm-config log files during install.  Corrected Changing passwords after SRM is working for SRM 4.0 info. Added an MV error and solution. and corrected. and background on MV in the general section. 11/27/09  Add info on Linux IP customization issue.  Added a screen shot for what new license looks like.0. SRDF.  Various edits and suggestions from Rob N.  Fixed a misspelling in EditPlus section.  Also add some grayed out create PG general help in troubleshooting. and clarity around how many VM‘s can be started.  Added the –null Celerra error.  Added problem and solution of the proxy issue after Update 1.0 (Repair SRM in add remove instead of srmconfig). SRM 4. 10/30/09  Add additional info.  Added info on IBM path issues  Re-initialize SRM database  URL to SRM 4. Added info on how to change the MirrorView SRA to support 3 simultaneous recovery plans instead of the default and supported one.  Correction for how many VM‘s can be started in a SRM 1. Added some MirrorView port usage info to the EMC MirrorView section.  Added info on incompatible host / resource pool etc.  Some HP EVA info was added.0 world.  HP EVA setting issue added. EVA added in info about the replication direction changed during a failover.          Added some additional information on number of character limits for a number of tools.  Added info around why you cannot see a LUN during array config.

 Updated the app test plan. Documentation bug – non-zero exit crash not true Added some info on what database corruption looks like. 6/13/09  Added info on scalability. 6/4/09  LeftHand VSA upgrade info  Added info on syntax highlighting. Two possible solutions for SRM not starting.  Added Celerra section to EMC and that it works with simultaneous recovery plan operation.  Additional info on log file component logging.  Information on uninstalling an SRA which stops SRM from working!  Some info on MS licensing and DR testing.0 SRA logs location. Added EMC SRDF error and solution to the Troubleshooting section and NOT to the EMC section.com SRM Reference Guide Page 166 of 166 .docx Revision: 46 Last Save By: Michael White at 2/26/2011 16:16 Created by: VMware.  Updated MirrorView for possible working of simultaneous recovery plan operation. Also said it works for RecoverPoint.  NetApp MetroCluster background info (thanks Lee!)  A RecoverPoint / DMX / SPC2 issue! 6/2/09  Added info on SVC and simultaneous recovery operations.       7/5/09  Added Linux customization log info  Upgrade information for next release  RecoverPoint log location  Added symapi_c_net_handshake_failed error to SRDF section 6/20/09  Added some additional info on upgrades.  Added a post install test outline. Filename: Z:\Downloads\SRM Reference Guide_x. Celerra 2. Some additional clarity on what the Repair button is for. Added additional info on uninstalls.  Added the error message to changing install passwords after install and how to fix. Inc at 8/8/2010 10:52 Last printed at: 2/26/2011 16:162/26/2011 16:16 Comments / suggestions / corrections / changes to MWhite@VMware.

Sign up to vote on this title
UsefulNot useful