Professional Documents
Culture Documents
Day 4
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Overview
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Agenda
These slides are HP Confidential and are designed for Restricted HP internal use Only.
If you want to share the information with partners or customers, use the other HP Official
permitted documents/presentations
3 © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is Remote Copy? (1 of 2)
Remote Copy is the disaster recovery feature for 3PAR products
modes of operation:
Synchronous (sync):
Data is replicated in real time from primary to secondary. This gives a RPO (Recovery Point
Objective) at the expense of increased service times.
4 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is Remote Copy? ( 2 of 2)
Starting with HP 3PAR StoreServ OS 3.1.2 some Remote Copy topologies support
mixing synchronous and periodic asynchronous replication between a pair of nodes
-For the topologies that support mixing synchronous and periodic asynchronous modes you cannot mix
the modes on a shared pair of RC links, each mode must reside on its own pair of links
4 links are required between the same node pair( 4 RCIP links or 2 RCIP and 2 RCFC
links)
-Synchronous remote copy group created using Synchronous mode target
-Periodic groups created using Periodic remote copy target.
5 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Advantages
6 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Terminology ( 1 of 2)
7 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Terminology ( 2 of 2)
8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
RC Transport Layers
9 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Maximum Latencies
*
• Optical Fibre networks typically have a delay of ~5 us/km (0.005ms/km)
• Thus 2.6ms will allow Fibre link distances of up to 260 km
2 x 260km = 520km 520km x 0.005 ms/km = 2.6ms
10 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
RC Directions
Bidirectional
− A given storage system pair is said to be Primary Secondary
under bi-directional replication if replication
is occurring in both the directions.
S P
P S
11 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Links
Sending Link Created manually during the Remote Copy setup
The commands that configure the “Sending Link” are:
• admitrcopylink
• creatercopytarget
HP 3PAR Remote Copy uses sending links to transmit data from
a system to its remote-copy target system (the other system in
the remote-copy pair).
Receiving Link Automatically created on all nodes that have sending links
configured
HP 3PAR Remote Copy uses receiving links to:
• listen for remote-copy data and commands from the target system
in the remote-copy pair
• read the incoming data and commands
• send the data and commands to the appropriate remote-copy
process
12 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Groups
• Volumes on the primary are added to a Remote Copy group
− The group is the basic configuration unit of Remote Copy
− The data in volumes in a group are time consistent
• The group has a target associated with it
− The target is either FC or IP and identifies a remote system
− A target has links associated with it. The links are used to communicate with the remote system
− Targets must be configured on all systems using Remote Copy
• The target of a Remote Copy group is known as the secondary
• Each group can either be in periodic or synchronous mode. All the volumes in the
group operate in the same mode.
• Data replication starts when a group is started
• Secondary group name =
− <primary_group_name>.r<primary_system_system_ID>
13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Snapshots
• Snapshots are critical to Remote Copy operation (no license required but
strongly recommended)
• A snapshot is a view of a volume at a particular point in time
• Remote Copy typically uses coordinated snapshots
− Traffic to all volumes in a group is blocked and snapshots are taken
− This ensures that the snapshots of all volumes represent the state of the volumes at a distinct single point
in time
• Remote Copy uses resync snapshots when syncing data
− A resync snapshot is the starting point of a sync (can also be used as a recovery point)
− A sync snapshot is the target state of a sync
14 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Synchronous
5. I/O complete
1. I/O to local
array 4. Acknowledgment
to host
3. Acknowledgment
from remote
11 2
2. Replicate to 1 2
1 2
remote site
X ms latency
15 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Sync Mode Basic Operation
16 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Disaster Tolerant Solution Considerations
Synchronous replication server performance impact
• Host-initiated write is performed on both the active and the backup storage servers before
acknowledging the host write
• On active storage server, the write is written to the write cache of two nodes; this is the standard redundancy of the 3PAR
InServ Concurrently,
• the write is sent via communication link to backup storage server
• The write request is written to the write cache on two nodes before it sends acknowledgment to the active system
• Host write is acknowledged once the local cache update completes, as well as receiving the remote acknowledgment
• Server write I/O performance will be paced by both the speed of the inter-subsystem links
and the network latency on these links
• Total write IO service time on the primary array includes
• Local array IO service time
• Replication latency
• Replication latency can be much higher than the network “ping” time
• Remote array IO service time
• Server write IO performance decreases as the link latency increases or link speed decreases
• Provisioning a larger (faster) link can help reduce the IO service time for an individual
replicated IO but the IO service time the server receives will always be something larger than
the link latency plus local array IO service time (it may be a lot larger)
17 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Asynchronous Periodic Mode
• Host writes are performed on the active server
• Host write is acknowledged as soon as the data is sent to the cache of two nodes
(normal host write acknowledgment)
18 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Asynchronous Periodic
Primary Site Remote Site
1 Initial Copy A SA
Resynchronization.
Starts with snapshots B SA
2
P
Resynchronization. Delta B-A
Copy delta SB
Upon Completion.
3
Delete old snapshot A SA
20 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Asynchronous Periodic multi-volume IO
consistency
• For an application using a single volume by itself IO consistency is always insured as the delta data
is applied in an all or nothing fashion
• For an application that spans multiple volumes a Remote Copy “Volume Group” containing the
volumes ensures IO consistency is maintained across the volumes
• During the delta resynchronization of volumes in the Volume Group RC creates snapshots of all the
volumes before the resynchronization starts
• If RC is in the processes of updating volumes in the Volume group when a failover occurs all
volumes in the Volume Group are promoted back to the last snapshot that was taken, hence
ensuring IO consistency across the volumes
21 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
showrcopy
root@inodee4cd3:~# showrcopy
Target Information
Link Information
Group Information
22 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
showrcopy -d
During sync:
Group Information
Name ID Target Status Role Mode LocalUserCpg LocalSnapCpg RmUserCpg RmSnapCpg Options
After sync:
Group Information
Name ID Target Status Role Mode LocalUserCpg LocalSnapCpg RmUserCpg RmSnapCpg Options
23 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Supported Topologies
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy One-to-One (1:1) Topology
Sync
RCFC or RCIP, 2-4 nodes, bidirectional Sync RC links
Group Group
A A’
Periodic
RCFC or RCIP, 2-4 nodes, bidirectional
Async Periodic RC
FCIP, 2 nodes, bidirectional Group Group
links
B’ B
Mixed Sync and Periodic
RCIP or RCFC, 4 nodes, bidirectional 3PAR 3PAR
StoreServ StoreServ
RCIP periodic, RCFC sync, 2-4 nodes,
bidirectional
25 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP 3PAR Remote Copy Many to One (N:1) Topology
N:1 Topology
• Only supported with Asynchronous Periodic
P
replication
• StoreServ Requirements
Primary Site A
• Current max support is 4:1 InServs
P
• One of the four primary InServs can mirror bi-
directionally with the target hence protecting the
RC RC
target array’s data Primary Site B
RC P
• If the solutions is FC-IP based then the Target
StoreServ requires two nodes for every Primary RC
StoreServ
P Target Site
• One relationship may be bi-directional
• Cannot mix different transports (all links are
RCIP or all RCFC or all FCIP) Primary Site C P
• Sync not supported
RC
26 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
One to Many (1:N) Topology 3.1.2 Support
Source periodic
RCIP, 2-4 nodes, bidirectional
RCFC or FCIP, 4 nodes, bidirectional P1`
RCFC/FCIP and RCIP, 4 nodes, bidirectional
Source sync
RCIP, 2-4 nodes, bidirectional
RCFC, 4 nodes, bidirectional
RCFC and RCIP, 4 nodes, bidirectional P1 Target 1
N is a max of 2
Supported configs
Source
Async Periodic between all arrays
Sync between one pair and Async Periodic between
other pair
P2`
*Bi-directional replication supported starting
Note: Different volume groups are being replicated (1:N)
with HP 3PAR OS 3.1.2
27 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Target 2
Remote Copy: Supported Topology
28 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Synchronous Long Distance Topology 3.1.2 Support
Source to sync target
- In an SLD configuration, bi-directional
RCFC or RCIP, 2-4 nodes,
synchronous replication is supported bidirectional
between Source and Sync Target starting with
HP 3PAR OS 3.1.2
- Supported configs:
Only one Source and Sync Target pair
allowed Bi-direction
Source/DR pair is Uni-direction as before
- Allows customers with 3 datacenters (DCs) to
deploy primary applications on their source
and Sync sites, and protect their Source Target 1
data/applications using storage system
located in another DC.
Sync target to periodic target
Source to periodic target
RCFC or RCIP, 2-4 nodes
RCFC or RCIP, 2-4 nodes
FCIP, 4 nodes
FCIP, 4 nodes
Uni-directional
Target 2
Bi-directional synchrnonous
Support for two SLD configurations across three replication via RCFC or FCIP
arrays
P1 P1`
Source1 Sync
Target 1
P2` P2
Sync Source 2
Target 2
P1``
Async P2``
Async
Target 2 Target 2
30 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Topology Enhancements in 3.1.3
Many‐to‐Many (M‐to‐N) Remote Copy
The latest and greatest 3PAR Remote Copy Config Guide, including 3.1.3
enhancements, is available at:
31 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Disaster Recovery
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Disaster Recovery Actions
Supported Disaster Recovery Actions
• Reverse
• Failover
• Recover
• Restore
Failover
• Revert Failover
• Switchover
Restore
33 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Reverse
34 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Failover
35 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Recover
• This action can be performed on groups where the failover command has already
been completed successfully.
• The groups are then started and synchronized. Delta changes from backup array
been synced to the original primary array
• When recover is complete the Remote Copy system is working normally but in a
reverse direction.
36 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Restore
• Restore is followed by Recover in order to put the Remote copy to the original
direction
• Used on groups where the recover operation has been performed.
• When complete the groups are operating in the normal, natural direction.
Note: this sounds similar to a reverse, but the reverse does not sync the data from the
secondary to the primary!!
37 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Revert failover
You can undo a failover operation by reverting the Remote Copy groups to their normal
state. When the Remote copy start sync after the revert ,it will over write if there was any
data written to the Backup System Volume.
• When you revert, the data on the destination volumes is restored to the state of
stopping the replication using the snapshot.
• The volume will be still writable (RW)on source but on destination it will become read
only (RO).
Note: the snapshots on both source and destination remains until the replication started.
• The replication has not started at this stage.. user action is required to start
replication unless autostart is enabled.
• Start replication and When the volume is synced both snapshots get deleted.
38 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
switchover
39 © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Disaster Recovery Process
2. Reverse the natural direction of data flow and synchronize the systems.
• setrcopygroup recover
40 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Limits
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Limits
• Synchronous OR mixed (sync/async) support 800 replicated volumes in 3.1.2 and 2400
in 3.1.3
• Maximum number of volumes per group is 100 in 3.1.2 and 300 in 3.1.3
• More volumes means longer snapshot creation, hence longer failover time
• Can share HBA between host and RC, dedicated port for RC the rest for host use
• StoreServ 10000, E, F, S, T all require dedicated RCFC HBAs if running 3.1.2MU1 or earlier
• One RCIP and/or one RCFC port per node until 3.1.2, upto 4 RCFC ports per node in
3 .1.3
• You can configure one sending link per node per target system
42 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy Timeouts
When a failure occurs such that all links between the systems are broken:
Synchronous:
• After 15 seconds, the system marks the sending links as Down
• After another 15 seconds, the system marks the targets as failed
Asynchronous:
• After 25 seconds, the system marks the sending links as Down
• After another 200 seconds, the system marks the targets as failed
43 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Gotchas
44 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Performance (impact)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Local Write Request with Remote Copy
After the write data is mirrored between the nodes connected to common HDDs
The write data must be DMAed to one of the nodes performing Remote Copy
Replication. Notice that the nodes being used to implement Remote Copy can
(will) see higher CPU utilization and they will consume more write cache than
nodes not performing Remote Copy
• A node can tell from the information in the write request (LUN and the LBA for the
write) which LD (and hence which node) the write data ultimately will be going to
• The node can also tell which nodes are responsible for Remote Copy replication
and hence must receive a copy of the data
• For this example, the write is destined for an LD which is owned by Node 2 or
Node 3 even though the request comes in to Node 2
46 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Intel Multi-Core Intel Multi-Core Intel Multi-Core Intel Multi-Core
Processor Processor Processor Processor
Node 1 Node 3
Multifunction Controller Multifunction Controller
L Control Control L
Cache Cache
D D
Data Data Data Data Virtual Volume
Cache Cache Cache Cache
Presented
3PAR Gen4 3PAR Gen4 3PAR Gen4 3PAR Gen4
ASIC ASIC ASIC ASIC To The Server
Remote
PCIe PCIe PCIe PCIe PCIe PCIe
Copy link Switch Switch Switch Switch Switch Switch
Node 0 Node 2
Multifunction Controller Multifunction Controller
L Control Control L
Cache Cache
D D
Data Data Data Data
Cache Cache Cache Cache
Remote Processor
Xfer
Write
IO
Data determines
Ready
Complete
request from
to
from
host which
node
server
PCIe
Switch
PCIe PCIe PCIe
Switch
PCIe PCIe Data DMAed
Server sends
DMAed to
to node
data to
node node
Switch Switch Switch Switch
Copy link LD(orthe write is forin
1’s
and 3’s0’s)cache
data is placed
cache
node’s cache
Performance Statistics
• statport
− displays read/write (I/O) statistics for ports
• statrcopy
− displays statistics for Remote Copy volume groups
• statrcvv
− displays statistics for Remote Copy volumes
• checkrclink
− Performs a connectivity, latency, and throughput test between two systems
48 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Interrupt Coalescing
• Enabled by
default on RC
• Disabling can
reduce service
times for single
threaded apps
• Only use on sync
replication
Performance Impact
50 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Service Times
400
350
300
Remote 250
service times 200
(ms)
150
100
50
0
0 200 400 600Elapsed time800
(s) 1000 1200 1400
-50
Group1 Rmt Svt throttle disabled Group2 Rmt Svt throttle disabled
51 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Throughput
900000
800000
700000
600000
Data Rate 500000
(kBps)
400000
300000
200000
100000
0
0 200 400 Elapsed
600 time (s) 800 1000 1200 1400
-100000
Group1 rate throttle disabled Group2 rate throttle disabled
52 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What’s Happening?
53 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
How Writes are Done
Host writes to volumes in sync groups are done in the context of vio server threads:
• There are thousands of these per node
• The local write is first done, then the thread sends the write to the secondary and blocks waiting for a response
• When the remote service times are high, the effect of the blocking mechanism is that the throughput is reduced
54 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Solution
• The solution is to reduce the throughput for initial syncs/resyncs in the event of
high service times.
• Generally when the secondary is overloaded we start seeing a few excessive service times (> 75ms)
• As soon as an ACK is received with an excessive service time we reduce the rate by a small amount
• When no excessive service times have been seen for a certain period then we gradually increase the rate
• The rate is controlled on a per VV, per node basis.
55 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Performance measurement
Traditional performance checks do not apply:
----- statport -rw -d -iter -----
r/w I/O per second KBytes per sec Svt ms IOSz KB
Port D/C Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
0:1:1 Data r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 -
0:1:1 Data w 6 6 6 9 9 9 0.4 0.4 1.4 1.4 -
0:1:1 Data t 6 6 6 9 9 9 0.4 0.4 1.4 1.4 0
0:1:2 Data r 98 98 98 8599 8599 8599 1.0 1.0 87.7 87.7 -
0:1:2 Data w 112 112 112 2241 2241 2241 3.6 3.6 19.9 19.9 -
0:1:2 Data t 210 210 210 10840 10840 10840 2.4 2.4 51.5 51.5 4
0:2:1 Data r 223 223 223 422 422 422 1734.1 1734.1 1.9 1.9 -
0:2:1 Data w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 -
0:2:1 Data t 223 223 223 422 422 422 1734.1 1734.1 1.9 1.9 382
0:2:2 Data r 91 91 91 356 356 356 4253.4 4253.4 3.9 3.9 -
0:2:2 Data w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 -
0:2:2 Data t 91 91 91 356 356 356 4253.4 4253.4 3.9 3.9 382
For RC we use a pre-read of 80Kb. So a long svc time means we waited this long for the Primary side
to place data in the read buffer.
The timeout is 5 seconds so 5000 is the highest you will see there.
Performance in sync mode will be improved if IntCoal is disabled on FCRC ports
56 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Appendix
57 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Background Operations
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Code
• Remote Copy spans both user space and kernel space.
• In user space it is part of the sysmgr process and most code resides in
rmconfig.c and rmutil.c.
• The sysmgr code is largely responsible for configuration and management.
• Kernel code is contained in mirror.c, rmt_mirror.c and tickets.c.
• Kernel code is responsible for requesting reads/writes from volumes and identifying differences
between snapshots.
• Managing IO requests and handling link downs/node failures.
• Sending data to the links.
59
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
RTI
60
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Ticket Dispenser
• This is a kernel task and is active on a single node
• Whenever Remote Copy wants to send a write to the secondary it must request
a ticket from the ticket dispenser.
• Primary job is to ensure that all Remote Copy IOs are successfully written to the
secondary.
• The ticket dispenser checks for overlapping IOs and will not issue a ticket if an
active IO would write to the same memory as the requesting IO.
• It also handles load balancing i.e. decides which link an IO should be sent on
• In the case of a node down the ticket dispenser can replay active tickets to
ensure continuity of service.
61
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syncs/Resyncs in Sysmgr
• rm_util_sync_group() function called to sync the group
• This spawns threads running rm_util_vv_sync_thread()- one thread per volume
• Maximum of 20 threads at any time
• The volume is broken into chunks
• Chunk size is 1/32 of volume size or 16GB whichever is larger
• Thread sends message to the kernel on each node to sync the particular chunk
• RMCMD_VVSYNC ioctl used
• When chunk is synced on all nodes the next chunk is synced
• When chunk is synced, the percentage synced count is updated
• When volume is fully synced then rm_vv_sync_cb() is called
• When final volume is synced, group maintenance is performed in this function
62
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Syncs/Resyncs in Kernel
63
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Sync Mode
• Write request received from Host and handled in vio server thread-
vol_io_write()
• rm_get_resources() is called to create an RTI and to get a ticket
• IO enqueued to secondary
• Data copied to link node if required
• Data sent out on link
• Secondary takes data from link and enqueues it to VIO server thread
64
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy data verification
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Remote Copy data verification
• The need for Remote Copy Data Verification
− Remote copy does not currently provide any means for customers to compare live primary
and secondary volumes
• Volumes that belong to remote copy groups that are in the started state
− Inconsistencies between the two volumes are not easily detectable
− In fact there is no easy way to carry out a comparison without using third-party software or
running a file system check (such as fscheck).
• Remote Copy Data Verification
− Provides customers with a data verification tool which compares primary and secondary
volumes and reports inconsistencies.
− The key feature of the project is that the comparison can be carried out without stopping the
remote copy group.
− Additionally, an option is provided to automatically correct any miscompares which are
discovered.
66
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
How does it work? (Function)
• checkrcopyvv detects miscompares in passive volumes
− No I/Os occur during the compare operation
− Uses snapshots, rather than the base volumes, during the compare.
• Avoids the requirement to quiesce the volumes
• Repair option employs the existing resync code
− Writes the miscompared blocks back to the secondary volume
− The repair option makes some modifications
67 Rev. 12.41
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Configuration and installation
• Remote Copy Data Verification can be used on all Remote Copy configurations
• It can only be used to check one volume at a time, so for example, on an SLD system
the command will need to be issued once for the sync target and once for the
periodic target.
• The command must be issued from the primary 3PAR 7000 system
• The data on the primary 3PAR 7000 system is assumed to be correct. When
miscompares are discovered, the primary volume is used as a source to repair any
errors on the target volume.
• The feature requires 3.1.2 or later
• There is no impact to upgrade/downgrade functionality.
• There are no special installation or configuration requirements.
68
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Performance benefits
• Previously undetected inconsistencies can be detected and repaired with a single
command.
Any gotchas?
• The command runs at a relatively low priority so as not to impact on normal system
operation.
• This means that very large volumes can take a considerable amount of time to
compare/repair.
• It also means that the command can run slowly on very busy systems.
• This is deliberate.
69
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Troubleshooting
• What files to gather (file name and path)
− Output is logged to the sysmgr log:
− /var/log/tpd/sysmgr
• Commands to run and what output to collect
• No special commands – just the sysmgr log
• The repair operation task log can be viewed by using the showtask –d <tid>
command.
Common error messages and limitations
• checkrcopyvv cannot be run using the –v option while the remote copy group is
stopped.
• checkrcopyvv cannot be run using the –r option while the remote copy group is
stopped.
70
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Troubleshooting (Cont)
• a second instance of checkrcopyvv cannot be started while an existing instance
exists.
• checkrcopyvv can only be run from the primary 3PAR 7000 system.
• checkrcopyvv cannot continue if there is a configuration change since the online compare
command was issued.
• The showtask –d command returns a log detailing the progress of the repair sync. The log will
indicate a problem if the repair operation did not complete successfully.
• A secondary snapshot might be left behind if all Remote Copy links go down during a compare
operation. This will expire and delete automatically.
What does each error message mean?
• Error messages have been written to be self explanatory.
• Log messages related to this feature are prefixed with the letters RCDV. (Remote Copy Data
Verification)
• Bugs for this feature can be found by searching bugzilla for the feature id RCOPY2
71
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.