Professional Documents
Culture Documents
Example:
Monitoring alerts using Monitoring Tool like EMC Control Center
Follow-up for the pending service requests
Checking ticketing tool for any new tickets
Updating the tickets with the current status
Responding to the assigned service requests
Documenting the service request solutions
Performing Health checks
Preparing Job plans for changes
Participating bridge calls if any
Attending internal/customer meetings, etc
Example:
Providing L2 support for the EMC DMX and Clariion storage
Performing upgrades/downgrades of Firmware
Storage Provisioning for new hosts
Storage Provisioning/ reclamation for existing hosts
Performing Zoning
Troubleshooting Switch issues
Troubleshooting storage issues
Troubleshooting Performance issues
Vendor /customer management during the hardware failure issues
Performing Changes and preparing job plans for changes
Performing TimeFinder/Mirror Operations
Performing SRDF Operations
Troubleshooting failed SRDF/TimeFinder jobs
Monitoring Storage environment using Monitoring Tools (ECC)
Preparing Storage Capacity planning reports
Performing Disaster Recovery Activities.etc
Example:
Checking the mails for any escalations/ alerts/ new assignments
Checking the monitoring tools for any critical alerts on console
Checking the ticketing tools for newly logged service requests
Checking the pending issues.etc.,
Example:
We are having One Data Center and One DR Center and 16 branch
offices. In Data Center and DR Center we have 2 Symmetrix DMX-4
Arrays; each box is having 800TB of data.
For backup purpose we are creating mirrors in data center using
TimeFinder/Mirror technology and for local/fast restoration we are
creating snaps using TimeFinders/SNAP technology.
For remote replication (remote backup and DR purpose) we are using
SRDF/S technology.
Each Brach offices are having CX series Clariion Arrays, For backup
purpose we are creating SnapView Clones and for fast/local restoration
we are creating SnapView Snapshots.
We are using symcli for storage operation for DMX Arrays and
Navisphere Manager for Clariion Arrays, Connectrix Manager for switch
directors management, cli for individual brocade switches, for monitoring
storage environment, Storage Scope reporting, performance monitoring
we are using EMC Control Center, Replication Manager for automating
the TimeFinder/Mirror operations, Power path for path redundancy at
host-end.
Connectivity between the Data center and DR center is over Fibre
Channel.
Connectivity between the branch offices and DC/DR is over Ethernet
Example:
We are using Symcli/SMC/ECC for DMX Arrays storage provisioning
We are using Navisphere Manager for Clariion storage provisioning
We are using connectrix manager for Directors management
We are using cli/GUi for Individual switches management
We are using EMC Control Center for Alerts management/monitoring
We are using EMC Control Center for Storage Scope reporting
We are using EMC Control Center for Performance Monitoring
We are using Replication Manager for automating the
TimeFinder/Mirror/Clone jobs
We are using symcli for managing and monitoring SRDF jobs
We are using Power path for path management at host end
Example:
Getting tickets through telephone
Getting tickets through emails
Getting tickets in ticketing tool
Self logged tickets (if we found any critical alerts on management tool,
we create tickets our self), etc.,
Explain step by step procedure to close the received ticket?
Example:
Received the tickets/Service requests/Service orders via telephone/
mails/ web portal, etc.
If not resolved:
Lack of Experience /un-known problem
Change Required
Hardware failure
Change Required
Update the ticket with your findings
Inform to the end user, server/device owner
Raise a change request if you are responsible or inform to owner to
raise the change request
Coordinate with all teams who are all involved in this change
Schedule the change and get it done
Update and close the ticket after getting the confirmation from the end
user
Hardware failure:
Inform your findings to the end user
Provide work around/alternate solution for time being if possible
Update the ticket with the progress
Check whether this failure can be done by internal team or vendor
support is required
If vendor support is required log a service request with the vendor
If yes take the necessary approvals from server/device owner, team lead
or who ever involved
Schedule the change as per approvals
Inform to vendor about change schedule and get it resolved the problem
Once it is resolved check with the users who are affected with this
problem and take the confirmation/approval to close the ticket
Update the ticket and close
Example:
Whenever it is required to perform the change, I will inform to my
Manager and will take approval to proceed further.
I will prepare the job plan for the change.
I will raise the change request with the schedule timings and will take the
necessary approvals from concerned teams who will affect with this
change.
I will perform the change according to the job plan.
I will cross check all operation running smoothly?
I will take the confirmation from all teams.
I will update the change request and will close.
Explain about a critical situation which you have handled in the present
job.
Tell him the reason why do you want to leave the present company,
keep in mind that the answer should be positive.
Example:
Career growth.
Change of location necessity.
Expecting large scope of work, etc.,
Ask one or two questions if you want a clarification on any topic during
interview
Example:
Can you please tell me the project details
Working location and Shift timings
May I know the storage environment in this project?
What will be the scope of work of mine in this project
How is the career growth in your organization, etc.,
SANs support disk mirroring, backup and restore, archival and retrieval
of archived data, data migration from one storage device to another, and
the sharing of data among different servers in a network. SANs can
incorporate subnetworks with network-attached storage (NAS) systems.
You must have a root or super user authority to use TCPdumps in UNIX
like environment.
LDAP
Fibre Channel SANs are the de facto standard for storage networking in
the corporate data center because they provide exceptional reliability,
scalability, consolidation, and performance. Fibre Channel SANs provide
significant advantages over direct-attached storage through improved
storage utilization, higher data availability, reduced management costs,
and highly scalable capacity and performance.
Typically, Fibre Channel SANs are most suitable for large data centers
running business-critical data, as well as applications that require high-
bandwidth performance such as medical imaging, streaming media, and
large databases. Fibre Channel SAN solutions can easily scale to meet
the most demanding performance and availability requirements.
What’s the need for separate network for storage why LAN cannot
be used?
LAN hardware and operating systems are geared to user traffic, and
LANs are tuned for a fast user response to messaging requests.
With a SAN, the storage units can be secured separately from the
servers and totally apart from the user network enhancing storage
access in data blocks (bulk data transfers), advantageous for server-less
backups.
In this RAID level all the data is saved on stripped volumes which are in
turn mirrored, so any disk failure saves the data loss but it makes whole
stripe unavailable. The key difference from RAID 1+0 is that RAID 0+1
creates a second striped set to mirror a primary striped set. The array
continues to operate with one or more drives failed in the same mirror
set, but if drives fail on both sides of the mirror the data on the RAID
system is lost. In this RAID level if one disk is failed full mirror is marked
as inactive and data is saved only one stripped volume.
What is a HBA?
Host bus adapters (HBAs) are needed to connect the server (host) to the
storage.
The basic difference between SAN and NAS, SAN is Fabric based and
NAS is Ethernet based.
SAN – Storage Area Network
It accesses data on block level and produces space to host in form of
disk.
NAS – Network attached Storage
It accesses data on file level and produces space to host in form of
shared network folder.
Fabric Switch.
FC Controllers.
JBOD’s.
Each component has its own criticality with respect to business needs of
a company.
Santricity.
IBM Tivoli Storage Manager.
CA Unicenter.
Veritas Volumemanger.
How do you install device drivers for the HBA first time during OS
installation?
If you are installing Linux you need to type “linux dd” for installing any
driver.
What is Array?
One more probable reason is if you have flashed the firmware for
different OEM’s on the same hardware.
To get rid of this the flash utilities will be having option to erase all the
previous and EEPROM and boot block entry option. Use that option to
rectify the problem.
First you should make sure your hardware is of which series, you can
find out this in the product website.
Generally you can see this because in most of the testing companies
they use same hardware to test different series of same hardware type.
What they do is they flash the different series firmware. You can always
flash back to exact hardware type.
Core-edge.
Full-Mesh.
Partial-Mesh.
Cascade.
dmesg.
Online.
Degraded.
Rebuilding.
Failed.
Ethernet.
SCSI.
Fibre Channel.
What is virtualization?
There are many types of tape media available to back up the data, some
of them are:
LTO: Linear Tape Open; a new standard tape format developed by HP,
IBM, and Seagate.
No, since R0 is not redundant array, failure of any disks results in failure
of the entire array so we cannot rebuild the hot spare for the R0 array.
Fault tolerant technique where, there is more than one physical path
between the CPU in the computer systems and its main storage devices
through the buses, controllers, switches and other bridge devices
connecting them.
What is disk array?
Set of high performance storage disks that can store several terabytes of
data. Single disk array can support multiple points of connection to the
network.
8b/10b, as the encoding technique is able to detect all most all the bit
errors
What is a Fabric?
1. Fabric Login.
2. SNS.
3. Fabric Address Notification.
4. Registered state change notification.
5. Broadcast Servers.
WWN: 64bit address that is hard coded into a fibre channel HBA and
this is used to identify individual port (N_Port or F_Port) in the fabric.
What are the different topologies in Fibre Channel?
1. Point-to-Point.
2. Arbitrary Loop.
3. Switched Fabric Loop.
1. FC Physical Media.
2. FC Encoder and Decoder.
3. FC Framing and Flow control.
4. FC Common Services.
5. FC Upper Level Protocol Mapping.
What is zoning?
1. Software Zoning.
2. Hardware Zoning.
1. Loop Initialization.
2. Loop Monitoring.
3. Loop arbitration.
4. Open Loop.
5. Close Loop.
What is snapshot?
What is hot-swapping?
The basic difference between SAN and NAS , SAN is Fabric based and
NAS is Ethernet based.
SAN - Storage Area Network
NAS - Network attached Storage
- Fabric Switch
- FC Controllers
- JBOD's
FC Controllers : These are Data transfer medias they will sit on PCI slots
of Server,u can configure Arrays and volumes on it.
- Santricity
- IBM Tivoli Storage Manager.
- CA Unicenter.
- Veritas Volumemanger.
8) How do you install device drivers for the HBA first time during OS
installation ?
9) What is Array ?
There are many possibilities that might cause this problem. One of the
reason might be you are using bad drives that cannot be repaired . In
those cases you replace the disks with working ones.
One more probable reason is if you have flashed the firmware for
different OEM’s on the same hardware.
To get rid of this the flash utilities will be having option to erase all the
previous and EEPROM and boot block entry option. Use that option to
rectify the problem.
First you should make sure your hardware is of which series , you can
find out this in the product website.
Generally you can see this because in most of the testing companies
they use same hardware to test different series of same hardware type.
What they do is they flash the different series firmware. You can always
flash back to exact hardware type
Answer : There are states of RAID arrays that represent the status of the
RAID arrays which are given below
online
Degraded
Rebuilding
Failed
Answer :No, since R0 is not redundant array, failure of any disks results
in failure of the entire array so we cannot rebuild the hot spare for the R0
array.
Answer :There are many types of tape media available to back up the
data some of them are
DLT :digital linear tape - technology for tape backup/archive of networks
and servers; DLT technology addresses midrange to high-end tape
backup requirements.
LTO :linear tape open; a new standard tape format developed by HP,
IBM, and Seagate.
AIT :advanced intelligent tape; a helical scan technology developed by
Sony for tape backup/archive of networks and servers, specifically
addressing midrange to high-end backup requirements.
9) what is HA ?
What are the daily tasks of a SAN professional - What are the routine
jobs he is responsible for - What happens in a SAN industry engineers'
day in office.
Answer :
As with any software professional's life the day starts with a brief review
of going through the Mails - Since we are dependent on many other
teams who are located in different time zones & geos we usually find
nearly 50 to 150 new emails in our inbox when we start our day.
We go through all these mails - some of them are very much relevant to
our day's tasks like "Storage Product enhancement or development
efforts " , " New fixes which will impact our earlier planning" , Lots of
organisation wide mails saying who will be our new director or finance
manager , Lots of Network or system or UPS outage mails , Lots of mails
about the CR's (Change request or Bugs) which we or one our
teammate has logged and what are the latest updates regarding these
bugs. The list of mails is endless so we better can have a separate
posting for that.
Then based on our mails and our seniors(manager or lead) inputs and
also based on our earlier days work we make our small "To Do " list for
the day which is some how planned with keeping in mind the "Dreaded
Project Deadlines" ( End dates for us to finish the assigned tasks).
the list grows along the day & some of them move for next day's ToDo
list.
Then we start the actuall work to accomplish what we have put in our
ToDO list.
We may be working with new release of our SAN product and need to
carry out sanity checks on the builds ,creating setups with different
hardware and configurations and Operating Systems. We try to replicate
customer network setups as far as possible but a customer like Citibank
can have a huge SAN network costing billions of dollars what we can
afford to replicate is no where comparable to that. We can say we setup
a scaled down SAN network to work in our lab for all our work
CLARRION
AX Series
FC Series
NaviCli
NaviSphere Manager
SymCli
4,6,10
LUN masking.
Right click on Clariion Array select Properties from the drop down menu.
CX500 – 64 Initiators/Port
CX600 – 32 Initiators/Port
The managefiles command will transfer the data file to the Navisphere
CLI directory where the command was invoked.
Will check is there any free space is available in existing RAID group as
per the required LUN Capacity.
Will go to the host’s storage group properties and open the LUN tab and
add the newly created LUN
Installing NaviAgent
Creating Zone, add new zone to zone set, save and enable the zone.
Right Click on LUN > Select Expand > Expand Storage Wizard will
appear click on Next > Select the Expansion Type (Stripped or
Concatenate) and Click on Next > Confirm the Preserve Data dialog >
Select the members (LUN) of Meta and click on Next > Select the User
Capacity and click on Next > give the MetaLUN name, Default Owner,
Expansion Rate, etc.. and click on Next > Review the Summary and click
on Finish
3. If it is Registered and not logged in then you need to check the Zoning
side and physical connectivity.
4. if the host initiators are not showing under connectivity status at all
then you need check the zoning and physical connectivity. If possible
remove the zone and create it back. Once you create the zone dont
forget to Enable and Save the config. After this just refresh it.
5. Once all these tasks are fine then you can login to Navisphere and
update the array once. Update is over then you can go to connectivity
status and check
Striped or concatenated
Cache implementation.
Memory budgets for caching and for snap sessions, mirrors, clones,
copies.
Process Scheduling.
Boot Management.
User must explicitly enable this (for both read and write).
Locality - Merge several writes to the same area into a single operation.
Navisphere Manager/NaviCli
Clariion Hardware
Right Click on the LUN and select migrate from the drop down menu.
The LUN becomes private LUN when you add it to the reserved LUN
pool. Since the LUNs in the reserved LUN pool are private LUNs, they
cannot belong to storage groups and a server cannot perform I/O to
them.
Give the user name, Role, access level (Global or Local) and password.
Monitor Agents run on one or more hosts (or SPs) and watch over the
storage systems
What are Vault drives and how much capacity they use?
Clariion Platform_____:Vault Drivers____:Vault overhead per drive
CX____________________:0-4______________:6.22 GB
CX3___________________:0-4______________:33 GB
CX4___________________:0-4______________:62 GB
AX4-5_________________:0-3______________:17.4 GB
Vault Drives:
All Clariions have Vault Drives. They are the first five (5) disks in all
Clariions. Disks 0_0_0 through 0_0_4. The Vault drives on the Clariion
are going to contain some internal information that is pre-configured
before you start putting data on the Clariion. Vault Drives contains Vault
area, PSM Lun, Flare database Lun and Operating System.
The Vault:
The vault is a ‘save area’ across the first five disks to store write cache
from the Storage Processors in the event of a Power Failure to the
Clariion, or a Storage Processor Failure.
The Flare Database LUN will contain the Flare Code that is running on
the Clariion. I like to say that it is the application that runs on the Storage
Processors that allows the SPs to create the Raid Groups, Bind the
LUNs, setup Access Logix, SnapView, MirrorView, SanCopy, etc…
Operating System:
Give the user name, Role, access level (Global or Local) and password
The LUN becomes private LUN when you add it to the reserved LUN
pool. Since the LUNs in the reserved LUN pool are private LUNs, they
cannot belong to storage groups and a server cannot perform I/O to
them.
Locality - Merge several writes to the same area into a single operation.
SYMMETRIX
Symmetrix DMX-3 and DMX-4 are the latest technology using redundant
global memory and largest capacity.
What are the different types of Front-end directors and the purpose
of each one?
Possible answers:
Rule of 17 ensured that FAs being used for host connectivity were in
different power zones.
The rule of 17 is simply a way to make sure that the paths you connect
your host to are not running on the same director, but one physically far
away from it.
The original Rule of 17 was put into place to ensure that there was a
path on each bus (odd and even).The bus architecture went away in
DMX-1 ( Symm6). But we had 2 power zones; one zone for directors 1-
8, and another zone for directors 9-16. So the Rule of 17 still had value.
but DIR 3 (odd) and DIR 4 (even) reside on different buses yet in the
same power zone, so even if you had your host connected to 3 and 4 ..if
that power zone went down ..Your hosts went down
What are the major components of System Bay and Storage Bay in
DMX?
System Bay Components:
Either six or eight disk directors and up to 12 channel directors
(Combined total = 16).
From four to eight global memory directors.
Up to eight power supplies, each of having a dedicated Battery Back
Up(BBU)
1U service processor with KVM (keyboard, video screen and mouse)
and dedicated UPS.
Three cooling fan assemblies (each containing 3 fans).
Can you explain about Read Hit, Read Miss and Fast Write and
Delayed Write?
Read Hit: In a read hit operation, the requested data resides in global
memory. The channel director transfers the requested data through the
channel interface to the host and updates the global memory directory.
Since the data is in global memory, there are no mechanical delays due
to seek and latency.
Read Miss: In a read miss operation, the requested data is not in global
memory and must be retrieved from a disk device. While the channel
director creates space in the global memory, the disk director reads the
data from the disk device. The disk director stores the data in global
memory and updates the directory table. The channel director then
reconnects with the host and transfers the data. because the data is not
in global memory, the symmetrix system must search for data on the
disk and then transfer it to the channel adding seek and latency times to
the operation.
Fast Write A fast write occurs when the percentage of modified data in
global memory is less than the fast write threshold. On a host write
command, the channel director places the incoming blocks directly into
global memory. For fast write operations, the channel director stores the
data in global memory and sends a “channel end” and “device end” to
the host computer. The disk director then asynchronously de-stages the
data from global memory to the disk device.
Delayed Fast Write: A delayed fast write occurs only when the fast write
threshold has been exceeded. That is the percentage of global memory
containing modified data is higher than the fast write threshold. If this
situation occurs, the symmetrix system disconnects the channel
directors from the channels. The disk director then de-stages the data to
disk. When sufficient global memory space is available. The channel
directors reconnect to their channels and process the fast I/O requires
as a fast write. The symmectrix system continues to process read
operations during delayed fast writes with sufficient global memory
present, this type of global memory operation rarely occurs.
This feature will automatically selects and assigns the LUN IDs to the
devices while device mapping to the port Instead of manually assigning
address to the device while mapping
Save Devices: Special devices (not mapped to the host) that provide
physical storage space for pre-update images or changed tracks during
a virtual copy session of TimeFinder/Snap operations.
Hot Spare: At the time of physical drive failure hot spare drives will take
place
The prepare argument performs the preview checks and also verifies the
appropriateness of the resulting configuration definition against the
current state of the Symmetrix array; the argument then terminates the
session without change execution
The commit argument completes all stages and executes the changes in
the specified Symmetrix array
What are the possible device service states and device status
states?
We can not create Disk Groups, It should be done by changing BIN file
by CE.
We can rename the existing disk groups.
Example: symconfigure -sid 207 -cmd “set disk_group 4
disk_group_name = flash_dsks;” -v -nop commit
How do you check the free space by Disk group and Array as
whole?
Create a commandfile with the following entry to map the device to the
FA port
map dev 26ca to dir 7d:0, lun=036;
Mask the devices to the host HBA and refresh the sym configuration
symmaskdb -sid "SymID" -wwn 10000000c93f62cf -dir 7d -p 0 add devs
26ca -nop
Symmask -sid "SymiD" -refresh
Rescan the disks and refresh the powerpath or reboot the server to get
the assigned devices at host-end
Unmasking
Write Disable
Un-mapping
Dissolve meta
Deleting hypers
3. Write Disable the devices before unmapping from the Director port
symdev -sid 4282 write_disable 26ca -sa 7d -p 0 -noprompt
How do check the devices which are not mapped and masked?
How do you check the devices which are mapped to FA but not
masked to any host?
Symcli -def
What are the Symmetrix External locks and how to check and
release?
Symmetrix external locks are used by SYMAPI (locks 0 to 15) and also
for applications assigned by EMC (>15) to lock access to the entire
Symmetrix arrayduring critical operations
Step 3: Once a solution for load balancing has been developed, the next
phase is to carry out the Symmetrix device swaps. This is done using
established TimeFinder technology, which maintains data protection and
availability. we can specify whether swaps should occur in a completely
automated fashion, or if the device swaps require user approval before
the action is taken.
The QoS (Quality of Service) feature allows us to adjust the data transfer
pace on specified devices, or devices in a device group, for certain
operations.
We can control the priority service time of devices and control cache
partitions for different device groupings.
The symqos command allows you to create partitions for different device
groupings in addition to the default partition that all devices belong to
initially. Each partition will have a target cache percentage as well as a
minimum and maximum percentage. In addition, you can donate unused
cache to other partitions after a specified donation time
How do you monitor the real time events on symmetrix array with
example?
To monitor real time 100 event records with 600 seconds interval in the
symmetrix array
Symevent -sid 4282 monitor -i 600 -c 100 -warn/-error/-fatal
To monitor the real time audit logs 100 records with 30sec interval.
symaudit -sid 4282 monitor -i 30 -c 100
SRDF
Power path at host end TimeFinder/Mirror, Clone and Snapshot for local
replication SRDF for remote replication.
What are the different types of Remote Link directors used for
SRDF?
Adaptive Copy Mode is used primarily for data migrations and data
center moves.
Two
Failover: from the source side to the target side, switching data
processing to the target side.
Failback: from the target side to the source side by switching data
processing to the source side.
Update: the source side after a failover while the target side may still be
operational to its local host.
Establish:
Resume Normal SRDF operations
Preserves data on the source (R1) volumes, discarding changes to the
target (R2) volumes
Split:
Suspends link between source (R1) and target (R2) volumes
Enables read and write operations on both source and target volumes
Restore
Resume SRDF operations
Preserves data on the target (R2) volumes, discarding changes to the
source (R1) volumes
Examples:
symrdf -g "DgName" set mode sync
symrdf -cg set mode semi
symrdf -f FileName set mode async
symrdf -g "DgName" set domino on
symrdf -g "DgName" set domino off
symrdf -g "DgName" set mode acp_wp
symrdf -g "DgName" set mode acp_off
symrdf -g prod set mode acp_disk
symrdf -g prod set mode acp_off
The target (R2) device is Write Disabled to its local host I/O.
Traffic is suspend on the SRDF links.
All the tracks on the target (R2) device are marked invalid.
All tracks on the R2 side are refreshed by the R1 source side. The track
tables are merged between the R1 and R2 side.
Traffic is resumed on the SRDF links.
OR
Yes, we can migrate the R1 data to larger R2 but we can not perform
device swap, SRDF/Star operations, we can not restore back to the R1
device and Concatenated meta devices are not supported.
How do you create groups for dynamic RDF pairs in a device file?
The dynamic R1/R2 swap feature swaps the SRDF personality of the
SRDF device designations of a specified device or composite group