You are on page 1of 61

Storage

Ratnadeep Bhattacharya
Module 1

INTRODUCTION TO STORAGE
Storage only means an accumulation of devices
to store electrical bits that constitute data.
The storage subsystem can consist of:
- Hard drives (Random access devices)
- Tape drives (Sequential access devices)
- Optical or magnetic drives (CD/DVD/Floppy)
- Memory (Random access device)
Hard Drives
• Data is written to and read from in a random manner. Actual
structure of the disks are cylindrical with tracks defined on
the circumference. A spindle runs along these tracks
(sequentially) to read/write data.
• Supports logical addressing scheme, implemented with the
help of file systems.
• File systems simply bundle assign logical addresses to physical
blocks and make them appear as sequential to the external
world while the underlying physical formatting is random.
• Partition tables define the disk geography and the block
structure of the file system.
Tape Drives
• Mostly non-intelligent or semi-intelligent systems.
Basically this means that though some devices in
this class might let you do a few operations (mostly
indexing) while the data is still in the drive, data
access is not allowed as no file system structure is
defined.
• The format in which the data is written is not
recognised mostly. Though in some tape drives you
can run commands to identify the date of the
backup, kind of data held and so on.
Optical and magnetic media
• CD/DVD drives use optical
technology(something I am not aware of) to
save data while achieving greater compression
ratios.
• Floppy drives store data in the same manner
but magnetically. Compression ratios are
much lesser.
• Basic principle of storing is almost same.
Memory
• A very important unit in both processing and transmission
of information.
• Though not generally seen as a storage unit, memory has a
very important role in the utilisation of processor speed.
This has compelled many a tweaking in how memory
modules handle data. A basic understanding of memory
will help reduce adverse effects to data such as loss or
corruption.
• The system memory was introduced to hold data closer to
the processor chip to enable faster access by the core to
data.
Memory (ctd.)
• This concept was later re-introduced in the form of
caches to store data in different parts of its travel from
the disk to the processor along the system bus.
• Lately, three layers of caches have been introduced to
the processor chips with duplicate data along with
some unique addressing system (like TAG-RAM s) and
data access techniques for faster access.
• From a storage point of view, we are most concerned
with caches found on RAID cards, HBAs and controllers.
Storage today
• Today we look at storage in a very different manner – with
awe; with fear; and of course with excitement.
• Different storage devices can be held at remote locations
and accessed by operating system just like local disks. This
also has introduced an awesome array of technologies in
the storage field.
• The key words are:
– Speed
– Availability
– Manageability
– Redundancy
Transmission technologies of today
• Main players in this arena are:
– Fiber Channel
– iSCSI
• Fiber Channel wins in speed and security by using laser to transmit data using
protocols like FCP (local), FCIP and iFCP (remote using TCP/IP).
• iSCSI wins in costs, familiarity and distance.
• Though FC is touted to be lossless. That is not entirely correct. Just as we
have loss due to impedance and crosstalk created by magnetic flux in
electrical lines, we also have loss in FC due to reflection, refraction and
deflection in optical lines.
• The above point actually seriously hampers the ability of the FCP protocol to
carry data over long distances.
• We can transmit FC packets with any reliability only 50-100 KM and that too
only by using CWDM/DWDM technologies over dark fiber.
Parallel data transmission protocols

• ATA (Advanced Technology Attachment)


• SCSI (Small Computer System Interface)
• SBCCS (Small Byte Command Code Set)
ATA (also called PATA)
• This is the first protocol that dealt with data
transfers with the help of IDE interfaces.
• Started out as parallel transfer protocol with a
great number of pins in the IDE connectors.
• The ATA controllers (the IDE interfaces) are
completely unintelligent (traditionally) and based
purely on its electrical capabilities.
• With the advent of SATA it is competing fiercely
with SAS/SCSI specially in the low cost disk section.
SCSI
• Faster than PATA/SATA due to the intelligent
nature of the controllers.
• This is connection oriented protocol that also
takes care of “delivery reports”.
• Much costlier but more reliable with higher
mean failure time than SATA drives (though
they have also much improved)
SBCCS
• As PATA, developed by IBM for data
transmission purposes in IBM mainframes.
Serial data transmission
Reason for serial data transmission gaining
popularity is significant reduction of noise
across parallel electrical lines.
Also serial protocols are a lot faster than the
previous parallel protocols.
SATA
• SATA consumes 250 mV compared to the 5 V
by PATA.
• SATA is a point-to-point protocols.
• Number of pins reduced to 7 from 40 (PATA).
• SATA gives a speed of 150 Mbps as compared
to 133 Mbps in ATA.
• SATA comes in both 3.5” and SFF form factor.
SATA features
- LVD signalling.
- 8b/10b encoding.
- Lower connector pin count.
- PPP connections with hot-plug capability
- There are SATA RAID controllers available in the market, which
however are very limited in their offerings (SATA does not
support RAID 5 or above). SATA can combine the controller,
BIOS, drivers and the processor to perform RAID 0, 1 and 0+1.
- SATA RAID cannot mirror boot drives and is slower as it does
not have independent processors for read/write and RAID
calculations.
SAS
Serial Attached SCSI still uses the SCSI command set. However,
maximum speed has increased from 640 Mbps to 6 Gbps. 3
Gbps, is more common.
SAS also supports the SFF form factor with drives ranging from
72 GB to 180 GB.
SAS connectors can also support SATA drives (not vice versa
though).
SAS can support 128 drives as compared to the maximum of 15
in SCSI.
SAS can work on PPP, but it also supports:
Serial Management Protocol (SMP)
Serial SCSI Protocol (SSP)
SATA Tunneling Protocol (STP)
RAID
RAID makes multiple physical drives appear as a single
volume.
RAID can be of two types – hardware and software.
Hardware RAIDs need all the physical disks to be of the
same size whereas software RAID cannot distribute the
boot drive.
Also in software RAID, if the OS is corrupted then the RAID
configuration is destroyed. Hardware RAID has something
known as the RIS (RAID Information Sector) on the RAID
controllers and as well as on the hard drives.
RAID 0
Striping.
Read/write speed is high as data is read or
written concurrently. Number of hard drives
that can be used depends on the controller.
Capacity utilisation is 100% but there is no
redundancy.
RAID 1
Mirroring.
Read/write performance is the lowest. However,
redundancy is 100%. Only, two hard drives can
be used. Capacity utilisation is 50%.
RAID 5
Distributed Data Guarding
Performance is reasonable. Minimum number of
drives required is three and maximum
depends on the controller. Parity is calculated
from the stripes and stored in all the three
drives. Resource utilisation is (N-1)/N.
RAID 6
Advanced Data Guarding.
Data is striped and parity is calculated from the
data. Then a second data is calculated from
the data and the parity. Performance is slower
than RAID 5 and capacity utilisation is (N-2)/N.
However, RAID 6 can support the failure of upto
2 hard disks.
RAID at a glance
RAID 0 – striping.
RAID 1 – mirror.
RAID 2 – bit level parity.
RAID 3 – byte level parity.
RAID 4 – byte level dedicated parity.
RAID 5 – Distributed parity (Data Guard)
RAID 6 – Distributed parity (Advanced Data Guard)
RAID 3 and RAID 4 varies only in the size of the stripe.
Signalling types
• SE – single ended
• LVD – low voltage differential
• HVD – high voltage differential
Module 2

SCSI ARCHITECTURE
The SCSI family
• The SCSI family is mainly recognised by their corresponding
ANSI revision numbers.
• SCSI started with a speed of 5 Mb/sec. Today SCSI is
capable of 640 Mb/sec while SAS is dealing with 3 and 6
Gb/sec speeds.
• ANSI versions for SCSI:
– ANSI 1: SCSI 1
– ANSI 2: SCSI 2
– ANSI 3: SCSI 3
– ANSI 4: SCSI 3 SPC 2
– ANSI 5: SCSI 3 SPC 3
SCSI diagram

Initiator Terminator

SCSI bus
SCSI architecture
• A SCSI channel always begins at the initiator and ends at the
terminator.
• Traditionally, there is only eight IDs on a SCSI bus – 0 to 7.
• The initiator always has the SCSI ID of 7.
• The terminator does not have an ID. It’s sole purpose in life is to
terminate any electrical signal that manages to reach the end of
the bus. This ensures that the signal does not reflect (the
similarities between light and electricity are endless) back into the
bus creating noise and distortion.
• A terminator can be passive (a 50 ohm resistance) or active (an IC
chip).
• The rest of the IDs are open for client devices.
SCSI Bus phases
The most important SCSI bus phases are:
• Bus free – BSY and SEL signals are simultaneously false.
• Arbitration – the BSY signal and the SCSI ID of the device is
raised on the bus by the target. If no other device raises a
higher SCSI ID on the bus then the requesting device gains
control of the bus effectively setting up an I_T nexus.
• Selection – this simply means that there is some command or
data transfer operation going on in the bus.
• Reselection – allow a target to re-establish a connection to
the initiator which was previously initiated by the initiator but
suspended by the target.
SCSI flavours
• Fast SCSI – can process 10 million operations
per second. Has a width of 8 devices.
• Wide SCSI – can process 5 million operations
per second. Has a width of 16 or 32 devices.
Generally a width of 16 is used.
• Fast and Wide SCSI – combines the above two.
The SCSI look
• SCSI commands are grouped into blocks called
the Command Data Blocks (CDB).
• A CDB has the following:
– A control byte
– An op-code
– The LUN ID (optional)
– Any command parameters if required
Structure of the op-code
• The op-code is always the first byte of the
CDB.
– Bits 0 to 4 indicate which group the command
belongs to.
– Bits 5 to 7 indicate the actual command.
The SCSI look
• SCSI commands are grouped into blocks called
the Command Data Blocks (CDB).
• A CDB has the following:
– A control byte
– An op-code
– The LUN ID (optional)
– Any command parameters if required
Structure of the op-code
• The op-code is always the first byte of the
CDB.
– Bits 0 to 4 indicate which group the command
belongs to.
– Bits 5 to 7 indicate the actual command.
Op-code groups
There are 8 op-code groups in all:
1. Group 0 – six byte commands.
2. Group 1 – ten byte commands.
3. Group 2 – also ten byte commands..
4. Group 3 – reserved.
5. Group 4 – sixteen byte commands.
6. Group 5 – twelve byte commands.
7. Group 6 and Group 7 are for vendor specific
commands.
Module 3

FIBER CHANNEL
Fiber Channel
This is a serial bus architecture.
Each SCSI bus phase is considered a sequence by FC and
broken into 2K chunks.
Any SCSI operation is an ‘Exchange’. Exchanges are broken
into ‘Sequences’ and sequences into ‘Frames’.
SCSI packets from the host are sent to the HBA/SP ports,
which have the GLM/GBIC installed in them. The GBIC is a
special device placed on the transceiver on the port to
generate the SONET/SDH packets (encapsulating the SCSI
packet) using an ITU standard called the Generic Framing
Procedure (GFP).
FC packets
• Start of frame (4 bytes)
• Frame header (24 bytes)
• Data field (2112 bytes)
• CRC error check (4 bytes)
• End of frame (4 bytes)
FC header
• CTL (control information)
• Source address
• Destination address
• Type
• Seq_cnt
• Seq_ID
• Exchange_ID
FC ports
• N_Port – node port. Can be an HBA/SP port.
• NL_Port – arbitrated loop port. When arbitrated loop topology
instead of switched topology is used.
• F_Port – fabric port. Generally stands for a switch port.
• FL_Port – fabric port with arbitrated loop capabilities. Ports on
a switch used to integrate an arbitrated loop topology.
• G_Port – generic port. Always on the switch. Can be an E_Port
or an F_Port.
• E_Port – extension port. On the switch. Interconnects switches.
• TE_Port – trunked extension port. Again on the port. Similar to
link aggregation.
Class of Service
• This defines delivery options for frame
transmission.
• They define connection, in-order delivery and
confirmation of delivery or non-delivery of
frames.
• There are three classes- 1,2 & 3.
• SCSI over FC uses 3.
• Error recovery is passed completely to the SCSI
layers.
Zoning
• Zoning is used to present LUNs to servers in a secured way
over a switched network.
• Each node has a WWNN and each port has a WWPN.
These numbers are used to create logical connection maps
and hold them on an ‘active’ configuration file. Only one
configuration can be active at any time, albeit multiple
zones can exist within or without the configuration file.
• Zoning using WWNs is called soft zoning. In hard zoning,
ports on a logical connection maps are created between
ports on the same switch.
Register State Change Notification
• Information about all zones (created on the switch)
and all devices (connected to the switch) are held on
an FC database inside the switch.
• Whenever there is any change on this database,
notifications are sent to the devices attached via
RSCN (given that the device attached supports RSCN).
• There are two types of notifications:
– A node event: when a node port generates an event.
– A fabric event: when the switch generates an event.
FCIP (Fiber Channel over IP)
• Connects multi-site FC-SANs over an TCP/IP link as a logical SAN.
• SCSI packets are encapsulated inside FC packets and then FC packets are
encapsulated inside IP datagram.
• The tunnels behave similar to ISLs and can be used for trunking and load-
balancing in the same manner.
• A disruption in the IP network also affects the local FC network
temporarily and generates RSCNs.
• All the SAN islands appear as one as they start using a common and
shared namespace.
• The major drawback of this protocol is that there is no way to delink the
IP network from the FC network.
• Generally, the IP tunnel is completely transparent but low-level FC-AL
signals cannot traverse the link.
FCIP ctd.
• The frames employed over the IP tunnels to
establish connection are called FSF – FCIP
Special Frames.
• The receiver verifies the packet and if it is
acceptable then echoes the same back to the
tunnel initiator in an unmodified format.
• Then the initiator verifies the packet, after
which transmission can start.
FSF frames
• FC identifier of tunnel initiator.
• FC endpoint identifier of tunnel initiator.
• FC identifier of the intended destination.
• A 64 bit random number to uniquely identify
the FSF.
iFCP (Internet Fiber Channel Protocol)

• Initially, this protocol targeted to replace FC


switching by ethernet switching by directly
connecting to ethernet switches.
• However, the deployment is different. FC
switches are used to connect the devices onto
the ethernet switches (much like the FCIP
protocol).
iFCP’s services
iFCP relies heavily on FCNS (FC Name Server)
and FCZS (FC Zone Server).
The functionality of these two services are
provided by a new service called iSNS
(Internet Storage Name Service) rather than
the traditional SLP (Service Location Provider).
iFCP gateway devices
• There are two modes:
– Address Transparency Mode: this allows all SAN
devices to operate within a single, common
address space, exactly like FCIP. However, this also
introduces FCIP’s drawback.
– Address Translation Mode: this allows each
separate SAN island to have its own address
space. This mitigates the IP connectivity issue to a
great deal.
Module 4

ISCSI OVERVIEW
iSCSI features
• Same old serial SCSI architecture.
• First allowing block device access over an electrical network.
• Requires Gigabit networks.
• iSCSI over WAN can have latency issues.
• TOE cards save CPU overhead.
• The same old three-way (SYN – SYN/ACK – ACK) handshake used.
• TCP takes care of ordering and retry issues.
• Uses DNS and SLP services. Can be configured to use iSNS as well.
• Removes distance limitations inherent in FC.
• Encapsulates SCSI CDBs inside TCP packets turning them into iSCSI
PDUs.
• Automatic target discovery is handled by the ‘SendTarget’ command.
iSCSI functional overview
The iSCSI target ports form something called a target port group. Each port has
the port group tag attached to it.
An iSCSI login happens in the following manner:
• First the initiator port creates a SCSI port ID: the iqn number, i, hex of the
ISID.
• Then (in case of automatic discovery) it sends the SendTarget command out
to the target. The target replies with all the iSCSI portals associated to that
target port group.
• The target forms the SCSI port with: the iqn number, t, the hex code for the
target port group tag. This is the reason that even a single target port needs
to form a target group.
• The target identifies each session with a number. During the initiator login
this number is always 0. After the login the target identifies the session with
an unique ID.
iSCSI PDU (Protocol Data Unit)
• PHY header
• IP header
• TCP header
• iSCSI BHS
• CDB
• Data
iSCSI identifiers
• iSCSI names – iSCSI nodes have globally unique names.
The iqn, eui format and any aliases are supported by
ESX.
• ISID – iSCSI session ID. TCP relationship between
initiator and target.
• CID – iSCSI connection identifiers. An iSCSI session
may have several logical connections. They aggregate
bandwidth and provide load balancing.
• iSCSI portals – combination of the IP address of
initiator/target and the port number.
Module 5

SAN SWITCH BASICS


Brocade switch commands
• Display defined and currently effective configuration: cfgShow. Shows the
configuration names and then the zoning information for the active
configuration.
• Display the version: version.
• Display the switch name, mode, WWN and role: switchShow. Role is
whether the switch is the master or slave when participating in an ISL. Its
role is either as a Principal or as a Subordinate. If the switch mode is Native
then the fabric only has Brocade switches while a mixed switch fabric needs
the Interop mode.
• fabricShow – displays all the switches in the fabric with a > mark against the
switch on which the command was run.
• Displaying/clearing port statistics and errors – portStatsShow/portStatsClear.
• A concise form of the statistics – diagShow.
Brocade ctd.
• errShow – breaks down particular error .
• nsShow – shows the FC database. This means details of
all the N_Ports logged into the switch.
• nsAllShow – displays a list of all nodes with their port
numbers logged into the switch.
• supportShow – generates a log by running 25 or more
different commands. Can be captured by saving the
telnet session into a text file.
• portErrShow – an useful tactics of zeroing out all port
statistics and start any test again.
EMC Connectrix Manager and McData
The connectrix manager facilitates management of a lot of EMC and
McData switches.
McData switches have:
• Audit log
• Hardware log
• Link Incident log
• Event log
• Session log
• Product Status log
Debugging information can be collected from Data Collection in the
Connectrix Manager: Element Manager -> Hardware View ->
Maintenance Menu.
CISCO switches
There are two management components:
• Fabric manager – zoning, ISL management and
so on.
• Device manager – physical switch view and
connected devices.
CISCO commands
• show version – all information about hardware
and software.
• show interface brief – shows information about
VSAN number, Admin mode, status, speed etc.
• show fcns database – shows the FC name server
database.
• show tech-support details – gather logs from a
CISCO switch.

You might also like