Storage

Ratnadeep Bhattacharya

Module 1

INTRODUCTION TO STORAGE

Storage only means an accumulation of devices to store electrical bits that constitute data. The storage subsystem can consist of: - Hard drives (Random access devices) - Tape drives (Sequential access devices) - Optical or magnetic drives (CD/DVD/Floppy) - Memory (Random access device)

Hard Drives
‡ Data is written to and read from in a random manner. Actual structure of the disks are cylindrical with tracks defined on the circumference. A spindle runs along these tracks (sequentially) to read/write data. ‡ Supports logical addressing scheme, implemented with the help of file systems. ‡ File systems simply bundle assign logical addresses to physical blocks and make them appear as sequential to the external world while the underlying physical formatting is random. ‡ Partition tables define the disk geography and the block structure of the file system.

Tape Drives
‡ Mostly non-intelligent or semi-intelligent systems. Basically this means that though some devices in this class might let you do a few operations (mostly indexing) while the data is still in the drive, data access is not allowed as no file system structure is defined. ‡ The format in which the data is written is not recognised mostly. Though in some tape drives you can run commands to identify the date of the backup, kind of data held and so on.

Optical and magnetic media
‡ CD/DVD drives use optical technology(something I am not aware of) to save data while achieving greater compression ratios. ‡ Floppy drives store data in the same manner but magnetically. Compression ratios are much lesser. ‡ Basic principle of storing is almost same.

Memory
‡ A very important unit in both processing and transmission of information. ‡ Though not generally seen as a storage unit, memory has a very important role in the utilisation of processor speed. This has compelled many a tweaking in how memory modules handle data. A basic understanding of memory will help reduce adverse effects to data such as loss or corruption. ‡ The system memory was introduced to hold data closer to the processor chip to enable faster access by the core to data.

Memory (ctd.)
‡ This concept was later re-introduced in the form of caches to store data in different parts of its travel from the disk to the processor along the system bus. ‡ Lately, three layers of caches have been introduced to the processor chips with duplicate data along with some unique addressing system (like TAG-RAM s) and data access techniques for faster access. ‡ From a storage point of view, we are most concerned with caches found on RAID cards, HBAs and controllers.

Storage today
‡ Today we look at storage in a very different manner with awe; with fear; and of course with excitement. ‡ Different storage devices can be held at remote locations and accessed by operating system just like local disks. This also has introduced an awesome array of technologies in the storage field. ‡ The key words are:
± ± ± ± Speed Availability Manageability Redundancy

Transmission technologies of today
‡ Main players in this arena are:
± Fiber Channel ± iSCSI

‡ Fiber Channel wins in speed and security by using laser to transmit data using protocols like FCP (local), FCIP and iFCP (remote using TCP/IP). ‡ iSCSI wins in costs, familiarity and distance. ‡ Though FC is touted to be lossless. That is not entirely correct. Just as we have loss due to impedance and crosstalk created by magnetic flux in electrical lines, we also have loss in FC due to reflection, refraction and deflection in optical lines. ‡ The above point actually seriously hampers the ability of the FCP protocol to carry data over long distances. ‡ We can transmit FC packets with any reliability only 50-100 KM and that too only by using CWDM/DWDM technologies over dark fiber.

Parallel data transmission protocols
‡ ATA (Advanced Technology Attachment) ‡ SCSI (Small Computer System Interface) ‡ SBCCS (Small Byte Command Code Set)

ATA (also called PATA)
‡ This is the first protocol that dealt with data transfers with the help of IDE interfaces. ‡ Started out as parallel transfer protocol with a great number of pins in the IDE connectors. ‡ The ATA controllers (the IDE interfaces) are completely unintelligent (traditionally) and based purely on its electrical capabilities. ‡ With the advent of SATA it is competing fiercely with SAS/SCSI specially in the low cost disk section.

SCSI
‡ Faster than PATA/SATA due to the intelligent nature of the controllers. ‡ This is connection oriented protocol that also takes care of delivery reports . ‡ Much costlier but more reliable with higher mean failure time than SATA drives (though they have also much improved)

SBCCS
‡ As PATA, developed by IBM for data transmission purposes in IBM mainframes.

Serial data transmission
Reason for serial data transmission gaining popularity is significant reduction of noise across parallel electrical lines. Also serial protocols are a lot faster than the previous parallel protocols.

SATA
‡ SATA consumes 250 mV compared to the 5 V by PATA. ‡ SATA is a point-to-point protocols. ‡ Number of pins reduced to 7 from 40 (PATA). ‡ SATA gives a speed of 150 Mbps as compared to 133 Mbps in ATA. ‡ SATA comes in both 3.5 and SFF form factor.

SATA features
LVD signalling. 8b/10b encoding. Lower connector pin count. PPP connections with hot-plug capability There are SATA RAID controllers available in the market, which however are very limited in their offerings (SATA does not support RAID 5 or above). SATA can combine the controller, BIOS, drivers and the processor to perform RAID 0, 1 and 0+1. - SATA RAID cannot mirror boot drives and is slower as it does not have independent processors for read/write and RAID calculations.

SAS
Serial Attached SCSI still uses the SCSI command set. However, maximum speed has increased from 640 Mbps to 6 Gbps. 3 Gbps, is more common. SAS also supports the SFF form factor with drives ranging from 72 GB to 180 GB. SAS connectors can also support SATA drives (not vice versa though). SAS can support 128 drives as compared to the maximum of 15 in SCSI. SAS can work on PPP, but it also supports: Serial Management Protocol (SMP) Serial SCSI Protocol (SSP) SATA Tunneling Protocol (STP)

RAID
RAID makes multiple physical drives appear as a single volume. RAID can be of two types hardware and software. Hardware RAIDs need all the physical disks to be of the same size whereas software RAID cannot distribute the boot drive. Also in software RAID, if the OS is corrupted then the RAID configuration is destroyed. Hardware RAID has something known as the RIS (RAID Information Sector) on the RAID controllers and as well as on the hard drives.

RAID 0
Striping. Read/write speed is high as data is read or written concurrently. Number of hard drives that can be used depends on the controller. Capacity utilisation is 100% but there is no redundancy.

RAID 1
Mirroring. Read/write performance is the lowest. However, redundancy is 100%. Only, two hard drives can be used. Capacity utilisation is 50%.

RAID 5
Distributed Data Guarding Performance is reasonable. Minimum number of drives required is three and maximum depends on the controller. Parity is calculated from the stripes and stored in all the three drives. Resource utilisation is (N-1)/N.

RAID 6
Advanced Data Guarding. Data is striped and parity is calculated from the data. Then a second data is calculated from the data and the parity. Performance is slower than RAID 5 and capacity utilisation is (N-2)/N. However, RAID 6 can support the failure of upto 2 hard disks.

RAID at a glance
RAID 0 striping. RAID 1 mirror. RAID 2 bit level parity. RAID 3 byte level parity. RAID 4 byte level dedicated parity. RAID 5 Distributed parity (Data Guard) RAID 6 Distributed parity (Advanced Data Guard) RAID 3 and RAID 4 varies only in the size of the stripe.

Signalling types
‡ SE single ended ‡ LVD low voltage differential ‡ HVD high voltage differential

Module 2

SCSI ARCHITECTURE

The SCSI family
‡ The SCSI family is mainly recognised by their corresponding ANSI revision numbers. ‡ SCSI started with a speed of 5 Mb/sec. Today SCSI is capable of 640 Mb/sec while SAS is dealing with 3 and 6 Gb/sec speeds. ‡ ANSI versions for SCSI:
± ± ± ± ±

ANSI 1: SCSI 1 ANSI 2: SCSI 2 ANSI 3: SCSI 3 ANSI 4: SCSI 3 SPC 2 ANSI 5: SCSI 3 SPC 3

SCSI diagram

Initiator SCSI bus

Terminator

SCSI architecture
‡ A SCSI channel always begins at the initiator and ends at the terminator. ‡ Traditionally, there is only eight IDs on a SCSI bus 0 to 7. ‡ The initiator always has the SCSI ID of 7. ‡ The terminator does not have an ID. It s sole purpose in life is to terminate any electrical signal that manages to reach the end of the bus. This ensures that the signal does not reflect (the similarities between light and electricity are endless) back into the bus creating noise and distortion. ‡ A terminator can be passive (a 50 ohm resistance) or active (an IC chip). ‡ The rest of the IDs are open for client devices.

SCSI Bus phases
The most important SCSI bus phases are: ‡ Bus free BSY and SEL signals are simultaneously false. ‡ Arbitration the BSY signal and the SCSI ID of the device is raised on the bus by the target. If no other device raises a higher SCSI ID on the bus then the requesting device gains control of the bus effectively setting up an I_T nexus. ‡ Selection this simply means that there is some command or data transfer operation going on in the bus. ‡ Reselection allow a target to re-establish a connection to the initiator which was previously initiated by the initiator but suspended by the target.

SCSI flavours
‡ Fast SCSI can process 10 million operations per second. Has a width of 8 devices. ‡ Wide SCSI can process 5 million operations per second. Has a width of 16 or 32 devices. Generally a width of 16 is used. ‡ Fast and Wide SCSI combines the above two.

The SCSI look
‡ SCSI commands are grouped into blocks called the Command Data Blocks (CDB). ‡ A CDB has the following:
± A control byte ± An op-code ± The LUN ID (optional) ± Any command parameters if required

Structure of the op-code
‡ The op-code is always the first byte of the CDB.
± Bits 0 to 4 indicate which group the command belongs to. ± Bits 5 to 7 indicate the actual command.

The SCSI look
‡ SCSI commands are grouped into blocks called the Command Data Blocks (CDB). ‡ A CDB has the following:
± A control byte ± An op-code ± The LUN ID (optional) ± Any command parameters if required

Structure of the op-code
‡ The op-code is always the first byte of the CDB.
± Bits 0 to 4 indicate which group the command belongs to. ± Bits 5 to 7 indicate the actual command.

Op-code groups
There are 8 op-code groups in all: 1. Group 0 six byte commands. 2. Group 1 ten byte commands. 3. Group 2 also ten byte commands.. 4. Group 3 reserved. 5. Group 4 sixteen byte commands. 6. Group 5 twelve byte commands. 7. Group 6 and Group 7 are for vendor specific commands.

Module 3

FIBER CHANNEL

Fiber Channel
This is a serial bus architecture. Each SCSI bus phase is considered a sequence by FC and broken into 2K chunks. Any SCSI operation is an Exchange . Exchanges are broken into Sequences and sequences into Frames . SCSI packets from the host are sent to the HBA/SP ports, which have the GLM/GBIC installed in them. The GBIC is a special device placed on the transceiver on the port to generate the SONET/SDH packets (encapsulating the SCSI packet) using an ITU standard called the Generic Framing Procedure (GFP).

FC packets
‡ ‡ ‡ ‡ ‡ Start of frame (4 bytes) Frame header (24 bytes) Data field (2112 bytes) CRC error check (4 bytes) End of frame (4 bytes)

FC header
‡ ‡ ‡ ‡ ‡ ‡ ‡ CTL (control information) Source address Destination address Type Seq_cnt Seq_ID Exchange_ID

FC ports
‡ N_Port node port. Can be an HBA/SP port. ‡ NL_Port arbitrated loop port. When arbitrated loop topology instead of switched topology is used. ‡ F_Port fabric port. Generally stands for a switch port. ‡ FL_Port fabric port with arbitrated loop capabilities. Ports on a switch used to integrate an arbitrated loop topology. ‡ G_Port generic port. Always on the switch. Can be an E_Port or an F_Port. ‡ E_Port extension port. On the switch. Interconnects switches. ‡ TE_Port trunked extension port. Again on the port. Similar to link aggregation.

Class of Service
‡ This defines delivery options for frame transmission. ‡ They define connection, in-order delivery and confirmation of delivery or non-delivery of frames. ‡ There are three classes- 1,2 & 3. ‡ SCSI over FC uses 3. ‡ Error recovery is passed completely to the SCSI layers.

Zoning
‡ Zoning is used to present LUNs to servers in a secured way over a switched network. ‡ Each node has a WWNN and each port has a WWPN. These numbers are used to create logical connection maps and hold them on an active configuration file. Only one configuration can be active at any time, albeit multiple zones can exist within or without the configuration file. ‡ Zoning using WWNs is called soft zoning. In hard zoning, ports on a logical connection maps are created between ports on the same switch.

Register State Change Notification
‡ Information about all zones (created on the switch) and all devices (connected to the switch) are held on an FC database inside the switch. ‡ Whenever there is any change on this database, notifications are sent to the devices attached via RSCN (given that the device attached supports RSCN). ‡ There are two types of notifications:
± A node event: when a node port generates an event. ± A fabric event: when the switch generates an event.

FCIP (Fiber Channel over IP)
‡ Connects multi-site FC-SANs over an TCP/IP link as a logical SAN. ‡ SCSI packets are encapsulated inside FC packets and then FC packets are encapsulated inside IP datagram. ‡ The tunnels behave similar to ISLs and can be used for trunking and load-balancing in the same manner. ‡ A disruption in the IP network also affects the local FC network temporarily and generates RSCNs. ‡ All the SAN islands appear as one as they start using a common and shared namespace. ‡ The major drawback of this protocol is that there is no way to delink the IP network from the FC network. ‡ Generally, the IP tunnel is completely transparent but low-level FCAL signals cannot traverse the link.

FCIP ctd.
‡ The frames employed over the IP tunnels to establish connection are called FSF FCIP Special Frames. ‡ The receiver verifies the packet and if it is acceptable then echoes the same back to the tunnel initiator in an unmodified format. ‡ Then the initiator verifies the packet, after which transmission can start.

FSF frames
‡ ‡ ‡ ‡ FC identifier of tunnel initiator. FC endpoint identifier of tunnel initiator. FC identifier of the intended destination. A 64 bit random number to uniquely identify the FSF.

iFCP (Internet Fiber Channel Protocol)
‡ Initially, this protocol targeted to replace FC switching by ethernet switching by directly connecting to ethernet switches. ‡ However, the deployment is different. FC switches are used to connect the devices onto the ethernet switches (much like the FCIP protocol).

iFCP s services
iFCP relies heavily on FCNS (FC Name Server) and FCZS (FC Zone Server). The functionality of these two services are provided by a new service called iSNS (Internet Storage Name Service) rather than the traditional SLP (Service Location Provider).

iFCP gateway devices
‡ There are two modes:
± Address Transparency Mode: this allows all SAN devices to operate within a single, common address space, exactly like FCIP. However, this also introduces FCIP s drawback. ± Address Translation Mode: this allows each separate SAN island to have its own address space. This mitigates the IP connectivity issue to a great deal.

Module 4

ISCSI OVERVIEW

iSCSI features
Same old serial SCSI architecture. First allowing block device access over an electrical network. Requires Gigabit networks. iSCSI over WAN can have latency issues. TOE cards save CPU overhead. The same old three-way (SYN SYN/ACK ACK) handshake used. TCP takes care of ordering and retry issues. Uses DNS and SLP services. Can be configured to use iSNS as well. Removes distance limitations inherent in FC. Encapsulates SCSI CDBs inside TCP packets turning them into iSCSI PDUs. ‡ Automatic target discovery is handled by the SendTarget command. ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡

iSCSI functional overview
The iSCSI target ports form something called a target port group. Each port has the port group tag attached to it. An iSCSI login happens in the following manner: ‡ First the initiator port creates a SCSI port ID: the iqn number, i, hex of the ISID. ‡ Then (in case of automatic discovery) it sends the SendTarget command out to the target. The target replies with all the iSCSI portals associated to that target port group. ‡ The target forms the SCSI port with: the iqn number, t, the hex code for the target port group tag. This is the reason that even a single target port needs to form a target group. ‡ The target identifies each session with a number. During the initiator login this number is always 0. After the login the target identifies the session with an unique ID.

iSCSI PDU (Protocol Data Unit)
‡ ‡ ‡ ‡ ‡ ‡ PHY header IP header TCP header iSCSI BHS CDB Data

iSCSI identifiers
‡ iSCSI names iSCSI nodes have globally unique names. The iqn, eui format and any aliases are supported by ESX. ‡ ISID iSCSI session ID. TCP relationship between initiator and target. ‡ CID iSCSI connection identifiers. An iSCSI session may have several logical connections. They aggregate bandwidth and provide load balancing. ‡ iSCSI portals combination of the IP address of initiator/target and the port number.

Module 5

SAN SWITCH BASICS

Brocade switch commands
‡ Display defined and currently effective configuration: cfgShow. Shows the configuration names and then the zoning information for the active configuration. ‡ Display the version: version. ‡ Display the switch name, mode, WWN and role: switchShow. Role is whether the switch is the master or slave when participating in an ISL. Its role is either as a Principal or as a Subordinate. If the switch mode is Native then the fabric only has Brocade switches while a mixed switch fabric needs the Interop mode. ‡ fabricShow displays all the switches in the fabric with a > mark against the switch on which the command was run. ‡ Displaying/clearing port statistics and errors portStatsShow/portStatsClear. ‡ A concise form of the statistics diagShow.

Brocade ctd.
‡ errShow breaks down particular error . ‡ nsShow shows the FC database. This means details of all the N_Ports logged into the switch. ‡ nsAllShow displays a list of all nodes with their port numbers logged into the switch. ‡ supportShow generates a log by running 25 or more different commands. Can be captured by saving the telnet session into a text file. ‡ portErrShow an useful tactics of zeroing out all port statistics and start any test again.

EMC Connectrix Manager and McData
The connectrix manager facilitates management of a lot of EMC and McData switches. McData switches have: ‡ Audit log ‡ Hardware log ‡ Link Incident log ‡ Event log ‡ Session log ‡ Product Status log Debugging information can be collected from Data Collection in the Connectrix Manager: Element Manager -> Hardware View -> Maintenance Menu.

CISCO switches
There are two management components: ‡ Fabric manager zoning, ISL management and so on. ‡ Device manager physical switch view and connected devices.

CISCO commands
‡ show version all information about hardware and software. ‡ show interface brief shows information about VSAN number, Admin mode, status, speed etc. ‡ show fcns database shows the FC name server database. ‡ show tech-support details gather logs from a CISCO switch.

Sign up to vote on this title
UsefulNot useful