Professional Documents
Culture Documents
Dr Punitha K
Overview
• External storage systems
• Organization and structure of disk drives
• Reliability of memory systems
• Error detecting and error correcting
systems
• RAID Levels
• I/O Performance
Storage Technology Drivers
• Driven by the prevailing computing paradigm
– 1950s: migration from batch to on-line processing
– 1990s: migration to ubiquitous computing
• computers in phones, books, cars, video cameras,
• nationwide fiber optical network with wireless tails
• Effects on storage industry
– Embedded storage
• smaller, cheaper, more reliable, lower power
– Data utilities
• high capacity, hierarchically managed storage
Types of Storage Devices
• Purpose
– Long-term, nonvolatile storage
– Large, inexpensive, slow level in the storage hierarchy
• Types
– Magnetic storages: disk, floppy, tape
– Optical storages: compact discs (CD), digital video/versatile discs
(DVD)
– Electrical storage: flash memory
• Bus Interface
– IDE: ATA, S-ATA
– SCSI: Small Computer System Interface
– USB: Universal Serial Bus
– 1394, Fiber Channel, etc.
External storage systems
• External storage comprises devices that store
information outside a computer. Such devices
may be permanently attached to the computer,
may be removable or may use removable media
Types of external storage devices
External HDD
Types of external storage devices
Linear Tape-Open 8 (LTO-8) tape drive and media
Types of external storage devices
An automated tape library
Types of external storage devices
- Optical media formats
Types of external storage devices
- USB flash drive image
• External storage vs. internal storage
• Security and data protection
Magnetic Disks
Magnetic Disk: Read and Write Mechanisms
• Recording & retrieval via conductive coil called a head
• May be single read/write head or separate ones
• During read/write, head is stationary, platter rotates
• Write
– Current through coil produces magnetic field
– Pulses sent to head
– Magnetic pattern recorded on surface below
• Read (traditional)
– Magnetic field moving relative to coil produces current
– Coil is the same for read and write
• Read (contemporary)
– Separate read head, close to write head
– Partially shielded magneto resistive (MR) sensor
– Electrical resistance depends on direction of magnetic field
– High frequency operation
– Higher storage density and speed
Inductive Write/ Magneto resistive Read Head
Data Organization and Formatting
Error detection and correction
• Errors
• When bits are transmitted over the computer network, they are subject to
get corrupted due to interference and network problems. The corrupted bits
leads to spurious data being received by the destination and are called
errors.
• Types of Errors
• Single bit error - In the received frame, only one bit has been corrupted,
i.e. either changed from 0 to 1 or from 1 to 0.
• Multiple bits error− In the received frame, more than one bits are
corrupted.
• Burst error − In the received frame, more than one consecutive bits are
corrupted
•Single bit error
•Burst error
Basic concepts
Networks must be able to transfer data from
one device to another with complete accuracy.
Data can be corrupted during transmission.
For reliable communication, errors must be
detected and corrected.
Error detection and correction
are implemented either at the data link
layer or the transport layer of the OSI
model.
Types of Errors
Single-bit error
• Single bit errors are the least likely type of errors
in serial data transmission because the noise must
have a very short duration which is very rare.
However this kind of errors can happen in parallel
transmission.
• Example:
If data is sent at 1Mbps then each bit lasts only
1/1,000,000 sec. or 1 μs.
For a single-bit error to occur, the noise must have
a duration of only 1 μs, which is very rare.
Burst error
The term burst error means that two or more bits
in the data unit have changed from 1 to 0 or from
0 to 1.
• RAID Levels and Types: RAID levels are grouped into the following
categories:
• Standard RAID levels
• Non-standard RAID levels
• Nested/hybrid RAID levels
• Additionally, you can choose how to implement RAID on your system. Therefore you can choose
between hardware RAID, software RAID, and firmware RAID.
• The following list explains the standard RAID levels (0, 1, 2, 3, 4, 5, 6) and popular non-standard
and hybrid options (RAID 10).
RAID Levels
• RAID 0: data striping. No redundancy
• RAID 1: mirrored disk
• RAID 2: parallel access technique
• RAID 3: redundancy
• RAID 4: independent access technique
• RAID 5: round- robin scheme
• RAID 6: mirroring
Key evaluation points for a RAID
System
• Reliability: How many disk faults can the system tolerate?
Disadvantages of RAID 0
• Doesn't provide fault tolerance or redundancy.
Reliability: 1 to N/2
1 disk failure can be handled for certain, because
blocks of that disk would have duplicates on some
other disk. If we are lucky enough and disks 0 and 2
fail, then again this can be handled as the blocks of
these disks have duplicates on disks 1 and 3. So, in
the best case, N/2 disk failures can be handled.
Capacity: N*B/2
Only half the space is being used to store data. The
other half is just a mirror to the already stored data.
Advantages of RAID 1
Increased read performance.
Provides redundancy and fault tolerance.
Simple to configure and easy to use.
Disadvantages of RAID 1
Uses only half of the storage capacity.
More expensive (needs twice as many drivers).
Requires powering down your computer to replace failed drive.
• Disadvantages of RAID 2
Expensive.
Difficult to implement.
Require entire disks for ECC.
Reliability: 1
RAID-4 allows recovery of at most 1 disk failure
(because of the way parity works). If more than one
disk fails, there is no way to recover the data.
Capacity: (N-1)*B
One disk in the system is reserved for storing the
parity. Hence, (N-1) disks are made available for
data storage, each disk having B blocks.
Advantages of RAID 4
Fast read operations.
Low storage overhead.
Simultaneous I/O requests.
Disadvantages of RAID 4
Bottlenecks that have big effect on overall performance.
Slow write operations.
Redundancy is lost if the parity disk fails.
When Raid 4 Should Be Used
• Considering its configuration, RAID 4 works best with use cases
requiring sequential reading and writing data processes of huge
files. Still, just like with RAID 3, in most solutions, RAID 4 has
been replaced with RAID 5.
Raid 5: Striping with Parity
Evaluation:
Reliability: 1
RAID-5 allows recovery of at most 1 disk
failure (because of the way parity works). If
more than one disk fails, there is no way to
recover the data. This is identical to RAID-4.
Capacity: (N-1)*B
Overall, space equivalent to one disk is
utilized in storing the parity. Hence, (N-1)
disks are made available for data storage,
each disk having B blocks.
Advantages of RAID 5
High performance and capacity.
Fast and reliable read speed.
Tolerates single drive failure.
Disadvantages of RAID 5
Longer rebuild time.
Uses half of the storage capacity (due to parity).
If more than one disk fails, data is lost.
More complex to implement.
When Raid 5 Should Be Used
• RAID 5 is often used for file and application servers because of
its high efficiency and optimized storage. Additionally, it is the
best, cost-effective solution if continuous data access is a priority
and/or you require installing an operating system on the array.
Raid 6: Striping with Double Parity
Advantages of RAID 6
High fault and drive-failure tolerance.
Storage efficiency (when more than four drives are used).
Fast read operations.
Disadvantages of RAID 6
Rebuild time can take up to 24 hours.
Slow write performance.
Complex to implement.
More expensive.
When Raid 6 Should Be Used
• RAID 6 is a good solution for mission-critical applications where
data loss cannot be tolerated. Therefore, it is often used for data
management in defence sectors, healthcare, and banking.
Raid 10: Mirroring with Striping
Advantages of RAID 10
High performance.
High fault-tolerance.
Fast read and write operations.
Fast rebuild time.
Disadvantages of RAID 10
Limited scalability.
Costly (compared to other RAID levels).
Uses half of the disk space capacity.
More complicated to set up.
When Raid 10 Should Be Used
• RAID 10 is often used in use cases that require storing high
volumes of data, fast read and write times, and high fault
tolerance. Accordingly, this RAID level is often implemented for
email servers, web hosting servers, and databases.
Non-Standard RAID
• The RAID levels mentioned above are considered standard or commonly used RAID
implementations. However, there is a myriad of ways you can set up redundant
arrays of independent disks.
• Accordingly, many open-source projects and companies have created their own
configurations to adhere to their needs. As a result, there are many non-standard
RAID implementations, such as:
RAID-DP
Linux MD RAID 10
RAID-Z
Drive Extender
De-clustered RAID
Nested (Hybrid) RAID
• You can combine two or more standard RAID levels to ensure better performance
and redundancy. Such combinations are called nested (or hybrid) RAID levels.
• Hybrid RAID implementations are named after the RAID levels they incorporate. In
most cases, they include two numbers where their order represents the layering
scheme.
• Popular hybrid RAID levels include:
RAID 01 (striping and mirroring; also known as “mirror of stripes”)
RAID 03 (byte-level striping and dedicated parity)
RAID 10 (disk mirroring and straight block-level striping)
RAID 50 (distributed parity and straight block-level striping)
RAID 60 (dual parity and straight block-level striping)
RAID 100 (a stripe of RAID 10s)
RAID Implementation Types
• There are three ways of utilizing RAID:
Hardware-based RAID
• When installing the hardware setup, you insert a RAID controller card in a
fast PCI-Express slot on the motherboard and connect it to the drives.
External RAID drive enclosures with a built-in controller card are also
available.
Software-based RAID
• For the software setup, you connect the drives directly to the computer,
without using a RAID controller. In that case, you manage the disks through
utility software on the operating system.
Firmware/Driver-based RAID
• Firmware-based RAID (also known as a driver-based RAID) are RAID systems
often stored directly on the motherboard. All its operations are performed by
the computer's CPU, not by a dedicated processor.
RAID DISK ARRAY
Advantages of RAID
Transfer of large sequential files and graphic images is easier.
Hardware based implementation is more robust.
Software based implementation is cost-effective.
Highest performance and Data protection can be achieved.
Fault tolerance capacity is high.
They require less power.
Controller logic is built-in which helps in error detection and correction functions.
Disadvantages of RAID
In spite of using this technology, backup software is a must.
Mapping Logic blocks onto physical locations is complex.
Data chunk size affects the performance of disk array.