You are on page 1of 63

Chapter 3

Management of Data Storages


and
File Systems
Introduction
• Most frequently, data is stored on local hard disks,
but over the last few years more and more of our
files have moved “into the cloud”, where different
providers offer easy access to large amounts of
storage over the network.
• As system administrators, we are responsible for all
kinds of devices: we build systems running entirely
without local storage just as we maintain the
massive enterprise storage arrays that enable
decentralized data replication and archival.
Introduction …
• We manage large numbers of computers with their
own hard drives, using a variety of technologies to
maximize throughput before the data even gets onto
a network.
• In order to be able to optimize our systems on this
level, it is important for us to understand the
principal concepts of how data is stored, the
different storage models and disk interfaces.
• It is important to be aware of certain physical
properties of our storage media.
Introduction …
• In order to accommodate the ever-growing need for
storage space, we use technologies such as Logical
Volume Management to combine multiple physical
devices in a flexible manner to present a single
storage container to the operating system.
• We use techniques such as RAID to increase capacity,
resilience or performance, and separate data from one
another within one storage device using partitions.
• Finally, before we can actually use the disk devices to
install an operating system or any other software, we
create a file system on top of these partitions.
Storage Models
• We distinguish different storage models
• by how the device in charge of keeping
the bits in place interacts with the higher
layers:
• by where raw block device access is made
available
• by where a file system is created to make
available the disk space as a useful unit,
• by which means and protocols the
operating system accesses the file system.
Direct Attached Storage

• Hard drives are attached (commonly via a host


bus adapter and a few cables) directly to the
server, the operating system detects the block
devices and maintains a file system on them
• The vast majority of hosts (laptops, desktop and
server systems alike) all utilize this method.
• All interfacing components are within the
control of a single server’s operating system (and
frequently located within the same physical case)
and multiple servers each have their own storage
system.
Direct Attached Storage …

Direct Attached Storage


Direct Attached Storage …

• Multiple direct attached disks can be combined to create


a single logical storage unit through the use of a Logical
Volume Manager (LVM) or a Redundant Array of
Independent Disks (RAID). This allows for improved
performance, increased amount of storage and/or
redundancy.
• Direct attached storage need not be physically located in
the same case (or even rack) as the server using it. That
is, we differentiate between internal storage (media
attached inside the server with no immediate external
exposure) and external storage (media attached to a
server’s interface ports, such as Fibre Channel, USB
etc.) with cables.
Advantages and disadvantages
• Advantages
• Since there is no network or other additional
layer in between the operating system and the
hardware, the possibility of failure on that level
is eliminated. Likewise, a performance penalty
due to network latency
• Disadvantages
• Since the storage media is, well, directly
attached, it implies a certain isolation from other
systems on the network.
Network Attached Storage
• One host functions as the “file server”, while
multiple clients access the file system over the
network.
Network Attached Storage…
• Advantages
• Data is no longer restricted to a single physical or virtual
host and can be accessed (simultaneously) by multiple
clients.
• By pooling larger resources in a dedicated NAS device,
more storage becomes available.
• Disadvantage
• Data becomes unavailable if the network connection
suffers a disruption.
Storage Area Networks
• To allow different clients to access large chunks of
storage on a block level, we build high performance
networks specifically dedicated to the management
of data storage: Storage Area Networks.
• Central storage media is accessed using high
performance interfaces and protocols such as Fibre
Channel or iSCSI, making the exposed devices
appear local on the clients
Storage Area Networks …
• Storage area networks are frequently labeled an
“enterprise solution” due to their significant
performance advantages and distributed nature.
• Allow connecting switched SAN components across
a Wide Area Network (or WAN).
• But the concept of network attached storage devices
facilitating access to a larger storage area network
becomes less accurate when end users require
access to their data from anywhere on the Internet.
• Cloud storage solutions have been developed to
address these needs.
Storage Area Networks …
Storage Area Networks …
Cloud Storage
• Direct attached storage providing block-level access,
to distributed file systems, and then back around to
block-level access over a dedicated storage network.
• But this restricts access to clients on this specific
network.
• As more and more (especially smaller or mid-sized)
companies are moving away from maintaining their
own infrastructure to-wards a model of Infrastructure
as a Service (IaaS) and Cloud Computing, the storage
requirements change significantly, and we enter the
area of Cloud Storage.
Cloud Storage …
Cloud Storage …
Disk Devices and Interfaces
• Hard drives can be made available to a server in a
variety of ways.
• Individual disks are connected directly to a Host
Bus Adapter (HBA) using a single data/control
cable and a separate power cable.
• The traditional interfaces here are
• SCSI(Small Computer System Interface), PATA(parallel
advanced technology attachment) and SATA(standard
advanced technology attachment), as well as Fibre
Channel.
• SCSI • Fiber channel
Physical Disk Structure
• A disk is composed of circular plates called platters. Each
platter has an upper and lower oxide-coated surface.
• Recording heads, at least one per surface, are mounted on
arms that can be moved to various radial distances from the
center of the platters.
• The heads float very close to the surfaces of the platters,
never actually touching them, and read and record data as
the platters spin around.
Physical Disk Structure…
Physical Disk Structure…
• When the recording heads are at a particular
position, the portions of the disk that can be read or
written are called a cylinder
• A cylinder is made up of rings on the upper and
lower surfaces of all of the platters. The ring on one
surface is called a track. Each track is divided into
disk blocks (sometimes called sectors)
Disk Partitions
• Disks are divided into logical units called partitions.
• Partitions divide the disk into fixed-size portions
• A disk partition is a grouping of adjacent cylinders through all
platters of a hard drive.
• Dividing a single large disk into multiple smaller partitions is done
for a number of good reasons:
• if you wish to install multiple operating systems, for example, you need
to have dedicated disk space as well as a bootable primary partition for
each OS.
• You may also use partitions to ensure that data written to one location
(log files, for example, commonly stored under e.g. /var/log) cannot cause
you to run out of disk space in another (such as user data under /home).
• Other reasons to create different partitions frequently involve the choice
of file system or mount options, which necessarily can be applied only on
a per-partition basis.
Option 1: Partition a Disk Using parted
Command
• Step 1: List Partitions
• Before making a partition, list available storage
devices and partitions. This action helps identify the
storage device you want to partition. sudo parted –l
• The terminal prints out available storage devices with
information about:
• Model – Model of the storage device.
• Disk – Name and size of the disk.
• Sector size – Logical and physical size of the memory.
• Partition Table – Partition table type (msdos, gpt, aix, amiga, bsd,
dvh, mac, pc98, sun, and loop).
• Disk Flags – Partitions with information on size, type, file system,
and flags.
Option 1: Partition a Disk Using
parted Command
• Partitions types can be:

• Primary – Holds the operating system files. Only four


primary partitions can be created.
• Extended – Special type of partition in which more than
the four primary partitions can be created.
• Logical – Partition that has been created inside of an
extended partition.
Step 2: Open Storage Disk

• Open the storage disk that you intend to partition by


running the following command: sudo parted
/dev/sdb

• Always specify the storage device. If you don’t


specify a disk name, the disk is randomly selected.
To change the disk to dev/sdb run: select /dev/sdb
Step 3: Make a Partition Table
• Create a partition table before partitioning the disk. A partition table is
located at the start of a hard drive and it stores data about the size and
location of each partition.

• Partition table types are: aix, amiga, bsd, dvh, gpt, mac, ms-dos, pc98,
sun, and loop.

• To create a partition table, enter the following:

mklabel [partition_table_type]

For example, to create a gpt partition table, run the following command:
mklabel gpt

Type Yes to execute:


Step 4: Check Table
• Run the print command to review the partition table. The
output displays information about the storage device:
• Note: Run help mkpart command to get additional help on
how to create a new partition.
Step 5: Create Partition
• Let’s make a new 1854MB-partition using the ext4 file
system. The assigned disk start shall be 1MB and the disk
end is at 1855MB.
• To create a new partition, enter the following:
mkpart primary ext4 1MB 1855MB
After that, run the print command to review information on
the newly created partition.
To save your actions and quit, enter the quit command.
Changes are saved automatically with this command.
Option 2: Partition a Disk Using fdisk Command
Step 1: List existing Partitions

• Run the following command to list all existing


partitions:

sudo fdisk –l or sudo fdisk -l /dev/sd*

Step 2: Select Storage Disk

Select the storage disk you want to create partitions on


by running the following command: sudo fdisk
/dev/sda
Step 3: Create a New Partition
1. Run the n command to create a new partition.

2. Select the partition number by typing the default number (2).

3. After that, you are asked for the starting and ending sector of
your hard drive. It is best to type the default number in this
section (3622912).

4. The last prompt is related to the size of the partition. You can
choose to have several sectors or to set the size in megabytes
or gigabytes. Type +2GB to set the size of the partition to
2GB.
Step 4: Write on Disk
• The system created the partition, but the changes are not
written on the disk.

1. To write the changes on disk, run the w command:

2. Verify that the partition is created by running the following


command: sudo fdisk -l

Once a partition has been created with the parted of fdisk


command, format it before using it.

Format the partition by running the following command:

sudo mkfs -t ext4 /dev/sdb1


Mount the Partition

• To begin interacting with the disk, create a mount point and


mount the partition to it.

1. Create a mount point by running the following command:

sudo mkdir -p /mt/sdb1

2. After that, mount the partition by entering:

sudo mount -t auto /dev/sbd1 /mt/sdb1

• The terminal does not print out an output if the commands are
executed successfully.

3. Verify if partition is mounted by using the df hT command:


Analyzing the Hard Drive

• df, which stands for Disk Filesystem, is used to


check disk space. It will display available and used
storage of file systems on your machine.
• FileSystem : provides the name of the file system.

• Size : gives us the total size of the specific file system.

• Used : shows how much disk space is used in the particular file
system.
• Available : shows how much space is left in the file system.

• Use% : displays the percentage of disk space that is used.

• Mounted On : tells us the mount point of a particular file system.


Analyzing the Hard Drive
• By adding a certain option to the df command, you can
check the disk space in Linux more precisely. These are the
most popular options:
o df –h : it will display the result in a human-readable format.
o df –m : this command line is used to display information
of file system usage in MB.
o df –k: to display file system usage in KB.
o df –T : this option will show the file system type (a new
column will appear).
o df /home : it allows you to view information about a
specific file system in a readable format (in this case /home
file system).
o df — help : it lists down other useful options that you can
use, complete with their descriptions.
Analyzing the Hard Drive

• Another important command is du, short for Disk


Usage.

• It will show you details about the disk usage of files


and directories on a Linux computer or server.

• With the du command, you need to specify which


folder or file you want to check. The syntax is as
follow:

du <options> <location of directory or file>


Analyzing the Hard Drive
• Let’s take a look at real-world use of the du command
with the Desktop directory:
• du /home/user/Desktop: this command line allows
users to see into the disk usage of their Desktop
folders and files (subdirectories are included as well).
• du -h /home/user/Desktop: just like with df, the option
-h displays information in a human-readable format.
• du -sh /home/user/Desktop: the -s option will give us
the total size of a specified folder (Desktop in this
case).
• du -m /home/user/Desktop: the -m option provides us
with folder and file sizes in Megabytes (we can use -k
to see the information in Kilobytes).
RAID(Redundant Array of Independent
Disks)
• Multiple disks can be combined in a number of
ways to accomplish one or more of these goals:
• increased total disk space,
• increased performance,
• increased data redundancy.
• Like an LVM, RAID as well hides the complexity
of the management of these devices from the OS
and simply presents a virtual disk comprised of
multiple physical devices. However, unlike with
logical volume management a RAID configuration
cannot be expanded or shrunk without data loss.
RAID levels
• RAID can do two basic things.
• First, it can improve performance by “striping” data across
multiple drives, thus allowing several drives to work
simultaneously to supply or absorb a single data stream.
• Second, it can replicate data across multiple drives, decreasing
the risk associated with a single failed disk.
• Replication assumes two basic forms:
• mirroring, in which data blocks are reproduced bit-for-bit on
several different drives, and
• parity schemes, in which one or more drives contain an error-
correcting checksum of the blocks on the remaining data drives.
• Mirroring is faster but consumes more disk space. Parity
schemes are more disk space-efficient but have lower
performance.
RAID levels
• RAID is described in terms of “levels” that specify the
exact details of the parallelism and redundancy
implemented by an array.
• RAID 0 Striped array (block level), no parity or mirroring
• RAID 1 Mirrored array, no parity or striping
• RAID 2- Striped array (bit level) with dedicated parity
• RAID 3- Striped array (byte level) with dedicated parity
• RAID 4- Striped array (block level) with dedicated parity
• RAID 5- Striped array (block level) with distributed parity
• RAID 6- Striped array (block level) with double distributed
parity
RAID 0
• By writing data blocks in parallel across all
available disks
• RAID 0 accomplishes a significant performance
increase. At the same time, available disk space is
linearly increased. However, RAID 0 does not
provide any fault tolerance: any disk failure in the
array causes data loss.
RAID-1
• RAID level 1 is colloquially known as mirroring.
Writes are duplicated to two or more drives
simultaneously.
• This arrangement makes writes slightly slower than
they would be on a single drive. However, it offers
read speeds comparable to RAID 0 because reads
can be farmed out among the several duplicate disk
drives.
RAID levels 1+0 and 0+1
• RAID levels 1+0 and 0+1 are stripes of mirror sets or
mirrors of stripe sets. Logically, they are concatenations
of RAID 0 and RAID 1
• The goal of both modes is to simultaneously obtain the
performance of RAID 0 and the redundancy of RAID 1.
RAID-5
• RAID level 5 stripes both data and parity
information, adding redundancy while
simultaneously improving read performance.
• In addition, RAID 5 is more efficient in its use of
disk space than is RAID 1.
RAID-6
• RAID level 6 is similar to RAID 5 with two parity
disks. A RAID 6 array can withstand the complete
failure of two drives without losing data.
Implementations
Software based RAID:
• Software implementations are provided by
many Operating Systems.
• A software layer sits above the disk device
drivers and provides an abstraction layer
between the logical drives(RAIDs) and physical
drives.
• Server's processor is used to run the RAID
software.
• Used for simpler configurations like RAID0 and
RAID1.
Implementations …
Hardware based RAID:
• A hardware implementation
of RAID requires at least a
special-purpose RAID
controller.
• On a desktop system this
may be built into the
motherboard.
A PCI-bus-based, IDE/ATA hard disk •
RAID
Processor is not used for
controller, supporting levels 0, 1, and 10. RAID calculations as a

separate controller present.


Logical Volumes
• Using hard drives with fixed partitions may lead to
a number of problems:
• disk drives are prone to hardware failure, which may
lead to data loss;
• reading data from individual disk drives may suffer
performance penalties depending on the location of the
data on the disk; and,
• perhaps most notably, disks are never large enough.
• A logical volume manager (or LVM) combine
multiple physical devices and present them to the
OS as a single resource.
Logical Volumes …
• Logical volume groups grants the system
administrator a significant amount of flexibility:
• The total storage space can easily be extended (and the
file system, if it supports this operation, grown!) by
adding new disk drives to the pool;
• File system performance can be improved by striping
data across all available drives;
• Data redundancy and fault tolerance can be improved by
mirroring data on multiple devices.
Logical Volumes …
• The LVM divides the physical volumes into data
blocks, so called physical extents, and allows the
system administrator to group one or more of these
physical volumes into a logical volume group.
• In effect, available storage space is combined into a
pool, where resources can dynamically be added or
removed.
Logical Volumes …
• Out of such a volume group, individual logical volumes
can then be created, which in turn are divided into the
equivalent of a hard disk’s sectors, so-called logical
extents.
• This step of dividing a logical volume group into logical
volumes is conceptually equivalent to the division of a
single hard drive into multiple partitions; in a way, you
can think of a logical volume as a virtual disk.
• To the operating system, the resulting device looks and
behaves just like any disk device:
• it can be partitioned, and new file systems can be created on
them just like on regular hard drive disks.
Logical Volumes …
Anatomy of LVM
• This diagram gives an overview of the main elements in an
LVM system:
Schematically..
Logical Volume Management

Physical Volume Physical Volume Physical Volume /boot


20GB 36GB 34GB 2GB
ext3

Logical Volume Group


90GB

Logical Volume Logical Volume


Free Space
/home /
15GB
50GB 25GB
1. Volume group (VG)
• The Volume Group is the highest level abstraction used
within the LVM. It gathers together a collection of Logical
Volumes and Physical Volumes into one administrative unit.
2. physical volume (PV)
• A physical volume is typically a hard disk, though it may
well just be a device that 'looks' like a hard disk (eg. a
software raid device).
3. logical volume (LV)
• The equivalent of a disk partition in a non-LVM system.
The LV is visible as a standard block device; as such the
LV can contain a file system (eg. /home).
4. physical extent (PE)
• Each physical volume is divided chunks of data, known
as physical extents, these extents have the same size as
the logical extents for the volume group.
5. logical extent (LE)
• Each logical volume is split into chunks of data, known
as logical extents. The extent size is the same for all
logical volumes in the volume group.
Linux/Unix File System
• A filesystem is a way of storing, organizing and
accessing files (and/or directories) on a storage device.
• Some examples of filesystems are FAT, NTFS for
Windows/DOS, HFS for MAC OS etc.
• In Linux, the popular filesystems are ext2, ext3 and
ext4 filesystems.
• Some other filesystems such as ReiserFS are also
natively supported by Linux.
• Here, we discuss various features of extended
filesystems in Linux, i.e. ext2, ext3 and ext4.
ext2 - Second Extended Filesystem

• The extended filesystem, ext, implemented in Linux in 1992 was


the first filesystem designed specifically for Linux.
• ext2 filesystem is the second extended filesystem.
• It was default filesystem in many Linux distros for many years.
• Features of ext2 are:
• Developed by Remi Card
• Introduced in January 1993
• Replacement for extended filesystem
• Maximum file size: 16GiB - 2TiB, depending upon block size (1K, 2K,
4K or 8K)
• Maximum volume/filesystem size: 2TiB - 32TiB
• Maximum filename length: 255 bytes (255 characters)
• Maximum number of files: 10^18
• Filenames: All characters except NULL('\0') and '/' are allowed in a file
name
• Date range: December 14, 1901 - January 18, 2038
ext3 - Third Extended Filesystem
• With ext3, the concept of journaling was introduced.
• With ext2 filesystem, when system crashed, or power failure
occurred, the whole filesystem needed to be checked for
consistency and bad blocks.
• With journaling, the filesystem keeps track of the changes
made in the filesystem before committing them to
filesystem.
• These changes are stored at a specified location in a dedicated
area of the filesystem.
• So in the event of power failure or system crash, the
filesystems can be brought back much quickly.
• ext3 filesystem is fully compatible with its previous version,
i.e. ext2 filesystem.
• The other features are:
• Developed by Stephen Tweedie
• Introduced in November 2001 (with Linux 2.4.15)
• Journaled filesystem.
• An ext2 filesystem can be converted to ext3 without any
need of backup.
• Maximum file size: 16GiB - 2TiB
• Maximum volume/filesystem size: 2TiB - 32TiB
• Maximum filename length: 255 bytes (255 characters)
• Maximum number of files: Variable
• Filenames: All characters except NULL('\0') and '/' are
allowed
• Date range: December 14, 1901 - January 18, 2038
ext4 - Fourth extended filesystem

• The ext4 filesystem, developed as an extension to ext3 is the newest


filesystem in the series of extended filesystems (ext's).
• It has many performance improvements over ext3.
• In most modern distros, the default filesystem is ext4.
• The features are:
• Developers: Mingming Cao, Andreas Dilger, Alex Zhuravlev (Tomas), Dave
Kleikamp, Theodore Ts'o, Eric Sandeen, Sam Naghshineh and others (from
wikipedia.org)
Introduced in October 2008 (stable)
• Journaled filesystem
• Performance enhancements over its predecessor (ext3)
• Maximum file size: 16TB
• Maximum volume/filesystem size: 1EIB (exabyte) (1Eib = 1024PiB, 1PiB =
1024TiB, 1TiB = 1024GiB
• Maximum filename length: 255 bytes (255 characters)
• Maximum number of files: 4 billion
• Filenames: All characters except NULL('\0') and '/' are allowed
• Date range: December 14, 1901 - April 25, 2514
• Total filesystem check time improved (fsck time)
• An ext3 filesystem can be converted to ext4 filesystem

You might also like