You are on page 1of 14

437

A FIBRE CHANNEL-BASED ARCHITECTURE FOR INTERNIET MULTI-


MEDIA SERVER CLUSTERS

SHENZE CHEN, MANU THAPAR


Hewlett-Packard Labs
1501 Page Mill Rd., Pa10 Alto, CA 94304
{szchen,thupur}@ hpl.hp.corn

In this paper, we present a cluster architecture for Intemet multimedia servers, which uses the
Fibre Channel (FC) technology to overcome some of the shortcomings of existing architec-
tures. We also explore the design issues of an FC-based multimedia server cluster. A signifi-
cant advantage of the FC-based cluster is that it allows physicul storuge uttachnent to rhe in-
terconnecr. Because of this feature, FC-based clusters will change the fundamental data-
sharing paradigm of existing clusters by eliminating remote data accesses in a cluster. Many
aspects of this architecture are critical to real-time multimedia applications, such as audio and
video services.

1 Introduction
Today thousands of Web servers are managing multiple terabytes of data across the
Internet, which may grow exponentially given the rapidly increasing demand for
multimedia information. Multimedia Web servers differentiate themselves from tra-
ditional file servers by requiring real-time delivery of continuous media streams, such
as audio and video. A recent study [l] has shown that, for popular sites, no single
processor system could handle the current web traffic, and therefore a scalable solu-
tion becomes a necessity.
In this paper, we present a scalable cluster architecture for Internet multimedia
servers, which is based on the newly developed Fibre Channel technology and tries to
overcome some of the shortcomings of existing cluster architectures. Fibre Channel
(FC) is a new serial link defined by the ANSI X3T9 'Technical Committee as an in-
dustry standard [ 2 ] . It provides a general transport vehicle for Upper Level Protocols
(ULP), such as SCSI, IP, etc. The bandwidth can be its high as one giga-bit per sec-
ond. Multiple systems or storage devices can have point-to-point connections through
FC switches. Fibre Channel also defines a loop topology that provides a shared media
to multiple devices and hosts. One of the main advantages of the FC-based cluster is
the physical storage attachment to the cluster interconnect, which ILSunique to the
Fibre Channel technology. This feature changes the fundamental remote file access
paradigm for existing clusters. For non-FC clusters, storage devices are attached to a
host and the host is attached to the interconnect. Therefore, storage devices attached
to node A are not directly visible to node B. If node I3 wants to access data stored in
node A, B has to request A's service to retrieve the data. These remote accesses are
costly in terms of performance and resource utilizatijon. In contrast, in an FC-based

(0-7803-4229-1/97/$10.00 1997 IEEE)

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
438

cluster, because of the direct storage attachment to the interconnect, all storage de-
vices are immediately visible to all nodes. Thus, an FC-based cluster can eliminate all
remote accesses, thereby achieving both cost and performance improvements. We
will discuss the FC-based cluster architecture and its benefits in detail in the context
of providing lnternet multimedia services. We will also address FC cluster design
issues such as data sharing, load balancing, cluster control mechanism, fault toler-
ance, and server scalability.
In an early study [3], we have shown that FC’s fair arbitration scheme can effec-
tively help the delivery of audio and video streams. Other studies on FC can be found
in [4,5,6]. Early clustering concepts were represented by the DEC VAX clusters.
Currently, the most popular cluster architecture is the LAN-based UNIX workstation
cluster. Recently, server architectures based on ATM or other proprietary switching
technologies have been proposed, which we will discuss in next section.
2 Current Cluster and Server Architectures
In this section, we describe several existing and proposed cluster and server archi-
tectures. We consider a cluster to be a loosely-coupled multiprocessor system that
shares data and other resources. Since most of the server architectures discussed be-
low fall into this cluster category, we will use the terms cluster and server architec-
ture interchangeably in this paper.
2.1 The HP-UX Cluster
For the most popular UNIX workstation cluster, we take the HP-UX cluster 171 as an
example for ease of discussion. The HP-UX cluster is a collection of workstations
that are interconnected by a typical FDDI or Ethernet LAN and is built on top of the
NFS file system. After booting, each node perceives a single consistent file system
view.
2.2 The NCSA Web Server
The NCSA Web server is based on the AFS distributed file system [l,S]. The server
complex consists of multiple AFS file servers and AFS clients. All of these nodes are
connected via an FDDI ring. Advantages of the AFS-based servers are good scalabil-
ity and support for heterogeneous server nodes. Because of the nature of WWW
workloads and AFS local caching, nearly 90% of the external Web requests are satis-
fied by the AFS clients without crossing the FDDI LAN to the AFS servers. This
caching mechanism works well with the current user access patterns. However, in the
future, audio and video clips are expected to play a larger role in conveying multime-
dia information. Since these media objects are usually very large in size and typically
need to be played-back in.real-time, this changing access pattern will have a signifi-
cant negative impact on the efficiency of the caching strategy, and therefore on the
overall performance of the server cluster.

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
439

2.3 ATM-based Server Architecture


There have been several proposals and prototypes for ATM-based media server ar-
chitectures. Buddhikot et a1 proposed a Cluster Based Storage (CBS) architecture, in
which multiple independent storage clusters are inteirconnected through an ATM
switch [SI.
In another study, Ito and Tanaka [ l o ] presented a video server architecture,
which consists of a system manager, a File Distributor (FD), multiple 'Video segment
File Servers (VFS), and multiple Sequence Control (SC) brokers. All of these nodes
are interconnected through an ATM switch.
Based on the speculation that future multimedia s,ystems will perform not only
data storage and retrieval but also data manipulations, Rooholamini and Cherkassky
[ 111 proposed using ATM as the subsystem interconnect in a shared-memory multi-
processor multimedia server.
While these architectures are different in many aspects, they share the same re-
mote file access model, i.e., server nodes that receive. external requests need to re-
quest remote nodes for file accesses.
2.4 Other Cluster Architectures
Haskin and Williams [ 121 presented a cluster architecture for Webhideo servers,
which is based on the IBM SP-2 proprietary switch technology and each node is im-
plemented with an RS/6000 workstation. Again, this ;architecture also needs to per-
form remote file accesses.
An interesting on-going research project is the Berkeley NOW (Networks of
Workstations) [ 131, which tackles cluster computing from multiple perspectives, such
as low overhead communications, cross-net memory sharing, a global layer UNIX,
serverless network file systems, etc. Myrinet is used for the cluster interconnect.

3 A Fibre Channel-based Media Server Cluster Architecture


3.1 An Architecture for Internet Multimedia Server Cluster
In this section, we present an Fibre Channel-based architecture for Internet media
server clusters.
As shown in Figure l(a), multiple server nodes aind storage devices are attached
to a Fibre Channel switch. The mass storage in the figure can be am individual FC
disk, an FC disk array, or multiple FC disks or RAIDs attached to an FC loop. In the
case of FC loop, although a maximum of 126 disks or RAIDs are allowed for each
loop, in reality, less than 30 disks or RAIDs are recommended per loop to better
utilize the effective disk bandwidth. The simplified di.agram for a cluster node, which
can be a system controller, a Web server, or an authoring station, is shown in Figure
1 (b). Here one FC interface replaces multiple SCSI interfaces and possibly the

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
440

F i b r e Channel

cirk IOOM B l s

,
,. ,L--
\ -

L i t c il I

I FC S w i t c h
Cascade
Disk System B u s

(a). The Clusler (b). A Cluster N o d e

Figure 1: A Fibre Chunnel-bused Media Server Cluster.

traditional Ethernet or FDDI LAN interface typically found in existing cluster nodes.
Each cluster node can have an (optional) local disk for booting and local swap. The
cluster nodes need not be symmetrical. For example, the authoring station may have
more CPU power for the image, video, and audio processing, the system controller
may have higher fault tolerance requirements, and the Web server may be optimized
towards data transfers, including real-time continuous media deliveries such as audio
and video. Since the server cluster is expected to store and manage a large database
for the media objects, a tertiary storage subsystem is required. The data stored in a
tertiary storage subsystem can be staged into disks or delivered directly to server
nodes (Web servers). The communication between server nodes uses the Fibre Chan-
nel IP protocol and the data transfer between server nodes and storage devices uses
the Fibre Channel SCSI protocol. The server cluster is connected to the Internet
through ATM or other network interfaces.
3.2 Advantages of the Fibre Channel-based Architecture
While the architecture presented in Figure 1 looks similar to many of those discussed
in Section 2, there is a major difference between an FC-based cluster and existing
clusters. In fact, many of these cluster architectures discussed in Section 2, such as
NCSA’s Web server, IBM’s WebNideo server, the HP-UX cluster, some ATM-
based servers, etc., can be abstracted to the one shown in Figure 2 (a), where the
processor block on the left side of the interconnect corresponds to the AFS server or
Storage Node, etc. For ease of discussion, we use a unified term “storage node” to
refer to these nodes. The main purpose for a storage node in a cluster is to manage
storage devices attached to it and provide data services to other nodes in the cluster.
Some of these storage nodes use full-blown general purpose machines, such as the
SUN Sparc 10 and the IBM RS/6000, and others use custom-designed special pur-
pose storage control systems. The processor blocks on the right side of the intercon-

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
441

nect correspond to “server nodes” that receive and service the requests from external
Web clients.
Interconnect Interconnect \ e t v 0rL
IP t W orh

Figure 2: Difference between FC-bused Cluster und other Clusters.

An obvious difference between Figure 2 (a) and (b) is that the FC-based system,
instead of connecting storage devices to the storage node, directly connects storage
devices to the interconnect switch, eliminating the storage nodes all together. This is
because Fibre Channel devices, including FC disks, FC switches, and FC host inter-
face cards, can communicate directly using the SCSI protocol, which enables the
“true” direct attachment of storage devices to the interconnect. Notice that for non-
FC systems, although the term “storage direct network attachment” often appeared in
the literature, in practice storage devices are attached to storage nodes and it is the
storage nodes that are attached to the network. In non-FC systems, storage devices,
such as disks, cannot directly communicate with the interconnect or network.
Allowing FC disks to directly attach to an FC switch results in improved
cost/performance. The elimination of storage nodes immediately reduces the system
cost. Even with Ethernet or FDDI-based clusters, which do not incur the switch cost,
the FC switch expense is greatly offset by the savings realized by ekminating multi-
ple storage nodes. In terms of the cost of FC disks, manufacturers e.stimate a price
comparable to that of f a d w i d e differential SCSI disk!;. The performance benefits of
the FC-based cluster, especially for delivering real-time video and audio, will be dis-
cussed in detail later in Section 5.
3.3 Weakness of the FC-based Architecture
At the other side of the coin, the FC switch-based architecture also has some weak-
nesses. First, though Fibre Channel defines rich protocols and mawy attractive fea-
tures, it is heavy-weight. To implement these standardized protocols and features
increases the complexity, which in turn can increase the manufacturing cost. For in-
stance, compared to some simple proprietary interconnect, such as Prlyrinet, the per
port cost of an FC switch is much higher. Of course, these proprietary interconnects

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
442

rely on the server nodes to define there own communication protocols or messaging
systems.
Second, unlike the AFS-based server architecture, which supports heterogeneous
server nodes and can theoretically scale up without limit, the FC-based architecture is
more appropriate for a homogeneous cluster environment, in which all nodes run the
same kernel. The homogeneity is required to simplify the cluster file system imple-
mentation that takes advantage of the FC “direct storage attachment to the intercon-
nect” feature described above. As any switch-based architecture, the FC-based cluster
can freely scale up and down within the switch capacity. When the cluster needs to
scale up and exceeds the switch capacity, then things become more complicated.
4 Fibre Channel-based Cluster Design Issues
4.1 Datu Sharing and I/O Load Balancing
Data sharing is one of the fundamental design goals for a cluster of media servers,
i.e., the file system should provide a single consistent view of all files stored in the
cluster to each server node. Fortunately, as a consequence of direct disk attachment
to the switch, the data sharing can be easily achieved in an FC-based cluster. One
simple implementation follows the HP-UX cluster paradigm, i.e., the system con-
troller acts as the root node, and all other server nodes boot from the root node. One
main difference is that in an HP-UX cluster, disks are attached to individual server
nodes and are not visible to other nodes. File sharing is achieved via the NFS mount
mechanism. File systems created on a node’s local disks are attributed to “LOCAL”
only to that node and are attributed to “REMOTE” to all other nodes in the cluster
(see Section 5.1 for the discussions on the local vs. remote file accesses). Whereas in
an FC cluster, all disks attached to the switch are immediately visible to each server
node when the node boots. Therefore, all files created on these disks can be accessed
directly by each cluster node. In the simple implementation, after the root node boots,
it calls a Logic Volume Manager (LVM) to create one or more Logic Volume
Groups on the shared disks attached to the FC switch (the LVM is also responsible
for the data striping across multiple disks). These Logic Volume Groups serve as a
shared free pool of storage space in the cluster. Later a superuser can log on any
node, grab a chunk of storage space from the free pool of Logic Volume Groups, and
create a new file system on the acquired space. Once mounted, this new file system
will be visible to all nodes and all files created in this file system are attributed to
“LOCAL” to each node. Since the Logic Volume Group metadata is shared amongst
all nodes, its access must be mutual exclusive. Like any NFS based file systems, this
simple implementation dose not guarantee file consistency when multiple users open
the same file from different nodes. In order to prevent concurrent writes to the same
file by multiple users, a mutual locking mechanism should be implemented.

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
443

In clusters that are based on distributed file systems (such as NFS, AFS, etc.),
files are typically not allowed to be striped across nodes, i.e., each node with disks
manages a set of files and serves other nodes’ access requests for these files. This
paradigm is inferior at load balancing. If some files are hotter than others, then the
node with the hot files may become a bottleneck in the cluster. One solution is to
store replicated copies of these hot files on multiple nodes in order to share the
workload. However, this may be costly for a media server cluster, since multimedia
objects are typically very large in size. It also increases the system complexity, since
load balancing algorithms are required to decide where, when, and for which files the
redundant copies should be created and removed. In a.n FC-based cluster, all of the
disks are viewed as cluster-wide resources, which enables the file system to manage
these disks in a single consistent way. In addition, for multimedia applications, typi-
cally media objects are striped across multiple disks to avoid any single disk becom-
ing a hot spot. Let’s assume that multiple disks are connected through an FC loop and
multiple loops are attached to FC switch ports. Wide data striping can now be
achieved by striping files not only across disks within a loop, but also across loops.
In this way, file access load can be evenly distributed across all the disks. One may
argue that redundant copies are also needed for fault tolerant reasons. However, once
load balancing is not a problem, fault tolerance can be achieved more cheaply, as we
will see in Section 4.3.
4.2 Distributed vs. Centralized Control
There are two ways to configure server clusters to service requests from external cli-
ents. Consider a Web media server cluster similar to the one in [ 13. In the distributed
control mechanism, the Web domain name, http://www.hp.com, is miapped to the IP
addresses of multiple server nodes in a round-robin manner by the Domain Name
Server (DNS). Each server node services Web requests routed to it independently.
This scheme, however, suffers from the caching effect of some local DNSs (recently,
there are some other proposals on this issue). One advantage of distributed control is
the ease of adding and removing server nodes, since ‘each node is functionally sym-
metrical and services requests independently. A disadvantage of distributed control is
that the round-robin server assignment mechanism fads to utilize workload charac-
teristics to optimize server performance. For example, once a server node services a
request, the data may still be in the node’s main memory when the next request for
the same data arrives. Therefore, assigning the next request to the same node results
in a better performance.
In contrast to the distributed control mechanism, the centralized control method
uses one node (e.g., the system controller) as the interface to the external clients, i.e.,
all external requests are routed to this node. This central node may service only small
text file requests and redirect requests for large multimedia objects to other nodes by

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
444

using hyperlinks. In this case, the central node can use high-end systems or multi-
processor systems to avoid bottlenecks. In addition, the other server nodes can be
optimized to service multimedia objects, such as audio and video. These optimiza-
tions include data-type-dependent-caching algorithms, quality-of-service (QoS) pro-
tocols, direct-storage-to-network data transfers without going to the system main
memory, etc. The central server node can redirect requests to other nodes by using a
simple round-robin mechanism, or by using more intelligent mechanisms that keep
track of access patterns.
Finally, a hybrid of the above two control mechanisms can be configured that
maps the Web domain name to a subset of the server nodes, which in turn redirect
requests for media objects to the server nodes optimized for delivery of multimedia
objects. This hybrid configuration can avoid the central server node from becoming a
bottleneck, and, at the same time, it allows the various optimizations on some nodes
to provide better service of media objects.
4.3 Fault Tolerance
For Fibre Channel-based server clusters, fault tolerance can be provided at different
levels, depending on system requirements and the budget.
Disks and U0 Channels Failures
Disks are the most unreliable components in a system. A well-known and simple way
to protect against disk failures is to use RAIDs. That is, instead of using individual
disks, use RAIDs to attach to FC loops or switch ports. However, this only protects
against disk failures, but not against I/O channel failures. One way to protect against
channel failures is to use the “dual-loop” structure as shown in Figure 3 (a). In this
configuration, each FC disk or RAID has dual ports, each of which is connected to an
independent loop. Each loop is attached to a separate switch port. Therefore, by us-
ing the dual-loop structure, fault tolerance is achieved not only for the FC channel
(the loop), but also for the switch port. Furthermore, besides the fault tolerance, the
performance is also improved because double channel bandwidth is available now,
which allows to attach more disks or RAIDs to the loop.

F C Switch
1 I F C Switch I

Figure 3: ( U ) D U U ~ - L O ( J ~Structure;
J (b)Striping ucross Loops with Purity Protection

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
445

An alternative to providing fault tolerance on disk, loop, or switch port is shown


in Figure 3 (b). In this figure, a f i l e f i s striped across multiple loops and parity is
used to protect against failure of either a disk, a loop, or a switch port. This configu-
ration is cheaper than the previous one, but the cluster may suffer performance deg-
radation if any of the three components fails.
Cluster Server Node and Switch Failures
When a server node fails in a cluster using the distributed control mechanism de-
scribed in Section 4.2, the DNS needs to temporarily remove the failed node from the
round-robin distribution list. Since each node provides the same functionality and
operates independently, the cluster will continue to operate, but with less processing
power. All current connections to the failed server node are interrupted, and there is
no way to recover these connections automatically. The Web clients connected to the
failed node see an error message on their screens, and., if they try to connect again,
their requests are routed to surviving nodes, which treat these requeg-.,ts as new re-
quests and start to service them all over again.
For centralized control, more sophisticated recoveiry mechanisms can be imple-
mented by maintaining the connection status of each server node at the central system
control node(s). Various degrees of recovery can be achieved depending on the com-
plexity of the states and algorithms. The details of these recovery methods are be-
yond the scope of this paper. The control node is the key component of the cluster. If
desired, a fully redundant fault tolerant control node can be used to protect against
failures. An alternative is to use the hybrid control mechanism, in which multiple
control nodes are configured.

Figure 4: Fully Redundunt Cluster Structure.

In the above, we have discussed protection against single port failures for the in-
terconnect Fibre Channel switch. But the switch backplane is still potentially a single
point of failure, i.e., if the switch backplane fails, the whole cluster halts. So for non-
stop server without any single point of failure, a fully redundant server can be con-
figured as shown in Figure 4. In this configuration, each server node has two FC in-
terface cards, each connected to an FC port of separate switches. This configuration

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
446

protects against any single component failure such that, at any time, there is at least
one data path available between two server nodes or a server node and a storage de-
vice.
4.4 Scalability
Given the Internet growth rate, an Internet media server must be scalable in order to
cope with the increasing number of service requests. Fibre Channel-based server
cluster scalability can be achieved by addinghemoving server nodes and/or storage
devices to the interconnect switch. When more server processing power is needed, a
server node can be added to the switch. When more disk bandwidth or storage ca-
pacity is needed, one more loop can be added to the switch. This flexibility adjusts
the server size according to the demand, and the cluster can freely scale up with al-
most linear cost until the available switch ports run out. In this case, new switches
can be cascaded onto the existing switch.
5 Performance Issues
In this section, we address some of the performance issues related to the Fibre Chan-
nel-based media server cluster.
5.1 Remote File Access
As stated earlier, one of the most significant advantages of using Fibre Channel tech-
nology is the feasibility of direct storage attachment to the interconnect. For non-FC
clusters, when a server node receives a request for files that are not cached or stored
in local disks, it must retrieve the files from remote (storage) node(s). A typical data
path for a remote file access is illustrated in Figure 5 (a).
S c r v c r Nctdr
(W e h S e r v c r j

Ih 1

Figure 5: Data Puth for File Accesses.

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
447

As can been seen from the figure, remote accesses involve multipile components
for non-FC-based systems. To read a file, the system call is trapped into the kernel.
The kernel virtual file system (VFS) layer distinguishes local files frorn remote files.
If the system call implies a remote access, the VFS calls; the network protocol stack if
the interconnect is a LAN (Ethernet, FDDI, or ATM), or the proprietary message
system layer if the interconnect is a proprietary switch. When the remote storage
node receives the request, its VFS layer calls the local file system, which translates
the request into a SCSI command and then initiates the disk I/O. The requested file
data is transferred into a memory buffer in the remote Istorage node, which packages
the data according to the packet or message format reqluired by the intlerconnect, and
then ships the data back to the requesting Web server node. Many factors may have a
negative impact on the performance of the remote accesses. First, the interconnect
LAN can be slow. Today’s typical Ethernet has 10 Mbit/sec of bandwidth, FDDI 100
MbiVsec, and ATM/OC-3 link 155 MbiVsec. Thus, to transfer 128 &;Bytes of data
across a LAN takes looms, lOms, or 6.5ms, respectively. Second, remote accesses
suffer from the overhead due to network protocol processing or message processing
and packaging, which can be very high if multiple data copies are involved in the
performance path. Third, since many system resources/components are involved in
remote accesses, such as interconnect network, storage node processor, memory, bus,
and disks, any contention for these resources will delay the data accesses, which is
especially undesirable for real-time media objects, such as audio and video. Some
studies have reported that remote file accesses are three times slower than local ac-
cesses [ 141.
By contrast, in a Fibre Channel-based cluster, all storage devices attached to the
interconnect switch are immediately visible to all server nodes. Thus any server node
can access any file in the cluster by issuing SCSI commands directly to the storage
devices, as shown in Figure 5 (b). This eliminates “remote” accesses; all file
read/write operations are executed as “local” accesses. Because of the elimination of
both protocol/message processing and datakommand format translaticin overhead for
remote accesses, the performance for file accesses is expected to be higher for FC-
based clusters.
5.2 File Caching
Although the AFSFDDI-based NCSA Web server performs satisfactorily today, its
success relies on the cache of frequently accessed files in the Web server nodes (AFS
clients). According to current user access patterns, for which only :I%of the total
requests access audio or video files, it is reported that 90% of the total requests can
be serviced by the Web server nodes from their local caches [1,8]. However, audio
and video objects are expected to play a greater role in delivering multimedia infor-
mation. As the Internet infrastructure improves, the requests for real-time play-back

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
448

of audio and video will increase rapidly. Since these audio and video objects are
very large in size, caching them in the Web servers (AFS clients)’ local disks or
memory may force the removal of many other (small) files from the local cache,
which in turn will decrease the local cache hit ratio. Therefore, new data type de-
pendent cache strategies need to be designed. In order to maintain a high cache hit
ratio, probably only a small part of the audio and video data can be cached with the
rest retrieved from remote storage nodes (AFS servers). As reported in the above
study [l], the 1% of requests accessing audio and video files accounted for 28% of
the total bytes transferred. If the audio and video requests increase by 1%, simple
mathematics reveals that this 2% accounts for 44% of the total bytes, and if 5% of the
total requests are for audio and video then they will account for 67% of the total
bytes transferred. This would significantly increase both the traffic across the inter-
connect and the workload to the storage nodes. As a result, the Web server may be
severely degraded in performance.
On the other hand, using the Fibre Channel-based cluster automatically solves the
cache problem faced by AFS-based server clusters, such as the NCSA Web server.
All files are effectively local to every Web server node. This illustrates the benefit of
an FC-based architecture for supporting real-time multimedia applications that in-
volve large files which are difficult to cache.
5.3 Fibre Channel Switch Overhead
Compared to truly local disk accesses (e.g., a disk directly attached to a server node’s
local PCI bus), the accesses of the disks attached to the Fibre Channel switch incur an
extra delay in crossing the switch. The minimurn acrvss switch delay (i.e., there is nu
contention in the switch) for a Class 1 frame is one microsecond, and for Class 2 and
3 frames it is about eight microseconds. The Class 1 service, however, incurs an ad-
ditional upfront connection setup delay in the range of 150-250 microseconds. There
are two types of traffic across the switch: (a) the IP traffic for the communications
between server nodes; and (b) the SCSI traffic for data transfers bctwcen server
nodes and storage devices. In an FC-based multimedia server cluster, the inter-
server communication traffic, as compared to server-disk data traffic, is relatively
small, and can be delivered by either Class 2 or Class 3 services.
For the server-disk traffic, we identify two scenarios according to the direction of
traffic. The traffic from server node to disks is typically SCSI commands or control
messages, which are usually very short and can be packed into a single frame (we
assume that disk writes or database updates can be done off-line). Thus, either Class
2 or Class 3 services can be used. On the other hand, the traffic from disks to server
nodes is typically data requested by these nodes. To improve disk utilization, typi-
cally 110 requests for media objects access large blocks of data (e.g., 128 to 256
KBytes as compared to 4 to 8 KBytes in traditional transaction processing systems).

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
449

When multiple disks are connected through an FC loop attached to a switch port,
delivering these large data trunks using Class 2 or Class 3 services may suffer from
the well-known “head-of-line-blocking”problem. Some preliminary simulation re-
sults show that, with the head-of-line-blocking, only 50% of the port bandwidth can
be effectively utilized. This implies that a 1 GbiVsec port can only sustain a 500
MbiVsec data rate. In a previous study [ 3 ] ,we observed that, if a 1 GbiVsec FC loop
is attached directly to a host PCI bus, it can sustain a 700-800 Mbir/sec data rate for
real-time applications such as video-on-demand. Thus, a switch that sustains only a
500 MbiVsec data rate is apparently a bottleneck between the host and a n FC loop.
With Class 1 service, the head-of-line-blocking can be avoided. The Class 1 con-
nection setup overhead is still a concern, but it can ble justified by iihe large data
transfer within a connection, which typically takes several milliseconds. Therefore,
using Class 1 scrvicc for thc data traffic from disks to servcr nodes might be appro-
priate for multimedia servers.

6 Summary and Future Work


In this paper, we presented an FC-based architecture for server clusters., which allows
storage devices to be physically attached to the interconnect FC switch. This feature
changes the fundamental data sharing model o f existing clusters for which remote file
accesses are necessary. The advantages of this architecture include: (a) it improves
the performance by eliminating all the remote data accesses to storage nodes; (b) it
avoids the “caching large media objects” problem faced by AFS-based clusters; (c) it
reduces costs by eliminating the storage nodes found in existing clusters; (d) it pro-
vides a single uniform mechanism that manages all the storage devices in the cluster,
which enables wide data striping and better load balancing; and (e) it supports vari-
ous levels of fault tolerance and scalability. While our discussions are: conducted in
the context of Internet media servers, this architecture can certainly be used for a
wide range of multimedia servers, such as those for video-on-demand, corporate
training, distant learning, etc.
Finally, in this paper we have only discussed the arlchitectural level design issues
of FC-based multimedia server clusters. There are still many questions need to be
answered. First, while we claimed performance and cost benefits of FC-based cluster,
most of these claims are not yet quantified. Further studies and simulations are neces-
sary in order to better understand these issues. Seconld, the cluster control mecha-
nism, especially the promising hybrid configuration, needs furthcr investigation. We
will leavc these issues as topics of our future research.
Acknowledgment:
The authors wish to thank Lucy Cherkasova for her generous help providing some of
the simulation performance results for the Fibre Channel switch.

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.
450

References:

1. Kwan, T. and McGrath, R., “NCSA’s World Wide Web Sewer: Design and Per-
formance,” IEEE Computer, Nov. pp.68-74, 1995.
2. ANSI Standard X3T9.3 Fibre Channel Physical and Signaling Intelface (FC-
PH), Rev 4.0, May 1993.
3. Chen, Shenze and Manu Thapar, “Fibre Channel Storage Interj%ce for Video-on-
Demand Servers,” IS&T/SPIE Proc. Vol 2667 on Multimedia Computing and
Networking, San Jose, CA, Jan. 1996.
4. Cummings, Roger, “System Architectures Using Fibre Channel, ” Proc. 12th
IEEE Symposium on Mass Storage Systems, pp.251-256, 1993.
5. Varma, Anujan, Vikram Sahi, and Robert Bryant, “Peij4ormance Evaluation of a
High-speed Switching System Based on the Fibre Channel Standard, ” Proc. of
the 2nd IEEE Int’l Symposium on High-Performance Distributed Computing,
Spokane, Washington, July 1993.
6. Getchell, D. and P. Rupert, “Fibre Channel in the Local Area Network, ” IEEE
LTS, Vo1.3, No. 2, pp.38-42, May 1992.
7. Hewlett-Packard Co. HP-UX Release 9.0, Managing Clusters of HP 9000 Com-
puters, Aug. 1992.
8. Kwan, T., R. McGrath, and D. Reed, “User Access Patterns to NCSA’s World
Wide Web Server,” Tech. Report UIUCDCS-R-95- 1934, Dept. Computer Sci-
ence, Univ. of Illinois, Urbana-Champain, Feb. 1995.
9. Dittia, Z. D., J. R. Cox, Jr, and G. M. Parulkar, “Using an ATM Interconnect as
a High Performance I/O Backplane,” Presentation at Hot Interconnects 11, Stan-
ford, CA, Aug. 1994.
10. Ito, Yukiko and Tsutomu Tanaka, “A Video Server Using ATM Switching
Technology,” Proc. 5th IEEE COMSOC Int’l Workshop on Multimedia Com-
munications, Kyoto, Japan, pp.4-4-1,3-4-6, May 1994.
1 1. Rooholamini, Reza and Vladimir Cherkassky, “ATM-Based Multimedia Serv-
ers,” IEEE Multimedia, Spring, 1995, pp.39-52.
12. Williams, Robin and Roger Haskin, “Tiger Shark Video Server,” Presentation at
Hot Interconnect 111, Stanford, CA, Aug. 1995.
13. Anderson, Thomas E., et al, “A Case for NOW (Networks of Workstations), ”
IEEE Micro, Vo1.15, No. 1, pp.54-64, Feb. 1995.
14. Howard, John H., et al, “Scale and Performance in a Distributed File System,”
ACM Trans. On Computer Systems, Vol. 6, No.1, pp.51-81, Feb. 1988.

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on July 12,2021 at 18:20:17 UTC from IEEE Xplore. Restrictions apply.

You might also like