Professional Documents
Culture Documents
Direct NFS was inspired by experience at Oracles Austin Data Center. Oracle uses NFS to run its applications on tens of thousands of Linux servers accessing many petabytes of NetApp storage. In 2005 they had 12,000 Linux servers and 3 petabytes of NetApp storage. Todays numbers arent public, but they are much larger. When an operating system capability becomes sufficiently important, Oracle pulls it into the database. Memory management became critical, so Oracle said, Just give me the raw pages, and Ill manage them myself. Disk caching became critical, and Oracle said, Just give me the raw disk blocks, and Ill cache them myself. Now NFS has become critical, so Oracle says, Just give me a raw TCP/IP socket, and Ill generate NFS requests myself. Steve Kleiman has argued that as Oracle becomes more sophisticated, the operating system becomes little more than a device driver framework that gives the database raw access to the hardware. That sheds new light on Oracles Unbreakable Linux program. What exactly does Oracle gain from Direct NFS? The primary benefits are simplicity and performance. Its simpler because you dont have to worry about how to configure NFS. What timeouts should you use? What caching options? It doesnt matter. Oracle looks at how you have NFS configured to figure out where the data lives, but aside from that, your settings dont matter. Oracle takes control. It even works with Windows. Just mount the data that Oracle needs using a CIFS share, and Oracle figures out the location of the data and accesses it via NFS. (CIFS is great for home directory sharing, but it isnt designed for database workloads.) Performance is better because Oracle bypasses the operating system and generates exactly the requests it needs. Data is cached just once, in user space, which saves memory no second copy in kernel space. Oracle also improves performance by load balancing across multiple network interfaces, if they are available. For more technical details on Direct NFS, check out this article by Kevin Closson. He works for PolyServe, which is a NetApp competitor, but technically speaking, he talks good sense. I also recommend this article, by NetApps John Elliott, comparing Oracle performance over Fibre Channel, NFS and iSCSI. NetApp has been closely involved in Direct NFS from the very beginning. Peter Schay came up with the idea while he worked for Oracles Linux Program Office. He wanted to simplify things for Oracle customers running on Linux, many of whom were hosted on Oracles On-Demand environment at the Austin Data Center. He worked closely with NetApp engineers to prototype and test the idea. The Oracle ST team used his functional specification to develop the production version of Direct NFS now shipping in 11g. (Today Peter works for NetApp.) I love how NFS has evolved over the past couple of decades. Twenty years ago, it providing file sharing to small engineering workgroups; today it provides the data backbone for some of the worlds largest data centers. What it is about NFS that has allowed it to make this transition? What is it about NFS that Oracle would choose to build it directly into their database? Thats the topic for another post! At first glance NAS and SAN might seem almost identical, and in fact many times either will work in a given situation. After all, both NAS and SAN generally use RAID connected to a network, which then are backed up onto tape. However, there are differences -- important differences -- that can seriously affect the way your data is utilized. For a quick introduction to the technology, take a look at the diagrams below.
More Differences
NAS
Almost any machine that can connect to the LAN (or is interconnected to the LAN through a WAN) can use NFS, CIFS or HTTP protocol to connect to a NAS and share files. SAN A NAS identifies data by file name and byte offsets, transfers file data or file meta-data (file's owner, permissions, creation data, etc.), and handles security, user authentication, file locking A NAS allows greater sharing of information especially between disparate operating systems such as Unix and NT. File System managed by NAS head unit Backups and mirrors (utilizing features like NetApp's Snapshots) are done on files, not blocks, for a savings in bandwidth and time. A Snapshot can be tiny compared to its source volume. Only server class devices with SCSI Fibre Channel can connect to the SAN. The Fibre Channel of the SAN has a limit of around 10km at best A SAN addresses data by disk block number and transfers raw disk blocks.
File Sharing is operating system dependent and does not exist in many operating systems. File System managed by servers Backups and mirrors require a block by block copy, even if blocks are empty. A mirror machine must be equal to or greater in capacity compared to the source volume.
What's Next? NAS and SAN will continue to butt heads for the next few months, but as time goes on, the boundaries between NAS and SAN are expected to blur, with developments like SCSI over IP and Open Storage Networking (OSN), the latter recently announced at Networld Interop. Under the OSN initiative, many vendors such as Amdahl, Network Appliance, Cisco,
Foundry, Veritas, and Legato are working to combine the best of NAS and SAN into one coherent data management solution.
NAS vs. SAN When ProfitLine went through the laborious process of choosing a storage technology, Don Lightsey was thrust into the center of the decision making. This is his first-person account.
Just like the age-old question of which operating system (OS) and computer platform to
choose when buying a computer, both network-attached storage (NAS) and storage area network (SAN) technologies have their place in the network storage arena. For companies that need to move beyond direct storage, considering NAS or SAN makes sense. How do you know which one will work for you? ProfitLine began to look at NAS and SAN solutions for centralized storage, storage management and storage scalability. ProfitLine manages the telecom expenses on behalf of large enterprises, so we receive thousands of phone bills daily that we process, audit and pay on our clients behalf, and we store all the data for a rolling 13 months for trending and reporting purposes. We were adding new clients quickly, and direct storage was clearly no longer a workable solution, as the companys storage needs mushroomed. Our new storage solution needed to support our rapidly escalating storage needs in processing and storing large volumes of call detail. Our storage needs were driven by the amount of client data processed, and that number was growing exponentially. The new storage needed to be scalable to dynamically allocate and easily manage disk space without incurring unneeded downtime and high administrative overhead. The security and availability of our clients data were top priorities. At first glance, NAS and SAN may seem easy to tell apart. A SAN is a dedicated storage area network that is interconnected through a Fibre Channel protocol using either 1gigabit or 2-gigabit Fibre Channel switches and Fibre Channel host bus adaptors. Devices such as file servers connect directly to the storage area network through the Fibre Channel protocol. Unlike a NAS filer, which uses standard TCP/IP over Ethernet to connect the storage to the network, SANs use Fibre Channel protocol to connect storage directly to devices/hosts. NAS connects directly to the network using TCP/IP over Ethernet CAT 5 cabling. In most cases, no changes to the existing network infrastructure need to be made in order to install a NAS solution. The network-attached storage device is attached to the local area network (typically, an Ethernet network) and assigned an IP address just like any other network device. The more we researched NAS and SAN, however, the muddier the waters became. Storage solutions vendors try to be all things to all people, and the definitions, value and benefits were hard to differentiate. To add to the confusion, SAN and NAS technologies are beginning to converge and blur the line even more. CROSSOVER FEATURES Features that were once only available in SAN solutions are now starting to become available in NAS products, and, likewise, products are now available that allow storage administrators to leverage parts of their SAN infrastructures to act as NAS filers through the use of technologies such as iSCSI. iSCSI (Internet small computer system interface) is an Internet protocol-based storage networking standard for linking data storage facilities. By carrying SCSI commands over IP networks, iSCSI is used to facilitate data
transfers over intranets and to manage storage over long distances, loaning SAN a few of the NAS capabilities. With a lot of research and business use cases, we were able to sort out fact from fiction. There are big differences between the technologies, and both have benefits. SANs are highly redundant through the implementation of multipathing and the ability to create fully redundant fiber meshes; so there is no single point of failure. SANs feature block-level transfers instead of NAS file-level transfers. This is critical if you have database applications that read and write data at the block level. SAN products run on Fibre Channel protocol and are entirely isolated from the IP network, so there is no contention over the TCP/IP offload engine (which optimizes throughput). NAS products run over your existing TCP/IP network, and, as such, are prone to latency and broadcast storms, and compete for bandwidth with users and other network devices. You can leverage an existing SAN to act like a NAS with an iSCSI switch, which saves money. With a SAN, there is higher security because SAN uses zoning and logical unit number (LUN) security. NAS security is typically implemented at the file-system level through traditional operating system access-control lists. There is more flexibility for redundant array of independent disks (RAID) levels. While NAS products do support standard RAID levels, such as 0,1,5, you typically do not get the flexibility to mix RAID levels within the same device. LOWER COSTS WITH NAS File servers see SAN attached volumes as locally attached disks, whereas a NAS presents them as remote network file system (NFS) or new technology file system (NTFS) file shares. NTFS is the file system that the operating system uses for storing and retrieving files on a hard disk. NFS is a client/server application that lets a computer user view and optionally store and update files on a remote computer as though they were on the users own computer. Some applications do not support remote drives and can only use a volume that is local to the OS.
NAS is cheaper than SAN. The initial investment in a SAN is expensive due to the high cost of Fibre Channel switches and host bus adaptors. If you need just simple file storage, NAS is the way to go. A NAS product simply plugs into your existing IP network as any other device and looks like a normal file share on the network. So, a NAS can be dropped right into your existing IP network without any additional costs or infrastructure changes. Ethernet is a stable and mature protocol and most any IT administrator already knows Ethernet and TCP/IP, so there are no steep learning curves compared with learning and understanding the Fibre Channel protocol. After carefully weighing all the benefits, ProfitLine chose to go with a Hitachi 9200 SAN infrastructure with Compaq/HP Proliant Servers because of its better support for databases and higher security. Since ProfitLine is processing tens of thousands of invoices monthly on behalf of its clients, and the number is only going up with the addition of new clients, we needed the more powerful and scalable SAN technology. One of the limitations of the mid-class SAN that we purchased was the ability to resize existing LUNs without destroying all LUNs created after the LUN being resized. A LUN is a unique logical unit number that is used to create logical partitions of data that reside on one or many physical hard drives, and is similar to a volume or file system on a Windows or Unix operating system. STORAGE SPACE UNAVAILABLE We originally had not anticipated this limitation as being a big issue due to our fairly simple and straightforward storage needs. We soon found out further down the road, as we began to consume all of the storage space on the SAN, that this was a problematic issue for us because we did not have a clear strategy for allocating storage space to servers. The ability to go back and reclaim over-allocated and unused disk space was not possible without backing up, rebuilding and restoring data. We now had a situation where we were running out of storage space to allocate to new servers, but still had roughly 50% raw storage that was not being used, just sitting on servers wasting away because we could not dynamically resize the LUNs. At first, we were disappointed with the SAN solution. This was the same type of problem that we regularly complained about with direct-attached storage on servers, and now here we were in the same situation. At the same time that this issue started becoming a problem, we noticed that a few of the SQL servers were showing slightly high disk queuing values during peak times of the day. After some analysis in comparing the disk counters in Windows Performance Monitor with the data we were capturing from the Hitachi 9200 SAN and Brocade (Silkworm 3800) Fibre Channel switches, we determined that the operating systems, and not the SAN, were causing the disk I/O (input/output) performance problems. All of our focus was on the I/O performance of the SAN, so we built our RAID arrays using lots of spindles, knowing that this was a good way to get I/O performance. We neglected to consider the operating system as part of the equation, however. We were allocating large LUN sizes to the operating system and placing the disk I/O bottleneck at the server. After discussions with our SAN vendor, we came up with a strategy to solve both the disk-allocation problems and the performance issues. To solve the disk-allocation problems, we broke up all of the LUNs on the SAN into two sizes, 10 gigabytes and 25 gigabytes, and used Veritas Volume Manager volumemanagement software on the Windows servers to strip multiple LUNs together. The new dynamically resized logical volumes were able to match changing storage needs, and we
can now allocate disk space to servers by simply adding one or more LUNs together and stripping them together in the operating system. For example, a server needing 20 gigbytes would get 2x10-gigabyte LUNs, a server needing 50 gigabytes would get 2x25-gigabyte LUNs, or a server needing 30 gigabytes would get either 3x10 gigabyte or 25 gigabytes + 10 gigabytes, and so on. In addition to making managing the storage needs easier, it also helped us eliminate unnecessary costs in purchasing additional storage capacity by reclaiming almost 40% of the previously unused storage space. The performance problem was also solved with this same basic math. Our previous strategy was to allocate LUNs as large as needed to fit the application requirements. So, if an application needed 100 gigabytes of storage space, we would allocate a single 100gigabyte LUN that had a single I/O stack. By utilizing the new strategy of 10-gigabyte and 25-gigabyte LUNs, the 100-gigabyte LUN would be given 4x25-gigabyte LUNs and stripped in the operating system using the volume-management software. The new 100-gigabyte LUN would have four I/O stacks to use when reading and writing to the logical volume, which increases disk performance. Our SQL servers have benefited greatly from this new strategy, as a result. We have since extended our SAN to support basic file storage needs for document imaging and EDI file storage for electronic processing. Our current SAN is a workhorse, and I/O performance is great. The SANs higher performance of Fibre Channel compared to Ethernet, its flexibility to support different RAID levels easily, and its strong support for databases and block-level data transfers made it the better choice for our dramatically growing business. Our clients see flawless, fast throughput of their telecom data, which leads to faster processing and quicker return on investment. SAN and NAS both have their advantages, so be sure to assess your current and future storage needs, and understand the differences of how they could help your business succeed.
The Top 10 SANs vs. NAS Decision Factors
by W. Curtis Preston, author of Using SANs and NAS 03/14/2002
Many administrators find themselves answering a new question: Should I use Storage Area Networks (SANs) or Network Attached Storage (NAS) to store my data? NAS filers (dedicated machines serving files via NFS, CIFS, or NCP) were once perceived as "NFS in a box," and no one would think about using them to hold large databases. In contrast, SAN disk arrays were perceived as too expensive for the average user. The decrease in the price of a SAN array, and the increase in functionality and speed of NAS filers have changed all that. Which is right for you? Should you buy a large Fibre Channel-based array, put it on a SAN, then use disk virtualization, a volume manager, and filesystem software to allocate the disks to multiple hosts? Or should you buy a NAS filer with its native volume manager and use NFS to share its volumes to multiple hosts? This article gives you 10 reasons that people cite when they make this decision. Many of the comments made in the following paragraphs are summaries of information from the book Using SANs and NAS. For the details behind these summary statements, please see Chapters 2 and 5 of this book.
Many would argue that SANs are simply more powerful than NAS. Some would argue that NFS and CIFS running on top of TCP/IP creates more overhead on the client than
SCSI-3 running on top of Fibre Channel. This means, they would say, that a single host can sustain more throughput to a SAN-based disk than a NAS-based disk. While this may be true on very high-end servers, most real-world applications require much less throughput than the maximum available throughput of a filer. There are applications where SANs will be faster. If your application requires sustained throughput greater than what is available from the fastest filer, your only alternative is a SAN.
No one who has managed both a SAN and NAS should argue with this statement. SANs are composed of pieces of hardware from many vendors, including the HBA, the switch or hub, and the disk arrays. Each of these vendors will be new to an environment that has not previously used a SAN. In comparison, filers allow the use of your existing network infrastructure. The only new vendor you'll need to communicate with is the manufacturer of the filer itself. SANs have a larger number of components that can fail, fewer tools to troubleshoot these failures, and more possibilities of finger pointing. The result is that a NAS-based network will be much easier to maintain.
Again, since filers let you leverage your existing network infrastructure, they are usually much cheaper to implement than a SAN. A SAN requires the purchase of a Fibre Channel HBA to support each host that will be connected to the SAN, a port on a hub or switch to support each host, one or more disk arrays, and the appropriate cables to connect all this together. Even if you choose to install a completely separate LAN for your NAS traffic, the required components will still be much cheaper than their SAN counterparts. Although SANs are getting less expensive every day, a Fibre Channel HBA still costs much more than a standard Ethernet NIC. It's simply a matter of economies of scale. More people need Ethernet than need Fibre Channel.
Many people have criticized SANs for being more hype than reality. Too many vendors systems are incompatible, and too many software pieces are just now being released.
Vendors are still fighting over the Fibre Channel standard. While there are many successfully implemented SANs today, there are many that were not successful. If you connect equipment from the wrong vendors, things just won't work. In comparison, filers are completely interoperable, and the standards they are based on have been around for years. Perhaps in a few years, the vendors will have agreed upon an appropriate standard, and SAN management software will do everything that we want it to do, with SAN equipment that is completely interoperable. I sure hope this happens.