Professional Documents
Culture Documents
FILESTREAM Design and Implementation Considerations
FILESTREAM Design and Implementation Considerations
Considerations
SQL Server Technical Article
Summary: Much of the data used today is unstructured, including text documents, images, and
videos. This unstructured data is typically stored outside a relational database and separate
from structured data. This separation can make data management much more complex, and if
the data is associated with structured storage, the separation can limit performance and file
streaming capabilities.
Microsoft SQL Server 2008 includes an enhancement to data storage called FILESTREAM,
which lets you store unstructured binary large object (BLOB) data directly in the file
system. With FILESTREAM you can take advantage of the Win32 rich set of streaming
application programming interfaces (APIs) for better streaming performance. FILESTREAM also
provides transactional consistency so structured and unstructured data are always in synch;
additionally, you can use Transact-SQL statements to insert, update, query, search, and back
up FILESTREAM data.
This white paper is a companion to the information about FILESTREAM found on TechNet. This
paper delves deeply into selected topics that should be considered when implementing a
solution that uses FILESTREAM, including design considerations, maintenance, and
management of a FILESTREAM environment.
Copyright
This document is provided as-is. Information and views expressed in this document, including
URL and other Internet Web site references, may change without notice. You bear the risk of
using it.
This document does not provide you with any legal rights to any intellectual property in any
Microsoft product. You may copy and use this document for your internal, reference purposes.
2011 Microsoft. All rights reserved.
Contents
Introduction.................................................................................................................................. 5
Introducing FILESTREAM............................................................................................................5
When to Use FILESTREAM.....................................................................................................6
Considerations for Enabling and Configuring FILESTREAM.......................................................7
Enabling FILESTREAM............................................................................................................ 7
Failover Clusters...................................................................................................................... 8
FILESTREAM Storage Considerations....................................................................................9
Considerations for Setting Up FILESTREAM Storage Volumes............................................9
Special Consideration for Large Environments...................................................................10
RAID Considerations.......................................................................................................... 10
Security Considerations.............................................................................................................11
Encryption.............................................................................................................................. 12
Management Considerations.....................................................................................................12
Maintenance Tasks................................................................................................................. 12
Index Management................................................................................................................ 12
Filegroup Management.......................................................................................................... 13
Backup and Restore............................................................................................................... 13
FILESTREAM Transaction Log..............................................................................................13
Garbage Collector.................................................................................................................. 14
Log Shipping.......................................................................................................................... 15
Database Mirroring................................................................................................................. 15
Database Snapshot................................................................................................................ 15
AlwaysON (HADRON)............................................................................................................ 15
Considerations for Table Creation..............................................................................................15
Data Access Considerations......................................................................................................16
Transact-SQL Access............................................................................................................. 17
File System Streaming Access...............................................................................................17
Statement Model.................................................................................................................... 17
Storage Namespace.............................................................................................................. 17
Transacted File System Access.............................................................................................18
Transactional Durability.......................................................................................................... 18
3
Isolation Semantics................................................................................................................ 18
Partial Updates....................................................................................................................... 18
Write-Through from Remote Clients.......................................................................................18
Migration Considerations........................................................................................................... 19
Conclusion................................................................................................................................. 20
For More Information................................................................................................................. 21
Introduction
Todays computer-driven business world generates data at an incredible rate. If this data is to be
useful, organizations must store it in a controlled and efficient way so that it is readily
accessible.
Prior to Microsoft SQL Server 2008, storing unstructured data such as text documents, images,
and videos posed many challenges, such as how to maintain transactional consistency between
the structured and unstructured data, how to manage backup and restore, and storage
performance and scalability. Architects of applications that required the storage of binary large
objects (BLOB) data could either store the data in the database or store it outside of the
database with a reference stored in the database. This decision was never easy to make
because each option had its own benefits and frustrating limitations. Information about the
tradeoffs can be found in the white paper, FILESTREAM Storage in SQL Server 2008
(http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx).
Microsoft SQL Server 2008 R2 and the current Community Technology Preview (CTP) release
(SQL Server code-named Denali) expand the available options for storing BLOB data to
include:
In-database
FILESTREAM
FileTable
This white paper focuses on design and implementation considerations for using FILESTREAM
storage. Additionally, notes in the paper describe relevant RBS and FileTable considerations.
The paper consolidates information from many resources, and adds content from Microsoft
development team members.
Note: This paper is intended for system architects, IT professionals, and database
administrators (DBAs) tasked with evaluating or implementing data storage. It is assumed that
the reader is very familiar with Windows Server and SQL Server and has at least a basic
knowledge of database concepts.
Introducing FILESTREAM
FILESTREAM is a SQL Server 2008 feature that lets you store unstructured BLOB data directly
in the file system. FILESTREAM is not a data type; it is an attribute imposed on a varbinary
column to indicate that the data is to be stored directly on the file system, thus maintaining
transactional consistency.
A non-FILESTREAM storage format uses the buffer pool when the data pages are accessed.
FILESTREAM uses the NT system cache for caching the file data. This approach helps reduce
5
the effects that FILESTREAM data has on database engine performance. While the buffer pool
is relieved of managing the varbinary(max) data pages, it is important to appreciate that the
virtual address space (VAS) is still shared between FILESTREAM data and SQL Server data.
When using FILESTREAM, it is important to differentiate between traditional data (called row
data) and FILESTREAM data.
Note: Other storage options include RBS and FileTable.
RBS is a library API set incorporated as an add-on feature pack for Microsoft SQL Server. RBS
is designed to move storage of BLOB data from database servers to external Content
Addressable Stores (CAS). A reference to the BLOB is stored in the database. An application
stores and accesses the BLOB data by calling into the RBS client library. RBS manages the
lifecycle of the BLOB; for example, RBS performs garbage collection when needed. For more
information, see Microsoft SQL Server Remote Blob Storage (RBS) Samples
(http://sqlrbs.codeplex.com/).
FileTable builds on existing FILESTREAM capabilities, providing applications with nontransactional access to a special table (the FileTable) that contains unstructured data. For more
information, see FileTable Overview (http://msdn.microsoft.com/enus/library/ff929068(v=SQL.110).aspx).
Note that unlike RBS, FILESTREAM is constrained to local volumes. RBS can store BLOB data
on a variety of remote storage devices.
You are developing applications that use a middle tier for application logic.
You should generally store BLOBs smaller than 256 kilobytes (KB) inside the database, and
store BLOBs larger than 1 MB outside the database. For BLOBs sized between 256 KB and 1
MB, the more efficient storage solution depends on the read:write ratio of the data, and on the
rate of overwrite1. Generally, storing them as varbinary(max) BLOBs in the database provides
better streaming performance than storing them outside of the database
1 http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx
6
Use these considerations as a starting point for deciding if BLOBs should be stored outside of
the database. If you plan to store the BLOBs outside the database, you can then evaluate
whether RBS or FILESTREAM is the most appropriate solution.
For more information, see Chapter 7 (Special Storage) of Microsoft SQL Server 2008
Internals2. This chapter provides points to consider when comparing in-database or file system
storage. While not required for implementing FILESTREAM, the discussion in the book is useful
when selecting the most efficient storage medium based on the BLOB size and impact to
database management operations.
Note: SQL Server stores Character Large Objects (CLOBs) as varchar(max) and nvarchar(max)
data types, while BLOB types can be used with varbinary(max) and image data types.
Enabling FILESTREAM
FILESTREAM is not automatically enabled when you install or upgrade SQL Server; before you
can start using FILESTREAM, you must enable it on the instance of the SQL Server Database
Engine.
There are several different ways to enable FILSTREAM. You can enable FILESTREAM by using
SQL Server Configuration Manager, by using Transact-SQL, by using SQL Server Management
Studio, or by enabling FILESTREAM during SQL Server 2008 installation. Note that while
enablement is typically performed by a Windows system administrator, it is critical that the
administrator is able to assign appropriate levels of access to the storage.
Note: Step-by-step guidance for enabling FILESTREAM can be found at TechNet at How to:
Enable FILESTREAM (http://technet.microsoft.com/en-us/library/cc645923.aspx) or on MSDN
at How to: Enable FILESTREAM (http://msdn.microsoft.com/en-us/library/cc645923.aspx).
When enabling FILESTREAM using SQL Server Configuration Manager, you should select all of
the check boxes in the FILESTREAM tab of the SQL Server (SQL 2008) Properties pane in
most cases (Enable FILESTREAM or Transact-SQL access, Enable FILESTREAM for file
I/O streaming access, and Allow remote clients to have streaming access to FILESTREAM
data). You should also create a Windows share name.
2 Delaney, Kalen, et al., Microsoft SQL Server 2008 Internals, Redmond, WA: Microsoft Press,
2009, ISBN: 0735626243
7
Note that the share name must not pre-exist. When selecting a name, pay careful attention to
naming conventions, especially if multiple instances reside on the server and each has
FILESTREAM enabled. Note that if the application will run on the same server as SQL Server
(an uncommon configuration), there is no need to enable remote connections.
Microsoft recommends that you only enable the features you need. It is therefore important to
communicate clearly with the Windows system administrator who is responsible for
configuration. For example, without clear guidance, an administrator might configure the
Windows properties to allow only a Transact-SQL connection when the desired configuration is
to include Win32 access.
After the service is configured through the SQL Server Configuration Manager, you must enable
FILESTREAM through the sp_configure process using Transact-SQL. In SQL Server
Management Studio, open a new query window (the Query Editor) and type the following
Transact-SQL code to enable FILESTREAM and to allow remote clients to connect to
FILESTREAM files through the Win32 API.
EXEC sp_configure FILESTREAM_access_level, 2
RECONFIGURE WITH OVERRIDE
GO
SQL Server 2008 supports three levels of FILESTREAM access:
After FILESTREAM is enabled, the SQL Server instance is ready to hold databases that contain
FILESTREAM filegroups. Because FILESTREAM uses this special type of filegroup, you must
specify the CONTAINS FILESTREAM clause for at least one filegroup when you create the
FILESTREAM-enabled database. Note that it is not possible to point multiple databases to a
common directory for their FILESTREAM requirements.
Note: FileTable is enabled through the FILESTREAM configuration settings. In the database,
the process is different than that for FILESTREAM because FileTable is enabled through
database options, not through the definition of a filegroup.
Failover Clusters
FILESTREAM is fully supported with failover clustering. To use FILESTREAM in a failover
cluster, all nodes in the cluster must have FILESTREAM enabled at the Windows level, and the
FILESTREAM data container(s) must be placed on shared storage so the data is available to all
nodes. If you plan to enable I/O streaming, be sure to use the same Windows share name on all
nodes; you should see the share name appear as a cluster resource.
Note: For step-by-step guidance, see the MSDN article How to: Set Up FILESTREAM on a
Failover Cluster (http://msdn.microsoft.com/en-us/library/cc645886.aspx).
Turn off short file names on FILESTREAM computer systems. Short file names take
significantly longer to create. To disable short file names, use the Windows fsutil utility
(see Fsutil [http://technet.microsoft.com/en-us/library/cc753059(WS.10).aspx]).
Use 64 KB NTFS file system clusters. Compressed volumes must be set to 4 KB NTFS
clusters.
Set up and tune the RAID level for fault tolerance and the performance that is required
by an application.
A FILESTREAM filegroup has only one data container. This is unlike row data filegroups,
which can have multiple files (equivalent to data containers) per filegroup. Note that the
next release of SQL Server will remove the one data container limitation.
When you are using failover clustering, the FILESTREAM filegroups must be on shared
disk resources.
10
RAID levels differ in terms of read/write performance, resilience to failure, and cost.
RAID 5 is ideal for high read solutions and is relatively low cost. It can handle the failure
of only one drive in the RAID array, however, and it may be unsuitable for write-heavy
workloads.
RAID 10 provides excellent read and write performance and is preferred for high
updating solutions. It can handle multiple drive failures (depending on the degree of
mirroring involved), but it is more expensive, given that at least 50 percent of the drives
in the RAID array are redundant.
RAID level choice might be different for the volume on which each user database is stored, and
it might differ between the volume storing the data files and that storing the log files for a single
database.
If the workload will involve high-performance streaming of FILESTREAM data, you may choose
to have the FILESTREAM data container volume use the RAID level that gives the highest read
performance. However, this approach might not provide a high degree of resilience against
failures. On the other hand, you might choose to use the same RAID level as for the other
volumes that store the data for the database, but this approach might not provide the requisite
performance levels that the workload demands.
It is therefore important to make a carefully considered choice for RAID level for the
FILESTREAM data container volumes after considering the tradeoffs.
Note: For more information, see Physical Database Storage Design
(http://technet.microsoft.com/en-us/library/cc966414.aspx) or RAID Levels and SQL Server
(http://technet.microsoft.com/en-us/library/ms190764.aspx).
Security Considerations
The recommended, default configuration only allows access to the FILESTREAM files through
SQL Serverthrough Transact-SQL or through a token if using Win32. Microsoft recommends
that no account other than the account running the SQL Server service be granted NTFS
permissions to the FILESTREAM data containers.
It is possible to access the files without going through SQL Server (not recommended), but this
requires an explicit action by the system administrator to modify the security settings applied to
the data container.
FILESTREAM takes advantage of the existing authentication and authorization functionality of
SQL Server for controlled access to the data values; these permissions can be applied at the
columnar level.
Note: When the RBS provider is FILESTREAM, then FILESTREAM applies SQL Server
security, but all other providers are unaware of SQL Server security.
11
You can use DBCC CHECKDB to help identify orphans, whether they exist in the table or in the
file system. Note that DBCC CHECKDB does not reveal cases of file contents that have been
tampered with, however. For more information, see DBCC CHECKDB (Transact-SQL)
(http://msdn.microsoft.com/en-us/library/ms176064.aspx).
Encryption
It is possible to store FILESTREAM data on Encrypted File System (EFS) volumes; however,
you should pay careful attention to the nuances of an EFS volume. For more information, see
The Encrypting File System (http://technet.microsoft.com/en-us/library/cc700811.aspxfor more
information).
Note that during a SQL Server backup of FILESTREAM data, the data is stored decrypted. A
restore operation to a normal volume therefore will result in decrypted values.
Management Considerations
Following are some considerations for maintenance and management of FILESTREAM data.
Maintenance Tasks
Because FILESTREAM is implemented as a varbinary(max) column and integrated directly into
the Database Engine, most SQL Server management tools and functions work without
modification for FILESTREAM data.
Index Management
Indexes can become fragmented over time and might need to be rebuilt. Rebuilding indexes
only addresses the Database Engine pages and does not impact the FILESTREAM data. Note
that index rebuilds are less resource intensive and can be completed in a much shorter time
than when BLOBs are stored in the database.
Note: The varbinary(max) data type prevents online operations. This is a good reason to keep
the data table narrow.
When files on a volume grow, they can also become fragmented, meaning that the collection of
clusters allocated to the file is not contiguous. When the file is read sequentially, the underlying
disk heads need to read all the clusters in sequence, which could mean they have to read
different portions of the disk. Even if files do not grow once they have been created, they could
become fragmented if they were created on a volume where the available free space is not in a
single contiguous chunk.
Fragmentation reduces the sequential read performance; this is similar to index fragmentation
within a database, which can slow down query range scan performance. It is therefore essential
that the volume hosting the FILESTREAM objects be periodically defragmented. Also, if the
volume that will be used to host the FILESTREAM data container was previously used, or if it
still contains other data, the fragmentation level should be checked and fixed if necessary.
12
Filegroup Management
FILESTREAM filegroups inherit characteristics of row data filegroups, except when only one
data container exists per filegroup.
Note: The next major release of SQL Server will allow multiple data containers for
FILESTREAM filegroups.
Many solutions do not take advantage of all of the useful capabilities of filegroups.
FILESTREAM filegroups tend to be extremely large, therefore Read-Only filegroups, Partial
Restore, and Filegroup Backup operations are often very useful, though potentially complex.
database transaction log and the FILESTREAM transaction log lets the FILESTREAM and
structured data be recovered correctly.
A significant advantage of using FILESTREAM instead of storing the BLOB in the database is
the reduction in the size of the transaction log. In Full Recovery mode, inserts into a nonFILESTREAM varbinary(max) column are fully logged; if the column is FILESTREAM enabled,
the transaction log does not contain the BLOB.
The directory that holds the BLOB files has a folder called $FSLOG, which acts like a
transaction log. Unlike the transaction log of the row data files, however, a copy of the
FILESTREAM file is not stored. The algorithms of this transaction log ensure that the space
consumption is minimal, (almost negligible).
The following operations can cause growth similar to that of a row data transaction log:
With an INSERT operation, the new file is created and, in simple terms, very small
(approximately 12 Bytes) text files are created to track which files are new.
An UPDATE operation is not performed in-place. The original file is therefore retained,
and a new file is created with updated values and small files created in the $FSLOG
directory.
Frequently updated environments with updates that are equal to or larger than the original value
can consume disk space very quickly. The files are cleared from the file system when they
qualify to be removed by the garbage collector.
Garbage Collector
Files that are no longer needed are removed by a garbage collection process. This process is
automatic, unlike that in Windows SharePoint Services, where garbage collection must be
implemented manually on the external BLOB store. FILESTREAM garbage collection is a
background task that is triggered by the database checkpoint process. A checkpoint is
automatically run when enough transaction logs have been generated. For most
implementations, an administrator simply needs to know that the process exists, and that it is
the mechanism to remove deleted FILESTREAM files.
Note the following considerations:
The garbage collector kicks in approximately every five minutes and requires the system
to be idle.
The garbage collector first flags files as To be Deleted, and then removes them.
CHECKPOINT is the only way to manually initiate the garbage collector, but this does
not guarantee immediate garbage collector response.
To manage space consumed by the Deferred Update behavior of FILESTREAM data, you
should make sure that a CHECKPOINT and backup have been run, followed about 10 minutes
later with another backup.
14
Log Shipping
Log shipping supports FILESTREAM. Both the primary and secondary servers must be running
SQL Server 2008 or a later version and must have FILESTREAM enabled.
Database Mirroring
As of SQL Server 2008 R2, Database Mirroring is not yet supported. However, the upcoming
release of SQL Server, code-named Denali, will provide this functionality.
Database Snapshot
It is possible to create a snapshot of a database that contains FILESTREAM filegroups,
however the actual FILESTREAM filegroups cannot be part of the definition; in other words, the
FILESTREAM data cannot participate in the database snapshot. Understanding this can be
useful in migration scenarios, where a snapshot can provide a fast rollback option.
AlwaysON (HADRON)
The current Community Technology Preview (CTP) release (SQL Server code-named Denali)
will support FILESTREAM in the High Availability Disaster RecoveryAlwaysON (HADRON)
configuration.
15
The value of this strategy becomes clear when you are managing large sets of data and
accommodating partitioning strategies that align with filegroup strategies. For example, when
switching from a table into another, indexes and foreign key relationships must be identical.
Following is an example of a table creation that aligns with a partitioning strategy:
CREATE TABLE [dbo].[FILESTREAMTable] (
[BlobID] uniqueidentifier
,[Blob]
After you store data in a FILESTREAM column, you can access the files by using Transact-SQL
transactions or by using Win32 APIs.
Note: Unlike FILESTREAM, FileTable does allow for in-place updating of BLOB values.
16
Transact-SQL Access
Transact-SQL access is not the most efficient access to FILESTREAM data. However, TransactSQL access provides the ability to introduce FILESTREAM into a solution without requiring the
application to be aware of this new storage format.
When you use Transact-SQL, you can insert, update, and delete FILESTREAM data. Following
are some considerations for using Transact-SQL:
A large amount of data is more efficiently streamed into a file that uses Win32 interfaces.
When a FILESTREAM field is set to NULL, the BLOB data associated with the field is
deleted.
Deleting a row will find the respective individual file and mark it as ready to be deleted. It
will not free disk space until the garbage collector removes it.
A truncate operation marks the tables directory as ready to be deleted and creates a
new directory.
Statement Model
The FILESTREAM file system access models a Transact-SQL statement by using file open and
close. The statement starts when a file handle is opened and ends when the handle is closed.
For example, when a write handle is closed, any possible AFTER trigger that is registered on
the table operates as if an UPDATE statement is completed.
Storage Namespace
In FILESTREAM, the Database Engine controls the BLOB physical file system namespace. A
new intrinsic function, PathName, provides the logical UNC path of the BLOB that corresponds
to each FILESTREAM cell in the table. The application uses this logical path to obtain the
17
Win32 handle and operate on the BLOB data by using regular Win32 file system interfaces. The
function returns NULL if the value of the FILESTREAM column is NULL.
Transactional Durability
With FILESTREAM, upon transaction commit, the Database Engine ensures transaction
durability for FILESTREAM BLOB data that is modified from the file system streaming access.
Isolation Semantics
The isolation semantics are governed by Database Engine transaction isolation levels. Only the
read-committed isolation level is supported for file system access. Repeatable read operations,
and also serializable and snapshot isolations, are supported when the FILESTREAM data is
accessed by using Transact-SQL. Dirty read is not supported.
The file system access open operations do not wait for locks. Instead, the open operations fail
immediately if they cannot access the data because of transaction isolation. The streaming API
calls fail with ERROR_SHARING_VIOLATION if the open operation cannot continue because of
isolation violation.
Partial Updates
To allow for partial updates to be made, the application can issue a device file system control
(FSCTL, or FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) to fetch the old content
into the file that the opened handle references. This will trigger a server-side old content copy.
For better application performance and to avoid running into potential time-outs when you are
working with very large files, Microsoft recommends that you use asynchronous I/O.
If the FSCTL is issued after the handle has been written to, the last write operation will persist.
For more information, see FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT
(http://technet.microsoft.com/en-us/library/cc627407.aspx).
write operations will always be sent to the server. The data can be cached on the server side.
Microsoft recommends that applications that are running on remote client computers
consolidate small write operations to make fewer write operations using larger data size.
Creating memory mapped views (memory mapped I/O) by using a FILESTREAM handle is not
supported. If memory mapping is used for FILESTREAM data, the Database Engine cannot
guarantee consistency and durability of the data or the integrity of the database.
Migration Considerations
Prior to SQL Server 2008, solutions stored BLOB data either within the database as varbinary
objects or externally in a file system. You can use several techniques to migrate from these
options to a FILESTREAM format. While the details of these migration strategies are out of the
scope for this white paper, the following sections describe some migration considerations.
Several factors influence which migration technique is most appropriate:
The applications ability and business tolerance to handle temporarily unavailable BLOB
values.
A simple solution for the migration of data currently residing in a database is to use INSERT
INTO the FILESTREAM-enabled table. If the data resides in a file system, a process or
application is required to read the values and insert them into the FILESTREAM-enabled
database.
Note: It is not possible to simply point FILESTREAM to existing locations. In all cases, a BLOB
must pass through the Database Engine, because internal structures maintain references
between the table row and the location on the file system.
The simple solution is not adequate if the source data is many terabytes in size, or if the
application (or business) cannot tolerate the duration of transferring the data. Also, a highperforming system generally stores BLOBs smaller than 1 MB in the database and larger
BLOBs outside of the database. You might also want the application to create and store
thumbnails as part of the migration process. These all make the migration more complex, so
carefully consider migration during the planning phase for your solutions.
Note: FileTable eases migration because you can copy and paste; however, it is still important
to carefully consider your migration strategy.
19
Conclusion
There are many factors to consider when deciding which solution is best for storing BLOBs. This
white paper serves as a companion to the information about FILESTREAM found in many
sources. It delves deeply into selected topics that IT professionals should consider when
implementing a solution that uses FILESTREAM. The links that follow provide further
information.
20
21
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.
22