You are on page 1of 47

Fast Track Data Warehouse 3.

0 Reference Guide
SQL Server Technical Article
Version 3.0.8, Final 2/1/2011
Writers: Eric Kraemer, Mike Bassett
Technical Reviewers: Eric Hanson, Stuart Ozer, Umair Waheed, Mike Ruthruff, Thomas Kejser, Ashit Gosalia, Tamer Farag,
Nicholas Dritsas, Anjan Das, Alexei Khalyako, Chris Mitchell

Version 2.0, Published November 2009
Writers: Dave Salch, Eric Kraemer, Umair Waheed, Paul Dyke
Technical Reviewers: J ose Blakeley, Stuart Ozer, Eric Hanson, Mark Theissen, Mike Ruthruff

Published: 4 February 2011
Updated: 8 January 2011
Applies to: SQL Server 2008 R2 Enterprise

Summary: This paper defines a reference configuration model (known as Fast Track Data
Warehouse) using an I/O balanced approach to implementing a symmetric multiprocessor
(SMP)-based SQL Server data warehouse with proven performance and scalability expectations
for data warehouse workloads. The goal of a Fast Track Data Warehouse reference
configuration is to achieve a cost-effective balance between SQL Server data processing
capability and realized component hardware throughput.


Page 2


Copyright
This document is provided as-is. Information and views expressed in this document, including
URL and other Internet Web site references, may change without notice. You bear the risk of
using it.
Some examples depicted herein are provided for illustration only and are fictitious. No real
association or connection is intended or should be inferred.
This document does not provide you with any legal rights to any intellectual property in any
Microsoft product. You may copy and use this document for your internal, reference purposes.
2011 Microsoft. All rights reserved.


Page 3


Contents
FT 3.0 Highlights ........................................................................................................................ 4
Introduction ................................................................................................................................ 4
Audience ................................................................................................................................ 4
Fast Track Data Warehouse ...................................................................................................... 4
Methodology .............................................................................................................................. 5
FTDW Workload ........................................................................................................................ 7
Choosing a FTDW Reference Configuration .............................................................................10
Option 1: Basic Evaluation .....................................................................................................11
Option 2: Full Evaluation .......................................................................................................13
Option 3: User-Defined Reference Architectures ...................................................................14
FTDW Standard Configuration ..................................................................................................15
Hardware Component Architecture ........................................................................................15
Application Configuration .......................................................................................................18
SQL Server Best Practices for FTDW .......................................................................................22
Data Architecture ...................................................................................................................22
Indexing .................................................................................................................................23
Managing Data Fragmentation ..............................................................................................25
Loading Data .........................................................................................................................27
Benchmarking and Validation ....................................................................................................31
Fast Track Data Warehouse Reference Configurations ............................................................38
Conclusion ................................................................................................................................38
Appendix ...................................................................................................................................39
FTDW System Sizing Tool ........................................................................................................39
Validating a Fast Track Reference Architecture ........................................................................39
Synthetic I/O Testing .............................................................................................................39
Workload Testing ..................................................................................................................43


Page 4


FT 3.0 Highlights
The following table provides a list of notable changes between the FTDW 2.0 and FTDW 3.0
Reference Guides.
Description Note Page
FTDW 3.0
Architecture
Basic component architecture for FT
3.0 based systems.
FT 3.0 Component Architecture
New Memory
Guidelines
Minimum and maximum tested
memory configurations by server
socket count.
Memory
Additional Startup
Options
Notes for T-834 and setting for Lock
Pages in Memory.
Startup Options
Storage Configuration RAID1+0 now standard (RAID1 was
used in FT 2.0).
Storage System
Evaluating
Fragmentation
Query provided for evaluating logical
fragmentation.
Fragmentation
Loading Data Additional options for CI table loads. Loading Data
MCR Additional detail and explanation of
FTDW MCR Rating.
What is MCR?

Introduction
This document defines the component architecture and methodology for the SQL Server Fast
Track Data Warehouse (FTDW) program. The result of this approach is the validation of a
minimal Microsoft SQL Server data management software configuration, including software and
hardware, required to achieve and maintain a baseline level of out of box scalable
performance when deploying a SQL Server database for many data warehousing workloads.
Audience
The target audience for this document consists of IT planners, architects, DBAs, CIOs, CTOs,
and business intelligence (BI) users with an interest in options for their BI applications and in the
factors that affect those options.
Fast Track Data Warehouse
The SQL Server Fast Track Data Warehouse initiative provides a basic methodology and
concrete examples for the deployment of balanced hardware and database configuration for a
data warehousing workload.
Balance is measured across the key components of a SQL Server installation; storage, server,
application settings, and configuration settings for each component are evaluated. The goal is to
achieve an efficient out-of-the-box balance between SQL Server data processing capability and

Page 5


aggregate hardware I/O throughput. Ideally, minimum storage is purchased to satisfy customer
storage requirements and provide sufficient disk I/O for SQL Server to achieve a benchmarked
maximum data processing rate.
Fast Track
The Fast Track designation signifies a component hardware configuration that conforms to the
principles of the FTDW reference architecture (RA). The reference architecture is defined by a
workload and a core set of configuration, validation, and database best practices guidelines.
The following are key principles of Fast Track DW reference architectures:
Detailed and validated hardware component specifications
Validated methodology for database and hardware component evaluation
Component architecture balance between database capability and hardware bandwidth
Value Proposition
The following principles create the foundation of the FTDW value proposition:
Predetermined balance across key system components. This minimizes the risk of
overspending for CPU or storage resources that will never be realized at the application
level.
Predictable out-of-the-box performance. Fast Track configurations are built to
capacity that already matches the capabilities of the SQL Server application for a
selected server and workload.
Workload-centric. Rather than being a one-size-fits-all approach to database
configuration, the FTDW approach is aligned specifically with a data warehouse use
case.
Methodology
Holistic Component Architecture
SQL Server FTDW reference architectures provide a practical framework for balancing the
complex relationships between key components of database system architecture. Referred to
generically as a stack, the component architecture is illustrated in Figure 1.

Page 6



Figure 1: Example Fast Track database component architecture
Each component of the stack is a link in a chain of operations necessary to process data in SQL
Server. Evaluating the stack as an integrated system enables benchmarking that establishes
real bandwidth for each component. This ensures that individual components provide sufficient
throughput to match the capabilities of the SQL Server application for the prescribed stack.
Workload Optimized
Different database application workloads can require very different component architectures to
achieve optimal resource balance. A classic example of this can be found in the contrast
between discrete lookup-based OLTP workloads and scan-intensive analytical data
warehousing. OLTP use cases are heavily indexed to support low latency retrieval of small
numbers of rows from relatively large data sets. These types of database operations induce
significant disk head movement and generate classic random I/O scan patterns. Analytical use
cases, such as data warehousing, can involve much larger data requests and benefit greatly
from the increased total throughput potential of sequential disk scans.
For these contrasting use cases, the implications for a balanced component stack are
significant. Average, per-disk random I/O scan rates for modern SAS disk drives can be a factor
of 10 times slower when compared to sequential scan rates for the same hardware. With Fast
Track data warehousing workloads an emphasis is placed on achieving consistently high I/O
scan rates (MB/s) rather than the more traditional focus on operations per second (IOPS).
The challenge of very different workloads is addressed by clearly defining the attributes of
customer workloads. SQL Server Fast Track workloads comprise a qualitative list of attributes
that uniquely define a common database application use case. In addition, each workload is
represented by quantitative measures including standard benchmark queries. Workload-specific
benchmarking is used to validate database configuration, best practices, and component
hardware recommendations.

Page 7


Validated SQL Server Fast Track Reference Configurations
All published Fast Track reference configurations are validated as conforming to the set of
principles and guidelines provided in this reference guide. Examples of this process can be
found in later sections of this document.
Summary
The SQL Server FTDW specification described in this reference guide is workload-centric and
component balanced. This approach acknowledges that one-size-fits-all provisioning can be
inefficient and costly for many database use cases. Increasingly complex business
requirements coupled with rapidly scaling data volumes demand a more realistic approach. By
presenting a combination of prescriptive reference architectures, benchmarking of hardware and
software components, and clearly targeted workloads, this document provides a practical
approach to achieving balanced component architectures.
FTDW Workload
Data Warehouse Workload Patterns
Typically questions asked of data warehouses require access to large volumes of data. Data
warehouses need to support a broad range of queries from a wide-ranging audience (for
example: finance, marketing, operations, and research teams).
In order to overcome the limitations of traditional data warehouse systems, organizations have
resorted to using traditional RDBMS optimization techniques such as building indexes, pre-
aggregating data, and limiting access to lower levels of data. The maintenance overheads
associated with these approaches can often overwhelm even generous batch windows. As a
data warehouse becomes more mature and the audience grows, supporting these use-case
specific optimizations becomes even more challenging, particularly in the case of late-arriving
data or data corrections.
A common solution to this challenge is to simply add drives; it is not uncommon to see hundreds
of disks supporting a relatively small data warehouse in an attempt to overcome the I/O
performance limitations of mapping a seek-based I/O infrastructure to a scan based workload.
This is frequently seen in large shared SAN environments that are traditionally seek optimized.
Many storage I/O reference patterns and techniques that encourage random I/O access,
introducing disk latency and reducing the overall storage subsystem throughput for a data
warehouse workload that is scan intensive.
Fast Track Data Warehouse is a different way of optimizing for data warehouse workloads. By
aligning database files and configuration with efficient disk scan (rather than seek) access,
performance achieved from individual disk can be many factors higher. The resulting per-disk
performance increase reduces the number of disks needed to generate sufficient I/O throughput
to satisfy the ability of SQL Server to process data for a given workload. Furthermore, some
index-based optimization techniques used to improve disk seek access can be avoided.

Page 8


When considering workloads for Fast Track Data Warehouse based systems, the data
warehouse architect should consider not only whether the current data warehouse design fits
the FTDW workload, but whether the data warehouse could benefit if the FTDW best practices
outlined in this document are adopted.
Workload Evaluation
FTDW configurations are positioned specifically for data warehousing scenarios. The data
warehousing workload has the following core principles relating to SQL Server database
operations; when applying FTDW principles or reference configurations, it is important to
evaluate the affinity of the target workload to these high-level principles.
Scan-Intensive
Queries in a data warehouse workload typically scan a large number of rows. For this reason,
disk scan performance becomes an increasing priority in contrast to transactional workloads
that stress disk seek time. The FTDW reference architecture optimizes hardware and database
software components with disk scan performance as the key priority. This results in more
efficient sequential disk reads and a correlated increase in disk I/O throughput per drive.
Nonvolatile
After data is written, it is rarely changed. DML operations, such as SQL update, that move
pages associated with the same database table out of contiguous alignment should be carefully
managed. Workloads that commonly introduce such volatility may not be well aligned to FTDW.
Where volatility does occur, periodic maintenance to minimize fragmentation is recommended.
Index-Light
Adding nonclustered indexes generally adds performance to discrete lookups of one or few
records. However, if nonclustered indexes are applied to tables in which large numbers of rows
are to be retrieved, additional fragmentation and increased random I/O disk scan patterns can
degrade overall system performance. In addition, maintaining indexes introduces a significant
data management overhead, which can generate challenges in meeting service-level
agreements (SLAs) and load windows.
Sequential scan rates can be many factors higher (10 times or more) than random access rates.
In addition, indexes deliver reduced value when large data scans represent the primary
workload. For these reasons, the net performance benefit to an environment tuned for large
analytical scan patterns can become negative.
FTDW methodology prescribes database optimization techniques that align with the
characteristics of the targeted workload. Clustered index and range partitioning are examples of
data structures that support efficient scan based disk I/O.
Partition-Aligned
A common trait of Fast Track DW workloads is the ability to take advantage of SQL Server
partitioning. Partitioning can simplify data lifecycle management and assist in minimizing

Page 9


fragmentation over time. In addition, query patterns for large scans can take advantage of range
partition qualification and significantly reduce the size of table scans without sacrificing
fragmentation or disk I/O throughput.
Additional Considerations
The following additional considerations should be taken into account during the evaluation of a
database workload:
The implementation and management of an index-light database optimization strategy is
a fundamental requirement for FTDW workloads.
It is assumed that minimal data fragmentation will be maintained within the data
warehouse. This implies the following:
o Expanding the server by adding storage requires that all performance-sensitive
tables be repopulated in a manner consistent with guidelines provided in this
document.
o Implementing highly volatile data structures, such as tables with regular row-level
update activity, may also require frequent maintenance (such as defragmentation
or index rebuilds) to reduce fragmentation.
o Loading of cluster index tables with batches of cluster key IDs that overlap
existing ranges is a frequent source of fragmentation. This should be carefully
monitored and managed in accordance with best practices provided in this
reference guide.
Data warehousing can mean many things to different audiences. Care should be taken
to evaluate customer requirements against FTDW workload attributes.
Qualitative Data Warehouse Workload Attributes
The FTDW workload can be defined through the properties of the following subject areas
related to database operations:
User requirements and access pattern
Data model
Data architecture
Database optimization
The following table summarizes data warehouse workload attributes; contrast is provided
through comparison to an OLTP or operational data store (ODS) workload.

Attribute Workload Affinity
Data Warehouse OLTP/ODS

Page 10



Use Case
Description


Read-mostly (90%-10%)
Updates generally limited to data quality
requirements
High-volume bulk inserts
Medium to low overall query
concurrency; peak concurrent query
request ranging from 10-30
Concurrent query throughput
characterized by analysis and reporting
needs
Large range scans and/or aggregations
Complex queries (filter, join, group-by,
aggregation)

Balanced read-update ratio (60%-40%)
Concurrent query throughput
characterized by operational needs
Fine-grained inserts and updates
High transaction throughput (for example,
10s K/sec)
Medium-to-high overall user concurrency.
Peak concurrent query request ranging
from 50-100 or more
Usually very short transactions (for
example, discrete minimal row lookups)

Data Model


Highly normalized centralized data
warehouse model
Denormalization in support of reporting
requirements often serviced from BI
applications such as SQL Server
Analysis Services
Dimensional data structures hosted on
the database with relatively low
concurrency, high volume analytical
requests
Large range scans are common
Ad-hoc analytical use cases

Highly normalized operational data model
Frequent denormalization for decision
support; high concurrency, low latency
discrete lookups
Historical retention of data is limited
Denormalized data models extracted from
other source systems in support of
operational event decision making

Data Architecture


Significant use of heap table structures
Large partitioned tables with clustered
indexes supporting range-restricted
scans
Very large fact tables (for example,
hundreds of gigabytes to multiple
terabytes)
Very large data sizes (for example,
hundreds of terabytes to a petabyte)

Minimal use of heap table structures
Clustered index table structures that
support detailed record lookups (1 to few
rows per request)
Smaller fact tables (for example, less
than100 GB)
Relatively small data sizes (for example, a
few terabytes)

Database
Optimization


Minimal use of secondary indexes
(described earlier as index-light)
Partitioning is common

Heavy utilization of secondary index
optimization
Choosing a FTDW Reference Configuration
There are three general approaches to using the FTDW methodology described within this
document. The first two are specific to the use of published, conforming Fast Track reference
architectures for data warehousing. These approaches enable the selection of predesigned
systems published as part of the FTDW program. The third approach treats core Fast Track
methodology as a guideline for the creation of a user-defined data warehousing system. This

Page 11


final approach implies detailed workload profiling and system benchmarking in advance of
purchase or deployment. It requires a high degree of technical knowledge.
Option Pros Cons
1. Basic Evaluation Very fast system set-up and
procurement (days to weeks)
Minimize cost of design and
evaluation
Lower infrastructure skill
requirements
Possibility of over-specified storage
or under-specified CPU
2. Full Evaluation Predefined reference architecture
tailored to expected workload
Potential for cost-saving on hardware
Increased confidence in solution
Evaluation takes effort and time
(weeks to months)
Requires detailed understanding of
target workload
3. User-defined
Reference
Architecture
Potential to reuse existing hardware
Potential to incorporate latest
hardware
System highly tailored for your use-
case
Process takes several months
Requires significant infrastructure
expertise
Requires significant SQL Server
expertise

Option 1: Basic Evaluation
The basic evaluation process describes a scenario in which a full platform evaluation (proof of
concept) will not be performed. Instead, the customer has already targeted an FTDW reference
configuration or has alternative methods to determine server and CPU requirements.
Step 1: Evaluate the Customer Use Case
Fast Track Data Warehouse reference configurations are not one-size-fits-all configurations of
software and hardware. Rather, they are configured for the characteristics of a data
warehousing workload.
Workload
FTDW workload definitions provide two key points for use case evaluation. The first is a set of
core principles that define key elements of the workload as it relates to SQL Server
performance. These principles should be measured carefully against a given use case because
conflicts may indicate that a target workload is not appropriate for an FTDW reference
architecture.
The second component to a workload is a general description of the targeted use case. This
provides a useful high-level description of the use case in addition to providing a reasonable
starting point for evaluating workload fit.
Workload Evaluation
The following list outlines a basic process for customer workload evaluation. This is a qualitative
assessment and should be considered a guideline:

Page 12


1. Define the targeted workload requirements. Compare and contrast to FTDW workload
attributes. For more information, see the FTDW Workload Evaluation section of this
document.
2. Evaluate FTDW best practices. Practices relating to database management and data
architecture and system optimization should be evaluated against the target use case
and operational environment.
Making a Decision
The goal of this workload assessment is to ensure that you make a fully informed decision when
you choose a published FTDW reference configuration. In reality most data warehousing
scenarios represent a mixture of conforming and conflicting attributes relative to the FTDW
workload. High priority workload attributes with a strong affinity for Fast Track reference
configurations are listed here; primary customer use cases that directly conflict with any of these
attributes should be carefully evaluated because they may render the methodology invalid for
the use case.
Workload
The following workload attributes are high priority:
Critical workloads feature scan-intensive data access patterns (that is, those that can
benefit from sequential data placement). In general, individual query requests involve
reading tens of thousands to millions (or higher) of rows.
High data capacity, low concurrency relative to common OLTP workloads.
Low data volatility. Frequent update/delete DML activity should be limited to a small
percentage of the overall data warehouse footprint.
Database Management
This includes database administration, data architecture (data model and table structure), and
data integration practices:
Index-light, partitioned data architecture.
Careful management of database fragmentation, through suitable loading and ETL
strategies and periodic maintenance.
Predictable data growth requirements. FTDW systems are prebuilt to fully balanced
capacity. Storage expansion requires data migration.
Step 2: Choose a Published FTDW Reference Configuration
A customer may have a server in mind when performing a simple evaluation based on budget or
experience. Alternatively, the customer may already have a good idea of workload capacity or
an existing system on which to base analysis of bandwidth requirements. In any case, a full
platform evaluation is not performed in an FTDW simple evaluation. Instead, a conforming
FTDW configuration is selected that matches the estimated customer requirements.

Page 13


Option 2: Full Evaluation
Fast Track-conforming reference architectures provide hardware component configurations
paired with defined customer workloads. The following methodology allows for a streamlined
approach to choosing a database component architecture that ensures better out-of-the-box
balance among use case requirements, performance, and scalability.
Process Overview
The following process flow summarizes the FTDW Full Evaluation selection process:
1. Evaluate Fast Track workload attributes against the target usage scenario.
2. Identify server and/or bandwidth requirements for the customer use case. A published
FTDW reference configuration must be chosen to begin an evaluation.
3. Identify a query that is representative of customer workload requirement.
4. Calculate the Benchmark Consumption Rate (BCR) of SQL Server for the query.
5. Calculate the Required User Data Capacity (UDC).
6. Compare BCR and UDC ratings against published Maximum CPU Consumption Rate
(MCR) and Capacity ratings for conforming Fast Track reference architectures.
The following describes individual points of the Full Evaluation process flow in detail.
Step 1: Evaluate the Customer Use Case
Workload Evaluation
This process is the same as for Option 1: Basic Evaluation.
Select FTDW Evaluation Hardware
Before you begin a full system evaluation, you must choose and deploy a published FTDW
reference configuration for testing. You can choose among several methods to identify an
appropriate reference configuration. The following approaches are common:
Budget. The customer chooses to buy the highest-capacity system and/or highest-
performance system for the available budget.
Performance. The customer chooses to buy the highest-performing system available.
In-house analysis. The decision is based on workload analysis the customer has run on
existing hardware.
Ad-hoc analysis. The FTDW Sizing Tool provides a basic approach to calculating FTDW
system requirements based on basic assumptions about the targeted database
workload.
Step 2: Establish Evaluation Metrics
The following three metrics are important to a full FTDW evaluation, and they comprise the key
decision criteria for hardware evaluation. For more information about calculating these metrics,
see the Fast Track Database Validation section of this document.
Maximum CPU Core Consumption Rate (MCR)
Benchmark Consumption Rate (BCR)

Page 14


Required User Data Capacity (UDC)
MCR
This metric measures the maximum SQL Server data processing rate for a standardized query
against a standardized data set for a specific server and CPU combination. This is provided as
a per-core rate and is measured as a simple query-based scan from memory cache. MCR is the
initial starting point for Fast Track system design and represents the maximum required I/O
bandwidth for the server, CPU, and workload.
BCR
BCR is measured by a query or set of queries that are considered definitive of the Fast Track
DW workload. BCR is calculated in terms of total read bandwidth from disk, rather than from
cache as with the MCR calculation. The relative comparison of MCR to BCR values provides a
common reference point when evaluating different reference architectures for unique customer
use cases. The determination of BCR can enable tailoring of the infrastructure for a given
customer use case.
UDC
This is simply the customer-required user data capacity for the SQL Server database. It is
important to take growth rates into account when determining UDC. Note that any storage
expansion beyond initial deployment potentially requires data migration that would effectively
stripe existing data across the new database file locations.
Step 3: Choose a Fast Track Data Warehouse Reference Configuration
After it is calculated, BCR can be compared against published MCR and capacity ratings
provided by Fast Track vendors for each published FTDW reference configuration. BCR can be
seen as a common reference point for evaluating results from the test/evaluation system against
published configurations. This allows the customer to choose the Fast Track option that best
aligns with the test results.
Option 3: User-Defined Reference Architectures
This approach leverages the FTDW methodology to tailor a system for a specific workload or
set of hardware. This approach requires a thorough understanding of both SQL Server and the
hardware components that it runs on. The following steps outline the general approach for
developing a user-defined reference architecture that conforms to FTDW principles.
Step 1: Define Workload
Understanding the target database use case is central to FTDW configurations, and this applies
equally to any custom application of the guidance provided within this document. Guidance for
Fast Track RAs, specifically on the topic of workloads, can be used as a reference model for
incorporating workload evaluation into component architecture design.

Page 15


Step 2: Establish Component Architecture Benchmarks
The following framework provides an outlined approach to developing a reference architecture
for a predefined workload. For more information about this process, see the Error! Reference
source not found. section of this document.
1. Establish the Maximum CPU Core Consumption Rate (MCR) for the chosen server and
CPU.
a. Use the method outlined in the Error! Reference source not found. section to
calculate the standardized MCR.
b. You can also use published MCR ratings for FTDW configurations. In general,
CPUs of the same family have similar CPU core consumption rates for the SQL
Server database.
2. Use the MCR value to estimate storage and storage network requirements and create
an initial system design.
3. Procure a test system based on the initial system design. Ideally this will be the full
configuration specified.
4. Establish a Benchmark Consumption Rate (BCR).
a. Based on workload evaluation, identify a query or in the ideal case a set of
representative queries.
b. Follow the practices described in this reference guide to establish baseline I/O
characteristics and BCR rating.
5. Adjust the system design based on results.
6. Establish the final server and storage configuration.
Step 3: System Validation
The goal of system benchmarking should be configuration and throughput validation of the
hardware component configuration identified in Step 2. For more information about this process,
see the Error! Reference source not found. section of this document.
1. Evaluate component throughput against the established performance requirements. This
ensures that real system throughput matches expectations.
2. Validate system throughput by rebuilding to final configuration and executing final
benchmarks. As a general rule, final BCR should achieve 80 percent or better of the
system MCR.
FTDW Standard Configuration
Hardware Component Architecture
Current Fast Track Data Warehousing reference architectures are based on dedicated storage
configurations (SAN or SAS at time of publication). Disk I/O throughput is achieved through the
use of independent, dedicated storage enclosures and processors. Additional details and
configurations are published by each Fast Track vendor. Figure 2 illustrates the component-
level building blocks that comprise a Fast Track Data Warehousing reference architecture
based on SAN storage.

Page 16



Figure 2: Example storage configuration for a 2-socket, 12-core server
Component Requirements and Configuration
Server
Memory: Memory allocations for Fast Track RAs are based on lab benchmark results with the
goal of balancing maximum logical throughput (total pages read from disk and buffer over time)
with CPU utilization. Recommended memory allocation for FT 3.0 architectures are provided in
the following schedule. The maximum memory values provided are not hard limits, but
represent the maximum configurations tested.
Server Size Minimum Memory Maximum Memory
2 Socket 96 GB 256 GB
4 Socket 128 GB 512 GB
8 Socket 256 GB 512 GB

The following considerations are also important when evaluating system memory requirements:
Query from cache: Workloads that service a large percentage of queries from cache
may see an overall benefit from increased RAM allocations as the workload grows.

Page 17


Hash Joins and Sorts: Queries that rely on large-scale hash joins or perform large-
scale sorting operations will benefit from large amounts of physical memory. With
smaller memory, these operations will spill to disk and heavily utilize tempdb, which
introduces a random I/O pattern across the data drives on the server.
Loads: Bulk inserts can also introduce sorting operations that utilize tempdb if they
cannot be processed in available memory.
Local Disk: A 2-disk RAID1 array is the minimum allocation for operating system and SQL
Server application installation. Remaining disk configuration depends on the use case and
customer preference.
Fibre Channel SAN
HBA SAN: All HBA and SAN network components vary to some degree by make and model.
In addition, storage enclosure throughput can be sensitive to SAN configuration and PCIe bus
capabilities. This recommendation is a general guideline and is consistent with testing
performed during FTDW reference configuration development.
All FC ports should be connected to the FC switch within the same FC zone. Only ports in use
for Fast Track should exist in the zone. Affinity between HBA port and SAN port is not
prescribed by the Fast Track RA. Instead, aggregate bandwidth is assumed across a shared
zone hence a FC switch is required. HBA queue depth should be set appropriate to the
anticipated workload.
Additional, vendor-specific HBA and SAN configuration details can be found for some Fast
Track partner solutions. For Fast Track 3.0 HP, Dell and IBM all publish technical white papers
with additional guidance for SAN configuration and PCIe slot optimization. If differing guidance
is offered, vendor-specific guidance supersedes this Reference Guide.
Multipath I/O (MPIO): MPIO should be configured. Each disk volume / LUN from the SAN
should appear as a separate disk in Windows. Each volume hosted on the SAN arrays should
have multiple MPIO paths defined with at least one Active path.
Round-robin, with subset policy should be used where the Microsoft Device-Specific Module
(DSM) is configured. Vendor-specific DSMs and/or documentation may prescribe different
settings and should be reviewed prior to configuration.
Storage
Logical File System: Mounting LUNs to windows folders (mount points) rather than drive
letters is preferred due to the number of drives added in a Fast Track system.
It can be useful to understand which Windows operating system drive assignment represents
which LUN (volume), RAID Disk Group, and Windows Server Mount Point in the storage
enclosures. You can adopt a naming scheme for the mount points and volumes when mounting
LUNs to Windows folders. Refer to the FTDW 3.0 Schema Wizard found at the SQL Server
2008 R2 Fast Track Portal for examples of Volume naming schemes recommended for FTDW
systems.

Page 18


You can use vendor-specific tools to achieve the recommended volume naming scheme. If a
tool is not available you can make one disk available to Windows at a time from the storage
arrays while assigning drive names to ensure the correct physical-to-logical topology.
Physical File System: For more information, including detailed instructions, see the Application
Configuration section.
Storage Enclosure Configuration: All enclosure settings remain at their defaults. FTDW
specifications for file system configuration require storage enclosures that allow specific
configuration of RAID groupings and LUN assignments. This should be taken into account for
any FTDW reference configuration hardware substitutions or custom hardware evaluations.
Application Configuration
Windows Server 2008
Default settings should be used for the Windows Server 2008 (or Windows Server 2008 R2)
operating system. The Multipath I/O feature is required.
The hotfixes listed below should be applied for Windows Server 2008 R2. The hotfixes will not
deploy if they have been previously applied or are no longer necessary.
a. kb982383
b. kb979149
c. kb976700

SQL Server 2008 R2
Startup Options
-E must be added to the start-up options. This increases the number of contiguous extents in
each file that are allocated to a database table as it grows. This improves sequential disk
access. Microsoft Knowledge Base Article 329526 describes the -E option in more detail.
-T1117 should be added to the start-up options. This trace flag ensures even growth of all files
in a file group in the case that autogrow is enabled. The standard FTDW recommendation for
database file growth is to preallocate rather than autogrow (with the exception of tempdb) as
noted in Storage Configuration Details.
Enable option Lock Pages in Memory. For more information, see How to: Enable the Lock
Pages in Memory Option.
-T834 should be evaluated on a case-by-case basis. This can improve throughput rates for
many DW workloads. This flag enables large page allocations in memory for the SQL Server
buffer pool. Benchmarks for FT 3.0 do not use T834. For more information about this and other
trace flags, see SQL Server Performance Tuning & Trace Flags.

Page 19



Resource Governor
Data warehousing workloads typically include a good proportion of complex queries operating
on large volumes of data. These queries often consume large amounts of memory, and they
can spill to disk where memory is constrained. This has specific implications in terms of
resource management. You can use the Resource Governor technology in SQL Server 2008 to
manage resource usage. Maximum memory settings are of particular importance in this context.
In default settings for SQL Server, Resource Governor is provides a maximum of 25 percent of
SQL Server memory resources to each session. This means that, at worst, three queries heavy
enough to consume at least 25 percent of available memory will block any other memory-
intensive query. In this state, any additional queries that require a large memory grant to run will
queue until resources become available.
You can use Resource Governor to reduce the maximum memory consumed per query.
However, as a result, concurrent queries that would otherwise consume large amounts of
memory utilize tempdb instead, introducing more random I/O, which can reduce overall
throughput. While it can be beneficial for many data warehouse workloads to limit the amount of
system resources available to an individual session, this is best measured through analysis of
concurrency benchmarks and workload requirements. For more information about how to use
Resource Governor, see Managing SQL Server Workloads with Resource Governor in SQL
Server Books Online. Vendor specific guidance and practices for Fast Track solutions should
also be reviewed. In particular, larger 4-socket and 8-socket Fast Track solutions may rely on
specific Resource Governor settings to achieve optimal performance.
In summary, there is a trade-off between lowering constraints that offer higher performance for
individual queries and more stringent constraints that guarantee the number of queries that can
run concurrently. Setting MAXDOP at a value other than the maximum will also have an impact
in effect, it further segments the resources available, and it constrains I/O throughput to an
individual query.
For more information about best practices and common scenarios for Resource Governor, see
the white paper Using the Resource Governor.
Storage System
Managing fragmentation is crucial to system performance over time for Fast Track Data
Warehouse reference architectures. For this reason, a detailed storage and file system
configuration is specified.
Storage System Components
Figure 3 provides a view that combines three primary layers of storage configuration for the
integrated database stack. The database stack contains the following elements:
Physical disk array (4 spindle RAID 1+0)
Operating system volume assignment (LUN)

Page 20


Databases: User, System Temp, System Log

Figure 3: Example comprehensive storage architecture for a FT system based on three storage
enclosures with one LUN per disk group. This is typical of some FT 3.0 2-socket server
reference architectures.
Storage Configuration Details
For each storage enclosure, do the following.
1. Create Disk Groups of four disks each, using RAID 1+0 (RAID 10). The exact number of
Disk Groups, per storage enclosure, can vary by vendor. Refer to vendor-specific
documentation for details. In general, the number is (2) RAID10 and (1) RAID1 disk
group for large form factor (LFF) enclosures and (5) RAID10 disk groups for small form
factor (SFF) enclosures.
2. All but one disk group will be dedicated to primary user data (PRI).

Page 21


a. .All Fast Track 3.0 RAs call for either one or two LUNs to be created per PRI disk
group. Refer to vendor-specific guidance for your chosen RA. These LUNs will
be used to store the SQL Server database files (.mdf and .ndf files).
b. Ensure that primary storage processor assignment for each disk volume
allocated to primary data within a storage enclosure is evenly balanced. For
example, a storage enclosure with four disk volumes allocated for primary data
will have two volumes assigned to storage processor A and two assigned to
storage processor B.
3. Create one LUN on the remaining Disk Group to host the database transaction logs
(LOG). For some larger Fast Track configurations LOG allocations are limited to only the
first few storage enclosures in the system. In this case the additional disk groups are
used for nondatabase staging or left unpopulated to reduce cost.
For each database, do the following:
1. Create at least one filegroup containing one data file per PRI LUN. Be sure to make all
the files the same size. If you plan to use multiple filegroups within a single database to
segregate objects (for example, a staging database to support loading), be sure to
include all PRI LUN as locations for each filegroup.
2. When creating the files for each filegroup, preallocate them to their largest anticipated
size, with a size large enough to hold the anticipated objects.
3. Disable the autogrow option for data files, and manually grow all data files when the
current size limit is being approached.
4. For more information about recommendations for user databases and filegroups, see the
Managing Data Fragmentation section of this document.
For tempdb, do the following:
1. Preallocate space, and add a single data file per LUN. Be sure to make all files the same
size.
2. Assign temp log files onto one of the LUNs dedicated to LOG files.
3. Enable Autogrow; In general the use of a large growth increment is appropriate for DW
workloads. A value equivalent to 10 percent of the initial file size is a reasonable starting
point.
4. Follow standard SQL Server best practices for database and tempdb sizing
considerations. Greater space allocation may be required during the migration phase or
during the initial data load of the warehouse. For more information, see Capacity
Planning for tempdb in SQL Server Books Online.
For the transaction log, do the following:
1. Create a single transaction log file per database on one of the LUNs assigned to the
transaction log space. Spread log files for different databases across available LUNs or
use multiple log files for log growth as required.
2. Enable the autogrow option for log files.
3. Refer to existing best practices for SQL Server transaction log allocation and
management.

Page 22


SQL Server Best Practices for FTDW
Practices for Fast Track workloads are validated and documented in two cases. The first occurs
in the case that a Fast Track practice differs substantively from established SQL Server best
practices. The second case occurs in scenarios where existing practices are missing or not
easily accessible. The practices provided here are not intended to be comprehensive as there is
a rich set of existing documentation for SQL Server database deployment. Existing SQL Server
technical documentation and best practices should be referred to on numerous topics related to
a FTDW deployment.
Data Architecture
Table Structure
The type of table that is used to store data in the database has a significant effect on the
performance of sequential access. It is very important to design the physical schema with this in
mind to allow the query plans to induce sequential I/O as much as possible.
There is a choice to be made with regard to the type of table that is used to store your data. The
decision comes down to how the data in the table will be accessed the majority of the time. The
following decision tree can be used to help determine which type of table should be considered
based on the details of the data being stored.
Heap Tables
Heap tables provide clean sequential I/O for table scans and generally lower overhead with
regards to table fragmentation. They do not inherently allow for optimized (direct-access) range
based scans as found with a clustered index table. In a range scan situation, a heap table scans
the entire table (or appropriate range partition, if partitioning is applied).
Scanning heap tables reaches maximum throughput at 32 files, so use of heaps for large fact
tables on systems with high LUN (more than 32) or core (more than 16) count may require the
use of Resource Governor, DOP constraints, or changes to the standard Fast Track database
file allocation.
It is best to use heap tables where:
The majority of high priority queries against the table reference contain predicates that
reference a variety of disparate columns or have no column predicates.
Queries normally perform large scans as opposed to range-restricted scans, such as
tables used exclusively to populate Analysis Services cubes. (In such cases, the heap
table should be partitioned with the same granularity as the Analysis Services cube
being populated.)
Query workload requirements are met without the incremental overhead of index
management or load performance is of paramount importance heap tables are faster
to load.

Page 23


Clustered Index Tables
In the data warehouse environment, a clustered index is most effective when the key is a range-
qualified column (such as date) that is frequently used in restrictions for the relevant query
workload. In this situation, the index can be used to substantially restrict and optimize the data
to be scanned.
It is best to use clustered index tables if:
There are range-qualified columns in the table that are used in query restrictions for the
majority of the high-priority query workload scenarios against the table:
o For FTDW configurations, the partitioned date column of a clustered index should
also be the clustered index key.
o Choosing a clustered index key that is not the date partition column for a
clustered index table might be advantageous in some cases. However, this is
likely to lead to fragmentation unless complete partitions are loaded at a time,
because new data that overlaps existing clustered index key ranges creates
page splits.
Queries against the table normally do granular or range constrained lookups, not full
table or large multi-range scans.
Table Partitioning
Table partitioning provides an important benefit with regard to data management and minimizing
fragmentation. Because table partitioning groups the table contents into discrete blocks of data,
it enables you to manage these blocks as a whole. For example, using this technique to delete
data from your database in blocks can help reduce fragmentation in your database. In contrast,
deleting row by row induces fragmentation and may not even be feasible for large tables.
In addition, large tables that are used primarily for populating SQL Server Analysis Services
cubes can be created as partitioned heap tables, with the table partitioning aligned with the
cubes partitioning. When accessed, only the relevant partitions of the large table will be
scanned. (Partitions that support Analysis Services ROLAP mode may be better structured as
clustered indexes.)
For more information about table partitioning, see SQL Server Partitioning.
Indexing
The following guidelines for FTDW index creation are recommended for FTDW systems:
Use a clustered index for date ranges or common restrictions.
Reserve nonclustered indexing for tables where query restriction or granular lookup is
required and table partitioning does not provide sufficient performance.
Limit the use of nonclustered indexes to situations where the number of rows searched
for in the average query is very low compared to overall table size.

Page 24


Nonclustered, covering indexes may provide value for some data warehouse workloads.
These should be evaluated on a case-by-case basis.
Database Statistics
When to run statistics and how often to update them is not dependent on any single factor. The
available maintenance window and overall lack of system performance are typically the two
main reasons where database statistics issues are addressed.
For more information, see Statistics for SQL Server 2008.
Best Practices
We recommend the following best practices for database statistics:
Use the AUTO CREATE and AUTO UPDATE (sync or async) options for statistics (the
system default in SQL Server). Use of this technique will minimize the need to run
statistics manually.
If you must gather statistics manually, statistics ideally should be gathered on all
columns in a table. If it is not possible to run statistics for all columns, you should at least
gather statistics on all columns that are used in a WHERE or HAVING clause and on join
keys. Index creation builds statistics on the index key, so you dont have to do that
explicitly.
Composite (multicolumn) statistics are critical for many join scenarios. Fact-dimension
joins that involve composite join keys may induce suboptimal nested loop optimization
plans in the absence of composite statistics. Auto-statistics will not create, refresh, or
replace composite statistics.
Statistics that involve an increasing key value (such as a date on a fact table) should be
updated manually after each incremental load operation. In all other cases, statistics can
be updated less frequently. If you determine that the AUTO_UPDATE_STATISTICS
option is not sufficient for you, run statistics on a scheduled basis.
Compression
The Fast Track data warehouse configurations are designed with page compression enabled. It
is highly recommended that you use page compression on all fact tables. Compression of small
dimension tables (that is, those with less than a million rows) is optional. With larger dimension
tables it is often beneficial to use page compression. In either case, compression of dimension
tables should be evaluated on a use case basis. Row compression is an additional option that
provides reasonable compression rates for certain types of data.
SQL Server 2008 compression shrinks data in tables, indexes, and partitions. This reduces the
amount of physical space required to store user tables which enables more data to fit into the
SQL Server buffer pool (memory). One benefit of this is in a reduction of the number of I/O
requests serviced from physical storage.

Page 25


The amount of actual compression that can be realized varies relative to the data that is being
stored and the frequency of duplicate data fields within the data. If your data is highly random,
the benefits of compression are very limited. Even under the best conditions, the use of
compression increases demand on the CPU to compress and decompress the data, but it also
reduces physical disk space requirements and under most circumstances improves query
response time by servicing I/O requests from memory buffer. Usually, page compression has a
compression ratio (original size/compressed size) of between 2 and 7:1, with 3:1 being a typical
conservative estimate. Your results will vary depending on the characteristics of your data.
Managing Data Fragmentation
Fragmentation can happen at several levels, all of which must be controlled to preserve
sequential I/O. A key goal of a Fast Track Data Warehouse is to keep your data as sequentially
ordered as possible while limiting underlying fragmentation. If fragmentation is allowed to occur,
overall system performance suffers.
Periodic defragmentation is necessary, but the following guidelines can help you minimize the
number of time-consuming defragmentation processes.
File System Fragmentation
Disk blocks per database file should be kept contiguous on the physical platter within the NTFS
file system. Fragmentation at this level can be prevented by preallocation of files to their
expected maximum size upon creation.
NTFS file system defragmentation tools should be avoided. These tools are designed to work at
the operating system level and are not aware of internal SQL Server data file structures.
Extent Fragmentation
Within SQL Server, all of the pages within a file, regardless of table association, can become
interleaved down to the extent size (2M) or page level (8K). This commonly occurs due to
concurrent DML operations, excessive row-level updates, or excessive row-level deletes.
Fully rewriting the table or tables in question is the only way to ensure optimal page allocation
within a file. There are no alternative methods to resolving this type of database fragmentation.
For this reason, it is important to follow guidelines for SQL Server configuration and best
practices for loading data and managing DML.
The following query provides key information for evaluating logical fragmentation for a FTDW
table. The highest priority metric is Average Fragment Size. This value provides an integer that
represents the average number of SQL Server pages that are clustered in contiguous extents.
This value must be greater than or equal to 400 for benchmarks on FTDW 3.0 qualified
reference architectures. Lower values will have an increasingly detrimental impact on disk I/O
throughput and overall performance.
SELECT db_name(ps.database_id) as database_name
,object_name(ps.object_id) as table_name
,ps.index_id

Page 26


,i.name
,cast (ps.avg_fragmentation_in_percent as int) as [Logical Fragmentation]
,cast (ps.avg_page_space_used_in_percent as int) as [Avg Page Space Used]
,cast (ps.avg_fragment_size_in_pages as int) as [Avg Fragment Size In Pages]
,ps.fragment_count as [Fragment Count]
,ps.page_count
,(ps.page_count * 8)/1024/1024 as [Size in GB]
FROM sys.dm_db_index_physical_stats (DB_ID() --NULL = All Databases
, OBJECT_ID('$(TABLENAME)')
, 1
, NULL
, 'SAMPLED') AS ps --DETAILED, SAMPLED, NULL = LIMITED
INNER JOIN sys.indexes AS i
on (ps.object_id = i.object_id AND ps.index_id = i.index_id)
WHERE ps.database_id = db_id()
and ps.index_level = 0;

Index Fragmentation
An index can be in different physical (page) and logical (index) order.
Do not use the ALTER INDEX REORGANIZE command to resolve this type of fragmentation
because doing so can negate the benefits of large allocations. An index rebuild or the use of
INSERTSELECT to insert data into a new copy of the index (which avoids a resort) can
resolve this issue. Any ALTER INDEX REBUILD process should specify
SORT_IN_TEMPDB=TRUE to avoid fragmentation of the destination filegroup. A MAXDOP
value of 1 should also be used to maintain index page order and improve subsequent scan
speeds.
Filegroup Segmentation
Separate filegroups can be created to handle volatile data use cases such as:
Tables or indexes that are frequently dropped and re-created (leaving gaps in the
storage layout that are refilled by other objects).
Indexes for which there is no choice but to support as highly fragmented because of
page splits, such as cases in which incremental data that mostly overlaps the existing
clustered index key range is frequently loaded.

Page 27


Smaller tables (such as dimension tables) that are loaded in relatively small increments,
which can be placed in a volatile filegroup to prevent those rows from interleaving with
large transaction or fact tables.
Staging databases from which data is inserted into the final destination table.
Other tables can be placed in a nonvolatile filegroup. Additionally, very large fact tables can also
be placed in separate file groups.
Loading Data
The Fast Track component architecture is balanced for higher average scan rates seen with
sequential disk access. To maintain these scan rates, care must be taken to ensure contiguous
layout of data within the SQL Server file system.
This section is divided into the following two high-level approaches, incremental load and data
migration. This guidance is specific, but not exclusive, to Fast Track data warehousing.
For more information about SQL Server bulk load, see The Data Loading Performance Guide.
Incremental Loads
This section covers the common day-to-day load scenarios of a data warehouse environment.
This section includes load scenarios with one or more of the following attributes:
Small size relative to available system memory
Load sort operations fit within available memory
Small size relative to the total rows in the target load object
The following guidelines should be considered when you are loading heap and clustered index
tables.
Heap Table Load Process
Bulk inserts for heap tables can be implemented as serial or parallel process. Use the following
tips:
To execute the movement of data into the destination heap table, use BULK INSERT
with the TABLOCK option. If the final permanent table is partitioned, use the
BATCHSIZE option, because loading to a partitioned table causes a sort to tempdb to
occur.
To improve load time performance when you are importing large data sets, run multiple
bulk insert operations simultaneously to utilize parallelism in the bulk process.
Clustered Index Load Process
Two general approaches exist to loading clustered index tables with minimal table
fragmentation:
Option 1: Use BULK INSERT to load data directly into the destination table. For best
performance, the full set of data being loaded should fit into an in-memory sort. All data
loaded should be handled by a single commit operation by using a BATCHSIZE value of

Page 28


0. This setting prevents data in multiple batches from interleaving and generating page
splits. If you use this option, the load must occur single-threaded.
Option 2: Create a staging table that matches the structure (including partitioning) of the
destination table.
o Perform a serial or multithreaded bulk insert into the empty clustered index
staging table using moderate, nonzero batch size values to avoid spilling sorts to
tempdb. Best performance will be achieved with some level of parallelism. The
goal of this step is performance; therefore the page splits and logical
fragmentation induced by parallel and/or concurrent inserts are not a concern.
o Insert from the staging table into the destination clustered index table using a
single INSERTSELECT statement with a MAXDOP value of 1. This MAXDOP
hint setting ensures that data pages are placed contiguously within the SQL
Server data file and will clean up any logical fragmentation created in the prior
step.
Option 3: This option requires the use of two filegroups and two or more tables. The
approach requires a partitioned cluster index table and is best suited for tables that see
high levels of logical fragmentation in the most current partitions with little to no change
activity to older partitions. The overall goal is to place volatile partitions in a dedicated
filegroup and age or roll those partitions to the static filegroup after they stop receiving
new records or changes to existing records.
o Create two filegroups, following FTDW guidance. One will be dedicated to
volatile partitions and the other to static partitions. A volatile partition is one in
which more than 10 percent of rows will change over time. A static partition is
one that is not volatile.
o Build the primary cluster index partitioned table in the static filegroup.
o Build a table consistent with one of the following two general approaches:
A single heap table with a constraint that mirrors the partition scheme of
the primary table. This constraint would represent the volatile range of the
primary data set and may span one or more partition ranges of the
primary table scheme. This is most useful if initial load performance is the
key decision criteria because loads to a heap are generally more efficient
than loads to a cluster index.
A single cluster index table with a partition scheme that is consistent with
the primary table partition. This allows direct inserts with low degree of
parallelism (DOP) to the primary table as volatile partitions age. After they
are aged via insert to the primary table, partitions are dropped and new
ranges added.
o Build a view that unions both tables together. This presents the combination of
the two tables as a single object from the user perspective.
o After the volatile data ranges become static from a change-data perspective, use
an appropriate aging process (often called partition rolling):
If a heap table with constraint is used; move data by partition range to the
static filegroup via insert to staging table. Use CREATE INDEX and
partition switching to move the data into the primary table. For more

Page 29


information about this type of operation for FTDW configurations, see the
following section on Data Migration.
If a partitioned CI is used; Use a low DOP that is, less than or equal to 8)
INSERT, restricted by partition range, directly into the primary table. You
may need to set the DOP as low as one to avoid fragmentation
depending on overall system concurrency.
Data Migration
This covers large one-time or infrequent load scenarios in a data warehouse environment.
These situations can occur during platform migration or while test data is loaded for system
benchmarking. This topic includes load scenarios with one or more of the following attributes:
Load operations that exceed available system memory
High-concurrency, high-volume load operations that create pressure on available
memory
Heap Table Load Process
Follow the guidance provided for incremental load processing.
Clustered Index Load Process
Two general approaches exist to loading clustered index tables with minimal table
fragmentation:
Option 1: Use BULK INSERT to load data directly into a clustered index target table.
Sort operations and full commit size should fit in memory for best performance. Care
must be taken to ensure that separate batches of data being loaded do not have index
key ranges that overlap.
Option 2: Perform a serial or multithreaded bulk insert to an empty clustered index
staging table of identical structure. Use moderate, nonzero batch size to keep sorts in
memory. Next, insert data into an empty clustered index table using a single
INSERTSELECT statement with a MAXDOP value of 1.
Option 3: Use multithreaded bulk inserts to a partition conforming heap staging table,
using moderate nonzero batch size values to keep sorts in memory. Next, use serial or
parallel INSERTSELECT statements spanning each partition range to insert data into
the clustered index table.
Option 4: Partition switch operations can be used in a multi-step process that generally
provides the best results for large load operations. This approach adds more complexity
to the overall process and is designed to demonstrate an approach that is optimal for
raw load performance. The primary goal of this approach is to enable parallel write
activity at all phases of the insert to cluster index operation without introducing logical
fragmentation. This is achieved by staging the table across multiple filegroups prior to
inserting the data into the final destination table.
o Identify the partition scheme for the final, destination cluster index table.
o Create a stage filegroup.
o Create an uncompressed, nonpartitioned heap base staging table in the stage
filegroup.
o Bulk insert data using WITH TABLOCK to the base staging table. Multiple,
parallel bulk-copy operations are the most efficient approach if multiple source

Page 30


files are an option. The number of parallel load operations to achieve maximum
throughput is dependent on server resources (CPU and memory) and the data
being loaded.
o Identify the number of primary filegroups to be supported. This number should be
a multiple of the total number of partitions in the destination table. The number
also represents the total number of INSERT and CREATE INDEX operations to
be executed concurrently in later steps. As an example, for a table with 24
partitions and a server with eight cores eight primary filegroups can be chosen.
This allows execution of eight parallel inserts in the next steps, one for each of
the eight primary filegroups. Each filegroup, in this case, would contain three
partition ranges worth of data.
o Create the number of primary filegroups as determined earlier.
o Create one staging heap table in each primary filegroup for each partition range,
with no compression. Create a constraint on the staging table that matches the
corresponding partition range from the destination table. Using the example
given earlier, there would be three staging tables per primary filegroup created in
this step.
o Create the destination, partitioned cluster index table with page compression.
This table should be partitioned across all primary filegroups. Partitions should
align with the heap staging table constraint ranges.
o Execute one INSERT or SELECT statement from the base staging table to the
staging filegroup tables for each primary filegroup. This should be done in
parallel. Be sure that the predicate for the INSERT or SELECT statement
matches the corresponding partition ranges. Never run more than one INSERT
or SELECT statement per filegroup concurrently.
o Execute one CREATE CLUSTERED INDEX command with page compression
per filegroup for the newly populated staging tables. This can be done in parallel
but never with DOP higher than 8. Never run more than one create index per
filegroup concurrently. Be sure to use the SORT_IN_TEMPDB option whenever
performing a CREATE INDEX operation to avoid fragmenting the primary
filegroups. The optimal number of concurrent create index operations will depend
on server size, memory, and the data itself. In general, strive for high CPU
utilization across all cores without oversubscribing (85-90 percent total
utilization).
o Execute serial partition switch operations from the staging tables to the
destination table. This can also be done at the completion of each staging
CREATE INDEX operation.
Examples
For more information about data loading and optimizations, see The SQL Server 2008 Data
Loading Performance Guide.
An additional resource describing Fast Track incremental and migration load scenarios
compatible with Fast Track 2.0 and Fast Track 3.0 is provided in a Microsoft PowerPoint
presentation FTDW 3.0 Data Loading Overview that can be found at the SQL Server 2008 R2
Fast Track Portal.

Page 31


Benchmarking and Validation
This section describes the metrics and processes used to design and qualify SQL Server Fast
Track DW reference architectures. The process for FTDW validation can be divided into the two
categories described here.
Baseline Hardware Validation
The goal of hardware validation is to establish real, rather than rated, performance metrics for
the key hardware components of the Fast Track reference architecture. This process
determines actual baseline performance characteristics of key hardware components in the
database stack.
Fast Track Database Validation
Establishing SQL Server performance characteristics, based on a FTDW workload, allows for
comparison against the performance assumptions provided by the baseline hardware evaluation
process. In general, database throughput metrics should reflect at least 80 percent of baseline
rates for Fast Track validated reference architectures. The performance metrics calculated in
this process are the basis for published FTDW performance values.
Performing Baseline Hardware Evaluation
Baseline hardware testing is done at the operating system level with a tool such as SQLIO. SQL
Server application testing is not done in this phase, and all tests are synthetic, best-case
scenarios. In addition, this methodology can be used to provide validation that a FTDW system
deployment has been correctly configured.
Windows Server Performance and Reliability Monitor (Perfmon) can be used to track, record,
and report on I/O performance. A tool such as SQLIO can be used to test I/O bandwidth. For
more information about SQLIO, including instructions and download locations, see the SQLCAT
white paper Predeployment I/O Best Practices.
The following components and validation processes are used to generate baseline hardware
benchmarks.
Step 1 - Validate Fibre Channel Bandwidth
The first step in validating a FTDW configuration is to determine the maximum aggregate
throughput that can be realized between the storage I/O network and the server. This involves
removing disk as a bottleneck and focusing on the nondisk components (that is, HBAs, switch
infrastructure, and array controllers). Use the following steps to perform this task using SQLIO:
1. Generate a small data file on each LUN to be used for database files. These files should
be sized such that all data files will fit into the read cache on the array controllers (for
example, 50 MB per file).
2. Use SQLIO to issue sequential reads against the file simultaneously using large block
I/O sizes (512K) and at least two read threads per file. Be sure to calculate aggregate
outstanding reads. For example, 2 read threads with 50 outstanding requests would
account for 100 total outstanding requests to the target LUN.

Page 32


3. Start with a relatively low value for outstanding I/Os (-o) and repeat tests increasing this
value until there is no further gain in aggregate throughput.
The goal of this test is to achieve aggregate throughput that is reasonable compared with the
theoretical limits of the components in the path between the server and storage. This test
validates the bandwidth between the server and the SAN storage processors that is, the Multi-
Path Fibre Channel paths.
Step 2 - Validate LUN and RAID Pair Bandwidth
This test is similar to the previous tests. However, a larger file is used to remove possible
benefits from array cache from controller cache. These test files should be large enough to
simulate the target database file size per LUN, for example, 25 GB per LUN. Similar parameters
should be used for SQLIO as described in step 1.
Large block (512 KB) sequential reads should be issued against the test files on each LUN. We
recommend that you use a single thread per file with an outstanding request depth somewhere
between 4 and 16 (start small and increase until maximum throughput is achieved). First test
each LUN individually and then test the two simultaneously. Disk group throughput varies by
storage vendor and configuration but comparison can always be made to single HDD read
rates. A 4-disk RAID1+0 disk group, for example, should achieve a peak read rate of at least
four times the single HDD read rate for this type of basic read pattern. If this is not achieved, it is
possible that reads are not being serviced by both sides of the mirrored disk pair within the disk
group. These mirrored reads are critical to FTDW scan performance.
Step 3 - Validate Aggregate Bandwidth
In this test, sequential reads should be run across all of the available data LUNs concurrently
against the same files used in step 2. SQLIO should be run using two threads per test file, with
an I/O size of 512K and an optimal number of outstanding I/Os as determined by the previous
test.
The results of this test illustrate the maximum aggregate throughput achievable when reading
data from the physical disks.
Data is read from the large data file, as in the previous test, on each LUN simultaneously.
Aggregate performance from disk should be in the region of 80 percent to 90 percent of the
aggregate Fibre Channel bandwidth, for balanced FTDW systems.
Component Ratings
The following diagram illustrates synthetic benchmark results that are consistent with values
seen in similar Fast Track reference architectures.

Page 33



Figure 4: Example of synthetic benchmark realized bandwidth for a 2-socket, 12-core server
with 3 8Gbps dual-port HBA cards, with 12 4-disk RAID1+0 primary data LUN.
Summary
Baseline hardware benchmarking validates the actual bandwidth capability for the key hardware
components of the database stack. This is done with a series of best-case synthetic tests
executed through a tool like SQLIO.
Performing Fast Track Database Validation
This phase of Fast Track validation measures SQL Server performance for the FTDW workload
in terms of two core metrics. The first, Maximum CPU Consumption Rate (MCR), is a measure
of peak I/O processing throughput. The second, Benchmark CPU Consumption Rate (BCR) is a
measure of actual I/O processing throughput for a query or a query-based workload.
What is MCR?
The MCR calculation provides a per-core I/O throughput value in MB or GB per second. This
value is measured by executing a predefined, read-only, nonoptimized query, from buffer cache
and measuring the time to execute against the amount of data in MB or GB. Because MCR is
run from cache it represents the peak nonoptimized scan rate achievable through SQL Server

Page 34


for the FTDW workload the query represents and the hardware stack being evaluated.
Additionally, MCR can be used as a reference point when comparing the same standardized
metric from any published and validated FTDW reference architecture. This does mean,
however, that MCR is not representative of results for an actual customer workload. Rather,
MCR provides a baseline peak rate for comparison and design purposes. For example, if the
database stack being evaluated does not provide total disk I/O that is comparable to the MCR
rating for the server, the system may be under-provisioned for storage I/O. Validated FTDW 3.0
architectures will have aggregate baseline I/O throughput results that are at least 100 percent of
the server-rated MCR.
In summary:
MCR is not definitive of actual results for a customer workload.
MCR provides a maximum data processing rate baseline for SQL Server and the
associated Fast Track workload.
MCR is specific to a CPU and server. In general, rates for a given CPU do not vary
greatly by server and motherboard architecture but final MCR should be determined by
actual testing.
MCR throughput rating can be used as a comparative value against existing, published
FTDW reference architectures. This can assist with hardware selection prior to
component and application testing.
Calculating MCR
A baseline CPU consumption rate for the SQL Server application is established by running a
standard SQL query defined for the FTDW program. This query is designed to be a relatively
simple representation of a typical query for the workload type (in this case DW) and is run from
buffer cache. The resulting value is specific to the CPU and the server the query is being
executed against. Use the following method to calculate MCR:
1. Create a reference dataset based on the TPC-H lineitem table or similar data set. The
table should be of a size that it can be entirely cached in the SQL Server buffer pool yet
still maintain a minimum two-second execution time for the query provided here.
2. For FTDW 3.0 the following query is used: SELECT sum([integer field]) FROM [table]
WHERE [restrict to appropriate data volume] GROUP BY [col].
3. The environment should:
Ensure that Resource Governor settings are at default values.
Ensure that the query is executing from the buffer cache. Executing the query
once should put the pages into the buffer, and subsequent executions should
read fully from buffer. Validate that there are no physical reads in the query
statistics output.
Set STATISTICS IO and STATISTICS TIME to ON to output results.
4. Run the query multiple times, at MAXDOP = 4.
Record the number of logical reads and CPU time from the statistics output for
each query execution.
Calculate the MCR in MB/s using the formula:
( [Logical reads] / [CPU time in seconds] ) * 8KB / 1024

Page 35


A consistent range of values (+/- 5%) should appear over a minimum of five
query executions. Significant outliers (+/- 20% or more) may indicate
configuration issues. The average of at least 5 calculated results is the FTDW
MCR.

Based on the MCR calcuation, a component architecture throughput diagram can be
constructed. For the purposes of system MCR evaluation, component throughput is based on
vendor-rated bandwidth. This diagram can be useful for system design, selection, and
bottleneck analysis. Figure 5 illustrates an example of this.

Figure 5: Example of Maximum CPU Consumption Rate (MCR) and rated component
bandwidth for a 2 socket, 12 core server based on Intel Westmere CPUs.
For more information about measuring MCR, see Workload Testing in the appendix.



Page 36


Calculating BCR
A benchmark CPU consumption rate for the SQL Server application is established by running a
standard SQL query (or a set of queries) that are specific to your data warehouse workload.
This query should be serviced from disk, not from the SQL Server buffer pool as with MCR. The
resulting value is specific to the CPU, the server, and the workload it is being executed against.
The appendix entry for Workload Testing provides a more detailed example of creating a BCR
workload benchmark.
Use the following method to calculate BCR:
1. Create a reference dataset that contains at least one table. The table should be of
significant enough size that it is not entirely cached in either the SQL Server buffer pool
cache or in the SAN array cache. In absence of customer data, a synthetic dataset can
be used. It is important to attempt to approximate the expected characteristics of the
data for the targeted use case.
2. The basic query form for FTDW 3.0 is the following: SELECT sum([integer field]) FROM
[table] WHERE [restrict to appropriate data volume] GROUP BY [col].
3. For a FTDW customer benchmark it is always ideal to choose a query that is
representative of the target workload. Even better, multiple queries can also be selected
and scheduled as concurrent query workload that is representative of peak historical or
projected activity for the customer environment. The following criteria can be considered
in query selection:
Represent average target workload requirements. This may imply increasing or
decreasing the complexity of the basic query form, adding joins, and/or
discarding more or less data through projection and restriction.
The query should not cause writes of data to tempdb unless this characteristic is
a critical part of the target workload.
The query should return minimal rows. The SET ROWCOUNT option can be
used to manage this. A ROWCOUNT value greater than 100 should be used
(105 is standard for Fast Track benchmarking). Alternatively aggregation can be
used to reduce records returned from large unrestricted scans.
4. The environment should:
Ensure that Resource Governor settings are set at default.
Ensure caches are cleared before the query is run, using DBCC
dropcleanbuffers.
Set STATISTICS IO and STATISTICS TIME to ON to output results.
5. Run the query or workload multiple times, starting at MAXDOP 8. Each time you run the
query, increase the MAXDOP setting for the query, clearing caches between each run.
Record the number of logical reads and CPU time from the statistics output.
Calculate the BCR in MB/s using the formula:
( [Logical reads] / [CPU time in seconds] ) * 8KB / 1024
This gives you a range for BCR. If multiple queries are used, use a weighted
average to determine BCR.

Page 37


BCR Results
Figure 6 illustrates SQL Server workload based benchmark results that are consistent with
values seen in similar Fast Track Data Warehouse reference architectures.



Figure 6: Example of synthetic benchmark realized bandwidth for a 2-socket, 12-core server
with 3 8Gbps dual-port HBA cards, with 12 4-disk RAID1+0 primary data LUN.
Interpreting BCR
If your BCR for the average query is much lower than the standard MCR benchmarked for the
Fast Track RA, you are likely to be CPU bound. In response, you might think about reducing the
storage throughput, for example by reducing the number of arrays, introducing more disks per
array, or increasing the size of the disks these will reduce the cost of the storage infrastructure
to a balanced level. Alternatively you might think about using a higher socket count server or
higher performance CPUs that will take advantage of the surplus of storage I/O throughput. In
either case the goal is to balance database processing capability with storage I/O throughput.

Page 38


Correspondingly, if your BCR is higher than the MCR, you may need more I/O throughput to
process a query workload in a balanced fashion.
Fast Track Data Warehouse Reference Configurations
Detailed hardware reference architecture specifications are available from each participating
Fast Track Data Warehouse partner. Links to each partner solution can be found here: SQL
Server 2008 R2 Fast Track Portal.
Fast Track 3.0 capacity is rated in User Data Capacity (UDC). This calculation assumes, per
FTDW guidelines, that page compression is enabled for all user data. An average compression
factor of 3:1 is used. In addition an allocation of up to 30 percent of noncompressed capacity is
allocated to tempdb before calculating UDC. Note that for larger configurations with more total
capacity this ratio is reduced to as low as 20 percent.
For more information about tempdb sizing, see Capacity Planning for tempdb in SQL Server
Books Online.
Conclusion
SQL Server Fast Track Data Warehouse offers a template and tools for bringing a data
warehouse from design to deployment. This document describes the methodology,
configuration options, best practices, reference configurations, and benchmarking and validation
techniques for Fast Track Data Warehouse.
For more information:
SQL Server Website
SQL Server TechCenter
SQL Server Online Resources
Top10 Best Practices for Building Large Scale Relational Data Warehouses (SQLCAT team)

How to: Enable the Lock Pages in Memory Option (Windows)

Tuning options for SQL Server 2005 and SQL Server 2008 for high performance workloads

How to: Configure SQL Server to Use Soft-NUMA

Database File Initialization

How to: View or Change the Recovery Model of a Database (SQL Server Management Studio)

Monitoring Memory Usage

824190 Troubleshooting Storage Area Network (SAN) Issues


Page 39



Microsoft Storage Technologies Multipath I/O

Storport in Windows Server 2003: Improving Manageability and Performance in Hardware RAID
and Storage Area Networks

SQL Server 2000 I/O Basics White Paper
Data Compression: Strategy, Capacity Planning and Best Practices

Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing,
or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.
Appendix
FTDW System Sizing Tool
The FTDW System Sizing tool is a spreadsheet calculator that assists you with the process of
calculating a customer workload requirement in terms of FTDW system throughput. You can
use this tool in the absence of a test platform or as a starting point when evaluating customer
requirements. The tool can be found at the SQL Server 2008 R2 Fast Track Portal. Additionally,
some partner vendors have built their own conforming Fast Track sizing tools. These can be
found at the partner websites.
Validating a Fast Track Reference Architecture
Synthetic I/O Testing
SQLIO is a tool available for download from Microsoft that enables testing of the I/O subsystem
independent of SQL Server. For more information about using SQLIO, including general
predeployment advice, see the SQL Server Predeployment I/O Best Practices white paper.
Generating Test Files with SQLIO
Running SQLIO causes an appropriate file to be created, if it is not already present. To generate
a file of a specific size, use the F parameter. For example using a parameter file (param.txt)
containing:

Page 40


C:\stor\pri\1\iobw.tst 1 0x0 50
Running SQLIO with the F parameter generates a 50 MB file on first execution:
Eq sqlio -kW -s60 -fsequential -o1 -b64 -LS -Fparam.txt
This process can take some time for large files. Create one file on each data disk on which you
will host SQL Server data and tempdb files. This can be achieved by adding more lines to the
parameter file, which will create the required files one by one. To achieve parallel file creation,
create multiple parameter files and run multiple SQLIO sessions concurrently.
Testing with SQLIO
Use of SQLIO is described more completely in the best practices article. Read tests generally
take the form:
sqlio kR fSequential -s30 -o120 -b512 d:\iobw.tst t1
where R indicates a read test, 30 is the test duration in seconds, 120 is the number of
outstanding requests issued, 512 is the block size in kilobytes of requests made, d:\iobw.tst is
the location of the test file and 1 is the number of threads.
In order to test aggregate bandwidth scenarios, multiple SQLIO tests must be issued in parallel.
This can be achieved using Windows PowerShell scripts or other scripting methods.
The predeployment best practices article also covers how to track your tests using Windows
Server Performance and Reliability Monitor. Recording and storing the results of these tests will
give you a baseline for future performance analysis and issue resolution.
Validate Fibre Channel Bandwidth (from cache)
Using a small test file with a read duration of several minutes ensures that the file is completely
resident in the array cache. Figure 7 shows the Logical Disk > Read Bytes / sec counter against
the disks in an example Fast Track system at various outstanding request numbers and block
size. Tests should run for at least a few minutes to ensure consistent performance. The figure
shows that optimal performance requires an outstanding request queue of at least four requests
per file. Each individual disk should contribute to the total bandwidth.

Page 41



Figure 7: Logical Disk > Read Bytes / sec counter
Validate LUN and Disk Group Bandwidth (from Disk)
These tests ensure all disk volumes presented by the disk arrays to Windows are capable of
contributing to the overall aggregate bandwidth, by reading from each LUN, one at a time. You
may see that some of the LUNs appear to be slightly faster than others. This is not unusual but
differences greater than 15% should be examined.

Figure 8: Validating LUN and RAID pair bandwidth
Run simultaneous tests against one or more LUN that share the same disk group. The picture
shows the output of tests against 8 disk groups.

Page 42



Figure 9: Testing LUNs that share disk groups.
Validate Aggregate Bandwidth (from Disk)
The following test demonstrates the effect of stepping up the I/O throughput, adding in an
additional LUN into the test at regular intervals. As each test runs for a set interval, you see a
step down. You should observe a similar pattern. Peak aggregate bandwidth from disk should
approach 80 percent to 90 percent of the bandwidth demonstrated from cache in the first step.
The graph shows the test at multiple block sizes 512K and 64K.

Figure 10: Aggregate bandwidth at multiple block sizes

Page 43


Workload Testing
Measuring the MCR for Your Server (Optional)
The goal of MCR is to estimate the maximum throughput of a single CPU core, running SQL
Server, in absence of I/O bottleneck issues. MCR is evaluated per core. If you chose to
calculate this for your own server, additional details describing the methodology for calculating
MCR are provided here:
1. Create a reference dataset based on the TPC-H lineitem table or similar data set. The
table should be of a size that it can be entirely cached in the SQL Server buffer pool yet
still maintain a minimum 2 second execution time for the query provided here.
2. For FTDW 3.0 the following query is used: SELECT sum([integer field]) FROM [table]
WHERE [restrict to appropriate data volume] GROUP BY [col].
3. The environment should:
Ensure that Resource Governor is set to default values.
Ensure that the query is executing from the buffer cache. Executing the query
once should put the pages into the buffer and subsequent executions should
read fully from buffer. Validate that there are no physical reads in the query
statistics output.
Set STATISTICS IO and STATISTICS TIME to ON to output results.
4. Run the query multiple times, at MAXDOP = 4.
Record the number of logical reads and CPU time from the statistics output for
each query execution.
Calculate the MCR in MB/s using the formula:
( [Logical reads] / [CPU time in seconds] ) * 8KB / 1024
A consistent range of values (+/- 5%) should appear over a minimum of five
query executions. Significant outliers (+/- 20% or more) may indicate
configuration issues. The average of at least 5 calculated results is the FTDW
MCR.
Measuring the BCR for Your Workload
BCR measurement is similar to MCR measurement, except that data is serviced from disk, not
from cache. The query and dataset for BCR is representative of your target data warehousing
workload.
One approach for BCR is to take a simple query, an average query, and a complex query from
the workload. The complex queries should be those that put more demands on the CPU. The
simple query should be analogous to the MCR and should do a similar amount of work, so that it
is comparable to the MCR.
Creating the Database
Here is an example of a CREATE DATABASE statement for an 8-core Fast Track Data
Warehouse system, with 16 data LUNs.
CREATE DATABASE FT_Demo ON PRIMARY Filegroup FT_Demo
( NAME = N 'FT_Demo_.mdf' , FILENAME = N'C:\FT\PRI\SE1-SP1-DG1-v1' , SIZE = 100MB ,
FILEGROWTH = 0 ),

Page 44


( NAME = N 'FT_Demo_v1.ndf' , FILENAME = N'C:\FT\PRI\SE1-SP1-DG1-v1' , SIZE = 417GB ,
FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v2.ndf' , FILENAME = N'C:\FT\PRI\SE1-SP1-DG2-v2' , SIZE = 417GB ,
FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v3.ndf' , FILENAME = N'C:\FT\PRI\SE1-SP2-DG3-v3' , SIZE = 417GB ,
FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v4.ndf' , FILENAME = N'C:\FT\PRI\SE1-SP2-DG4-v4' , SIZE = 417GB ,
FILEGROWTH = 0 ),

( NAME = N 'FT_Demo_v6.ndf' , FILENAME = N'C:\FT\PRI\SE2-SP1-DG6-v6' , SIZE = 417GB ,
FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v7.ndf' , FILENAME = N'C:\FT\PRI\SE2-SP1-DG7-v7' , SIZE = 417GB ,
FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v8.ndf' , FILENAME = N'C:\FT\PRI\SE2-SP2-DG8-v8' , SIZE = 417GB ,
FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v9.ndf' , FILENAME = N'C:\FT\PRI\SE2-SP2-DG9-v9' , SIZE = 417GB ,
FILEGROWTH = 0 ),

( NAME = N 'FT_Demo_v11.ndf' , FILENAME = N'C:\FT\PRI\SE3-SP1-DG11-v11' , SIZE = 417GB
, FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v12.ndf' , FILENAME = N'C:\FT\PRI\SE3-SP1-DG12-v12' , SIZE = 417GB
, FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v13.ndf' , FILENAME = N'C:\FT\PRI\SE3-SP2-DG13-v13' , SIZE = 417GB
, FILEGROWTH = 0 ),
( NAME = N 'FT_Demo_v14.ndf' , FILENAME = N'C:\FT\PRI\SE3-SP2-DG14-v14' , SIZE = 417GB
, FILEGROWTH = 0 ),


LOG ON
( NAME = N 'FT_LOG_v5.ldf' , FILENAME = N 'C:\FT\LOG\SE1-SP2-DG5-v5' , SIZE = 100GB ,
MAXSIZE = 500GB , FILEGROWTH = 50 )
GO


/*****************Configure recommended settings***********************/
ALTER DATABASE FT_Demo SET AUTO_CREATE_STATISTICS ON
GO

ALTER DATABASE FT_Demo SET AUTO_UPDATE_STATISTICS ON
GO

ALTER DATABASE FT_Demo SET AUTO_UPDATE_STATISTICS_ASYNC ON
GO

ALTER DATABASE FT_Demo SET RECOVERY SIMPLE
GO

sp_configure 'show advanced options', 1
go

reconfigure with override
go

/********Make sure all tables go on our filegroup and not the Primary filegroup****/
ALTER DATABASE FT_Demo
MODIFY FILEGROUP FT_Demo
DEFAULT
GO
Creating the Test Tables
Here is an example CREATE TABLE statement.
CREATE TABLE lineitem

Page 45


( l_orderkey bigint not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity float not null,
l_extendedprice float not null,
l_discount float not null,
l_tax float not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate datetime not null,
l_commitdate datetime not null,
l_receiptdate datetime not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(132) not null
)
ON FT_Demo
GO


CREATE CLUSTERED INDEX cidx_lineitem
ON lineitem(l_shipdate ASC)
WITH( SORT_IN_TEMPDB = ON
, DATA_COMPRESSION = PAGE
)
ON FT_Demo
GO
Loading Data for BCR Measurement
As described elsewhere in this document, Fast Track Data Warehouse systems are sensitive to
the fragmentation of database files. Use one of the techniques this document describes to load
data. During FTDW testing, the clustered index load method described as option 2 was used.
Using the TPC-H datagen tool, lineitem table data was generated to a size of 70 GB, using
options -s100, generating the file in 8 parts, and using the S and C options.
Trace flag 610 was set during all load operations to use minimal logging where possible.
Using BULK INSERT, this data was inserted in parallel into a single clustered index staging
table, using minimal logging; we chose a block size that would not overwhelm available memory
and that would reduce spillage to disk. Disabling page locks and lock escalation on the staging
table improved performance during this phase.
A final insert was performed into an identical target table, with MAXDOP 1 (using the TABLOCK
hint) and avoiding a sort.
Running Queries for BCR Measurement
Use the SQL Server Profiler tool to record relevant information for query benchmarks. SQL
Server Profiler should be set up to record logical reads, CPU, duration, database name, schema
name, the SQL statement, and the actual query plans. Alternatively the statistics session
parameters set statistics io on and set statistics time on can be used.

Page 46


Here are a few example queries (based on queries from the TPC-H benchmark) and the BCR
achieved on reference systems.
Query
Complexity
Per-core BCR
(Page Compressed )
at MAXDOP 4
Simple 201 MB/s
Average 83 MB/s
Complex 56 MB/s

Simple
SELECT
sum(l_extendedprice * l_discount) as revenue
FROM
lineitem
WHERE
l_discount between 0.04 - 0.01 and 0.04 + 0.01 and
l_quantity < 25
OPTION (maxdop 4)

Average
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count_big(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= dateadd(dd, -90, '1998-12-01')
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus
OPTION (maxdop 4)

Complex

Page 47


SELECT
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice*(1-l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as
promo_revenue
FROM
lineitem,
part
WHERE
l_partkey = p_partkey
and l_shipdate >= '1995-09-01'
and l_shipdate < dateadd(mm, 1, '1995-09-01')
OPTION (maxdop 4)


Factors Affecting Query Consumption Rate
Not all queries will achieve the Maximum CPU Consumption Rate (MCR) or Benchmark
Consumption Rate (BCR). There are many factors that can affect the consumption rate for a
query. Queries simpler than the workload used to generate the consumption rate will have
higher consumption rates, and more complex workloads will have lower consumption rates.
Many factors that can affect this complexity and the consumption rate, for example:
Query complexity: The more CPU intensive the query is, for example, in terms of
calculations and the number of aggregations, the lower the consumption rate.
Sort complexity: Sorts from explicit order by operations or group by operations will
generate more CPU workload and decrease consumption rate. Additional writes to
tempdb caused by such queries spilling to disk negatively affect consumption rate.
Query plan complexity: The more complex a query plan, the more steps and
operators, the lower the CPU consumption rate will be, because each unit of data is
processed through a longer pipeline of operations.
Compression: Compression will decrease the consumption rate of data in real terms,
because consumption rate by definition is measured for queries that are CPU bound,
and decompression consumes CPU cycles. However, the increased throughput benefits
usually outweigh additional CPU overhead involved in compression, unless the workload
is highly CPU intensive. When comparing consumption rates for compressed and
uncompressed data, take the compression factor into account. Another way of looking at
this is to think of the consumption rate in terms of rows per second.
Data utilization: Discarding data during scans (for example, through query projection
and selection) is quite an efficient process. Queries that use all the data in a table have
lower consumption rates, because more data is processed per unit data throughput.

You might also like