You are on page 1of 240

Designing Microsoft®

SQL Server™ 2005


Databases
Delivery Guide
Course Number: 2782A

MCT USE ONLY. STUDENT USE PROHIBITED


Beta
Information in this document, including URL and other Internet Web site references, is subject to
change without notice. Unless otherwise noted, the example companies, organizations, products,
domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious,
and no association with any real company, organization, product, domain name, e-mail address,
logo, person, place or event is intended or should be inferred. Complying with all applicable
copyright laws is the responsibility of the user. Without limiting the rights under copyright, no
part of this document may be reproduced, stored in or introduced into a retrieval system, or
transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual property.

© 2004 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, BizTalk, Excel, MSDN, PowerPoint, Visio, Visual Basic, VIsual C#,
Visual SourceSafe, Visual Studio, Windows, and Windows Server are either registered trademarks
or trademarks of Microsoft Corporation in the U.S.A. and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their
respective owners.

Beta

Course Number:

MCT USE ONLY. STUDENT USE PROHIBITED


Part Number:
Released:
Module 0: Introduction
Time estimated: 30 minutes
Presentation: 30 minutes
Table of contents
Module 0: Introduction ........................................................................................................................... 1
At the end of this module, you will be able to describe this course and its purpose................... 2
Introduction ........................................................................................................................................ 3
Introduce yourself, and provide a brief description of your background. ................................... 3
Course Materials ............................................................................................................................. 4
Identify and describe the course materials .................................................................................. 4
Microsoft Learning Product Types ................................................................................................. 6
Facilities.......................................................................................................................................... 8
Inform students of class logistics and rules for the training site ................................................. 8
Microsoft Learning ......................................................................................................................... 9
Describe other Microsoft learning courses related to this one .................................................... 9
Microsoft Certified Professional Program .................................................................................... 10
[Click here to type objective text] ............................................................................................ 10
About This Course ........................................................................................................................ 13
Course Outline .............................................................................................................................. 15
Describe the course outline ........................................................Error! Bookmark not defined.
Setup ............................................................................................................................................. 17
Describe the student computer configuration for this course.................................................... 17
Demonstration: Using Virtual PC ................................................................................................. 19
Describe how to use Virtual PC ................................................................................................ 19
Introduction to Adventure Works Cycles ..................................................................................... 21
Describe the fictitious company used in the lab scenarios ........................................................ 23

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 2

At the end of this module, you will be able to describe this course and its purpose.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 3

Introduction

Introduce yourself, and provide a brief description of your background.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 4

Course Materials

Identify and describe the course materials


Course kit
The following materials are included with your kit:
Name card
Write your name on both sides of the name card.
Student workbook
The student workbook contains the material covered in class, in addition to the hands-on lab
exercises.
Student Materials compact disc
The Student Materials compact disc (CD) contains the Web page that provides links to resources
pertaining to this course, including additional reading, review and lab answers, lab files,
multimedia presentations, and course-related Web sites. To open the Web page, insert the Student
Materials CD into the CD-ROM drive, and then, in the root directory of the CD, double-click
Autorun.exe or Default.htm.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 5

Course evaluation
You will have the opportunity to provide feedback about the course, training facility, and
instructor by completing an online evaluation near the end of the course.
Document conventions
The following conventions are used in course materials to distinguish elements of the text.

Convention Use
Bold Represents commands, command options, and syntax that
must be typed exactly as shown. It also indicates commands
on menus and buttons, and indicates dialog box titles and
options, and icon and menu names.
Italic In syntax statements or descriptive text, indicates argument
names or placeholders for variable information. Italic is also
used for introducing new terms, for book titles, and for
emphasis in the text.
Title Capitals Indicate domain names, user names, computer names,
directory names, and folder and file names, except when
specifically referring to case-sensitive names. Unless
otherwise indicated, you can use lowercase letters when you
type a directory name or file name in a dialog box or at a
command prompt.
ALL CAPITALS Indicate the names of keys, key sequences, and key
combinations — for example, ALT+SPACEBAR.
try/Try Keywords in C# and Microsoft® Visual Basic® .NET are
separated by a forward slash when casing differs.
monospace Represents code samples or examples of screen text.
[] In syntax statements, enclose optional items. For example,
[filename] in command syntax indicates that you can choose
to type a file name with the command. Type only the
information within the brackets, not the brackets themselves.
{} In syntax statements, enclose required items. Type only the
information within the braces, not the braces themselves.
| In syntax statements, separates an either/or choice.

Ç Indicates a procedure with sequential steps.

... In syntax statements, specifies that the preceding item may


be repeated. It also represents an omitted portion of a code
sample.

Providing feedback
To provide additional comments or feedback about the course, send e-mail to
support@mscourseware.com. To ask about the Microsoft Certification program, send e-mail to
mcphelp@microsoft.com.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 6

Microsoft Learning Product Types

[Click here to type objective text]


Microsoft Learning product types
Microsoft Learning offers four instructor-led Official Microsoft Learning Product types. Each type is
specific to a particular audience and level of experience. The various product types also tend to suit
different learning styles. These types are as follows:
Courses are for information technology (IT) professionals and developers who are new to a particular
product or technology and for experienced individuals who prefer to learn in a traditional classroom
format. Courses provide a relevant and guided learning experience that combines lecture and practice
to deliver thorough coverage of a Microsoft product or technology. Courses are designed to address
the needs of learners engaged in the planning, design, implementation, management, and support
phases of the technology adoption lifecycle. They provide detailed information by focusing on
concepts and principles, reference content, and in-depth, hands-on lab activities to ensure knowledge
transfer. Typically, the content of a course is broad, addressing a wide range of tasks necessary for the
job role.
Workshops are for knowledgeable IT professionals and developers who learn best by doing and
exploring. Workshops provide a hands-on learning experience in which participants can use Microsoft
products in a safe and collaborative environment based on real-world scenarios. Workshops are the
learning products in which students learn by doing through scenario and through troubleshooting
hands-on labs, targeted reviews, information resources, and best practices, with instructor facilitation.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 7

Clinics are for IT professionals, developers, and technical decision makers. Clinics offer a detailed
presentation that may describe the features and functionality of an existing or new Microsoft product
or technology, provide guidelines and best practices for decision making, and/or showcase product
demonstrations and solutions. Clinics focus on how specific features will solve business problems.
Stand-alone Hands-On Labs provide IT professionals and developers with hands-on experience with
an existing or new Microsoft product or technology. Hands-on labs provide a realistic and safe
environment to encourage knowledge transfer by learning through doing. The labs provided are
completely prescriptive so that no lab answer keys are required. There is very little lecture or text
content provided in hands-on labs, aside from lab introductions, context setting, and lab reviews.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 8

Facilities

Inform students of class logistics and rules for the training site

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 9

Microsoft Learning

Describe other Microsoft learning courses related to this one


Introduction
Microsoft Learning develops Official Microsoft Learning Products for computer professionals who
use Microsoft products and technologies to design, develop, support, implement, or manage solutions.
These learning products provide comprehensive, skills-based training in instructor-led and online
formats.
Related courses
Each course relates in some way to another course. A related course might be a prerequisite, a follow-
up course in a recommended series, or a course that offers additional training.
Other related courses might become available in the future, so for up-to-date information about
recommended courses, visit the Microsoft Learning Web site.
Microsoft Learning information
For more information, visit the Microsoft Learning Web site at http://www.microsoft.com/learning/.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 10

Microsoft Certification Program

[Click here to type objective text]


Introduction
Microsoft Learning offers a variety of certification credentials for developers and IT professionals.
The Microsoft Certification Program (MCP) program is the leading certification program for
validating your experience and skills, keeping you competitive in today’s changing business
environment.
Related certification exams
This course helps students to prepare for:
MCP certifications
The Microsoft Certification program includes the following certifications.
MCDST on Microsoft Windows®
The Microsoft Certified Desktop Support Technician (MCDST) certification is designed for
professionals who successfully support and educate end users and troubleshoot operating system
and application issues on desktop computers running the Windows operating system.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 11

MCSA on Microsoft Windows Server™ 2003


The Microsoft Certified Systems Administrator (MCSA) certification is designed for
professionals who implement, manage, and troubleshoot existing network and system
environments based on the Windows Server 2003 platform. Implementation responsibilities
include installing and configuring parts of systems. Management responsibilities include
administering and supporting systems.
MCSE on Microsoft Windows Server 2003
The Microsoft Certified Systems Engineer (MCSE) credential is the premier certification for
professionals who analyze business requirements and design and implement infrastructure for
business solutions based on the Windows Server 2003 platform. Implementation responsibilities
include installing, configuring, and troubleshooting network systems.
MCAD
The Microsoft Certified Application Developer (MCAD) for Microsoft .NET credential is
appropriate for professionals who use Microsoft technologies to develop and maintain
department-level applications, components, Web or desktop clients, or back-end data services, or
who work in teams developing enterprise applications. This credential covers job tasks ranging
from developing to deploying and maintaining these solutions.
MCSD
The Microsoft Certified Solution Developer (MCSD) credential is the premier certification for
professionals who design and develop leading-edge business solutions with Microsoft
development tools, technologies, platforms, and the Microsoft Windows DNA architecture. The
types of applications that MCSDs can develop include desktop applications and multiuser, Web-
based, N-tier, and transaction-based applications. The credential covers job tasks ranging from
analyzing business requirements to maintaining solutions.
MCDBA on Microsoft SQL Server™ 2000
The Microsoft Certified Database Administrator (MCDBA) credential is the premier certification
for professionals who implement and administer SQL Server databases. The certification is
appropriate for individuals who derive physical database designs, develop logical data models,
create physical databases, create data services by using Transact-SQL, manage and maintain
databases, configure and manage security, monitor and optimize databases, and install and
configure SQL Server.
MCP
The Microsoft Certification Program (MCP) credential is for individuals who have the skills to
successfully implement a Microsoft product or technology as part of a business solution in an
organization. Hands-on experience with the product is necessary to successfully achieve
certification.
MCT
Microsoft Certified Trainers (MCTs) demonstrate the instructional and technical skills that qualify
them to deliver Official Microsoft Learning Products through a Microsoft Certified Partner for
Learning Solutions (CPLS).

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 12

Certification requirements
Certification requirements differ for each certification category and are specific to the products and
job functions addressed by the certification. The Microsoft Certification Program requires that you
pass rigorous certification exams that provide a valid and reliable measure of technical proficiency
and expertise.

For More Information


See the Microsoft Learning Web site at http://www.microsoft.com/learning/.

You can also send e-mail to mcphelp@microsoft.com if you have specific certification questions.
Acquiring the skills tested by an MCP exam
Official Microsoft Learning Products can help you develop the skills that you need to do your job.
They also complement the experience that you gain while working with Microsoft products and
technologies. However, no one-to-one correlation exists between Official Microsoft Learning
Products and MCP exams. Microsoft does not expect or intend for the courses to be the sole
preparation method for passing MCP exams. Practical product knowledge and experience is also
necessary to pass MCP exams.
To help prepare for MCP exams, use the preparation guides that are available for each exam. Each
Exam Preparation Guide contains exam-specific information, such as a list of the topics on which you
will be tested. These guides are available on the Microsoft Learning Web site at
http://www.microsoft.com/learning/.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 13

About This Course

Describe the audience prerequisites and objectives for this course


Description
The purpose of this course is to teach database developers working in enterprise environments to
design databases, using business requirements to guide their decisions (beyond structured third normal
form [3NF] modeling techniques). Students will also learn to incorporate security requirements
throughout their design.
Audience
The audience of this course is professional-level database developers.
Course prerequisites
This course has the following prerequisites:
• Experience with reading user requirements and business-need documents. For example,
development project vision and mission statements or business analysis reports.
• Experience with reading and drawing business process flow charts.
• A basic knowledge of Transact-SQL syntax and programming logic.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 14

• Experience with professional-level database design. Specifically, you must fully


understand third normal form (3NF), be able to design a database to 3NF (fully normalized),
and know the tradeoffs when backing out of the fully normalized design (denormalization)
and designing for performance and business requirements. You must also be familiar with
design models, such as Star and Snowflake schemas.
• A basic monitoring and troubleshooting skills.
• A basic knowledge of the operating system and platform. You should be familiar with
how the operating system integrates with the database, what the platform or operating system
can do, and how interaction between the operating system and the database works.
• A basic knowledge of application architecture. You should understand how applications
can be designed in three layers, what applications can do, how interaction between the
application and the database works, and how the interaction between the database and the
platform or operating system works.
• A working knowledge of how to use a data modeling tool.

Course objectives
After completing the course, you will be able to:
• Approach database design from a systematic perspective, gather database requirements, and
formulate a conceptual design.
• Analyze and evaluate a logical database design.
• Apply best practices for creating a physical database design.
• Apply best practices when designing for database scalability.
• Apply best practices for designing a database access strategy.
• Use best practices to model database dependencies.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 15

Course Outline

Course outline
Module 1: Approaching Database Design Systematically focuses on the guidelines and best practices
that should use for gathering requirements and formulating conceptual model.
Module 2: Modeling a Database at the Logical Level teaches you how to choose the relational model
and emphasize the best practices for designing the relational model. The best practices covered in the
module include normalization for online transaction processing (OLTP) systems and designing star
and snowflake schemas for relational dimensional systems that will eventually support online
analytical processing (OLAP) databases.
Module 3: Modeling a Database at the Physical Level provides you with the knowledge of how to
translate the logical model into SQL Server 2005. The module covers the guidelines and
considerations for designing physical database objects, constraints, database and server options, and
database security. The module also covers guidelines for data migration for existing or legacy data.
Module 4: Designing Databases for Performance covers the guidelines and best practices for revising
a physical design to include performance and optimization considerations.
Module 5: Designing a Database Access Strategy covers the physical design of database objects (such
as stored procedures and functions) which do not define data, but allow the user to access data. The
module covers the guidelines for designing the secure data access, User-Defined Functions (UDFs),
and stored procedures.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 16

Module 6: Modeling Database Dependencies covers the best practices for modeling local and remote
dependencies of the database.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 17

Setup

Describe the student computer configuration for this course


Virtual PC configuration
In this course, you will use Microsoft Virtual PC 2004 to perform the hands-on practices and labs.
There is one virtual machine for each module, and the virtual machines are named 2782A-MIA-SQL -
nn, where nn is the module number.

Important
If, when performing the hands-on activities, you make any changes to the virtual machine and do not
want to save them, you can close the virtual machine without saving the changes. This will take the
virtual machine back to the most recently saved state. To close a virtual machine without saving the
changes, perform the following steps: 1. On the virtual machine, on the Action menu, click Close. 2.
In the Close dialog box, in the What do you want the virtual machine to do? list, click Turn off
and delete changes, and then click OK.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 18

Software configuration
The classroom computers use the following software:
• Microsoft Windows Server 2003
• Microsoft SQL Server 2005
• Microsoft Office 2003
• Microsoft Visual Studio 2005 Team Developer Edition
• Microsoft Visio® for Enterprise Architects

Course files
There are files associated with the practices and labs in this course. The files are located on each
student computer, on drive D:\
Classroom setup
Each classroom computer will have the same virtual machine configured in the same way. Windows
Server 2003 is installed in a workgroup, and has the server name MIA-SQL. There is one instance of
SQL Server 2005 installed named SQLINST1.
Course hardware level
To ensure a satisfactory student experience, Microsoft Learning requires a minimum equipment
configuration for trainer and student computers in all Microsoft Certified Partner for Learning
Solutions (CPLS) classrooms in which Official Microsoft Learning Products are used. This course
requires computers that meet or exceed the following specification:

Component Requirement

Processor Pentium III or equivalent personal computer with processor speed greater than or equal to 1 GHz

Hard Disk At least 18 GB 7200 RPM; Larger drives are recommended where storage of multiple VPC courses is
desired.

RAM At least 1 GB

DVD/CD CD-ROM/DVD

Network Adapter 10/100 Mb/s required** full duplex

Sound Card yes

Video Adapter At least 4 MB

Monitor Super VGA monitor (17 inch/ 43 cm)

Ports PCI bus must meet 2.1 specs

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 19

Demonstration: Using Virtual PC

Describe how to use Virtual PC


Virtual PC demonstration
In this demonstration, your instructor will help familiarize you with the Virtual PC environment in
which you will work to complete the practices and labs in this course. You will learn:
• How to start Virtual PC.
• How to start a virtual machine.
• How to log on to a virtual machine.
• How to switch between full screen and window modes.
• How to distinguish the virtual machines that are used in the practices for this course.
• That the virtual machines can communicate with each other and with the host computer, but
they cannot communicate with computers that are outside of the virtual environment. (For
example, no Internet access is available from the virtual environment.)
• How to close Virtual PC.
Keyboard shortcuts
While working in the Virtual PC environment, you might find it helpful to use keyboard shortcuts. All
Virtual PC shortcuts include a key that is referred to as the HOST key or the RIGHT-ALT key. By
default, the HOST key is the ALT key on the right side of your keyboard. Some useful shortcuts
include:

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 20

• RIGHT-ALT+DELETE to log on to the Virtual PC.


• RIGHT-ALT+ENTER to switch between full-screen and window modes.
• RIGHT-ALT+RIGHT ARROW to display the next virtual machine.
For more information about using Virtual PC, see Virtual PC Help.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 21

What Matters Most?

[objective]
What matters most in this course
This table captures the most important information that you should take away from this course.

Category What matters most

Most important • The architecture of a database server solution.


conceptual • The tradeoffs and choices that need to be made when planning the
knowledge and infrastructure.
understanding
• Methods of archiving and backing up your data, and the benefits of
different methods.

Most important • Given a list of constraints and business requirements, interpret business
problems to requirements and design a disaster recovery plan.
solve or skills to • Evaluate an existing database server infrastructure and recommend
demonstrate in improvements.
the classroom
• Communicate what the business will and will not get within a specified
budget to non-technical business decision makers.
• Evaluate a set of requirements and design a server infrastructure, and

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 22

servers to meet business needs.


• Evaluate an existing database system and recommend improvements.
• Design a database system’s backup and recovery strategy.

Most important • Database server infrastructure design document


products to • Database standards, conventions, and policies document (could be one or
create during more than one document)
the course
• Disaster recovery plan document

Dispositions • Professional database administrators who design Microsoft SQL Server


(attitudes, 2005 infrastructures should:
interests, beliefs) • Demonstrate the ability to extract requirements from different people.
that might
contribute to • Be willing to try different strategies, technologies, and so on.
success on the • Be meticulous (attentive to details).
job

Tips for getting the most out of this course.


If, as the course progresses, you feel that you have not adequately learned something mentioned in this
table, ask questions of the instructor and your peers until you are satisfied that you understand the
concepts and know how to do these important tasks.
After class each day, review the materials, highlight key ideas and notes, and create a list of questions
to ask the next day.
Your instructor and peers will be able to suggest additional, up-to-date resources for more
information. Ask them about additional resources and record their ideas so that you can continue
learning after this course is over.
As soon as possible after this course is over, share this information with your manager, and discuss
next steps. Your manager should understand that the sooner you apply these skills on the job, the more
likely you will be to remember them long term. However, a person cannot learn everything that there
is to know about this complex job task in a 2-day course, so schedule some additional time to continue
your learning by reading the supplementary materials mentioned throughout the course, offered by
your instructor and peers during the course, and included on the Student Materials compact disk.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 23

Introduction to Adventure Works Cycles

Describe the fictitious company used in the lab scenarios


Your role in Adventure Works Cycles
Throughout this course, you will perform the role of a lead database designer in Adventure Works
Cycles. You will perform database designer tasks based on the instructions and specifications given to
you by the company’s management team. You will work on a new project named HR VASE
(Vacation and Sick Leave Enhancement) that is targeted towards enhancing the current Human
Resources (HR) system. You will perform design changes to the company’s AdventureWorks
database. The major goals of the HR VASE project are:
• Provide managers with current and historical information about employee vacation and sick
leave data.
• Provide individual employees the means to view their vacation and sick leave balances.
• Provide certain workers in the HR department the ability to view and update employee salary
data.
• Provide certain workers in the HR department the ability to view and update employee sick
leave, and vacation data.
• Provide the HR manager the ability to view and update all of the data.
• Standardize employee job titles.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 0: Introduction 24

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design
Systematically
Time estimated: 105 Minutes
Table of contents
Module 1: Approaching Database Design Systematically ...................................................................... 1
Approach database design from a systematic perspective, gather database requirements, and
formulate a conceptual design. ................................................................................................... 2
Lesson 1: Overview of Database Design ............................................................................................ 3
Key Steps in the Database Design Process ..................................................................................... 4
Best Practices for Database Design................................................................................................. 6
Best Practices for Managing the Scope of a Database Design Project............................................ 8
Discussion: Lessons Learned in Database Design ........................................................................ 10
Lesson 2: Gathering Database Requirements.................................................................................... 12
Strategies for Identifying Database Requirements........................................................................ 13
Best Practices for Documenting Database Requirements ............................................................. 15
Considerations for Modifying an Existing Database .................................................................... 17
Lesson 3: Creating a Conceptual Database Design........................................................................... 19
Considerations for Choosing a Conceptual Modeling Methodology ............................................ 20
Guidelines for Conceptual Modeling Using ORM........................................................................ 22
Guidelines for Conceptual Modeling Using ER ........................................................................... 24
Guidelines for Conceptual Modeling Using UML........................................................................ 26
Lab 1: Beginning the Database Design Process................................................................................ 29
Exercise 1: Gathering Database Requirements ........................................................................... 31
Exercise 2: Creating a Conceptual Database Design .................................................................. 33

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 2

Module objective:
After completing this module, students will be able to:

Approach database design from a systematic perspective, gather database requirements, and formulate
a conceptual design.

Introduction
Quality in a data model does not happen by accident. A good database designer insists on quality
control when creating the design and gathering database requirements.
The conceptual design is the first product in the design process after gathering requirements. In the
conceptual design phase, you focus on conceptual objects that define your business data objects.
However, you should keep the user’s perspective in mind and not commit to a particular type of
database system.
In this module, you will acquire the skills to approach database design with a systematic perspective.
A systematic approach involves formulating your database design process, following guidelines on
how to gather and document database requirements, and following best practices when formulating a
conceptual design.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 3

Lesson 1: Overview of Database Design

Lesson objective:
After completing this lesson, students will be able to:

Apply a systematic approach to database design.


Introduction
A successful database design requires you to do more than just create a database diagram. To create a
successful database design, you must follow a systematic approach including the following steps:
• Create a conceptual model.
• Transform the conceptual model into a logical model.
• Implement the design by using a physical data model.
In this lesson, you will learn the key steps involved in designing a database and the best practices to be
adopted during a database design. This lesson will also cover the best practices for managing the
scope of a database design project. At the end of the lesson, you will share your experiences with
other database professionals about the design process and the consequences of approaching database
design without a clear process.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 4

Key Steps in the Database Design Process

Process: Identify the key steps that are included in the process of designing a database.
Introduction
When progressing from gathering user requirements to a final physical database model, you will find
that it is risky to take shortcuts. When important steps are skipped, critical pieces of information are
missed or lost. Consequently, the resulting database will require you to make frequent changes to meet
the requirements.
Such modifications to the database can create great uncertainty for the rest of your development team.
Often, the database is the application's foundation, and changes to the database could require changes
to higher application layers that rely upon the database. Therefore, you should follow every step of the
database design process. If you skip any step, you should have valid reasons for doing so.
Quoting a Boehm and Papaccio study, Steve McConnell states in Rapid Development that “getting a
requirement in the first place costs 50 to 200 times less than waiting until construction or maintenance
to get it right (Boehm and Papaccio 1988). The typical project experiences a 25% change in
requirements” (page 62).
Key steps in the database design process
The following are the key steps in the database design process:
• Identify and document database requirements
Research, formulate, and document requirements of customers and users. Do not overlook
requirements captured in previous project phases.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 5

• Create a conceptual model


Define the essential relationships and flow of business data objects based on the requirements
gathered. The conceptual model is based on users’ perspectives and the views of customers.
• Transform the conceptual model into a logical model
Design the database from the development team's perspective. You need to decide the type of
database management system (DBMS)—relational, object/relational, or other—that you will
use. If you choose a relational DBMS, identify the functional dependencies, and ensure an
appropriate degree of normalization.
• Implement a physical model from the logical model
The physical model is a refinement of the logical model based on the workings of the DBMS
that you will actually use. Often, the logical design overlaps the physical model, but the
physical model helps you to determine how you will actually implement data types, table
structures, and physical constraints.
• Refine the physical model
It is often difficult to get the physical model right the first time. Usually, you need to refine
the design to adapt to requirements such as performance, security, auditing, availability, and
scalability.
• Build a prototype and test the database design
A prototype helps you validate and test your model on a very small scale. Any problems that
you experience with a prototype can help you refine your design. The prototype can also
serve as the foundation for the rest of your testing plans. When you reach a stage at which the
physical design looks stable, you should develop a full test version of the database that can be
used by application developers.
For more information
“Modeling a Database at the Logical Level” and “Modeling a Database at the Physical Level” are
covered later in the course.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 6

Best Practices for Database Design

Principle: Use best practices to effectively choose tools and techniques when planning the design
phase of database development.
Introduction
Every database designer would adopt different approaches for designing a database. Sharing your
experiences with other designers, or learning from their experiences will help you avoid errors that
might occur during the design.
To minimize errors, you can follow certain best practices that have been beneficial for designers.
Best practices for database design
Consider the following recommended practices in your own database design process:
• Ensure that you have a clear definition of the database portion of the project scope.
• Verify that the database design goals are within the appropriate project scope.
• Choose a database design methodology. Every design methodology has advantages and
disadvantages. To create the conceptual view, you should analyze the advantages and
disadvantages of the different methodologies, and choose one that meets your requirements.
• Choose a professional data-modeling tool that supports the chosen database design
methodology. Ensure that you become familiar with the tool before beginning the actual
modeling. This will help you avoid expensive errors, such as an inappropriate definition of
the model.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 7

For example, if you are not familiar with the tool, you might not enter all the required data
into the tool, thereby creating an incomplete model. Consider another example. You discover
that the tool you are using is not appropriate for the design model you have chosen, and that
you need to use a different one. Changing the data-modeling tool in the middle of a design
process can prove to be costly for your design.
• Always begin with a conceptual model; do not jump into the logical or physical model
without the conceptual model. The conceptual model is closer to requirements and makes the
problem more transparent.
• Use a source control system for documents and database scripts. A source control system
helps you avoid costly mistakes.
• Ensure that the application features of the existing database are not lost in the new design.
Normally, users dislike applications that have fewer features than previous versions.
• Weigh business needs and priorities against the ideal design and its quality. Sometimes, it is
reasonable to break the rules to meet the users’ requirements or business rules. For example,
you may need to provide some denormalized columns for performance reasons. You may also
discover that you need to interface a new application design with an existing database that
cannot be changed. To do so, you will need to incorporate tables that are not well designed.
Note
Microsoft® Office Visio® for Enterprise Architects is a tool that provides support for data modeling at
all levels and will be used as the sample design tool for this course.

For more information


In the lesson Creating a Conceptual Database Design, you will learn more about the guidelines for the
three different conceptual design methodologies.

For more information


For more information about MSF best practices, see Course 1846, Microsoft Solutions Framework
Essentials.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 8

Best Practices for Managing the Scope of a Database Design Project

Principle: Manage the scope of a database design project.


Introduction
Unmanaged scope changes are one of the main obstacles to the success of IT projects. This is the main
obstacle in the database design process as well.
The design scope places a boundary around the database design task, containing details of what will
be covered and what will not be covered. It is critical that you manage the scope of your database
design project. If you fail to do so, the risk of your project failing increases significantly.
Best practices
Consider the following best practices when managing the scope of a database design project:
• Obtain an authoritative statement of the project scope
An authoritative statement of the project scope contains details of the customer’s
requirements for the project and the problems that you aim to solve with the project.
You could be solely responsible for creating the project scope, or you could be actively
involved in its creation. After the project scope is created, you will need to present it to the
stakeholders for their approval. The project scope statement helps you stay focused on the
business problem.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 9

• Ensure that the scope is realistic, feasible, and consistent


In the planning phase, you need to seek answers to the “what, how, and when it will be built”
questions. You must ensure that the project scope is realistic and feasible.
• Isolate the database portion of the project scope
The database portion of the project scope will contain details of what will and what will not
be in the database. Business rules are a good example in this context. You should not try to
force all the business rules into the database.
• Clearly identify scope boundaries
The scope boundaries will contain details of whether you should solve a problem in a
business component or in the database. Often, your software architect has the choice of
solving the problem in the database or in a business component.
• Identify the stakeholders and decision makers
Stakeholders and decision makers will measure the effectiveness and success of your work,
regardless of their technical background. You need to ensure that the scope meets the
requirements of the stakeholders and decision makers.
• Implement a change-control process
Change requests occur in every real-world project. If changes are not allowed, the finished
product could meet all the documented requirements, but might not solve the actual business
need. However, if change requests are not controlled, users may increase project requirements
considerably beyond what was originally foreseen. As a result, there might be risks to the
product quality or schedule.
You must make stakeholders aware of the fact that adding features to the project scope will
cost time and resources.
• Identify database deliverables
Milestones and deliverables enable you to show your customers the progress of your projects,
thereby increasing the project’s visibility to customers and stakeholders. Deliverables are the
proof of the work done and are the basis for the next phase.
• Identify risk factors
Risk is inherent in any IT project. It is your responsibility to identify the risks early, because
it forms the foundation for proactive risk management. By identifying risk factors, you can
anticipate and avoid problems rather than react to them.
• Compile all available scoping information into a single document
A single document with all available scoping information reduces the risk of unmanaged
scope changes. A single document also helps you in version control and change management.

Note
MSF offers you two tools to avoid scope creep (the unmanaged expansion of a project’s scope)—
Trade-off Triangle and Project Trade-off Matrix. The Trade-off Triangle represents the relationship
between resources, schedule, and features. The Project Trade-off Matrix is an agreement between the
development team and the customer on the default priorities when making trade-off decisions.

For more information


For further information about MSF Risk Management Discipline, see Module 3: “Managing Product
Risks” in Course 1846, Microsoft Solutions Framework Essentials.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 10

Discussion: Lessons Learned in Database Design

Principle: Learn important principles of database design by discussing design experiences and
approaches with other database developers.
Discussion questions
Q What are the characteristics of a successful database design?
A Answers will vary, but characteristics will generally include issues of a well-managed
project scope, sufficient budget, stakeholder involvement, and most often successful
requirements documents.
Q What kinds of problems or failures have you experienced in database design?
A Answers will vary.
Q What kinds of solutions did you formulate for the problems you experienced in the design
process?
A Answers will vary.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 11

Q Do you agree with the importance of following steps in a sequence, from conceptual to
logical, and then to a physical design?
A Answers will vary. Some of the students are experienced but not formally trained, and
will have their own ways to articulate various steps. For example, some students might
"do everything in their head" and then jump to a physical design. This is not a good
approach at an enterprise level, but it is a valid practice for some students, and could
produce important inputs for the discussion.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 12

Lesson 2: Gathering Database Requirements

Lesson objective:
After completing this lesson, students will be able to:

Devise an appropriate strategy for gathering database requirements for a specified project.
Introduction
To create a good database design, it is essential to have a clear understanding of your requirements.
This lesson will provide guidelines to help you identify and document these requirements. Database
designers know that every aspect of the design process depends on the database requirements captured
in the requirements document. Therefore, you must ensure that your stakeholders clearly understand
the documented requirements and validate them.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 13

Strategies for Identifying Database Requirements

Principle: Apply a systematic strategy for gathering scope information for a database design project.
Introduction
To identify and capture database requirements, you will have to analyze information gathered through
different techniques. During this process, remember to capture the functional requirements (from
customers and users), as well as the operational requirements. Operational requirements include
configuration information and metadata.
Strategies for gathering scope information for a database design project
The following strategies should be adopted while gathering database requirements:
• Identify and interview domain experts
There are several techniques to gather information from domain experts, including interviews,
focus groups, and shadowing. You must include people who really know how the business
works.
• Isolate business objects, rules, and data flows
Business objects are real-world concepts, such as customers and invoices, and do not include
databases and windows applications. Business rules are constraints in which a business
structure is stated (for example, discounts must be under 15 percent). Data flows are
processes that represent interactions between external entities and the system (for example,
the system prints the invoice). In Unified Model Language (UML), data flows are closely
correlated to use cases and use case narratives.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 14

You will need to isolate each concept: business objects, rules, and data flows. Business
objects are particularly interesting to capture from the database point of view. One popular
strategy (based on the concepts by Russell Abbott in Program Design by Informal English
Descriptions) is to identify nouns and their associated verbs. In this context, nouns frequently
become objects; verbs become services; and full sentences with subject, verb, and direct
objects indicate the presence of a business object. Applying the same process to the text of
data flows can be valuable in identifying objects and services that may not have been
identified when documenting business objects.
• Establish a consistent naming standard for business objects and rules
Naming conventions help you to identify objects, thus facilitating rapid troubleshooting and
debugging. They also help you remove ambiguity and inconsistencies that may exist between
different business groups served by the project. For example, one business group may refer to
customers and another group to clients, and yet refer to the same object—someone who
purchases a service. It is a good practice to use one term for an object consistently throughout
your process.
• Determine the expected transaction rate and projected growth requirements
When you capture database requirements, remember to include operational requirements,
such as those related to performance and scalability. If you do this at a later stage, it will
result in frequent database redesign.
• Estimate current capacity and projected growth requirements
As part of the strategy to capture database requirements, you should estimate the database
capacity needs, archiving needs, and distribution needs.
For more information
For further information about transaction rates and projected growth, see course 2786, Designing
Microsoft SQL Server 2005 Infrastructure and Services.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 15

Best Practices for Documenting Database Requirements

Principle: Apply best practices when documenting database requirements.


Introduction
The requirements document you prepare acts as a contract between your project and your users, and it
helps you to define the scope of the project’s work. Often, changes to requirements are made later in
the project. In such situations, with the help of the requirements document, you realize that
requirement changes made at a later stage are far more expensive than if they were documented in the
first place.
Best practices
The following are the best practices you may adopt for documenting database requirements:
• Ensure that meeting the requirements will satisfy the database portion of the project
scope
The database must be fairly represented in the requirements document. Sometimes, the focus
is on services and processes that the solution provides, and this tends to minimize data
persistence.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 16

• Document requirements concisely and accurately


The document should be brief enough to retain the interest of non-technical readers and
precise enough to provide developers with all of the necessary information. Ensure that all
requirements are documented. Making changes to database requirements at a later stage might
be much more expensive than in an earlier stage.
In his book, Rapid Development, Steve McConnell states that "A defect that isn’t detected
upstream (during requirements or design) will cost from 10 to 100 times as much to fix
downstream (during testing)" (page 72).
• Clarify and revise ambiguous or unclear requirements
With the help of the information gathered from the requirements document, you can act as a
bridge between the users and the developers. Users state in the document what they expect
from the software (users’ point of view), and developers state what the software will do
(developers’ point of view). The language must be clear for both users and developers.
• Make the requirements document readable for stakeholders
When a requirements document is not readable, it becomes useless for stakeholders.
Remember that an excess of information, especially if written in a highly technical manner,
might mean that few stakeholders will read the document.
• Obtain explicit agreement from all stakeholders
The key word here is all. Different stakeholders will have different perspectives of the
software. It is important for all of them to agree with the content of the document.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 17

Considerations for Modifying an Existing Database

Principle: Identify important facts about an existing database to assess its impact on the requirements
for a database design project.
Introduction
A new database design provides you a lot of freedom in the design process. However, most of the
projects you work on will require you to modify existing databases. As a result, your options are
restricted.
New questions arise when you work with existing databases, such as: Where is the documentation? Is
it up-to-date? Who really understands it? Who owns the data? What changes can be made? What
changes should not be made? How should data be migrated?
Considerations for modifying an existing database
To answer the aforementioned questions and to cope with the challenges posed, when modifying an
existing database, take into account the following considerations:
• Identify database requirements related to the current project
Remember that there will be certain new functions in the software that you are creating. You
must keep the new functionality in mind.
• Review and validate existing database documentation
The rule here is: Trust, but check. Review the documentation because it will help you
understand the design, but always check the actual implementation. Often, software updates
are not reflected in the documentation.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 18

• Reverse engineer the existing database design if required


It is probably easier to create documentation than to rectify mistakes committed because of
inadequate documentation. If there is no documentation, you can reverse engineer the current
database design to create the documentation.
• Develop a deployment strategy
Deploying an application that modifies existing databases is a great risk to the project’s
success. This risk should be appropriately managed. Two common alternatives are:
• Modify in place. In this option, one application replaces the other. The production
environment system is turned off (users are disconnected); and SQL scripts, batch
scripts, and Server Integration Services packages are used to upgrade the database
schema.
Use this option when changes are minimal, fully tested, and it is too expensive to run
both systems in parallel. You must include a rollback or disaster recovery plan if the
deployment fails.
• Run in parallel. In this option, one application is substituted for the other, but does
not replace it immediately. Both systems run in parallel for a short period of time.
When the stakeholders are confident of the stability and performance of the new
system, the old system is switched off. This is usually the safest option, but is also
the most expensive. When systems are running in parallel, ensure that you create a
strict timeline with specific dates indicating when to stop the previous system. This is
because a parallel run places a huge workload on the users. Also, synchronizing data
between two parallel systems can be complex and expensive.
• Plan for migrating data to the new design
When you are building a new solution, do not build new functionality without considering
how to migrate the existing data. Business users often assume that existing data will be
available in the new system. The project scope should clearly state if the migration of data is
included in the project scope. If it is, you must schedule time to analyze and determine the
data that will be migrated, how it will be migrated, and the schedule for migration.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 19

Lesson 3: Creating a Conceptual Database Design

Lesson objective:
After completing this lesson, students will be able to:

Formulate requirements into a conceptual model that serves as a basis for defining entities, attributes,
and relationships.
Introduction
Conceptual database design is the process of analyzing customer and user perspectives and creating a
high-level representation of the solution. The purpose of a conceptual design is to formalize business
requirements, isolate them, and formulate them into general statements. You need to use these general
statements to check whether the conceptual model matches the database requirements already
gathered.
This lesson will provide an overview of the guidelines for producing a conceptual model, and for
using Object Role Modeling (ORM), Entity-Relationship (ER), and Unified Model Language (UML)
for conceptual models.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 20

Considerations for Choosing a Conceptual Modeling Methodology

Principle: Choose the most appropriate conceptual model for a given project.
Introduction
A conceptual model helps you understand the business problem to be solved and formalize the
business, customer, and user requirements.
When producing a conceptual model and transforming it later into a logical and physical model, it is
important that you choose a standard design methodology. The most prominent methodologies in
practice are ORM, ER, and UML.
Considerations for choosing a methodology
When choosing a conceptual design methodology, ensure that it meets the following requirements:
• The methodology supports database design at all levels
As discussed in the previous lesson, the process of designing a database requires you to
perform certain steps that will help you move from the user’s perspective of the solution to
that of the development team. Remember, your methodology must support the design at all
levels.
• Your data-modeling tools must support the methodology
Software and database designers use data-modeling tools that help them to create the
diagrams and models to share with the rest of the team. Therefore, it is critical that the tool
you choose supports you during the entire process.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 21

• Your methodology is acceptable to project sponsors


Diagrams of the conceptual models as well as the conceptual models themselves provide a
communication channel between users, customers, and database designers. Therefore, you
need to ensure that the chosen design methodology is acceptable to project sponsors.
Advantages and disadvantages of a conceptual design methodology
Models Advantages Disadvantages

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 22

Guidelines for Conceptual Modeling Using ORM

Principle: Apply guidelines to produce a conceptual model by using ORM.


Introduction
ORM is a conceptual database design methodology that enables users and designers to express
information as objects and to explore how they relate to other information objects.
ORM Guidelines
These guidelines need to be followed when working on a conceptual model using ORM:
• Identify elementary facts
The most important step in conceptual schema design is to transform information examples
into elementary facts. Consider the following example: “Claus Hansen attends a database
design course.” This information is transformed into an elementary fact during the conceptual
schema design.
• Determine fact types
A database designer creates a generalization of the elementary facts and expresses them in
fact types, and then applies the necessary quality checks with the domain expert. An example
of a fact type is “Student attends a course.”

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 23

• Determine database objects and roles


ORM involves picturing the world in terms of objects (entities or values) that play roles
(partially in a relationship). For example, in the ORM world, you will be an object called
person. The course that you are attending will be an object called learning method. The object
person will be considered in the role of a reader, and the object learning method will be
considered in the role of being read. You derive objects and roles from fact types.
• Identify mandatory and optional roles
Mandatory roles imply that an object type is known to play a certain role. Roles identified as
mandatory between objects and values are represented as NOT NULL constraints. Optional
roles are represented as columns in which you can fill null values. Mandatory and optional
roles between different objects result in different types of foreign keys.
For example: Person has name can be identified as mandatory for name, and will translate in
ORM as: Each person has some name. This is represented in the database as a NOT NULL
constraint. An optional role may be: Person has middle name. This fact will be represented as
a column that allows NULL values.
To use declarative integrity in the database, you must differentiate mandatory roles from
optional roles.
• Identify uniqueness constraints
Uniqueness constraints enable you to prevent the duplication of role instances and fact
redundancy in the model. Uniqueness constraints can be internal or external. Internal
uniqueness constraints are enforced within a single predicate. External uniqueness constraints
span roles across different predicates. An example of an internal uniqueness constraint is:
Each student has some name (NOT NULL). An example of an external uniqueness constraint
is: For each name N and last name L, there is at most one student with the name as N and last
name as L.
• Identify types and subtypes
You should inspect optional roles and check that entity types are in primitive forms. Primitive
forms are exclusive, and by refining the model, you can identify these primitive forms (types)
and their derived subtypes.
• Identify other constraints
At the end of the process, you should determine other constraints, such as frequency, value,
and set.
For more information
You can find additional information about Object Role Modeling and database modeling in:
“Visio-Based Database Modeling in Visual Studio .NET Enterprise Architect” by Terry Halpin on the
Microsoft MSDN Web site
In the CD resources directory, you will find a five-part article from Terry Halpin: “Microsoft New
Modeling Tool.” The article focuses on how Visio for Enterprise Architects supports ORM..

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 24

Guidelines for Conceptual Modeling Using ER

Principle: Apply guidelines to produce a conceptual model by using ER.


Introduction
ER conceptual models are a popular way to communicate and share conceptual models. ER abstracts
the world and views it as a collection of relationships between entities.
Guidelines
When working with an ER conceptual model, you may consider the following guidelines:
• Choose entities, attributes, and relationships based on requirements
An entity is a person, place, item, or concept. An attribute is a characteristic or property of an
entity. A relationship is an association or connection between entities. Eventually, entities
translate into tables, attributes into columns, and relationships into foreign keys.

When modeling by using the ER methodology, you will find candidate entities (concepts that
are capable of becoming entities) from the documentation and the attributes describing these
entities. When searching for entities, you will also find that some of the attributes are
references to other entities. These attributes should be represented as relationships.

For example, if you analyze the phrase a student attends a course, you will find two candidate
entities (student and course). If, from the database point of view, it is important to capture
information about student and course, they should both become entities. The attends predicate
is modeled as a relationship in the ER model between the student and course entities. If only

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 25

the data about the student is important and not the course, except for the fact that the student
attended the course, course may become an attribute in the student entity.
• Identify cardinality and constraints
Cardinality is a relationship property that permits you to specify the number of instances of
an entity that can exist on each side of the relationship. Cardinality is a specific type of
constraint in which the number of instances in the relationship is limited.
Other constraints that can be expressed in the ER model include key, structural, and
specialization constraints. These constraints play a critical role in good database design
because they improve the quality of the data. If you want to disallow invalid data in the
database, you must add database constraints.
• Identify subtypes and supertypes
A subtype is a type relationship that denotes a subgroup of entities that have attributes that are
different from other subgroups of a given supertype.
A supertype is an entity type whose subtypes share common attributes. The supertype holds
the attributes shared by all of the entities, including the identifier.

For example, assume that the Employee, Customer Contact, and Supplier Contact entities
share the attributes FirstName and LastName. This implies that you would have successfully
identified the Person entity. The Person entity is the supertype and Employee, Customer
Contact, and Supplier Contact are subtypes of the Person entity.
• Produce a conceptual ER diagram
Because the ER diagram is too close to the database representation, you might be tempted to
directly design the physical schema of the database. It is recommended that you first validate
the diagram with the domain expert and refine it before creating a logical diagram. Remember
that a conceptual ER diagram expresses the domain problem in terms that the stakeholders
understand.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 26

Guidelines for Conceptual Modeling Using UML

Principle: Apply guidelines to produce a conceptual model by using UML.


Introduction
UML is a notation system that allows designers to model, create, and communicate object-oriented
systems.
Guidelines
When working with a UML conceptual model, keep in mind the following guidelines:
• Use Case diagrams
• Develop Use Case diagrams to illustrate actors and relationships: Use Case
diagrams represent the interaction between users and the system. In a Use Case
diagram, users or external systems are represented as actors. User interactions with
the system are called Use Cases. Use Case diagrams help you capture the business
processes handled by the solution.
• Develop narratives for Use Cases: Use Case narratives are used to express
interaction between the system and the user. They are the main source of information
for capturing user and business requirements. The Use Case narratives describe the
data flow between the system and the user. By exploring the Use Case narratives,
you will identify the classes and properties needed to design the conceptual model in
a class diagram.
One main characteristic of good narratives is that they provide you with a description

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 27

of messages sent back and forth between the user and the system, and not of the
technology that supports it.
• Class diagrams
• Identify abstract patterns: From the Use Case narratives, you will create a class
diagram to represent the conceptual model. This class diagram will represent the
perspective and vocabulary of the system domain user, not the software objects to be
created..
• Identify object constraints and business rules: After objects and attributes are
identified and validated with the domain expert, study the documentation for
additional constraints and business rules. These constraints are added to the
appropriate class.
• Activity diagrams
• Use activity diagrams to build a conceptual model: UML is a process-oriented
approach to software design. Therefore, to refine the conceptual model, you need to
create activity diagrams that represent the interaction between objects.
For more information
You can find additional information about Unified Modeling Language on the Unified Modeling
Language Web site.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 28

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 29

Lab: Beginning the Database Design Process

Time estimated: 45 minutes


Scenario
You are the lead database designer working on the Human Resource Vacation and Sick Leave
Enhancement (HR VASE) project. You are provided with a Requirements document that specifies
business requirements, cost benefits, availability/reliability for business needs, security features, and
performance requirements. In this document, you will also find statements about how the HR
department wants to store information about vacation and sick-leave hours. You are also given a
FactInstances document, which provides information about facts, objects, and roles.
The HR VASE project will enhance the current human resources system. This system is based on the
AdventureWorks sample database in Microsoft SQL Server™ 2005.
The main goals of the project are to:
• Provide managers with current and historical information about employee vacation and
sick-leave data.
• Give individual employees permission to view their vacation and sick-leave balances.
• Give certain employees in the HR department permission to view and update employee
salary data.
• Give certain employees in the HR department permission to view and update employee
sick-leave and vacation data.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 30

• Give the HR manager permission to view and update all of the data.
• Standardize employee job titles.
Preparation
Ensure that the virtual machines for 2781A-90A-DEN-DC and 2782A-MIA-SQL-01 are running. Log
on to the computer by using the following credentials:
• Username: Administrator
• Password: Pa$$w0rd

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 31

Exercise 1: Gathering Database Requirements


Introduction
In this exercise, you will gather requirements for the database by classifying each requirement
provided in the Requirements document as a database requirement or a business rule.
Gather requirements for a systematic database design
Summary Specifications
1. Analyze the Requirements document. • For gathering database requirements,
2. Fill in the empty column in the Database analyze the Requirements document
Requirements document. located at D:\Labfiles\Starter. Gather
data about database requirements,
3. Discuss the results of your analysis with such as business requirements, cost
the class. benefits, availability and reliability
for business needs, security features,
and performance requirements.
• Record the inputs that you gather in
the Requirements document.
• Classify the requirements as a
database requirement or a business
rule in the appropriate column in the
Database Requirements document.
The Database Requirements
document is located at
D:\Labfiles\Starter.

Answer Key

1. In Windows Explorer, browse to D:\Labfiles\Starter.


2. To open the Microsoft Office Word document, in the Starter folder, double-click
Requirements.doc.
3. As per the information provided, find as many requirements as you can in the document.
4. Classify the requirements and then write your answers in the Database Requirements versus
Business Rules table in the Database Requirements document. This document is located at
D:\Labfiles\Starter. Classify the requirements as:
• Database Requirements. Requirements that belong to the database.
• Business Rules. Requirements that should be enforced by the application.
5. Fill in the missing database requirements or the points that require clarification in the Missing
Database Requirements table in the Database Requirements document.
6. You can compare your solution with the sample Database Requirements Solution document.
This document is located at D:\Labfiles\Solution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 32

Discussion questions
Read the following questions and discuss your answers with the class.
Q Which requirements were you able to isolate?
A Answers will vary. Some students will feel more comfortable with a client-server
architecture and will be tempted to assign all the responsibility to the database.
However, programmers with little or no database experience will assign all the
responsibility to business rules.
Q What requirements are still unclear or missing?
A Answers will vary. Students with HR solutions experience will raise many issues. Non-
user requirements are good to explore (For example: How many employees? How many
users? How often will reports be queried?)
Q How would you evaluate the scope statement for this project?
A Answers will vary. Students should apply their own strategies and rules to evaluate the
project scope for this project.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 33

Exercise 2: Creating a Conceptual Database Design


Introduction
In this exercise, you will infer fact types from the fact instances provided in the FactInstances
document. You need to identify objects, uniqueness constraints, and cardinality. You will also use the
Fact Editor in Microsoft Visio for Enterprise Architects 2005 Beta to draw an ORM source model to
depict your conceptual design.
Microsoft Visio for Enterprise Architects 2005 Beta is installed on the 2782A-MIA-SQL-01 virtual
machine.Create a conceptual database design
Summary Specifications
1. Analyze the FactInstances document. • Access the FactInstances document
2. Follow the necessary guidelines when in D:\Labfiles\Starter.
working on the ORM model. • Identify elementary fact types,
3. Add to the conceptual model in database objects and roles,
Microsoft Visio for Enterprise Architects uniqueness and other constraints,
2005 Beta. and cardinality.
• Infer objects and their relationship
with supplied tables.
• Infer the respective identifier for
every object and then add it to the
source model. For every fact, add
constraints and cardinality.

Answer Key

1. Browse to D:\Labfiles\Starter, and then open and read the FactInstances document.
2. Identify the facts, objects, uniqueness and other constraints, and cardinality in the document.
3. Open Microsoft Visio for Enterprise Architects 2005 Beta.
4. Browse to D:\Labfiles\Starter, and then open and review the ConceptualModel.vsd file.
5. The data for vacation is already filled in the FactInstances document. Use it as a source for the
following tasks:
• On the Main menu, click Database, View, Fact Editor, and then Create a New Fact.
• Refer to the FactInstances document, and then in the Create a New Fact dialog box,
type object names, relationships, and inverse relationships. Infer the object and its
relationship with supplied tables in the FactInstances document. Repeat the process for
every relationship. If you are familiar with ORM, you may be more comfortable working
in freeform style.
• For every fact, add constraints and cardinality. On the Main menu, click Database,
View, and then Business Rules to open the Business Rules window. In the Business
Rules window, double-click the fact to which the object belongs. This displays the Fact
Editor – Edit an Existing Fact dialog box. You can enter your constraint in this dialog
box.
• For every object, infer the respective identifier, and then add it to the source model. In the
Business Rules window (see the previous bullet point for information on opening this
window), double-click the fact to which the object belongs. Select the object, and then

MCT USE ONLY. STUDENT USE PROHIBITED


Module 1: Approaching Database Design Systematically 34

add the appropriate reference.


6. Save your model, and then compare your model to the sample solution file in
D:\Labfiles\Solution.

Discussion questions
Read the following questions and discuss your answers with the class.
Q What strategies can be used for solving problems with incomplete data?
A Answers will vary. Possible answers include:
• Repeat users and customers interviews.
• Interview the domain expert.
• Create a prototype and validate with users.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical
Level
Time estimated: 105 Minutes
Table of contents
Lesson 1: Building a Logical Database Model...................................................................... 4
Apply best practices when building a new logical database model. .............................. 4
Best Practices for Transforming a Conceptual Database Design into a Logical Model ..... 5
Principle: Apply best practices when transforming a conceptual database design into a
logical model................................................................................................................... 5
Best Practices for Working with Entities and Attributes .................................................... 7
Principle: Apply best practices for working with entities and attributes......................... 7
Considerations When Choosing Primary Keys.................................................................... 9
Principle: Select a primary key for the logical model based on the advantages and
disadvantages of surrogate and natural keys. .................................................................. 9
Best Practices for Finalizing the Logical Model ............................................................... 11
Principle: Apply best practices for finalizing a logical model. ..................................... 11
Lesson 2: Designing for OLTP Activity ............................................................................. 13
Apply guidelines for normalization when designing an OLTP model.......................... 13
Guidelines for Identifying Functional Dependencies ........................................................ 14
Principle: Apply guidelines for identifying the functional dependencies among entities
and attributes. ............................................................................................................... 14
Data Normalization Objectives.......................................................................................... 16
Principle: Identify the objectives of data normalization................................................ 16
Multimedia: Achieving a Normalized Design ................................................................... 17
Principle: Apply guidelines for achieving a normalized design. .................................. 17
Lesson 3: Designing for Data Warehousing ....................................................................... 19
Apply guidelines for designing a data warehouse database. ......................................... 19
Guidelines for Designing Fact Tables ............................................................................... 20
Principle: Apply guidelines for designing fact tables. ................................................. 20
Guidelines for Designing Dimensions............................................................................... 22
Principle: Apply guidelines for designing dimensions.................................................. 22
Multimedia: Guidelines for Designing a Star or Snowflake Schema ................................ 24

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 2

Principle: Apply guidelines for designing a star or snowflake schema......................... 24


Lesson 4: Evaluating Logical Models.................................................................................. 26
Evaluate an existing logical model of a database.......................................................... 26
Guidelines for Analyzing an Existing Logical Model ....................................................... 27
Principle: Apply guidelines for analyzing an existing logical model. .......................... 27
Guidelines for Identifying Problems with a Logical Model .............................................. 29
Principle: Apply guidelines to better identify problems with a logical model. ............. 29
Lab 2: Modeling a Database at the Logical Level.............................................................. 31
Exercise 1: Determining Entities, Attributes, Relationships, Keys, and Constraints....... 33
Exercise 2: Normalization ............................................................................................... 35

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 3

Module objective:
After completing this module, students will be able to:

Analyze and evaluate a logical database design.


Introduction
In a conceptual model, the focus is on the business user’s perspective. In a logical model, the
focus shifts from the perspective of the user to that of the development team. The model
translates user requirements into a relational database management system (DBMS). In the next
module, you will create the physical design and choose the DBMS vendor.
At the logical design phase, you choose the relational model and emphasize the best practices
for designing the relational model. These best practices include normalization for online
transaction processing (OLTP) systems and best practices for designing star and snowflake
schemas for relational dimensional systems that will eventually support online analytical
processing (OLAP) databases.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 4

Lesson 1: Building a Logical Database Model

Lesson objective:
After completing this lesson, students will be able to:

Apply best practices when building a new logical database model.

Introduction
There are many theories about database design at the logical level —such as Codd’s Rules, set
theory, and normalization forms—because it is a critical step in the design process. To translate
a conceptual database design into a logical database model, you must be familiar with ER
modeling. At the logical modeling stage, the ER diagram is usually mandatory, and it is the
standard for representing the logical model. You must know how to work with entities,
attributes, and relationships, and how to include different kinds of constraints in the model.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 5

Best Practices for Transforming a Conceptual Database Design into a


Logical Model

Principle: Apply best practices when transforming a conceptual database design into a logical
model.

Best Practices
When transforming a conceptual database model into a logical model, consider the following
best practices:
• Use an automated design tool to generate the logical model, and then manually
review it
By using automated design tools, such as Microsoft® Office Visio® for Enterprise
Architects 2005, you can concentrate on the design. Automated design tools help you
increase your productivity and provide assistance in the process.
Some tools allow you to generate logical models from conceptual models. However,
some designers create them manually. The advantage of manually creating the design is
that it assures you of the quality of the process.
However, by using an automated design tool to generate the logical model, and then
manually reviewing the model, you can benefit from both approaches.
• Produce an initial ER diagram of the logical model
If you use ER as the conceptual modeling methodology, you will already have the
initial diagram of the logical model. If you use ORM or UML as the conceptual

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 6

modeling methodology, you must create an initial ER diagram in this phase. This
diagram will evolve from the conceptual model into a logical model. With an initial ER
diagram, you can visualize the database and communicate and share the design with the
rest of the development team.
• Use portable database types
When creating the ER diagram to represent the logical model, use portable database
types. By using portable database types, you will be able to delay your design decisions
regarding the physical model. This delay provides you with the required flexibility to
make better decisions when designing the physical model. For example, using a generic
“text” column enables you to delay your decision about which of the char types
(CHAR, VARCHAR, NCHAR, or NVARCHAR) to use. In the next module, you will
translate the logical model into the physical database design.
• Use naming conventions that are independent of a physical implementation
When you work with ER diagrams, ensure that you are not affected by decisions
related to the physical model, or you will end up designing the physical database. If
you use a naming convention that is independent of the physical implementation, you
minimize the possibility of changing names frequently to keep the name properly
synchronized with the type. At this phase, make sure that the naming conventions are
independent of database types.
• Revise and clarify the logical model diagram for readability
A logical model is useful when you validate and share the model with other members
of the development team. If your team members understand the model, you have
developed a good design. However, if members of your team have difficulty
understanding the model, you probably need to redesign it.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 7

Best Practices for Working with Entities and Attributes

Principle: Apply best practices for working with entities and attributes.
Introduction
The most critical steps in the logical model require you to define entities and attributes. You
create the definitions in the context of choosing to implement your conceptual model in a
relational database, without committing to a particular physical implementation.
Best Practices
To create a logical model, you must provide a database model. When creating this model, the
following best practices will help you model better entities and attributes:
• Identify independent and dependent entities
One of the first steps in the process of working with entities and attributes is to separate
independent entities from dependent entities. Dependent entities (also known as weak
entities) rely on independent entities (also known as strong entities) for identification.
For example, the Employee entity is an independent entity. It does not rely on any
other entity for identification. The Manager entity is a dependent entity. It relies on the
Employee entity for identification.
• Identify tables vs. views
Some of the entities from the conceptual model will be in the form of tables in the
logical model, and some entities will be views. Ensure that you distinguish tables from
views in the model, and create the appropriate object to represent each. For example,
when working with a human resources model, you might find two entities: Employee

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 8

Information and Employee Badge. But Employee Badge might be just a view based
on the Employee Information table.
• Identify candidate keys
A candidate key is an attribute or set of attributes used to fully identify the entity. Some
of the attributes included in the candidate key can also be candidate keys. Therefore,
you must ensure that the attribute of a candidate key is not assigned to any subset of a
candidate key. For example, in a table that permits international IDs, a set consisting of
Region, Country, and National Identification is not a candidate key, because a subset
of those attributes consisting of Country and National Identification is also a
candidate key.
Some tables contain multiple candidate keys; you can make any one of them the
primary key. Candidate keys not selected as primary keys will become alternate keys.
These alternate keys will be translated into a unique constraint in the physical database.
For example, in the Employee Information entity, Personnel ID and Employee
Number are candidate keys, because they both fully identify employees. However,
only one of them will become a primary key, and the other will be an alternate key.
• Specify attribute constraints
After working with entities and attributes, you should design attribute constraints.
These constraints are controls that Relational Database Management Systems (RDMS)
will enforce on information. These constraints limit possible values stored in columns
in the database. Constraints are critical for ensuring data quality. The most common
attribute constraints are: Nullability, Default, Set, and Domain.
• Identify attributes with special security requirements
Security must be included in the design from an early stage. You might need to design
special security requirements for some attributes. Often, these attributes have special
constraints on auditing tables (such as a default system user), or they might be hidden
from the user by using different views.
More information
In Module 3, “Modeling a Database at the Physical Level,” you will review attributes and
security guidelines that will help you add security considerations to the physical model.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 9

Considerations When Choosing Primary Keys

Principle: Select a primary key for the logical model based on the advantages and disadvantages
of surrogate and natural keys.
Introduction
In database design, the choice between a surrogate key and a natural key is frequently debated.
This topic lists the advantages and disadvantages of surrogate and natural keys. This will help
you decide which key to choose while designing the logical model.
Advantages of Surrogate Keys Over Natural Keys
Surrogate keys have the following advantages over natural keys:
• Smaller size
Surrogate keys are smaller in size than natural primary keys. A surrogate key can
replace hundreds of bytes with one or two bytes. Because of their small size,
uniqueness, single columns, and distinct values, surrogate keys are ideal for clustered
indexes.
• Better support for primary key changes
Changes to data are better supported by surrogate keys than natural keys. When natural
keys are used as primary keys, any change to their values is difficult for the database to
accommodate. There might be rows in many tables that depend on the changed values.
As a result, all related data must be changed, which is a slow process.
• Natural keys non-existent at times
At times, there might be no natural keys for data. Even if there are natural keys for

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 10

data, they might be missing. For example, if you are designing a database for a
hospital, you might make “Medical ID number” the natural key. However, when an
unconscious patient is admitted to the hospital, this number might not be available. In
this case, there is no natural key to use; there is no way to identify the person, and a
surrogate key must be used.
• Easier joins
The development effort involved in joining tables by using natural keys of six or seven
attributes can be significant and time consuming, especially if many tables are involved
in the join. For a database developer, the complexity of the query increases with natural
keys. It is simple to join tables with surrogate keys, because they require only one
attribute in the join.

Advantages of Natural Keys Over Surrogate Keys


Natural keys have the following advantages over surrogate keys:
• Enforced compliance
When Dr. Edgar F. Codd introduced the word relational, he was trying to avoid the
physical model. A surrogate key brings you closer to the physical model. Often, you
can implement surrogate keys by using identity columns that are not compliant with
American National Standards Institute (ANSI) standards. As a result, your choice of
database vendors is limited to those who support this type of surrogate key, which can
make database migration more difficult.
• Less joins to obtain values
When working with natural keys, you require fewer joins. In grandparent-parent-child
relationships, you can use attribute values directly to create joins with the grandparent
table, without involving the parent.
• Automatically created constraints
When using surrogate keys, you must create a unique constraint over the natural key to
avoid the insertion of duplicate data. This constraint is automatically created when
using a natural key.
• User verifiable
Because natural keys are a collection of business attributes, users can verify them.
Surrogate keys do not have a business meaning and are prone to errors.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 11

Best Practices for Finalizing the Logical Model

Principle: Apply best practices for finalizing a logical model.


Best Practices
When finalizing the logical model, you should adopt the following best practices:
• Identify remaining constraints
When creating a logical model, you might find constraints that don’t meet the defined
standards for ER models. Because these constraints will be related to the data, you
must still identify and validate them. These constraints can be implemented by using
stored procedures, triggers, functions, or other database objects. Microsoft SQL
Server™ 2005 permits you to use the T-SQL language as well as the Microsoft.NET–
compatible languages, such as Microsoft Visual Basic®, Microsoft Visual C#®, and
Microsoft Visual C++®. You should not try to build business constraints or solve
business logic in the database. If the constraint is closely related to data and can be
easily written in Structured Query Language (SQL), only then should you validate the
constraint in the database. However, if the constraint includes complex calculations,
affects the workflow, is not related to the content of data, or is difficult to write in SQL,
you should consider writing the constraint in business components.
• Identify schemas for grouping entities
When database models include hundreds or thousands of tables, you might face
difficulties in naming objects and managing them. To solve this problem, you should
use schemas to help developers and users identify database objects. A schema is a
collection of database objects that form a single namespace. SQL Server 2005 supports

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 12

the ANSI SQL-92 schema standard, enabling you to group entities in appropriate
namespaces.
• Ensure that all tables are fully normalized
Normalization helps you eliminate redundancy in the database. When designing an
OLTP database, you must ensure that every table is fully normalized. From the
viewpoint of the business application, normalizing at the third normal form suffices.
Keep in mind that the logical model should be fully normalized. Sometimes, if
performance issues arise, you may denormalize, but you should do it at the physical
level.
• Ensure that the logical model expresses the conceptual design
The logical model should be an accurate and complete representation of the conceptual
model. The logical database model represents the perspective of the database
development team. The logical database model is one step in the Microsoft Solution
Framework (MSF) Process Model deliverables that help you move from the customer
view of the solution to the specific implementation.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 13

Lesson 2: Designing for OLTP Activity

Lesson objective:
After completing this lesson, students will be able to:

Apply guidelines for normalization when designing an OLTP model.


Introduction
OLTP systems are usually supported by OLTP databases. These databases are organized into
relational tables to reduce redundant information. When designing an OLTP database, you
should identify all functional dependencies and achieve a normalized design. Normalization
enables you to increase the speed of updates (inserts, deletes, and updates) and the number of
real-time transactions supported. A normalized design also helps you reduce the risk of
inconsistency in data.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 14

Guidelines for Identifying Functional Dependencies

Principle: Apply guidelines for identifying the functional dependencies among entities and
attributes.
Introduction
When designing relational databases, you will find frequent mention of the concept of
functional dependencies. A functional dependency is a special relationship among columns in a
table. A set of columns X in a table T is functionally dependent on a set of columns Y, if values
in columns Y identify values in columns X.
For example, when designing the Customer entity, you might find columns named Customer
Code, Customer Name, and Customer Address. The values in the Customer Code column
identify the values of the Customer Name and Customer Address columns. Therefore, the
columns Customer Name and Customer Address are functionally dependent on the column
Customer Code.
Guidelines for Identifying Functional Dependencies
When searching for functional dependencies, consider the following guidelines:
• Identify the components of a relational schema
A relational schema is based on the relational theory, which is a part of the applied set
theory. When you are working with other professionals, you should be able to identify
the concepts used in the theory. The various concepts are as follows:
• Relation: An entity or a valid combination of entities. For example,
Customer, Invoice, and Product are relations in a relational schema. In SQL,
relations are represented as tables.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 15

• Tuple: An instance of an entity. If the relation is a combination of entities, a


tuple is an instance of that entity combination. For example, the information
from a customer Claus Hansen is a tuple; data from Invoice 8349 is also a
tuple. In SQL, tuples are represented as rows in a table.
• Attribute: A single value (unstructured property) containing the description of
an entity or relation. Name, Last Name, and Email are attributes of the
relation Customer. In SQL, attributes are columns.
• Key: A set of attributes (may be a set with only one element) that uniquely
identifies tuples in a relation. In a relation, Employee, the Employee Number,
and the Social Security Number are keys. Keys become primary keys or
unique constraints in SQL.
• Schema: The attributes that the relation will have, the names of the attributes,
and the keys in the relation.
Note
The concept of schema in a relational theory is completely different from the concept of ANSI
schema supported in Microsoft® SQL Server™ 2005. The ANSI schema concept refers to the
logical container of database objects.

• Identify the semantics of attributes


To identify the meaning of values in attributes, you must understand their semantics.
For example, when you are working with the date “06/05/2005,” the semantics of the
attribute provides information about the meaning of the date format—that is, whether it
is referring to June 5, 2005, or May 6, 2005. The semantics of attributes identify the
meaning of the attribute (06=month, 05=day, 2005=year).
• Identify keys, superkeys, and transitivity
When you work with relational data, you must identify the key or keys of every
relation. When searching for keys, you will frequently find superkeys first. A superkey
is a set of attributes that cannot have the same values in two different tuples. For
example, when designing the relation Employee, you find that the set of attributes
{Name, Last Name, Birth date, and Social Security Number (SSN)} is a superkey of
Employee. This is because two employees cannot have the same values for those
attributes. In addition, another attribute set {Name, Last Name, and Social Security
Number (SSN)} is a superkey, and finally another set {Social Security Number(SSN)}
is also a superkey.

Social Security Number (SSN) is the only key in this example, because it is a minimal
superkey. From the mathematical theory, we know that:
• Every key is superkey.
• Every attribute set that includes a key is also superkey.
• Every superkey is not a key.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 16

Data Normalization Objectives

Principle: Identify the objectives of data normalization.

Data Normalization
When normalizing a relational database, consider the following objectives of data
normalization:
• Reduce redundant values in tuples
Redundancy in databases produces undesirable abnormalities, such as having the
same customer with two different addresses or two different names.
• Reduce or eliminate null values in tuples
Null values are often a sign of fat relations, or relations with too many columns. In
general, try to avoid using null values. For example, many Products might not have
a Photo. Instead of having a column with many null values, you could use a
Product-Photo relation to avoid null values.
• Avoid generating spurious tuples
A spurious tuple is a false tuple that is generated if you join improperly designed
relations. For example, imagine that you created three relations, Customer,
Invoice, and Employee (as salesman), and you incorrectly assign the Customer
Address attribute to the Invoice relation. By joining Invoice and Employee, you
generate a spurious tuple because the Customer Address attribute has been
assigned to the employee.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 17

Multimedia: Achieving a Normalized Design

Principle: Apply guidelines for achieving a normalized design.


Introduction
To create a design that effectively supports the OLTP environment, you must create a logical
database design that is highly normalized. From the logical point of view, you have reviewed
the objectives of a normalized design (reduce redundant values in tuples, reduce the null values
in tuples, and avoid generating spurious tuples). Note that the logical design will provide a solid
foundation for the physical design.
Guidelines
When normalizing your logical database design, consider the following guidelines:
• Achieve high degree of normalization for a robust design
There are several degrees of normalization. In most database designs, a third normal
form is sufficient to evaluate the degree of normalization of the database. The first
three normal forms are:
• First normal form: A relation is in first normal form if it does not contain
any repeating groups, and if every attribute is atomic. For example, if you
design a relation with the attributes Country Name, Official Language 1,
Official Language2, Official Language 3, the design is not in first
normal form. This is because the Official Language attribute is repeating.
• Second normal form: A relation is in second normal form when, in
addition to being in the first normal form, its attributes depend on the
whole key. If you create a relation with the attributes Country,

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 18

State/Province, and Currency and {Country, State/Province} is the key,


the design is not in second normal form because the Currency attribute
depends only on the Country attribute.
• Third normal form: A relation is in third normal form when it is in second
normal form and not all of its attributes are transitively dependent on the
primary key. An example of an entity that is not in third normal form is
the Employee Address relation with the attributes Employee Number,
Address Line, Country Code, and Country Name. This is because
Country Name depends only on Country Code, which in turn depends
on Employee Number. Country Code depends transitively on the
Employee Number key.
There are more normal forms, such as Boyce-Codd Normal Form, fourth normal form,
and fifth (or domain-key) normal form. Most designers normalize the logical design to
the third normal form because the other forms help only in cases that do not occur
regularly in business applications.
• Avoid update anomalies with normalization
Update anomalies are problems that arise when you modify database information
(insert, delete, or update). In the example provided for first normal form, it is not
possible to add a language before normalization, unless it is the official language of a
country. Normalization reduces update anomalies because most of the redundant data is
removed, allowing only one “good value” to be stored in the database and providing
access to that value through the key.
• Ensure dependency of all values on simple as well as composite keys
When a logical design is in third normal form, the values of every attribute depend only
on the key. The functional dependency is removed from the attributes, except for the
dependency originating in the key.
• Ensure non-loss decomposition of the design
When normalizing the design, remember that the process must be reversible. You
should always be able to re-create the information in its original format by joining the
design relations.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 19

Lesson 3: Designing for Data Warehousing

Apply guidelines for designing a data warehouse database.


Introduction
Data warehouses are used by people at all levels in an organization for making business
decisions. Data warehouses are a valuable asset for any organization. In this lesson, you will
become familiar with the guidelines that you should follow when designing a data warehouse
database.
Most designers accept that a data warehouse schema and its design differ from a traditional
transactional database design. A data warehouse database is highly optimized for queries. A
data warehouse database supports unplanned queries, and you can update it by using an
extraction, transformation, and loading (ETL) process. Such a process is used to move data
from one or more relational databases to the OLTP data warehouse. Data warehouse databases
are often denormalized.
This lesson introduces you to the guidelines for designing fact tables, dimensions, and a star or
snowflake schema. You should consider these guidelines when designing software solutions
that include data warehouses and other business intelligence components, such as OLAP
structures.
For more information
This course will not cover how to translate data from the OLTP database to the data warehouse.
This information can be found in Module 8, “Creating a Strategy for Distributing Data Using
SQL Server 2005,” in Course 2781A, Designing Microsoft SQL Server 2005 Server-Side
Solutions.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 20

Guidelines for Designing Fact Tables

Principle: Apply guidelines for designing fact tables.


Introduction
A fact table is the center table of the star or snowflake schema. A fact table represents a
business event or process. A fact table usually contains aggregated data and rarely includes
detailed facts. In SQL Server 2005, a fact table is associated with a measure group in an OLAP
cube.
The purpose of a fact table is to capture all metrics that measure the event that the fact table
represents. For example, when you are designing a Sales fact table, focus on measures such as
quantity, amount, discount, cost, and profit. All these measures provide users with the business
indicators of the event.
The remaining attributes of the fact table act as foreign keys to dimension tables. These
dimensions add a business perspective to the event. For example, in a Sales fact table, you will
probably find columns related to a Customer Dimension, Date Dimension, Product Dimension,
and Store Dimension. These dimensions provide business context to the measures.
Guidelines
Consider the following guidelines when designing fact tables of a data warehouse:
• Ensure that each fact row identifies a single event
When you are designing a fact table, keep in mind that you must capture information
about a single event in the fact table. A business event is the entity represented by the
fact table. For example, in the Sales fact table, the invoicing (creation of one invoice)

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 21

event is captured as one row in the table.


If you do not follow this guideline, users will request more detail from the data
warehouse, which usually involves redesigning the fact table and the group measure.
• Consider multiple fact tables for different event types
A common error when designing a data warehouse is to represent different event types
in one fact table. For example, you might try to capture the receipt of the purchase
order event, the invoicing event, and the shipping event as three different rows in the
same fact table, but you should not.
If you represent each event in a different fact table, it will be easier for users to query
the table. Keeping events in separate tables will also make the database less susceptible
to exponential growth.
• Consider the need for factless tables
Occasionally you need only to capture whether an event happened. For example, you
might want to capture employee attendance. You only need to know which employees
came to work (one fact table) and which employees were absent (another fact table).
Tables that capture events without measures are known as factless tables.
When designing data warehouse solutions, you should identify the situations in which
you need to create factless tables in the solution; they are easily overlooked.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 22

Guidelines for Designing Dimensions

Principle: Apply guidelines for designing dimensions.


Introduction
A dimension represents a business entity that provides you with the context to the measures
included in fact tables. This business entity is arranged in hierarchies of other business entities.
The business entity enables users to browse information in a summarized manner down to
minute details.
Guidelines
Consider the following guidelines when designing dimensions:
• Design solutions for handling changes in large dimensions
Data warehouse designers must keep in mind the business event history and retain it in
the current design. For example, if a customer’s address has reference to the city,
state/province, and country, the data warehouse will group the sales made to that
customer in a particular manner, such as Los Angeles-California-USA. Later, if the
customer relocates Buenos Aires-Buenos Aires-Argentina, the design must support one
customer instance before the relocation and another after the relocation, so that the
information is arranged properly. You can usually achieve this by adding a new row
with the new customer information.
In large dimensions (hundreds of thousands of customers), adding new rows will create
a performance issue. To handle changes in very large dimensions, create separate
dimensions for frequently changed attributes. When appropriate, you should group
attribute values in their respective ranges. For example, consider demographics

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 23

information, such as income, age, and education. These attributes change frequently.
To avoid performance problems, you should create separate dimensions for each
attribute and create income, age, and education groups relevant to the business
perspective.
• Consider causal dimensions
Causal dimensions add value to data warehouses. Most dimensions will help you
answer the what, who, when, and how queries, but causal dimensions help you
understand why something happened. For example, in a Sales fact table, the Customer,
Date, Product, and Store dimensions provide users with important business
information, but they do not tell you the reason the customer had for buying something.
Reasons might include a product promotion, a price or size change, or marketing and
promotion efforts.
Casual dimensions are rarely precise. Often, they simply provide the business user with
a guess. Consider the use of casual dimensions and educate users on their limitations.
Explain to users that there could be other reasons that explain the “why” that are not
captured in the data.
• Consider time dimensions
Time dimension is one of the most important dimensions, considering the frequency of
its use. Almost every fact table has a reference to the time dimension. Time dimensions
tell you when the event happened. This information is critical for business users. When
working with a time dimension, consider the following guidelines:
• It should be universal enough to support most fact tables
A single time dimension table enables users and OLAP designers to merge
information from different fact tables. This is not always the case when
different time dimensions are used.
For example, when an event has different dates (such as order date, invoice
date, shipping date, and received date), all of the column attributes are
represented as columns of the fact table and have a foreign key relationship
with the same time dimension table.
• It should be built at design time
Most software architects build the time dimension during the design phase
without querying the transactional databases. This practice allows them to
create a dimension that is not dependent on any fact table. When you create
such a time dimension, it serves the purpose of providing a general solution
for many fact tables. SQL Server 2005 Analysis Services provides you with
a wizard to create a time dimension without querying transactional
databases.
• It should include attributes related to time
Certain attributes might have significant business implications. For example,
attributes such as Weekend, Holiday, Pay Day, and Season might have
business relevance in the retail industry.
• Design hierarchies that match the requirements gathered from different business
users
When designing business entities, you should keep in mind the requirements of
different users and customers to present information in a practical and descriptive
manner. For example, for some users it might seem natural to group customer
information in a geographical structure, so you should provide support for a Country-
State/Province-City-Store-Customer hierarchy. For other users, it might be logical to
group customer information by market channels, such as Wholesale, Retail, and E-
Commerce.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 24

Multimedia: Guidelines for Designing a Star or Snowflake Schema

Principle: Apply guidelines for designing a star or snowflake schema.


Introduction
As a data warehouse designer, you may choose between two different schemas: the star schema
and the snowflake schema.
The simplest schema for designing data warehouse databases is the star schema. The star
schema has a center, represented by the fact table, and the points of the star, represented by the
dimension tables. The advantage of the star schema is that joins between the dimensions and the
fact tables are simple.
The snowflake schema is more complicated because it permits you to normalize the dimension
tables. Each hierarchical level in the dimension is modeled in one distinct table. The snowflake
schema requires more complex joins than the star schema.
Guidelines
Consider the following guidelines while designing a star or snowflake schema:
• Choose a star schema when one fact table is present in the dimension
Unless a very specific need arises, the star schema is the preferred schema. The star
schema is easy to query, fast, and a natural way to model the data warehouse.
• Choose a snowflake schema when the initial model does not involve folding up fact
tables
When designing a data warehouse, sometimes you will need different granularity for

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 25

the same event. For example, if you are working with budget and expense information,
chances are that the budget is not as detailed as the expense information. Designers
usually work with two different fact tables, but one of them is not related to the lowest
level in every dimension. A snowflake schema in this case is faster, smaller, and
simpler than the star schema.
• Choose a snowflake schema to support large dimensions
Sometimes, you will need to use dimensions that include several hundred thousand, or
even several million, members in a dimension. In this case, normalizing the dimension
might improve performance. For example, consider a fact table for clickstream analysis
(a data warehouse that records how customers navigate a Web site) in which customers
have the Web browser as one of the levels. Because most of the users will have the
same repeating values, a snowflake schema will provide better performance than a star
schema. A snowflake schema permits a more normalized design; therefore, there is no
need to have millions of rows with the same attribute.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 26

Lesson 4: Evaluating Logical Models

Lesson objective:
After completing this lesson, students will be able to:

Evaluate an existing logical model of a database.

Introduction
Sometimes, as part of the development process, you will need to work with existing logical
models. In such cases, you often need to validate the model. If the validation reveals flaws in
the existing model, you might need to redesign the model. However, you should first evaluate
the risks involved in redesigning. In most cases, the previous functionality is retained in the new
model, and you would need to protect the information that is already stored in the database.
In this lesson, you will learn the guidelines for analyzing existing logical models and
identifying problems within them.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 27

Guidelines for Analyzing an Existing Logical Model

Principle: Apply guidelines for analyzing an existing logical model.


Introduction
Before starting any redesign, you should validate whether the model you are working on is
accurate. The following guidelines can help you avoid the costly mistake of working with the
incorrect model, thereby saving time and effort.
Guidelines
Consider the following guidelines for analyzing an existing logical model:
• Validate the accuracy of documentation for an existing logical model
Most companies either do not have strict operation management policies or do not
enforce them, including document management controls. Therefore, a logical database
model is usually not 100 percent accurate in the actual production environment.
Changes in the production environment might reveal requirements that were not
recognized in the early stages and mistakes that were corrected. These are usually
minor but important changes. You must validate the documentation to ensure that these
changes are retained in the new model.
• Reverse engineer the logical model if required
You will often find documentation that is not precise. Therefore, you should assign
some time in the project to reverse engineer the logical model. Visio for Enterprise
Architects 2005 enables you to reverse engineer databases into ER models or ORM
source models.
Compare the generated model with the documented model. If you find a few minor

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 28

changes in the production environment, modify the documentation model to reflect the
production environment. With this approach, you can use better support documents
apart from the ER diagram, including transcripts of interviews, messages, and reports.
You should use the reverse engineered model if:
• There are many changes.
• The changes to be made are structural in nature.
• The support documents do not exist.
If this phase is necessary, make sure to plan enough time for it.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 29

Guidelines for Identifying Problems with a Logical Model

Principle: Apply guidelines to better identify problems with a logical model.


Introduction
When you are sure about working with the correct model, you must identify problems in the
existing database design. A common error is to normalize the model first without looking at the
other requirements. You should first identify the problem and then fix it. In other words, look
for inconsistencies and anomalies, and then normalize.
Guidelines
Consider the following guidelines for identifying problems in the existing logical models:
• Analyze the model for consistency
Database designers use logical models to ensure that you write only valid data in the
database. If the goal to write only valid data is not achieved, problems in the logical design
are the cause.
While reviewing a logical model, search for inconsistencies that allow invalid data into the
model. Ask yourself, “How can I break the rules of the model?” For example, will you be
able to save an e-mail message in a date attribute? If date attributes are captured in text/char
columns, and if constraints are not properly enforced, you might be successful. Can you
save amounts in an Invoice Header that do not add correctly? If the total amount is the sum
of the subtotal and sales tax, saving the three amounts creates an unnecessary functional
dependency (total depends on subtotal and sales tax that are not keys), which allows invalid
data to be saved in the database.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 30

• Identify anomalies caused by insufficient normalization


Lack of normalization introduces update anomalies into the model. In addition to
normalizing the database, you must also identify the effects of the lack of normalization.
Identifying these anomalies will enable you to design a strategy to migrate data from one
schema to the other.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 31

Lab: Modeling a Database at the Logical Level

Time estimated: Insert time estimated


Scenario
You are the lead database designer working on the Human Resource Vacation and Sick Leave
Enhancement (HR VASE) project. In the previous lab, you created a conceptual model based on
the Requirements document about the information that the HR department wants to store about
employee vacation and sick leave hours. In this lab, you will build a logical model based on the
conceptual model and normalize its entities.
The HR VASE project will enhance the current human resources system. This system is based
on the AdventureWorks sample database included with SQL Server 2005.
The main goals of the project are to:
• Provide managers with current and historical information about employee vacation and
sick-leave data.
• Give individual employees permission to view their vacation and sick-leave balances.
• Give certain employees in the HR department permission to view and update employee
salary data.
• Give certain employees in the HR department permission to view and update employee
sick-leave and vacation data.
• Give the HR manager permission to view and update all of the data.
• Standardize employee job titles.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 32

Preparation
Ensure that the virtual machines for the computers 2781A-90A-DEN-DC and 2782A-MIA-
SQL-02 are running. Log on to the computer by using the following credentials:
• User name: Administrator
• Password: Pa$$w0rd

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 33

Exercise 1: Determining Entities, Attributes, Relationships, Keys, and


Constraints
Introduction
In this exercise, you will create a logical database model based on the provided conceptual
model, using Visio for Enterprise Architects 2005. You will have to determine entities,
attributes, relationships, keys, and constraints, and you will create an ER diagram.
Determining entities, attributes, relationships, keys, and constraints
Summary Specifications
1. Create a new database model • Open Microsoft Office Visio for
diagram. Enterprise Architects 2005 Beta.
2. Add the conceptual model diagram to Create a new database model diagram,
the database model diagram. and save it in the folder
D:\Labfiles\Starter as
3. Review the ORM conceptual model. LogicalModel.vsd.
4. Generate the ER logical model. • On the Main menu, click Database,
5. Review the ER logical model. point to Project, and then click Add
Existing Document.
6. Discuss your views with the class.
• Browse to D:\Labfiles\Starter, and
then select ConceptualModel.vsd.
The Project window is displayed,
with the ConceptualModel.vsd file
added to the project. Double-click
ConceptualModel.vsd, and when the
conceptual model is displayed, review
the ORM conceptual model.
• On the Main menu, click Database,
point to Project, and then click Build.
From the Tables and Views window,
drag the tables to the design area.
• Review the logical design diagram.
Identify the entities, attributes,
relationships, keys, and constraints
represented in the diagram.

Answer Key
1. Open Microsoft® Office Visio® for Enterprise Architects 2005 Beta.
2. Create a new database model diagram, and save it in the folder D:\Labfiles\Starter as
LogicalModel.vsd.
3. On the Main menu, click Database, point to Project, and then click Add Existing
Document.
4. Browse to D:\Labfiles\Starter, and then select ConceptualModel.vsd. The Project
window is displayed, with the ConceptualModel.vsd added to the project.
5. Double-click ConceptualModel.vsd, and when the conceptual model is displayed,

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 34

review the ORM conceptual model.


6. Close the conceptual model.
7. Save LogicalModel.vsd.
8. On the Main menu, click Database, point to Project, and then click Build.
9. From the Tables and Views window, drag the tables to the design area.
10. Review the logical design diagram. Identify the entities, attributes, relationships, keys,
and constraints represented in the diagram.
11. Save LogicalModel.vsd.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 35

Exercise 2: Normalization
Introduction
In this exercise, you will examine the resulting ER diagram from Exercise 1. Based on the
generated logical model, you will change the model to ensure that all entities are normalized.
All entities must be in at least the third normal form.

Normalization
Summary Specifications
1. Determine whether the design is in • Review the logical model generated in
third normal form. the previous exercise. Determine
2. Normalize the logical database. whether the design is in third normal
form. If it is not, list the attributes and
3. Review the new facts added when the entities that should be normalized.
design was normalized.
• During the discussion, share your
findings about the lack of
normalization in the logical design
with other students. Based on your
findings, normalize the logical
database. Create the necessary
entities, and change the attributes to
support your normalized design.
• On the Main menu, click Database,
point to Project, and then click
Update Source Models. In the
Project window, double-click
ConceptualModel.vsd. In the
Business Rules window, review the
new facts added when the design was
normalized.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 2: Modeling a Database at the Logical Level 36

Answer Key
1. Review the logical model generated in the previous exercise.
2. Determine whether the design is in third normal form. If it is not, list the attributes and
entities that should be normalized.
3. Based on your findings, normalize the logical database. Create the needed entities, and
change the attributes to support your normalized design. At least the attribute Title in
the Employee table should be normalized.
4. On the Main menu, click Database, point to Project, and then click Update Source
Models.
5. In the Project window, double-click ConceptualModel.vsd. If the window is not
displayed, on the Main menu, click Database, point to View, and then click Project to
display the window, and then double-click ConceptualModel.vsd.
6. In the Business Rules window, review the new facts added when the design was
normalized. If the Business Rules window is not displayed, on the Main menu, click
Database, point to View, and then click Business Rules.

Discussion questions
Read the following questions and discuss your answers with the class.
Q Why is the design not in the third normal form?
A The design might not be in the third normal form because of the following:
• The Title column (attribute) in the Employee table (entity)
• The Sick Reason column in the Sick Event table
• The Group Name column in the Department entity
Q How can you normalize the design?
A The design can be normalized in the following ways:
• Create the Employee Title table.
• Create the Risk Reason table.
• Add the Department Group entity.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical
Level
Time estimated: 180 Minutes
Table of contents

Lesson 1: Designing Physical Database Objects................................................................................. 6


Apply guidelines for designing physical database objects............................................................. 6
Introduction ............................................................................................................................... 6
Guidelines for Establishing Database Object Naming Standards ....................................................... 7
Principle: Follow guidelines for naming database terms, abbreviations, and objects.................... 7
Introduction ............................................................................................................................... 7
Guidelines.................................................................................................................................. 7
Considerations for Choosing Column Data Types.............................................................................. 9
Principle: Consider tradeoffs when choosing column data types. ................................................. 9
Introduction ............................................................................................................................... 9
Considerations when choosing column data types .................................................................... 9
Considerations for Using CLR User-Defined Data Types................................................................ 12
Principle: Consider aspects of data types and processing requirements when using CLR user-
defined data types. ....................................................................................................................... 12
Introduction ............................................................................................................................. 12
Considerations when using CLR UDTs .................................................................................. 12
Guidelines for using the XML Data Type......................................................................................... 14
Principle: Apply guidelines for using the XML data type. .......................................................... 14
Introduction ............................................................................................................................. 14
Guidelines................................................................................................................................ 14
Guidelines for Choosing Computed Columns .................................................................................. 16
Principle: Apply guidelines for choosing computed columns. .................................................... 16
Introduction ............................................................................................................................. 16
Guidelines................................................................................................................................ 16

Lesson 2: Designing Constraints........................................................................................................ 18


Apply best practices when designing constraints to columns, tables, and databases................... 18
Introduction ............................................................................................................................. 18
Guidelines for Designing Column Constraints ................................................................................. 20
Principle: Apply guidelines for designing column constraints. ................................................... 20
Introduction ............................................................................................................................. 20
Guidelines................................................................................................................................ 20
Guidelines for Designing Table Constraints ..................................................................................... 22
Principle: Apply guidelines for designing table constraints......................................................... 22
Introduction ............................................................................................................................. 22
Guidelines................................................................................................................................ 22
Guidelines for Designing Database Constraints................................................................................ 24
Principle: Apply guidelines for designing database constraints................................................... 24

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 2

Introduction ............................................................................................................................. 24
DDL triggers ........................................................................................................................... 24

Lesson 3: Designing for Database Security....................................................................................... 26


Implement security best practices in the design of databases. ..................................................... 26
Guidelines for Authentication and Authorization ............................................................................. 27
Principle: Apply guidelines for authentication and authorization................................................ 27
Authentication ......................................................................................................................... 27
Authorization........................................................................................................................... 28
Considerations for Data Protection ................................................................................................... 29
Principle: Consider the methods for protecting data in a database. ............................................. 29
Introduction ............................................................................................................................. 29
Data protection ........................................................................................................................ 29
Data encryption ....................................................................................................................... 30
Considerations for Auditing.............................................................................................................. 31
Principle: Consider the impact of auditing on the physical schema of the database.................... 31
Introduction ............................................................................................................................. 31
Audit patterns .......................................................................................................................... 31
Discussion: Encryption Tradeoffs..................................................................................................... 33
Principle: Discuss the tradeoffs implicit in different encryption scenarios.................................. 33
Introduction ............................................................................................................................. 33
Discussion questions ............................................................................................................... 33

Lesson 4: Designing Database and Server Options ......................................................................... 35


Apply best practices when designing database and server options. ............................................. 35
Introduction ............................................................................................................................. 35
Considerations for Service Settings .................................................................................................. 36
Principle: Consider the service settings in designs that implement cross-database access or
enable CLR integration. ............................................................................................................... 36
Introduction ............................................................................................................................. 36
Enabling CLR integration ....................................................................................................... 36
Cross-database access.............................................................................................................. 38
Guidelines for Specifying Database File Placement and Organization............................................. 39
Principle: Apply guidelines for specifying placement of database files. ..................................... 39
Introduction ............................................................................................................................. 39
Guidelines................................................................................................................................ 40
Guidelines for Choosing Database Options ...................................................................................... 41
Principle: Apply guidelines for choosing database options. ........................................................ 41
Introduction ............................................................................................................................. 41
Database options...................................................................................................................... 41

Lesson 5: Evaluating the Physical Model.......................................................................................... 43


Apply best practices when evaluating the physical model........................................................... 43
Introduction ............................................................................................................................. 43
Reasons for Prototyping the Database Design .................................................................................. 44
Principle: State the reasons for prototyping the database design. ................................................ 44
Reasons to prototype ............................................................................................................... 45

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 3

Guidelines for Data Migration .......................................................................................................... 46


Principle: Apply guidelines for migrating data............................................................................ 46
Introduction ............................................................................................................................. 46
Guidelines................................................................................................................................ 46

Lab 3: Modeling a Database at the Physical Level .......................................................................... 48


Time estimated: 50 minutes .................................................................................................... 48
Exercise 1: Specify Database Object Naming Standards ....................................................... 48
Exercise 2: Define Tables and Columns and Choose Data Types........................................... 48
Scenario................................................................................................................................... 48
Preparation .............................................................................................................................. 49
Exercise 1: Specify Database Object Naming Standards.................................................................. 50
Introduction ............................................................................................................................. 50
Specify database object naming standards .............................................................................. 50
Exercise 2: Define Tables and Columns and Choose Data Types ................................................... 51
Introduction ............................................................................................................................. 51
Define tables and columns and choose data types................................................................... 51

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 4

Module objective:
After completing this module, you will be able to:

Apply best practices for creating a physical database design.


Introduction
In its planning phase, the Microsoft® Solutions Framework (MSF) Process Model defines three levels
of design: conceptual, logical, and physical. The physical design level that is discussed in this module
and the remaining modules is the culmination of the conceptual and logical design stages.

The following table illustrates the evolution of a database design through its conceptual, logical, and
physical design levels.

Database design question Database design level Distinguishing features

Why? Conceptual database model User requirements

What? Logical database model Relational and data warehouse


theories

How? Physical database model Technical implementation


decisions

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 5

The physical design of a database is the last step in the design process and the point at which you
create the physical database model, the third stage in the evolution of the database design. At the
physical stage, the development team has already agreed on the following:
• The business requirements and terminology, and a conceptual database model.
• A set of database entities and relationships in the logical design.
This module provides you with the conceptual information and techniques that you will need to
successfully model a database at the physical level. You will learn how to:
• Design physical database objects.
• Design constraints on columns, tables, and databases.
• Design a physical database more securely to protect sensitive data.
• Design database and server options.
• Evaluate the physical model.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 6

Lesson 1: Designing Physical Database Objects

Lesson objective:
After completing this lesson, students will be able to:

Apply guidelines for designing physical database objects.


Introduction
In this lesson, you will learn about important guidelines for implementing physical database objects
when designing a physical database model. After learning how to establish naming standards for
database objects in general, you will focus on the tables and columns used to implement the entities
and attributes of the logical design.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 7

Guidelines for Establishing Database Object Naming Standards

Principle: Follow guidelines for naming database terms, abbreviations, and objects.
Introduction
When designing the physical database model, it is important to adopt a standard for naming database
objects. By following a naming standard, developers, database designers, and administrators can more
effectively communicate with stakeholders. For example, developers can clearly communicate the
contents of a table, or the purpose of a stored procedure, by the way the object is named. By adhering
to a naming standard, both the development team and the operations team can minimize errors caused
by misleading object names.
Guidelines
Consider the following guidelines when establishing database object naming standards:
• Use names that comply with the rules for forming SQL Server 2005 identifiers
Microsoft SQL Server™ 2005 refers to database names as identifiers, and it requires
identifiers for most objects. There are two classes of identifiers, delimited and regular.
Delimited identifiers are enclosed in brackets ([ ]) or double quotation marks ("") and can use
any characters in the character set. Regular identifiers, however, are subject to the following
rules:
ƒ The first character must be a Latin character from a to z or from A to Z, a letter from a
different language (Unicode Standard 3.2), or an underscore (_) character. The “at” sign
(@) denotes local variables or parameters, the number sign (#) denotes temporary tables
or procedures, and the double number sign (##) denotes a global temporary object.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 8

(Because SQL Server uses double @ signs in some Transact-SQL functions, to avoid
confusion, you should not use names that begin with @@.)
ƒ The remaining characters must be letters as previously defined, decimal numbers (basic
Latin or national scripts), the @ sign, the dollar sign ($), or _. Embedded spaces and
special characters are not allowed.
ƒ The identifier must not be a Transact-SQL reserved word in any uppercase/lowercase
version.
You should use regular identifiers to name the database objects. Using a regular identifier
eliminates the need to use delimiters around object names, making your queries easier to code
and read.
• Use descriptive terms
Keep your names brief and intuitive, and ensure that they clearly describe the object that you
are naming, such as a table, view, or stored procedure. For example, if you are naming a table
that contains employee information, the names Employees, Employee, or tblEmployee are
more descriptive than TVHREMP (T=Table, V=Vase Project, HR=Human Resources,
EMP=Employee). Even though the latter name is systematically formed, it is not very
descriptive. When you use a simple and descriptive naming style, your users, developers, and
database administrators can easily recognize the contents of a table or identify other database
objects.
• Use only standard abbreviations in names
When naming objects, use well-known abbreviations consistently and avoid using
nonstandard abbreviations. For example, you can choose “Org” as the abbreviation for
Organization, because Org is a well-known abbreviation for Organization. If you use this
abbreviation in all objects (tables, stored procedures, functions, and columns) that use the
word Organization, then users, developers, and database administrators can easily recognize
the meaning of the abbreviation. However, in some databases, Org might be confused with
“Organism” or “Organ,” so you should use careful judgment with abbreviations and make
sure that they are appropriate for the business vocabulary that you are using.
• Name intersection tables consistently
When establishing a naming standard, you must specify how to name intersection tables.
Database intersection tables are used to represent many-to-many relationships. A common
solution is to name the table by concatenating the names of the entities it references. For
example, the relationship between Countries and Languages may be represented as
CountriesLanguages.
• Be consistent across all objects
It is critical that after defining the naming standard you use it consistently across all objects.
To ensure consistency, database object naming reviews must be included in the project as a
strict quality-control measure. When names do not conform to the standard, they should be
added to the bug list. It is vital that this naming review take place in the design phase and that
all related bugs are resolved before any development takes place.
• Document and communicate naming standards
The best way to enforce a selected set of naming standards is to document them carefully and
then communicate them to all team members and stakeholders. The naming standards should
be clear to all stakeholders. As part of the documentation process, you might find it helpful to
create a “Quick Reference” card with a detailed explanation of the naming standards and
some examples. When new members join the development team, you should instruct them on
the established naming standards as part of their training.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 9

Considerations for Choosing Column Data Types

Principle: Consider tradeoffs when choosing column data types.


Introduction
During the physical design phase, you define tables and columns based on the entities and attributes in
the logical model. In the logical model, generalized data types might have been indicated for the
attributes, but you will normally choose the actual SQL Server data types during the physical design
phase. Making data type decisions involves many factors and necessitates making tradeoffs in your
choices. These tradeoffs might reflect performance or security considerations based on your
requirement documents as well as any service level agreement (SLA) that might be in effect. For
example, a data type such as a globally unique identifier (GUID) has a high degree of guaranteed
uniqueness, but it also comes with a performance penalty, and only an integer data type can perform
within required ranges.

Considerations
Consider the following tradeoffs when choosing column data types:
• Integer vs. GUID primary keys
It is often helpful to use a surrogate key as a primary key in your base tables. When working
with surrogate keys, you commonly have two choices: either define an integer, such as
smallint, int, or bigint, with an identity property; or use a GUID (uniqueidentifier) with a
NEWID default constraint.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 10

Integers are shorter than GUIDs, usually 4 bytes in size, and are easy to understand. Because
of the difference in size, indexes on integer columns usually perform better than indexes on
GUID columns. GUIDS are longer than integers (16 bytes for a GUID) and are difficult to
read and remember.
One advantage of GUIDs over integer columns is that they are globally unique in a database
and in a server across the world. Therefore, GUID columns are preferred as candidate keys in
distributed databases and are useful for consolidating data from diverse sources, where the
original integer keys might overlap. GUIDS are also required in systems that use merge
replication and transactional replication with immediate updating subscribers.
However, the GUID data type often does not perform as well as an integer key when used in
clustered indexes. A clustered index that is built on an integer value usually performs better
than one built on a GUID column because the size is smaller. In addition, if you can cluster a
table on an ascending integer key (for example, by using the Identity property), inserts will
not cause the table to fragment (although subsequent updates might). Because the GUID
values are essentially random, inserts can lead to fragmentation of a table when you cluster on
a column with a GUID data type.
For more information
For more information on the performance effects of using GUIDs as primary keys, see the article
“Performance Effects of Using GUIDs as Primary Keys,” written by Brian Moran on the
WindowsITPro web site.

• Fixed vs. variable length columns


If all of the values in a column are of the same or similar length, you should use the CHAR or
NCHAR data types. However, if column data entries vary considerably in length, you should
choose the VARCHAR or NVARCHAR data types. The VARCHAR and NVARCHAR
data types use 2 bytes in each row to record the length of the information contained. When
column length values change considerably, the VARCHAR and NVARCHAR data types
can occupy less space than the CHAR and NCHAR data types, because they will not fill up
the values with spaces.
• VARCHAR (MAX), NVARCHAR (MAX), and VARBINARY (MAX) data types
SQL Server 2005 introduces an extension of the VARCHAR, NVARCHAR, and
VARBINARY data types. Using this extension, you can create columns that can contain
more than 8,000 bytes of data.
The new extension eliminates the need for text and image data types. You should avoid the
TEXT, NTEXT, and IMAGE data types in new solutions that you develop, as they are now
officially deprecated in SQL Server 2005.
• Character column collations and Unicode, and non-Unicode data types
Character column collations are rules used in SQL Server to determine how data is sorted and
compared. These rules are based on norms of languages, locales, and case-sensitivity, accent
marks, and kana (Japanese) character rules. If all users of the database use the same language,
you should use a collation that provides support for that language. You can also use a more
general collation, such as the LATIN1_GENERAL collation.
When the database supports users with different languages, consider using data types that
store Unicode characters (NCHAR and NVARCHAR). You should use Unicode data types
only when clearly needed, such as when multilanguage support is required. Unicode data
types require 2 bytes per character, as opposed to 1 byte per character for the non-Unicode
types.
• Transact-SQL user-defined data types
You can create a Transact-SQL user-defined data type (UDDT) in SQL Server 2005 based on

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 11

system-provided data types. With the help of UDDTs, you obtain an extra level of abstraction
by declaring the type in one place and using it in several columns. For example, if Product
Code in a database is of the CHAR (10) type, you can create a Product Code UDDT and use
it in all of the required columns.
The main advantage of traditional Transact-SQL UDDTs is that you can apply constraints to
them by using bound defaults and bound rules. However, the use of Transact-SQL UDDTs is
severely limited now that bound rules and bound defaults have been deprecated in SQL
Server 2005. In place of Transact-SQL UDDTs, you can use common language runtime
(CLR) user-defined data types (UDTs) and code appropriate constraints by using managed
code.
• CLR user-defined types
A new feature of SQL Server 2005 is that you can create your own custom UDTs based on
classes implemented in the Microsoft .NET Framework assemblies. As a result, you can
create customized data types by using any language (Microsoft Visual Basic®, C#, C++, J#,
and others) that supports the .NET Framework.
One advantage of CLR UDTs is that they provide encapsulation that separates the process of
storing and presenting data. For example, the DateTime data type is serialized differently in
the database (a decimal number) and displays differently when queried (date format).
By using CLR UDTs, you can create customized data types with the same behavior. For
example, you can create your own UDT that captures integers and displays them as roman
numerals (2005=MMV).
CLR UDTs will be covered in detail in the next topic, “Considerations for Using CLR User-
Defined Data Types.”
Note
In SQL Server 2005, the abbreviation UDT is used for CLR user-defined types, while UDDT is used
for Transact-SQL user-defined data types. Although both are actually user-defined data types that are
used to limit the domain for columnar data, the CLR managed code data types are usually known
simply as “user-defined types.”

• XML data type


SQL Server 2005 introduces the XML data type, which you can use to store XML documents
and XML fragments. SQL Server verifies whether the stored XML document is well-formed.
Further, the XML document can be validated with an associated XML schema. The XML
data type provides you with a set of XML functions to access and modify the values of a
column by using XQUERY, a standard language for querying XML documents. By using the
XML data type, you can also create indexes over the XML content.
If you know the XML schema of the columns at design time, you should create the columns
by using the XML data type associated with the proper schema.

For more information


The guidelines for using the XML data type are covered later in this lesson in the topic Guidelines for
Using the XML Data Type, and XML indexes are covered in Module 4, “Designing for Database
Performance,” Lesson 1, Designing Indexes.

For more information on the XML data type, see Course 2781, Designing Microsoft SQL Server 2005
Server-Side Solutions.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 12

Considerations for Using CLR User-Defined Data Types

Principle: Consider aspects of data types and processing requirements when using CLR user-defined
data types.
Introduction
With the help of CLR UDTs, developers can program their own data types in SQL Server 2005. By
using CLR UDTs, you can use a .NET Framework language to create classes that extend the system
data types provided in SQL Server 2005. In previous releases of SQL Server, only the built-in data
types and the Transact-SQL user-defined data types were available.
Because CLR UDTs embody code that is not readily available to the database administrator to view or
change, it is important that CLR UDTs be fully tested and well designed before you allow them to
affect database data.
Considerations when using CLR UDTs
When using CLR UDTs, consider the following points:
• Consider using CLR UDTs for nonstandard or proprietary data types
CLR UDTs are useful for solving problems in which you need to use data types that are
specific to your applications and different from built-in data types. Most data captured by
business applications can be stored in the built-in SQL Server data types. However, the
following are some examples in which CLR UDTs can be useful:

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 13

ƒ Cartesian coordinate data type. A point in a Cartesian plane is represented by X and Y


coordinates (X, Y). In SQL Server 2005, you can create your own type to store the point
as a single column.
ƒ Roman numeral data type. With a roman numeral data type, you can serialize the
information as a number, and present it as a character string.
ƒ E-mail or URL data type. An e-mail or URL data type permits you to validate the value
stored in the database by using languages, such as Visual Basic or C#, and by using the
powerful functions provided in the System.Text.RegularExpressions namespace.
ƒ Account number data type. Banking account numbers and credit card numbers use
different algorithms to validate the account number. You can build this algorithm into the
data type by using CLR UDTs.
• Avoid excessively complex data types
When building CLR UDTs, remember to avoid creating complex data types that can hinder
server performance. For example, you might not want to use CLR UDTs to store a telephone
array type, because there are better standard solutions that address this requirement.
For more information
For more information on considerations involved in using SQL CLR, including CRL UDTs, see the
white paper “Using CLR Integration in SQL Server 2005,” on the Microsoft MSDN Web site.

• Consider the overhead of row-by-row processing


When writing a class that supports a CLR UDT, consider that the code defining the data type
will be executed frequently, one row at time. Therefore, it is critical that the CLR UDT code
be fully optimized. However, even fully optimized code cannot perform as well as the built-in
SQL Server data types, so it is important to use CLR UDTs only when the cost in
performance is small, or at least acceptable.
• Consider the risks of tightly coupling a CLR UDT and the database
Loose coupling of components is one of the architectural patterns that is used in creating
highly scalable and extensible applications. CLR UDTs are tightly coupled to the database,
meaning that the code and the database are intrinsically joined and cannot be separated easily.
Consider this factor when deciding to implement CLR UDTs, because using them might
negatively affect reusability and maintainability. To use the UDT client, applications must
create a binary reference to the assembly that defines the UDT.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 14

Guidelines for Using the XML Data Type

Principle: Apply guidelines for using the XML data type.


Introduction
SQL Server 2005 introduces a built-in XML data type that offers database designers the ability to
store unstructured data that was not natively supported in previous versions of SQL Server. The XML
data type allows you to store XML documents natively in the database and create indexes on the
document.
Guidelines
Consider the following guidelines when working with the XML data type:
• Use the XML data type for data that is not updated
The relational model uses normalization to minimize storage of redundant data in the
database. Therefore, the relational model can provide support to many insert, update, and
delete operations. The XML model does not support redundancy reduction, and it does not
handle inserts, updates, and deletes as effectively as the relational model. If data is updated
frequently, you should work with a relational structure.
• Use typed XML columns
SQL Server 2005 allows designers to use XML Schema definition language (XSD)
documents to create typed XML columns, parameters, and variables. A typed XML value is an
XML element value that has been assigned a data type in an XML schema. To create typed
XML columns, you use the XmlSchemaCollection object to include one or more XSD
documents that will validate the column.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 15

Use typed XML columns when you have the schema information of the column. The server
validates typed XML columns to support data integrity in the XML documents. Typed XML
columns also allow the server to optimize storage and queries based on this information. Do
not use typed XML columns when the schema is not available or when the application
validates the XML documents, because the additional overhead of the typed XML column
will be unnecessary.
• Use the XML data type for data that is not relationally structured
When you have structured data, it is better to use the relational model and store the data in
relational tables. For example, most business invoices have certain attributes whose types are
known at design time. Data sets such as invoices, therefore, are good candidates to be stored
in tables. However, at times you might work with data that is semistructured or hierarchical,
and that does not map well in the relational model. For example, you might want to store the
data captured by a custom form for a particular customer in the customer table. The form
differs from customer to customer, but an XML column allows you to store each result per
customer. When data is not relationally structured, consider using the XML data type.
• Use the XML data type for configuration information
Configuration information is semistructured data that is often stored in .ini files or in the
Microsoft Windows® registry. With the introduction of the .NET Framework and the
increasing popularity of XML, modern applications use .xml files to store configuration
settings. Now you can easily store configuration information in an XML data type and ensure
that the data is well formed, validate the schema, and use XQuery and XPath to reference
specific values stored inside the attributes.
• Use the XML data type for data with recursive structures
Sometimes you need to store data that requires a varying recursive structure. An example is a
set of menu items used by a restaurant. To find the cost of a particular menu item, you add the
costs of all of its ingredients. However, some of the ingredients might also be used in the
creation of additional menu items. Therefore, you cannot calculate the cost of the ingredient
without taking into consideration its use in other menu items. This type of convoluted
calculation requires recursive queries, which are difficult to manage in the relational model.
Because XML is hierarchical by nature, it can easily handle these recursive structures.
However, if the recursive structure is simple or does not vary, you might find that Transact-
SQL common table expressions (CTEs) can accommodate those recursive queries in a
relational structure.
• Be alert to the overhead involved in querying for elements
When using the XML data type, take into consideration that you can incur significant
overhead if the server has to extract values from elements embedded in an XML column. Do
not expect the same level of performance obtained from querying attributes that are directly
stored in a relational table’s columns.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 16

Guidelines for Choosing Computed Columns

Principle: Apply guidelines for choosing computed columns.


Introduction
The computed columns feature of SQL Server enables you to create columns whose values are based
on the values of other columns in the same table. With SQL Server 2005, you can persist a computed
column by using the PERSISTED keyword in the CREATE or ALTER TABLE command. In your
physical design, some attributes can be calculated based on other attributes, and can therefore be
candidates for becoming a computed column in the database.
Guidelines
Consider the following guidelines when working with computed columns:
• Use computed columns to derive results from other columns
Consider an Order Detail table, for which you find it useful to include an Extended Price
column. You can implement this column as a computed column that is the result of the
multiplication of the Price and Quantity columns. Every time a user queries the Order Detail
table, the server performs the operation and sends the result of the calculation in Extended
Price column to the user. If your query includes a nonpersisted computed column, the server
calculates it at run time. Computed columns do not use space unless you specifically add
persistence.
• Use persisted computed columns for performance
If you need to improve the performance of a computed column and the values of the column

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 17

are rarely updated, you can optimize it by altering the column to be persisted. Adding the
PERSISTED keyword causes the column to be materialized (stored on disk).
When a computed column is persisted and a row is inserted or updated, the server generates
the value by using the operations and functions defined for the column. For subsequent
queries to the same row, the server uses the saved data and does not calculate the operations
again.
Persisted computed columns are not always faster than nonpersisted computed columns.
Simple operations in computed columns might be faster when not persisted, because
processors and memory are much faster than disk I/O, and calculating the values of a
computed column at run time can be faster than reading the values from the disk.
• Avoid the overhead of complex functions in computed columns
Avoid the use of complex functions in computed columns that make the computations
expensive. This is particularly important when you are using nonmaterialized columns on
frequently queried data, or when you are using materialized views in frequently updated data.
When using user-defined functions inside computed columns, you should be cautious of
functions that are slow or that consume processor resources intensively.
• Avoid persisted computed columns on active data
If a persisted computed column’s value is based on columns that are frequently updated, the
additional computation required to store the newly computed values can add significant
overhead to inserts, updates, and deletes on the table. As a result, computed columns on
active tables should not be persisted.
• Protect against numeric overflow and divide by zero
When creating computed columns, you must ensure that all operations and functions are
arithmetically correct. For example, assume that you have an Inventory table with Quantity
and TotalCost columns. You can add a column for unitary cost by using the following
statement:
WHEN Quantity=0.0 THEN 0 ELSE TotalCost/Quantity END
This removes the possibility of a divide-by-zero error when the quantity is zero.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 18

Lesson 2: Designing Constraints

Lesson objective:
After completing this lesson, students will be able to:

Apply best practices when designing constraints to columns, tables, and databases.
Introduction
In this lesson, you will review guidelines for applying physical constraints as a part of your physical
database model. After you have designed the tables that implement a logical model’s entities and
attributes, you can proceed to design the database constraints that will implement the logical model’s
constraints and relationships.

Database constraints limit the data values that can be stored in the database. SQL Server uses
constraints to validate data changes during INSERT, UPDATE, and DELETE commands. You can
use constraints to help ensure the consistency of the database as well as the overall quality and
accuracy of its data, and therefore the database’s integrity.

You can implement the following four types of integrity by using database constraints:
• Entity integrity
• Domain integrity
• Referential integrity
• User-defined integrity

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 19

Although database constraints are the preferred method of enforcing database integrity, you can also
use other custom-coded methods such as stored procedures, functions, triggers, and rules. However,
built-in database constraints have the following three advantages over custom-coded methods:
• Constraints follow standard SQL language formats and are therefore easily grasped.
• Built-in constraints generally offer better performance than coded constraints, because SQL
Server is optimized to validate constraints when data is updated.
• Constraints can be considered by the SQL Server query optimizer and can assist the optimizer
in building more efficient query plans.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 20

Guidelines for Designing Column Constraints

Principle: Apply guidelines for designing column constraints.


Introduction
Column constraints are database restrictions or rules that limit the range of values that can be stored in
a column. Defining these rules enhances the quality of the stored information and is a standard method
for enforcing integrity in the database.
Guidelines
Consider the following guidelines when designing column constraints:
• Declare columns as NOT NULL whenever possible
If possible, you should not allow NULLs in columns, because NULLs add unnecessary
complexity to the database. This complexity is often beyond what developers and database
administrators intend.
For example, if you have a column named HireDate that allows NULLs, you might expect
that if you query the table with the condition HireDate< '20050101', and later query the table
with the condition HireDate>='20050101', the result of the two conditions would include all
of the rows in the table. However, your query results will be missing the rows in which the
HireDate value is NULL, because a NULL implies an unknown value, and an unknown
HireDate is neither before nor after one specific date.
When your design requires you to use columns with unknown values, consider using a
specific value to indicate an unknown value. In many cases, an actual value within the
domain of the column’s acceptable values can serve as an indicator that the value is unknown.
For example, a HireDate of 1900-01-01 might be considered an empty or unknown value.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 21

However, you must ensure that all queries are aware of a particular value being an unknown
value. When no actual value could serve to indicate an unknown value, you might be forced
to allow NULL in the column.
• Use ANSI DEFAULT constraints rather than bound defaults
You can assign a default value to a column by using a DEFAULT constraint. The standard
method for giving a default value to a column is by using the ANSI default constraint. The
ANSI default constraint is assigned to a column when it is declared. A nonstandard method of
defining a default value is to define a bound default, and then bind the default object to the
required columns.
ANSI defaults are easier to declare and maintain than bound defaults. Because bound defaults
are deprecated in SQL Server 2005, you should avoid using them in your designs and plan for
their removal when redesigning current databases that use them.
• Use column CHECK constraints to enforce domain integrity
Domain integrity enhances the quality of data by placing limits on the values a column can
contain. In SQL Server, CHECK constraints are the built-in method of enforcing domain
integrity. Some samples of CHECK constraints are:
ƒ Amount>0
ƒ Gender IN ('F','M')
ƒ LicensePlate LIKE '[A-Z][A-Z][A-Z]-[0-9][0-9][0-9]'
CHECK constraints work best when they are simple. Complex constraints should be placed
in the middle tier, especially if they are subject to frequent change.
• Use CHECK constraints rather than bound rules
CHECK constraints are an ANSI standard method for placing explicit limits on the ranges of
values that can be stored in a column. Bound rules are an older, nonstandard alternative to
CHECK constraints. You should always use ANSI CHECK constraints declared at the
column or table level. You should avoid the use of bound rules for the same reasons you
should avoid bound defaults.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 22

Guidelines for Designing Table Constraints

Principle: Apply guidelines for designing table constraints.


Introduction
You have already learned how to enforce entity integrity by using primary keys, and that middle-tier
components usually implement user-defined integrity. This topic covers the guidelines for
implementing domain and referential integrity by using table constraints. Domain and referential
integrity are the important tools that can be implemented on the database to ensure data integrity.
Domain integrity validates the entries for a given column, and referential integrity defines
relationships between tables when records are entered or deleted.
Guidelines
You can use the following guidelines for implementing domain and referential integrity when
designing table constraints:
• Use DRI
Declarative referential integrity (DRI) is one of the most powerful instruments in a relational
database management system (RDBMS) for maintaining data integrity in the database. In
SQL Server, you enforce DRI by using foreign key constraints. Foreign key constraints
guarantee that a relationship between two tables is reliable. When you create a foreign key
constraint from one column in a table to a referenced column in another table, SQL Server
guarantees that the values of that column in every row of the table will be validated against
the valid values of a column in a referenced table. DRI is faster, safer, more scalable, and
easier to maintain than any other method of maintaining referential integrity.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 23

• Specify cascading levels and options


SQL server lets you decide what the server will do when a row referenced by another table is
deleted or updated. The default option is No action, which means that the server will raise the
error and send it to the client application, and then cancel the delete or update transactions in
the parent table.
You can use ANSI-standard ON DELETE and ON UPDATE cascading options to enforce
complex data types of DRI that otherwise require unnecessary and potentially slow-running
Transact-SQL code. For example, suppose there is a foreign key constraint between the Order
and Order Detail tables on the OrderID column. If you delete a row in the Order table, and if
the constraint has the CASCADE option, the action will delete all rows in the Order Detail
table that reference the deleted row in the Order table. A new foreign key option in SQL
Server alternatively lets you set the OrderID column in the column table to NULL when the
parent row is deleted.
• Use triggers to enforce referential integrity across databases
SQL Server 2005 does not let you establish DRI between databases, only between tables
within a single database. To maintain data integrity between multiple databases, you can use
triggers.
To establish such integrity, you must create triggers to handle four actions. You need one or
two triggers to handle updates and deletes in the parent table, checking for matching rows in
the child table. If a mismatch occurs, the triggers must roll back the transaction and raise an
error, or else they must update all rows in the child table with NULLs in the referencing
columns.
You will also need one or two triggers in the child table to control insert or update actions and
ensure that all references to the parent table are valid.
• Use table-level CHECK constraints to enforce domain integrity at the table level
When a column’s valid values depend on values from another column in the same table, you
can use a table-level CHECK constraint to enforce domain integrity. Some examples of
table-level CHECK constraints are:
ƒ (HireDate<=TerminationDate OR TerminationDate IS NULL)
ƒ ((Country='USA' AND SSN LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]')
OR ((Country<>'USA' AND SSN=''))
Remember that components in the middle tier can enforce some business rules better than
the database. If a table-level CHECK constraint is too complex, or if it has dependencies
that are difficult to enforce in the database, consider moving the constraint logic to a
middle-tier component.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 24

Guidelines for Designing Database Constraints

Principle: Apply guidelines for designing database constraints.


Introduction
You can implement constraints at a database level by using data definition language (DDL) triggers.
DDL triggers are similar to data manipulation language (DML) triggers, in that you can trap certain
events and roll them back. The difference is that DDL triggers capture database-wide DDL statements,
such as CREATE, ALTER, and DROP, rather than DML statements (INSERT, UPDATE, and
DELETE). You can use DDL triggers to implement auditing, DDL security, and a limited form of
database change control. However, DDL triggers are not a substitute for proper security policies and
permissions control in the database, so do not ignore more detailed security issues just because you
have implemented DDL triggers.
Guidelines
Consider the following guidelines for using DDL triggers:
• Use DDL triggers for auditing
You can use a DDL trigger to maintain an audit log that contains information about DDL
statements that are issued to the database. The audit log records the details of the users who
submit DDL statement to the database, allowing you to track people who were responsible for
making certain changes. The following is a simple auditing sample.
CREATE TABLE DropTableLog(
DropEventID INT NOT NULL PRIMARY KEY IDENTITY
, ResponsibleUser NVARCHAR(32) NOT NULL
DEFAULT(SYSTEM_USER)

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 25

, DropDate DateTime NOT NULL DEFAULT(GETDATE())


, EventCommand NVARCHAR(2000) NOT NULL )

CREATE TABLE Dropable ( Col1 INT)


GO
CREATE TRIGGER DropTriggerLog
ON DATABASE AFTER DROP_TABLE
AS
SET NOCOUNT ON
INSERT INTO DropTableLog(EventCommand)
VALUES(EVENTDATA().value('(/EVENT_INSTANCE/TSQLCommand)[1]'
, 'nvarchar(2000)') ) ;
GO
DROP TABLE Dropable
SELECT * FROM DropTableLog

• Use DDL triggers to support security


You can use DDL triggers to help prevent unauthorized changes to database schemas by
preventing users from using the DROP TABLE statement, as illustrated in the following
example.
CREATE TRIGGER NoDropTable
ON DATABASE FOR DROP_TABLE
AS
RAISERROR('DROP Table NOT ALLOWED in this
database',16,1)
ROLLBACK
GO
• Use DDL triggers to implement database change control
You can use DDL triggers to prevent accidental changes to a database, or to track schema and
other changes. Based on the auditing DDL trigger previously defined, a developer can review
all DDL statements issued against the development database. A developer can also copy the
captured statements to create the script needed to deploy a new version of the database in the
production environment.
In some cases, you might want to prevent unauthorized changes entirely. For example, you
might create a DDL trigger on production databases that will prevent any schema changes.
During official deployments of database changes, you can disable or remove the trigger
temporarily.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 26

Lesson 3: Designing for Database Security

Lesson objective:
After completing this lesson, students will be able to:

Implement security best practices in the design of databases.


Introduction
Growing concerns about information security led Microsoft to launch its Trustworthy Computing
Initiative in 2002. Based on this initiative, security issues in SQL Server 2005 were addressed and
new security features were added in the early stages of design. The Trustworthy Computing Initiative
includes the following goals:
• Secure by design
• Secure by default
• Secure in deployment
This lesson provides you with guidelines and considerations that can help you design a more secure
physical database. These guidelines are based on the Trustworthy Computing Initiative and four basic
security concepts: authentication, authorization, data protection, and auditing.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 27

Guidelines for Authentication and Authorization

Principle: Apply guidelines for authentication and authorization.


Introduction
Authentication and authorization are the major tools that you can use when designing security into
your database model. In SQL Server, authentication is the level at which users can establish a
connection to SQL Server; authorization is the level at which connected users are verified for the
appropriate permissions on the database objects.
Authentication
Authentication is the first layer of security in a database server. When you implement authentication
decisions in your database design, you specify how the identity of users can be identified and
confirmed.
Consider the following guidelines when designing the authentication process:
• User or application authentication
When working with distributed applications, you can use either of two security scenarios for
authorization in the database: user authentication or application authentication. When you
select user authentication, middleware components impersonate the user. By using this
option, you can create a more secure environment with better access control. You can also use
database tools to audit user access.
When you use application authentication, the middleware components have their own
connection information and do not use the user identity to access the database. An application
user name can be passed in as a parameter for further authorization, but that user name is not

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 28

used by SQL Server. Application authentication is more scalable than user authentication and
is the most popular solution in enterprise scenarios. From the database perspective, the
application, not the user, is authenticated.
• Windows, SQL Server, or personalized authentication
When designing a SQL Server 2005 database, you have three options for authentication:
Windows Authentication, SQL Server authentication, or personalized authentication.
You should use Windows Authentication when all SQL Server clients are running the
Microsoft Windows operating system. This type of authentication is more secure and allows
easier management than other authentication mechanisms because it is integrated with
Microsoft Active Directory® directory services. Windows Authentication also provides single
sign-on, which is desirable in many situations and eliminates the need for maintaining
multiple passwords.
If you combine application authentication and Windows Authentication, only the application
servers must run Windows and be members of the Active Directory forest. The end user client
PCs can run any operating system and do not need to be members of the Active Directory
forest. This makes combining application authentication and Windows Authentication a very
attractive combination for enterprise solutions.
You should use SQL Server authentication when some of the clients are not running the
Windows operating system, and they must connect directly to SQL Server. A new feature in
SQL Server 2005 allows you to enforce password policies defined in Active Directory in SQL
Server authentication, if SQL Server is hosted on Windows Server 2003 or later.
You can also design your own authentication process in which the middleware components
create their personalized authentication process and use SQL Server tables to store user
names and passwords used by the application. For example, Microsoft ASP.NET allows
developers to use forms authentication as a custom authentication mechanism; developers
using this technology will use database tables to store user security information.
Authorization
Authorization is the security layer that is responsible for assigning and checking whether authenticated
users have the required permission to access resources, such as tables, views, stored procedures, and
functions. The recommended implementation process to create a manageable and secure environment
in SQL Server 2005 is as follows:
• Create user roles in the database
User roles let database administrators manage groups of users according to the required level
of access to resources.
• Assign permission to user roles
Based on use cases, database administrators analyze and grant permissions to provide the
required access to the resources. A common solution is to create a security matrix or
workbook with user roles as columns and objects as rows, with every cell detailing the level
of access required (SELECT, INSERT, DELETE, UPDATE, and EXECUTE) for the
object.
• Assign users to user roles
Depending on the authorization mechanism that was selected, database administrators can
assign users to Active Directory user groups and assign the Active Directory group to the
database role. If you use SQL Server authentication, assign the database users to the database
user roles.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 29

Considerations for Data Protection

Principle: Consider the methods for protecting data in a database.


Introduction
When considering the security aspects of your database design, you will soon realize that security and
performance are often competing goals in a database design. Designing too much security into a
database can prevent it from performing adequately. Therefore, you should carefully design data
protection methods that will have the least possible impact on database performance.
Data protection
The main design goal of database security is to provide data protection. There are three primary
elements of data protection: confidentiality, integrity, and non-repudiability. Confidentiality means
providing data access only to the authorized users. Integrity means providing authentic, unaltered, and
reliable information to the users. Non-repudiability means providing mechanisms by which users
cannot deny their actions.

Consider the following guidelines when designing data protection strategies:


• Use views to hide data
One of the main uses of views is to provide data protection in the database. For example, if
you do not want employees to access invoices older than 24 hours, you can create a view with
a WHERE condition of InvoiceDate>DATEADD(day, -1, GETDATE()) and not grant them
access to the base table. Therefore, employees will only be able to access invoices through the
view.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 30

You can also use views to allow access to summarized information only, by using the
GROUP BY clause in the view. This is particularly useful in hospitals, for example, where
access to individual records is limited to doctors, but additional users may access grouped
data.
• Use the WITH CHECK OPTION in views
By using the WITH CHECK OPTION, you can limit modifications to rows in a view. With
this option enabled, if a row is modified, the updated row must conform to the WHERE
condition of the view. In the previously mentioned example of the employee, if the view is
created with this option, the employee cannot insert or update an invoice with data earlier
than what the filter stipulates, because the employee is limited by the filter and does not have
access to the invoice table.
Note
The WITH CHECK OPTION is a critical security component of views. Without this option, users can
exploit views to infer underlying data that the view does not directly show.

• Use stored procedures to protect data


By using stored procedures, you can increase database security by denying direct access to
the data, even more effectively than with views. Stored procedures permit a variety of
Transact-SQL statements and support more complex logic than views; therefore, you should
implement them as one of the security measures in the database. By using stored procedures,
you can declare variables, and use simple workflows and conditions to return data, which you
cannot do with views. When using stored procedures to protect data, you can grant users
execute permissions on the stored procedures without granting permissions to access the
underlying tables directly. In this way, users can access tables only by executing the
procedures.
Data encryption
Data encryption is a fundamental technique that provides confidentiality to stored information in the
database. You should analyze the need for data encryption based on your specific business
requirements. SQL Server 2005 provides a hierarchical encryption and key management infrastructure
that enables developers to use Transact-SQL functions to encrypt data.
You can use the EncryptByAsymKey and DecryptByAsymKey functions to provide asymmetric
encryption. You can use the EncryptByKey and DecryptByKey functions to provide symmetric
encryption. The asymmetric encryptions use the RSA algorithm, with private keys of 512, 1024, or
2048 bytes. The symmetric encryptions use a TRIPLE_DES algorithm.
Asymmetric keys provide a more secure environment, but they have higher performance costs.
Symmetric keys do not have the same performance overhead, which means that they run faster, but
they are not as secure. You should use the appropriate keys based on your data requirements and size
the hardware infrastructure accordingly.
For more information
For more information on encryption issues and algorithms, review the Cryptography Overview section
of the .NET Framework Developer’s Guide on the Microsoft MSDN Web site.

The following are scenarios in which you should consider using data encryption to maintain the
confidentially of the information:
• When using personalized authentication, the user password is passed in from the application
and stored in the database. Because users frequently use the same password in different
applications, it is critical that you encrypt any passwords that are stored in the database. This
will protect the authentication processes of your application as well as other applications.
• In addition to passwords, you can encrypt highly sensitive data. For example, private financial
information (such as credit card numbers) and medical records are commonly encrypted.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 31

Considerations for Auditing

Principle: Consider the impact of auditing on the physical schema of the database.
Introduction
One of the three primary elements of data protection is non-repudiability. Repudiability is the
possibility of denying an action that occurred. For example, if a user accidentally deletes a row and
denies that he or she did it, it means that the user has repudiated his or her own action. To control
repudiability, you must add application auditing to the solution. By using application auditing, you can
create logs of events (INSERT, DELETE, and UPDATE) that occur in the tables in the database.
You can also include DDL triggers, along with appropriate permissions to protect audit triggers from
malicious users.

This topic covers the different audit patterns that are frequently used on databases for supporting non-
repudiability. You should consider the impact of each pattern on the physical model.
Audit patterns
Databases that support non-repudiability normally use one of the following audit patterns:
• Simple audit in columns
One approach that you can follow is to add columns to tables. These columns can capture
details of the user who inserted the row and the date when the row was inserted. You can also
add columns to capture the same details for an updated row. This solution adds four columns
to each table that you want to audit.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 32

If user authentication is used, you can use a column DEFAULT value combined with insert
and update triggers that use system functions (such as SYSTEM_USER and
CURRENT_TIMESTAMP) to capture the information without the application explicitly
sending it.
• Simple audit in tables
A less intrusive method is to create an audit log table containing information that details all
changes to tables in the database. This table often includes a long integer identity column or a
unique identifier column primary key, as well as user and date columns. All audited tables
contain two columns that reference the log table. This solution offers a limited audit log that
permits auditing users to find details of users who would have created or updated the row, and
the time when the row was created or updated.
• History tables
When a more comprehensive solution and a full audit log is required, you might consider
creating history tables that mirror each audited table. With this approach, every time a row is
inserted, updated, or deleted, a row is created in the history table. To implement history audit
tables, you use triggers in the audited table. History tables include all columns of the original
table, a long integer identity column or a GUID column as the primary key, and user and date
columns.
• Audit log table with an XML column
This model extends the simple audit in tables to include two additional columns. One column
captures the table name, which is required to control delete statements. The other column is
an XML column that includes the modified information. Because SQL Server 2005 provides
extensive support for XML, this option is somewhat more manageable and scalable than in
previous releases.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 33

Discussion: Encryption Tradeoffs

Principle: Discuss the tradeoffs implicit in different encryption scenarios.


Introduction
Suppose you are designing a Web application that supports approximately 10,000 hits per day.
Visitors to your Web site will be able to create their own personalized account. E-mail addresses will
be used as an account’s principal identity, and users will assign a password to their accounts. To
provide users with confidentiality, you want to store only encrypted passwords in the database.
The development team agrees with your basic design, but some team members want to implement the
encryption process in middle-tier components, while others want to implement the encryption process
in the database.

Discussion questions
Based on the scenario, consider the following questions:
Q What are the advantages of encryption in the middle tier?
A The advantages of encryption in the middle tier are as follows:
• Encryption is a processor-intensive process. Middle-tier components are usually run in
servers with sufficient processor resources.
• Scalability is achieved when encryption is implemented in the middle tier.
• If passwords are encrypted early in the middle tier, their exposure in the network is
minimized.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 34

Q What are the advantages of encryption in the database?


A The advantages of encryption in the database are as follows:
• The encryption process requires a key management infrastructure. The encrypted data
is safe only if the key is secure. SQL Server 2005 offers an extensive infrastructure to
manage keys.
• Based on the defense-in-depth principle, you must assume that attackers will penetrate
every defense. Data encryption is the final level of defense; if middle-tier components
are exposed, data will still be protected.
• Encryption at the database level can be easily implemented without fundamentally
changing the application logic.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 35

Lesson 4: Designing Database and Server Options

Lesson objective:
After completing this lesson, students will be able to:

Apply best practices when designing database and server options.


Introduction
After you have completed an initial draft of your physical database model, you should focus on other
important considerations and decisions. For example, it is important to consider how the database and
server should be configured, because these configurations can significantly affect the rest of the
development team. Also, because SQL Server server-wide settings apply to all databases on an
instance, your design requirements will restrict the servers on which your database can reside, as well
as your design’s compatibility with a given database.

To reduce the risk of conflicts later in the development life cycle, you should make these decisions at
an early stage, and then document and communicate them to the development team and stakeholders.

Note
To facilitate your software development life cycle, including improved documentation and
communication, consider using the Microsoft Visual Studio® 2005 Team System and its Logical
Datacenter designer.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 36

Considerations for Service Settings

Principle: Consider the service settings in designs that implement cross-database access or enable
CLR integration.
Introduction
There are two database server-wide settings that you should specify in your physical design: enabling
CLR integration and enabling cross-database access. Both have a broad impact on how database code
will behave. The CLR integration determines whether the .NET CLR will be enabled in your database.
You must enable it if you have specified any CLR objects in the database. Another setting concerns
cross-database access. If your design specifies that Transact-SQL must make references from one
database to another, you must enable cross-database access.
Enabling CLR integration
With SQL Server 2005, you can specify CLR stored procedures, triggers, user-defined functions
(UDFs), custom aggregates, and custom data types in your database design. These elements can also
be coded in any .NET-compliant language other than Transact-SQL.
CLR integration is determined at the server level: either all databases will have it enabled, or none
will. Because SQL Server 2005 disables CLR integration by default, you must explicitly specify it in
your design. When you do so, SQL Server 2005 directly hosts the Microsoft .NET Framework CLR
Version 2.0 in the SQL Server Database Engine. By being able to run CLR code in the database,
developers can use procedural languages, such as Visual Basic, C#, and C++ to create functionality in
the database that was not possible before.

When deciding whether to enable CLR integration, consider the following:

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 37

• Relevant design requirements


As a database designer; you should decide the language to use for creating the functionality
specified in the design requirements. Transact-SQL is the preferred choice when you work
with data access that has little or no procedural logic. Transact-SQL always runs on the
server. However, if you need to write code that is not compatible with Transact-SQL, CLR
becomes an obvious choice. CLR code written in the .NET Framework languages can be used
to create complex algorithms, workflows, and logic, and it can run on the client (user
interface), the application servers (business components), or the database servers.
For example, some bank account numbers request a verifier digit, based on a complex
algorithm. Transact-SQL is not the best language to write that code, because it is an
interpreted language designed and optimized for data access, and it has many limitations
when coding complex logic. Because data integrity is critical and different components and
stored procedures will access the table, a CHECK constraint is required to validate the
account. To resolve this, you can enable CLR integration on the server, create a UDF, and use
the UDF in a CHECK constraint in a .NET Framework language.
If you choose not to use CLR integration in the database server, you should maintain the
default configuration: CLR disabled, to reduce the overall attackable surface area of your
server.
For more information
For more information about this subject, review the “CLR Integration Security” topic in SQL
Server 2005 Books Online on the Microsoft TechNet Web site.

• Security and access


When designing the physical model, you should consider the level of access that the CLR
code will require from the server. SQL Server 2005 permits three permissions sets for
assemblies: SAFE, EXTERNAL_ACCESS, and UNSAFE.
The SAFE permission allows only assemblies to run and access the current database, while
disallowing any external access. The EXTERNAL_ACCESS permission allows the assembly
to use the .NET Framework class library to access the file system, registry, network, and Web
services, among others. The SAFE and EXTERNAL_ACCESS permission sets do not allow
the code to access arbitrary memory addresses, thereby protecting the database engine. The
UNSAFE permission set allows code unlimited access to resources within SQL Server and
outside the server. UNSAFE code may access the memory space of the Database Engine,
potentially hindering the security, performance, and stability of the server.
You should follow the principle of least privilege. This principle requires the least amount of
access privileges. Evaluate the security need of the assemblies and grant them the least level
of permission set.
Note
To enable the server to run CLR code, execute the following Transact-SQL statements:
EXEC sp_configure 'clr enabled', '1'
RECONFIGURE

• Performance
When deciding whether to use Transact-SQL or CLR .NET Framework–compatible
languages, you should evaluate the performance effects on your decision. Set-based
operations work better in Transact-SQL, because Microsoft has optimized the query engine to
work with set operations. The query engine has embedded advanced algorithms that allow it
to run Transact-SQL statements exceptionally fast.
Procedural statements and looping constructs run faster in CLR .NET Framework–compatible

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 38

languages than in Transact-SQL. Stored procedures that use Object Linking and Embedding
(OLE) automation objects (using sp_OACreate) or user-defined extended stored procedures
can be replaced by better-performing CLR procedures. Note that although SQL Server 2005
supports user-defined extended stored procedures, forthcoming versions of SQL Server will
not have this feature.
Cross-database access
SQL Server 2005 checks permissions on objects every time the ownership chain is broken. An
example of this is a stored procedure that uses a view based on a table. If all of the objects belong to
the same user, SQL Server only checks the user’s permission to run the procedure, but if different
users own the objects, SQL Server demands permissions for every object. Remember that the
aforementioned scenario applies for objects in the same database.
When the objects reside in different databases, SQL Server considers by default that the ownership
chain is broken and checks permissions on the involved objects. If you allow cross-database
ownership, the behavior changes, permitting the ownership chain to include more than one database.
A good practice is to leave cross-database ownership disabled, because users with CREATE
DATABASE rights may use the right to elevate their privileges in a database that they do not own.
Consider the following example: A user with CREATE DATABASE rights wants to gain UPDATE
access to a table named Salaries in the HumanResources database. User Charles owns the Salaries
table, and cross-ownership is enabled. To gain that privilege, the user will:
• Create a new database named MyDatabase.
• Create the MySalaries view in MyDatabase that references Salaries in the HumanResources
database.
• Change ownership of the MySalaries view to Charles.
Because the user is the owner of MyDatabase, he or she will have access to all rights (SELECT,
UPDATE, DELETE, and REFERENCE) in MySalaries. Because Charles owns both MySalaries and
Salaries, permissions will not be required when accessing the Salaries table through the use of the
MySalaries view.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 39

Guidelines for Specifying Database File Placement and Organization

Principle: Apply guidelines for specifying placement of database files.


Introduction
As part of the physical design process, you should specify the file locations for the database, based on
estimated activity and storage requirements.
Every SQL Server 2005 database contains at least two files: a primary data file and the transaction
log. By default, SQL Server stores data, indexes, and every object (including views, stored
procedures, UDFs, and assemblies) in the primary filegroup. When required, database administrators
can add secondary files and filegroups. All objects except system tables can be moved to secondary
files and filegroups.

The transaction log is where SQL Server 2005 records all transactions (database modifications) that
the database receives. The transaction log is a critical database element that guarantees data integrity
and is the source of information for SQL Server in the database recovery processes. By default, every
SQL Server database has at least one transaction log file, but you can specify additional log files for a
database.

For more information


For more information about how SQL Server 2005 stores information and how it assigns and manages
the file system, read the “Physical Database Architecture” section in SQL Server 2005 Books Online.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 40

Guidelines
Consider the following guidelines when specifying database file placement:
• Specify separate disk locations for log and data files
Reasons to have different physical disk locations for the transaction log and database files
include:
ƒ Performance
SQL Server 2005 uses random access to write data to any page in the primary and
secondary files in the database. SQL Server 2005 caches all database updates initially in
memory, rather than to disk. It then writes dirty pages (pages that have changed) to disk
when a checkpoint event is fired or when the lazywriter frees up the dirty pages. By
periodically saving dirty pages to disk, the server reduces the database recovery time
when restarting.
The server accesses the transaction log file in a sequential manner. Every time the
application sends a data modification statement to the server, all required database
changes are first saved in a transaction buffer and then written to disk as quickly as
possible.
By assigning database and log files to separate locations, you can improve system
performance (both server and application) by separating the two types of access.
ƒ Fault tolerance
SQL Server 2005 allows you to back up both the database and the transaction log. If
needed, you can use a full database backup and several transaction log backups to recover
a database.
If you place the two types of database files in separate disk volumes, the possibility of
full recovery is increased. If the database disk subsystem fails, the administrator can back
up the transaction log and use it to recover the database to the point in time at which the
subsystem failed, without losing any committed data.
For more information
Point-in-time recovery and transaction log backup require the database to be in the FULL recovery
model. Database recovery models will be discussed in the next topic.

• Specify multiple data files


You might want to define multiple data files to divide the workload of the disk system into
various subsystems. This can result in improved server performance and the ability to handle
larger workloads. At the physical design level, you should analyze the need for multiple disk
subsystems that might handle the workload and match your performance requirements.
With a very large database (VLDB), you should consider the need for multiple data files
distributed in multiple file subsystems to increase the system’s fault tolerance. Multiple data
files can be backed up separately and do not require a full backup restore in the event of
failure.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 41

Guidelines for Choosing Database Options

Principle: Apply guidelines for choosing database options.


Introduction
When designing the physical database model, you should also specify the database options that your
design will require. These options affect the behavior of the database and the server. By including and
documenting your decisions in the design at an early stage, you provide other development team
members with the information necessary to create other components in the solution.
Database options
Consider the following guidelines when choosing database options:
• Recovery model of the database
SQL Server 2005 permits three different configurations for the database recovery model: Full,
Bulk Logged, and Simple. The Full recovery model is the recommended option for most
online transaction processing (OLTP) databases, because it provides the best fault tolerance.
The Bulk Logged recovery model is recommended for stage and data warehouse databases
that will benefit from fast bulk-copy operations. The Simple recovery model should be used
only when maintaining the transaction log is not necessary, or when there is no need for better
fault tolerance. For example, you might use the Simple recovery model for development
databases, non-critical end user databases, and read-only databases.
• Database collation
As mentioned in Lesson 1 of this module, Designing Physical Database Objects, character
collations are rules that SQL Server 2005 uses to determine how data is sorted and compared.
You should choose the database collation that matches the language of the database users. If

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 42

users speak multiple languages, use a more general collation such as LATIN1_GENERAL.
You will also need to decide whether the database will be case-sensitive, accent-sensitive, and
kana-sensitive (for Japanese characters). These options help you configure how SQL Server
orders values and solve comparisons. For example, if the database is configured for case
sensitivity, the condition 'a'='A' is false.
For more information
For more information on database collation information, read the topic “International Considerations
for Databases and Database Engine Applications” in SQL Server 2005 Books in Online.

• ANSI options
You can set two ANSI options that define the behavior of NULLs at the database level:
ANSI_NULL_DEFAULT and ANSI_NULLS. The ANSI_NULL_DEFAULT option defines
how the server will create a column when you do not explicitly declare it with the NULL or
NOT NULL keywords. When this option is set to ON, columns will allow NULL values by
default. When this option is set to OFF, a NOT NULL constraint will be attached to the
column and, by default, NULL values will not be allowed.
The second option is the ANSI_NULLS option. When this option is set to ON, the
comparison between columns containing NULL values is False; when set to OFF, the
comparison is True.
Because these settings can be overridden at the connection level, you should leave these
options set to their defaults in the database. More importantly, you should make it
compulsory to declare NOT NULL constraints in your CREATE TABLE scripts, and to use
only IS and IS NOT NULL clauses in all your Transact-SQL scripts. Scripts and commands
written with explicit declarations will have the same effect under any condition, regardless of
how the ANSI options are set. Such scripts and commands are also easy to understand for
database designers, developers, and database administrators.
• Database setting options
The AUTO_CLOSE, AUTO_SHRINK, and AUTO_CREATE_STATISTICS are the
database setting options, which are explained as follows:
ƒ When you set the AUTO_CLOSE option and users disconnect from the database, the
database files are closed and resources are released. The AUTO_CLOSE option lets users
manage database files directly from the operating system. AUTO_CLOSE is useful only
in desktop applications. In a server environment, it is too expensive for database servers
to frequently open and close database files. You should specify that the AUTO_CLOSE
option should always be OFF in production server databases.
ƒ The AUTO_SHRINK option lets the server shrink database files (data and log files)
when more than 25 percent of the file space is unused. For performance reasons, in
database servers, it is better to set the AUTO_SHRINK option to OFF.
ƒ The AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS options help
you control how the server manages column statistics. SQL Server 2005 creates statistics
over the search arguments used in WHERE and JOIN conditions. These statistics keep
track of the distribution of values in the column; for example, how many rows have a
certain value assigned to them.
AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS provide critical
information to SQL Server 2005. This information is used to design query access plans.
You should specify that both options be set to ON.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 43

Lesson 5: Evaluating the Physical Model

Lesson objective:
After completing this lesson, students will be able to:

Apply best practices when evaluating the physical model.


Introduction
Every step in the design process must be validated to ensure the quality and integrity of the design. In
Module 1, “Approaching Database Design Systematically,” and Module 2, “Modeling a Database at
the Logical Level,” you learned how expert users validate the conceptual model and how normal
forms determine how to design the logical model.

In this lesson, you will learn how the physical design is validated by using prototypes. You will also
learn about recommended guidelines for working with database prototypes.
In addition, you will review recommended guidelines for designing a data migration process.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 44

Reasons for Prototyping the Database Design

Principle: State the reasons for prototyping the database design.


Introduction
One of the best practices of modern design techniques is to incorporate different kinds of prototypes
during the development cycle. By using prototypes, you can gather valuable information about how
the software solutions are used. You can use prototypes to create simulations that help you verify
customer requirements, test performance and workload requirements, and validate the technology.
Prototypes provide two valuable services to the design process:
• By using prototypes, you can mitigate the risks associated with investing time and resources
in solutions that have hidden structural flaws. Without prototyping, the development team
might have difficulty detecting structural flaws that could include critical undetected
requirements, serious performance or scalability issues, technology incompatibilities, and
infrastructure stability issues.
• Prototypes help you increase the visibility of the project, which in turn helps to motivate the
development team and the stakeholders. Prototypes are also important tools for program
managers to track schedule and resource targets.
The MSF Process Model assigns different names to prototypes depending on the goal of the prototype
and the Process Model phase it uses. The different names are as follows:
• Prototypes. Used to envision and plan phases for exploring a product feature or architecture.
Use prototypes for demonstration purposes only.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 45

• Technology validation tests. Used to test and evaluate products and technologies in simple
isolated environments.
• Proof of concept. Used to verify that the solution is feasible, by means of testing in a lab
environment. The main difference between a proof of concept and the technology validation
test is that in a proof of concept, the lab environment simulates the production environment.
The proof of concept milestone is the first milestone in the development phase.
Reasons to prototype
When designing a database, you should use prototypes to:
• Validate the design feasibility
During the physical database design process, you should attempt to find ways to validate the
feasibility of the design. For example, for your physical model, you might create an early
prototype to evaluate the use of the .NET Framework technology and CLR UDTs. Later in
the project, you might want to use a technology validation test to check the SQL Server
Service Broker technology. Finally, you might consider using a proof of concept prototype to
verify the adequate performance of complex queries, or SLA compliance under conditions
that match expected production stress levels.
• Support the application prototype
You might need to create prototype databases to support application prototypes required by
the development team. These application prototypes might require different levels of
functionality from the database. Therefore, you should ensure that the database prototype
requirements document clearly states the goal of the prototype and the level of service
required from the database.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 46

Guidelines for Data Migration

Principle: Apply guidelines for migrating data.


Introduction
When the physical database design is ready to support the new features and functionality added to the
existing design solution, you must review the data migration requirements and choose the information
that is going to be imported from other systems.
Guidelines
Consider the following guidelines when migrating data:
• Identify all data inputs and outputs for the data to be migrated
As with any other project, it is critical that the development team and stakeholders share a
clear vision of what they want to build. In projects that involve data migration, the migration
requirements define the inputs and outputs of the data to be migrated. It is essential that the
requirements documentation clearly states the expected results of the migration processes in
terms of all data inputs and outputs.
• Determine the cost of data migration
Development teams usually focus on creating new features. As a result, the cost of data
migration can be overlooked or seriously underestimated. When designing your migration
process, be careful to thoroughly evaluate the resources and time needed to design, deploy,
and execute the process.
• Protect the integrity of the original database
It is important to protect the integrity of the original database by establishing a data migration

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 47

protocol. Examples of data migration protocols include data source backup, restoration of the
data source in a test environment, read-only access to the data source, verification checklists,
and deployment tests.
• Determine the method for deploying the new design into production
SQL Server 2005 provides several tools that you can use to migrate data and deploy new
designs into the production environment. Among the available tools are Transact-SQL
command scripts, the Bulk Copy Program (BCP), command-line scripts (SQLCMD, OSQL),
and SQL Server Integration Services (SSIS). Because of its flexibility, rapid development,
and versatility, SSIS is the preferred method to deploy new database designs into production.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 48

Lab: Modeling a Database at the Physical Level

Time estimated: 50 minutes


Scenario
You are the lead database designer working on the Human Resources Vacation and Sick Leave
Enhancement (HR VASE) project. In the previous lab, you created a conceptual model based on the
Requirements document, which detailed how the HR department wants to store information about the
vacation and sick-leave hours of its employees. In this lab, you will build a logical model based on the
conceptual model and normalize its entities.
The HR VASE project will enhance the company’s current human resources system. This system is
based on the AdventureWorks sample database in SQL Server 2005.
The main goals of the project are to:
• Provide managers with current and historical information about employee vacation and sick-
leave data.
• Give individual employees permission to view their vacation and sick-leave balances.
• Give certain employees in the HR department permission to view and update employee salary
data.
• Give certain employees in the HR department permission to view and update employee sick-
leave and vacation data.
• Give the HR manager permission to view and update all of the data.
• Standardize employee job titles.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 49

Preparation
Ensure that the virtual machines for the computers 2781A-90A-DEN-DC and 2782A-MIA-SQL-02
are running. Also, ensure that you have logged on to the computer by using the following credentials:
• Username: Administrator
• Password: Pa$$w0rd

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 50

Exercise 1: Specify Database Object Naming Standards


Introduction
In this exercise, you will specify a set of consistent naming standards for a given set of database object
types. You will then fill in a document specifying a naming convention for each type of database
object.

Specify database object naming standards


Summary Specifications
1. Analyze the Naming Standards Template 1. In Windows Explorer, browse to the
document. D:\Labfiles\Starter folder and double-click
2. Fill in the appropriate naming conventions in the NamingStandardsTemplate.doc file.
the document. The document contains placeholders for
naming conventions for all major database
objects, such as tables, columns, views,
constraints, stored procedures and functions,
and triggers.
2. Fill in the naming conventions, using simple,
brief, and descriptive terms.

Answer Key
1. In Microsoft® Windows® Explorer, browse to the D:\Labfiles\Starter folder and double-click the
NamingStandardsTemplate.doc file.
The document contains placeholders for naming conventions for all major database objects, such
as tables, columns, views, constraints, stored procedures and functions, and triggers.
2. Fill in the naming conventions, using simple, brief, and descriptive terms.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 51

Exercise 2: Define Tables and Columns and Choose Data Types


Introduction
In this exercise, you will design the physical model of the database design based on the logical model,
by choosing tables, columns, column data types, and constraints. You will review the naming
conventions you established in Exercise 1 and create appropriate physical object definitions. You will
then create Transact-SQL scripts to create each object, store the scripts in a SQL Server 2005
Management Studio project, and check the project into the Microsoft Visual SourceSafe® database.

Define tables and columns and choose data types


Summary Specifications

1. Save the LogicalModel.vsd file as


PhysicalModel.vsd. 1. Open Microsoft Office Visio® for Enterprise Architects
2. Generate a Table Report. 2005.
3. Modify the PhysicalModel.vsd file. 2. Open the LogicalModel.vsd file, located in the
D:\Labfiles\Starter folder, and save it as PhysicalModel.vsd in
4. Generate DDL scripts. the same folder.
5. Create a SQL Server Management Studio 3. Generate a Table Report:
project.
a. On the main menu, click Database, and then click
6. Add a new connection. Report.
7. Add the DDL to the HRVasePilot project. b. In the New Report Wizard dialog box, click
8. Save all files in the SQL Server Management Table Report, and then click Finish.
Studio project. c. In the Report dialog box, click Export to RTF,
name the file PhysicalReview.rtf, and then save
the report in the D:\Labfiles\Starter folder.
d. Click Close.
4. Open Visio for Enterprise Architects 2005.
5. Open the PhysicalReview.rtf file in Microsoft Office
Word.
6. Refer to the PhysicalModel.rtf file and the following
checklist to modify PhysicalModel.vsd.
Task Check
Review all physical names and rename
them to comply with the naming
standards set in Exercise 1.
Review all column types and change
them, if needed.
Verify that all columns are defined with
a NOT NULL constraint.
If a column needs to hold NULL
values, write a justification.
Review the need for CHECK
constraints.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 52

7. Save the PhysicalModel.vsd file.


8. Generate the DDL scripts:
a. On the main menu, click Database, and then click
Generate.
b. In the Generate Wizard dialog box, rename the
file HRVase.SQL, and then save it in the
D:\Labfiles\Starter folder.
c. Click Next, and then in the Installed Visio
drivers list, select Microsoft SQL Server. In the
Database name box, type HRVase, and then
click Finish.
d. In the SQL Server Create Database dialog box,
click Close.
e. In the Microsoft Visio dialog box, click No.
f. Exit Visio.
9. Create a Microsoft SQL Server Management Studio
project:
a. Open Microsoft SQL Server Management Studio.
b. On the main menu, click File, click New, and then
click Project.
c. In the New Project dialog box, name the project
HRVasePilot, and then save it in the
D:\Labfiles\Starter folder.
10. Add a new connection:
a. In Solution Explorer, navigate to the Connections
folder.
b. Right-click the Connections folder, and then
select New Connection.
c. Use your Server and Windows authentication, and
then click OK.
11. Add the DDL created in step 8 to the HRVasePilot project:
a. Using File Explorer or My Computer, navigate to
the D:\Labfiles\Starter folder.
b. Drag and drop the HRVase.SQL file into the
Queries folder.
12. Save all of the files in Microsoft SQL Server Management
Studio.

Answer Key
1. Open Microsoft Office Visio® for Enterprise Architects 2005.
2. Open the LogicalModel.vsd file, located in the D:\Labfiles\Starter folder, and save it as PhysicalModel.vsd in
the same folder.
3. Generate a Table Report:

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 53

a. On the main menu, click Database, and then click Report.


b. In the New Report Wizard dialog box, click Table Report, and then click Finish.
c. In the Report dialog box, click Export to RTF, name the file PhysicalReview.rtf, and then save the
report in the D:\Labfiles\Starter folder.
d. Click Close.
4. Open Visio for Enterprise Architects 2005.
5. Open the PhysicalReview.rtf file in Microsoft Office Word.
6. Refer to the PhysicalModel.rtf file and the following checklist to modify PhysicalModel.vsd.
Task Check
Review all physical names and rename them to comply
with the naming standards set in Exercise 1.
Review all column types and change them, if needed.
Verify that all columns are defined with a NOT NULL
constraint.
If a column needs to hold NULL values, write a
justification.
Review the need for CHECK constraints.

7. Save the PhysicalModel.vsd file.


8. Generate the DDL scripts:

a. On the main menu, click Database, and then click Generate.


b. In the Generate Wizard dialog box, rename the file HRVase.SQL, and then save it in the
D:\Labfiles\Starter folder.
c. Click Next, and then in the Installed Visio drivers list, click Microsoft SQL Server. In the Database
name box, type HRVase, and then click Finish.
d. In the SQL Server Create Database dialog box, click Close.
e. In the Microsoft Visio dialog box, click No.
f. Exit Visio.
9. Create a Microsoft SQL Server Management Studio project:

a. Open Microsoft SQL Server Management Studio.


b. On the main menu, click File, click New, and then click Project.
c. In the New Project dialog box, name the project HRVasePilot, and then save it in the
D:\Labfiles\Starter folder.
10. Add a new connection:

a. In Solution Explorer, navigate to the Connections folder.


b. Right-click the Connections folder, and then select New Connection.
c. Use your Server and Windows authentication, and then click OK.
11. Add the DDL created in step 8 to the HRVasePilot project:

a. Using File Explorer or My Computer, navigate to the D:\Labfiles\Starter folder.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 3: Modeling a Database at the Physical Level 54

b. Drag and drop the HRVase.SQL file into the Queries folder.
12. Save all of the files in Microsoft SQL Server Management Studio.
13. You can compare your solution with the PhysicalReview Solution.rtf and PhysicalModel Solution.vsd
documents. These documents are located at D:\Labfiles\Solution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance
Time estimated: 130 minutes

Lesson 1: Designing Indexes................................................................................................................. 4


Apply best practices for designing indexes.................................................................................... 4
Introduction ............................................................................................................................... 4
Performance Considerations in Choosing a Clustered Index.............................................................. 5
Principle: Consider performance when choosing a clustered index............................................... 5
Introduction ............................................................................................................................... 5
Performance considerations in choosing a clustered index ....................................................... 6
Performance Considerations in Choosing a Non-Clustered Index...................................................... 7
Principle: Consider performance when choosing a non-clustered index. ...................................... 7
Introduction ............................................................................................................................... 7
Considerations ........................................................................................................................... 8
Performance Considerations in Choosing an XML Data Type Index................................................. 9
Principle: Consider performance when choosing an XML data type index................................... 9
Introduction ............................................................................................................................... 9
Considerations ........................................................................................................................... 9
Performance Considerations in Choosing a Computed Column Index............................................. 11
Principle: Consider performance when choosing a computed column index. ............................. 11
Introduction ............................................................................................................................. 11
Considerations ......................................................................................................................... 12
Practice: Choosing Appropriate Indexes........................................................................................... 13
Introduction ............................................................................................................................. 13
Scenario................................................................................................................................... 13
Reference Files ........................................................................................................................ 13
Discussion .............................................................................................................................. 14

Lesson 2: Planning for Table Optimization...................................................................................... 15


Apply guidelines when planning for table optimization. ............................................................. 15
Introduction ............................................................................................................................. 15
Guidelines for Designing Views ....................................................................................................... 16
Principle: Design views according to guidelines. ........................................................................ 16
Introduction ............................................................................................................................. 16
Guidelines................................................................................................................................ 16
Guidelines for Choosing Indexed Views .......................................................................................... 18
Principle: Apply guidelines for choosing indexed views............................................................. 18
Introduction ............................................................................................................................. 18
Guidelines................................................................................................................................ 18
Best Practices for Partitioning Tables ............................................................................................... 20
Principle: Apply best practices for partitioning tables. ................................................................ 20
Introduction ............................................................................................................................. 20
Best Practices .......................................................................................................................... 20

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 2

Best Practices for Creating Summary Tables.................................................................................... 22


Principle: Apply best practices for creating summary tables....................................................... 22
Introduction ............................................................................................................................. 22
Best Practices .......................................................................................................................... 22
Guidelines for Selective Denormalization ........................................................................................ 24
Principle: Apply guidelines to the use of selective denormalization. .......................................... 24
Introduction ............................................................................................................................. 24
Guidelines................................................................................................................................ 24

Lesson 3: Planning for Database Optimization................................................................................ 26


Apply guidelines in choosing additional optimization techniques............................................... 26
Introduction ............................................................................................................................. 26
Best Practices for Choosing Snapshot Isolation................................................................................ 27
Principle: Apply best practices for choosing snapshot isolation.................................................. 27
Introduction ............................................................................................................................. 27
Best Practices .......................................................................................................................... 28
Guidelines for Sizing the Tempdb Database..................................................................................... 29
Principle: Follow guidelines for sizing the Tempdb database. .................................................... 29
Introduction ............................................................................................................................. 29
Guidelines................................................................................................................................ 29
Guidelines for Testing the Database ................................................................................................. 31
Principle: Follow guidelines for testing the database. ................................................................. 31
Introduction ............................................................................................................................. 31
Guidelines................................................................................................................................ 32

Lab 4: Designing for Database Scalability........................................................................................ 33


Time estimated: 20 minutes .................................................................................................... 33
Introduction ............................................................................................................................. 33
Scenario................................................................................................................................... 33
Exercise 1: Apply Optimization Techniques ................................................................................... 35
Introduction ............................................................................................................................. 35
Review methods to increase the query performance ............................................................... 35
Discussion questions ............................................................................................................... 35

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 3

Module objective:
After completing this module, you will be able to:

Apply best practices when designing for database scalability.


Introduction
The translation of a logical database model into a physical database model consists of designing the
actual database tables, relationships, views, and constraints. Physical database design is an iterative
process. Once the initial database layout is proposed, it must be revised and enhanced by addressing
issues such as performance, secure database access, and database dependencies. In this module, you
will learn guidelines and best practices for revising a physical design to include performance and
optimization considerations. Module 5, “Designing a Database Access Strategy,” and Module 6,
“Modeling Database Dependencies,” will cover secure database access and database dependencies.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 4

Lesson 1: Designing Indexes

Lesson objective:

Apply best practices for designing indexes.


Introduction
Choosing an appropriate indexing strategy is an important design decision because it affects the
performance of a database. Well-designed indexes can greatly enhance query performance, especially
when querying large tables. By creating indexes you can increase query performance and response
time without affecting the design of the underlying tables. As a database designer, your job is to
choose the most important indexes for the tables in your physical model, without imposing a burden
on Data Manipulation Language (DML) statements. You cannot anticipate all the indexes for the
database in advance because you do not know all the queries and the full data sets that an application
might require. Therefore, you should design the indexes you realistically anticipate, keeping in mind
that new indexes might be required in the future after analyzing the live database performance.

In this lesson, you will review guidelines and considerations regarding clustered and non-clustered
indexes, as well as when to use the Extensible Markup Language (XML) and computed columns
indexes.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 5

Performance Considerations in Choosing a Clustered Index

Principle: Consider performance when choosing a clustered index.


Introduction
When you work with clustered indexes, keep in mind the following technical facts:
• Clustered indexes are B-tree structures that speed the retrieval of rows from tables or views.
• Clustered indexes store data rows at the leaf node of the B-tree structure.
• Clustered indexes determine the physical order of records and pages in a table or indexed
view. Only one clustered index can exist for each table or indexed view because the actual
data rows can be sorted only in one order.
• The total width of the column values included in the index determines how many levels the
index tree will require. The more levels in an index, the less efficient it will be. Although
even the largest Microsoft® SQL Server™ indexes do not have more than three levels, you
should still minimize the index key width when you can.
• The width of each non-clustered index key value includes the width of the clustered key
value. This is because non-clustered indexes reference the clustered index if one exists, and
they use it to reach data rows. The corresponding clustered index key is copied into the leaf
node of every non-clustered index.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 6

Performance considerations in choosing a clustered index


Consider the following factors when designing clustered indexes:
• Create a clustered index on the frequently used columns
Clustered indexes are efficient when they are created on the columns that are frequently used
in the queries that return large numbers of contiguous rows. When the number of rows
involved in the query is small, non-clustered indexes can be as effective as clustered indexes.
When designing clustered indexes, look for columns used as search arguments (SARGs) in
JOIN conditions or in WHERE conditions and that use the equal sign (=), greater than sign
(>), less than sign (<), and BETWEEN operator.
In general, primary keys that are used often in JOINs and WHERE conditions are good
candidates for clustered indexes. Columns with date values can also be good clustered
indexes because they are often used as search ranges in WHERE conditions.
• Consider clustered index data types and column widths
The data type of the columns and column widths determine the total index width. Avoid
choosing a clustered index with a wide key because a wider key takes more resources for
maintenance of the clustered index as well as all the non-clustered indexes that rely on it.
The following table describes the type of indexes that you should use with different types of
column widths.
Index type When to use
Clustered indexes When columns are narrow and selective. Selective columns have
large numbers of distinct values.
Hash indexes Hash index is created on a computed column that contains the
checksum value. Use hash index when columns are wide with few
distinct values and are involved in queries that return a single row or
a few rows.

For more information


To create a hash index, define a computed persisted column using the CHECKSUM function, and then
create an index on the computed column. To make effective use of the index, you will have to modify
your queries to include the new column in the WHERE clauses. For more information, see the topic
“CHECKSUM” in SQL Server Books Online.

• Consider frequency of data changes


Do not use clustered indexes on columns that undergo frequent changes. When the
columns involved in a clustered index undergo data modification, SQL Server will update
the clustered index structure to move the row from the original position to a new position.
SQL Server will then update the references to the clustered key in all the related non-
clustered indexes. Updating clustered indexes frequently not only results in a
performance cost, but it can also cause page splits and index fragmentation over time.
You can reduce the risk of page splits by rebuilding the clustered index regularly using a
higher fill factor; however, rebuilding indexes will add to your maintenance overhead.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 7

Performance Considerations in Choosing a Non-Clustered Index

Principle: Consider performance when choosing a non-clustered index.


Introduction
When you work with non-clustered indexes, consider the following technical facts:
• Non-clustered indexes, like clustered indexes, are B-tree structures that speed the retrieval of
rows from tables or views.
• If a table has a clustered index, all the table’s non-clustered indexes will reference the
clustered index.
• When all the columns needed to answer a query are included in the non-clustered index, the
server will use only the non-clustered index (called a covered query).
• When some of the columns needed to answer a query are not included in the non-clustered
index, the server will use the non-clustered index to access the clustered index to retrieve the
values (called a bookmark lookup).
• The only types of views that can have indexes are indexed views. An indexed view must have
a clustered index before it can have any non-clustered indexes. SQL Server 2005 supports up
to 249 indexes per table or indexed view.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 8

Considerations
Consider the following guidelines to increase performance when you define non-clustered indexes:
• Consider performance gain versus maintenance cost
When you choose a non-clustered index, you must balance performance gain and
maintenance cost. Non-clustered indexes enable a query to retrieve data from a table without
having to scan the table, thereby potentially improving query performance. However, there is
a maintenance cost to all non-clustered indexes. UPDATE, INSERT, and DELETE
statements that affect the non-clustered columns result in changing not only the data row but
also the content and structure of any related non-clustered indexes.
• Index on frequently used search arguments
When you design non-clustered indexes, look for columns used in search arguments, known
as SARGs, that can be found in WHERE and JOIN clauses of a query. Non-clustered indexes
work best when you estimate that the query will return just one row or a small number of
rows.
• Use on high selectivity columns
Consider non-clustered indexes for columns with high selectivity (i.e., a higher ratio of
distinct values). For example, avoid choosing a non-clustered index on a column such as
Gender that includes only two values (M, F).
• Place on foreign key columns
Consider placing non-clustered indexes on foreign key columns. It is a common practice to
join tables on foreign key values, and if a non-clustered index is placed on the foreign key
values, the optimizer can use it in the join.
• Aim for covered queries
In cases where performance is critical, you can choose a non-clustered index to cover the
query. If the non-clustered index contains all the columns involved in the query, SQL Server
can satisfy the query just from the index and need not access the underlying table, which
causes a bookmark lookup. Following is an example:
SELECT OrderId, ProductID, Quantity
FROM OrderDetail
WHERE OrderID=5869 AND ProductID=13256 AND Quantity = 3

If you specify a composite non-clustered index only on OrderID and ProductID, SQL Server
will need to use the clustered index to retrieve the value of the Quantity column. If you create
the composite non-clustered index with the columns OrderID, ProductID, and Quantity, SQL
Server can satisfy the query using only the non-clustered index.
• Consider using included columns
In special cases where you have wide tables and a critical query will need to retrieve only
some of the column data, you can avoid the cost of a non-clustered index on all the required
columns by using included columns. You still need to index on the search arguments, but you
can include additional columns that are not part of the index key. Consider the following
query:
SELECT OrderId, ProductID, Quantity
FROM OrderDetail
WHERE OrderID=5869 AND ProductID=13256

In this case, Quantity is not a search argument, but it is required to satisfy the query. You can
specify a composite non-clustered index on OrderID and ProductID, with Quantity as an
included column. The index is not as expensive to maintain, but it still covers the query.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 9

Performance Considerations in Choosing an XML Data Type Index

Principle: Consider performance when choosing an XML data type index.


Introduction
Indexes on SQL Server XML columns present a challenge to database designers because the data type
is nonrelational and therefore it becomes difficult to anticipate the kinds of queries users will need. In
Module 3, “Modeling a Database at the Physical Level,” you learned about the guidelines for selecting
the SQL Server 2005 XML data type for tables in your physical database model. The XML data type
is a good choice when the data is nonrelational, contains hierarchical structures, or the schema is
unknown at design time.

You can select XML indexes when you anticipate the need to increase the performance of queries that
involve documents stored in XML columns. XML indexes create data structures that can reduce or
eliminate the frequent XML shredding process. Just like table indexes, there is a cost to maintain
XML indexes. XML indexes are maintained automatically by SQL Server when the XML columns are
updated.

Considerations
Consider the following guidelines to increase performance when choosing an XML data type index.
• Choose an appropriate XML index
SQL Server 2005 supports four different XML indexes:
ƒ XML primary index
The XML primary index creates a structure with all the tags, values, and paths that can be

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 10

directly associated with the table’s clustered index. This index is required to create all
other XML secondary indexes.
ƒ XML secondary indexes
Three specialized index types are available for XML secondary indexes: VALUE, PATH,
and PROPERTY. Each index type helps to increase the performance of queries of a
different nature.
• VALUE XML indexes increase the performance of content-oriented queries. Queries
like those in the following example can benefit from a VALUE XML index:
SELECT … FROM …
WHERE xCol.exist ('//ItemId[@ID = "1-626-391"]')
= 1

• PATH XML indexes increase the performance of structured queries. Queries such as
those used in the following example benefit from a PATH XML index:
SELECT … FROM …
WHERE PurchaseOrder.exist ('/Header/ShipNote[1] = "
Rush"]') = 1

• PROPERTY XML indexes increase the performance of queries that use Name/Value
predicates. The query illustrated in the following example benefits by using the
PROPERTY XML index because it uses the SELECT statement that retrieves
multiple values from an individual XML column:
SELECT PurchaseOrder.value ('(/Header/Charge/@Id)[1]',
'Varchar(5)'),
PurchaseOrder.value ('(/Header/Charge/Description)[1]',
'Varchar(50)'),
PurchaseOrder.value ('(/book//Charge/Cost)[1]',
'Decimal(18,4)')
FROM …

SQL Server supports XML indexes on typed (with associated schemas) and untyped (no
associated schema) XML columns.

• Determine likely search arguments


Specify XML indexes when you have certain queries that use XPath, Value, or Name/Value
predicates. When XPath, Value, or Name/Value predicate types are required as a search
argument in a SQL SELECT statement, the predicate can be a candidate for an XML index.
• Consider performance issues
Generally, relational indexes perform better than XML indexes do. Updating XML data, as
opposed to deleting or replacing it, requires more SQL Server resources than does updating a
relational index. If you update an XML column often, and it has XML indexes, the cost of
rebuilding the indexes with each update might have a significant impact on performance.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 11

Performance Considerations in Choosing a Computed Column Index

Principle: Consider performance when choosing a computed column index.


Introduction
You might also find it beneficial to choose indexes on computed columns, whether or not the column
is persisted, because you will not incur the cost of the computation at run time. The following
conditions should be met to define an index on a computed column:
• The computed column expression must be deterministic and precise. An expression is
deterministic if all functions (built-in, Transact-SQL, or common language runtime (CLR)) in
the expression are deterministic and precise. A function is deterministic if every time you run
the function with the same parameters, it returns the same value. For example, the GETDATE
or CURRENT_TIMESTAMP function is nondeterministic, whereas the ISNULL function is
deterministic. A function is precise if the expression does not involve the use of floating-point
(float or real) data types. When a function is not precise, a computed column must be
PERSISTED to support an index.
• The computed column expression must be based only on values of other columns in the same
row. The expression cannot reference other rows in the same table or columns in other tables.
• If the computed column references CLR functions, they should not perform any system or
user data access.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 12

Considerations
Consider the following factors that affect performance when you are designing an index on computed
columns:
• Assess benefits for common or important queries
In some cases, you can specify indexes on computed columns to increase the performance of
the critical or high-frequency queries. For example, in the OrderDetail table, you can define
the ExtendedPrice column based on the following function: ROUND(Quantity*Price*(1-
Discount/100)*(1+Tax/100),4). If this ExtendedPrice column is used to compute daily
product sales, a covered index {OrderId, ProductID, ExtendedPrice} can help the
performance of the query, and because the ExtendedPrice is included as part of the index it
will be a covered index.
• Assess performance cost against performance gain
When you define indexes on computed columns, evaluate the performance gain against the
performance cost. When computed columns are persisted or used in indexes, every time the
row is updated, the server will generate the value again based on the expression. If the table is
frequently updated, the performance of UPDATE statements will be affected.
Computed column indexes can be especially helpful in computed columns that are based on
the complex CLR functions, which are seldom updated. You can test the queries with and
without the index in your prototype to confirm that the index is used and whether a
performance gain is achieved.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 13

Practice: Choosing Appropriate Indexes

Introduction
Based on the sample tables provided in IndexPractice.doc, design the appropriate indexes: clustered,
non-clustered, XML, and computed.
Scenario
You are the database designer of a commercial retail database. You must design indexes for two tables
named InvoiceHeader and InvoiceDetail. These tables are two of the most active tables of the
database. InvoiceHeader experiences more than 5,000 row inserts per day, and InvoiceDetail more
than 30,000 inserts per day. Rows in these tables are never updated and seldom are deleted. The
development team has identified a number of queries that they believe will have performance
problems.
Reference Files
IndexPractice.doc

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 14

Discussion
Based on the scenario, consider the following questions:
Q What clustered indexes did you define?
A In the InvoiceHeader table: InvoiceID. You might alternatively suggest {InvoiceDate,
InvoiceNum}.
In the InvoiceDetail table: {InvoiceNum, ProductID}.
Q What non-clustered indexes did you define?
A In the InvoiceHeader table: If {InvoiceDate, InvoiceNum} was not defined as clustered,
define it as non-clustered. If {InvoiceNum} was not defined as clustered, define it as non-
clustered. You might also define {CustomerCode, InvoiceNum}.
In the InvoiceDetail table: {InvoiceNum, CommissionAmount} and {ProductID,
InvoiceNum, Quantity}.
Q What XML indexes did you define?
A In the InvoiceHeader table: Primary XML index on the CustomerOrder column. Also
PATH XML index in the CustomerOrder table.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 15

Lesson 2: Planning for Table Optimization

Lesson objective:

Apply guidelines when planning for table optimization.


Introduction
Besides indexes, SQL Server 2005 offers database designers with several other kinds of database
objects and techniques to assist in designing for performance. In this lesson, you will learn guidelines
and best practices for using views, indexed views, partitioned views, summary tables, and
denormalization to enhance database performance. Just like indexes, improper or excessive use of the
other database techniques can result in adverse effects.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 16

Guidelines for Designing Views

Principle: Design views according to guidelines.


Introduction
When you are creating a physical database model, the first kinds of objects you deal with are tables
and columns, and perhaps indexes. After tables are chosen and normalized, information that bridges
those tables can be supplied only by complex queries. You might decide that the physical model also
needs views to expose in a more intuitive way the information that bridges tables and hides the
complex joins.
A view is a database object that enables database developers to store a predefined query in the server.
The main advantage of views is that users and applications query views, the same way they query
tables. From the user perspective, views are virtual tables. For many applications, there is no practical
distinction between tables and views.

Guidelines
Consider the following guidelines when designing views:
• Use views to hide joins
In a physical database model, objects that you have designed as a single concept can be
materialized in many tables. To help developers and users, you can create views that hide the
complexity of the joins required to collect all the data for that object and that present it in a
more natural schema. Take, for example, the Employee entity; you can split its attributes in
three tables: Person, Address, and Employee. A FullEmployee view might join all the

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 17

attributes and enable applications and users to use it as a table, hiding the complexity
introduced by the normalization process.
• Use views to hide sensitive data
SQL Server enables you to create views based on a selected subset of a table’s columns. If
you grant users SELECT, UPDATE, DELETE, or INSERT statement rights in the view, but
do not grant those rights in the table, users will be able to access only the values from the
columns included in the view, effectively hiding sensitive columns to unauthorized users.
For example, if the Employee table includes a Salary column and you want all users to be
able to query the employee information but not employee salaries, you can create a view that
includes all columns from the Employee table except for the Salary column. To restrict the
salary information, grant all users access to the view, and grant access to the Employee table
only to the users that require access to the Salary column.
If you choose to use views as a security mechanism, be sure to include the WITH CHECK
OPTION clause when creating the view to help ensure that users cannot move data to ranges
of rows that are beyond their permissions.
• Consider alternatives to views
Views are not the only methods available for hiding the complexity of joins. Some
alternatives to views are stored procedures, user-defined functions (UDFs), and
synonyms. The following table describes the advantages of the alternative options to
views.

Alternatives to Description Advantages over views


views
Stored Are used differently from views: ƒ Allow a wider range of SQL statements
procedures ƒ Stored procedures are executed while than what views allow
tables and views are queried. ƒ Passing parameters, declaring variables,
ƒ To query tables applications use a using if conditions, providing error
SELECT statement; to execute procedures control, and using multiselect
the application uses the EXEC statement. statements are only some of the
possibilities permitted in stored
procedures that are not allowed in
views.
UDFs (three Multistatement and inline functions provide ƒ Inline: Permits the use of parameters.
types: scalar, alternatives to views: ƒ Multistatement: Allow a wide selection
multistatement, ƒ Inline: Similar to views because you of SQL statements, but return a single
and inline) define them as one SELECT statement. table variable.
ƒ Multistatement: Similar to stored
procedures in the way they are defined
and similar to views in the way they are
queried.
Synonyms Can be used as an alternative to a view when ƒ Allows you to define an alternate name
all you need is a different name for a view. for database objects.
ƒ Can reference tables, views, stored
procedures, UDFs, and other database
objects.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 18

Guidelines for Choosing Indexed Views

Principle: Apply guidelines for choosing indexed views.


Introduction
A SQL Server 2005 indexed view is a database view with a clustered index defined on it. When SQL
Server creates the clustered index, it materializes the view and stores the resulting data on disk. The
view is no longer a virtual table because, as an indexed view, it is similar to a regular table that
occupies its own space in the database. From the moment the server creates the index, all INSERT,
UPDATE, and DELETE statements modifying rows in the base tables also update the data rows in the
view.
Guidelines
When defining and designing indexed views, keep in mind the following guidelines:
• Use indexed views to improve performance of reporting queries
Examine the proposed reports and queries in your design that you anticipate will contain
complex joins and aggregations. Indexed views can dramatically improve the performance of
some queries, in particular those involving aggregations.
• Consider performance gain versus cost
Consider the tradeoff between performance gain and cost. Because indexed views are
materialized, the server has to keep the data in the base tables synchronized with the data
rows in the view. The database occupies more space and the server must update the rows in
the view every time the base tables are updated.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 19

Indexed views that reference tables that are actively updated can hinder server performance.
Before creating the index view, evaluate the performance impact.
• Consider indexed view requirements
The requirements for creating and using an indexed view include the following:
ƒ Create the view with the SCHEMABINDING option. This option ties the schema in the
view to the tables’ schema.
ƒ All functions used in the view must be deterministic and created with the
SCHEMABINDING option.
ƒ The view can include only tables from the same owner in the same database. Using other
views from the same database, tables or views from other databases, or tables or views
from other owners is not allowed. You must also use two-part names for all objects
referenced in the view.
ƒ The ANSI_NULLS and QUOTED_IDENTIFIER options must be set to ON when you
create and use indexed views. Create all base tables and the indexed views with the
ANSI_NULLS option and QUOTED_IDENTIFIER option set to ON.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 20

Best Practices for Partitioning Tables

Principle: Apply best practices for partitioning tables.


Introduction
If you anticipate that your database design must support one or more large tables, and those large
tables will also be active, you can specify table partitioning as a method for increasing performance.
When a large table is stored in a single database file, SQL Server assigns one thread to perform a read
of the rows in the table. If you specify that the table will be divided into multiple, approximately
equal-sized physical files, SQL Server assigns multiple threads (one per physical file) to read the
table.
You can specify your own custom partitioning of a table and create a custom view over the partitions,
resulting in a union partitioned view. However, when you can satisfy the requirements for a
partitioning key, you should specify SQL Server table partitioning and enable much of the partition
maintenance to be handled automatically.
With table partitioning, you can divide large tables and their indexes into partitions that reside in
different filegroups. Dividing tables or indexes into partitions and assigning them to different
filegroups enable the designer to take advantage of multiple input/output (I/O) channels to enhance
I/O performance.
Best practices
Consider the following best practices when partitioning tables:
• Use only on large tables
Table partitioning is an advanced technique that shows performance benefits only when used
on large tables. Consider the use of table partitioning only in active online transaction

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 21

processing (OLTP) databases that contain millions of rows in tables or when loading data into
stage or data warehouse databases.
• Choose the appropriate partition strategy
The two basic partitioning patterns to create partitions are as follows:
ƒ Balanced partitions: In this type of partition, the database designer divides rows in an
even manner into as many I/O channels as possible. This distributes partition units in a
manner that enables the server to query them simultaneously. For example, if the Orders
table is partitioned using the Customer column as a partitioning column, INSERT
statements will be distributed automatically across the partitions. SELECT statements
that are filtered by OrderId or OrderDate values will benefit from parallel scans of the
partitioned table because SQL Server can extract data simultaneously from multiple
partitions.
ƒ Personalized partitions: In this type of partition, the database designer deliberately
creates unbalanced partition units to increase the performance of some queries at a cost to
others. One way to use this approach is to create a History partition to contain the old
data and another partition to contain the latest data. For example, in the Orders table, you
can create the Current Sales partition that includes only the last quarter’s information and
a History Sales partition that includes the past 5 years of orders. The performance of
queries on recently included orders will increase, and the table will be able to support
queries that require historic information.
• Choose an optimal partitioning function
Table partitioning in SQL Server 2005 uses a built-in partition function that accepts column
values from the row and returns a number. The partitioning schema uses the number
generated by the partitioning function to indicate the filegroup in which the row is stored.
When defining partitioning across multiple tables that are frequently joined, for example,
Orders, OrderDetails, Invoices, and InvoiceDetails, you should use a generic partitioning
function that enables partition management from a single point and workload distribution,
rather than using hard-coded values.
• Choose appropriate file placement of partitions
Based on your performance requirements, make sure the storage capacity plan of your
database takes into account the use of partitioned tables. File placement depends on the
chosen partition strategy. If balanced partitions as used, you can include multiple disk
subsystems to distribute database files and filegroups and distribute partitions accordingly.

If personalized partitions are used, include at least two subsystems for the database: one
redundant array of independent disks (RAID) 1 or RAID 10 subsystem where write
performance is a priority, and one or more RAID 5 subsystems in partitions where read
performance is the priority.
• Consider the use of index alignment
If you create partitioned tables and create indexes on those tables, SQL Server 2005
automatically partitions the indexes using the same partition schema as the table. This is
called index alignment and is usually the best solution. However, in huge volatile tables you
might experiment with separating clustered and non-clustered indexes onto different disk
subsystems.
• Plan for data movement in and out of the partitioned table
Consider the need to manage partitions and the need to move data inside the partitioned table.
These operations usually involve large amounts of data, and if partitions are designed with
these operations in mind, they can be achieved quickly using the built-in SWITCH, MERGE,
and SPLIT commands. Consider the need to archive data, move data to history partitions, and
migrate data into the partitioned table.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 22

Best Practices for Creating Summary Tables

Principle: Apply best practices for creating summary tables.


Introduction
In your physical database design, you might find cases where summary or aggregate data does not
need to be queried in real time, although it does need to be queried from the database. For example, a
report query might require summary values of a prior month’s or several months’ sales data. If the
query can access the monthly summary data separately from the detail data, and if the older detail data
does not change, you can increase query performance by storing data in separate summary tables.
Best practices
Use the following best practices when creating summary tables:
• Use when data is not required in real time
Summary data is relatively static, so you benefit most when that summary data can have some
latency, such as totals as of yesterday or last month. For example, instead of querying the
detail data by scanning the OrdersDetails, Orders, and Products tables every time a report
requires the prior month’s data, you can specify an OrderSummary table with sums of
OrderQuantity, SalesAmount, and CostAmount for each ProductID and CustomerID per
month.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 23

• Plan for updating or re-creating summary tables


The drawback of summary tables is that you have to design a strategy to update or re-create
them periodically. In cases where the older data does not change, some summary tables might
need to be created only once. You can specify SQL Server jobs, stored procedures, or
automatic triggers to update the summary tables.
ƒ Consider alternatives to summary tables
Sometimes summary tables are impractical because they involve lot of overhead or require a lot
of disk space. In such cases, you might need to find alternatives to summary tables that do not
impose overhead on the production server. Such alternatives include the following:
ƒ A separate reporting server: SQL Server 2005 Reporting Services is a server-based
reporting technology that has the ability to schedule report generation with automatic
distribution. Report generation can be scheduled when the database activity is low. You
can specify the use of a data distribution technology such as replication or SQL Server
Integration Services (SSIS) to copy data from production to the reporting server.
ƒ An operational data store (ODS): An ODS is a database used as a stage area for a data
warehouse and is updated through the course of business. ODS databases are designed to
support frequent queries on detailed data.
ƒ A data warehouse: A data warehouse is a database that is designed to contain
summarized information needed to support business analysis processes.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 24

Guidelines for Selective Denormalization

Principle: Apply guidelines to the use of selective denormalization.


Introduction
In Module 2, “Modeling a Database at the Logical Level,” you learned about the importance of
normalization in OLTP databases. You learned that to eliminate update anomalies and to reduce
redundancy in the database, at least third normal form should be achieved in the design.
However, on rare occasions you might critically require better performance and you will selectively
denormalize data after having first normalized it. Denormalization is the process of adding
redundancy in the database to optimize performance. A best practice in database design is to achieve a
fully normalized database at a logical design level and then, if necessary, introduce some degree of
denormalization at the physical level.
Guidelines
Consider the following guidelines when denormalizing selected parts of the database:
• Verify and assess the performance benefits
Do not assume that redundant data will always translate into better performance. When
denormalizing the physical design, evaluate the performance gain involved in introducing
redundant data.
You can evaluate the performance gain of denormalization by performing a test in which you
create two sets of tables, one normalized and one denormalized, in a test copy of the database.
You then run the sample queries on the tables and compare the actual response time with
query optimization costs.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 25

• Consider the update anomaly cost


Denormalization requires storing the same fact or attribute more than once. This has two
important consequences that you must keep in mind:
ƒ You will need to keep the denormalized facts synchronized because they are stored in
more than one place.
ƒ You must be on the alert for potential update anomalies, ensuring that all values of
denormalized data match after each update.
The first consequence can hinder query performance in the database, and the second demands
additional measures to detect and correct.
• Consider alternatives to denormalization
In previous topics, you learned about guidelines for applying performance improvement
techniques that could reduce the need for denormalization. You can specify the use of
computed columns, persisted computed columns, views, indexed views, and partitioned views
as alternatives that might offer better or equivalent performance to denormalization without
the update anomaly risk.
• Consider denormalizing data at the middle tier
Another alternative to denormalizing data is to design middle-tier data access components
that query normalized tables and then denormalize them on the middle tier. The middle-tier
data access components can then present the denormalized data to the other components and
to the user interface. However, be careful when using this approach because you might just
push an update anomaly back one level from the database to the middle tier. Then the middle
tier must ensure that denormalized data is updated in multiple places.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 26

Lesson 3: Planning for Database Optimization

Lesson objective:

Apply guidelines in choosing additional optimization techniques.


Introduction
When you add performance considerations to your database design, some factors will change your
design (such as specifying indexes) whereas others might not directly affect your design (such as
isolation levels). In the first two topics of this lesson, you learn about the guidelines for specifying
snapshot isolation and the tempdb system database, the two factors that do not affect your design. You
can specify one of two snapshot isolation options to improve query accuracy and performance in some
scenarios. The tempdb system database can be used to support better query performance but can also
become a bottleneck to the server if it is not sized properly.

In the last topic of the lesson, you will learn guidelines on how to test your solution in a lab
environment and identify performance issues that might arise.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 27

Best Practices for Choosing Snapshot Isolation

Principle: Apply best practices for choosing snapshot isolation.


Introduction
A basic functionality of a relational database such as SQL Server is to provide ACID properties for
transactions. The acronym ACID is used to specify four essential attributes of transactions: atomicity,
consistency, isolation, and durability. These essential attributes are supported by the database
transaction log and are not configurable.

However, as a database designer, you can specify how the isolation property can be configured.
The isolation property refers to the degree to which a transaction is independent from other
transactions. SQL Server 2005 supports all four isolation levels defined in SQL-99: read uncommitted,
read committed, repeatable read, and serializable. SQL Server 2005 also supports two flavors of
snapshot isolation based on row versioning: READ_COMMITTED_SNAPSHOT and SNAPSHOT
ISOLATION.

The READ_COMMITTED_SNAPSHOT database option enables you to specify that all SELECT
statements in a database, using the default read committed isolation level, will not take any shared
locks and will rely on row versioning to read data that is consistent for the duration of the SELECT
command. The SNAPSHOT ISOLATION option ensures that the SELECT statements within a
transaction do not take any shared locks and rely on row versioning to read data that is consistent at
the time the transaction began.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 28

The main benefits of the snapshot isolation options are the reduction of blocking caused by shared
locks and the guarantee of reading consistent data within the defined scope.
Best practices
Consider the following best practices when working with snapshot isolation:
• Assess the potential for concurrency issues caused by reader/writer blocking
The main reason to use snapshot isolation is to avoid locking conflicts between readers and
writers. If transactions are small and fast, the server will handle the conflicts without a
significant performance impact. However, if transactions are large and slow, and you foresee
the potential for readers blocking writers, you should consider specifying snapshot isolation.
• Identify transactions versus queries for reader/writer conflicts
To determine which snapshot isolation level will be most beneficial, evaluate whether readers
blocking writers is more likely to occur in single SELECT queries (such as simple reports) or
in transactions. Complex reports that require completely consistent data across several
SELECT statements can be wrapped in a transaction that runs using the snapshot isolation
level.
• Identify the risk of update conflicts for snapshot isolation level
A consequence of working with optimistic locking is the risk that two transactions working
with the same row will both try to update it. When this situation occurs, SQL Server detects it
by comparing the transaction row version with the actual row. If SQL Server determines that
the rows are different, it generates an update conflict error. The application or Transact-SQL
transaction must be able to handle the error properly. You can specify that Transact-SQL
transactions use the TRY/CATCH construct to intercept an update conflict and retry.
• Estimate the cost of tempdb activity for the snapshot isolation options
SQL Server uses the tempdb database to store row versions, and the more actively you use
snapshot isolation, the more tempdb usage you must plan for. You will learn about tempdb
sizing guidelines in the next topic, “Guidelines for Sizing the Tempdb Database.”

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 29

Guidelines for Sizing the Tempdb Database

Principle: Follow guidelines for sizing the tempdb database.


Introduction
When you work with a SQL Server 2005 physical database design, it is critical that you attempt to
estimate tempdb activity and properly size it for your design. Tempdb is one of the SQL Server 2005
system databases. When users and applications create temporary tables and temporary stored
procedures, regardless of which database the application is using, temporary objects are stored in
tempdb.
SQL Server 2005 also uses tempdb to store worktables needed to perform various tasks, including
index creation, hash table creation, row versioning for online operations, triggers, snapshot isolation,
and worktable creation for join and sort operations. Tempdb is a global resource that should be
properly sized; failing to do so can produce a system bottleneck.
Guidelines
When sizing the tempdb database consider the following guidelines:
• Identify the use of snapshot isolation options
As mentioned earlier, there are two ways to use snapshot isolation in SQL Server 2005:
READ_COMMITTED_SNAPSHOT and SNAPSHOT ISOLATION. If you have specified
either option, you must factor that into your estimate of activity in tempdb. You can use
System Monitor counters to monitor snapshot isolation behavior in your prototypes or in a
test system to form a baseline that can be used to estimate tempdb activity.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 30

• Identify the use of triggers and online indexing for row versioning
SQL Server 2005 uses row versioning automatically for triggers and online indexing. This
occurs in the background and is not configurable.
Row versioning is automatically used for materializing a trigger’s inserted and deleted tables
and for any rows updated in the triggers. For example, assume that an OrderDetail table has a
trigger that updates the Inventory table. When an order is voided, the server will create
versions for all updated OrderDetail rows (inserted and deleted tables) and all updated rows in
the Inventory table.
Another feature that uses row versioning is online indexing, whereby developers and database
administrators can create or rebuild indexes without disrupting concurrent user activity.
Online indexing creates a parallel index structure and switches over to it when the new index
is built. During that process, it uses row versioning to keep both the old and new indexes
synchronized.
If you specify either triggers or online indexing in your physical database design, you must
provision for additional tempdb activity based on the use of row versioning.
For more information
For more information about row versioning for triggers and online indexing, see “Understanding Row
Versioning-Based Isolation Levels,” in SQL Server Books Online.

• Consider the file placement and initial size option of tempdb


When you specify how tempdb should be sized for your database design, consider the
following guidelines:
ƒ Configure tempdb with the auto growth option enabled (default), but set its initial
database files to a reasonable size to avoid numerous expansions. Frequent database
expansions can hinder server performance and fragment the files. Monitor your
prototypes and test databases to find an adequate starting size.
ƒ Specify that tempdb database files should be placed on a fast disk subsystem, one that
uses RAID 1 or RAID 0+1 (mirrors with stripes). Because SQL Server re-creates tempdb
every time the server starts, do not store permanent data in tempdb because all the data
will be lost on a restart of the server. For SQL Server to work, it requires tempdb.
Therefore, ensure that you place tempdb on a disk subsystem that is fault tolerant. If the
tempdb disk subsystem fails, SQL Server will not be able to run, and the database
administrator (DBA) will have to re-create the array and restart the server.
ƒ Consider the use of multiple files for tempdb distributed on different disk subsystems.
One recommendation is to use one tempdb data file per CPU, but you should test to
determine the optimal number for your application.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 31

Guidelines for Testing the Database

Principle: Follow guidelines for testing the database.


Introduction
As mentioned in Module 3, “Modeling a Database at the Physical Level,” the Microsoft Solutions
Framework (MSF) Process Model assigns different names to prototypes depending on which phase
the project is in and the objective of the prototype. A proof of concept is essentially a load test used to
determine whether a solution is feasible by simulating the production environment in a lab setting.
The proof of concept is the first milestone of the development phase and applies as much to a database
design as to an application design.

The objectives of load testing the database include the following:


• Determine whether performance goals are achieved.
• Identify bottlenecks and optimize performance.
• Verify reliability.
The first objective is the main goal of load testing. If performance goals are not achieved, database
designers and the software development team must evaluate and revise the architecture and technical
implementation of the solution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 32

Guidelines
Consider the following guidelines for testing the database:
• Find methods to simulate projected load
To generate good lab simulation you must find methods to generate appropriate workloads. A
workload is a Transact-SQL script or trace that can be replayed and is used in stressing tools
to generate the stress load on the server.
Microsoft Visual Studio® 2005 enables the creation of a testing project where you can
program multiple tests for a project. These tests can be grouped in a special kind of test called
LoadTest. The load test enables the developer to configure the number of users, their
bandwidth, and the mix of tests representing a production environment.
Ensure that you do not allow the tests to run using production servers. You should instead
specify the tests run on test servers.
• Ensure the data and query mix is both realistic and representative
When designing the load test, make sure you include a mix of tests that is representative of
the production environment. The load test should include a realistic mix of reports that are
query intensive and operations that include both query and updates.
In addition, the information stored in the database should be similar or identical to what the
production database stores.

• Use a load and stress testing tool


After the test scenarios are generated, you can use SQL Profiler to capture all Transact-SQL
statements generated by the testing tool. SQL Profiler enables you to replay the capture trace
or script. This feature of the SQL Profiler is useful when testing database options, file
placement, index alternatives, and so on. SQL Server 2005 also provides the Database Engine
Tuning Advisor, which provides advice on index (including indexed views) and partitioning
strategies.
Also, third-party load testing tools can be used for database load testing.
For more information
For more information about SQL Profiler, see “Replaying Traces,” in SQL Server Books Online.

• Stress test the disk subsystem


Because the nature of any relational database management system (RDBMS) is to store and
retrieve data, the most physically stressed part of a database server is its disk subsystem. As
part of your specification for disk storage for your database design, you should plan for disk
system stress tests that will verify appropriate performance at peak load and higher. Microsoft
provides two tools, SQLIO.EXE and SQLIOStress.EXE, that can be used to determine disk
subsystem throughput. SQLIO.EXE, the SQLIO Disk Subsystem Benchmark Tool, enables
you to measure the throughput of a disk subsystem and vary certain low-level settings.
SQLIOStress.EXE helps you find the maximum throughput of an existing disk subsystem and
enables you to stress test the disk subsystem by running it at high load for a long period of
time.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 33

Lab: Designing for Database Scalability

Time estimated: 20 minutes

Introduction
In Lab 3, “Modeling a Database at the Physical Level,” you created a database physical model based
on the previously generated conceptual and logical models. In this lab, you will design appropriate
optimization methods that will ensure efficient reports.
Scenario
You are the lead database designer working as a part of the Human Resources (HR) Vacation and Sick
Leave Enhancement (VASE) project.

The HR VASE project will enhance the current Human Resources system. This system is based on the
AdventureWorks sample database built on SQL Server 2005.

You are asked to formulate a list of database requirements that your design must satisfy. The main
goals of the project are the following:
• Give individual employees permission to view their vacation and sick leave balances.
• Provide managers with current and historical information about employee vacation and sick
leave data.
• Give certain employees in the HR department permission to view and update employee salary
data.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 34

• Give certain employees in the HR department permission to view and update employee sick
leave and vacation data.
• Give the HR manager permission to view and update all the data.
• Optimize reports performance for managers and HR personnel.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 35

Exercise 1: Apply Optimization Techniques


Introduction
Given the physical data model generated in Lab 3 and data retrieval requirements for the VASE
project, determine appropriate methods for optimizing access to the data. This might include indexed
views, computed columns, summary tables, and denormalized tables.

Review methods to increase the query performance


Detailed Steps
Summary 1. Open the QueryPerformance.doc file
1. Based on the physical model and located at
QueryPerformance document, determine install_folder\Labfiles\Mod04\Starter.
appropriate methods to increase the 2. Use the PhysicalModel.vsd file located at
performance of provided queries. install_folder\Labfiles\Mod04\Starter.
2. Class review.

Answer Key

1. Open the QueryPerformance.doc document located at install_folder\Labfiles\Mod04\Starter.


2. Suggest appropriate methods to increase the performance of the provided queries.
3. You can compare your solution with the sample Query Performance Solution.doc. This
document is located at D:\Labfiles\Solution.
4. Wait for the instructor to review different solutions provided by students.

Discussion questions
Read the following questions and discuss your answers with the class.
Q What was your recommendation to increase the performance of the Manager Report?
A Answers will vary. The recommended solution is to create an indexed view
replacing the Report Manager SQL statement. The view should summarize the
sick and vacation days taken by employees.
Q What was your recommendation to increase the performance of the Department Report?
A Answers will vary. The recommended solution is to create a summary table
containing yesterday’s aggregated data. The table should have three columns:
DepartmentID, AggSickDaysTaken, and AggVacationDaysTaken. Populate the
table in a job using the Department Report SQL statement.
Q What was your recommendation to increase the performance of the Today’s Absenteeism
Report?
A Answers will vary. The recommended solution is to create a persisted computed
column ReturnDate in both tables (SickLeaveEvent and VacationEvent). Test
and evaluate the performance gain. If needed, create an indexed view replacing
the Today’s Absenteeism Report.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 4: Designing Databases for Performance 36

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy
Time estimated: 130 minutes
Table of contents
Lesson 1: Designing for Secure Data Access................................................................................... 3
Apply best practices when designing for secure data access..................................................... 3
Guidelines for Designing a Secure Execution Context.................................................................. 4
Principle: Apply guidelines for designing a secure execution context...................................... 4
Guidelines for Defining a Data Access Policy............................................................................... 7
Lesson 2: Designing User-Defined Functions ................................................................................. 9
Apply guidelines for designing user-defined functions (UDFs)................................................ 9
Guidelines for Choosing Between Transact-SQL and CLR User-Defined Functions ................. 10
Principle: Apply guidelines for designing Transact-SQL and CLR UDFs ............................. 10
Guidelines for Designing Transact-SQL User-Defined Functions .............................................. 12
Principle: Apply best practices for designing Transact-SQL UDFs........................................ 12
Guidelines for Designing CLR User-Defined Functions ............................................................. 14
Principle: Apply best practices for designing CLR UDFs....................................................... 14
Considerations for Designing User-Defined Aggregate Functions.............................................. 16
Principle: Apply considerations for designing user-defined aggregate functions. .................. 16
Practice: Specifying a User-Defined Aggregate .......................................................................... 18
Lesson 3: Designing Stored Procedures ........................................................................................ 21
Apply best practices for designing stored procedures. ............................................................ 21
Guidelines for Selecting Between Transact-SQL and CLR Stored Procedures........................... 22
Guidelines for Designing Transact-SQL Stored Procedures........................................................ 24
Guidelines for Designing CLR Stored Procedures ...................................................................... 27
Lab 5: Designing a Database Access Strategy .............................................................................. 29
Exercise 1: Design Data Retrieval Objects ................................................................................. 30
Exercise 2: Design Security for Data Retrieval Objects ............................................................. 32

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 2

Module objective:
After completing this module, you will be able to:

Design a database access strategy.


Introduction
During the physical database design process, you should carefully consider the data access strategies
that your solution will employ. In effect, you can design an interface to the database by abstracting the
physical database structure (such as tables and columns) and using functions and stored procedures for
data access. Incorporating correct data access logic into the physical database and application model
enhances both security and extensibility by loosely coupling the database objects and the application
logic.
In this module, you will learn guidelines for revising and enhancing the physical database model with
data access considerations. This includes guidelines for designing:
• Secure access methods.
• Database objects that clients can use for data access, specifically user-defined functions
(UDFs) and stored procedures.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 3

Lesson 1: Designing for Secure Data Access

Lesson objective:
After completing this lesson, students will be able to:

Apply best practices when designing for secure data access.


Introduction
Security is a central operational requirement of today’s applications. After you have designed the
physical database objects such as tables and columns, you can proceed to design database objects that
will provide data access and abstract the physical database structure. This design process requires
knowledge of the Microsoft® SQL Server™ 2005 security model.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 4

Guidelines for Designing a Secure Execution Context

Principle: Apply guidelines for designing a secure execution context.


Introduction
In SQL Server 2005, two security tokens represent the identity of the session. These two tokens—the
login token and the user token—represent the user execution context, against which permissions are
checked. This execution context can be further qualified by the roles (groups of users) to which the
user or login belongs. The following guidelines can help you design a secure execution context for
your database.
Determine the permission granularity
When designing the authorization or permissions strategy for a database, you should decide on a level
of permission granularity that your solution requires. Less restrictive levels of authorization will allow
you to create a model that requires fewer permission settings and less management and administrative
effort. However, implementing finer permission granularity levels provides more control over specific
database object access, which is sometimes required in applications that work with highly sensitive
data. Finer permission granularity requires more administrative effort because a greater number of
permissions must be set. You should consult your database requirements to select the appropriate level
of permission granularity.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 5

For example, consider a database that provides a book catalog to a public library. In this scenario, you
will probably use a very coarse granularity because most of the data can be accessed by a majority of
users. A single schema and three roles—Patron, Librarian, and Administrator—should suffice to meet
your requirements. However, in other contexts, such as a healthcare application, data access must be
restricted for specific database objects by many parameters, such as the user’s department, job
function, and authorization level. In such scenarios, a very granular permissions model must be
applied.
Associate permissions with schemas
The schema object is a new security feature of SQL Server 2005. Previously, schema objects were
associated with the object owners, who were users. Now, however, the concept of a database user is
separated from the schema.
You can use schema objects to group tables, views, stored procedures, and other database objects and
grant users access to them through the schema object. For example, if you create a HumanResources
schema and grant users SELECT access to the schema, users will have select access to all tables and
views defined in the HumanResources schema.
You can design and use schemas to reduce the quantity of needed permissions by grouping database
objects that require the same level of permission. Granting permission through schemas simplifies
database permission management.
Consider using EXECUTE AS to fine-tune permissions
SQL Server 2005 allows you to define the security context in which modules (stored procedures,
triggers, functions, and queues) are executed. This ability is useful in two scenarios: when the security
chain is broken (objects belong to different users) and when external access is required.

Note
To set the security context of the module, use the EXECUTE AS statement. The EXECUTE AS
statement allows one of the following options: CALLER, SELF, OWNER, and ‘User.’ CALLER is
the default for all modules except queues.

The EXECUTE AS clause can address the following problems:


• Problem: You do not want users to have access to the Salaries table unless it is accessed
through a particular stored procedure. Each object belongs to a different user.
Solution: Use the EXECUTE AS clause with the SELF option, grant users access to the
stored procedure, and do not grant the users access to the table.
• Problem: You want to provide summarized order information through a stored procedure,
but only if the users have access to the detailed information. Both objects (Table and Stored
Procedure) belong to different users.
Solution: The default option will work. The procedure will fail if the user has no access to the
Order Table.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 6

Consider the use of application roles and the sp_setapprole stored procedure
Instead of granting users permissions to database objects, you might consider using application roles.
An application role is a database security principal that represents an application or program. To use
application roles, you can grant all needed permissions to the application role, without granting the
same permissions to users. The advantage is that only users authenticated in the server and with access
to the database can use the application role, yet they are granted no direct permissions to data

Note:
To activate the application roles, the application executes the sp_setapprole stored procedure with the
application role name and password.

The main purpose of application roles is to deny user access from outside the secured application and
allow data access only through the particular application. For example, if a cashier needs access to the
Invoice table, but only from the Retail Software application, you can create an application role (such
as RetailSoftware) and grant it access to the Invoice table, while not granting the user access to the
table. When the user logs in with the application, the application executes the sp_setapprole stored
procedure and can access the Invoice table. If the user accesses SQL Server from outside the
application by using the same login as the one he or she uses for the application, the user will not have
access to the protected table.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 7

Guidelines for Defining a Data Access Policy

Principle: Apply guidelines for defining a data access policy.


Introduction
After you have chosen secure data access methods, you should define a data access policy. If you
allow developers to access the database in any manner, it will become more difficult to monitor and
troubleshoot queries. More importantly, the database might become tightly coupled to the application,
so that you cannot tune queries without changing the application code. A good data access policy
keeps the database more loosely coupled with applications and sets up an interface that developers can
use to access the physical database objects.
Guidelines for defining a data access policy
Consider the following guidelines when defining a data access policy:
• Define limits to direct data access
The most important data access decision is whether to allow users direct access to the
database tables. From a security perspective, not granting direct access to users is always the
best strategy. Specifying only indirect access to tables through views, stored procedures, and
UDFs provides better access control, easier management, and stronger security. However,
indirect access requires database developers to create additional objects to interface with
applications.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 8

• Prohibit or provide limited direct access to database


Consider limiting or prohibiting direct access to database data as part of your data access
policy. Normally this implies that applications cannot use ad-hoc queries.
• Identify mechanisms for indirect access
Views, stored procedures, and functions are the objects most frequently used to provide
indirect access to database tables.
Generally, stored procedures are the most effective means of providing indirect access to data. Many
database developers use a policy of one stored procedure per operation and per table. This practice not
only helps to increase the security of the database, but also is a good development practice because it
facilitates abstraction and modular programming. Using stored procedures hides the complexity of the
database from the application and provides a single point of management for all Transact-SQL code.
By implementing stored procedures, you can later tune queries without affecting the application.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 9

Lesson 2: Designing User-Defined Functions

Lesson objective:
After completing this lesson, students will be able to:

Apply guidelines for designing UDFs.


Introduction
User-defined functions are an important part of the toolset that you can use to solve problems in your
database design. Because you cannot anticipate all of the granular functions that will be required
during application development, you cannot specify all of the required UDFs in advance. However,
you can anticipate problems or tasks that can be addressed by specifying the use of a UDF. Therefore,
you must become familiar with UDF design patterns so that you can solve specific issues in your
physical database design. In this lesson, you will learn about guidelines for designing the various types
of UDFs.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 10

Guidelines for Choosing Between Transact-SQL and CLR User-Defined


Functions

Principle: Apply guidelines for designing Transact-SQL and CLR UDFs


Introduction
One important choice you face as a database designer is when to specify, or recommend, a common
language runtime (CLR) or Transact-SQL UDF. The choice of which type of function to specify, and
what type of activity to specify in the design of the function, depends on the type of activity that the
function must perform.
Guidelines for choosing between Transact-SQL and CLR UDFs
The following guidelines will help you to specify the appropriate UDF:
• Use Transact-SQL UDFs by default. Transact-SQL UDFs should be your default choice when
specifying UDFs in your database design. Transact-SQL UDFs can be modified, optimized,
and redeployed without necessitating the more elaborate process of managed code
development. There are certain conditions under which CLR UDFs are the best solution, but
when those conditions are not met, you should normally specify Transact-SQL functions.
• Use CLR functions for computationally intensive processing. Computationally intensive code
written in Microsoft .NET Framework languages has several advantages over Transact-SQL
procedural code:

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 11

• Easier to write. Development and testing of complex algorithms written in .NET


languages such as Microsoft Visual Basic® .NET and C# is more streamlined than
implementation of these algorithms in Transact-SQL. Because Transact-SQL is a language
designed to provide data access and data manipulation, it lacks the more sophisticated control
statements. For example, Transact-SQL does not have a FOR looping statement or a
SWITCH/CASE statement. Also, development in Transact-SQL lacks much of the
functionality and support that you can benefit from when developing in .NET languages by
using Microsoft Visual Studio® .NET.
• Easier to maintain. For the same reasons, it is also easier to maintain, update, and
version complex procedural code in .NET languages.
• Better performance. CLR functions have the benefit of a quicker invocation path, giving
managed code an important performance advantage over Transact-SQL procedural
statements. This is true only for certain important operations such as computations. However,
Transact-SQL offers a performance advantage for data access–related operations such as
sorting and querying.
As a rule, you should specify that all computationally intensive functions in your database
model be coded by using CLR UDFs. Some examples of computationally intensive
operations are geometry algorithms, statistical processing, compressing, and image and text
processing.
• Use CLR functions for complex string processing. You might also find it necessary to specify
functions that involve complex processing of text strings. These functions can be
implemented in the database as CHECK constraints or inside stored procedures, which are
used to find, split, replace, and transform strings in the database. This functionality is
particularly important in extract, transform, and load (ETL) processes that extract information
from different sources and sometimes extract partial texts from string (CHAR, VARCHAR,
NCHAR, or NVARCHAR) columns, and must parse the strings into column values.
Because string processing is also computationally intensive, and the .NET Framework String
class and the System.Text.RegularExpressions namespace provide the necessary
functionality to process complex string processing, you should specify that this type of
functionality use CLR UDFs. For example, validation of a column that only accepts valid e-
mail addresses requires complex string processing that can involve several functions. In the
.NET Framework, the RegEx class provides the Match method, which allows validation of e-
mail in a single line of code. In Transact-SQL UDFs, this task would require much more
code, be less reliable, and reduce performance.
• Use CLR functions to extend, but not replace, built-in SQL Server Transact-SQL functions.
One advantage of including CLR assemblies in the database is that you can increase the level
of functionality that the database provides. You can easily access and write files, connect to
Web services, create references to external components, access Microsoft Active Directory®
services, send and read e-mail, write logs, add performance counters, and so on. However, the
best use of CLR functions is to extend the functionality of the built-in functions in SQL
Server, not replace them. When you replace SQL Server functions:
• You might confuse users who expect something different from the function.
• Your function might disrupt other system functions that consume the replaced function.
• SQL Server service packs might replace your function.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 12

Guidelines for Designing Transact-SQL User-Defined Functions

Principle: Apply best practices for designing Transact-SQL UDFs.


Introduction
In many cases, you will want to specify the use of Transact-SQL UDFs to help modularize database
code. However, Transact-SQL UDFs perform best when they are not used with large numbers of
rows. When you design a Transact-SQL UDF, you should keep the following guidelines in mind.
Guidelines for designing Transact-SQL UDFs
Consider the following guidelines when designing Transact-SQL UDFs:
• Minimize the number of rows to which a scalar function applies
The most significant performance cost for all types of UDFs results from row-by-row
processing when the functions are used in a Transact-SQL query. This is particularly true for
Transact-SQL scalar functions that are frequently used to modify output columns in SELECT
statements, because these functions will be executed once for every row returned by the
query. If you must specify the processing of a large number of rows, you should consider
some alternatives, in particular computed columns or indexed views.
• Minimize the number of rows returned by a table-valued function
Transact-SQL table-valued functions return results as a table by making an internal use of a
table variable. Although SQL Server 2005 implements table variables and temporary tables
similarly, there are significant differences. For example, table variables are well suited for
smaller tables because they are not transactional and cannot have secondary indexes. When
you specify the use of a table-valued function, you should evaluate the performance impact of

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 13

the table variable that will be involved. If the number of rows is significant, and especially if
indexes are required, consider specifying a stored procedure with a temporary table instead.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 14

Guidelines for Designing CLR User-Defined Functions

Principle: Apply best practices for designing CLR UDFs.


Introduction
When you specify the use of CLR UDFs in your database model, you must provide design details
regarding aspects of the functions that you specify. Your design must fulfill not only the user
requirements, but also operational requirements such as performance, scalability, and security.
Guidelines for designing CLR UDFs
Use the following guidelines when designing CLR UDFs:
• Specify the appropriate environment permission levels. When designing CLR UDFs, specify
the level of assembly access based on the principle of least privilege—giving a function only
the type of access absolutely necessary to perform a task.
The SAFE permission level is the default and should be used by functions that perform
computations only and that do not require access to any external resources. Functions that
extend the functionality of SQL Server by using external resources, or that reference other
databases, should use the EXTERNAL_ACCESS permission level. Avoid designing
functions that use the UNSAFE permission level.
• Group CLR UDFs based on required environment permission levels. When designing CLR
functions, you can choose to specify that all functions with the same SQL Server environment
permission level are placed in the same assembly. Specifying functions with different levels
of access in the same assembly is not recommended. Placing functions into appropriate

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 15

assemblies will help you follow the principle of least permission, which protects the server
and users from any damage that might occur if the code is exploited by a malicious user.
For example, you could specify one assembly that groups all UDFs requiring the
EXTERNAL_ACCESS permission level, while not including functions that do not need this
level of access in these assemblies. This design pattern reduces the attack surface of your
application, thereby protecting it from malicious users.
• Use schemas to group CLR UDFs based on user permissions. Group your designed CLR
UDFs by using object schemas based on the users who will consume the UDF’s services. For
example, if you specify a set of functions that extend SQL Server mathematical support, even
if they will be placed in different assemblies, you should specify a single schema to contain
them all. You can then specify that the EXECUTE permission be granted to the schema and
not to the individual functions.
• Set the Function attribute properties explicitly. The CLR UDF Function attribute has
properties that allow you to define how the server can use your functions. For example, if you
specify that the Function attribute property IsDeterministic should be true, SQL Server will
allow users to create indexes in calculated columns and indexed views that use this function.
When you specify that attributes should be set explicitly, your design will openly
communicate the intention of the function and enable the rest of the development team to plan
accordingly. If you do not specify the attributes explicitly and rely on the defaults, the rest of
the development team might assume the wrong value of the attribute.
• Hide the complexity of the .NET Framework. Design functions that help database developers
validate and manage data. Do not simply expose the .NET Framework classes in the database.
For example, to validate an e-mail message, do not create a Match function that exposes the
Match method of the Regex class, because it exposes too much .NET functionality. When
you do this, you force database developers to learn regular expressions to use the function.
One of the great advantages of CLR UDFs is that you can specify data validation by using
complex algorithms. This enables database developers to create constraints that utilize
complex user-defined validation. For example, if you need a check digit for a bank account
number, you can specify the use of a CLR UDF within the CHECK constraint, which
validates the digit and enforces the constraint in the table.
Sometimes you might need to access external resources from within the database objects. In
previous SQL Server versions, you had to use extended stored procedures or Component
Object Model (COM) components to accomplish this. With CLR functions, you can consume
and expose external resources such as the file system, event log, Web services, and registry in
an organized and easy-to-manage infrastructure.
• Do not design CLR UDFs to embed middle-tier or client functionality in the database.
Because you can specify complex code in CLR UDFs, you might be tempted to design
functionality into the database that is more properly placed in the middle or client tier. CLR
UDFs run in the SQL Server memory space. Therefore, if you use them to embed business
logic that belongs in the middle or client tier, they will compete with the database engine in
terms of CPU cycles and memory. For example, reports might need certain types of complex
string formatting for dates or numeric data. You could specify that this formatting be
performed in the database by using CLR UDFs, but this will negatively affect the scalability
of your design. When you design such business logic into the middle or client tier, and
outside the database server, your database design becomes more scalable.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 16

Considerations for Designing User-Defined Aggregate Functions

Principle: Apply considerations for designing user-defined aggregate functions.


Introduction
You can also specify the use of user-defined aggregate functions (UDAs) in your SQL Server 2005
database design. UDA functions accept a set of values as a parameter and return a single result as an
aggregate. In Transact-SQL, the built-in functions SUM and COUNT provide aggregate results, but
the number of Transact-SQL aggregate functions is limited and fixed. With CLR UDFs, you can
specify custom aggregate functions.
In your design, specify that the UDA function should use the SqlUserDefinedAggregate attribute,
and specify values for the attribute’s four properties: IsInvariantToDuplicates, IsInvariantToNulls,
IsInvariantToOrder, and IsNullIfEmpty. These values specify the behavior of the function with
regard to duplicates, nulls, order, and empty values. Within the function, you should specify the
operations in each of four methods:
Init
The Init method initializes variables that might be set from previous uses of this instance. You
must initialize each time because SQL Server may reuse the instance.
Accumulate
The Accumulate method aggregates the values. SQL Server invokes this method once for every
row in the accumulated set.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 17

Merge
The Merge method combines multiple partial computations of an aggregation.
Terminate
The Terminate method completes the aggregation and returns the result.
You can supply the pseudo-code for the methods, and let the developer implement them in managed
code.
Considerations for designing UDA functions
UDA functions have a specialized use and some limitations. Consider the following when designing
UDA functions:
• Specify UDAs to replace cursors in iterating through a result set
Complex aggregations are often performed by using cursors. Server-side cursors are usually
very expensive in terms of performance and are recommended only when set operations
based in data manipulation language (DML) statements cannot fulfill the requirements. With
UDAs, you can replace cursors, performing calculations on sets of values and returning a
single value. UDAs offer a significant performance gain over cursors. Additionally,
developing aggregate functions by using .NET procedural code offers easier development and
maintenance.
Analyze the current usage pattern for cursors in your application. Consider replacing server-
side cursors with UDA functions.
• Specify UDAs to extend but not replace Transact-SQL aggregate functions
SQL Server provides the following aggregate functions: AVG, CHECKSUM,
CHECKSUM_AGG, COUNT, COUNT_BIG, GROUPING, MAX, MIN, SUM, STDEV,
STDEVP, VAR, and VARP.
Your solution requirements might call for extending these built-in functions. Customized
aggregated functions such as CountWords, SumIf, and CountNull, or added statistical
functions such as HarmonicMean, Median, Percentile, and Mode might be useful in your
solutions.
• Note the 8-KB restriction on user-defined aggregate functions
When you specify a UDA, you might need to specify that the function persist the state of the
aggregate, thereby serializing it. There is a limit of 8192 bytes (8-KB page size) available to
store the state of the aggregate when using serialization within your UDAs. Beware of the
overhead needed to store the variables in your aggregate. For example, to store a 100-
character string, 202 bytes will be required: 2 bytes per character in the string, and 2 for
control.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 18

Practice: Specifying a User-Defined Aggregate

Scenario
As part of the Human Resource Vacation and Sick Leave Enhancement (HR VASE) project, you
analyze the report requirements and find several reports that use stored procedures designed with
server-side cursors. Some of the procedures are using a harmonic mean. You anticipate that these
cursors will have an inverse impact on server performance, and therefore you want to replace the
server-side cursors with UDA functions.
Specifying a user-defined aggregate
Procedure overview Procedure list
1. Review the formula for the harmonic 1. Review the following formula for the
mean. The harmonic mean algorithm is harmonic mean:
the reciprocal of the arithmetic mean of
the reciprocals of the values of a set.
2. Design a UDA for the harmonic mean.
Fill in the pseudo-code required for
expressing the algorithm, not actual
managed code.
3. Discuss your design with the rest of the
class.
2. Using the UDAPractice.doc template,
design a UDA for the harmonic mean.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 19

Fill in the pseudo-code required for


expressing the algorithm, not actual
managed code.
3. Discuss your design with the rest of the
class.

Answer Key
1. In Microsoft® Windows® Explorer, browse to the D:\Labfiles\Starter folder and double-click
the NamingStandardsTemplate.doc file.
2. You can compare your solution with the UDAPractice Solution.doc document. This
document is located at D:\Labfiles\Solution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 20

Discussion questions
Read the following questions and discuss your answers with the class.
Q Did you encounter any issues in designing the UDA?
A Answers may vary.
One issue might be that students have specified the use of arrays or collections to
maintain all of the supplied values of the function. This is not required. The structure
should keep only the number of rows and an accumulated value. The Accumulate
function should add the inverse value. Other students might have issues with the
division by zero in the harmonic mean, in which case the value of the functions is invalid
and the method should throw an exception.
Q What other uses can you think of for UDAs?
A Answers may vary.
Many UDAs will be special implementations of descriptive and inferential statistics,
taken from financial mathematics. An example would be a special UDA for calculating
an average that removes the top and the least value of the set.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 21

Lesson 3: Designing Stored Procedures

Lesson objective:
After completing this lesson, students will be able to:

Apply best practices for designing stored procedures.


Introduction
The most effective way to implement a database access methodology is to use stored procedures. The
use of stored procedures assists in addressing the security, performance, and programmability
requirements of the database solution. With the introduction of CLR stored procedures, database
designers have new options that were not available in earlier versions of SQL Server. The following
guidelines can help you decide what functionality should be implemented in stored procedures and
what technology should be used for their development.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 22

Guidelines for Selecting Between Transact-SQL and CLR Stored Procedures

Principle: Apply guidelines for selecting between Transact-SQL and CLR stored procedures.
Introduction
Stored procedures are routines of code stored in the server that allow developers to encapsulate code
for reuse. In previous versions of SQL Server, stored procedures could be written only by using
Transact-SQL, but in SQL Server 2005, you can also specify that procedures be written in managed
code.
Transact-SQL stored procedure is a saved collection of Transact-SQL statements (using DML and/or
data definition language [DDL] statements) that can receive and return parameters, as well as return
data to the caller.
CLR stored procedures are Public, static methods of a CLR assembly that is registered with the SQL
Server database. You can create a CLR stored procedure in any .NET-compatible language such as
Visual Basic .NET or Visual C#.
During the physical design process, you must decide which technology should be used to implement
stored procedures. The following guidelines will help you to make this decision.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 23

Guidelines for selecting between Transact-SQL and CLR stored procedures


Consider the following guidelines when selecting between Transact-SQL and CLR stored procedures:
• Assess the required stored procedure functionality
Transact-SQL statements should be the preferred method for data access and data
manipulation in the database. As a best practice, you should specify that procedures using
SELECT, UPDATE, DELETE, or INSERT operations be written using Transact-SQL.
If you are designing a stored procedure that requires more functionality than simple data
access, such as advanced procedural statements or the use of external resources, you should
specify that the procedure be written in managed code. For example, it is better to specify a
CLR stored procedure if the functionality requires compressing a string or invoking a Web
service.
• Consider the requirements for monitoring and accessing data
When selecting the type of stored procedures that your application will use, consider
management and operational requirements.
It is easier for database administrators to monitor and control data access when it is
implemented in Transact-SQL stored procedures than in CLR stored procedures because
Transact-SQL stored procedures are built from SQL statements. In most circumstances, the
text of a stored procedure can be recovered from the database without requiring a language
compiler to run. For this reason, Transact-SQL stored procedures will be easier to manage
and edit than CLR stored procedures.
CLR stored procedures are deployed as assemblies (DLL) written in managed code and
compiled into Microsoft Intermediate Language (MSIL). MSIL code is not as easy to read as
Transact-SQL. Even when original source code is available, the database administrator might
not be familiar with the syntax of the language.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 24

Guidelines for Designing Transact-SQL Stored Procedures

Principle: Apply guidelines for designing Transact-SQL stored procedures.


Introduction
In a database design, you should specify Transact-SQL stored procedures for data access purposes.
Because Transact-SQL is more natively adapted for accessing relational data, Transact-SQL stored
procedures can enhance the reusability, performance, and maintainability of database code.
Guidelines for designing Transact-SQL stored procedures
Consider the following guidelines when designing stored procedures:
• Use Transact-SQL stored procedures for all direct access to database tables
When designing for database access, you should specify that application code and users
should query the tables through use of Transact-SQL stored procedures rather than directly
querying the tables. Such an approach provides a more secure design and can deliver better
performance. In addition, using Transact-SQL stored procedures often provides the flexibility
of being able to change the schema of the database without modifying the application.
Note
CLR stored procedures should not have direct access to tables. As a best practice, application code,
including CLR stored procedures, should use Transact-SQL stored procedures to access data in
database tables. The only exception to this rule is when ad-hoc queries are required, which should
only be required for queries. Using a separate online analytical processing (OLAP) or data warehouse
database to serve ad-hoc queries can eliminate the need to run ad-hoc queries in the online transaction
processing (OLTP) system.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 25

• Specify set-based operations and not cursors


The Transact-SQL language is designed to work with sets of rows, and not to process rows
one at a time. Server-side cursors are a Transact-SQL construct that allows developers to
manipulate the information row-by-row. You should specify that Transact-SQL stored
procedures avoid cursors whenever possible.
• Specify the use of native Transact-SQL constructs (such as CTEs, PIVOT, or OUTPUT)
instead of temporary tables
SQL Server 2005 includes Transact-SQL enhancements that help reduce the need for using
temporary tables. Because temporary tables can affect query or procedure performance, you
should learn to specify these new constructs when possible in place of temporary tables.
Common table expressions (CTEs) allow you to create recursive queries that can reference
themselves. You can use CTEs in hierarchical structures such as organizations or tables of
accounts. PIVOT and UNPIVOT operations allow you to rotate a table-value expression.
PIVOT allows you to change the result set from one column to multiple columns. The
OUTPUT clause of the INSERT, UPDATE, and DELETE statements allows you to return
result sets directly from the statement.
• Design for reusable queries
Transact-SQL stored procedures have the advantage that they can be parsed and compiled
once and then cached and reused by SQL Server. When a stored procedure’s query plan is
reused, you help to increase its performance. Transact-SQL stored procedures should be
designed for reuse by specifying parameters and avoiding dynamic SQL.
• Minimize recompilation risks
Design stored procedures that avoid recompilation. SQL Server 2005 compiles Transact-SQL
stored procedures before running the procedures and caching the compiled query plans in
memory, in an area known as the procedure cache. When users execute the stored procedure
again, the server looks in the procedure cache for plan reuse opportunities. However, some
practices can cause stored procedures to recompile unnecessarily, consuming additional CPU
time and delaying their execution. To minimize the risk of recompilation, use SET options
correctly. Do not change session SET options during a connection, and use the same options
in stored procedures that are used at connection time. The SET options that can cause
recompilation if they are placed in a Transact-SQL stored procedure are:
• ANSI_NULL_DFLT_OFF
• ANSI_NULL_DFLT_ON
• ANSI_NULLS
• ANSI_PADDING
• ANSI_WARNINGS
• ARITHABORT
• CONCAT_NULL_YIELDS_NULL
• DATEFIRST
• DATEFORMAT
• FORCEPLAN
• LANGUAGE
• NO_BROWSETABLE

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 26

• NUMERIC_ROUNDABORT
• QUOTED_IDENTIFIER
• Use table variables instead of temporary tables. In general, when the amount of data being
stored temporarily is comparatively small, it is best to use table variables instead of temporary
tables. Creating and populating temporary tables in a Transact-SQL stored procedure can
trigger recompilations.
However, when the number of rows being stored is relatively large, using temporary tables in
stored procedures can provide better performance. You can create secondary indexes on
temporary tables, and SQL Server can automatically generate column statistics. Both of these
operations can help SQL Server create a more optimal query plan, and neither operation is
possible with table variables.
• Use EXEC ... WITH RECOMPILE. If you are designing a Transact-SQL stored procedure
that will accept an atypical parameter value, you might achieve better performance by forcing
a recompilation of the stored procedure. You do this by using the WITH RECOMPILE
option. You should also force a recompilation when the underlying data is skewed. New
executions without the WITH RECOMPILE option will use the previous plan, not the plan
created for the atypical parameter.
• Use CREATE PROCEDURE ... WITH RECOMPILE. Occasionally you might need to
design a Transact-SQL stored procedure that has widely varying parameters, with plans that
are highly dependent on the parameter values supplied. The RECOMPILE option forces the
recompilation of a stored procedure every execution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 27

Guidelines for Designing CLR Stored Procedures

Principle: Apply best practices for designing CLR stored procedures.


Introduction
You can specify the use of CLR stored procedures, and that the guidelines for their use parallel those
for CLR UDFs. The difference is that CLR stored procedures are invoked as stored procedures from
Transact-SQL, and they can return a result set.
Guidelines for designing CLR stored procedures
Consider the following guidelines when designing CLR stored procedures:
• Use only for complex computational and string processing operations. In the previous lesson
concerning design guidelines for CLR UDFs, you learned that there are conditions that can
make it more advantageous to specify CLR over Transact-SQL code. These conditions also
apply to CLR stored procedures. You can specify CLR stored procedures when there is a
clear need for complex computational operations or string processing operations.
• Use for accessing external data or resources. If your design calls for stored procedures that
must access the file system or consume Web services, specify the use of CLR stored
procedures.
Transact-SQL stored procedures can access operating system functionality by using system
extended stored procedures such as xp_cmdshell. However, this is cumbersome and can
introduce security vulnerabilities (such as not being able to control which applications the
user will be able to run).

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 28

• Avoid the use of direct data access or transactions. Do not access data directly from CLR
stored procedures. Design CLR stored procedures that consume Transact-SQL stored
procedures to access tables and manage database transactions. Avoid the use of direct data
access or transactions within the CLR stored procedures.
If you restrict the use of database transactions in CLR stored procedures, you can control the
work done in the database through the Transact-SQL stored procedure and reduce the
duration of locks held during the transaction. Using a transaction within a CLR stored
procedure makes it difficult to monitor and control the transaction’s resource usage.
• Specify security for CLR procedure access. Consider the following table when designing the
permission set of the assembly that includes the CLR stored procedures.

Permission set Guideline


SAFE Use when only local database access is required.
EXTERNAL_ACCESS Use when access to the local database and external resources
(files, Web services, registry, and network services) is
required.
UNSAFE Avoid. Use this permission set only when absolutely
necessary, and only on highly trusted and tested code.

• Design for monitoring and performance. When designing CLR stored procedures, remember
that database administrators might not have direct access to the CLR stored procedure source
code, as they often do in Transact-SQL stored procedures. This limits administrators’ ability
to replace slow-running code with better-performing code, and it also limits their ability to
monitor code inside the stored procedure.
To design CLR stored procedures for optimal performance, you should employ good
programming techniques and follow development best practices. One common technique is to
perform routine code reviews, unit-testing the code and testing for performance.
You should also consider including code that allows operations personnel to monitor and
troubleshoot CLR stored procedures.

Note
The .NET Framework includes the system.diagnostics namespace, which provides classes to monitor
your code.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 29

Lab: Designing a Database Access Strategy

Time estimated: 30 minutes

Scenario
You are the lead database designer working as part of the HR VASE project. The HR VASE project
will enhance the current Human Resources system. This system is based on the AdventureWorks
sample database built on SQL Server 2005.

You are asked to formulate a list of database requirements that your design must satisfy. The main
goals of the project are to:

• Provide managers with current and historical information about employee vacation and sick-
leave data in their own department.
• Give individual employees permission to view their vacation and sick leave balances.
• Give certain employees in the HR department permission to view all employee sick-leave and
vacation data.
• Give IT personnel the ability to view all employees who, while taking vacation or sick leave,
accessed the network (based on accessing a Microsoft Windows® event log file on another
server).
In this lab, you will determine which objects will be required to satisfy data access requirements,
ensuring that no clients directly access the HR database tables. You must then apply appropriate
security measures to the data retrieval objects.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 30

Exercise 1: Design Data Retrieval Objects


Introduction
Given a set of data retrieval requirements, design the appropriate objects that will satisfy the
requirement. The objects might be views, functions, or stored procedures. In addition, specify the
language (Transact-SQL or CLR) that you will use to create each object.

Design data retrieval objects

Summary Specifications
1. Based on the data retrieval requirements 3. Open the document
and the physical model, determine the 2782A_05_DataRetrievalRequirements.d
appropriate types of objects that will oc, located at
satisfy the requirements. install_folder\Labfiles\Mod05\Starter.
2. Class review 4. For each requirement, suggest
appropriate types of objects to satisfy the
data access requirement.
5. For each object, determine the
appropriate language to use.
6. Wait for the instructor to review different
solutions provided by students.

Answer Key

1. Open the document 2782A_05_DataRetrievalRequirements.doc, located at


install_folder\Labfiles\Mod05\Starter.
2. For each requirement, suggest appropriate types of objects to satisfy the data access
requirement.
3. For each object, determine the appropriate language to use.
4. You can compare you solution with the Data Retrieval Requirements Solution.doc document.
This document is located at D:\Labfiles\Solution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 31

Discussion questions
Read the following questions and discuss your answers with the class.
Q Which objects will you use to fulfill the Employee VASE history requirement?
A A T-SQL stored procedure; use a system function such as SYSTEM_USER to filter the
information.
Q Which objects will you use to fulfill the Subordinates VASE History Requirement?
A A Transact-SQL stored procedure that uses a function such as SYSTEM_USER to filter
the information is the best approach for filtering history data. Other possibilities include
a Transact-SQL UDF, but a UDF might not perform as well on a large history table.
Q Which objects will you use to fulfill the HR Weekly VASE Report Requirement?
AA T-SQL Stored Procedure. Use the PIVOT operation to avoid temporary tables.
Q Which objects will you use to fulfill the IT Audit VASE Report Requirement?
AA CLR UDF to access the Active Directory directory service information, and a SQL
stored procedure to create the report; or a CLR stored procedure to access the Active
Directory information and a T-SQL join to join the information to a SQL stored
procedure to access the SQL information.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 5: Designing a Database Access Strategy 32

Exercise 2: Design Security for Data Retrieval Objects


Introduction
Given the data retrieval objects designed in the previous exercise, specify the security strategy for
each object. Consider how permission will be granted (to user, to role, to application role, and so on),
where will they be granted (to object or schema), the need for EXECUTE AS, and so on.

Design security for data retrieval objects

Summary Specifications
1. Based on the objects designed in 1. Open the document
Exercise 1, specify an appropriate 2782A_05_DataRetrievalRequirements.d
security strategy for each object. oc, located at
2. Answer the questions and discuss the install_folder\Labfiles\Mod05\Starter.
results with the class. 2. Fill in the Object security strategy table.
3. For each object, determine the
appropriate language to use.
4. Wait for the instructor to review different
solutions provided by students.

Answer Key

1. Compare your solution with the Data Retrieval Requirements Solution.doc. This document is
located at D:\Labfiles\Solution

Discussion questions
Read the following questions and discuss your answers with the class.
Q What security choices did you make and why?
A Answers may vary.
Use an application role to grant access to the Employee VASE History stored procedure.
Employees should not be granted permission to the stored procedure.
Subordinates VASE History and HR Weekly VASE Report should be in one schema,
and permissions should be granted at the schema level to the Human Resources
department.
In the IT Audit VASE Report, the assembly with the CLR UDF or stored procedure
should be created with the EXTERNAL_ACCESS permission set.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies
Time estimated: 100 minutes

Lesson 1: Modeling Local Database Dependencies ........................................................................... 3


Apply guidelines for modeling local database dependencies......................................................... 3
Guidelines for Modeling Cross-Database Access............................................................................... 4
Principle: Apply guidelines for modeling cross-database access................................................... 4
Introduction ............................................................................................................................... 4
Guidelines.................................................................................................................................. 5
Guidelines for Using Extended Stored Procedures ............................................................................. 6
Principle: Apply guidelines for using extended stored procedures................................................ 6
Introduction ............................................................................................................................... 6
Guidelines.................................................................................................................................. 7
Considerations for Specifying COM Components in a Database Design........................................... 8
Principle: Consider security issues and alternatives to designing a database to use COM
components. ................................................................................................................................... 8
Introduction ............................................................................................................................... 8
Considerations ........................................................................................................................... 9

Lesson 2: Modeling Remote Database Dependencies ...................................................................... 11


Apply guidelines for modeling remote database dependencies. .................................................. 11
Introduction ............................................................................................................................. 11
Considerations for Using Linked Servers ......................................................................................... 12
Principle: Consider the major requirements that are involved with using linked servers. ........... 12
Introduction ............................................................................................................................. 12
Considerations ......................................................................................................................... 12
Guidelines for Modeling Data Distribution ...................................................................................... 14
Principle: Apply guidelines for modeling data distribution. ........................................................ 14
Introduction ............................................................................................................................. 14
Guidelines................................................................................................................................ 14
Guidelines for Modeling Database Availability................................................................................ 16
Principle: Apply guidelines for modeling database availability. ................................................. 16
Introduction ............................................................................................................................. 16
Guidelines................................................................................................................................ 16

Lab: Modeling Database Dependencies ............................................................................................ 18


Exercise 1: Design Cross-Database Access ..................................................................................... 19
Introduction ............................................................................................................................. 19
Design cross-database access .................................................................................................. 19
Discussion questions ............................................................................................................... 20
Exercise 2: Design Linked Servers .................................................................................................. 21
Introduction ............................................................................................................................. 21
Design linked servers ............................................................................................................. 21
Discussion questions ............................................................................................................... 21

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 2

Module objective:
After completing this module, you will be able to:
Apply best practices for modeling database dependencies.
Introduction
This module covers the guidelines for the last step of the physical database design process, designing
and modeling database dependencies.

Modeling database dependencies is a frequent requirement in applications where data is distributed


across multiple data sources. Documenting database dependencies is an important task in the physical
database design process. Clearly defined dependencies facilitate better technology validation, change
control, and release management procedures. Database dependencies can be classified as follows:
• Local dependencies that refer only to the local instance of Microsoft® SQL Server™ or
database server. Local dependencies can include database objects that refer to objects in other
databases, such as calls to stored procedures in other databases, system messages and
extended stored procedures, and calls to Component Object Model (COM) components on the
server.
• Remote dependencies that refer to SQL Server instances on other database servers, as well as
to other relational database management system (RDBMS) engines. Remote dependencies
use technologies such as linked servers, distribution technologies (such as replication and
Service Broker), and high availability technologies (such as database mirroring and log
shipping).

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 3

Lesson 1: Modeling Local Database Dependencies

Lesson objective:
After completing this lesson, students will be able to:

Apply guidelines for modeling local database dependencies.


Introduction
Local dependencies define the relationships a database has with other databases on the same server
and with server objects, such as extended stored procedures and COM components. In this lesson, you
will learn the guidelines for modeling local dependencies as part of your physical database model.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 4

Guidelines for Modeling Cross-Database Access

Principle: Apply guidelines for modeling cross-database access.


Introduction
When you implement a logical database design at the physical level, you will often define a single
database to implement your requirements. Database designers frequently choose a single-database
model for some of the following reasons:
• Often no significant performance difference exists between using one large database and
using multiple smaller databases.
• A single database permits a more consistent use of declared referential integrity (DRI).
• A single database can simplify database management.

Often, however, an application must reference multiple databases. In such cases, queries must refer
across databases to access required objects. If the user executing the query has ownership of objects in
the current database but not in the other database, the ownership chain between the databases is
broken and the query will not succeed. SQL Server 2005 supports cross-database access. When you
enable cross-database access, a user in one database does not own objects or map to an owner of the
objects in another database. For cross-database queries to succeed, the SQL Server 2005 cross-
database ownership chaining setting must be enabled. By default, ownership chaining is not enabled.
You can enable cross-database ownership chaining at both the database and server levels.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 5

Guidelines
Use the following guidelines when modeling cross-database access:
• Determine the requirements for cross-database access.
Organizations often rely on multiple applications to help manage their business, where each
application uses its own data model and its own database. However, it is also common for
queries in one database to make reference to data in other databases.
If your physical database model requires cross-database access and you do not specify it, your
physical design will be incomplete. It will lack required configuration settings and might not
be prepared for the security vulnerabilities that come with cross-database access.
In general, you should implement cross-database access to avoid duplicating data between
databases. For example, if the Retail Application needs the same product catalog as the E-
Commerce Application, cross-database access will enable both applications to reference the
same data, while keeping the information updated in one database and minimizing
maintenance.
• Consider the security implications of allowing cross-database ownership chaining.
In Module 3, “Modeling a Database at the Physical Level,” you learned that enabling cross-
database ownership can introduce a potential security vulnerability. You learned that it is a
good practice to leave the cross-database ownership option disabled to prevent users from
using the CREATE DATABASE permission to elevate their privileges. If you do enable
cross-database ownership chaining, it requires that you assign permissions to objects in each
database. For example, if you create a stored procedure that queries a table in another
database, grant users EXEC rights to the stored procedure and SELECT rights to the table.
• Use partially qualified names to reference external objects.
When referencing other databases, use partially qualified names that omit the server name
instead of fully qualified names. Use database.schema.object instead of
server.database.schema.object. Partially qualified names imply the server name, enabling
easier deployment, server renaming and upgrade, and multiple instances deployment.
• Use views to provide indirect access to external tables.
Create views in the local database that reference external database tables to hide the cross-
database reference and provide a single management point. A view that references the
external table can help during application upgrades by implementing a new structure while
the legacy data tables keep their old structure. For example, to reference the Person.Contact
table in another database, create the following view:
CREATE VIEW Person.ContactNames
AS
SELECT ContactID, FirstName, MiddleName, LastName
FROM AdventureWorks.Person.Contact

• Use triggers to enforce cross-database integrity.


In Module 3, “Modeling a Database at the Physical Level,” you learned that to enforce data
integrity across multiple databases, you should use triggers. Triggers that handle update and
delete actions in the parent table and insert and update actions in the child table are required
to ensure that all rows in the child table reference valid rows in the parent table. When using
this design pattern, consider its performance implications.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 6

Guidelines for Using Extended Stored Procedures

Principle: Apply guidelines for using extended stored procedures.


Introduction
Extended stored procedures are external routines written in programming languages such as C or C++.
Extended stored procedures use the SQL Server Open Data Services application programming
interface (API) to increase SQL Server functionality and provide access to external resources. When
extended stored procedures are used to provide functionality for a given database, they become a
dependency for that database, even though they reside in the master database.
Important
The purpose of this topic is to provide you with the necessary guidelines for using extended stored
procedures in case you need to support them for backward compatibility. However, it is not
recommended to build any new functionality using extended stored procedures because they are
officially deprecated in SQL Server 2005. Extended stored procedures might be removed in a future
version, and you should plan to port the functionality previously written as extended stored procedures
to CLR stored procedures. Extended stored procedures are supported in SQL Server 2005 for
backward compatibility only.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 7

Guidelines
When using extended stored procedures, consider the following guidelines:
ƒ Consider the security implications of extended stored procedures.
Extended stored procedures run in the security context of the host SQL Server. When extended
stored procedures do not perform input validations correctly, malicious users might be able to
run unauthorized code in the security context of the SQL Server service account. Possible
attacks include memory buffer overruns, SQL injection, elevation of privilege, and
nonrepudiation. The following guidelines will help you reduce the security exposure resulting
from using extended stored procedures:
ƒ Use service accounts with least privileges: You should use the principle of least
privilege when assigning permissions to service accounts so that the accounts have the
appropriate permission levels required for a task but do not have more permissions than
they need. Keeping the least privilege principle in mind, do not configure SQL Server
services to use the LocalSystem account or domain accounts with administrative
privileges. Use restricted domain accounts for SQL Server services.
ƒ Validate all user inputs: Do not trust user input; assume that all user input can be
malicious. All extended stored procedures should validate data type, character set,
minimum and maximum lengths, null values, numeric ranges, legal values, and string
patterns.
ƒ Do not concatenate user input before validating: String concatenation can result in the
server running malicious SQL statements. Validate the parameters at the beginning of the
extended stored procedure routine.
• Consider the risk to the SQL Server service.
Extended stored procedures share memory space with the SQL Server service (sqlservr.exe).
Memory leaks, unhandled exceptions, and other problems can reduce the stability and
reliability of SQL Server, and in some cases can cause the SQL Server service to terminate or
crash. Use extended stored procedures only from highly trusted sources. Consider using a
separate SQL Server instance to host databases that require the use of extended stored
procedures.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 8

Considerations for Specifying COM Components in a Database Design

Principle: Consider security issues and alternatives to designing a database to use COM components.
Introduction
SQL Server 2005 provides extended stored procedures that enable OLE Automation for access to
COM components by using Transact-SQL. OLE Automation stored procedures are sp_OACreate,
sp_OADestroy, sp_OAGetErrorInfo, sp_OAGetProperty, sp_OAMethod, sp_OASetProperty, and
sp_OAStop.
OLE Automation is disabled by default in SQL Server 2005. To enable OLE Automation, use the
SQL Server Surface Area Configuration (SAC) tool or the following script:
USE Master
GO
sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
sp_configure 'Ole Automation Procedures', 1
GO
RECONFIGURE
GO

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 9

However, extended stored procedures are deprecated in SQL Server 2005. Therefore, you should
avoid using the sp_OA* stored procedures. Instead, use CLR procedures or functions. If you cannot
enable CLR integration in your design, and Transact-SQL code must be used to access COM objects,
you must enable OLE Automation and use the sp_OA* stored procedures.

Considerations
Use the following considerations when working with COM components in the database:
• Consider security issues for OLE Automation in the database.
OLE Automation in the database is subject to the same security issues and limitations as
custom extended stored procedures. You can implement generic extended stored procedure
guidelines to reduce the security exposure produced by the use of sp_OA* procedures.
• Use wrapper stored procedures to encapsulate sp_OA* procedures.
If you must use sp_OA* procedures in your database, consider using wrapper stored
procedures that encapsulate sp_OA* procedures. A wrapper stored procedure executes OLE
Automation and prohibits applications or users from using the sp_OA* procedures directly.
Wrapper stored procedures define which COM objects and methods are used and do not allow
applications or users to specify them.
The following example illustrates how to create a wrapper stored procedure to enable
Transact-SQL queries to write to an external disk file without directly calling the sp_OA*
stored procedures:

CREATE PROCEDURE WriteLog(@Str varchar(100))


AS
DECLARE @FS int, @OLEResult int, @FileID int, @File
VARCHAR(100)
BEGIN TRY
SET @File='C:\Log.txt';
EXEC @OLEResult = sp_OACreate
'Scripting.FileSystemObject', @FS OUT
IF @OLEResult <> 0
BEGIN
RAISERROR (50001,16,
1,N'Scripting.FileSystemObject');
END
EXEC @OLEResult = sp_OAMethod @FS, 'OpenTextFile', @FileID
OUT, @File, 8, 1
IF @OLEResult <> 0
BEGIN
RAISERROR (50002,16, 1,N'OpenTextFile');
END
EXEC @OLEResult = sp_OAMethod @FileID, 'WriteLine', Null,
@Str
IF @OLEResult <> 0
BEGIN
RAISERROR (50003,16, 1,N'WriteLine');
END
EXEC @OLEResult = sp_OADestroy @FileID
EXEC @OLEResult = sp_OADestroy @FS
END TRY
BEGIN CATCH
PRINT 'Error in '+ERROR_MESSAGE();
END CATCH

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 10

GO
EXEC WriteLog 'TestLog'

The WriteLog stored procedure enables a user to write a log entry to a file without directly calling the
internal sp_OA* stored procedures. It thereby insulates the user from internal objects and enables you
to specify more granular security than would be possible if the user called the sp_OA* procedures
directly.

Note
The log writing functionality could be accomplished in an equally effective way by calling a CLR
stored procedure or function.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 11

Lesson 2: Modeling Remote Database Dependencies

Lesson objective:
After completing this lesson, students will be able to:

Apply guidelines for modeling remote database dependencies.


Introduction
Remote database dependencies define relationships of a database with the other databases on remote
database servers. Remote dependencies are required to interoperate with databases residing on a
different physical server, other relational databases, and databases in different geographic areas. Using
remote databases is sometimes required to increase database availability and scalability. In this lesson,
you will review the guidelines for modeling remote database dependencies.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 12

Considerations for Using Linked Servers

Principle: Identify the major requirements for using linked servers.


Introduction
To integrate your database design with other applications, you might find it necessary to specify that a
query accesses data from another server running SQL Server. Using a linked server is a practical
solution that simplifies connection management and cross-server access.
Considerations
When identifying the requirements for a linked server, consider the following:
• Determine the requirements for linked server access.
Organizations rely on multiple applications to help manage their business, and although it can
be convenient to have all business data on a single server, it is not always possible. For
example, you might need to specify that queries in your database can access databases
belonging to legacy applications and vendor applications.
You can specify cross-server data access using SQL Server linked servers. However, keep in
mind that linked server queries do not perform as well as queries on local servers. Therefore,
you should not specify linked server data access when you require frequent access to remote
data because the overhead will hinder query performance.
When frequent linked server access is required, use persisted linked servers to access remote
data. Linked servers provide the following important benefits:
ƒ Single administration point for connection parameters

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 13

ƒ Access control to specific data sources


ƒ Ability to issue distributed queries

• Determine the requirements for ad hoc name access using linked servers.
Sometimes applications choose the data sources at run time and need not create persistent
links. In those scenarios, SQL Server 2005 provides the ability to access ad hoc names. Ad
hoc names are a means of creating a momentary linked server at run time.
Ad hoc name access is disabled by default. To enable it use the SQL Server 2005 Surface
Area Configuration (SAC) tool or the Ad Hoc Distributed Queries Server option. Following
the security principle of reducing the exposed surface area of the system, enable ad hoc name
access only when applications require it.
To provide your application with cross-server access using ad hoc names, use the
OPENROWSET and OPENDATASOURCE functions. For example, to select HRContacts
information from the HRServer (Human Resources) server use the following code:
SELECT HRContacts.ContactID, HRContacts.FullName
FROM OPENROWSET('SQLNCLI',
'Server=HRServer;Trusted_Connection=yes;',
'SELECT ContactID, FirstName+' '+COALESCE(MiddleName+' ',
'',)+LastName AS FullName
FROM AdventureWorks.Person.Contacts') AS HRContacts;

• Determine authentication requirements for remote servers.


When you use linked servers, you have the option to control how the server will authenticate
itself to the remote servers. There are two available options:
ƒ Self-mapping
When self-mapping is enabled, the server emulates the existing security credentials of the
user. If the user is using SQL Authentication credentials, the server passes the account
name and password to the other server. If the user is authenticated with Windows
authentication, account delegation needs to be configured for self-mapping to work, and
the server will impersonate the user. Use self-mapping to create a more secure
environment with better access control and to use database tools to audit user access to
the remote server.
For More Information
For more information about the self-mapping authentication option, see “Configuring
Linked Servers for Delegation” in SQL Server Books Online.

ƒ Account mapping
Account mapping associates local accounts with remote SQL Server accounts. To create
an account mapping you need to provide the local account name, the remote account
name, and the remote password. The server passes the specified credentials on to the
remote server. Account mapping is more scalable and easier to manage than self-mapping
is. Use account mapping when there is no need to audit user access to the remote server.
• Determine requirements for cross-server transactions.
With Microsoft Distributed Transaction Coordinator (MSDTC), SQL Server 2005
provides transactions that are distributed across multiple servers. These transactions are
called distributed transactions, and they require the MSDTC service to be configured and
running properly. Analyze whether your solution requires distributed transactions, and
explicitly document this requirement. Distributed transactions cause tight coupling
between servers, and therefore should not be used except when necessary.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 14

Guidelines for Modeling Data Distribution

Principle: Apply guidelines for modeling data distribution.


Introduction
SQL Server 2005 provides technologies that support geographically or logically distributed
applications. Geographically distributed applications demand the ability to move data between
physically separated environments that share the same schema. Logically distributed applications use
an asynchronous programming model, based on messages and queues, to interact with other
applications. Logically distributed applications might or might not be physically separated.

Guidelines
Consider the following guidelines when modeling data distribution:
• Determine requirements for data distribution.
After you are confident in your single-database design or designs, you must address data
distribution requirements. Some guidelines for specifying data distribution requirements are
as follows:
ƒ Determine the data source for the geographically distributed application: Some
applications need to provide services to geographically distributed users. When designing
these applications, review the architecture of the application and determine whether a
centralized or distributed data source should be used. Centralized data should be the
preferred choice. You should consider data distribution only when the connections among
the users are slow, unreliable, or expensive.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 15

ƒ Consider the Service Level Agreement (SLA) for distributed applications or mobile
users: Distributed applications provide services to users even when connections are slow
or not available. Review the level of service that is required for distributed applications
and mobile users, and ensure that the communications infrastructure supports that level of
service.
To support mobile users and to raise the level of service for distributed applications when
the communications infrastructure is an issue, consider data distribution technologies
such as merge replication.
ƒ Consider external system communications: Modern applications use the Service
Oriented Architecture (SOA) to create separate components that interact with each other,
hiding the structural complexity of the solution. Consider the SOA approach that uses an
asynchronous messaging model to communicate with external systems.
• Define inputs and outputs
When working with distributed systems, it is critical that you define which information will
be shared between different geographic sites or applications. For example, in a geographically
distributed retail system, you should evaluate the need to share the detail of every order. It
might turn out that distributing only the summary information is sufficient.
• Specify an appropriate distribution technology
As you extend and refine the physical database model, you should implement the appropriate
distribution technology based on data distribution requirements. Following are some
examples of SQL Server data distribution technologies:
ƒ Replication: SQL Server replication is a technology that distributes data between
databases, including geographically distributed sites and mobile users. SQL Server
replication uses a publisher and subscriber metaphor with three types of replication:
snapshot, transactional, and merge. Choosing which replication type to implement
demands knowledge of the communication infrastructure, the quantity and volatility of
the data, and data usage patterns.
Consider using SQL Server transactional or snapshot replication to distribute read-only
data, and use merge replication to support mobile users.
ƒ Service Broker: Introduced in SQL Server 2005, Service Broker is a technology that
provides an asynchronous programming model, based on Transact-SQL intrinsics.
Applications using Service Broker use messages to communicate with other applications.
Messages in Service Broker are organized as dialogues, the essential concept of SOA
applications.
Consider using Service Broker to manage messages in SOA within or beyond your
application.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 16

Guidelines for Modeling Database Availability

Principle: Apply guidelines for modeling database availability.


Introduction
Availability is an operational requirement of every application and measures the ability of the solution
to remain operational for periods of time. Availability is usually measured as a percentage of
operational time.
Guidelines
When selecting availability technologies, consider the following guidelines:
• Determine availability requirements.
Availability requirements must be supported and validated in the context of business
requirements. It is important that stakeholders understand that high levels of availability can
be costly. When the penalty cost of not having the application available is significant, the
organization should consider the use of high availability technologies.
For a business that is open continuously, it is difficult but possible to achieve 99.9 percent
availability (8.76 hours of downtime per year) for a single SQL Server database or database
server by using reliable hardware and without using a specialized high availability
technology. Higher availability levels require high availability technologies, however, such as
clustering, database mirroring, log shipping, and replication. Third-party high availability
solutions are also available in addition to those SQL Server provides.
• Select an appropriate high availability technology for your business requirements.
The following high availability technologies are provided by SQL Server:

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 17

ƒ Clustering: Clustering is when two or more independent servers (nodes) work together
and share a common name. When one of the nodes fails, the remaining node or nodes
continue operations without affecting the service.
Failover clustering is the primary high availability technology because it provides
automatic failover at the server level. Note that clustering protects you only from server
failure and not shared disk failure or a local disaster. Failover clustering requires the
following components:
• A shared disk system
• Microsoft Windows Server™ 2003, Enterprise Edition, or Windows Server 2003,
Datacenter Edition
• SQL Server 2005 Standard Edition (2 nodes) or SQL Server 2005 Enterprise Edition
(up to 32 nodes)
ƒ Database mirroring: Database mirroring provides an almost instantaneous failover for a
single database. Database mirroring uses two copies of the same database located on
different servers. Only one of the databases is available for write operations. Database
mirroring is supported only in SQL Server 2005 Enterprise Edition and Standard Edition
(Standard Edition only allows Safety as Full). Database mirroring does not demand
specialized hardware, but it works with only a single database at a time. Use database
mirroring in more cost-conscious scenarios and to provide geographic fault tolerance.
ƒ Log shipping: Similar to database mirroring, log shipping uses two or more copies of the
database located on different servers, but it does not provide automatic failover. Log
shipping uses a backup and restore strategy to copy data from the principal server to
secondary servers. When the principal server fails, one of the secondary databases must
be brought fully up to date manually. Log shipping is supported in Workgroup Edition,
Standard Edition, and Enterprise Edition of SQL Server 2005. Use log shipping to
maintain a redundant database server that can drastically reduce the recovery interval of
failed hardware.
ƒ Peer-to-peer transactional replication: Peer-to-peer transactional replication makes a
set of servers peers to each other. In other words, each publisher server contains the same
data as all the other publisher servers because each SQL Server publisher also acts as a
subscriber to the others. Each publisher sends all the data to every subscriber and also is a
subscriber to all other publishers. The simplest configuration is two servers, each acting
as a publisher for and a subscriber to the other. Because each server contains all the data,
one can act as a standby server for the other. However, you must provide your own
mechanism, manual or coded, to redirect the application’s updates to the standby server
in case of a server failure.
For More Information
For more information about high availability technologies, refer to Course 2788,
Designing High Availability Database Solutions Using Microsoft SQL Server 2005.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 18

Lab: Modeling Database Dependencies

Time estimated: 30 minutes


Scenario
You are the lead database designer working as part of the Human Resources (HR) Vacation and Sick
Leave Enhancement (VASE) project. The physical model of the database has already been generated,
and you must extend the model to support additional requirements.

The HR VASE project will enhance the current HR system. This system is based on the
AdventureWorks sample database built on SQL Server 2005.

You are asked to formulate a list of database requirements that your design must satisfy. The main
goals of the project are as follows:
• Provide managers with current and historical information about employee vacation and sick
leave data.
• Give individual employees permission to view their vacation and sick leave balances.
• Give certain employees in the HR department permission to view and update employee salary
data.
• Give certain employees in the HR department permission to view and update employee sick
leave and vacation data.
• Give the HR manager permission to view and update all the data.
• Place HR VASE tables separately in their own databases.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 19

Exercise 1: Design Cross-Database Access


Introduction
Given the supplied physical data model and database objects dependencies for the HR VASE project,
determine security implications of the decision to use cross-database access with linked servers. Also,
determine the new objects that are required to support the decision of having a separate database for
the HR VASE project.

Design cross-database access


Summary Detailed Steps
1. Review the requirements in the lab 1. Open the PhysicalModel.vsd file located
scenario. in the D:\Labs\Starter folder.
2. Review the document PhysicalModel 2. Open the DatabaseObjects.doc file
diagram. located in the D:\Labs\Starter folder.
3. Review the DatabaseObjects document.
4. Add new objects required to support a
cross-database implementation.

Answer Key
1. Open the Physical Model diagram (PhysicalModel.vsd) located in the
install_folder\Labs\Mod06\Starter folder.
2. Analyze the new requirement of having a separate database for the HR VASE project using
the database diagram and table relationships.
3. Open the DatabaseObjects document located in the install_folder\Labs\Mod06\Starter folder.
4. Recommend new objects to support cross-database access.
5. Write your answers in the last rows of the DatabaseObjects template.
6. Review the security strategy and permissions required to support cross-database access.
7. Write your answers in the comments column in the DatabaseObjects template.
8. You can compare your solution with the sample Database Objects Solution document. This
document is located at D:\Labfiles\Solution.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 20

Discussion questions
Read the following questions and discuss your answers with the class.
Q What is your recommendation for new objects?
A Answers will vary.
The recommendations for the new objects are as follows:
• Triggers to maintain the integrity of SickLeaveEvent and VacationEvent
tables: InsSickLeaveEventTrg, UpdSickLeaveEventTrg,
InsVacationEventTrg, UpdVacationTrg, DelEmployeeTrg,
UpdEmployeeTrg.
• Views to manage cross-database access: Department, Employee, Shift, and
EmployeeDepartmentHistory. Alternatively, create two views to support the
stored procedures.

Q What is your recommendation for managing cross-database permissions?


A Answers will vary.
The recommendations to manage cross-database permissions are as follows:
• Use the EXECUTE AS option: Create a user account with access to the base
tables (Department, Employee, Shift, and EmployeeDepartmentHistory) in
the AdventureWorks database and to the VASE tables; use the EXECUTE
AS option in the procedures. This is the recommended solution.
• Use Service Broker: It is feasible to use Service Broker to manage messages
between applications in SOA.

Q Discuss the pros and cons of the HR director’s decision to separate the databases.
A Answers will vary.
Having a separate database can help increase the security of the data and can
also help with maintenance by splitting the database into separate files.
However, it might also cause added maintenance tasks because referential
integrity across different databases must be managed.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 21

Exercise 2: Design Linked Servers


Introduction
The HR director has requested that the new HR database reside on its own server. Review the
previous scenario, taking into account this new requirement.

Design linked servers


Summary Detailed Steps
1. Design a cross-server access strategy to 1. Open the DatabaseObjects.doc file
support the new requirement. located in the
install_folder\Labs\Mod06\Starter folder.

Answer Key
1. Open the DatabaseObjects document located in the install_folder\Labs\Mod06\Starter folder.
2. Recommend a cross-server strategy to support the HR director’s requirement that the database
reside on its own server.
3. Write your answers in the appropriate columns, and write your consideration in the
DatabaseObjects template.

Discussion question
Read the following question and discuss your answer with the class.
Q How will you configure cross-server access?
A The application does not require ad hoc names. It will use a linked server to the
AdventureWorks server. Mapping all HR VASE accounts to a single account on
the remote server is the recommended solution unless database auditing is
required on the remote server. Because distributed transactions are not
necessary, there is no need to start the MSDTC service.

MCT USE ONLY. STUDENT USE PROHIBITED


Module 6: Modeling Database Dependencies 22

MCT USE ONLY. STUDENT USE PROHIBITED