You are on page 1of 33

Database Systems

Session 1 – Main Theme


Introduction to Database Systems
Dr. Jean-Claude Franchitti

New York University


Computer Science Department
Courant Institute of Mathematical Sciences

Presentation material partially based on textbook slides


Fundamentals of Database Systems (6th Edition)
by Ramez Elmasri and Shamkant Navathe
Slides copyright © 2011

Agenda

11 Instructor
Instructor and
and Course
Course Introduction
Introduction

22 Introduction
Introduction to
to Database
Database Systems
Systems

33 Database
Database System
System Architecture
Architecture

44 Summary
Summary and
and Conclusion
Conclusion

2
Who am I?

- Profile -
¾ 27 years of experience in the Information Technology Industry, including twelve years of experience working
for leading IT consulting firms such as Computer Sciences Corporation
¾ PhD in Computer Science from University of Colorado at Boulder
¾ Past CEO and CTO
¾ Held senior management and technical leadership roles in many large IT Strategy and Modernization
projects for fortune 500 corporations in the insurance, banking, investment banking, pharmaceutical, retail,
and information management industries
¾ Contributed to several high-profile ARPA and NSF research projects
¾ Played an active role as a member of the OMG, ODMG, and X3H2 standards committees and as a
Professor of Computer Science at Columbia initially and New York University since 1997
¾ Proven record of delivering business solutions on time and on budget
¾ Original designer and developer of jcrew.com and the suite of products now known as IBM InfoSphere
DataStage
¾ Creator of the Enterprise Architecture Management Framework (EAMF) and main contributor to the creation
of various maturity assessment methodology
¾ Developed partnerships between several companies and New York University to incubate new
methodologies (e.g., EA maturity assessment methodology developed in Fall 2008), develop proof of
concept software, recruit skilled graduates, and increase the companies’ visibility
3

How to reach me?

Come on…what else


did you expect? Cell (212) 203-5004

Email jcf@cs.nyu.edu

AIM, Y! IM, ICQ jcf2_2003

MSN IM jcf2_2003@yahoo.com

LinkedIn http://www.linkedin.com/in/jcfranchitti
Woo hoo…find the word
of the day… Twitter http://twitter.com/jcfranchitti

Skype jcf2_2003@yahoo.com

4
What is the class about?

ƒ Course description and syllabus:


» http://www.nyu.edu/classes/jcf/CSCI-GA.2433-001
» http://cs.nyu.edu/courses/fall11/CSCI-GA.2433-001/

ƒ Textbooks:
» Fundamentals of Database Systems (6th Edition)
Ramez Elmasri and Shamkant Navathe
Addition Wesley
ISBN-10: 0-1360-8620-9, ISBN-13: 978-0136086208 6th Edition (04/10)

Icons / Metaphors

Information

Common Realization

Knowledge/Competency Pattern

Governance

Alignment

Solution Approach

66
Course Objectives

„ Gain understanding of fundamental concepts of state-of-the art


databases (more precisely called: Database Management Systems)
„ Get to know some of the tools used in the design and
implementations of databases
„ Know enough so that it is possible to read/skim a database system
manual and
„ Start designing and implementing small data bases
„ Start managing and interacting with existing small to large databases
„ Experiment and practice with industry leading vendor solutions:
„ CA’s Erwin for design of relational database
„ Oracle, IBM DB2, and other DB products for writing relational queries

Key Material Covered (1/2)

„ Methodology used for modeling a business application


during the database design process, focusing on entity-
relationship model and entity relationship diagrams
„ Relational model and implementing an entity-relationship
diagram
„ Relational algebra (using SQL syntax)
„ SQL as data manipulation language
„ SQL as data definition language
„ Refining a relational implementation, including the
normalization process and the algorithms to achieve
normalization

8
Key Material Covered (2/2)

„ Physical design of the database using various file organization and


indexing techniques for efficient query processing
„ Concurrency Control
„ Recovery
„ Query execution
„ Data warehouses
„ Additional topics may be covered as time allows, these topics are
covered in greater depth in other courses but PowerPoint presentations
for them will still be provided
„ The course material is partially derived from the textbook slides and
material covered as part of the Database Systems course offered at
NYU Courant in previous semesters

Software Requirements

„ Microsoft Windows XP (Professional Ed.) / Vista / 7


„ Software tools will be available from the Internet or from the
course Web site under demos as a choice of freeware or
commercial tools
„ Database Modeling Tools
„ Database Management Software Tools
„ etc.
„ References will be provided on the course Web site

10
Agenda

11 Instructor
Instructor and
and Course
Course Introduction
Introduction

22 Introduction
Introduction to
to Database
Database Systems
Systems

33 Database
Database System
System Architecture
Architecture

44 Summary
Summary and
and Conclusion
Conclusion

11

Section Outline

ƒ Introduction
ƒ An Example
ƒ Characteristics of the Database Approach
ƒ Actors on the Scene
ƒ Workers behind the Scene
ƒ Advantages of Using the DBMS Approach
ƒ A Brief History of Database Applications
ƒ When Not to Use a DBMS

12
Overview

ƒ Traditional database applications


ƒ Store textual or numeric information
ƒ Multimedia databases
ƒ Store images, audio clips, and video streams digitally
ƒ Geographic information systems (GIS)
ƒ Store and analyze maps, weather data, and satellite images
ƒ Data warehouses and online analytical processing (OLAP)
systems
ƒ Extract and analyze useful business information from very large
databases
ƒ Support decision making
ƒ Real-time and active database technology
ƒ Control industrial and manufacturing processes

13

Introduction (1/3)

ƒ Database
ƒ Collection of related data
ƒ Known facts that can be recorded and that have implicit meaning
ƒ Mini-world or Universe of Discourse (UoD)
ƒ Represents some aspect of the real world
ƒ Logically coherent collection of data with inherent meaning
ƒ Built for a specific purpose
ƒ Example of a large commercial database
ƒ Amazon.com
ƒ Database management system (DBMS)
ƒ Collection of programs
ƒ Enables users to create and maintain a database
ƒ Defining a database
ƒ Specify the data types, structures, and constraints of the data to
be stored
14
Introduction (2/3)

ƒ Meta-data
ƒ Database definition or descriptive information
ƒ Stored by the DBMS in the form of a database catalog or
dictionary
ƒ Manipulating a database
ƒ Query and update the database miniworld
ƒ Generate reports
ƒ Sharing a database
ƒ Allow multiple users and programs to access the database
simultaneously
ƒ Application program
ƒ Accesses database by sending queries to DBMS
ƒ Query
ƒ Causes some data to be retrieved

15

Introduction (3/3)

ƒ Transaction
ƒ May cause some data to be read and some data to be written into
the database
ƒ Protection includes:
ƒ System protection
ƒ Security protection
ƒ Maintain the database system
ƒ Allow the system to evolve as requirements change over time

16
Database System Environment

17

High-Level Example - Metadata

ƒ UNIVERSITY database
ƒ Information concerning students, courses, and grades in a
university environment
ƒ Data records
ƒ STUDENT
ƒ COURSE
ƒ SECTION
ƒ GRADE_REPORT
ƒ PREREQUISITE
ƒ Specify structure of records of each file by specifying data
type for each data element
ƒ String of alphabetic characters
ƒ Integer
ƒ etc.

18
High-Level Example – Database Implementation (1/2)

ƒ Construct UNIVERSITY database


ƒ Store data to represent each student, course, section, grade report, and
prerequisite as a record in appropriate file
ƒ Relationships among the records
ƒ Manipulation involves querying and updating
ƒ Examples of queries:
ƒ Retrieve the transcript
ƒ List the names of students who took the section of the ‘Database’ course offered in
fall 2008 and their grades in that section
ƒ List the prerequisites of the ‘Database’ course
ƒ Examples of updates:
ƒ Change the class of ‘Smith’ to sophomore
ƒ Create a new section for the ‘Database’ course for this semester
ƒ Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
ƒ Phases for designing a database:
ƒ Requirements specification and analysis
ƒ Conceptual design
ƒ Logical design
ƒ Physical design
19

High-Level Example – Database Implementation (2/2)

20
Characteristics of the Database Approach

ƒ Traditional file processing


ƒ Each user defines and implements the files needed for a specific
software application
ƒ Database approach
ƒ Single repository maintains data that is defined once and then
accessed by various users
ƒ Main characteristics of database approach
ƒ Self-describing nature of a database system
ƒ Insulation between programs and data, and data abstraction
ƒ Support of multiple views of the data
ƒ Sharing of data and multiuser transaction processing

21

Self-Describing Nature of a Database System

ƒ Database system
contains complete
definition of structure
and constraints
ƒ Meta-data
ƒ Describes structure of
the database
ƒ Database catalog
used by:
ƒ DBMS software
ƒ Database users who
need information about
database structure

22
Insulation Between Programs and Data

ƒ Program-data independence
ƒ Structure of data files is stored in DBMS catalog separately from
access programs
ƒ Program-operation independence
ƒ Operations specified in two parts:
• Interface includes operation name and data types of its arguments
• Implementation can be changed without affecting the interface

23

Data Abstraction

ƒ Data abstraction
ƒ Allows program-data independence and program-operation
independence
ƒ Conceptual representation of data
ƒ Does not include details of how data is stored or how operations
are implemented
ƒ Data model
ƒ Type of data abstraction used to provide conceptual
representation

24
Support of Multiple Views of the Data

ƒ View
ƒ Subset of the database
ƒ Contains virtual data derived from the database files but is not
explicitly stored
ƒ Multiuser DBMS
ƒ Users have a variety of distinct applications
ƒ Must provide facilities for defining multiple views

25

Sharing of Data and Multiuser Transaction Processing

ƒ Allow multiple users to access the database at the same


time
ƒ Concurrency control software
ƒ Ensure that several users trying to update the same data do so in
a controlled manner
• Result of the updates is correct
ƒ Online transaction processing (OLTP) application
ƒ Transaction
ƒ Central to many database applications
ƒ Executing program or process that includes one or more database
ƒ Isolation property
• Each transaction appears to execute in isolation from other
transactions
ƒ Atomicity property
• Either all the database operations in a transaction are executed or
none are
26
Actors on the Scene

ƒ Database administrators (DBA) are responsible for:


ƒ Authorizing access to the database
ƒ Coordinating and monitoring its use
ƒ Acquiring software and hardware resources
ƒ Database designers are responsible for:
ƒ Identifying the data to be stored
ƒ Choosing appropriate structures to represent and store this data
ƒ End users
ƒ People whose jobs require access to the database
ƒ Types
• Casual end users
• Naive or parametric end users
• Sophisticated end users
• Standalone users
ƒ System analysts
ƒ Determine requirements of end users
ƒ Application programmers
ƒ Implement these specifications as programs
27

Workers behind the Scene

ƒ DBMS system designers and implementers


ƒ Design and implement the DBMS modules and interfaces as a
software package
ƒ Tool developers
ƒ Design and implement tools
ƒ Operators and maintenance personnel
ƒ Responsible for running and maintenance of hardware and
software environment for database system

28
Advantages of Using the DBMS Approach (1/3)

ƒ Controlling redundancy
ƒ Data normalization
ƒ Denormalization
• Sometimes necessary to use controlled redundancy to improve the
performance of queries
ƒ Restricting unauthorized access
ƒ Security and authorization subsystem
ƒ Privileged software
ƒ Providing persistent storage for program objects
ƒ Complex object in C++ can be stored permanently in an object-oriented
DBMS
ƒ Impedance mismatch problem
• Object-oriented database systems typically offer data structure compatibility
ƒ Providing storage structures and search techniques for efficient query
processing
ƒ Indexes
ƒ Buffering and caching
ƒ Query processing and optimization
29

Advantages of Using the DBMS Approach (2/3)

ƒ Providing backup and recovery


ƒ Backup and recovery subsystem of the DBMS is responsible for
recovery
ƒ Providing multiple user interfaces
ƒ Graphical user interfaces (GUIs)
ƒ Representing complex relationships among data
ƒ May include numerous varieties of data that are interrelated in
many ways
ƒ Enforcing integrity constraints
ƒ Referential integrity constraint
• Every section record must be related to a course record
ƒ Key or uniqueness constraint
• Every course record must have a unique value for Course_number
ƒ Business rules
ƒ Inherent rules of the data model
30
Advantages of Using the DBMS Approach (3/3)

ƒ Permitting inferencing and actions using rules


ƒ Deductive database systems
• Provide capabilities for defining deduction rules
• Inferencing new information from the stored database facts
ƒ Trigger
• Rule activated by updates to the table
ƒ Stored procedures
• More involved procedures to enforce rules
ƒ Additional implications of using the database approach
ƒ Reduced application development time
ƒ Flexibility
ƒ Availability of up-to-date information
ƒ Economies of scale

31

A Brief History of Database Applications (1/2)

ƒ Early database applications using hierarchical


and network systems
ƒ Large numbers of records of similar structure
ƒ Providing data abstraction and application
flexibility with relational databases
ƒ Separates physical storage of data from its conceptual
representation
ƒ Provides a mathematical foundation for data
representation and querying
ƒ Object-oriented applications and the need for
more complex databases
ƒ Used in specialized applications: engineering design,
multimedia publishing, and manufacturing systems
32
A Brief History of Database Applications (2/2)

ƒ Interchanging data on the Web for e-commerce using


XML
ƒ Extended markup language (XML) primary standard for
interchanging data among various types of databases and Web
pages
ƒ Extending database capabilities for new applications
ƒ Extensions to better support specialized requirements for
applications
ƒ Enterprise resource planning (ERP)
ƒ Customer relationship management (CRM)
ƒ Databases versus information retrieval
ƒ Information retrieval (IR)
• Deals with books, manuscripts, and various forms of library-based
articles

33

When Not to Use a DBMS

ƒ More desirable to use regular files for:


ƒ Simple, well-defined database applications not expected to
change at all
ƒ Stringent, real-time requirements that may not be met because of
DBMS overhead
ƒ Embedded systems with limited storage capacity
ƒ No multiple-user access to data

34
Summary

ƒ Database
ƒ Collection of related data (recorded facts)
ƒ DBMS
ƒ Generalized software package for implementing and maintaining a
computerized database
ƒ Several categories of database users
ƒ Database applications have evolved
ƒ Current trends: IR, Web

35

Agenda

11 Instructor
Instructor and
and Course
Course Introduction
Introduction

22 Introduction
Introduction to
to Database
Database Systems
Systems

33 Database
Database System
System Architecture
Architecture

44 Summary
Summary and
and Conclusion
Conclusion

36
Section Outline

ƒ Data Models, Schemas, and Instances


ƒ Three-Schema Architecture and Data
Independence
ƒ Database Languages and Interfaces
ƒ The Database System Environment
ƒ Centralized and Client/Server Architectures
for DBMSs
ƒ Classification of Database Management Systems

37

Database System Concepts and Architecture

ƒ Basic client/server DBMS architecture


ƒ Client module
ƒ Server module

38
Data Models, Schemas, and Instances

ƒ Data abstraction
ƒ Suppression of details of data organization and storage
ƒ Highlighting of the essential features for an improved
understanding of data
ƒ Data model
ƒ Collection of concepts that describe the structure of a
database
ƒ Provides means to achieve data abstraction
ƒ Basic operations
ƒ Specify retrievals and updates on the database
ƒ Dynamic aspect or behavior of a database application
ƒ Allows the database designer to specify a set of valid
operations allowed on database objects

39

Categories of Data Models (1/2)

ƒ High-level or conceptual data models


ƒ Close to the way many users perceive data
ƒ Low-level or physical data models
ƒ Describe the details of how data is stored on computer storage media
ƒ Representational data models
ƒ Easily understood by end users
ƒ Also similar to how data organized in computer storage
ƒ Entity
ƒ Represents a real-world object or concept
ƒ Attribute
ƒ Represents some property of interest
ƒ Further describes an entity
ƒ Relationship among two or more entities
ƒ Represents an association among the entities
ƒ Entity-Relationship model

40
Categories of Data Models (2/2)

ƒ Relational data model


ƒ Used most frequently in traditional commercial DBMSs
ƒ Object data model
ƒ New family of higher-level implementation data models
ƒ Closer to conceptual data models
ƒ Physical data models
ƒ Describe how data is stored as files in the computer
ƒ Access path
• Structure that makes the search for particular database records efficient
ƒ Index
• Example of an access path
• Allows direct access to data using an index term or a keyword

41

Schemas, Instances, and Database State (1/2)

ƒ Database schema
ƒ Description of a
database
ƒ Schema diagram
ƒ Displays selected
aspects of schema
ƒ Schema construct
ƒ Each object in the
schema
ƒ Database state or
snapshot
ƒ Data in database at a
particular moment in
time

42
Schemas, Instances, and Database State (2/2)

ƒ Define a new database


ƒ Specify database schema to the DBMS
ƒ Initial state
ƒ Populated or loaded with the initial data
ƒ Valid state
ƒ Satisfies the structure and constraints specified in the schema
ƒ Schema evolution
ƒ Changes applied to schema as application requirements change

43

Three-Schema Architecture and Data Independence

ƒ Internal level
ƒ Describes physical storage structure of the database
ƒ Conceptual level
ƒ Describes structure of the whole database for a community of users
ƒ External or view level
ƒ Describes part of the database that a particular user group is interested in

44
Data Independence

ƒ Capacity to change the schema at one level of a database


system
ƒ Without having to change the schema at the next higher level
ƒ Types:
ƒ Logical
ƒ Physical

45

DBMS Languages

ƒ Data definition language (DDL)


ƒ Defines both schemas
ƒ Storage definition language (SDL)
ƒ Specifies the internal schema
ƒ View definition language (VDL)
ƒ Specifies user views/mappings to conceptual schema
ƒ Data manipulation language (DML)
ƒ Allows retrieval, insertion, deletion, modification
ƒ High-level or nonprocedural DML
• Can be used on its own to specify complex database operations
concisely
• Set-at-a-time or set-oriented
ƒ Low-level or procedural DML
• Must be embedded in a general-purpose programming language
• Record-at-a-time
46
DBMS Interfaces

ƒ Menu-based interfaces for Web clients or browsing


ƒ Forms-based interfaces
ƒ Graphical user interfaces
ƒ Natural language interfaces
ƒ Speech input and output
ƒ Interfaces for parametric users
ƒ Interfaces for the DBA

47

The Database System Environment

ƒ DBMS component modules


ƒ Buffer management
ƒ Stored data manager
ƒ DDL compiler
ƒ Interactive query interface
• Query compiler
• Query optimizer
ƒ Pre-compiler
ƒ Runtime database processor
ƒ System catalog
ƒ Concurrency control system
ƒ Backup and recovery system

48
Component Modules of a DBMS

49

Database System Utilities

ƒ Loading
ƒ Load existing data files
ƒ Backup
ƒ Creates a backup copy of the database
ƒ Database storage reorganization
ƒ Reorganize a set of database files into different file organizations
ƒ Performance monitoring
ƒ Monitors database usage and provides statistics to the DBA

50
Tools, Application Environments, and Communications Facilities

ƒ CASE Tools
ƒ Data dictionary (data repository) system
ƒ Stores design decisions, usage standards, application program
descriptions, and user information
ƒ Application development environments
ƒ Communications software

51

Centralized and Client/Server Architectures for DBMSs

ƒ Centralized DBMSs Architecture


ƒ All DBMS functionality, application program execution, and user
interface processing carried out on one machine

52
Basic Client / Server Architectures (1/2)

ƒ Servers with specific functionalities


ƒ File server
• Maintains the files of the client machines.
ƒ Printer server
• Connected to various printers; all print requests by the clients
are forwarded to this machine
ƒ Web servers or e-mail servers
ƒ Client machines
ƒ Provide user with:
• Appropriate interfaces to utilize these servers
• Local processing power to run local applications

53

Basic Client/Server Architectures (2/2)

ƒ Client
ƒ User machine that provides user interface capabilities
and local processing
ƒ Server
ƒ System containing both hardware and software
ƒ Provides services to the client machines
• Such as file access, printing, archiving, or database access

54
Sample Two-Tier Client / Server Architecture

55

Two-Tier Client/Server Architectures for DBMSs

ƒ Server handles
ƒ Query and transaction functionality related to SQL
processing
ƒ Client handles
ƒ User interface programs and application programs
ƒ Open Database Connectivity (ODBC)
ƒ Provides application programming interface (API)
ƒ Allows client-side programs to call the DBMS
• Both client and server machines must have the necessary
software installed
ƒ JDBC
ƒ Allows Java client programs to access one or more
DBMSs through a standard interface
56
Application Server or Web Server

Three-Tier and n-Tier Architectures for Web Applications


ƒ Application server or Web server
ƒ Adds intermediate layer between client and the
database server
ƒ Runs application programs and stores business rules
ƒ N-tier
ƒ Divide the layers between the user and the stored data
further into finer components

57

Sample Logical Three-Tier Client/Server

58
Classification of Database Management Systems

ƒ Data model
ƒ Relational
ƒ Object
ƒ Hierarchical and network (legacy)
ƒ Native XML DBMS
ƒ Number of users
ƒ Single-user
ƒ Multiuser
ƒ Number of sites
ƒ Centralized
ƒ Distributed
ƒ Homogeneous
ƒ Heterogeneous
ƒ Cost
ƒ Open source
ƒ Different types of licensing

59

Classification of Database Management Systems

ƒ Types of access path options


ƒ General or special-purpose

60
Section Summary

ƒ Concepts used in database systems


ƒ Main categories of data models
ƒ Types of languages supported by DMBSs
ƒ Interfaces provided by the DBMS
ƒ DBMS classification criteria:
ƒ Data model, number of users, number of sties, access paths, cost

61

Agenda

11 Instructor
Instructor and
and Course
Course Introduction
Introduction

22 Introduction
Introduction to
to Database
Database Systems
Systems

33 Database
Database System
System Architecture
Architecture

44 Summary
Summary and
and Conclusion
Conclusion

62
Course Assignments

„ Individual Assignments
„ Reports based on case studies / class presentations
„ Textbook problem sets
„ Project-Related Assignments
„ All assignments (other than the individual assessments) will
correspond to milestones in the course project

63

Assignments & Readings

ƒ Readings
» Slides and Handouts posted on the course web site
» Textbook: Chapters 1 & 2

64
Next Session: Relational Data Model and Relational Database Constraints

65

Any Questions?

66

You might also like