You are on page 1of 43

DSE 310: Fundamentals of Database Systems

Topic 1: Introduction
Course Goals
1. Understanding internal working of Database systems
2. Able to write basic and Intermediate SQL queries

Timings:
● Slot C
○ Monday 9.00 AM - 09.55 AM
○ Tuesday 9.00 AM - 09.55 AM
○ Thursday 9.00 AM - 09.55 AM
Similar online courses:
● “Introduction to Database systems” from CMU (Link)
● “Databases” from Stanford (Link)
● “DBMS” in NPTEL (Link)

Course Prerequisites
1. Programming Knowledge
2. Knowledge of data structures
3. Set theory operations
4. Propositions and predicate
5. Relations and functions
Course Evaluation:
● We will decide after registration!!

Books
● Abraham Silberschatz, Henry Korth, and S. Sudarshan. Database System Concepts.
6th Edition, McGraw-Hill Education, 2010.
● Joseph M. Hellerstein and Michael Stonebraker. Readings in Database Systems. 4th
Edition, MIT Press, 2005.
Course contents
1. Introduction to DBMS
2. Relational Algebra and Relational Calculus
3. Entity-Relationship Model, Relational Database Design
4. Structured Query Language (SQL)
5. Transactions and Concurrency Control
6. Recovery Systems
7. Storage and File Structure
8. Application Development
9. Indexing and Hashing
10. Query Processing, Query Optimization
1. Introduction

Objective:
Understanding importance of Database Management Systems
Understand basic terminologies
Understand role of data models and languages
Approaches to database design
What is a database?

● A database is a collection of data that is organized so that it can be easily accessed,


managed, and updated.
● A database management system (DBMS) is a software application that allows users
to create, maintain, and use databases.
Examples
● Customer relationship management (CRM):
○ Databases to store customer information: contact details, purchase history, and preferences.
○ This information can be used to personalize marketing campaigns, provide customer support, and
improve customer satisfaction.
● E-commerce websites:
○ Databases stores product information, such as product descriptions, prices, and inventory levels.
○ This information can be used to help customers find the products they are looking for, place orders,
and track the status of their orders.
Examples (contd.)
● Online banking systems:
○ Databases store customer account information, such as account numbers, balances, and transaction
history.
○ This information can be used to allow customers to view their account balances, make transfers, and
pay bills online.
● Hospital information systems:
○ Databases store patient information, such as medical records, test results, and allergies.
○ This information can be used to provide patient care, track patient progress, and make informed
decisions about patient treatment.
Examples (contd.)
● Now think how can database be used in following cases?
○ Airline reservation systems
○ Social media platforms
○ Online gaming platforms
○ Logistics and supply chain management systems
○ Scientific research databases
○ Government databases
Why should we learn DBMS?
● Often, we have to work with enormous amount of data
● DBMS can efficiently store and manage them
● With DBMS in backend, we can easily build applications for our task.

How traditionally people use to work?


● File based systems (e.g: csv, txt, xml etc.)
● Application has to parse multiple files to carry out a task.
● Parsing can be sequential/ random access
Drawbacks of file based systems
● Redundancy:
○ Same data is present in multiple files
● Consistency:
○ Update data in one file and forget to update in others
● Difficulty to access:
○ Need a new program for a new task
● Data isolation:
○ Multiple files and formats
● Integrity constraints:
○ Difficult to add constraints to a attribute (e.g. account balance > 0 for withdrawal)
○ Hard to update
Drawbacks contd..
● Atomicity of updates:
○ Either an operation has to carry out fully or not at all. E.g. fund transfer
● Concurrent access:
○ Difficult to avoid inconsistency when multiple users trying to access same data.
E.g. seat booking in railway reservation systems
● Security:
○ Hard to provide partial access of a file to a user.

DBMS provides solution to above problems :-)


View of Data:
View 1 View 2 View 3
● Three levels of Database Abstraction
○ Lowest level: Physical abstraction View level
○ Mid level: Logical abstraction
○ Highest level: View abstraction
● Physical abstraction:
○ How data is physically stored in storage?
○ How binary files are arranged? Logical level
how indexing is done?
● Logical abstraction:
○ What are the data entries
○ How they are related
Physical level
● View abstraction:
○ How application program views data
○ Some data can be hidden for security
Schema and Instances
● Analogous to data type and variables
● Presents metadata of a layer
● Logical schema:
○ Describes overall logical structure
○ E.g. Customer schema: Should describe the combination of attributes that is required to describe
customers

Name Cust ID Acc. No Adhar No Mobile No Address

String String Integer Integer Integer String

● Physical schema:
○ How data is represented in binary files
○ How binary files are indexed in the storage device etc.
Schema and Instances contd.
● Instance refers to the actual records (or a Table)

Name Cust ID Acc. No Adhar No Mobile No Address

Abdul C01 30841 774456 886757 Bhopal

Jenifer C02 23567 977178 998678 Goa

● Schemas rarely change, instances change frequently


● Between the schemas we need data independence
● Physical data independence
○ The ability to change physical schema without affecting logical schema
○ Interfaces between various levels should be well defined, so that changes in one level do not
influence the others.
Data models
● Collection of tools that describes
○ Data
○ Data relationships
○ Data Semantics
○ Data Constraints
● Types
○ Relational model (Our main focus)
○ Entity relationship data model (Mainly used in DB design)
○ Object based data model
○ Semi structured data model (using tags to store data)
○ Network model
○ Hierarchical data model
Relational model (RDBMS)
● Organise data in various tables
● Relations are implicitly defined by shared attributes

Customer Name Cust ID Acc. No Adhar No Mobile No Address

Abdul C01 30841 774456 886757 Bhopal

Jenifer C02 23567 977178 998678 Goa

Purchase Cust ID Item Qty

C01 Iphone 1

C02 pendrive 4
Entity relationship data model
Object based data model
Semi structured data model
This is a Data Model that is based on Graphs
Network model
Hierarchical data model
Object-relational data models
● Entries in relational models: atomic and simple
● Object relational data model:
a. Include object oriented property in relational model
b. Added constructs to to deal with additional data types
c. Tuples can have complex value entries like a ʻrelation/tableʼ (nested relations) itself.
d. Preserve relational foundations like declarative access to data etc.
e. Compatibility with relational languages

Extensible Markup Language (XML)


● Initially proposed to mark documents (section heads, tags etc)
● Later found out that it can be very useful in communication of data
● Now it has become a basic format to exchange data.
● Wide variety of tools present to parse data in this format.
Database Languages: DDL and DML
○ Two types
■ Pure/ Mathematical: Relational Algebra/ Relational Calculus
■ Commercial: SQL
Database Languages: DDL and DML

● Data definition language (DDL)


○ Use to define a schema
○ E.g.: CREATE TABLE Customer (ID char(10), Name varchar(20), … )
○ DDL compiler generates a set of table templates stored in data dictionary
○ Data dictionary contains the metadata
■ Database schema
■ Integrity constraints
■ Authorization etc.
SQL
● Structured query language (SQL) is the widely used in commercial systems
● Not a turning equivalent language
○ Everything that can be computed may not be computed using SQL
● To compute complex functions SQL is used in combination with with a high level
language like C, python etc.
● We use interface like ODBC/JDBC to pass SQL queries to DB and get the results.
Database design
● Decide general structure of database
● Logical design
○ Deciding good logical schema for our job
○ Two decisions
■ Business decision: what should we record in DB
■ Computational decision: What relation schemas we should use, how attributes will be divided
among schemas.
● Physical design
○ Deciding physical layout
○ What are the DB files? what indexing we will use?
Database Engine
● Three major components
a. Storage manager
b. Query processing
c. Transaction manager
● Storage manager:
○ Software module that acts as an interface between low level data stored in database and application
programs and queries submitted in the system
○ Two tasks
■ Interaction with OS file manager (storage access)
■ Efficient storing, retrieving and updating data (file organisation, indexing and hashing)
Database Engine contd.
● Query processing
a. Parsing/ translation (to pure
language)
b. Optimization
c. Evaluation
● Evaluation
a. Equivalent expressions
b. Different algorithms for
Each operation.
● Bad evaluation way can cost huge
Database Engine contd.
● Transaction Manager
a. Transaction: set of computational operations for a single logical function
b. It ensures that database always remains in consistent state even if system fails.
c. Also, it controls the interaction among the concurrent/ parallel transactions so that the database
remains in consistent state.
Attributes and Tuples
Attribute types
● Types:
○ Numeric (Integer, Decimal) e.g.: Salary
○ Date and Time e.g. ( e.g. joining date of an employee)
○ Alphanumeric Strings (e.g. IFSC code)
○ Alphabet strings (First name, last name)
○ Spatial data type (GIS coordinates)
○ JSON (to store dictionaries) and so on..
● Spacial type: ʻnullʼ which means ʻunknownʼ. Can cause complications

Attribute types in MySQL: https://dev.mysql.com/doc/refman/8.0/en/data-types.html


Relation schema and instances
Relations are Unordered
● Order of tuples and attributes are irrelevant.
Keys
Keys..
Keys..
Thank you

You might also like