Professional Documents
Culture Documents
Data refers to raw facts, symbols, or values that have little or no meaning on their own. Data
can be in the form of numbers, text, images, sounds, or any other representation of facts.
Example: A series of numbers (e.g., 5, 10, 15, 20) or a collection of letters (e.g., A, B, C, D) are
examples of data.
Information is data that has been processed, organized, and contextualized to make it
meaningful, relevant, and useful. It provides insight, knowledge, or understanding. Example:
When the numbers 5, 10, 15, and 20 are organized and presented as "These are the scores of four
students on a test: Student A scored 5, Student B scored 10, Student C scored 15, and Student D
scored 20," it becomes information that conveys test scores for specific students.
Data abstraction refers to the process of hiding the complex underlying details of the
database system from users and applications while providing a simplified and consistent view of
the data. It involves creating a level of abstraction that separates the physical storage of data from
its logical
• Physical level. The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored
in the database, and what relationships exist among those data. The logical level
thus describes the entire database in terms of a small number of relatively simple
structures. Although implementation of the simple structures at the logical level
may involve complex physical-level structures, the user of the logical level does not
need to be aware of this complexity. This is referred to as physical data indecency.
• View level. The highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database. Many users of the
database system do not need all this information; instead, they need to access only
a part of the database. The view level of abstraction exists to simplify their interaction
with the system. The system may provide many views for the same database.
Purpose of Database Systems
Database systems serve a fundamental role in modern information technology and business
operations. Their primary purpose is to efficiently and securely manage large volumes of data,
making it accessible, organized, and available for various applications and users. Here are some
key purposes and benefits of database systems:
Data Organization: Database systems provide a structured and organized way to store and
manage data. Data is typically organized into tables with rows and columns, making it easier to
categorize and retrieve information.
Data Retrieval: Users can easily retrieve specific data using queries and search criteria, enabling
efficient access to relevant information without the need for manual data sorting or searching
through files.
Data Integrity: Database systems enforce data integrity constraints, such as unique keys and
referential integrity, to ensure data accuracy and consistency. This helps prevent data duplication
and maintain data quality.
Data Security: Database systems offer security features like user authentication and
authorization, encryption, and access control to protect sensitive data from unauthorized access
and tampering.
Concurrency Control: In multi-user environments, database systems manage concurrent access
to data to prevent conflicts and ensure data consistency. This is crucial for collaborative work
and transaction processing.
Data Redundancy Reduction: Databases minimize data redundancy by storing information in a
centralized manner. This reduces storage space requirements and decreases the chances of
inconsistent data.
Data Abstraction: Database systems provide a level of abstraction, separating the physical
storage details from the logical view of data. This abstraction simplifies data management and
allows changes to the database structure without affecting applications.
Scalability: Database systems can scale to accommodate growing data volumes and increasing
user loads. This scalability is essential for businesses and applications that expect data growth
over time.
Data Backup and Recovery: Database systems offer tools and mechanisms for regular data
backups and recovery procedures to safeguard against data loss due to hardware failures,
accidents, or other unforeseen events.
Data Analysis: Database systems support data analysis through SQL queries and reporting tools,
enabling businesses to gain insights, make informed decisions, and identify trends in their data.
Data Sharing: Multiple users and applications can access and share data from a central
repository. This promotes collaboration and consistency across an organization.
Data Independence: Database systems allow changes to the database structure without affecting
the applications using the data. This separation of data and application logic simplifies software
maintenance.
Support for Complex Queries: Database systems can handle complex queries involving
multiple tables and conditions, enabling users to retrieve specific data subsets efficiently.
Data Durability: Data changes made within a database are durable, meaning they persist even in
the event of system failures or crashes.
Data Compliance: Database systems can help organizations adhere to regulatory requirements
by providing features for data auditing, tracking, and reporting.
Database systems are foundational to various applications and industries, including business,
healthcare, finance, education, research, and more. Their purpose is to ensure data reliability,
availability, and accessibility while supporting the evolving needs of organizations and users.
Attributes are the properties or characteristics of entities. e.g.; ROLL_NO, NAME etc.
Entity refers to a real-world object, concept, or thing that is distinguishable and can be
uniquely identified. Entities are fundamental building blocks used to represent and organize
data within a database
Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples,
one of which is shown as:
1 RAM DHAKA 9455123451
Data Models
Data modeling in Database Management Systems (DBMS) is the process of creating an abstract
representation (a model) of the data and its relationships within an organization or system. This
representation serves as a blueprint for designing the structure of a database, defining how data
elements are organized, stored, accessed, and related to each other. Data modeling is a crucial
step in database development, as it helps ensure that the database accurately reflects the real-
world domain it represents.
• Relational Model: The relational model uses a collection of tables to represent both
data and the relationships among those data. Each table has multiple columns, and
each column has a unique name. Tables are also known as relations. The relational
model is an example of a record-based model. Record-based models are so named
because the database is structured in fixed-format records of several types. Each
table contains records of a particular type. Each record type defines a fixed number
of fields, or attributes. The columns of the table correspond to the attributes of the
record type. The relational data model is the most widely used data model, and
a vast majority of current database systems are based on the relational model.
• Entity-Relationship Model: The entity-relationship (E-R) data model uses a collection
of basic objects, called entities, and relationships among these objects. An entity
is a “thing” or “object” in the real world that is distinguishable from other
objects. The entity-relationship model is widely used in database design.
• Semi-structured Data Model: The semi-structured data model permits the specification
of data where individual data items of the same type may have different
sets of attributes. This is in contrast to the data models mentioned earlier, where
every data item of a particular type must have the same set of attributes. JSON and
Extensible Markup Language (XML) are widely used semi-structured data representations.
• Object-Based Data Model: Object-oriented programming (especially in Java, C++,
or C#) has become the dominant software-development methodology. This led
initially to the development of a distinct object-oriented data model, but today the
concept of objects is well integrated into relational databases. Standards exist to
store objects in relational tables. Database systems allow procedures to be stored
in the database system and executed by the database system. This can be seen as
extending the relational model with notions of encapsulation, methods, and object identity.
The collection of information stored in the database at a particular moment is called an
Instance of the database.
A logical container or blueprint that defines the structure, organization, and attributes of a
database Schema.
• Physical Schema: The physical schema, also known as the physical data model, defines
how data is physically stored on storage devices such as disks or memory. Specifies data
file organization, indexing methods, storage structures (e.g., B-trees, hash tables), and
access paths. It addresses issues like data compression, partitioning, and clustering.
Example: Defining how tables are stored on specific hard drives, the organization of data blocks
within those files, and the indexing structures used to speed up queries.
• Logical Schema: The logical schema, also known as the logical data model, defines the
structure of the data and how it is related, without specifying how it will be stored or
accessed physically. Describes tables, attributes, relationships, keys, constraints, and
views without concerning itself with the physical storage details. It serves as a bridge
between the physical schema and the application schema.
Example: Defining tables, relationships, and constraints in an Entity-Relationship Diagram
(ERD) or a relational schema, without specifying how the data will be physically stored.
The Relational Data Model is a conceptual framework to represent and manage data
in a structured manner. It is based on the principles of set theory and mathematical relations,
Example-
A Data Manipulation Language (DML) is a language that enables users to
access or manipulate data as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database.
• Insertion of new information into the database.
• Deletion of information from the database.
• Modification of information stored in the database.
Integrity: The SQL DDL includes commands for specifying integrity constraints that the
data stored in the database must satisfy. Updates that violate integrity constraints are
disallowed.
View definition: The SQL DDL includes commands for defining views.
Transaction control: SQL includes commands for specifying the beginning and end points
of transactions.
Embedded SQL and dynamic SQL: Embedded and dynamic SQL define how SQL
statements can be embedded within general-purpose programming languages, such as C,
C++, and Java.
Authorization: The There are basically two types of data-manipulation language:
• Procedural DMLs require a user to specify what data are needed and how to get those
data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify
what data are needed without specifying how to get those data. Declarative DMLs are
usually easier to learn and use than are procedural DMLs. However, since a user does
not have to specify how to get the data, the database system has to figure out an
efficient means of accessing data
Key components and operations of DML include:
SELECT: The statement is used to retrieve data from one or more database tables. It allows
users to specify the columns they want to retrieve and apply conditions to filter the results.
SELECT is primarily used for querying and reporting.
Ex-
SELECT FirstName, LastName FROM Employees
WHERE Department = 'Sales';
INSERT: The statement is used to add new records (rows) to a database table. Users provide
values for each column in the table to insert a new row.
Ex-
INSERT INTO Customers (CustomerID, FirstName, LastName, Email)
VALUES (101, 'John', 'Doe', 'john.doe@email.com');
UPDATE: The statement is used to modify existing records in a database table. Users
specify the columns to be updated and the new values. Conditions can be used to identify the
rows to be updated.
Ex-
UPDATE Products
SET Price = 24.99
WHERE ProductID = 123;
DELETE: The statement is used to remove one or more records from a database table. Users
can specify conditions to identify the rows to be deleted.
Ex-
DELETE FROM Orders
WHERE OrderID = 456;
A Query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language
Database design is the process of creating a structured plan or blueprint for how a
database system will store, organize, and manage data. It involves making critical decisions
about the structure, organization, and relationships of data within a database to meet specific
business or application requirements efficiently and accurately. Database Schema-
A Database engine is a software component or system responsible for the core
functionality of storing, managing, and retrieving data in a database. The functional components
of a database system can be broadly divided into the storage manager, the query processor
components, and the transaction management component.
The Storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and
queries submitted to the system. The storage manager is responsible for the interaction
with the file manager. The raw data are stored on the disk using the file system provided
by the operating system. The storage manager translates the various DML statements
into low-level file-system commands. Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent
(correct)
state despite system failures, and that concurrent transaction executions proceed
without conflicts.
• File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is
a critical part of the database system, since it enables the database to handle data
sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical
system implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in
particular the schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook,
a database index provides pointers to those data items that hold a particular value.
For example, we could use an index to find the instructor record with a particular
ID, or all instructor records with a particular name.
A Super key is a set of one or more attributes that, taken collectively, allow us to
identify uniquely a tuple in the relation. For example, the ID attribute of the relation
instructor is sufficient to distinguish one instructor tuple from another. Thus, ID is a
super key.
A Primary key is a unique identifier for each record in a table. Example:
Consider a database table named "Students" that stores information about students in a school.
Each student has a unique student ID, and we want to use this unique identifier as the primary
key for the table. Here's what the table might look like:
In this example, the "StudentID" column serves as the primary key for the "Students" table. It
uniquely identifies each student, ensuring that there are no duplicate values in the "StudentID"
column.
A Foreign key establishes a relationship between tables by referencing the primary key of
another table. Example:
consider another table named "Courses" that stores information about various courses offered by
the school. Each course has a course ID, and we want to establish a relationship between students
and the courses they are enrolled in. To do this, we can use a foreign key in the "Courses" table
to reference the "Students" table. Here's what the "Courses" table might look like:
In this example, the "StudentID" column in the "Courses" table is a foreign key that references
the "StudentID" column in the "Students" table. This establishes a relationship between students
and the courses they are enrolled in. The foreign key ensures that the values in the "StudentID"
column of the "Courses" table match values in the "StudentID" column of the "Students" table.
For example, in the "Courses" table:
StudentID 101 (John Smith) is enrolled in Mathematics and History.
StudentID 102 (Emily Johnson) is enrolled in science.
StudentID 103 (Michael Davis) is enrolled in English.
Relational algebra is a theoretical framework and a mathematical foundation for working
with relational databases. It defines a set of operations used to manipulate and query data in
relational database systems. These operations are fundamental for performing various tasks in
database management. Here are the core relational algebra operations:
Selection (σ - Sigma):
The selection operation retrieves rows from a relation (table) that satisfy a specified condition or
predicate. It filters the rows based on a given condition, resulting in a subset of the original
relation.
Example:
σ (Department = 'Sales')(Employees)
Projection (π - Pi):
The projection operation retrieves specific columns (attributes) from a relation while discarding
the rest of the columns. It allows you to create a new relation with a subset of the original
attributes.
Example:
π (FirstName, LastName)(Employees)
Union (∪):
The union operation combines the rows from two relations (tables) into a single result relation. It
eliminates duplicate rows from the result, ensuring that each row is unique.
Example:
Employees ∪ Contractors
Intersection (∩):
The intersection operation returns the common rows between two relations. It retrieves rows that
exist in both relations.
Example:
Employees ∩ Contractors
Difference (-):
The difference operation (also known as set difference or minus) retrieves rows from one relation
that do not exist in another relation. It returns rows that are unique to the first relation.
Example:
Employees - Contractors
Cartesian Product (×):
The cartesian product operation combines every row from the first relation with every row from
the second relation, resulting in a new relation that contains all possible combinations of rows
from both relations.
Example:
Employees × Departments
Join (⨝ - Theta Join):
The join operation combines rows from two relations based on a specified condition. It creates a
new relation by matching rows where the condition is true.
Example:
Employees ⨝ (Employee.Department = Department.Department)(Department)
Rename (ρ - Rho):
The rename operation allows you to change the name of a relation and its attributes. It is often
used for clarity and to avoid naming conflicts in query results.
Example:
ρ (Emp)(Employees)
These relational algebra operations are the building blocks of relational database query languages
like SQL. They provide a formal and mathematical foundation for expressing a wide range of
database operations and queries. More complex queries and expressions can be constructed by
combining these basic operations.
Set operations in the context of relational databases involve operations that allow you to
manipulate and combine data from multiple tables or relations. These operations are fundamental
for querying and retrieving data from a database. There are several types of set operations
commonly used in relational database systems:
1.Union: The union operation combines the rows from two or more tables or queries into a
single result set. It eliminates duplicate rows, ensuring that each row is unique. The columns of
the tables involved in the union must have the same data types.
2.Intersect: The intersection operation returns the common rows between two or more tables or
queries. It retrieves rows that exist in all the specified tables. Like the union operation, the
columns must have the same data types.
3.Except: The difference operation (also known as set difference or minus) retrieves rows from
the first table or query that do not exist in the second table or query. It returns rows that are
unique to the first set.
Example: CODE-
SELECT FirstName, LastName FROM Employees
UNION/INTERCEPT/EXCEPT
SELECT FirstName, LastName FROM Contractors;
Aggregate functions are SQL functions that perform a calculation on a set of values and
return a single value as the result. These functions are commonly used in SQL queries to
summarize and analyze data within database tables. Aggregate functions operate on a group of
rows specified by the GROUP BY clause or on all rows if no grouping is specified. Here are
some common aggregate functions:
• COUNT() calculates the number of rows in a group or a table. It can be used with the
wildcard (*) to count all rows or with a specific column to count non-null values in that
column.
• SUM() calculates the sum of all values in a numeric column within a group or a table.
• AVG() calculates the average (mean) of values in a numeric column within a group or a
table.
• MIN() returns the minimum (smallest) value in a column within a group or a table.
• MAX() returns the maximum (largest) value in a column within a group or a table.
Chapter 7
Types Of Attributes:
1. Key Attribute: The attribute which uniquely identifies each entity in the entity set is
called the key attribute. For example, Roll_No will be unique for each student. In ER
diagram, the key attribute is represented by an oval with underlying lines.
2. Composite Attribute: An attribute composed of many other attributes is called a
composite attribute.
3. Multivalued Attribute: An attribute consisting of more than one value for a given entity.
For example, Phone_No (can be more than one for a given student). In ER diagram, a
multivalued attribute is represented by a double oval. For example, the Address attribute
of the student Entity type consists of Street, City, State, and Country. In ER diagram, the
composite attribute is represented by an oval comprising of ovals.
4. Derived Attribute: An attribute that can be derived from other attributes of the entity
type is known as a derived attribute. e.g.; Age (can be derived from DOB). In ER
diagram, the derived attribute is represented by a dashed oval.
The Complete Entity Type Student with its Attributes can be represented as:
The number of times an entity of an entity set participates in a relationship set is known as
Cardinality. Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the
relationship, the cardinality is one-to-one. Let us assume that a male can marry one
female and a female can marry one male. So the relationship will be one-to-one.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more
than one relationship and the total number of tables that can be used in this is 2. Let us
assume that one surgeon department can accommodate many doctors. So the Cardinality
will be 1 to M. It means one department has many Doctors.
3. Many-to-One: When entities in one entity set can take part only once in the relationship
set and entities in other entity sets can take part more than once in the relationship set,
cardinality is many to one. Let us assume that a student can take only one course but one
course can be taken by many students. So the cardinality will be n to 1. It means that for
one course there can be n students but for one student, there will be only one course.
4. Many-to-Many: When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than
one course and one course can be taken by many students. So, the relationship will be
many to many.
Give an expression in the relational algebra to express each of the following queries:
employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)