Dbms Unit 1
Dbms Unit 1
Data are raw facts. Information is data with context. Knowledge is information with
application.
Information is data that has been processed, organized, and given meaning. When you put
data into a relevant framework, it becomes information. This is where you can start to answer
basic questions about the data.
Data base = Data + Base
For example,1. An online telephone directory database to store data relating to people, phone
numbers, other contact details, etc
2. The electricity service department is using the database for related issues, managing bills,
handling fault that are, etc.
3. a Facebook social network needs to store, manipulate and present data related to
followers, their friends messages, member activates, Advertisements and a lot more.
Finance
economics accounting
R&D Sales
DATABASE
Marketing
Planning
Production
Data Independence
This is a critical feature that separates the data's structure from the programs that access it.
• Physical data independence means you can change how data is physically stored
(e.g., changing file format or storage device) without affecting the application programs.
• Logical data independence means you can change the logical structure of the
database (e.g., adding a new column to a table) without forcing existing applications to
be rewritten.
Advantages Disadvantages
Data Integrity: Enforces rules to maintain High Cost: Requires significant investment in
data accuracy. software, hardware, and staff.
Data Security: Offers granular access Complexity: Requires specialized skills for
control and user authentication. design and management.
Reduced Redundancy: Eliminates data Performance Overhead: Can be slower than
duplication, saving space and improving custom file systems due to its robust features.
consistency.
Data Sharing: Allows multiple users to Single Point of Failure: A system crash can
access and modify data concurrently. impact all applications.
Backup/Recovery: Provides automated Vendor Dependence: Can lead to being
tools to protect against data loss. locked into a single database provider.
Data Independence: The ability to change Setup Time: Initial database design and
storage methods without affecting implementation can be a lengthy process.
applications.
Q) ER Analysis
A) Inroduction:
ER analysis means entity-relationship model explains the construct
Showing off a database with the help of a diagram called an Entity Relationship diagram.
ER diagrams are a visual tool that helps represent the ER model. It was proposed by Peter
Chen in 1971 to create a uniform conviction that can be used for relational databases and
networks, and a mood to use an ER model as a conceptual modelling approach.
Lines: links attributes to entity types and entity types with other relation types.
Double ellipse means multi valued and double rectangles means weak set.
Dashed ellipse : derived attributes
Relationshi
Weak
Relationship
Attribue
Multivalued attribute
Weak Entity
A. Entity
B. Attribute
C. Relationship
Attributes and domains:
Attributes are very important because they help to describe the entities in a database.
Understanding Attributes and Domains
In the context of database management, an attribute is a specific piece of information that
describes an entity. Think of it as a column in a table, representing a characteristic of the items
in that table. For example, in a table of students, attributes could include 'Student ID', 'First
Name', 'Last Name', and 'Date of Birth'.
A domain defines the set of permissible values for an attribute. It establishes the rules and
constraints for the data that can be stored in that attribute, ensuring data integrity and
consistency. The domain specifies the data type (like integer, string, or date), length, and any
other restrictions. For instance, the domain for the 'Date of Birth' attribute might be restricted
to valid dates in the past, while the domain for 'Student ID' might require a unique, 5-digit
number.
❖ Simple attributes
❖ Composite attributes
❖ Single valued attribute
❖ Multi-valued attributes
❖ Derived attributes
❖ Key attributes
❖ Complex attribute
❖ Storage attribute
Integrity constrains:
Integrity constraints are a set of rules used in a database to ensure that data is accurate,
consistent, and reliable. They prevent accidental damage to the database by restricting the
type of data that can be inserted, updated, or deleted. Think of them as the quality control
checks for your data, making sure it follows the logical structure and business rules you've
defined.
1. Domain Constraints
Domain constraints specify the valid set of values for a particular attribute (a column in a
table). They enforce rules about the data type, size, and format of the data. For example, a
domain constraint might ensure that a 'price' column only accepts positive numerical values,
or that a 'date' column only contains valid date formats.
4. Key Constraints
Key constraints enforce the uniqueness of data within a table. This is a broader category that
includes the primary key and other unique keys. For example, a 'Students' table might have
'StudentID' as its primary key, but a 'Social Security Number' or 'Email Address' could also be
defined as unique keys to ensure that no two students have the same value for those
attributes.
Keys:
Sure, let's add super key and surrogate key to the discussion of database keys.
Super Key
A super key is any set of attributes (columns) that can uniquely identify a row in a table. It's the
most general type of key. A super key can contain extra, unnecessary attributes as long as the
unique identification property is maintained. For instance, if StudentID is a primary key, then
combinations like (StudentID, StudentName) or (StudentID, DateOfBirth) are also
super keys, even though StudentName and DateOfBirth aren't needed for unique
identification.
Surrogate Key
A surrogate key is a special kind of primary key. It's an artificial, system-generated attribute
with no inherent business meaning.1 These keys are typically auto-incrementing integers,
GUIDs (Globally Unique Identifiers), or other randomly generated numbers. The main
advantage of a surrogate key is that it's simple, reliable, and never changes, even if the data it
represents does. For example, instead of using a complex combination of attributes like
(FirstName, LastName, DateOfBirth) as a primary key, you could simply use a surrogate
key named UserID that automatically assigns a new number to each new user. This approach
is often preferred over using "natural" keys (like social security numbers or email addresses)
that might change or have business meaning.
1. Atomicity: Each column contains a single, indivisible value. There are no repeating
groups of data within a single cell. For example, a single cell in a 'phone number'
column should not contain multiple phone numbers.
2. Unique Rows: Each row is unique, typically ensured by a primary key.
• Reduces Data Redundancy: Storing data in multiple places wastes space and can
lead to inconsistencies. For example, if a customer's address is stored in two different
tables and only one is updated, the data becomes inconsistent. Normalization prevents
this by storing data in a single, dedicated location.
• Improves Data Integrity: By removing dependencies and enforcing rules,
normalization ensures that data is accurate and reliable.
• Enhances Database Flexibility: A normalized database is easier to modify and extend
without disrupting the existing structure. It allows you to add new data or entities
without having to change multiple tables.
• Atomicity: Each column contains a single, atomic value. There are no repeating groups
of data within a single cell.
• Unique Rows: Each row is unique, typically ensured by a primary key.
Example:
Student Student
Course & Grade
ID Name
Scienc
102 Jane Doe A
e
Using the 1NF table above, the composite primary key is (Student ID, Course). The
Student Name depends only on Student ID, not the full key. To achieve 2NF, we split the
table.
Example (2NF):
Students
Student Student
ID Name
Student Grad
Course
ID e
101 History A
101 Math B
Scienc
102 A
e
102 English B
Books
Book Author
Book Title Author Name
ID ID
Douglas
1001 The Hitchhiker's Guide 501
Adams
Douglas
1003 The Restaurant 501
Adams
Here, Author Name depends on Author ID, not Book ID. Author ID is a non-key attribute.
To get to 3NF, we create a separate table for authors.
Example (3NF):
Books
Book Author
Book Title
ID ID
Douglas
501
Adams
Student Interests
Student
Activity Hobby
ID
Readin
101 Soccer
g
Basketbal Readin
101
l g
Here, a student can have multiple activities and multiple hobbies, but the two are not related.
To achieve 4NF, you separate these independent attributes.
Example (4NF):
Student Activities
Student
Activity
ID
101 Soccer
Basketbal
101
l
Student Hobbies
Student
Hobby
ID
Readin
101
g
Components of DBMS:
HARDWARE
SOFTWARE
DATA
PROCEDURES
SOFTWARE
HARDWARE
DATA
DAT
ACCESS PROCEDURS
A
LANGUAGE
USER
Hardware refers to the physical, tangible parts of a computer system that you can touch. This
includes the electronic components and devices that make up the computer. Examples are
the keyboard, monitor, mouse, hard drive, and the central processing unit (CPU).
Software is the set of instructions and programs that tell the hardware what to do. It's the
intangible part of a computer system. There are two main types:
• System software manages and controls computer hardware, like the operating system
(e.g., Windows, macOS).
• Application software performs specific tasks for the user, such as a word processor or
a web browser.
Data is the raw, unorganized facts and figures that a computer processes. It's the information
that the software and hardware manipulate. This can be anything from text, numbers, images,
or sound files. For example, the name of a customer, their address, and their phone number
are all pieces of data.
Procedures are the rules and guidelines that govern how a computer system and its data are
used. They are the instructions for the users and operators. This can include how to log in, how
to back up a file, or how to handle a system error.
A data access language is a specialized language used to interact with a database. It allows
users and applications to retrieve, insert, update, and delete data. The most common example
is SQL (Structured Query Language), which is the standard language for managing relational
databases.
Data models are frameworks that define how data is structured and stored in a database. They
can be categorized into three main types based on their level of abstraction: object-based,
record-based, and physical models.
• Relational Model: Data is organized into two-dimensional tables (called relations) with
rows and columns. Relationships are established through primary and foreign keys, and
data is manipulated using SQL. It's the most widely used data model today.
• Hierarchical Model: Data is organized in a tree-like structure with a parent-child
relationship. Each child record has only one parent, but a parent can have multiple
children. This model is rigid and was common in early database systems.
• Network Model: An extension of the hierarchical model, it allows a child record to have
multiple parents, forming a more complex, graph-like structure that can represent
many-to-many relationships.
Network
The Network Database Model emerged as an evolution of the hierarchical model, specifically
designed to address its primary limitation: the inability to represent complex, many-to-many
relationships directly.
Relational
The Relational Database Model stands as the most dominant and influential database
paradigm of the past several decades, fundamentally reshaping how data is managed and
accessed. Its principles form the backbone of countless information systems worldwide.
Hierarchical Model: This model is best suited for simple, stable data structures that
inherently exhibit clear, one-to-many parent-child relationships, such as file systems,
organizational charts, or product inventory categories. It excels in scenarios requiring
predictable, high-performance top-down data retrieval, particularly in specialized, high-
availability applications like certain banking, healthcare, and telecommunications systems.
Network Model: Representing an evolutionary step, the network model was designed to
overcome the hierarchical model's limitations by allowing for more complex many-to-many
relationships with explicit links between records. It offered a more natural representation of
interconnected data than its predecessor. However, its operational complexity, reliance on
explicit pointer management, and procedural querying limited its long-term viability in the face
of more abstract and flexible alternatives.
Relational Model: The relational model stands as the most versatile and widely adopted
paradigm for structured data management. It offers robust data integrity through ACID
properties and normalization, provides high data independence, and enables powerful, easy-
to-use declarative querying via SQL. It is the ideal choice for the vast majority of transactional
systems, including e-commerce, Enterprise Resource Planning (ERP), Customer Relationship
Management (CRM), and financial applications, where consistency, flexibility, and a mature
ecosystem are paramount.