You are on page 1of 42

ISM UNIT 2 NOTES

Database: Database is a collection of inter-related data which helps in efficient


retrieval, insertion and deletion of data from database and organizes the data in the
form of tables, views, schemas, reports etc. For Example, university database
organizes the data about students, faculty, and admin staff etc. which helps in
efficient retrieval, insertion and deletion of data from it.

Database Management System: The software which is used to manage database


is called Database Management System (DBMS). For Example, MySQL, Oracle
etc. are popular commercial DBMS used in different applications. DBMS allows
users the following tasks:

Data Definition: It helps in creation, modification and removal of definitions that


define the organization of data in database.

Data Updation: It helps in insertion, modification and deletion of the actual data
in the database.

Data Retrieval: It helps in retrieval of data from the database which can be used
by applications for various purposes.

User Administration: It helps in registering and monitoring users, enforcing data


security, monitoring performance, maintaining data integrity, dealing with
concurrency control and recovering information corrupted by unexpected failure.

Paradigm Shift from File System to DBMS

File System manages data using files in hard disk. Users are allowed to create,
delete, and update the files according to their requirement. Let us consider the
example of file based University Management System. Data of students is
available to their respective Departments, Academics Section, Result Section,
Accounts Section, Hostel Office etc. Some of the data is common for all sections
like Roll No, Name, Father Name, Address and Phone number of students but
some data is available to a particular section only like Hostel allotment number
which is a part of hostel office. Let us discuss the issues with this system:

• Redundancy of data: Data is said to be redundant if same data is copied at


many places. If a student wants to change Phone number, he has to get it
updated at various sections. Similarly, old records must be deleted from all
sections representing that student.
• Inconsistency of Data: Data is said to be inconsistent if multiple copies of
same data does not match with each other. If Phone number is different in
Accounts Section and Academics Section, it will be inconsistent.
Inconsistency may be because of typing errors or not updating all copies of
same data.

• Difficult Data Access: A user should know the exact location of file to access
data, so the process is very cumbersome and tedious. If user wants to search
student hostel allotment number of a student from 10000 unsorted students’
records, how difficult it can be.

• Unauthorized Access: File System may lead to unauthorized access to data.


If a student gets access to file having his marks, he can change it in
unauthorized way.

• No Concurrent Access: The access of same data by multiple users at same


time is known as concurrency. File system does not allow concurrency as data
can be accessed by only one user at a time.

• No Backup and Recovery: File system does not incorporate any backup and
recovery of data if a file is lost or corrupted.

File Management System DBMS

File System is a general, easy-to-use Database management system is used


system to store general files which when security constraints are high.
require less security and constraints.

Data Redundancy is more in file Data Redundancy is less in database


management system. management system.

Data Inconsistency is more in file Data Inconsistency is less in database


system. management system.

Centralization is hard to get when it Centralization is achieved in Database


comes to File Management System. Management System.

User locates the physical address of In Database Management System,


the files to access data in File user is unaware of physical address
Management System. where data is stored.

Security is low in File Management Security is high in Database


System. Management System.
File Management System stores Database Management System stores
unstructured data as isolated data structured data which have well
files/entities. defined constraints and interrelation.

Structure of DBMS

DBMS (Database Management System) acts as an interface between the user and
the database. The user requests the DBMS to perform various operations such as
insert, delete, update and retrieval on the database.

The components of DBMS perform these requested operations on the database and
provide necessary data to the users.

Applications: - It can be considered as a user friendly web page where the user
enters the requests. Here he simply enters the details that he needs and presses
buttons to get the data.

End User: - They are the real users of the database. They can be developers,
designers, administrator or the actual users of the database.

DDL: - Data Definition Language (DDL) is a query fired to create database,


schema, tables, mappings etc in the database. These are the commands used to
create the objects like tables, indexes in the database for the first time. In other
words, they create structure of the database.

DDL Compiler: - This part of database is responsible for processing the DDL
commands. That means these compiler actually breaks down the command into
machine understandable codes. It is also responsible for storing the metadata
information like table name, space used by it, number of columns in it etc.

DML Compiler: - When the user inserts, deletes, updates or retrieves the record
from the database, he will be sending request which he understands by pressing
some buttons. But for the database to work/understand the request, it should be
broken down to object code. This is done by this compiler.

Query Optimizer: - When user fires some request, he is least bothered how it will
be fired on the database. He is not all aware of database or its way of performance.
But whatever be the request, it should be efficient enough to fetch, insert, update or
delete the data from the database. The query optimizer decides the best way to
execute the user request which is received from the DML compiler.

Stored Data Manager: - This is also known as Database Control System. It is one
the main central system of the database. It is responsible for various tasks :-

• It converts the requests received from query optimizer to machine


understandable form. It makes actual request inside the database. It is like
fetching the exact part of the brain to answer.

• It helps to maintain consistency and integrity by applying the


constraints. That means, it does not allow inserting / updating / deleting any
data if it has child entry. Similarly it does not allow entering any duplicate
value into database tables.

• It controls concurrent access. If there are multiple users accessing the


database at the same time, it makes sure, all of them see correct data. It
guarantees that there is no data loss or data mismatch happens between the
transactions of multiple users.

• It helps to backup the database and recover data whenever required. Since it
is a huge database and when there is any unexpected exploit of transaction,
and reverting the changes are not easy. It maintains the backup of all data, so
that it can be recovered.

Data Files: - It has the real data stored in it. It can be stored as magnetic tapes,
magnetic disks or optical disks.

Compiled DML: - Some of the processed DML statements (insert, update, delete)
are stored in it so that if there is similar requests, it will be re-used.

Data Dictionary: - It contains all the information about the database. As the name
suggests, it is the dictionary of all the data items. It contains description of all the
tables, view, materialized views, constraints, indexes, triggers etc.
People who deal with Data Base
Application Programmers

Application programmers are the one who writes application programs that uses
the database. These application programs are written in programming languages
like COBOL or PL (Programming Language 1), Java and fourth generation
language. These programs meet the user requirement and made according to
user requirements. Retrieving information, creating new information and
changing existing information is done by these application programs.

They interact with DBMS through DML (Data manipulation language) calls.
And all these functions are performed by generating a request to the DBMS. If
application programmers are not there then there will be no creativity in the
whole team of Database.

End Users

End users are those who access the database from the terminal end. They use
the developed applications and they don’t have any knowledge about the design
and working of database. These are the second class of users and their main
motto is just to get their task done. There are basically two types of end users
that are discussed below:-

Casual User

These users have great knowledge of query language. Casual users access data
by entering different queries from the terminal end. They do not write programs
but they can interact with the system by writing queries.

Naïve

Any user who does not have any knowledge about database can be in this
category. There task is to just use the developed application and get the desired
results. For example: Clerical staff in any bank is a naïve user. They don’t have
any DBMS knowledge but they still use the database and perform their given
task.

DBA (Database Administrator)

DBA can be a single person or it can be a group of person. Database


Administrator is responsible for everything that is related to database. He makes
the policies, strategies and provides technical supports.
Database administrator is responsible for: Design of the conceptual and physical
schemas: interacting with the users of the system to understand what data is to be
stored in the DBMS and how it is likely to be used.

Security and authorization: ensuring that unauthorized data access is not


permitted.

Data availability and recovery from failures: ensuring if the system fails,
users can continue to access as much of the uncorrupted data as possible.

Database tuning: modifying the database to ensure adequate performance as


user requirements change.

System Analyst

System analyst is responsible for the design, structure and properties of


database. All the requirements of the end users are handled by system analyst.
Feasibility, economic and technical aspects of DBMS is the main concern of
system analyst.

So, this is all about the Data Base Users.

DBMS Architecture
The design of a DBMS depends on its architecture. It can be centralized or
decentralized or hierarchical. The architecture of a DBMS can be seen as either
single tier or multi-tier. The N-tier architecture divides the whole system into
related but independent n modules, which can be independently modified,
altered, changed, or replaced.

In 1-tier architecture, the DBMS is the only entity where the user directly sits
on the DBMS and uses it. Any changes done here will directly be done on the
DBMS itself. It does not provide handy tools for end-users. Database designers
and programmers normally prefer to use single-tier architecture.

If the architecture of DBMS is 2-tier, then it must have an application through


which the DBMS can be accessed. Programmers use 2-tier architecture where
they access the DBMS by means of an application. Here the application tier is
entirely independent of the database in terms of operation, design, and
programming.
3 Tier Architecture:

The 3-tier architecture separates its tiers from each other based on the
complexity of the users and how they use the data present in the database. It is
the most widely used architecture to design a DBMS.

Database (Data) Tier − In this tier, the database resides along with its query
processing languages. We also have the relations that define the data and
their constraints at this level.

Application (Middle) Tier − In this tier reside the application server and
the programs that access the database. For a user, this application tier
presents an abstracted view of the database. End-users are unaware of any
existence of the database beyond the application. At the other end, the
database tier is not aware of any other user beyond the application tier.
Hence, the application layer sits in the middle and acts as a mediator
between the end-user and the database.

User (Presentation) Tier − End-users operate on this tier and they know
nothing about any existence of the database beyond this layer. At this layer,
multiple views of the database can be provided by the application. All views
are generated by applications that reside in the application tier.

Multiple-tier database architecture is highly modifiable, as almost all its


components are independent and can be changed independently.
Introduction to Data Models
Data models define how the logical structure of a database is modeled. Data
Models are fundamental entities to introduce abstraction in a DBMS. Data
models define how data is connected to each other and how they are
processed and stored inside the system.

The very first data model could be flat data-models, where all the data used
are to be kept in the same plane. Earlier data models were not so scientific,
hence they were prone to introduce lots of duplication and update anomalies.

The primary goal of data models are:

• Ensures that all data objects required by the database are accurately
represented. Omission of data will lead to creation of faulty reports
and produce incorrect results.

• A data model helps design the database at the conceptual, physical


and logical levels.

• Data Model structure helps to define the relational tables, primary and
foreign keys and stored procedures.

Types of Data Models:

There are mainly three different types of data models:

Conceptual: This Data Model defines WHAT the system contains. This
model is typically created by Business stakeholders and Data Architects. The
purpose is to organize, scope and define business concepts and rules.

Logical: Defines HOW the system should be implemented regardless of the


DBMS. This model is typically created by Data Architects and Business
Analysts. The purpose is to developed technical map of rules and data
structures.

Physical: This Data Model describes HOW the system will be implemented
using a specific DBMS system. This model is typically created by DBA and
developers. The purpose is actual implementation of the database.
Conceptual Model:
The main aim of this model is to establish the entities, their attributes, and
their relationships. In this Data modeling level, there is hardly any detail
available of the actual Database structure.

The 3 basic elements of Data Model are

Entity: A real-world thing

Attribute: Characteristics or properties of an entity

Relationship: Dependency or association between two entities

Customer and Product are two entities. Customer number and name are
attributes of the Customer entity.

Product name and price are attributes of product entity.

Sale is the relationship between the customer and product.

Logical Model
Logical data models add further information to the conceptual model
elements. It defines the structure of the data elements and set the
relationships between them.

The advantage of the Logical data model is to provide a foundation to form


the base for the Physical model. However, the modeling structure remains
generic.
At this Data Modeling level, no primary or secondary key is defined. At this
Data modeling level, you need to verify and adjust the connector details that
were set earlier for relationships.

Physical Model
A Physical Data Model describes the database specific implementation of
the data model. It offers an abstraction of the database and helps generate
schema. This is because of the richness of meta-data offered by a Physical
Data Model.

This type of Data model also helps to visualize database structure. It helps to
model database columns keys, constraints, indexes, triggers, and other
RDBMS features.
Introduction to Data Base Models
A Database model defines the logical design and structure of a database and
defines how data will be stored, accessed and updated in a database
management system. While the Relational Model is the most widely used
database model, there are other models too:

• Hierarchical Model
• Network Model
• Entity-relationship Model
• Relational Model.

Hierarchical Model

This database model organizes data into a tree-like-structure, with a single


root, to which all the other data is linked. The hierarchy starts from
the Root data, and expands like a tree, adding child nodes to the parent
nodes.

In this model, a child node will only have a single parent node.

This model efficiently describes many real-world relationships like index of


a book, recipes etc.

In hierarchical model, data is organized into tree-like structure with one-to-


many relationship between two different types of data, for example, one
department can have many courses, many professors and of course many
students.
NETWORK MODEL
This is an extension of the Hierarchical model. In this model data is
organized more like a graph, and are allowed to have more than one parent
node.

In this database model data is more related as more relationships are


established in this database model. Also, as the data is more related, hence
accessing the data is also easier and fast. This database model was used to
map many-to-many data relationships.

This was the most widely used database model, before Relational Model was
introduced.

Entity Relationship Model


In this database model, relationships are created by dividing object of
interest into entity and its characteristics into attributes.

Different entities are related using relationships.

E-R Models are defined to represent the relationships into pictorial form to
make it easier for different stakeholders to understand.

This model is good to design a database, which can then be turned into
tables in relational model.

The ER Model creates entity set, relationship set, general attributes and
constraints.

ER Model is best used for the conceptual design of a database.


ER Model is based on −

Entities and their attributes.

Relationships among entities.

Entity − An entity in an ER Model is a real-world entity having properties


called attributes. Every attribute is defined by its set of values called domain.
For example, in a school database, a student is considered as an entity.
Student has various attributes like name, age, class, etc.

 Relationship − The logical association among entities is called relationship.


Relationships are mapped with entities in various ways. Mapping
cardinalities define the number of association between two entities.

 Mapping cardinalities −

1. one to one
2. one to many
3. many to one
4. many to many

Example: If we have to design a School Database, then Student will


be an entity with attributes name, age, address etc. As Address is
generally complex, it can be another entity with attributes street
name, pin code, city etc, and there will be a relationship between
them.

Attributes:
An attribute can be of many types, here are different types of attributes
defined in ER database model:

Simple attribute: The attributes with values that are atomic and cannot be
broken down further are simple attributes. For example, student's age.
Composite attribute: A composite attribute is made up of more than one
simple attribute. For example, student's address will contain, house
no., street name, pin code etc.

Derived attribute: These are the attributes which are not present in the
whole database management system, but are derived using other attributes.
For example, average age of students in a class.

Single-valued attribute: As the name suggests, they have a single value.

Multi-valued attribute: And, they can have multiple values.

Relationship:
When an Entity is related to another Entity, they are said to have a
relationship. For example, A Class Entity is related to Student entity,
because students study in classes, hence this is a relationship.

Depending upon the number of entities involved, a degree is assigned to


relationships.

For example, if 2 entities are involved, it is said to be Binary relationship,


if 3 entities are involved, it is said to be Ternary relationship, and so on.

Overview of Data Base Design


• Database Design is a collection of processes that facilitate the
designing, development, implementation and maintenance of
enterprise data management systems.
• It helps produce database systems hat meet the requirements of the
users.
• Have high performance.

• The main objectives of database designing are to produce logical and


physical designs models of the proposed database system.

• The logical model concentrates on the data requirements and the data to be
stored independent of physical considerations. It does not concern itself with
how the data will be stored or where it will be stored physically.
• The physical data design model involves translating the logical design of the
database onto physical media using hardware resources and software
systems such as database management systems (DBMS).

Data Base Development Life Cycle

 The database development life cycle has a number of stages that are
followed when developing database systems.

 The steps in the development life cycle do not necessary have to be followed
religiously in a sequential manner.

 On small database systems, the database system development life cycle is


usually very simple and does not involve a lot of steps.

Requirements analysis

Planning - This stages concerns with planning of entire Database


Development Life Cycle. It takes into consideration the Information Systems
strategy of the organization.

System definition - This stage defines the scope and boundaries of the
proposed database system.
Database designing

Logical model - This stage is concerned with developing a database model


based on requirements. The entire design is on paper without any physical
implementations or specific DBMS considerations.

Physical model - This stage implements the logical model of the database
taking into account the DBMS and physical implementation factors.

Implementation

Data conversion and loading - this stage is concerned with importing and
converting data from the old system into the new database.

Testing - this stage is concerned with the identification of errors in the


newly implemented system .It checks the database against requirement
specifications.

Introduction to Relational Model

The most popular data model in DBMS is the Relational Model. It is more
scientific a model than others. This model is based on first-order predicate
logic and defines a table as an n-ary relation.

The basic Terminology of relational model is:-

Relational Model: Relational model represents data in the form of relations or


tables.

Relational Schema: Schema represents structure of a relation. e.g.; Relational


Schema of STUDENT relation can be represented as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE)

Relational Instance: The set of values present in a relation at a particular instance


of time is known as relational instance as shown in Table 1 and Table 2.

Attribute: Each relation is defined in terms of some properties, each of which is


known as attribute. For Example, STUD_NO, STUD_NAME etc. are attributes of
relation STUDENT.
Domain of an attribute: The possible values an attribute can take in a relation is
called its domain. For Example, domain of STUD_AGE can be from 18 to 40.

Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation given
below has 4 tuples.

NULL values: Values of some attribute for some tuples may be unknown, missing
or undefined which are represented by NULL. Two NULL values in a relation are
considered different from each other.

Table 1 and Table 2 represent relational model having two relations STUDENT
and STUDENT_COURSE.

Degree: The number of attributes in the relation is known as degree of the


relation. The STUDENT relation defined above has degree 6.

Cardinality: The number of tuples in a relation is known as cardinality.


The STUDENT relation defined above has cardinality 4.

Column: Column represents the set of values for a particular attribute. The
column STUD_NO is extracted from relation STUDENT.

NULL Values: The value which is not known or unavailable is called


NULL value. It is represented by blank space.
Introduction to Keys in Relational Model

Super Key is defined as a set of attributes within a table that can uniquely
identify each record within a table. Super Key is a superset of Candidate
key. In the table defined above super key would include student_id,
(student_id, name), phone etc.

Candidate Key: The minimal set of attribute which can uniquely identify a
tuple is known as candidate key.

For Example: STUD_NO in STUDENT relation.

The value of Candidate Key is unique and non-null for every tuple.

There can be more than one candidate key in a relation. For Example,
STUD_NO as well as STUD_PHONE both are candidate keys for relation
STUDENT.

The candidate key can be simple (having only one attribute) or composite
as well. For Example, {STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.

 Primary Key: There can be more than one candidate key in a relation out of
which one can be chosen as primary key.
For Example, STUD_NO as well as STUD_PHONE both are candidate
keys for relation STUDENT but STUD_NO can be chosen as primary key
(only one out of many candidate keys).

 Alternate Key: The candidate key other than primary key is called as
alternate key.

For Example: STUD_NO as well as STUD_PHONE both are candidate


keys for relation STUDENT but STUD_PHONE will be alternate key (only
one out of many candidate keys).

 Foreign key: Foreign Key are the columns of a table that points to the
primary key of another table. They act as a cross-reference between tables.

For Example, STUD_NO in STUDENT_COURSE is a foreign key to


STUD_NO in STUDENT relation.

unlike, Primary Key of any given relation, Foreign Key can be NULL as
well as may contain duplicate tuples i.e. it need not follow uniqueness
constraint.

For Example, STUD_NO in STUDENT_COURSE relation is not unique. It


has been repeated for the first and third tuple. However, the STUD_NO in
STUDENT relation is a primary key and it needs to be always unique and it
cannot be null.

 Non-key Attributes: Non-key attributes are the attributes or fields of a


table, other than candidate key attributes/fields in a table.

Non-prime Attributes: Non-prime Attributes are attributes other


than Primary Key attribute(s).

 Composite Key :- Key that consists of two or more attributes that uniquely
identify any record in a table is called Composite key. But the attributes
which together form the Composite key are not a key independently or
individually.
Constraints in Relational Model

While designing Relational Model, we define some conditions which must


hold for data present in database are called Constraints. These constraints are
checked before performing any operation (insertion, deletion and updation)
in database. If there is a violation in any of constrains, operation will fail.

Domain Constraints: These are attribute level constraints. An attribute can


only take values which lie inside the domain range. e.g; If a constraints
AGE>0 is applied on STUDENT relation, inserting negative value of AGE
will result in failure.

Key Integrity Constraints: Every relation in the database should have at


least one set of attributes which defines a Tuple uniquely. Those set of
attributes is called key. e.g.; ROLL_NO in STUDENT is a key. No two
students can have same roll number.

So a key has two properties:

 It should be unique for all tuples.


 It can’t have NULL values.

Referential Integrity: When one attribute of a relation can only take values
from other attribute of same relation or any other relation, it is called
referential integrity. Let us suppose we have 2 relations:
STUDENT

Roll No Name Address Phone Age Branch


code
1 Ram Delhi 9786547 18 CS
2 Ramesh Gurgaon 9652456 18 CS
3 Sujit Rohtak 9657845 20 ECE
4 Suresh Delhi 18 IT

BRANCH

BRANCH CODE BRANCH NAME


CS Computer Science
IT Information Technology
ECE Electronics and Communication Engineering
CV Civil Engineering

BRANCH_CODE of STUDENT can only take the values which are present
in BRANCH_CODE of BRANCH which is called referential integrity
constraint.

The relation which is referencing to other relation is called REFERENCING


RELATION (STUDENT in this case) and the relation to which other
relations refer is called REFERENCED RELATION (BRANCH in this
case).

CONCEPT OF NORMALIZATION

Normalization is the process of minimizing redundancy from a relation or


set of relations. Redundancy in relation may cause insertion, deletion and
updation anomalies. So, it helps to minimize the redundancy in
relations. Normal forms are used to eliminate or reduce redundancy in
database tables.

If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with
anomalies is next to impossible.

 Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when
we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left with
old values. Such instances leave the database in an inconsistent state.

 Deletion anomalies − We tried to delete a record, but parts of it was left


undeleted because of unawareness, the data is also saved somewhere else.

 Insert anomalies − We tried to insert data in a record that does not exist at
all.

 Normalization is a method to remove all these anomalies and bring the


database to a consistent state.

First Normal Form


If a relation contain composite or multi-valued attribute, it violates first
normal form or a relation is in first normal form if it does not contain any
composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is singled valued attribute.

First Normal Form is defined in the definition of relations (tables) itself.


This rule defines that all the attributes in a relation must have atomic
domains. The values in an atomic domain are indivisible units.

A table is in 1 NF iff:

• There should be Single Valued Attributes.


• Attribute Domain should not change.
• There should be Unique name Attributes/Columns.
• The order in which data is stored, does not matter.

Table 1: Unorganized Relation


Table 2: First Normal Form

Second Normal Form

To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the table.

Partial Dependency – If the proper subset of candidate key determines non-prime


attribute, it is called partial dependency.

For a relation to be in Second Normal Form, it must be in First Normal form.

Prime attribute − An attribute, which is a part of the candidate-key, is


known as a prime attribute.

Non-prime attribute − An attribute, which is not a part of the prime-key, is


said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be


fully functionally dependent on prime key attribute. That is, if X → A holds,
then there should not be any proper subset Y of X, for which Y → A also
holds true.

We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any of
the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in
Second Normal Form.

Table: In Second Normal Form

We broke the relation in two as depicted in the above picture, So there exists
no partial dependency.

Third Normal Form

A relation is in third normal form, if there is no transitive dependency for non-


prime attributes as well as it is in second normal form.

A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y.
• X is a super key.
• Y is a prime attribute (each element of Y is part of some candidate key).

A relation is in third normal form, if there is no transitive dependency for


non-prime attributes as well as it is in second normal form.

We find that in the above Student_detail relation, Stu_ID is the key and only
prime key attribute. We find that City can be identified by Stu_ID as well as
Zip itself. Neither Zip is a superkey nor is City a prime attribute.

Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

 To bring this relation into third normal form, we break the relation into two
relations as follows:

Table 1:- Stu_ID, Stu_Name,City where Stu_ID as primary key.


Table 2: City, ZIP where ZIP as primary key.

These Two tables are in 3rd NF.

BOYCE CODD Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form
on strict terms.

BCNF states that −

It should be in Third Normal Form.

For any functional dependency, X → A, X must be a super-key.

In simple words, it means, that for a dependency A → B, A cannot be a non-


prime attribute, if B is a prime attribute.
In the example

In the table above, student_id, subject form primary key, which


means subject column is a prime attribute.

But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute,


which is not allowed by BCNF.

To make this relation(table) satisfy BCNF, we will decompose this table into
two tables, student table and professor table.

Below we have the structure for both the tables.

Student Table:

 Professor Table:

And now, this relation satisfy Boyce-Codd Normal Form.


QUERYING RELATIONAL DATA USING SQL

Structured Query Language (SQL) as we all know is the database language


by the use of which we can perform certain operations on the existing
database and also we can use this language to create a database. SQL uses
certain commands like Create, Drop, Insert etc. to carry out the required
tasks.

These SQL commands are mainly categorized into four categories as:

DDL – Data Definition Language


DQL – Data Query Language
DML – Data Manipulation Language
DCL – Data Control Language

DDL (Data Definition Language)


DDL or Data Definition Language actually consists of the SQL commands
that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the
structure of database objects in the database. There are several types of
commands:

• CREATE
• DROP
• ALTER
• TRUNCATE
• RENAME

CREATE COMMAND
There are two CREATE statements available in SQL:
1. CREATE DATABASE
2. CREATE TABLE

CREATE DATABASE

A Database is defined as a structured set of data. So, in SQL the very first step to
store the data in a well structured manner is to create a database. The CREATE
DATABASE statement is used to create a new database in SQL.
Syntax:
CREATE DATABASE database_name;

CREATE TABLE: The CREATE TABLE statement is used to create a


table in SQL. We know that a table comprises of rows and columns. So
while creating tables we have to provide all the information to SQL about
the names of the columns, type of data to be stored in columns, size of the
data etc.
CREATE TABLE table_name
(column1 data_type(size),
column2 data_type(size),
column3 data_type(size), .... );
Example:
CREATE TABLE Student
(ROLL_NO int(3),
NAME varchar(20),
SUBJECT varchar(20) );

DROP COMMAND

DROP is used to delete a whole database or just a table. The DROP


statement destroys the objects like an existing database, table, index, or
view.

A DROP statement in SQL removes a component from a relational database


management system (RDBMS).

DROP object object_name ;

Example:
DROP TABLE table_name;

table_name: Name of the table to be deleted.

DROP DATABASE database_name;

database_name: Name of the database to be deleted.


ALTER COMMAND

ALTER TABLE is used to add, delete/drop or modify columns in the


existing table. It is also used to add and drop various constraints on the
existing table.

ALTER TABLE – ADD

ADD is used to add columns into the existing table. Sometimes we may
require to add additional information, in that case we do not require to create
the whole database again, ADD comes to our rescue.

ALTER TABLE table_name ADD


(Columnname_1 datatype,
Columnname_2 datatype,
Columnname_n datatype);

ALTER TABLE – DROP

DROP COLUMN is used to drop column in a table. Deleting the unwanted


columns from the table.

ALTER TABLE table_name


DROP COLUMN column_name;

ALTER TABLE-MODIFY

It is used to modify the existing columns in a table. Multiple columns can


also be modified at once.

*Syntax may vary slightly in different databases.

Syntax(Oracle,MySQL,MariaDB):

ALTER TABLE table_name


MODIFY column_name column_type;
Syntax(SQL Server):

ALTER TABLE table_name


ALTER COLUMN column_name column_type;

Sometimes we may want to rename our table to give it a more relevant


name. For this purpose we can use ALTER TABLE to rename the name of
table.

*Syntax may vary in different databases.

Syntax(Oracle,MySQL,MariaDB):

ALTER TABLE table_name


RENAME TO new_table_name;

Columns can be also be given new name with the use of ALTER TABLE.

Syntax(Oracle):

ALTER TABLE table_name


RENAME COLUMN old_name TO new_name;

Syntax(MySQL,MariaDB):

ALTER TABLE table_name


CHANGE COLUMN old_name TO new_name;

TRUNCATE COMMAND

TRUNCATE statement is a Data Definition Language (DDL) operation that


is used to mark the extents of a table for de allocation (empty for reuse). The
result of this operation quickly removes all data from a table, typically
bypassing a number of integrity enforcing mechanisms.

The TRUNCATE TABLE my table statement is logically (though not


physically) equivalent to the DELETE FROM my table statement (without a
WHERE clause).
TRUNCATE TABLE table_name;

Truncate means the data will be deleted but the structure will remain in the
memory for further operations.

DQL (Data Query Language)


DQL statements are used for performing queries on the data within schema
objects. The purpose of DQL Command is to get some schema relation
based on the query passed to it.

SELECT – It is used to retrieve data from the a database.

SELECT column_name from Table_name


WHERE <Condition>;

DML (Data Manipulation Language)

The SQL commands that deals with the manipulation of data present in the
database belong to DML or Data Manipulation Language and this includes
most of the SQL statements.

INSERT – is used to insert data into a table.


UPDATE – is used to update existing data within a table.
DELETE – is used to delete records from a database table.

INSERT COMMAND

The INSERT INTO statement of SQL is used to insert a new row in a table.
There are two ways of using INSERT INTO statement for inserting rows:

Only values: First method is to specify only the value of data to be inserted
without the column names.
INSERT INTO Table_name VALUES (value1, value2, value3,…);

Table_name: name of the table.

value1, value2,.. : value of first column, second column,… for the new
record.

Column names and values both: In the second method we will specify
both the columns which we want to fill and their corresponding values as
shown below:

INSERT INTO Table_name (column1, column2, column3,..) VALUES


( value1, value2, value3,..);

Table_name: name of the table.


column1: name of first column, second column …
value1, value2, value3 : value of first column, second column,… for the
new record.

 Using SELECT in INSERT INTO Statement

We can use the SELECT statement with INSERT INTO statement to copy
rows from one table and insert them into another Table. The use of this
statement is similar to that of INSERT INTO statement.

The difference is that the SELECT statement is used here to select data from
a different table. The different ways of using INSERT INTO SELECT
statement are shown below:

 Inserting all columns of a table: We can copy all the data of a table and
insert into in a different table.

INSERT INTO First_Table SELECT * FROM Second_Table;

first_table: name of first table.


second_table: name of second table.

We have used the SELECT statement to copy the data from one table and
INSERT INTO statement to insert in a different table.
 Inserting specific columns of a table: We can copy only those columns of
a table which we want to insert into in a different table.

INSERT INTO First_Table(names_of_columns1) SELECT


names_of_columns2 FROM Second_Table;

First_Table: name of first table.


Second_Table: name of second table.
names of columns1: name of columns separated by comma(,) for table 1.
names of columns2: name of columns separated by comma(,) for table 2.

We have used the SELECT statement to copy the data of the selected
columns only from the second table and INSERT INTO statement to insert
in first table.

Copying specific rows from a table: We can copy specific rows from a
table to insert into another table by using WHERE clause with the SELECT
statement. We have to provide appropriate condition in the WHERE clause
to select specific rows.

INSERT INTO table1 SELECT * FROM table2 WHERE condition;

first_table: name of first table.


second_table: name of second table.
condition: condition to select specific rows.

UPDATE COMMAND

The UPDATE statement in SQL is used to update the data of an existing


table in database. We can update single columns as well as multiple columns
using UPDATE statement as per our requirement.

UPDATE Table_name SET column1 = value1, column2 = value2,...


WHERE condition;

table_name: name of the table.


column1: name of first , second, third column....
value1: new value for first, second, third column....
condition: condition to select the rows for which the values of columns
needs to be updated.

In the above query the SET statement is used to set new values to the
particular column and the WHERE clause is used to select the rows for
which the columns are needed to be updated. If we have not used the
WHERE clause then the columns in all the rows will be updated. So the
WHERE clause is used to choose the particular rows.

DELETE COMMAND

The DELETE Statement in SQL is used to delete existing records from a


table. We can delete a single record or multiple records depending on the
condition we specify in the WHERE clause.

DELETE FROM Table_name WHERE some_condition;

table_name: name of the table.


some_condition: condition to choose particular record.

We can delete single as well as multiple records depending on the condition


we provide in WHERE clause. If we omit the WHERE clause then all of the
records will be deleted and the table will be empty.

Delete all of the records:

query1: "DELETE FROM Student";


query2: "DELETE * FROM Student";

DCL (Data Control Language)


DCL includes commands such as GRANT and REVOKE which mainly
deals with the rights, permissions and other controls of the database system.

GRANT-gives user’s access privileges to database.

REVOKE-Withdraw user’s access privileges given by using the GRANT


command.
AGGREGATE FUNCTION
In database management an aggregate function is a function where the
values of multiple rows are grouped together as input on certain criteria to
form a single value of more significant meaning.

Various Aggregate Functions


1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()

Example:

ID NAME SALARY
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F NULL

Count():

Count(*): Returns total number of records i.e. 6.

Count(salary): Return number of Non Null values over the column Salary
i.e. 5.

Count(Distinct Salary): Return number of distinct Non Null values over the
column Salary i.e. 4.

Sum():

sum(salary): Sum all Non Null values of Column salary i.e., 310.

sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.


Avg():

Avg(salary) = Sum(salary) / count(salary) = 310/5.

Avg(Distinct salary) = sum(Distinct salary) / Count(Distinct Salary) =


250/4.

Min():

Min(salary): Minimum value in the salary column except NULL i.e., 40.

Max();

Max(salary): Maximum value in the salary i.e., 80.

INTRODUCTION TO DATA MINING


“Data Mining” refers to the extraction of useful information from a bulk of
data or data warehouses.

The result of data mining is the patterns and knowledge that we gain at the
end of the extraction process. In that sense, Data Mining is also known as
Knowledge Discovery or Knowledge Extraction.

Now a days, data mining is used in almost all the places where a large
amount of data is stored and processed.

Main Purpose of Data Mining

Basically, the information gathered from Data Mining helps to predict


hidden patterns, future trends and behaviors and allowing businesses to take
decisions.

Technically, data mining is the computational process of analyzing data


from different perspective, dimensions, angles and
categorizing/summarizing it into meaningful information.
Data Mining can be applied to any type of data e.g. Data Warehouses,
Transactional Databases, Relational Databases, Multimedia Databases,
Spatial Databases, Time-series Databases, World Wide Web.

Data Mining as a whole process

The whole process of Data Mining comprises of three main phases:

Data Pre-processing – Data cleaning, integration, selection and


transformation takes place.

Data Extraction – Occurrence of exact data mining.

Data Evaluation and Presentation – Analyzing and presenting results.

DATA MINING PROCESS

Applications of Data Mining

1. Financial Analysis
2. Biological Analysis
3. Scientific Analysis etc.
Real life example of Data Mining –

Market Basket Analysis

Market Basket Analysis is a technique which gives the careful study of


purchases done by a customer in a super market. The concept is basically
applied to identify the items that are bought together by a customer. Say, if a
person buys bread, what are the chances that he/she will also purchase
butter. This analysis helps in promoting offers and deals by the companies.
The same is done with the help of data mining.

Difference between Data Warehousing and Data Mining

A data warehouse is built to support management functions whereas data


mining is used to extract useful information and patterns from data. Data
warehousing is the process of compiling information into a data warehouse.

Data Warehousing:

It is a technology that aggregates structured data from one or more sources


so that it can be compared and analyzed rather than transaction processing.
A data warehouse is designed to support management decision-making
process by providing a platform for data cleaning, data integration and data
consolidation. A data warehouse contains subject-oriented, integrated, time-
variant and non-volatile data.

Data warehouse consolidates data from many sources while ensuring data
quality, consistency and accuracy. Data warehouse improves system
performance by separating analytics processing from transnational
databases. Data flows into a data warehouse from the various databases. A
data warehouse works by organizing data into a schema which describes the
layout and type of data. Query tools analyze the data tables using schema.
Data Warehousing Process

Difference between Data Warehousing and Data Mining

Data Warehousing Data Mining


A data warehouse is database Data mining is the process of
system which is designed for analyzing data patterns.
analytical analysis instead of
transactional work.

Data is stored periodically. Data is analyzed regularly.

Data warehousing is the Data mining is the use of


process of extracting and pattern recognition logic to
storing data to allow easier identify patterns.
reporting.

Data warehousing is solely Data mining is carried by


carried out by engineers. business users with the help of
engineers.

Data warehousing is the Data mining is considered as a


process of pooling all relevant process of extracting data
data together. from large data sets.
Introduction to Cloud Computing
In Simplest terms, cloud computing means storing and accessing the data
and programs on remote servers that are hosted on internet instead of
computer’s hard drive or local server. Cloud computing is also referred as
Internet based computing.

Cloud Computing Architecture: Cloud computing architecture refers to


the components and sub components required for cloud computing. These
component typically refer to:

Front end(fat client, thin client)


Back end platforms(servers, storage)
Cloud based delivery and a network (Internet, Intranet, Inter cloud).

Hosting a cloud:

There are three layers in cloud computing. Companies use these layers based
on the service they provide.

Infrastructure
Platform
Application

At the bottom is the foundation, the Infrastructure where the people start and
begin to build. This is the layer where the cloud hosting lives.
 Hosting :

Let’s say you have a company and a website and the website has a lot of
communications that are exchanged between members. You start with a few
members talking with each other and then gradually the numbers of
members increases.

As the time passes, as the number of members increases, there would be


more traffic on the network and your server will get slow down. This would
cause a problem.

A few years ago, the websites are put in the server somewhere, in this way
you have to run around or buy and set number of servers. It costs a lot of
money and takes lot of time. You pay for these servers when you are using
and as well as when you are not using. This is called Hosting.

This problem is overcome by cloud hosting. With Cloud Computing, you


have access to computing power when you needed. Now, your website is put
in the cloud server as you put it on dedicated Server. People start visiting
your website and if you suddenly need more computing power, you would
scale up according to the need.

Benefits of Cloud Hosting :

Scalability: With Cloud hosting, it is easy to grow and shrink the number
and size of servers based on the need.

This is done by either increasing or decreasing the resources in the cloud.


This ability to alter plans due to fluctuation in business size and needs is a
superb benefit of cloud computing especially when experiencing a sudden
growth in demand.

Instant: Whatever you want is instantly available in the cloud.

Save Money: An advantage of cloud computing is the reduction in hardware


cost. Instead of purchasing in-house equipment, hardware needs are left to
the vendor.
For companies that are growing rapidly, new hardware can be a large,
expensive, and inconvenience. Cloud computing alleviates these issues
because resources can be acquired quickly and easily. Even better, the cost
of repairing or replacing equipment is passed to the vendors.

Along with purchase cost, off-site hardware cuts internal power costs and
saves space. Large data centers can take up precious office space and
produce a large amount of heat. Moving to cloud applications or storage can
help maximize space and significantly cut energy expenditures.

Reliability: Rather than being hosted on one single instances of a physical


server, hosting is delivered on a virtual partition which draws its resource,
such as disk space, from an extensive network of underlying physical
servers. If one server goes offline it will have no effect on availability, as the
virtual servers will continue to pull resource from the remaining network of
servers.

Physical Security: The underlying physical servers are still housed within
data centers and so benefit from the security measures that those facilities
implement to prevent people accessing or disrupting them on-site.

You might also like