Professional Documents
Culture Documents
Data Updation: It helps in insertion, modification and deletion of the actual data
in the database.
Data Retrieval: It helps in retrieval of data from the database which can be used
by applications for various purposes.
File System manages data using files in hard disk. Users are allowed to create,
delete, and update the files according to their requirement. Let us consider the
example of file based University Management System. Data of students is
available to their respective Departments, Academics Section, Result Section,
Accounts Section, Hostel Office etc. Some of the data is common for all sections
like Roll No, Name, Father Name, Address and Phone number of students but
some data is available to a particular section only like Hostel allotment number
which is a part of hostel office. Let us discuss the issues with this system:
• Difficult Data Access: A user should know the exact location of file to access
data, so the process is very cumbersome and tedious. If user wants to search
student hostel allotment number of a student from 10000 unsorted students’
records, how difficult it can be.
• No Backup and Recovery: File system does not incorporate any backup and
recovery of data if a file is lost or corrupted.
Structure of DBMS
DBMS (Database Management System) acts as an interface between the user and
the database. The user requests the DBMS to perform various operations such as
insert, delete, update and retrieval on the database.
The components of DBMS perform these requested operations on the database and
provide necessary data to the users.
Applications: - It can be considered as a user friendly web page where the user
enters the requests. Here he simply enters the details that he needs and presses
buttons to get the data.
End User: - They are the real users of the database. They can be developers,
designers, administrator or the actual users of the database.
DDL Compiler: - This part of database is responsible for processing the DDL
commands. That means these compiler actually breaks down the command into
machine understandable codes. It is also responsible for storing the metadata
information like table name, space used by it, number of columns in it etc.
DML Compiler: - When the user inserts, deletes, updates or retrieves the record
from the database, he will be sending request which he understands by pressing
some buttons. But for the database to work/understand the request, it should be
broken down to object code. This is done by this compiler.
Query Optimizer: - When user fires some request, he is least bothered how it will
be fired on the database. He is not all aware of database or its way of performance.
But whatever be the request, it should be efficient enough to fetch, insert, update or
delete the data from the database. The query optimizer decides the best way to
execute the user request which is received from the DML compiler.
Stored Data Manager: - This is also known as Database Control System. It is one
the main central system of the database. It is responsible for various tasks :-
• It helps to backup the database and recover data whenever required. Since it
is a huge database and when there is any unexpected exploit of transaction,
and reverting the changes are not easy. It maintains the backup of all data, so
that it can be recovered.
Data Files: - It has the real data stored in it. It can be stored as magnetic tapes,
magnetic disks or optical disks.
Compiled DML: - Some of the processed DML statements (insert, update, delete)
are stored in it so that if there is similar requests, it will be re-used.
Data Dictionary: - It contains all the information about the database. As the name
suggests, it is the dictionary of all the data items. It contains description of all the
tables, view, materialized views, constraints, indexes, triggers etc.
People who deal with Data Base
Application Programmers
Application programmers are the one who writes application programs that uses
the database. These application programs are written in programming languages
like COBOL or PL (Programming Language 1), Java and fourth generation
language. These programs meet the user requirement and made according to
user requirements. Retrieving information, creating new information and
changing existing information is done by these application programs.
They interact with DBMS through DML (Data manipulation language) calls.
And all these functions are performed by generating a request to the DBMS. If
application programmers are not there then there will be no creativity in the
whole team of Database.
End Users
End users are those who access the database from the terminal end. They use
the developed applications and they don’t have any knowledge about the design
and working of database. These are the second class of users and their main
motto is just to get their task done. There are basically two types of end users
that are discussed below:-
Casual User
These users have great knowledge of query language. Casual users access data
by entering different queries from the terminal end. They do not write programs
but they can interact with the system by writing queries.
Naïve
Any user who does not have any knowledge about database can be in this
category. There task is to just use the developed application and get the desired
results. For example: Clerical staff in any bank is a naïve user. They don’t have
any DBMS knowledge but they still use the database and perform their given
task.
Data availability and recovery from failures: ensuring if the system fails,
users can continue to access as much of the uncorrupted data as possible.
System Analyst
DBMS Architecture
The design of a DBMS depends on its architecture. It can be centralized or
decentralized or hierarchical. The architecture of a DBMS can be seen as either
single tier or multi-tier. The N-tier architecture divides the whole system into
related but independent n modules, which can be independently modified,
altered, changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits
on the DBMS and uses it. Any changes done here will directly be done on the
DBMS itself. It does not provide handy tools for end-users. Database designers
and programmers normally prefer to use single-tier architecture.
The 3-tier architecture separates its tiers from each other based on the
complexity of the users and how they use the data present in the database. It is
the most widely used architecture to design a DBMS.
Database (Data) Tier − In this tier, the database resides along with its query
processing languages. We also have the relations that define the data and
their constraints at this level.
Application (Middle) Tier − In this tier reside the application server and
the programs that access the database. For a user, this application tier
presents an abstracted view of the database. End-users are unaware of any
existence of the database beyond the application. At the other end, the
database tier is not aware of any other user beyond the application tier.
Hence, the application layer sits in the middle and acts as a mediator
between the end-user and the database.
User (Presentation) Tier − End-users operate on this tier and they know
nothing about any existence of the database beyond this layer. At this layer,
multiple views of the database can be provided by the application. All views
are generated by applications that reside in the application tier.
The very first data model could be flat data-models, where all the data used
are to be kept in the same plane. Earlier data models were not so scientific,
hence they were prone to introduce lots of duplication and update anomalies.
• Ensures that all data objects required by the database are accurately
represented. Omission of data will lead to creation of faulty reports
and produce incorrect results.
• Data Model structure helps to define the relational tables, primary and
foreign keys and stored procedures.
Conceptual: This Data Model defines WHAT the system contains. This
model is typically created by Business stakeholders and Data Architects. The
purpose is to organize, scope and define business concepts and rules.
Physical: This Data Model describes HOW the system will be implemented
using a specific DBMS system. This model is typically created by DBA and
developers. The purpose is actual implementation of the database.
Conceptual Model:
The main aim of this model is to establish the entities, their attributes, and
their relationships. In this Data modeling level, there is hardly any detail
available of the actual Database structure.
Customer and Product are two entities. Customer number and name are
attributes of the Customer entity.
Logical Model
Logical data models add further information to the conceptual model
elements. It defines the structure of the data elements and set the
relationships between them.
Physical Model
A Physical Data Model describes the database specific implementation of
the data model. It offers an abstraction of the database and helps generate
schema. This is because of the richness of meta-data offered by a Physical
Data Model.
This type of Data model also helps to visualize database structure. It helps to
model database columns keys, constraints, indexes, triggers, and other
RDBMS features.
Introduction to Data Base Models
A Database model defines the logical design and structure of a database and
defines how data will be stored, accessed and updated in a database
management system. While the Relational Model is the most widely used
database model, there are other models too:
• Hierarchical Model
• Network Model
• Entity-relationship Model
• Relational Model.
Hierarchical Model
In this model, a child node will only have a single parent node.
This was the most widely used database model, before Relational Model was
introduced.
E-R Models are defined to represent the relationships into pictorial form to
make it easier for different stakeholders to understand.
This model is good to design a database, which can then be turned into
tables in relational model.
The ER Model creates entity set, relationship set, general attributes and
constraints.
Mapping cardinalities −
1. one to one
2. one to many
3. many to one
4. many to many
Attributes:
An attribute can be of many types, here are different types of attributes
defined in ER database model:
Simple attribute: The attributes with values that are atomic and cannot be
broken down further are simple attributes. For example, student's age.
Composite attribute: A composite attribute is made up of more than one
simple attribute. For example, student's address will contain, house
no., street name, pin code etc.
Derived attribute: These are the attributes which are not present in the
whole database management system, but are derived using other attributes.
For example, average age of students in a class.
Relationship:
When an Entity is related to another Entity, they are said to have a
relationship. For example, A Class Entity is related to Student entity,
because students study in classes, hence this is a relationship.
• The logical model concentrates on the data requirements and the data to be
stored independent of physical considerations. It does not concern itself with
how the data will be stored or where it will be stored physically.
• The physical data design model involves translating the logical design of the
database onto physical media using hardware resources and software
systems such as database management systems (DBMS).
The database development life cycle has a number of stages that are
followed when developing database systems.
The steps in the development life cycle do not necessary have to be followed
religiously in a sequential manner.
Requirements analysis
System definition - This stage defines the scope and boundaries of the
proposed database system.
Database designing
Physical model - This stage implements the logical model of the database
taking into account the DBMS and physical implementation factors.
Implementation
Data conversion and loading - this stage is concerned with importing and
converting data from the old system into the new database.
The most popular data model in DBMS is the Relational Model. It is more
scientific a model than others. This model is based on first-order predicate
logic and defines a table as an n-ary relation.
Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation given
below has 4 tuples.
NULL values: Values of some attribute for some tuples may be unknown, missing
or undefined which are represented by NULL. Two NULL values in a relation are
considered different from each other.
Table 1 and Table 2 represent relational model having two relations STUDENT
and STUDENT_COURSE.
Column: Column represents the set of values for a particular attribute. The
column STUD_NO is extracted from relation STUDENT.
Super Key is defined as a set of attributes within a table that can uniquely
identify each record within a table. Super Key is a superset of Candidate
key. In the table defined above super key would include student_id,
(student_id, name), phone etc.
Candidate Key: The minimal set of attribute which can uniquely identify a
tuple is known as candidate key.
The value of Candidate Key is unique and non-null for every tuple.
There can be more than one candidate key in a relation. For Example,
STUD_NO as well as STUD_PHONE both are candidate keys for relation
STUDENT.
The candidate key can be simple (having only one attribute) or composite
as well. For Example, {STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Primary Key: There can be more than one candidate key in a relation out of
which one can be chosen as primary key.
For Example, STUD_NO as well as STUD_PHONE both are candidate
keys for relation STUDENT but STUD_NO can be chosen as primary key
(only one out of many candidate keys).
Alternate Key: The candidate key other than primary key is called as
alternate key.
Foreign key: Foreign Key are the columns of a table that points to the
primary key of another table. They act as a cross-reference between tables.
unlike, Primary Key of any given relation, Foreign Key can be NULL as
well as may contain duplicate tuples i.e. it need not follow uniqueness
constraint.
Composite Key :- Key that consists of two or more attributes that uniquely
identify any record in a table is called Composite key. But the attributes
which together form the Composite key are not a key independently or
individually.
Constraints in Relational Model
Referential Integrity: When one attribute of a relation can only take values
from other attribute of same relation or any other relation, it is called
referential integrity. Let us suppose we have 2 relations:
STUDENT
BRANCH
BRANCH_CODE of STUDENT can only take the values which are present
in BRANCH_CODE of BRANCH which is called referential integrity
constraint.
CONCEPT OF NORMALIZATION
If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when
we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left with
old values. Such instances leave the database in an inconsistent state.
Insert anomalies − We tried to insert data in a record that does not exist at
all.
A table is in 1 NF iff:
To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the table.
We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any of
the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in
Second Normal Form.
We broke the relation in two as depicted in the above picture, So there exists
no partial dependency.
A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y.
• X is a super key.
• Y is a prime attribute (each element of Y is part of some candidate key).
We find that in the above Student_detail relation, Stu_ID is the key and only
prime key attribute. We find that City can be identified by Stu_ID as well as
Zip itself. Neither Zip is a superkey nor is City a prime attribute.
To bring this relation into third normal form, we break the relation into two
relations as follows:
To make this relation(table) satisfy BCNF, we will decompose this table into
two tables, student table and professor table.
Student Table:
Professor Table:
These SQL commands are mainly categorized into four categories as:
• CREATE
• DROP
• ALTER
• TRUNCATE
• RENAME
CREATE COMMAND
There are two CREATE statements available in SQL:
1. CREATE DATABASE
2. CREATE TABLE
CREATE DATABASE
A Database is defined as a structured set of data. So, in SQL the very first step to
store the data in a well structured manner is to create a database. The CREATE
DATABASE statement is used to create a new database in SQL.
Syntax:
CREATE DATABASE database_name;
DROP COMMAND
Example:
DROP TABLE table_name;
ADD is used to add columns into the existing table. Sometimes we may
require to add additional information, in that case we do not require to create
the whole database again, ADD comes to our rescue.
ALTER TABLE-MODIFY
Syntax(Oracle,MySQL,MariaDB):
Syntax(Oracle,MySQL,MariaDB):
Columns can be also be given new name with the use of ALTER TABLE.
Syntax(Oracle):
Syntax(MySQL,MariaDB):
TRUNCATE COMMAND
Truncate means the data will be deleted but the structure will remain in the
memory for further operations.
The SQL commands that deals with the manipulation of data present in the
database belong to DML or Data Manipulation Language and this includes
most of the SQL statements.
INSERT COMMAND
The INSERT INTO statement of SQL is used to insert a new row in a table.
There are two ways of using INSERT INTO statement for inserting rows:
Only values: First method is to specify only the value of data to be inserted
without the column names.
INSERT INTO Table_name VALUES (value1, value2, value3,…);
value1, value2,.. : value of first column, second column,… for the new
record.
Column names and values both: In the second method we will specify
both the columns which we want to fill and their corresponding values as
shown below:
We can use the SELECT statement with INSERT INTO statement to copy
rows from one table and insert them into another Table. The use of this
statement is similar to that of INSERT INTO statement.
The difference is that the SELECT statement is used here to select data from
a different table. The different ways of using INSERT INTO SELECT
statement are shown below:
Inserting all columns of a table: We can copy all the data of a table and
insert into in a different table.
We have used the SELECT statement to copy the data from one table and
INSERT INTO statement to insert in a different table.
Inserting specific columns of a table: We can copy only those columns of
a table which we want to insert into in a different table.
We have used the SELECT statement to copy the data of the selected
columns only from the second table and INSERT INTO statement to insert
in first table.
Copying specific rows from a table: We can copy specific rows from a
table to insert into another table by using WHERE clause with the SELECT
statement. We have to provide appropriate condition in the WHERE clause
to select specific rows.
UPDATE COMMAND
In the above query the SET statement is used to set new values to the
particular column and the WHERE clause is used to select the rows for
which the columns are needed to be updated. If we have not used the
WHERE clause then the columns in all the rows will be updated. So the
WHERE clause is used to choose the particular rows.
DELETE COMMAND
Example:
ID NAME SALARY
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F NULL
Count():
Count(salary): Return number of Non Null values over the column Salary
i.e. 5.
Count(Distinct Salary): Return number of distinct Non Null values over the
column Salary i.e. 4.
Sum():
sum(salary): Sum all Non Null values of Column salary i.e., 310.
Min():
Min(salary): Minimum value in the salary column except NULL i.e., 40.
Max();
The result of data mining is the patterns and knowledge that we gain at the
end of the extraction process. In that sense, Data Mining is also known as
Knowledge Discovery or Knowledge Extraction.
Now a days, data mining is used in almost all the places where a large
amount of data is stored and processed.
1. Financial Analysis
2. Biological Analysis
3. Scientific Analysis etc.
Real life example of Data Mining –
Data Warehousing:
Data warehouse consolidates data from many sources while ensuring data
quality, consistency and accuracy. Data warehouse improves system
performance by separating analytics processing from transnational
databases. Data flows into a data warehouse from the various databases. A
data warehouse works by organizing data into a schema which describes the
layout and type of data. Query tools analyze the data tables using schema.
Data Warehousing Process
Hosting a cloud:
There are three layers in cloud computing. Companies use these layers based
on the service they provide.
Infrastructure
Platform
Application
At the bottom is the foundation, the Infrastructure where the people start and
begin to build. This is the layer where the cloud hosting lives.
Hosting :
Let’s say you have a company and a website and the website has a lot of
communications that are exchanged between members. You start with a few
members talking with each other and then gradually the numbers of
members increases.
A few years ago, the websites are put in the server somewhere, in this way
you have to run around or buy and set number of servers. It costs a lot of
money and takes lot of time. You pay for these servers when you are using
and as well as when you are not using. This is called Hosting.
Scalability: With Cloud hosting, it is easy to grow and shrink the number
and size of servers based on the need.
Along with purchase cost, off-site hardware cuts internal power costs and
saves space. Large data centers can take up precious office space and
produce a large amount of heat. Moving to cloud applications or storage can
help maximize space and significantly cut energy expenditures.
Physical Security: The underlying physical servers are still housed within
data centers and so benefit from the security measures that those facilities
implement to prevent people accessing or disrupting them on-site.