Professional Documents
Culture Documents
UNIT – I
Data resource management - Data base concepts, The traditional approaches,
the modern approaches (Data base management approaches) DBMS, Data
models, Data ware housing and mining.
Data is generally organized into characters, fields, records, files, and database, which is called the logical
data elements.
Character : A character consists of a single alphabetic, numeric, or other symbol, which is represented
by ‘bit’ or ‘byte’.
Field : The ‘field’, which is the next higher level of data, is a combination of related characters. A ‘Field’
is also termed as a data item. For example, the combination of various alphabetic characters (fields) in an
employee may be ‘employee_name’, ‘sex’, ‘address’ etc.
Record : Combination or collection of various related fields that describe single instance of an entity is
known as a ‘record’. For example, student_name, address, Roll_no, Marks, etc. will be a record of the
student.
File : A group of related records is known as a file. In other words, any collection of related records in
the form of rows and columns (tabular form) is called a file. For example, if there are many students in a
class, then a group of related records would form a ‘student_file’.
Database : Collection of various related files is known as a database. An information system application
may have several related files and all the related files would constitute a database for that application. For
example, in a salary processing system, the files may be employee_file, provident_fund_file,
income_tax_file, etc.
The software that allows an organization to centralize data, manage it efficiently, and provides access to
the database by application programmes is known as a Database Management System (DBMS). The
DBMS thus solves the problems of the traditional file processing environment.
Advantages of Database : Database approach provides the following benefits over the file
management systems.
Redundancy control : In a file management system, each application has its own data, which causes
duplication of common data items in more than one file. This data duplication needs more storage space
as well as multiple updations for a single transaction. This problem is overcome in database approach
where data is stored only once.
Data consistency: The problem of updating multiple files in file management system leads to inaccurate
data as different files may contain different information of the same data item at a given point of time. In
database approach, this problem of inconsistent data is automatically solved with the control of
redundancy.
Management queries : The database approach, in most of the information systems, many of the
organization-wide files at one place known as central database and thus is capable of answering queries
of the management, relating to more than one functional area.
Data independence : However, the database approach provides an independence between the file
structure and program structure. This gives a flexibility to the application programs in Database
Management System (DBMS) environment.
Data protection : Data protection and security is one of the major concerns in a database. DBMS protects
the data against access by unauthorized users, physical damage, operating system failure, simultaneous
updation, etc.
Hierarchical model : In the hierarchical structure, the relationships between records are stored in the
form of a hierarchy or a tree (inverted tree, with the root at the top and branches below) which has a root.
The lowermost record is known as the ‘child’ of the next higher level record, whereas the higher level
record is called the ‘parent’ of its child records. Relationships among records are one-to-many.
Network model : The network model allows more complex 1:M or M:M logical relationships among
entities. The relationships are stored in the form of linked list structure in which subordinate records,
called members, can be linked to more than one owner (parent).
Relational data model : In a relational structure, data is organized in two-dimensional tables, called
relations, each of which is implemented as a file. In relational model, each row of the table is referred to
as a ‘tuple’ and each column in the row as ‘attribute’. A tuple refers to a set of data item values relating
to one entity.
Object-oriented model : Object-oriented model is an approach to data management that stores both data
and the operations that can be performed upon the data as objects. Where as traditional DBMS are
designed for homogeneous data that can be structured into pre-defined data fields and records, object-
oriented databases are capable of manipulating heterogeneous data that include drawings, images,
photographs, voice and full-motion video.
Any query on a single table can be performed by using only two basic operators, namely SELECT and
PROJECT. The SELECT operator selects a set of records (rows) from the table, whereas PROJECT takes
out selected fields (columns) from the table.
Another operator JOIN is also used in SQL when the query requires more than one table. JOIN links or
combines two tables together over a common field.
Create Table student CNO Char (5), CTITLE Char (25), CREDITS integer, STDNO integer, TCODE Char (3)
Create Table Teacher TCODE Char (3), NAME Char (20), DEPTT Char (5), DESIG Char (12), PHONE Char (6),
For example, we want to know the name of course(s) where the number of students is less than 21 from
our earlier database stored as relation course (Table 5.1).
SELECT CNO, CTITLE, CREDITS, STDNO FROM Course WHERE STDNO < 21
E-R Diagrams
Entity-relationship diagrams, popularly known as E-R diagrams, are the graphical representation of
various entities and their relationships. In fact, E-R diagrams are used as the first step in designing and
creating a physical data model comprising of tables and relationships. In this way, E-R diagrams can be
considered as similar to data flow diagrams, with the difference that E-R diagrams focus on the need for
and use of data.
There may be three types of relationships which exist among entities, namely, one-to-one; one-to- many;
and many-to-many.
A one-to-one (1:1) relationship is an association between two entities. For example a relationship
between husband and wife, where the husband is allowed one wife at a time and vice versa (see Figure
5.9).
A one-to-many (1:M) relationship represents an entity that may have two or more entities associated with
it. For example, father may have many children and a state may have many districts but each child has
only father and each district has only one state (see Figure 5.10).
A many-to-many (M:M) relationship describes entities which may have many relationships both ways.
For example, teachers and students where a teacher teaches many students and a student attends the
classes of many teachers (see Figure 5.11).
Normalization of Database
Database Normalisation is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics
like Insertion, Update and Deletion Anamolies.
UpdationAnamoly : To update address of a student who occurs twice or more than twice in a
table, we will have to update S_Address column in all the rows, else data will become
inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and
address of a student but if student has not opted for any subjects yet then we have to
insert NULL there, leading to Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with it.
First Normal Form (1NF) : As per First Normal Form, no two Rows of data must contain repeating
group of information i.e each set of column must have a unique value, such that multiple columns cannot
be used to fetch the same row.
For example consider a table which is not in First normal form
1 ST NORMAL FORM RESULTS THE FOLLOWING TABLE
Student Age Subject
Stuart 17 Maths
Note: The purpose of data warehouse is permanent storage of detailed information. Data entered into a
data warehouse needs to be processed to ensure that it is clean, complete, and in the proper format.
Many a times, a data warehouse is subdivided into smaller repositories called data marts. A data mart is
a subset of a data warehouse, in which only the required portion of the data warehouse information is
kept.
Data warehouse has the following important characteristics:
(i) Subject-oriented focuses on modelling and analysis of data relating to a specific area.
(ii) Integrated, i.e., the data warehouse is an integration of data from various different
applications/systems like ERP System; CRM System, SCM System, etc.
(iii) Historical perspective: The time variant for a data warehouse has a historical perspective in its
approach, for example, past 5-10 years.
(iv) Non-volatile means it is stored permanently i.e. data once stored cannot be updated.
However, data warehouses or data marts in themselves are useless. To make data warehouses useful,
organizations must use BI (Business Intelligence) tools to process data from these huge databases into
meaningful information. There databases are used for data mining and Online Analytical Processing
(OLAP).
The organizations that develop Business Intelligence (BI) tools create interfaces that help the managers
to quickly grasp business situations. Such an interface is simple to understand and interpretation by the
managers becomes easy. One of such interface is called dashboard, because it looks similar to a car
dashboard.
Data mining is the process of sorting through large data sets to identify patterns and establish
relationships to solve problems through data analysis. Data mining tools allow enterprises to predict
future trends.
Data mining has four main objectives:
• Sequence or path analysis: Finding patterns where one event leads to another,