You are on page 1of 11

MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

UNIT – I
Data resource management - Data base concepts, The traditional approaches,
the modern approaches (Data base management approaches) DBMS, Data
models, Data ware housing and mining.

2.1 DATA BASE CONCEPTS:


An entity may be a tangible object, such as an employee, a student, a spare part or a place. It may also be
non-tangible, such as an event, a job title, a customer account, a profit centre or an abstract concept. An
entity can be described by its characteristics or features, such as name, age, designation, etc. These
characteristics or features of an entity are called attributes .

Data is generally organized into characters, fields, records, files, and database, which is called the logical
data elements.

Character : A character consists of a single alphabetic, numeric, or other symbol, which is represented
by ‘bit’ or ‘byte’.

Field : The ‘field’, which is the next higher level of data, is a combination of related characters. A ‘Field’
is also termed as a data item. For example, the combination of various alphabetic characters (fields) in an
employee may be ‘employee_name’, ‘sex’, ‘address’ etc.

Record : Combination or collection of various related fields that describe single instance of an entity is
known as a ‘record’. For example, student_name, address, Roll_no, Marks, etc. will be a record of the
student.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 1


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

File : A group of related records is known as a file. In other words, any collection of related records in
the form of rows and columns (tabular form) is called a file. For example, if there are many students in a
class, then a group of related records would form a ‘student_file’.
Database : Collection of various related files is known as a database. An information system application
may have several related files and all the related files would constitute a database for that application. For
example, in a salary processing system, the files may be employee_file, provident_fund_file,
income_tax_file, etc.

2.2 THE TRADITIONAL APPROACH


Traditionally, data files were developed and maintained separately for individual applications. Thus, the
file processing system depends on segments of data across the organization where every functional unit
like marketing, finance, production, etc., used to maintain their own set of application programs and data
files.

Problems with Traditional File Processing


This approach was inadequate, especially when organizations started developing organization wide
integrated applications. The major drawbacks of file processing system may be outlined due to the
following reasons.
(i) Data duplication (ii) Data inconsistency
(iii) Lack of data integration (iv) Data dependence (v) Program dependence
Data duplication : In file system, each application has its own data file, the same data may have to be
recorded and stored in several files. For example, payroll application and personnel application, both will
have data on employee name, designation, etc. This results in unnecessary duplication/redundancy of
common data items.
Data inconsistency : Data duplication leads to data inconsistency especially when data is to be updated.
Data inconsistency occurs because the same data items which appear in more than one file do not get
updated simultaneously in all the data files.
Lack of data integration :Because of independent data files, users face difficulty in getting information
on any dynamic query that requires accessing data stored in more than one file. Thus, either complicated
programs have to be developed to retrieve data from each independent data file or users have to manually
collect the required information from various outputs of separate applications.
Data dependence :The applications in file processing systems are data dependent. For example, in order
processing application, the file may be organized on customers records sorted on their last name, which
implies that retrieval of any customer’s record has to be through his/her last name only.
Program dependence :The reports produced by the file processing system are program dependent, which
implies that if any change in the format or structure of data and records in the file is to be made, a
corresponding change in the programs have to be made.

2.3 THE MODERN APPROACH


An alternative approach to the file processing system is the database management approach, which isalso
called the modern approach for managing organizational data. A database is an organized collection of
records and files which are related to each other. In a database system, a common pool of data can be
shared by a number of applications as it is data and program independent.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 2


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

Figure 5.2 Simplified View of a Database System

The software that allows an organization to centralize data, manage it efficiently, and provides access to
the database by application programmes is known as a Database Management System (DBMS). The
DBMS thus solves the problems of the traditional file processing environment.

Objectives of a Database The specific objectives may be listed as follows


(i) Controlled data redundancy
(ii) Enhanced data consistency
(iii) Data independence
(iv) Application independence
(v) Ease of use
(vi) Economical
(vii) Recovery from failure

Advantages of Database : Database approach provides the following benefits over the file
management systems.
Redundancy control : In a file management system, each application has its own data, which causes
duplication of common data items in more than one file. This data duplication needs more storage space
as well as multiple updations for a single transaction. This problem is overcome in database approach
where data is stored only once.
Data consistency: The problem of updating multiple files in file management system leads to inaccurate
data as different files may contain different information of the same data item at a given point of time. In
database approach, this problem of inconsistent data is automatically solved with the control of
redundancy.
Management queries : The database approach, in most of the information systems, many of the
organization-wide files at one place known as central database and thus is capable of answering queries
of the management, relating to more than one functional area.

Data independence : However, the database approach provides an independence between the file
structure and program structure. This gives a flexibility to the application programs in Database
Management System (DBMS) environment.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 3


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

Disadvantages of a Database The disadvantages of a database approach are given below.


Centralized database : The data structure may become quite complex because of the centralized database
supporting many applications in an organization. This may lead to difficulties in its management and
may require a professional/ an experienced database designer and sometimes extensive training for users.
More disk space : Database approach generally requires more processing than file management system
and, thus, needs more disk space for program storage.
Operationality of the system : Since the database is used by many users in the organization, any failure
in it, whether due to a system fault, database corruption, etc. will affect the operationality or functionality
of the system as it would render all users unable to access the database.
Security risk :Being a centralized database, it suffers from security disasters. It needs high security.

Basic Database Architecture


Distributed databases : A distributed database, as the name indicates, is stored in more than one
physical location. The database is stored partly in one location while it is partly stored and maintained in
other locations. In other words, a distributed database coordinates data access from various locations. In
this approach, databases are designed as an entity and are linked through communication networks.
Client-server systems : These systems are closely related to the concept of distributed database. In the
client/server model, the database and processing power are distributed over the organization rather than
having a centralized database. This model splits processing between ‘clients’ and ‘servers’ on a network,
assigning these functions to the machine that it is most able to perform.

2.4 DATABASE MANAGEMENT SYSTEM


DBMS is a software that facilitates flexible management of data. It is generally composed of three
sub-systems which are described as follows.
Database definition : In this sub-system, the complete database (schema) is described with the help of a
special language known as the Data Description Language (DDL).
Database manipulation : The stored data may either be retrieved or updated later through Data
Manipulation Language (DML). The manipulation subsystem can retrieve the required elements of data
(the sub-schema) in a variety of sequences.
Database support : This sub-system performs database utility or service functions that include functions
like list files, change file passwords, change file capacities, print file statistics, unlock files, etc.
Functions of a Database Management System - A DBMS performs a wide variety of functions,
which are discussed as follows.
Data organization : DBMS organizes data items as per the specifications of the data definition language.
Database administrator decides about the data specifications that are most-suited to each application.
Data integration : Data is inter-related together at the element level and can be manipulated in many
combinations during execution of a particular application program.
Physical/logical-level separation : DBMS separates the logical description and relationships of data
from the way in which the data is physically stored. It also separates out application programs and their
associated data.
Data control :DBMS receives requests for storing data from different programs. It controls how and
where data is physically stored. Similarly, it locates and returns requested data to the program.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 4


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

Data protection : Data protection and security is one of the major concerns in a database. DBMS protects
the data against access by unauthorized users, physical damage, operating system failure, simultaneous
updation, etc.

2.5 DATA MODELS


Several logical data models or database structures are used to build the conceptual structure or schema of
the database. The various data models are:
(i) Hierarchical model (ii) Network model
(iii) Relational model (iv) Object-oriented model (v) Multi-dimensional model

Hierarchical model : In the hierarchical structure, the relationships between records are stored in the
form of a hierarchy or a tree (inverted tree, with the root at the top and branches below) which has a root.
The lowermost record is known as the ‘child’ of the next higher level record, whereas the higher level
record is called the ‘parent’ of its child records. Relationships among records are one-to-many.

Network model : The network model allows more complex 1:M or M:M logical relationships among
entities. The relationships are stored in the form of linked list structure in which subordinate records,
called members, can be linked to more than one owner (parent).

Relational data model : In a relational structure, data is organized in two-dimensional tables, called
relations, each of which is implemented as a file. In relational model, each row of the table is referred to
as a ‘tuple’ and each column in the row as ‘attribute’. A tuple refers to a set of data item values relating
to one entity.

Object-oriented model : Object-oriented model is an approach to data management that stores both data
and the operations that can be performed upon the data as objects. Where as traditional DBMS are
designed for homogeneous data that can be structured into pre-defined data fields and records, object-
oriented databases are capable of manipulating heterogeneous data that include drawings, images,
photographs, voice and full-motion video.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 5


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

Data table for student

Multi-dimensional model : Multi-dimensional model is an extension of the relational model. In this


model, data is organized using multi-dimensional structure. Multi-dimensional structures can be
visualized as cubes of data and cubes within cubes of data.

Structured Query Language (SQL)


It is called structured query language because it follows a rigorous set of rules and procedures in
answering queries. SQL is also termed as 4GL to distinguish it from other 3GL programming languages
like PASCAL, COBOL or C.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 6


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

Any query on a single table can be performed by using only two basic operators, namely SELECT and
PROJECT. The SELECT operator selects a set of records (rows) from the table, whereas PROJECT takes
out selected fields (columns) from the table.
Another operator JOIN is also used in SQL when the query requires more than one table. JOIN links or
combines two tables together over a common field.

Typical constructs to create teacher and student Tables are as follows.

Create Table student CNO Char (5), CTITLE Char (25), CREDITS integer, STDNO integer, TCODE Char (3)

Create Table Teacher TCODE Char (3), NAME Char (20), DEPTT Char (5), DESIG Char (12), PHONE Char (6),

For example, we want to know the name of course(s) where the number of students is less than 21 from
our earlier database stored as relation course (Table 5.1).

SELECT CNO, CTITLE, CREDITS, STDNO FROM Course WHERE STDNO < 21

E-R Diagrams
Entity-relationship diagrams, popularly known as E-R diagrams, are the graphical representation of
various entities and their relationships. In fact, E-R diagrams are used as the first step in designing and
creating a physical data model comprising of tables and relationships. In this way, E-R diagrams can be
considered as similar to data flow diagrams, with the difference that E-R diagrams focus on the need for
and use of data.

There may be three types of relationships which exist among entities, namely, one-to-one; one-to- many;
and many-to-many.

A one-to-one (1:1) relationship is an association between two entities. For example a relationship
between husband and wife, where the husband is allowed one wife at a time and vice versa (see Figure
5.9).

A one-to-many (1:M) relationship represents an entity that may have two or more entities associated with
it. For example, father may have many children and a state may have many districts but each child has
only father and each district has only one state (see Figure 5.10).

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 7


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

A many-to-many (M:M) relationship describes entities which may have many relationships both ways.
For example, teachers and students where a teacher teaches many students and a student attends the
classes of many teachers (see Figure 5.11).

Normalization of Database
Database Normalisation is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics
like Insertion, Update and Deletion Anamolies.

Problem Without Normalization


Without Normalization, it becomes difficult to handle and update the database, without facing data loss.
Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized.

S_id S_Name S_Address Subject_opted

401 Adam Noida Bio

402 Alex Panipat Maths

403 Stuart Jammu Maths

404 Adam Noida Physics

 UpdationAnamoly : To update address of a student who occurs twice or more than twice in a
table, we will have to update S_Address column in all the rows, else data will become
inconsistent.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 8


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

 Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and
address of a student but if student has not opted for any subjects yet then we have to
insert NULL there, leading to Insertion Anamoly.
 Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with it.

Normalization rule are divided into following normal form.

1. First Normal Form


2. Second Normal Form
3. Third Normal Form
4. BCNF

First Normal Form (1NF) : As per First Normal Form, no two Rows of data must contain repeating
group of information i.e each set of column must have a unique value, such that multiple columns cannot
be used to fetch the same row.
For example consider a table which is not in First normal form
1 ST NORMAL FORM RESULTS THE FOLLOWING TABLE
Student Age Subject

Biology, Student Age Subject


Adam 15 Maths
Adam 15 Biology
Alex 14 Maths
Adam 15 Maths
Stuart 17 Maths
Alex 14 Maths

Stuart 17 Maths

Second Normal Form (2NF)


As per the Second Normal Form there must not be any partial dependency of any column on primary
key. while the candidate key is {Student, Subject}, Age of Student only depends on Student column,
which is incorrect as per Second Normal Form.

AS FOR THE SECOND NORMAL FORM


Student Age Student Subject

Adam 15 Adam Biology

Adam 15 Adam Maths

Alex 14 Alex Maths

Stuart 17 Stuart Maths

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 9


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

Third Normal Form (3NF)


Third Normal form applies that every non-prime attribute of table must be dependent on primary key.
So this transitive functional dependency should be removed from the table and also the table must be
in Second Normal form. For example, consider a table with following fields.
Student_DetailTable :
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency
between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the
street, city and state to new table, with Zip as primary key.

Note: The purpose of data warehouse is permanent storage of detailed information. Data entered into a
data warehouse needs to be processed to ensure that it is clean, complete, and in the proper format.
Many a times, a data warehouse is subdivided into smaller repositories called data marts. A data mart is
a subset of a data warehouse, in which only the required portion of the data warehouse information is
kept.
Data warehouse has the following important characteristics:
(i) Subject-oriented focuses on modelling and analysis of data relating to a specific area.
(ii) Integrated, i.e., the data warehouse is an integration of data from various different
applications/systems like ERP System; CRM System, SCM System, etc.
(iii) Historical perspective: The time variant for a data warehouse has a historical perspective in its
approach, for example, past 5-10 years.
(iv) Non-volatile means it is stored permanently i.e. data once stored cannot be updated.
However, data warehouses or data marts in themselves are useless. To make data warehouses useful,
organizations must use BI (Business Intelligence) tools to process data from these huge databases into
meaningful information. There databases are used for data mining and Online Analytical Processing
(OLAP).
The organizations that develop Business Intelligence (BI) tools create interfaces that help the managers
to quickly grasp business situations. Such an interface is simple to understand and interpretation by the
managers becomes easy. One of such interface is called dashboard, because it looks similar to a car
dashboard.
Data mining is the process of sorting through large data sets to identify patterns and establish
relationships to solve problems through data analysis. Data mining tools allow enterprises to predict
future trends.
Data mining has four main objectives:
• Sequence or path analysis: Finding patterns where one event leads to another,

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 10


MANAGEMENT INFORMATION SYSTEMS – Dept of MBA SREC

• Classification: Finding whether certain facts fall into predefined groups,


• Clustering: Finding groups of related facts not previously known, and
• Forecasting: Discovering patterns in data that can lead to reasonable predictions.
Data mining process generally consists of the following sequence of steps
1. Data cleaning: To remove noise and inconsistent data.
2. Data integration: Where multiple data sources may be combined.
3. Data selection: Data relevant to the analysis task are retrieved from the database.
4. Data transformation: Data are transformed or consolidated into forms appropriate for mining by
performing summary or aggregation operations.
5. Data mining: Process where intelligent methods are applied in order to extract data patterns.
6. Pattern evaluation: To identify the truly interesting patterns representing knowledge based on some
interestingness measure. Patterns are selected on interestingness basis.
7. Knowledge presentation: Visualization and knowledge representation technique are used to present the
mined knowledge to the user.

S. Md. Riyaz Naik, Assistant proff, Dept of CSE Page 11

You might also like