Professional Documents
Culture Documents
Many believe that the terms “data” and “information” can be used interchangeably and mean the
same. However, there is a subtle difference between the two.
Data can be a number, symbol, character, word, and if not put into context, individual pieces of data
mean nothing to humans.
On the other hand, information is a data put into context. Information is utilized by humans in some
significant way. A good example of information would be a computer. A computer uses
programming scripts, formulas, or software applications to turn data into information.
DATA PROCESSING
Data processing refers to the process of performing specific operations on a set of data or a database.
A database is an organized collection of facts and information, such as records on employees,
inventory, customers, and potential customers. As these examples suggest, numerous forms of data
processing exist and serve diverse applications in the business setting.
Data processing starts with collecting data. The data collected to convert the desired form must be
processed by processing data in a step-by-step manner such as the data collected must be stored,
sorted, processed, analyzed, and presented.
So this broadly divided into 6 basic steps as following discussion given below.
Data Collection
Storage of Data
Sorting of Data
Processing of Data
Data Analysis
Data Presentation and conclusions
DATA MANAGEMENT
Data management is the process of ingesting, storing, organizing and maintaining the data created
and collected by an organization. Effective data management is a crucial piece of deploying the IT
systems that run business applications and provide analytical information to help drive operational
decision-making and strategic planning by corporate executives, business managers and other end
users.
The data management process includes a combination of different functions that collectively aim to
make sure that the data in corporate systems is accurate, available and accessible. Most of the
required work is done by IT and data management teams
The introduction of shared files solves the problem of duplication and inconsistent data across
different versions of the same file held by different departments, but other problems may emerge,
including:
• File incompatibility: When each department had its own version of a file for processing, each
department could ensure that the structure of the file suited their specific application. If
departments have to share files, the file structure that suits one department might not suit
another. For example, data might need to be sorted in a different sequence for different
applications (for instance, customer details could be stored in alphabetical order, or numerical
order, or ascending or descending order of customer number).
• Difficult to control access: Some applications may require access to more data than others;
for instance, a credit control application will need access to customer credit limit information,
whereas a delivery note printing application will only need access to customer name and
address details. The file will still need to contain the additional information to support the
application that requires it.
• Physical data dependence: If the structure of the data file needs to be changed in some way
(for example, to reflect a change in currency), this alteration will need to be reflected in all
application programs that use that data file.
• Difficult to implement concurrency: While a data file is being processed by one application,
the file will not be available for other applications or for ad hoc queries. This is because, if
more than one application is allowed to alter data in a file at one time, serious problems can
arise in ensuring that the updates made by each application do not clash with one another.
This issue of ensuring consistent, concurrent updating of information is an extremely
important one, and is dealt with in detail for database systems in the chapter on concurrency
control. File-based systems avoid these problems by not allowing more than one application
to access a file at one time.
1) Data redundancy
In computer system many files are likely in the different formats and the programs are written in
different programming languages. Moreover, the same information may be duplicated in several files,
this duplication of data is known as data redundancy.
Example: The address and telephone number of a particular customer may appear in a file that
consist of saving account records and in a file that consist of checking account record.
2) Data inconsistency
Various copies contain the same type of data which may no longer which means that various copies
of same data may contain different kind of information.
Example: A changed customer address may be reflected in savings account records but not elsewhere
in the system.
In file processing system it is very difficult to access the data in a specific way and it also require a
special application program which carry out new task.
4) Data isolation
Because data are scattered in various files and files may be in different formats, writing new
applications program to retrieve the appropriate data is difficult.
5) Integrity problem
Database must satisfy a particular consistency constraint. These constraints are added in application
program.
Example: The balance of a bank account may never fall below a prescribed amount.
6) Atomicity problem
A computer system, like any other mechanical or electrical devices, is subject to failure. In many
applications, it is crucial that if failure occurs, the data be restored to the consistent state that existed
prior to the failure.
If two programs run concurrently it is important to has supervision. But supervision is difficult to
provide because data is decentralised in file processing system. In such an environment, interaction
updates may result in inconsistent data.
8) Security problems
In this not every user of the database system should be able to access all the data.
Traditional database applications were developed on top of the databases, which led to challenges
such as data redundancy, isolation, integrity constraints, and difficulty managing data access. A layer
of abstraction was required between users or apps and the databases at a physical and logical level.
• Data security. DBMS allows organizations to enforce policies that enable compliance and
security. The databases are available for appropriate users according to organizational policies.
The DBMS system is also responsible to maintain optimum performance of querying operations
while ensuring the validity, security and consistency of data items updated to a database.
• Data sharing. Fast and efficient collaboration between users.
• Data access and auditing. Controlled access to databases. Logging associated access activities
allows organizations to audit for security and compliance.
• Data integration. Instead of operating island of database resources, a single interface is used to
manage databases with logical and physical relationships.
• Abstraction and independence. Organizations can change the physical schema of database
systems without necessitating changes to the logical schema that govern database relationships. As
a result, organizations can upgrade storage and scale the infrastructure without impacting database
operations. Similarly, changes to the logical schema can be applied without altering the apps and
services that access the databases.
• Uniform management and administration. A single console interface to perform basic
administrative tasks makes the job easier for database admins and IT users.
A database system is referred to as self-describing because it not only contains the database itself, but
also metadata which defines and describes the data and relationships between tables in the database.
This information is used by the DBMS software or database users if needed. This separation of data
and information about the data makes a database system totally different from the traditional file-
based system in which the data definition is part of the application programs.
In the file-based system, the structure of the data files is defined in the application programs so if a
user wants to change the structure of a file, all the programs that access that file might need to be
changed as well.
On the other hand, in the database approach, the data structure is stored in the system catalogue and
not in the programs. Therefore, one change is all that is needed to change the structure of a file. This
insulation between the programs and data is also called program-data independence.
A database supports multiple views of data. A view is a subset of the database, which is defined and
dedicated for particular users of the system. Multiple users in the system might have different views
of the system. Each view might contain only the data of interest to a user or group of users.
Current database systems are designed for multiple users. That is, they allow many users to access the
same database at the same time. This access is achieved through features called concurrency control
strategies. These strategies ensure that the data accessed are always correct and that data integrity is
maintained.
The design of modern multiuser database systems is a great improvement from those in the past
which restricted usage to one person at a time.
In the database approach, ideally, each data item is stored in only one place in the database. In some
cases, data redundancy still exists to improve system performance, but such redundancy is controlled
by application programming and kept to minimum by introducing as little redudancy as possible
when designing the database.
Data sharing
The integration of all the data, for an organization, within a database system has many advantages.
First, it allows for data sharing among employees and others who have access to the system. Second,
it gives users the ability to generate more information from a given amount of data than would be
possible without the integration.
Database management systems must provide the ability to define and enforce certain constraints to
ensure that users enter valid information and maintain data integrity. A database constraint is a
restriction or rule that dictates what can be entered or edited in a table such as a postal code using a
certain format or adding a valid city in the City field.
There are many types of database constraints. Data type, for example, determines the sort of data
permitted in a field, for example numbers only. Data uniqueness such as the primary key ensures that
no duplicates are entered. Constraints can be simple (field based) or complex (programming).
Not all users of a database system will have the same accessing privileges. For example, one user
might have read-only access (i.e., the ability to read a file but not make changes), while another
might have read and write privileges, which is the ability to both read and modify a file. For this
reason, a database management system should provide a security subsystem to create and control
different types of user accounts and restrict unauthorized access.
Data independence
Another advantage of a database management system is how it allows for data independence. In other
words, the system data descriptions or data describing data (metadata) are separated from the
application programs. This is possible because changes to the data structure are handled by the
database management system and are not embedded in the program itself.
Transaction processing
A database management system must include concurrency control subsystems. This feature ensures
that data remains consistent and valid during transaction processing even if several users update the
same information.
By its very nature, a DBMS permits many users to have access to its database either individually or
simultaneously. It is not important for users to be aware of how and where the data they access is
stored
Backup and recovery are methods that allow you to protect your data from loss. The database system
provides a separate process, from that of a network backup, for backing up and recovering data. If a
hard drive fails and the database stored on the hard drive is not accessible, the only way to recover the
database is from a backup.
If a computer system fails in the middle of a complex update process, the recovery subsystem is
responsible for making sure that the database is restored to its original state. These are two more
benefits of a database management system.
Database Architecture
In this type of architecture, the database is readily available on the client machine, any request made
by client doesn’t require a network connection to perform the action on the database.
For example, lets say you want to fetch the records of employee from the database and the database is
available on your computer system, so the request to fetch employee details will be done by your
computer and the records will be fetched from the database by your computer as well. This type of
system is generally referred as local database system.
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the DBMS
application is present at the client machine, these two machines are connected with each other
through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a query
language like sql, the server perform the request on the database and returns the result back to the
client. The application connection interface such as JDBC, ODBC are used for the interaction
between server and client.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data
is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction with database
system.
Example: Let’s say we are storing customer information in a customer table. At physical level these
records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These
details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data types,
their relationship among each other can be logically implemented. The programmers generally work
at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the screen,
they are not aware of how the data is stored and what data is stored; such details are hidden from
them.
For example: In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the database, it doesn’t
show the data present in those tables. Schema is only a structural view (design) of a database as
shown in the diagram below.
The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data records
gets stored in data structures, however the internal details such as implementation of data structure is
hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user interaction
with database systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance
of database. Database schema defines the variable declarations in tables that belong to a particular
database; the value of these variables at a moment of time is called the instance of that database.
For example, lets say we have a single table student in the database, today the table has 100 records,
so today the instance of the database has 100 records. Lets say we are going to add another 100
records in this table by tomorrow so the instance of database tomorrow will have 200 records in table.
In short, at a particular moment the data stored in database is called the instance, that changes over
time when we add or delete data from the database.
System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They check
whether all the requirements of end users are satisfied.
Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. They can develop their own data base applications according to their requirement. They
don’t write the program code but they interact the data base by writing SQL queries directly
through the query processor.
Application Program :
Application Program are the back end programmers who writes the code for the application
programs. They are the computer professionals. These programs could be written in Programming
languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc.
Depending upon the usage requirements, there are following types of databases available in the
market −
• Centralised database.
• Distributed database.
• Personal database.
• End-user database.
• Commercial database.
• NoSQL database.
• Operational database.
• Relational database.
• Cloud database.
• Object-oriented database.
• Graph database.
Centralized Database
The information(data) is stored at a centralized location and the users from different locations can access this data. This type of database
contains application procedures that help the users to access the data even from a remote location.
Various kinds of authentication procedures are applied for the verification and validation of end users, likewise, a registration number is
provided by the application procedures which keep a track and record of data usage. The local area office handles this thing.
Distributed Database
Just opposite of the centralized database concept, the distributed database has contributions from the common database as well as the
information captured by local computers also. The data is not at one place and is distributed at various sites of an organization. These
sites are connected to each other with the help of communication links which helps them to access the distributed data easily.
You can imagine a distributed database as a one in which various portions of a database are stored in multiple different locations(physical)
along with the application procedures which are replicated and distributed among various points in a network.
There are two kinds of distributed database, viz. homogenous and heterogeneous. The databases which have same underlying hardware
and run over same operating systems and application procedures are known as homogeneous DDB, for eg. All physical locations in a
DDB. Whereas, the operating systems, underlying hardware as well as application procedures can be different at various sites of a DDB
which is known as heterogeneous DDB.
Personal Database
Data is collected and stored on personal computers which is small and easily manageable. The data is generally used by the same
department of an organization and is accessed by a small group of people.
5.Commercial Database
These are the paid versions of the huge databases designed uniquely for the users who want to access the information for help. These
databases are subject specific, and one cannot afford to maintain such a huge information. Access to such databases is provided through
commercial links.
6.NoSQL Database
These are used for large sets of distributed data. There are some big data performance issues which are effectively handled by relational
databases, such kind of issues are easily managed by NoSQL databases. There are very efficient in analyzing large size unstructured
data that may be stored at multiple virtual servers of the cloud.
7.Operational Database
Information related to operations of an enterprise is stored inside this database. Functional lines like marketing, employee relations,
customer service etc. require such kind of databases.
Relational Databases
These databases are categorized by a set of tables where data gets fit into a pre-defined category. The table consists of rows and
columns where the column has an entry for data for a specific category and rows contains instance for that data defined according to the
category. The Structured Query Language (SQL) is the standard user and application program interface for a relational database.
There are various simple operations that can be applied over the table which makes these databases easier to extend, join two databases
with a common relation and modify all existing applications.
Cloud Databases
Now a day, data has been specifically getting stored over clouds also known as a virtual environment, either in a hybrid cloud, public or
private cloud. A cloud database is a database that has been optimized or built for such a virtualized environment. There are various
benefits of a cloud database, some of which are the ability to pay for storage capacity and bandwidth on a per-user basis, and they provide
scalability on demand, along with high availability.
A cloud database also gives enterprises the opportunity to support business applications in a software-as-a-service deployment.
Object-Oriented Databases
An object-oriented database is a collection of object-oriented programming and relational database. There are various items which are
created using object-oriented programming languages like C++, Java which can be stored in relational databases, but object-oriented
databases are well-suited for those items.
An object-oriented database is organized around objects rather than actions, and data rather than logic. For example, a multimedia record
in a relational database can be a definable data object, as opposed to an alphanumeric value.
Graph Databases
The graph is a collection of nodes and edges where each node is used to represent an entity and each edge describes the relationship
between entities. A graph-oriented database, or graph database, is a type of NoSQL database that uses graph theory to store, map and
query relationships.
Graph databases are basically used for analyzing interconnections. For example, companies might use a graph database to mine data
about customers from social media.
Database Languages
A database system provides a data-definition language to specify the database schema and a data-
manipulation language to express database queries and up-dates. These are the components of SQL
language which is the language for Oracle database.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database
There are basically two types:
Procedural DML
Procedural DML is used to tell the system what data is needed and how to take the data.
Procedural DML is embedded into a high-level programming language.
Statement st = connection.createStatement();
while(rs.next){
String s = rs.getString(1);
//dst...
} catch(SQLException e){}
Resultset declare what data is needed, which are included in the line of the SQL query SELECT *
FROM students. While the while line states the way to retrieve the data.
Non Procedural DML
Non-procedural DML is used to declare what data is needed and not how the data is retrieved. Non
procedural also called declarative programming.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a data-
definition language (DDL).
The DDL is also used to specify additional properties of the data.
We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language.
The data values stored in the database must satisfy certain consistency constraints. For example,
suppose the university requires that the account balance of a department must never be negative.
The DDL provides facilities to specify such constraints. The database system checks these
constraints every time the database is updated.
The following are the types of constraints:
• Domain Constraints
A domain of possible values must be associated with every attribute (for example, integer types,
character types, date/time types). Declaring an attribute to be of a particular domain acts as a
constraint on the values that it can take. Domain constraints are the most elementary form of
integrity constraint. They are tested easily by the system whenever a new data item is entered into
the database.
• Referential Integrity
There are cases where we wish to ensure that a value that appears in one relation for a given set of
attributes also appears in a certain set of attributes in another relation (referential integrity). For
example, the department listed for each course must be one that actually exists. More precisely, the
dept name value in a course record must appear in the dept name attribute of some record of the
department relation. Database modifications can cause violations of referential integrity. When a
referential-integrity constraint is violated, the normal procedure is to reject the action that caused the
violation.
• Assertions
An assertion is any condition that the database must always satisfy. Domain constraints and
referential-integrity constraints are special forms of assertions. However, there are many constraints
that we cannot express by using only these special forms. For example, “Every department must
have at least five courses offered every semester” must be expressed as an assertion. When an
assertion is created, the system tests it for validity. If the assertion is valid, then any future
modification to the database is allowed only if it does not cause that assertion to be violated.
• Authorization
We may want to differentiate among the users as far as the type of access they are permitted on
various data values in the database. These differentiations are expressed in terms of authorization,
the most common being: read authorization, which allows reading, but not modification, of data;
insert authorization, which allows insertion of new data, but not modification of existing data;
update authorization, which allows modification, but not deletion, of data; and delete authorization,
which allows deletion of data. We may assign the user all, none, or a combination of these types of
authorization.
The DDL, just like any other programming language, gets as input some instructions (statements)
and generates some output. The output of the DDL is placed in the data dictionary, which contains
metadata— that is, data about data.
The data dictionary is considered to be a special type of table that can only be accessed and
updated by the database system itself (not a regular user). The database system consults the data
dictionary before reading or modifying actual data.
Database Interfaces:
A database management system (DBMS) interface is a user interface which is seen by the user.
User Interface are often graphical or at least partly graphical (GUI – graphical user interface)
constructed and offer tools which make the interaction with the DBMS easier.
An interface can be used to manipulate the Database either for adding the data, or deleting some
data, or updating some data, or for viewing the data present in the database.
These interfaces present the user with lists of options (called menus) that lead the user through
the formation of a request. Basic advantage of using menus is that they removes the tension of
remembering specific commands and syntax of any query language, rather than query is basically
composed step by step by collecting or picking options from a menu that is basically shown by
the system. Pull-down menus are a very popular technique in Web based interfaces. They are
also often used in browsing interface which allow a user to look through the contents of a
database in an exploratory and unstructured manner.
The Speech input is detected using predefined words and used to set up the parameters that are
supplied to the queries. For output, a similar conversion from text or numbers into speech takes
place.
Example: Most of you must have used either Siri on apple, or OK google, Alexa or Cortana to ask
some question like,
“Ok, google, find the value of square root of 729”
“Alexa what is the capital of Kathmandu”
And the speech user interface interprets your speech input and processes the data from the
database and after successful; interpretation answers you back in speech