You are on page 1of 46

Chapter 1

Introduction

-Prof.Supriya Mahadevkar
What Is Structured Data?

• Structured data is information that has been


formatted and transformed into a well-defined data
model. The raw data is mapped into predesigned
fields that can then later be extracted and read
through SQL easily.  
• SQL relational databases, consisting of tables with
rows and columns, are the perfect example of
structured data.
• Example:Student table with name,roll no,marks
column
What is Semi-Structured Data?

• Your data may not always be structured or unstructured –


semi-structured data is what lies another category
between these two that is partially structured. Such data is
defined as semi-structured.
• Semi-structured data is a type of data that has some
consistent and definite characteristics, it does not confine
into a rigid structure such as that needed for relational
databases. 
• Example:JSON
• [{           first_name : "Jane",           last_name : "Smith",        
  order_id : "123456",           order_total : "12.34"      }]
What is Unstructured Data?
• Data present in absolute raw form is termed as 
unstructured. This data is difficult to process due to its
complex arrangement and formatting.
• Unstructured data management may take data from
many forms, including social media posts, chats, satellite
imagery, IoT sensor data, emails, and presentations in
order to organize it in a logical, predefined manner.
• Unstructured data can be anything that’s not in any
specific format. This can be a paragraph from a book
with relevant information. An example of unstructured
data could be Log files that are not easy to separate.
Social media comments and posts that need to be
analyzed.
Difference between structured, semi-
structured, and unstructured data:
• Organization: Structured Data is well
organized, therefore it has the highest level of
organization, while semi-structured data is
partially organized hence the level of
organizing is lesser than structured data but
higher than that of unstructured data. Lastly,
unstructured data is not organized at all.
• Flexibility and Scalability: Structured data is
relational database or schema dependent
therefore less flexible and difficult to scale,
while semi-structured data is more flexible
and simpler to scale as compared to
structured data. However, unstructured data
doesn’t have a schema which makes it most
flexible and scalable out of the other two.
• Versioning: Since structured data is based on
a relational database, versioning is performed
over tuples, rows, and tables. On the other
hand, in semi-structured data tuples or graphs
are possible as only a partial database is
supported. Lastly, in unstructured data
versioning is likely as a whole data as there’s
no support of database.
• Transaction Management: In structured data,
concurrency of data is available and therefore
usually preferred for the multitasking process.
While in semi-structured data transaction gets
adapted from DBMS but still, data
concurrency isn’t available. Lastly, in
unstructured data, neither transaction
management nor data concurrency is present.
File-based Approach
• Data is stored in one or more separate
computer files.
• Data is then processed by computer programs
– applications.
• Physical structure and Storage of data files are
defined in the application code.

Slide 1-10
File-based Approach

Slide 1-11
File-based Approach
• Problems/Limitations
– Separation or isolation of data(Difficult to access
data from more than two files)
– Data Redundancy/Duplication
– Data dependence
– Data Inconsistency

Slide 1-12
Certain functionalities omitted from file
based approach….
• No provision for security
• No provision for integrity
• Recovery of data in the event of hardware &
software failure was omitted
• No provision for shared access
Shared File-based Approach
– Data (files) is shared between different
applications
– Data redundancy problem is alleviated
– Data inconsistency problem across different
versions of the same file is solved
– Other problems:
• Rigid data structure: If applications have to share files, the file structure
that suits one application might not suit another
• Physical data dependency: If the structure of the data file needs to be
changed in some way, this alteration will need to be reflected in all
application programs that use that data file
• No support of concurrency control: While a data file is being processed by
one application, the file will not be available for other applications or for
ad hoc queries

Slide 1-14
Purpose of Database Systems
• The main purpose of database systems is to
manage the data. Consider a university that keeps
the data of students, teachers, courses, books etc.
• To manage this data we need to store this data
somewhere where we can add new data, delete
unused data, update outdated data, retrieve data,
to perform these operations on data we need a
Database management system that allows us to
store the data in such a way so that all these
operations can be performed on the data
efficiently.
Database Approach
• Reasons behind changing database
approach...
– Definition of data was embedded in application
programs, rather than being stored separately and
independently
– No control over access and manipulation of data
beyond that imposed by application programs
• Result:
– The Database and Database Management System
(DBMS).

Slide 1-16
Database Approach

Slide 1-17
DBMS

18
Basic Definitions
• Database: A collection of related data.
• Data: Known facts that can be recorded and have an implicit
meaning.
• Mini-world: Some part of the real world about which data is
stored in a database. For example, student grades and transcripts
at a university.
• Database Management System (DBMS): A software package/
system to facilitate the creation and maintenance of a
computerized database.
• Database System: The DBMS software together with the data
itself. Sometimes, the applications are also included.

Slide 1-19
History of Databases
•1950s and early 1960s:
• Data processing using magnetic tapes for storage
• Tapes provide only sequential access
• Punched cards for input
•Late 1960s and 1970s:
• Hard disks allow direct access to data
•Network and hierarchical data models in widespread use
• Ted Codd defines the relational data model
• High-performance (for the era) transaction processing
•1980s:
• Research relational prototypes evolve into commercial systems
• SQL becomes industrial standard
• Parallel and distributed database systems
• Object-oriented database systems
•1990s:
• Large decision support and data-mining applications
• Large multi-terabyte data warehouses
• Emergence of Web commerce
•2000s:
• XML and XQuery standards
• Automated database administration
Typical DBMS Functionality
• Define a database : in terms of data types, structures and
constraints
• Construct or Load the Database on a secondary storage
medium
• Manipulating the database : querying, generating reports,
insertions, deletions and modifications to its content
• Concurrent Processing and Sharing by a set of users and
programs – yet, keeping all data valid and consistent

Slide 1-21
Typical DBMS Functionality

• Other features:
– Protection or Security measures to prevent
unauthorized access
– “Active” processing to take internal actions on
data
– Presentation and Visualization of data

Slide 1-22
Components of DBMS Environment

• Hardware
– Can range from a PC to a network of computers.
• Software
– DBMS, operating system, network software (if
necessary) and also the application programs.
• Data
– Used by the organization and a description of this data
called the schema.

23
Components of DBMS Environment

• Procedures
– Instructions and rules that should be applied to
the design and use of the database and DBMS.
• People
– Includes database designers, DBAs, application
programmers, and end-users.

24
Two-Tier Client-Server

• Client manages main business and data


processing logic and user interface.
• Server manages and controls access to
database.

25
Three-Tier C-S Architecture

26
Functions of a DBMS

• Data Storage, Retrieval, and Update.


• A User-Accessible Catalog.
• Transaction Support.
• Concurrency Control Services.
• Recovery Services.

27
Functions of a DBMS

• Authorization Services.
• Support for Data Communication.
• Integrity Services.
• Services to Promote Data Independence.
• Utility Services.

28
• Data Integrity-Data integrity is defined as the
maintenance, assurance, accuracy, consistency of data
over its entire life-cycle i.e. throughout its design,
implementation, and usage stages.
• The term data integrity refers to the overall accuracy,
completeness, and reliability of data.
• Data Redundancy-
• Data redundancy occurs when the same piece of data
is stored in two or more separate places. Suppose you
create a database to store sales records, and in the
records for each sale, you enter the customer address.
Yet, you have multiple sales to the same customer so
the same address is entered multiple times. The
address that is repeatedly entered is redundant data.
• Data Consistency-Consistency in 
database systems refers to the requirement that
any given database transaction must change
affected data after every transaction.
• Data consistency means that each user sees a
consistent view of the data, including visible
changes made by the user's own transactions
and transactions of other users.
• Concurrency-Data concurrency means that many
users can access data at the same time.
Advantages of DBMSs
• No redundant data: Redundancy removed by data normalization. No
data duplication saves storage and improves access time.
• Data Consistency and Integrity: As we discussed earlier the root cause of
data inconsistency is data redundancy, since data normalization takes care
of the data redundancy, data inconsistency also been taken care of as part
of it
• Data Security: It is easier to apply access constraints in database systems
so that only authorized user is able to access the data. Each user has a
different set of access thus data is secured from the issues such as identity
theft, data leaks and misuse of data.
• Privacy: Limited access means privacy of data.
• Easy access to data – Database systems manages data in such a way so
that the data is easily accessible with fast response times.
• Easy recovery: Since database systems keeps the backup of data, it is
easier to do a full recovery of data in case of a failure.
• Flexible: Database systems are more flexible than file processing systems.
31
Disadvantages of DBMSs
• DBMS implementation cost is high compared
to the file system
• Complexity: Database systems are complex
to understand
• Performance: Database systems are generic,
making them suitable for various
applications. However this feature affect
their performance for some applications

32
Roles in DBMS Environment
• Data Administrator(DA)
– Overall management of data resource
– Database planning
– Development of policies & Procedures
– Conceptual/Logical database design
– Consults with senior managers
• Database Administrator(DBA)
– Physical database design
– Security
– Integrity Control
– The role of DBA is more technical.
– Detailed knowledge of target DBMS and DBMS environment
Roles in DBMS Environment …….
• Database designers
– Logical database designers…(Answers..what?)
• Identify data(entities & attributes)
• Relationship between the data
• Constraints
• It targets to specific data model such as relational, network or hierarchical
or object oriented.
– Physical database designers…(Answers..How?)
• Mapping of logical design to set of tables
• Integrity Constraints
• Selecting methods for good performance
• Design data security measures
• These all depends on target DBMS
Roles in DBMS Environment …….
• Application Developers
– Develops the required functionality to end users.
– Develops applications for database operations
• End users
– Naïve user:- These users access database through
application programs. They do not have knowledge
about DB.
– Sophisticated User:- These users use a high level query
language such as SQL for DB operations. They do have
DB knowledge.
DBMS Languages
Introduction
• User can access, update, delete, and store data or
information in the database using database
languages.
• The following are the databases languages in the
database management system:
• Data Definition Language(DDL)
• Data Manipulation Language(DML)
• Data Control Language(DCL)
• Transaction Control Language(TCL)
Data Definition Language (DDL)
• Data Definition Language is used for defining the structure
or schema of the database. It is also used for creating
tables, indexes, applying constraints, etc. in the database.
• The main purpose of DDL is to store the information of
metadata like the number of schemas and tables, their
names, indexes, constraints, columns in each table, etc.
• The result of Data Definition Language statements will be
a set of tables which are stored in a special file called data
directory .
• This language is used by the conceptual schema to access
and retrieve the records from/to the database
respectively, where these records describe entities,
relationship, and attributes.
Database Schema
• A database schema is the skeleton structure that
represents the logical view of the entire database.
• It defines how the data is organized and how the
relations among them are associated.
• It formulates all the constraints that are to be applied on
the data.
• A database schema defines its entities and the
relationship among them. It contains a descriptive detail
of the database, which can be depicted by means of
schema diagrams.
• It’s the database designers who design the schema to
help programmers understand the database and make it
useful.
There are following Data Definition
Languages (DDL) Commands:
• Create: This command is used to create a new table or a new
database.
• Alter: This command is used to alter or change the structure of the
database table.
• Drop: This command is used to delete a table, index, or views from
the database.
• Truncate: This command is used to delete the records or data from
the table, but its structure remains as it is.
• Rename: This command is used to rename an object from the
database.
• Comment: This command is used for adding comments to our
table.
Data Manipulation Language (DML)

• A language that offers a set of operations to


support the fundamental data manipulation
operations on the data held in the database.
• Data Manipulation Language (DML)
statements are used to manage data within
schema objects. 
The lists of tasks that come under DML:

• SELECT - It retrieves data from a database


• INSERT - It inserts data into a table
• UPDATE - It updates existing data within a
table
• DELETE - It deletes all records from a table,
the space for the records remain
Data Control language(DCL)
• There are two other forms of database sub-languages. The
Data Control Language (DCL) is used to control privilege in
Databases. To perform any operation in the database, such
as for creating tables, sequences, or views, we need
privileges.
• Privileges are of two types,
• System - creating a session, table, etc. are all types of system
privilege.
• Object - any command or query to work on tables comes
under object privilege.
• DCL is used to define two commands. These are:
• Grant - It gives user access privileges to a database.
• Revoke - It takes back permissions from the user.
Transaction Control statement(TCL)
• Transaction Control statements are used to run the
changes made by DML statements. It allows
statements to be grouped into logical transactions.
• COMMIT - It saves the work done
• SAVEPOINT - It identifies a point in a transaction to
which you can later roll back
• ROLLBACK - It restores the database to original since
the last COMMIT
• SET TRANSACTION - It changes the transaction
options like isolation level and what rollback segment
to use

You might also like