You are on page 1of 48

INFORMATION MANAGEMENT (2013 REGULATION)

UNIT – 1 DATABASE MODELLING,


MANAGEMENT AND DEVELOPMENT

Database design and modelling - Business Rules and


Relationship;

Java database Connectivity (JDBC), Database


connection Manager, Stored Procedures.

Trends in Big Data systems including NoSQL - Hadoop


HDFS, MapReduce, Hive, and enhancements.

1
UNIT - I WEB
INFORMATION MANAGEMENT (2013 SERVERS
REGULATION) , SERVLETS
Database design and modelling :

Data/Information:
 The word Data and Information may look similar.

Different forms of data:


 Letter, Word, Number, Image, Sound etc…

Data Information
Data is raw & unorganized (After processing the data)
form. Information is organized, structured (or) presented in a
given context so as to make it useful.

- Data doesn’t depends on information but information depends on data.

- Data is not specific and does not carry any meaning, Information is specific and meaningful.
UNIT - I WEB
INFORMATION MANAGEMENT (2013 SERVERS
REGULATION) , SERVLETS
For example, consider the following :

Data:
Saranya324917628
Rajkumar476193248
Kamal548429344
Gopal551742186
Latha409723145

Information:

KGISL Institute of Technology

Course: IT Semester: 4

Student Name ID
Saranya 324917628
Rajkumar 476193248
Kamal 548429344
Gopal 551742186
Latha 409723145
INFORMATION MANAGEMENT (2013 REGULATION)

 Data can be Qualitative (or) Quantitative

DATA

Qualitative Quantitative
Ex: “My name is hari”

Discrete [Whole number] Continuous [Within range]


Ex: 5 ,6, 7 Ex: 3.25, 5.68
INFORMATION MANAGEMENT (2013 REGULATION)

File:
A file is a named collection of related information that is residing on secondary
storage.
INFORMATION MANAGEMENT (2013 REGULATION)

What is file management system?

 A file management system is a type of software that manages data files in a computer system.

 It has limited capabilities and is designed to manage individual or group files, such as documents and office
records.

 It may display report details, like owner, creation date, state of completion and similar features useful in an
office environment.

Advantages of file management system:

1. Simpler to use

2. Less expensive

3. Popular FMS’s are packaged along with the operating systems. ( Note pad, Word pad, Microsoft office etc..)

4. Fits the needs of many small businesses and home users.


INFORMATION MANAGEMENT (2013 REGULATION)

What is DBMS?
DATABASE MANAGEMANT SYSTEM
 A database is refers to a collection of related data. and the way of it is organized, access to this data is
usually provided by a database management system.

 DBMS is a collection of program ( or) software application that interact with user and other application
software.
INFORMATION MANAGEMENT (2013 REGULATION)

Some of the very well known DBMS are :


1. Microsoft Access

2. Oracle
3. FoxPro
4. SQLite
5. Firebird
6. Microsoft SQL Server
7. Postgre SQL
8. IBM DB2
9. SAP Sybase
10. R:Base

11. MYSQL

12. Microsoft Azure SQL Database (Cloud base) etc..


INFORMATION MANAGEMENT (2013 REGULATION)

Database Design:

 Database design is a framework that the database uses for Planning, Storing and Managing data in
companies and organizations. (Defines Structure of a database)

Main phases that create database design:

1. Conceptual database design

- This design created from the business rules, The entity relationship diagram is used to
represents the conceptual design. It consists of entities, attributes and relationships.

2. Logical database design

- Here the data is arranged into logical structures and mapped into database management
system tables.

3. Physical database design

- Actual physical implementation of the database in a DBMS, it includes the description of


data features, data types, indexing etc…
INFORMATION MANAGEMENT (2013 REGULATION)

Database Design:

Logical Design

Conceptual Design

Physical Design
INFORMATION MANAGEMENT (2013 REGULATION)

Data Modelling:

 Data models show that how the data is connected and stored in the system.

 It shows the relationship between data.

There are different types of the data models:

1. Flat data model

2. Entity relationship model

3. Relation model

4. Network model

5. Hierarchical model

6. Object oriented data model etc..


INFORMATION MANAGEMENT (2013 REGULATION)

There are different types of the data models:

1. Flat data model

 A flat database is a simple database system in which each database is represented as a single table in which
all of the records are stored as single rows of data, which are separated by delimiters such as tabs or commas.
INFORMATION MANAGEMENT (2013 REGULATION)

There are different types of the data models:

2. Entity relationship model (ER Model)

 ER diagrams help you to explain the logical structure of databases.

 Entity-Relation model is based on the notion of real-world entities and the relationship between them.
INFORMATION MANAGEMENT (2013 REGULATION)

There are different types of the data models:

3. Relation model

 The relational model represents the database as a collection of relations.

 A relation is nothing but a table of values. Every row in the table represents a collection of related
data values.
Key Differences Between E-R Model and Relational Model
 An E-R Model describes the data with entity set, relationship set and attributes. However, the
Relational model describes the data with the tuples, attributes and domain of the attribute.

 E-R Model has Mapping Cardinality as a constraint whereas Relational Model does not have such
constraint.
INFORMATION MANAGEMENT (2013 REGULATION)

There are different types of the data models:

4. Network model

 Network model has the entities which are organized in a graphical representation and some entities in the graph can
be accessed through several paths.
INFORMATION MANAGEMENT (2013 REGULATION)

There are different types of the data models:

5. Hierarchical model

 Hierarchical model has one parent entity with several children entity but at the top we should have only one
entity called root.
INFORMATION MANAGEMENT (2013 REGULATION)

There are different types of the data models:


6. Object oriented data model etc..

 Object oriented data model is based upon real world situations.

 These situations are represented as objects, with different attributes. (Shape, Circle, Rectangle and Triangle are all
objects in this model)

EX:

 Circle has the attributes Center and Radius.

 Rectangle has the attributes Length and Breath


INFORMATION MANAGEMENT (2013 REGULATION)

ER Diagram:

A university registrar’s office maintains data about the following entities: (a) Courses including: number, title, credits, syllabus, and
prerequisites; (b) Course offerings including: course_number, year, semester, section_number, instructor(s), timings, and classroom;
(c) Students including: student-id, name, and program; and (d) Instructors including: identification_number, name, department, and
title. Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be
appropriately modeled. Construct an E-R diagram for the registrar’s office. Document all assumptions that you make about the
mapping constraints.
INFORMATION MANAGEMENT (2013 REGULATION)

Normalization :
 Normalization is the process of organizing data in a database.

 This includes creating tables and establishing relationships between those tables according to rules designed both to
protect the data and to make the database more flexible by eliminating two factors: redundancy and inconsistent
dependency.

 Normalization is the analysis of functional dependencies between attributes.

 It is the process of decomposing relations with anomalies to produce well-structured relations.

 Well-structured relation contains minimal redundancy and allows insertion, modification, and deletion without errors
or inconsistencies.
Purpose of Normalization :

 Normalization allows us to minimize insert, update, and delete anomalies and help maintain data consistency in the

database.
1. To avoid redundancy by storing each fact within the database only once.
2. To put data into the form that is more able to accurately.
3. To avoid certain updating “anomalies”.
4. To facilitate the enforcement of data constraint.
5. To avoid unnecessary coding.
(Extra programming in triggers, stored procedures can be required to handle the non-normalized data and this in turn can
impair performance significantly)
INFORMATION MANAGEMENT (2013 REGULATION)

Different forms of Normalization :


First Normal Form (1NF) :
 A table is in first normal form (1NF) if and only if all columns contain only atomic values;

 Each record must be unique and the order of the rows is irrelevant.
Second Normal Form (2NF) :
 A table is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully
dependent on the primary key.
Third Normal Form (3NF) :
 To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist
within the relation.

 A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the
dependency is through another non-key attribute).
Boyce–Codd Normal Form (BCNF) :
 To be in Boyce–Codd Normal Form (BCNF) the relation must be in 3NF and every determinant must be a
candidate key.
Fifth Normal Form (5NF) :
 The Fifth Normal Form concerns dependencies that are obscure.
INFORMATION MANAGEMENT (2013 REGULATION)

Business Rules and Relationship;

 A business rule is a statement that imposes some form of constraint on a specific aspect of the
database, such as the elements within a field specification for a particular field.

 The business or domain understanding can be provided using the business rules.

Example:

Business Rules: License Inspection Project

 A driver of a vehicle must have a valid driver’s license.

 The driver’s license belongs to the driver.

 The expiry date of the driver’s license is later than the inspection date.

 The physical proof is produced within 24 hours of the inspection date.


INFORMATION MANAGEMENT (2013 REGULATION)

Java database Connectivity (JDBC):


 A Relational database is usually the primary data resource in an enterprise application.

 Java Database Connectivity (JDBC) is an application program interface (API) packaged with the Java SE
edition that makes it possible to standardize and simplify the process of connecting Java applications to
external, relational database systems (RDBMS).

 JDBC can run on different platforms and interact with different DBMS’s and you can send SQL, PL/SQL
statements to almost any relational database.

JDBC API is defined in the following packages: Java.sql and javax.sql

The JDBC supports large number of database:


INFORMATION MANAGEMENT (2013 REGULATION)

The JDBC API supports both two-tier and three-tier processing models:
INFORMATION MANAGEMENT (2013 REGULATION)

JDBC interfaces, classes and components:


INFORMATION MANAGEMENT (2013 REGULATION)

JDBC Architecture consists of two layers −


 JDBC API: This provides the application-to-JDBC Manager connection.

 JDBC Driver API: This supports the JDBC Manager-to-Driver Connection.

 The driver manager is capable of supporting multiple concurrent drivers connected to multiple
heterogeneous databases.
INFORMATION MANAGEMENT (2013 REGULATION)

The JDBC API provides the following interfaces and classes:


 Driver Manager: This class manages a list of database drivers. Connection requests from the java
application with the proper database driver using communication sub protocol. JDBC will be used to
establish a database Connection.

 Driver: This interface handles the communications with the database server. You will interact directly with
Driver objects very rarely. Instead, you use Driver Manager objects, which manages objects of this type.

 Connection: This interface with all methods for contacting a database. The connection object represents
communication context, i.e., all communication with database is through connection object only.

 Statement: You use objects created from this interface to submit the SQL statements to the database. Some
derived interfaces accept parameters in addition to executing stored procedures.

 ResultSet: These objects hold data retrieved from a database after you execute an SQL query using
Statement objects. It acts as an iterator to allow you to move through its data.

 SQLException: This class handles any errors that occur in a database application.
INFORMATION MANAGEMENT (2013 REGULATION)

JDBC drivers are divided into four types or levels:


 JDBC Driver is a software component that enables java application to interact with the database.

The different types of JDBC drivers are:

1. Type 1: JDBC-ODBC Bridge driver (Bridge)

2. Type 2: Native-API/ Partly Java driver (Native)

3. Type 3: All Java/ Net-protocol driver (Middleware)

4. Type 4: All Java/ Native-protocol driver (Pure)


INFORMATION MANAGEMENT (2013 REGULATION)

Steps to connect any java application with the database using JDBC:
INFORMATION MANAGEMENT (2013 REGULATION)

Trends in Big Data systems: Including NoSQL - Hadoop HDFS, MapReduce, Hive, and
enhancements.

Introduction to Big Data –


 Big Data is a term used to describe a collection of data that is huge in size and yet growing
exponentially with time.

 Both in a structured and unstructured format etc.

 It does mean only the large set of data but the data is analyzed for insights that lead to a better
decision and strategic moves.

 Big Data is all about 5 V’s that are Volume, Velocity, Value, Variety and Veracity.

An example:

 Big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting
of billions to trillions of records of millions of people—all from different sources.

(e.g. Web, sales, customer contact center, social media, mobile data and so on).
INFORMATION MANAGEMENT (2013 REGULATION)

What is the need and use of Big Data:


Big data analytics is the process of extracting useful information by analyzing different types of
big data sets. Big data analytics is used to discover hidden patterns, market trends and consumer
preferences, for the benefit of organizational decision making.

Location Tracking: Location analytics - help logistic companies to mitigate risks in transport, improve speed and
reliability in delivery.

Precision Medicine: Analyzing the past records of the patients and the medicines.

Fraud Detection & Handling: Banking and finance sector is using big data to predict and prevent cyber crimes,
card fraud detection, archival of audit trails, etc.

Advertising: Facebook, Google, Twitter or any other online giant, all keep a track of the user behavior and
transactions.

Entertainment & Media: In the field of entertainment and media, big data focuses on targeting people with the right
content at the right time. Based on your past views and your behavior online you will be shown different

recommendations.
INFORMATION MANAGEMENT (2013 REGULATION)

How Companies Use Big Data:


INFORMATION MANAGEMENT (2013 REGULATION)

Characteristics of Big Data:


INFORMATION MANAGEMENT (2013 REGULATION)

Who is using Big Data? 5 Applications


1) Healthcare
 Big Data has already started to create a huge difference in the healthcare sector. With the help of predictive analytics,
medical professionals and HCPs are now able to provide personalized healthcare services to individual patients.
2) Academia
 Big Data is also helping enhance education today. Education is no more limited to the physical bounds of the
classroom. (Udacity, NPTEL etc…)
3) Banking
 The banking sector relies on Big Data for fraud detection. Big Data tools can efficiently detect fraudulent acts in real-
time such as misuse of credit/debit cards, archival of inspection tracks, faulty alteration in customer stats, etc.
4) Manufacturing
 According to TCS Global Trend Study, the most significant benefit of Big Data in manufacturing is improving the
supply strategies and product quality.
5) IT
 One of the largest users of Big Data, IT companies around the world are using Big Data to optimize their functioning,
enhance employee productivity, and minimize risks in business operations. By combining Big Data technologies
with ML and AI, the IT sector is continually powering innovation to find solutions even for the most complex
of problems.
INFORMATION MANAGEMENT (2013 REGULATION)

NoSQL: (Distributed Database)


 A NoSQL (Not Only SQL) database provides a mechanism for storage and retrieval of data that is modeled
in means other than the tabular relations used in relational databases.

 A NoSQL database does not necessarily follow the strict rules that govern transactions in relational
databases. These violated rules are known by the acronym ACID (Atomicity, Consistency, Integrity,
Durability).

For example, NoSQL databases do not use fixed schema structures and SQL joins.

 A NoSQL databases are increasingly used in big data and real-time web applications.

For example, Companies like Twitter, Facebook, Google that collect terabytes of user data every single day.

Why NoSQL:

 The system response time becomes slow when you use RDBMS for massive volumes of data, To resolve
this problem, we could "scale up" our systems by upgrading our existing hardware. This
process is expensive.

 Alternatively, for this issue is to distribute database load on multiple hosts whenever the
load increases. This method is known as "scaling out."
INFORMATION MANAGEMENT (2013 REGULATION)

Brief History of NoSQL Databases


 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational database.

 2000- Graph database Neo4j is launched.

 2004- Google BigTable is launched.

 2005- CouchDB is launched.

 2007- The research paper on Amazon Dynamo is released.

 2008- Facebooks open sources the Cassandra project.

 2009- The term NoSQL was reintroduced.


Features of NoSQL
 Non-relational

 NoSQL databases never follow the relational model and Never provide tables with flat fixed-column records

 Work with self-contained aggregates or BLOBs

 Doesn't require object-relational mapping and data normalization

 No complex features like query languages, query planners, referential integrity joins, ACID

 Schema-free

 NoSQL databases are either schema-free or have relaxed schemas

 Do not require any sort of definition of the schema of the data

 Offers heterogeneous structures of data in the same domain


INFORMATION MANAGEMENT (2013 REGULATION)

Features of NoSQL
 Simple API

 Offers easy to use interfaces for storage and querying data provided

 APIs allow low-level data manipulation & selection methods

 Text-based protocols mostly used with HTTP REST with JSON

 Mostly used no standard based query language

 Web-enabled databases running as internet-facing services

 Distributed

 Multiple NoSQL databases can be executed in a distributed fashion

 Offers auto-scaling and fail-over capabilities

 Often ACID concept can be sacrificed for scalability and throughput

 Mostly no synchronous replication between distributed nodes Asynchronous Multi-Master Replication,


peer-to-peer, HDFS Replication

 Only providing eventual consistency

 Shared Nothing Architecture. This enables less coordination and higher distribution.
INFORMATION MANAGEMENT (2013 REGULATION)

Types of NoSQL Databases:


 There are mainly four categories of NoSQL databases.

 Each of these categories has its unique attributes and limitations.

 No specific database is better to solve all problems. You should select a database based on your product needs.

Four Types:
 Key-value Pair Based

 Document-oriented

 Column-oriented

 Graphs based
INFORMATION MANAGEMENT (2013 REGULATION)

List of NoSQL Databases:


 Key-value Pair Based

 Aerospike, Apache Ignite, ArangoDB, Berkeley DB, Couchbase, Dynamo, FoundationDB,


InfinityDB, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak,
SciDB, SDBM/Flat File dbm, ZooKeeper

 Document-oriented

 Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM


Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB

 Column-oriented
 Accumulo, Cassandra, Scylla, Apache Druid, HBase, Vertica.
 Graphs based

 AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB,


Virtuoso
INFORMATION MANAGEMENT (2013 REGULATION)

Key-value Pair Based: (Basic type of NoSQL database)

 Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load. a key is required to retrieve and update data.

 Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.

 For example, a key-value pair may contain a key like "Website" associated with a value like
“google.com".
INFORMATION MANAGEMENT (2013 REGULATION)

Column-based:

 Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored
contiguously.

 They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.

 Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs etc…
INFORMATION MANAGEMENT (2013 REGULATION)

Document-Oriented:

 Document-Oriented NoSQL databases replace the familiar rows and columns structure with a
document storage model. stores and retrieves data as a key value pair but the value part is
stored as a document.

 Each document is structured, frequently using the JavaScript Object Notation (JSON) model.
The document data model is associated with object-oriented programming where each
document is an object.

 The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
INFORMATION MANAGEMENT (2013 REGULATION)

Graph Based:

 A graph type database stores entities as well the relations amongst those entities. The entity
is stored as a node with the relationship as edges.

 An edge gives a relationship between nodes. Every node and edge has a unique identifier.

 Graph base database mostly used for social networks, logistics, spatial data.
INFORMATION MANAGEMENT (2013 REGULATION)

Advantages of NoSQL:
 Can be used as Primary or Analytic Data Source

 Big Data Capability

 No Single Point of Failure

 Easy Replication

 No Need for Separate Caching Layer

 It provides fast performance and horizontal scalability.

 Can handle structured, semi-structured, and unstructured data with equal effect

 Object-oriented programming which is easy to use and flexible

 NoSQL databases don't need a dedicated high-performance server

 Support Key Developer Languages and Platforms

 Simple to implement than using RDBMS

 It can serve as the primary data source for online applications.

 Handles big data which manages data velocity, variety, volume, and complexity

 Excels at distributed database and multi-data center operations

 Eliminates the need for a specific caching layer to store data

 Offers a flexible schema design which can easily be altered without downtime or service disruption
INFORMATION MANAGEMENT (2013 REGULATION)

Disadvantages of NoSQL:
 No standardization rules

 Limited query capabilities

 RDBMS databases and tools are comparatively mature

 It does not offer any traditional database capabilities, like consistency when multiple transactions are
performed simultaneously.

 When the volume of data increases it is difficult to maintain unique values as keys become difficult

 Doesn't work as well with relational data

 The learning curve is stiff for new developers

 Open source options so not so popular for enterprises.


INFORMATION MANAGEMENT (2013 REGULATION)

Hadoop:

 Hadoop is an open source distributed processing framework that manages data processing and
storage for big data applications running in clustered systems, you can process it parallely.

 Hadoop is not a programming language.

 Term "Hadoop" is commonly used for all ecosystem which runs on HDFS.

Hadoop provide a solution to the Big Data:


INFORMATION MANAGEMENT (2013 REGULATION)

Hadoop:

 Hadoop was started by Doug Cutting to support two of his other well known projects, Lucene and
Nutch

 Hadoop has been inspired by Google's File System (GFS) which was detailed in a paper by released
by Google in 2003

 Hadoop, originally called Nutch Distributed File System (NDFS) split from Nutch in 2006 to
become a sub-project of Lucene. At this point it was renamed to Hadoop.
INFORMATION MANAGEMENT (2013 REGULATION)

Why is Hadoop important?

 Ability to store and process huge amounts of any kind of data, quickly: With data volumes and varieties
constantly increasing, especially from social media and the Internet of Things (IoT), that's a key consideration.
 Computing power. Hadoop's distributed computing model processes big data fast. The more computing nodes
you use, the more processing power you have.
 Fault tolerance. Data and application processing are protected against hardware failure. If a node goes down,
jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple
copies of all data are stored automatically.

 Flexibility. Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can
store as much data as you want and decide how to use it later. That includes unstructured data like text, images
and videos.
 Low cost. The open-source framework is free and uses commodity hardware to store large quantities of data.
 Scalability. You can easily grow your system to handle more data simply by adding nodes. Little administration is
required.
INFORMATION MANAGEMENT (2013 REGULATION)

Hadoop ECO System:

https://data-flair.training/blogs/hadoop-ecosystem-components/

You might also like