Professional Documents
Culture Documents
Nformation Anagement Egulation: Unit - 1 Database Modelling, Management and Development
Nformation Anagement Egulation: Unit - 1 Database Modelling, Management and Development
1
UNIT - I WEB
INFORMATION MANAGEMENT (2013 SERVERS
REGULATION) , SERVLETS
Database design and modelling :
Data/Information:
The word Data and Information may look similar.
Data Information
Data is raw & unorganized (After processing the data)
form. Information is organized, structured (or) presented in a
given context so as to make it useful.
- Data is not specific and does not carry any meaning, Information is specific and meaningful.
UNIT - I WEB
INFORMATION MANAGEMENT (2013 SERVERS
REGULATION) , SERVLETS
For example, consider the following :
Data:
Saranya324917628
Rajkumar476193248
Kamal548429344
Gopal551742186
Latha409723145
Information:
Course: IT Semester: 4
Student Name ID
Saranya 324917628
Rajkumar 476193248
Kamal 548429344
Gopal 551742186
Latha 409723145
INFORMATION MANAGEMENT (2013 REGULATION)
DATA
Qualitative Quantitative
Ex: “My name is hari”
File:
A file is a named collection of related information that is residing on secondary
storage.
INFORMATION MANAGEMENT (2013 REGULATION)
A file management system is a type of software that manages data files in a computer system.
It has limited capabilities and is designed to manage individual or group files, such as documents and office
records.
It may display report details, like owner, creation date, state of completion and similar features useful in an
office environment.
1. Simpler to use
2. Less expensive
3. Popular FMS’s are packaged along with the operating systems. ( Note pad, Word pad, Microsoft office etc..)
What is DBMS?
DATABASE MANAGEMANT SYSTEM
A database is refers to a collection of related data. and the way of it is organized, access to this data is
usually provided by a database management system.
DBMS is a collection of program ( or) software application that interact with user and other application
software.
INFORMATION MANAGEMENT (2013 REGULATION)
2. Oracle
3. FoxPro
4. SQLite
5. Firebird
6. Microsoft SQL Server
7. Postgre SQL
8. IBM DB2
9. SAP Sybase
10. R:Base
11. MYSQL
Database Design:
Database design is a framework that the database uses for Planning, Storing and Managing data in
companies and organizations. (Defines Structure of a database)
- This design created from the business rules, The entity relationship diagram is used to
represents the conceptual design. It consists of entities, attributes and relationships.
- Here the data is arranged into logical structures and mapped into database management
system tables.
Database Design:
Logical Design
Conceptual Design
Physical Design
INFORMATION MANAGEMENT (2013 REGULATION)
Data Modelling:
Data models show that how the data is connected and stored in the system.
3. Relation model
4. Network model
5. Hierarchical model
A flat database is a simple database system in which each database is represented as a single table in which
all of the records are stored as single rows of data, which are separated by delimiters such as tabs or commas.
INFORMATION MANAGEMENT (2013 REGULATION)
Entity-Relation model is based on the notion of real-world entities and the relationship between them.
INFORMATION MANAGEMENT (2013 REGULATION)
3. Relation model
A relation is nothing but a table of values. Every row in the table represents a collection of related
data values.
Key Differences Between E-R Model and Relational Model
An E-R Model describes the data with entity set, relationship set and attributes. However, the
Relational model describes the data with the tuples, attributes and domain of the attribute.
E-R Model has Mapping Cardinality as a constraint whereas Relational Model does not have such
constraint.
INFORMATION MANAGEMENT (2013 REGULATION)
4. Network model
Network model has the entities which are organized in a graphical representation and some entities in the graph can
be accessed through several paths.
INFORMATION MANAGEMENT (2013 REGULATION)
5. Hierarchical model
Hierarchical model has one parent entity with several children entity but at the top we should have only one
entity called root.
INFORMATION MANAGEMENT (2013 REGULATION)
These situations are represented as objects, with different attributes. (Shape, Circle, Rectangle and Triangle are all
objects in this model)
EX:
ER Diagram:
A university registrar’s office maintains data about the following entities: (a) Courses including: number, title, credits, syllabus, and
prerequisites; (b) Course offerings including: course_number, year, semester, section_number, instructor(s), timings, and classroom;
(c) Students including: student-id, name, and program; and (d) Instructors including: identification_number, name, department, and
title. Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be
appropriately modeled. Construct an E-R diagram for the registrar’s office. Document all assumptions that you make about the
mapping constraints.
INFORMATION MANAGEMENT (2013 REGULATION)
Normalization :
Normalization is the process of organizing data in a database.
This includes creating tables and establishing relationships between those tables according to rules designed both to
protect the data and to make the database more flexible by eliminating two factors: redundancy and inconsistent
dependency.
Well-structured relation contains minimal redundancy and allows insertion, modification, and deletion without errors
or inconsistencies.
Purpose of Normalization :
Normalization allows us to minimize insert, update, and delete anomalies and help maintain data consistency in the
database.
1. To avoid redundancy by storing each fact within the database only once.
2. To put data into the form that is more able to accurately.
3. To avoid certain updating “anomalies”.
4. To facilitate the enforcement of data constraint.
5. To avoid unnecessary coding.
(Extra programming in triggers, stored procedures can be required to handle the non-normalized data and this in turn can
impair performance significantly)
INFORMATION MANAGEMENT (2013 REGULATION)
Each record must be unique and the order of the rows is irrelevant.
Second Normal Form (2NF) :
A table is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully
dependent on the primary key.
Third Normal Form (3NF) :
To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist
within the relation.
A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the
dependency is through another non-key attribute).
Boyce–Codd Normal Form (BCNF) :
To be in Boyce–Codd Normal Form (BCNF) the relation must be in 3NF and every determinant must be a
candidate key.
Fifth Normal Form (5NF) :
The Fifth Normal Form concerns dependencies that are obscure.
INFORMATION MANAGEMENT (2013 REGULATION)
A business rule is a statement that imposes some form of constraint on a specific aspect of the
database, such as the elements within a field specification for a particular field.
The business or domain understanding can be provided using the business rules.
Example:
The expiry date of the driver’s license is later than the inspection date.
Java Database Connectivity (JDBC) is an application program interface (API) packaged with the Java SE
edition that makes it possible to standardize and simplify the process of connecting Java applications to
external, relational database systems (RDBMS).
JDBC can run on different platforms and interact with different DBMS’s and you can send SQL, PL/SQL
statements to almost any relational database.
The JDBC API supports both two-tier and three-tier processing models:
INFORMATION MANAGEMENT (2013 REGULATION)
The driver manager is capable of supporting multiple concurrent drivers connected to multiple
heterogeneous databases.
INFORMATION MANAGEMENT (2013 REGULATION)
Driver: This interface handles the communications with the database server. You will interact directly with
Driver objects very rarely. Instead, you use Driver Manager objects, which manages objects of this type.
Connection: This interface with all methods for contacting a database. The connection object represents
communication context, i.e., all communication with database is through connection object only.
Statement: You use objects created from this interface to submit the SQL statements to the database. Some
derived interfaces accept parameters in addition to executing stored procedures.
ResultSet: These objects hold data retrieved from a database after you execute an SQL query using
Statement objects. It acts as an iterator to allow you to move through its data.
SQLException: This class handles any errors that occur in a database application.
INFORMATION MANAGEMENT (2013 REGULATION)
Steps to connect any java application with the database using JDBC:
INFORMATION MANAGEMENT (2013 REGULATION)
Trends in Big Data systems: Including NoSQL - Hadoop HDFS, MapReduce, Hive, and
enhancements.
It does mean only the large set of data but the data is analyzed for insights that lead to a better
decision and strategic moves.
Big Data is all about 5 V’s that are Volume, Velocity, Value, Variety and Veracity.
An example:
Big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting
of billions to trillions of records of millions of people—all from different sources.
(e.g. Web, sales, customer contact center, social media, mobile data and so on).
INFORMATION MANAGEMENT (2013 REGULATION)
Location Tracking: Location analytics - help logistic companies to mitigate risks in transport, improve speed and
reliability in delivery.
Precision Medicine: Analyzing the past records of the patients and the medicines.
Fraud Detection & Handling: Banking and finance sector is using big data to predict and prevent cyber crimes,
card fraud detection, archival of audit trails, etc.
Advertising: Facebook, Google, Twitter or any other online giant, all keep a track of the user behavior and
transactions.
Entertainment & Media: In the field of entertainment and media, big data focuses on targeting people with the right
content at the right time. Based on your past views and your behavior online you will be shown different
recommendations.
INFORMATION MANAGEMENT (2013 REGULATION)
A NoSQL database does not necessarily follow the strict rules that govern transactions in relational
databases. These violated rules are known by the acronym ACID (Atomicity, Consistency, Integrity,
Durability).
For example, NoSQL databases do not use fixed schema structures and SQL joins.
A NoSQL databases are increasingly used in big data and real-time web applications.
For example, Companies like Twitter, Facebook, Google that collect terabytes of user data every single day.
Why NoSQL:
The system response time becomes slow when you use RDBMS for massive volumes of data, To resolve
this problem, we could "scale up" our systems by upgrading our existing hardware. This
process is expensive.
Alternatively, for this issue is to distribute database load on multiple hosts whenever the
load increases. This method is known as "scaling out."
INFORMATION MANAGEMENT (2013 REGULATION)
NoSQL databases never follow the relational model and Never provide tables with flat fixed-column records
No complex features like query languages, query planners, referential integrity joins, ACID
Schema-free
Features of NoSQL
Simple API
Offers easy to use interfaces for storage and querying data provided
Distributed
Shared Nothing Architecture. This enables less coordination and higher distribution.
INFORMATION MANAGEMENT (2013 REGULATION)
No specific database is better to solve all problems. You should select a database based on your product needs.
Four Types:
Key-value Pair Based
Document-oriented
Column-oriented
Graphs based
INFORMATION MANAGEMENT (2013 REGULATION)
Document-oriented
Column-oriented
Accumulo, Cassandra, Scylla, Apache Druid, HBase, Vertica.
Graphs based
Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load. a key is required to retrieve and update data.
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like "Website" associated with a value like
“google.com".
INFORMATION MANAGEMENT (2013 REGULATION)
Column-based:
Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored
contiguously.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs etc…
INFORMATION MANAGEMENT (2013 REGULATION)
Document-Oriented:
Document-Oriented NoSQL databases replace the familiar rows and columns structure with a
document storage model. stores and retrieves data as a key value pair but the value part is
stored as a document.
Each document is structured, frequently using the JavaScript Object Notation (JSON) model.
The document data model is associated with object-oriented programming where each
document is an object.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
INFORMATION MANAGEMENT (2013 REGULATION)
Graph Based:
A graph type database stores entities as well the relations amongst those entities. The entity
is stored as a node with the relationship as edges.
An edge gives a relationship between nodes. Every node and edge has a unique identifier.
Graph base database mostly used for social networks, logistics, spatial data.
INFORMATION MANAGEMENT (2013 REGULATION)
Advantages of NoSQL:
Can be used as Primary or Analytic Data Source
Easy Replication
Can handle structured, semi-structured, and unstructured data with equal effect
Handles big data which manages data velocity, variety, volume, and complexity
Offers a flexible schema design which can easily be altered without downtime or service disruption
INFORMATION MANAGEMENT (2013 REGULATION)
Disadvantages of NoSQL:
No standardization rules
It does not offer any traditional database capabilities, like consistency when multiple transactions are
performed simultaneously.
When the volume of data increases it is difficult to maintain unique values as keys become difficult
Hadoop:
Hadoop is an open source distributed processing framework that manages data processing and
storage for big data applications running in clustered systems, you can process it parallely.
Term "Hadoop" is commonly used for all ecosystem which runs on HDFS.
Hadoop:
Hadoop was started by Doug Cutting to support two of his other well known projects, Lucene and
Nutch
Hadoop has been inspired by Google's File System (GFS) which was detailed in a paper by released
by Google in 2003
Hadoop, originally called Nutch Distributed File System (NDFS) split from Nutch in 2006 to
become a sub-project of Lucene. At this point it was renamed to Hadoop.
INFORMATION MANAGEMENT (2013 REGULATION)
Ability to store and process huge amounts of any kind of data, quickly: With data volumes and varieties
constantly increasing, especially from social media and the Internet of Things (IoT), that's a key consideration.
Computing power. Hadoop's distributed computing model processes big data fast. The more computing nodes
you use, the more processing power you have.
Fault tolerance. Data and application processing are protected against hardware failure. If a node goes down,
jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple
copies of all data are stored automatically.
Flexibility. Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can
store as much data as you want and decide how to use it later. That includes unstructured data like text, images
and videos.
Low cost. The open-source framework is free and uses commodity hardware to store large quantities of data.
Scalability. You can easily grow your system to handle more data simply by adding nodes. Little administration is
required.
INFORMATION MANAGEMENT (2013 REGULATION)
https://data-flair.training/blogs/hadoop-ecosystem-components/