Professional Documents
Culture Documents
AN INTRODUCTION TO DATABASES
1
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
2
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
3
1.1 DATA STORAGE
Data storage is the recording (storing) of information (data) in a storage
medium.
Handwriting, phonographic recording, magnetic tape, and optical discs are all
examples of storage media.
Electronic documents can be stored in much less space than paper documents.
Barcodes and magnetic ink character recognition (MICR) are two ways of
recording machine-readable data on paper.
4
1.1 DATA STORAGE
Data, information, knowledge, and wisdom are closely related concepts, but each has its role concerning
the other, and each term has its meaning.
5
1.1 DATA STORAGE
Data is often assumed to be the least abstract concept, information the next
least, and knowledge the most abstract.
e.g.,
❏ the height of Mount Everest is generally considered "data",
❏ a book on Mount Everest geological characteristics may be
considered "information",
❏ and a climber's guidebook containing practical information on the
best way to reach Mount Everest's peak may be considered
"knowledge".
❏ The practical climbing of Mount Everest's peak based on this
knowledge may be seen as "wisdom".
6
1.1 DATA STORAGE
Biological data Data maintenance Data integrity
Computer data processing Data management Data warehouse
Computer memory Data mining Database
Dark data Data modeling Datasheet
Data acquisition Data point Environmental data
Data analysis Data preservation rescue
Data bank Data protection Fieldwork
Data cable Data publication Information engineering
Data curation Data remanence Machine learning
Data domain Data science Open data
Data element Data set Scientific data archiving
Data farming Data structure Secondary Data
Data governance Data visualization Statistics
7
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
❏ Text files
A text file (also called ASCII files) stores information in ASCII characters.
A text file contains human-readable characters. A user can read the contents of a text file or edit it using a text editor.
In text files, each line of text is terminated, (delimited) with a special character known as EOL (End of Line) character.
Examples of text files: A text document (often .txt)
❏ Binary files
A binary file is a file that contains information in the same format in which the information is held in memory. In binary
file, there is no delimiter for a line. Also no translations occur in binary files. As a result, binary files are faster and easier
for a program to read and write than the text files.
As long as the file doesn't need to be read or need to be ported to a different type of system, binary files are the best
way to store program information. Examples of binary files: A JPEG image (.jpg or .jpeg)
8
1.2 DATA FILE
Data file categories
Examples of open data files include CSV, XLS and XML formats such as HTML for storing web pages or SVG for storing scalable graphics
9
1.2 DATA FILE
Serialization
This process of serializing an object is also called marshalling an object in some situations.
The opposite operation, extracting a data structure from a series of bytes, is deserialization, (also called unserialization or
unmarshalling).
10
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
1.3 DATABASE
In computing, a database is an organized collection of data
stored and accessed electronically. Small databases can be
stored on a file system, while large databases are hosted on
computer clusters or cloud storage.
11
1.3 DATABASE
History (1/2)
12
1.3 DATABASE
History (2/2)
14
1.3 DATABASE
Storage
Databases as digital objects contain three layers of information which must be stored: the data, the structure, and the
semantics.
Proper storage of all three layers is needed for future preservation and longevity of the database.Putting data into
permanent storage is generally the responsibility of the database engine a.k.a. "storage engine".
15
1.3 DATABASE
Security (1/2)
❏ Access control
❏ Auditing
1.3 DATABASE ❏
❏
Authentication
Encryption
❏ Integrity controls
Security (2/2) ❏ Backups
❏ Application security
❏ Unauthorized or unintended activity or misuse by authorized database users, database administrators, or network/systems managers, or
by unauthorized users or hackers
❏ Malware infections causing incidents such as unauthorized access, leakage or disclosure of personal or proprietary data, deletion of or
damage to the data or programs, interruption or denial of authorized access to the database, attacks on other systems.
❏ Overloads, performance constraints and capacity issues resulting in the inability of authorized users to use databases as intended;
❏ Physical damage to database servers caused by computer room fires or floods, overheating, lightning, accidental liquid spills, static
discharge, electronic breakdowns/equipment failures and obsolescence;
❏ Design flaws and programming bugs in databases and the associated programs and systems, creating various security vulnerabilities (e.g.
unauthorized privilege escalation), data loss/corruption, performance degradation etc.;
❏ Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in database or system administration processes,
sabotage/criminal damage etc.
17
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
Examples of DBMS's include MySQL, PostgreSQL, Microsoft SQL Server, Oracle Database, and Microsoft Access.
18
1.4 DATABASE MANAGEMENT SYSTEMS
The functionality provided by a DBMS can vary enormously. The core
functionality is the storage, retrieval and update of data.
The core part of the DBMS interacting between the database and the application interface sometimes referred to as the
database engine.
Often DBMSs will have configuration parameters that can be statically and dynamically tuned, for example the maximum
amount of main memory on a server the database can use. The trend is to minimize the amount of manual configuration, and for
cases such as embedded databases the need to target zero-administration is paramount.
❏ A general-purpose DBMS will provide public application programming interfaces (API) and optionally a processor for
database languages such as SQL to allow applications to be written to interact with and manipulate the database.
❏ A special purpose DBMS may use a private API and be specifically customized and linked to a single application.
For example, an email system performs many of the functions of a general-purpose DBMS such as message insertion, message deletion, attachment handling,
blocklist lookup, associating messages an email address and so forth however these functions are limited to what is required to handle email.
20
1.4 DATABASE MANAGEMENT SYSTEMS
It is also generally to be expected the DBMS will provide a set of utilities for such purposes as may be necessary to administer
the database effectively, including import, export, monitoring, defragmentation and analysis utilities.
The core part of the DBMS interacting between the database and the application interface sometimes referred to as the
database engine.
Often DBMSs will have configuration parameters that can be statically and dynamically tuned, for example the maximum
amount of main memory on a server the database can use. The trend is to minimize the amount of manual configuration, and for
cases such as embedded databases the need to target zero-administration is paramount.
❏ A general-purpose DBMS will provide public application programming interfaces (API) and optionally a processor for
database languages such as SQL to allow applications to be written to interact with and manipulate the database.
❏ A special purpose DBMS may use a private API and be specifically customized and linked to a single application.
For example, an email system performs many of the functions of a general-purpose DBMS such as message insertion, message deletion, attachment handling,
blocklist lookup, associating messages an email address and so forth however these functions are limited to what is required to handle email.
21
1.4 DATABASE MANAGEMENT SYSTEMS
RELATIONAL NO RELATIONAL
22
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
23
1.5 RELATIONAL DATABASE
In a relational database, a relation is a set of tuples that have the same attributes.
A tuple usually represents an object and information about that object. Objects are typically physical objects or concepts.
A relation is usually described as a table, which is organized into rows and columns.
All the data referenced by an attribute are in the same domain and conform to the same constraints.
The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the
attributes.
Relations can be modified using the insert, delete, and update operators.
New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting.
Tuples by definition are unique. If the tuple contains a candidate or primary key then obviously it is unique; however, a primary key need not
be defined for a row or record to be a tuple. The definition of a tuple requires that it be unique, but does not require a primary key to be
defined. Because a tuple is unique, its attributes by definition constitute a superkey.
24
1.5 RELATIONAL DATABASE
Domain
Constraints
25
1.5 RELATIONAL DATABASE
Domain Constraints
Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation. Since every
attribute has an associated domain, there are constraints (domain constraints). The two principal rules for the relational model
are known as entity integrity and referential integrity.
Domain Constraints are user-defined columns that help the user to enter the value according to the data type. And if it
encounters a wrong input it gives the message to the user that the column is not fulfilled properly. Or in other words, it is an
attribute that specifies all the possible values that the attribute can hold like integer, character, date, time, string, etc. It defines the
domain or the set of values for an attribute and ensures that the value taken by the attribute must be an atomic value(Can’t be
divided) from its domain.
26
1.5 RELATIONAL DATABASE
Primary key
Every relation/table has a primary key, this being a consequence of a relation being a set.
A primary key uniquely specifies a tuple within a table. While natural attributes (attributes
used to describe the data being entered) are sometimes good primary keys, surrogate keys
are often used instead.
A surrogate key is an artificial attribute assigned to an object which uniquely identifies it (for
instance, in a table of information about students at a school they might all be assigned a
student ID in order to differentiate them).
The surrogate key has no intrinsic (inherent) meaning, but rather is useful through its ability
to uniquely identify a tuple.
A composite key is a key made up of two or more attributes within a table that (together)
uniquely identify a record.
27
1.5 RELATIONAL DATABASE
Foreign key
The concept is described formally as: "For all tuples in the referencing relation
projected over the referencing attributes, there must exist a tuple in the
referenced relation projected over those same attributes such that the values
in each of the referencing attributes match the corresponding values in the
referenced attributes." 28
1.5 RELATIONAL DATABASE
Normalization
29
1.5 RELATIONAL DATABASE
Database Normalization With Examples
Database Normalization Example can be easily understood with the help of a case study. Assume, a video library maintains a
database of movies rented out. Without any normalization in database, all information is stored in one table as shown below. Let’s
understand Normalization database with normalization example with solution:
30
Here you see Movies Rented column has multiple values. Now let’s move into 1st Normal Forms >
1.5 RELATIONAL DATABASE
1NF (First Normal Form) Rules
31
It is clear that we can’t move forward to make our
1.5 RELATIONAL DATABASE simple database in 2nd Normalization form unless we
partition the table above.
2NF (Second Normal Form) Rules
We have divided our 1NF table into two tables viz.
❏ Be in 1NF Table 1 and Table2. Table 1 contains member
❏ Single Column Primary Key that does not functionally information. Table 2 contains information on movies
dependant on any subset of candidate key relation. rented.
❏ Be in 2NF
❏ Has no transitive functional dependencies
A transitive functional dependency is when changing a non-key column, might cause any of the other non-key columns to change
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
34
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 3
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
The CAP Theorem is comprised of three components (hence its name) as they
relate to distributed data stores:
In normal operations, your data store provides all three functions. But the CAP theorem maintains that
when a distributed database experiences a network failure, you can provide either consistency or
availability.
35
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
Modern NoSQL databases have been designed for the cloud, making them
naturally good for horizontal scaling where a lot of smaller servers can be spun
up to handle increased load.
36
1.7 NON-RELATIONAL vs RELATIONAL DATABASE
37
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA
Let's consider an example of storing information about a user and their hobbies. We need to store a user's first name, last name,
cell phone number, city, and hobbies.
In a relational database, we'd likely create two tables: one for Users and one for Hobbies.
38
#1 DATA STORAGE #2 DATA FILES #3 DATABASES #4 DBMS #5 RELATIONAL #6 CAP #7 NON-RELATIONAL #8 RELATIONAL VS NON-RELATIONAL #9 BIG DATA