You are on page 1of 35

Designing databases

2.021 – UAB – DACSO


Pere Pons
Intuitive approach
When computer science was born (about 1960), they realized that a
tool was needed.. a tool to:

• Store information and be sure that it is not lost


• Search for information that is stored
• It has to be “easy to use”, because the interest was in the information,
not the tool itself.

Here is when databases were born…


A bit of history
• 1960 - Navigational databases. Main idea: there was a pointer in the data registries to the following ones so
that you could navigate through the information. IBM created IMS, among others.
• Worked with disks
• Used physical addresses of where the information was stored.
• Generally used for batch processing (not interactive)
• Used in airlines companies, Apollo program, etc.. only big players.

• 1970 – Birth of the relational databases. Main ideas: split the data in tables, creation of primary keys,
relations between tables, and queries to be executed in the information structure.
• Mathematics concepts were used to define databases
• New concepts were born: table, keys, queries, etc..
• The databases where searchable (yes! before they were not!!)
• SQL language appeared.
• Pointers to disc address were not used any more.
• Query optimization.
• Not so big players.

• 1980 – Desktop databases: dBase. Databases for everyone!


• 1990 – Object oriented databases.
• 2000 – NoSQL databases.
Evolution..??..
So what is a database?

Is a set of tools that will help you to


structure, store, retrieve, organize and keep
safe your information.
Examples of databases
• Library files
• Image databases
• Client’s data (personal data, invoices,
consumption, preferences, etc)
• F1 telemetry data
• Objects in space (stars, planets, debris…)
• User profile in Internet (more than 5.000
characteristic per person in USA!!!)
• etc..
Types of databases
As far as we get close to our needs, we realize that one name implies
different concepts:
• Fast write access
• Specific search
• Ultra fast read process
• Library files •

Fault tolerant
Only floating point numbers
• Binary files •

Many writes at once
etc…
• Json documents

Technologies
• User status in online games
• Series of telemetry data
• etc.. •

Relational databases
NoSQL databases
• Numerical Series databases
• In memory databases
• etc...
Market of databases
We will focus on:

Relational databases
• Manage the relation of data stored
• Widely used in market
• High level of standardization

We will use PostgreSQL and a tool named Dbeaver, both opensouce.


https://www.postgresql.org/
https://dbeaver.io/
How will we learn and try it?

You have to download and install:

• VirtualBox http://www.virtualbox.org
• Install Virtualbox .
• And the extension pack also.

• A virtual machine with the distribution of Kubuntu with all the software and
databases ready to use. See https://cv.uab.cat
• You have to downloand and import the OVA to your virtualbox environment.
DBMS ACID Model
ACID model of Databases provide
strong consistency view

• Atomicity: All transactions proceed or


Concept of transaction: group of
fail. “All or nothing” operations where Atomicity is
• Consistency: Only valid data can be applied.
part of the database
• Isolation: Any concurrent execution of
transactions will produce the same
result as generating them one after
the other
• Durability: Once a transaction is
committed, it will remain so even after
any error or problem
Structure of a system with a database
DBMS responsibilities:
May be in any
• Creating the database: Manage low level file management.
Application language
• Providing query and update facilities: Methods for querying and inserting (java, c#,
data. (any language) C/C++, etc..)
• Multitasking: allow applications to access data simultaneously.
• Maintaining an audit trail: Being able to keep a log of changes in the
Database driver
database.
• Manage the security of the database: Define and manage users and their There is a language for queries (SQL)
permissions.
• Maintain the referential integrity. DBMS The engine
(Data Base for managing
Management System) data is
separated
DBMS: Data Base Management System (“The database”) from data. Is
reusable and
provides you with: Data provides
• a Data Definition Language. many
(stored in disks) utilities.
• a Data Manipulation Language.
• a Data Control Language.
How are we going to use it?

Create the Insert data Use the


structure of into the information
the database database stored

We will use the Data


We will use the Data Manipulation Language
Definition Language
Relational Databases: structure of a database
Pets

• Elements of a database schema: pet_id pet_name Pet_type Pet_owner Date_of_bir Vaccinated Keeper
th
• Tables: Set of records that all of them have the 1 Kily Cat Ms Jones 8/5/2016 Yes Mr Sanchez
same “columns”
• Record: Set of values (one per column) that have 2 Boot Dog Mr Martins 18/7/2006 Yes Ms Smith
a relation together.
• Columns or fields: Set of pieces of information, 3 Marumba Elephant Ms Jones 9/4/1999 Yes Mr. Malone

that have the same type and the same meaning.


4 Nick Dog Mrs Gregson 3/4/2018 No Ms Smith
• Keys: column, or set of columns, whose value
represent a unique register. It can not be 5 Wally Dog Mr Cooper 7/12/2019 Yes Ms Smith
reapeated in the table.

• Each column will have:


• A name,
• A type that indicates what kind of values are allowed
• A set of restrictions (can be null?, default value, etc..)

A table is a structure to store data.


Pets

Keys and indexes


pet_id pet_name Pet_type Pet_owner_id Date_of_birth Vaccinated Keeper

1 Kily Cat 1 8/5/2016 Yes Mr Sanchez

2 Boot Dog 2 18/7/2006 Yes Ms Smith

3 Marumba Elephant 1 9/4/1999 Yes Mr. Malone

4 Nick Dog 3 3/4/2018 No Ms Smith


• A candidate key is a set of attributes where each element in
5 Wally Dog 4 7/12/2019 Yes Ms Smith
that set is not repeated and I does not contain a subset that is
a not repeated.
• A primary key is a column, or set of columns, that identify
uniquely a register, for example a person’s DNI or a
username in a users tables.
• All tables must have a primary key, which is the reference key Foreign Key Owners
for the table. If you do not define it, the DBMS will create one owner_id owner_name Owner_mobile_phone
for you.
1 Ms Jones 68596524
• Indexes are a mechanism to access a register faster and
introduce restrictions. For example all owners will have 2 Mr Martins 96684846
different mobile phone number and you may want to search
it. 3 Mrs Gregson 96498464

• From another table you can identify a register by its key. That 4 Mr Cooper 896446877
reference is called “Foreign key”.

Index
Stored procedures
• Are programs designed to be run on the DBMS.
• As the DBMS knows it, it can introduce optimization structures internally in the database.
• They can be launched by the users (persons or programs).
• Are written in a more or less standard language.
Entity-relationship diagram
In order to represent graphically the
database there is a standardized way of
representing it, it is the Entity-Relationship
diagram.
• Tables are represented with all their
fields
• The relations between tables are
painted too (references, indexes, etc..)
SQL
SQL = Standard Query Language

• SQL is a language designed to interact with DBMS and it has become a standard.
• The differences between the implementation that different DBMS make of SQL is small, but there is.

• With SQL you can:


• Create and remove databases, users, grant or deny privileges to users.
• Define and modify the structure of a database (create tables, index, stored procedures, keys..)
• Insert and remove data from the structure you have defined
• Create queries to be executed on the structure you have defined.
Rules for design
• Up to the moment, we have seen that we can create a
structure to store our information.
• Access from wherever you want…
• This structure can be whatever we want..
• be modified whenever you like…
• Initially it sound great, but…

• It can easily be a mess…


• We need the structure to be maintainable.
• It does not handle repeated information.
• All its information should always be consistent.
• Easy to understand for others.
• It should be able to grow.. correctly documented.
• etc..
Rules for design pet_id pet_name Pet_type Pet_owner Date_of_bir Vaccinated Keeper
th
1 Kily Cat Ms Jones 8/5/2016 Yes Mr Sanchez

• There are rules and design principles to 2 Boot Dog Mr Martins 18/7/2006 Yes Ms Smith

help us in the design of the database. 3 Marumba Elephant Ms Jones 9/4/1999 Yes Mr. Malone

4 Nick Dog Mrs Gregson 3/4/2018 No Ms Smith

• The structure of the database should be 5 Wally Dogh Mr Cooper 7/12/2019 Yes Ms Smith
“normalized”

• The normalization is a process defined to


adjust the structure of the database to
the conceptual representation of the
reality.
Data normalization
Normal Form Idea Comments
1NF Eliminate Repeating Groups Make a different table for each set of related attributes and give
each table a primary key
2NF Eliminate Redundant Data If an attribute depends on only a part of a multivalued key, remove
it to a separate table.
3NF Eliminate Columns not If attributes do not contribute to a description of the key, remove
dependent on key them to a separate table
BCNF Boice/Codd Normal Form If there are non trivial dependencies between candidate keys
attributes separate them to distinct tables
4NF Isolate independent multiple
relationships
5NF Isolate semantically related
multiple relationships
1NF - Eliminate Repeating Groups
• In the original design, asking “who uses
DB2?” is very inefficient. you should scan
all records one by one.

• In the new structure these problems are


resolved.

But still:
• “DB2” could be misspelled in some cases…
• Changing a database name is dangerous if
all names are repeated.
2NF – Eliminate redundant data
• The Database table still contains repetitions of data.
See that Oracle, Access, and DB2 are repeated.
• The same concept “Oracle” f.ex. has two
DatabaseIDs… it is not a representation of reality
• The primary key of the table is DID
• Te table represents concepts (Databases) and
relations..

We should:
• Have a table with concepts (Databases)
• Have a table with relations (User – Database)
• The table that stablishes the relations will have as
primary Key (Member ID, Database ID).
3NF – Eliminate columns that do not depend on
keys
In this case the member table mixes information
about the member and the company.

• As the field Company and CompLoc does not


depend on the key of the table (MID), it
should have to be extracted.
• The new table should not have repetitions.
4NF – Isolate multiple independent relationships MID SID BID

Bill ERWin ERWin Bible

John VB.NET VB for Dummies

John JAVA Thinking in Java

• Primary applies to key-only tables. Mary ERWin N/A

Steve N/A Powerbuilder bible

• We want to represent what software uses


each user and a book that the user user software books

recommends. MID name SID Software_name BID book_title

1 Bill 1 ERWin 1 ERWin Bible

• Is M:M relationship 2 John 2 VB.NET 2 VB for Dummies

3 Mary 3 Thinking in Java


• Software and books might not have a 1:1
3 JAVA
4 Steve 4 Powerbuilder
bible
relation. They are an independent relation. users_books_software
Have to be in different tables MID SID BID

1 1 1
users_software 2 2 2 users_books
MID SID 2 3 3 MID BID
1 1 3 1 N/A 1 1
2 2 4 N/A 4 2 2
2 3 2 3
3 1 4 4
5NF – Isolate semantically related multiple
relationships
• Now in the previous example we will add
the information about what software is
covered in each book.
software books
• This relation is independent of any other
user
MID name SID Software_name BID book_title
previous relation 1 Bill 1 ERWin 1 ERWin Bible

• Should be in a new table.


2 John 2 VB.NET 2 VB for Dummies

3 Mary 3 Thinking in Java


3 JAVA
4 Steve 4 Powerbuilder
4 Powerbuilder bible

users_software book_software users_books

MID SID BID SID MID BID

1 1 1 1 1 1

2 2 2 2 2 2

2 3 3 3 2 3

3 1 4 4 4 4
Exercise: Create a DB model.
Martian Empire is a company that exploits tourism in Mars.

Martian people have a Martian identifier that is unique. We store the


name, surname and email of people, and every Martian is assigned to a
base, except honorees exceptions. In each base we have some resorts
to spend some holiday, we keep track of everyone that visits our resorts
and store when did he arrive and how long the visitor stayed.

As we have to take care of our visitors, we control the stock of supplies


at resort level. Is to say, we know how many items we have in each
resort, and we control the cost of the stock assuming that all supplies
have the same const independently of the resort that owns it.

On the other hand, now we are only admitting Martian people, because
they are kinder than Earthers, but we have created a marketing
database to store Earthers that may be interested in coming to Mars
and providing us more wealth for standing them.
SQL
Now that we know what do we want to to, let’s see how to do it!!

SQL is a “standard” language to make requests to the database.


You can request for:
• administrative purposes (create users, grant permissions, remove them, etc..)
• alter the structure of the database
• Add and remove information
• questions to the database to search in the defined structure

Full documentation: https://www.postgresql.org/docs/12/sql.html


How to create a table
To create a table you will need:

• know its name


• know its “columns”, or fields, names and the type of data that will store
• What will be the primary key, and if you are going to have other indexes.
Data types
Name Storag Description Range
e size

SMALLINT 2 bytes Small range integer -32768 to 23767

The types supported by PostgreSQL includes the INT 4 bytes Typical integer -2147483648 to
2149483647
following. BIGINT 8 bytes Long range integer -9223372036854775808 to
+9223372036854775807
It is just a subset, also include arrays, geographical
information, etc.. DECIMAL Variabl
e
User-specified precision exact Up to 131072 digits before
decimal point and 16383
after decimal point

REAL 4 bytes Variable precision inexact 6 decimal digits precision

Full info at DOUBLE 8 bytes Variable precision inexact 15 decimal digits precision

https://www.postgresql.org/docs/9.3/datatype.html SERIAL 4 bytes Auto incremental INT

DATETIME 8 bytes Date and time

TIMESTAMP 8 bytes

INTERVAL 16 Time interval


bytes

CHARACTER, N Variable-length text


varchar(n)

TEXT Inf Undetermined length text

CIDR 7 or 19 IPV4 & 6 networks


bytes

INET 7 or 19 IPV4 & 6 hosts and networks


bytes
Table creation
The sintax used to create a table is:

Create table <name_of_the_table> Example


Create table users
( (
user_id serial,
user_name text,
<field definition>, date_of_birth date,

primary key (user_id)


primary key (<field_name>) );

);

See: Default, not null, generated columns,


Table references and constraints
The sintax used to create a table reference is: You can add other constraints to the value of the field

Create table <name_of_the_table>


Create table <name_of_the_table>
(
( …
… <field definition> not null,

<field definition> REFERENCES <table_name> (<fields>), salary int check (salary> 0),

… mobile_phone text default null,


CHECK (salary > 0 AND salary < 1000000) /* ☺ */
);

);

for example:

create table user_password


(
p_id serial,
encrypted_password text not null,
user_id int references users(user_id)
);

https://www.postgresql.org/docs/13/ddl-constraints.html
Table removal and alteration
To remove a table:
Drop table <table_name>;

To alter a table:
alter table <table_name> add column <column_name> <type> <constrict>;
alter table <table_name> drop column <column_name>;
alter table <table_name> add check (<constraint>);
alter table <table_name> add constraint <constraint_name> UNIQUE (<column_name>);
alter table <table_name> add foreign key <column_name> references <table_name>(<column_name>);

https://www.postgresql.org/docs/9.5/ddl-alter.html
Exercise

Create a script that creates the database structure that you have
previously defined.

• create a text file using a text editor


• in that file introduce all the SQL sequences to create the database
structure.

Soon we will execute it in a real database!


Thanks!

Next session:
• Have installed VirtualBox and the VM.
• We will talk about SQL + PostgreSQL + Dbeaver

You might also like