You are on page 1of 17

CS notes – Chapter 8 Databases

Spec 8.1:

There are many ways to store data (information). There are two you need to know for now. The first
way is file-based/flat-file database, and the second is relational databases.

Now first off, what is a database?


A database is “structured collection of items of data that can be accessed by different applications
programs”. Basically its data put into a table.

Flat-file database:

These are just one big table with all the data stored in it (all the data is in one place). The main thing
u need to know about them are their limitations/disadvantages:

- Lots of items can keep on being repeated (like if a database stores genders of people in a
school or something, the word “male” and “female” will be repeated so many times, which
just takes a lot of storage if there are like 1000 people or something.). This creates data
redundancy (data redundancy= the same data repeated more than once) which can reduce
data integrity (data integrity= how correct and reliable data is).
- If there are like 2 flat file databases in 2 different applications containing the same data, data
can be altered by one application and not by another (because they aren’t linked). This
makes the data inconsistent.
- All users have access to the whole table/all the data, so the security of the data is a problem.
- Data is much harder to update. Say we have this flat-file database:

Let’s pretend this table goes on and there are like 5000 students. Now say some of them join class
5A. Let’s pretend Joebob and the 4999 th student joined class 5A. Now, to change this information,
you would have to scroll to Joebob and change his “no” to a yes in the “in class5a?” column. Then
you would have to scroll alllllll the way down to change just one word. And if there are like 200 new
people in class 5A? It would be a nightmare to change every single one of them.

Another problem about file-based approaches is that they can’t really be programmed much.
Programs in tables/databases stop data from being entered wrong or from there to be repeating
data in a table.

Relational database:
A relational database is many different tables linked together by internal pointers. Basically, the
different tables share the same data at one point. Here is an example:

Here we have 2 tables that link together to make a relational database. See how both tables use
“Student ID”? What this does is that if we ever need to change data across the two tables, we just
need to change the data in one table, and it will automatically change the data in the other table.
For example, say “Joebob” joined class 5A. Instead of manually typing it out like in a flat-file
database that he joined class 5A, we can just write his Student ID in the yellow table. This makes it so
much easier and is faster.

(Note: the above is just an example but it’s not the best example out there)

Now here are the problems of flat file databases that relational databases solve:

 storage space is not wasted as data items are only stored once, meaning little or no
redundant data
 data altered in one application is available in another application, so the data is consistent
 Since data is inputted only once, so it’s faster to make/update these databases. It also takes
up less memory

Now, the next part you need to know is the parts of a relational database:

Now, a database is made up of tables. Each column is called a field/attribute and each row is called
a record/tuple.

An entity is anything written inside a table. For example, the word “Ahmed” in the table above is an
entity, so is the date of birth, class ID, etc.

Now, lets talk about keys. Keys are basically like ID numbers (they are normally numbers but can also
be letters sometimes). Each table in a relational database has one unique key/ID called the primary
key. The primary key is specific to that table alone.
In this example, Student ID is our primary key. A foreign key is a like a primary key borrowed from
another table (they are the primary key of another table used in a separate table). Now this is a bit
confusing so let me give an example.

 table about students

 table about classes

Now, here are 2 tables. If you look at the “students” table, we see that it has its primary key
“Student ID” and the “classes” table has its primary key called “Class ID”. However, the students
table has another key/ID which is the Class ID. Class ID is a foreign key, so its basically a borrowed
primary key from another table. In this case, the “students” table borrows the Class ID key from the
other table. This ultimately links the two tables together.

Now, there is one more key called a composite key. It is a primary key made up of multiple
fields/attributes, used when one single attribute is not enough to uniquely ID a table. Here is an
example:

(there are 2 keys in the same table and both of them are
primary)

Referential integrity is something where in databases, the links between the tables/the data that is
shared between tables is made sure to be the exact same. For example, Foreign keys that link with
primary keys in another table must have the exact same data

Now, Indexing is something used by databases to speed up searching in large databases. It is made
up of one or more columns in a database. An index in a database is kinda like an index in databases.
This is a picture JUST to help u understand a bit:
Key term Definition
Entity A value/anything written in the table
table A collection of data… in a table (with columns and rows)
Field A column in a table
Attribute A column in a table
Record A row in a table
Tuple A row in a table
Primary Unique ID numbers that identify an item in a table.
key
Foreign They are primary keys of other tables called in a new table that links them (multiple
key different primary keys from multiple different tables) together.

Candidate It is a primary key made up of multiple fields/attributes, used when one single
key attribute is not enough to uniquely ID a table.
Referential This is how accurate data is between the different table links/in all the different
integrity references. It is the property that makes sure that all foreign key values are all linked
to a primary key and that data shared between tables is correct.
Indexing To speed up searching for data, an index can be used. This is a data structure built
from one or more columns in a database table.

Now, let’s talk about is the relationships between tables in a relational database. There are 3 types:

 One to one: when there is one item/entity from one database linked to only one another
from another database. E.g. if 1 student takes only 1 course (if there was one database for
students linked to a database with “courses”). In other words, an entity from table B will
only appear in table A once
 one to many: where there is one item/entity linked from one database to many other items
from another database. E.g. 1 student takes many courses (if there was one database for
students linked to a database with “courses”). In other words, an entity from table B will
appear in table A multiple times.
 Many to many: where many items are linked to many others from another database. E.g.
many students take many courses. (if there was one database for students linked to a
database with “courses”). In other words, many entities from table B will appear in table A
multiple times.

There is a way to draw out these relationships in diagram form. This diagram is called an ER (entity
relationship) diagram.

One to one relationship:

One to many relationships:


Many to many relationships:

Let’s move onto normalization:


Normalization is process taken to ensure that a database has no redundant or inconsistent data. It
minimizes duplication, thus allowing the accurate processing of data; and that the database has
referential integrity. i.e. it will remain error free and robust when data is added, deleted or changed.

BASICALLY it just is a way to simplify and make data more correct/accurate.

Now we need to know how to normalize a database. You will start off from a giant flat-file database.

It will be in a form called 0NF (0 Normal form).

The normalization process is:


0NF 1NF (1st normal form)  2NF (2nd normal form)  3NF (3rd normal form).

 To do the first step (going to 1NF):


There are 3 rules u need to know:
1) all rows need to be unique
2) Each cell/box must contain only a single value (not an entire list)
3) Each value must not be divisible/should be atomic. Basically what I mean is it should not
be able to be broken down. (read on and ull understand).

Let’s say we want to make this 0NF table into 1NF:

Let’s apply the first rule to it: (all rows must be unique)

When we look at this table, we see there is a problem. There are 2 “Bob Jones” in the table,
which is a problem (because we may think its an accidental duplicate). However, we can fix
this problem by doing this:
What we do is we add an “Order ID” or a primary key so we can distinguish between the 2
orders. Now our first rule has been solved.

Let’s now apply the second rule (no lists allowed):


When we see the table, we see that the “Order” field is full of lists.

Now, we need to eliminate these lists due to the second rule. So what we do is we split the tables
into two like this:

Now what we have is a new table of orders. This also means that we have no more lists! So
now we have completed the second rule.

Now lets look at the 3rd rule (Each value should be broken down as much as it can be):

So, when we look at the first table, we notice a problem:

All these names can be broken down / divided / made smaller. Right now we have full
names, and so we can break these names down into first and second names like this:
Now all elements are broken down so much we can’t break down the values any further
(they all are in their simplest form). So we end with this:

Now this is in 1st normal form as it satisfies the three rules.

 Now for 2NF (2nd normal form). Note that we can only make a table that is in 1NF into 2NF.
We can’t go straight from 0NF to 2NF. So, 2NF has one rule. None of the
columns/fields/attributes should have partial dependency on the primary keys. This might
sound really confusing but its not and here’s why.

Say we have this table:

We can see that the “Course fee” column is only dependent on the Course ID part of the
composite key. This means it isn’t dependent on the “Student ID” column. So, what we can
do is we can split this table so that this problem doesn’t exist:
Now, Course fee is in a table where it is 100% dependent on the key (which is Course ID).
Basically what this does is it stops values from repeating and taking more data. Like look at
this:

As you can see, these numbers are unnecessarily repeating and take up more and more
data. So, we make it into 2NF so that doesn’t happen anymore.

That’s all you need to know for 2NF.

 Now for 3NF (3rd normal form). Note: you can only make something into 3NF if its already in
2NF. Now, there is one rule for 3NF. All fields must be dependent on the primary/composite
key, not by anything else/any other keys.
Let me show u what I mean by example:

Over here, we have a problem. The field “Winner’s DOB (Date of Birth)” is dependent on the
field “Winner” and not the composite keys. This does not follow the 3NF rule, so we need to
change it. What we can do is split the table again so that all fields are dependent on the
primary/composite keys. So, this happens:
So now we have 2 tables. And all the fields are dependent on the composite/primary keys. In
the second table (aka “Winners DOBs), the “Winner” table is the new primary key of that
table. And DOB is dependent on the Winner column so yeah. Well, now we have this
database in 3NF.

Spec point 8.2 DBMS

DBMS (Database management system) is a system which manages and changes/updates databases.
DBMS allows organizations to create, access, search and update data or files.

So, we need to know how DBMS address problems of file based approaches, and how DBMS manage
databases.

 Data dictionary: is used to store the data about the table, like definitions of tables,
attributes, relationships between tables and any indexing etc. the data definition can also
define data entrance validation rules. This improves data integrity. Data dictionaries are
used for managing the databases.
 Data modelling: This basically is a tool to show you the structure of a database. Examples
are just like E-R diagrams or diagrams to represent the databases structure
 Logical schema?
 Data integrity solution: The DBMS makes data have high integrity by storing data in separate
linked tables, which reduces the duplication of data as most items of data are only stored
once. Items of data used to link tables by the use of foreign keys are stored more than once.
 Data security: DBMSs keeps data secure by having backup procedures where data is
occasionally backed up at certain times. They also can give different users different access
rights to different data/tables.

Now here are some tools with DBMS software to help with different stuff:

 Developer interface: This allows people to make queries (basically searches in the database
where you can search for data using a code called SQL (structured query language). These
searches/queries are processed by the query processor
 The Query processor: The query processor takes a query written in SQL and processes it. The
query processor includes a DDL (talked about later) interpreter, a DML (talked about later)
compiler and a query evaluation engine. Any DDL statements are interpreted and recorded
in the database’s data dictionary. DML statements are compiled into low level instructions
that are executed by the query evaluation engine. The DML compiler will also optimise the
query.
Spec point 8.3- DML and DDL:

DBMS - database management system- it’s a method of managing and changing/updating databases.
DBMS allows organizations to create, access, search, and update data or files.

Uses include: allowing a number of users to access the same data and create work such as searches,
reports, updating and deleting files. Will create access procedures, which grant users different
amounts of access for databases, and may allow some files to be read only or write only etc.

You might not need to know advantages and disadvantages but here it is just in case:

Advantages of DBMS:

 The format in which data will be provided when searched for will be the same.
 Security is improved.
 Economical way of storing data, it only has to be stored once, saving memory and hardware.

Disadvantages of DBMS:

 Costly to set up, expensive hardware and staff expertise required to run the system.
 Security is essential because data is stored centrally and can be very vulnerable
 Learning how to use DBMS can be difficult

DBMS is just a program used on databases to manage and make them. It consists of 2 parts: DDL and
DML.

- DBMS uses DDL (data definition language) to create, modify and delete data “structures”
(organized data) that make up relational databases.

- DBMS uses DML (data manipulation language) to add, modify and retrieve data already inside a
relational database.

-Both DML and DDL are written just like regular programming. DDL is used for working on the
relational database structure/making databases, whereas DML is used to work with the data stored
in the relational database.

Most companies use a language called SQL (Structured Query language) for their DDL and DML. You
need to know how to program in SQL, which will be explained next.

I’ll first explain SQL (DDL) commands (so SQL that will make databases):

First of all, you need to know that all data in SQL is one of these different datatypes:

Data type Description


CHARACTER It’s just text
VARCHAR(n) Text but of a specified length. (text can be as
long as the number within the brackets). Will be
explained below.
BOOLEAN Data that is either TRUE or FALSE. In SQL we use
1 (True) or 0 (False)
INTEGER Whole numbers
REAL Numbers with decimal points (like 4.3)
DATE A date usually in the format: YYYY-MM-DD
TIME Time, usually in the format: HH:MM:SS

VARCHAR(n) - basically “n” is a number which is like a limit to how long the text can be. So for
example if I wrote VARCHAR(4), my text will be 4 characters long.

Here are the commands you need to know:

(remember that each “command” is in all capitals, and each command ends with a “;” once it is
finished)
I will put all the commands in stuff in bold. Anything not in bold is just a name

- CREATE DATABASE name. This makes a database. (Think of it as making a new file where
you will be putting all your tables inside). You write the name of the database next to the
command to name your new database. Example:

“CREATE DATABASE school; ” makes a database called “school”.

- CREATE TABLE name. This makes a table in a database. Remember, there can be multiple
tables in one table. You write the name of the database next to the command to name ur
table. Then in brackets you define the “fields”/columns and write their data types next to
them. Here is an example:

CREATE TABLE student (


StudentID CHARACTER,
Name CHARACTER,
StudentAge INTEGER,
Date_of_Birth DATE,
ClassID CHARACTER);

What this does is it makes a table that looks like this:

StudentID Name StudentAge Date_of_Birth ClassID


Some Text Some Text A Number A Date Some text
Some text Some text A Number A Date Some text

- Here are 3 commands that go together a lot:

ALTER TABLE field_name. Changes a “definition” of a table. Basically it changes a part of the
table (You will see what I mean in a second). You write the name of the table you want to
change next to it.
E.g. ALTER TABLE school … (there is more to this command, you’ll see in a bit)
PRIMARY KEY field_name. Adds a primary key to the table

FOREIGN KEY field_name REFERENCES Table(field). This command is a 2-part command and
it adds a foreign key to the table.

So this is how they work. Let’s first pretend we have this code:
CREATE DATABASE School;
CREATE TABLE Student(
StudentID CHARACTER,
FirstName CHARACTER,
SecondName CHARACTER,
DateOfBirth DATE,
ClassID CHARACTER);

CREATE TABLE Class(


ClassID CHARACTER,
Location CHARACTER,
Licence Number CHRACTER);

This code makes a database called school and has 2 tables inside of it called “student” and “Class”.

Now, say we want to make StudentID a primary key. This is what we would write:

ALTER TABLE Student ADD PRIMARY KEY (StudentID);

The ADD function just means you are creating a PRIMARY KEY.

Let’s say we also make a primary key for the “class” table (this is for the next example).

So we do:
ALTER TABLE Class ADD PRIMARY KEY (ClassID);

Ok, so now we need to make a foreign key in the student table called Class ID. Here’s how:
ALTER TABLE Students ADD FOREIGN KEY ClassID REFERENCES Class(ClassID);

The ALTER TABLE means we are changing the “Students” table.

ADD FOREIGN KEY means we are making “ClassID” a foreign key

REFERENCES means we are taking the primary key (called “ClassID”) of the table called “Class”.

Now let’s go to SQL (DML), so basically SQL to change data in a database:

First, I will explain how to retrieve/query for data:


- SELECT field_names; to choose which field u want. Example of this:

SELECT first-name;
You can use the select command on many fields like this:
SELECT first-name, surname, date_of_birth;
You can also use the select command on all the fields by doing this:
SELECT *

- FROM name_of_table; allows you to choose which table you want to “select” from/where
you want to retrieve data from. Example of this is:

FROM Class;

- WHERE condition. This command will retrieve all the rows of data that meet the condition.
For example:
Where Gender = “male”;
This retrieves all the rows where the “Gender” field has male in it.

Operator/symbols (that can be used in the Description


WHERE command)
= Equals to
> Is greater than
< Is less than
<> Not equal to

Here is an example of all three of these used together in one SQL code:

For context, lets say we have this table:

SELECT Name, Age ,Date of birth;

FROM Customers;

WHERE Gender = “M”;

This will return all the names, age and DOBs of Males in the database called “customers”.

You will end up with this:


Name Age Date of Birth
Mike 34 10/10/1977
Chris 18 12/12/1992
Ali 35 2/14/1976

Slightly more complex stuff about these commands:

- U can also use the FROM command with more than one table. After each For example, if ur
doing a query for 3 tables, u can do this:
FROM School, Students, Classes;

- The WHERE command can contain multiple conditions. For example, if you want to retrieve
data where: Gender= “male” and Age < 18, you can use the AND function. For example:
WHERE Gender = “male”
AND Age < 18

- When there is multiple tables, for the WHERE command you need to write the table name
before the field name and put a “.” between them. Let me show u what I mean:
WHERE Student.Age > 18
Over here, “Student” is the name of the table where “Age” is the name of the field.

Now, back to more query commands:

- ORDER BY field_name. This sorts the results that u get from a column in alphabetical order
or numerical order. Example of use:
ORDER BY Age
This will sort all the results by their age. It will order it in ascending order by default, but if
you write DESC after the command, it will sort it in descending order. So:
ORDER BY Age ASC = order age in ascending order
ORDER BY Age DESC= order age in descending order

- GROUP BY field_name. Basically, it groups up all searched items by a c olumn. For example,
if you did:
GROUP BY Age.
Basically what this does is it groups every single record that has been searched for by age. So
all the 16 year olds will appear first or something, and then all the 17 year olds etc. This
comes after a SELECT and FROM command

- INNER JOIN table 2


ON table1.field_name = table2.field_name.
This command requires 2 tables. Basically you use it after a SELECT and FROM command. It
basically is like where command and it finds all the records where the data from the first
tables field is the same as the data in the second tables field. For example, If I used INNER
JOIN for a second table of children in a certain sports club with a table about students in a
school, it will give us all the rows where the data in two of their fields match

- COUNT (field_name).
This function is used to count the number of rows that follow a certain condition. For
example, you can make a code that tells you how many rows have the Age of 18 like this:
SELECT COUNT (Age)
FROM Students
WHERE Age=18;

- AVG(field_name).
This functions gives you an average of all the values of selected rows. For example:
SELECT AVG(Age)
FROM Students
This will return the average of all the ages in the students table

- SUM (field_name)
This is the same as the AVG command but instead of giving an average, it gives you a sum of
all the values in a field. For example:
SELECT SUM(Age)
FROM Students
This will return the sum of all the ages in the students table.

Now, lets move onto managing data in already existing databases:

(Ill be using this table for all my examples):

STUDENTS (<- this is the table name)

- INSERT INTO table_name VALUES(data); What this does is it inserts a new row in the table.
Inside the brackets you have to write the values of all the data you want to add into the
able. So for example, you would write:

INSERT INTO Students VALUES(Saim,Ahmed,24,July,2005);


Note that each data value is separated by commas. The position of each word inside the
brackets corresponds to the column of the table. What I mean by this is that, for example,
“Saim” will go into the Name column, “Ahmed” will go into the surname column, etc. We
end up with this:
Now, If we have some data that we don’t want to enter, like we only want to enter the name
and surname (and not day, month, and year), we have to do something a bit differently.
After the table_name, we have to put, in brackets, the columns that we want to fill with
data. For example, say I only want to enter “Saim” in the Name column and “Ahmed” in the
surname column. The way we do it is this:

INSERT INTO Students(Name,Surname) Values(Saim,Ahmed);


In here, we write “Name” and “Surname” inside the brackets to tell the program as that is
what we want to fill. We will end up with this:

- DELETE FROM table_name. This deletes a row from the table. It combos in with the WHERE
command. For example:

DELTE FROM Student


WHERE Day = 4;

What this does is it removes any row where the Day field contains the value 4. In this case,
the row about Peter Parker will be deleted.
- UPDATE table_name. This command allows you to edit a row and change the data in it. You
have to combo it with the SET and the WHERE command. Let me show you the format:

UPDATE table_name
SET field_name = new_data, field_name = new_data,…..
WHERE condition

For example:

UPDATE Students
SET Name=Bob,Day=5
WHERE Year=1975.

What this does is that wherever the Year field has 1975 in it, it will change the name field to
Bob and the Day to 5. So you will end with this:

You might also like