You are on page 1of 11

CS3101 Practical 1 - Database Design

210001111

March 7, 2023

Contents
1 Overview

2 ER Model 1
2.1 Design decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1.1 Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1.2 User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1.3 Member/Subscriber . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1.4 Subscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.5 FilmPerson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.6 Actor/Director . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.7 Award . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.8 PersonAward/ShowAward . . . . . . . . . . . . . . . . . . . . . 3
2.1.9 TV Show/Film . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.10 Episode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.11 Watchlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Relational Schema 4

4 Normalisation 7

1 Overview
This practical involved designing an ER Model for the Scotf lix streaming service,
before translating it into a relational schema and discussing what normal form it was
in. The ER Model is shown in figure 1.
Figure 1: ER Model for Scotflix
2 ER Model
2.1 Design decisions
2.1.1 Person
The person entity set has a unique ID as the primary key because the other attributes
of a person may not be unique to them and may even change (such as with an address).
This entity set acts as a superclass for both a f ilmperson and a Scotflix user, because all
these entities share some common attributes such as name, date of birth and nationality.
The age of a person is a derived attribute because (calculated from the date of birth),
and the name is a composite attribute made up of a first name and a surname as actors
and directors may need to be searched by either name, so having both ensures atomicity.
Storing two names will be sufficient for Scotflix, as it is unlikely two actors, directors or
users will have the same exact name as someone else in the same show or subscription
group. Storing middle names would lead to many null values as not everyone has more
than two names.
The reason for specialisation here is because a user and a f ilmperson will not share
all the same attributes, but the person attributes are common to all entity sets. The
inheritance of person is overlapping and has a total completeness constraint because
everyone registered on Scotflix must either be a user or a f ilmperson, and it’s possible
for someone to be both as actors/directors can also register.

2.1.2 User
The user entity set inherits the attributes from person and has additional attributes as
someone registered with Scotflix. The address attribute is composite because multiple
values are required to identify a house address uniquely, and the system will need to
ensure these are filled out by the user. The attributes for an address are taken from
the examples of members in the specification. Users can review shows, and since each
review only needs a score and a date, it is represented by a many-to-many relationship
set.

2.1.3 Member/Subscriber
Members and subscribers both have the user entity set as a superclass, as they have the
same attributes but must be distinguished from each other due to differing capabilities.
There is a disjoint, total completeness constraint in the inheritance relationship because
users have to be either a member or a subscriber (as profiles are deleted once they are
not part of a subscription anymore), but can’t be both as every user can only be
assigned to one subscription at a time. Up to 5 members may be associated with one
subscription at a time so there is a cardinality limit between member and subscription.
M ember and subscriber have total participation in the relationship because they must

1
be associated with a subscription to be part of Scotflix. However, subscriptions need
not have additional members, hence the partial participation on subscription’s side.
Subscribers have a one-to-one relationship with a subscription, as they must only have
one at any time.

2.1.4 Subscription
The subscription entity set is uniquely identified by an ID as other attributes may
change so will not be suitable primary keys. The payment details are stored within this
entity set instead of subscriber, because subscriptions need to be able to automatically
renew, so it made more sense to have the payment details accessible on the subscription
itself rather than having to perform an extra step to access the subscriber table. This
attribute is composite as it requires a card number, expiration date and security code
according to the examples in the specification. The dates for the subscription’s renewal
and reminder email are derived attributes as they can be calculated from the date of
the subscription.

2.1.5 FilmPerson
This entity set inherits from person, and also stores the date of an Actor’s or Director’s
death, if there is one. The downside to this is that the database may have null values
for people who are still alive, but it is not worth having separate entity sets for actors
or directors who are still alive or not.

2.1.6 Actor/Director
Shows must have at least one Director, but some shows (such as documentaries) may
not have Actors, so there is total participation with Director but partial participation
with Actor. Also, it is possible for Actors to become Directors, so the specialisation is
overlapping. It may have been plausible to represent Actors and Directors with rela-
tionship sets and roles instead of entity sets because they share the same attributes and
are overlapping. However, it is assumed that Scotflix only stores Actors and Directors,
so there is a total completeness constraint which makes it more suitable to represent
them with entities.

2.1.7 Award
The person award and show award entity sets were separated and inherit from the
superclass award because they require the same attributes but have relationships with
different entity sets. Showtitle is required to identify shows because not every awarded
show is on Scotflix as per the specification. There is a disjoint, total inheritance because
all awards must be one of the subclasses, but there exists no award for both a show
and a person. Awards will have the same name, category and awarding entity, but are

2
uniquely identified by the year of the award, so all these attributes are required for the
primary key.

2.1.8 PersonAward/ShowAward
P ersonaward and showaward have a one-to-one relationship with f ilmperson and
show because any award can only be won by a single person or show each year. The par-
ticipation of showaward in the relationship with show is partial because some awards
may not be related to a show in the Scotflix database. P ersonaward has total par-
ticipation in its relationship with f ilmperson because it’s assumed each personaward
must have a corresponding person in the database. The reason awards were not rep-
resented by multi valued attributes was because several attributes are required for an
award, and Scotflix may also want to store additional attributes such as nominees, so
it is better to represent them with an entity set.

2.1.9 TV Show/Film
Films and TV shows inherit with a disjoint, total completeness constraint from a Show
entity set because they share exactly the same attributes, and a show must be either a
TV Show or a Film. There are separate entity sets to prevent films from being able to
have episodes. An ID is needed for the primary key because there could be shows with
the same title and year. There is an attribute for the expiry date of a show, which for
the shows that are part of the permanent collection could be set to null. The languages
attribute is multi-valued because a show may be available in one or more languages.

2.1.10 Episode
This is a weak entity set because it relies on show to be uniquely identified, as there will
be episodes from different shows that have the same title, season number, and episode
number. The discriminator attributes are just the season and episode numbers, as the
title is not needed to distinguish episodes within a show. One show will have several
episodes, explaining the many-to-one relationship, and there is total participation on
episode’s side because every episode must belong to a show. The precedes and f ollows
relationship sets are used to represent the relationship between episodes, and there is
partial participation as episodes may not have a next or previous episode. The roles to
indicate the current and next or current and previous episodes are needed to specify how
an episode participates in a relationship instance, since it is a recursive relationship.

2.1.11 Watchlist
This could have been a weak entity set, requiring user as its identifying entity set, be-
cause name is not a uniquely identifying attribute. However, weak entity sets usually
have many-to-one relationships whilst users can only have one watch list so it requires

3
a one-to-one relationship. Thus, it was made to be a strong entity set with a unique ID.
It is assumed watch lists may not contain any shows if the user has not added any and
users may not have a watch list, so there is partial participation in the relationship set
contains and on the user side. However, watch lists on the system must have a user,
so it has total participation.

3 Relational Schema
subscription(ID: INTEGER, subscription fee: FLOAT, date: DATE,
renewal type: INTEGER, card number: INTEGER, expiration date: DATE,
security code: INTEGER)

The subscription f ee is £9.99 as per the specification so requires a float type.


Since the renewal is either every 6 months or annually, renewal type can be
stored as an integer (0 or 1) for more efficient memory usage. It is assumed a date
type is available to make it easier to compare dates for when the subscription
expires.

user(ID: INTEGER, firstname: STRING, surname: STRING, birth date:


DATE, nationality: STRING, email address: STRING, street number:
INTEGER, street name: STRING, apartment number: STRING, city:
STRING, zip: STRING, password: STRING, subscription ID*: INTEGER)
member(user ID*: INTEGER)
subscriber(user ID*: INTEGER)

apartment number is a string because it could contain letters (e.g. 15A). Mem-
bers and subscribers have their own schemas to distinguish their roles and ca-
pabilities with a subscription, such as removing or adding other members, but
do not need additional local attributes. The other difference is a subscriber also
gives their payment details, but these are stored in subscription to avoid having
to access two tables when automatically renewing the subscription. The foreign
key linking a user and their subscription is stored in the user schema because of
the many-to-one relationship.

filmperson(ID: INTEGER, firstname: STRING, surname: STRING, birth date:


DATE, nationality: STRING, death date: DATE)

4
The person entity was not represented by its own schema so that querying of data
from f ilmperson and user is easier, with all the attributes in one place rather
than two. Although there might be redundant data for people who are both
in films and Scotflix users, it is assumed these will be rare cases with Scotflix’s
users. Also, the potentially redundant attributes (name, date of birth, national-
ity) are unlikely to ever change so there is minimal chance for inconsistencies in
the database.

show(ID: INTEGER, title: STRING, year: INTEGER, country: STRING,


category: STRING, expiry date: DATE)

tvshow(show ID*: INTEGER)

film(show ID*: INTEGER)

There is a schema for the superclass show because other schemas such as review
need to reference both TV shows and films, so the general show schema allows
them to have just one foreign key, because it contains all the attributes for both
entities. Subclass schemas are needed because episode will only reference tvshow,
and not a f ilm.

watch list(ID: INTEGER, user ID*: INTEGER, name: STRING)

watch list contains(watchlist ID*: INTEGER, show ID*: INTEGER)

The foreign key linking a user and their watch list is stored in the watch list
schema to prevent null values if a user does not have a watch list.

review(user ID*: INTEGER, show ID*: INTEGER, date: DATE,


score: INTEGER)

It is assumed users can review any show only once so the tuple (user ID*, show ID*)
is a candidate key. If they were allowed to submit multiple reviews for the same
show, the table would not have a superkey. Score is an integer from 1 to 5.

show language(show ID*: INTEGER, language: STRING)

5
episode(tvshow ID*: INTEGER, season number: INTEGER,
episode number: INTEGER, title: STRING,
next episode number: INTEGER, next season number: INTEGER,
previous episode number: INTEGER, previous season number: INTEGER)

Episode’s in different shows may have the same title so it is not included in the
primary key. The attributes for the next episode number, next season number,
previous episode number and previous season number can be stored in the same
episode table since there is a one-to-one relationship between episodes. The down-
side here is that there will be null values for the first and last episodes in a TV
show.

show award(awarding entity: STRING, category: STRING, year: INTEGER,


show ID*: INTEGER, show title: STRING)

person award(awarding entity: STRING, category: STRING,


year: INTEGER, filmperson ID*: INTEGER, show title: STRING)

There is no superclass schema for award to avoid having to access two relations
to get information on awards. It is assumed that awards may not have the cor-
responding show in the database, so there is a show title attribute to identify a
show in case show ID is null. The minimal candidate keys do not require the
show ID or f ilmperson ID because each award can only be awarded to one show
or person each year.

actor(show ID*: INTEGER, filmperson ID*: INTEGER)

director(show ID*: INTEGER, filmperson ID*: INTEGER)

The f ilmperson superclass schema is used to avoid storing redundant data for
people who are both actors and directors, so these schemas just need to reference
the show and film person with foreign keys.

6
4 Normalisation
subscription
– ID → subscription fee, date, renewal type, card number, expiration date, se-
curity code
– card number → expiration date, security code
user
– ID → firstname, surname, birth date, nationality, email address,
street number, street name, apartment number, city, zip, password, subscrip-
tion ID
– email address → ID, firstname, surname, birth date, nationality,
street number, street name, apartment number, city, zip, password,
subscription ID
– zip → city
film person
– ID → firstname, surname, birth date, nationality, death date
show
– ID → title, year, country, category, expiry date
watch list
– ID → user ID, name
– user ID → ID, name
review
– user ID, show ID → score, date
show language
– show ID → language
episode
– tvshow ID, episode number, season number → title, next episode number,
next season number, previous episode number, previous season number
– previous episode number, next episode number → episode number
– previous season number, next season number → season number
show award
– awarding entity, category, year → show ID, show title
– show ID → show title
person award
– awarding entity, category, year → filmperson ID, show title
Figure 2: (Non-trivial) functional dependencies in the relational model

There was ambiguity regarding whether a film person or show along with their awarding
entity and category would determine the year in which they won that award. It was
assumed that a series may win the same exact award over several years for different
seasons, in which case this functional dependency would be invalid, so it was not in-

7
cluded. Similarly, actors and directors can also win the same award in different years
(and potentially for the same series).
It is assumed there could be different shows with the same title, year, country and
category, so the candidate key is just the show ID. It is also assumed that a TV show
could have episodes with the same title (e.g. a ’pilot’ for every season), so the show ID
and title do not functionally determine the episode or season numbers.
First normal form conditions are satisfied because the domains of all attributes are
atomic for the requirements of this system. Everything that needs to potentially be
searched for is its own attribute and doesn’t need to be broken down. For instance,
actors and directors can be searched for by their first or second names, and dates can
be searched for and compared because they are stored as a date type.
Second normal form is also satisfied because all non-prime attributes are fully func-
tionally dependent on their candidate key. Figure 2 shows that schemas subscription,
user, f ilm person, show, watch list, and show language have a single identifier as
the candidate key, which acts as a superkey for the whole schema. For review, score
and date still fully depend on both the user that submitted the review and the show
being reviewed. For episode, the title is unique to the show, episode number and season
number, and the next/previous episode/season attributes depend on these attributes
too because each show has a different number of episodes and seasons. For the award
schemas, show ID, f ilmperson ID and show title depend on the whole candidate key
because the same award can be awarded to different shows or people in different years.
The schemas subscription, user and show award are not in third normal form (3NF)
due to transitive dependencies starting from candidate keys. For example, subscription
has a transitive dependency:
ID → card number → expiration date, security code
as card numbers are assumed to be unique to a person, and they determine the other
details on a card.
U ser also has a transitive dependency:
ID → zip → city
as it is assumed that a city can be uniquely determined by the zip code [6]. This is true
at least for the UK, and whilst there may be exceptions in other countries, it would be
for very small towns, so it can be assumed that cities are determined by a zip code.
Finally, show award also has a transitive dependency:
awarding entity, category, year → show ID → show title
as show title only depends on show ID (if there is a corresponding show in the database).

8
To be in third normal form, one solution would be have to be separate schemas con-
taining the dependent attributes, with (card number) and (zip, street number) as the
primary keys. However, for show award it is harder to achieve 3NF as show title is
required in the same table in case show ID is null.
The schemas f ilm person, show, watch list, review, show language, and person award
are all in 3NF and Boyce-Codd normal form (BCNF) because the determinant of every
non-trivial functional dependency is a superkey. For these schemas, the determinant is
always the primary key of the table and thus a superkey by definition. The exception is
watch list, but user ID is still a superkey because each user can only have one watch-
list. For episode, it can’t be in BCNF as the determinants for the last two functional
dependencies do not determine every attribute in the schema.

Word Count: 2400

References
[1] CS3101 Databases Lecture 3 – Extended ER Model. en-GB. url: https://studres.
cs.st-andrews.ac.uk/CS3101/Lectures/L03_Extended_ER_Model.pdf (visited
on 03/05/2023).
[2] CS3101 Databases Lecture 5 – ER to Relational. en-GB. url: https://studres.
cs.st-andrews.ac.uk/CS3101/Lectures/L05_ER_to_Relational.pdf (visited
on 03/05/2023).
[3] CS3101 Databases Lecture 6 – Functional Dependencies. en-GB. url: https :
/ / studres . cs . st - andrews . ac . uk / CS3101 / Lectures / L06 _ Functional _
Dependencies.pdf (visited on 03/05/2023).
[4] CS3101 Databases Lecture 7 – Normalisation Normal Forms. en-GB. url: https:
//studres.cs.st- andrews.ac.uk/CS3101/Lectures/L07_Normalisation_
NormalForms.pdf (visited on 03/05/2023).
[5] Ron McFadyen. “7.4.4: Recursive Relationships”. en-ca. In: (). url: https : / /
ecampusontario.pressbooks.pub/relationaldatabasesandmicrosoftaccess365/
chapter/__unknown__-75/ (visited on 03/05/2023).
[6] Postcodes in the United Kingdom. Mar. 2023. url: https://en.wikipedia.org/
wiki/Postcodes_in_the_United_Kingdom.
[7] Abraham Silberschatz, Henry F. Korth, and S. Sudarshan. Database system con-
cepts. Seventh edition. New York, NY: McGraw-Hill, 2020. isbn: 9780078022159
9781260515046.

You might also like