You are on page 1of 62

Data modeling

Prof. Amos DAVID

http://ui-n2.loria.fr
Course content
 Why data modeling ?
 Entity-Relation model
 Relational model

2
Why data modeling ?

3
From information problem
statement to data specification
 A case study

 Problem statement
 We want an information system on students

 Questions
 What is an information system ?
 What do we mean by students ?
 Why do we want the information system ?
 The system should provide answers to what questions ?
4
What is an information system ?
 Functional characteristics of an information system
 Store information (creation)
 Retrieve information (access)
 Update information (modification, deletion)

 Components of an information system


 Users
 End-user
 Information system manager/administrator

 Information base (database)

 User interface
 To implement the functional characteristics
 Between the end-user and the information base
5
General schema of an IRS :
functional approach
Information
User problem

Information problem transformation into access expression

Common access methods Information


* navigation base
* query

Matching Store /Update

Results Objects
6
What do we mean by students ?
 Students viewed by who ?
 Admission office ?
 Post graduate school ?
 Registrar’s office ? University structures
 A university department ?
 Alumni association ?
 By a state government ? Government structures
 By other structures ?
 Students considered over what period ?
 From admission to graduation only ?
 Consider ex-students ?
7
Who are the end-users ?
 This will determine how students are viewed
 This will determine the final use of the information
to be accessed

 Examples of end-users in the university structure


 The VC
 The registrar
 The Dean
 The HOD
 Any category of student (in-course, ex-student)
8
Why data modeling ? …
 Represent the real world
 Focus on the use of the elements
 Represent only the necessary elements
 Represent the relationships between the selected
elements

9
Why data modeling ? …
 For efficient computerization
 Reduce data redundancy
 Disk space problem
 Volume of data transfer
 Objects of documentation
 Describe the computed elements – user notice – technical notice
 For the programmer
 For the system designer / manager
 For the end-users
 Guaranty data integrity
 Valid information irrespective of context
10
Example on data integrity
 Admission office
 STUDENT (N°, names, marital status, gender, degree,
address)

 The department
 STUDENT (N°, names, courses, address)

 The administrative office


 PERSONNEL (N°, names, marital status, faculty,
department, address)
11
General schema of an IRS :
development approach
IRS
(What is seen by the end-user)

Implemented for

Database Management System


(DBMS)

Determined by Developed using

Event
Operation Data

Data modeling

Real world
12
Entity-Relation model

13
Represent the real world elements with four
main concepts
 Entity
 Attributes
 Relation
 Cardinality

 Employs graphic representation


 Intuitive approach

14
Entity
 The basic conceptual or real element

 Examples
 A student
 A personnel
 A town

 Entities have real existence (the instances)


 They are identifiable
 Amos DAVID
 Charles ROBERT
 Ibadan
15
Entity …
 Each entity is associated with a set of attributes

 The instances of an entity have the same characteristics

 They have the same set of attributes


 For example
 All students have the same set of attributes
 All members of staff have the same set of attributes

16
Attributes
 Attributes are used for describing the entities
 The entities and their attributes are determined according
to the database project
 Taking into account the functions to be accomplished
 Examples
 Represent students at the department for course registration
 Represents members of staff for salaries and promotions

 One of the attributes must be an IDENTIFIER


 Its value is unique for each entity

17
Attributes …
 How to reduce redundancy
 Avoid structured attributes
 Structured attributes should be dissociated

 Example
 Names
 First name, last name
 Address
 Street n°, street name, town, local government, state

 Dissociating structured attributes allows an easy access to the


component elements
 Example : the town element of an address can be easily extracted
instead of performing string extraction on the structured element

18
Attributes …
 Examples …
 Name, Address
 Amos DAVID; Dept computer science, UI Ibadan, Ibadan, Oyo state
 Olu OJO; 23 Aderemo street, Agbowo, AgbowoLGA, Ibadan, Oyo state
 Uche KALU; 5 market road, Anambra, Anambra state

 Problems with this representation


 The addresses do not have the same number of elements, so
how can one obtain a specific component ?
 The nth element ?
 Starting from the nth character ?
 How can one locate the town within an address ?
19
Attributes …
 Examples …
 Dissociate structured elements
 Name, Street number, Street name, Town, Local government, State

 Amos DAVID; Dept computer science; UI Ibadan; Ibadan; ;Oyo state


 Olu OJO; 23; Aderemo street; Agbowo; Ibadan; ; Oyo state
 Uche KALU; 5; market road; Anambra; ; Anambra state

 Efficiency
 Each entry has the same number of elements
 A component element can be easily extracted using its position
 Example
 The town value is always at the 4th position
 The state value is always at the last position
 The position can be in string functions or for the colon numbers in tables

20
Attributes
reducing redundancy …
 Avoid attributes whose value is a list ;
a new entity should be created

 Example (memory redundancy)


 Courses as attribute of Degree
 We do not know the number of courses for a degree
 Create DEGREE and COURSE
 Associate the two entities (to be seen later)

21
Example (memory redundancy)…
 Computer science, course 1, course 2, course 3
 Biology, course 3, course 6, course 7, course 20
 Chemistry, course 7, course 3, course 8, course 9, course 10

 In terms of memory allocation, how many courses should be


anticipated ?

 Because of the unknown number of courses, the anticipated number


will either be too few or too many

22
Attributes …
 How to ensure data integrity
 Identify the “functional dependency” of attributes
 Example
 A town belongs to only one state
 Towns are unique

 In a case of functional dependency, use a new entity to


regroup the dependent attributes
 Create a relation between the new entity and the
original one

23
Attributes …
 Examples …
 Name, Street number, Street name, Town, Local government, State
 1.Amos DAVID; Dept computer science; UI Ibadan; Ibadan; Oyo state
 2.Olu OJO; 23; Aderemo street; Agbowo; Ibadan; Oyo state
 3.Uche KALU; 5; market road; Anambra; ; Anambra state

 Entities 1 and 2 are redundant, prone to non integrity


 Should a town change from one state to another, all the entities
are no longer valid
 All the entities must be modified
 Entering entity 1 and 2 (town, state) two times may produce
typographical error

24
Entity –graphic representation
 A rectangle divided into two parts
 The name of the entity in the upper part
 The names of the attributes at the lower part
 The identifier is underlined

PERSON TOWN
Number Town name
Last name State
First name Local government
Date of birth

25
Relation
 A relation specifies the association between two or
more entities

 Example
 Town and Person

 The relation should specify the semantic of the


association
 A person lives in a town
26
Relation …
 A relation is symbolized by an oval with its semantic inside the oval
 A relation is further specified by cardinalities that indicate the
number of associated instances

 Example
 A person lives in a minimum of one town and in a maximum of 1 town
 A town is inhabited by a minimum of one person and a maximum of n
(indicating several)

TOWN PERSON
Name (1,1) Number
Surface area Lives in Last name
State (1,n) First name
Local government Date of birth
27
Relation …
 A relation may sometimes have an attribute
 The attribute describes the relation and not the
entities associated

 Example
 The number of an article bought by a client as well as the
date are neither an attribute of the client nor that of the
article, but an attribute of the association

 The attributes of the relation are indicated at the


lower part of the oval that represents the relation

28
Relation …

CLIENT ARTICLE
Number Name
Last name Bought
Quantity Unit price
First name (1,n) (1,m)
Date of birth Date

29
Relation …
 Maximum cardinality
 This indicates the maximum cardinalities on the
left and on the right of a relation

 Example
[n:1]
TOWN PERSON
Name (1,1) Number
Surface area Lives in Last name
State (1,n) First name
Local government Date of birth
30
Relation …
 How to read the relations : recall
 A person lives in a minimum of 1 town and in a
maximum of 1 town
 A town is inhabited (is lived) by a minimum of one
person and a maximum of n person (several)

[n:1]
TOWN PERSON
Name (1,1) Number
Surface area Lives in Last name
State (1,n) First name
Local government Date of birth
31
Relation …
 A relation can be between two same entities
 Example
 A person is the father of another person
 WARNING : A person is the father of 0 or many persons ; A person
has as father 1 and only 1 person

(0,n)

PERSON
Number
Father of
Last name
First name
Date of birth
32
(1,1)
REMARKS
 The entity and the association as described above
correspond to the description of the concepts.

In the literature, they are termed entity-type and relation-


type.

Their instantiations (existence) are termed entity and


relation.

 For us we use entity and entity-type ; relation and


relation-type indifferently.

33
Relational model

34
Relational model
 The basic concepts
 Relation
 Domain
 Attribute
 Key
 N-uplet

35
Relation (Table)
STUDENT Attributes

Last name First name Date of birth Degree


N-Uplets

Domain (same types of value : names)

36
Domain
 Represents the data type of a column
 Can be defined in form of intention or extension
 In form of intention, it is specified by a formal
definition
 Example
 Integer values
 Character set of les than 20 characters
 In form of extension, it is specified as a finite list of
values
 Example
 Town : {Oyo, Ibadan, Lagos}
37
Relation …
 A relation R is represented as R(A1, …, An)
 Where
 A1 takes its values from D1 …
 An takes its values from Dn

38
Attribute
 An attribute specifies a constituent of the relation
(a particular column of the table)

 Attributes are unique within a relation


(each column must be distinguished from the
others)
 Two columns should not have the same name

 Two attributes may have the same domain


 Example
 First name, Last name : NAMES
39
Cartesian product of relation
 Let R(A1, A2) be a relation
 The Cartesian product of the relation represents all
the possible combination of the values of the
attributes

 Example
 Cars-parked (Mark, Color)
 Where
 Mark  marks of cars (Toyota, Peugeot)
 Color  colors of cars (red, black, white)

40
Cartesian product of a relation …
 Toyota, red
 Toyota, black
 Toyota, white
 Peugeot, red
 Peugeot, black
 Peugeot, white

41
The intention of a relation
 The intention of a relation specifies how the
relation should be interpreted

 Example
 Cars-parked (Mark, Color)
 Cars parked in front of the department of
computer science, University of Ibadan

42
N-uplet
 Represents the extension of a relation
 It is a Cartesian product of attributes
 A line of the table
 Also called a record

 Example
 Person (First name, Surname)
 Amos, David
 John, Olaoye

43
Schema of a relation
 The schema of a relation specifies the
intention of the relation and the associated
integrity constraints

44
Constraint of data integrity
 Constraint on a single attribute
 Example
 The values of “vehicle marks” should be German vehicles

 Constraint based on two attributes


 Example
 The date of marriage should be ≥ date of birth

 Constraint on the table n-uplets


 Example
 The number of registrations for a degree in one year should be
limited to one
45
Maximum Key
 A set of attributes of a relation whose values are
distinct for each n-uplet

 Example
 Person (Matriculation number, First name, Last
name, Date of birth, email)

 All the attributes combined can form the maximum


key

46
Minimum key (the key)
 A key is the minimum set of attributes of a
relation whose values are distinct for each n-uplet

 Examples
 Student N°, first name, last name
 The Student N°, first name combined can be used as key, but
only the Student N° is sufficient

 Student N°, first name, last name, email


 Either Student N° or email can be used as key
47
Functional dependency
 In the relation R(X, Y, Z),
 There is functional dependency between X and Y
if and only if the value of Y is determined by the
value of X and the value of X is unique

 Example
 Person (N°, FN, LN, town, state)

 There is FD between town and state


48
Relations in 1st normal form
 A relation is said to be in 1st normal form if all
the attributes are of single values

49
Relations in 2nd normal form
 A relation is in 2nd normal form if and only if
it is in 1st normal form and there is no FD
between a subset of the key and the rest of the
attributes
 This mean that the key of the relation must be a
minimum key

50
Relations in 3rd normal form
 A relation is in 3rd normal for if it is in 2nd
normal form and there is no FD between non
key attributes

51
From ERM to RM
1. To each entity corresponds a relation
(Entity name relation name)
2. To each attribute of an entity corresponds an attribute of the relation
3. The identifier of the entity becomes the key of the relation
4. For associations of maximum cardinality [1:n], add the key of the
relation on the n side to the relation on the 1 side
5. For associations of maximum cardinality [n:m], a new relation
should be created using the concatenation of the keys of the
associated relations as the key. The attributes of the association
should be added as attributes of the new relation

52
REMARKS
 A collection of relations obtained from an entity-relation
model as described above will have the following
characteristics
 Each attribute is single-value
 The key contains the least number of attributes
 There are no dependencies between the attributes

 A collection of relations that have the above


characteristics are considered of 3rd normal form

53
Important problem
 Some relations resulting from the translation
may not have keys

 In this case, define a new key

 This happens occasionally particularly in the


transformation of NM associations
54
Graph of relations
 Specify by pointed arrows the origin of
imported attributes

 Redraw the relations in form of rectangles


 Use the pointed arrows to link the relations

 REMARKS
 There should be no linked circle
55
Practical
 Model the following types of person in the
university
 the students
 the members of staff

 Be sure to apply the methods for reducing


redundancy and guaranteeing data integrity

56
Practical : University students
 Description
 Each student has a number, a name, an address
 A student is registered for a degree
 A student may not register for more than one degree simultaneously
 A student may take several degrees from the university
 To a degree is associated a set of courses
 A degree is managed by a department
 A course is offered by only one department

 Example of questions
 What are the courses associated with a degree ?
 What are the courses taken by a student for a degree ?
 What are the courses offered by a department ?

57
Exercise – Documentary information
system
 A library contains the following types of
document
 Books
 Journals that contain articles
 Proceedings that contain articles
 Write-ups for master and PhD works

58

 Books and write-ups are described using title,
authors, and a list of keywords

 Journals and proceedings are described using


the title, editor and year of publication

 Articles are represented using the title, authors,


their addresses and a list of keywords
59

 The authorized keywords for describing the
documents are represented using a thesaurus

 Propose an ER model and the associated MR


to manage the information on the various
types of document in the library as well as the
thesaurus

60

 A thesaurus
 List of concepts linked by semantic links
 The semantic links are
 Specific – Generic link (hierarchy)
 See also (association)
 Used for (synonymous)

61
Example of a thesaurus
Transport
Specific/generic

Vehicle
Plane Boat Car See also
Specific/generic Specific/generic

Boeing Airbus Mercedes Peugeot

62

You might also like