Professional Documents
Culture Documents
Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee
Educational Objectives
TECNOLOGIE
Technology is pervasive in our daily experience, but we must never forget that it is a tool at our service. We are surrounded by
data, applications and digital services of all kinds and nature and we ourselves, more or less consciously, are users of these new
digital technologies on a daily basis.
Technology makes it possible to record, to store and to analyze ever-growing quantities of data, to search, to book, to pay for
goods and services, to manage relations with the public administration, to express our opinion on the services and goods we use,
to access virtual realities, to consume entertainment services whenever we want and wherever we are, to communicate for work
or pleasure with anyone wherever they are.
Therefore, new problems of Control, Quality, Reliability, Certification, Security of the digital platforms on which we operate arise.
The boundaries between social and private are more blurred, uncertain, indefinite, data can make us freer and more aware, or
more vulnerable and orientable.
Hand in hand with technology, it is necessary to develop the ability to search, integrate, elaborate, imagine and understand.
And together with all this we need ethical awareness and social responsibility.
In this context, great attention is paid to Artificial Intelligence and how it can be applied in different fields and how its
applications are changing the world. Machine Learning in recent years has found wide areas of application, for example in the
field of health.
In this course we will address these issues with an approach where the questions we ask ourselves will be more important than the
answers we will find, together or individually.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Lessons Time Table
date time topics
Monday 4 March 2024 18-20 New Paradigms for Digital Architectures
Tuesday 5 March 2024 18-20 New Paradigms for Data Architectures
Wednesday 6 March 2024 18-20 Emerging Issues - Data Privacy/Data Governance
Monday 11 March 2024 18-20 GDPR - Regulations & Roles
Tuesday 12 March 2024 18-20 Data Analysis & Design: W6H
Wednesday 13 March 2024 18-20 Data Modeling - ER Model
Monday 18 March 2024 18-20 ER Model: Identification Keys - Hierarchies
Tuesday 19 March 2024 18-20 ER Model: Exercises on ER Diagrams
Wednesday 20 March 2024 18-20 ER Model: Exercises on Trasforming ERD into Tables
Monday 25 March 2024 18-20 Data Warehouse-Multidimensional Model
Tuesday 26 March 2024 18-20 Data Warehouse-Star Schema, Snowflake Skema, Galaxy Schemas
Wednesday 27 March 2024 18-20 Data Warehouse-Exercises on Star/Snowflake/Galaxy Schemas
Monday 8 April 2024 18-20 Big Data: Definition; NoSQL models
Tuesday 9 April 2024 18-20 Artificial Intelligence
Wednesday 10 April 2024 18-20 Machine Learning
Monday 15 April 2024 18-20 Test on the first part of the course
Tuesday 16 April 2024 18-20 Presentation and discussion of the papers on the first part of the course
Wednesday 17 April 2024 18-20 Deep Learning & Neural Networks
Monday 22 April 2024 18-20 Deep Learning & Neural Networks
Tuesday 23 April 2024 18-20 Regulation on Artificial Intelligence
Wednesday 24 April 2024 18-20 BlockChain: Cryptocurrencies, Smart Contracts and Certification
Monday 6 May 2024 18-20 Digital Identity: Definition and Regulations
Tuesday 7 May 2024 18-20 Digital Divide: Definition and Regulations
Wednesday 8 May 2024 18-20 The Industry 4.0 national plan: Objectives and enabling technologies
Monday 13 May 2024 18-20 Digital Era: social and behavioral implications of new technologies
Tuesday 14 May 2024 18-20 ChatGPT: Ethical, Social and Economic Implications
Wednesday 15 May 2024 18-20 Test on the second part of the course
Monday 20 May 2024 18-20 Final Discussion about the topics of the course
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
FROM THE DEVELOPMENT OF CUSTOM APPLICATIONS TO THE "COMPOSITION OF SERVICES" ON CLOUD ARCHITECTURES FROM
SYSTEM INTEGRATOR TO IT SERVICES ORCHESTRATOR
The current scenario:
The development of custom IT services through massive coding has decreasing margins, also due to the contraction in demand for these services
and the collapse of unit prices for the day/person or for the Function Point (traditional standard metric for measuring software applications);
the management of large custom application assets is an activity with very low or no margins and with a high risk of fault;
the management of local “human intensive” data & processing centers involves very high operating costs, continuous investments for revamping,
with inadequate scalability compared to market demand of IT services.
Consequentially:
The professions of programmer, system administrator, network expert or storage manager are non-strategic professions in mature and evolved IT
markets and can in any case be acquired cheaply in near or remote shoring mode;
The emerging and dominant profile is that of the solution architect, who in close contact with the business understands the core processes and
the needs of IT services and packages the flexible and scalable solution by composing elementary services offered in as a service mode by the
technological platforms present on a global scale (on cloud).
https://aws.amazon.com/it/about-aws/global-infrastructure/?p=ngi&loc=1
https://cloud.google.com/about/locations?hl=it#regions
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
GDPR
Test AI Regulation
Laws &
Pick some element of digital technology Regulations
Digital Identity Regulation
Cryptocurrency Regulation
innovations and relate it to some digital Smart Contracts Regulation
opportunity and then relate these …
INTERNET
The internet can be described as a vast network of interconnected computers and devices spanning the globe, facilitating communication, information
exchange, and resource sharing. It is a dynamic and ever-expanding network that enables users to access a wealth of content and services, including
websites, email, social media, online shopping, streaming media, and more.
At its core, the internet operates on a decentralized system of interconnected networks, each comprising servers, routers, switches, and other
infrastructure components. These networks use standardized protocols and technologies, such as TCP/IP (Transmission Control Protocol/Internet
Protocol), to ensure seamless communication between devices regardless of their location or type.
The internet has revolutionized the way we live, work, and interact with one another. It has democratized access to information, enabling individuals to
educate themselves, express their opinions, and engage in public discourse. Additionally, the internet has transformed industries, from commerce and
entertainment to healthcare and education, by providing new avenues for innovation, collaboration, and efficiency.
However, the internet also presents challenges and risks, including privacy concerns, cybersecurity threats, misinformation, and digital divide issues.
As society continues to rely more on the internet for everyday tasks and activities, it becomes increasingly important to address these challenges
while leveraging the internet's potential for positive impact and global connectivity.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
CLOUD TECHNOLOGY
If you have a free Gmail account or regularly back up your phone photos to a Google Drive or iCloud storage account, you’re already using cloud
services in your day-to-day life.
While these examples of digital transformation with the cloud are on the more basic end of the spectrum, businesses can leverage the full power of
cloud computing in multiple ways:
Backing up copies of big data
Hosting websites
Deploying software infrastructure to a global workforce
Giving team members remote access to essential programs
While we may use the word “cloud” to describe where this data is located, it does have a terrestrial presence.
Cloud-based data exists on one or more servers separate from your location.
Your Gmail messages, for example, might currently be in a Google data center in Henderson, Nevada, or Middenmeer, Netherlands!
Keeping your data in the cloud—spread across different data centers—is beneficial for safety.
If all of your data is in one location, such as in a server in your building, and it’s damaged, your business operations may go down.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
MOBILE
The first tool in digital transformation is one we already have on our hands and use daily: mobile devices.
The rise of mobile technology is a cornerstone of the digital age.
It’s already helped businesses around the world with tasks that range from basic (answering emails while on vacation) to complex (getting medical
equipment into mobile clinics and remote locations).
Each use of mobile technology in business will look different from the next based on industries and needs.
Depending on your business’ focus, you might use mobile technology to:
Enable team members to call into staff meetings while working remotely
Use voice over internet protocol (VoIP) phone systems that keep you connected to your office line
Track patients’ vital signs with the use of wearable sensors
Connect healthcare providers and specialists in remote locations
Unlocking the full benefits of mobile technology does require a few other pieces of tech:
cloud storage
high-speed mobile data
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
BIG DATA
Big data refers to large volumes of raw data that must be processed and interpreted in order to glean useful results.
Big data collection happens thanks to the Internet of Things, telecommunications, website trackers, social networks, and any other place where
people or systems interact with your business.
Big data is typically stored in databases.
While this itself isn’t new—companies have been using big data analytics since the early days of computing—the ways in which it is processed are
changing.
Many companies are now turning to artificial intelligence (AI) and machine learning to comb through their big data and use the results in the
development of business strategies.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
DIGITAL TWINS
Digital twins are digital, data-based business models that replicate people, places, systems, and processes in the real world. In business, you can use a
digital twin to do things like:
Model the impact of changes on a production line
Predict customers’ buying habits and adjust inventory levels
Test different office building layouts for productivity
Model the impact of launching a new operating model
Project how your new products may be received by your market
Determine when customers prefer to receive important information
Try out new business processes without risk
The biggest benefit of a digital twin is that it can be used to forecast and predict potential problems. If you can use a digital twin to predict what
might go wrong—and mitigate the problem before it can happen in the real world—you can save resources, time, and money.
If you want to walk around a business metaverse, the technologies that power digital twins also help to create augmented and virtual reality tools.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
BLOCKCHAIN
A blockchain is a digital ledger, or rows of data, shared across (and accessible by) a computer network.
When you add to a blockchain, one of the computers in the blockchain network writes data, called a block, to the ledger.
Once created, an entry on the ledger can’t be edited without breaking the entire chain of information, so the blockchain becomes a reliable record of
transactions and events.
Publicly accessible blockchains are used to record the exchange of digital currency, like Bitcoin, and the sale of digital goods like non-fungible tokens
(NFTs).
When used privately within your organization, though, a blockchain can become a verifiable record of approvals, contracts, data entries, and more.
The blockchain can also be used to store an organization’s website and app data, track inventory through supply chains, or manage information about
international flights.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
INFORMATION
Element of knowledge deriving from observation, deductive or inductive analysis, analysis of
relationships/correlations on a set of measurable or non-measurable data
The concept of information is closely linked to the concept of communication
The information, as result of more or less complex analyses, based on implicit or explicit interpretative models,
could contain elements of subjective evaluation
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision
DECISION
Definitive judgement, which overcomes any pre-existing doubts and uncertainties, which determines a choice, an action
and consequences.
The decisions are arbitrary but involve an assumption of responsibility for the consequences.
Decisions are influenced by data, information and opinions.
In complex systems it is impossible to foresee all the possible consequences, direct and indirect, of a decision relevant to
the system.
What can be done is to use data and information to reasonably reduce the degree of uncertainty of the effects of a decision
on the system.
There are no decisions that are neutral with respect to the opinions of the decision maker. There are no objectively right
decisions, unless you evoke moral issues.
But one can aspire to rational decisions, based on neutral data, on partially neutral information and on arbitrary opinions.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Value
PDCA
CYCLE
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision
Experience +
Knowledge +
Wisdom
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision
Emerging Issues:
Data Quality
Data Governance
Data Protection
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Quality
UNIQUENESS OF THE DATA: THE DATA REFERRING TO SPECIFIC EVENTS
MAIN ASPECTS OF DATA QUALITY ARE RECORDED ONLY ONCE WITHIN THE DATABASE
Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee
Emerging Issues:
Data Quality
Data Governance
Data Protection
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Privacy
REGULATION (EU) 2016/679 (GENERAL DATA PROTECTION REGULATION)
Key GDPR roles:
• Controller
• Processor
• Data Protection Officer (DPO)
• Supervisory Authority
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
Protection of personal data
In January 2012, the EC proposed a comprehensive reform of data protection rules in the EU. On 4 May 2016, the
official texts of the Regulation and the Directive have been published in the EU Official Journal in all the official
languages. The Regulation entered into force on 24 May 2016, it had to be applied from 25 May 2018. The Directive
entered into force on 5 May 2016 and EU Member States had to transpose it into their national law by 6 May 2018.
PERSON IN CHARGE OF THE COOPERATE WITH THE EDPS (RESPONDING TO HIS REQUESTS ABOUT
INVESTIGATIONS, COMPLAINT HANDLING, INSPECTIONS CONDUCTED BY THE EDPS,
TREATMENT ETC.);
DRAW THE INSTITUTION'S ATTENTION TO ANY FAILURE TO COMPLY WITH THE
THE PERSON WHO DAILY AND APPLICABLE DATA PROTECTION RULES.
D.P.O.
DATA PROTECTION OFFICER
IS AN OBLIGATORY CONSULTANT FOR THE DATA CONTROLLER AND FOR THE RESPONSIBLE FOR THE TREATMENT
HAVE ADEQUATE KNOWLEDGE OF THE LEGISLATION AND PRACTICES OF PERSONAL DATA MANAGEMENT
HAVE NO CONFLICT OF INTEREST TO FULFILL HER/HIS DUTIES IN TOTAL AUTONOMY AND INDEPENDENCE
ATTENTION: THE OWNER WHO APPOINTS A D.P.O. WHO DOES NOT QUALIFY IS GUILTY OF "CULPA IN ELIGENDO"
The personnel office provides the The purchasing office provides The sales office provides the
worker with a complete the supplier with a complete customer with a complete
INFORMATION NOTICE clearly INFORMATION NOTICE which INFORMATION NOTICE which
specifying the use that will be clearly specifies the use that will clearly specifies the use that will
made of the data and which be made of the data and which be made of the data and which
subjects will be able to come into subjects will be able to come into subjects will be able to come into
contact with it. contact with it. contact with it.
In this phase it is also determined When dealing exclusively with In this phase it is also determined
whether for some data it is tax data, this task is not whether for some data it is
necessary to ask for explicit necessary necessary to ask for explicit
consent consent
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
THE VIOLATION OF THE
GDPR LAW ENTAILS:
COMMUNICATION
Bring to the attention of one or disciplinary sanctions
more well-identified subjects of
WHEN ARE THEY LEGAL? administrative sanctions
information (mail, letter, phone
call, etc.) when the law provides criminal penalties
for it
SPREAD But above all the violation
when the data owner of the GDPR law may
make a plurality of unidentified
has given consent have economic
subjects aware of information
(newspaper, radio, internet, TV, consequences in terms of
ecc.) compensation for
damages even for very
significant amounts
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Modeling
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
7 Key Questions
(W6H)
Which ? Who ?
• What kind of data • who produces the data
• who updates them
Why ? • who manages them
• for knowledge • who do I distribute them to
• for monitoring
• for forecasts How ?
• … • how do I get them
• how do I process them
What ? • how do I store them (in
• wich kind of analyses aggregated form or in elementary
• which kind of elaborations form; historicized or not)
• wich kind of reports
When?
Where ? • How often do I extract them
• where do I get the data (sources) • How often do I process them
• where do I store them • How often do I distribute them
(repository/storage)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
Data modeling is the process of creating a visual representation of either a whole information
system or parts of it to communicate connections between data points and structures.
The goal is to illustrate the types of data used and stored within the system, the relationships
among these data types, the ways the data can be grouped and organized and its formats and
attributes.
Data models are built around business needs.
Rules and requirements are defined upfront through feedback from business stakeholders so they
can be incorporated into the design of a new system or adapted in the iteration of an existing
one.
Data can be modeled at various levels of abstraction.
The process begins by collecting information about business requirements from stakeholders and
end users.
These business rules are then translated into data structures to formulate a concrete database
design.
A data model can be compared to a roadmap, an architect’s blueprint or any
formal diagram that facilitates a deeper understanding of what is being designed.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Like any design process, database and information system design begins at a high level of
abstraction and becomes increasingly more concrete and specific.
Data models can generally be divided into three categories, according to their degree of
abstraction.
The process will start with a conceptual model, progress to a logical model and conclude with a
physical model.
Each type of data model is discussed in more detail in the next slides.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
They are also referred to as domain models and offer a big-picture view of what the system will
contain, how it will be organized, and which business rules are involved.
Conceptual models are usually created as part of the process of gathering initial project
requirements.
Typically, they include entity classes (defining the types of things that are important for the
business to represent in the data model), their characteristics and constraints, the relationships
between them and relevant security and data integrity requirements.
The notation is typically simple.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling discipline invites stakeholders to evaluate data processing and storage in deep
detail.
Data modeling techniques have different conventions that dictate which symbols are used to
represent the data, how models are drown, and how business requirements are defined.
All approaches provide formalized workflows that include a sequence of tasks to be performed in
an iterative manner.
Those workflows generally look like this:
1. Identify the entities
The process of data modeling begins with the identification of the things, events or concepts that
are represented in the data set that is to be modeled. Each entity should be cohesive and logically
discrete from all others.
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
5. Assign keys as needed and decide on a degree of normalization that balances the
need to reduce redundancy with performance requirements.
Normalization is a technique for organizing data models (and the databases they represent) in
which numerical identifiers, called keys, are assigned to groups of data to represent relationships
between them without repeating the data.
For instance, if customers are each assigned a key, that key can be linked to both their address and
their order history without having to repeat this information in the table of customer names.
Normalization tends to reduce the amount of storage space a database will require, but it can at
cost to query performance.
Data Modeling
TECNOLOGIE
Data modeling has evolved alongside database management systems, with model types
increasing in complexity as businesses' data storage needs have grown. Here are two model types:
a) Hierarchical data models represent one-to-many relationships in a treelike format. In this type of
model, each record has a single root or parent which maps to one or more child tables. This
model was implemented in the IBM Information Management System (IMS), which was
introduced in 1966 and rapidly found widespread use, especially in banking. Though this
approach is less efficient than more recently developed database models, it’s still used in
Extensible Markup Language (XML) systems and geographic information systems (GISs).
b) Relational data models were initially proposed by IBM researcher E.F. Codd in 1970. They are still
implemented today in the many different relational databases commonly used in enterprise
computing. Relational data modeling doesn’t require a detailed understanding of the physical
properties of the data storage being used. In it, data segments are explicitly joined through the
use of tables, reducing database complexity.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
Relational databases frequently employ structured query language (SQL) for data management.
These databases work well for maintaining data integrity and minimizing redundancy.
Entity-relationship (ER) data models use formal diagrams to represent the relationships between
entities in a database.
Several ER modeling tools are used by data architects to create visual maps that convey
database design objectives.
Object-oriented data models gained traction with object-oriented programming and became
popular in the mid-1990s.
The “objects” involved are abstractions of real-world entities.
Objects are grouped in class hierarchies and have associated features.
Object-oriented databases can incorporate tables but can also support more complex data
relationships.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
Dimensional data models were developed by Ralph Kimball, and they were designed to
optimize data retrieval speeds for analytic purposes in a data warehouse.
While relational and ER models emphasize efficient storage, dimensional models increase
redundancy in order to make it easier to locate information for reporting and retrieval.
This modeling is typically used across OLAP systems.
Two popular dimensional data models are the star schema, in which data is organized into facts
(measurable items) and dimensions (reference information), where each fact is surrounded by its
associated dimensions in a star-like pattern.
The other is the snowflake schema, which resembles the star schema but includes additional
layers of associated dimensions, making the branching pattern more complex.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
Data Modeling
TECNOLOGIE
It is transaction-oriented. It is subject-oriented.
High transaction volumes using few records at a time. Low transaction volumes using many records at a time.
Data Modeling
TECNOLOGIE
ER Modeling
ER Model is used to model the logical view
of the system from a data perspective
which consists of these components:
Data Modeling
TECNOLOGIE
ER Modeling
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name, DoB, Age,
Address, Mobile_No are the attributes that define entity type Student. In ER diagram, the attribute
is represented by an oval.
Key Attribute
The attribute which uniquely identifies each entity in the entity set is called key attribute. For
example, Roll_No will be unique for each student. In ER diagram, key attribute is represented by an
oval with underlying lines. A roll number is a unique identification number that can be
assigned to a student during admission or after registration.
Composite Attribute
An attribute composed of many other attribute is called as composite
attribute. For example, Address attribute of student Entity type consists
of Street, City, State, and Country. In ER diagram, composite attribute is
represented by an oval comprising of ovals.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Multivalued Attribute:
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more
than one for a given student). In ER diagram, a multivalued attribute is represented by a double oval.
Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute.
Age can be derived from Date of Birth. In ER diagram, the derived attribute is represented by a dashed oval.
Data Modeling
TECNOLOGIE
ER Modeling
Relationship Type and Relationship Set:
A relationship type represents the association between entity types.
For example,‘Enrolled in’ is a relationship type that exists between
entity type Student and Course. In ER diagram, the relationship type
is represented by a diamond and connecting the entities with lines.
Data Modeling
TECNOLOGIE
ER Modeling
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.
1. Unary Relationship:
When there is only ONE entity set participating in a relation, the Marrried
Person to
relationship is called a unary relationship. For example, one person is
married to only one person.
2. Binary Relationship:
When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship.
For example, a Student is enrolled in a Course.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
One-to-one When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-one.
Let us assume that a person can have only one Tax ID code and a Tax ID code can match just with one
person. So, the relationship will be one-to-one.
Many to one When entities in one entity set can take part only once in the relationship set and entities in other entity sets
can take part more than once in the relationship set, cardinality is many to one. Let us assume that a student
can take only one course, but one course can be taken by many students. So, the cardinality will be n to 1. It
means that for one course there can be n students but for one student, there will be only one course.
Many to many When entities in all entity sets can take part more than once in the relationship cardinality is many to many.
Let us assume that a student can take more than one course and one course can be taken by many
students. So, the relationship will be many to many.
INTRODUZIONE AL DIRITTO E ALLE TECNOLOGIE DIGITALI
INTRODUCTION TO DIGITAL TECHNOLOGY
Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee
Data Modeling
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Lessons Time Table
This week
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
ER Model is used to model the logical view
of the system from a data perspective
which consists of these components:
Data Modeling
TECNOLOGIE
ER Modeling
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age,
Address, Mobile_No are the attributes that define entity type Student. In ER diagram, the attribute
is represented by an oval.
Key Attribute
The attribute which uniquely identifies each entity in the entity set is called key attribute. For
example, Roll_No will be unique for each student. In ER diagram, key attribute is represented by
an oval with underlying lines. A roll number is a unique identification number that can be
assigned to a student during admission or after registration.
Composite Attribute
An attribute composed of many other attribute is called as composite
attribute. For example, Address attribute of student Entity type consists
of Street, City, State, and Country. In ER diagram, composite attribute is
represented by an oval comprising of ovals.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Multivalued Attribute:
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more
than one for a given student). In ER diagram, a multivalued attribute is represented by a double oval.
Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute.
Age can be derived from Date of Birth. In ER diagram, the derived attribute is represented by a dashed oval.
Data Modeling
TECNOLOGIE
ER Modeling
Relationship Type and Relationship Set:
A relationship type represents the association between entity types.
For example,‘Enrolled in’ is a relationship type that exists between
entity type Student and Course. In ER diagram, the relationship type
is represented by a diamond and connecting the entities with lines.
Data Modeling
TECNOLOGIE
ER Modeling
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.
1. Unary Relationship:
When there is only ONE entity set participating in a relation, the Marrried
Person to
relationship is called a unary relationship. For example, one person is
married to only one person.
2. Binary Relationship:
When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship.
For example, a Student is enrolled in a Course.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.
3. n-ary Relationship:
When there are n entities set participating in a relation, the relationship is called an an n-ary relationship.
Student enrolls
Student in Course taught Course
by Professor
Professor
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
One-to-one
Many to one
Many to many
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality: One-to-one
When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-one. Let us
assume that a male can marry one female and a female can marry one male. So, the relationship will be one-to-
one. the total number of tables that can be used in this is 2.
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality: One-to-one
Using Sets, it can be represented as:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality: Many-to-one
When entities in one entity set can take part only once in the relationship set and entities in other entity sets can take
part more than once in the relationship set, cardinality is many to one. Let us assume that a student can take only
one course, but one course can be taken by many students. So, the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course. The total number of tables that
can be used in this is 3.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality: many-to-one
Using Sets, it can be represented as:
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality: Many-to-many
When entities in all entity sets can take part more than once in the relationship cardinality is many to many. Let us
assume that a student can take more than one course and one course can be taken by many students. So, the
relationship will be many to many. The total number of tables that can be used in this is 3.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Data Modeling
TECNOLOGIE
ER Modeling
Participation Constraint
1. Total Participation
Each entity in the entity set must participate in the relationship. If each student must enroll in a course, the participation of
students will be total. Total participation is shown by a double line in the ER diagram.
2. Partial Participation
The entity in the entity set may or may NOT participate in the relationship. If some courses are not enrolled by any of the
students, the participation of the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course Entity set having
partial participation.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Participation Constraint
Participation Constraint:
Using set, it can be represented as:
Data Modeling
TECNOLOGIE
ER Modeling
Weak Entity Type and Identifying Relationship
As discussed before, an entity type has a key attribute that uniquely identifies each entity in the entity set.
But there exists some entity type for which key attributes can’t be defined.
These are called Weak Entity types.
For example, a company may store the information of dependents (Parents, Children, Spouse) of an Employee.
But the dependents don’t have existed without the employee.
So Dependent will be a weak entity type and Employee will be Identifying Entity type for Dependent.
A weak entity type is represented by a double rectangle.
The participation of weak entity types is always total.
The relationship between the weak entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling
Converting an ER diagram into the tables
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 1
Construct an E-R diagram for a car-insurance company whose customers own one or more cars each.
Each car has associated with it zero to any number of recorded accidents.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 1
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 2
Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors.
Associate with each patient a log of the various tests and examinations conducted.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 2 Id_doctor
test id
Id_patient
test results
date
is
takes test prescribed
by
is
patients followed
by
doctors
Id_patient Id_doctor
name name
specialization
address
telefon number
age
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 2
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 3
A university registrar’s office maintains data about the following entities:
b) course offerings, including course number, year, semester, section number, instructor(s), timings, and classroom;
Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be
appropriately modeled.
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 3 In the answer given here, the main
entity sets are student, course, course-
offering and instructor.
The entity set course-offering is a weak
entity set dependent on course.
The assumptions made are :
a) a class meets only at one particular
place and time. This E-R diagram
cannot model a class meeting at
different places at different times.
b) there is no guarantee that the
database does not have two
classes meeting at the same place
and time.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 4
Consider a database used to record the marks that students get in different exams of different course offerings.
a) Construct an E-R diagram that models exams as entities, and uses a ternary relationship, for the above database.
b) Construct an alternative E-R diagram that uses only a binary relationship between students and course-offerings. Make sure
that only one relationship exists between a particular student and course-offering pair, yet you can represent the marks that a
student gets in different exams of a course offering
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 4 a)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 4 b)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 5
Construct appropriate tables for each of the E-R diagrams in Exercises 1 to 3.
b) Hospital tables
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 5
a) Car insurance tables:
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 5
b) Hospital tables
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 5
c) University registrar’s tables
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 6
Design an E-R diagram for keeping track of the exploits of your favourite sports team.
You should store the matches played, the scores in each match, the players in each match and individual player statistics for
each match.
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 6
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 7
Extend the E-R diagram of the previous question to track the same information for all teams in a league.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Modeling
TECNOLOGIE
ER Modeling - Exercises
Exercise 7
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse
INTRODUZIONE AL DIRITTO E ALLE TECNOLOGIE DIGITALI
INTRODUCTION TO DIGITAL TECHNOLOGY
Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee
Data Warehouse
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
A Data Warehouse (DW), unlike an operational database, has different characteristics and
peculiarities.
This large data set will need to be:
Integrated: a fundamental requirement of a data warehouse is the integration of the collected
data.
Data from multiple transactional systems and external sources converge in the Data Warehouse.
The goal of integration can be achieved by following different paths:
through the use of uniform coding methods,
through the pursuit of a semantic homogeneity of all the variables,
using the same units of measurement
Subject oriented: DW is oriented towards specific business topics rather than applications or
functions.
In a DW, data is stored so that it can be easily read or processed by users.
The goal, therefore, is no longer that of minimizing redundancy through normalization, but that of
providing data organized in such a way as to favor the production of information.
We move from functional design to data modeling that allows a multidimensional view of the same
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
The Data Warehouse, therefore, describes the process of acquiring, transforming and distributing
information present inside or outside companies as a support to managers and decision-makers.
Compared to traditional relational databases, a data warehouse offers significant advantages in
terms of performance and results.
W.H.Immon, recognized as the "father" of the Data Warehouse (DW), defines it as follows:
«A subject-oriented, integrated, time-variant and non-volatile collection of data, in support of management's decision
making»
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Subject oriented:
organized around specific aspects of the company (customers, sales, orders, etc...)
focused on data useful for decision making, and not on day-to-day operations
aggregated and historicised
Integrated:
integrates data from different and heterogeneous sources (relational databases, text files,
transactional databases, etc...)
ensures the consistency of the integrated data using data cleaning and data integration
techniques.
the data is converted to ensure its consistency and only subsequently entered into the Data
Warehouse
Time-variant:
the data does not only provide current information but has a historical dimension
Non-volatile:
it is an archive physically separate from the databases used for daily operations.
it does not require continuous updating operations and therefore it does not need support for the
management of transactions and concurrency.
the only operations that can be performed on a data warehouse are the initial data load and read
access.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Subject oriented
A data warehouse target on the
modeling and analysis of data for
decision-makers.
Therefore, data warehouses
typically provide a concise and
straightforward view around a
particular subject, such as
customer, product, or sales,
instead of the global
organization's ongoing
operations.
This is done by excluding data
that are not useful concerning the
subject and including all data
needed by the users to
understand the subject.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Integrated
A data warehouse integrates
various heterogeneous data
sources like RDBMS, flat files, and
online transaction records.
It requires performing data
cleaning and integration during
data warehousing to ensure
consistency in naming
conventions, attributes types, etc.,
among different data sources.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Time-variant
Historical information is kept in a
data warehouse.
For example, one can retrieve
files from 3 months, 6 months, 12
months, or even previous data
from a data warehouse.
These variations with a
transactions system, where often
only the most current file is kept.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Non-volatile
The data warehouse is a physically separate data
storage, which is transformed from the source
operational RDBMS.
The operational updates of data do not occur in the
data warehouse, i.e., update, insert, and delete
operations are not performed.
It usually requires only two procedures in data
accessing: Initial loading of data and access to data.
Therefore, the DW does not require transaction
processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval.
Non-Volatile defines that once entered into the
warehouse, and data should not change.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
3) The structure of data warehouses is more accessible for end-users to navigate, understand, and query.
4) Queries that would be complex in many normalized databases could be easier to build and maintain in Data Warehouses.
5) Data warehousing is an efficient method to manage demand for lots of information from lots of users.
6) Data warehousing provide the capabilities to analyze a large amount of historical data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
It includes detailed information used to run the day-to-day operations of the business.
The data frequently changes as updates are made and reflect the current value of the last transactions.
Operational Database Management Systems also called as OLTP (Online Transactions Processing Databases), are used to
manage dynamic data in real-time.
Data Warehouse Systems serve users or knowledge workers in the purpose of data analysis and decision-making.
Such systems can organize and present information in specific formats to accommodate the diverse needs of various users.
Data Warehouse and the OLTP database are both relational databases.
Data Warehouse
TECNOLOGIE
Operational systems are usually concerned with current data. Data warehousing systems are usually concerned with historical data.
Data within operational systems are mainly updated regularly according Non-volatile, new data may be added regularly. Once Added rarely
to need. changed.
It is designed for real-time business dealing and processes. It is designed for analysis of business measures by subject area,
categories, and attributes.
It is optimized for a simple set of transactions, generally adding or It is optimized for extent loads and high, complex, unpredictable queries
retrieving a single row at a time per table. that access many rows per table.
It is optimized for validation of incoming information during transactions, Loaded with consistent, valid information, requires no real-time validation.
uses validation data tables.
It supports thousands of concurrent clients. It supports a few concurrent clients relative to OLTP.
Operational systems are widely process-oriented. Data warehousing systems are widely subject-oriented
Operational systems are usually optimized to perform fast inserts and Data warehousing systems are usually optimized to perform fast
updates of associatively small volumes of data. retrievals of relatively high volumes of data.
Data Warehouse
TECNOLOGIE
Information systems that rely on a traditional database are often called OLTP (on-line transaction processing)
systems.
Their function is to perform everyday operations: data modification and simple read operations.
A Data Warehouse, on the other hand, is the heart of an OLAP (on-line analytical processing) system.
Its function is to provide support to data analysis operations and decision-making processes.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Differences between:
OLTP
OLAP
OLTP OLAP
customer-oriented (used by employees or by business oriented and used by managers, data
customers of the organization) analysts, etc..
detailed data, often too detailed to be useful for synthetic and aggregated data
decision making developed from star or snowflake diagrams
developed starting from an ER diagram historical data
current data read-only but very complex queries
fast accesses and to be treated in an atomic way,
which require control of the concurrency between
the various user transactions
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Cube dimensions are the entities against which an organization wants to keep track of its data.
Example: a company can create a “Sales” Data Warehouse to record the company's sales, based
on the dimensions time, product, branch and customer.
In each position of the cube a fact is entered, i.e., the value assumed by the variable to be
analysed.
“Product units sold” and “Sales Revenues” are examples of facts.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
You have different cuboids depending on the dimensions that are chosen and the level of detail of each
dimension – for the dimension period you can choose a quarter as a level of detail (as done in the previous
slides), but also a single month, or a semester .
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
A concept hierarchy is a set of associations between detail concepts and gradually more aggregate concepts
that is associated with a dimension.
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
STAR SCHEMA
Data Warehouse
TECNOLOGIE
STAR SCHEMA
Data Warehouse
TECNOLOGIE
STAR SCHEMA
A star schema is the elementary form of a dimensional model, in which data are organized into facts and dimensions. A fact is an
event that is counted or measured, such as a sale or log in. A dimension includes reference data about the fact, such as date,
item, or customer.
A star schema is a relational schema where a relational schema whose design represents a multidimensional data model. The
star schema is the explicit data warehouse schema. It is known as star schema because the entity-relationship diagram of this
schemas simulates a star, with points, diverge from a central table. The center of the schema consists of a large fact table, and
the points of the star are the dimension tables.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
STAR SCHEMA
Fact Tables
A table in a star schema which contains facts and connected to dimensions. A fact table has two types of columns: those that
include fact and those that are foreign keys to the dimension table. The primary key of the fact tables is generally a composite
key that is made up of all of its foreign keys.
A fact table might involve either detail level fact or fact that have been aggregated (fact tables that include aggregated fact are
often instead called summary tables). A fact table generally contains facts with the same level of aggregation.
Dimension Tables
A dimension is an architecture usually composed of one or more hierarchies that categorize data. If a dimension has not got
hierarchies and levels, it is called a flat dimension or list. The primary keys of each of the dimension tables are part of the
composite primary keys of the fact table. Dimensional attributes help to define the dimensional value. They are generally
descriptive, textual values. Dimensional tables are usually small in size than fact table.
Fact tables store data about sales while dimension tables data about the geographic region (markets, cities), clients, products, times,
channels.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
STAR SCHEMA
Advantages of Star Schema
Star Schemas are easy for end-users and application to understand and navigate. With a well-designed schema, the customer can
instantly analyze large, multidimensional data sets.
The main advantage of star schemas in a decision-support environment are:
Query Performance
A star schema database has a limited number of table and clear join paths, the query run faster than they do against OLTP
systems. Small single-table queries, frequently of a dimension table, are almost instantaneous. Large join queries that contain
multiple tables takes only seconds or minutes to run. In a star schema database design, the dimension is connected only through
the central fact table. When the two-dimension table is used in a query, only one join path, intersecting the fact tables, exist
between those two tables. This design feature enforces authentic and consistent query results.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
STAR SCHEMA
Advantages of Star Schema
Load performance and administration
Structural simplicity also decreases the time required to load large batches of record into a star schema database. By describing
facts and dimensions and separating them into the various table, the impact of a load structure is reduced. Dimension table can
be populated once and occasionally refreshed. We can add new facts regularly and selectively by appending records to a fact
table.
Built-in referential integrity
A star schema has referential integrity built-in when information is loaded. Referential integrity is enforced because each data in
dimensional tables has a unique primary key, and all keys in the fact table are legitimate foreign keys drawn from the dimension
table. A record in the fact table which is not related correctly to a dimension cannot be given the correct key value to be retrieved.
Easily Understood
A star schema is simple to understand and navigate, with dimensions joined only through the fact table. These joins are more
significant to the end-user because they represent the fundamental relationship between parts of the underlying business.
Customer can also browse dimension table attributes before constructing a query.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
STAR SCHEMA
Disadvantage of Star Schema
There is some condition which cannot be meet by star schemas
like the relationship between the user, and bank account cannot
describe as star schema as the relationship between them is many
to many.
Example: Suppose a star schema is composed of a fact table, SALES, and
several dimension tables connected to it for time, branch, item, and
geographic locations.
The TIME table has a column for each day, month, quarter, and year. The
ITEM table has columns for each item_Key, item_name, brand, type,
supplier_type. The BRANCH table has columns for each branch_key,
branch_name, branch_type. The LOCATION table has columns of
geographic data, including street, city, state, and country.
In this scenario, the SALES table contains only four columns with IDs from the dimension tables, TIME, ITEM, BRANCH, and LOCATION,
instead of four columns for time data, four columns for ITEM data, three columns for BRANCH data, and four columns for LOCATION data.
Thus, the size of the fact table is significantly reduced. When we need to change an item, we need only make a single change in the
dimension table, instead of making many changes in the fact table.
We can create even more complex star schemas by normalizing a dimension table into several tables.
The normalized dimension table is called a Snowflake.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
SNOWFLAKE SCHEMA
Data Warehouse
TECNOLOGIE
SNOWFLAKE SCHEMA
A schema is known as a snowflake if one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables.
The snowflake schema is an expansion of the star schema where each point of the star explodes into more points.
It is called snowflake schema because the diagram of snowflake schema resembles a snowflake.
Snowflaking is a method of normalizing the dimension tables in a STAR schemas. When we normalize all the dimension tables entirely,
the resultant structure resembles a snowflake with the fact table in the middle.
Snowflaking is used to develop the performance of specific queries.
The schema is diagramed with each fact surrounded by its associated dimensions, and those dimensions are related to other
dimensions, branching out into a snowflake pattern.
The snowflake schema consists of one fact table which is linked to many dimension tables, which can be linked to other dimension
tables through a many-to-one relationship.
Tables in a snowflake schema are generally normalized to the third normal form. Each dimension table performs exactly one level in a
hierarchy.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
Data Warehouse
TECNOLOGIE
SNOWFLAKE SCHEMA
Advantage of Snowflake Schema
The primary advantage of the snowflake schema is the development in query performance due to minimized disk storage
requirements and joining smaller lookup tables.
It provides greater scalability in the interrelationship between dimension levels and components.
No redundancy, so it is easier to maintain.
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
Data Warehouse
TECNOLOGIE
GALAXY SCHEMA
Data Warehouse
TECNOLOGIE
Data Warehouse
Exercises
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 1
You want to create a data warehouse for a company that sells wholesale furniture.
The data warehouse must allow the company to analyze the company's revenues.
Costs and revenues must be analyzed considering the following parameters: furniture, customers, time (day level).
The company is interested in analyzing the furniture with respect to its type (tables, chairs, beds, wardrobes, etc.) and with
respect to its category (kitchen, living room, bedroom, bathroom, office, etc.).
The company wants to analyze customers with respect to their geographical location, considering at least city, region, state.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 1
Furnitures
Furniture-k Customers
Description
Customer-k
Style
First name
Material
Type
Sales Last name
(Fact Table Business name
Category
Furniture-k Address
Customer-k City
Time-k Region
Country
Time Amount
Time-k Total price
Date Discount
Day of the week
Month
Quarter
Year
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 2
Starting from the following E/R diagram… … let's build the Data
Warehouse to analyze
Work code author library loans:
ISBN code
(1,N) (1,1)
edition
By User
LITERARY WORK year
LOAN
editor id editor name
User Author
• Author Code
• User Code
• Author first name
• Username
• Author last name
• User Last Name
• Date of birth
• Date of birth
• User Tax Code
• User address
• User City
• User Region
• User Country
Loan
• Loan date
• Return date
•
•
User Code
Serie code
Literary series
• Author Code • Serie code
• Work code • Serie name
Literary work • Serie year
• Work code • Editor name
• Work title • Edition code
• Year of publication • Edition year
• Author code
• Serie code
• ISBN code
• Edition code
• Edition year
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
ERA Diagrams
Homeworks 2024 March 24°