You are on page 1of 176

INTRODUZIONE AL DIRITTO E ALLE TECNOLOGIE DIGITALI

INTRODUCTION TO DIGITAL TECHNOLOGY

Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee

SCIENZE GIURIDICHE PER LE NUOVE


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Educational Objectives
TECNOLOGIE

Technology is pervasive in our daily experience, but we must never forget that it is a tool at our service. We are surrounded by
data, applications and digital services of all kinds and nature and we ourselves, more or less consciously, are users of these new
digital technologies on a daily basis.
Technology makes it possible to record, to store and to analyze ever-growing quantities of data, to search, to book, to pay for
goods and services, to manage relations with the public administration, to express our opinion on the services and goods we use,
to access virtual realities, to consume entertainment services whenever we want and wherever we are, to communicate for work
or pleasure with anyone wherever they are.
Therefore, new problems of Control, Quality, Reliability, Certification, Security of the digital platforms on which we operate arise.
The boundaries between social and private are more blurred, uncertain, indefinite, data can make us freer and more aware, or
more vulnerable and orientable.
Hand in hand with technology, it is necessary to develop the ability to search, integrate, elaborate, imagine and understand.
And together with all this we need ethical awareness and social responsibility.
In this context, great attention is paid to Artificial Intelligence and how it can be applied in different fields and how its
applications are changing the world. Machine Learning in recent years has found wide areas of application, for example in the
field of health.
In this course we will address these issues with an approach where the questions we ask ourselves will be more important than the
answers we will find, together or individually.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Lessons Time Table
date time topics
Monday 4 March 2024 18-20 New Paradigms for Digital Architectures
Tuesday 5 March 2024 18-20 New Paradigms for Data Architectures
Wednesday 6 March 2024 18-20 Emerging Issues - Data Privacy/Data Governance
Monday 11 March 2024 18-20 GDPR - Regulations & Roles
Tuesday 12 March 2024 18-20 Data Analysis & Design: W6H
Wednesday 13 March 2024 18-20 Data Modeling - ER Model
Monday 18 March 2024 18-20 ER Model: Identification Keys - Hierarchies
Tuesday 19 March 2024 18-20 ER Model: Exercises on ER Diagrams
Wednesday 20 March 2024 18-20 ER Model: Exercises on Trasforming ERD into Tables
Monday 25 March 2024 18-20 Data Warehouse-Multidimensional Model
Tuesday 26 March 2024 18-20 Data Warehouse-Star Schema, Snowflake Skema, Galaxy Schemas
Wednesday 27 March 2024 18-20 Data Warehouse-Exercises on Star/Snowflake/Galaxy Schemas
Monday 8 April 2024 18-20 Big Data: Definition; NoSQL models
Tuesday 9 April 2024 18-20 Artificial Intelligence
Wednesday 10 April 2024 18-20 Machine Learning
Monday 15 April 2024 18-20 Test on the first part of the course
Tuesday 16 April 2024 18-20 Presentation and discussion of the papers on the first part of the course
Wednesday 17 April 2024 18-20 Deep Learning & Neural Networks
Monday 22 April 2024 18-20 Deep Learning & Neural Networks
Tuesday 23 April 2024 18-20 Regulation on Artificial Intelligence
Wednesday 24 April 2024 18-20 BlockChain: Cryptocurrencies, Smart Contracts and Certification
Monday 6 May 2024 18-20 Digital Identity: Definition and Regulations
Tuesday 7 May 2024 18-20 Digital Divide: Definition and Regulations
Wednesday 8 May 2024 18-20 The Industry 4.0 national plan: Objectives and enabling technologies
Monday 13 May 2024 18-20 Digital Era: social and behavioral implications of new technologies
Tuesday 14 May 2024 18-20 ChatGPT: Ethical, Social and Economic Implications
Wednesday 15 May 2024 18-20 Test on the second part of the course
Monday 20 May 2024 18-20 Final Discussion about the topics of the course
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

XX° century XXIst century


IT Stone Age IT Bronze Age IT Iron Age
Smart Connected Objects
Internet Web 2.0
(IoT – IoE)
Social Networks Auto Generated Data
Multimedia Data
Automation Big Data BIG ISSUES
Destructured Data
NoSQL Data Bases Data Quality
Structured Data Service Oriented Architectures Data Governance
Artificial Intelligence Data Protection
Mobile Technologies & Apps
Custom Applications Multidevice User Experience
Machine Learning Cyber Security
Ethics
Deep Learning & Neural Networks
Local IT Infrastructures Cloud
IaaS Micro Services Architectures
PaaS
Development vs Operations New Paradigms for Application Governance
SaaS DevOps

Human Powered IT DaaS


New Paradigms for IT Infrastructures Mgnt SW Powered IT
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

FROM THE DEVELOPMENT OF CUSTOM APPLICATIONS TO THE "COMPOSITION OF SERVICES" ON CLOUD ARCHITECTURES FROM
SYSTEM INTEGRATOR TO IT SERVICES ORCHESTRATOR
The current scenario:
 The development of custom IT services through massive coding has decreasing margins, also due to the contraction in demand for these services
and the collapse of unit prices for the day/person or for the Function Point (traditional standard metric for measuring software applications);
 the management of large custom application assets is an activity with very low or no margins and with a high risk of fault;
 the management of local “human intensive” data & processing centers involves very high operating costs, continuous investments for revamping,
with inadequate scalability compared to market demand of IT services.

Consequentially:
 The professions of programmer, system administrator, network expert or storage manager are non-strategic professions in mature and evolved IT
markets and can in any case be acquired cheaply in near or remote shoring mode;
 The emerging and dominant profile is that of the solution architect, who in close contact with the business understands the core processes and
the needs of IT services and packages the flexible and scalable solution by composing elementary services offered in as a service mode by the
technological platforms present on a global scale (on cloud).

https://aws.amazon.com/it/about-aws/global-infrastructure/?p=ngi&loc=1
https://cloud.google.com/about/locations?hl=it#regions
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

GDPR
Test AI Regulation
Laws &
Pick some element of digital technology Regulations
Digital Identity Regulation
Cryptocurrency Regulation
innovations and relate it to some digital Smart Contracts Regulation
opportunity and then relate these …

opportunities to some emerging threat


and then relate Regulatory Acts to
Manage
theese threats

Technological Enable Bring


Opportunities Threats
innovations
Data Intelligence Digital identity theft
Internet Business Process Automation Digital Divide
Mobile Decision Support System Digital Frauds
Cloud Digital Market Arbitrary Profiling of Citizens
Wide Band – 4G/5G Digital Health & Care Hacker Attacks
Big Data Digital Cytizenship ...

ML & DL Digital Services for Citizens&Companies


BlockChain Digital Payments
High Performance Computer: processing and storage Smart Contracts
… Artificial Intelligence Services & Applications
Homeland Security
Citizens Security
Cyber Security
...
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

TOP TECHNOLOGIES THAT ENABLE DIGITAL TRANSFORMATION


1. Internet
2. Cloud technology
3. Mobile
4. Big data
5. 5G
6. Internet of Things (IoT)
7. Artificial intelligence (AI) and machine learning
8. Augmented reality (AR) and virtual reality (VR)
9. Digital twins
10. Blockchain
11. Robotic process automation (RPA)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

INTERNET
The internet can be described as a vast network of interconnected computers and devices spanning the globe, facilitating communication, information
exchange, and resource sharing. It is a dynamic and ever-expanding network that enables users to access a wealth of content and services, including
websites, email, social media, online shopping, streaming media, and more.
At its core, the internet operates on a decentralized system of interconnected networks, each comprising servers, routers, switches, and other
infrastructure components. These networks use standardized protocols and technologies, such as TCP/IP (Transmission Control Protocol/Internet
Protocol), to ensure seamless communication between devices regardless of their location or type.
The internet has revolutionized the way we live, work, and interact with one another. It has democratized access to information, enabling individuals to
educate themselves, express their opinions, and engage in public discourse. Additionally, the internet has transformed industries, from commerce and
entertainment to healthcare and education, by providing new avenues for innovation, collaboration, and efficiency.
However, the internet also presents challenges and risks, including privacy concerns, cybersecurity threats, misinformation, and digital divide issues.
As society continues to rely more on the internet for everyday tasks and activities, it becomes increasingly important to address these challenges
while leveraging the internet's potential for positive impact and global connectivity.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

CLOUD TECHNOLOGY
If you have a free Gmail account or regularly back up your phone photos to a Google Drive or iCloud storage account, you’re already using cloud
services in your day-to-day life.
While these examples of digital transformation with the cloud are on the more basic end of the spectrum, businesses can leverage the full power of
cloud computing in multiple ways:
 Backing up copies of big data
 Hosting websites
 Deploying software infrastructure to a global workforce
 Giving team members remote access to essential programs
While we may use the word “cloud” to describe where this data is located, it does have a terrestrial presence.
Cloud-based data exists on one or more servers separate from your location.
Your Gmail messages, for example, might currently be in a Google data center in Henderson, Nevada, or Middenmeer, Netherlands!
Keeping your data in the cloud—spread across different data centers—is beneficial for safety.
If all of your data is in one location, such as in a server in your building, and it’s damaged, your business operations may go down.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

MOBILE
The first tool in digital transformation is one we already have on our hands and use daily: mobile devices.
The rise of mobile technology is a cornerstone of the digital age.
It’s already helped businesses around the world with tasks that range from basic (answering emails while on vacation) to complex (getting medical
equipment into mobile clinics and remote locations).
Each use of mobile technology in business will look different from the next based on industries and needs.
Depending on your business’ focus, you might use mobile technology to:
 Enable team members to call into staff meetings while working remotely
 Use voice over internet protocol (VoIP) phone systems that keep you connected to your office line
 Track patients’ vital signs with the use of wearable sensors
 Connect healthcare providers and specialists in remote locations
Unlocking the full benefits of mobile technology does require a few other pieces of tech:
 cloud storage
 high-speed mobile data
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

BIG DATA
Big data refers to large volumes of raw data that must be processed and interpreted in order to glean useful results.
Big data collection happens thanks to the Internet of Things, telecommunications, website trackers, social networks, and any other place where
people or systems interact with your business.
Big data is typically stored in databases.
While this itself isn’t new—companies have been using big data analytics since the early days of computing—the ways in which it is processed are
changing.
Many companies are now turning to artificial intelligence (AI) and machine learning to comb through their big data and use the results in the
development of business strategies.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

5G 5G data is ultra fast, with low latency. This means that


you can transmit large quantities of data with virtually
no lag.
Think back to the last time you were on a video call and
you started to experience the video or audio breaking
up. That’s the effect of latency and data transmission
speeds. On a 5G connection, you’re likely to see fewer of
those problems.
5G’s benefits go beyond improving video calls for
remote work, though. The low latency of 5G
connections means the technology is well suited for
powering the Internet of Things, allowing for better
transmission of data, metrics, video footage, and more.
At the time of writing, 5G networks are still rolling out
across the world.
This chart from OpenSignal shows the percentage time
users were able to access 5G signals in different
countries as of May 2023
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

INTERNET OF THINGS (IOT)


The Internet of Things (IoT) is the array of sensors, components, machines, and other equipment that’s connected to the internet.
The IoT can encompass everything from your home refrigerator to heat sensors on a production line.
While the Internet of Things is already active, with use across supply chains, manufacturing, property management, and more, it’ll become even easier
and faster to adopt IoT technologies as 5G access expands.
Thanks to 5G’s low latency, you’ll be able to use the IoT to monitor office temperatures, watch security camera footage, and even control equipment in
near real-time.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

ARTIFICIAL INTELLIGENCE (AI) AND MACHINE LEARNING


Artificial intelligence (AI) and machine learning are used across many industries, including in ways you may not realize.
If you’ve ever clicked on a suggested product while shopping on Amazon, read a suggested post on LinkedIn, or engaged with a customer service
chatbot when contacting your cell phone provider, you’ve probably interacted with AI.
While AI and machine learning aren’t the same thing, they often go hand in hand.
Machine learning uses data to train computers to carry out actions and make decisions based on existing data points.
As such, machine learning can be used to create and optimize artificial intelligence processes.
AI is often used to boost efficiency, improve process automation, speed up service deliveries, transform customer insights into business decisions, and
identify new business opportunities.
McKinsey researchers expect that AI will continue to become more commonplace across software and electronics industries and will be used to
support healthcare and pharmaceutical services in increasingly complicated ways.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

AUGMENTED REALITY (AR) AND VIRTUAL REALITY (VR)


Augmented reality (AR) and virtual reality (VR) experiences use technology to create 3D simulations that feel immersive.
The application of AR and VR in business can range from modeling a new building or product in its intended location to holding full business meetings
around a virtual conference table.
E-Commerce Portal “Target.com” is a great example of a business using AR to enhance its operations.
When using the company’s mobile app, Target shoppers can view digital models of products within their living space.
As the adoption of requisite VR technology, such as headsets, expands, you may find that holding meetings in the metaverse is an appealing way to
collaborate with your distributed team.

TM & © 2023 Target Brands, Inc.


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

DIGITAL TWINS
Digital twins are digital, data-based business models that replicate people, places, systems, and processes in the real world. In business, you can use a
digital twin to do things like:
 Model the impact of changes on a production line
 Predict customers’ buying habits and adjust inventory levels
 Test different office building layouts for productivity
 Model the impact of launching a new operating model
 Project how your new products may be received by your market
 Determine when customers prefer to receive important information
 Try out new business processes without risk
The biggest benefit of a digital twin is that it can be used to forecast and predict potential problems. If you can use a digital twin to predict what
might go wrong—and mitigate the problem before it can happen in the real world—you can save resources, time, and money.
If you want to walk around a business metaverse, the technologies that power digital twins also help to create augmented and virtual reality tools.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

BLOCKCHAIN
A blockchain is a digital ledger, or rows of data, shared across (and accessible by) a computer network.
When you add to a blockchain, one of the computers in the blockchain network writes data, called a block, to the ledger.
Once created, an entry on the ledger can’t be edited without breaking the entire chain of information, so the blockchain becomes a reliable record of
transactions and events.
Publicly accessible blockchains are used to record the exchange of digital currency, like Bitcoin, and the sale of digital goods like non-fungible tokens
(NFTs).
When used privately within your organization, though, a blockchain can become a verifiable record of approvals, contracts, data entries, and more.
The blockchain can also be used to store an organization’s website and app data, track inventory through supply chains, or manage information about
international flights.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

New Paradigms for Information Technology


TECNOLOGIE

ROBOTIC PROCESS AUTOMATION (RPA)


Robotic process automation is an offshoot of machine learning and artificial intelligence. When you deploy robotic process automation, you’re actually
training software to make the same types of interactions that a human would. The actions and responsibilities of an RPA can go beyond that of a
more simplistic AI.
You might train an RPA to execute specific keystrokes as if it was typing, read what’s on a screen and make decisions, or engage with other RPAs and
even humans.
In 2018, Google rolled out a consumer-focused RPA called Duplex, now part of Google Assistant. The company billed Duplex as an AI assistant for
placing routine calls, such as making reservations, that a person may prefer to outsource to technology. Google has since used humans to help train
Duplex—and intervene when necessary—to make the program more independent.
Companies interested in utilizing RPA for their digital business transformations may want to use these software robots to:
 Help ease the burden placed on customer service agents
 Complete paperwork
 Evaluate data from sensors on production lines
 Respond to social media messages
 Execute data-heavy tasks
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
New Paradigms for the Data Intelligence
TECNOLOGIE

FROM SEQUENTIAL OR TABULAR DATA TO UNSTRUCTURED AND HETEROGENEOUS DATA


FROM LOCAL DATABASES TO DISTRIBUTED DATA ARCHITECTURES (LOCAL, CLOUD OR HYBRID ENVIRONMENTS)
The current scenario:
 The development activities of proprietary and vertical databases with respect to IT applications have decreasing margins, also due to the
contraction in demand for these services and the collapse of unit prices for the day/person;
 running custom application data ecosystems managed on local data centers is an activity with very low margins and a high risk of fault;
 the management of local human intensive Data Centers involves very high operating costs, continuous investments for revamping, with
inadequate scalability compared to market demands.
Consequentially:
 The professions of the traditional data analyst, the SQL programmer, the ETL developer, even the DW/BI expert and the online reporting expert
are no longer strategic professions in the new Data Intelligence scenarios in mature and evolved markets and can be acquired at cheap in near or
remote shoring mode
 The emerging and dominant profiles are those of the data architect, the data scientist, the data solution architect who, in close contact with the
experts of the Application Domain, understand the core processes and the needs of analysis, processing, exploration, research, exploitation of
information and they package the flexible and scalable solution, composing elementary services offered in as a service mode by the technological
platforms present on a global scale (on cloud)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision
DATA
Recording of the value assumed, at a certain moment, by a variable, by a phenomenon, by an entity, by an event
Data is certainly NEUTRAL

INFORMATION
Element of knowledge deriving from observation, deductive or inductive analysis, analysis of
relationships/correlations on a set of measurable or non-measurable data
The concept of information is closely linked to the concept of communication
The information, as result of more or less complex analyses, based on implicit or explicit interpretative models,
could contain elements of subjective evaluation
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision
DECISION
Definitive judgement, which overcomes any pre-existing doubts and uncertainties, which determines a choice, an action
and consequences.
The decisions are arbitrary but involve an assumption of responsibility for the consequences.
Decisions are influenced by data, information and opinions.
In complex systems it is impossible to foresee all the possible consequences, direct and indirect, of a decision relevant to
the system.
What can be done is to use data and information to reasonably reduce the degree of uncertainty of the effects of a decision
on the system.
There are no decisions that are neutral with respect to the opinions of the decision maker. There are no objectively right
decisions, unless you evoke moral issues.
But one can aspire to rational decisions, based on neutral data, on partially neutral information and on arbitrary opinions.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Value

WE NEED DATA WE NEED KNOWLEDGE TO DECISIONS HAVE EFFECTS OVER


FOR DECIDE IN A CONSCIOUS TIME WHICH MUST BE MEASURED
KNOWLEDGE WAY AND EVALUATED

PDCA
CYCLE
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision
Experience +
Knowledge +
Wisdom
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision

Emerging Issues:
 Data Quality
 Data Governance
 Data Protection
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Quality
UNIQUENESS OF THE DATA: THE DATA REFERRING TO SPECIFIC EVENTS
MAIN ASPECTS OF DATA QUALITY ARE RECORDED ONLY ONCE WITHIN THE DATABASE

UNIQUENESS DATA COMPLETENESS: THE PERCENTAGE OF MEASURED/RECORDED


DATA COMPARED TO THE TOTAL MEASURABLE/RECORDABLE DATA
COMPLETENESS
TIMELINESS OF THE DATA: THE RECORDED DATA MUST CORRESPOND TO
TIMELINESS A MOMENT IN TIME AS CLOSE AS POSSIBLE TO THE TIME OF USE OF THE
DATA ITSELF
VALIDITY
DATA VALIDITY: DATA IS VALID IF IT CONFORMS TO THE SYNTAX
ACCURACY
(FORMAT, TYPE, TIME REFERENCE, ETC.) OF ITS SEMANTICS
CONSISTENCY
DATA ACCURACY: THE LEVEL OF ACCURACY WITH WHICH THE DATA
OWNERSHIP CORRECTLY DESCRIBES THE "REAL WORLD" OBJECT OR EVENT BEING
DESCRIBED

CONSISTENCY OF THE DATA: THE DATA MUST BE MEANINGFUL AND


EFFECTIVELY USABLE

DATA OWNERSHIP: EACH DATA MUST HAVE A SINGLE OWNER, I.E. AN


"OWNER" WHO HAS THE AUTHORITY TO MODIFY THE INFORMATION
CONTAINED
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Governance
DATA DICTIONARY: DETAILED DESCRIPTIONS OF THE PHYSICAL STRUCTURE OF THE
DATA (TYPE, LENGTH, TECHNICAL NAME, FORMAT, ETC.)
DATA MANAGEMENT & GOVERNANCE PROCESSES(*) DATA CATALOG: SINGLE REFERENCE SOURCE FOR LOCATING ANY DATA SET
RELEVANT TO THE APPLICATION CONTEXT
DATA DICTIONARY
BUSINESS GLOSSARY: SET OF TERMS USED WITHIN AN ORGANIZATION TO REFER
DATA CATALOG TO ITS BUSINESS, MUST BE EASILY UNDERSTANDABLE AT ALL LEVELS OF THE
BUSINESS GLOSSARY ORGANIZATION AND MUST DEFINE WHAT EACH TERM MEANS FROM A BUSINESS
PERSPECTIVE
METADATA MANAGEMENT
METADATA MANAGEMENT: IS THE PROCESS OF MANAGING METADATA
MASTER DATA MANAGEMENT (INFORMATION ABOUT DATA) WHICH ARE THE BUSINESS AND CONTEXT
INFORMATION OF AN ORGANIZATION'S DATA
DATA LINEAGE
MASTER DATA MANAGEMENT: IT MEANS SYSTEMATIZING, INTEGRATING,
DATA SECURITY STANDARDIZING AND SHARING THE MASTER DATA RELATING TO DATA RELEVANT TO
DATA STEWARDSHIP THE APPLICATION CONTEXT
DATA LINEAGE: METHODOLOGY FOR MANAGING THE LIFE CYCLE OF DATA, FOR
(*) Data Governance establishes policies and procedures TRACING WHERE IT COMES FROM, WHAT TRANSFORMATIONS IT UNDERGOES, HOW
around data, while Data Management enacts those policies AND WHERE IT IS USED
and procedures to compile and use that data for decision-
making (Tableau Software) DATA SECURITY: PROCESS OF PROTECTING DATA FROM UNAUTHORIZED ACCESS
AND DAMAGE THROUGHOUT ITS LIFECYCLE
DATA STEWARDSHIP: PROCESS THAT ENSURES THE EXISTENCE OF DOCUMENTED
PROCEDURES AND GUIDELINES FOR ACCESSING AND USING DATA
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Privacy
REGULATION (EU) 2016/679 (GENERAL DATA PROTECTION REGULATION)
Key GDPR roles:
• Controller
• Processor
• Data Protection Officer (DPO)
• Supervisory Authority
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Privacy

NEXT WEEK - GDPR (GENERAL DATA PROTECTION REGULATION)


INTRODUZIONE AL DIRITTO E ALLE TECNOLOGIE DIGITALI
INTRODUCTION TO DIGITAL TECHNOLOGY

Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee

SCIENZE GIURIDICHE PER LE NUOVE


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Lessons Time Table
This week
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data, Information and Decision

Emerging Issues:
 Data Quality
 Data Governance
 Data Protection
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Privacy
REGULATION (EU) 2016/679 (GENERAL DATA PROTECTION REGULATION)
Key GDPR roles:
• Controller
• Processor
• Data Protection Officer (DPO)
• Supervisory Authority
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
Protection of personal data
In January 2012, the EC proposed a comprehensive reform of data protection rules in the EU. On 4 May 2016, the
official texts of the Regulation and the Directive have been published in the EU Official Journal in all the official
languages. The Regulation entered into force on 24 May 2016, it had to be applied from 25 May 2018. The Directive
entered into force on 5 May 2016 and EU Member States had to transpose it into their national law by 6 May 2018.

Objective of this new set of rules about Personal Data protection


The objective of this new set of rules is to give citizens back control over of their personal data, and to simplify the
regulatory environment for business. The data protection reform is a key enabler of the Digital Single Market which
the Commission has prioritised. The reform allowed European citizens and businesses to fully benefit from the digital
economy.
Whenever you open a bank account, join a social networking website or book a flight online, you hand over vital
personal information such as your name, address, and credit card number. What happens to this data? Could it
fall into the wrong hands? What rights do you have regarding your personal information? Everyone has the right to
the protection of personal data.
Under EU law, personal data can only be gathered legally under strict conditions, for a legitimate purpose.
Furthermore, persons or organisations which collect and manage your personal information must protect it and
must respect certain rights of the data owners which are guaranteed by EU law.
Every day within the EU, businesses, public authorities and individuals transfer vast amounts of personal data across
borders. Conflicting data protection rules in different countries would disrupt international exchanges. Individuals
might also be unwilling to transfer personal data abroad if they were uncertain about the level of protection in
other countries.
Therefore, common EU rules have been established to ensure that your personal data enjoys a high standard of
protection everywhere in the EU. You have the right to complain and obtain redress if your data is misused
anywhere within the EU.
The EU's Data Protection Directive also foresees specific rules for the transfer of personal data outside the EU to
ensure the best possible protection of your data when it is exported abroad.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
Data Protection Officer
The publication of the EU General Data Protection Regulation means that the role of the Data Protection
Officer will at long last be given a Pan-European legislative construct.
Currently one of the first responsibilities of the DPO is to manage notifications or registrations with the
relevant data protection authority in respect of the data processing activities of the data controller.
Furthermore, the DPO must keep such notifications and registrations up-to-date and to maintain separate
notifications in respect of all data processing entities within the corporate group.
DATA PROTECTION There are particular obligations placed on DPO’s in respect of notifications and registrations in respect of
Methods, Techniques & SW processing sensitive personal data as well as the international transfer of personal data (and particularly
sensitive personal data). It must be remembered that in the EU the process of notification and registration is
Tools for the Protection of
more than a “tick box” exercise and also more than a mere bureaucratic filing formality. In some European
Personal Data, even in a jurisdictions data processing cannot occur without prior registration of data processing activities. In
private environment and addition, specific notifications fall within the responsibility of the DPO where those notifications relate to
whistleblower and ethical hotlines, international transfers of personal data (particularly of a sensitive nature)
also in cloud environments.
and notifications of data breaches for cyber incidents. Another general responsibility of the DPO is to
Related topics: monitor the activities of all data controllers within the DPO’s corporate group including HR, sales and
marketing, IT, procurement and outsourcing.
• Data Security, The DPO must have policies and procedures in place that ensures liaison with the relevant departments in
• Data Virtualization, respect of any changes to processing activities – such as human resources in relation to staff, resignation or
dismissals, job interviews and recruitment, new staff members and the use of agents or subcontractors.The
• Data Masking DPO is or should be a “C level” person who has direct reporting to the management in respect of data
privacy and related compliance issues.
The DPO needs to implement policies and procedures to manage the outsourcing of data processing
activities including the use of third-party vendors for HR, IT and marketing and particularly where those third-
party vendors may be processing personal data of the company outside the European Economic Area
and/or within the Cloud.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
The protection of personal data of natural persons has been considered a fundamental
individual right since the last 25 years, according to the laws of the European Union.

Regulation (EU) 2016/679 (General Data Protection Regulation) is a Law of European


Union of general application.
It is mandatory in all its elements and directly applicable in each of the member states.

WHAT DOES PROCESSING OF PERSONAL DATA MEAN ?


Any operation (by anyone) involving paper or digital personal data:
 Acquisition
 Cancellation
 Editing
 Processing/Transformation
 Custody
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
PERSONAL DATA (DATA OF INDIVIDUALS) ANONYMOUS OR COMMON DATA
DATA THAT ALLOW THE IDENTIFICATION OR ALL NON-PERSONAL DATA OR IN ANY CASE
IDENTIFIABILITY OF THE PERSON ANONYMISED WITH CERTIFIED MASKING OR
 NAME AND SURNAME PSEUDONYMISATION TECHNIQUES
 ADDRESS THE DATA PROTECTION REGULATION DOES NOT
 TELEPHONE NUMBER OR EMAIL ADDRESS APPLY TO ANONYMOUS DATA
 TAX ID CODE
 …
SENSITIVE PERSONAL DATA Security and protection policies
 HEALTH STATE
 RELIGIOUS OR POLITICAL FAITH Sensitive Data MAXIMUM POSSIBLE OF
 SEXUAL ORIENTATION
Judicial Data SECURITY MEASURES
 BIOMETRIC DATA
 GENETIC DATA
CONFIDENTIALITY OBLIGATION
 … Personal Data WITH ADEQUATE SECURITY LEVEL
JUDICIAL PERSONAL DATA
 JUDICIAL RECORDS DATA Anonymous or
FREE DISTRIBUTION
 DATA RELATING TO JUDGMENTS Common Data
 …
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection D.P.O.
DATA CONTROLLER DATA PROTECTION OFFICER
HE IS THE SUBJECT OF DUTIES THE PRIMARY ROLE OF THE DATA PROTECTION OFFICER (DPO) IS TO
DATA OWNER HE IS RESPONSIBLE FOR THE DATA
ENSURE THAT THE ORGANISATION PROCESSES THE PERSONAL DATA IN
COMPLIANCE WITH THE APPLICABLE DATA PROTECTION RULES.
HE IS THE SUBJECT OF RIGHTS MANAGEMENT MODEL AND POLICIES IN THE APPOINTMENT OF A DPO MUST BE BASED ON HER/HIS PERSONAL
THE COMPANY: AND PROFESSIONAL QUALITIES, BUT PARTICULAR ATTENTION MUST BE
HIS DATA MUST BE PROCESSED IN PAID TO THE KNOWLEDGE OF DATA PROTECTION. A GOOD
 IT MUST DEFINE AND IMPLEMENT THE
A TRANSPARENT, LAWFUL AND APPROPRIATE SECURITY MEASURES FOR UNDERSTANDING OF THE WAY THE ORGANISATION OPERATES IS ALSO
CORRECT MANNER, AS REQUIRED EACH CATEGORY OF PERSONAL DATA, RECOMMENDED.
BY LAW PAPER OR DIGITAL, WHICH ARE MANAGED THE DPO SHOULD BE ABLE TO PERFORM HER/HIS DUTIES INDEPENDENTLY.
AND STORED IN THE COMPANY THE DPO HAS TO ENSURE THAT THE DATA PROTECTION RULES ARE
• right to transparency in the data  IT MUST DEFINE THE CORPORATE ROLES FOR RESPECTED IN COOPERATION WITH THE DATA PROTECTION AUTHORITY
collection phase THE PROCESSING OF PERSONAL DATA EDPS. THE DPO MUST:
• right to granularity in the expression  ENSURE THAT CONTROLLERS AND DATA SUBJECTS ARE INFORMED ABOUT THEIR DATA
of consent PROTECTION RIGHTS;
• right to be forgotten RESPONSIBLE FOR THE TREATMENT  GIVE ADVICE AND RECOMMENDATIONS TO THE INSTITUTION ABOUT THE
• right to portability INTERPRETATION OR APPLICATION OF THE DATA PROTECTION RULES;
HE IS THE MAIN COLLABORATOR OF THE  CREATE A REGISTER OF PROCESSING OPERATIONS WITHIN THE INSTITUTION AND
DATA CONTROLLER NOTIFY THE EDPS THOSE THAT PRESENT SPECIFIC RISKS (SO-CALLED PRIOR CHECKS);
 ENSURE DATA PROTECTION COMPLIANCE WITHIN HER/HIS INSTITUTION AND HELP THE
LATTER TO BE ACCOUNTABLE IN THIS RESPECT.

PERSON IN CHARGE OF THE  COOPERATE WITH THE EDPS (RESPONDING TO HIS REQUESTS ABOUT
INVESTIGATIONS, COMPLAINT HANDLING, INSPECTIONS CONDUCTED BY THE EDPS,

TREATMENT ETC.);
 DRAW THE INSTITUTION'S ATTENTION TO ANY FAILURE TO COMPLY WITH THE
THE PERSON WHO DAILY AND APPLICABLE DATA PROTECTION RULES.

OPERATIONALLY PROCESSES THE DATA


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection

D.P.O.
DATA PROTECTION OFFICER

IS AN OBLIGATORY CONSULTANT FOR THE DATA CONTROLLER AND FOR THE RESPONSIBLE FOR THE TREATMENT

MANDATORY REQUIREMENTS OF D.P.O.:

 HAVE ADEQUATE KNOWLEDGE OF THE LEGISLATION AND PRACTICES OF PERSONAL DATA MANAGEMENT

 HAVE NO CONFLICT OF INTEREST TO FULFILL HER/HIS DUTIES IN TOTAL AUTONOMY AND INDEPENDENCE

ATTENTION: THE OWNER WHO APPOINTS A D.P.O. WHO DOES NOT QUALIFY IS GUILTY OF "CULPA IN ELIGENDO"

(i.e.: GUILTY FOR THE NOMINATION)


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
Violating the Data Protection Regulation does not only mean disclosing confidential data
to the outside world, to violate this Regulation it is enough just to keep the data badly
When we talk about "custody" we must introduce the concept of "minimum security
measures", i.e. the set of all those organizational, behavioral and IT measures that must
guarantee the security of confidential and personal data
MINIMUM IT SECURITY MEASURES:
 each computer must be accessible only through login credentials or physical keys
 the personal access credentials must be secret and carefully guarded
 the identity and access control system must provide access only to the data for which you are
authorized
 the databases where the data resides must be protected by firewalls and antivirus software
 it is mandatory to backup data frequently; original data and backups should be stored in different
sites
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
MANDATORY DISCLOSURE
It specifies:
 the use that will be made of the data
 which subjects may come into contact with the acquired data
 who is the data controller
 who is the D.P.O.
 what are the rights of the data owner
It must be signed by the data owner to prove his consent.

The Mandatory Disclosure must be detailed and specific.


For instance, a consent acquired to publish a photo in a book does not apply to publish the
same photo on a website
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
MANDATORY DISCLOSURE
WHEN IS IT NEEDED?

ON THE OCCASION OF A NEW CONTRACT


(WITH AN EMPLOYEE, WITH A SUPPLIER, WITH A CUSTOMER)

The personnel office provides the The purchasing office provides The sales office provides the
worker with a complete the supplier with a complete customer with a complete
INFORMATION NOTICE clearly INFORMATION NOTICE which INFORMATION NOTICE which
specifying the use that will be clearly specifies the use that will clearly specifies the use that will
made of the data and which be made of the data and which be made of the data and which
subjects will be able to come into subjects will be able to come into subjects will be able to come into
contact with it. contact with it. contact with it.
In this phase it is also determined When dealing exclusively with In this phase it is also determined
whether for some data it is tax data, this task is not whether for some data it is
necessary to ask for explicit necessary necessary to ask for explicit
consent consent
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Emerging Issues: Data Protection
THE VIOLATION OF THE
GDPR LAW ENTAILS:
COMMUNICATION
Bring to the attention of one or  disciplinary sanctions
more well-identified subjects of
WHEN ARE THEY LEGAL?  administrative sanctions
information (mail, letter, phone
call, etc.)  when the law provides  criminal penalties
for it
SPREAD But above all the violation
 when the data owner of the GDPR law may
make a plurality of unidentified
has given consent have economic
subjects aware of information
(newspaper, radio, internet, TV, consequences in terms of
ecc.) compensation for
damages even for very
significant amounts
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE

Data Modeling
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

7 Key Questions
(W6H)
Which ? Who ?
• What kind of data • who produces the data
• who updates them
Why ? • who manages them
• for knowledge • who do I distribute them to
• for monitoring
• for forecasts How ?
• … • how do I get them
• how do I process them
What ? • how do I store them (in
• wich kind of analyses aggregated form or in elementary
• which kind of elaborations form; historicized or not)
• wich kind of reports
When?
Where ? • How often do I extract them
• where do I get the data (sources) • How often do I process them
• where do I store them • How often do I distribute them
(repository/storage)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Data modeling is the process of creating a visual representation of either a whole information
system or parts of it to communicate connections between data points and structures.
The goal is to illustrate the types of data used and stored within the system, the relationships
among these data types, the ways the data can be grouped and organized and its formats and
attributes.
Data models are built around business needs.
Rules and requirements are defined upfront through feedback from business stakeholders so they
can be incorporated into the design of a new system or adapted in the iteration of an existing
one.
Data can be modeled at various levels of abstraction.
The process begins by collecting information about business requirements from stakeholders and
end users.
These business rules are then translated into data structures to formulate a concrete database
design.
A data model can be compared to a roadmap, an architect’s blueprint or any
formal diagram that facilitates a deeper understanding of what is being designed.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Data modeling employs standardized schemas and formal techniques.


This provides a common, consistent, and predictable way of defining and managing data
resources across an organization, or even beyond.
Ideally, data models are living documents that evolve along with changing business needs.
They play an important role in supporting business processes and planning IT architecture and
strategy.
Adequately modeling the information that the system will have to process is a fundamental
prerequisite for the effectiveness and quality of any system.
Data modeling is a basic discipline: it does not require any type of preliminary technological
competence, and teaches how to use the mechanisms of abstraction, generalization and
association, fundamental mechanisms for all system design activities.
Data modeling has scientific foundations, linked to the mathematics of sets, but also
less deterministic aspects, which require experience and the ability to analyze the pros
and cons of the different possible solutions for the same problem.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

The data model can be built top-down or bottom-up.


The result may be identical, but the ways to get there are different.
A model is built in a "top-down" way if it is born in a unitary way, regardless of preventive analyzes of portions of
the system ("subject area").
Instead, it is constructed in a "bottom up" way if the final model is the result of the aggregation of several sectoral
models.
For example, in a project that uses use cases for requirements specification, the data model can be constructed in
two alternative ways:
1. An initial version of the model is created before having defined the use cases in detail. Subsequently,
as the individual use cases are detailed, the model is enriched and completed with new attributes
and associations.
2. For each use case already detailed, a partial data model, or "local view", is defined. The overall model
will be derived step by step, through the progressive integration of the local views created for each
use case.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Like any design process, database and information system design begins at a high level of
abstraction and becomes increasingly more concrete and specific.
Data models can generally be divided into three categories, according to their degree of
abstraction.
The process will start with a conceptual model, progress to a logical model and conclude with a
physical model.
Each type of data model is discussed in more detail in the next slides.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

CONCEPTUAL DATA MODELS

They are also referred to as domain models and offer a big-picture view of what the system will
contain, how it will be organized, and which business rules are involved.
Conceptual models are usually created as part of the process of gathering initial project
requirements.
Typically, they include entity classes (defining the types of things that are important for the
business to represent in the data model), their characteristics and constraints, the relationships
between them and relevant security and data integrity requirements.
The notation is typically simple.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

LOGICAL DATA MODELS


They are less abstract and provide more detail about the concepts and relationships in the
domain under consideration.
Logical data models indicate data attributes, such as data types and corresponding lengths,
and show relationships between entities.
Logical data models do not specify any technical system requirements.
This phase is often omitted in Agile(1) or DevOps(2) practices.
Logical data models can be useful in highly procedural implementation environments or for
projects that are data-oriented in nature, such as data warehouse(3) design or reporting system
development.
(1) agile is a method of project management, used especially for software development, that is characterized by the division of tasks into short
phases of work and frequent reassessment and adaptation of plans
(2) DevOps is a combination of software development (dev) and operations (ops). It is defined as a software engineering methodology which
aims to integrate the work of development teams and operations teams by facilitating a culture of collaboration and shared responsibility
(3) A data warehouse is a type of data management system that is designed to enable query&reporting activities. Data warehouses are
intended to perform queries and analysis and often contain large amounts of historical data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

LOGICAL DATA MODELS


The logical data model is detailed and should be as accurate as possible.
A precise logical model includes:
 the full list of attributes for each entity
 the explicit indication of the primary key and any alternative keys of each entity
 the explicit indication of optionality or obligation of each attribute
 for each attribute, the explicit indication of the data type, which specifies its format and length
 for each data type for which it is possible to do so (few stable values, or range), the explicitness
of the permitted values
 the indication of the minimum and maximum multiplicity in both directions of the relationships
 In short, the logical model must contain all the necessary information that is not
related to technical aspects, but to knowledge of the application field, and
therefore of the meaning of the data
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

PHYSICAL DATA MODELS


They provide a schema for how the data will be physically stored within a database.
As such, they’re the least abstract of all.
They offer a finalized design that can be implemented as a relational database, including
associative tables that illustrate the relationships among entities as well as the primary keys and
foreign keys that will be used to maintain those relationships.
Physical data models can include database management system (DBMS)-specific properties,
including performance tuning.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Data Modeling discipline invites stakeholders to evaluate data processing and storage in deep
detail.
Data modeling techniques have different conventions that dictate which symbols are used to
represent the data, how models are drown, and how business requirements are defined.
All approaches provide formalized workflows that include a sequence of tasks to be performed in
an iterative manner.
Those workflows generally look like this:
1. Identify the entities
The process of data modeling begins with the identification of the things, events or concepts that
are represented in the data set that is to be modeled. Each entity should be cohesive and logically
discrete from all others.

2. Identify key properties of each entity.


Each entity type can be differentiated from all others because it has one or more unique properties,
called attributes. For instance, an entity called “customer” might possess such attributes as a first
name, last name, telephone number and job title, while an entity called “address” might include a
street name and number, a city, state, country and zip code.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

3. Identify relationships among entities.


The earliest draft of a data model will specify the nature of the relationships each entity has with the
others.
In the above example, each customer “lives at” an address.
If that model were expanded to include an entity called “orders,” each order would be shipped to
and billed to an address as well.
These relationships are usually documented via unified modeling language (UML).

4. Map attributes to entities completely.


This will ensure the model reflects how the business will use the data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

5. Assign keys as needed and decide on a degree of normalization that balances the
need to reduce redundancy with performance requirements.
Normalization is a technique for organizing data models (and the databases they represent) in
which numerical identifiers, called keys, are assigned to groups of data to represent relationships
between them without repeating the data.
For instance, if customers are each assigned a key, that key can be linked to both their address and
their order history without having to repeat this information in the table of customer names.
Normalization tends to reduce the amount of storage space a database will require, but it can at
cost to query performance.

6. Finalize and validate the data model.


Data modeling is an iterative process that should be repeated and refined as business needs
change.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Data modeling has evolved alongside database management systems, with model types
increasing in complexity as businesses' data storage needs have grown. Here are two model types:
a) Hierarchical data models represent one-to-many relationships in a treelike format. In this type of
model, each record has a single root or parent which maps to one or more child tables. This
model was implemented in the IBM Information Management System (IMS), which was
introduced in 1966 and rapidly found widespread use, especially in banking. Though this
approach is less efficient than more recently developed database models, it’s still used in
Extensible Markup Language (XML) systems and geographic information systems (GISs).
b) Relational data models were initially proposed by IBM researcher E.F. Codd in 1970. They are still
implemented today in the many different relational databases commonly used in enterprise
computing. Relational data modeling doesn’t require a detailed understanding of the physical
properties of the data storage being used. In it, data segments are explicitly joined through the
use of tables, reducing database complexity.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Relational databases frequently employ structured query language (SQL) for data management.
These databases work well for maintaining data integrity and minimizing redundancy.
 Entity-relationship (ER) data models use formal diagrams to represent the relationships between
entities in a database.
Several ER modeling tools are used by data architects to create visual maps that convey
database design objectives.
 Object-oriented data models gained traction with object-oriented programming and became
popular in the mid-1990s.
The “objects” involved are abstractions of real-world entities.
Objects are grouped in class hierarchies and have associated features.
Object-oriented databases can incorporate tables but can also support more complex data
relationships.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

 Dimensional data models were developed by Ralph Kimball, and they were designed to
optimize data retrieval speeds for analytic purposes in a data warehouse.
While relational and ER models emphasize efficient storage, dimensional models increase
redundancy in order to make it easier to locate information for reporting and retrieval.
This modeling is typically used across OLAP systems.

Two popular dimensional data models are the star schema, in which data is organized into facts
(measurable items) and dimensions (reference information), where each fact is surrounded by its
associated dimensions in a star-like pattern.

The other is the snowflake schema, which resembles the star schema but includes additional
layers of associated dimensions, making the branching pattern more complex.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Benefits of Data Modeling


Data Modeling makes it easier for developers, data architects, business analysts, and other
stakeholders to view and understand relationships among the data in a database or data
warehouse.
In addition, it can:
 Reduce errors in software and database development.
 Increase consistency in documentation and system design across the enterprise.
 Improve application and database performance.
 Make it easier to map data across your organization.
 Improve communication between developers and business intelligence teams.
 Facilitate and accelerate the database design process at the conceptual, logical and physical
levels.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Data Modeling tools


Several commercial and open-source Computer-Aided Software Engineering (CASE) solutions are
widely used today, including multiple data modeling, diagramming and visualization tools.
Here are some examples:
 Erwin Data Modeler is a data modeling tool based on the Integration DEFinition for information
modeling (IDEF1X) data modeling language that now supports other notation methodologies,
including a dimensional approach.
 Enterprise Architect is a visual modeling and design tool that supports the modeling of enterprise
information systems and architectures as well as software applications and databases. It’s
based on object-oriented languages and standards.
 ER/Studio is a database design software that’s compatible with several of today’s most popular
database management systems. It supports both relational and dimensional data modeling.
 Free data modeling tools include open-source solutions such as Open ModelSphere.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Difference between ER Modeling and Dimensional Modeling


ER model is used for logical representation or the conceptual view of data.
It is a high level of the conceptual data model.
It forms a virtual representation of data that describes how all the data are related to each other.
It is a complex diagram that is used to represent multiple processes.
It helps to describe entities, attributes, and relationships.
It helps to analyze data requirements systematically to produce a well-designed database.
At the view level, the ER model is considered a good option for designing databases.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Difference between ER Modeling and Dimensional Modeling


Data in a Data Warehouse are usually in the multidimensional form.
Dimensional modeling prefers keeping the table denormalized.
The primary purpose of dimensional modeling is to optimize the database for faster retrieval of the
data.
The concept of Dimensional Modelling was developed by Ralph Kimball and consists of “fact” and
“dimension” tables.
The primary purpose of dimensional modeling is to enable business intelligence (BI) reporting,
query, and analysis.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Difference between ER Modeling and Dimensional Modeling


Dimensional modeling is a more flexible form of data modeling from the user's perspective.
Both dimensional and relational models have their specific way of storing data.
Dimensional models are built around business processes.
Dimensional models are able to manage the temporal dimension (i.e. the history) of the data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

Difference between ER Modeling and Dimensional Modeling

ER Modeling Dimensional Modeling

It is transaction-oriented. It is subject-oriented.

Entities and Relationships. Fact Tables and Dimension Tables.

Few levels of granularity. Multiple levels of granularity.

Real-time information. Historical information.

It eliminates redundancy. It plans for redundancy.

High transaction volumes using few records at a time. Low transaction volumes using many records at a time.

Highly Volatile data. Non-volatile data.

Normalization is suggested. De-Normalization is suggested.

OLTP Application. OLAP Application.


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
ER Model is used to model the logical view
of the system from a data perspective
which consists of these components:

Entity, Entity Type, Entity Set


An Entity may be an object with a physical existence – a particular
person, car, house, or employee – or it may be an object with a
conceptual existence – a company, a job, or a university course.
An Entity is an object of Entity Type, and a set of all entities is called
as an entity set.
e.g.; E1 is an entity having Entity Type Student and a set of all
students is called Entity Set. In ER diagram, Entity Type is
represented as:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name, DoB, Age,
Address, Mobile_No are the attributes that define entity type Student. In ER diagram, the attribute
is represented by an oval.

Key Attribute
The attribute which uniquely identifies each entity in the entity set is called key attribute. For
example, Roll_No will be unique for each student. In ER diagram, key attribute is represented by an
oval with underlying lines. A roll number is a unique identification number that can be
assigned to a student during admission or after registration.

Composite Attribute
An attribute composed of many other attribute is called as composite
attribute. For example, Address attribute of student Entity type consists
of Street, City, State, and Country. In ER diagram, composite attribute is
represented by an oval comprising of ovals.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Multivalued Attribute:
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more
than one for a given student). In ER diagram, a multivalued attribute is represented by a double oval.

Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute.
Age can be derived from Date of Birth. In ER diagram, the derived attribute is represented by a dashed oval.

The complete entity type


Student with its attributes
can be represented as:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Relationship Type and Relationship Set:
A relationship type represents the association between entity types.
For example,‘Enrolled in’ is a relationship type that exists between
entity type Student and Course. In ER diagram, the relationship type
is represented by a diamond and connecting the entities with lines.

A set of relationships of the same type is known as a relationship set.


The following relationship set depicts S1 as enrolled in C2, S2 is
enrolled in C1, and S3 is enrolled in C3.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.

1. Unary Relationship:
When there is only ONE entity set participating in a relation, the Marrried
Person to
relationship is called a unary relationship. For example, one person is
married to only one person.

2. Binary Relationship:
When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship.
For example, a Student is enrolled in a Course.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality

The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality can be of different types:

One-to-one When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-one.
Let us assume that a person can have only one Tax ID code and a Tax ID code can match just with one
person. So, the relationship will be one-to-one.
Many to one When entities in one entity set can take part only once in the relationship set and entities in other entity sets
can take part more than once in the relationship set, cardinality is many to one. Let us assume that a student
can take only one course, but one course can be taken by many students. So, the cardinality will be n to 1. It
means that for one course there can be n students but for one student, there will be only one course.
Many to many When entities in all entity sets can take part more than once in the relationship cardinality is many to many.
Let us assume that a student can take more than one course and one course can be taken by many
students. So, the relationship will be many to many.
INTRODUZIONE AL DIRITTO E ALLE TECNOLOGIE DIGITALI
INTRODUCTION TO DIGITAL TECHNOLOGY

Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee

SCIENZE GIURIDICHE PER LE NUOVE


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE

Data Modeling
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Lessons Time Table
This week
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
ER Model is used to model the logical view
of the system from a data perspective
which consists of these components:

Entity, Entity Type, Entity Set


An Entity may be an object with a physical existence – a particular
person, car, house, or employee – or it may be an object with a
conceptual existence – a company, a job, or a university course.
An Entity is an object of Entity Type, and a set of all entities is called
as an entity set.
e.g.; E1 is an entity having Entity Type Student and set of all
students is called Entity Set. In ER diagram, Entity Type is
represented as:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age,
Address, Mobile_No are the attributes that define entity type Student. In ER diagram, the attribute
is represented by an oval.

Key Attribute
The attribute which uniquely identifies each entity in the entity set is called key attribute. For
example, Roll_No will be unique for each student. In ER diagram, key attribute is represented by
an oval with underlying lines. A roll number is a unique identification number that can be
assigned to a student during admission or after registration.

Composite Attribute
An attribute composed of many other attribute is called as composite
attribute. For example, Address attribute of student Entity type consists
of Street, City, State, and Country. In ER diagram, composite attribute is
represented by an oval comprising of ovals.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Multivalued Attribute:
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more
than one for a given student). In ER diagram, a multivalued attribute is represented by a double oval.

Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute.
Age can be derived from Date of Birth. In ER diagram, the derived attribute is represented by a dashed oval.

The complete entity type


Student with its attributes
can be represented as:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Relationship Type and Relationship Set:
A relationship type represents the association between entity types.
For example,‘Enrolled in’ is a relationship type that exists between
entity type Student and Course. In ER diagram, the relationship type
is represented by a diamond and connecting the entities with lines.

A set of relationships of the same type is known as a relationship set.


The following relationship set depicts S1 as enrolled in C2, S2 is
enrolled in C1, and S3 is enrolled in C3.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.

1. Unary Relationship:
When there is only ONE entity set participating in a relation, the Marrried
Person to
relationship is called a unary relationship. For example, one person is
married to only one person.

2. Binary Relationship:
When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship.
For example, a Student is enrolled in a Course.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as the degree of a relationship set.

3. n-ary Relationship:
When there are n entities set participating in a relation, the relationship is called an an n-ary relationship.

Student enrolls
Student in Course taught Course
by Professor

"Student enrolls in Course taught by Professor,"

Professor
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality can be of different types:

 One-to-one

 Many to one

 Many to many
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality: One-to-one
When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-one. Let us
assume that a male can marry one female and a female can marry one male. So, the relationship will be one-to-
one. the total number of tables that can be used in this is 2.

Person married Person


to
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality: One-to-one
Using Sets, it can be represented as:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality: Many-to-one
When entities in one entity set can take part only once in the relationship set and entities in other entity sets can take
part more than once in the relationship set, cardinality is many to one. Let us assume that a student can take only
one course, but one course can be taken by many students. So, the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course. The total number of tables that
can be used in this is 3.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality: many-to-one
Using Sets, it can be represented as:

In this case, each student is taking only 1 course


but 1 course has been taken by many students.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality: Many-to-many
When entities in all entity sets can take part more than once in the relationship cardinality is many to many. Let us
assume that a student can take more than one course and one course can be taken by many students. So, the
relationship will be many to many. The total number of tables that can be used in this is 3.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.

Cardinality: Many to many


Using sets, it can be represented as:

In this example, student S1 is enrolled in C1 and


C3 and Course C3 is enrolled by S1, S3, and S4.
So it is many-to-many relationships.

In this, there is one-to-many mapping as well


where each entity can be related to more than
one relationship and the total number of tables
that can be used in this is 2.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Participation Constraint

1. Total Participation
Each entity in the entity set must participate in the relationship. If each student must enroll in a course, the participation of
students will be total. Total participation is shown by a double line in the ER diagram.
2. Partial Participation
The entity in the entity set may or may NOT participate in the relationship. If some courses are not enrolled by any of the
students, the participation of the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course Entity set having
partial participation.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Participation Constraint

Participation Constraint:
Using set, it can be represented as:

Every student in the Student


Entity set is participating in
a relationship but there
exists a course C4 that is not
taking part in the
relationship.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Weak Entity Type and Identifying Relationship

As discussed before, an entity type has a key attribute that uniquely identifies each entity in the entity set.
But there exists some entity type for which key attributes can’t be defined.
These are called Weak Entity types.
For example, a company may store the information of dependents (Parents, Children, Spouse) of an Employee.
But the dependents don’t have existed without the employee.
So Dependent will be a weak entity type and Employee will be Identifying Entity type for Dependent.
A weak entity type is represented by a double rectangle.
The participation of weak entity types is always total.
The relationship between the weak entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-01: For Strong Entity Set With Only Simple Attributes


A strong entity set with only simple attributes will require only one
table in relational model.
 Attributes of the table will be the attributes of the entity set
 The primary key of the table will be the key attribute of the entity
set
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-02: For Strong Entity Set With Composite Attributes


A strong entity set with any number of composite attributes will
require only one table in relational model.
While conversion, simple attributes of the composite attributes are
taken into account and not the composite attribute itself.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-03: For Strong Entity Set With Multi Valued Attributes


A strong entity set with any number of multi valued attributes will
require two tables in relational model.
 One table will contain all the simple attributes with the primary
key.
 Other table will contain the primary key and all the multi valued
attributes.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-04: Translating Relationship Set into a Table


A relationship set will require one table in the relational model.
Attributes of the table are:
 Primary key attributes of the participating entity sets
 Its own descriptive attributes if any
Set of non-descriptive attributes will be the primary key.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-05: For Binary Relationships With Cardinality Ratios


The following four cases are possible:

 Case-01: Binary relationship with cardinality ratio m:n

 Case-02: Binary relationship with cardinality ratio 1:n

 Case-03: Binary relationship with cardinality ratio m:1

 Case-04: Binary relationship with cardinality ratio 1:1


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-05: For Binary Relationships With Cardinality Ratios


 Case-01: Binary relationship with cardinality ratio m:n
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-05: For Binary Relationships With Cardinality Ratios


The following four cases are possible:
 Case-02: Binary relationship with cardinality ratio 1:n

Or, in extended form:


Way-00:
1. A (a1, a2)
2. B (b1, b2)
3. R (a1, b1)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-05: For Binary Relationships With Cardinality Ratios


The following four cases are possible:
 Case-03: Binary relationship with cardinality ratio m:1

Or, in extended form:


Way-00:
1. A (a1, a2)
2. B (b1, b2)
3. R (a1, b1)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling
Converting an ER diagram into the tables

Rule-05: For Binary Relationships With Cardinality Ratios


The following four cases are possible:
 Case-04: Binary relationship with cardinality ratio 1:1

Or, in extended form:


Way-00:
1. A (a1, a2)
2. B (b1, b2)
3. R (a1, b1)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 1
Construct an E-R diagram for a car-insurance company whose customers own one or more cars each.

Each car has associated with it zero to any number of recorded accidents.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 1
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 2
Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors.

Associate with each patient a log of the various tests and examinations conducted.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 2 Id_doctor
test id
Id_patient
test results
date
is
takes test prescribed
by

is
patients followed
by
doctors
Id_patient Id_doctor

name name
specialization
address
telefon number
age
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 2
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 3
A university registrar’s office maintains data about the following entities:

a) courses, including number, title, credits, syllabus, and prerequisites;

b) course offerings, including course number, year, semester, section number, instructor(s), timings, and classroom;

c) students, including student-id, name, and program;

d) instructors, including identification number, name, department, and title.

Further, the enrollment of students in courses and grades awarded to students in each course they are enrolled for must be
appropriately modeled.

Construct an E-R diagram for the registrar’s office.


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 3 In the answer given here, the main
entity sets are student, course, course-
offering and instructor.
The entity set course-offering is a weak
entity set dependent on course.
The assumptions made are :
a) a class meets only at one particular
place and time. This E-R diagram
cannot model a class meeting at
different places at different times.
b) there is no guarantee that the
database does not have two
classes meeting at the same place
and time.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 4
Consider a database used to record the marks that students get in different exams of different course offerings.

a) Construct an E-R diagram that models exams as entities, and uses a ternary relationship, for the above database.

b) Construct an alternative E-R diagram that uses only a binary relationship between students and course-offerings. Make sure
that only one relationship exists between a particular student and course-offering pair, yet you can represent the marks that a
student gets in different exams of a course offering
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 4 a)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 4 b)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 5
Construct appropriate tables for each of the E-R diagrams in Exercises 1 to 3.

a) Car insurance tables:

b) Hospital tables

c) University registrar’s tables


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 5
a) Car insurance tables:

 person (driver-id, name, address)

 car (license, year, model)

 accident (report-number, date, location)

 Participated (driver-id, license, report-number, damage-amount)


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 5
b) Hospital tables

 patients (patient-id, name, insurance, date-admitted, date-checked-out)

 doctors (doctor-id, name, specialization)

 test (testid, testname, date, time, result)

 doctor-patient (patient-id, doctor-id)

 test-log (testid, patient-id) performed-by (testid, doctor-id)


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 5
c) University registrar’s tables

 student (student-id, name, program)

 course (courseno, title, syllabus, credits)

 course-offering (courseno, secno, year, semester, time, room)

 instructor (instructor-id, name, dept, title)

 enrols (student-id, courseno, secno, semester, year, grade)

 teaches (courseno, secno, semester, year, instructor-id)

 requires (maincourse, prerequisite)


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 6
Design an E-R diagram for keeping track of the exploits of your favourite sports team.

You should store the matches played, the scores in each match, the players in each match and individual player statistics for
each match.

Summary statistics should be modeled as derived attributes.


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 6
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 7
Extend the E-R diagram of the previous question to track the same information for all teams in a league.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Modeling
TECNOLOGIE

ER Modeling - Exercises
Exercise 7
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE

Data Warehouse
INTRODUZIONE AL DIRITTO E ALLE TECNOLOGIE DIGITALI
INTRODUCTION TO DIGITAL TECHNOLOGY

Giuseppe Conigliaro
Chief Innovation Officer Humanativa Group
CEO HN Digee

SCIENZE GIURIDICHE PER LE NUOVE


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE

Data Warehouse
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

A Data Warehouse (DW), unlike an operational database, has different characteristics and
peculiarities.
This large data set will need to be:
 Integrated: a fundamental requirement of a data warehouse is the integration of the collected
data.
Data from multiple transactional systems and external sources converge in the Data Warehouse.
The goal of integration can be achieved by following different paths:
 through the use of uniform coding methods,
 through the pursuit of a semantic homogeneity of all the variables,
 using the same units of measurement
 Subject oriented: DW is oriented towards specific business topics rather than applications or
functions.
In a DW, data is stored so that it can be easily read or processed by users.
The goal, therefore, is no longer that of minimizing redundancy through normalization, but that of
providing data organized in such a way as to favor the production of information.
We move from functional design to data modeling that allows a multidimensional view of the same
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

It must also be:


 Variable over time: the data archived within a DW cover a much longer time horizon than those
archived in an operating system.
The DW contains a series of information relating to the areas of interest which capture the situation
relating to a given phenomenon in a given rather extended time interval.
This implies that the data contained in a DW are updated up to a certain date which, in most cases, is
prior to that in which the user queries the system.
This differs from what occurs in a transactional system, in which the data always correspond to an
updated situation, usually unable to provide a historical picture of the phenomenon being analysed.
 Non-volatile: this characteristic indicates that the data contained in the DW cannot be modified,
which allows read-only access.
This results in a simplicity of database design compared to that of a transactional application.
In this context, possible anomalies due to updates are not considered, nor are complex tools used to
manage referential integrity or to block records that can be accessed by other users being updated
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

The Data Warehouse, therefore, describes the process of acquiring, transforming and distributing
information present inside or outside companies as a support to managers and decision-makers.
Compared to traditional relational databases, a data warehouse offers significant advantages in
terms of performance and results.
W.H.Immon, recognized as the "father" of the Data Warehouse (DW), defines it as follows:
«A subject-oriented, integrated, time-variant and non-volatile collection of data, in support of management's decision
making»
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Subject oriented:
 organized around specific aspects of the company (customers, sales, orders, etc...)
 focused on data useful for decision making, and not on day-to-day operations
 aggregated and historicised
Integrated:
 integrates data from different and heterogeneous sources (relational databases, text files,
transactional databases, etc...)
 ensures the consistency of the integrated data using data cleaning and data integration
techniques.
 the data is converted to ensure its consistency and only subsequently entered into the Data
Warehouse
Time-variant:
 the data does not only provide current information but has a historical dimension
Non-volatile:
 it is an archive physically separate from the databases used for daily operations.
 it does not require continuous updating operations and therefore it does not need support for the
management of transactions and concurrency.
 the only operations that can be performed on a data warehouse are the initial data load and read
access.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Subject oriented
A data warehouse target on the
modeling and analysis of data for
decision-makers.
Therefore, data warehouses
typically provide a concise and
straightforward view around a
particular subject, such as
customer, product, or sales,
instead of the global
organization's ongoing
operations.
This is done by excluding data
that are not useful concerning the
subject and including all data
needed by the users to
understand the subject.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Integrated
A data warehouse integrates
various heterogeneous data
sources like RDBMS, flat files, and
online transaction records.
It requires performing data
cleaning and integration during
data warehousing to ensure
consistency in naming
conventions, attributes types, etc.,
among different data sources.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Time-variant
Historical information is kept in a
data warehouse.
For example, one can retrieve
files from 3 months, 6 months, 12
months, or even previous data
from a data warehouse.
These variations with a
transactions system, where often
only the most current file is kept.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Non-volatile
The data warehouse is a physically separate data
storage, which is transformed from the source
operational RDBMS.
The operational updates of data do not occur in the
data warehouse, i.e., update, insert, and delete
operations are not performed.
It usually requires only two procedures in data
accessing: Initial loading of data and access to data.
Therefore, the DW does not require transaction
processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval.
Non-Volatile defines that once entered into the
warehouse, and data should not change.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

If we want a compact definition of Data Warehouse, we could say that it is an organic


collection of information, even from heterogeneous sources (company databases,
databases of other companies, internet, flat files), which:
 it is managed separately from the main database of the organization;
 it serves as a support for decision-making activities, providing a series of consistent
historical data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Data Warehouse is needed for the following reasons:


1)Business User: Business users require a data warehouse to
view summarized data from the past. Since these people
are non-technical, the data may be presented to them in
an elementary form.
2)Store historical data: Data Warehouse is required to store
the time variable data from the past. This input is made to
Consistency
be used for various purposes.
3)Make strategic decisions: Some strategies may be
depending upon the data in the data warehouse. So, data
warehouse contributes to making strategic decisions.
4)For data consistency and quality: Bringing the data from
different sources at a commonplace, the user can
effectively undertake to bring the uniformity and
consistency in data.
5)High response time: Data warehouse has to be ready for
somewhat unexpected loads and types of queries, which
demands a significant degree of flexibility and quick
response time.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Benefits of Data Warehouse:

1) Understand business trends and make better forecasting decisions.

2) Data Warehouses are designed to perform well enormous amounts of data.

3) The structure of data warehouses is more accessible for end-users to navigate, understand, and query.

4) Queries that would be complex in many normalized databases could be easier to build and maintain in Data Warehouses.

5) Data warehousing is an efficient method to manage demand for lots of information from lots of users.

6) Data warehousing provide the capabilities to analyze a large amount of historical data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Difference between Operational Database and Data Warehouse


The Operational Database is the source of information for the data warehouse.

It includes detailed information used to run the day-to-day operations of the business.

The data frequently changes as updates are made and reflect the current value of the last transactions.

Operational Database Management Systems also called as OLTP (Online Transactions Processing Databases), are used to
manage dynamic data in real-time.

Data Warehouse Systems serve users or knowledge workers in the purpose of data analysis and decision-making.

Such systems can organize and present information in specific formats to accommodate the diverse needs of various users.

These systems are called as Online-Analytical Processing (OLAP) Systems.

Data Warehouse and the OLTP database are both relational databases.

However, the goals of both these databases are different.


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Difference between Operational Database and Data Warehouse


Operational systems are designed to support high-volume transaction Data warehousing systems are typically designed to support high-volume
processing. analytical processing (i.e., OLAP).

Operational systems are usually concerned with current data. Data warehousing systems are usually concerned with historical data.

Data within operational systems are mainly updated regularly according Non-volatile, new data may be added regularly. Once Added rarely
to need. changed.
It is designed for real-time business dealing and processes. It is designed for analysis of business measures by subject area,
categories, and attributes.
It is optimized for a simple set of transactions, generally adding or It is optimized for extent loads and high, complex, unpredictable queries
retrieving a single row at a time per table. that access many rows per table.

It is optimized for validation of incoming information during transactions, Loaded with consistent, valid information, requires no real-time validation.
uses validation data tables.
It supports thousands of concurrent clients. It supports a few concurrent clients relative to OLTP.
Operational systems are widely process-oriented. Data warehousing systems are widely subject-oriented

Operational systems are usually optimized to perform fast inserts and Data warehousing systems are usually optimized to perform fast
updates of associatively small volumes of data. retrievals of relatively high volumes of data.

Data In Data Out


Less Number of data accessed. Large Number of data accessed.
Relational databases are created for on-line transactional Processing Data Warehouse designed for on-line Analytical Processing (OLAP)
(OLTP)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Two different approaches to querying data:


 OLTP
 OLAP

Information systems that rely on a traditional database are often called OLTP (on-line transaction processing)
systems.
 Their function is to perform everyday operations: data modification and simple read operations.

A Data Warehouse, on the other hand, is the heart of an OLAP (on-line analytical processing) system.
 Its function is to provide support to data analysis operations and decision-making processes.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Differences between:
 OLTP
 OLAP

OLTP OLAP
 customer-oriented (used by employees or by  business oriented and used by managers, data
customers of the organization) analysts, etc..
 detailed data, often too detailed to be useful for  synthetic and aggregated data
decision making  developed from star or snowflake diagrams
 developed starting from an ER diagram  historical data
 current data  read-only but very complex queries
 fast accesses and to be treated in an atomic way,
which require control of the concurrency between
the various user transactions
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Traditional approach: "query-driven", that is: guided by the «queries»

In the query-driven approach, when a


query arrives at the integrated system,
a mediator generates subqueries for
the various heterogeneous DBMSs,
query mediator aggregates the results, and responds to
the original query.
 The mediator's job can be very
complex
 Broker queries interfere with queries
directed to individual databases.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

Approccio “update-driven””, ovvero: guidato dall’ integrazione

In the update-driven approach,


update
information is integrated in
advance.
 There is no interference
update between queries to the data
query warehouse and queries to
Data individual databases.
integrator
Warehouse update  However, the data is not
updated until the last
transaction.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL

 A Data Warehouse is based on a "multidimensional" data model.


 The data is seen in the form of «cubes» or «hypercubes».

 Cube dimensions are the entities against which an organization wants to keep track of its data.
 Example: a company can create a “Sales” Data Warehouse to record the company's sales, based
on the dimensions time, product, branch and customer.

 In each position of the cube a fact is entered, i.e., the value assumed by the variable to be
analysed.
 “Product units sold” and “Sales Revenues” are examples of facts.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: THE CUBE

The «cube» is a 3D representation of a


company's sales, based, for instance, on
three dimensions:
 period
 product
 branch
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: THE CUBE


The "cube" of the previous slide in the classic relational model corresponds to the following table.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: THE HYPERCUBE


The «hypercube» is, for example, a 4D representation of a company's sales, based on four dimensions of
analysis :
 period
 product
 branch
 supplier

To represent it in a three-dimensional space, we fix the dimension “supplier” and we have as


many three-dimensional cubes as there are suppliers
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: CUBOID AND DATA CUBE

An n-dimensional cube is called a “cuboid”.

You have different cuboids depending on the dimensions that are chosen and the level of detail of each
dimension – for the dimension period you can choose a quarter as a level of detail (as done in the previous
slides), but also a single month, or a semester .

The set of all cuboids is called a “data cube”.


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: CUBOID AND DATA CUBE


The "cuboid" is, for example, a 4D representation of a company's sales, based on tre following dimensions:
 period
 product
 branch
The lattice of the cuboid forms a
 supplier "data cube".
The figure shows the cuboid
lattice that creates 4D data cubes
for the period, product, branch,
and vendor dimensions.
Each cuboid represents a
different subset of dimensions.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: CONCEPT HIERARCHIES

A concept hierarchy is a set of associations between detail concepts and gradually more aggregate concepts
that is associated with a dimension.

Concept hierarchy for the


attribute "location"
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: CONCEPT HIERARCHIES

Many concept hierarchies are obtained implicitly from the


attributes that make up the database.
 the location dimension is described by the attributes:
• street,
• city,
• provinces,
• country.

 In the graphical representation, the attributes are ordered


from the most basic to the most general.
 You get a “hierarchy schema”.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: CONCEPT HIERARCHIES


 Conceptual hierarchies can also be achieved by grouping or subdividing the basic values ​of a given dimension.
 This is referred to as a "set-grouping hierarchy".

Concept hierarchy for the


attribute “price"
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


OLAP systems based on the multi-dimensional data model provide a series of standard operations on cuboids:
 Roll-up (drill-up): performs aggregations going up the concept hierarchies or eliminating some dimensions.
 Drill-down: it is the inverse of Roll-up, it moves from less detailed data to more detailed data going down the
hierarchies of concepts.
 Slice & Dice: make a selection on one or more dimensions, obtaining sub-cubes from the starting one.
 Pivot: Rotates the axes into a cuboid, leaving the data unchanged.
 Drill-through: when the data warehouse goes below the level of detail of the basic cuboid, directly accessing the
starting data.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


 Roll-up (drill-up)
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


 Drill-down:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


 Slice:
A slice in a multidimensional array is a column of data corresponding to a single value for one or more members of the dimension.
Slicing is the act of divvying up the cube to extract this information for a given slice.
It is important because it helps the user visualize and gather information specific to a dimension.
When you think of slicing, think of it as a specialized filter for a particular value in a dimension.

For instance, if a user wanted This picture shows the


to know the total number of slicing that filters the
Wireless Mice sold over the information so that we
whole dataset time space have only the data for
(2000-2003), the user would ASIA for all products
perform a horizontal slice as and for all the years.
shown in figure on right
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


 Dice:
Dicing is similar to slicing but it works a little bit differently.
When one thinks of slicing, filtering is done to focus on a particular attribute,
dicing on the other hand is more a zoom feature that selects a subset over all
the dimensions but for specific values of the dimension.
This tool is very useful in allowing the user to get more detailed information
on what goes in on a smaller scale.
For instance, the figure shows a graphical representation of dicing for a
particular produce, over a specific time span for a particular region. The
subset shows the Cell phone market, in North America only for the year 2000.
It incorporates the drilling technique previously defined. As one can see in the
figure, cellphones are subdivided and we use a lower level in the hierarchy and
get information for the various types of cellphones.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


 Pivot:
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATA MODEL: OPERATIONS ON CUBOIDS


 Drill-through:
The drill-through operations make use of relational SQL facilitates to reach the bottom
level of a data cubes down to its back-end relational tables.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

MULTIDIMENSIONAL DATABASES SCHEMAS

A database for OLTP applications is developed starting from an ER diagram.


For data warehouses alternative models are used:
 star schema,
 snowflake schema
 galaxy schema
Each dimension has an associated dimension table, which describes the attributes of which it is composed.
The object dimension can contain the attributes name, brand, type.
The core of the data warehouse is stored in a fact table.
Examples of facts are:
 “Product units sold”
 “Proceeds from the sale”
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA

In the star schema we have a fact table and


various dimension tables.
The fact table contains the foreign keys to the
dimension tables.
Size tables are not normalized.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA

Given a star schema, a cuboid is determined by


choosing:
 a fact from the fact table
 a set of dimensions
 for each size chosen, an attribute in the
relative table.
For example, the cuboid corresponding to the
choice of the fact «units_sold» and the
attributes «type», «quarter» and «city» is
shown here.

The cuboid corresponds to the result of the SQL query:


select sum(units_sold) from sales natural join item natural join location natural join time group by type, quarter, city.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA

A star schema is the elementary form of a dimensional model, in which data are organized into facts and dimensions. A fact is an
event that is counted or measured, such as a sale or log in. A dimension includes reference data about the fact, such as date,
item, or customer.
A star schema is a relational schema where a relational schema whose design represents a multidimensional data model. The
star schema is the explicit data warehouse schema. It is known as star schema because the entity-relationship diagram of this
schemas simulates a star, with points, diverge from a central table. The center of the schema consists of a large fact table, and
the points of the star are the dimension tables.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA
Fact Tables
A table in a star schema which contains facts and connected to dimensions. A fact table has two types of columns: those that
include fact and those that are foreign keys to the dimension table. The primary key of the fact tables is generally a composite
key that is made up of all of its foreign keys.
A fact table might involve either detail level fact or fact that have been aggregated (fact tables that include aggregated fact are
often instead called summary tables). A fact table generally contains facts with the same level of aggregation.
Dimension Tables
A dimension is an architecture usually composed of one or more hierarchies that categorize data. If a dimension has not got
hierarchies and levels, it is called a flat dimension or list. The primary keys of each of the dimension tables are part of the
composite primary keys of the fact table. Dimensional attributes help to define the dimensional value. They are generally
descriptive, textual values. Dimensional tables are usually small in size than fact table.
Fact tables store data about sales while dimension tables data about the geographic region (markets, cities), clients, products, times,
channels.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA
Advantages of Star Schema
Star Schemas are easy for end-users and application to understand and navigate. With a well-designed schema, the customer can
instantly analyze large, multidimensional data sets.
The main advantage of star schemas in a decision-support environment are:
 Query Performance
A star schema database has a limited number of table and clear join paths, the query run faster than they do against OLTP
systems. Small single-table queries, frequently of a dimension table, are almost instantaneous. Large join queries that contain
multiple tables takes only seconds or minutes to run. In a star schema database design, the dimension is connected only through
the central fact table. When the two-dimension table is used in a query, only one join path, intersecting the fact tables, exist
between those two tables. This design feature enforces authentic and consistent query results.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA
Advantages of Star Schema
 Load performance and administration
Structural simplicity also decreases the time required to load large batches of record into a star schema database. By describing
facts and dimensions and separating them into the various table, the impact of a load structure is reduced. Dimension table can
be populated once and occasionally refreshed. We can add new facts regularly and selectively by appending records to a fact
table.
 Built-in referential integrity
A star schema has referential integrity built-in when information is loaded. Referential integrity is enforced because each data in
dimensional tables has a unique primary key, and all keys in the fact table are legitimate foreign keys drawn from the dimension
table. A record in the fact table which is not related correctly to a dimension cannot be given the correct key value to be retrieved.
 Easily Understood
A star schema is simple to understand and navigate, with dimensions joined only through the fact table. These joins are more
significant to the end-user because they represent the fundamental relationship between parts of the underlying business.
Customer can also browse dimension table attributes before constructing a query.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA
Disadvantage of Star Schema
There is some condition which cannot be meet by star schemas
like the relationship between the user, and bank account cannot
describe as star schema as the relationship between them is many
to many.
Example: Suppose a star schema is composed of a fact table, SALES, and
several dimension tables connected to it for time, branch, item, and
geographic locations.
The TIME table has a column for each day, month, quarter, and year. The
ITEM table has columns for each item_Key, item_name, brand, type,
supplier_type. The BRANCH table has columns for each branch_key,
branch_name, branch_type. The LOCATION table has columns of
geographic data, including street, city, state, and country.
In this scenario, the SALES table contains only four columns with IDs from the dimension tables, TIME, ITEM, BRANCH, and LOCATION,
instead of four columns for time data, four columns for ITEM data, three columns for BRANCH data, and four columns for LOCATION data.
Thus, the size of the fact table is significantly reduced. When we need to change an item, we need only make a single change in the
dimension table, instead of making many changes in the fact table.
We can create even more complex star schemas by normalizing a dimension table into several tables.
The normalized dimension table is called a Snowflake.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

SNOWFLAKE SCHEMA

Size tables are normalized.


It saves space but is less efficient
because queries require multiple
joins to run.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

SNOWFLAKE SCHEMA
A schema is known as a snowflake if one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables.
The snowflake schema is an expansion of the star schema where each point of the star explodes into more points.
It is called snowflake schema because the diagram of snowflake schema resembles a snowflake.
Snowflaking is a method of normalizing the dimension tables in a STAR schemas. When we normalize all the dimension tables entirely,
the resultant structure resembles a snowflake with the fact table in the middle.
Snowflaking is used to develop the performance of specific queries.
The schema is diagramed with each fact surrounded by its associated dimensions, and those dimensions are related to other
dimensions, branching out into a snowflake pattern.
The snowflake schema consists of one fact table which is linked to many dimension tables, which can be linked to other dimension
tables through a many-to-one relationship.
Tables in a snowflake schema are generally normalized to the third normal form. Each dimension table performs exactly one level in a
hierarchy.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

SNOWFLAKE SCHEMA
Advantage of Snowflake Schema
 The primary advantage of the snowflake schema is the development in query performance due to minimized disk storage
requirements and joining smaller lookup tables.
 It provides greater scalability in the interrelationship between dimension levels and components.
 No redundancy, so it is easier to maintain.

Disadvantage of Snowflake Schema


 The primary disadvantage of the snowflake schema is the additional maintenance efforts required due to the increasing number of
lookup tables. It is also known as a multi fact star schema.
 There are more complex queries and hence, difficult to understand.
 More tables more join so more query execution time.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA VS SNOWFLAKE SCHEMA


A star schema store all attributes for a dimension into one denormalized
table. This needed more disk space than a more normalized snowflake
schema. Snowflaking normalizes the dimension by moving attributes
with low cardinality into separate dimension tables that relate to the core
dimension table by using foreign keys. Snowflaking for the sole purpose
of minimizing disk space is not recommended, because it can adversely
impact query performance.
In snowflake, schema tables are normalized to delete redundancy. In
snowflake dimension tables are damaged into multiple dimension tables.
Figure shows a simple STAR schema for sales in a manufacturing
company. The sales fact table include quantity, price, and other relevant
metrics. SALESREP, CUSTOMER, PRODUCT, and TIME are the dimension
tables.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA VS SNOWFLAKE SCHEMA


The STAR schema for sales, as shown above, contains only
five tables, whereas the normalized version now extends to
eleven tables.
We will notice that in the snowflake schema, the attributes
with low cardinality in each original dimension tables are
removed to form separate tables.
These new tables are connected back to the original
dimension table through artificial keys.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

STAR SCHEMA VS SNOWFLAKE SCHEMA


STAR SCHEMA SNOWFLAKE SCHEMA
Ease of It has redundant data and hence less easy to No redundancy and therefore easier to maintain and
Maintenance/Change maintain/change change
More complex queries and therefore less easy to
Ease of Use Less complex queries and simple to understand
understand
In a star schema, a dimension table will not have any In a snowflake schema, a dimension table will have one
Parent table
parent table or more parent tables
Less number of foreign keys and hence lesser query
Query Performance More foreign keys and thus more query execution time
execution time
Normalization It has De-normalized tables It has normalized tables
Good for data marts with simple relationships (one to one Good to use for data warehouse core to simplify complex
Type of Data Warehouse
or one to many) relationships (many to many)
Joins Fewer joins Higher number of joins
It contains only a single dimension table for each It may have more than one dimension table for each
Dimension Table
dimension dimension
Hierarchies are broken into separate tables in a
Hierarchies for the dimension are stored in the snowflake schema. These hierarchies help to drill down
Hierarchies
dimensional table itself in a star schema the information from topmost hierarchies to the lowermost
hierarchies.
When dimensional table store a huge number of rows
When the dimensional table contains less number of
When to use with redundancy information and space is such an issue,
rows, we can go for Star schema.
we can choose snowflake schema to store space.
Data Warehouse system Work best in any data warehouse/ data mart Better for small data warehouse/data mart.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

GALAXY SCHEMA

Also called a fact constellation


schema.
Featuring various fact tables that
share dimension tables.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies

Data Warehouse
TECNOLOGIE

DATA WAREHOUSE & DATA MART

Differences between Data Warehouse and Data Mart:


 a Data Warehouse collects information on all aspects of an organization: customers, sales, personnel, etc.
 a Data Mart is a subset of the Data Warehouse focused on a single aspect (for example sales) and managed by a single
department.

 For data warehouses, the Galaxy Schema is mainly used.


 The Star Schema is generally used for Data Marts.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE

Data Warehouse
Exercises
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 1
You want to create a data warehouse for a company that sells wholesale furniture.
 The data warehouse must allow the company to analyze the company's revenues.
 Costs and revenues must be analyzed considering the following parameters: furniture, customers, time (day level).
 The company is interested in analyzing the furniture with respect to its type (tables, chairs, beds, wardrobes, etc.) and with
respect to its category (kitchen, living room, bedroom, bathroom, office, etc.).
 The company wants to analyze customers with respect to their geographical location, considering at least city, region, state.
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 1

Furnitures
Furniture-k Customers
Description
Customer-k
Style
First name
Material
Type
Sales Last name
(Fact Table Business name
Category
Furniture-k Address
Customer-k City
Time-k Region
Country
Time Amount
Time-k Total price
Date Discount
Day of the week
Month
Quarter
Year
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 2
Starting from the following E/R diagram… … let's build the Data
Warehouse to analyze
Work code author library loans:
ISBN code
(1,N) (1,1)
edition
 By User
LITERARY WORK year

(1,N) (1,1)  by Literary work


title publishing year EDITION LITERARY VOLUME
serie id (1,N)  By Author
(0,N) (1,1)
LITERARY SERIES edition
 by Literary serie
serie name code
(1,1)
(1,1)

COPY library code


(0,N)
(0,N)
loan date
EDITOR return date

LOAN
editor id editor name

user first name (0,N)


user last name
user tel number USER user id
user date of birth
user address
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
Data Warehouse – Exercise 2

User Author
• Author Code
• User Code
• Author first name
• Username
• Author last name
• User Last Name
• Date of birth
• Date of birth
• User Tax Code
• User address
• User City
• User Region
• User Country
Loan
• Loan date
• Return date


User Code
Serie code
Literary series
• Author Code • Serie code
• Work code • Serie name
Literary work • Serie year
• Work code • Editor name
• Work title • Edition code
• Year of publication • Edition year
• Author code
• Serie code
• ISBN code
• Edition code
• Edition year
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE

ERA Diagrams
Homeworks 2024 March 24°

Send your homework to: gconigliaro@os.uniroma3.it


SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
ERA Diagram – Home Work n°1

Draw an ER diagram representing the following requirements:


• A university student is identified by a student number and is described by a first name, last name, date
and place of birth.
• For a university course you want to register the name, a program, the degree course and the faculty
you belong to. The name of the course and the related degree program uniquely identify a course.
• For each course it is necessary to register the students who passed the relevant exam, the date and the
grade of the exam.

 remember to indicate the primary key


 try to indicate the cardinality of the relationship
SCIENZE GIURIDICHE PER LE NUOVE Introduction to Digital Technologies
TECNOLOGIE
ERA Diagram – Home Work n°2

Draw an ER diagram representing the following requirements:


• A state has a name, a number of inhabitants, a surface area and a density. Neighboring states are also
registered.
• Each state has a president, whose name, surname, political party, if any, and the date of election are
known.

 remember to indicate the primary keys, simple or composite


 try to indicate the cardinality of the relationship
 indicate the derived attributes

You might also like