You are on page 1of 570

Topic 01

What is Business
Intelligence?

ICT394 BI Application
Development
Resources for this topic
• Reading:
– Obeidat, M., North, M., Richardson, R., Rattanak, V., and
North, S., 2015, Business Intelligence Technology,
Applications, and Trends, International Management Review,
11(2): 47-56
• Videos:
– Hitachi Solutions Canada, 2014, What is Business Intelligence
(BI)?, https://www.youtube.com/watch?v=hDJdkcdG1iA
– Microsoft BI TV, 2009, Business Intelligence Customer Case
Study: Premier Bank Card, https
://www.youtube.com/watch?v=ritRCFcyaLw
Learning Outcomes
• At the completion of this topic, you should be able to:
– Provide and discuss a working definition of BI
– Explain a variety of reasons as to why BI exists
– Provide an overview of BI in terms of data, analysis and
presentation
• This topic contributes to the following unit learning
outcomes:
– Demonstrate an understanding of the role of BI in
organisations
– Describe the common data sources that exist in organisations
and their use in BI
What do you need to do for this topic?

• Readings
• Case Study Videos
• Lecture Slides/Recordings
• Tutorial Questions
• Workshop
• End of Topic Quiz
Lecture Outline
1. Introduction
2. What is BI?
3. Why does BI exist?
4. Data, Analysis and Presentation
5. Conclusion
ICT394 Business Intelligence
Application Development
Topic 01: Part 02
What is Business Intelligence?
What is Business Intelligence?

Perhaps a better question is, “What isn’t it?”


- It is not a single technology
- It is not an application
- It is not a methodology
- It isn’t about spying on competitors
- It IS an often-used, but poorly understood
term
There are many definitions…
They range from simplistic
- Rud (2009) states that it is, “…a term that
encompasses all the capabilities required to turn data
into intelligence” (p. 3)
To functional…
- Langit (2007) citing Microsoft, states that BI is “…a
method of storing and presenting key enterprise data
so that anyone in your company can quickly and easily
ask questions of accurate and timely data.” (p. 1)
- Rud (2009) “…getting the right information to the
right people at the right time through the right
channel” (p. 3)
More definitions…
• Williams & Williams (2006) suggest that
BI is a combination of products,
technologies and methods
- This combination leads to better organisation
of key organisational data for improved
decision making, that drive (for example):
- Increased sales, reduced costs and
increased profits
What does it do?
It helps decision makers to:
- Answer questions such as:
- “How well are the various parts of the organisation
performing?”
- “Into which market segments do my customers fall,
and what are their characteristics?”
- “How can I tell which transactions are likely to be
fraudulent?”
- Better manage the organisation
- Conform with government regulation
So what is BI?
BI is an organisational outcome, or set of
outcomes
- …based on the combination of
- Data
- Analysis
- Presentation
• …that involve the provision of information to
the users who need it, in a form that is
useful to them, in a timely fashion
ICT394 Business Intelligence
Application Development
Topic 01: Part 03
Why does BI exist?
Why does BI exist?
• In order to understand this question, we need
to understand the costs, the benefits and the
facilitators for organisational BI

http://mimiandeunice.com/wp-content/uploads/2011/08/ME_384_CostBenefit.png
Costs of BI
• BI is expensive to implement
– Dedicated resources
• Hardware, human
– Disruptive
– Design and implementation is non-trivial
• Time consuming
• Not a great history of success!
Critical Success Factors
• The literature points to a number of CSF’s for
BI implementation:
– They suggest factors such as:
• Committed management support
• Clear vision and well-established business case
• User-oriented change management
• Successful data quality and integrity
(Yeoh & Popovic, 2016)
Benefits of BI
• Many are claimed!
– Most centre around improved decision making
and the gaining of “insights”
– Often manifested as:
• Improved forecasting
• Improved processes
• Improved data governance
• Empowerment of decision makers
BI is facilitated by
• Improvements in technology
– …and reduction in costs of storage/processing
• Improvements in analytics and visualisation
tools
• Improved understanding of the value of data
• Me too!
– BI successes tend to be reported more than the
failures
Data, Analysis, Presentation
• In the next lecture, we’ll examine these three
aspects of BI…
ICT394 Business Intelligence
Application Development
Topic 01: Part 04
Data, Analysis and Presentation
Data, Analysis and Presentation
• These are the three primary
components/activities of BI
– This lecture is an introduction to each of them
– The rest of the unit is basically divided between
them
Data
• From your previous studies, you are aware
that database is a collection or set of facts
about something of interest
– Data is the “stuff” that is stored in the database
• In terms of BI, two things happen to data:
– Firstly, it is collected, and
– Secondly, it is prepared for Analysis
Data Collection
• In previous studies, the focus was on data that
came from transactions:
– Enrolments
– Purchases
– Orders
• This type of data is thought of as being structured
• BI makes much use of this type of data
– A challenge for BI is identifying these data sources in
the organisation
Data Collection
• There are other types of data that are not
structured (semi- or un-structured) data
– Emails
– Social media postings
– Help desk calls
– …are examples of data with less structure
• Organisations are increasingly realising the
value of these data
– QANTAS social media fail
Data Preparation
• Identification and collection of data is a start
– The data need to be in a format suitable for
analysis…
• Formats
• Names

http://articles.latimes.com/1999/oct/01/news/mn-17288
Data Preparation
• There are three main phases in data
preparation:
– Extract from sources
– Transform into a format suitable for analysis
– Load into the analysis environment
Analysis
• Analysis of data in BI is where the “insights”
are generated
– Data can be:
• Summarised
• Aggregated
• Joined with other data
– In later topics, we will look at OLAP and statistical
analyses
Presentation
• …and this is where the analysis ends up
– End users having access (hopefully) to the
information they need!
• Presentation is more than simply graphs and
tables
– There is a developing science around data
visualisation
• We will just touch the surface of this!
• http://www.axiis.org/examples/BrowserMarketShare.html
Topic 02

BI Lifecycle

ICT394 BI Application
Development
Resources
• Essential Reading:
– Moss, L.T., and Atre, S., (2003), Business Intelligence
Roadmap: The Complete Project Lifecycle for Decision
Support Applications, Addison-Wesley Professional. E-
book, available from the Topic 2 Readings link on Moodle.
• Case Study:
– Yellowfin Business Intelligence, 2013, A Case Study:
University of Konstanz generates new insight with
Business Intelligence,
https://www.youtube.com/watch?v=zp0BbAO-GHU
Learning Outcomes
• At the completion of this topic, you should be able to:
– Describe a high-level BI implementation roadmap
– Compare and contrast the implementation of BI with system
development projects
– Explain the activities that would typically happen in the Justification
and Planning stages of BI application development
• This topic contributes to the following unit learning outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Describe the common data sources that exist in organisations and
their use in BI
– Demonstrate practical skills in the processes associated with
extraction, transformation and loading (ETL) of organisational data
Lecture Outline
• Introduction to a BI Lifecycle
• Justification of BI Projects
• Planning for BI Projects
• Conclusion
ICT394 Business Intelligence
Application Development
Topic 02: Part 02
Introduction to the BI Lifecycle
BI Lifecycle
• Justification
• Planning
• Business Analysis
• Design
• Construction
• Deployment
Contrast with traditional approaches
• Traditional methods work for development of
stand alone or once-off type of transaction
processing systems
• BI is primarily iterative in nature
– There is always fine-tuning of the deliverables required
– User requirements change in response to changing
environment and changing user understanding of the
environment
– New opportunities arise!
ICT394 Business Intelligence
Application Development
Topic 02: Part 03
Justification of BI Projects
Justification
• BI application development is expensive
– There needs to be a balance between costs and
benefits
• There are four main components in the
justification process:
– Business Drivers
– Business analysis issues
– Cost-benefit analysis
– Risk assessment
Business Drivers
• These are required to ensure there is
alignment between the organisation’s
strategic goals and the BI application(s) being
developed
– This is an iterative process because of the
potential for changes in:
• business requirements
• BI requirements
Business analysis issues
• These usually revolve around “unmet”
information needs
– The required information cannot be easily provided
from the existing systems
• Most often, they weren’t designed for BI, but for processing
particular transactions
– Exacerbating this are the myriad different data
sources, that could be operational, private or external
– …and the varying quality of the data in these various
systems
Business analysis issues
Cost-benefit analysis
Risk analysis
ICT394 Business Intelligence
Application Development
Topic 02: Part 04
Planning for BI Projects
Planning
• There are two main aspects to planning for BI
application development
– Enterprise infrastructure evaluation
– Project planning
Enterprise infrastructure evaluation

• Just about any IT implementation will need to


take into account the:
– Technical infrastructure
• E.g., hardware, middleware, DBMS
– Non-technical infrastructure
• Standards, metadata, business rules, policies
Technical infrastructure
• BI technology has improved markedly in
recent times
– Early BI was slow, expensive and required much
manual creation of code
– As you have seen, this is not necessarily the case
now
– However, it is still vital that the technical
component are chosen with integration with the
existing infrastructure in mind
Technical Infrastructure
• Hardware
• Network
• Middleware
• DBMS
• Tools & Standards
Non-technical infrastructure
• Logical data model
– Do we have existing models of our source
systems?
– Who owns the data in those systems?
– Do we have people who can validate the existing
models?
– Who will be responsible for the data integration?
– What tools do we have for modeling (do we need
any or more?)
Non-technical infrastructure
• Meta Data
– What meta data do we already have?
• How is it stored/maintained?
• Is there a repository?
– How is it accessed?
• Who is responsible for it?
Non-technical infrastructure
• Standards, Guidelines and Procedures
Project Planning
• Business Involvement
• Project Scope and Deliverables
• Cost-Benefit Analysis
• Infrastructure
• Staffing and Skills
ICT394 Business Intelligence
Application Development
Topic 02: Part 05
Conclusion
Conclusion
• This topic introduced a basic/generic BI Lifecycle
• There is a need for this because BI projects are
essentially different to “normal” systems
development where a “waterfall” type method
may not be appropriate
• The topic concentrated on the Justification and
Planning steps of the BI lifecycle
– We will meet the other stages as we progress
through the unit
The Next Topic
• Topic 03 addresses Data Warehousing
– At the completion of this topic, you should be able
to:
• Provide a definition of a data warehouse, including
examples as to how and in what circumstances it would be
used
• Describe and provide examples as to the difference
between operational and analytical databases
• Discuss the components of a data warehouse system
• Discuss the basic steps in the development of a data
warehouse
Topic 03
Data Warehousing

ICT394 BI Application
Development
Resources
• As this topic is essentially revision, there is no set
reading, however, you should read over your notes or
textbook from your previous databases study to
refresh your memory about data warehousing. If you
do not have access to those resources, just about any
databases text will have a section on data
warehousing that you will be able to read
• Case Study:
– 2011, Chrysler’s Data Quality Management Case
Study, https://www.youtube.com/watch?v=N78lHpiCD0k
Learning Outcomes
• At the completion of this topic, you should be able to:
– Provide a definition of a data warehouse, including examples as to how
and in what circumstances it would be used
– Describe and provide examples as to the difference between operational
and analytical databases
– Discuss the components of a data warehouse system
– Discuss the basic steps in the development of a data warehouse
• This topic contributes to the following unit learning outcomes:
– Describe the common data sources that exist in organisations and their
use in BI
– Demonstrate practical skills in the processes associated with extraction,
transformation and loading (ETL) of organisational data
– Design and implement a simple data warehouse environment
Lecture Outline
• Introduction
• Operational Vs Analytical Databases
• Data Warehouse Definition
• Data Warehouse Components
• Data Warehouse Development
• Conclusion
ICT394 Business Intelligence
Application Development
Topic 03: Part 02
Operational Vs Analytics Databases
Types of information in organisations

• Two primary types of information


• Operational information (transactional
information)
• the information collected and used in support of day to
day operational needs in businesses and other
organisations
• Analytical information
• the information collected and used in support of
analytical tasks
• Analytical information is based on operational
(transactional) information
DW as a separate structure
• The data warehouse is created as a separate
database because:
• The analytical tasks the data warehouse will have
to complete will detract from the performance of
the operational databases
• The analytical tasks will often require data from
multiple operational data sources; it may be
impossible to design an operational database that
will be able to fulfil these requirements
Operational Vs. Analytical
Information
Operational Data Analytical Data

Data Makeup Differences


Typical Time-Horizon: Days/Months Typical Time-Horizon: Years
Detailed Summarized (and/or Detailed)
Current Values over time (Snapshots)
Technical Differences
Small Amounts used in a Process Large Amounts used in a Process
High frequency of Access Low/Modest frequency of Access
Can be Updated Read (and Append) Only
Non-Redundant Redundancy not an Issue
Functional Differences
Used by all types of employees for Used by a narrower set of
tactical purposes users for decision making
Application Oriented Subject Oriented
Application Oriented vs. Subject Oriented– Example

An application-
oriented database
serving the Vitality
Health Club Visits and
Payments Application
(from: Jukic, et. al.,
2014, p.211).
Application Oriented vs. Subject Oriented– Example

A subject-oriented
database for the
analysis of the
subject revenue in
the Vitality Health
Club (from: Jukic, et.
al., 2014, p.211).
ICT394 Business Intelligence
Application Development
Topic 03: Part 03
Data Warehouse Definition
The Data Warehouse Definition
• The data warehouse is a structured repository of
integrated, subject-oriented, enterprise-wide,
historical, and time-variant data.
• The purpose of the data warehouse is the
retrieval of analytical information.
• A data warehouse can store detailed and/or
summarized data.
Structured repository
• The data warehouse is a database containing
analytically useful information
• Any database is a structured repository with its
structure represented in its metadata
Integrated
• The data warehouse integrates the analytically useful
data from the various operational databases (and
possibly other sources)
• Integration refers to this process of bringing the
data from multiple data sources into a singular data
warehouse.
Subject-Oriented
• The term subject-oriented refers to the fundamental
difference in the purpose of an operational database
system and a data warehouse.
• An operational database system is developed in
order to support a specific business operation
• A data warehouse is developed to analyze specific
business subject areas
Enterprise-wide
• The term enterprise-wide refers to the fact that
the data warehouse provides an organization-
wide view of the analytically useful
information it contains
Historical
• The term historical refers to the larger time horizon in
the data warehouse than in the operational databases
Time-variant
• The term time variant refers to the fact that a
data warehouse contains slices or snapshots of
data from different periods of time across its
time horizon
• With the data slices, the user can create reports for
various periods of time within the time horizon
Retrieval of analytical information
• A data warehouse is developed for the retrieval of
analytical information, and it is not meant for direct data
entry by the users.
• The only functionality available to the users of the
data warehouse is retrieval
• The data in the data warehouse is not subject to
changes.
• The data in the data warehouse is referred to as non-
volatile, static, or read-only
Detailed and/or summarized data
• A data warehouse, depending on its purpose,
may include the detailed data or summary
data or both
• A data warehouse that contains the data at the
finest level of detail is the most powerful
The Data Warehouse Definition
• The data warehouse is a structured repository of
integrated, subject-oriented, enterprise-wide,
historical, and time-variant data.
• The purpose of the data warehouse is the
retrieval of analytical information.
• A data warehouse can store detailed and/or
summarized data.
ICT394 Business Intelligence
Application Development
Topic 03: Part 04
Data Warehouse Components
Data Warehouse Components
• Source systems
• Extraction-transformation-load (ETL)
infrastructure
• Data warehouse
• Front-end applications
Example - The use of operational data sources for operational
purposes in an organization

(from: Jukic, et. al., 2014, p.214).


Example - The core components of a data warehousing system

(from: Jukic, et. al., 2014, p.215).


Source systems
• In the context of data warehousing, source systems are
operational databases and other operational data
repositories (in other words, any sets of data used for
operational purposes) that provide analytically useful
information for the data warehouse's subjects of analysis
• Every operational data store that is used as a source
system for the data warehouse has two purposes:
• The original operational purpose
• As a source system for the data warehouse (re-purposing)
• Source systems can include external data sources
Example - A data warehouse with internal and external source
systems

(from: Jukic, et. al., 2014, p.216).


Data Warehouse
• The data warehouse is sometimes referred to
as the target system, to indicate the fact that
it is a destination for the data from the source
systems
• A typical data warehouse periodically retrieves
selected analytically useful data from the
operational data sources
ETL infrastructure
• The infrastructure that facilitates the retrieval of data
from operational databases into the data warehouses
• ETL includes the following tasks:
• Extracting analytically useful data from the operational data
sources
• Transforming such data so that it conforms to the structure
of the subject-oriented target data warehouse model (while
ensuring the quality of the transformed data)
• Loading the transformed and quality assured data into the
target data warehouse
Data warehouse front-end (BI) applications

• Used to provide access to the data warehouse for


users who are engaging in indirect use
• Recall the case study from Topic 2 where end users
were given access to reports from the university system
Example - A data warehouse with front-end applications

(from: Jukic, et. al., 2014, p.217).


Data Warehouse Components
• Source systems
• Extraction-transformation-load (ETL)
infrastructure
• Data warehouse
• Front-end applications
ICT394 Business Intelligence
Application Development
Topic 03: Part 05
Data Warehouse
Development
Steps In The Development Of Data Warehouses

(from: Jukic, et. al., 2014, p.218).


Requirements collection, definition, and
visualization
• This step results in the requirements specifying the
desired capabilities and functionalities of the future
data warehouse
• The requirements are based on the analytical needs
that can be met by the data in the internal data source
systems and available external data sources
• The requirements are collected through interviewing
various stakeholders of the data warehouse
• In addition to interviews, additional methods for
eliciting requirements from the stakeholders can be
used
Requirements collection, definition, and visualization

• The collected requirements should be clearly


defined and stated in a written document, and
then visualized as a conceptual data model
Iterative nature of the data warehouse requirements
collection, definition, and visualization process
(from: Jukic, et. al., 2014, p.219).
Data warehouse modeling
• AKA: logical data warehouse modeling
• creation of the data warehouse data model that is
implementable by the DBMS
• This is the focus of the next topic
Creating The Data Warehouse
• Using a DBMS to implement the data
warehouse data model as an actual data
warehouse
• Typically, data warehouses are implemented using
a relational DBMS (RDBMS) software
Creating ETL infrastructure
• Creating necessary procedures and code for:
• Automatic extraction of relevant data from the
operational data sources
• Transformation of the extracted data, so that its quality is
assured and its structure conforms to the structure of the
modeled and implemented data warehouse
• The seamless load of the transformed data into the data
warehouse
• Due to the amount of details that have to be considered,
creating ETL infrastructure is often the most time- and resource-
consuming part of the data warehouse development process
Developing front-end (BI) applications

• Designing and creating applications for


indirect use by the end-users
• Front-end applications are included in most data
warehousing systems and are often referred to as
business intelligence (BI) applications
• Front-end applications contain interfaces (such as
forms and reports) accessible via a navigation
mechanism (such as a menu)
Data warehouse deployment
• Releasing the data warehouse and its front-
end (BI) applications for use by the end users
Data warehouse use
• The retrieval of the data in the data
warehouse
• Indirect use
• Via the front-end (BI) applications
• Direct use
• Via the DBMS
• Via the OLAP (BI) tools
Data warehouse administration and
maintenance
• Performing activities that support the data
warehouse end user, including dealing with
technical issues, such as:
• Providing security for the information contained in
the data warehouse
• Ensuring sufficient hard-drive space for the data
warehouse content
• Implementing the backup and recovery
procedures
Steps In The Development Of Data Warehouses

(from: Jukic, et. al., 2014, p.218).


ICT394 Business Intelligence
Application Development
Topic 03: Part 06
Topic Summary
Conclusion
• At the completion of this topic, you should be
able to:
– Provide a definition of a data warehouse, including
examples as to how and in what circumstances it
would be used
– Describe and provide examples as to the difference
between operational and analytical databases
– Discuss the components of a data warehouse system
– Discuss the basic steps in the development of a data
warehouse
References
Jukic, N., Vrksky, S., and Nestorov, S., 2014,
Database Systems: Introduction to Databases
and Data Warehouses, Pearson Education
Inc., Boston.
Topic 04
Data Warehouse Design

ICT394 BI Application
Development
Resources
• Essential Readings
– Jukic, N., Vrbsky, S., & Nestorov, S., 2014,
Database Systems: Introduction to Databases and
Data Warehouses, Pearson. Chapter 8
• Case Study
– “Business Intelligence Software - Desert Schools
Federal Credit Union - BI360 Case Study.” 2013.
https://www.youtube.com/watch?v=PhgywrbSIlw
Learning Outcomes
• At the completion of this topic, you should be able
to:
– Explain how and why dimensional modelling is used to design
data warehouses
– Create a dimensional model from a given case study
– Explain, and be able to resolve, some of the problems often
associated with dimensional modelling
• This topic contributes to the following unit learning
outcomes
– Demonstrate practical skills in the processes associated with
extraction, transformation and loading (ETL) of organisational data
– Design and implement a simple data warehouse environment
Lecture Outline
• Introduction
• Dimensional Modelling
• Star Schema
• Granularity in Dimensional Modelling
• Slowly Changing Dimensions
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 04: Part 02
Dimensional Modelling
Introduction
• ER modelling
– You will recall that ER modelling is used for visualising
database requirements, used extensively for conceptual
modeling of operational databases
• Relational modelling
– Standard method for logical modelling of operational
databases
• Both of these techniques can also be used during the
development of data warehouses
Dimensional modelling
• A data design methodology used for designing
subject-oriented analytical databases, such as
data warehouses or data marts
– Commonly, dimensional modelling is employed as a relational
data modelling technique
– In addition to using the regular relational concepts (primary
keys, foreign keys, integrity constraints, etc.) dimensional
modeling distinguishes two types of tables:
• Dimensions
• Facts
Dimension tables (dimensions)
• Contain descriptions of the business, organisation,
or enterprise to which the subject of analysis
belongs
• Columns in dimension tables contain descriptive
information that is often textual (e.g., product
brand, product color, customer gender, customer
education level), but can also be numeric (e.g.,
product weight, customer income level)
• This information provides a basis for analysis of the
subject
Fact tables
• Contain measures related to the subject of
analysis and the foreign keys (associating fact
tables with dimension tables)
• The measures in the fact tables are typically
numeric and are intended for mathematical
computation and quantitative analysis
Characteristics of dimensions and
facts
• A typical dimension contains relatively static
data, while in a typical fact table, records are
added continually, and the table rapidly grows
in size.
• In a typical dimensionally modelled analytical
database, dimension tables have orders of
magnitude fewer records than fact tables
ICT394 Business Intelligence
Application Development
Topic 04: Part 03
Star Schema
Star schema
• The result of dimensional modeling is
a dimensional schema containing facts
and dimensions
• The dimensional schema is often
referred to as the star schema

Jukic, et al, 2014, p.226


Dimensional Model Based on A Single Source
Relational schema : ZAGI Retail Company Sales Department
Database (Source)

Jukic, et al, 2014, p.275


Data records: ZAGI Retail Company Sales Department Database
(Source)

Jukic, et al, 2014, p.228


ZAGI Retail Company dimensional model for the subject sales

Jukic, et al, 2014, p.228


Star schema
• In the star schema, the chosen subject of
analysis is represented by a fact table
• Designing the star schema involves
considering which dimensions to use with the
fact table representing the chosen subject
• For every dimension under consideration,
two questions must be answered:
– Question 1: Can the dimension table be useful for the analysis
of the chosen subject?
– Question 2: Can the dimension table be created based on the
existing data sources?
Initial Example: Dimensional Model Based on A Single Source
ZAGI Retail Company dimensional model for the subject sales,
populated with the data from the operational data source

Jukic, et al, 2014, p.229


Surrogate key
• Typically, in a star schema all dimension tables
are given a simple, non-composite, system-
generated key, also called a surrogate key
• Values for the surrogate keys are typically
simple auto-increment integer values
• Surrogate key values have no meaning or
purpose except to give each dimension a new
column that serves as a primary key within the
dimensional model instead of the operational key
Additional possible
fact attributes
• A fact table contains
– Foreign keys connecting the fact table to the dimension tables
– The measures related to the subject of analysis
• In addition to the measures related to the subject of
analysis, in certain cases fact tables can contain other
attributes that are not measures
• Two of the most typical additional attributes that can
appear in the fact table are:
– Transaction identifier
– Transaction time
ZAGI Retail Company dimensional model for the subject sales
with transaction identifier included

Jukic, et al, 2014, p.233


Multiple facts in a dimensional model
• When multiple subjects of analysis can share
the same dimensions, a dimensional model
contains more than one fact table
• A dimensional model with multiple fact tables is
referred to as a constellation or galaxy of stars
• This approach enables:
– Quicker development of analytical databases for multiple subjects of
analysis, because dimensions are re-used instead of duplicated
– Straightforward cross-fact analysis
Relational schema and data records: ZAGI Retail Company Quality
Control Database
(Source 4)

Jukic, et al, 2014, p.240


ZAGI Retail Company dimensional model for the subjects sales
and defects

Jukic, et al, 2014, p.241


Expanded Example:
Dimensional Model
Based on Multiple
Sources
ZAGI Retail Company
dimensional model for
the subjects sales and
defects , populated
with the data from the
four sources

Jukic, et al, 2014, p.242

Chapter 8 – Slide 127


Snowflake model
• A star schema that contains the
dimensions that are normalised
• Snowflaking is usually not used in
dimensional modeling
– Not-normalised (not snowflaked)
dimensions provide for simpler analysis
– Normalisation is usually not necessary
for analytical databases
Snowflake Model - Example
A snowflaked version of the ZAGI Retail Company star schema
for the subject sales

Jukic, et al, 2014, p.254


ICT394 Business Intelligence
Application Development
Topic 04: Part 04
Granularity in Dimensional
Modelling
Aggregation
• Fact tables in a dimensional model can contain
either detailed data or aggregated data
– In detailed fact tables each record refers to a single
fact
– In aggregated fact tables each record summarizes
multiple facts
ZAGI Retail Company Sales Department Database (Source 1) with
additional data records included in SALESTRANSACTION and
SOLDVIA tables

Jukic, et al, 2014, p.243


ZAGI Retail Company dimensional model for the subject sales

Jukic, et al, 2014, p.244


Detailed Fact Table Example
ZAGI Retail
Company
dimensional
model for the
subject sales,
populated with
the additional
data records
from Source 1

Jukic, et al, 2014,


p.244

Both tables
consist of 11 rows
- DETAILED
Chapter 8 – Slide 134
Aggregated Fact Table Example 1
ZAGI Retail Company dimensional model with an aggregated fact
table Sales per day, product, customer, and store

Jukic, et al, 2014, p.245


Aggregated
Fact Table
Example 1
ZAGI Retail
Company
dimensional
model for the
subject sales
with an
aggregated fact
table
Sales per
day,
product,
customer,
store,
populated with
the data
Jukic, et al, 2014,
p.246

Chapter 8 – Slide 136


Granularity of the fact tables
– Granularity describes what is depicted by one row in
the fact table
– Detailed fact tables have fine level of granularity
because each record represents a single fact
– Aggregated fact tables have a coarser level of
granularity than detailed fact tables as records in
aggregated fact tables always represent
summarisations of multiple facts
Granularity of the fact
tables
• Due to their compactness, coarser granularity
aggregated fact tables are quicker to query than detailed
fact tables
– Coarser granularity tables are limited in terms of what
information can be retrieved from them
– One way to take advantage of the query performance
improvement provided by aggregated fact tables, while
retaining the power of analysis of detailed fact tables, is to
have both types of tables coexisting within the same
dimensional model, i.e. in the same constellation
A constellation of detailed and aggregated facts - Example

Jukic, et al, 2014, p.249


Line-item versus transaction-level detailed
fact table
• Line-item detailed fact table
– Each row represents a line item of a particular
transaction
• Transaction-level detailed fact table
– Each row represents a particular transaction
Line-Item Detailed Fact Table Example

Jukic, et al, 2014, p.261


Transaction-Level Detailed Fact Table Example

Jukic, et al, 2014, p.250


ICT394 Business Intelligence
Application Development
Topic 04: Part 05
Slowly Changing Dimensions
Slowly Changing Dimension
• Typical dimension in a star schema contains:
– Attributes whose values do not change (or change extremely
rarely) such as store size and customer gender
– Attributes whose values change occasionally and sporadically
over time, such as customer zip and employee salary.
• Dimension that contains attributes whose values can
change referred to as a slowly changing dimension
– Most common approaches to dealing with slowly changing
dimensions
• Type 1
• Type 2
• Type 3
Type 1
• Changes the value in the dimension’s record
– The new value replaces the old value.
• No history is preserved
• The simplest approach, used most often when a
change in a dimension is the result of an error
Type 1 Example
Susan's Tax Bracket attribute value changes from Medium to
High

Jukic, et al, 2014, p.251


Type 2
• Creates a new additional dimension record using a new
value for the surrogate key every time a value in a
dimension record changes
• Used in cases where history should be preserved
• Can be combined with the use of timestamps and row
indicators
– Timestamps - columns that indicates the time interval for
which the values in the records are applicable
– Row indicator - column that provides a quick indicator of
whether the record is currently valid
Type 2 Example
Susan's Tax Bracket attribute value changes from Medium to
High

Jukic, et al, 2014, p.251


Type 2 Example (with timestamps and row indicator)
Susan's Tax Bracket attribute value changes from Medium to
High

Jukic, et al, 2014, p.252


Type 3
• Involves creating a “previous” and “current”
column in the dimension table for each column
where changes are anticipated
• Applicable in cases in which there is a fixed
number of changes possible per column of a
dimension, or in cases when only a limited
history is recorded.
• Can be combined with the use of timestamps
Type 3 Example
Susan's Tax Bracket attribute value changes from Medium to
High

Jukic, et al, 2014, p.253


Type 3 Example (with timestamps)
Susan's Tax Bracket attribute value changes from Medium to
High

Jukic, et al, 2014, p.253


ICT394 Business Intelligence
Application Development
Topic 04: Part 06
Summary
Learning Outcomes
• At the completion of this topic, you should be able
to:
– Explain how and why dimensional modelling is used to design
data warehouses
– Create a dimensional model from a given case study
– Explain, and be able to resolve, some of the problems often
associated with dimensional modelling
• This topic contributes to the following unit learning
outcomes
– Demonstrate practical skills in the processes associated with
extraction, transformation and loading (ETL) of organisational data
– Design and implement a simple data warehouse environment
References
Jukic, N., Vrksky, S., and Nestorov, S., 2014,
Database Systems: Introduction to Databases
and Data Warehouses, Pearson Education
Inc., Boston.
Next topic…
• Data warehouse implementation
Topic 05
Data Warehouse
Implementation and Use

ICT394 BI Application
Development
Resources
• Case Study:
– Meson-BI feature case study: Paul Morris from
Royal Liverpool Hospitals,
https://www.youtube.com/watch?v=syMbchrHIWM
• Essential reading:
– Moss, L.T., and Atre, S., (2003), Business Intelligence
Roadmap: The Complete Project Lifecycle for Decision
Support Applications, Addison-Wesley Professional. E-
book, available from the Topic 5 Readings link on
Moodle. Chapter 9 
Learning Outcomes
• At the completion of this topic, you should be able to:
– Explain the role of the ETL process in BI, including describing the
activities involved in ETL design, and deliverables that result from
those activities and the roles involved
– Discuss the different types of ETL programs
– Create a source-to-target mapping document
• This topic contributes to the following Unit Learning
Outcomes:
– 3. Demonstrate practical skills in the processes associated with
extraction, transformation and loading (ETL) of organisational data
– 4. Design and implement a simple data warehouse environment
Lecture Outline
• Data Warehouse Creation
• Extract
• Transform
• Load
• ETL Infrastructure and
documentation
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 05: Part 02
Data Warehouse Creation
Creating A BI Target Database
• Involves using the functionalities of database
management software to implement the dimension model
as a collection of physically created and mutually
connected database tables
• Most often, our BI target databases are modeled as
relational databases
– Consequently, they are implemented using a relational DBMS

• It is important to note also, that


depending on the particular BI
requirement, it may be the “warehouse”
is a stand alone database in a DBMS, or a spreadsheet or
similar
Creating a data warehouse - Example
A data warehouse model

Jukic, N., Vrbsky, S., & Nestorov, S., 2014, p.244


Creating a data warehouse - Example
CREATE TABLE statements

CREATE TABLE calendar


( calendarkey INT,
fulldate DATE,
dayofweek CHAR(15),
daytype CHAR(20),
dayofmonth INT,
month CHAR(10),
quarter CHAR(2),
year INT,
PRIMARY KEY (calendarkey));

CREATE TABLE store


( storekey INT,
storeid CHAR(5),
storezip CHAR(5),
storeregionname CHAR(15),
storesize INT,
storecsystem CHAR(15),
storelayout CHAR(15),
PRIMARY KEY (storekey));
Creating a data warehouse - Example
CREATE TABLE statements

CREATE TABLE product


( productkey INT,
productid CHAR(5),
productname CHAR(25),
productprice NUMBER(7,2),
productvendorname CHAR(25),
productcategoryname CHAR(25),
PRIMARY KEY (productkey));

CREATE TABLE customer


( customerkey INT,
customerid CHAR(7),
customername CHAR(15),
customerzip CHAR(5),
customergender CHAR(15),
customermaritalstatus CHAR(15),
customereducationlevel CHAR(15),
customercreditscore INT,
PRIMARY KEY (customerkey));
Creating a data warehouse - Example
CREATE TABLE statements

CREATE TABLE sales


( calendarkey INT,
storekey INT,
productkey INT,
customerkey INT,
tid CHAR(15),
timeofday TIME,
dollarssold NUMBER(10,2),
unitssold INT,
PRIMARY KEY (productkey, tid),
FOREIGN KEY (calendarkey) REFERENCES calendar,
FOREIGN KEY (storekey) REFERENCES store,
FOREIGN KEY (productkey) REFERENCES product,
FOREIGN KEY (customerkey) REFERENCES customer);
ICT394 Business Intelligence
Application Development
Topic 05: Part 03
ETL - Extract
Extraction
• The retrieval of analytically useful data from the
operational data sources that will eventually be
loaded into the BI target database
– The data to be extracted is data that is analytically useful in the BI
target database
– What to extract is determined within the requirements and
modeling stages
• Requirements and modeling stages of the BI target database include
the examination of the available sources
• During the process of creation of the ETL infrastructure the data model
provides a blueprint for the extraction procedures
From: Moss & Atre (2003)
Extract
• There are many ways in which the
data from the source systems can be
extracted, and will depend on:
– The nature of the source system
– The owner of the source system
– The way the source system is used
• E.g., extracting data from an OLTP may reduce its
functionality during the extraction process
Ways to extract data
• Full duplication of source data
– Easiest, but will require more work on the part of the BI
team, particularly if they only require a subset of the original
data
• Would be good to be able to do the work at the source
– Could cause unacceptable load on the source systems
• Will normally be done as some sort of compromise of
these two
– Operational systems have to keep operating optimally
– Source data needs to be extracted as quickly as possible
Challenges
• There is often a high degree of redundancy in
source systems
– The same data element (e.g., Student Name) can exist in
many different systems
– Need to be sure which is the most useful for BI purposes
• Limited access to operational systems
– As above
• Staging
– Where will the extracted data be held prior to loading?
Extraction - Metadata
• Where is the data required for the DW coming
from?
– Who owns it?
– How is it formatted?
– What does it mean?
– It is complete?
– How often is it updated?
• …etc
ICT394 Business Intelligence
Application Development
Topic 05: Part 04
ETL - Transform
Transformation
• Transforming the structure of extracted
data in order to fit the structure of the
BI target database model
– E.g. adding surrogate keys
– Transforming to a common data format
• Names
• Data types
• Units
Source data problems
• Transformation is complicated
because of a range of potential
issues with the source data
– Inconsistent primary keys (see next slide)
– Inconsistent data values
– Different data formats
– Inaccurate data values
– Synonyms and homonyms
– Embedded process logic
– Missing values
From: Moss & Atre (2003)
Data transformations
• Much of the T will involve dealing
with the problems in the source data
as discussed above
• There will also be aggregations
depending on the level of fact
granularity
Transform tools…
• Can be just about anything 
– You have been performing transformations in
Power BI Desktop
– Excel is also used when the volume of data is not
great
– Various programming/transform environments
“T” as a data quality mechanism
• The data quality control and improvement
are included in the transformation process
– Commonly, some of the data in the data sources
exhibit data quality problems
– Data sources often contain overlapping
information
• Data cleansing (scrubbing)
– the detection and correction of low-quality data
Transformation - Metadata
• We will need to clearly record what needs to
be done to the extracted data in order for it to
be transformed into the common format
required in the DW
– Formulae for conversion
– Aggregation to a suitable level of granularity
– What needs to be done to clean the data for the
DW
– Replacement of missing values
• Etc…
ICT394 Business Intelligence
Application Development
Topic 05: Part 05
ETL - Load
Load
• Loading the extracted, transformed, and
quality assured data into the target BI
database
– A batch process that inserts the data into the BI
target database tables in an automatic fashion
with little end-user involvement
Load Stages
• The initial load (first load), populates initially empty BI
target database
– It can involve large amounts of data, depending on what is the
desired time horizon of the data in the newly initiated data
warehouse
• Every subsequent load is referred to as a refresh or
incremental load
– Refresh cycle is the period in which the data warehouse is
reloaded with the new data (e.g. hourly, daily)
– Determined in advance, based on the analytical needs of the
business users of the data warehouse and the technical
feasibility of the system
– In active data warehouses the loads occur in micro batches that
occur continuously
Load

From: Moss & Atre (2003)


Initial Load
• Very similar to the processes
involved in system conversion
– Updating systems, creating enterprise systems
(e.g., ERP)
– First process is to map the relevant data elements
Historical Load
• An extension to the initial load
process, except that it is using
historical, static, data
– These data may have been archived
– Data types and formats may have changed
• E.g., changes in student number/unit code formats
– May need to be different transformation
processes
Incremental Load
• Loading of new/changed operational data
– Could be monthly, weekly, daily depending on the need of
the system
• Can be:

From: Moss & Atre (2003)


Load - Metadata
• We will also need to record how the data were
loaded
– What loading schedule?
– How much to load?
– What happens if repeated data is attempted to be
loaded?
ICT394 Business Intelligence
Application Development
Topic 05: Part 06
ETL Infrastructure and
Documentation
ETL Infrastructure
• Typically, the process of creating the ETL infrastructure
includes using specialized ETL software tools and/or writing
code
• Due to the amount of detail that has to be considered,
creating ETL infrastructure is often the most time and
resource consuming part in the data warehouse
development process
• Although labor intensive, the process of creating the ETL
infrastructure is essentially predetermined by the results of
the requirements collection and data warehouse modeling
processes which specify the sources and the target
ETL Documentation
• One of the most important outputs of
the ETL process is the process
documentation/metadata
– There is a need to know what was done to what,
how it was done, and why it was done
• So it can be done again!
• So, if there is some question about the output of our BI
system, it is clear where the data came from and the
processes through which the data were put
Metadata
• What is it?
– “Data about data”
• Is correct, but does not tell enough of the story to
explain why it is so vital in the context of data
warehouse creation and any subsequent analysis
– Not exactly the same as ‘metadata’ in the context
of a RDBMS
Metadata in Action
• 133883 445673 4565465
• A report from the Blah Group dated 4/7/17
states that the European market for repository
tools expanded by 33% in 2015
• Leading gadget vendors: Protz Group 48%,
Harris Goods 29%, Zymurgy Inc 13%
Documentation

From: Moss & Atre (2003)


ICT394 Business Intelligence
Application Development
Topic 05
Topic Summary
Learning Outcomes
• At the completion of this topic, you should be able to:
– Explain the role of the ETL process in BI, including describing the
activities involved in ETL design, and deliverables that result from
those activities and the roles involved
– Discuss the different types of ETL programs
– Create a source-to-target mapping document
• This topic contributes to the following Unit Learning
Outcomes:
– 3. Demonstrate practical skills in the processes associated with
extraction, transformation and loading (ETL) of organisational data
– 4. Design and implement a simple data warehouse environment
Lecture Outline
• Data Warehouse Creation
• Extract
• Transform
• Load
• ETL Infrastructure and
documentation
• Topic Summary
Next Topic: OLAP
• At the completion of this topic, you should be able to:
– Explain how and why OLAP is used
– Explain and give examples of the OLAP operators, slice/dice,
pivot and drill down/up
– Describe the difference between discrete and continuous
data and how they are treated in Tableau
• This topic contributes to the following unit learning
outcomes:
– Present analyses of data using a number of different
techniques
References
• Moss, L.T., and Atre, S., (2003), Business
Intelligence Roadmap: The Complete Project
Lifecycle for Decision Support Applications,
Addison-Wesley Professional.
• Jukic, N., Vrbsky, S., & Nestorov, S., 2014,
Database Systems: Introduction to Databases
and Data Warehouses, Pearson.
Topic 06

OLAP

ICT394 BI Application
Development
Resources
• Connolly, T., and Begg, C.E., 2015, Database
Systems: a practical approach to design,
implementation, and management, 6th Ed.,
Pearson, Boston. E-book (available from My Unit
Readings) CHAPTER 33
• Case Study: Ergo – Topaz Business Intelligence Case
Study
– Available from:
https://www.youtube.com/watch?v=mNzzBaKNEcs&no
html5=
False
Learning Outcomes
• At the completion of this topic, you should be able to:
– Explain how and why OLAP is used
– Explain and give examples of the OLAP operators, slice/dice,
pivot and drill down/up
– Describe the difference between discrete and continuous
data and how they are treated in Tableau
• This topic contributes to the following unit learning
outcomes:
– Present analyses of data using a number of different
techniques
Lecture Outline
• What is OLAP?
• OLAP Tools
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 06: Part 02
What is OLAP?
What is OLAP?
• Strictly speaking:
– “OnLine Analytical Processing”
• Many more detailed definitions:
– “The dynamic synthesis, analysis, and consolidation of
large volumes of multi-dimensional data” Connolly &
Begg (2015, p1286)
– “The use of a set of graphical tools that provide users
with multidimensional views of their data and allows
them to analyze the data using simple windowing
techniques” Hoffer et. al. (2013, p.448)
Multi-dimensional
• In your previous database studies, you would
have spent a lot of time dealing with 2-
dimensional data structures
– E.g., rows and columns
• Multi-dimensional data means that there are
more than just rows and columns
– The easiest way to think about this is in terms of a
cube
Cubes

https://docs.oracle.com/cd/E12839_01/bi.111
1/b40105/i_olap_chapter.htm#BIDPU149
Why cubes?
• The data are stored in our BI target database
in such a way so as to facilitate multi-
dimensional queries
– E.g., sum of sales/
Period/product/
location

https://docs.oracle.com/cd/E12839_01/bi.111
1/b40105/i_olap_chapter.htm#BIDPU149
How many dimensions?
• Cubes are most often used to help us understand
what is meant by multi-dimensionality
– It does not mean however, that we are necessarily
limited to only 3 dimensions…
• It’s just that it gets hard to draw models of > 3 dimensions!

http://imaginingthetenthdimension.b
logspot.com.au/2011/08/imagining-
fourth-dimension.html
nth Dimension
• We could add additional dimensions to the
example cube we looked at earlier:
– E.g., sum of sales by Period by product by location
by customer
• We could perhaps visualise this as a cube of cubes

http://depositphotos.com/13755967/stock-
illustration-exploding-head.html
ICT394 Business Intelligence
Application Development
Topic 06: Part 03
OLAP Tools
OLAP/BI Tools
– Require dimensional organization of underlying data for
performing basic OLAP operations (slice, pivot, drill)
– Allow users to query fact and dimension tables by using
simple point-and-click query-building aplications
– Based on the point-and-click actions by the user of the
OLAP/BI tool, the tool writes and executes the code in
the language of the data management system (e.g. SQL)
that hosts the data warehouse or data mart that is being
queried
OLAP Tools allow for…
• Ad-hoc direct analysis of dimensionally
modelled data
• Creation of front-end BI applications
A typical OLAP/BI tool query construction space
OLAP Query 1 - OLAP/BI tool query construction actions

Jukic, et. al,


2017, p. 284
OLAP Query 1 - Result

Jukic, et. al,


2017, p. 284
Basic OLAP operators:
o Slice and Dice
o Pivot (Rotate)
o Drill Down / Drill Up
Slice and Dice
• Adds, replaces, or eliminates specified
dimension attributes (or particular values of the
dimension attributes) from the already displayed
result

Jukic, et. al,


2017, p. 285
Slice and Dice - Example
OLAP Query 2 - Text

OLAP Query 2: For stores 1 and 2, show separately for male and female shoppers
the number of product units sold for each product category

Jukic, et. al,


2017, p. 284/5
Slice and Dice – Another Example
OLAP Query 3 - Text

OLAP Query 3: For each individual store, show separately for male and female
shoppers the number of product units sold on workdays and on weekends/holidays

Jukic, et. al,


2017, p. 285
Pivot (aka Rotate)
• Reorganizes the values displayed in the original
query result by moving values of a dimension
column from one axis to another

Jukic, et. al,


2017, p. 286
Drill-Down/Up
• Drill Down
– Makes the granularity of the data in the query
result finer
• Drill Up
– Makes the granularity of the data in the query
result coarser
Drill Hierarchy
• Set of attributes within a dimension where an attribute is
related to one or more attributes at a lower level but only
related to one item at a higher level
– For example: StoreRegionName → StoreZip →
StoreID
• Used for drill down/drill up operations
Drill Down – Example
OLAP Query 4 - Text

OLAP Query 4: For each individual store, show separately for male and female
shoppers the number of product units sold for each individual product in each
category.

Jukic, et. al,


2017, p. 288
Additional OLAP/BI tool functionalities
• Graphically visualizing the answers
• Creating and examining calculated data
• Determining comparative or relative differences
• Performing exception analysis, trend analysis,
forecasting, and regression analysis
• Number of other analytical functions
Result Visualization Example
OLAP Query 1 – Result, visualized as a chart

Jukic, et. al,


2017, p. 288
ICT394 Business Intelligence
Application Development
Topic 05: Part 04
Topic Summary
Learning Outcomes
• At the completion of this topic, you should be able to:
– Explain how and why OLAP is used
– Explain and give examples of the OLAP operators, slice/dice,
pivot and drill down/up
– Describe the difference between discrete and continuous
data and how they are treated in Tableau
• This topic contributes to the following unit learning
outcomes:
– Present analyses of data using a number of different
techniques
Lecture Outline
• What is OLAP?
• OLAP Tools
• Topic Summary
Next Topic: Business Analytics
• At the completion of this topic, you should be able to:
– Define and provide appropriate examples of business analytics
– Explain, using examples, the differences between descriptive,
predictive and prescriptive analytics
– Provide examples of the use of different types of reports
– Perform various analytics techniques on given data sets
• This topic contributes to the following unit learning outcomes:
1. Demonstrate an understanding of the role of BI in
organisations
6. Present analyses of data using a number of different
techniques
References
• Connolly, T., and Begg, C., 2015, Database Systems:
A practical approach to design, implementation,
and management, 6th Ed., Pearson, Boston
(available from MUR)
• Hoffer, J.A., Ramesh, V., and Topi, H., 2013, Modern
Database Management, 11th Ed., Pearson, Boston
• Jukic, N., Vrbsky, S., and Nestorov, S., 2017,
Database Systems: Introduction to Databases and
Data Warehouses, Prospect Press
Topic 07

Business Analytics

ICT394 BI Application
Development
Resources
• Reading:
– Stubbs (2013), Chapters 1 & 2
• Case Study:
– Allrecipes
Learning Outcomes
• At the completion of this topic, you should be able to:
– Define and provide appropriate examples of business analytics
– Explain, using examples, the differences between descriptive, predictive
and prescriptive analytics
– Provide examples of the use of different types of reports
– Perform various analytics techniques on given data sets
– Create a plan for a business analytics investigation
• This topic contributes to the following unit learning outcomes:
1. Demonstrate an understanding of the role of BI in
organisations
6. Present analyses of data using a number of different
techniques
Lecture Outline
• What is Business Analytics?
• Types of Business Analytics
• Types of Reports
• SMART Goals
• Developing a Business Analytics plan
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 07 Part 02:
What is Business Analytics?
Business Analytics
• A broad term, defined differently by different
authors
– Stubbs (2013) provides such a definition:
• “analytics can be considered any data-driven process
that provides insight” (p.5)
– Seddon et al (2016):
• “the use of data to make sounder, more evidence-based
business decisions”
How does BA provide insight?
• Reporting
• Trending
• Segmentation
• Predictive Modelling
Insight generation
• Each of the broad types of analytics have the
following in common:
– They are based on data
– They use mathematical techniques to transform
and summarize the data
– They add value
• https
://onlinehelp.tableau.com/current/pro/online/mac/en-us/for
ecast_how_it_works.html
Advanced Analytics
• Focus is on
– WHY things are happening
– WHAT things are likely to happen
• Requires more complex toolset
– Operations research
– Statistics
– Multivariate analyses
– Decision trees
– Regressions
Business Outcomes
• Business relevancy
• Actionable insight
• Performance measurement and value
measurement
ICT394 Business Intelligence
Application Development
Topic 07 Part 03:
Types of Business Analytics
Descriptive Analytics
• AKA Reporting
– What is happening and understanding the
underlying trends and causes
– Relies on consolidated data sources (e.g., BI target
database/data warehouse)
– Includes a high degree of visualisation
• Some interesting case studies are listed in the
readings for this topic – Seattle Children’s and
Kaleida Health.
Predictive Analytics
• Aims to determine what will happen in the
future
– Will generally be based on statistical techniques
and other techniques that will often fall under the
heading of data mining
– Often used to predict churn, products likely to
appeal, creditworthiness assessment (
https://www.youtube.com/watch?v=AJQ3TM-p2QI)
– Moneyball!
Prescriptive Analytics
• Goal is to recognise what is happening as well as
the forecast and to be able to make decisions
that will achieve the best possible performance
– Generally aimed at optimisation of system
performance
– The types of recommendations provided by these
types of systems may be yes/no or production of a
recommended amount
• They might also be used directly in automated rules-based
decision systems such as airline pricing systems
ICT394 Business Intelligence
Application Development
Topic 07 Part 04:
Types of Reports
What is a report?
• Document that contains information relevant
to the domain of interest
• Used in management decision making
• Will use data from a variety of sources
• Iterative reporting cycle:
– Data acquisition  Information Generation 
Decision Making  Business Process
Management  Data acquisition etc…
Metric Management Reports
• Performance against desired outcome
– Service level agreements
– KPI’s
– 6-Sigma/Total Quality Management

http://kb.tableau.com/articles/knowledgebase
/kpis
Dashboards
• A way of presenting a range of performance
indicators on a single page
• We will address these in more depth later in
the course
– …and, building them in the workshops and
assignment
Balanced Scorecard (BSC)
• The BSC is a system that measures success in
an organisation, but looks beyond just the
financial perpsective
– Customer perspective
– Internal Business Process perspective
– Learning and growth perspective
ICT394 Business Intelligence
Application Development
Topic 07 Part 05:
SMART Goals
Business Analytics
• So, BA is about developing or creating insight from
data
– But how?
• In order to develop insight, we need to be sure that
we understand what it is we are trying to do…
– What problem are we trying to solve?

– Watch this video now:


https://www.coursera.org/learn/analytics-tableau/lectur
e/ruazC/rock-
projects
Value
• What is the value of the BA project to the
organisation?
– In order to understand this, we need to
understand what problem it is the organisation is
asking us to solve.
– The best way to do this, is to ASK QUESTIONS…
Stakeholders
• In order to make sure we are asking questions
of the right people, we need to make sure we
know who the right people are!
– By doing this, and asking appropriate questions, we
will be in the best position to be able to work
constructively with the client to solve their problem
– A good place to start might be:
• “What problem is it that you hope to solve by developing
this project?”
Questions…
• …and ask more and more and more 
– “How is this problem affecting the organisation?”
– “What is your ideal outcome of the project?”

• The purpose of asking these questions is to make sure that


you really do understand what the stakeholders’ needs are

• See:
https://www.captechconsulting.com/blogs/interacting-with
-stakeholders-as-a-business-analyst-who-are-you-dealing-
with
SMART
• Specific
– Greater chance of accomplishing a specific goal than a general goal
• Measureable
– Need to be able to understand if we are making progress (or not) toward
our goal
• Attainable
– No point in setting a goal that cannot be attained regardless of how
much effort you put in
• Relevant
– You need to be willing AND able to achieve the goal
• Time-bound
– Achievable within a given timeframe
How SMART?
• Identify the business metric that needs to
change in order to achieve the SMART goal
– How should it change, and by how much?

• If we can answer this, then our goals are more


likely to be SPECIFIC and MEASUREABLE
Dependent and Independent Variables

• Dependent variable is the business metric that


we need to change
• Independent variable is what we use to
change the dependent variable
Example
• Our problem is that too many people are not
returning to our site after they log in for the
first time…
– How to convert first time visitors to returning
visitors?

– A vague project goal would be:


• Increase the number of returning visitors to the website.
• Is it SMART?
SMART
• Better would be:
– Increase the number of returning visitors on a
month-by-month basis by 15% compared to the
same month last year

• What do you think the DV is for this goal?


– Ask yourself, what is it we are trying to change, and
– What data will you need in your dataset to be able
to show the DV?
Relevant?
• R:
– We need to ensure that we are solving the correct
problem. In this case, the number of visitors might
not be the real problem, the real problem might
be a drop in revenue…
– IF that is the case, then a better project goal might
be:
• Determine the website changes that will most
efficiently increase revenues by 15% on a month-by-
month bases compared to the same month last year
Time-bound?
• The goal of the project is to, within 2 months,
determine the website changes that will most
efficiently increase revenues by 15% on a
month-by-month bases compared to the same
month last year

• What’s left? A…
Attainable?
• The client might simply not collect the data
you will need to identify the changes that are
needed to be made to the website…
– If this was the case, then it would be impossible to
achieve the goal in the time stated
• Might need to start a project on redesigning how data
is captured by the web site
• This might take much longer than you have!
Updated goals:
• If the organisation does collect good data from its website:
– In 2 months, analyze archived click-stream data to determine the
website changes that will most efficiently increase revenues by
15% on a month-by-month basis compared to the same month
last year
• If not…
– In 3 months, install a system that will collect and store click-
stream data in a cloud-based relational database. By 2 months
after the system is installed...
• Much SMARTer than our original goal...
– Increase the number of returning visitors to the site...
…and then what?
• These goals need to be discussed and possibly
refined in consultation with the stakeholders
– Everyone needs to be playing the same tune!

• Asking lots of questions will help to determine


what the independent variables might be…
ICT394 Business Intelligence
Application Development
Topic 07 Part 06:
Developing a Business Analytics Plan
Planning…
• As with most things, a lack of planning will
make progress more difficult
– BA is no different
– Planning helps us to know:
• What we know
• What we don’t know
• What we need to know
ICT394 Business Intelligence
Application Development
Topic 07 Part 07:
Topic Summary
Learning Outcomes
• At the completion of this topic, you should be able to:
– Define and provide appropriate examples of business analytics
– Explain, using examples, the differences between descriptive, predictive
and prescriptive analytics
– Provide examples of the use of different types of reports
– Perform various analytics techniques on given data sets
– Create a plan for a business analytics investigation
• This topic contributes to the following unit learning outcomes:
1. Demonstrate an understanding of the role of BI in
organisations
6. Present analyses of data using a number of different
techniques
Lecture Outline
• What is Business Analytics?
• Types of Business Analytics
• Types of Reports
• SMART Goals
• Developing a Business Analytics plan
• Topic Summary
Where to from here…
• The next topic deals with Data Mining
• Following that, we will move into the
Presentation section of the unit which will
discuss visualisation in more detail
Topic 08

Data Mining

ICT394 BI Application
Development
Resources
• See Topic 08 Reading List in Moodle for links
– REQUIRED:
• Sharda Ramesh, Delen Dursun, and Turban Efraim.
2014. Business Intelligence and Analytics: Systems for
Decision Support. Harlow: Pearson Education Limited.
• There are a number of other links throughout the
lecture slides and in the Topic reading list
Learning Outcomes
• At the completion of this topic, you should be able to:
– Define and give examples of data mining as an enabling technology for
business intelligence and analytics
– Understand and give examples of the objectives and benefits of data
mining
– Give examples of a wide range of data mining applications
– Understand the standardised data mining process (CRISP-DM)
• This topic contributes to the following unit learning outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Present analyses of data using a number of different techniques
Lecture Outline
• Data Mining Concepts
• Data Mining Applications
• Data Mining Processes
• Data Mining Methods
• Data Mining Software Tools
• Data Mining Issues
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 08 Part 02:
Data Mining Concepts
Data Mining Defined
• “…a process that uses statistical, mathematical and
artificial intelligence techniques to extract and
identify useful information ... from large sets of
data.” Sharda et. al. (2014, p.222).
– Also sometimes known as:
• Knowledge extraction (KDD)
• Pattern analysis
• Data Archaeology
• Information harvesting
• Pattern searching
• Data dredging
…another definition
• “the nontrivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data stored in
structured databases”
Pattern
Recognition

DATA Machine
MINING Learning

Mathematical
Modeling Databases Sharda, et al (2014, p.223)

Management Science &


Information Systems
Why bother?
• Global competition
• Untapped value of organisational data
• Increasing consolidation of data
• Vast improvements in processing and storage
capabilities and reduction in cost
https://en.wikipedia.org/wiki/Recommender_s
ystem#
The_Netflix_Prize
How data mining works
• Four major types of patters:
– Associations
– Predictions
– Clusters
– Sequential Relationships
Data Mining Tasks
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Sharda et al
Outlier analysis Unsupervised K-means, Expectation Maximization (EM)
(2014, p. 228)
Where to from here…
• Data Mining Applications

http://dilbert.com/strip/2000-01-05
ICT394 Business Intelligence
Application Development
Topic 08 Part 03:
Data Mining Applications
Data Mining Applications
• Data mining has been, and continues to be,
used in a wide variety of contexts, some
examples are:
– Customer relationship management
– Banking and other financial
– Retailing/logistics
– Insurance
– Brokerage and securities trading
– Manufacturing and Maintenance
CRM
• Customer relationship management
– http://searchcrm.techtarget.com/definition/CRM
• Maximize return on marketing campaigns
• Improve customer retention (churn analysis)
• Maximize customer value (cross-, up-selling)
• Identify and treat most valued customers
Banking and other financial
• Automate the loan application process
– Prediction of most likely defaulters
• Detecting fraudulent transactions
– https://www.youtube.com/watch?v=1zDwIfSDQiE

• Maximize customer value (cross-, up-selling)


• Optimizing cash reserves with forecasting
Retailing and logistics
• Optimize inventory levels at different locations
• Improve the store layout and sales promotions
• Optimize logistics by predicting seasonal
effects
• Minimize losses due to limited shelf life

https://www.linkedin.com/pulse/20140403185
417-4785379-diapers-and-
beer
Manufacturing and maintenance
• Predict/prevent machinery failures
– http://www.manufacturing.net/article/2014/12/u
sing-big-data-iot-predict-machine-
failure
• Identify anomalies in production systems to
optimize the use manufacturing capacity
• Discover novel patterns to improve product
quality
Insurance
• Forecast claim costs for better business
planning
• Determine optimal rate plans
• Optimize marketing to specific customers
• Identify and prevent fraudulent claim activities
Where to from here…
• Data Mining Process

http://sisbinus.blogspot.com.au/2014/11/proc
esses-in-data-
mining.html
https://www.ibm.com/developerworks/bpm/li
brary/techarticles/1407_chandran
/
ICT394 Business Intelligence
Application Development
Topic 08 Part 04:
Data Mining Process
Data Mining Process
• CRISP-DM

https://www.ibm.com/developerworks/bpm/
library/techarticles/1407_chandran
1. Business/Organisational Understanding

• What is the study for?


– Need for a thorough understanding of the need
for new knowledge and an explicit specification of
the objectives of the study
2. Data Understanding
• Identify relevant data that is required to address the
specific questions posed in the previous step
– Obviously, the clearer the specification, the easier this
step will be 
– As we discussed earlier in the unit, we will need to
understand where the data exist, what format they are
in, how we access the data and so on
– Will often be:
• Demographic
• Sociographic
• Transactional
3. Data Preparation
• AKA Data Pre-processing Real-world
Data

– Consolidation · Collect data


Data Consolidation · Select data

– Cleaning · Integrate data

– Transformation Data Cleaning


·
·
·
Impute missing values
Reduce noise in data
Eliminate inconsistencies

– Reduction · Normalize data


Data Transformation · Discretize/aggregate data
· Construct new attributes

· Reduce number of variables


Data Reduction · Reduce number of cases
· Balance skewed data

Sharda et al
Well-formed (2014, p.237)
Data
4. Modelling/Model Building
• Representation of real-world observations
– Applications of algorithms to see out, identify and
display the patterns found in the data
– Models, generally, with classify, predict or both
• E.g., Decision tree are predictive in nature but can also
be used to classify our data
5. Testing and Evaluation
• Models are assessed and evaluated for their
accuracy and generality
– Particularly in terms of the business objectives of
the project
6. Deployment
• For the models generated to be of practical
value, they must be deployed
– The end users must be able to access and
understand what the model shows
• Could include preparation of a report, or some more
sophisticated form of interactive system
– Will also include maintenance
– http://www.npr.org/sections/alltechconsidered/2
012/10/31/163951263/the-night-a-computer-pred
icted-the-next-
president
Where to from here…
• Data mining methods
ICT394 Business Intelligence
Application Development
Topic 08 Part 05:
Data Mining Methods
Classification
• Most frequently used DM method
• Part of the machine-learning family
• Employ supervised learning
• Learn from past data, classify new data
• The output variable is categorical (nominal or
ordinal) in nature
Classification Methodology
• Two-step
– Model development/training
– Model testing/development
Classification Model Assessment
• Predictive accuracy
• Speed
• Robustness
• Scalability
• Interpretability
Classification Techniques
• Decision tree analysis
• Statistical analysis
• Neural networks
• Case-based reasoning
• Bayesian classifiers
• Genetic algorithms
http://www.cse.unsw.edu.au/~billw/cs9414/n
otes/ml/06prop/id3/id3.html
Classification Techniques
• Decision tree analysis
• Statistical analysis
• Neural networks:
https://www.youtube.com/watch?v=bxe2T-V8XRs
• Case-based reasoning:
http://www.inf.ufsc.br/~awangenh/RP/RBC/introrbc.pdf
• Bayesian classifiers
• Genetic algorithms:
http://www.obitko.com/tutorials/genetic-algorithms/
ICT394 Business Intelligence
Application Development
Topic 08 Part 06:
Data Mining Software Tools
Data mining tool use
http://www.kdnuggets.com/polls/2015/analyti
cs-data-mining-data-science-software-
used.html
Some useful links…
• Rapid Miner getting started videos:
https://www.youtube.com/playlist?list=
PLssWC2d9JhOZLbQNZ80uOxLypglgWqbJA
• Intro to Python for Data Science:
https://www.datacamp.com/courses/intro-to-
python-for-data-
science
• Other relevant “Data Camp” courses:
https://www.datacamp.com/courses
ICT394 Business Intelligence
Application Development
Topic 08 Part 07:
Data Mining Issues
Privacy issues
• Any time that transactional data is stored,
there may be identifying information
– Name, address etc
– Purchasing habits
– Loan details etc
• The ownership of that data is questionable
– Data mining, uses these data
Myths…
• Data mining …
– provides instant solutions/predictions
– is not yet viable for business applications
– requires a separate, dedicated database
– can only be done by those with advanced degrees
– is only for large firms that have lots of customer
data
– is another name for the good-old statistics
Blunders
• Selecting the wrong problem for data mining
• Ignoring what your sponsor thinks data mining is
and what it really can/cannot do
• Not leaving insufficient time for data acquisition,
selection and preparation
• Looking only at aggregated results and not at
individual records/predictions
• Being sloppy about keeping track of the data
mining procedure and results
ICT394 Business Intelligence
Application Development
Topic 08 Part 08:
Topic Summary
Learning Outcomes
• At the completion of this topic, you should be able to:
– Define and give examples of data mining as an enabling technology for
business intelligence and analytics
– Understand and give examples of the objectives and benefits of data
mining
– Give examples of a wide range of data mining applications
– Understand the standardised data mining process (CRISP-DM)
• This topic contributes to the following unit learning outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Present analyses of data using a number of different techniques
Lecture Outline
• Data Mining Concepts
• Data Mining Applications
• Data Mining Processes
• Data Mining Methods
• Data Mining Software Tools
• Data Mining Issues
• Topic Summary
Where to from here…
• Next topic:
– Visualisation
Correlation and Regression
Correlation is a statistical technique used to determine the
degree to which two variables are related.
Scatter Plot
• Two quantitative
variables
• One variable is
called independent
(X) and the second
is called dependent
(Y)
• Points are not joined
Scatter plots

The pattern of data is indicative of the type of


relationship between your two variables:
positive relationship
negative relationship
no relationship
Positive relationship
Negative relationship

Reliability

Age of Car
No relation
Correlation Coefficient

Statistic showing the degree of relation between two


variables
Simple Correlation coefficient (r)

It is also called Pearson's correlation or product


moment correlation coefficient.
It measures the nature and strength between two
variables of the quantitative type.
The sign of r denotes the nature of
association

while the value of r denotes the


strength of association.
If the sign is +ve this means the relation is direct (an
increase in one variable is associated with an increase in
the other variable and a decrease in one variable is
associated with a decrease in the other variable).

While if the sign is -ve this means an inverse or indirect


relationship (which means an increase in one variable is
associated with a decrease in the other).
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the
association as illustrated
by the following diagram.

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
Correlation

Measures the relative strength of the linear


relationship between two variables
Unit-less
Ranges between –1 and 1
The closer to –1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker any positive linear
relationship
Correlation is based on Covariance
Interpreting Covariance

cov(X,Y) > 0 X and Y are positively correlated


cov(X,Y) < 0 X and Y are inversely correlated
cov(X,Y) = 0 X and Y are independent
Correlation coefficient

 Pearson’s Correlation Coefficient is


standardized covariance (unitless):
Scatter Plots of Data with Various
Correlation Coefficients

Y Y Y

X X X
r = -1 r = -.6 r=0

Y
Y Y

X X X
r = +1 r = +.3 r=0
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear Correlation

Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-H
Linear Correlation

Strong relationships Weak relationships

Y Y

X X

Y Y

X X
 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-H
Correlation is not the same thing as causation..
Question : Is correlation a DESCRIPTIVE or
PREDICTIVE model?
Regression

If we wish to extend this model further then we


may use regression
Linear regression

In correlation, the two variables are treated as equals. In regression, one


variable is considered independent (=predictor) variable (X) and the other the
dependent (=outcome) variable Y.
What is “Linear”?

Remember this:
Y=mX+B?

B
Regression

Calculates the “best-fit” line for a certain set of data


The regression line makes the sum of the squares of
the residuals smaller than for any other line
SB P (m m H g )

220

200

180

160

140

120

100

80
60 70 80 90 100 110 Wt (kg)
120
Prediction

If you know something about X, this knowledge helps you


predict something about Y.
Regression equation…

Expected value of y at a given level of x=


Predicted value for an individual…

yi=  + *xi + random errori

Fixed –
Follows a normal distribution
exactly on the

line
Assumptions

Linear regression assumes that…


1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the
same (homogeneity of variances)
4. The observations are independent

(If these assumptions cannot be met we can still


apply regression – but not a simple linear
regression)
Classical Linear Regression (OLS)

a0  Mean response when x=0 (y-


intercept)
b1  Change in mean response when
x increases by 1 unit (slope)
a0, b1 are unknown parameters (like
m)
a0+b1x  Mean response when
explanatory variable takes on the
value x
Task:

Minimize the
sum of squared
errors:
Hours studying and grades
Regressing grades on hours

Predicted final grade in class =


59.95 + 3.17*(number of hours you study per week)
Predicted final grade in class = 59.95 + 3.17*(hours of study)

Predict the final grade of…

Someone who studies for 12 hours


Final grade = 59.95 + (3.17*12)
Final grade = 97.99

Someone who studies for 1 hour:


Final grade = 59.95 + (3.17*1)
Final grade = 63.12
Choice of model

Multiple Linear regression


Logistic regression (non-parametric)
K-Means Clustering
Logistical Regression
Association Rules
Clustering

Clustering is a general task which can be solved by


numerous algorithms. The basic objective is to create k
clusters of data points within the set, such that the members
of the cluster are more similar to each other, than they are to
members of other clusters.
Clustering

K-means is a well known clustering algorithm which uses


the means of the clusters as the distance measure. Hence the
name.

The k-means algorithm is an algorithm to cluster n


objects based on attributes into k partitions, where k <
n.
Distance measure

The grouping is done by minimizing the sum of


squares of distances between data and the
corresponding cluster centroid.

There are various distance measures possible, but


we will consider MSE as this has been used
previously.
Weaknesses of K-Mean Clustering
 
1. For small data sets, initial grouping changes the
outcome significantly.
2. K, must be determined before hand and may yield
different results in each run as clusters depend on
initial random assignments .
3. Different order of data can yield different clusters,
especially if the data set is small.
4. Initial conditions change the outcome such that it
may never find the best result. Algorithm can
become trapped in a local optimum.
Applications of K-Mean Clustering

K-means algorithm is useful for undirected knowledge


discovery and is relatively simple. K-means has found
wide spread usage in lot of fields, ranging from
unsupervised learning of neural network, Pattern
recognitions, Classification analysis, Artificial intelligence,
image processing, machine vision, and many others.
It is relatively quick. The algorithm would imply O(nlogn)
but heuristics and optimization make it faster.
If groups are already known?

Clustering is used if we don’t already know the


groups.

If we already know the groups, then it’s a simple


classification task. We might use something like a
decision tree.
Case Study

Sonia is a program director for a major health insurance provider.


Recently she has been reading in medical journals and other
articles, and found a strong emphasis on the influence of weight,
gender and cholesterol on the development of coronary heart
disease. The research she’s read confirms time after time that
there is a connection between these three variables, and while
there is little that can be done about one’s gender, there are
certainly life choices that can be made to alter one’s cholesterol
and weight. She begins brainstorming ideas for her company to
offer weight and cholesterol management programs to individuals
who receive health insurance through her employer. As she
considers where her efforts might be most effective, she finds
herself wondering if there are natural groups of individuals who
are most at risk for high weight and high cholesterol, and if there
are such groups, where the natural dividing lines between the
groups occur.
Logistic Regression

Logistic Regression measures the relationship between a


categorical dependent variable and one or more predictor
variables. This is done using a logistic function generally
producing a binary outcome. Multinomial logistic
regression is a class of logistic regression which may yield
results in more than 2 categories.
Simple linear regression

Table 1 Age and systolic blood pressure (SBP) among 33 adult women
SBP (mm Hg)

Age (years)

adapted from Colton T. Statistics in Medicine. Boston: Little Brown,


1974
Simple linear regression

Relation between 2 continuous variables (SBP and age)

y
Slope

Regression coefficient b1
Measures association between y and x
Amount by which y changes on average when x changes by one
unit
Multiple linear regression

Relation between a continuous variable and a set of


i continuous variables

Partial regression coefficients bi


Amount by which y changes on average when xi changes by one
unit and all the other xis remain constant
Measures association between xi and y adjusted for all other xi

Example
SBP versus age, weight, height, etc
Multiple linear regression

Dependent Independent variables


Predicted Predictor variables
Response variable Explanatory variables
Outcome variable Covariables
Multivariate analysis

Model Outcome

Linear regression continous


Poisson regression counts
Logistic regression binomial
......

Choice of the tool according to study, objectives, and the


variables
Logistic regression

Models the relationship between a set of variables xi


dichotomous (eat : yes/no)
categorical (social class, ... )
continuous (age, ...)

and

dichotomous variable Y

Dichotomous (binary) outcome most common


situation in biology and epidemiology
Logistic regression (1)

Table 2 Age and signs of coronary heart disease (CD)


How can we analyse these data?

Comparison of the mean age of diseased and non-


diseased women

Non-diseased: 38.6 years


Diseased: 58.7 years (p<0.0001)

Linear regression?
Dot-plot: Data from Table 2
Logistic regression (2)

Table 3 Prevalence (%) of signs of CD according to age


group
Dot-plot: Data from Table 3

Diseased %

Age (years)
The logistic function (1)

Probability
of disease

x
Example

Age (<55 and 55+ years) and risk of developing


coronary heart disease (CD)
Regression Vs Discriminant Analysis

Read up on Discriminant Analysis.


This also allows you to predict categorical
dependent variable from your data set.

It is a bit stricter on the kind of input data though,


so most of the time we can just use logistic
regression.
Case Study

Can we can predict the chances of the company’s


policy holders suffering second heart attacks?
Using the insurance company case perhaps we can
help policy holders who have suffered heart attacks by
offering weight, cholesterol and stress management
classes or support groups. By lowering these key heart
attack risk factors, clients will live healthier lives.
Association Rules

Association Rule Mining simply discovers links between


attributes in large data sets. The most common example is
called ‘market basket analysis” . This refers to the purchase
patterns that can be learned from observing shoppers
purchase history.
Association rule mining

Proposed by Agrawal et al in 1993.


It is an important data mining model studied
extensively by the database and data mining
community.
Assume all data are categorical.
No good algorithm for numeric data.

Bread  Milk [sup = 5%, conf = 100%]

Following examples from : CS583, Bing Liu, UIC


The model: data

I = {i1, i2, …, im}: a set of items.


Transaction t :
t a set of items, and t  I.
Transaction Database T: a set of transactions T =
{t1, t2, …, tn}.
Transaction data: supermarket data
Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
… …
tn: {biscuit, eggs, milk}
Concepts:
An item: an item/article in a basket
I: the set of all items sold in the store
A transaction: items purchased in a basket; it may
have TID (transaction ID)
A transactional dataset: A set of transactions

385
Transaction data: a set of documents

A text document data set. Each


document is treated as a “bag” of
keywords
doc1: Student, Teach, School
doc2: Student, School
doc3: Teach, School, City, Game
doc4: Baseball, Basketball
doc5: Basketball, Player, Spectator
doc6: Baseball, Coach, Game, Team
doc7: Basketball, Team, City, Game
386
The model: rules

A transaction t contains X, a set of items (itemset) in


I, if X  t.
An association rule is an implication of the form:
X  Y, where X, Y  I, and X Y = 

An itemset is a set of items.


E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
E.g., {milk, bread, cereal} is a 3-itemset

387
Rule strength measures

Support: The rule holds with support sup in T (the


transaction data set) if sup% of transactions
contain X  Y.
sup = Pr(X  Y).
Confidence: The rule holds in T with confidence conf
if conf% of transactions that contain X also contain
Y.
conf = Pr(Y | X)
An association rule is a pattern that states when X
occurs, Y occurs with certain probability.

388
Goal and key features

Goal: Find all rules that satisfy the user-specified


minimum support (minsup) and minimum
confidence (minconf).

389
An example
t1: Beef, Chicken, Milk
t2: Beef, Cheese
t3: Cheese, Boots
Transaction data t4: Beef, Chicken, Cheese
Assume: t5: Beef, Chicken, Clothes, Cheese,
minsup = 30% Milk
minconf = 80% t6:
Chicken, Clothes, Milk
An example frequent itemset: t7:
Chicken, Milk, Clothes
{Chicken, Clothes, Milk} [sup = 3/7]

Association rules from the itemset:


Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]
… …
Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3]

390
Transaction data representation

A simplistic view of shopping baskets,


Some important information not considered. E.g,
the quantity of each item purchased and
the price paid.
Many mining algorithms
They use different strategies and data structures.
Their resulting sets of rules are all the same.
Given a transaction data set T, and a minimum support and a
minimum confidence, the set of association rules existing in
T is uniquely determined.
Any algorithm should find the same set of rules although their
computational efficiencies and memory requirements may be
different.
Apriori Algorithm is a popular one

392
The Apriori algorithm
Probably the best known algorithm
Two steps:
Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets).
Use frequent itemsets to generate rules.

E.g., a frequent itemset


{Chicken, Clothes, Milk} [sup = 3/7]
and one rule from the frequent itemset
Clothes  Milk, Chicken [sup = 3/7, conf
= 3/3]
393
Step 1: Mining all frequent itemsets

A frequent itemset is an itemset whose support is ≥


minsup.
Key idea: The apriori property (downward closure
property): any subsets of a frequent itemset are also
frequent itemsets

ABC ABD ACD BCD

AB AC AD BC BD CD

A B C D
394
The Algorithm

Iterative algo. (also called level-wise search): Find


all 1-item frequent itemsets; then all 2-item
frequent itemsets, and so on.
In each iteration k, only consider itemsets that
contain some k-1 frequent itemset.
Find frequent itemsets of size 1: F1
From k = 2
Ck = candidates of size k: those itemsets of size k
that could be frequent, given Fk-1

Fk = those itemsets that are actually frequent, Fk


 Ck (need to scan the database once).
395
Example –
Finding frequent itemsets
TID Items
Dataset T
T100 1, 3, 4
itemset:count
T200 2, 3, 5
1. scan T  C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3
T300 1, 2, 3, 5
 F1: {1}:2, {2}:3, {3}:3, {5}:3 T400 2, 5
 C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}
2. scan T  C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2
 F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2
 C3: {2, 3,5}
3. scan T  C3: {2, 3, 5}:2  F3: {2, 3, 5}
396
Generating rules from frequent
itemsets
Frequent itemsets  association rules
One more step is needed to generate association
rules
For each frequent itemset X,
For each proper nonempty subset A of X,
Let B = X - A
A  B is an association rule if
Confidence(A  B) ≥ minconf,
support(A  B) = support(AB) = support(X)
confidence(A  B) = support(A  B) /
support(A)
397
Generating rules: an example

Suppose {2,3,4} is frequent, with sup=50%


Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2},
{3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75%
respectively
These generate these association rules:
2,3  4, confidence=100%
2,4  3, confidence=100%
3,4  2, confidence=67%
2  3,4, confidence=67%
3  2,4, confidence=67%
4  2,3, confidence=67%
All rules have support = 50% 398
Generating rules: summary

To recap, in order to obtain A  B, we need to have


support(A  B) and support(A)
All the required information for confidence
computation has already been recorded in itemset
generation. No need to see the data T any more.
This step is not as time-consuming as frequent
itemsets generation.

399
On Apriori Algorithm

Seems to be very expensive


Level-wise search
K = the size of the largest itemset
It makes at most K passes over data
In practice, K is bounded (10).
The algorithm is very fast. Under some
conditions, all rules can be found in linear
time.
Scale up to large data sets

400
More on association rule mining

Clearly the space of all association rules is


exponential, O(2m), where m is the number of
items in I.
The mining exploits sparseness of data, and high
minimum support and high minimum confidence
values.
Still, it always produces a huge number of rules,
thousands, tens of thousands, millions, ...

401
Case Study

Roger is a city manager for a medium-sized, but steadily growing,


city. The city has limited resources, and like most municipalities,
there are more needs than there are resources. He feels like the
citizens in the community are fairly active in various community
organizations, and believes that he may be able to get a number
of groups to work together to meet some of the needs in the
community. He knows there are churches, social clubs, hobby
enthusiasts and other types of groups in the community. What he
doesn’t know is if there are connections between the groups that
might enable natural collaborations between two or more groups
that could work together on projects around town. He decides that
before he can begin asking community organizations to begin
working together and to accept responsibility for projects, he
needs to find out if there are any existing associations between
the different types of groups in the area.
Topic 09

Introduction to Data
Visualisation

ICT394 BI Application
Development
Resources
• Required Reading:
– Knaflic Cole Nussbaumer. 2015. ‘Storytelling with Data.
Chapter 2.’ In Storytelling with Data, 35–69. Hoboken,
New Jersey: John Wiley & Sons. This is available from
the Topic 09 Readings link in Moodle.
• Case Study:
– ‘ViaWest Business Intelligence (BI) Case Study.’ 26AD.
https://www.youtube.com/watch?v=EXEgK-wD4gg
Learning Outcomes
• At the completion of this topic, you should be able to:
– Explain the importance of visualisations in the BI lifecycle
– Create and use a variety of visualisations that are appropriate to the
purpose of the visualisation
– Critique a given visualisation and explain how it could be improved
if required
• This topic contributes to the following unit learning
outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Understand that different users have different requirements for
information
– Present analyses of data using a number of different technique
Lecture Outline
• What is visualisation?
• Types of visualisation
• Line Graphs
• Bar Charts
• Other types of Charts
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 09 Part 02:
What is Visualisation?
What is visualisation?
• A way to tell a story using graphics rather than words?
– This raises some questions?
• Who is the audience?
• Why are we telling them a story?
• How do we know, or communicate that the information is
accurate and relevant
• “The purpose of… data visualizations is to enlighten
people – not to entertain them, not to sell them
products, services or ideas, but to inform them.”
(Cairo, 2016, p.13)
http://www.businessinsider.com
.au/travel-agents-vs-online-hote

What is the story here? l-bookings-2015-


12
Stories…
• The stories we tell with visualisations are for
the purposes of:
– Analysis
– Communication
– Monitoring
– Planning
Analysis
Analysis
Analysis
Analysis
Analysis
Communication
Monitoring

http://vizwiz.blogspot.com.au/2014/12/
donutcharts.html
Planning
Good visualizations…
• Are:
– Presenting reliable information
– Visually encoded so that relevant patterns become
noticeable
– Organised in a way that enables some exploration,
if appropriate
– Presented in an attractive manner (but
remembering that honesty, clarity, and depth
come first
ICT394 Business Intelligence
Application Development
Topic 09 Part 03:
Types of Visualisation
Visualisations
• The reading for this topic suggests there are
several basic types of visualisation:
– Simple text
– Table
– Points
– Lines
– Bars
– Area
Simple Text

http://www.storytellingwithdata.com/blog/20
12/06/power-of-simple-text?rq=simple%
20text
Simple Text

http://www.storytellingwithdata.com/blog/20
12/06/power-of-simple-text?rq=simple%
20text
Tables
Iteration 1
Group Metric A Metric B Metric C
Group 1 $111 $234 $345
Group 2 $123 $345 $567
Group 3 $234 $567 $678
Group 4 $345 $678 $789
Group 5 $456 $789 $890
Iteration 2
Group Metric A Metric B Metric C
Group 1 $111 $234 $345
Group 2 $123 $345 $567
Group 3 $234 $567 $678
Group 4 $345 $678 $789
Group 5 $456 $789 $890

Iteration 3
Group Metric A Metric B Metric C
Group 1 $111 $234 $345
Group 2 $123 $345 $567
Group 3 $234 $567 $678
Group 4 $345 $678 $789
Group 5 $456 $789 $890
Heatmaps
Caption… Values above average
Group Metric A Metric B Metric C Group Metric A Metric B Metric C
Group 1 $111 $234 $345 Group 1 $111 $234 $345
Group 2 $123 $345 $567 Group 2 $123 $345 $567
Group 3 $234 $567 $678 Group 3 $234 $567 $678
Group 4 $345 $678 $789 Group 4 $345 $678 $789
Group 5 $456 $789 $890 Group 5 $456 $789 $890

Low … High
Group Metric A Metric B Metric C
Group 1 $111 $234 $345
Group 2 $123 $345 $567
Group 3 $234 $567 $678
Group 4 $345 $678 $789
Group 5 $456 $789 $890
Points
ICT394 Business Intelligence
Application Development
Topic 09 Part 04:
Line Graphs
Lines
Line Graph Best Practices
• Lines should only be used to connect values
along an interval scale
Line Graph Best Practices
• Lines should only be used to connect values
along an interval scale

http://www.perceptualedge.com/articles/visua
l_business_intelligence/
line_graphs_and_irregular_intervals.pdf
Line Graph Best Practices
• Lines should only be used to connect values
along an interval scale
• Intervals should be equal in size
Line Graph Best Practices
• Lines should only be used to connect values
along an interval scale
• Intervals should be equal in size

http://www.perceptualedge.com/articl
es/visual_business_intelligence/
line_graphs_and_irregular_intervals.pd
f
Line Graph Best Practices
• Lines should only be used to connect values
along an interval scale
• Intervals should be equal in size
• Lines should only connect values in adjacent
intervals
Line Graph Best Practices
• Lines should only be used to connect values
along an interval scale
• Intervals should be equal in size
• Lines should only connect values in adjacent
intervals
Slopegraph

http://www.storytellingwithdata.com/blog/20
13/11/slopegraph-
template
ICT394 Business Intelligence
Application Development
Topic 09 Part 05:
Bar Charts
Bar charts
Grades
90
80
70
60
50 Grades

40
30
20
10
0
HD D C P SX N NA
Baseline

https://flowingdata.com/2014/04/04/fox-news
-bar-chart-gets-it-wrong
/
8000000

7000000

6000000

5000000

4000000

3000000

2000000

1000000

0
27 March 31 March Goal
Right/left…
7200000
7000000
6800000
6600000
6400000
6200000
6000000
5800000
5600000 8000000
5400000
27 March 31 March Goal 7000000

Axis Title 6000000

5000000

4000000

3000000

2000000

1000000

0
27 March 31 March Goal
Goldilocks???
9000000 Too skinny 9000000
Too fat
8000000
8000000
7000000
6000000 7000000
5000000 6000000
4000000 5000000
3000000
4000000
2000000
1000000 3000000
0 2000000
1000000
0
27 March 31 March Goal 4th April 11th April

9000000

8000000

7000000

6000000 Just right 


5000000

4000000

3000000

2000000

1000000

0
https://en.wikipedia.org/w 27 March 31 March Goal 4th April 11th April
iki/
One, two, many…
9000000
8000000 10000000
7000000 9000000
6000000 8000000
7000000
5000000 6000000
4000000 5000000
3000000 4000000
2015 3000000 2015
2000000
2016 2000000 2016
1000000 1000000 2017
0 0

10000000
9000000
8000000
7000000
6000000 27 March
5000000 31 March Goal
4000000 4th April
11th April
3000000
2000000
1000000
0
2015 2016 2017
Vertical or horizontal?
9000000
8000000
7000000
6000000
5000000
4000000
3000000 2015
2000000 2016
1000000
0
11th April

4th April

2016
31 March Goal 2015

27 March

0 0 0 0 0 00
0 00 0 00 0 00 0 00 0
00 00 00 00 0 00
2 4 6 8 10
Stacked
100%
90%
80%
70%
60%
50% 2016
2015
40%
30%
20%
18000000
10%
16000000
0%
27 March 31 March Goal 4th April 11th April 14000000

12000000

10000000
2016
8000000 2015
6000000

4000000

2000000

0
27 March 31 March Goal 4th April 11th April
ICT394 Business Intelligence
Application Development
Topic 09 Part 06:
Other Types of Charts
Pies…
2015 2015

27 March
27 March
31 March Goa
31 March Goal
4th April
4th April
11th April
11th April

2015

6000000
8500000 27 March
31 March Goal
4th April
11th April
7066000

8400000
Donuts…

6000000
8500000
27 March
31 March Goal
4th April
7066000 11th April
8400000

6000000
8500000 27 March
31 March Goal
4th April
7066000 11th April
8400000
3-D
9000000
8400000 8500000
8000000
7066000
7000000
6000000
6000000
5000000
4000000 2015
2016
3000000 2017
2000000
1000000
0
27 March
31 March Goal
4th April
11th April
ICT394 Business Intelligence
Application Development
Topic 09 Part 07:
Topic Summary
Learning Outcomes
• At the completion of this topic, you should be able to:
– Explain the importance of visualisations in the BI lifecycle
– Create and use a variety of visualisations that are appropriate to the
purpose of the visualisation
– Critique a given visualisation and explain how it could be improved
if required
• This topic contributes to the following unit learning
outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Understand that different users have different requirements for
information
– Present analyses of data using a number of different technique
Lecture Outline
• What is visualisation?
• Types of visualisation
• Line Graphs
• Bar Charts
• Other types of Charts
• Topic Summary
Where to next?
• The next topic is Effective Visual Design
Topic 10

Effective Visualisation
Design

ICT394 BI Application
Development
Resources
• See Topic 10 Readings in Moodle for links
– Interaction design and Gestalt Principles (youtube)
– Perception in Visualisation (Healey)
– Tapping the Power of Visual Perception (Few)
Learning Outcomes
• This topic contributes to the following Unit Learning Outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Understand that different users have different requirements for information
– Present analyses of data using a number of different technique
• At the completion of this topic, you should be able to:
– Apply your understanding of the concept of cognitive load to visualisation
design
– Demonstrate how to remove “clutter” from visualisations
– Explain and give examples of the Gestalt Principles of Visual Perception in the
context of visualisations used in business intelligence
– Critique and improve a given visualisation
Lecture Outline
• Cognitive Load
• Gestalt Principles of Visual Perception
• Pre-attentive Attributes
• Topic Summary
ICT394 Business Intelligence
Application Development
Topic 10 Part 02:
Cognitive Load
Cognitive Load
• Is the mental effort required to learn new
things
– Recall that visualisations are there to “tell a story”
– If we are to effectively tell the story we want, then
we need to reduce the effort required by the
audience to understand the story
Memory in Visualisation Design
• Iconic
– “Graphics buffer”
• Short term
– RAM
• Long term
– HDD
Short term memory

https://www.perceptualedge.com/articles/ie/
visual_perception.pdf
Clutter
• Is the “stuff” in our visualisations that don’t
add any value to us telling the story
– ...making our visualisations more complicated
than necessary
– It increases the cognitive load without adding to
the story
• The audience might not be bothered to try to
understand
Reducing Clutter
• If we think about visualisations as messages
that we are sending, then we can think about:
– Information as the signal
– Clutter as the noise
• …we are trying to increase the signal to noise
ratio!
• Gestalt principles of Visual Perception tell us
how individuals tend to perceive order
Reduce clutter

http://www.storytellingwithdata.com/blog/20
11/07/gridlines-are-
gratuitous
ICT394 Business Intelligence
Application Development
Topic 10 Part 03:
Gestalt Principles of Visual
Perception
Gestalt Principles of Visual Perception

• There are six principles that are of interest to


us in terms of designing effective
visualisations:
– Proximity
– Similarity
– Enclosure
– Closure
– Continuity
– Connection
Proximity
• Objects that are close together are seen to form
groups

https://au.pinterest.com/hannokoen/proximity
-in-gestalt
/
Similarity
• Objects of a similar colour, shape, size or
orientation are seen to relate to one another
or form a group

http://www.3rootsstudios.com/inspiration-gra
phic-design-gestalt
/
Enclosure
• Objects that are physically enclosed together
(perhaps by a boundary) are seen to be part of
a group

http://www.excelcharts.com/blog/data-visualiz
ation-excel-users/gestalt-laws
/
Closure
• Open structures are perceived as closed,
complete, and regular whenever there is a
way that they can be reasonably interpreted
as such

https://au.pinterest.com/hannokoen/proximity
-in-gestalt
Continuity
• Objects that are aligned together or appear to
be a continuation of one another are
perceived as a group

https://au.pinterest.com/hannokoen/proximity
-in-gestalt
Connection
• Objects that are connected (e.g., by a line) are
perceived as a group

https://au.pinterest.com/hannokoen/proximity
-in-gestalt
/
ICT394 Business Intelligence
Application Development
Topic 10 Part 04:
Preattentive Attributes
Preattentive Attributes
• These are things that “pop out” from things
you see

http://ed-informatics.org/2010/01/25/medical-computing-8/
Preattentive Attributes
• These are things that “pop out” from things
you see

http://ed-informatics.org/2010/01/25/medical-computing-8/
Categories of preattentive attributes
• Few (2012) suggests there are three main categories
of preattentive attributes that are relevant to us:
– Form
– Colour
– Spatial position
• The other main category is Motion, though we
make less use of it than the others in the
context of BI
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure
https://www.interaction-design.org/literature/
article/preattentive-visual-properties-and-how-
to-use-them-in-information-
visualization
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure https://www.interaction-design.org/literature/
article/preattentive-visual-properties-and-how-
to-use-them-in-information-
visualization
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure
https://www.interaction-design.org/literature/
article/preattentive-visual-properties-and-how-
to-use-them-in-information-
visualization
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure
https://
www.interaction-design.org/literature/article/
preattentive-visual-properties-and-how-to-use-
them-in-information-visualization
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure https://
www.interaction-design.org/literature/article/
preattentive-visual-properties-and-how-to-use-
them-in-information-visualization
Form
• Length
• Width
• Orientation
• Shape
• Size
• Enclosure https://
www.interaction-design.org/literature/article/
preattentive-visual-properties-and-how-to-use-
them-in-information-visualization
Colour
• Hue
• Intensity

http://colorbrewer2.org/
Colour
• Hue
• Intensity

https://en.wikipedia.org/wiki/HSL_and_HSV
Spatial Position
• 2-D Position
• Grouping

https://en.wikipedia.org/wiki/HSL_and_HSV
Applying visual attributes to design
• Encoding of quantitative values
– FORM:
• Length, width (limited), size (limited)
– COLOUR
• Intensity (limited)
– Position
• 2D position
Context
• Context can have an impact on preattentive
attributes
http://www.cse.dmu.ac.uk
/~sexton/WWWPages/Col
our/colour4.
html

http://www.thephilosophyresource.co.uk/wp-c
ontent/uploads/2013/10/
plato_akrasia_illusion.png
Limits to distinct perceptions
• Preattentive symbols can become less useful if
we are not careful with how they are used
– There are limits to how many differences we can
process preattentively
• E.g., about 8 hues, 4 orientations, 4 sizes
– Use them wisely!
• Better to use fewer attributes than more
ICT394 Business Intelligence
Application Development
Topic 10 Part 05:
Topic Summary
Learning Outcomes
• This topic contributes to the following Unit Learning Outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Understand that different users have different requirements for information
– Present analyses of data using a number of different technique
• At the completion of this topic, you should be able to:
– Apply your understanding of the concept of cognitive load to visualisation
design
– Demonstrate how to remove “clutter” from visualisations
– Explain and give examples of the Gestalt Principles of Visual Perception in the
context of visualisations used in business intelligence
– Critique and improve a given visualisation
Lecture Outline
• Cognitive Load
• Gestalt Principles of Visual Perception
• Pre-attentive Attributes
• Topic Summary
Where to from here…
• Next topic is:
– Visualisation Best Practice
• Choosing the right chart type
• Creating effective views
• Designing holistic dashboards
• Perfecting your work
• Evaluating your work
Topic 11

Visualisation Best Practice

ICT394 BI Application
Development
Resources
• All readings are available from the Topic readings list on
Moodle:
– Tableau inc. n.d. ‘Visual Analysis Best Practices: Simple Techniques for
Making Every Data Visualization Useful and Beautiful.’
http://www.tableau.com/sites/default/files/media/whitepaper_visual
-analysis-guidebook_0.
pdf
– ‘Which Chart or Graph Is Right for You? | Tableau Software.’ n.d.
http://www.tableau.com/learn/whitepapers/which-chart-or-graph-is-r
ight-for-you?ref=wc&amp;signin=
d16c3ac7d42e642ceb4fc753a4468325
– Few Stephen. n.d. ‘Tapping the Power of Visual Perception.’ 2006.
‘Information Dashboard Design: The Effective Visual Communication
of Data. Chapter 3.’ In Information Dashboard Design: The Effective
Visual Communication of Data, 48–76. Sebastopol, CA: O’Reilly.
Learning Outcomes
• This topic contributes to the following Unit Learning Outcomes:
– Demonstrate an understanding of the role of BI in organisations
– Understand that different users have different requirements for
information
– Present analyses of data using a number of different technique
 
• At the completion of this topic, you should be able to:
– Design an effective dashboard
– Select the most appropriate chart or graph for a given situation, and
justify the choice
– Explain a process for evaluating your own design choices
Lecture Outline
• Choosing the right chart type
• Creating effective views
• Designing holistic dashboards
• Perfecting your work
• Evaluating your work
Choosing the right chart type
Choosing the right chart type
• Trends over time
• Ranking
• Correlation
• Distribution
• Part to whole
• Geographical data
Trends over time
Trends over time
Trends over time
Trends over time
Trends over time
Trends over time

http://vizwiz.blogspot.com.au/2012/10/stacke
d-area-chart-vs-line-chart-
great.html
Trends over time

http://betterevaluation.org/evaluation-options http://betterevaluation.org/evaluation-o
/ ptions/
slopegraph split_axis_bar_graph
Trends over time

http://betterevaluation.org/evaluation-options
/
LineGraph
Ranking
Correlation
Correlation
Correlation
Distribution
Distribution
Part to Whole
Geographical data

https://twitter.com/simongerman600/status/7
35242824954699776/photo/
1
Creating effective views
Emphasise the most important data
• The visualisation tools we use are very powerful
and so allow us to do some very sophisticated
things…
– For example, in Tableau you often have choices as to
where the measures are positioned in your graph:
• X or Y-axis
• Or colour, size or shape
– An important rule is that the more important data
should be shown on the x and y-axes
Lot Size

Home
Size
Chart orientation
Avoid overloading your graphs
• The tools will allow us to do a whole heap of
stuff that may or may not be useful to our
graphs
– Overloading is a common mistake
• because we can, and we get involved in telling our story
– we understand how the graph came together, but it is
not immediately obvious to others
Limit the number of colours and shapes in a
single graph
• Similar to the previous one, and as we have
discussed before, there are limits as to what
can be easily processed and understood
– Limit the number of colours and shapes in a single
graph to 7-10
Designing holistic dashboards
General dashboard design principles
• Most important graph at the top or top left
• If users are expected to move from one graph to
another, then go topbottom and leftright
with the final view at the bottom right
• Limit the number of graphs to three or four
• Avoid using multiple colour schemes
• Group filters together
• Put the legend(s) with the filter(s)
Perfecting Your Work
Perfecting your work
• Colours
– Work together without clashing
– Less than 7-10
• Fonts
– Consistent use throughout
– No more than 3 different ones on a dashboard
• Labels
– Clear, concise
– Placement
– Levelling
• Tooltips
– Useful?
Evaluating your work
How do I know my visualisation is any good?

• You need to ask yourself a number of


questions
– What questions are you trying to answer?
– Do you have the right chart type for your analysis?
– Are your graphs effective?
– Is your dashboard holistic?
– Are there other things you could do to polish your
work up?
Topic 12

Unit Review

ICT394 BI Application
Development
Resources
• Learning Guides for each topic
– These are available on Moodle
• Readings and case studies for each topic
– The links for the readings for each topic are on
Moodle
Lecture Outline
• Topic Reviews
• Exam Mechanics and Technique Tips
Topic Reviews
Topic 01: What is BI?
• At the completion of this topic, you should be able to:
– Provide and discuss a working definition of BI
– Explain a variety of reasons as to why BI exists
– Provide an overview of BI in terms of data, analysis and
presentation
• Sample Exam Question:
– Provide a definition of BI and give an example of a BI
implementation including a clear explanation of how the
organisation was impacted.
• Using the example you provided for the question above, discuss
the major elements of BI; Data, Analysis and Presentation.
Topic 01 Case Study
• Hitachi Solutions Canada
– https://www.youtube.com/watch?v=hDJdkcdG1iA
– Grocery Store Metaphor
• Tesco
– https://www.youtube.com/watch?v=i83dyhyOTtw
– Loyalty Card (Clubcard)
Topic 02: BI Lifecycle
• At the completion of this topic, you should be
able to:
– Describe a high-level BI implementation roadmap
– Compare and contrast the implementation of BI
with system development projects
– Explain the activities that would typically happen
in the Justification and
– Planning stages of BI application development
Topic 02: BI Lifecycle
• Sample Exam Questions:
– Moss and Atre (2003) suggest there are 6 major
development steps in the creation of BI. Provide a one
paragraph explanation of each of these steps.
– Explain, using examples as appropriate, how a business
case assessment for a BI project would proceed.
– Planning for a BI project will involve multiple steps;
Enterprise Infrastructure, Evaluation and Project
Planning. Give an explanation of the activities that
would be involved with both of those steps.
Topic 02: Case Study
• University of Konstanz
– https://www.youtube.com/watch?v=zp0BbAO-G
HU

– BI Implementation
• Reasons for implementing
• Existing systems
• Features of the solution
Topic 03: Data Warehousing
• At the completion of this topic, you should be
able to:
– Provide a definition of a data warehouse, including
examples as to how and in what circumstances it
would be used
– Describe and provide examples as to the difference
between operational and analytical databases
– Discuss the components of a data warehouse system
– Discuss the basic steps in the development of a data
warehouse
Topic 03: Data Warehousing
• Explain the following parts of the data warehouse definition:
– Structured repository
– Integrated
– Subject oriented
– Enterprise-wide
– Historical
– Time-variant
– Developed for the retrieval of analytical information
– May include data at the fine level of detail of summary data or both
• Explain, using examples as appropriate, why a data warehouse would be created as a
separate data store?
• What are the major components of a data warehouse?
• Briefly describe the process of developing data warehouse front-end applications.
• Explain, using examples, why the data warehouse design process is an iterative process.
Topic 03: Case Study
• Daimler-Chrysler
– https://www.youtube.com/watch?v=N78lHpiCD0k
– Consolidation of multiple customer databases
• Group 1 Software
– Code-1 Plus
Topic 04: Data Warehouse Design
• At the completion of this topic, you should be
able to:
– Explain how and why dimensional modelling is
used to design data warehouses
– Create a dimensional model from a given case
study
– Explain, and be able to resolve, some of the
problems often associated with dimensional
modelling
Topic 04: Data Warehouse Design
• Sample Exam Questions
– Explain the role and give examples of the following:
• Dimension tables
• Fact tables
– How does the use of a dimensional model simplify
analytical queries?
– How is transaction time typically represented in a
dimensional model?
– Explain and give examples of what is meant by the
granularity of a fact table.
– Explain and give an example of a summarisability problem.
Topic 04: Case Study
• Dimensional Modeling Exercise
– Exercise 8.1 Jukic et al (2014) p. 266-7
Topic 05: Data Warehouse Implementation

• At the completion of this topic, you should be


able to:
– Explain the role of the ETL process in BI, including
describing the activities involved in ETL design,
and deliverables that result from those activities
and the roles involved
– Discuss the different types of ETL programs
– Create a source-to-target mapping document
Topic 05: Data Warehouse Implementation

• Sample Exam Questions:


– Explain, using relevant examples, the three
different approaches to loading data into a target
BI database.
– Explain, using examples, why it is said that
Transformation takes 80% of the ETL effort.
Topic 05: Case Study
• Meson BI Feature Case Study
– Royal Liverpool Hospital
• Existing platform not delivering
• Benefits
• Controlled Access
• Dimensional Modeling Exercise
– Exercise E8.2 Jukic et al (2014)
Topic 06: OLAP
• At the completion of this topic, you should be
able to:
– Explain how and why OLAP is used
– Explain and give examples of the OLAP operators,
slice/dice, pivot and drill down/up
– Describe the difference between discrete and
continuous data and how they are treated in
Tableau
Topic 06: OLAP
• Sample Exam Questions:
– What is OLAP and what would it be used for?
– Explain the major differences between OLAP and
“traditional” query processing in terms of the
structures and operators
Topic 06: Case Study
• What is OLAP?
– Why is it more efficient for dimensional analysis?
– Challenges in implementing and managing
• OLAP Operators
– Exercise
• Drill-up/down
• Slice/Dice
• Pivot
Topic 07: Business Analytics
• At the completion of this topic, you should be
able to:
– Define and provide appropriate examples of business
analytics
– Explain, using examples, the differences between
descriptive, predictive and prescriptive analytics
– Provide examples of the use of different types of
reports
– Perform various analytics techniques on given data
sets
Topic 07: Business Analytics
• Sample Exam Questions:
– Using examples, list two desirable functionalities of a
reporting tool.
– Give an example of how visualization can assist with
decision making.
– Describe the differences between a metric management
report and a dashboard.
– Provide an example of each of the following: descriptive
analytics, predictive analytics, prescriptive analytics
– Provide an example of how insight is provided by analytics.
Topic 07: Case Study
• Allrecipes
– Use of analytics in this business
– Insight development
– Outcomes
• SMART Goals
– Rock mining video
– How to make goals SMARTer
Topic 08: Data Mining
• At the completion of this topic, you should be able
to:
– Define and give examples of data mining as an enabling
technology for business intelligence and analytics
– Understand and give examples of the objectives and
benefits of data mining
– Give examples of a wide range of data mining
applications
– Understand the standardised data mining process
(CRISP-DM)
Topic 08: Data Mining
• Sample Exam Questions
– Describe two of the main data mining application areas and
include examples of how data mining would be used in those
areas.
– Why is there a need for a standard data mining process? Explain
the steps of CRISP-DM.
– Explain, using an example, why there is a need for data pre-
processing. What are the main tasks and relevant techniques
used in data pre-processing?
– What is the main difference between classification and
clustering? Explain using examples of both.
– Explain why privacy is an issue for data mining?
Topic 08: Case Study
• The Checkout Video (Target)
– How data mining has changed advertising
– Possible outcomes from data mining
– Recommendations as a consumer
Topic 09: Introduction to Data Visualisation

• At the completion of this topic, you should be


able to:
– Explain the importance of visualisations in the BI
lifecycle
– Create and use a variety of visualisations that are
appropriate to the purpose of the visualisation
– Critique a given visualisation and explain how it
could be improved if required
Topic 09: Introduction to Data Visualisation

• Sample exam questions


– Evaluations of graphs and charts
Topic 09: Case Study
• ViaWest BI
– Report creation
– Multiple systems as input
– Long term vision
Topic 10: Effective Visual Design
• At the completion of this topic, you should be
able to:
– Apply your understanding of the concept of cognitive
load to visualisation design
– Demonstrate how to remove “clutter” from
visualisations
– Explain and give examples of the Gestalt Principles of
Visual Perception in the context of visualisations used
in business intelligence
– Critique and improve a given visualisation
Topic 10: Effective Visual Design
• Sample Exam Questions:
– Explain and give an example of how cognitive load
impacts on the capacity of a visualisation to tell
the right story.
– Explain and give examples of the Gestalt principles
of Proximity and Similarity and how they could be
used in the design of an effective visualisation.
– Explain and give examples of how preattentive
attributes can be used in visualisation design.
Topic 10: Case Study
• Gestalt Principles
• Visualisation critiques
Topic 11: Visualisation Best Practice

• At the completion of this topic, you should be


able to:
– Design an effective dashboard
– Select the most appropriate chart or graph for a
given situation, and justify the choice
– Explain a process for evaluating your own design
choices
Topic 11: Visualisation Best Practice
• Sample Exam Questions:
– Which chart type would be used in the scenario where
you were needing to show the approval ratings for three
US presidential candidates?
• What chart type would be needed if you needed to show how
the approval ratings had changed over the course of the
election campaign?
– List and explain two general best practice guidelines for
the design of an interactive dashboard.
– Where in a chart would you put the most important data
in order to ensure it was the most noticed?
Topic 11: Case Study
• Critique of dashboards
The Exam
When you click on the link…
Mechanics…
• 2 hours + 20 minutes duration
• Online, non-sequential (i.e., you can go back
to earlier questions), open book
• There are 20 questions, each worth 5 marks
• You should be spending 5-6 minutes on each
question and use this time as a guide as to
how much to write for each question
Exam technique…
• Most of the questions do not have right/wrong
answers, but are asking you to explain, discuss,
give an example of…
– The exam is an opportunity for you to demonstrate
what you have understood from the unit
• Most of the time, a one line answer will not attract that
many marks because you haven’t adequately answered
the question
– Of course, make sure you read the question
• ...and answer all of the bits
What are the questions?
• As a rough guide (some questions will cover
multiple topics)
– Topic 01: 1 question
– Topic 02: 1 question
– Topics 03, 04, 05: 6 questions
– Topics 06, 07, 08: 4 questions
– Topics 09, 10, 11: 5 questions + 3 charts

You might also like