You are on page 1of 47

Schulich School of Business MBAN 5140

Visual Analytics & Modelling


David Elsner

Class 6 - Discussion

Fall 2022

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
Class 6
• Discussion:
• Hot or Not Visualizations
• Data Modelling Fundamentals
• Comparing Data Connection Options
• 10 Best Practices for Modelling Data

• Watch:
• Data Connection Options
• Showing Breakdowns of the Whole
• Viewing Specific Values
• Tableau Practice Examples

• Readings and Videos:


• “Spreadsheet Thinking vs. Database Thinking” by Robert Kosara
• “The self-sufficiency test” by Kaiser Fung
• Data Model Enhancements by Tableau [57m]
• Create a Delicious Data Cocktail: Joins versus Blends by Tableau [55m]

• Assignments:
• Weekly Quiz 5
• Exercise 3 assigned

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
2
Hot or Not

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
3
• Lecture 6 – Feb 19
Hot or Not | The quest to power Africa

http://www.getchee.com/?p=15316

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
4
• Lecture 6 – Feb 19
Hot or Not | Poll

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
5
• Lecture 6 – Feb 19
Hot or Not | The Marvel Cinematic Universe

https://graphics.straitstimes.com/STI/STIMEDIA/Interactives/2018/04/marvel-cinematic-universe-whos-who-interactive/index.html

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
6
• Lecture 6 – Feb 19
Hot or Not | Current State Assessment

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
7
Data Modelling Fundamentals
• What is Data Modelling?
• Types of Data Models
• Conceptual Data Models
• Logical Data Models
• Physical Data Models
• Data Modelling Process
• Types of Data Modelling
• Understanding Data
• Comparing Data Connection Options
• Relationships
• Joins
• Common Pitfalls
• Cross Database Joins
• Unions
• 10 Best Practices for Modelling Data
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
8
What is Data Modelling?
• Data modelling is the process of creating a visual representation of a data model for the data
to be stored. This data model is a conceptual representation of data objects, the associations
between different data objects, and the rules.
• The goal is to illustrate the types of data used and stored within the model, the relationships
among these data types, the ways the data can be grouped and organized, and its formats
and attributes.
• Data modeling helps in the visual representation of data and enforces business rules,
regulatory compliances, and government policies on the data.

• The Data Model is defined as an abstract model that organizes data description, data
semantics, and consistency constraints of data.
• The data model emphasizes what data is needed and how it should be organized. It is like a
roadmap or a diagram that facilitates a deeper understanding of what is being designed.
• Data models ensure consistency in naming conventions, default values, semantics, security
while ensuring the quality of the data.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
9
Types of Data Models
• Like any design process, data model design begins at a high level of abstraction and becomes
more concrete and specific.
• Data models can be divided into three categories, which vary according to their degree of
abstraction.
• The process will start with a conceptual model, progress to a logical model and conclude with a
physical model.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
10
Conceptual Data Models
• A big-picture view of what the model will contain, how it will be organized, and which business rules apply.

• Conceptual models are usually created as part of gathering initial project requirements.

• They include entity classes (defining the types of things that are important for the business to represent in the
data model), their characteristics and constraints, the relationships between them and relevant security and
data integrity requirements.

Customers Employees

Orders

Products Returns

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
11
Logical Data Models
• Less abstract and provide greater detail about the concepts and relationships under consideration.
• Formal data modeling notation systems are followed. These indicate data attributes, such as data types and
their corresponding lengths, and show the relationships among entities.
• Logical data models don’t specify any technical system requirements.
• Logical data models can be useful for projects that are data-oriented by nature, such as data warehouse
design or reporting system development.

Orders
Market Employees

Customers Region Employee ID


Country Employee Name
Customer ID State Market
Customer Name City Region
First Sale Date Employee ID Title
Gender Customer ID Hire Date
Market Order ID Birth Date
Region Order Date Email Address
Rewards Member Year (OrderDate) Marital Status
Order Priority Gender
Product ID Base Rate
Products Product Name Sales Quota
Category Profit Quota
Product ID
Sub-Category
Product Name
Segment
Colour
Ship Date Returns
Brand
Ship Mode
Category
Payment Type Returned
Sub-Category
Discount Order ID
Profit Market
Quantity
Sales
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 Shipping Cost CLASS 6
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
12
Physical Data Models
• Provide a schema for how the data will be physically stored within a database.
• They are the least abstract of the three models.
• Offer a finalized design that can be implemented as a relational database, including tables that illustrate the
relationships among entities as well as the primary keys that will be used to maintain those relationships

Orders
Market String Employees
Customers Region String
Country String Employee ID String
Customer ID String State String Employee Name String
Customer Name String City String Market String
First Sale Date Date Employee ID String Region String
Gender String Customer ID String Title String
Market String Order ID String Hire Date Date
Region String Order Date Date Birth Date Date
Rewards Member String Year (OrderDate) Number Email Address String
Order Priority String Marital Status String
Product ID String Gender String
Products Product Name String Base Rate Number
Category String Sales Quota Number
Product ID String Profit Quota Number
Sub-Category String
Product Name String
Segment String
Colour String
Ship Date Date Returns
Brand String
Ship Mode String
Category String
Payment Type String Returned String
Sub-Category String
Discount Number Order ID String
Profit Number Market String
Quantity Number
Sales Number
Shipping Cost Number

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
13
Data Modelling Process
• Data modelling requires evaluating data processing and storage in detail. Data modeling techniques dictate
which symbols are used to represent the data, how models are laid out, and how business requirements are
conveyed. The approach is a workflow with a sequence of tasks that generally looks like this:

1. Identify the entities. The process begins with the identification of the things, events or concepts that are
represented in the data set that is to be modeled.

2. Identify key properties of each entity. Each entity type can be differentiated from others because it has
one or more unique properties, called attributes. For example, an entity called “customer” might
possess attributes as a first name, last name, and phone number, while an entity called “address”
may include a street name and number, a city, province, country and postal code.

3. Identify relationships among entities. The data model will specify the type of relationships each entity
has with the others. For example, each customer “lives at” an address. If that model were expanded to
include an entity called “orders,” each order would be shipped to and billed to an address as well.

4. Map attributes to entities. This will ensure the model reflects how the business will use the data.

5. Assign keys and decide on a degree of normalization that balances redundancy with performance
requirements. Normalization is a technique for organizing data models in which identifiers, called
keys, are assigned to data to represent relationships between them without repeating the data. For
example, if customers are each assigned a key, that key can be linked to both their address and their
order history without having to repeat this information in the table of customer names.

6. Finalize and validate the data model. Data modeling is an iterative process that should be repeated
and refined as business needs change.
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
14
Types of Data Modelling
Data modelling has two common model types:
• Hierarchical data models
• Represent one-to-many relationships in a treelike format. In this type of model, each record
has a single root or parent which maps to one or more child tables.
• This model was introduced in 1966 and was widely used in banking. While approach is less
efficient than more recent models, it’s still used in Extensible Markup Language (XML)
systems and geographic information systems (GISs).
• Relational data models
• Initially proposed by IBM researcher E.F. Codd in 1970. They are still used today in relational
databases used in enterprise computing.
• Relational data modelling doesn’t require a detailed understanding of the physical
properties of the data storage being used. Data segments are explicitly joined using tables,
reducing database complexity.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
15
Types of Data Modelling
• Relational databases typically use structured query language (SQL) for data management. They work well for
maintaining data integrity and minimizing redundancy. They’re often used in POS and transaction systems.

• Entity-relationship (ER) data models use diagrams to represent the relationships between entities in a
database. Data architects use them to create visual maps showing database design objectives.

Supplier Delivery Order Detail Delivery

Delivery ID Delivery ID Delivery ID


Delivery Date Delivery Date Order ID
Supplier ID Supplier ID Order Detail ID

Product Order Detail Order

Order Detail ID
Order ID
Product ID Product ID
Order Date
Supplier ID Order ID
Headquarters ID
Product Quantity

Store Headquarters

Headquarters ID
Store ID
Store ID

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
16
Types of Data Modelling
• Relational databases typically use structured query language (SQL) for data management. They work well for
maintaining data integrity and minimizing redundancy. They’re often used in POS and transaction systems.

• Object-oriented data models became popular in the mid-1990s. The “objects” are abstractions of real-
world entities. Objects are grouped in hierarchies and have associated features. Object-oriented
databases can incorporate tables and support more complex data relationships. This approach is
employed in multimedia and hypertext databases.

Object 1: Sales Report Object 1: Instance

Month 05-15-22
Product Code 1234
Vendor 897-STA
Revenue $345.67

Object 2: Sales Activity

Customer
Product Code
Product Name
Sales Associate
Date of Sale
Price

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
17
Types of Data Modelling
• Relational databases typically use structured query language (SQL) for data management. They work well for
maintaining data integrity and minimizing redundancy. They’re often used in POS and transaction systems.

• Dimensional data models were designed to optimize data retrieval speeds for analytic purposes in a
data warehouse. Dimensional models increase redundancy to make it easier to locate information for
reporting and retrieval. This modeling is typically used in OLAP systems.

• A popular dimensional data model is the star schema, with data organized into facts (measurable items) and
dimensions (reference information), with each fact surrounded by its dimensions in a star-like pattern.

Orders Facts table Employees dimensions table


Customer dimensions table Market Employee ID
Customer ID Region Employee Name
Customer Name Country Market
State Region
First Sale Date
City Title
Gender
Employee ID Hire Date
Market Customer ID
Region Birth Date
Order ID
Rewards Member Order Date Email Address
Year (OrderDate) Marital Status
Order Priority Gender
Products dimensions table Product ID Base Rate
Product Name Sales Quota
Product ID Category Profit Quota
Product Name Sub-Category
Colour Segment
Brand Ship Date Returns dimensions table
Category Ship Mode
Sub-Category Payment Type Returned
Discount Order ID
Profit Market
Quantity
Sales
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6
Shipping Cost
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
18
Understanding Data
Tall Wide
Province Year Value Province 1980 1990 2000 2010 2020
Ontario 1980 2 Ontario 2 5 2 5 4
Ontario 1990 5 Quebec 3 1 4 1 5
Ontario 2000 2 Alberta 3 9 8 7 5
Ontario 2010 5
Ontario 2020 4
Quebec 1980 3 Tall vs. Wide
Quebec 1990 1
• Data usually comes in one of two formats: tall
Quebec 2000 4
(normalized) or wide (typical Excel format).
Quebec 2010 1
Quebec 2020 5 • When working with databases and structured data,
Quebec 1980 3 Tall data is best practice as it records entries for each
Alberta 1990 9
observation in the underlying information.
Alberta 2000 8 • When connecting data to Tableau, using Tall data
Alberta 2010 7 allows you to utilize Tableau to its fullest capabilities.
Alberta 2020 5

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
19
Understanding Data

Value Wide
Province 1980 1990 2000 2010 2020
• Joining information between two wide
Ontario 2 5 2 5 4
tables is a complicated process
Quebec 3 1 4 1 5
impossible to perform in Tableau.
Alberta 3 9 8 7 5
• Instead of joining two rows of
information together, with wide data you
Inventory must join two intersections of a matrix.
Province 1980 1990 2000 2010 2020 This introduces a high degree of
Ontario 25 23 36 40 37 difficulty and is not robust or flexible.
Quebec 21 17 12 24 37
• Years are dimensional and do not need
Alberta 22 19 14 35 32
to be recorded as separate columns.
Instead, this data should be pivoted to
provide a single row per record in the
table and one Year column.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
20
Understanding Data

Value Inventory Tall (Normalized)


Province Year Value Province Year Inventory
• The primary advantage of using tall
Ontario 1980 2 Ontario 1980 25 data is when joining information from
Ontario 1990 5 Ontario 1990 23 two tables, you are referencing
Ontario 2000 2 Ontario 2000 36 individual rows.
Ontario 2010 5 Ontario 2010 40
Ontario 2020 4 Ontario 2020 37
• In the example, joining inventory values
Quebec 1980 3 Quebec 1980 21
to the value information is as simple as
finding both rows that have
Quebec 1990 1 Quebec 1990 17
Province = ‘Ontario’ and Year = ‘2000’.
Quebec 2000 4 Quebec 2000 12
Quebec 2010 1 Quebec 2010 24 • Tall data functions like a pivot table in
Quebec 2020 5 Quebec 2020 37 Excel, allowing you to efficiently cut, slice
Quebec 1980 3 Quebec 1980 22 and group.
Alberta 1990 9 Alberta 1990 19
Alberta 2000 8 Alberta 2000 14
Alberta 2010 7 Alberta 2010 35
Alberta 2020 5 Alberta 2020 32

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
21
Comparing Data Connection Options
• A data source is comprised of data connections. The connections used will depend on what
data needs to be accessed.

Method Definition Use this when

Relationships are a dynamic, flexible way to combine data The data to be analyzed exists in
Relationship
from multiple tables for analysis. separate tables in the database.

A relationship of two tables in a single database using a The data to be analyzed exists in
Join
common field (a key field or indexed field). separate tables in the database.

Cross Database A relationship across two different databases or text tables The data to be analyzed exists in
Joins based on a common field. different data sources.

Appended rows from different tables with the same column The tables have the same columns
Union
names. but are not stored in the same file.

A combination of data from different databases or text tables The data to be analyzed exists in
Blend
based on a common dimension. Behaves like a left join. different data sources.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
22
Relationships
• Relationships are a new, flexible way to combine data for multi-table analysis in Tableau. You
don’t define join types for relationships, so you won’t see a Venn diagram when you create them.
• No up-front join type. You only need to select matching fields to define a relationship (no
join types). Tableau attempts to create the relationship based on existing key constraints
and matching field names. You can then check to ensure they are the fields you want to
use or add more field pairs to better define how the tables should be related.
• Automatic and context-aware. Tableau automatically selects join types based on the fields
being used in the visualization. During analysis, Tableau adjusts join types intelligently and
preserves the native level of detail in your data.
• Flexible. Relationships can be many-to-many and support full outer joins. When you
combine tables using relationships, it’s like creating a custom, flexible data source for every
viz, all in a single data source for the workbook.
• https://help.tableau.com/current/pro/desktop/en-us/datasource_multitable_normalized.htm

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
23
Joins
• Joins relate two or more tables in a database into a single result set.

• The tables must have a common field to define the relationship between the rows (usually a key field).

• Joins can be used to retrieve additional columns to your result set from a different table, but the table
relationship must be first defined in Tableau.

• You can join multiple tables in Tableau, but these relationships need to be defined for each new table included
in the data source.

• There can be multiple clauses within each join and clauses can be different when joining more than two tables.

• In the example tables below, both tables contain a column for the Product field, but the values aren't the same
in each table.

Sales Table Product Table

Product Customer Sales Product Detail


1 A $10 1 Furniture
2 B $20 2 Books
3 C $30 4 Supplies

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
24
Joins

Inner Left
Returns records from sales and Returns all records from sales and the
product only when there is a matching records from product. In this
matching record in both tables. example, sales with a null or non-matching
product are also returned.

Right Full Outer


Returns all records from product and the Returns all records from sales and product. In
matching records from sales. In this this example, matching and non-matching
example, products with a null or non- records from both tables are returned. This is
matching sales are included (i.e. they the equivalent of doing a union on a left outer
have no sales). and right outer join.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
25
Inner Join
Sales Table Product Table

Product Customer Sales Product Detail


1 A $10 1 Furniture
2 B $20 2 Books
3 C $30 4 Supplies

• An inner join is the default join in Tableau. An inner join returns transactions that have been recorded in both
tables.

• In the following example, using an inner join to relate the Sales to the Product table returns only records where
a match is found for the Product number.

Product Customer Sales Detail


1 A $10 Furniture
2 B $20 Books

• In this example, you would not see the following records returned for an inner join:

• Products that have not been sold, because they do not have a corresponding row in the Sales table.

• A sales transaction with no matching record in the Product table: for example, the sale of Product 3 to
Customer C. The product table does not have an item number 3.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
26
Left Join
Sales Table Product Table

Product Customer Sales Product Detail


1 A $10 1 Furniture
2 B $20 2 Books
3 C $30 4 Supplies

• A left join returns all rows from the left table and only matching rows from the right table.

• The following example shows the results of sales on the left and Product on the right.

Product Customer Sales Detail


1 A $10 Furniture
2 B $20 Books
3 C $30

• The left Join relationship more accurately reflects the actual sales transactions that have occurred.

• Although the Detail is not included because there isn’t a match, the sale of Product 3 to Customer C is included
in this result.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
27
Right Join
Sales Table Product Table

Product Customer Sales Product Detail


1 A $10 1 Furniture
2 B $20 2 Books
3 C $30 4 Supplies

• A right join uses logic similar to a left join but changes the direction of the join.
• A right join returns all rows from the table on the right and only matching rows from the left table.
• In the following above, a right join is useful for identifying which products have not been sold. If the database
does not support a right join, you can achieve the same result by placing Product table on the left and then
creating a left join to the Sales table.

Product Customer Sales Detail


1 A $10 Furniture
2 B $20 Books
4 Supplies

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
28
Full Outer Join
Sales Table Product Table

Product Customer Sales Product Detail


1 A $10 1 Furniture
2 B $20 2 Books
3 C $30 4 Supplies

• A full outer join returns all records from both tables and leaves nulls where there is no match between the
two.

Product Customer Sales Detail


1 A $10 Furniture
2 B $20 Books
3 C $20
4 Supplies

• You can use the above result to find missing details about a transaction and identify where product sales are
poor.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
29
Conceptual Review of Joins

Employee Table Dependent Table

Employee ID Employee Name Employee ID Dependent Name


1 John 1 Karen
2 Susan 2 Lisa
3 Chris 3 Robert
4 Mike

Q1: To see only employees who have dependents, you would use a _________ join.

Q2: To see all employees and who their dependents are, a _________ join would be appropriate.

Q3: To see all dependents and which employees they are associated with, you could use a _________ join.
In this example, Mike is not associated with any employee).

Q4: To see all employees and all dependents and leave fields when there is no match, a _________ join
would be appropriate.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
30
Relationships versus Joins
Relationships Joins
• Are displayed as flexible noodles between logical tables. • Are displayed with Venn diagram icons between physical tables.
• Require you to select matching fields between two logical tables. • Require you to select join types and join clauses.
• Do not require you to select join types. • Joined physical tables are merged into a single logical table with
• Make all row and column data from related tables potentially a fixed combination of data.
available in the data source. • May drop unmatched measure values.
• Maintain each table's level of detail in the data source and • May duplicate aggregate values when fields are at different
during analysis. levels of detail.
• Create independent domains at multiple levels of detail. Tables • Can be defined using calculated fields and inequality operators.
aren't merged together in the data source. • Support scenarios that require a single table of data, such as
• During analysis, create the appropriate joins automatically, extract filters and row level security.
based on the fields in use.
• Do not duplicate aggregate values (when Performance Options
are set to Many-to-Many).
• Keep unmatched measure values (when Performance Options
are set to Some Records Match).

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
31
Common Pitfalls A

A
Master Table Table being joined • When analyzing and joining data, it is
Fruit Facility Sales Fruit Inventory
imperative you fully understand the
data you are working with. The level of
Orange A $12.00 Orange 100
granularity of individual tables can have
Orange B $15.00 Apple 175
huge implications on joining conditions
Apple A $21.00 and results.
Apple B $25.00
• If the granularity of your two tables
doesn’t match, you may duplicate
Output Table
information.
Fruit Facility Sales Inventory
• This can also occur if you don’t specify
Orange A $12.00 100
your join with enough detail.
Orange B $15.00 100
Apple A $21.00 175
Apple B $25.00 175

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
32
Common Pitfalls B

Master Table Table being joined • When joins are not defined at the
correct level of detail, it can increase
Fruit Inventory Fruit Facility Sales
the total size of your ‘master’ table.
Orange 100 Orange A $12.00
Apple 175 Orange B $15.00 • In this example, Tableau sees two (2)
Apple A $21.00 successful joins to the sales table
Apple B $25.00
because the join criteria has been
set to Fruit.
Output Table • Consequently, it duplicates the original
Fruit Inventory Facility Sales
inventory record to accommodate for
this ‘double success’.
Orange 100 A $12.00
Orange 100 B $15.00
Apple 175 A $21.00
Apple 175 B $25.00

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
33
Cross Database Joins
• Tableau supports data integration, where a single data source may combine data from multiple, potentially heterogeneous data
connections.
• Data integration results in a row-level cross-database join.

Setting Up a Multi-Connection Data Source


• To set up a multi-connection data source, connect to the primary data source and then click the Add link to add a connection.

• If the connector is not available in the Add a Connection list, cross-database joins are not supported for that combination of data
sources.
• For example, you cannot integrate connections to cube data, connections that require an extract, or Tableau Server data sources.
• Instead of joining tables, consider using data blending.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
34
Unions
• Another way to connect tables is by using a union.
• A join appends columns from one table to another table. In contrast, a union appends rows from one table to another table.
• For best results, the tables that you combine must have the same structure. That is, each table must have the same number of
fields and related fields must have matching field names and data types.

• For example, consider two tables of sales data in an Excel file:

Market Category Sales Market Category Sales


West A $10 East C $20
West B $20 West A $40
East B $30 East C $10

• The resulting union, containing rows from both tables:

Market Category Sales


West A $10
West B $20
East B $30
East C $20
West A $40
East C $10

• You can union your data to combine two or more tables in your Excel or Google Sheets workbook data; text file data; JSON file
data; Google BigQuery, Microsoft SQL, Oracle database data. To union your data, the tables must come from the same connection.
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
35
Unions
Merging fields
• Some tables might contain columns of the same data but might have different names.
• For example, one table might contain a field named Customer ID, but another table might use the field name Cust. ID.
Performing a union on tables with slight differences can result in null values.

• In this example, one table used Billing Rate, and another table • To avoid nulls, you can use the Merge mismatched
used Bill Rate, resulting in a union containing null values: fields option to edit the union:

Associate Billing Rate Bill Rate Hours Associate Billing Rate & Bill Rate Hours

Susan A Null 20 Susan A 20

John B Null 5 John B 5

Amy A Null 15 Amy A 15

Mike Null B 30 Mike B 30

Lisa Null A 8 Lisa A 8

Chris Null C 11 Chris C 11

• After you merge fields, you can use the field generated from the merge in a pivot or split, or use the field as a join key. You can
also change the data type of the field generated from a merge.
• NOTE: Only merge fields that were mismatched during a union. Using merge to combine columns will create random results.

• How to Merge Mismatched Fields


1. Perform either a manual or a wildcard union.
2. Select the columns in the data preview that you want to merge.
3. Right-click the selected columns and then click Merge Mismatched Fields.

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
36
10 Best Practices for Modelling Data
1. Understand who is the audience
2. Recognize the life of the model
3. Construct a logical layout
4. Use a consistent colour scheme
5. Provide assumptions and data validation
6. Build error checks and documentation
7. Incorporate effective analysis
8. Complete the formatting
9. Provide workbook independence
10. Keep learning data tools

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
37
10 Best Practices for Modelling Data
1. Understand the audience

• You need to have a clear idea about who your audience is. Are you developing the model for
yourself, an expert user, or a novice? This will help you decide what level of details you need.
• Before building a model, ask the question – “If I was going to use this model, what information
and insights would I need to see?”

KEY TIPS

✓ Customize the model for potential users, those using it today and in the future
✓ Provide easy to understand instructions for the models for non-technical users
✓ Ensure the model has a dashboard that can provide results and insights customized for decision makers

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
38
10 Best Practices for Modelling Data
2. Recognize the life of the model

• Are you going to use the model for just a few days, or is it something that you can use for a long
time? This is important because you don’t have to spend many hours on something that you want
for just one day. Depending on how long it will be useful, you can make it more sophisticated.
• A good model is flexible and capable of adapting to changing business conditions. It should be
built to allow for future requirements, to reduce the need for major changes when making
updates.

KEY TIPS

✓ Think about how many hours you want to invest building the model
✓ Incorporate extra capacity for new data and inputs to avoid having to make model/calculation changes
✓ Ensure that future users are considered in developing model functionality

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
39
10 Best Practices for Modelling Data
3. Construct a logical layout

• You model should have a clean layout, with inputs on one side, and output cleanly organized
separately.
• A clear structure will help users understand the model quickly.
• People are used to reading from top-to-bottom, left-to-right. Avoid calculations that move
within and across tables in an unclear pattern.
• Group data and label all inputs and outputs. This will reduce the risk of errors.

KEY TIPS

✓ Set up tables and calculations to function in a logical manner that most people would typically expect
✓ Work from top-to-bottom and left-to-right to make the model flow better and avoid confusion
✓ Arrange and colour-code the worksheets to represent inputs, outputs, and calculations

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
40
10 Best Practices for Modelling Data
4. Use a consistent colour scheme

• Follow a color scheme consistently. One scheme is: keep all calculated values (and titles) in
black, user inputs/assumptions in blue, and references/sourced assumptions in green.
• Colour-code worksheets them to represent inputs, outputs, and calculations.

KEY TIPS

✓ Keep colours consistent across worksheets and workbooks


✓ Use colours that follow standard conventions: green = good/up, red = bad/down
✓ Colour code different groups of information

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
41
10 Best Practices for Modelling Data
5. Provide assumptions and data validation

• All assumptions about the inputs should be clearly stated in the form of comments.
• Use data validation (dropdown menu) to avoid accidental wrong input by the user.

KEY TIPS

✓ Ensure users understand all the assumptions used to create the model
✓ Use data validation to make data entry fast, easy, and correct
✓ Review assumptions with users to make sure they are applied correctly to the model

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
42
10 Best Practices for Modelling Data
6. Build error checks and documentation

• Error checks should be built into the model at each stage. A 2012 study found that 75% of
spreadsheets contain errors. It’s easier to find and fix errors if models are simple and clear.
• Place the checks so they are visible and use colours to alert of an error. This allows for real-time
error checking when changing inputs.
• If you have used programming or complex formulas, try to comment as much as possible so that
it’s easy for anyone to understand and modify it.
• Avoid programming and complexity by treating the problem differently – changing the order of a
calculation, the steps of a calculation, or using simple algebra.

KEY TIPS

✓ Keep formulas simple to make it easier to find an error


✓ Incorporate automated error checking to ensure the model produces expected values
✓ Provide documentation for current and future users to understand how the model functions

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
43
10 Best Practices for Modelling Data
7. Incorporate effective analysis

• Incorporate tools for changing scenarios.


• Effective models have tools to answer the right questions quickly.
• Models need to be capable of dealing with changing inputs and scenarios.
• A good model will let users run sensitivities on key model drivers. It should be dynamic and
robust to allow different scenarios with only a few clicks.
• An answer should be five seconds away.

KEY TIPS

✓ Ensure the model has the key drivers and a dynamic framework to quickly analyze different scenarios
✓ Allow for quick and easy sensitivity analysis
✓ Reduce the number of click required to make changes in the analysis

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
44
10 Best Practices for Modelling Data
8. Complete the formatting

• Once the model is complete, you need to format the model.


• Some things you can do: remove clutter, outline the important data, and use consistent font sizes.
• Pre-format the pages so that they print well.
• Unlock input cells and protect the model to avoid accidental changes.

KEY TIPS

✓ Remove clutter so the model is stands out visually


✓ Ensure that parts of the model can be printed on standard-sized paper
✓ Lock cells to avoid accidental changes to calculations

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
45
10 Best Practices for Modelling Data
9. Provide workbook independence

• Try to reduce the dependency of your workbook on other workbooks. When possible, the model
workbook should be complete in itself.
• Workbooks that are linked to other workbooks often cannot be updated without having the other
workbook open.
• Links across workbooks are easily broken and difficult to reconnect.

KEY TIPS

✓ Place all the data needed in a model in a single workbook


✓ If multiple workbooks are needed, store them in the same folder

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
46
10 Best Practices for Modelling Data
10. Keep learning data tools

• Most data modelling tools have a vast amount of tools and functions.
• Few people know everything about any tool. The more you learn about a tool, the more you can
incorporate into your models.
• Leverage design templates to build models with more consistency quicker.
• As your skills change, update models with more efficient functions, more powerful calculations,
and more attractive visualizations.

KEY TIPS

✓ Models can always be improved – review and update as your data skills change
✓ Use templates to better understand model building and to learn new techniques
✓ Continue to develop data skills that are useful for model building

S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6


V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
47

You might also like