Professional Documents
Culture Documents
Discussion - MBAN 5140 - Class 6
Discussion - MBAN 5140 - Class 6
Class 6 - Discussion
Fall 2022
• Watch:
• Data Connection Options
• Showing Breakdowns of the Whole
• Viewing Specific Values
• Tableau Practice Examples
• Assignments:
• Weekly Quiz 5
• Exercise 3 assigned
http://www.getchee.com/?p=15316
https://graphics.straitstimes.com/STI/STIMEDIA/Interactives/2018/04/marvel-cinematic-universe-whos-who-interactive/index.html
• The Data Model is defined as an abstract model that organizes data description, data
semantics, and consistency constraints of data.
• The data model emphasizes what data is needed and how it should be organized. It is like a
roadmap or a diagram that facilitates a deeper understanding of what is being designed.
• Data models ensure consistency in naming conventions, default values, semantics, security
while ensuring the quality of the data.
• Conceptual models are usually created as part of gathering initial project requirements.
• They include entity classes (defining the types of things that are important for the business to represent in the
data model), their characteristics and constraints, the relationships between them and relevant security and
data integrity requirements.
Customers Employees
Orders
Products Returns
Orders
Market Employees
Orders
Market String Employees
Customers Region String
Country String Employee ID String
Customer ID String State String Employee Name String
Customer Name String City String Market String
First Sale Date Date Employee ID String Region String
Gender String Customer ID String Title String
Market String Order ID String Hire Date Date
Region String Order Date Date Birth Date Date
Rewards Member String Year (OrderDate) Number Email Address String
Order Priority String Marital Status String
Product ID String Gender String
Products Product Name String Base Rate Number
Category String Sales Quota Number
Product ID String Profit Quota Number
Sub-Category String
Product Name String
Segment String
Colour String
Ship Date Date Returns
Brand String
Ship Mode String
Category String
Payment Type String Returned String
Sub-Category String
Discount Number Order ID String
Profit Number Market String
Quantity Number
Sales Number
Shipping Cost Number
1. Identify the entities. The process begins with the identification of the things, events or concepts that are
represented in the data set that is to be modeled.
2. Identify key properties of each entity. Each entity type can be differentiated from others because it has
one or more unique properties, called attributes. For example, an entity called “customer” might
possess attributes as a first name, last name, and phone number, while an entity called “address”
may include a street name and number, a city, province, country and postal code.
3. Identify relationships among entities. The data model will specify the type of relationships each entity
has with the others. For example, each customer “lives at” an address. If that model were expanded to
include an entity called “orders,” each order would be shipped to and billed to an address as well.
4. Map attributes to entities. This will ensure the model reflects how the business will use the data.
5. Assign keys and decide on a degree of normalization that balances redundancy with performance
requirements. Normalization is a technique for organizing data models in which identifiers, called
keys, are assigned to data to represent relationships between them without repeating the data. For
example, if customers are each assigned a key, that key can be linked to both their address and their
order history without having to repeat this information in the table of customer names.
6. Finalize and validate the data model. Data modeling is an iterative process that should be repeated
and refined as business needs change.
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
14
Types of Data Modelling
Data modelling has two common model types:
• Hierarchical data models
• Represent one-to-many relationships in a treelike format. In this type of model, each record
has a single root or parent which maps to one or more child tables.
• This model was introduced in 1966 and was widely used in banking. While approach is less
efficient than more recent models, it’s still used in Extensible Markup Language (XML)
systems and geographic information systems (GISs).
• Relational data models
• Initially proposed by IBM researcher E.F. Codd in 1970. They are still used today in relational
databases used in enterprise computing.
• Relational data modelling doesn’t require a detailed understanding of the physical
properties of the data storage being used. Data segments are explicitly joined using tables,
reducing database complexity.
• Entity-relationship (ER) data models use diagrams to represent the relationships between entities in a
database. Data architects use them to create visual maps showing database design objectives.
Order Detail ID
Order ID
Product ID Product ID
Order Date
Supplier ID Order ID
Headquarters ID
Product Quantity
Store Headquarters
Headquarters ID
Store ID
Store ID
• Object-oriented data models became popular in the mid-1990s. The “objects” are abstractions of real-
world entities. Objects are grouped in hierarchies and have associated features. Object-oriented
databases can incorporate tables and support more complex data relationships. This approach is
employed in multimedia and hypertext databases.
Month 05-15-22
Product Code 1234
Vendor 897-STA
Revenue $345.67
Customer
Product Code
Product Name
Sales Associate
Date of Sale
Price
• Dimensional data models were designed to optimize data retrieval speeds for analytic purposes in a
data warehouse. Dimensional models increase redundancy to make it easier to locate information for
reporting and retrieval. This modeling is typically used in OLAP systems.
• A popular dimensional data model is the star schema, with data organized into facts (measurable items) and
dimensions (reference information), with each fact surrounded by its dimensions in a star-like pattern.
Value Wide
Province 1980 1990 2000 2010 2020
• Joining information between two wide
Ontario 2 5 2 5 4
tables is a complicated process
Quebec 3 1 4 1 5
impossible to perform in Tableau.
Alberta 3 9 8 7 5
• Instead of joining two rows of
information together, with wide data you
Inventory must join two intersections of a matrix.
Province 1980 1990 2000 2010 2020 This introduces a high degree of
Ontario 25 23 36 40 37 difficulty and is not robust or flexible.
Quebec 21 17 12 24 37
• Years are dimensional and do not need
Alberta 22 19 14 35 32
to be recorded as separate columns.
Instead, this data should be pivoted to
provide a single row per record in the
table and one Year column.
Relationships are a dynamic, flexible way to combine data The data to be analyzed exists in
Relationship
from multiple tables for analysis. separate tables in the database.
A relationship of two tables in a single database using a The data to be analyzed exists in
Join
common field (a key field or indexed field). separate tables in the database.
Cross Database A relationship across two different databases or text tables The data to be analyzed exists in
Joins based on a common field. different data sources.
Appended rows from different tables with the same column The tables have the same columns
Union
names. but are not stored in the same file.
A combination of data from different databases or text tables The data to be analyzed exists in
Blend
based on a common dimension. Behaves like a left join. different data sources.
• The tables must have a common field to define the relationship between the rows (usually a key field).
• Joins can be used to retrieve additional columns to your result set from a different table, but the table
relationship must be first defined in Tableau.
• You can join multiple tables in Tableau, but these relationships need to be defined for each new table included
in the data source.
• There can be multiple clauses within each join and clauses can be different when joining more than two tables.
• In the example tables below, both tables contain a column for the Product field, but the values aren't the same
in each table.
Inner Left
Returns records from sales and Returns all records from sales and the
product only when there is a matching records from product. In this
matching record in both tables. example, sales with a null or non-matching
product are also returned.
• An inner join is the default join in Tableau. An inner join returns transactions that have been recorded in both
tables.
• In the following example, using an inner join to relate the Sales to the Product table returns only records where
a match is found for the Product number.
• In this example, you would not see the following records returned for an inner join:
• Products that have not been sold, because they do not have a corresponding row in the Sales table.
• A sales transaction with no matching record in the Product table: for example, the sale of Product 3 to
Customer C. The product table does not have an item number 3.
• A left join returns all rows from the left table and only matching rows from the right table.
• The following example shows the results of sales on the left and Product on the right.
• The left Join relationship more accurately reflects the actual sales transactions that have occurred.
• Although the Detail is not included because there isn’t a match, the sale of Product 3 to Customer C is included
in this result.
• A right join uses logic similar to a left join but changes the direction of the join.
• A right join returns all rows from the table on the right and only matching rows from the left table.
• In the following above, a right join is useful for identifying which products have not been sold. If the database
does not support a right join, you can achieve the same result by placing Product table on the left and then
creating a left join to the Sales table.
• A full outer join returns all records from both tables and leaves nulls where there is no match between the
two.
• You can use the above result to find missing details about a transaction and identify where product sales are
poor.
Q1: To see only employees who have dependents, you would use a _________ join.
Q2: To see all employees and who their dependents are, a _________ join would be appropriate.
Q3: To see all dependents and which employees they are associated with, you could use a _________ join.
In this example, Mike is not associated with any employee).
Q4: To see all employees and all dependents and leave fields when there is no match, a _________ join
would be appropriate.
A
Master Table Table being joined • When analyzing and joining data, it is
Fruit Facility Sales Fruit Inventory
imperative you fully understand the
data you are working with. The level of
Orange A $12.00 Orange 100
granularity of individual tables can have
Orange B $15.00 Apple 175
huge implications on joining conditions
Apple A $21.00 and results.
Apple B $25.00
• If the granularity of your two tables
doesn’t match, you may duplicate
Output Table
information.
Fruit Facility Sales Inventory
• This can also occur if you don’t specify
Orange A $12.00 100
your join with enough detail.
Orange B $15.00 100
Apple A $21.00 175
Apple B $25.00 175
Master Table Table being joined • When joins are not defined at the
correct level of detail, it can increase
Fruit Inventory Fruit Facility Sales
the total size of your ‘master’ table.
Orange 100 Orange A $12.00
Apple 175 Orange B $15.00 • In this example, Tableau sees two (2)
Apple A $21.00 successful joins to the sales table
Apple B $25.00
because the join criteria has been
set to Fruit.
Output Table • Consequently, it duplicates the original
Fruit Inventory Facility Sales
inventory record to accommodate for
this ‘double success’.
Orange 100 A $12.00
Orange 100 B $15.00
Apple 175 A $21.00
Apple 175 B $25.00
• If the connector is not available in the Add a Connection list, cross-database joins are not supported for that combination of data
sources.
• For example, you cannot integrate connections to cube data, connections that require an extract, or Tableau Server data sources.
• Instead of joining tables, consider using data blending.
• You can union your data to combine two or more tables in your Excel or Google Sheets workbook data; text file data; JSON file
data; Google BigQuery, Microsoft SQL, Oracle database data. To union your data, the tables must come from the same connection.
S C H UL ICH S C H OO L O F B U S INES S | M B AN 5 1 4 0 CLASS 6
V I S U AL A N A L YTI CS A N D M O D EL LI NG O C T OBER 2 0 , 2 0 22
35
Unions
Merging fields
• Some tables might contain columns of the same data but might have different names.
• For example, one table might contain a field named Customer ID, but another table might use the field name Cust. ID.
Performing a union on tables with slight differences can result in null values.
• In this example, one table used Billing Rate, and another table • To avoid nulls, you can use the Merge mismatched
used Bill Rate, resulting in a union containing null values: fields option to edit the union:
Associate Billing Rate Bill Rate Hours Associate Billing Rate & Bill Rate Hours
• After you merge fields, you can use the field generated from the merge in a pivot or split, or use the field as a join key. You can
also change the data type of the field generated from a merge.
• NOTE: Only merge fields that were mismatched during a union. Using merge to combine columns will create random results.
• You need to have a clear idea about who your audience is. Are you developing the model for
yourself, an expert user, or a novice? This will help you decide what level of details you need.
• Before building a model, ask the question – “If I was going to use this model, what information
and insights would I need to see?”
KEY TIPS
✓ Customize the model for potential users, those using it today and in the future
✓ Provide easy to understand instructions for the models for non-technical users
✓ Ensure the model has a dashboard that can provide results and insights customized for decision makers
• Are you going to use the model for just a few days, or is it something that you can use for a long
time? This is important because you don’t have to spend many hours on something that you want
for just one day. Depending on how long it will be useful, you can make it more sophisticated.
• A good model is flexible and capable of adapting to changing business conditions. It should be
built to allow for future requirements, to reduce the need for major changes when making
updates.
KEY TIPS
✓ Think about how many hours you want to invest building the model
✓ Incorporate extra capacity for new data and inputs to avoid having to make model/calculation changes
✓ Ensure that future users are considered in developing model functionality
• You model should have a clean layout, with inputs on one side, and output cleanly organized
separately.
• A clear structure will help users understand the model quickly.
• People are used to reading from top-to-bottom, left-to-right. Avoid calculations that move
within and across tables in an unclear pattern.
• Group data and label all inputs and outputs. This will reduce the risk of errors.
KEY TIPS
✓ Set up tables and calculations to function in a logical manner that most people would typically expect
✓ Work from top-to-bottom and left-to-right to make the model flow better and avoid confusion
✓ Arrange and colour-code the worksheets to represent inputs, outputs, and calculations
• Follow a color scheme consistently. One scheme is: keep all calculated values (and titles) in
black, user inputs/assumptions in blue, and references/sourced assumptions in green.
• Colour-code worksheets them to represent inputs, outputs, and calculations.
KEY TIPS
• All assumptions about the inputs should be clearly stated in the form of comments.
• Use data validation (dropdown menu) to avoid accidental wrong input by the user.
KEY TIPS
✓ Ensure users understand all the assumptions used to create the model
✓ Use data validation to make data entry fast, easy, and correct
✓ Review assumptions with users to make sure they are applied correctly to the model
• Error checks should be built into the model at each stage. A 2012 study found that 75% of
spreadsheets contain errors. It’s easier to find and fix errors if models are simple and clear.
• Place the checks so they are visible and use colours to alert of an error. This allows for real-time
error checking when changing inputs.
• If you have used programming or complex formulas, try to comment as much as possible so that
it’s easy for anyone to understand and modify it.
• Avoid programming and complexity by treating the problem differently – changing the order of a
calculation, the steps of a calculation, or using simple algebra.
KEY TIPS
KEY TIPS
✓ Ensure the model has the key drivers and a dynamic framework to quickly analyze different scenarios
✓ Allow for quick and easy sensitivity analysis
✓ Reduce the number of click required to make changes in the analysis
KEY TIPS
• Try to reduce the dependency of your workbook on other workbooks. When possible, the model
workbook should be complete in itself.
• Workbooks that are linked to other workbooks often cannot be updated without having the other
workbook open.
• Links across workbooks are easily broken and difficult to reconnect.
KEY TIPS
• Most data modelling tools have a vast amount of tools and functions.
• Few people know everything about any tool. The more you learn about a tool, the more you can
incorporate into your models.
• Leverage design templates to build models with more consistency quicker.
• As your skills change, update models with more efficient functions, more powerful calculations,
and more attractive visualizations.
KEY TIPS
✓ Models can always be improved – review and update as your data skills change
✓ Use templates to better understand model building and to learn new techniques
✓ Continue to develop data skills that are useful for model building