You are on page 1of 8

Business Analytics

Course Code : BBA 202 L-4 Credit-04


Unit.III
Data Visualization: Definition and Techniques

3.1 Introduction to Data Visualization


3.2 Data Visualization Techniques
3.3 Tables: Cross Tabulation & Pivot
3.4 Data Modeling: Concept
3.5 Data Modeling: Role & Techniques
3.6 Introduction to Tableau
……………………………………………………………………………………………………

3.1. Data Visualization1


Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way to see
and understand trends, outliers, and patterns in data. Additionally, it provides an excellent way
for employees or business owners to present data to non-technical audiences without confusion.
Data visualization is a critical step in the data science process, helping teams and individuals
convey data more effectively to colleagues and decision makers. Teams that manage reporting
systems typically leverage defined template views to monitor performance. However, data
visualization isn’t limited to performance dashboards. For example, while text mining an analyst
may use a word cloud to to capture key concepts, trends, and hidden relationships within this
unstructured data. Alternatively, they may utilize a graph structure to illustrate relationships
between entities in a knowledge graph. There are a number of ways to represent different types
of data, and it’s important to remember that it is a skillset that should extend beyond your core
analytics team.

Idea generation2
Data visualization is commonly used to spur idea generation across teams. They are frequently
leveraged during brainstorming or Design Thinking sessions at the start of a project by

1
https://www.tableau.com/learn/articles/data-visualization
2
https://www.ibm.com/in-en/topics/data-visualization
supporting the collection of different perspectives and highlighting the common concerns of the
collective. While these visualizations are usually unpolished and unrefined, they help set the
foundation within the project to ensure that the team is aligned on the problem that they’re
looking to address for key stakeholders.
Idea illustration
Data visualization for idea illustration assists in conveying an idea, such as a tactic or process. It
is commonly used in learning settings, such as tutorials, certification courses, centers of
excellence, but it can also be used to represent organization structures or processes, facilitating
communication between the right individuals for specific tasks. Project managers frequently use
Gantt charts and waterfall charts to illustrate workflows.
Visual discovery
Visual discovery and every day data viz are more closely aligned with data teams. While visual
discovery helps data analysts, data scientists, and other data professionals identify patterns and
trends within a dataset, every day data viz supports the subsequent storytelling after a new
insight has been found.
Data Visualization
Advantages Disadvantages
• Easily sharing information. • Biased or inaccurate information based
• Interactively explore opportunities. on assumptions.
• Visualize patterns and relationships. • Correlation doesn’t always mean
causation.
• Core messages can get lost in
translation.

Common Approaches to Data Visualization


1. Showing change over time
2. Showing a part-to-whole composition
3. Looking at how data is distributed
4. Comparing values between groups
5. Observing relationships between variables
6. Looking at geographical data
– Detailed Illustration in: https://chartio.com/learn/charts/how-to-choose-data-
visualization/
3.2. Types of data visualizations3
The earliest form of data visualization can be traced back the Egyptians in the pre-17th century,
largely used to assist in navigation. As time progressed, people leveraged data visualizations for
broader applications, such as in economic, social, health disciplines. Perhaps most notably,
Edward Tufte published The Visual Display of Quantitative Information, which illustrated that
individuals could utilize data visualization to present data in a more effective manner. His book
continues to stand the test of time, especially as companies turn to dashboards to report their
performance metrics in real-time. Dashboards are effective data visualization tools for tracking
and visualizing data from multiple data sources, providing visibility into the effects of specific
behaviors by a team or an adjacent one on performance. Dashboards include common
visualization techniques, such as:

 Tables: This consists of rows and columns used to compare variables. Tables can show a
great deal of information in a structured way, but they can also overwhelm users that are
simply looking for high-level trends.

 Pie charts and stacked bar charts: These graphs are divided into sections that represent
parts of a whole. They provide a simple way to organize data and compare the size of
each component to one other.

 Line charts and area charts: These visuals show change in one or more quantities by
plotting a series of data points over time and are frequently used within predictive
analytics. Line graphs utilize lines to demonstrate these changes while area charts
connect data points with line segments, stacking variables on top of one another and
using color to distinguish between variables.

 Histograms: This graph plots a distribution of numbers using a bar chart (with no spaces
between the bars), representing the quantity of data that falls within a particular range.
This visual makes it easy for an end user to identify outliers within a given dataset.

 Scatter plots: These visuals are beneficial in reveling the relationship between two
variables, and they are commonly used within regression data analysis. However, these
can sometimes be confused with bubble charts, which are used to visualize three
variables via the x-axis, the y-axis, and the size of the bubble.

 Heat maps: These graphical representation displays are helpful in visualizing behavioral
data by location. This can be a location on a map, or even a webpage.

 Tree maps, which display hierarchical data as a set of nested shapes, typically rectangles.
Treemaps are great for comparing the proportions between categories via their area size.

3
https://chartio.com/learn/charts/how-to-choose-data-visualization/
3.2.1 Data visualization best practices

With so many data visualization tools readily available, there has also been a rise in ineffective
information visualization. Visual communication should be simple and deliberate to ensure that
your data visualization helps your target audience arrive at your intended insight or conclusion.
The following best practices can help ensure your data visualization is useful and clear:

Set the context: It’s important to provide general background information to ground the
audience around why this particular data point is important. For example, if e-mail open rates
were underperforming, we may want to illustrate how a company’s open rate compares to the
overall industry, demonstrating that the company has a problem within this marketing channel.
To drive an action, the audience needs to understand how current performance compares to
something tangible, like a goal, benchmark, or other key performance indicators (KPIs).

Know your audience(s): Think about who your visualization is designed for and then make sure
your data visualization fits their needs. What is that person trying to accomplish? What kind of
questions do they care about? Does your visualization address their concerns? You’ll want the
data that you provide to motivate people to act within their scope of their role. If you’re unsure if
the visualization is clear, present it to one or two people within your target audience to get
feedback, allowing you to make additional edits prior to a large presentation.

Choose an effective visual: Specific visuals are designed for specific types of datasets. For
instance, scatter plots display the relationship between two variables well, while line graphs
display time series data well. Ensure that the visual actually assists the audience in understanding
your main takeaway. Misalignment of charts and data can result in the opposite, confusing your
audience further versus providing clarity.

Keep it simple: Data visualization tools can make it easy to add all sorts of information to your
visual. However, just because you can, it doesn’t mean that you should! In data visualization,
you want to be very deliberate about the additional information that you add to focus user
attention. For example, do you need data labels on every bar in your bar chart? Perhaps you only
need one or two to help illustrate your point. Do you need a variety of colours to communicate
your idea? Are you using colours that are accessible to a wide range of audiences (e.g.
accounting for colour blind audiences)? Design your data visualization for maximum impact by
eliminating information that may distract your target audience.

3.3. Tables: Cross Tabulation & Pivot4


Data tables are used to analyze the changes seen in your final result when certain variables are
changed from your function or formula. There are two types of a data table, which are as
follows: One-Variable Data Table & Two-Variable Data Table.
3.3.1 Cross Tabulation

4
https://www.voxco.com/blog/cross-tabulation/
A cross-tabulation is a two- (or more) dimensional table that records the number (frequency) of
respondents that have the specific characteristics described in the cells of the table. Cross-
tabulation tables provide a wealth of information about the relationship between the variables.
Cross-tabulation is one of the most useful analytical tools and a mainstay of the market research
industry. Cross-tabulation analysis, also known as contingency table analysis, is most often used
to analyse categorical (nominal measurement scale) data. When you want to conduct a survey
analysis and compare the results for one or more variables with the results of another, there’s
only one solution: cross-tabulation. Cross-tabulation (also cross-tabulation or crosstab) is one of
the most useful analytical tools and a mainstay of the market research industry. Cross-tabulation
analysis, also known as contingency table analysis, is most often used to analyze categorical
(nominal measurement scale) data. At their core, cross-tabulations are simply data tables that
present the results of the entire group of respondents, as well as results from subgroups of survey
respondents. With them, you can examine relationships within the data that might not be readily
apparent when only looking at total survey responses.
When should you use cross-tabulation?
You typically use cross tabulation when you have categorical variables or data – e.g. information
that can be divided into mutually exclusive groups.
An example of when to use cross-tabulation is with product surveys – you could ask a group of
50 people “Do you like our products?” and use cross-tabulation to get a more insightful answer.
Rather than just recording the 50 responses, you can add another independent variable, such as
gender, and use cross-tabulation to understand how the male and female respondents view your
product. With this information, you might see that your female customers prefer your products
more than your male customers. You can then use these insights to improve your products for
your male customers. As such, using two variables – gender and product likability – along with
cross-tabulation, you can get a more comprehensive breakdown of data sets to identify patterns,
trends, or other useful information. As a result, cross-tabulation is excellent for assessing
categorical variables in market research or survey responses, as you can readily compare data
sets to discover the relationship between two (or more) seemingly unrelated items.

3.3.2 Pivot Tables5


A pivot table is a summary of your data, packaged in a chart that lets you report on and explore
trends based on your information. Pivot tables are particularly useful if you have long rows or
columns that hold values you need to track the sums of and easily compare to one another. The
"pivot" part of a pivot table stems from the fact that you can rotate (or pivot) the data in the table
to view it from a different perspective. To be clear, you're not adding to, subtracting from, or
otherwise changing your data when you make a pivot. Instead, you're simply reorganizing the
data so you can reveal useful information.

5
https://support.microsoft.com/en-us/office/create-a-pivotchart-c1b1e057-6990-4c38-b52b-8255538e7b1c
What are pivot tables used for?

The purpose of pivot tables is to offer user-friendly ways to quickly summarize large amounts of
data. They can be used to better understand, display, and analyze numerical data in detail. With
this information, you can help identify and answer unanticipated questions surrounding the data.
Steps for building a Pivot Table
1. Select the cells you want to create a PivotTable from.
2. Select Insert > PivotTable.
3. This will create a PivotTable based on an existing table or range.
4. Choose where you want the PivotTable report to be placed. Select New
Worksheet to place the PivotTable in a new worksheet or Existing
Worksheet and select where you want the new PivotTable to appear.
5. Click OK.

3.4 Data Modeling6


Data modeling is the process of creating a visual representation of either a whole information
system or parts of it to communicate connections between data points and structures. The goal is
to illustrate the types of data used and stored within the system, the relationships among these
data types, the ways the data can be grouped and organized and its formats and attributes.
Data models are built around business needs. Rules and requirements are defined upfront through
feedback from business stakeholders so they can be incorporated into the design of a new system
or adapted in the iteration of an existing one.

6
https://www.ibm.com/in-en/topics/data-modeling
Data can be modeled at various levels of abstraction. The process begins by collecting
information about business requirements from stakeholders and end users. These business rules
are then translated into data structures to formulate a concrete database design. A data model can
be compared to a roadmap, an architect’s blueprint or any formal diagram that facilitates a
deeper understanding of what is being designed.
Data modeling employs standardized schemas and formal techniques. This provides a common,
consistent, and predictable way of defining and managing data resources across an organization,
or even beyond.
Ideally, data models are living documents that evolve along with changing business needs. They
play an important role in supporting business processes and planning IT architecture and
strategy. Data models can be shared with vendors, partners, and/or industry peers.

Types of Data Model – Based on Abstraction


Like any design process, database and information system design begins at a high level of
abstraction and becomes increasingly more concrete and specific. Data models can generally be
divided into three categories, which vary according to their degree of abstraction. The process
will start with a conceptual model, progress to a logical model and conclude with a physical
model. Each type of data model is discussed in more detail below:

1. Conceptual data models. They are also referred to as domain models and offer a big-
picture view of what the system will contain, how it will be organized, and which
business rules are involved. Conceptual models are usually created as part of the process
of gathering initial project requirements. Typically, they include entity classes (defining
the types of things that are important for the business to represent in the data model),
their characteristics and constraints, the relationships between them and relevant security
and data integrity requirements. Any notation is typically simple.
2. Logical data models. They are less abstract and provide greater detail about the concepts
and relationships in the domain under consideration. One of several formal data modeling
notation systems is followed. These indicate data attributes, such as data types and their
corresponding lengths, and show the relationships among entities. Logical data models
don’t specify any technical system requirements. This stage is frequently omitted in agile
or DevOps practices. Logical data models can be useful in highly procedural
implementation environments, or for projects that are data-oriented by nature, such as
data warehouse design or reporting system development.
3. Physical data models. They provide a schema for how the data will be physically stored
within a database. As such, they’re the least abstract of all. They offer a finalized design
that can be implemented as a relational database, including associative tables that
illustrate the relationships among entities as well as the primary keys and foreign keys
that will be used to maintain those relationships. Physical data models can include
database management system (DBMS)-specific properties, including performance tuning.
Data Modeling Techniques

Data modeling has evolved alongside database management systems, with model types
increasing in complexity as businesses' data storage needs have grown. Here are several model
types:

 Hierarchical data models represent one-to-many relationships in a treelike format. In


this type of model, each record has a single root or parent which maps to one or more
child tables. This model was implemented in the IBM Information Management System
(IMS), which was introduced in 1966 and rapidly found widespread use, especially in
banking. Though this approach is less efficient than more recently developed database
models, it’s still used in Extensible Markup Language (XML) systems and geographic
information systems (GISs).
 Relational data models were initially proposed by IBM researcher E.F. Codd in 1970.
They are still implemented today in the many different relational databases commonly
used in enterprise computing. Relational data modeling doesn’t require a detailed
understanding of the physical properties of the data storage being used. In it, data
segments are explicitly joined through the use of tables, reducing database complexity.
 Entity-relationship (ER) data models use formal diagrams to represent the relationships
between entities in a database. Several ER modeling tools are used by data architects to
create visual maps that convey database design objectives.
 Object-oriented data models gained traction as object-oriented programming and it
became popular in the mid-1990s. The “objects” involved are abstractions of real-world
entities. Objects are grouped in class hierarchies, and have associated features. Object-
oriented databases can incorporate tables, but can also support more complex data
relationships. This approach is employed in multimedia and hypertext databases as well
as other use cases.
 Dimensional data models were developed by Ralph Kimball, and they were designed to
optimize data retrieval speeds for analytic purposes in a data warehouse. While relational
and ER models emphasize efficient storage, dimensional models increase redundancy in
order to make it easier to locate information for reporting and retrieval. This modeling is
typically used across OLAP systems.

You might also like