You are on page 1of 89

V-SEMESTER

DSCP508 – DATA VISUALISATION LAB


V SEMESTER
DSCP508 – DATA VISUALISATION LAB

Certified that , this is a bona fide record of work done by


Mr.Ms.____________________________________________________________
Reg. No. ______________________ of B.E. Computer Science and Engineering
(Data Science) in DSCP508 – Data Visualization Lab during the Odd Semester
(July 2022- November 2022).

Staff-In-charge

Internal Examiner External Examiner

Place:Annamalai nagar
Date: / / 2022
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VISION
To provide a congenial ambience for individuals to develop and blossom as
academically superior, socially conscious and nationally responsible citizens.

MISSION

M1: Impart high quality computer knowledge to the students through a dynamic
scholastic environment wherein they learn to develop technical, communication
and leadership skills to bloom as a versatile professional.
M2: Develop life-long learning ability that allows them to be adaptive and
responsive to the changes in career, society, technology, and environment.
M3: Build student community with high ethical standards to undertake innovative
research and development in thrust areas of national and international needs.
M4: Expose the students to the emerging technological advancements for meeting
the demands of the industry.

PROGRAM EDUCATIONAL OBJECTIVES (PEO)

PEO PEO Statements


To prepare graduates with potential to get employed in the right role and/or become
PEO1
entrepreneurs to contribute to the society.
To provide the graduates with the requisite knowledge to pursue higher education and
PEO2 carry out research in the field of Computer Science and Engineering.

To equip the graduates with the skills required to stay motivated and adapt to the
PEO3
dynamically changing world so as to remain successful in their career.
To train the graduates to communicate effectively, work collaboratively and exhibit high
PEO4
levels of professionalism and ethical responsibility.
COURSE OBJECTIVES:

• To learn the interface in Tableau / MS-Excel for creating visualisations.


• To understand the methods for drawing charts and graphs.
• To learn the use of maps and tables in creating visualisation.
• To prepare dashboard design for data analytics applications.

LIST OF EXERCISES
01. Generating line chart using matplotlib.
02. Generating bar chart using matplotlib.
03. Generating scatter plot using matplotlib.
04. Generating box plot using seaborn.
05. Generating joint plot using seaborn.
06. Generating heatmap using seaborn and matplotlib.
07. Generating pie chart using matplotlib.
08. Generating Stream graph using matplotlib.
09. Generating Radial bar charts using matplotlib.
10. Generating Scatter plot matrix using plotly.
11. Generating Parallel coordinates using plotly.
12. Generating Treemap using squarify
13. Study of Tableau Environment
14. Connecting with various Data Sources
15.Building Views
16. Applying Filters
17. Drawing Charts and Graphs
18. Working with Functions
19. Creating Dashboard

COURSE OUTCOMES:
At the end of this course, the students will be able to
1. Discover the various elements in the interface to load and analyze data.
2. Design filters for data visualization.
3. Develop dashboard design for typical data analytics applications
CONTENTS
Ex.No. Date Name of the Experiment Page No. Marks Sign
1 Generating line chart using 1
matplotlib
2 Generating bar chart using 4
matplotlib
3 Generating scatter plot using 10
matplotlib.
4 Generating box plot using seaborn. 13

5 Generating joint plot using 16


seaborn.
6 Generating heatmap using seaborn 19
and matplotlib.
7 Generating pie chart using 22
matplotlib.
8 Generating Stream graph using 24
matplotlib
9 Generating Radial bar charts using 27
matplotlib
10 Generating Scatter plot matrix 31
using plotly.
11 Generating Parallel coordinates 34
using plotly
12 Generating Treemap using 36
squarify.
13 Study of Tableau Environment 38

14 Connecting with various Data 46


Sources
15 Building Views 50

16 Applying Filters 53

17 Drawing Charts and Graphs 59

18 Working with Functions 75

19 Creating Dashboard 81
1

EX.NO:1 Generating line chart using matplotlib.

DATE:

Aim :
To use Matplotlib for plotting line chart .

A).simple line chart


Code:
import matplotlib.pyplot as plt
x =[3,6,8,11,13,14,17,19,21,24,33,37]
y = [7.5,12,13.2,15,17,22,24,37,34,38.5,42,47]
x2 =[3,6,8,11,13,14,17,19,21,24,33]
y2 = [50,45,33,24,21.5,19,14,13,10,6,3]

plt.plot(x,y, label='First Line')


plt.plot(x2, y2, label='Second Line')
plt.xlabel('Plot Number')
plt.ylabel('Important var')
plt.title('Interesting Graph\n2018 ')
plt.yticks([0,5,10,15,20,25,30,35,40,45,50],
['0B','5B','10B','15B','20B','25B','30B','35B','40B','45B','50B'])
plt.legend()
plt.show()

Output:
2
B)loading dataset.
Code:

import pandas as pd
dataset = pd.read_csv("salaries.csv")
rank = dataset['rank']
discipline = dataset ['discipline']
phd = dataset['phd']
service = dataset['service']
sex = dataset['sex']
salary = dataset['salary']
dataset.head()

Output:

C)Plotting two fields.


Code:

dataset[["phd","service"]].plot()

Output:
3
D)Plotting a distribution of two fields.
Code:

import matplotlib.pyplot as plt


plt.plot(salary, label='Phd')
plt.plot(service, label='service')
plt.xlabel('salary/service')
plt.ylabel('Frequency')
plt.title('Phd/service\nDistribution')
plt.legend()
plt.show()

Output:

Result:
Line chart was generated using matplotlib.
4

EX.NO:2 Generating bar chart using matplotlib.

DATE:

Aim :
To use Matplotlib for plotting bar chart .

A)simple bar chart:


Code:
(Note: Repeat Ex.No 1. B loading dataset)
import matplotlib.pyplot as plt

Students = [2,4,6,8,10]
Courses = [4,5,3,2,1]
plt.bar(Students,Courses, label="Students/Courses")
plt.xlabel('Students ')
plt.ylabel('Courses')
plt.title('Students Courses Data\n 2018')
plt.legend()
plt.show()

Output:
5

B)Generating colors:
Code:
Students = [2,4,6,8,10]
Courses = [4,5,3,2,3]
stds = [3,5,7,9,11]
Projects = [1,2,4,3,2]
plt.bar(Students, Courses, label="Courses", color='r')
plt.bar(stds, Projects, label="Projects", color='c')
plt.xlabel('Students')
plt.ylabel('Courses/Projects')
plt.title('Students Courses and Projects Data\n 2018')
plt.legend()
plt.show()

Output:

C)Plotting histogram:
Code:
plt.hist(service, bins=30, alpha=0.4, rwidth=0.8,
color='green', label='service')
plt.hist(phd, bins=30, alpha=0.4, rwidth=0.8,
color='red', label='phd')
6
plt.xlabel('Services/phd')
plt.ylabel('Distribution')
plt.title('Services/phd\n 2018')
plt.legend(loc='upper right')
plt.show()
Output:

D)Plotting histogram with bins:


Code:
plt.hist(service, bins=10, alpha=0.4, rwidth=0.8,
color='green', label='service')
plt.hist(phd, bins=10, alpha=0.4, rwidth=0.8,
color='red', label='phd')
plt.xlabel('Services/phd')
plt.ylabel('Distribution')
plt.title('Services/phd\n 2018')
plt.legend(loc='upper right')
plt.show()

Output:

E)Tabling using groupby:


7
Code:
import pandas as pd
dataset1 = dataset.groupby(['service']).sum()
dataset1.sort_values("salary", ascending = False, inplace=True)
dataset1.head()

Output:

F)Plotting bar chart:


Code:

dataset1["salary"].plot.bar()

Output:

G)Plotting histogram with colors:


Code:
dataset[['phd', 'service']].head(10).plot.bar (title="Ph.D. Vs Service\n 2018" ,
. color=['g','red'])
8
Output:

H)Stacked bar chart:


Code:
import matplotlib.pyplot as plt
import numpy as np

x=['A','B','C','D']
y1=np.array([10,20,10,30])
y2=np.array([20,25,15,25])
y3=np.array([12,15,19,6])
y4=np.array([10,29,13,19])
plt.bar(x,y1,color='r')
plt.bar(x,y2,bottom=y1,color='b')
plt.bar(x,y3,bottom=y1+y2,color='y')
plt.bar(x,y4,bottom=y1+y2+y3,color='g')
plt.xlabel("Teams")
plt.ylabel("Score")
plt.legend(["Round 1","Round 2","Round 3","Round 4"])
plt.title("Score by Teams in 4 Rounds")
plt.show()
9
Output:

Result:
Bar chart was generated using matplotlib.
10
EX.NO:3 Generating scatter chart using matplotlib.

DATE:

Aim :
To use Matplotlib for plotting Scatter Chart.

A)scatter plot using dataset:


Code:
(Note: Repeat Ex.No 1. B loading dataset)
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

dataset = pd.read_csv("salaries.csv")
plt.scatter(phd,salary, label='Salary/phd', color='g',marker='+', s=80 )
plt.xlabel('phd')
plt.ylabel('salary')
plt.title('phd/ salary\n Spring 2018')
plt.legend()
plt.show()
Output:
11
B)scatter plot:
Code:

plt.scatter(rank,salary, label='salary/rank',color='g', marker='+', s=50 )


plt.xlabel('rank')
plt.ylabel('salary')
plt.title('salary/rank\n Spring 2018')
plt.legend()
plt.show()

Output:

C)Strip plot:
Code:
import seaborn as sns
sns.stripplot( x = dataset['salary’])

Output:
12
D)Strip plot with two fields:
Code:
sns.stripplot( x = dataset['sex'], y= dataset['salary'], data=dataset);

Output:

Result:

Scatter plot was generated using matplotlib.


13

EX.NO:4 Generating box chart using seaborn.

DATE:

Aim :
To use Seaborn for plotting Box plot.

A)Box plot using dataset:


Code:
(Note: Repeat Ex.No 1. B loading dataset)
import seaborn as sns
import pandas as pd
import numpy as np
dataset = pd.read_csv("salaries.csv")
sns.boxplot(x = dataset['salary'])

Output:

B) Box plot with notch:


Code:

sns.boxplot(x = dataset['salary'], notch=True)


14
Output:

C)Box plot with morethan one fields.


Code:

sns.boxplot(x = 'rank', y = 'salary', data=dataset)

Output:

D)Box plot with subplots.


Code:

sns.boxplot(x = 'rank', y = 'salary', hue='sex', data=dataset,palette=’Set3’)


15

Output:

E)Box plot with swarm plot.


Code:

sns.boxplot(x = 'rank', y = 'salary', data=dataset)


sns.swarmplot(x = 'rank', y = 'salary', data=dataset, color='0.25')

Output:

Result:
Box plot was generated using seaborn
16

EX.NO:5 Generating joint plot using seaborn.

DATE:

Aim :
To use Seaborn for plotting Joint plot.

A)Joint plot using dataset:


Code:
(Note: Repeat Ex.No 1. B loading dataset)
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
dataset = pd.read_csv("salaries.csv")
sns.jointplot(x = 'salary', y = 'service', data=dataset)

Output:
17
B)Joint plot with distribution:
Code:

sns.jointplot('salary', 'service', data=dataset,kind='reg')

Output:
18

C) Joint plot with kde.


Code:
sns.jointplot('salary', 'service',data=dataset).plot_joint(sns.kdeplot, n_levels=6)

Output:

Result:
Joint plot was generated using seaborn
19

EX.NO:6 Generating heatmap using seaborn and matplotlib.

DATE:

Aim :
To use Seaborn and matplotlib for plotting Heatmap.

A) Creating 2D array:
Code:
import seaborn as sns # for data visualization
import pandas as pd # for data analysis
import numpy as np # for numeric calculation
import matplotlib.pyplot as plt # for data visualization
array_2d = np.linspace(1,5,12).reshape(4,3) # create numpy 2D array

print(array_2d)

Output:

B)Simple heatmap:
Code:

sns.heatmap(array_2d)
20

Output:

C)Heatmap using reshape:


Code:

annot_arr = np.arange(1,13).reshape(4,3) # create 2D numpy array with 4 rows and 3 .


. columns
sns.heatmap(array_2d, annot= annot_arr)

Output:
21
D)Heatmap using dataset:
Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
globalWarming_df = pd.read_csv("Who_is_responsible_for_global_warming.csv")
globalWarming_df = globalWarming_df.drop(columns=['Country Code', 'Indicator Name',
. 'Indicator Code'] , axis=1).set_index('Country Name')
plt.figure(figsize=(16,9))

annot_kws={'fontsize':10,
'fontstyle':'italic',
'color':"k",
'alpha':0.6,
'rotation':"vertical",
'verticalalignment':'center',
'backgroundcolor':'w'}

sns.heatmap(globalWarming_df, annot = True, annot_kws= annot_kws)


Output:

Result:
Heat map was generated using seaborn and matplotlib.
22

EX.NO:7 Generating pie chart using matplotlib.

DATE:

Aim :
To use matplotlib for plotting pie charts.

A)Pie chart with explode:


Code:

import numpy as np
import matplotlib.pyplot as plt

# Creating dataset
cars = ['AUDI', 'BMW', 'FORD',
'TESLA', 'JAGUAR', 'MERCEDES']

data = [23, 17, 35, 29, 12, 41]

# Creating explode data


explode = (0.1, 0.0, 0.2, 0.3, 0.0, 0.0)

# Creating color parameters


colors = ( "orange", "cyan", "brown",
"grey", "indigo", "beige")

# Wedge properties
wp = { 'linewidth' : 1, 'edgecolor' : "green" }

# Creating autocpt arguments


def func(pct, allvalues):
absolute = int(pct / 100.*np.sum(allvalues))
return "{:.1f}%\n({:d} g)".format(pct, absolute)

fig, ax = plt.subplots(figsize =(10, 7))


wedges, texts, autotexts = ax.pie(data,
autopct = lambda pct: func(pct, data),
explode = explode,
labels = cars,
shadow = True,
23

colors = colors,
startangle = 90,
wedgeprops = wp,
textprops = dict(color ="magenta"))

# Adding legend
ax.legend(wedges, cars,
title ="Cars",
loc ="center left",
bbox_to_anchor =(1, 0, 0.5, 1))

plt.setp(autotexts, size = 8, weight ="bold")


ax.set_title("Customizing pie chart")

# show plot
plt.show()
Output:

Result:

Pie charts was generated using matplotlib.


24
EX.NO:8 Generating stream chart using matplotlib.

DATE:

Aim :
To use matplotlib for plotting stream graph.

A)Stream graph:
Code:

import numpy as np
import matplotlib.pyplot as plt

# Creating dataset
x = np.arange(0, 10)
y = np.arange(0, 10)

# Creating grids
X, Y = np.meshgrid(x, y)

# x-component to the right


u = np.ones((10, 10))

# y-component zero
v = np.zeros((10, 10))

fig = plt.figure(figsize = (12, 7))

# Plotting stream plot


plt.streamplot(X, Y, u, v, density = 0.5)

# show plot
plt.show()
Output: 25

B)Advanced stream plot:


Code:
import plotly.figure_factory as ff
import numpy as np

x=np.linspace(-1,1,10)
y=np.linspace(-1,1,10)
Y,X=np.meshgrid(x,y)
u=1-X**2+Y
v=-1+X-Y**2
fig=ff.create_streamline(x,y,u,v,arrow_scale=.1)
fig.show()
Output:
26

Result:
Stream graph was generated using matplotlib
27
EX.NO:9 Generating radial bar chart using matplotlib.

DATE:

Aim:
To use matplotlib for plotting radial bar chart.

A)Radial bar chart:


Code:
import pandas as pd

# Build a dataset
df = pd.DataFrame(
{
'Name': ['item ' + str(i) for i in list(range(1, 51)) ],
'Value': np.random.randint(low=10, high=100, size=50)
})

# Show 3 first rows


df.head(3)

Output:

Code:
# set figure size
plt.figure(figsize=(20,10))

# plot polar axis


ax = plt.subplot(111, polar=True)

# remove grid
plt.axis('off')

# Set the coordinates limits


28
upperLimit = 100
lowerLimit = 30

# Compute max and min in the dataset


max = df['Value'].max()

# Let's compute heights: they are a conversion of each item value in those new .
. coordinates
# In our example, 0 in the dataset will be converted to the lowerLimit (10)
# The maximum will be converted to the upperLimit (100)
slope = (max - lowerLimit) / max
heights = slope * df.Value + lowerLimit

# Compute the width of each bar. In total we have 2*Pi = 360°


width = 2*np.pi / len(df.index)

# Compute the angle each bar is centered on:


indexes = list(range(1, len(df.index)+1))
angles = [element * width for element in indexes]
angles

# Draw bars
bars = ax.bar(
x=angles,
height=heights,
width=width,
bottom=lowerLimit,
linewidth=2,
edgecolor="white")

Output:
29

B) Radial bar chart with label:


Code:

# initialize the figure


plt.figure(figsize=(20,10))
ax = plt.subplot(111, polar=True)
plt.axis('off')

# Draw bars
bars = ax.bar( x=angles,
height=heights, width=width,
bottom=lowerLimit, linewidth=2,
edgecolor="white",color="#61a4b2",
)

# little space between the bar and the label


labelPadding = 4

# Add labels
for bar, angle, height, label in zip(bars,angles, heights, df["Name"]):

# Labels are rotated. Rotation must be specified in degrees


rotation = np.rad2deg(angle)
# Flip some labels upside down
alignment = ""
if angle >= np.pi/2 and angle < 3*np.pi/2:
alignment = "right"
rotation = rotation + 180
else:
alignment = "left"
# Finally add the labels
ax.text(
x=angle, y=lowerLimit + bar.get_height() + labelPadding,
s=label,
ha=alignment, va='center',
rotation=rotation, rotation_mode="anchor")
30
Output:

. Result:
Radial bar chart was generated by using matplotlib.
31

EX.NO:10 Generating scatterplot matrix chart using plotly.

DATE:

Aim:
To use plotly for plotting scatter plot matrix.

A)Scatter matrix plot:


Code:

import plotly.express as px
df = px.data.iris()
fig = px.scatter_matrix(df,
dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
color="species")
fig.show()

Output:

B) Scatter matrix plot with selected dimension:


Code:
32

import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('iris_plotly.csv')
# The Iris dataset contains four data variables, sepal length, sepal width, petal length,
# petal width, for 150 iris flowers. The flowers are labeled as `Iris-setosa`,
# `Iris-versicolor`, `Iris-virginica`.

# Define indices corresponding to flower categories, using pandas label encoding


index_vals = df['class'].astype('category').cat.codes

fig = go.Figure(data=go.Splom(
dimensions=[dict(label='sepal length',
values=df['sepal length']),
dict(label='sepal width',
values=df['sepal width']),
dict(label='petal length',
values=df['petal length']),
dict(label='petal width',
values=df['petal width'])],
text=df['class'],
marker=dict(color=index_vals,
showscale=False, # colors encode categorical variables
line_color='white', line_width=0.5)
))

fig.update_layout(
title='Iris Data set',
dragmode='select',
width=600,
height=600,
hovermode='closest',
)

fig.show()
33
Output:

Result:
Scatter plot matrix was generated using plotly.
34
EX.NO:11 Generating parallel coordinates using plotly
Date:

Aim:
To use plotly for plotting parallel coordinates.

A)Parallel coordinates:
Code:
import plotly.graph_objects as go

fig = go.Figure(data=
go.Parcoords(
line_color='blue',
dimensions = list([
dict(range = [1,5],
constraintrange = [1,2], # change this range by dragging the pink line
label = 'A', values = [1,4]),
dict(range = [1.5,5],
tickvals = [1.5,3,4.5],
label = 'B', values = [3,1.5]),
dict(range = [1,5],
tickvals = [1,2,4,5],
label = 'C', values = [2,4],
ticktext = ['text 1', 'text 2', 'text 3', 'text 4']),
dict(range = [1,5],
label = 'D', values = [4,2])
]),
unselected = dict(line = dict(color = 'green', opacity = 0.5))
)
)

fig.show()
35

Output:

Result:
Parallel coordinates was generated using plotly.
36

EX.NO:12 Generating tree chart using squarify.

DATE:

Aim :
To use squarify for plotting tree map
.
A)Simple plot using squarify:
Code:
import squarify
import matplotlib.pyplot as plt
squarify.plot(sizes=[1, 2, 3, 4, 5], color="yellow")

Output:

B) Simple tree:
Code:
data = [300, 400, 120, 590, 600, 760]
colors = ["red", "black", "green", "violet", "yellow", "blue"]
squarify.plot(sizes=data, color=colors)
plt.axis("off")
37

Output:

C)Treemap with axis:


CODE:

episode_data = [1004, 720, 366, 360, 80]


anime_names = ["One Piece", "Naruto", "Bleach", "Gintama", "Attack On Titan"]
squarify.plot(episode_data, label=anime_names, pad=2)
plt.axis("on")

Output:

Result:
Tree map was generated using squarify.
38
EX.NO: 13 Study of Tableau Environment
DATE:

Aim:

To study about Tableau workspace, icons and their visual cues in Tableau Desktop.

i) Tableau Workspace
The Tableau workspace consists of menus, toolbar, data pane, cards and shelves, and
one or more sheets. Sheets can be worksheets, dashboards, or stories.

Workspace area

A. Workbook name.
• A workbook contains sheets. A sheet can be a worksheet, a dashboard,
or a story.
• A worksheet is where you build views of your data by dragging and
dropping fields onto shelves.

• A dashboard is a combination of several views that you can arrange for


presentation or to monitor.
39
• A story is a sequence of views or dashboards that work together to
convey information.

• The sheets display along the bottom of the workbook as tabs.


B. Cards and shelves - Drag fields to the cards and shelves in the workspace to add data to your
view.
C. Toolbar - Use the toolbar to access commands and analysis and navigation tools.
D. View - This is the canvas in the workspace where you create a visualization (also referred to
as a "viz").
E. Click this icon to go to the Start page, where you can connect to data.
F. Side Bar - In a worksheet, the side bar area contains the Data pane and the Analytics pane.
G. Data Source Page. Click this tab to go to the Data Source page and view your data.
H. Status bar - Displays information about the current view.
I. Sheet tabs - Tabs represent each sheet in your workbook. This can include worksheets,
dashboards, and stories. For more information, see Workbooks and Sheets.

ii) Visual Cues and Icons in Tableau Desktop


Tableau provides many visual cues to evaluate the type of data that’s displayed in
the Data pane and the state of a data view.
Icons used to describe the type of data sources in the Data pane

• A blue check mark indicates that the data source is the primary data source in
the workbook.

• An orange check mark indicates that the data source is the secondary data
source in the workbook.
Icons used to describe the Fields in the Data

• Blue icons indicate that the field is discrete. • Green icons indicate that the field
is continuous. • Icons preceded by the equal sign (=) indicate that the field is a user-
defined calculation or a copy of another field.
40
iii) Tableau – Data Terminology
1. Alias: Alias is the alternate name that we can appoint to the field or to a
dimension member.

2. Bin: In the data source, there is a user-defined grouping of measures known as


Bin.

3. Bookmark: A Bookmark is a .tbm document found in the bookmark’s envelope


of the Tableau repository, which contains a single worksheet. Unlike web
program bookmarks, .tbm file is a more suitable way to display different
analyses. It helps in upgrading data analysis.
4. Calculated field: A calculated field is a new field that the user creates, using a
formula, to modify the existing fields in a given data source. It is generally
created in order to perform analysis more easily and comfortably.

5. Crosstab: A crosstab is used for the text table view. It utilizes various text tables
to display the numbers correlated with dimension members.

6. Dashboard: A dashboard is a union of several views arranged on a single page. In


Tableau, a dashboard is used to interact with other worksheets and also to
observe and differentiate a variety of data simultaneously.

7. Data Pane: A Data pane is situated on the left side of the workbook. It exhibits
the data sources field which is connected to Tableau. The fields are further
divided into dimensions and measures. A Data pane also unveils custom fields
such as calculations, binned fields, and groups. The views of data sources are
built by dragging fields from the data pane onto the various shelves which is a
part of every worksheet.

8. Data Source Page: As the name suggests, the data source page is a page where
we set up the data source. The data source page mainly contains four main areas
− left pane, join area, a preview area, and metadata area.

9. Dimension: A field of categorical data is known as Dimension. The dimensions


are used to hold discrete data like hierarchies and members which cannot be
aggregated. The dimensions also hold characteristic values such as dates, names,
and geographical data. Examples are dates, customer names, and customer
segments.
41
10. Extract: A saved subset of a data source that can enhance performance and
study offline is known as an Extract. We can make an extract by defining limits
and filters that contain the data, which you want in the extract.

11. Filters Shelf: A Filter shelf is also situated on the left side of the workbook. The
function of the filters shelf is to exclude the data from a view by filtering it, using
both dimensions and measures.
12. Format Pane: A Format pane is found on the left side of the workbook, and it
also holds various formatting settings. Its function is to control the entire view of
the worksheet, as well as views of the individual fields.

13. Level Of Detail (LOD) Expression: A Level Of Detail Expression is a syntax


that assists the combination of various dimensions in addition to the view level.
Using Level Of Detail Expression, the user can attach multiple dimensions with
an aggregate expression.

14. Marks: Marks helps in the visual representation of one or more rows in a data
source. The users can control the type, color, and size of marks. A mark can be
anything like a bar, line, or square, etc.

15. Marks Card: The position of the Marks card is on the left side of the Worksheet.
On the marks card, the user can drag and drop fields to the control mark
properties such as color, type, shape, size, detail, and tooltip.

16. Pages Shelf: A page shelf is located on the left side of the view. The
functioning of the page shelf is to split a view into a sequence of pages on the
basis of values and members in a continuous or discrete field. The process of
adding a field with the pages shelf is the same as adding a field in rows shelf i.e
is for each new row, a new page will be created.

17. Rows Shelf: A Row shelf is situated on the top of the workbook. The row shelf
is used in creating the rows of a data table. It can create rows having any
numbers of measures and dimensions. If the user places a dimension on the
Rows shelf, then Tableau will build headers for the members of that dimension.
Whereas if the user places a measure on the Rows shelf, Tableau will build
quantitative axes for that particular measure.

18. Shelves: The Shelves are the named areas found on the top and left of the view.
The views can be created by placing fields onto the shelves. Some shelves
42

become active when the user selects a particular mark type. Like, the Shape shelf is
. available only when the user selects the Shape mark type.

19. Workbook: A Workbook is a file having .twb extension, which can carry one or
more worksheets as well as dashboards and stories.

20. Worksheet: A sheet where we build views of data set by dragging various fields
onto the shelves. The collection of sheets is known as Worksheets.

iv) Tableau Navigation


The main interface with all the available Menu commands.

File menu is used to create a new Tableau workbook and open existing workbooks
from both the local system and Tableau server.

Data menu is used to create new data source to fetch the data for analysis and
visualization. It also allows you to replace or upgrade the existing data source.

The important features in this menu are as follows


43
• New Data Source allows to view all the types of connections available and choose
from it.

• Refresh All Extracts refreshes the data from the source.

• Edit Relationships option defines the fields in more than one data source for linking.

Worksheet Menu is used to create a new worksheet along with various display
features such as showing the title and captions, etc.

The important features in this menu are as follows −

• Show Summary allows to view the summary of the data used in the worksheet such
as, count, etc.

• Tooltip shows the tooltip when hovering above various data fields.

• Run Update option updates the worksheet data or filters used.

Dashboard Menu is used to create a new dashboard along with various display
features, such as showing the title and exporting the image, etc.

The important features in this menu are as follows −

• Format sets the layout in terms of colors and sections of the dashboard.

• Actions link the dashboard sheets to external URLs or other sheets.

• Export Image option exports an image of the Dashboard.

Story Menu is used to create a new story which has many sheets or dashboards with
related data.

The important features in this menu are as follows −

▪ Format sets the layout in terms of colors and sections of the story.

▪ Run Update updates the story with the latest data from the source.

▪ Export Image option exports an image of the story.

Analysis Menu is used for analyzing the data present in the sheet. Tableau provides many
outof-the-box features, such as calculating the percentage and performing a forecast, etc.

The important features in this menu are as follows –


44
• Forecast shows a forecast based on available data.

• Trend Lines shows the trend line for a series of data.

• Create Calculated Field option creates additional fields based on certain calculation on
the existing fields.

Map Menu is used for building map views in Tableau. You can assign geographic
roles to fields in your data.

The important features in this menu are as follows −

• Map Layers hides and shows map layers, such as street names, country borders, and
adds data layers.

• Geocoding creates new geographic roles and assigns them to the geographic fields in
your data.

Format Menu is used for applying the various formatting options to enhance the
look and feel of the dashboards created. It provides features such as borders,
colors, alignment of text, etc.

The important features in this menu are as follows −

▪ Borders applies borders to the fields displayed in the report.

▪ Title & Caption assigns a title and caption to the reports.

▪ Cell Size customizes the size of the cells displaying the data.

▪ Workbook Theme applies a theme to the entire workbook.

Server Menu is used to login to the Tableau server if you have access, and publish
your results to be used by others. It is also used to access the workbooks published
by others.

The important features in this menu are as follows −

▪ Publish Workbook publishes the workbook in the server to be used by others.

▪ Publish Data Source publishes the source data used in the workbook.

Create User Filters creates filters on the worksheet to be applied by various users while accessing
. the report.
45
v) Data Types
In Tableau, we have seven primary data types. The function of Tableau is to
automatically detect the data types of various fields, as soon as the data is uploaded
from the source and allocate it to the fields. These seven data types are:

i) String Data type: The collection of characters give rise to the string data
type. A string is always enclosed within a single or double inverted comma.
We can divide String data type into two types, Char and Varchar.

ii) Numeric Data type: This data type consists of both integer type or floating
type.

iii) Date and Time Data type: Tableau supports all forms of date and time like
dd-mm-yy, or mm-dd-yyyy, etc. And the time data values can be in the form
of a decade, year, quarter, month, hour, minutes, seconds, etc. Whenever the
user enters data and time values, Tableau automatically registers it under
Date data type and Date & Time data value.

iv) Boolean Data type: As a result of relational calculations, boolean data type
values are formed. The boolean data values are either True or False.

v) Geographic Data type: All values that are used in maps, comes under
geographic data type. The example of geographic data values is country
name, state name, city, region, postal codes, etc.

vi) Cluster or Mixed Data type: Sometimes data set contains values having a
mixture of data types. Such values are known as cluster group values or
mixed data values. In such a situation, users have the option either to handle
it manually or allow Tableau to operate on it.

Datatypes and Icons

Result: Thus the environment of Tableau is studied.


46

Ex.NO:14 Connecting with various data sources


DATE:

Aim:
To connect tableau with following data sources.
a) text file b) Excel file c) PDF d)website e) Database
Procedure:

1. Select data tab and


connect to data 2. Clock on
corresponding data source.
3. Select the file.
4. Click on Open. This will connect the file into Tableau.
5. Name of the file will be displayed on the left side of the window. To connect to a
website
1. Click on More option in the To a server data tab.
2. Select Web data Connector. This will open a Tableau Web Data Connector Window.
3. Enter of Web data.

To connect to a MS Access Database

Sample Input Output:

1. Connecting to Text File Sample-Superstore.csv

(https://drive.google.com/uc?export=download&id=1xV_3jkn7UbHBpLd47DgY5g6o
bdbmQk1)
47

2. Connecting to Excel file (https://drive.google.com/uc?


export=download&id=1wq60dEFV3NUPqpwXaMKtmQPnxzqZHR5F)

3. Connecting to PDF File with tables:


48

4. Connecting to Websites (Google Sheet)


49

5. Connecting to MS Access database

Result:
Thus the tableau was connected with various data sources successfully and the
output was verified.
50
EX.NO: 15 Building Views
DATE:

Aim:
To build views of the connected file using dimensions and
measures automatically and customize it.
Procedure:
1. Connect the sample superstore datasource Excel file to Tableau.
2. Select the sheet to view and dclick.
3. Click sheet1 in status bar.
4. Select the fields you want to view from the Data pane and drag it into columns and rows
shelves.
5. Press ShowMe button on the tool bar.
6. Select the type of View.
7. Tableau automatically creates the view of the data.
8. Customize the view by selecting Custom Mark Type Sample Input and Output:

Input File : Connect SuperStore.xls


file Fields:

Columns : Sales
Rows : SubCategory, Region
1) Automatic View
51
2) Change Color

3) Change Size
52

4) Show Label

5) Show tooltip.

Result:

Thus the views have been created using dimensions and measures and customized.
53
EX.NO:16 Applying Filters
DATE:

Aim :
To apply filters for removing certain values or range of values from a result set.

Concepts
Filters are defined by selecting specific dimension members or a range of measure
values. All fields that are filtered will be in the Filters shelf. A filter can be added
either by selecting data in the view, dragging a field to the Filters shelf, or turning
on quick filters.

i) Creating Filters for Measures


Measures are numeric fields. So, the filter options for such fields involve choosing values.
Tableau offers the following types of filters for measures.
• Range of Values − Specifies the minimum and maximum values of the
range to include in the view.

• At Least − Includes all values that are greater than or equal to a specified
minimum value.

• At Most − Includes all values that are less than or equal to a specified
maximum value.

• Special − Helps you filter on Null values. Include only Null values, Non-
null values, or All Values.

• Following worksheet shows these options.


54

ii) Creating Filters for Dimensions

Dimensions are descriptive fields having values which are strings. Tableau offers the
following types of filters for dimensions.
• General Filter − allows to select specific values from a list.
• Wildcard Filter − allows to mention wildcards like cha* to filter all string values starting
with

cha.

• Condition Filter − applies conditions such as sum of sales.


• Top Filter − chooses the records representing a range of top values.

Following worksheet shows these options.


55

Queries

Connect SuperStore.xls file and perform the following queries

Data Connection
• Connect a dataset ( Super Store.Xls File)
• Drag and drop the the neede sheet of the connected dataset Click on sheet1 to open the
tableau worksheet.

• You will get whole dataset attributes on the left side and a worksheet for work. Add the

Dimensions and Measures (Sub Category, Region and Sales) Prepare a worksheet
with some graphs or charts.

1. Display the ship mode and subcategory of the products having average of profit more
than 50000$.
i. Create a view with ship mode, sub category in the
column shelf and profit in the rows shelf
ii. Drag the AVG (profit) value to the filter shelf. iii.
Choose Average as the filter mode. iv. Next, choose
"At least" and give the value 50000to filter the rows

2. Display profit within a period by specifying a range of dates


56

i. Create a view with order date in the column shelf and profit in the rows shelf
ii. Drag the "order date" field to the filter shelf
iii. Choose Range of dates in the filter dialog box. Choose the dates iv. On clicking
OK, the
final view appears showing the result for the chosen range of dates.

3. Select all subcategory name starting with “a”.

i. Create a view with sales in the column shelf and subcategory in the rows shelf
ii. Next, drag the sub-category field to the filters pane. iii. All the subcategories
appear next
to the chart. iv. Apply wildcard filtering using the expression a*
v. This selects all subcategory name starting with “a”.
4. Show orders with an average quantity of 26 or more.

Dragging the Order Quantity measure to the Filters shelf and select Average as the
aggregation.
Choose At Least filter with the minimum value set to 26.
When finished, click OK.

5. Find the top 10 Sub-Category of products for the category called Furniture. (Example
for
context filter)
Drag the dimension Sub-Category to the Rows shelf and the measure Sales
to the Columns Shelf. Choose the horizontal bar chart as the chart type. Drag
the dimension Sub-Category again to the Filters shelf. You will get the
following chart.

Right-click on the field Sub-Category in the filter shelf and go the fourth tab
named Top. Choose the option by field. From the next drop-down, choose
the option Top 10 by Sales Sum
Drag the dimension Category to the filter shelf. Right-click to edit and under
the general tab choose Furniture from the list. As you can see the result
shows three subcategory of products.

Step 4 − Right-click the Category: Furniture filter and select the option Add
to on text. This produces the final result, which shows the subcategory of
products from the category Furniture which are among the top 10
subcategories across all the products.
6. Create a view to show only sales between $5000 and $20,000.
Select the range of values
57

7. Show the bottom 10 products based on sales, all products sold in the last 30 days
8. Show the Products that have a Time to Ship that is greater than 10 days.
select this option to specify a condition based on existing fields in the data source.
Use the first two drop-down lists to select the field and aggregation you want to base
the condition on. Then select a condition operator such as greater than, equal to, etc.
Finally, type a criteria value into the text box.

By field: select Time to Ship and AVG from the first two drop-down lists. Then select Greater
( > ) from the operator list and type10 into the text
box.

By Formula: select this option for more advanced filter conditions. You can type a
custom formula into the text box or open the formula editing dialog box by clicking
the button to the right of the text box.

9. Create a basic filter on the Container dimension that excludes the Small Pack and Wrap
Bag shipping containers.
Show just the top 3 of those orders in terms of sales.
Exclude orders that were shipped via Delivery Truck.

i. Drag the Container dimension to the Filters shelf to open the Filter
dialog box.
ii. Click the None button at the bottom of the list to deselect all of the
shipping containers. iii. Then select the Exclude option in the upper right corner
of the dialog box. iv. Finally, select Small Pack and Wrap Bag.
58
v. When finished click OK. The view updates to only show orders that were
not shipped in a Small Pack or Wrap Bag.
vi. Now let’s refine the filter on Container by adding a limit. Right-click the
Container field on the Filters shelf and select Filter. vii. The Filter dialog
box opens. Leave the selections as they are.
Switch to the Top tab and select By Field.
viii. Select Top 3 from first two drop-down lists. ix. Then select Sales and SUM
from the remaining drop-down lists. When finished click OK.
x. The Top formula is computed after the selections on the General tab. Then the view
shows just the top 3 of those orders in terms of sales.
xi. Now let’s add a new filter on Ship Mode to exclude orders that were shipped via
Delivery Truck.
Right-click the Delivery Truck row header and select Exclude.
xii. The Delivery Truck ship mode is removed from each region in the view.

10. Create a view to include only orders that were placed between August 2, 2008
and May 1, 2009.
i. Place Order Date on to the Columns shelf and select All Values as the
aggregation. ii. Then place Profit onto the Rows shelf.
iii. Drag the Order Date field to the Filters shelf and select Range of Dates in the
Filter Field dialog box.
iv. Then click Next
v. The Filter dialog box is shown below. It displays the Order Date limits. Use the
dropdown date controls to specify a new lower limit of August 2, 2008 and an
upper limit of May 1, 2009.

Result:
Thus the given queries have been executed successfully and the outputs are verified.
59
EX.NO: 17 Drawing Charts and Graphs
DATE:

Aim:
To create various types of charts and graphs using the dimensions and measures

Procedure:
1. Connect the data source.
2. Go to the worksheet .
3. Drag the needed field into the column shelf.
4. Drag needed field into the row shelf.
5. Choose the chart type from the Show Me tool.

Sample Input and Output

i) Bar Chart

ii) Bar Chart with Color Range


60

iii) Stacked Bar Chart

iv) Simple Line Chart


61

v) Multiple Measure Line Chart


62

vi)Line Chart with Label

vii)Simple Pie Chart

viii)Drill-Down Pie Chart


63

ix) Bubble Charts


Drag the measures Profit and drop into the "Size" pane.
Drag the dimensions Ship Mode and drop into the "Labels" pane.
Also drag the dimension Ship Mode into the "Color
pane" under the "Marks" card.

x) Bubble Chart with Measure Values


1. drag the measure Sales into the "Labels" pane
64

xi) Bump Chart

The bump chart is used to compare two dimensions using one of the measure value.
It explores the changes in Rank of value over a time dimension or place dimension
or any other relevant dimension.

The bump chart can take two dimensions with zero or more measures.

Find variations between the Ship Mode of the product and the Sub-Category.

Step1: Drag the dimension Sub-Category to the column shelf.

Step2: Drag the dimension Ship Mode to the color shelf.

By default, it creates the following view of the chart.


65

Step3: Create the calculated field.

1) Go to the Analysis and create a calculated field.

2) Give a field name to the calculated field such as Rank.

3) Write the expression "Index ()" in the calculated field calculation area. It creates an index for
the

current row in the partition.

4) Click on the OK button.

5) The new calculated field Rank will be visible in the Measure section.

6) Drag the Rank field to the row shelf.

7) Right-click on the Rank field and convert into "Discrete"

After that, the following view appears that shows the dimension Sub-Category with each
Ship Mode
66

Step5: Apply some more calculations to the calculated field Rank using the measure Profit.

1) Right-click on the Measure Rank. And

2) Choose "Edit Table Calculation" option.

3) It opens the Table Calculation window.

4) Then, choose the "Specific Dimensions" option.

5) Select the Sub-Category field and Ship Mode field.

6) And, choose the sorting by the Profit field using partition by Sub-Category and addressed by

Ship Mode.

The following view will appear shown in the below screenshot.


67

After completion of the all above steps, you will get the bump chart as shown in the
below screenshot. It shows the variation of the Profit for each Ship Mode across
different subcategories.

Histogram

A histogram chart is a chart that displays the shape of the distribution.


68

A histogram looks like a bar chart but group values for a continuous measure into range. In the
histogram, each bar represents the height of the number of values present in that range.

To create a histogram, we need only one measure. It creates the additional bin field for the
measure.

For example, consider the data source such as Sample-Superstore, and if you to
find the Quantities of sales for different Segment. For this, follow the below
procedure step by step, such as:ec

Step 1: Go to the worksheet.

Step 2: Drag the measure Quantity into the columns shelf.

Step 3: Click on the "show me" toolbar and select the histogram chart icon, shown in the
below screenshot.
69

NOTE: The histogram chart is available in "show me" when the view contains only one
measure and no dimensions.
Step 4: After selecting the histogram chart as the chart type. Then,

o The view changes and shows vertical bars, with a continuous X-axis and Y-axis.
o The measure Quantity with SUM aggregate in columns shelf is replaced by
continuous Quantity(bin) dimension.

o The Quantity field moves to the rows shelf and aggregation changes from SUM to
CNT or (Count).
70

Step 5: Drag the dimension Segment and drop into the Color shelf under the Marks pane.
71

After adding the Segment field to Color shelf, you can see a relationship between the
Segment field and the Quantity of item as per order is shown in the below screenshot.

Step 6: Hold the Ctrl key in the keyboard and drag CNT(Quantity) field from the rows
shelf Label shelf under the Marks pane.
72

Step 7: Right-click on the CNT(Quantity) field in Marks pane. And

o Click on the Quick Table Calculation option from the list. o Select the Percent of
Total option.
73

Now each colored section of each bar shows its percentage of total quantity shown in the
following screenshot.
74

Result:
Various types of charts and graphs are created by using the dimensions and
measures .
75

EX.NO: 18 Working with Functions

DATE:

Aim:

To work with inbuilt functions

Concept:

Tableau has a number of inbuilt functions which help in creating expressions for complex
calculations.
Following are the different categories of functions.
• Number Functions
• String Functions
• Date Functions
• Logical Functions
• Aggregate Functions
Queries
1. Find the minimum sales of each category Procedure

1. Connect to the Sample - Superstore saved data source, which comes with Tableau.

2. Navigate to a worksheet and select Analysis > Create Calculated Field.

3. In the calculation editor that opens, do the following:

4. Name the calculated field Minimum Sales transaction

5. Enter the following formula:

a. MIN(Sales)

6. When finished, click OK.


76

2. Display the order number from order ID

1. Connect to the Sample - Superstore saved data source, which comes with Tableau.

2. Navigate to a worksheet.

3. From the Data pane, under Dimensions, drag Order ID to the Rows shelf.

Notice that every order ID contains values for country (CA and US, for
example), year (2011), and order number (100006). For this example, you
will create a calculation to pull only the order number from the field.

4. Select Analysis > Create Calculated Field.

5. In the calculation editor that opens, do the following:

o Name the calculated field Order ID Numbers.


o Enter the following formula:
77

RIGHT([Order ID], 6)

This formula takes the specified digits (6) from the right of the string and
pulls them into a new field.

Therefore, RIGHT('CA-2011-100006' , 6) =

'100006'. o When finished, click OK.

The new calculated field appears under Dimensions in the Data


pane. Just like your other fields, you can use it in one or more
visualizations.

6. From the Data pane, drag Order ID Numbers to the Rows shelf. Place it to the right of
Order ID.

Notice how the fields differ now.

3. Display the year and month of order date

1. In Tableau Desktop, connect to the Sample-Superstore saved data source, which comes
with Tableau.

2. Navigate to a worksheet.

3. From the Data pane, under Dimensions, drag Order Date to the Rows shelf.

4. On the Rows shelf, click the plus icon (+) on the YEAR(Order Date) field.

QUARTER(Order Date) is added to the Rows shelf and the view updates.
78

5. On the Rows shelf, click the plus icon (+) on the QUARTER(Order Date) field to drill
down to MONTH(Order Date).Select Analysis > Create Calculated Field.

7. In the calculation editor that opens, do the following:

o Name the calculated field,

Quarter Date. o Enter the

following formula:

DATETRUNC('quarter', [Order Date])

o When finished, click OK.

The new date calculated field appears under Dimensions in the Data
pane. Just like your other fields, you can use it in one or more
visualizations.

8. From the Data pane, under Dimensions, drag Quarter Date to the Rows shelf and place it
to the right of MONTH(Order Date).
79
The visualization updates with year values. This is because Tableau rolls date
data up to the highest level of detail.
9. On the Rows shelf, right-click YEAR(Quarter Date) and select Exact Date.

10. On the Rows shelf, right-click YEAR(Quarter Date) again and select Discrete.

The visualization updates with the exact quarter date for each row in the table.

4. Find the categories which are losing money in each state

1. Connect to the Sample - Superstore saved data source, which comes with
Tableau.

2. Navigate to a worksheet.

3. From the Data pane, drag State to the Rows shelf.

4. From the Data pane, drag Category to the Rows shelf and place it to the right of
State.

5. From the Data pane, drag Sales to the Columns shelf.

6. Select Analysis > Create Calculated Field.

7. In the calculation editor that opens, do the following:

o Name the calculated field, KPI.

o Enter the following formula:

SUM([Profit]) > 0

This calculation quickly checks if a member is great than


zero. If so, it returns true; if not, it returns false. o When
finished, click OK.

The new calculated field appears under Measures in the Data pane.
80

Just like your other fields, you can use it in onr or more visualizations.

8. From the Data pane, drag KPI to Color on the Marks card.
You can now see which categories are losing money in each state.

5. How many orders the store had for a particular year

1. Use the COUNTD function to summarize the exact number of orders

COUNTD(Order ID)

2. Break the visualization down by year.

Result:
Inbuilt functions are worked.
81
EX.NO:19 Creating Dashboard
DATE:

Aim:
To create and format dashboard for sales/profit analysis.
Concept:
A dashboard is a collection of several worksheets and other related information in a
single place. Dashboards are used for comparing and monitoring a variety of data,
all at once. Data in sheets and dashboards are connected. When a sheet is modified,
any dashboards containing it change, and vice versa. Both sheets and dashboards
update with the latest available data from the data source.
A blank dashboard will appear with the Data window replaced by four sections: a
list of existing worksheets in the workbook, dashboard objects, a layout section, and
a sizing section for customizing dashboard element sizes.

Query:
Create a dashboard showing the sales and profits for different segments and Sub-Category
of products across all the states.
Procedure:
Step 1: Create a blank worksheet by using the add worksheet icon that is located at
the bottom of the workbook.
i) Drag the dimension Segment to the columns
area and the dimension Sub-Category to the
Rows area.
ii) Drag and drop the measure Sales to the Color
area iii) Drag and drop measure Profit to
the Size area. iv) This will plot a chart,
and name this sheet as sales-profits.
Step 2: Create sheet2 to analyze details of the Sales across the various States.
i) Drag the dimension State to the Rows area and the measure Sales to the
Columns area. ii) Add a filter to the State field to arrange the Sales in an
ascending or descending order. iii) Name this worksheet as a sales state.

Step 3: Create sheet3 to display a map


i) Double click on geographical field
country, state ii) Drag and drop profit
into the color area iii) Drag and drop states
into label iv) Drag and drop sales into label
82

Step 4: Create a blank dashboard now by clicking the create new dashboard icon,
which is at the bottom of the workbook or Go to Dashboard Tab and Click New
Dashboard.

Step 5: Now, drag the 3 worksheets that were created in previous steps to the
dashboard. Once done, you can see three small icons near the top borderline of the
sales profit worksheet. Clicking the middle icon will show the prompt use as Filter
when mouse hovering is done over it.

Step 6: Now, as the last step, click the box that represents sub-category machines
and segment consumers in the dashboard. This is done to show only the states where
the sales happened for this amount/criterion of profit are filtered out in the right pane.
This shows that the sheets are linked.
Step 7: Add dashboard objects that add visual appeal and interactivity.
83

Output:

Result:
Thus the dashboard has been created and formatted.

You might also like