0% found this document useful (0 votes)
22 views36 pages

4 - Data Science Application

Uploaded by

Nia Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Matplotlib,
  • Infographics,
  • Data Visualization Challenges,
  • Customer Behavior,
  • Visualization Design,
  • Visualization Customization,
  • Data Visualization Examples,
  • Visualization History,
  • Data Clarity,
  • Chartblocks
0% found this document useful (0 votes)
22 views36 pages

4 - Data Science Application

Uploaded by

Nia Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Matplotlib,
  • Infographics,
  • Data Visualization Challenges,
  • Customer Behavior,
  • Visualization Design,
  • Visualization Customization,
  • Data Visualization Examples,
  • Visualization History,
  • Data Clarity,
  • Chartblocks

Data Science Application

 What is Data Visualization?

 Data visualization is a graphical representation of quantitative information and data


by using visual elements like graphs, charts, and maps.

 Data visualization convert large and small data sets into visuals, which is easy to
understand and process for humans.

 Data visualization tools provide accessible ways to understand outliers, patterns, and
trends in the data.

 Data visualizations are common in your everyday life, but they always appear in the
form of graphs and charts. The combination of multiple visualizations and bits of
information are still referred to as Infographics.

 Data visualizations are used to discover unknown facts and trends.

 You can see visualizations in the form of line charts to display change over time.

 Bar and column charts are useful for observing relationships and making
comparisons. A pie chart is a great way to show parts-of-a-whole. And maps are the
best way to share geographical data visually.

 Today's data visualization tools go beyond the charts and graphs used in the
Microsoft Excel spreadsheet, which displays the data in more sophisticated ways
such as dials and gauges, geographic maps, heat maps, pie chart, and fever chart.

Prepared by: Priti Shiyani 1


Data Science Application
 What makes Data Visualization Effective?

 Effective data visualization is created by communication, data science, and design


collide. Data visualizations did right key insights into complicated data sets into
meaningful and natural.

 To craft an effective data visualization, you need to start with clean data that is well-
sourced and complete. After the data is ready to visualize, you need to pick the right
chart.

 After you have decided the chart type, you need to design and customize your
visualization to your liking. Simplicity is essential - you don't want to add any
elements that distract from the data.

Prepared by: Priti Shiyani 2


Data Science Application
 History of Data Visualization
 The concept of using picture was launched in the 17th century to understand the
data from the maps and graphs, and then in the early 1800s, it was reinvented to the
pie chart.

 Several decades later, one of the most advanced examples of statistical graphics
occurred when Charles Minard mapped Napoleon's invasion of Russia. The map
represents the size of the army and the path of Napoleon's retreat from Moscow -
and that information tied to temperature and time scales for a more in-depth
understanding of the event.

Minard is best known for his cartographic depiction of numerical data on a map
of Napoleon's disastrous losses suffered during the Russian campaign of 1812 (in
French,

 The map of Napoleon's Russian campaign

Prepared by: Priti Shiyani 3


Data Science Application
 Computers made it possible to process a large amount of data at lightning-fast
speeds. Nowadays, data visualization becomes a fast-evolving blend of art and
science that certain to change the corporate landscape over the next few years.

Prepared by: Priti Shiyani 4


Data Science Application
 Importance of Data Visualization

 Data visualization is important because of the processing of information in human


brains. Using graphs and charts to visualize a large amount of the complex data sets
is more comfortable in comparison to studying the spreadsheet and reports.

 Data visualization is an easy and quick way to convey concepts universally. You can
experiment with a different outline by making a slight adjustment.

Data visualization have some more specialties such as:


o Data visualization can identify areas that need improvement or modifications.
o Data visualization can clarify which factor influence customer behavior.
o Data visualization helps you to understand which products to place where.
o Data visualization can predict sales volumes.

 Data visualization tools have been necessary for democratizing data, analytics, and
making data-driven perception available to workers throughout an organization.

 They are easy to operate in comparison to earlier versions of BI software or


traditional statistical analysis software. This guide to a rise in lines of business
implementing data visualization tools on their own, without support from IT.

Prepared by: Priti Shiyani 5


Data Science Application
 Why Use Data Visualization?
1. To make easier in understand and remember.
2. To discover unknown facts, outliers, and trends.
3. To visualize relationships and patterns quickly.
4. To ask a better question and make better decisions.
5. To competitive analyze.
6. To improve insights.

Prepared by: Priti Shiyani 6


Data Science Application
 Top 10 Data Visualization Tools

 There are tools which help you to visualize all your data in a few minutes. They
are already there; only you need to do is to pick the right data visualization tool
as per your requirements.

 Data visualization allows you to interact with data. Google, Apple, Facebook,
and Twitter all ask better a better question of their data and make a better
business decision by using data visualization.

 Here are the top 10 data visualization tools that help you to visualize the data:

Prepared by: Priti Shiyani 7


Data Science Application
1. Tableau

 Tableau is a data visualization tool. You can create graphs, charts, maps, and
many other graphics.

 A tableau desktop app is available for visual analytics. If you don't want to install
tableau software on your desktop, then a server solution allows you to visualize
your reports online and on mobile.
 A cloud-hosted service also is an option for those who want the server solution
but don't want to set up manually. The customers of Tableau include Barclays,
Pandora, and Citrix

Prepared by: Priti Shiyani 8


Data Science Application
2.Infogram

 Infogram is also a data visualization tool. It has some simple steps to process
that:
1. First, you choose among many templates, personalize them with additional
visualizations like maps, charts, videos, and images.
2. Then you are ready to share your visualization.
3. Infogram supports team accounts for journalists and media publishers, branded
designs of classroom accounts for educational projects, companies, and
enterprises.

 An infogram is a representation of information in a graphic format designed to


make the data easily understandable in a view. Infogram is used to quickly
communicate a message, to simplify the presentation of large amounts of the
dataset, to see data patterns and relationships, and to monitor changes in
variables over time.
 Infogram abounds in almost any public environment such as traffic signs,
subway maps, tag clouds, musical scores, and weather charts, among a huge
number of possibilities.

Prepared by: Priti Shiyani 9


Data Science Application

3. Chartblocks

 Chartblocks is an easy way to use online tool which required no coding and
builds visualization from databases, spreadsheets, and live feeds.

 Your chart is created under the hood in html5 by using the powerful JavaScript
library D3.js. Your visualizations is responsive and compatible with any screen
size and device. Also, you will be able to embed your charts on any web page,
and you can share it on Facebook and Twitter.

Prepared by: Priti Shiyani 10


Data Science Application

4. Datawrapper

 Datawrapper is an aimed squarely at publisher and journalist. The Washington


Post, VOX, The Guardian, BuzzFeed, The Wall Street
Journal and Twitter adopts it.

 Datawrapper is easy visualization tool, and it requires zero codings. You can
upload your data and easily create and publish a map or a chart. The custom
layouts to integrate your visualizations perfectly on your site and access to local
area maps are also available.

Prepared by: Priti Shiyani 11


Data Science Application
5. Plotly

 Plotly will help you to create a slick and sharp chart in just a few minutes or in a
very short time. It also starts from a simple spreadsheet.

 The guys use Plotly at Google and also by the US Air Force, Goji and The New
York University.
 Plotly is very user-friendly visualization tool which is quickly started within a few
minutes. If you are a part of a team of developers that wants to have a crack, an
API is available for JavaScript and Python languages.

Prepared by: Priti Shiyani 12


Data Science Application
6. RAW

 RAW creates the missing link between spreadsheets and vector graphics on its
home page.

 Your Data can come from Google Docs, Microsoft Excel, Apple Numbers, or a
simple comma-separated list.
 Here the kicker is that you can export your visualization easily and have a
designer to make it look sharp. RAW is compatible with Inkscape, Adobe
Illustrator, and Sketch. RAW is very easy to use and get quick results.

Prepared by: Priti Shiyani 13


Data Science Application
7. Visual.ly

 Visual.ly is a visual content service. It has a dedicated data visualization service


and their impressive portfolio that includes work for Nike, VISA, Twitter, Ford,
The Huffington post, and the national geographic.

 By a streamlined online process, you can find entire outsource your


visualizations to a third-party where you describe your project and connected
with a creative team that will stay with you for the entire duration of the project.
 Visual.ly sends you an email notification for all the event you are hitting, and
also it will give you constant feedback to your creative team. Visual.ly offer their
distribution network for showcasing your project after it's completed.

Prepared by: Priti Shiyani 14


Data Science Application
8. D3.js

 D3.js is a best data visualization library for manipulating documents. D3.js runs
on JavaScript, and it uses CSS, html, and SVG. D3.js is an open-source and
applies a data-driven transformation to a webpage. It's only applied when data
is in JSON and XML file.

 D3.js emphasis on web standards gives you the full capabilities of modern
browsers without tying yourself to a single framework and combining powerful
visualization components.
 D3.js is as powerful as it is a cutting-edge library, so it comes with no pre-built
charts and only IE9+ supports this library.

Prepared by: Priti Shiyani 15


Data Science Application
9. Ember Charts

 Ember charts are based on the ember.js and D3.js framework, and it uses the
D3.js under the hood. It also applied when the data is in JSON and XML file.

 It includes a bar, time series, pie, and scatter charts which are easy to extend
and modify. These chart components represent our thoughts on best practices
in chart presentation and interactivity.
 The team behind Ember Charts is also the same that created Ember.js. It puts a
lot of focus on best practices and interactivity. Error handling is very graceful,
and your app will not crash after finding irrelevant data or corrupt data.

Prepared by: Priti Shiyani 16


Data Science Application
10. NVD3

 NVD3 is a project that attempts to build reusable charts and components. This
project is to keeps all your charts neat and customizable.

 NDV3 is a simpler interface on the top of the D3.js and keeps all of its powerful
features under the hood.
 The front end engineers develop NDV3, and they use their insight into charting
technology. This charting technology is used to provide powerful analytics to
clients in the financial industry.

Prepared by: Priti Shiyani 17


Data Science Application
 Top Data Visualization Libraries Available in Python, R, and
Javascript
The following are the top Data Visualization Libraries
 Python:
 Matplotlib
 Plotly
 ggplot
 Seaborn
 Altair
 Geoplotlib
 Bokeh
 R:
 ggplot2
 Plotly
 Leaflet
 Esquisse
 Lattice
 Javascript:
 D3.js
 Chart.js
 Plotly

Prepared by: Priti Shiyani 18


Data Science Application
 Matplotlib Tutorial

 Matplotlib is easy to use and an amazing visualizing library in Python.


 It is built on NumPy arrays and designed to work with the broader SciPy stack and
consists of several plots like line, bar, scatter, histogram, etc.

 Matplotlib is a low level graph plotting library in python that serves as a


visualization utility.
 Matplotlib was created by John D. Hunter.
 Matplotlib is open source and we can use it freely.
 Matplotlib is mostly written in python, a few segments are written in C, Objective-C
and Javascript for Platform compatibility.

 Installation of Matplotlib

 If you have Python and PIP already installed on a system, then installation of
Matplotlib is very easy.Install it using this command:

C:\Users\Your Name>pip install matplotlib

If this command fails, then use a python distribution that already has Matplotlib installed,
like Anaconda, Spyder etc.

 Import Matplotlib

Once Matplotlib is installed, import it in your applications by adding


the import module statement:

import matplotlib

 Checking Matplotlib Version


The version string is stored under __version__ attribute.
import matplotlib
print(matplotlib.__version__)

Prepared by: Priti Shiyani 19


Data Science Application
 Matplotlib Pyplot

 Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:

import matplotlib.pyplot as plt

Now the Pyplot package can be referred to as plt.

Example:

Draw a line in a diagram from position (0,0) to position (6,250):


import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([0, 6])


ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()

Output:

Prepared by: Priti Shiyani 20


Data Science Application
 Matplotlib Plotting

Plotting x and y points


 The plot() function is used to draw points (markers) in a diagram.
 By default, the plot() function draws a line from point to point.
 The function takes parameters for specifying points in the diagram.
 Parameter 1 is an array containing the points on the x-axis.
 Parameter 2 is an array containing the points on the y-axis.
 If we need to plot a line from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and
[3, 10] to the plot function.

Example:

Draw a line in a diagram from position (1, 3) to position (8, 10):


import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([1, 8])


ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints)
plt.show()

Output:

Prepared by: Priti Shiyani 21


Data Science Application
 Matplotlib Markers

You can use the keyword argument marker to emphasize each point with a specified
marker:
Example:
Mark each point with a circle:
import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, marker = 'o')


plt.show()

output:

Example:Mark each point with star


...
plt.plot(ypoints, marker = '*')
...
Output:

Prepared by: Priti Shiyani 22


Data Science Application
 Matplotlib Line
Linestyle

You can use the keyword argument linestyle, or shorter ls, to change the style of the
plotted line:
Example: dotted line
import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, linestyle = 'dotted')


plt.show()

output:

Example:Use a dashed line:

plt.plot(ypoints, linestyle = 'dashed')

output:

Prepared by: Priti Shiyani 23


Data Science Application
 Shorter Syntax
The line style can be written in a shorter syntax:
linestyle can be written as ls.
dotted can be written as :.
dashed can be written as --.
Example:Shorter syntax:
plt.plot(ypoints, ls = ':')

output:

 Line Styles
 You can choose any of these styles:

Style Or

'solid' (default) '-'

'dotted' ':'

'dashed' '--'

'dashdot' '-.'

'None' '' or ' '

Prepared by: Priti Shiyani 24


Data Science Application
 Line Color
You can use the keyword argument color or the shorter c to set the color of the line:

Example:Set the line color to red:

import matplotlib.pyplot as plt


import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, color = 'r')


plt.show()

output:

You can also use Hexadecimal color values:


Example:Plot with a beautiful green line:
...
plt.plot(ypoints, c = '#4CAF50')
...
Output:

Prepared by: Priti Shiyani 25


Data Science Application
 Or any of the 140 supported color names.

Example:Plot with the color named "hotpink":


...
plt.plot(ypoints, c = 'hotpink')
...

Output:

 Line Width
 You can use the keyword argument linewidth or the shorter lw to change the width
of the line.The value is a floating number, in points:

Example:Plot with a 20.5pt wide line:


import matplotlib.pyplot as plt
import numpy as np

ypoints = np.array([3, 8, 1, 10])

plt.plot(ypoints, linewidth = '20.5')


plt.show()

output:

Prepared by: Priti Shiyani 26


Data Science Application
 Multiple Lines
You can plot as many lines as you like by simply adding more plt.plot() functions:

Example:Draw two lines by specifying a plt.plot() function for each line:


import matplotlib.pyplot as plt
import numpy as np

y1 = np.array([3, 8, 1, 10])
y2 = np.array([6, 2, 7, 11])

plt.plot(y1)
plt.plot(y2)

plt.show()

output:

Prepared by: Priti Shiyani 27


Data Science Application
 You can also plot many lines by adding the points for the x- and y-axis for each line
in the same plt.plot() function.
 (In the examples above we only specified the points on the y-axis, meaning that the
points on the x-axis got the the default values (0, 1, 2, 3).)
 The x- and y- values come in pairs:

Example:Draw two lines by specifiyng the x- and y-point values for both lines:
import matplotlib.pyplot as plt
import numpy as np

x1 = np.array([0, 1, 2, 3])
y1 = np.array([3, 8, 1, 10])
x2 = np.array([0, 1, 2, 3])
y2 = np.array([6, 2, 7, 11])

plt.plot(x1, y1, x2, y2)


plt.show()

output:

Prepared by: Priti Shiyani 28


Data Science Application
 Matplotlib Labels and Title
 Create Labels for a Plot
 With Pyplot, you can use the xlabel() and ylabel() functions to set a label for the x-
and y-axis.
ExampleGet your own Python Server
Add labels to the x- and y-axis:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.plot(x, y)

plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.show()

output:

Prepared by: Priti Shiyani 29


Data Science Application
 Create a Title for a Plot
 With Pyplot, you can use the title() function to set a title for the plot.

Example:Add a plot title and labels for the x- and y-axis:


import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.plot(x, y)

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.show()

output:

Prepared by: Priti Shiyani 30


Data Science Application
 Set Font Properties for Title and Labels

 You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font
properties for the title and labels.
Example:Set font properties for the title and labels:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}

plt.title("Sports Watch Data", fontdict = font1)


plt.xlabel("Average Pulse", fontdict = font2)
plt.ylabel("Calorie Burnage", fontdict = font2)

plt.plot(x, y)
plt.show()

output:

Prepared by: Priti Shiyani 31


Data Science Application
 Position the Title

 You can use the loc parameter in title() to position the title.
 Legal values are: 'left', 'right', and 'center'. Default value is 'center'.
Example
Position the title to the left:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data", loc = 'left')


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)
plt.show()

output:

Prepared by: Priti Shiyani 32


Data Science Application
 Matplotlib Adding Grid Lines
 Add Grid Lines to a Plot
 With Pyplot, you can use the grid() function to add grid lines to the plot.

Example:Add grid lines to the plot:


import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid()

plt.show()

output:

Prepared by: Priti Shiyani 33


Data Science Application
 Specify Which Grid Lines to Display
 You can use the axis parameter in the grid() function to specify which grid lines to
display.
 Legal values are: 'x', 'y', and 'both'. Default value is 'both'.
Example:Display only grid lines for the x-axis:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid(axis = 'x')

plt.show()

output:

Prepared by: Priti Shiyani 34


Data Science Application
ExampleDisplay only grid lines for the y-axis:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid(axis = 'y')

plt.show()

output:

Prepared by: Priti Shiyani 35


Data Science Application
 Set Line Properties for the Grid

You can also set the line properties of the grid, like this: grid(color = 'color', linestyle =
'linestyle', linewidth = number).
Example:Set the line properties of the grid:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])

plt.title("Sports Watch Data")


plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")

plt.plot(x, y)

plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)

plt.show()

output:

Prepared by: Priti Shiyani 36

You might also like