You are on page 1of 27

U N I T- 6

G R A P H A N A LY T I C S A N D D ATA
V I S U A L I Z AT I O N
Prepared By:
Aayushi Chaudhari,
Assistant Professor, CE, CSPIT,
CHARUSAT

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 1


Agenda
• What is data hbmn ?
• Benefits of using data visualization
• Why is it required?
• Its benefits and why is it required?
• Apache Spark GraphX: Property Graph
• Graph Operator
• SubGraph, Triplet
• Neo4j: Modeling data with Neo4j
• Cypher
• Query Language: General clauses
• Read and Write clauses.
• Big Data Visualization with Power BI
• Apache Super-Set

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 2


What is data visualization?
• Data visualization is the practice of translating information into a visual
context, such as a map or graph, to make data easier for the human brain
to understand and pull insights from.
• The main goal of data visualization is to make it easier to identify patterns,
trends and outliers in large data sets.
• The term is often used interchangeably with others, including information
graphics, information visualization and statistical graphics.
• Data visualization is one of the steps of the data science process, which states
that after data has been collected, processed and modeled, it must be
visualized for conclusions to be made.
Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 3
What is data visualization? Cont..
• Data visualization is important for almost every career.
• It can be used by teachers to display student test results, by computer
scientists exploring advancements in artificial intelligence (AI) or by
executives looking to share information with stakeholders.
• It also plays an important role in big data projects.
• As businesses accumulated massive collections of data during the early years
of the big data trend, they needed a way to quickly and easily get an
overview of their data.
• Visualization tools were a natural fit.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 4


Need of Data Visualization
• When a data scientist is writing advanced predictive analytics or machine
learning (ML) algorithms, it becomes important to visualize the outputs
to monitor results and ensure that models are performing as intended.
• This is because visualizations of complex algorithms are generally easier
to interpret than numerical outputs.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 5


Example

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 6


Importance of Data Visualization
• Data visualization provides a quick and effective way to communicate information in a
universal manner using visual information.
• The practice can also help businesses identify which factors affect customer behavior;
pinpoint areas that need to be improved or need more attention; make data more
memorable for stakeholders; understand when and where to place specific products; and
predict sales volumes.
• It has ability to absorb information quickly, improve insights and make faster decisions.
• It provides an increased understanding of the next steps that must be taken to improve the
organization.
• Provides an improved ability to maintain the audience's interest with information they can
understand.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 7


Importance of Data Visualization cont..
• Provides an easy distribution of information that increases the
opportunity to share insights with everyone involved.
• It eliminates the need for data scientists since data is more
accessible and understandable.
• Provides an increased ability to act on findings quickly and,
therefore, achieve success with greater speed and less mistakes.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 8


Data Visualization for Big data
• Data analysis projects have made visualization more important than ever.
• Companies are increasingly using machine learning to gather massive amounts of data that
can be difficult and slow to sort through, comprehend and explain.
• Visualization offers a means to speed this up and present information to business owners
and stakeholders in ways they can understand.
• Big data visualization often goes beyond the typical techniques used in normal
visualization, such as pie charts, histograms and corporate graphs.
• It instead uses more complex representations, such as heat maps and fever charts.
• Big data visualization requires powerful computer systems to collect raw data, process it
and turn it into graphical representations that humans can use to quickly draw insights.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 9


Needs of Organizations to use Data Visualization
Visualization specialist is required for organization, who can apply appropriate data set and
visual styles so that, it guarantees that the organization are optimizing the use of the data.
Involvement of IT specialist is required as organization would need powerful computer
hardware, efficient storage systems and even a move to the cloud.
Quality of data to be used needs to accurate and should be in control of governing person.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 10


Example of Various Visualization Styles
In the early days of visualization, the most common visualization technique was using a
Microsoft Excel, spreadsheet to transform the information into a table, bar graph or pie
chart. While these visualization methods are still commonly used, more intricate
techniques are now available, including the following:
 infographics
 bubble clouds
 bullet graphs
 heat maps
 fever charts

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 11


Example of Infographics

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 12


Example of bubble clouds

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 13


Example of Bullet chart

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 14


Example of heat map

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 15


Fever chart example

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 16


Apache Spark GraphX
• GraphX is the graph processing library, built in Apache Spark.
• It makes use of Property Graph and Spark RDD.
• GraphX is the hybrid technology, that combines two components, data
parallel systems, such as Hadoop and spark, which focus on distributed
data across multiple nodes.
• Graph-parallel systems such as pregel, Graph lab, Giraph, efficiently
execute graph algorithms through partitioning and distributing
techniques.
• GraphX will unify data parallel and Graph parallel approach.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 17


Table View v/s Graph view

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 18


Data parallel v/s Graph parallel

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 19


GraphX
• GraphX is the collection of graph that extends the Spark
RDD(Resilient Distributed Database) class, which is an
immutable distributed collection of objects.
• Basically there are two types of graphs:
• Directed Graph: Edges have direction associated with the graph.
• Regular Graph: Graph where each vertex has same number of
edges.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 20


GraphX property graph
• It is a directed multigraph which has multiple edges in a
parallel.
• Every edge and vertex has user defined properties
associated with it.
• The parallel edges allow multiple relationships between
the same vertices.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 21


Example of Property Graph

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 22


Example
In this scenario, we will analyze three flights, information for the same is given in table below:
• Airport will act as vertices
• Routes will act as edges
• For vertices, each of them have an ID and Airport Name as a property.

ID Airport Name SrcID DestID Distance


1 Ahmedabad 1 2 263.3
2 Surat 2 3 279.4
3 Mumbai 3 1 524.2
Table for Routes and Distances Vertex Table for Airports Edges Table for Routes
ID - Long and Airport Name - String SrcID, DestID and Distance - Long

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 23


Graph Operator
• Big data comes in different shapes and sizes. It can be batch data that needs to be
processed offline, processing large set of records and generating the results and insights
at a later time.
• Or the data can be real-time streams which needs to be processed on the fly and create
the data insights almost instantaneously.
• Apache Spark can be used for processing batch (Spark Core) as well as real-time data
(Spark Streaming).

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 24


Graph Operator
GraphX makes it easier to run analytics on graph data with the built-in operators and
algorithms.
It also allows us to cache and uncache the graph data to avoid recomputation when we
need to call a graph multiple times.
Basically, there are four types of graph operators:
1. Basic
2. Property
3. Structural
4. Join

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 25


Types of Graphs
Graph Operators

Basic Join
numEdges joinVertices
numVertices Property Structural outerJoinVertices
inDegress mapVertices reverse
outDegress mapEdges subgraph
degrees mapTriplets mask
groupEdges

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 26


Thank You.

Wednesday, April 12, 2023| U & P U. Patel Department of Computer Engineering 27

You might also like