You are on page 1of 26

DATA SCIENCE

TOOLS

GROUP 3
KDU | FOC| DSBA
W.A.C.Imasha | M.V.D.Nimsliu | B.K.T.Dhananjana |
What is
ggplot2 ?
• ggplot2 is an advanced
data visualization package
for the R programming
language.
ggplot2
Plot = Data + Aesthetics + Geometry
Grammar of Graphics
• Data : A data frame

• Aesthetic : Indicate X and Y variables


Control colors, size, shape of points.....

• Geometry : Corresponds to the type of graphics


Tidyverse
ggplot2
ggplot2
ggplot2
Choropleths
Cartograms
Hexbins
DataRobot
DataRobot was founded by in 2012

DataRobot is the premier platform for automated machine learning


R

Spark python
Mlib

DataRobot

Open
H2O source
libraries
DataRobot offers

● Automated machine learning (ML)


● Automated time series
● Machine learning and operations (MLops)
● Adaptive data preparation
Lenovo-Retail demand in brazil

United airlines-predict which


passengers might gate check bags
Matplotlib
• Matplotlib is a comprehensive library
for creating static,animated and
interactive visualizations in Python.
• It was originally written by John D.
Hunter in 2003.
• It is a useful plotting library for the
Python programming.
Comparison with MATLAB
● Pyplot is a Matplotlib module which provides a MATLAB-like interface. Matplotlib is
designed to be as usable as MATLAB, with the ability to use Python, and the
advantage of being free and open-source.
● Several advanced plot designs can be done with Matplotlib.
examples:

Histogram Scatter Plot 3D Plot

Line Plot
Polar Plot
Image Plot
Toolkits used in Matplotlib
●Mapping toolkits
Basemap
Cartopy
●General Toolkits
Mplot3D
Axesgrid
Mpl Data Cursor Mplotlib
GTK Tools Cartopy
Excel Tools
Natgrid
●High level plotting
Seaborn
Holoviews
Ggplot
Prettyplotlib
AxesGrid
Holoviews
Matplotlib
Introduction of Apache Hadoop
• Hadoop is an open-source application
framework which is a part of the Apache suite
of application.
• Hadoop was created by Doug Cutting and
Mike Cafarella in 2005.
• It is primarily used for data analysis.
Low-Cost
Data

What is Hadoop use HDFC


Archive
Integrate
with Data

for ? Warehous
e

Internet of
MapReduce
Uses things

Hadoop
Data Lake
Components

Discovery
and
Analysis
Advantages and Disadvantages
• Advantages • Disadvantages

1. Open source 1. Issue with small files


2. Hadoop 2. Vulnerable by nature
3. Cost – Effective 3. Processing overhead
4. Fast 4. Iterative processing
5. Data Handling 5. Security
6. Scalable 6. Supports only batch processing
7. Flexible
THANK YOU !

You might also like