Professional Documents
Culture Documents
know
Big Data Technologies, The Buzz-word which you get to hear much in the recent
days. In this article, We shall discuss the groundbreaking technologies which
made Big Data spread its branches to reach greater heights.
We need Big Data Processing Technologies to Analyse this huge amount of Real-
time data and come up with Conclusions and Predictions to reduce the risks in the
future.
Firstly, The Operational Big Data is all about the normal day to day data that we
generate. This could be the Online Transactions, Social Media, or the data from
a Particular Organisation etc. You can even consider this to be a kind of Raw Data
which is used to feed the Analytical Big Data Technologies.
Online ticket bookings, which includes your Rail tickets, Flight tickets, movie
tickets etc.
Online shopping which is your Amazon, Flipkart, Walmart, Snap deal and
many more.
Data from social media sites like Facebook, Instagram, what’s app and a lot
more.
The employee details of any Multinational Company.
Analytical Big Data is like the advanced version of Big Data Technologies. It is a
little complex than the Operational Big Data. In short, Analytical big data is where the
actual performance part comes into the picture and the crucial real-time business
decisions are made by analyzing the Operational Big Data.
Stock marketing
Carrying out the Space missions where every single bit of information is
crucial.
Weather forecast information.
Medical fields where a particular patients health status can be monitored.
Let us have a look at the top Big Data Technologies being used in the IT Industries.
Data Storage
Data Mining
Data Analytics
Data Visualization
Now let us deal with the technologies falling under each of these categories with
their facts and capabilities, along with the companies which are using them.
Data Storage
Hadoop
MongoDB
Rainstor
Hunk lets you access data in remote Hadoop Clusters through virtual indexes and
lets you use the Splunk Search Processing Language to analyse your data.
With Hunk, you can Report and Visualize large amounts from your Hadoop and
NoSQL data sources.
Data Mining
Presto
Companies Using Presto:
Rapid Miner
Companies Using RapidMiner:
Elasticsearch
Companies Using Elasticsearch:
With this, we can now move into Big Data Technologies used in Data Analytics.
Data Analytics
Kafka
Publisher
Subscriber
Consumer
This is similar to a Message Queue or an Enterprise Messaging System.
Companies Using Kafka:
Splunk
Companies Using Splunk:
KNIME
KNIME allows users to visually create Data Flows, Selectively execute some or All
Analysis steps, and Inspect the Results, Models, and Interactive views. KNIME is
written in Java and based on Eclipse and makes use of its Extension mechanism to
add Plugins providing Additional Functionality.
Companies Using KNIME:
S
park
R-Language
hared Ledger: Here we can append the Distributed System of records across a
Business network.
Smart Contract: Business terms are embedded in the transaction Database and
Executed with transactions.
Privacy: Ensuring appropriate Visibility, Transactions are Secure, Authenticated
and Verifiable
Consensus: All parties in a Business network agree to network verified
transactions.
Developed by: Bitcoin
Written in: JavaScript, C++, Python
Current stable version: Blockchain 4.0
Data Visualization
Tableau
Tableau is a Powerful and Fastest growing Data Visualization tool used in
the Business Intelligence Industry. Data analysis is very fast with Tableau and the
Visualizations created are in the form of Dashboards and Worksheets.
o
Developed by: TableAU 2013 May 17th
Written in: JAVA, C++, Python, C
Current stable version: TableAU 8.2
Plotly
Mainly used to make creating Graphs faster and more efficient. API libraries
for Python, R, MATLAB, Node.js, Julia, and Arduino and a REST API. Plotly can
also be used to style Interactive Graphs with Jupyter notebook.
Developed by: Plotly in the year 2012
Written in: JavaScript
Current stable version: Plotly 1.47.4
Beam
Apache Beam provides a Portable API layer for building sophisticated Parallel-Data
Processing Pipelines that may be executed across a diversity of Execution Engines
or Runners.
o
Developed by: Apache Software Foundation in
the year 2016 June 15th
Written in: JAVA, Python
Current stable version: Apache Beam 0.1.0
incubating.
Doc
ker
Docker is a tool designed to make it easier to Create, Deploy, and Run applications
by using Containers. Containers allow a developer to Package up an application
with all of the parts it needs, such as Libraries and other Dependencies, and Ship it
all out as One Package.
Airflow
Kubernetes
Kubernetes is a Vendor-Agnostic Cluster and Container Management tool, Open
Sourced by Google in 2014. It provides a platform for Automation, Deployment,
Scaling, and Operations of Application Containers across Clusters of Hosts.
With this, we come to an end of this article. I hope I have thrown some light on to
your knowledge on Big Data and its Technologies.