Chapter 1

Multimedia Big Data in OTT Platform
Introduction
What is Big Data?
The definition of big data that contains greater variety, in increasing volumes and with more velocity.
Big data mostly comes from data mining and arrives in multiple formats. This is also known as three
V’s. These three V’s are key to understand how we can measure big data and how to store the data.
Big Data is larger, more complex data, especially from new data that are coming from the different
sources. These data are called the data sets and these data are so major in size that’s why the
normal data processing software cannot handle it. Big data requires a set of technique and
technology with new forms of integration to reveal insight from data sets that are diverse, complex
and large scale.
The three V’s of big data are :-
1. Volume :- Big data is about volume. Volume of data can reach the remarkable height. It’s
estimated that around 2 quintillion bytes of data is created per day, which is 300 times from
year 2005. For some organization, this might be tens of terabytes of data in storage devices
and on servers. All this data is used to company to design and perform actions. All in all
volume is the data’s from all the world that we are collected.
2. Velocity :- Velocity is the speed rate at which data received from the sources across the
world. Velocity essentially measure how fast the data is receive. Some data comes in the real
time, however some data will come in batches. This is the important phase for companies
need their data to flow quickly, this help to make a best business decision for companies. It
directly refers to how quickly data is generated and how quickly that data moves.
3. Variety :- Variety refers to the many type of data that are available. Data was once collected
from one place and delivered in one format. These datatypes are in the structured and put
neatly in a database system. As soon as growth of big data, data comes in new unstructured
datatypes. Unstructured data is data that is manage and comes in different files or formats.
Although this data is very useful for the people’s it does create more work and require more
illustrate to the incoming data.
Big Data Technologies

To chose the best big data technologies, it is important to review and compare its feature. According
to this, we will highlight the top Big Data technologies that are ready to transform the technical field.
Recently many big data technologies have impactful to the market and IT industries. They can be
divided into four broad categories, that is
 Data Storage
 Data Mining
 Data Analytics
 Data Visualization
Now, we will discuss each one of them in brief.
Data Storage
This type of big data technology includes infrastructure that allow data to be stored and managed
and designed to handle the large amount of data. The most used technology for this purpose is:
1. Apache Hadoop:
Apache Hadoop is an java based frame work for storing and processing big data, developed by
apache software foundation. Apache Hadoop’s library is a framework that allow to process the
large data sets across lots of computer using programming models. Hadoop is basically work on
a HDFS(Hadoop distributed file system), and Map reduce. HDFS is used to store all the data in
one node amd it is replicate on other node in case of system failure. Map reduce is a built in
processing engine that used to split large data score in multiple nodes to ensure that system is
load performance and increase system speed.
2. Mongo DB:
Mongo Db is an open source document oriented database that is designed to store large scale
data. Mongo db does not store or retrieve data in the form of tables it is called as NoSQL(not
only SQL) database because the storage and retrieval of data in Mongodb are not in table. It is
most used in the language with python, ruby and javascript. A Mongo Db database stores data in
JSON document which is easily provide and used in our native programming language.
Data Mining
Data mining is the process of finding useful data from the raw data and analyzing it. In many case
raw data is very large and constantly using that makes data finding impossible without a special
technique. The widely used big data for data mining:
3. Presto:
It is developed by facebook, presto ia an open source SQL query engine that is used for
query analysis on large amount of data. This search engine is support fast analytics queries
on data of different sizes from gigabytes to petabytes. It does not rely on Mapreduce
technique and is capable of retriving data very quickly within seconds to minutes. With this
technology multiple data sources can be queried in once.
4. RapidMiner:
Rapidminer is a advanced open source data mining tool. It is mostly used for data science
that their data scientist and big data analysts analyze their data quickly. It is now moved on
Hadoop framework with its inbuilt rapidminer radoop. It gives the access of of loading and
analyse the any type of data either it is structured or unstructured data.
Data Analytics
Big data analytics involves analysing the data that we are collected from the raw data from data
mining process and set in order that will help in decision making process. In this, we can extract
more valuable information from the data we collected from raw data. Listed below are a few types
of data analytics technologies.
5. Apache Spark:
It is an open source analytics engine that supports big data processing. The spark platform
allow us the execution of programs 100 times faster than Hadoop. Spark has suitable with
development interfaces are like Java, Python, Scala, R for working in large datasets. It was
introduced by apache software foundation to boost up Hadoop computation.
6. Splunk:
Splunk is advanced software platform that searches, analyze and visualizes machine
generated data from system, websites, etc. in splunk real time data are captured into a
searchable repository and it will be helpful when we used to create a report. Splunk provides
analytical report including attractive graph, chart, and tables.
Data Visualization
Data visualization is the graphical representation of information and data by using charts, graph to
provide an easy way of viewing. This technique is used to understand easily a large amount of
information in seconds. Below are some top technologies for data visualization.
7. Tableau:
Tableau is the fastest growing tool for visualizing it make it easy for users to create graph,
chart for visualizing and analysing data. In this platform data is easily and fastly analysed.
You don’t need any programming language to get started even those without any
experience can create visualization with tableau. It shows real time data sharing of data in
the form of dashboard, sheets, etc.
8. Plotly:
Plotly is a python library that facilitates visualization of big data. This tool makes it possible
to create more effective graph more quickly. Plotly has many advantages, including user
friendly, reduced costs and many more. It is like drawing on paper you can draw anything
you want. Plotly enables full control over what is going to plot when compared to other
visualization tools. Plotly gives a wide range of graphs, and charts that is statistical charts,
financial charts and etc.
9. Clot from rapidminer:
Clot is the thing that are used in rapidminer in bigdata. This tools makes help in data
handling this is also used in the

Chapter 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1

Uploaded by

Copyright:

Available Formats

Multimedia Big Data in OTT Platform

The three V’s of big data are :-

Big Data Technologies

Now, we will discuss each one of them in brief.

You might also like