Aditya 18cs03 Seminar Report

A
Seminar Report
On
Big Data Analytics
Submitted in partial fulfillment of the requirements for the

award of the degree of
Bachelor of Technology
in
Computer Science Engineering
(Session 2018-2022)
Submitted To Submitted By
Mr. Atul Chaudhary Aditya Rangroo
(Guide) (18EEACS003)
Department of Computer Science and

Engineering
Engineering College, Ajmer
1
Certificate
This is to certify that seminar work entitled “Big Data Analytics” is a Bonafide work carried out
in the seventh semester by “Aditya Rangroo” in partial fulfilment for the award of Bachelor of
Technology in Computer Science and Engineering from Engineering College Ajmer during the
academic year 2021- 2022.
Mr. Atul Chaudhary Mr. Deepak Gupta Dr. Jyoti Gajrani

(Guide) (Seminar coordinator) (H.O.D. CSE&IT)
Declaration
I hereby declare that the work which is being presented in this report entitled “Big Data Analytics”
submitted in partial fulfillment for the award of degree of ‘Bachelor of Technology” in Computer
Science, Bikaner Technical University is an authentic record of my own work carried out under the
guidance of my guide Mr. Atul Chaudhary, Engineering College, Ajmer.
Mr. Atul chaudhary Aditya Rangroo

(guide) (18EEACS003)
Acknowledgement
First of all, I am indebted to the GOD ALMIGHTY for giving me an opportunity to excel in
my efforts to complete this seminar on time.
I am extremely grateful to Dr. Rekha Mehra, Principal, Engineering College Ajmer and Dr.
Jyoti Gajrani, Head of Department, Computer Science and Engineering, for providing all the
required resources for the successful completion of my seminar.
My heartfelt gratitude to my seminar guide Mr. Atul Chaudhary for his valuable suggestions
and guidance in the preparation of the seminar report.
I will be failing in duty if I do not acknowledge with grateful thanks to the authors of the
references and other literatures referred to in this seminar. Last but not the least; I am very
much thankful to my parents who guided me in every step which I took.
Aditya Rangroo
(18EEACS003)
Abstract
The age of big data is now coming. But the traditional data analytics may not be able to handle
such large quantities of data. The question that arises now is, how to develop a high
performance platform to efficiently analyze big data and how to design an appropriate mining
algorithm to find the useful things from big data. To deeply discuss this issue, we will begins
with a brief introduction to data analytics, followed by the discussions of big data analytics. Big
data refers to datasets that are not only big, but also high in variety and velocity, which makes
them difficult to handle using traditional tools and techniques. Due to the rapid growth of such
data, solutions need to be studied and provided in order to handle and extract value and
knowledge from these datasets. Furthermore, decision makers need to be able to gain valuable
insights from such varied and rapidly changing data, ranging from daily transactions to
customer interactions and social network data. Such value can be provided using big data
analytics, which is the application of advanced analytics techniques on big data. This report aims
to analyze some of the different analytics methods and tools which can be applied to big data, as
well as the opportunities provided by the application of big data analytics in various decision
domains.
Table of Contents
S.no Title
1 Introduction
- What is Big Data?
- Types of Big Data
2 Characteristics of Big Data

3 History and Evolution in Big Data
4 Big Data Analytics
5 Big Data Analytics tools and methods
6 Big Data Storage and Management
7 Big Data Analytics Application
8 Conclusion
9 References
7
List of Figures
S.no Title
1 Facebook as an example of big data
2 Characteristics of Big Data
3 History and Evolution of Big Data
4 Big Data Analytics Tools/Techniques
5 Apache Strom
6 Talend
7 CouchDB
8 Spark
9 Splice Machine
10 Plotly
11 HDInsight
12 R Language
13 Skytree
14 Lumify
15 Apache Hadoop
16 Big Data Applications
8
Introduction
What is Big data?
The quantities, characters, or symbols on which operations are performed by a computer, which may
be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or
mechanical recording media is called data .
Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that
is huge in volume and yet growing exponentially with time. In short such data is so large and complex
that none of the traditional data management tools are able to store it or process it efficiently .
Examples Of Big Data : The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly generated in terms of photo
and video uploads, message exchanges, putting comments etc. .
Fig(1) : Facebook as an example of big data

Types of Big Data
Big Data' could be found in three forms:
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data. Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well known in
advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size
of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes .
Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to the size
being huge, un-structured data poses multiple challenges in terms of its processing for deriving value
out of it. A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don't know how to derive value out of it since this data is
in its raw form or unstructured format .
Semi-structured
Semi-structured data can contain both the forms of data. We can see semi- structured data as a
structured in form but it is actually not defined with
e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an
XML file .
Characteristics of Big Data
Big data is data whose scale, distribution, diversity, and/or timeliness require the use of new technical
architectures, analytics, and tools in order to enable insights that unlock new sources of business value.
Three main features characterize big data: volume, variety, and velocity, or the three V’s. The volume
of the data is its size, and how enormous it is. Velocity refers to the rate with which data is changing,
or how often it is created. Finally, variety includes the different formats and types of data, as well as
the different kinds of uses and ways of analyzing the data .
Fig : Characteristics of Big Data

Data volume is the primary attribute of big data. Big data can be quantified by size in TBs or PBs, as
well as even the number of records, transactions, tables, or files .
Additionally, one of the things that make big data really big is that it’s coming from a greater variety
of sources than ever before, including logs, clickstreams, and social media. Using these sources for
analytics means that common structured data is now joined by unstructured data, such as text and
human language, and semi-structured data, such as extensible Markup Language (XML) or Rich Site
Summary (RSS) feeds. There’s also data, which is hard to categorize since it comes from audio, video,
and other devices.
Furthermore, multi-dimensional data can be drawn from a data warehouse to add historic context to big
data. Thus, with big data, variety is just as big as volume. Moreover, big data can be described by its
velocity or speed. This is basically the frequency of data generation or the frequency of data delivery .
The leading edge of big data is streaming data, which is collected in real- time from the website. Some
researchers and organizations have discussed the addition of a fourth V, or veracity. Veracity focuses
on the quality of the data. This characterizes big data quality as good, bad, or undefined due to data
inconsistency, incompleteness, ambiguity, latency, deception, and approximations .
There are four major characteristics of big data they are as follows:
(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one
characteristic which needs to be considered while dealing with Big Data.
(ii) Variety – The next aspect of Big Data is its variety.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by most of
the applications.
Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also
being considered in the analysis applications. This variety of unstructured data poses certain issues for
storage, mining and analyzing data.
(iii) Velocity– The term 'velocity' refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data. Big Data Velocity
deals with the speed at which data flows in from sources like business processes, application logs,
networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive
and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.
History and Evolution of Big Data Analytics
The concept of big data has been around for years; most organizations now understand that if they
capture all the data that streams into their businesses, they can apply analytics and get significant value
from it. But even in the 1950s, decades before anyone uttered the term “big data,” businesses were
using basic analytics (essentially numbers in a spreadsheet that were manually examined) to uncover
insights and trends.
The new benefits that big data analytics brings to the table, however, are speed and efficiency.
Whereas a few years ago a business would have gathered information, run analytics and unearthed
information that could
be used for future decisions, today that business can identify insights for immediate decisions. The
ability to work faster – and stay agile – gives organizations a competitive edge they didn’t have before.
Fig(3) : History and Evolution of Big Data

Big Data Analytics
The term “Big Data” has recently been applied to datasets that grow so large that they become
awkward to work with using traditional database management systems. They are data sets whose size
is beyond the ability of commonly used software tools and storage systems to capture, store, manage,
as well as process the data within a tolerable elapsed time.
Big data sizes are constantly increasing, currently ranging from a few dozen terabytes (TB) to many
petabytes (PB) of data in a single data set.
Consequently, some of the difficulties related to big data include capture, storage, search, sharing,
analytics, and visualizing. Today, enterprises are exploring large volumes of highly detailed data so as
to discover facts they didn’t know before.
Hence, big data analytics is where advanced analytic techniques are applied on big data sets. Analytics
based on large data samples reveals and leverages business change. However, the larger the set of data,
the more difficult it becomes to manage.
In this section, we will start by discussing the characteristics of big data, as well as its importance.
Naturally, business benefit can commonly be derived from analyzing larger and more complex data
sets that require real time or near-real time capabilities; however, this leads to a need for new data
architectures, analytical methods, and tools.
Therefore, the successive section will elaborate the big data analytics tools and methods starting with
the big data storage and management, then moving on to the big data analytic processing. It then
concludes with some of the various big data analyses which have grown in usage with big data.
Big Data Analytics tools and methods
With the evolution of technology and the increased multitudes of data flowing in and out of
organizations daily, there has become a need for faster and more efficient ways of analyzing such data.
Having piles of data on hand is no longer enough to make efficient decisions at the right time.
Such data sets can no longer be easily analyzed with traditional data management and analysis
techniques and infrastructures. Therefore, there arises a need for new tools and methods specialized for
big data analytics, as well as the required architectures for storing and managing such data.
Accordingly, the emergence of big data has an effect on everything from the data itself and its
collection, to the processing, to the final extracted decisions.
Consequently, proposed the Big – Data, Analytics, and Decisions (B- DAD) framework which
incorporates the big data analytics tools and methods into the decision making process. The framework
maps the different big data storage, management, and processing tools, analytics tools and methods,
and visualization and evaluation tools to the different phases of the decision making process.
Fig(4) : Big Data Analytics Tools/Techniques [5]
Hence, the changes associated with big data analytics are reflected in three main areas: big data storage
and architecture, data and analytics processing, and, finally, the big data analyses which can be applied
for knowledge discovery and informed decision making.
Each area will be further discussed in this section. However, since big data is still evolving as an
important field of research, and new findings and tools are constantly developing, this section is not
exhaustive of all the possibilities, and focuses on providing a general idea, rather than a list of all
potential opportunities and technologies.
With the rise in the volume of BigData and tremendous growth in cloud computing, the cutting edge
BigData Analytics Tools have become the key to achieve a meaningful analysis of data. In this article,
we shall discuss the top BigData Analytics tools and their key features.
Big Data Analytics Tools
Apache Storm: Apache Storm is an open-source and free big data computation system. Apache Storm
also an Apache product with a real- time framework for data stream processing for the supports any
programming language. It offers distributed real-time, fault-tolerant processing system. With real-time
computation capabilities. Storm scheduler manages workload with multiple nodes with reference to
topology configuration and works well with The Hadoop Distributed File System (HDFS).
Fig(5) Apache Strom
Features:
 It is benchmarked as processing one million 100 byte messages per second per node
 Storm assure for unit of data will be processed at minimum once.
 Great horizontal scalability
 Built-in fault-tolerance
 Auto-restart on crashes
 Clojure-written
 Works with Direct Acyclic Graph(DAG) topology
 Output files are in JSON format
 It has multiple use cases – real-time analytics, log processing, ETL, continuous computation,
distributed RPC, machine learning.
Talend: Talend is a big data tool that simplifies and automates big data integration. Its graphical
wizard generates native code. It also allows big data integration, master data management and checks
data quality.
Fig(6) Talend
Features:
 Streamlines ETL and ELT for Big data.

 Accomplish the speed and scale of spark.
 Accelerates your move to real-time.
 Handles multiple data sources.
 Provides numerous connectors under one roof, which in turn will allow you to customize the
solution as per your need.
 Talend Big Data Platform simplifies using MapReduce and Spark by generating native code
 Smarter data quality with machine learning and natural language processing
 Agile DevOps to speed up big data projects
 Streamline all the DevOps processes
Apache CouchDB: It is an open-source, cross-platform, document- oriented NoSQL database that

aims at ease of use and holding a scalable architecture. It is written in concurrency-oriented language
Erlang. Couch DB stores data in JSON documents that can be accessed web or query using JavaScript.
It offers distributed scaling with fault-tolerant storage. It allows accessing data by defining the Couch
Replication Protocol[6].
Fig(7) CouchDB
Features:
 CouchDB is a single-node database that works like any other database

 It allows running a single logical database server on any number of servers
 It makes use of the ubiquitous HTTP protocol and JSON data format
 document insertion, updates, retrieval, and deletion is quite easy
 JavaScript Object Notation (JSON) format can be translatable across different languages
Apache Spark: Spark is also a very popular and open-source big data analytics tool. Spark has over
80 high-level operators for making easy build parallel apps. It is used at a wide range of organizations
to process large datasets[6].
Fig(8) Spark
Features:
 It helps to run an application in Hadoop cluster, up to 100 times faster in memory, and ten
times faster on disk
 It offers lighting Fast Processing
 Support for Sophisticated Analytics
 Ability to Integrate with Hadoop and existing Hadoop Data
 It provides built-in APIs in Java, Scala, or Python
 Spark provides the in-memory data processing capabilities, which is way faster than disk
processing leveraged by MapReduce.
Splice Machine: It is a big data analytics tool. Their architecture is portable across public clouds
such as AWS, Azure, and Google.
Fig(9) Splice Machine
Features:
 It can dynamically scale from a few to thousands of nodes to enable applications at every scale
 The Splice Machine optimizer automatically evaluates every query to the distributed HBase
regions
 Reduce management, deploy faster, and reduce risk
 Consume fast streaming data, develop, test and deploy machine learning models
Plotly: Plotly is an analytics tool that lets users create charts and dashboards to share online.
Fig(10) Plotly
Features:
 Easily turn any data into eye-catching and informative graphics

 It provides audited industries with fine-grained information on data provenance
 Plotly offers unlimited public file hosting through its free
community plan
Azure HDInsight: It is a Spark and Hadoop service in the cloud. It provides big data cloud offerings
in two categories, Standard and Premium. It provides an enterprise-scale cluster for the organization to
run their big data workloads.
Fig(11) HDInsight
Features:
 Reliable analytics with an industry-leading SLA

 It offers enterprise-grade security and monitoring
 Protect data assets and extend on-premises security and governance controls to the cloud
 A high-productivity platform for developers and scientists
 Integration with leading productivity applications
 Deploy Hadoop in the cloud without purchasing new hardware or paying other up-front costs
R: R is a programming language and free software and It’s Compute statistical and graphics. The R
language is popular between statisticians and data miners for developing statistical software and data
analysis. R Language provides a Large Number of statistical tests.
Fig(12) R Language
Features:
 R is mostly used along with JupyteR stack (Julia, Python, R) for enabling wide-scale statistical
analysis and data visualization. Among the 4 widely used Big Data visualization tools, JupyteR
is one of them, 9,000 plus CRAN (Comprehensive R Archive Network) algorithms and
modules allow composing any analytical model running it in a convenient environment,
adjusting it on the go and inspecting the analysis results at once. R language is having as
following:
 R can run inside the SQL server
 R runs on both Windows and Linux servers
 R supports Apache Hadoop and Spark
 R is highly portable
 R easily scales from a single test machine to vast Hadoop data lakes
 Effective data handling and storage facility,
 It provides a suite of operators for calculations on arrays, in particular, matrices,
 It provides a coherent, integrated collection of big data tools for data analysis
 It provides graphical facilities for data analysis which display either on-screen or on hardcopy
Skytree: Skytree is a big data analytics tool that empowers data scientists to build more accurate
models faster. It offers accurate predictive machine learning models that are easy to use.
Fig(13) Skytree
Features:
 Highly Scalable Algorithms

 Artificial Intelligence for Data Scientist.
 It allows data scientists to visualize and understand the logic behind ML decisions
 The easy to adopt GUI or programmatically in Java via. Skytree
 Model Interpretability
 It is designed to solve robust predictive problems with data
preparation capabilities
 Programmatic and GUI Access

Lumify: Lumify is considered a Visualization platform, big data fusion and Analysis tool. It helps
users to discover connections and explore relationships in their data via a suite of analytic options.
Fig(14) Lumify
Features:
 It provides both 2D and 3D graph visualizations with a variety of automatic layouts

 Link analysis between graph entities, integration with mapping systems, geospatial analysis,
multimedia analysis, real-time collaboration through a set of projects or workspaces.
 It comes with specific ingest processing and interface elements for textual content, images, and
videos
 It spaces feature allows you to organize work into a set of projects, or workspaces
 It is built on proven, scalable big data technologies
 Supports the cloud-based environment. Works well with Amazon’s AWS.
Hadoop: The long-standing champion in the field of Big Data processing, well-known for its
capabilities for huge-scale data processing. It has low hardware requirement due to open-source Big
Data framework can run on-prem or in the cloud. The main Hadoop benefits and features are as
follows:
 Hadoop Distributed File System, oriented at working with huge- scale bandwidth – (HDFS)
 A highly configurable model for Big Data processing –
(MapReduce)
 A resource scheduler for Hadoop resource management – (YARN)
 The needed glue for enabling third-party modules to work with Hadoop – (Hadoop Libraries)
Fig(15) Apache Hadoop
It is designed to scale up from Apache Hadoop is a software framework employed for clustered file
system and handling of big data. It processes datasets of big data utilizing the MapReduce
programming model. Hadoop is an open-source framework that is written in Java and it provides
cross-platform support. No doubt, this is the topmost big data tool. Over half of the Fortune 50
companies use Hadoop. Some of the Big names include Amazon Web services, Hortonworks, IBM,
Intel, Microsoft, Facebook, etc. single servers to thousands of machines.
Features:
 Authentication improvements when using HTTP proxy server

 Specification for Hadoop Compatible File system effort
 Support for POSIX-style file system extended attributes
 It offers a robust ecosystem that is well suited to meet the analytical needs of a developer
 It brings Flexibility In Data Processing
 It allows for faster data Processing
Big Data Storage and management
One of the first things organizations have to manage when dealing with big data, is where and how this
data will be stored once it is acquired. The traditional methods of structured data storage and retrieval
include relational databases, data marts, and data warehouses. The data is uploaded to the storage from
operational data stores using Extract, Transform, Load (ETL), or Extract, Load, Transform (ELT),
tools which extract the data from outside sources, transform the data to fit operational needs, and
finally load the data into the database or data warehouse. Thus, the data is cleaned, transformed, and
catalogued before being made available for data mining and online analytical function.
However, the big data environment calls for Magnetic, Agile, Deep (MAD) analysis skills, which differ from
the aspects of a traditional Enterprise Data Warehouse (EDW) environment. First of all, traditional EDW
approaches discourage the incorporation of new data sources until they are cleansed and integrated. Due to
the ubiquity of data nowadays, big data environments need to be magnetic, thus attracting all the data sources,
regardless of the data quality. Furthermore, given the growing numbers of data sources, as well as the
sophistication of the data analyses, big data storage should allow analysts to easily produce and adapt data
rapidly. This requires an agile database, whose logical and physical contents can adapt in sync with rapid
data evolution. Finally, since current data analyses use complex statistical methods, and analysts need to be
able to study enormous datasets by drilling up and down, a big data repository also needs to be deep, and
serve as a sophisticated algorithmic runtime engine.
Accordingly, several solutions, ranging from distributed systems and Massive Parallel Processing
(MPP) databases for providing high query performance and platform scalability, to non-relational or
in-memory databases, have been
used for big data. Non-relational databases, such as Not Only SQL (NoSQL), were developed for
storing and managing unstructured, or non-relational, data. NoSQL databases aim for massive scaling,
data model flexibility, and simplified application development and deployment. Contrary to relational
databases, NoSQL databases separate data management and data storage. Such databases rather focus
on the high-performance scalable data storage, and allow data management tasks to be written in the
application layer instead of having it written in databases specific languages .
On the other hand, in-memory databases manage the data in server memory, thus eliminating disk
input/output (I/O) and enabling real-time responses from the database. Instead of using mechanical
disk drives, it is possible to store the primary database in silicon-based main memory. This results in
orders of magnitude of improvement in the performance, and allows entirely new applications to be
developed. Furthermore, in-memory databases are now being used for advanced analytics on big data,
especially to speed the access to and scoring of analytic models for analysis.
This provides scalability for big data, and speed for discovery analytics. Alternatively, Hadoop is a
framework for performing big data analytics which provides reliability, scalability, and manageability by
providing an implementation for the MapReduce paradigm, which is discussed in the following section,
as well as gluing the storage and analytics together. Hadoop consists of two main components: the HDFS
for the big data storage, and MapReduce for big data analytics [9]. The HDFS storage function provides a
redundant and reliable distributed file system, which is optimized for large files, where a single file is
split into blocks and distributed across cluster nodes. Additionally, the data is protected among the nodes
by a replication mechanism, which ensures availability and reliability despite any node failures.
Big Data Analytics Applications
Fig(16) Big Data Applications
There’s no single technology that encompasses big data analytics. Of course, there’s advanced
analytics that can be applied to big data, but in reality several types of technology work together to
help you get the most value from your information. Here are the biggest players:
Machine Learning: Machine learning, a specific subset of AI that trains a machine how to learn,
makes it possible to quickly and automatically produce models that can analyze bigger, more complex
data and deliver faster, more accurate results – even on a very large scale. And by building precise
models, an organization has a better chance of identifying profitable opportunities – or avoiding
unknown risks.
Data management: Data needs to be high quality and well-governed before it can be reliably
analyzed. With data constantly flowing in and out of an organization, it's important to establish
repeatable processes to build and maintain standards for data quality. Once data is reliable,
organizations Should establish a master data management program that gets the entire enterprise on the
same page.
Data mining: Data mining technology helps you examine large amounts of data to discover patterns in
the data – and this information can be used for further analysis to help answer complex business
questions. With data mining software, you can sift through all the chaotic and repetitive noise in data,
pinpoint what's relevant, use that information to assess likely outcomes, and then accelerate the pace of
making informed decisions.
Hadoop: This open source software framework can store large amounts of data and run applications
on clusters of commodity hardware. It has become a key technology to doing business due to the
constant increase of data volumes and varieties, and its distributed computing model processes big data
fast. An additional benefit is that Hadoop's open source framework is free and uses commodity
hardware to store large quantities of data.
Predictive analytics: Predictive analytics technology uses data, statistical algorithms and machine-
learning techniques to identify the likelihood of future outcomes based on historical data. It's all about
providing a best assessment on what will happen in the future, so organizations can feel more
confident that they're making the best possible business decision. Some of the most common
applications of predictive analytics include fraud detection, risk, operations and marketing.
Text mining: With text mining technology, you can analyze text data from the web, comment fields,
books and other text-based sources to uncover insights you hadn't noticed before. Text mining uses
machine learning or natural language processing technology to comb through documents – emails,
blogs, Twitter feeds, surveys, competitive intelligence and more – to help you analyze large amounts
of information and discover new topics and term relationships.
Conclusion
Big Data Analytics is a security enhancing tool of the future. The amount of information that can be
gathered, organized, and applied to users in a personalized fashion would take a human, days, weeks,
or even months to accomplish. In the capitalistic market such as the United States of America’s,
competition is key. Time cannot be wasted gathering information and making decisions on incidents that
have already taken place. Stopping incidents in their tracks, completing investigative work, and
quarantining threatening sources needs to happen immediately and allow for
administrators/management to make a on the spot decision. With big data analytics, more educated
decisions can be made and focus can remain on business operations moving forward.
References
[1] Nada Elgendy, Ahmed Elraged “Big Data Analytics” Available:

https://www.researchgate.net/publication/264555968_Big_
Data_Analytics_A_Literature_Review_Paper.
[2] Article on Big Data Available :

https://journalofbigdata.springeropen.com/articles/10.1186/ s40537-015-0030-3
[3] Big Data Analytics Overview On :

https://www.sas.com/en_us/insights/analytics/big-data- analytics.html
[4] Article on Big Data : https://www.guru99.com/what-is- big-data.html
[5] Google images : https://www.google.com/imghp?hl=en
[6] Big data Analytics tools and their key feature (An article on edureka) Available on :
https://www.edureka.co/blog/bigdata-analytics-tools-and- features/

Aditya 18cs03 Seminar Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aditya 18cs03 Seminar Report

Uploaded by

Copyright:

Available Formats

A

Submitted in partial fulfillment of the requirements for the

Department of Computer Science and

Mr. Atul Chaudhary Mr. Deepak Gupta Dr. Jyoti Gajrani

Mr. Atul chaudhary Aditya Rangroo

2 Characteristics of Big Data

What is Big data?

Fig(1) : Facebook as an example of big data

Big Data' could be found in three forms:

Fig : Characteristics of Big Data

(ii) Variety – The next aspect of Big Data is its variety.

Fig(3) : History and Evolution of Big Data

Fig(4) : Big Data Analytics Tools/Techniques [5]

Fig(5) Apache Strom

 Streamlines ETL and ELT for Big data.

Apache CouchDB: It is an open-source, cross-platform, document- oriented NoSQL database that

 CouchDB is a single-node database that works like any other database

Fig(9) Splice Machine

 Easily turn any data into eye-catching and informative graphics

 Reliable analytics with an industry-leading SLA

 Highly Scalable Algorithms

 It provides both 2D and 3D graph visualizations with a variety of automatic layouts

Fig(15) Apache Hadoop

 Authentication improvements when using HTTP proxy server

Fig(16) Big Data Applications

[1] Nada Elgendy, Ahmed Elraged “Big Data Analytics” Available:

[2] Article on Big Data Available :

[3] Big Data Analytics Overview On :

[4] Article on Big Data : https://www.guru99.com/what-is- big-data.html

[5] Google images : https://www.google.com/imghp?hl=en

You might also like