You are on page 1of 35

Unit I

Big Data Analytics


Understanding Big Data:
•At a fundamental level, it is just another collection of data that can be
analyzed and utilized for the benefit of the business.
•At the level of business, data generated by business operations, can be
analyzed to generate insights that can help the business to make better
decisions.
•On another level, Big Data is different from traditional data in every
way: space, time, and function.
•The quantity of Big Data is 1000s of times greater than that of
traditional data.
•The speed of data generation and transmission is 1000 times faster.
•The forms and functions of Big Data are 10 times more diverse: from
numbers to text, pictures, audio, videos, web logs, machine data, and more.
The figure shows data usage and growth. As size and
complexity increase, the proportion of unstructured
data types also increase.
Big Data Definitions:
• Big Data is high-volume, high-velocity and/or high-
variety information that requires new forms of
processing for enhanced decision making, insight
discovery and process optimization.
• “A collection of data sets so large or complex that
traditional data processing applications are
inadequate.”
• Big Data refers to data sets whose size is beyond the
ability of typical database software tool to capture,
store, manage and analyze.
Examples for Big Data:
• Social networks and web data, such as Facebook,
Twitter, e-mails, blogs andYouTube.
• Transactions data and Business Processes data,
such as credit card transactions, flight bookings,
medical records,insurance business data etc.
• Machine-generated data, such as machine-to-
machine or Internet of Things data, and the data
from sensors, trackers, web logs and computer
systems log.
• Human-generated data such as biometrics data,
human—machine interaction data.
Capturing Big Data:
Volume
• The name Big Data itself is related to an enormous size.
• Big Data is a vast 'volumes' of data generated from many
sources daily, such as business processes, machines,
social media platforms, networks, human
interactions, and many more.
Variety
• Big Data can be structured, unstructured, and semi-
structured that are being collected from different
sources.
• Data will only be collected from databases and sheets in
the past, but these days the data will comes in array
forms, that are PDFs, Emails, audios, SM posts, photos,
videos, etc.
Velocity
• Velocity creates the speed by which the data is
created in real-time.
• Big data velocity deals with the speed at the data
flows from sources like application logs, business
processes, networks, and social media sites,
sensors, mobile devices, etc.
Veracity
• Veracity means how much the data is reliable. It
has many ways to filter or translate the data.
• Veracity is the process of being able to handle and
manage data efficiently.
Benefitting's of Big Data :
• Benefits of Big Data in IT Sectors.
• Benefits of Big Data in Business:
• Benefits of Big Data in Enterprise
• Benefits of Big Data in Other Areas
Benefits of Big Data in IT Sectors:
• Many old IT companies are fully dependent on big data in order
to modernize their outdated mainframes by identifying the root
causes of failures and issues in real-time .
• Many organizations are replacing their traditional system with
open-source platforms like Hadoop (High Availability
Distributed Object Oriented Platform)Hadoop is an open source
framework that is used to efficiently store and process large
datasets ranging in size from gigabytes to petabytes of data.
• With the help of big data technologies IT companies are able to
process third-party data fast, which is often hard to understand
at once by having inherently high power and parallelized
working of platforms.
Benefits of Big Data in Business:
• Data quality has a direct impact on business process efficiency.
In purchase to pay process, poor quality vendor data can cause
missing purchase contracts or pricing information which can
lead to delays in procuring vital goods.
• The systematic analysis of data or data profiling is used to assess
the overall health of the data which leads to proper business
decisions in accordance with the present situation because
sometimes inaccurate data results in incorrect management,
which means business decisions are based on incorrect
information.
• Benefits of Big Data in Enterprise :
• Big data might allow a company to collect trillions or
billions of real-time data.
• The speed at which data is updated using big data
technologies allows enterprises to more quickly and
accurately respond to customer demands.
• Big data can help enterprises to act more nimbly allowing
them to adapt to changes faster than their competitors.
• Benefits of Big Data in Other Areas:
• Big Data technologies are used to predict the ‘buy’ and ‘sell’ decisions
made on the shares of different companies for the customers.
• Hospitals are analyzing medical data and records to predict those
patients that are likely to seek re-admission within a few months of
discharge. The hospital can avoid costly stays for patients in the
hospital.
• Search-Engine retrieves lots of Data from different databases in a
fraction of a second using big data technologies.
For example, Google uses the MapReduce algorithm to search for
a given query.
• Financial Services organizations are using big data for data mining
about customer interactions to slice and dice their users into finely
tuned segments, this will help in creating increasingly relevant and
sophisticated offers.
• Insurance companies are using Big Data analysis to see which home
insurance applications can be immediately processed, and which ones
need a validating in-person visit from an agent.
Big Data Management
• A corporation can use big data management to
analyze a lot of corporate data to understand its
customers better, create new products, and make
crucial financial decisions.
1. Big data management is the organization,
administration and control of large quantities of
structured and unstructured data.

2. Big data management aims to ensure a high level of


data quality and accessibility for applications in
business intelligence and Big Data Analytics.

3. The benefits offered by Big Data can be categorized


into three domains:
• technology
• financial
• competitive advantage.
Organizing Big Data:
•Good organization depends upon the purpose of the
organization.

•Given huge quantities of data, it would be desirable to


organize the data to speed up the search process for
finding a specific desired thing in the entire data.

•The cost of storing and processing the data, too, would


be a major driver for the choice of an organizing pattern.

•Given fast and variable speed of data, it would be


desirable to create a scalable number of ingest points.
•Given wide variety in form factors, data need to be
stored and analyzed differently.

•Videos need to be stored separately and used for


serving in a streaming mode.

•Given different quality of data, various data sources


may need to be ranked and prioritized before serving
them to the audience.

•For example, the quality of a webpage and its data


may be evaluated using its PageRank value
Analyzing Big Data:
• Big Data can be utilized to visualize a flowing or a
static situation. These are called analyzing Big Data
in motion or Big Data at rest.

Big Data can be analyzed in two ways:


• First way is to process the incoming stream of data
in real time for quick and effective statistics about
the data.
• The second way is to store and structure batches of
data and apply standard analytical techniques on
for generating insights.
• A million points of data can be plotted in a graph
and offer a view of the density of data.
• However, plotting a million points on the graph
may produce a blurred image which may hide,
rather than highlight the distinctions.
• In such a case, binning the data would help, or
selecting the top few frequent categories may
deliver greater insights.
• Data binning, also called data discrete binning or
data bucketing, is a data pre-processing technique
used to reduce the effects of minor observation
errors. (Noisy Data)
Technology Challenges for Big Data:
Big data challenges include the storing, analyzing the
extremely large and fast-growing data.
• Sharing and Accessing Data
• Privacy and Security
• Analytical Challenges
• Technical challenges
• Fault tolerance
• Scalability
• Sharing and Accessing Data:
• Perhaps the most frequent challenge in big data efforts is the
inaccessibility of data sets from external sources.
• Sharing data can cause substantial challenges.
• It include the need for inter and intra- institutional legal
documents.
• Accessing data from public repositories leads to multiple
difficulties.
• Privacy and Security:
• It is another most important challenge with Big Data. This
challenge includes sensitive, conceptual, technical as well as
legal significance.
• Most of the organizations are unable to maintain regular
checks due to large amounts of data generation. However, it
should be necessary to perform security checks and
• Analytical Challenges:
• These large amount of data on which these type of analysis
is to be done can be structured (organized data), semi-
structured (Semi-organized data) or unstructured
(unorganized data).
• There are two techniques through which decision making
can be done:
• Either incorporate massive data volumes in the analysis.
• Or determine upfront which Big data is relevant.
• Technical challenges: Quality of data:
• When there is a collection of a large amount of data and
storage of this data, it comes at a cost. Big companies,
business leaders and IT leaders always want large data
storage.
• For better results and conclusions, Big data rather than
• Fault tolerance:
• Fault tolerance is another technical challenge and fault
tolerance computing is extremely hard, involving intricate
algorithms.
• Nowadays some of the new technologies like cloud computing
and big data always intended that whenever the failure occurs
the damage done should be within the acceptable threshold
that is the whole task should not begin from the scratch.
• Scalability:
• Big data projects can grow and evolve rapidly.
• It leads to various challenges like how to run and execute
various jobs so that goal of each workload can be achieved
cost-effectively.
• It also requires dealing with the system failures in an efficient
manner. This leads to a big question again that what kinds of
storage devices are to be used.
Big Data Sources
Big Data Sources
A significant part of big data is generated from three
primary resources:
• Machine data
• Social data
• Transactional data.

In addition to this, companies also generate data


internally through direct customer engagement. This
data is usually stored in the company’s firewall. It is
then imported externally into the management and
analytics system.
1. Machine Data
• Machine data is automatically generated, either as a
response to a specific event or a fixed schedule.
• It means all the information is developed from
multiple sources such as smart sensors, SIEM logs,
medical devices and wearable's, road cameras, IoT
devices, satellites, desktops, mobile phones,
industrial machinery, etc. These sources enable
companies to track consumer behavior.
• Data extracted from machine sources grow
exponentially along with the changing external
environment of the market.
2. Social Data
• It is derived from social media platforms through
tweets, retweets, likes, video uploads, and
comments shared on Facebook, Instagram, Twitter,
YouTube, Linked In etc.
• Social media data spreads like wildfire and reaches
an extensive audience base. It gauges important
insights regarding customer behavior, their
sentiment regarding products and services.
• This is why brands capitalizing on social media
channels can build a strong connection with their
online demographic.
3. Transactional Data
• Transactional data is information gathered via online and offline
transactions during different points of sale.
• The data includes vital details like transaction time, location, products
purchased, product prices, payment methods, discounts/coupons
used, and other relevant quantifiable information related to
transactions.

The sources of transactional data include:


• Payment orders
• Invoices
• Storage records and
• E-receipts

Transactional data is a key source of business intelligence. The unique


characteristic of transactional data is its time print. Since all transactional
data include a time print, it is time-sensitive and highly volatile.
Applications of Big Data
• The term Big Data is referred to as large amount of
complex and unprocessed data.
• Big data is a spreading technology used in each business
sector.

The following are the major sectors:


• Travel and Tourism
• Financial and banking sector
• Healthcare
• Telecommunication and media
• Government and Military
• E-commerce
Travel and Tourism
• Travel and tourism are the users of Big Data.
• It enables us to forecast travel facilities
requirements at multiple locations, improve
business through dynamic pricing, and many more.

Financial and banking sector


• The financial and banking sectors use big data
technology extensively.
• Big data analytics help banks and customer behavior
on the basis of investment patterns, shopping
trends, motivation to invest, and inputs that are
obtained from personal or financial backgrounds.
Healthcare
• Big data has started making a massive
difference in the healthcare sector, with the
help of predictive analytics, medical professionals,
and health care personnel.
• It can produce personalized healthcare and solo patients
also.
Telecommunication and media
• Telecommunications and the multimedia
sector are the main users of Big Data.
• There are zettabytes to be generated every day and
handling large-scale data that require big data
technologies.
Government and Military
• The government and military also used technology at
high rates.
• In the military, a fighter plane requires to process petabytes of data.
• Government agencies use Big Data and run many agencies,
managing utilities, dealing with traffic jams, and the effect of crime
like hacking and online fraud.
• Aadhar Card: The government has a record of 1.21 billion citizens.

E-commerce
• It maintains relationships with customers that
is essential for the e-commerce industry.
• E-commerce websites have many marketing ideas to retail
merchandise customers, manage transactions, and implement
better strategies of innovative ideas to improve businesses with
Big data.
• Amazon: Amazon is a tremendous e-commerce website dealing
Social Media
• Social Media is the largest data
generator.
• The statistics have shown that around 500+ terabytes of
fresh data generated from social media daily, particularly
on Facebook.
• The data mainly contains videos, photos, message
exchanges, etc.
• The data stored is in terabytes (TB); it takes a lot of time for
processing. Big Data is a solution to the problem.

You might also like