Professional Documents
Culture Documents
org
Unit – 2: Big data Analytics
What is and isn’t big data analytics? Why hype around big data analytics? Classification of analytics, top
challenges facing big data, importance of big data analytics, technologies needed to meet challenges of big
data.
…………………………………………………………………………………………………………………………….
Big data analytics is time-sensitive, used to take faster decision from huge amount of diversified data and
used to find the deep and richer insights of the business. It is one kind of technology enabled analytics.
2. CLASSIFICATION OF ANALYTICS
Analytics is the discovery and communication of meaningful patterns in data. Especially, valuable in areas
rich with recorded information, analytics relies on the simultaneous application of statistics, computer
programming, and operation research to qualify performance. Analytics often favors data visualization to
communicate insight.
Firms may commonly apply analytics to business data, to describe, predict, and improve business
performance. Especially, areas within include predictive analytics, enterprise decision management, etc. Since
analytics can require extensive computation(because of big data), the algorithms and software used to
analytics harness the most current methods in computer science.
In a nutshell, analytics is the scientific process of transforming data into insight for making better decisions.
The goal of Data Analytics is to get actionable insights resulting in smarter decisions and better business
outcomes.
It is critical to design and built a data warehouse or Business Intelligence(BI) architecture that provides a
flexible, multi-faceted analytical ecosystem, optimized for efficient ingestion and analysis of large and diverse
data sets.
2 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified
www.anuupdates.org
1. Predictive (forecasting)
4. Diagnostic analytics
Predictive Analytics: Predictive analytics turn the data into valuable, actionable information. predictive
analytics uses data to determine the probable outcome of an event or a likelihood of a situation occurring.
Predictive analytics holds a variety of statistical techniques from modeling, machine, learning, data mining,
and game theory that analyze current and historical facts to make predictions about a future
event. Techniques that are used for predictive analytics are:
• Linear Regression
• Data Mining
• Predictive modeling
• Transaction profiling
Descriptive Analytics: Descriptive analytics looks at data and analyze past event for insight as to how to
approach future events. It looks at the past performance and understands the performance by mining
historical data to understand the cause of success or failure in the past. Almost all management reporting such
as sales, marketing, operations, and finance uses this type of analysis.
The descriptive model quantifies relationships in data in a way that is often used to classify customers or
prospects into groups. Unlike a predictive model that focuses on predicting the behavior of a single
customer, Descriptive analytics identifies many different relationships between customer and product.
Common examples of Descriptive analytics are company reports that provide historic reviews like:
• Data Queries
• Reports
• Descriptive Statistics
• Data dashboard
Prescriptive Analytics: Prescriptive Analytics automatically synthesize big data, mathematical science, business
rule, and machine learning to make a prediction and then suggests a decision option to take advantage of
the prediction.
Prescriptive analytics goes beyond predicting future outcomes by also suggesting action benefit from the
predictions and showing the decision maker the implication of each decision option. Prescriptive Analytics
not only anticipates what will happen and when to happen but also why it will happen. Further, Prescriptive
Analytics can suggest decision options on how to take advantage of a future opportunity or mitigate a future
risk and illustrate the implication of each decision option.
For example, Prescriptive Analytics can benefit healthcare strategic planning by using analytics to leverage
3 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified
www.anuupdates.org
operational and usage data combined with data of external factors such as economic data, population
demography, etc.
Diagnostic Analytics: In this analysis, we generally use historical data over other data to answer any question
or for the solution of any problem. We try to find any dependency and pattern in the historical data of the
particular problem.
For example, companies go for this analysis because it gives a great insight into a problem, and they also
keep detailed information about their disposal otherwise data collection may turn out individual for every
problem and it will be very time-consuming. Common techniques used for Diagnostic Analytics are:
• Data discovery
• Data mining
• Correlations
Companies fail in their Big Data initiatives due to insufficient understanding. Employees may not know what
data is, its storage, processing, importance, and sources. Data professionals may know what is going on, but
others may not have a clear picture.
For example, if employees do not understand the importance of data storage, they might not keep the
backup of sensitive data. They might not use databases properly for storage. As a result, when this important
data is required, it cannot be retrieved easily.
Solution
Big Data workshops and seminars must be held at companies for everyone. Basic training programs must be
arranged for all the employees who are handling data regularly and are a part of the Big Data projects. A
basic understanding of data concepts must be inculcated by all levels of the organization.
One of the most pressing challenges of Big Data is storing all these huge sets of data properly. The amount of
data being stored in data centers and databases of companies is increasing rapidly. As these data sets grow
exponentially with time, it gets extremely difficult to handle.
Most of the data is unstructured and comes from documents, videos, audios, text files and other sources. This
means that you cannot find them in databases. This can pose huge Big Data analytics challenges and must be
resolved as soon as possible, or it can delay the growth of the company.
Solution
In order to handle these large data sets, companies are opting for modern techniques, such as compression,
tiering, and deduplication. Compression is used for reducing the number of bits in the data, thus reducing its
overall size. Deduplication is the process of removing duplicate and unwanted data from a data set.
Companies are also opting for Big Data tools, such as Hadoop, NoSQL and other technologies.
Companies often get confused while selecting the best tool for Big Data analysis and storage. Is HBase or
Cassandra the best technology for data storage? Is Hadoop MapReduce good enough or will Spark be a
better option for data analytics and storage?
These questions bother companies and sometimes they are unable to find the answers. They end up making
poor decisions and selecting inappropriate technology. As a result, money, time, efforts and work hours are
wasted.
Solution
The best way to go about it is to seek professional help. You can either hire experienced professionals who
know much more about these tools. Another way is to go for Big Data consulting. Here, consultants will give
a recommendation of the best tools, based on your company’s scenario. Based on their advice, you can
work out a strategy and then select the best tool for you.
To run these modern technologies and Big Data tools, companies need skilled data professionals. These
professionals will include data scientists, data analysts and data engineers who are experienced in working
with the tools and making sense out of huge data sets.
Companies face a problem of lack of Big Data professionals. This is because data handling tools have evolved
rapidly, but in most cases, the professionals have not. Actionable steps need to be taken in order to bridge
this gap.
Solution
Companies are investing more money in the recruitment of skilled professionals. They also have to offer
training programs to the existing staff to get the most out of them.
Another important step taken by organizations is the purchase of data analytics solutions that are powered
by artificial intelligence/machine learning. These tools can be run by professionals who are not data science
experts but have basic knowledge. This step helps companies to save a lot of money for recruitment.
5. Securing data
Securing these huge sets of data is one of the daunting challenges of Big Data. Often companies are so busy
in understanding, storing and analyzing their data sets that they push data security for later stages. But, this is
not a smart move as unprotected data repositories can become breeding grounds for malicious hackers.
Companies can lose up to $3.7 million for a stolen record or a data breach.
Solution
Companies are recruiting more cybersecurity professionals to protect their data. Other steps taken for
securing data include:
• Data encryption
• Data segregation
5 © www.anuupdates.org Prepared by D.Venkata Reddy M.Tech(Ph.D), UGC NET, AP SET Qualified
www.anuupdates.org
• Identity and access control
Data in an organization comes from a variety of sources, such as social media pages, ERP applications,
customer logs, financial reports, e-mails, presentations and reports created by employees. Combining all this
data to prepare reports is a challenging task.
This is an area often neglected by firms. But, data integration is crucial for analysis, reporting and business
intelligence, so it has to be perfect.
Solution
Companies have to solve their data integration problems by purchasing the right tools. Some of the best data
integration tools are mentioned below:
• ArcESB
• IBM InfoSphere
• Xplenty
• Informatica PowerCenter
• CloverDX
• Microsoft SQL
• QlikView
In order to put Big Data to the best use, companies have to start doing things differently. Addressing these
Big Data challenges as soon as possible is crucial. This means hiring better staff, changing the management,
reviewing existing business policies and the technologies being used. To enhance decision making, they can
hire a Chief Data Officer – a step that is taken by many of the fortune 500 companies.
Big Data challenges are there in every industry and are very common. Here are some of the challenges of
conventional systems in big data and their solutions.
• Predictive Analysis can be used to find trends that were previously classified.
• To create a data transfer and interchange framework to give the patient individualised treatment.
• To create an appropriate technology powered by AI for combining data from several sources.
Solution
Utilising the information gleaned from the patient’s records, the transmission of data and accessibility were
developed to offer the patient individualised treatment. AI can store all medical records in the same place. It
can also increase the rate of accurate diagnosis.
• Text Analysis
The General Health Records (GHR) database, compiled by gathering medical reports, is utilised to develop
the algorithm. These reports are then digitalised so that the analysis can be considered.
Genomic data analysis thoroughly explains the connections among various genetic tags, alterations, and
states. It has the potential to significantly aid in developing many genetic medicines to treat diseases.
• While “points of access and exit” are frequently guarded, your system’s internal security may not be.
Solution –
• Centralised Management
Centralised key management is more efficient than distributed or application-specific key management.
Security keys and audit logs can be accessed from a single point in centralised management systems.
Companies handling sensitive data need reliable key management systems.
Basic network security tools include user access control. Big data systems can suffer a great deal from
improper access control measures. Role-based settings and policies are the foundation of a robust user
control policy. With policy-driven access control, complex levels of user control, such as multiple
administrator settings, are automatically managed to prevent insider threats.
• Encryption
Several big data encryption tools can help in handling large volumes of data. This is the reason why
companies encrypt their data, both machine-generated and manual
Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown
correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can
be used for better decision making, preventing fraudulent activities, among other things.
Take the music streaming platform Spotify for example. The company has nearly 96 million users that generate
a tremendous amount of data every day. Through this information, the cloud-based platform automatically
generates suggested songs—through a smart recommendation engine—based on likes, shares, search history,
and more. What enables this is the techniques, tools, and frameworks that are a result of Big Data analytics.
If you are a Spotify user, then you must have come across the top recommendation section, which is based on
your likes, past history, and other things. Utilizing a recommendation engine that leverages data filtering tools
that collect data and then filter it using algorithms works. This is what Spotify does.
Organizations can use big data analytics systems and software to make data-driven decisions that can improve
business-related outcomes. The benefits may include more effective marketing, new revenue opportunities,
customer personalization and improved operational efficiency. With an effective strategy, these benefits can
provide competitive advantages over rivals.
• Spark - used for real-time processing and analyzing large amounts of data