You are on page 1of 21

Big Data Analytics Seminar Report 2020-21

ABSTRACT

Big data is a new driver of the world economic and societal changes. The world’s
data collection is reaching a tipping point for major technological changes that can
bring new ways in decision making, managing our health, cities, finance and
education. While the data complexities are increasing including data’s volume,
variety, velocity and veracity, the real impact hinges on our ability to uncover the
`value’ in the data through Big Data Analytics technologies. Big Data Analytics
poses a grand challenge on the design of highly scalable algorithms and systems to
integrate the data and uncover large hidden values from datasets that are diverse,
complex, and of a massive scale. Potential breakthroughs include new algorithms,
methodologies, systems and applications in Big Data Analytics that discover useful
and hidden knowledge from the Big Data efficiently and effectively.
Big data analytics must also be team effort cutting across academic institutions,
government and society and industry, and by researchers from multiple disciplines
including computer science and engineering, health, data science and social and
policy areas.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

CONTENTS
 INTRODUCTION
 What is Data, Big Data, Big Data Analytics
 Benefits using Big Data Analytics
 History and Evolution of Big Data Analytics
 Why is Big Data Analytics Important
 Types of Big Data
 Characteristics of Big Data
 Applications of Big Data
 Advantages and Disadvantages of Big Data
 Tools used in Big Data Analytics
 The sources of Big Data
 Impact of Big Data on Business
 How it works and key technologies
 Big Data Analytics uses and challenges
 Lifecycle of Big Data Analytics
 Different types of Big Data Analytics
 CONCLUSION
 REFERENCES

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

INTRODUCTION

Big data analytics is the process of examining large data sets containing a variety of
data types – i.e., big data -- to uncover hidden patterns, unknown correlations, market
trends, customer preferences and other useful business information. The analytical
findings can lead to more effective marketing, new revenue opportunities, better
customer service, improved operational efficiency, competitive advantages over
rival organizations and other business benefits.
The primary goal of big data analytics is to help companies make more informed
business decisions by enabling data scientists, predictive modelers and other
analytics professionals to analyze large volumes of transaction data, as well as other
forms of data that may be untapped by conventional business intelligence(BI)
programs. That could include Web server logs and Internet clickstream data, social
media content and social network activity reports, text from customer emails and
survey responses, mobile-phone call detail records and machine data captured by
sensors connected to the Internet of Things.
With the launch of Web 2.0, a large amount of valuable business data started being
generated beyond the organization by consumers and, generally, by web users. This
data can be structured or unstructured, and can come from multiple sources such as
social networks, products viewed in virtual stores, information read by sensors, GPS
signals from mobile devices, IP addresses, cookies, bar codes, etc.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

What is Data?
The quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical signals and
recorded on magnetic, optical, or mechanical recording media.

What is Big Data?


Big Data is also data but with a huge size. Big Data is a term used to describe a
collection of data that is huge in volume and yet growing exponentially with time.
In short such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.

Data also exists in different formats, like structured data, semi-structured data, and
unstructured data. For example, in a regular Excel sheet, data is classified as
structured data—with a definite format. In contrast, emails fall under semi-
structured, and your pictures and videos fall under unstructured data. All this data
combined makes up Big Data.

What is Big Data Analytics?


Big Data analytics is a process used to extract meaningful insights, such as hidden
patterns, unknown correlations, market trends, and customer preferences. Big Data
analytics provides various advantages—it can be used for better decision making,
preventing fraudulent activities, among other things.

Benefits of using big data analytics:

 Uncover the need for new features or products


 Understand the full customer journey
 More effective marketing
 More effective customer support
 Greater responsiveness to market trends

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

History and evolution of big data analytics


The concept of big data has been around for years; most organizations now
understand that if they capture all the data that streams into their businesses, they
can apply analytics and get significant value from it. But even in the 1950s,
decades before anyone uttered the term “big data,” businesses were using basic
analytics (essentially numbers in a spreadsheet that were manually examined) to
uncover insights and trends.

The new benefits that big data analytics brings to the table, however, are speed and
efficiency. Whereas a few years ago a business would have gathered information,
run analytics and unearthed information that could be used for future decisions,
today that business can identify insights for immediate decisions. The ability to
work faster – and stay agile – gives organizations a competitive edge they didn’t
have before.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Why is big data analytics important?


Big data analytics helps organizations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient
operations, higher profits and happier customers. In the report Big Data in Big
Companies, IIA Director of Research Tom Davenport interviewed more than 50
businesses to understand how they used big data. He found they got value in the
following ways:

1. Cost reduction. Big data technologies such as Hadoop and cloud-based analytics
bring significant cost advantages when it comes to storing large amounts of data –
plus they can identify more efficient ways of doing business.
2. Faster, better decision making. With the speed of Hadoop and in-memory
analytics, combined with the ability to analyze new sources of data, businesses are
able to analyze information immediately – and make decisions based on what
they’ve learned.
3. New products and services. With the ability to gauge customer needs and
satisfaction through analytics comes the power to give customers what they want.
Davenport points out that with big data analytics, more companies are creating new
products to meet customers’ needs.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Types of Big-Data
Big Data is generally categorized into three different varieties. They are as shown
below:

 Structured Data
 Semi-Structured Data
 Unstructured Data

 Structured Data owns a dedicated data model, It also has a well-defined


structure, it follows a consistent order and it is designed in such a way that it
can be easily accessed and used by a person or a computer. Structured data is
usually stored in well-defined columns and also Databases.

Example: Database Management Systems (DBMS)

 Semi-Structured Data can be considered as another form of Structured Data.


It inherits a few properties of Structured Data, but the major part of this kind
of data fails to have a definite structure and also, it does not obey the formal
structure of data models such as an RDBMS.

Example: Comma Separated Values (CSV) File.

 Unstructured Data is completely a different type of which neither has a


structure nor obeys to follow the formal structural rules of data models. It does
not even have a consistent format and it found to be varying all the time. But,
rarely it may have information related to data and time.

Example: Audio Files, Images etc.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Characteristics of Big Data

Volume

Volume refers to the unimaginable amounts of information generated every second


from social media, cell phones, cars, credit cards, M2M sensors, images, video, and
whatnot. We are currently using distributed systems, to store data
in several locations and brought together by a software Framework like Hadoop.

Facebook alone can generate about billion messages, 4.5 billion times that the “like”
button is recorded, and over 350 million new posts are uploaded each day. Such a
huge amount of data can only be handled by Big Data Technologies.

Variety

As Discussed before, Big Data is generated in multiple varieties. Compared to the


traditional data like phone numbers and addresses, the latest trend of data is in the
form of photos, videos, and audios and many more, making about 80% of the data
to be completely unstructured. Structured data is just the tip of the iceberg.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Veracity

Veracity basically means the degree of reliability that the data has to offer. Since a
major part of the data is unstructured and irrelevant, Big Data needs to find an
alternate way to filter them or to translate them out as the data is crucial in business
developments.

Value

Value is the major issue that we need to concentrate on. It is not just the amount of
data that we store or process. It is actually the amount of valuable, reliable and
trustworthy data that needs to be stored, processed, analyzed to find insights.

Velocity

Last but never least, Velocity plays a major role compared to the others, there is no
point in investing so much to end up waiting for the data. So, the major aspect of
Big Data is to provide data on demand and at a faster pace.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Applications of Big Data


Big Data is considered the most valuable and powerful fuel that can run the massive
IT industries of the 21st Century. Big Data is being the most wide-spread technology
that is being used in almost every business sector. Let us now check out a few as
mentioned below.

Travel and Tourism is one of the biggest users of Big Data


Technology. It has enabled us to predict the requirements for travel
facilities in many places, improving business through dynamic pricing
and many more.

Financial and Banking Sectors extensively uses Big Data


Technology. Big data analytics can aid banks in understanding
customer behavior based on the inputs received from their investment
patterns, shopping trends, motivation to invest and personal or financial
backgrounds.

Big Data has already started to create a huge difference in


the healthcare sector. With the help of predictive analytics, medical
professionals and Health Care Personnel are now able to provide
personalized healthcare services to individual patients.

Telecommunication and Multimedia sector is one of the primary users


of Big Data. There are zettabytes of getting generated every day and to
handle such huge data would need nothing other than Big Data
Technologies.

Government and Military also use Big Data Technology at a higher


rate. You can consider the amount of data Government generates on its
records and in the military, a normal fighter jet plane requires to process
petabytes of data during its flight.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Benefits or advantages of Big Data


Following are the benefits or advantages of Big Data:
➨Big data analysis derives innovative solutions. Big data analysis helps in
understanding and targeting customers. It helps in optimizing business processes.
➨It helps in improving science and research.
➨It improves healthcare and public health with availability of record of patients.
➨It helps in financial tradings, sports, polling, security/law enforcement etc.
➨Any one can access vast information via surveys and deliver answer of any
query.
➨Every second additions are made.
➨One platform carry unlimited information.

Drawbacks or disadvantages of Big Data


Following are the drawbacks or disadvantages of Big Data:
➨Traditional storage can cost lot of money to store big data.
➨Lots of big data is unstructured.
➨Big data analysis violates principles of privacy.
➨It can be used for manipulation of customer records.
➨It may increase social stratification.
➨Big data analysis is not useful in short run. It needs to be analyzed for longer
duration to leverage its benefits.
➨Big data analysis results are misleading sometimes.
➨Speedy updates in big data can mismatch real figures.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Tools Used in Big Data Analytics

Here are some of the tools used in Big Data analytics:

 Hadoop - helps in storing and analyzing data


 Mongo DB - used on datasets that change frequently
 Talend - used for data integration and management
 Cassandra - a distributed database used to handle chunks of data
 Spark - used for real-time processing and analyzing large amounts of data
 STORM - an open-source real-time computational system
 Kafka - a distributed streaming platform that is used for fault-tolerant storage

The Sources of Big Data


The bulk of big data generated comes from three primary sources: social data,
machine data and transactional data. In addition, companies need to make the
distinction between data which is generated internally, that is to say it resides behind
a company’s firewall, and externally data generated which needs to be imported into
a system.

Whether data is unstructured or structured is also an important factor. Unstructured


data does not have a pre-defined data model and therefore requires more resources
to make sense of it.

The three primary sources of Big Data

Social data comes from the Likes, Tweets & Retweets, Comments, Video Uploads,
and general media that are uploaded and shared via the world’s favorite social media
platforms. This kind of data provides invaluable insights into consumer behavior and
sentiment and can be enormously influential in marketing analytics. The public web
is another good source of social data, and tools like Google Trends can be used to
good effect to increase the volume of big data.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Machine data is defined as information which is generated by industrial equipment,


sensors that are installed in machinery, and even web logs which track user behavior.
This type of data is expected to grow exponentially as the internet of things grows
ever more pervasive and expands around the world. Sensors such as medical devices,
smart meters, road cameras, satellites, games and the rapidly growing Internet Of
Things will deliver high velocity, value, volume and variety of data in the very near
future.

Transactional data is generated from all the daily transactions that take place both
online and offline. Invoices, payment orders, storage records, delivery receipts – all
are characterized as transactional data yet data alone is almost meaningless, and most
organizations struggle to make sense of the data that they are generating and how it
can be put to good use.

Impact of Big Data on Business


With the help of big data, companies aim at offering improved customer services,
which can help increase profit. Enhanced customer experience is the primary goal
of most companies. Other goals include better target marketing, cost reduction, and
improved efficiency of existing processes.

Big data technologies help companies store large volumes of data while enabling
significant cost benefits. Such technologies include cloud-based analytics and
Hadoop. They help businesses analyze information and improve decision-making.
Furthermore, data breaches pose the need for enhanced security, which technology
application can solve.

Big data has the potential to bring social and economic benefits to businesses.
Therefore, several government agencies have formulated policies for promoting the
development of big data.

Over the years, big data analytics has evolved with the adoption of agile technologies
and the increase of focus on advanced analytics. There is no single technology that
encompasses big data analytics. Several technologies work together to help

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

companies procure optimum value from the information. Among them are machine
learning, artificial intelligence, quantum computing, Hadoop, in-memory analytics,
and predictive analytics. These technology trends are likely to spur the demand for
big data analytics over the forecast period.

Earlier, big data was mainly deployed by businesses that could afford the
technologies and channels used to gather and analyze data. Nowadays, both large
and small business enterprises are increasingly relying on big data for intelligent
business insights. Thereby, they boost the demand for big data.

Enterprises from all industries contemplate ways of how big data can be used in
business. Its uses are poised to improve productivity, identify customer needs, offer
a competitive advantage, and scope for sustainable economic development.

How it works and key technologies


There’s no single technology that encompasses big data analytics. Of course, there’s
advanced analytics that can be applied to big data, but in reality several types of
technology work together to help you get the most value from your information.
Here are the biggest players:

Machine Learning. Machine learning, a specific subset of AI that trains a machine


how to learn, makes it possible to quickly and automatically produce models that
can analyze bigger, more complex data and deliver faster, more accurate results –
even on a very large scale. And by building precise models, an organization has a
better chance of identifying profitable opportunities – or avoiding unknown risks.

Data management. Data needs to be high quality and well-governed before it can
be reliably analyzed. With data constantly flowing in and out of an organization, it's
important to establish repeatable processes to build and maintain standards for data
quality. Once data is reliable, organizations should establish a master data
management program that gets the entire enterprise on the same page.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Data mining. Data mining technology helps you examine large amounts of data to
discover patterns in the data – and this information can be used for further analysis
to help answer complex business questions. With data mining software, you can sift
through all the chaotic and repetitive noise in data, pinpoint what's relevant, use that
information to assess likely outcomes, and then accelerate the pace of making
informed decisions.

Hadoop. This open source software framework can store large amounts of data and
run applications on clusters of commodity hardware. It has become a key technology
to doing business due to the constant increase of data volumes and varieties, and its
distributed computing model processes big data fast. An additional benefit is that
Hadoop's open source framework is free and uses commodity hardware to store large
quantities of data.

In-memory analytics. By analyzing data from system memory (instead of from


your hard disk drive), you can derive immediate insights from your data and act on
them quickly. This technology is able to remove data prep and analytical processing
latencies to test new scenarios and create models; it's not only an easy way for
organizations to stay agile and make better business decisions, it also enables them
to run iterative and interactive analytics scenarios.

Predictive analytics. Predictive analytics technology uses data, statistical


algorithms and machine-learning techniques to identify the likelihood of future
outcomes based on historical data. It's all about providing a best assessment on what
will happen in the future, so organizations can feel more confident that they're
making the best possible business decision. Some of the most common applications
of predictive analytics include fraud detection, risk, operations and marketing.

Text mining. With text mining technology, you can analyze text data from the web,
comment fields, books and other text-based sources to uncover insights you hadn't
noticed before. Text mining uses machine learning or natural language
processing technology to comb through documents – emails, blogs, Twitter feeds,
surveys, competitive intelligence and more – to help you analyze large amounts of
information and discover new topics and term relationships.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Big data analytics uses and challenges

Big data analytics applications often include data from both internal systems and
external sources, such as weather data or demographic data on consumers compiled
by third-party information services providers. In addition, streaming analytics
applications are becoming common in big data environments as users look to
perform real-time analytics on data fed into Hadoop systems through stream
processing engines, such as Spark, Flink and Storm.

Early big data systems were mostly deployed on premises, particularly in large
organizations that collected, organized and analyzed massive amounts of data. But
cloud platform vendors, such as Amazon Web Services (AWS) and Microsoft, have
made it easier to set up and manage Hadoop clusters in the cloud. The same goes for
Hadoop suppliers such as Cloudera-Hortonworks, which supports the distribution of
the big data framework on the AWS and Microsoft Azure clouds. Users can now
spin up clusters in the cloud, run them for as long as they need and then take them
offline with usage-based pricing that doesn't require ongoing software licenses.

Big data has become increasingly beneficial in supply chain analytics. Big supply
chain analytics utilizes big data and quantitative methods to enhance decision
making processes across the supply chain. Specifically, big supply chain analytics
expands datasets for increased analysis that goes beyond the traditional internal data
found on enterprise resource planning (ERP) and supply chain management (SCM)
systems. Also, big supply chain analytics implements highly effective statistical
methods on new and existing data sources. The insights gathered facilitate better
informed and more effective decisions that benefit and improve the supply chain.

Potential pitfalls of big data analytics initiatives include a lack of internal analytics
skills and the high cost of hiring experienced data scientists and data engineers to
fill the gaps.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Lifecycle of Big Data Analytics

Now, let’s review the lifecycle of Big Data analytics:

Stage 1 - Business case evaluation - The Big Data analytics lifecycle begins with a
business case, which defines the reason and goal behind the analysis.

Stage 2 - Identification of data - Here, a broad variety of data sources are identified.

Stage 3 - Data filtering - All of the identified data from the previous stage is filtered
here to remove corrupt data.

Stage 4 - Data extraction - Data that is not compatible with the tool is extracted and
then transformed into a compatible form.

Stage 5 - Data aggregation - In this stage, data with the same fields across different
datasets are integrated.

Stage 6 - Data analysis - Data is evaluated using analytical and statistical tools to
discover useful information.

Stage 7 - Visualization of data - With tools like Tableau, Power BI, and QlikView,
Big Data analysts can produce graphic visualizations of the analysis.

Stage 8 - Final analysis result - This is the last step of the Big Data analytics lifecycle,
where the final results of the analysis are made available to business stakeholders
who will take action.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Different Types of Big Data Analytics

There are four types of Big Data analytics:

 Descriptive Analytics

This summarizes past data into a form that people can easily read. This helps in
creating reports, like a company’s revenue, profit, sales, and so on. Also, it helps
in the tabulation of social media metrics.

Use Case: The Dow Chemical Company analyzed its past data to increase
facility utilization across its office and lab space. Using descriptive analytics,
Dow was able to identify underutilized space. This space consolidation helped
the company save nearly US $4 million annually.

 Diagnostic Analytics

This is done to understand what caused a problem in the first place. Techniques
like drill-down, data mining, and data recovery are all examples. Organizations
use diagnostic analytics because they provide an in-depth insight into a particular
problem.

Use Case: An ecommerce company’s report shows that their sales have gone
down, although customers are adding products to their carts. This can be due to
various reasons like the form didn’t load correctly, the shipping fee is too high, or
there are not enough payment options available. This is where you can use
diagnostic analytics to find the reason.

 Predictive Analytics

This type of analytics looks into the historical and present data to make
predictions of the future. The predictive analytics uses data mining, AI, and
machine learning to analyze current data and make predictions about the future.
It works on predicting customer trends, market trends, and so on.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

Use Case: PayPal determines what kind of precautions they have to take to protect
their clients against fraudulent transactions. Using predictive analytics, the
company uses all the historical payment data and user behavior data and builds
an algorithm that predicts fraudulent activities.

 Prescriptive Analytics

This type of analytics prescribes the solution to a particular problem.


Perspective analytics works with both descriptive and predictive analytics. Most
of the time, it relies on AI and machine learning.

Use Case: Prescriptive analytics can be used to maximize an airline’s profit.


This type of analytics is used to build an algorithm that will automatically adjust
the flight fares based on numerous factors, including customer demand, weather,
destination, holiday seasons, and oil prices.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

CONCLUSION

Big Data Analytics is a security enhancing tool of the future. The amount of
information that can be gathered, organized, and applied to users in a personalized
fashion would take a human, days, weeks, or even months to accomplish. In the
capitalistic market such as the United States of America’s, competition is key. Time
cannot be wasted gathering information and making decisions on incidents that have
already taken place. Stopping incidents in their tracks, completing investigative
work, and quarantining threatening sources needs to happen immediately and allow
for administrators/management to make a on the spot decision. With big data
analytics, more educated decisions can be made and focus can remain on business
operations moving forward.

The availability of Big Data, low-cost commodity hardware, and new information
management and analytic software have produced a unique moment in the history
of data analysis. The convergence of these trends means that we have the capabilities
required to analyze astonishing data sets quickly and cost-effectively for the first
time in history. These capabilities are neither theoretical nor trivial. They represent
a genuine leap forward and a clear opportunity to realize enormous gains in terms of
efficiency, productivity, revenue, and profitability. The Age of Big Data is here, and
these are truly revolutionary times if both business and technology professionals
continue to work together and deliver on the promise.

Dept. of Computer Engineering GPC Kasaragod


Big Data Analytics Seminar Report 2020-21

REFERENCES

 www.123seminarsonly.com
 www.wikipedia.com
 www.edureka.co

Dept. of Computer Engineering GPC Kasaragod

You might also like