You are on page 1of 77

Elective –I Fundamentals of Big

Data

Haripriya V, Asst.Professor,Dept.of CS&IT 1


Module -1

Introduction to Big Data

Haripriya V, Asst.Professor,Dept.of CS&IT 2


Haripriya V, Asst.Professor,Dept.of CS&IT 3
How much data ? How it matters?

 Data Size matters……


 How it matters……………….
 Like this………..

Haripriya V, Asst.Professor,Dept.of CS&IT 4


bit (b) 0 or 1 1/8 of a byte
byte (B) 8 bits 1 byte
kilobyte (KB) 10001 bytes 1,000 bytes
megabyte (MB) 10002 bytes 1,000,000 bytes
gigabyte (GB) 10003 bytes 1,000,000,000 bytes
1,000,000,000,000
terabyte (TB) 10004 bytes
bytes
1,000,000,000,000,000
petabyte (PB) 10005 bytes
bytes
1,000,000,000,000,000
exabyte (EB) 1000 bytes
6
,000 bytes
1,000,000,000,000,000
zettabyte (ZB) 10007 bytes
,000,000 bytes
1,000,000,000,000,000
yottabyte (YB) 1000 bytes
8
,000,000,000 bytes
Haripriya V, Asst.Professor,Dept.of CS&IT 5
 But Where and in which companies…….?
 Every Where……. Like in……

Haripriya V, Asst.Professor,Dept.of CS&IT 6


Haripriya V, Asst.Professor,Dept.of CS&IT 7
 Where in real time…..?

Haripriya V, Asst.Professor,Dept.of CS&IT 8


 Where in real time…..?

Haripriya V, Asst.Professor,Dept.of CS&IT 9


 Where in real time…..?

Haripriya V, Asst.Professor,Dept.of CS&IT 10


 Asia’s largest and world’s third largest data centre in Bengaluru
 Where in real time…..?

Haripriya V, Asst.Professor,Dept.of CS&IT 11


Simple to start

What is the maximum file size you have dealt


so far?
◦ Movies/Files/Streaming video that you have used?
◦ What have you observed?
What is the maximum download speed you
get?
Simple computation
◦ How much time to just transfer.

Haripriya V, Asst.Professor,Dept.of CS&IT 12


Careers in
Analytics

Haripriya V, Asst.Professor,Dept.of CS&IT 13


Career Options in Analytics

No matter what your educational background or aspirations,


you have a fulfilling career in one of the many fields of Business Analytics.

FIELDS IN MIS NON-PREDICTIVE PREDICTIVE MACHINE


ANALYTICS REPORTING ANALYTICS ANALYTICS LEARNING

• Data management • Segmentation • Probability models • Neural networks


• Data Exploration • Customer profiling • Classification and • Multi-layer
• MIS and report • Portfolio Analysis regression trees Perceptron
creation • Trend Analysis • Time series • Geospatial models
KEY • Automation of • Forecasting models • Associative rule
FEATURES reports learning
• Inductive logic
programming

• BCom • BCom • B.E • B.E


TARGET • B.E • B.E • BTech • BTech
AUDIENCE • BTech • BTech • MSc (Statistics) • MSc (Statistics)
• MBAs • MBAs • MBAs
• MCA • MCA

MIS Analyst Data Scientist


Data Analyst Statistician
INDICATIVE Advanced Analytics – Team Manager
JOB ROLES Strategy Analyst
Market / Global Research Analyst
Cost Analyst
Analytics Manager
Career Path and Indicative Salaries (CTC)

CAREER TRAJECTORY
Skills require to succeed in the Industry
• Strong analytical and • Understanding complex data
critical thinking skills & tools for Analytics
• Statistics • Soft Skills & Communication
15+ Director • Predictive Analytics • Business Understanding
Yrs • SQL Knowledge

10-15
VP
Average Salaries
Yrs

6-10
AVP Director Rs. 15-20 Lakhs
Yrs

AVP Rs. 15-20 Lakhs


4-6Yrs Manager

Manager Rs. 9-15 Lakhs


2-4 Senior Analyst
Yrs
Senior
Analyst Rs. 6-8 Lakhs
0-2 Analyst
Yrs
Analyst Rs 4-6 Lakhs
Careers
Top Companies Hiring for
Business Analytics

Rs. 15-20 Lakhs

Rs. 9-15
Lakhs

Haripriya V, Asst.Professor,Dept.of CS&IT 16


#What is Big Data?
What is Data?
The quantities, characters, or symbols on which operations are
performed by a computer, which may be stored and transmitted in
the form of electrical signals and recorded on magnetic, optical, or
mechanical recording media.
 Big Data is a term used for a collection of data sets that are large
and complex, which is difficult to store and process using available
database management tools or traditional data processing
applications.
 The challenge includes capturing, curating, storing, searching,
sharing, transferring, analyzing and visualization of this data.
 It is the capability to manage a huge volume of disparate data, at the
right speed, and within the right time frame to allow real-time
analysis and reaction.
Haripriya V, Asst.Professor,Dept.of CS&IT 17
Big Data EveryWhere!
Lots of data is being collected
and warehoused
◦ Web data, e-commerce
◦ purchases at department/
grocery stores
◦ Bank/Credit Card
transactions
◦ Social Network

Haripriya V, Asst.Professor,Dept.of CS&IT 18


Examples Of Big Data
The New York Stock Exchange generates about one terabyte of
new trade data per day.

Social Media : The statistics show that 500+terabytes of new data


get ingested into the databases of social media site Facebook,
every day.
 A single Jet engine can generate 10+terabytes of data in 30
minutes of flight time. With many thousand flights per day,
generation of data reaches up to many Petabytes.
Haripriya V, Asst.Professor,Dept.of CS&IT 19
Walmart handles more than 1 million customer
transactions every hour.
230+ millions of tweets are created every day.
More than 5 billion people are calling, texting,
tweeting and browsing on mobile phones worldwide.
YouTube users upload 48 hours of new video every
minute of the day.

Haripriya V, Asst.Professor,Dept.of CS&IT 20


#Why Big Data Analytics?
Why is Big Data Analytics important?
Big data analytics helps organizations harness
their data and use it to identify new
opportunities. That, in turn, leads to smarter
business moves, more efficient operations,
higher profits and happier customers.
# Importance of Big Data
◦ Reduction in cost
◦ Development of new products
◦ Smart decision-making
 Cost reduction. Big data technologies such as Hadoop and cloud-
based analytics bring significant cost advantages when it comes to
storing large amounts of data – plus they can identify more
efficient ways of doing business.
 Faster, better decision making. With the speed of Hadoop and
in-memory analytics, combined with the ability to analyze new
sources of data, businesses are able to analyze information
immediately – and make decisions based on what they’ve learned.
 New products and services. With the ability to gauge customer
needs and satisfaction through analytics comes the power to give
customers what they want. Davenport points out that with big data
analytics, more companies are creating new products to meet
customers’ needs.
Haripriya V, Asst.Professor,Dept.of CS&IT 22
Business-related task:
◦ Determining root causes of failures, issues and
defects in near-real time.
◦ Generating coupons at the point of sale based
on the customer’s buying habits.
◦ Recalculating entire risk portfolios in minutes.
◦ Detecting fraudulent behavior before it affects
your organization.

Haripriya V, Asst.Professor,Dept.of CS&IT 23


#Sources of Data Explosion
 There are many sources that predict exponential data growth
toward 2020 and beyond.

 Figure: The Big Data explosion


 The advent of the Internet and the World Wide Web has generated
exponential growth in the global user community—users with
ever-expanding access to computing power and bandwidth.
 The interaction of these users with Internet applications has
resulted in unprecedented levels of data and transaction volumes.
Haripriya V, Asst.Professor,Dept.of CS&IT 24
 The shift to online advertising supported by the likes of Google,
Yahoo, and others is a key driver in the data boom we are seeing today.
 The overall expansion of the worldwide economy has spurred massive
data growth for traditional commerce (e.g., increased airline travel,
international purchases, online products, etc.).
 The core social networks (e.g., Facebook, Twitter, LinkedIn, and now
Google+), by their very nature, have generated massive new ways for
people to communicate and interact, resulting in correspondingly large
data sets and transaction volume.
 Many specialized social networks have also arisen—everything from
match-making sites to special interest groups, and even “buy-sell”
applications that have generated their own micro-economies.
 An entirely new breed of social network applications has been
spawned, leveraging the inter-connection of social network users in
fascinating ways, driving exponential growth in application volume,
again with huge transaction volumes and data sizes (sometimes
virtually overnight success stories).

Haripriya V, Asst.Professor,Dept.of CS&IT 25


 Web- and advertising-analytics applications abound,
crawling and analyzing virtually every aspect of the user
interaction described above, again resulting in massive data
sets with intense database access needs.
 An entirely new breed of chatter trend analytics applications
have emerged, analyzing things like Twitter tweets,
Facebook chats, and so on, requiring massive levels of data
storage and access.
 Last, the world has gone mobile. In fact, in
burgeoning(rapid growth) economies and established
countries alike, smart phones and tablets are by far the most
readily available, high-growth, and commonly used
communication vehicle for much of the world’s population,
generating a nearly incompressible stream of data,
transactions, application interaction, and messaging volume
(with no end in sight).
Haripriya V, Asst.Professor,Dept.of CS&IT 26
#Types Of Big Data
 Structured
 Unstructured
 Semi-structured
 Structured
◦ Any data that can be stored, accessed and processed in the
form of fixed format is termed as a 'structured' data. 
◦ It refers to highly organized information that can be readily
and seamlessly stored and accessed from a database by simple
search engine algorithms. 
◦ For instance, the employee table in a company database
will be structured as the employee details, their job
positions, their salaries, etc., will be present in an organized
manner.
Haripriya V, Asst.Professor,Dept.of CS&IT 27
Eg: Structured data

Haripriya V, Asst.Professor,Dept.of CS&IT 28


Unstructured
◦ Any data with unknown form or the structure is
classified as unstructured data.
◦ In addition to the size being huge, un-structured
data poses multiple challenges in terms of its
processing for deriving value out of it.
◦ A typical example of unstructured data is a
heterogeneous data source containing a
combination of simple text files, images, videos
etc.

Haripriya V, Asst.Professor,Dept.of CS&IT 29


Eg: Unstructured data

Haripriya V, Asst.Professor,Dept.of CS&IT 30


Semi-structured
◦ Semi-structured data pertains to the data containing
both the formats mentioned above, that is,
structured and unstructured data.
◦ To be precise, it refers to the data that although
has not been classified under a particular repository
(database), yet contains vital information or tags
that segregate individual elements within the data.
Examples Of Semi-structured Data
◦ Personal data stored in an XML file-

Haripriya V, Asst.Professor,Dept.of CS&IT 31


Eg. Semi structured data

Haripriya V, Asst.Professor,Dept.of CS&IT 32


Relational Data
(Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
◦ Social Network, Semantic Web (RDF), …

Streaming Data
◦ You can only scan the data once
Haripriya V, Asst.Professor,Dept.of CS&IT 33
#Different V’s of Big Data

Haripriya V, Asst.Professor,Dept.of CS&IT 34


Big data spans three dimensions: Volume, velocity
and Variety
 Volume: Volume refers to the ‘amount of data’,
which is growing day by day at a very fast pace.
 The size of data generated by humans, machines and
their interactions on social media itself is massive.
 Researchers have predicted that 40 Zettabytes
(40,000 Exabytes) will be generated by 2020, which
is an increase of 300 times from 2005.
 Example: Amazon handles 15 million customer click
stream user data per day to recommend products.

Haripriya V, Asst.Professor,Dept.of CS&IT 35


VELOCITY
 Velocity is defined as the pace at which different
sources generate the data every day.
 This flow of data is massive and continuous. There
are 1.03 billion Daily Active Users (Facebook
DAU) on Mobile as of now, which is an increase of
22% year-over-year.
 This shows how fast the number of users are
growing on social media and how fast the data is
getting generated daily.
 Late decisions  missing opportunities
Examples
◦ E-Promotions: Based on your current location, your purchase history, what you like
 send promotions right now for store next to you
◦ 72 hours of video are uploaded to YouTube every minute.
Haripriya V, Asst.Professor,Dept.of CS&IT 36
Variety – The next aspect of Big Data is
its variety.
◦ Variety refers to heterogeneous sources and the
nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases
were the only sources of data considered by most
of the applications.
◦ Nowadays, data in the form of emails, photos,
videos, monitoring devices, PDFs, audio, etc. are
also being considered in the analysis applications.
◦ This variety of unstructured data poses certain
issues for storage, mining and analyzing data.

Haripriya V, Asst.Professor,Dept.of CS&IT 37


#Characteristics of Big Data
The eight (8) ‘V’ Dimension Characteristics
of Big Data:
Part One: Volume, Velocity, Variety
Part Two: Variability (Unpredictability),
Veracity (Reliability), Validity, Visualization
and Value.

Haripriya V, Asst.Professor,Dept.of CS&IT 38


Haripriya V, Asst.Professor,Dept.of CS&IT 39
 Variability : Variability in big data's context refers to a few
different things. One is the number of inconsistencies in the
data(meaning). These need to be found by anomaly and outlier
detection methods in order for any meaningful analytics to occur.
 Big data is also variable because of the multitude of data
dimensions resulting from multiple disparate data types and
sources. Variability can also refer to the inconsistent speed at
which big data is loaded into your database.
Eg: Say a company was trying to gauge sentiment towards a cafe
using these ‘tweets’:
“Delicious muesli from the @imaginarycafe- what a great way
to start the day!”
“Greatly disappointed that my local Imaginary Cafe have
stopped stocking BLTs.”
“Had to wait in line for 45 minutes at the Imaginary Cafe today.
Great, well there’s my lunchbreak gone…”

Haripriya V, Asst.Professor,Dept.of CS&IT 40


Veracity
◦ Big Data Veracity refers to the biases, noise and
abnormality in data. Is the data that is being stored,
and mined meaningful to the problem being
analyzed.
◦ For example, consider a data set of statistics on what
people purchase at restaurants and these items' prices
over the past five years. You might ask: Who created
the source? What methodology did they follow in
collecting the data? Were only certain cuisines or
certain types of restaurants included? Did the data
creators summarize the information? Has the
information been edited or modified by anyone else?

Haripriya V, Asst.Professor,Dept.of CS&IT 41


Noisy data

Haripriya V, Asst.Professor,Dept.of CS&IT 42


Validity
Similar to veracity, validity refers to how
accurate and correct the data is for its intended
use.
According to Forbes, an estimated 60 percent of
a data scientist's time is spent cleansing their
data before being able to do any analysis.
The benefit from big data analytics is only as
good as its underlying data, so you need to
adopt good data governance practices to ensure
consistent data quality, common definitions,
and metadata.
Haripriya V, Asst.Professor,Dept.of CS&IT 43
Volatility
◦ Big data volatility refers to how long is data
valid and how long should it be stored.
◦ In this world of real time data you need to
determine at what point is data no longer
relevant to the current analysis.
Visualization : Visualization is critical in
today’s world. Using charts and graphs to
visualize large amounts of complex data is much
more effective in conveying meaning than
spreadsheets and reports chock-full of numbers
and formulas.
Haripriya V, Asst.Professor,Dept.of CS&IT 44
 Value : Value is the end game. After addressing
volume, velocity, variety, variability, veracity, and
visualization – which takes a lot of time, effort and
resources – you want to be sure your organization is
getting value from the data.
 Substantial value can be found in big data, including
understanding your customers better, targeting them
accordingly, optimizing processes, and improving
machine or business performance.
 You need to understand the potential, along with the
more challenging characteristics, before embarking
on a big data strategy.
Haripriya V, Asst.Professor,Dept.of CS&IT 45
#Need of Big Data
 Banking and Securities : For monitoring financial markets
through network activity monitors and natural language processors
to reduce fraudulent transactions. Exchange Commissions or
Trading Commissions are using big data analytics to ensure that no
illegal trading happens by monitoring the stock market.
 Communications and Media: For real-time reportage of events
around the globe on several platforms (mobile, web and TV),
simultaneously. 
 Sports: To understand the patterns of viewership of different events
in specific regions and also monitor the performance of individual
players and teams by analysis.
 Healthcare: To collect public health data for faster responses to
individual health problems and identify the global spread of new
virus strains such as Ebola. 

Haripriya V, Asst.Professor,Dept.of CS&IT 46


 Education: To update and upgrade prescribed literature
for a variety of fields which are witnessing rapid
development. 
 Manufacturing: To increase productivity by using big
data to enhance supply chain management.
Manufacturing companies use these analytical tools to
ensure that are allocating the resources of production in
an optimum manner which yields the maximum benefit.
 Insurance: For everything from developing new
products to handling claims through predictive analytics.
Insurance companies use business big data to keep a
track of the scheme of policy which is the most in
demand and is generating the most revenue.

Haripriya V, Asst.Professor,Dept.of CS&IT 47


 Consumer Trade: To predict and manage staffing and
inventory requirements. Consumer trading companies
are using it to grow their trade by providing loyalty
cards and keeping a track of them.
 Transportation: For better route planning, traffic
monitoring and management, and logistics. This is
mainly incorporated by governments to avoid
congestion of traffic in a single place.
 Energy: By introducing smart meters to reduce
electrical leakages and help users to manage their energy
usage.

Haripriya V, Asst.Professor,Dept.of CS&IT 48


#Handling Limitations of Big Data
 Need for talent: Data scientists and big data experts are
among the most highly coveted —and highly paid —
workers in the IT field. 
 Data quality: In the Syncsort survey, the number one
disadvantage to working with big data was the need to
address data quality issues. Before they can use big data
for analytics efforts, data scientists and analysts need to
ensure that the information they are using is accurate,
relevant and in the proper format for analysis. 
 Need for cultural change: Many of the organizations
that are utilizing big data analytics don't just want to get
a little bit better at reporting, they want to use analytics
to create a data-driven culture throughout the company. 

Haripriya V, Asst.Professor,Dept.of CS&IT 49


 Compliance: Another thorny issue for big analytics efforts is
complying with government regulations. Much of the information
included in companies' big data stores is sensitive or personal, and
that means the firm may need to ensure that they are meeting
industry standards or government requirements when handling and
storing the data.
 Cybersecurity risks: Storing big data, particularly sensitive data,
can make companies a more attractive target for cyber attackers. 
 Hardware needs: Another significant issue for organizations is the
IT infrastructure necessary to support big data analytics initiatives.
Storage space to house the data, networking bandwidth to transfer it
to and from analytics systems, and compute resources to perform
those analytics are all expensive to purchase and maintain. Some
organizations can offset this problem by using cloud-based
analytics, but that usually doesn't eliminate the infrastructure
problems entirely.

Haripriya V, Asst.Professor,Dept.of CS&IT 50


 Costs: Many of today's big data tools rely on open source
technology, which dramatically reduces software costs, but
enterprises still face significant expenses related to staffing,
hardware, maintenance and related services. 

Haripriya V, Asst.Professor,Dept.of CS&IT 51


#Technologies Supporting Big Data
 BigData Analytics tools are important for companies
and enterprises because of the huge volume of Big
Data now generated and managed by modern
organizations.
 BigData Analytics tools also help businesses save time
and money in gaining insights to inform data-driven
decisions.

Haripriya V, Asst.Professor,Dept.of CS&IT 52


Analytic Processes and Tools

Haripriya V, Asst.Professor,Dept.of CS&IT 53


 1. Apache Hadoop
 Apache Hadoop is a java based free software framework that can
effectively store large amount of data in a cluster. This framework runs in
parallel on a cluster and has an ability to allow us to process data across all
nodes. Hadoop Distributed File System (HDFS) is the storage system of
Hadoop which splits big data and distribute across many nodes in a cluster.
This also replicates data in a cluster thus providing high availability.
 2. Microsoft HDInsight
 It is a Big Data solution from Microsoft powered by Apache Hadoop
which is available as a service in the cloud. HDInsight uses Windows
Azure Blob storage as the default file system. This also provides high
availability with low cost.
 3. NoSQL
 While the traditional SQL can be effectively used to handle large amount
of structured data, we need NoSQL (Not Only SQL) to handle unstructured
data. NoSQL databases store unstructured data with no particular schema.
Each row can have its own set of column values. NoSQL gives better
performance in storing massive amount of data. There are many open-
source NoSQL DBs available to analyze big Data.

Haripriya V, Asst.Professor,Dept.of CS&IT 54


 4. Hive : This is a distributed data management for Hadoop. This
supports SQL-like query option HiveSQL (HSQL) to access big data.
This can be primarily used for Data mining purpose. This runs on top
of Hadoop.
 5. Sqoop : This is a tool that connects Hadoop with various relational
databases to transfer data. This can be effectively used to transfer
structured data to Hadoop or Hive.
 6. Presto : Facebook has developed and recently open-sourced its
Query engine (SQL-on-Hadoop) named Presto which is built to handle
petabytes of data. Unlike Hive, Presto does not depend on MapReduce
technique and can quickly retrieve data.
 7. Map Reduce: Map Reduce is a programming model and software
framework first established by Google. It works similar to a UNIX
pipeline. A Map Reduce job splits the input dataset into independent
subsets that are managed by map tasks in parallel. This step of
mapping is then followed by a step of reducing tasks. These reduce
tasks use the output of the maps to obtain the final result.

Haripriya V, Asst.Professor,Dept.of CS&IT 55


# Difference between Traditional IT Approach
and Big Data Technology
 Data architecture :
◦ Traditional data use centralized database architecture in which large and
complex problems are solved by a single computer
system. Centralized architecture is costly and ineffective to process large amount
of data.
◦ Big data is based on the distributed database architecture where a large
block of data is solved by dividing it into several smaller sizes. Then the
solution to a problem is computed by several different computers
present in a given computer network.
 Types of data:
◦ Traditional database systems are based on the structured data.
◦ Big data uses the semi-structured and unstructured data.
 Volume of data
◦ The traditional system database can store only small amount of data ranging from
gigabytes to terabytes.
◦ big data helps to store and process large amount of data which consists of
hundreds of terabytes of data or petabytes of data and beyond.

Haripriya V, Asst.Professor,Dept.of CS&IT 56


Data relationship
 In the traditional database system relationship between the data items
can be explored easily as the number of information stored is small.
 However, big data contains massive or voluminous data which
increase the level of difficulty in figuring out the relationship
between the data items.
Scaling
 Scaling refers to demand of the resources and servers required to
carry out the computation. Big data is based on the scale out
architecture under which the distributed approaches for computing
are employed with more than one server.
 However, achieving the scalability in the traditional database is very
difficult because the traditional database runs on the single server and
requires expensive servers to scale up.

Haripriya V, Asst.Professor,Dept.of CS&IT 57


 Higher cost of traditional data :
◦ Traditional database system requires complex and expensive hardware
and software in order to manage large amount of data.  Also moving
the data from one system to another requires more number of hardware
and software resources which increases the cost significantly. 
◦ While in case of big data as the massive amount of data is segregated
between various systems, the amount of data decreases. So use of big
data is quite simple, makes use of commodity hardware and open
source software to process the data.
 Accuracy and confidentiality:
◦ Under the traditional database system it is very expensive to store
massive amount of data, so all the data cannot be stored. This would
decrease the amount of data to be analyzed which will decrease the
result’s accuracy and confidence.
◦ While in big data as the amount required to store voluminous data is
lower. Therefore the data is stored in big data systems and the points of
correlation are identified which would provide high accurate results.

Haripriya V, Asst.Professor,Dept.of CS&IT 58


#Capabilities of Big Data
 Big data analytics capability refers to the ability to
manage a huge volume of disparate data to allow users
to implement data analysis and reaction.
 Maximizing enterprise business value should encompass
speed to insight which is the ability to transform raw
data into usable information and pervasive use which is
the ability to use business analytics across the enterprise.
 ability to gather enormous variety of data - structured,
unstructured and semi-structured data - from current and
former customers to gain useful knowledge to support
better decision-making, to predict customer behavior via
predictive analytics software, and to retain valuable
customers by providing real-time offers

Haripriya V, Asst.Professor,Dept.of CS&IT 59


 With a lens of analytics adoption, big data analytics
capability can be categorized into three levels:
aspirational, experienced, and transformed.
 The former two levels of analytics capabilities focus
on using business analytics technologies to achieve
cost reduction and operation optimization.
 The last level of capability is aimed to drive
customer profitability and making targeted
investments in niche analytics.

Haripriya V, Asst.Professor,Dept.of CS&IT 60


Big data Architecture capabilities
 Storage and management capabilities
 Database capability
 Processing capability
 Data Integration capability
 Statistical Analysis Capability

Haripriya V, Asst.Professor,Dept.of CS&IT 61


Haripriya V, Asst.Professor,Dept.of CS&IT 62
Haripriya V, Asst.Professor,Dept.of CS&IT 63
Haripriya V, Asst.Professor,Dept.of CS&IT 64
Haripriya V, Asst.Professor,Dept.of CS&IT 65
Haripriya V, Asst.Professor,Dept.of CS&IT 66
#Big Data in Banking Domain
 Banking and the Financial Services Industry is a domain
where the volume of data generated and handled is enormous.
 Each and every activity of this industry generates a digital
footprint backed by data.
 As the number of electronic records grows, financial services
are actively using big data analytics to derive business
insights, store data, and improve scalability.
 Improving Customer Experience : With so many financial
institutions in the market, it gets tough for the customer to
decide which bank to transact with. Customer experience, in
this case, becomes a deciding factor. Big data analysis
presents with the customized analysis for each customer, thus
improving their services and offerings.

Haripriya V, Asst.Professor,Dept.of CS&IT 67


 Personalized Marketing : Big Data is used for
personalized marketing, targeting customers on the basis
of their individual spends.  Analysis of the customer
behaviour on social media through sentiment analysis
helps banks create credit risk assessment and offer
customized products to the customer.
 Optimized Operations : Big data can be applied to
bring immense value to the bank in the avenues of
effective credit management, fraud management,
operational risks assessment, and integrated risk
management. Systems that enable with Big Data can
detect fraud signals further analyse them real-time using
machine learning, to accurately predict illegitimate users
and/or transactions, thus raising a caution flag.

Haripriya V, Asst.Professor,Dept.of CS&IT 68


#Big Data in Ecommerce Domain
 Product Portfolio
 Pricing
 Online/in-store
experience
 Advertising/Marketing experience
 Customer Service
 Inventory

Haripriya V, Asst.Professor,Dept.of CS&IT 69


Haripriya V, Asst.Professor,Dept.of CS&IT 70
How Can Big Data Help Your Ecommerce Site?
 Provide a Personalized User Experience for Your Site’s Visitors :
A buyer will have to go through all products, without highlighting
the ones that he is most interested in. Using big data analytics, you
will be able to recommend items that the user usually looks for.
 You will also be able to send personalized promotions for the user,
giving him those that really mean something for him.
 For example, a user usually searches for gadgets, then he will
receive recommendations and promotional materials related to
smartphones and laptops but not those for clothes and shoes.
  Be Agiler when it Comes to Pricing: pricing strategy is one of the
key determinants on whether or not a consumer would buy from you
or from your competitors. 
 Checking your competitors’ prices every now and then would be too
tedious. With big data analytics, you will have the ability to check
how much your competitors tagged their products so that you will be
able to respond accordingly.

Haripriya V, Asst.Professor,Dept.of CS&IT 71


 Turn Visitors Into Actual Buyers: The ultimate goal of
spending time, effort, and resources into understanding big data is
to help you improve your business and this is evident when you
attain a higher conversion rate or turning more visitors into actual
buyers.
 With big data, you will be able to know more why certain visitors
end up leaving, as well as why certain others decided to buy. This
way, you will know where you need to improve so that more
visitors will make purchases.
 Manage Your Inventory Efficiently: Big data analysis also allows
you to predict demand so that you can get the right amount of
supply.
 By determining how many of certain products are needed for
certain periods, then you will not waste money on having too much,
nor would you lose potential sales by turning down transactions
because you are out of stock.

Haripriya V, Asst.Professor,Dept.of CS&IT 72


# Big Data in Government Sectors
 Big data and analytics can be applied to just about any public-sector
program to provide tangible outcomes, including:
 Emergency response. Analytics have been used in response to major
natural disasters such as Typhoon Haiyan to identify health issues,
coordinate thousands of displaced individuals and prevent water scarcity
issues. Recently, following Hurricane Maria, analytics was used to identify
areas of need and for more effective resource allocation.
 Anti-money laundering. Analytics are being used to prevent money
laundering and financial crimes, directly impacting terrorist organizations
or unfriendly foreign governments that use illicit financial activities to fund
their operations.
 Insider threats. Using analytics to detect anomalies and irregular behavior,
agencies can greatly reduce the amount of data that gets leaked or stolen.
This helps prevent fraud and cybercrime that drains money and resources
that could otherwise be used for programs to help the citizenry.
 Workforce effectiveness. Agencies can better understand the workforce
gaps that could develop as employees either retire or leave for the private
sector. By ensuring that new employees can fill the gaps, and by
introducing ways to retain employees, agencies can continue to operate
Haripriya V, Asst.Professor,Dept.of CS&IT 73
effectively.
 National Security : The security situation in India is unpredictable. The
security agencies and police can analyze the data gathered from
disparate sources and respond to crime, attacks, and other such situations
in the country.
 Impact on Education : Impact of Big data on the Education sector can
play a really vital role. Big data analytics will help to survey the needs of
current generations and enable the government to provide them what
they want. More data about students can help students in various
different ways to find their interest, strength, and help to identify their
weaknesses in ways that are not possible today.
 To Diminish Unemployment Tempo : The government can limit the
unemployment rate by foreseeing the employment needs based on the
proficiency rate. This can be accomplished by investigation the students
graduating every year.
 Formulating Economic and Social Policies : Government agencies
hold a significant amount of data collected from surveys, administrative
programs, and public banks. By harnessing the power of big data
analytics, the government can design smarter, citizen-centric services
and policies. This would offer a wide range of social benefits to citizens
and improve their lives considerably.
Haripriya V, Asst.Professor,Dept.of CS&IT 74
# Big Data in Hospitals
 1) Patients Predictions For An Improved Staffing : one classic
problem that any shift manager faces: how many people do I put on staff at any
given time period? If you put on too many workers, you run the risk of having
unnecessary labor costs add up. Too few workers, you can have poor customer
service outcomes – which can be fatal for patients in that industry.
 Forbes states: “The result is a web browser-based interface designed to be used
by doctors, nurses and hospital administration staff – untrained in data science –
to forecast visit and admission rates for the next 15 days.
 Extra staff can be drafted in when high numbers of visitors are expected, leading
to reduced waiting times for patients and better quality of care.”
 2) Electronic Health Records (EHRs) : It’s the most widespread
application of big data in medicine. Every patient has his own digital record
which includes demographics, medical history, allergies, laboratory test results
etc. Records are shared via secure information systems and are available for
providers from both public and private sector.
 Every record is comprised of one modifiable file, which means that doctors can
implement changes over time with no paperwork and no danger of data
replication.

Haripriya V, Asst.Professor,Dept.of CS&IT 75


 4) Enhancing Patient Engagement : Many consumers – and hence,
potential patients – already have an interest in smart devices that record every
step they take, their heart rates, sleeping habits, etc., on a permanent basis.
All this vital information can be coupled with other trackable data to identify
potential health risks lurking.
 Patients are directly involved in the monitoring of their own health, and
incentives from health insurances can push them to lead a healthy lifestyle.
 Big data to fight cancer :Cancer is rapidly crippling people across the
world. Big data can help to fight cancer more effectively. 
 Healthcare providers will have enhanced ability to detect and diagnose
diseases in their early stages, assigning more effectual therapies based on a
patient’s genetic makeup, and regulate drug doses to minimize side effects
and improve effectiveness. 
 Monitoring patient vitals : The application of big data makes it easier for
hospital staff to work more efficiently. Sensors are used besides patient beds
to continuously monitor blood pressure, heartbeat and respiratory rate. Any
change in pattern is quickly alerted to doctors and healthcare administrators.

Haripriya V, Asst.Professor,Dept.of CS&IT 76


 Fraud Prevention and Detection : Big data helps to
prevent a wide range of errors on the side of health
administrators in the form of wrong dosage, wrong
medicines, and other human errors. It will also be
particularly useful to insurance companies. They can
prevent a wide range of fraudulent claims of insurance.

Haripriya V, Asst.Professor,Dept.of CS&IT 77

You might also like