BDA Unit 2

Data Analytics Process
2.1. Domain Specific Examples of Big Data

1.Web -
Big data plays a crucial role in various aspects of the web, from improving user
experience to enhancing business operations. Here are some key areas where big
data is utilized in the web:
1. Personalization: Big data analytics are used to analyze user behavior and
preferences, enabling websites to offer personalized recommendations,
content, and advertisements. This enhances user engagement and satisfaction.
2. Content Optimization: Websites leverage big data analytics to understand
which content performs best and optimize their offerings accordingly. This
includes identifying popular topics, improving search relevance, and tailoring
content to specific audience segments.
3. Predictive Analytics: Big data enables predictive analytics, allowing websites
to anticipate user needs and behaviors. This can be used for proactive
customer support, predicting churn, and optimizing marketing campaigns.
4. Performance Optimization: Big data tools are employed to monitor website
performance in real-time, identifying issues such as slow loading times or
downtime. This enables timely interventions to maintain a seamless user
experience.
Overall, big data is essential for websites to understand their users, improve performance,
drive revenue, and stay competitive in today's digital landscape.
2. Financial-
Big data has revolutionized the finance industry, offering numerous benefits across
various sectors within finance. Here are some key applications of big data in finance:
1. Risk Management: Big data analytics help financial institutions assess and
mitigate risks more effectively. By analyzing vast amounts of historical and
real-time data, including market trends, customer behavior, and
macroeconomic indicators, banks and investment firms can identify potential
risks and take proactive measures to manage them.
2. Fraud Detection: Big data enables financial institutions to detect and prevent
fraudulent activities in real-time. By analyzing patterns and anomalies in
transaction data, banks can identify suspicious behavior and flag potentially
fraudulent transactions for further investigation.
3. Customer Analytics: Big data analytics provide insights into customer
behavior, preferences, and needs. Financial institutions use this information to
personalize their offerings, improve customer experience, and target
marketing campaigns more effectively.
4. Algorithmic Trading: Big data and machine learning algorithms are
increasingly used in algorithmic trading strategies. By analyzing market data,
news feeds, social media sentiment, and other relevant sources, algorithms
can make faster and more informed trading decisions, leading to better
investment outcomes.
big data has transformed the finance industry by enabling more informed decision-making,
improving risk management, enhancing customer experience, and driving innovation across
various financial sectors.
3.Healthcare –
Big data has become increasingly essential in revolutionizing healthcare by offering

numerous applications that improve patient outcomes, streamline operations, and
enhance overall efficiency. Here are some key areas where big data is utilized in
healthcare:
1. Clinical Decision Support: Big data analytics enable healthcare providers to

make more informed clinical decisions by analyzing large volumes of patient
data, including medical records, lab results, imaging studies, and genomic
data. This helps physicians diagnose diseases earlier, personalize treatment
plans, and improve patient outcomes.
2. Predictive Analytics: Big data tools allow healthcare organizations to predict
and prevent adverse events, such as hospital readmissions, infections, and
medication errors. By analyzing historical data and identifying patterns,
predictive analytics models can alert clinicians to potential risks and enable
proactive interventions to improve patient safety.
3. Population Health Management: Big data analytics help healthcare systems
manage the health of populations by identifying high-risk patients, predicting
disease outbreaks, and designing targeted interventions. By analyzing
demographic data, socioeconomic factors, and health behaviors, population
health management initiatives can improve healthcare delivery and reduce
costs.
4. Precision Medicine: Big data plays a crucial role in advancing precision
medicine initiatives by analyzing genomic data, biomarkers, and clinical
phenotypes. By integrating genetic information with clinical data, healthcare
providers can tailor treatment plans to individual patients' genetic profiles,
leading to more effective therapies and better patient outcomes.
big data has the potential to transform healthcare delivery by enabling

more informed decision-making, improving patient outcomes, and driving
innovation across the healthcare ecosystem.
5.Internet of Things
Big data plays a crucial role in the Internet of Things (IoT) ecosystem, where vast
amounts of data are generated by interconnected devices and sensors. Here's how
big data is utilized in IoT:
1. Data Collection: IoT devices generate enormous volumes of data from

various sources such as sensors, wearables, smart appliances, industrial
machines, etc. Big data technologies are employed to collect, aggregate, and
store this data efficiently.
2. Data Processing and Analytics: Big data analytics techniques are used to
process and analyze the massive amounts of data generated by IoT devices.
This involves extracting valuable insights, detecting patterns, and predicting
future trends, which can be used to optimize operations, improve efficiency,
and drive decision-making processes.
3. Real-time Monitoring and Control: Big data platforms enable real-time
monitoring and control of IoT devices and systems. By processing and
analyzing data in real-time, organizations can detect anomalies, identify
issues, and take immediate actions to prevent failures or optimize
performance.
4. Predictive Maintenance: Big data analytics combined with IoT data can
enable predictive maintenance in various industries such as manufacturing,
transportation, and utilities. By analyzing historical data and real-time sensor
readings, organizations can predict when equipment is likely to fail and
schedule maintenance proactively, reducing downtime and maintenance costs.
big data plays a critical role in unlocking the full potential of IoT by
enabling organizations to derive valuable insights, optimize operations,
enhance customer experiences, and drive innovation across various
industries.
5.Environment –
Big data plays a significant role in environmental management and sustainability
efforts. Here are several ways in which big data is used in the environmental sector:
1. Environmental Monitoring: Big data analytics can process large volumes of

data from various sources such as satellites, sensors, and IoT devices to
monitor environmental parameters like air quality, water quality, biodiversity,
and climate change. This data helps in understanding environmental trends
and identifying areas that need attention.
2. Predictive Modelling: Big data analytics enables the development of
predictive models for environmental phenomena. For example, machine
learning algorithms can analyze historical weather data to predict future
climate patterns, helping governments and organizations prepare for extreme
weather events.
3. Resource Management: Big data tools can optimize resource management

by analysing data on energy consumption, waste generation, and resource
utilization. This analysis helps in identifying inefficiencies and implementing
strategies for conservation and sustainable use of resources.
4. Wildlife Conservation: Big data analytics assists in wildlife conservation
efforts by analysing data from GPS trackers, camera traps, and satellite
imagery to track animal movements, monitor habitats, and identify threats to
endangered species.
big data plays a crucial role in understanding, managing, and mitigating

environmental issues, contributing to global efforts towards sustainability and
conservation.
7.Logistics & Transportation –

Big data has revolutionized the logistics and transportation industry by providing
valuable insights, optimizing operations, and improving efficiency. Here are some key
ways big data is used in logistics and transportation:
1. Route Optimization: Big data analytics enable logistics companies to analyze

vast amounts of data, including traffic patterns, weather conditions, historical
routes, and real-time information. This data is used to optimize routes for
trucks, ships, and planes, reducing fuel consumption, minimizing delivery
times, and avoiding traffic congestion.
2. Predictive Maintenance: Big data allows transportation companies to
monitor the condition of their vehicles and equipment in real-time. By
analysing data from sensors and IoT devices, companies can predict when
maintenance is needed, preventing breakdowns and reducing downtime.
3. Demand Forecasting: Analysing big data helps logistics companies predict
customer demand more accurately. By analysing historical sales data, market
trends, and other factors, companies can optimize inventory levels, reduce
stockouts, and minimize carrying costs.
4. Fleet Management: Big data analytics help fleet managers optimize the use
of their vehicles. By tracking vehicle location, speed, fuel consumption, and
other metrics in real-time, companies can improve route planning, reduce idle
time, and increase overall efficiency.
big data is transforming the logistics and transportation industry by

providing actionable insights, improving efficiency, and driving innovation.
As technology continues to evolve, we can expect even more
advancements in this field.
8. Industry –
Big data has become a game-changer across various industries, revolutionizing how
businesses operate, make decisions, and interact with customers. Here are some key
ways big data is used in industry:
1. Business Intelligence and Analytics: Big data analytics allows businesses to

analyze large volumes of data to gain insights into customer behavior, market
trends, and operational inefficiencies. This helps in making informed decisions
and strategic planning.
2. Personalized Marketing: By analyzing large datasets of customer
information, businesses can create personalized marketing campaigns tailored
to individual preferences and behaviors, leading to higher engagement and
conversion rates.
3. Supply Chain Optimization: Big data analytics helps in optimizing supply
chain operations by analyzing data from various sources such as sensors, RFID
tags, and GPS trackers. This enables businesses to improve inventory
management, reduce transportation costs, and enhance overall efficiency.
4. Predictive Maintenance: In manufacturing and other industries, big data
analytics is used to predict equipment failures and maintenance needs by
analyzing data from sensors and monitoring systems. This proactive approach
helps in minimizing downtime and maximizing asset utilization.
big data has become a cornerstone of modern business operations, enabling

organizations to unlock new opportunities, improve efficiency, and drive innovation
across various industries.
9. Retail –
Big data has revolutionized the retail industry in several ways:
1. Customer Insights: Retailers can gather vast amounts of data from various
sources such as transactions, loyalty programs, social media, and website
interactions. Analyzing this data provides valuable insights into customer
preferences, behaviors, and buying patterns, enabling retailers to tailor their
marketing strategies, product assortments, and pricing strategies to meet
customer demands effectively.
2. Inventory Management: Big data analytics help retailers optimize inventory
levels by predicting demand more accurately. By analyzing historical sales
data, current trends, and external factors like weather and economic
conditions, retailers can ensure they have the right products in stock at the
right time and minimize stockouts or overstock situations.
3. Personalized Marketing: Retailers can use big data to create personalized
marketing campaigns targeted at specific customer segments. By analyzing
customer data, retailers can send personalized recommendations, promotions,

and offers that are more likely to resonate with individual customers, leading
to higher conversion rates and customer satisfaction.
4. Price Optimization: Big data analytics enable retailers to dynamically adjust
prices based on factors such as demand, competitor pricing, and inventory
levels. By leveraging pricing algorithms, retailers can optimize prices in real-
time to maximize revenue and profit margins while remaining competitive in
the market.
big data has become an indispensable tool for retailers looking to stay
competitive in today's dynamic and fast-paced market by providing
valuable insights, optimizing operations, and enhancing the overall
customer experience.
2.2 Analytics Flow for Big Data –
Designing an analytics flow for big data involves several steps to process, analyze,
and derive insights from vast amounts of data efficiently. Here's a generalized flow:
1. Data Collection:
 Gather data from various sources such as sensors, logs, databases,
social media, etc.
 Utilize tools like Apache Kafka, Flume, or custom APIs for real-time data
ingestion.
 Store data in a distributed file system like Hadoop Distributed File
System (HDFS) or cloud storage.
2. Data Preprocessing:
 Cleanse data by removing duplicates, correcting errors, handling
missing values, etc.
 Normalize or standardize data to make it uniform for analysis.
 Perform transformations like aggregation, filtering, and feature
engineering.
3. Analysis Types:
1.Descriptive Analytics: Descriptive analytics involves summarizing
historical data to understand what has happened in the past. It includes
techniques such as data aggregation, data mining, and visualization to
provide insights into trends, patterns, and relationships within the data.
2.Diagnostic Analytics: Diagnostic analytics focuses on identifying the
reasons why certain events occurred. It involves digging deeper into the
data to uncover correlations and causal relationships. Techniques such as
root cause analysis and hypothesis testing are often used in diagnostic
analytics.
3.Predictive Analytics: Predictive analytics involves using statistical

algorithms and machine learning techniques to forecast future events or
trends based on historical data. It uses techniques such as regression
analysis, time series analysis, and machine learning models to make
predictions.
4.Prescriptive Analytics: Prescriptive analytics goes beyond predicting
future outcomes by recommending actions to optimize future performance.
It involves using optimization and simulation techniques to evaluate various
scenarios and determine the best course of action. Prescriptive analytics is
often used in decision support systems to help organizations make better
decisions.
5. Visualization –
Big data analytics visualization refers to the process of using visual
representations to explore and understand large and complex datasets. Visualization
techniques are crucial in making sense of vast amounts of data by presenting it in a
more understandable and actionable format. Here are some common techniques and
tools used in big data analytics visualization:
1. Charts and Graphs: Bar charts, line charts, scatter plots, histograms, and pie
charts are commonly used to represent data relationships, distributions, and
trends.
2. Heatmaps: Heatmaps visually represent data using colors to indicate values.
They are useful for identifying patterns and correlations in large datasets.
3. Geospatial Visualization: Mapping data onto geographical maps helps in
understanding spatial patterns and relationships. Geographic Information
Systems (GIS) tools are often used for this purpose.
4. Network Visualization: Network diagrams represent relationships between
entities, such as social networks, communication networks, or network traffic
flows.
Popular tools for big data analytics visualization include:
 Tableau
 Power BI (Microsoft)
 QlikView
 D3.js (for custom and interactive visualizations)
 Plotly
2.3 Big Data Stack –

A big data stack typically refers to the combination of various technologies and tools
used to handle large volumes of data efficiently. Here's a simplified breakdown of a
common big data stack:
1. Data Sources: Data can come from various sources such as databases,
sensors, social media, web logs, etc.
2. Data Ingestion: Tools like Apache Kafka, Apache NiFi, or AWS Kinesis are
often used to ingest data from different sources and route it to appropriate
destinations.
3. Storage: Data is stored in distributed storage systems like Hadoop Distributed
File System (HDFS), Amazon S3, or Google Cloud Storage. Other options
include NoSQL databases like Apache Cassandra, MongoDB, or HBase, and
also traditional relational databases like MySQL or PostgreSQL.
4. Processing and Analysis: Big data processing frameworks like Apache
Hadoop (MapReduce, HDFS), Apache Spark, Apache Flink, or Dask are used
for processing and analyzing large datasets in parallel across a distributed
cluster.
5. Data Warehousing/Analytics: Data warehouses such as Amazon Redshift,
Google BigQuery, or Snowflake are utilized for storing and analyzing
structured data in a way optimized for querying and reporting.
6. Data Visualization and BI Tools: Tools like Tableau, Power BI, or Apache
Superset are used to create visualizations and dashboards to derive insights
from the data.
7. Machine Learning and AI: Libraries and frameworks like TensorFlow, PyTorch,
scikit-learn, or MLlib (part of Apache Spark) are used for building and
deploying machine learning models on big data.
8. Monitoring and Management: Tools like Apache Ambari, Cloudera Manager,
or Datadog are employed for monitoring the health and performance of the
big data infrastructure and applications.
9. Security: Various security measures including encryption, access control, and

authentication mechanisms are implemented to ensure data security and
compliance with regulations like GDPR or HIPAA.
10. Orchestration: Tools like Apache Airflow, Kubernetes, or Apache Oozie are
used to automate and schedule data workflows and pipelines.
This stack can vary depending on specific requirements, preferences, and the scale of
the data being handled. Additionally, cloud-based solutions from providers like
Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure often
offer managed services that simplify many aspects of building and managing a big
data stack.
2.4. Mapping Analytics Flow to Big Data Stack –

Mapping analytics flow to a big data stack involves understanding the stages of the
analytics process and how they can be implemented using various components of a
big data stack. Here's a general outline of how you might map analytics flow to a big
data stack:
1. Data Ingestion:
 Source data from various systems, such as databases, files, streaming
sources, APIs, etc.
 Use technologies like Apache Kafka, Apache Flume, or AWS Kinesis for
real-time streaming data ingestion.
 Use technologies like Apache Sqoop, Apache NiFi, or custom scripts for
batch data ingestion.
2. Data Storage:
 Store ingested data in a scalable, distributed storage system.
 Options include Hadoop Distributed File System (HDFS), Amazon S3,
Google Cloud Storage, or Azure Data Lake Storage.
 Utilize NoSQL databases like Apache Cassandra, MongoDB, or HBase
for semi-structured or unstructured data.
 Use columnar databases like Apache HBase, Apache Druid, or
ClickHouse for analytical queries.
3. Data Processing:
 Perform data transformation, cleaning, and enrichment.
 Use distributed processing frameworks like Apache Spark, Apache Flink,
or Apache Beam for batch processing.
 Utilize stream processing frameworks like Apache Kafka Streams,
Apache Storm, or Apache Samza for real-time processing.
 Implement machine learning pipelines using libraries like Apache Spark
MLlib, TensorFlow, or Scikit-learn.
4. Data Warehousing:
 Build data warehouses for storing aggregated and processed data.

 Use technologies like Apache Hive, Amazon Redshift, Google BigQuery,
or Snowflake for SQL-based analytics.
 Utilize OLAP databases like Apache Kylin or Druid for fast query
performance on aggregated data.
5. Data Visualization:
 Create dashboards and visualizations to communicate insights.
 Use tools like Tableau, Power BI, or Looker for interactive visualizations.
 Integrate with programming languages like Python or R for custom
visualizations using libraries like Matplotlib, Seaborn, or Plotly.
2.5 Case Study: Genome Data Analysis –

Genome data analysis involves studying the complete set of genetic material within
an organism, known as its genome. This analysis can provide insights into various
aspects of biology, including evolutionary relationships, genetic disorders, population
genetics, and personalized medicine. Here's a broad overview of the steps involved in
genome data analysis:
1. Data Acquisition: The first step is obtaining the genome data. This can
involve sequencing the DNA of an organism using techniques like whole-
genome sequencing, targeted sequencing, or high-throughput sequencing
methods like Next-Generation Sequencing (NGS).
2. Preprocessing: Raw sequencing data often contain errors and artifacts that
need to be corrected before analysis. Preprocessing involves quality control,
trimming adapters, filtering out low-quality reads, and error correction.
3. Alignment: In this step, the sequenced reads are mapped back to a reference
genome or assembled into contigs if a reference is not available. Alignment
algorithms like Bowtie, BWA, or HISAT are commonly used for this purpose.
4. Variant Calling: Once the reads are aligned, variant calling identifies
differences (variants) between the sequenced genome and the reference
genome. Single nucleotide polymorphisms (SNPs), insertions, deletions, and
structural variations are detected in this step.
5. Annotation: Variants are annotated to determine their functional effects, such
as whether they occur in coding regions (exons), introns, regulatory regions,
or intergenic regions. This helps prioritize variants that may have biological
significance.
2.6 Case Study: Weather Data Analysis –

weather data analysis can involve a wide range of tasks, from simple descriptive
statistics to complex modelling and forecasting. Here's a general outline of steps you
might take in weather data analysis:
1. Data Collection: Gather historical weather data from reliable sources like
government meteorological agencies, weather stations, or online databases.
Data can include temperature, precipitation, wind speed, humidity, pressure,
etc. Ensure data quality and consistency.
2. Data Cleaning and Preprocessing: Check for missing values, outliers, and
inconsistencies in the data. Impute missing values using techniques like
interpolation or mean substitution. Convert units if necessary and format the
data for analysis.
3. Exploratory Data Analysis (EDA): Explore the data to understand its
characteristics and identify patterns. This can involve visualizations such as
time series plots, histograms, scatter plots, and correlation matrices.
4. Statistical Analysis: Conduct statistical tests to investigate relationships
between weather variables (e.g., correlation analysis, regression analysis).
Determine if there are significant differences or correlations between different
locations or time periods.
5. Predictive Modelling: Build predictive models to forecast future weather
conditions. This can include simple models like autoregressive models or more
complex machine learning algorithms such as ARIMA, SARIMA, or neural
networks.
6. Model Evaluation: Evaluate the performance of predictive models using
metrics like mean absolute error (MAE), root mean square error (RMSE), or
coefficient of determination (R-squared). Validate the models using
techniques like cross-validation or holdout validation.
7. Visualization and Communication: Present the results of the analysis using
visualizations such as maps, charts, and graphs. Communicate key findings
and insights to stakeholders in a clear and understandable manner.
8. Continuous Monitoring and Updating: Monitor weather conditions in real-
time and update the analysis as new data becomes available. Incorporate
feedback and improve the analysis over time.
2.7. Analytics Patterns:

"Analytics patterns" typically refer to recurring structures or methodologies used in
analyzing data to extract insights and make informed decisions. These patterns can
vary depending on the context, data types, and objectives of analysis. Here are some
common analytics patterns:
1. Descriptive Analytics: This pattern involves summarizing historical data to

understand what has happened in the past. It includes techniques such as
reporting, data visualization, and dashboards to provide insights into trends,
patterns, and anomalies.
2. Diagnostic Analytics: This pattern focuses on understanding why certain
events occurred by analyzing historical data and identifying root causes of
problems or opportunities. Techniques may include drill-down analysis,

correlation analysis, and root cause analysis.
3. Predictive Analytics: This pattern involves using statistical and machine
learning models to forecast future outcomes based on historical data.
Predictive analytics can help businesses anticipate trends, identify risks, and
make proactive decisions.
4. Prescriptive Analytics: This pattern goes beyond predicting future outcomes
by recommending actions to optimize decision-making. It combines predictive
models with optimization techniques to suggest the best course of action
given specific constraints and objectives.
5. Exploratory Data Analysis (EDA): EDA involves analyzing data sets to
summarize their main characteristics, often using statistical graphics and other
data visualization methods. EDA helps analysts understand the data's
underlying structure, identify patterns, and formulate hypotheses for further
investigation.

BDA Unit 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BDA Unit 2

Uploaded by

Copyright:

Available Formats

Data Analytics Process

2.1. Domain Specific Examples of Big Data

Big data has become increasingly essential in revolutionizing healthcare by offering

1. Clinical Decision Support: Big data analytics enable healthcare providers to

big data has the potential to transform healthcare delivery by enabling

1. Data Collection: IoT devices generate enormous volumes of data from

1. Environmental Monitoring: Big data analytics can process large volumes of

3. Resource Management: Big data tools can optimize resource management

big data plays a crucial role in understanding, managing, and mitigating

7.Logistics & Transportation –

1. Route Optimization: Big data analytics enable logistics companies to analyze

big data is transforming the logistics and transportation industry by

1. Business Intelligence and Analytics: Big data analytics allows businesses to

big data has become a cornerstone of modern business operations, enabling

Big data has revolutionized the retail industry in several ways:

customer data, retailers can send personalized recommendations, promotions,

2.2 Analytics Flow for Big Data –

3.Predictive Analytics: Predictive analytics involves using statistical

Popular tools for big data analytics visualization include:

2.3 Big Data Stack –

9. Security: Various security measures including encryption, access control, and

2.4. Mapping Analytics Flow to Big Data Stack –

 Build data warehouses for storing aggregated and processed data.

2.5 Case Study: Genome Data Analysis –

2.6 Case Study: Weather Data Analysis –

2.7. Analytics Patterns:

1. Descriptive Analytics: This pattern involves summarizing historical data to

problems or opportunities. Techniques may include drill-down analysis,

You might also like