Professional Documents
Culture Documents
Unit 2 (ETI) BDA
Unit 2 (ETI) BDA
1. Web: In the web domain, big data is used by online platforms and e-commerce websites to
analyze user behavior. This includes tracking user clicks, page views, and interactions. For
instance, platforms like Amazon use big data to personalize recommendations based on a
user's browsing and purchase history, enhancing the overall shopping experience.
2. Financial: Financial institutions leverage big data for risk management, fraud detection, and
customer insights. Credit card companies analyze transaction data in real-time to identify
unusual patterns that may indicate fraudulent activities. Additionally, big data analytics is
employed for predicting market trends and optimizing investment strategies.
3. Healthcare: In healthcare, big data is applied to enhance patient care and optimize
healthcare processes. Electronic health records (EHRs) are analyzed to identify patterns,
improve treatment plans, and predict disease outbreaks. Big data analytics also plays a crucial
role in genomics, helping researchers and clinicians analyze large-scale genomic data for
personalized medicine.
4. Internet of Things (IoT): IoT devices generate massive amounts of data, and big data
analytics is essential for extracting meaningful insights. In smart cities, sensors on traffic
lights, waste management systems, and public transportation are interconnected. Big data
is used to analyze this data in real-time, optimizing traffic flow, reducing energy
consumption, and improving overall city management.
6. Logistics & Transportation: In logistics and transportation, big data is used for route
optimization, predictive maintenance, and supply chain management. Companies like UPS
use big data analytics to optimize delivery routes, reduce fuel consumption, and enhance
overall operational efficiency. Predictive maintenance helps prevent breakdowns, ensuring
continuous and reliable transportation services.
7. Industry: Manufacturing industries leverage big data for quality control, process
optimization, and predictive maintenance. Sensors on production lines generate vast
amounts of data, which is analyzed in real-time to identify defects, optimize
production processes, and predict when machinery requires maintenance. This
improves overall efficiency and reduces downtime.
8. Retail: Retailers use big data to analyze customer purchasing patterns, optimize
inventory management, and personalize marketing strategies. For instance,
supermarkets analyze customer purchase data to optimize inventory levels, ensuring
products are always available. Online retailers use big data to personalize
recommendations and promotions based on customer preferences and browsing
history.
Analytics flow for big data:
1. *Data Collection*: This is where we gather all the relevant data from various sources
such as databases, sensors, or social media platforms. Think of it as collecting pieces of
a puzzle.
2. *Data Preparation*: After collecting the data, we need to clean and organize it. This
step involves removing any errors or inconsistencies and formatting the data in a way
that's suitable for analysis. It's like sorting and arranging the puzzle pieces so they fit
together neatly.
3. *Analysis Types*: There are different ways we can analyze the data depending on
what we want to find out. For example, we might use descriptive analysis to summarize
the data, predictive analysis to forecast future trends, or prescriptive analysis to
recommend actions based on the data.
4. *Analysis Modes*: Once we know what type of analysis we want to perform,
we choose the mode of analysis. This could be batch processing, where we
analyze a large amount of data at once, or real-time processing, where we
analyze data as it's generated. It's like deciding whether to solve the puzzle all
at once or piece by piece as we go.
1. *Raw Data Sources*: These are the original sources where data is generated or
collected, such as databases, sensors, or social media platforms. It's like the starting
point where all the data comes from.
2. *Data Access Connectors*: These are tools or interfaces that allow us to access and
retrieve data from different sources. They act as bridges between the raw data sources
and the rest of the big data stack, ensuring smooth data flow. Think of them as
connectors that link the raw data sources to the rest of the system.
3. *Data Storage*: This is where the collected data is stored for future use and analysis.
It could be in traditional databases, data lakes, or distributed file systems like Hadoop
Distributed File System (HDFS). It's like the storage room where we keep all the puzzle
pieces safe and organized.
4. *Batch Analytics*: Batch analytics involves processing and analyzing large volumes
of data in batches or chunks. It's useful for tasks that don't require immediate results,
such as historical analysis or periodic reporting. Think of it as solving the puzzle piece by
piece, but not necessarily in real-time.
6. *Interactive Querying*: This refers to the ability to interactively query and explore the
data stored in the system. It allows users to ask ad-hoc questions and receive instant
responses, facilitating exploratory data analysis and troubleshooting. Think of it as being
able to search for specific puzzle pieces and get instant answers.
7. *Serving Databases*: Databases designed for serving data quickly and
efficiently, such as NoSQL databases (e.g., MongoDB) or distributed SQL
databases (e.g., Apache Cassandra), play a role in storing processed data for
easy retrieval. It's like having a well-organized library where you can easily find
the book you need.
8. *Web & Visualization Frameworks*: These are the tools and frameworks
used to serve the analyzed data to end-users, whether through databases, web
applications, or visualization tools like Tableau or Power BI. They make the
insights gained from the data accessible and understandable to non-technical
users. It's like putting the puzzle together in a way that others can see and
understand the complete picture.
Mapping the analytics flow to the Big Data Stack means aligning the stages and
processes involved in data analytics with the various components of a Big Data
technology stack. It involves understanding how different tools and technologies within
the Big Data ecosystem can be employed to handle the various aspects of data
processing, storage, and analysis.
By mapping the analytics flow to the Big Data Stack, organizations can optimize
their data processing and analysis workflows, making use of the capabilities
offered by different components of the Big Data ecosystem. This ensures
efficient handling of large datasets and facilitates the extraction of valuable
insights from the data.
Case study on Genome Data Analysis:
In this case study, the Big Data Stack plays a crucial role in handling the vast and complex
genomic data, performing in-depth analysis, and presenting the findings in a way that aids
researchers in understanding the genetic factors influencing the rare disease.
Case study on Weather Data Analysis:
In this case study, the Big Data Stack facilitates the efficient handling of vast and dynamic weather
data, enabling comprehensive analysis, accurate predictions, and timely communication of weather
information to the public and other stakeholders.
Analytics patterns refer to recurring approaches or methodologies used in
data analysis to solve common problems or achieve specific goals. These
patterns provide guidance on how to structure and conduct data analysis tasks
efficiently and effectively. Here are some common analytics patterns: