Professional Documents
Culture Documents
2. Semi-structured data
This type of data does not have any standard format model.
3. Unstructured data
This data do not have any predefined data model.
Issues regarding data in traditional file
1. Volume
2. Velocity
3. Variety
4. Variability
5. Complexity
Examples of Big data Applications
1. Fraud Detection
2. IT log analytics
3. Call center analytics
4. Social Media Analysis
Data Explosion
The data explosion is nothing but the rapid growth of the data.
One reason to this explosive growth of data is innovation.
Data Veracity
It refers to the assurance of quality/integrity/credibility/accuracy of the
data. Since the data is collected from multiple sources, we need to check the
data for accuracy before using it for business insights.
Data Value
• Just because we collected lots of Data, it’s of no value unless we garner
some insights out of it. Value refers to how useful the data is in decision
making.
5 Vs of Big Data
• Raw Data: Volume
• Change over time: Velocity
• Data types: Variety
• Data Quality: Veracity
• Information for Decision Making: Value
Big data Infrastructure and Challenges
• Storage
• Transportation
• Processing
CPU
Memory
Software
• Speed or Throughput
Big Data Processing Architectures
1) Lambda Architecture
Lambda architecture mainly designed to manage the huge amount of data by
using the batch and stream methods.
It maintains Latency, throughput, fault tolerance
Example: Twitter, Spotify, Liveperson
2) Kappa Architecture
This is similar to lambda architecture but the batch layer is removed
from this architecture.
Example:
3) Zeta Architecture
It is useful for describing the scalable technique to boost the speed of combining data into
business.
7 pluggable components of Zeta Architecture
1. Distributed file system
2. Real-time data storage
3. Pluggable compute model/execution
Engine
4. deployment/container management
System
5. Solution architecture
6. Enterprise application
7. Dynamic & global resource management
Benefits of Zeta Architecture
1. It reduces time and cost.
2. It contains less moving parts
3. Duplication
4. Testing, troubleshooting
5. Better resource utilization
Difference between lambda and kappa architecture
Data Warehouse
• A Data Warehouse (DW) is a relational database that is designed for query
and analysis rather than transaction processing. It includes historical data
derived from transaction data from single and multiple sources.
• It is a single, complete and consistent store of data obtained from a variety
of different sources made available to end users.
• Data warehousing is a process of transforming raw data into systematic
information and making it available to users as per as requirement in a
timely manner.
• A data-warehouse is a heterogeneous collection of different data sources
organized under a unified schema. There are 2 approaches for constructing
data-warehouse: Top-down approach and Bottom-up approach are explained as
below.
Architecture of data Warehouse
There are two approaches
Top-down approach and Bottom-up approach
1) Top-down approach
2)Bottom-up Approach
Characteristics of Data Warehouse
Goals of Data Warehousing
• To help reporting as well as analysis
• Maintain the organization's historical information
• Be the foundation for decision making.
Disadvantages
1. Cost is high
2. The data sending involves the software interaction
3. In this technique more coordination is required.
Difference between shared everything and shared nothing architecture
Big data learning approaches
A. Machine Learning
It is a process that gives computers the ability to learn without being explicitly
programmed.
Machine learning is a method of data analysis that automates analytical model
building.
B. Machine Learning System Model