Professional Documents
Culture Documents
Big Data Pipelines PDF
Big Data Pipelines PDF
Module 1
Agenda
✓Data Pipelines
✓Data Pipelines Property
✓Types of Data
✓Evolution of Data Pipelines
✓Deployment of Data Pipelines
✓Analytical platform for IoT landscape
✓Building Big Data Pipelines
✓Benefits of Big Data Pipelines
Data Pipelines
• Building data pipelines is a core component of data science at a startup.
• Collect Data and process
• Typically, the destination for a data pipeline is a data lake, such as Hadoop or
parquet files on S3, or a relational database, such as Redshift
• A data pipeline views all data as streaming data and it allows for flexible
schemas.
• The data pipeline does not require the ultimate destination to be a data
warehouse.
• Pipeline is commonplace for everything related to data whether to ingest
data, store data or to analyze that data.
Components of Big Data Pipelines
Scalability
Property
Interactive
Querying:
Versioning
Monitoring
Testing
Data Warehouse Vs. Data Lake
Data Pipelines Solutions
Real-
Batch
time
Cloud Open
native Source
IoT Data Pipelines
Layers
Data Ingestion Layer
Visualization Tool
MongoDB such as Tableau,
Qlikview, D3.js, etc.
Building Big Data Pipelines
Benefits of Big Data Pipelines
• Big data pipelines help in Better Event framework Designing
• Data persistence maintained
• Ease of Scalability at the coding end
• Workflow management as the pipeline is Automated and has scalability
factors
• Provides Serialization framework
• There are some disadvantages of data pipelines also, but these are not that
much to worry on. They have some alternatives ways to manage.
• Economic resources may affect the performance as Data Pipelines are best
suited for large data sets only.
• Maintenance of job processing units or we can say Cloud Management.
• No more privacy on the cloud for critical data.
Thank you