You are on page 1of 1

Introduction to Dataflow

In this lesson you learn about Dataflow. Dataflow is a fully managed service for
transforming and enriching data in stream (real time) or batch (historical) modes. This
means you don't have to do complex workarounds or compromise your pipeline design.

Dataflow uses a serverless approach to resource provisioning and management. You can
access virtually limitless capacity to solve your biggest data processing challenges, while
paying only for what you use.

Pub/Sub Stream Data Third-party


Studio tools

Data
Warehouse
Datastore

BigQuery
Dataflow

Predictive
Apache Avro Process Analytics
Vertex AI

Apache Kafka Batch


Cloud Bigtable
Caching and
Ingest Analyze Serving

Dataflow pipelines are either batch (processing bounded input like a file or database table) or
streaming (processing unbounded input from a source like Pub/Sub).

The lab you do in this lesson is a batch pipeline. In a later module you create a streaming
pipeline.

You might also like