How is Dataflow pricing structured?

Dataflow charges per vCPU-hour, per GB-hour of memory, and per GB-month of persistent disk for worker resources. Streaming Engine pricing replaces worker-level charges with per-Streaming-Unit pricing that includes CPU, memory, and state storage. FlexRS offers up to 40% savings for batch jobs with flexible scheduling.

What is Streaming Engine?

Streaming Engine offloads the data shuffling and state management from worker VMs to a Google-managed service. This reduces worker resource requirements, improves autoscaling responsiveness, and provides better observability. It is recommended for all new streaming pipelines and is priced per Streaming Compute Unit.

Flexible Resource Scheduling (FlexRS) reduces batch job costs by up to 40% by running workers on a combination of preemptible and standard VMs with flexible scheduling. Dataflow queues the job and starts it within 6 hours when resources are cheapest. FlexRS is ideal for nightly ETL jobs and other delay-tolerant workloads.

Dataflow Cost Estimator

ComputeGCP

Estimate Dataflow costs for batch and streaming jobs with worker config, Streaming Engine, FlexRS, and Dataflow Prime.

Last verified: May 2026

Input

Job Type

Pricing Model

Machine Type (quick pick)

vCPU / Worker

Memory GB / Worker

Storage GB / Worker

Worker Count

Hours per Day

Configure your Dataflow job parameters and click "Estimate Cost" to see the breakdown.

Raw Output

Output will appear here...

Batch vs Streaming

Batch jobs process bounded data sets (e.g., daily ETL). They start, process all data, and terminate. Pricing uses lower vCPU/memory rates and HDD storage.

Streaming jobs process unbounded, real-time data (e.g., Pub/Sub ingestion). They run continuously with higher vCPU/memory rates and SSD storage for low-latency state access.

Streaming Engine

Streaming Engine offloads windowing, state management, and shuffle operations from worker VMs to the Dataflow service backend. This reduces worker CPU/memory consumption and improves autoscaling responsiveness.

At $0.018/hr per streaming unit, it often lowers total cost by allowing fewer or smaller workers while improving pipeline stability.

FlexRS (Flexible Resource Scheduling)

FlexRS is available for batch jobs only. It uses a mix of preemptible and on-demand VMs, scheduling execution during periods of available capacity. vCPU cost drops from $0.056 to $0.034/hr, roughly a 40% discount.

Trade-off: jobs may be delayed up to 6 hours and may take longer due to preemption. Best for non-time-sensitive workloads like overnight ETL.

Dataflow Prime

Dataflow Prime introduces per-DCPU pricing at $0.086/DCPU-hour. Instead of provisioning specific machine types, you specify processing capacity in DCPUs and Dataflow automatically manages resource allocation, right-sizing, and vertical autoscaling.

Prime is ideal for variable workloads where manual worker sizing is difficult. It can reduce over-provisioning costs significantly.

Comparison with Spark on Dataproc

Dataflow (Apache Beam) and Dataproc (Apache Spark) both handle large-scale data processing on GCP. Key differences:

Dataflow is fully managed and serverless, requiring no cluster provisioning. Dataproc requires cluster management.
Dataflow excels at streaming. Spark Structured Streaming exists but Dataflow's exactly-once semantics are more mature.
Dataproc can be cheaper for long-running batch jobs using sustained-use discounts and preemptible VMs.
Dataflow charges per-resource (vCPU, memory, storage). Dataproc charges a management fee on top of Compute Engine costs.

About This Tool

The Dataflow Cost Estimator calculates monthly costs for both batch and streaming Apache Beam jobs running on Google Cloud Dataflow. It factors in worker vCPUs, memory, persistent disk, Streaming Engine, FlexRS (Flexible Resource Scheduling), and Dataflow Prime pricing. The tool helps you compare worker configurations and choose the most cost-effective execution mode.

Real-World Scenario

Your team's nightly ETL job processes 500 GB of data and currently runs on 50 worker VMs (n1-standard-4) for 2 hours. Cost: ~$60/run × 30 = $1,800/month. The estimator models alternatives: switching to FlexRS saves 40% = $1,080/month. Migrating to Streaming Engine isn't applicable (this is batch). Switching to Dataflow Prime with right-sized auto-scaling reduces worker-hours by 35% (workers are smaller than over-provisioned default) = $1,260/month before Prime premium = $1,400/month. Best path: FlexRS for $1,080/month savings of $720/month.

When to Use This Tool

•Estimating monthly costs for a streaming Dataflow pipeline processing real-time events from Pub/Sub.
•Comparing FlexRS batch costs against standard batch pricing for delay-tolerant ETL workloads.
•Calculating the savings from enabling Streaming Engine vs. running in classic worker-based mode.
•Modeling costs for scaling a Dataflow pipeline from development to production worker counts.

Pro Tips

TIP

Streaming Engine should be the default for any streaming Dataflow pipeline in 2026. The legacy worker-based streaming model leaves CPU sitting idle most of the time waiting for shuffles. Streaming Engine bills per Streaming Compute Unit (SCU) — usually 30-50% cheaper than equivalent worker-based pricing for the same throughput.

TIP

FlexRS is the right choice for any nightly/scheduled batch job that can tolerate a delay window. It schedules workers when GCE has surplus capacity, saving up to 40%. The scheduling delay is at most 6 hours — typically much less. Don't use FlexRS for time-sensitive batch jobs (e.g., end-of-day reports needed by 9am).

TIP

Dataflow Prime adds vertical autoscaling (workers can be resized mid-job) and right-sizing recommendations on top of standard Dataflow. The 11% premium price is almost always worth it for batch jobs with variable per-stage resource requirements — the over-provisioning savings exceed the premium.

How It Works Under the Hood

The estimator computes Dataflow cost across two execution models: classic (worker vCPU-hours × rate + worker memory GB-hours × rate + worker disk GB-month × rate) and Streaming Engine (Streaming Compute Units × per-SCU rate, which bundles compute and shuffle). FlexRS applies a flat discount (~40%) to the worker compute portion. Dataflow Prime adds an 11% premium across the board.

Frequently Asked Questions

How is Dataflow pricing structured?: Dataflow charges per vCPU-hour, per GB-hour of memory, and per GB-month of persistent disk for worker resources. Streaming Engine pricing replaces worker-level charges with per-Streaming-Unit pricing that includes CPU, memory, and state storage. FlexRS offers up to 40% savings for batch jobs with flexible scheduling.
What is Streaming Engine?: Streaming Engine offloads the data shuffling and state management from worker VMs to a Google-managed service. This reduces worker resource requirements, improves autoscaling responsiveness, and provides better observability. It is recommended for all new streaming pipelines and is priced per Streaming Compute Unit.
What is FlexRS?: Flexible Resource Scheduling (FlexRS) reduces batch job costs by up to 40% by running workers on a combination of preemptible and standard VMs with flexible scheduling. Dataflow queues the job and starts it within 6 hours when resources are cheapest. FlexRS is ideal for nightly ETL jobs and other delay-tolerant workloads.

Related Learning Guides

Cost Optimization Guide24 min read

Was this tool helpful?

Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.