Estimate Dataflow costs for batch and streaming jobs with worker config, Streaming Engine, FlexRS, and Dataflow Prime.
Last verified: May 2026
Configure your Dataflow job parameters and click "Estimate Cost" to see the breakdown.
Output will appear here...Batch jobs process bounded data sets (e.g., daily ETL). They start, process all data, and terminate. Pricing uses lower vCPU/memory rates and HDD storage.
Streaming jobs process unbounded, real-time data (e.g., Pub/Sub ingestion). They run continuously with higher vCPU/memory rates and SSD storage for low-latency state access.
Streaming Engine offloads windowing, state management, and shuffle operations from worker VMs to the Dataflow service backend. This reduces worker CPU/memory consumption and improves autoscaling responsiveness.
At $0.018/hr per streaming unit, it often lowers total cost by allowing fewer or smaller workers while improving pipeline stability.
FlexRS is available for batch jobs only. It uses a mix of preemptible and on-demand VMs, scheduling execution during periods of available capacity. vCPU cost drops from $0.056 to $0.034/hr, roughly a 40% discount.
Trade-off: jobs may be delayed up to 6 hours and may take longer due to preemption. Best for non-time-sensitive workloads like overnight ETL.
Dataflow Prime introduces per-DCPU pricing at $0.086/DCPU-hour. Instead of provisioning specific machine types, you specify processing capacity in DCPUs and Dataflow automatically manages resource allocation, right-sizing, and vertical autoscaling.
Prime is ideal for variable workloads where manual worker sizing is difficult. It can reduce over-provisioning costs significantly.
Dataflow (Apache Beam) and Dataproc (Apache Spark) both handle large-scale data processing on GCP. Key differences:
The Dataflow Cost Estimator calculates monthly costs for both batch and streaming Apache Beam jobs running on Google Cloud Dataflow. It factors in worker vCPUs, memory, persistent disk, Streaming Engine, FlexRS (Flexible Resource Scheduling), and Dataflow Prime pricing. The tool helps you compare worker configurations and choose the most cost-effective execution mode.
Your team's nightly ETL job processes 500 GB of data and currently runs on 50 worker VMs (n1-standard-4) for 2 hours. Cost: ~$60/run × 30 = $1,800/month. The estimator models alternatives: switching to FlexRS saves 40% = $1,080/month. Migrating to Streaming Engine isn't applicable (this is batch). Switching to Dataflow Prime with right-sized auto-scaling reduces worker-hours by 35% (workers are smaller than over-provisioned default) = $1,260/month before Prime premium = $1,400/month. Best path: FlexRS for $1,080/month savings of $720/month.
Streaming Engine should be the default for any streaming Dataflow pipeline in 2026. The legacy worker-based streaming model leaves CPU sitting idle most of the time waiting for shuffles. Streaming Engine bills per Streaming Compute Unit (SCU) — usually 30-50% cheaper than equivalent worker-based pricing for the same throughput.
FlexRS is the right choice for any nightly/scheduled batch job that can tolerate a delay window. It schedules workers when GCE has surplus capacity, saving up to 40%. The scheduling delay is at most 6 hours — typically much less. Don't use FlexRS for time-sensitive batch jobs (e.g., end-of-day reports needed by 9am).
Dataflow Prime adds vertical autoscaling (workers can be resized mid-job) and right-sizing recommendations on top of standard Dataflow. The 11% premium price is almost always worth it for batch jobs with variable per-stage resource requirements — the over-provisioning savings exceed the premium.
The estimator computes Dataflow cost across two execution models: classic (worker vCPU-hours × rate + worker memory GB-hours × rate + worker disk GB-month × rate) and Streaming Engine (Streaming Compute Units × per-SCU rate, which bundles compute and shuffle). FlexRS applies a flat discount (~40%) to the worker compute portion. Dataflow Prime adds an 11% premium across the board.
Was this tool helpful?
Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.