Estimate Vertex AI costs for Gemini API tokens, custom training with GPUs, prediction endpoints, and AutoML.
1.0M tokens
0.5M tokens
Gemini 2.0 Flash is the most cost-effective for high-volume workloads at $0.10/1M input tokens. Gemini 1.5 Pro offers best quality for complex tasks at $1.25-$2.50/1M input tokens depending on context length. Gemini 1.5 Flash balances speed and cost at $0.075/1M input tokens. Prompts exceeding 128K tokens use long-context pricing (2x rates).
T4 ($0.35/hr): inference and light training. V100 ($2.48/hr): general-purpose training. A100 40GB ($2.95/hr): large model training with high memory bandwidth. A100 80GB ($3.67/hr): for models exceeding 40GB GPU memory. H100 ($12.24/hr): latest generation, best for LLM fine-tuning and large-scale training with up to 3x throughput over A100.
Use preemptible VMs for up to 60-91% savings on training jobs that can tolerate interruption. Start with smaller machine types and scale up based on GPU utilization metrics. Use spot instances for hyperparameter tuning jobs. Consider distributed training across multiple smaller GPUs instead of a single large GPU for better cost-performance ratio.
Configure min/max replicas to balance cost and latency. Set minimum replicas to handle baseline traffic without cold starts. Use scale-to-zero for development endpoints to avoid idle costs. Monitor CPU/GPU utilization to right-size machine types. Consider traffic splitting to gradually shift traffic to new model versions.
Output will appear here...The Vertex AI Cost Estimator helps you project monthly costs for Google Cloud's Vertex AI platform including Gemini API token usage, custom model training with GPUs, online and batch prediction endpoints, and AutoML training. Each pricing dimension varies significantly, and this tool consolidates them into a single estimate with component-level breakdowns.
Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.