AI Services Across Clouds
Comprehensive comparison of AI services across AWS Bedrock, Azure OpenAI, and GCP Vertex AI covering models, APIs, RAG, embeddings, pricing, and architecture patterns.
Prerequisites
- Basic understanding of generative AI and LLMs
- Familiarity with at least one cloud provider
The AI Services Landscape Across Clouds
Every major cloud provider now offers managed AI services that let you build generative AI applications without managing GPU infrastructure. AWS has Amazon Bedrock, Azure has Azure OpenAI Service, and GCP has Vertex AI with Gemini. While they solve the same fundamental problem, connecting your applications to powerful foundation models, their approaches differ significantly in model selection, API design, enterprise features, pricing, and ecosystem integration.
Choosing between them depends on your existing cloud investment, specific model requirements, compliance needs, and budget. Many enterprises use multiple AI services simultaneously, leveraging each provider's strengths: Anthropic's Claude on Bedrock for complex reasoning, GPT-4o on Azure for Microsoft 365 integration, and Gemini on Vertex for multimodal workloads with native GCP integration.
This guide provides a comprehensive comparison across every dimension that matters for production AI deployments: available models, API interfaces, RAG capabilities, guardrails and safety, pricing, enterprise security, and real-world architecture patterns. Each section includes equivalent code samples so you can see exactly how the same task is accomplished on each platform.
This Comparison Is a Snapshot
AI services are evolving rapidly. Model availability, pricing, and features change monthly. This guide reflects the state as of early 2026. Always check the official documentation for the latest information. The architectural principles and comparison framework remain stable even as specific details change.
Model Availability Comparison
The most important difference between cloud AI services is which models they offer. Each platform provides a mix of first-party models (built by the cloud provider) and third-party models (from AI labs like Anthropic, Meta, and Mistral). Your model choice drives quality, cost, and capabilities.
Model Availability Matrix
| Model Provider | AWS Bedrock | Azure OpenAI | GCP Vertex AI |
|---|---|---|---|
| Anthropic Claude | Claude 3.5 Sonnet, Haiku, Opus | Not available | Claude 3.5 Sonnet (via Model Garden) |
| OpenAI GPT | Not available | GPT-4o, GPT-4o mini, GPT-4 Turbo | Not available |
| Google Gemini | Not available | Not available | Gemini 2.0 Flash, 1.5 Pro, 1.5 Flash |
| Meta Llama | Llama 3.1 (8B, 70B, 405B) | Not available | Llama 3.1 (via Model Garden) |
| Mistral | Mistral Large, Small | Mistral Large (select regions) | Mistral (via Model Garden) |
| Amazon/First-Party | Titan Text, Titan Embeddings | N/A | N/A |
| Cohere | Command R, Command R+ | Not available | Not available |
| Stability AI | Stable Diffusion XL | Not available | Stable Diffusion (via Model Garden) |
Model Lock-In Consideration
If model flexibility is important, AWS Bedrock offers the widest selection of third-party models through a unified API. Azure OpenAI provides the deepest integration with OpenAI models but is limited to those models. GCP Vertex AI has Gemini as its primary offering with a broad Model Garden for open-source models. To avoid lock-in, abstract your AI calls behind a service layer that can switch providers.
API Design and Developer Experience
Each platform takes a different approach to API design, which affects how quickly you can build and how portable your code is across providers.
AWS Bedrock: Converse API
import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{
"role": "user",
"content": [{"text": "What is the capital of France?"}],
}
],
system=[{"text": "Answer in one sentence."}],
inferenceConfig={"maxTokens": 256, "temperature": 0.3},
)
print(response["output"]["message"]["content"][0]["text"])
print(f"Tokens: {response['usage']['inputTokens']} in, {response['usage']['outputTokens']} out")Azure OpenAI: Chat Completions API
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="your-key",
api_version="2024-10-21",
azure_endpoint="https://my-resource.openai.azure.com",
)
response = client.chat.completions.create(
model="gpt-4o", # deployment name
messages=[
{"role": "system", "content": "Answer in one sentence."},
{"role": "user", "content": "What is the capital of France?"},
],
max_tokens=256,
temperature=0.3,
)
print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")GCP Vertex AI: Gemini API
import vertexai
from vertexai.generative_models import GenerativeModel
vertexai.init(project="my-project", location="us-central1")
model = GenerativeModel(
"gemini-2.0-flash",
system_instruction="Answer in one sentence.",
)
response = model.generate_content(
"What is the capital of France?",
generation_config={"max_output_tokens": 256, "temperature": 0.3},
)
print(response.text)
print(f"Tokens: {response.usage_metadata.prompt_token_count} in, "
f"{response.usage_metadata.candidates_token_count} out")API Comparison Summary
| Feature | AWS Bedrock | Azure OpenAI | GCP Vertex AI |
|---|---|---|---|
| Unified API across models | Yes (Converse API) | Yes (OpenAI SDK) | Partial (Gemini-specific) |
| Streaming | ConverseStream | stream=True | stream=True |
| Tool use / Function calling | Yes (all models) | Yes (GPT-4o family) | Yes (Gemini family) |
| Multimodal (images) | Yes (Claude, Titan) | Yes (GPT-4o) | Yes (Gemini, native) |
| Video understanding | Limited | Limited | Native (Gemini) |
| SDK languages | Python, JS, Java, Go, .NET | Python, JS, .NET, Java, Go | Python, JS, Java, Go |
RAG and Knowledge Base Comparison
All three platforms provide managed RAG solutions that handle document ingestion, chunking, embedding, vector storage, and retrieval. The implementation details and flexibility differ significantly.
| RAG Feature | AWS Bedrock | Azure OpenAI | GCP Vertex AI |
|---|---|---|---|
| Managed RAG | Knowledge Bases | On Your Data | Vertex AI Search + Grounding |
| Vector stores | OpenSearch, Aurora, Pinecone, Redis | Azure AI Search | Vertex AI Vector Search, AlloyDB |
| Document sources | S3, web crawler, Confluence | Blob Storage, AI Search index | Cloud Storage, BigQuery, websites |
| Chunking strategies | Fixed, semantic, hierarchical | Fixed size | Fixed, semantic (via Vertex AI Search) |
| Hybrid search | Yes (OpenSearch) | Yes (AI Search) | Yes (Vertex AI Search) |
| Citation support | Yes (source attribution) | Yes (data references) | Yes (grounding metadata) |
| Web grounding | No | Bing Search integration | Google Search grounding |
Custom RAG vs. Managed RAG
Managed RAG solutions (Bedrock Knowledge Bases, Azure On Your Data, Vertex AI Search grounding) are fastest to set up but give you less control over chunking, retrieval strategy, and prompt construction. For production applications, most teams build custom RAG pipelines using the provider's vector store directly (OpenSearch, AI Search, Vector Search) combined with their own embedding and retrieval logic. This provides better quality at the cost of more engineering effort.
Embeddings Comparison
Embeddings are the foundation of RAG and semantic search. Each provider offers different embedding models with varying dimensions, quality, and pricing.
Side-by-Side Embedding Code
# ── AWS Bedrock ──
import boto3, json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
response = bedrock.invoke_model(
modelId="amazon.titan-embed-text-v2:0",
body=json.dumps({"inputText": "Cloud computing fundamentals", "dimensions": 1024}),
)
embedding_aws = json.loads(response["body"].read())["embedding"]
print(f"Bedrock Titan: {len(embedding_aws)} dimensions")
# ── Azure OpenAI ──
from openai import AzureOpenAI
azure_client = AzureOpenAI(api_key="key", api_version="2024-10-21",
azure_endpoint="https://my.openai.azure.com")
response = azure_client.embeddings.create(
model="text-embedding-3-large", input=["Cloud computing fundamentals"], dimensions=1024,
)
embedding_azure = response.data[0].embedding
print(f"Azure OpenAI: {len(embedding_azure)} dimensions")
# ── GCP Vertex AI ──
from vertexai.language_models import TextEmbeddingModel
model = TextEmbeddingModel.from_pretrained("text-embedding-005")
embeddings = model.get_embeddings(["Cloud computing fundamentals"], output_dimensionality=1024)
embedding_gcp = embeddings[0].values
print(f"Vertex AI: {len(embedding_gcp)} dimensions")Embedding Model Comparison
| Model | Provider | Dimensions | Price per 1M tokens |
|---|---|---|---|
| Titan Embed Text v2 | AWS Bedrock | 256 / 512 / 1024 | $0.02 |
| text-embedding-3-large | Azure OpenAI | 256-3072 | $0.13 |
| text-embedding-3-small | Azure OpenAI | 512-1536 | $0.02 |
| text-embedding-005 | GCP Vertex AI | 768 (default) | $0.00001 |
GCP Has the Cheapest Embeddings
GCP's text-embedding-005 is orders of magnitude cheaper than competitors at $0.00001 per 1,000 tokens (effectively free for most workloads). If embeddings are a significant cost driver, consider using GCP for embedding generation even if your primary LLM is on another provider. The embedding vectors can be stored in any vector database regardless of which cloud generated them.
Safety and Content Filtering
All three platforms provide content filtering to prevent harmful content generation. They differ in granularity, customizability, and approach.
| Safety Feature | AWS Bedrock | Azure OpenAI | GCP Vertex AI |
|---|---|---|---|
| Content filtering | Guardrails (customizable) | Built-in (configurable thresholds) | Safety Settings (per-category) |
| PII detection | Yes (anonymize or block) | No (use Azure AI services) | No (use DLP API) |
| Topic restrictions | Yes (deny topics) | No (prompt engineering) | No (prompt engineering) |
| Custom word filters | Yes (word lists) | Custom blocklists | No |
| Jailbreak detection | Yes (prompt attack filter) | Yes (jailbreak detection) | Partial (safety settings) |
| Independent from model | Yes (applies to any model) | Tied to deployment | Per-request configuration |
Pricing Comparison
Pricing is one of the most important factors in choosing an AI service, especially at scale. The following comparison shows per-token pricing for equivalent model tiers. Prices are approximate and change frequently.
Flagship Model Pricing (per 1M tokens)
| Tier | AWS Bedrock | Azure OpenAI | GCP Vertex AI |
|---|---|---|---|
| Flagship (Input) | Claude Sonnet: $3.00 | GPT-4o: $2.50 | Gemini 1.5 Pro: $1.25 |
| Flagship (Output) | Claude Sonnet: $15.00 | GPT-4o: $10.00 | Gemini 1.5 Pro: $5.00 |
| Fast/Cheap (Input) | Claude Haiku: $0.25 | GPT-4o mini: $0.15 | Gemini 2.0 Flash: $0.10 |
| Fast/Cheap (Output) | Claude Haiku: $1.25 | GPT-4o mini: $0.60 | Gemini 2.0 Flash: $0.40 |
| Batch discount | Up to 50% off | 50% off (global batch) | 50% off (batch prediction) |
| Committed pricing | Provisioned Throughput | PTU (Provisioned Throughput Units) | Provisioned Throughput |
Hidden Costs
Token pricing is not the only cost. Consider: vector database costs (OpenSearch Serverless at $700/month minimum vs. AI Search at $250/month vs. Vertex AI Vector Search at ~$200/month), data transfer costs for cross-region or cross-service communication, and the cost of minimum instances for managed RAG services. For many production deployments, infrastructure costs exceed model invocation costs.
Enterprise Security Comparison
Enterprise security features determine whether you can deploy AI services in regulated environments. All three platforms provide strong security, but the implementations differ.
| Security Feature | AWS Bedrock | Azure OpenAI | GCP Vertex AI |
|---|---|---|---|
| Private networking | VPC endpoints | Private Endpoints | VPC Service Controls |
| Authentication | IAM (SigV4) | Entra ID / API Key | IAM (OAuth2) |
| Encryption at rest | AWS KMS (CMK) | Azure Key Vault (CMK) | Cloud KMS (CMEK) |
| Data residency | Per region | Per region | Per region + data governance |
| SOC 2 | Yes | Yes | Yes |
| HIPAA | Yes (BAA) | Yes (BAA) | Yes (BAA) |
| FedRAMP | High (GovCloud) | High (Gov regions) | Moderate (select services) |
| Data used for training | No (opt-out by default) | No (not used) | No (not used) |
Architecture Patterns
Here are three common production architecture patterns, each optimized for a different cloud provider but applicable concepts across all three.
Pattern 1: Multi-Provider for Resilience
# Abstract AI provider behind a common interface
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class AIResponse:
content: str
input_tokens: int
output_tokens: int
provider: str
model: str
class AIProvider(ABC):
@abstractmethod
def complete(self, messages: list, max_tokens: int = 1024) -> AIResponse:
pass
class BedrockProvider(AIProvider):
def __init__(self):
import boto3
self.client = boto3.client("bedrock-runtime", region_name="us-east-1")
def complete(self, messages, max_tokens=1024):
response = self.client.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": m["role"], "content": [{"text": m["content"]}]} for m in messages],
inferenceConfig={"maxTokens": max_tokens},
)
return AIResponse(
content=response["output"]["message"]["content"][0]["text"],
input_tokens=response["usage"]["inputTokens"],
output_tokens=response["usage"]["outputTokens"],
provider="bedrock", model="claude-3.5-sonnet",
)
class AzureOpenAIProvider(AIProvider):
def __init__(self):
from openai import AzureOpenAI
self.client = AzureOpenAI(api_key="key", api_version="2024-10-21",
azure_endpoint="https://my.openai.azure.com")
def complete(self, messages, max_tokens=1024):
response = self.client.chat.completions.create(
model="gpt-4o", messages=messages, max_tokens=max_tokens,
)
return AIResponse(
content=response.choices[0].message.content,
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens,
provider="azure", model="gpt-4o",
)
class VertexProvider(AIProvider):
def __init__(self):
import vertexai
from vertexai.generative_models import GenerativeModel
vertexai.init(project="my-project", location="us-central1")
self.model = GenerativeModel("gemini-2.0-flash")
def complete(self, messages, max_tokens=1024):
prompt = "\n".join(f"{m['role']}: {m['content']}" for m in messages)
response = self.model.generate_content(
prompt, generation_config={"max_output_tokens": max_tokens},
)
return AIResponse(
content=response.text,
input_tokens=response.usage_metadata.prompt_token_count,
output_tokens=response.usage_metadata.candidates_token_count,
provider="vertex", model="gemini-2.0-flash",
)
# Failover chain
PROVIDERS = [BedrockProvider(), AzureOpenAIProvider(), VertexProvider()]
def complete_with_failover(messages, max_tokens=1024):
for provider in PROVIDERS:
try:
return provider.complete(messages, max_tokens)
except Exception as e:
print(f"Provider {provider.__class__.__name__} failed: {e}")
raise Exception("All providers failed")Decision Framework
Use this decision framework to choose the right AI service for your use case.
Choose AWS Bedrock When...
You need access to Claude (Anthropic) models, your infrastructure is primarily on AWS, you want the widest selection of third-party models through a single API, you need built-in guardrails with PII filtering and topic restrictions, or you want seamless integration with S3 and other AWS services for RAG.
Choose Azure OpenAI When...
You need GPT-4o or other OpenAI models, you are already on Azure or use Microsoft 365, you want the most mature content filtering system, you need FedRAMP High compliance in government regions, or you want to leverage Azure AI Search's hybrid search capabilities for RAG.
Choose GCP Vertex AI When...
You need native multimodal understanding (video, audio, images), you want the lowest-cost option (Gemini Flash is cheapest), you need Google Search grounding for up-to-date answers, your data is in BigQuery and you want tight integration, or you want the best model evaluation and fine-tuning tools.
Start Small, Iterate Fast
Do not over-invest in architecture before proving your use case. Start with the cloud you already use, the fastest model to prototype with (Gemini Flash or GPT-4o mini), and a simple prompt-based approach. Add RAG only when you confirm the model needs external data. Add guardrails when you move to production. Optimize costs when you have traffic data. The best AI architecture is the one you can ship this week.
Migration Between Providers
Switching between AI providers is simpler than migrating most cloud services because the interface is essentially "text in, text out." The main challenges are adapting prompts (models respond differently to the same prompt), migrating RAG infrastructure (vector databases and document pipelines), and updating authentication.
Prompt adaptation: Each model family has different strengths and sensitivities. A prompt optimized for GPT-4o may produce different results on Claude or Gemini. Budget 2-3 days for prompt testing and tuning when switching models.
Embedding migration: You cannot mix embeddings from different models in the same vector index. If you switch embedding models, you must re-embed all documents. This is typically a batch job that takes hours for large corpora.
Use abstraction layers: The multi-provider code example above shows how to abstract provider-specific code behind a common interface. This is the best investment you can make for portability and resilience.
Next Steps
After choosing your primary AI service, dive deep into the provider-specific guides for implementation details:
AWS Bedrock: Building AI ApplicationsAzure OpenAI Service GuideGCP Gemini & Vertex AI GuideKey Takeaways
- 1AWS Bedrock offers the widest third-party model selection through a unified Converse API.
- 2Azure OpenAI provides the deepest OpenAI model integration with enterprise Azure security.
- 3GCP Vertex AI has the cheapest pricing and native multimodal understanding with Gemini.
- 4All three providers offer managed RAG solutions, but custom RAG provides better quality and control.
- 5GCP has the cheapest embedding models; Bedrock has the most comprehensive guardrails.
- 6Abstract your AI calls behind a provider-agnostic interface for portability and resilience.
Frequently Asked Questions
Which cloud AI service is the cheapest?
Can I use multiple cloud AI services together?
Which service is best for RAG?
How do I avoid vendor lock-in with AI services?
Do cloud AI services use my data for training?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.