Multi-CloudReferenceintermediate

AI Services Across Clouds

Comprehensive comparison of AI services across AWS Bedrock, Azure OpenAI, and GCP Vertex AI covering models, APIs, RAG, embeddings, pricing, and architecture patterns.

CloudToolStack Editorial24 min readPublished Mar 14, 2026

Prerequisites

Basic understanding of generative AI and LLMs
Familiarity with at least one cloud provider

The AI Services Landscape Across Clouds

Every major cloud provider now offers managed AI services that let you build generative AI applications without managing GPU infrastructure. AWS has Amazon Bedrock, Azure has Azure OpenAI Service, and GCP has Vertex AI with Gemini. While they solve the same fundamental problem, connecting your applications to powerful foundation models, their approaches differ significantly in model selection, API design, enterprise features, pricing, and ecosystem integration.

Choosing between them depends on your existing cloud investment, specific model requirements, compliance needs, and budget. Many enterprises use multiple AI services simultaneously, leveraging each provider's strengths: Anthropic's Claude on Bedrock for complex reasoning, GPT-4o on Azure for Microsoft 365 integration, and Gemini on Vertex for multimodal workloads with native GCP integration.

This guide provides a comprehensive comparison across every dimension that matters for production AI deployments: available models, API interfaces, RAG capabilities, guardrails and safety, pricing, enterprise security, and real-world architecture patterns. Each section includes equivalent code samples so you can see exactly how the same task is accomplished on each platform.

This Comparison Is a Snapshot

AI services are evolving rapidly. Model availability, pricing, and features change monthly. This guide reflects the state as of early 2026. Always check the official documentation for the latest information. The architectural principles and comparison framework remain stable even as specific details change.

Model Availability Comparison

The most important difference between cloud AI services is which models they offer. Each platform provides a mix of first-party models (built by the cloud provider) and third-party models (from AI labs like Anthropic, Meta, and Mistral). Your model choice drives quality, cost, and capabilities.

Model Availability Matrix

Model Provider	AWS Bedrock	Azure OpenAI	GCP Vertex AI
Anthropic Claude	Claude 3.5 Sonnet, Haiku, Opus	Not available	Claude 3.5 Sonnet (via Model Garden)
OpenAI GPT	Not available	GPT-4o, GPT-4o mini, GPT-4 Turbo	Not available
Google Gemini	Not available	Not available	Gemini 2.0 Flash, 1.5 Pro, 1.5 Flash
Meta Llama	Llama 3.1 (8B, 70B, 405B)	Not available	Llama 3.1 (via Model Garden)
Mistral	Mistral Large, Small	Mistral Large (select regions)	Mistral (via Model Garden)
Amazon/First-Party	Titan Text, Titan Embeddings	N/A	N/A
Cohere	Command R, Command R+	Not available	Not available
Stability AI	Stable Diffusion XL	Not available	Stable Diffusion (via Model Garden)

Model Lock-In Consideration

If model flexibility is important, AWS Bedrock offers the widest selection of third-party models through a unified API. Azure OpenAI provides the deepest integration with OpenAI models but is limited to those models. GCP Vertex AI has Gemini as its primary offering with a broad Model Garden for open-source models. To avoid lock-in, abstract your AI calls behind a service layer that can switch providers.

API Design and Developer Experience

Each platform takes a different approach to API design, which affects how quickly you can build and how portable your code is across providers.

AWS Bedrock: Converse API

python

import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": "What is the capital of France?"}],
        }
    ],
    system=[{"text": "Answer in one sentence."}],
    inferenceConfig={"maxTokens": 256, "temperature": 0.3},
)

print(response["output"]["message"]["content"][0]["text"])
print(f"Tokens: {response['usage']['inputTokens']} in, {response['usage']['outputTokens']} out")

Azure OpenAI: Chat Completions API

python

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="your-key",
    api_version="2024-10-21",
    azure_endpoint="https://my-resource.openai.azure.com",
)

response = client.chat.completions.create(
    model="gpt-4o",  # deployment name
    messages=[
        {"role": "system", "content": "Answer in one sentence."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.3,
)

print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

GCP Vertex AI: Gemini API

python

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-project", location="us-central1")

model = GenerativeModel(
    "gemini-2.0-flash",
    system_instruction="Answer in one sentence.",
)

response = model.generate_content(
    "What is the capital of France?",
    generation_config={"max_output_tokens": 256, "temperature": 0.3},
)

print(response.text)
print(f"Tokens: {response.usage_metadata.prompt_token_count} in, "
      f"{response.usage_metadata.candidates_token_count} out")

API Comparison Summary

Feature	AWS Bedrock	Azure OpenAI	GCP Vertex AI
Unified API across models	Yes (Converse API)	Yes (OpenAI SDK)	Partial (Gemini-specific)
Streaming	ConverseStream	stream=True	stream=True
Tool use / Function calling	Yes (all models)	Yes (GPT-4o family)	Yes (Gemini family)
Multimodal (images)	Yes (Claude, Titan)	Yes (GPT-4o)	Yes (Gemini, native)
Video understanding	Limited	Limited	Native (Gemini)
SDK languages	Python, JS, Java, Go, .NET	Python, JS, .NET, Java, Go	Python, JS, Java, Go

RAG and Knowledge Base Comparison

All three platforms provide managed RAG solutions that handle document ingestion, chunking, embedding, vector storage, and retrieval. The implementation details and flexibility differ significantly.

RAG Feature	AWS Bedrock	Azure OpenAI	GCP Vertex AI
Managed RAG	Knowledge Bases	On Your Data	Vertex AI Search + Grounding
Vector stores	OpenSearch, Aurora, Pinecone, Redis	Azure AI Search	Vertex AI Vector Search, AlloyDB
Document sources	S3, web crawler, Confluence	Blob Storage, AI Search index	Cloud Storage, BigQuery, websites
Chunking strategies	Fixed, semantic, hierarchical	Fixed size	Fixed, semantic (via Vertex AI Search)
Hybrid search	Yes (OpenSearch)	Yes (AI Search)	Yes (Vertex AI Search)
Citation support	Yes (source attribution)	Yes (data references)	Yes (grounding metadata)
Web grounding	No	Bing Search integration	Google Search grounding

Custom RAG vs. Managed RAG

Managed RAG solutions (Bedrock Knowledge Bases, Azure On Your Data, Vertex AI Search grounding) are fastest to set up but give you less control over chunking, retrieval strategy, and prompt construction. For production applications, most teams build custom RAG pipelines using the provider's vector store directly (OpenSearch, AI Search, Vector Search) combined with their own embedding and retrieval logic. This provides better quality at the cost of more engineering effort.

Embeddings Comparison

Embeddings are the foundation of RAG and semantic search. Each provider offers different embedding models with varying dimensions, quality, and pricing.

Side-by-Side Embedding Code

python

# ── AWS Bedrock ──
import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
response = bedrock.invoke_model(
    modelId="amazon.titan-embed-text-v2:0",
    body=json.dumps({"inputText": "Cloud computing fundamentals", "dimensions": 1024}),
)
embedding_aws = json.loads(response["body"].read())["embedding"]
print(f"Bedrock Titan: {len(embedding_aws)} dimensions")

# ── Azure OpenAI ──
from openai import AzureOpenAI

azure_client = AzureOpenAI(api_key="key", api_version="2024-10-21",
                            azure_endpoint="https://my.openai.azure.com")
response = azure_client.embeddings.create(
    model="text-embedding-3-large", input=["Cloud computing fundamentals"], dimensions=1024,
)
embedding_azure = response.data[0].embedding
print(f"Azure OpenAI: {len(embedding_azure)} dimensions")

# ── GCP Vertex AI ──
from vertexai.language_models import TextEmbeddingModel

model = TextEmbeddingModel.from_pretrained("text-embedding-005")
embeddings = model.get_embeddings(["Cloud computing fundamentals"], output_dimensionality=1024)
embedding_gcp = embeddings[0].values
print(f"Vertex AI: {len(embedding_gcp)} dimensions")

Embedding Model Comparison

Model	Provider	Dimensions	Price per 1M tokens
Titan Embed Text v2	AWS Bedrock	256 / 512 / 1024	$0.02
text-embedding-3-large	Azure OpenAI	256-3072	$0.13
text-embedding-3-small	Azure OpenAI	512-1536	$0.02
text-embedding-005	GCP Vertex AI	768 (default)	$0.00001

GCP Has the Cheapest Embeddings

GCP's text-embedding-005 is orders of magnitude cheaper than competitors at $0.00001 per 1,000 tokens (effectively free for most workloads). If embeddings are a significant cost driver, consider using GCP for embedding generation even if your primary LLM is on another provider. The embedding vectors can be stored in any vector database regardless of which cloud generated them.

Safety and Content Filtering

All three platforms provide content filtering to prevent harmful content generation. They differ in granularity, customizability, and approach.

Safety Feature	AWS Bedrock	Azure OpenAI	GCP Vertex AI
Content filtering	Guardrails (customizable)	Built-in (configurable thresholds)	Safety Settings (per-category)
PII detection	Yes (anonymize or block)	No (use Azure AI services)	No (use DLP API)
Topic restrictions	Yes (deny topics)	No (prompt engineering)	No (prompt engineering)
Custom word filters	Yes (word lists)	Custom blocklists	No
Jailbreak detection	Yes (prompt attack filter)	Yes (jailbreak detection)	Partial (safety settings)
Independent from model	Yes (applies to any model)	Tied to deployment	Per-request configuration

Pricing Comparison

Pricing is one of the most important factors in choosing an AI service, especially at scale. The following comparison shows per-token pricing for equivalent model tiers. Prices are approximate and change frequently.

Flagship Model Pricing (per 1M tokens)

Tier	AWS Bedrock	Azure OpenAI	GCP Vertex AI
Flagship (Input)	Claude Sonnet: $3.00	GPT-4o: $2.50	Gemini 1.5 Pro: $1.25
Flagship (Output)	Claude Sonnet: $15.00	GPT-4o: $10.00	Gemini 1.5 Pro: $5.00
Fast/Cheap (Input)	Claude Haiku: $0.25	GPT-4o mini: $0.15	Gemini 2.0 Flash: $0.10
Fast/Cheap (Output)	Claude Haiku: $1.25	GPT-4o mini: $0.60	Gemini 2.0 Flash: $0.40
Batch discount	Up to 50% off	50% off (global batch)	50% off (batch prediction)
Committed pricing	Provisioned Throughput	PTU (Provisioned Throughput Units)	Provisioned Throughput

Hidden Costs

Token pricing is not the only cost. Consider: vector database costs (OpenSearch Serverless at $700/month minimum vs. AI Search at $250/month vs. Vertex AI Vector Search at ~$200/month), data transfer costs for cross-region or cross-service communication, and the cost of minimum instances for managed RAG services. For many production deployments, infrastructure costs exceed model invocation costs.

Enterprise Security Comparison

Enterprise security features determine whether you can deploy AI services in regulated environments. All three platforms provide strong security, but the implementations differ.

Security Feature	AWS Bedrock	Azure OpenAI	GCP Vertex AI
Private networking	VPC endpoints	Private Endpoints	VPC Service Controls
Authentication	IAM (SigV4)	Entra ID / API Key	IAM (OAuth2)
Encryption at rest	AWS KMS (CMK)	Azure Key Vault (CMK)	Cloud KMS (CMEK)
Data residency	Per region	Per region	Per region + data governance
SOC 2	Yes	Yes	Yes
HIPAA	Yes (BAA)	Yes (BAA)	Yes (BAA)
FedRAMP	High (GovCloud)	High (Gov regions)	Moderate (select services)
Data used for training	No (opt-out by default)	No (not used)	No (not used)

Architecture Patterns

Here are three common production architecture patterns, each optimized for a different cloud provider but applicable concepts across all three.

Pattern 1: Multi-Provider for Resilience

python

# Abstract AI provider behind a common interface
from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class AIResponse:
    content: str
    input_tokens: int
    output_tokens: int
    provider: str
    model: str

class AIProvider(ABC):
    @abstractmethod
    def complete(self, messages: list, max_tokens: int = 1024) -> AIResponse:
        pass

class BedrockProvider(AIProvider):
    def __init__(self):
        import boto3
        self.client = boto3.client("bedrock-runtime", region_name="us-east-1")

    def complete(self, messages, max_tokens=1024):
        response = self.client.converse(
            modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
            messages=[{"role": m["role"], "content": [{"text": m["content"]}]} for m in messages],
            inferenceConfig={"maxTokens": max_tokens},
        )
        return AIResponse(
            content=response["output"]["message"]["content"][0]["text"],
            input_tokens=response["usage"]["inputTokens"],
            output_tokens=response["usage"]["outputTokens"],
            provider="bedrock", model="claude-3.5-sonnet",
        )

class AzureOpenAIProvider(AIProvider):
    def __init__(self):
        from openai import AzureOpenAI
        self.client = AzureOpenAI(api_key="key", api_version="2024-10-21",
                                   azure_endpoint="https://my.openai.azure.com")

    def complete(self, messages, max_tokens=1024):
        response = self.client.chat.completions.create(
            model="gpt-4o", messages=messages, max_tokens=max_tokens,
        )
        return AIResponse(
            content=response.choices[0].message.content,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
            provider="azure", model="gpt-4o",
        )

class VertexProvider(AIProvider):
    def __init__(self):
        import vertexai
        from vertexai.generative_models import GenerativeModel
        vertexai.init(project="my-project", location="us-central1")
        self.model = GenerativeModel("gemini-2.0-flash")

    def complete(self, messages, max_tokens=1024):
        prompt = "\n".join(f"{m['role']}: {m['content']}" for m in messages)
        response = self.model.generate_content(
            prompt, generation_config={"max_output_tokens": max_tokens},
        )
        return AIResponse(
            content=response.text,
            input_tokens=response.usage_metadata.prompt_token_count,
            output_tokens=response.usage_metadata.candidates_token_count,
            provider="vertex", model="gemini-2.0-flash",
        )

# Failover chain
PROVIDERS = [BedrockProvider(), AzureOpenAIProvider(), VertexProvider()]

def complete_with_failover(messages, max_tokens=1024):
    for provider in PROVIDERS:
        try:
            return provider.complete(messages, max_tokens)
        except Exception as e:
            print(f"Provider {provider.__class__.__name__} failed: {e}")
    raise Exception("All providers failed")

Decision Framework

Use this decision framework to choose the right AI service for your use case.

Choose AWS Bedrock When...

You need access to Claude (Anthropic) models, your infrastructure is primarily on AWS, you want the widest selection of third-party models through a single API, you need built-in guardrails with PII filtering and topic restrictions, or you want seamless integration with S3 and other AWS services for RAG.

Choose Azure OpenAI When...

You need GPT-4o or other OpenAI models, you are already on Azure or use Microsoft 365, you want the most mature content filtering system, you need FedRAMP High compliance in government regions, or you want to leverage Azure AI Search's hybrid search capabilities for RAG.

Choose GCP Vertex AI When...

You need native multimodal understanding (video, audio, images), you want the lowest-cost option (Gemini Flash is cheapest), you need Google Search grounding for up-to-date answers, your data is in BigQuery and you want tight integration, or you want the best model evaluation and fine-tuning tools.

Start Small, Iterate Fast

Do not over-invest in architecture before proving your use case. Start with the cloud you already use, the fastest model to prototype with (Gemini Flash or GPT-4o mini), and a simple prompt-based approach. Add RAG only when you confirm the model needs external data. Add guardrails when you move to production. Optimize costs when you have traffic data. The best AI architecture is the one you can ship this week.

Migration Between Providers

Switching between AI providers is simpler than migrating most cloud services because the interface is essentially "text in, text out." The main challenges are adapting prompts (models respond differently to the same prompt), migrating RAG infrastructure (vector databases and document pipelines), and updating authentication.

Prompt adaptation: Each model family has different strengths and sensitivities. A prompt optimized for GPT-4o may produce different results on Claude or Gemini. Budget 2-3 days for prompt testing and tuning when switching models.

Embedding migration: You cannot mix embeddings from different models in the same vector index. If you switch embedding models, you must re-embed all documents. This is typically a batch job that takes hours for large corpora.

Use abstraction layers: The multi-provider code example above shows how to abstract provider-specific code behind a common interface. This is the best investment you can make for portability and resilience.

Next Steps

After choosing your primary AI service, dive deep into the provider-specific guides for implementation details:

AWS Bedrock: Building AI Applications Azure OpenAI Service Guide GCP Gemini & Vertex AI Guide

Key Takeaways

1AWS Bedrock offers the widest third-party model selection through a unified Converse API.
2Azure OpenAI provides the deepest OpenAI model integration with enterprise Azure security.
3GCP Vertex AI has the cheapest pricing and native multimodal understanding with Gemini.
4All three providers offer managed RAG solutions, but custom RAG provides better quality and control.
5GCP has the cheapest embedding models; Bedrock has the most comprehensive guardrails.
6Abstract your AI calls behind a provider-agnostic interface for portability and resilience.

Frequently Asked Questions

Which cloud AI service is the cheapest?

GCP Vertex AI with Gemini 2.0 Flash is the cheapest flagship model at $0.10/$0.40 per million tokens. GCP text-embedding-005 is effectively free. However, total cost includes vector store infrastructure, data transfer, and engineering effort, which can outweigh model costs.

Can I use multiple cloud AI services together?

Yes. Many enterprises use Bedrock for Claude-specific tasks, Azure OpenAI for Microsoft 365 integration, and Vertex AI for multimodal workloads. Abstract provider-specific code behind a common interface and implement failover across providers for resilience.

Which service is best for RAG?

Azure OpenAI with Azure AI Search offers the best managed RAG experience with hybrid search. AWS Bedrock Knowledge Bases provide the most flexible chunking strategies. GCP Vertex AI Search offers Google Search grounding for web-augmented answers. For custom RAG, all three are comparable.

How do I avoid vendor lock-in with AI services?

Use the provider's standard SDK but wrap calls in an abstraction layer. Avoid provider-specific features (agents, knowledge bases) until you've proven your use case. Store embeddings in a portable vector database format. Keep prompts in a template system that can be adapted per model.

Do cloud AI services use my data for training?

No. All three providers (AWS Bedrock, Azure OpenAI, GCP Vertex AI) explicitly state that customer data is not used to train or improve foundation models. Your data stays within your cloud account and is encrypted in transit and at rest.

Written by CloudToolStack Editorial

Written and reviewed by the CloudToolStack editorial team. Every guide is verified against current provider documentation and revised in place when providers change pricing, deprecate services, or release meaningfully better alternatives.

Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.