Skip to main content
Multi-CloudReferenceintermediate

AI Services Across Clouds

Comprehensive comparison of AI services across AWS Bedrock, Azure OpenAI, and GCP Vertex AI covering models, APIs, RAG, embeddings, pricing, and architecture patterns.

CloudToolStack Team24 min readPublished Mar 14, 2026

Prerequisites

  • Basic understanding of generative AI and LLMs
  • Familiarity with at least one cloud provider

The AI Services Landscape Across Clouds

Every major cloud provider now offers managed AI services that let you build generative AI applications without managing GPU infrastructure. AWS has Amazon Bedrock, Azure has Azure OpenAI Service, and GCP has Vertex AI with Gemini. While they solve the same fundamental problem, connecting your applications to powerful foundation models, their approaches differ significantly in model selection, API design, enterprise features, pricing, and ecosystem integration.

Choosing between them depends on your existing cloud investment, specific model requirements, compliance needs, and budget. Many enterprises use multiple AI services simultaneously, leveraging each provider's strengths: Anthropic's Claude on Bedrock for complex reasoning, GPT-4o on Azure for Microsoft 365 integration, and Gemini on Vertex for multimodal workloads with native GCP integration.

This guide provides a comprehensive comparison across every dimension that matters for production AI deployments: available models, API interfaces, RAG capabilities, guardrails and safety, pricing, enterprise security, and real-world architecture patterns. Each section includes equivalent code samples so you can see exactly how the same task is accomplished on each platform.

This Comparison Is a Snapshot

AI services are evolving rapidly. Model availability, pricing, and features change monthly. This guide reflects the state as of early 2026. Always check the official documentation for the latest information. The architectural principles and comparison framework remain stable even as specific details change.

Model Availability Comparison

The most important difference between cloud AI services is which models they offer. Each platform provides a mix of first-party models (built by the cloud provider) and third-party models (from AI labs like Anthropic, Meta, and Mistral). Your model choice drives quality, cost, and capabilities.

Model Availability Matrix

Model ProviderAWS BedrockAzure OpenAIGCP Vertex AI
Anthropic ClaudeClaude 3.5 Sonnet, Haiku, OpusNot availableClaude 3.5 Sonnet (via Model Garden)
OpenAI GPTNot availableGPT-4o, GPT-4o mini, GPT-4 TurboNot available
Google GeminiNot availableNot availableGemini 2.0 Flash, 1.5 Pro, 1.5 Flash
Meta LlamaLlama 3.1 (8B, 70B, 405B)Not availableLlama 3.1 (via Model Garden)
MistralMistral Large, SmallMistral Large (select regions)Mistral (via Model Garden)
Amazon/First-PartyTitan Text, Titan EmbeddingsN/AN/A
CohereCommand R, Command R+Not availableNot available
Stability AIStable Diffusion XLNot availableStable Diffusion (via Model Garden)

Model Lock-In Consideration

If model flexibility is important, AWS Bedrock offers the widest selection of third-party models through a unified API. Azure OpenAI provides the deepest integration with OpenAI models but is limited to those models. GCP Vertex AI has Gemini as its primary offering with a broad Model Garden for open-source models. To avoid lock-in, abstract your AI calls behind a service layer that can switch providers.

API Design and Developer Experience

Each platform takes a different approach to API design, which affects how quickly you can build and how portable your code is across providers.

AWS Bedrock: Converse API

python
import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[
        {
            "role": "user",
            "content": [{"text": "What is the capital of France?"}],
        }
    ],
    system=[{"text": "Answer in one sentence."}],
    inferenceConfig={"maxTokens": 256, "temperature": 0.3},
)

print(response["output"]["message"]["content"][0]["text"])
print(f"Tokens: {response['usage']['inputTokens']} in, {response['usage']['outputTokens']} out")

Azure OpenAI: Chat Completions API

python
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="your-key",
    api_version="2024-10-21",
    azure_endpoint="https://my-resource.openai.azure.com",
)

response = client.chat.completions.create(
    model="gpt-4o",  # deployment name
    messages=[
        {"role": "system", "content": "Answer in one sentence."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.3,
)

print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

GCP Vertex AI: Gemini API

python
import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-project", location="us-central1")

model = GenerativeModel(
    "gemini-2.0-flash",
    system_instruction="Answer in one sentence.",
)

response = model.generate_content(
    "What is the capital of France?",
    generation_config={"max_output_tokens": 256, "temperature": 0.3},
)

print(response.text)
print(f"Tokens: {response.usage_metadata.prompt_token_count} in, "
      f"{response.usage_metadata.candidates_token_count} out")

API Comparison Summary

FeatureAWS BedrockAzure OpenAIGCP Vertex AI
Unified API across modelsYes (Converse API)Yes (OpenAI SDK)Partial (Gemini-specific)
StreamingConverseStreamstream=Truestream=True
Tool use / Function callingYes (all models)Yes (GPT-4o family)Yes (Gemini family)
Multimodal (images)Yes (Claude, Titan)Yes (GPT-4o)Yes (Gemini, native)
Video understandingLimitedLimitedNative (Gemini)
SDK languagesPython, JS, Java, Go, .NETPython, JS, .NET, Java, GoPython, JS, Java, Go

RAG and Knowledge Base Comparison

All three platforms provide managed RAG solutions that handle document ingestion, chunking, embedding, vector storage, and retrieval. The implementation details and flexibility differ significantly.

RAG FeatureAWS BedrockAzure OpenAIGCP Vertex AI
Managed RAGKnowledge BasesOn Your DataVertex AI Search + Grounding
Vector storesOpenSearch, Aurora, Pinecone, RedisAzure AI SearchVertex AI Vector Search, AlloyDB
Document sourcesS3, web crawler, ConfluenceBlob Storage, AI Search indexCloud Storage, BigQuery, websites
Chunking strategiesFixed, semantic, hierarchicalFixed sizeFixed, semantic (via Vertex AI Search)
Hybrid searchYes (OpenSearch)Yes (AI Search)Yes (Vertex AI Search)
Citation supportYes (source attribution)Yes (data references)Yes (grounding metadata)
Web groundingNoBing Search integrationGoogle Search grounding

Custom RAG vs. Managed RAG

Managed RAG solutions (Bedrock Knowledge Bases, Azure On Your Data, Vertex AI Search grounding) are fastest to set up but give you less control over chunking, retrieval strategy, and prompt construction. For production applications, most teams build custom RAG pipelines using the provider's vector store directly (OpenSearch, AI Search, Vector Search) combined with their own embedding and retrieval logic. This provides better quality at the cost of more engineering effort.

Embeddings Comparison

Embeddings are the foundation of RAG and semantic search. Each provider offers different embedding models with varying dimensions, quality, and pricing.

Side-by-Side Embedding Code

python
# ── AWS Bedrock ──
import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
response = bedrock.invoke_model(
    modelId="amazon.titan-embed-text-v2:0",
    body=json.dumps({"inputText": "Cloud computing fundamentals", "dimensions": 1024}),
)
embedding_aws = json.loads(response["body"].read())["embedding"]
print(f"Bedrock Titan: {len(embedding_aws)} dimensions")

# ── Azure OpenAI ──
from openai import AzureOpenAI

azure_client = AzureOpenAI(api_key="key", api_version="2024-10-21",
                            azure_endpoint="https://my.openai.azure.com")
response = azure_client.embeddings.create(
    model="text-embedding-3-large", input=["Cloud computing fundamentals"], dimensions=1024,
)
embedding_azure = response.data[0].embedding
print(f"Azure OpenAI: {len(embedding_azure)} dimensions")

# ── GCP Vertex AI ──
from vertexai.language_models import TextEmbeddingModel

model = TextEmbeddingModel.from_pretrained("text-embedding-005")
embeddings = model.get_embeddings(["Cloud computing fundamentals"], output_dimensionality=1024)
embedding_gcp = embeddings[0].values
print(f"Vertex AI: {len(embedding_gcp)} dimensions")

Embedding Model Comparison

ModelProviderDimensionsPrice per 1M tokens
Titan Embed Text v2AWS Bedrock256 / 512 / 1024$0.02
text-embedding-3-largeAzure OpenAI256-3072$0.13
text-embedding-3-smallAzure OpenAI512-1536$0.02
text-embedding-005GCP Vertex AI768 (default)$0.00001

GCP Has the Cheapest Embeddings

GCP's text-embedding-005 is orders of magnitude cheaper than competitors at $0.00001 per 1,000 tokens (effectively free for most workloads). If embeddings are a significant cost driver, consider using GCP for embedding generation even if your primary LLM is on another provider. The embedding vectors can be stored in any vector database regardless of which cloud generated them.

Safety and Content Filtering

All three platforms provide content filtering to prevent harmful content generation. They differ in granularity, customizability, and approach.

Safety FeatureAWS BedrockAzure OpenAIGCP Vertex AI
Content filteringGuardrails (customizable)Built-in (configurable thresholds)Safety Settings (per-category)
PII detectionYes (anonymize or block)No (use Azure AI services)No (use DLP API)
Topic restrictionsYes (deny topics)No (prompt engineering)No (prompt engineering)
Custom word filtersYes (word lists)Custom blocklistsNo
Jailbreak detectionYes (prompt attack filter)Yes (jailbreak detection)Partial (safety settings)
Independent from modelYes (applies to any model)Tied to deploymentPer-request configuration

Pricing Comparison

Pricing is one of the most important factors in choosing an AI service, especially at scale. The following comparison shows per-token pricing for equivalent model tiers. Prices are approximate and change frequently.

Flagship Model Pricing (per 1M tokens)

TierAWS BedrockAzure OpenAIGCP Vertex AI
Flagship (Input)Claude Sonnet: $3.00GPT-4o: $2.50Gemini 1.5 Pro: $1.25
Flagship (Output)Claude Sonnet: $15.00GPT-4o: $10.00Gemini 1.5 Pro: $5.00
Fast/Cheap (Input)Claude Haiku: $0.25GPT-4o mini: $0.15Gemini 2.0 Flash: $0.10
Fast/Cheap (Output)Claude Haiku: $1.25GPT-4o mini: $0.60Gemini 2.0 Flash: $0.40
Batch discountUp to 50% off50% off (global batch)50% off (batch prediction)
Committed pricingProvisioned ThroughputPTU (Provisioned Throughput Units)Provisioned Throughput

Hidden Costs

Token pricing is not the only cost. Consider: vector database costs (OpenSearch Serverless at $700/month minimum vs. AI Search at $250/month vs. Vertex AI Vector Search at ~$200/month), data transfer costs for cross-region or cross-service communication, and the cost of minimum instances for managed RAG services. For many production deployments, infrastructure costs exceed model invocation costs.

Enterprise Security Comparison

Enterprise security features determine whether you can deploy AI services in regulated environments. All three platforms provide strong security, but the implementations differ.

Security FeatureAWS BedrockAzure OpenAIGCP Vertex AI
Private networkingVPC endpointsPrivate EndpointsVPC Service Controls
AuthenticationIAM (SigV4)Entra ID / API KeyIAM (OAuth2)
Encryption at restAWS KMS (CMK)Azure Key Vault (CMK)Cloud KMS (CMEK)
Data residencyPer regionPer regionPer region + data governance
SOC 2YesYesYes
HIPAAYes (BAA)Yes (BAA)Yes (BAA)
FedRAMPHigh (GovCloud)High (Gov regions)Moderate (select services)
Data used for trainingNo (opt-out by default)No (not used)No (not used)

Architecture Patterns

Here are three common production architecture patterns, each optimized for a different cloud provider but applicable concepts across all three.

Pattern 1: Multi-Provider for Resilience

python
# Abstract AI provider behind a common interface
from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class AIResponse:
    content: str
    input_tokens: int
    output_tokens: int
    provider: str
    model: str

class AIProvider(ABC):
    @abstractmethod
    def complete(self, messages: list, max_tokens: int = 1024) -> AIResponse:
        pass

class BedrockProvider(AIProvider):
    def __init__(self):
        import boto3
        self.client = boto3.client("bedrock-runtime", region_name="us-east-1")

    def complete(self, messages, max_tokens=1024):
        response = self.client.converse(
            modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
            messages=[{"role": m["role"], "content": [{"text": m["content"]}]} for m in messages],
            inferenceConfig={"maxTokens": max_tokens},
        )
        return AIResponse(
            content=response["output"]["message"]["content"][0]["text"],
            input_tokens=response["usage"]["inputTokens"],
            output_tokens=response["usage"]["outputTokens"],
            provider="bedrock", model="claude-3.5-sonnet",
        )

class AzureOpenAIProvider(AIProvider):
    def __init__(self):
        from openai import AzureOpenAI
        self.client = AzureOpenAI(api_key="key", api_version="2024-10-21",
                                   azure_endpoint="https://my.openai.azure.com")

    def complete(self, messages, max_tokens=1024):
        response = self.client.chat.completions.create(
            model="gpt-4o", messages=messages, max_tokens=max_tokens,
        )
        return AIResponse(
            content=response.choices[0].message.content,
            input_tokens=response.usage.prompt_tokens,
            output_tokens=response.usage.completion_tokens,
            provider="azure", model="gpt-4o",
        )

class VertexProvider(AIProvider):
    def __init__(self):
        import vertexai
        from vertexai.generative_models import GenerativeModel
        vertexai.init(project="my-project", location="us-central1")
        self.model = GenerativeModel("gemini-2.0-flash")

    def complete(self, messages, max_tokens=1024):
        prompt = "\n".join(f"{m['role']}: {m['content']}" for m in messages)
        response = self.model.generate_content(
            prompt, generation_config={"max_output_tokens": max_tokens},
        )
        return AIResponse(
            content=response.text,
            input_tokens=response.usage_metadata.prompt_token_count,
            output_tokens=response.usage_metadata.candidates_token_count,
            provider="vertex", model="gemini-2.0-flash",
        )

# Failover chain
PROVIDERS = [BedrockProvider(), AzureOpenAIProvider(), VertexProvider()]

def complete_with_failover(messages, max_tokens=1024):
    for provider in PROVIDERS:
        try:
            return provider.complete(messages, max_tokens)
        except Exception as e:
            print(f"Provider {provider.__class__.__name__} failed: {e}")
    raise Exception("All providers failed")

Decision Framework

Use this decision framework to choose the right AI service for your use case.

Choose AWS Bedrock When...

You need access to Claude (Anthropic) models, your infrastructure is primarily on AWS, you want the widest selection of third-party models through a single API, you need built-in guardrails with PII filtering and topic restrictions, or you want seamless integration with S3 and other AWS services for RAG.

Choose Azure OpenAI When...

You need GPT-4o or other OpenAI models, you are already on Azure or use Microsoft 365, you want the most mature content filtering system, you need FedRAMP High compliance in government regions, or you want to leverage Azure AI Search's hybrid search capabilities for RAG.

Choose GCP Vertex AI When...

You need native multimodal understanding (video, audio, images), you want the lowest-cost option (Gemini Flash is cheapest), you need Google Search grounding for up-to-date answers, your data is in BigQuery and you want tight integration, or you want the best model evaluation and fine-tuning tools.

Start Small, Iterate Fast

Do not over-invest in architecture before proving your use case. Start with the cloud you already use, the fastest model to prototype with (Gemini Flash or GPT-4o mini), and a simple prompt-based approach. Add RAG only when you confirm the model needs external data. Add guardrails when you move to production. Optimize costs when you have traffic data. The best AI architecture is the one you can ship this week.

Migration Between Providers

Switching between AI providers is simpler than migrating most cloud services because the interface is essentially "text in, text out." The main challenges are adapting prompts (models respond differently to the same prompt), migrating RAG infrastructure (vector databases and document pipelines), and updating authentication.

Prompt adaptation: Each model family has different strengths and sensitivities. A prompt optimized for GPT-4o may produce different results on Claude or Gemini. Budget 2-3 days for prompt testing and tuning when switching models.

Embedding migration: You cannot mix embeddings from different models in the same vector index. If you switch embedding models, you must re-embed all documents. This is typically a batch job that takes hours for large corpora.

Use abstraction layers: The multi-provider code example above shows how to abstract provider-specific code behind a common interface. This is the best investment you can make for portability and resilience.

Next Steps

After choosing your primary AI service, dive deep into the provider-specific guides for implementation details:

AWS Bedrock: Building AI ApplicationsAzure OpenAI Service GuideGCP Gemini & Vertex AI Guide

Key Takeaways

  1. 1AWS Bedrock offers the widest third-party model selection through a unified Converse API.
  2. 2Azure OpenAI provides the deepest OpenAI model integration with enterprise Azure security.
  3. 3GCP Vertex AI has the cheapest pricing and native multimodal understanding with Gemini.
  4. 4All three providers offer managed RAG solutions, but custom RAG provides better quality and control.
  5. 5GCP has the cheapest embedding models; Bedrock has the most comprehensive guardrails.
  6. 6Abstract your AI calls behind a provider-agnostic interface for portability and resilience.

Frequently Asked Questions

Which cloud AI service is the cheapest?
GCP Vertex AI with Gemini 2.0 Flash is the cheapest flagship model at $0.10/$0.40 per million tokens. GCP text-embedding-005 is effectively free. However, total cost includes vector store infrastructure, data transfer, and engineering effort, which can outweigh model costs.
Can I use multiple cloud AI services together?
Yes. Many enterprises use Bedrock for Claude-specific tasks, Azure OpenAI for Microsoft 365 integration, and Vertex AI for multimodal workloads. Abstract provider-specific code behind a common interface and implement failover across providers for resilience.
Which service is best for RAG?
Azure OpenAI with Azure AI Search offers the best managed RAG experience with hybrid search. AWS Bedrock Knowledge Bases provide the most flexible chunking strategies. GCP Vertex AI Search offers Google Search grounding for web-augmented answers. For custom RAG, all three are comparable.
How do I avoid vendor lock-in with AI services?
Use the provider's standard SDK but wrap calls in an abstraction layer. Avoid provider-specific features (agents, knowledge bases) until you've proven your use case. Store embeddings in a portable vector database format. Keep prompts in a template system that can be adapted per model.
Do cloud AI services use my data for training?
No. All three providers (AWS Bedrock, Azure OpenAI, GCP Vertex AI) explicitly state that customer data is not used to train or improve foundation models. Your data stays within your cloud account and is encrypted in transit and at rest.

Written by CloudToolStack Team

Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.

Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.