AWS Bedrock: Building AI Applications
Comprehensive guide to Amazon Bedrock covering model access, agents, knowledge bases, RAG, guardrails, fine-tuning, and production architecture patterns.
Prerequisites
- AWS account with Bedrock access enabled
- Basic understanding of LLMs and prompt engineering
- Familiarity with AWS IAM
Introduction to AWS Bedrock
Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies through a single API. Rather than training models from scratch or managing GPU infrastructure, Bedrock lets you build generative AI applications by selecting a model, customizing it with your data, and integrating it into your application using familiar AWS tools. Bedrock supports models from Anthropic (Claude), Amazon (Titan), Meta (Llama), Mistral, Cohere, Stability AI, and AI21 Labs.
What makes Bedrock compelling for enterprise teams is its integration with the broader AWS ecosystem. Your data never leaves your AWS account, models are accessed through private VPC endpoints, and you can use IAM policies to control who can invoke which models. Bedrock also provides built-in features for responsible AI, including Guardrails to filter harmful content, and Knowledge Bases for retrieval-augmented generation (RAG) that ground model responses in your own data.
This guide covers everything you need to go from zero to production with Bedrock: enabling model access, invoking models programmatically, building conversational agents, setting up knowledge bases for RAG, configuring guardrails, and optimizing costs. Every section includes working CLI commands and code samples you can run immediately.
Bedrock Pricing Model
Bedrock uses pay-per-use pricing based on input and output tokens. There are no upfront commitments or minimum fees. For example, Claude 3.5 Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens. You can also purchase Provisioned Throughput for predictable workloads at a discount. Always check the current pricing page since model costs change frequently.
Enabling Model Access
Before you can invoke any foundation model, you must explicitly request access in the Bedrock console. This is a one-time setup per model per region. AWS requires this step because some model providers have specific usage policies and terms of service you must accept.
Navigate to the Amazon Bedrock console, select "Model access" from the left sidebar, and click "Manage model access." Check the boxes for the models you want to use and submit the request. Most models are approved instantly, but some (like certain Anthropic or Meta models) may require a brief review period.
# List all available foundation models
aws bedrock list-foundation-models \
--query 'modelSummaries[].{Provider:providerName, Model:modelId, Input:inputModalities, Output:outputModalities}' \
--output table
# Check which models you have access to
aws bedrock list-foundation-models \
--query 'modelSummaries[?modelLifecycle.status==`ACTIVE`].{Model:modelId, Provider:providerName}' \
--output table
# Get details about a specific model
aws bedrock get-foundation-model \
--model-identifier anthropic.claude-3-5-sonnet-20241022-v2:0Region Availability
Not all models are available in every AWS region. Claude models are generally available in us-east-1, us-west-2, and eu-west-1. Check the Bedrock documentation for the latest region-model availability matrix. If you need a specific model, ensure you are working in a supported region before building your application.
Invoking Models with the API
Bedrock provides two primary APIs for model invocation: InvokeModel for synchronous single-turn requests and Converse for multi-turn conversations. The Converse API is the recommended approach because it provides a unified interface across all models, handles message formatting automatically, and supports tool use (function calling) natively.
# Simple invocation using the Converse API (recommended)
aws bedrock-runtime converse \
--model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
--messages '[{
"role": "user",
"content": [{"text": "Explain the difference between S3 and EBS in two sentences."}]
}]' \
--inference-config '{"maxTokens": 256, "temperature": 0.3}'
# Stream responses for real-time output
aws bedrock-runtime converse-stream \
--model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
--messages '[{
"role": "user",
"content": [{"text": "Write a Python function to list all S3 buckets."}]
}]' \
--inference-config '{"maxTokens": 1024, "temperature": 0.2}'Python SDK Integration
import boto3
import json
# Initialize the Bedrock Runtime client
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
# Using the Converse API (recommended for all models)
response = bedrock.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{
"role": "user",
"content": [
{"text": "What are the top 3 AWS services for building serverless applications?"}
],
}
],
inferenceConfig={
"maxTokens": 512,
"temperature": 0.3,
"topP": 0.9,
},
system=[{"text": "You are a helpful AWS solutions architect. Be concise and specific."}],
)
# Extract the response text
output_message = response["output"]["message"]
print(output_message["content"][0]["text"])
# Check token usage
usage = response["usage"]
print(f"Input tokens: {usage['inputTokens']}, Output tokens: {usage['outputTokens']}")Use the Converse API, Not InvokeModel
The older InvokeModel API requires model-specific request/response formatting (each provider has a different JSON schema). The Converse API uses a unified format across all models, making it easy to switch between providers without changing your code. Always prefer Converse for new projects.
Building Conversational Agents
Bedrock Agents extend foundation models with the ability to take actions. An agent can break down a user request into steps, call external APIs or Lambda functions to gather information, and synthesize a final response. This is essential for building applications that go beyond simple question-answering, such as customer service bots that can look up orders, IT assistants that can query monitoring systems, or data analysts that can run SQL queries.
Agents use a technique called ReAct (Reasoning and Acting) where the model reasons about what to do, executes an action, observes the result, and repeats until the task is complete. You define the available actions as Action Groups, each backed by a Lambda function or an OpenAPI schema.
import boto3
bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")
# Create an agent
response = bedrock_agent.create_agent(
agentName="cloud-ops-assistant",
agentResourceRoleArn="arn:aws:iam::123456789012:role/BedrockAgentRole",
foundationModel="anthropic.claude-3-5-sonnet-20241022-v2:0",
instruction="""You are a cloud operations assistant. You help engineers
check the status of their AWS resources, look up CloudWatch metrics,
and provide recommendations for cost optimization. Always verify
information before making recommendations.""",
idleSessionTTLInSeconds=600,
)
agent_id = response["agent"]["agentId"]
print(f"Agent created: {agent_id}")
# Define an action group with an OpenAPI schema
bedrock_agent.create_agent_action_group(
agentId=agent_id,
agentVersion="DRAFT",
actionGroupName="resource-lookup",
actionGroupExecutor={
"lambda": "arn:aws:lambda:us-east-1:123456789012:function:resource-lookup"
},
apiSchema={
"payload": json.dumps({
"openapi": "3.0.0",
"info": {"title": "Resource Lookup API", "version": "1.0"},
"paths": {
"/instances": {
"get": {
"summary": "List EC2 instances",
"operationId": "listInstances",
"parameters": [
{
"name": "state",
"in": "query",
"schema": {"type": "string"},
"description": "Filter by instance state (running, stopped)"
}
],
"responses": {"200": {"description": "List of instances"}}
}
}
}
})
},
)
# Prepare and create an agent alias for invocation
bedrock_agent.prepare_agent(agentId=agent_id)
# After preparation completes:
bedrock_agent.create_agent_alias(
agentId=agent_id,
agentAliasName="production"
)Knowledge Bases and RAG
Retrieval-Augmented Generation (RAG) is the most important pattern for enterprise AI applications. Instead of relying solely on a model's training data, RAG retrieves relevant documents from your own data sources and includes them in the prompt. This grounds the model's responses in your actual documentation, reducing hallucinations and ensuring answers reflect your organization's specific context.
Bedrock Knowledge Bases automate the entire RAG pipeline: ingesting documents from S3, chunking them into appropriate segments, generating vector embeddings, storing them in a vector database, and retrieving relevant chunks at query time. Supported vector stores include Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL (with pgvector), Pinecone, and Redis Enterprise Cloud.
import boto3
bedrock_agent = boto3.client("bedrock-agent", region_name="us-east-1")
# Create a knowledge base with OpenSearch Serverless
response = bedrock_agent.create_knowledge_base(
name="company-docs-kb",
description="Internal documentation and runbooks",
roleArn="arn:aws:iam::123456789012:role/BedrockKBRole",
knowledgeBaseConfiguration={
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
},
},
storageConfiguration={
"type": "OPENSEARCH_SERVERLESS",
"opensearchServerlessConfiguration": {
"collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/abc123",
"fieldMapping": {
"metadataField": "metadata",
"textField": "text",
"vectorField": "vector",
},
"vectorIndexName": "company-docs-index",
},
},
)
kb_id = response["knowledgeBase"]["knowledgeBaseId"]
# Add an S3 data source
bedrock_agent.create_data_source(
knowledgeBaseId=kb_id,
name="docs-s3-source",
dataSourceConfiguration={
"type": "S3",
"s3Configuration": {
"bucketArn": "arn:aws:s3:::company-documentation",
"inclusionPrefixes": ["runbooks/", "architecture/", "guides/"],
},
},
vectorIngestionConfiguration={
"chunkingConfiguration": {
"chunkingStrategy": "FIXED_SIZE",
"fixedSizeChunkingConfiguration": {
"maxTokens": 512,
"overlapPercentage": 15,
},
}
},
)
# Start ingestion (sync documents into the vector store)
bedrock_agent.start_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId="data-source-id"
)Querying the Knowledge Base
# Query the knowledge base directly (retrieve + generate)
bedrock_agent_runtime = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = bedrock_agent_runtime.retrieve_and_generate(
input={"text": "What is our disaster recovery procedure for the payments service?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": kb_id,
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"numberOfResults": 5,
"overrideSearchType": "HYBRID",
}
},
},
},
)
print(response["output"]["text"])
# Print source citations
for citation in response.get("citations", []):
for ref in citation.get("retrievedReferences", []):
source = ref["location"]["s3Location"]["uri"]
print(f" Source: {source}")Chunking Strategy Matters
The chunking strategy significantly impacts RAG quality. Fixed-size chunking (300-500 tokens with 10-20% overlap) works for general documents. For structured documents like API references or runbooks, use semantic chunking which splits on natural boundaries like headings and paragraphs. Bedrock also supports hierarchical chunking that creates parent-child relationships between chunks for better context retrieval.
Guardrails for Responsible AI
Bedrock Guardrails let you implement safeguards for your generative AI applications. Guardrails sit between the user and the model, filtering both inputs (prompts) and outputs (responses) based on configurable policies. You can block harmful content, filter sensitive information like PII, enforce topic boundaries, and apply custom word filters.
Guardrails are essential for production applications where you need to ensure the model does not discuss off-topic subjects, reveal sensitive data, or generate inappropriate content. You define a guardrail once and apply it to any model invocation.
bedrock = boto3.client("bedrock", region_name="us-east-1")
# Create a guardrail
response = bedrock.create_guardrail(
name="customer-service-guardrail",
description="Guardrail for customer-facing AI assistant",
# Content filters for harmful categories
contentPolicyConfig={
"filtersConfig": [
{"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "INSULTS", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "MISCONDUCT", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "PROMPT_ATTACK", "inputStrength": "HIGH", "outputStrength": "NONE"},
]
},
# Block specific topics
topicPolicyConfig={
"topicsConfig": [
{
"name": "competitor-discussion",
"definition": "Discussions comparing our products to competitors or recommending competitor products",
"examples": [
"Is CompetitorX better than our product?",
"Should I switch to CompetitorY?",
],
"type": "DENY",
},
{
"name": "financial-advice",
"definition": "Providing specific financial, investment, or tax advice",
"examples": [
"Should I invest in stocks?",
"What tax deductions can I claim?",
],
"type": "DENY",
},
]
},
# Filter PII from inputs and outputs
sensitiveInformationPolicyConfig={
"piiEntitiesConfig": [
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
]
},
# Custom blocked words
wordPolicyConfig={
"wordsConfig": [
{"text": "internal-secret-project"},
],
"managedWordListsConfig": [
{"type": "PROFANITY"},
],
},
blockedInputMessaging="I cannot process this request. Please rephrase without sensitive information.",
blockedOutputMessaging="I cannot provide a response to this query. Please ask something else.",
)
guardrail_id = response["guardrailId"]
# Use the guardrail with model invocation
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
response = bedrock_runtime.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{"role": "user", "content": [{"text": "Help me with my account question"}]}
],
guardrailConfig={
"guardrailIdentifier": guardrail_id,
"guardrailVersion": "DRAFT",
},
)Model Customization and Fine-Tuning
While foundation models are powerful out of the box, you can improve their performance on domain-specific tasks through customization. Bedrock supports two customization approaches: fine-tuning (training the model on your labeled data to adjust its weights) and continued pre-training (exposing the model to unlabeled domain-specific text to expand its knowledge).
Fine-tuning is best when you need the model to follow a specific output format, adopt a particular writing style, or improve accuracy on a narrow task. Continued pre-training is useful when the model lacks knowledge about your domain (for example, proprietary technical terminology or internal processes).
# Prepare training data in JSONL format
# Each line: {"prompt": "...", "completion": "..."}
# Upload to S3: s3://my-bucket/training-data/fine-tune.jsonl
bedrock = boto3.client("bedrock", region_name="us-east-1")
# Create a fine-tuning job
response = bedrock.create_model_customization_job(
jobName="support-classifier-v1",
customModelName="support-ticket-classifier",
roleArn="arn:aws:iam::123456789012:role/BedrockCustomizationRole",
baseModelIdentifier="amazon.titan-text-express-v1",
customizationType="FINE_TUNING",
trainingDataConfig={
"s3Uri": "s3://my-bucket/training-data/fine-tune.jsonl"
},
outputDataConfig={
"s3Uri": "s3://my-bucket/model-output/"
},
hyperParameters={
"epochCount": "3",
"batchSize": "8",
"learningRate": "0.00001",
"learningRateWarmupSteps": "10",
},
)
job_arn = response["jobArn"]
print(f"Fine-tuning job started: {job_arn}")
# Monitor the job
status = bedrock.get_model_customization_job(jobIdentifier=job_arn)
print(f"Status: {status['status']}") # InProgress, Completed, Failed
# Once complete, create a provisioned throughput for the custom model
# (Required to use custom models - on-demand is not available)
bedrock.create_provisioned_model_throughput(
provisionedModelName="support-classifier-pt",
modelId="arn:aws:bedrock:us-east-1:123456789012:custom-model/support-ticket-classifier",
modelUnits=1,
commitmentDuration="OneMonth", # or SixMonths for a discount
)Fine-Tuning Costs
Fine-tuning incurs training costs based on the number of tokens processed and the model used. Custom models also require Provisioned Throughput to invoke (no on-demand pricing), starting at approximately $1,800/month for one model unit. Fine-tuning is most cost-effective when you have a high-volume use case where improved accuracy justifies the infrastructure cost. For many use cases, prompt engineering or RAG with a Knowledge Base is more cost-effective.
Tool Use and Function Calling
Tool use (also called function calling) allows models to request the execution of external functions during a conversation. You define the available tools with their parameters, the model decides when and how to call them based on the conversation context, and your application executes the function and returns the result. This is the foundation for building AI assistants that can interact with real systems.
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
# Define available tools
tool_config = {
"tools": [
{
"toolSpec": {
"name": "get_ec2_instances",
"description": "Retrieves a list of EC2 instances with their current state",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"region": {
"type": "string",
"description": "AWS region (e.g., us-east-1)"
},
"state": {
"type": "string",
"enum": ["running", "stopped", "terminated"],
"description": "Filter by instance state"
}
},
"required": ["region"]
}
}
}
},
{
"toolSpec": {
"name": "get_cloudwatch_metric",
"description": "Gets CloudWatch metric statistics for a resource",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"namespace": {"type": "string"},
"metric_name": {"type": "string"},
"instance_id": {"type": "string"},
"period_hours": {"type": "integer", "default": 1}
},
"required": ["namespace", "metric_name", "instance_id"]
}
}
}
}
]
}
# Invoke with tool definitions
response = bedrock.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{
"role": "user",
"content": [{"text": "What is the CPU utilization of instance i-0abc123def456 in us-east-1?"}]
}
],
toolConfig=tool_config,
)
# Check if the model wants to use a tool
stop_reason = response["stopReason"]
if stop_reason == "tool_use":
tool_block = response["output"]["message"]["content"]
for block in tool_block:
if "toolUse" in block:
tool_name = block["toolUse"]["name"]
tool_input = block["toolUse"]["input"]
tool_use_id = block["toolUse"]["toolUseId"]
print(f"Model wants to call: {tool_name}({tool_input})")
# Execute the tool and return results
# (Your implementation here)
tool_result = {"cpu_utilization": 42.5, "period": "last_hour"}
# Send tool result back to the model
follow_up = bedrock.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{"role": "user", "content": [{"text": "What is the CPU utilization of instance i-0abc123def456?"}]},
{"role": "assistant", "content": tool_block},
{
"role": "user",
"content": [
{
"toolResult": {
"toolUseId": tool_use_id,
"content": [{"json": tool_result}],
}
}
],
},
],
toolConfig=tool_config,
)
print(follow_up["output"]["message"]["content"][0]["text"])IAM Permissions for Bedrock
Bedrock uses standard IAM policies for access control. You need separate permissions for model invocation (runtime), model management (control plane), and agent/knowledge base operations. A well-designed permission model is critical because Bedrock can process sensitive data and generate content that your users will see.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockModelInvocation",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-*",
"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
]
},
{
"Sid": "BedrockConverseAPI",
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
},
{
"Sid": "BedrockKnowledgeBase",
"Effect": "Allow",
"Action": [
"bedrock:Retrieve",
"bedrock:RetrieveAndGenerate"
],
"Resource": "arn:aws:bedrock:us-east-1:123456789012:knowledge-base/*"
},
{
"Sid": "BedrockAgentInvocation",
"Effect": "Allow",
"Action": [
"bedrock:InvokeAgent"
],
"Resource": "arn:aws:bedrock:us-east-1:123456789012:agent-alias/*"
},
{
"Sid": "BedrockGuardrails",
"Effect": "Allow",
"Action": [
"bedrock:ApplyGuardrail"
],
"Resource": "arn:aws:bedrock:us-east-1:123456789012:guardrail/*"
}
]
}Use Resource-Level Permissions
Restrict model access to specific models using resource ARNs rather than wildcards. This prevents users from accidentally invoking expensive models. For example, you might allow developers to use Claude Haiku for testing but restrict Claude Opus to production workloads with a separate role.
Monitoring and Observability
Monitoring Bedrock usage is essential for cost management and performance optimization. Bedrock publishes metrics to CloudWatch, and you can enable model invocation logging to capture full request and response payloads for debugging and audit purposes.
# Enable model invocation logging
aws bedrock put-model-invocation-logging-configuration \
--logging-config '{
"cloudWatchConfig": {
"logGroupName": "/aws/bedrock/model-invocations",
"roleArn": "arn:aws:iam::123456789012:role/BedrockLoggingRole",
"largeDataDeliveryS3Config": {
"bucketName": "bedrock-invocation-logs",
"keyPrefix": "large-payloads/"
}
},
"textDataDeliveryEnabled": true,
"imageDataDeliveryEnabled": false,
"embeddingDataDeliveryEnabled": false
}'
# View Bedrock CloudWatch metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/Bedrock \
--metric-name Invocations \
--dimensions Name=ModelId,Value=anthropic.claude-3-5-sonnet-20241022-v2:0 \
--start-time $(date -u -v-24H '+%Y-%m-%dT%H:%M:%S') \
--end-time $(date -u '+%Y-%m-%dT%H:%M:%S') \
--period 3600 \
--statistics Sum \
--output table
# Create a billing alarm for Bedrock spend
aws cloudwatch put-metric-alarm \
--alarm-name bedrock-cost-alarm \
--alarm-description "Alert when Bedrock costs exceed threshold" \
--namespace AWS/Bedrock \
--metric-name InvocationLatency \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 5000 \
--comparison-operator GreaterThanThreshold \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:bedrock-alerts"
# Query invocation logs with CloudWatch Logs Insights
aws logs start-query \
--log-group-name "/aws/bedrock/model-invocations" \
--start-time $(date -u -v-1H '+%s') \
--end-time $(date -u '+%s') \
--query-string 'fields @timestamp, modelId, inputTokenCount, outputTokenCount
| stats sum(inputTokenCount) as totalInput, sum(outputTokenCount) as totalOutput by modelId
| sort totalOutput desc'Cost Optimization Strategies
Bedrock costs can grow quickly in production, especially with large context windows and high request volumes. Understanding the pricing model and applying optimization strategies early can reduce your AI spend by 50-80% without sacrificing quality.
Model Selection by Use Case
| Use Case | Recommended Model | Why |
|---|---|---|
| Simple classification/routing | Claude Haiku | Fast, cheap ($0.25/$1.25 per M tokens) |
| General Q&A, summarization | Claude Sonnet | Best quality/cost ratio |
| Complex reasoning, coding | Claude Opus | Highest capability, use selectively |
| Embeddings | Titan Embed v2 | Low cost, good quality ($0.02 per M tokens) |
| High-volume, simple tasks | Titan Text Lite | Lowest cost, adequate for simple tasks |
Batch Inference for Non-Real-Time Workloads
If you have large datasets to process (document summarization, classification, extraction), use Bedrock Batch Inference. Submit jobs with thousands of prompts in a single request, and Bedrock processes them asynchronously at up to 50% lower cost than real-time invocations. Results are delivered to S3 when complete.
Production Architecture Patterns
Building production AI applications with Bedrock requires careful architecture decisions around reliability, security, and scalability. Here are the key patterns to follow.
VPC endpoints: Use a VPC endpoint for Bedrock to keep all traffic within the AWS network. This eliminates internet exposure for model invocations and is required by many compliance frameworks.
Caching: Implement response caching for common queries. Bedrock does not provide built-in caching, so use ElastiCache (Redis) or DynamoDB to cache responses keyed by a hash of the prompt and model parameters.
Rate limiting: Bedrock has per-model, per-region throttling limits. Implement client-side rate limiting and exponential backoff. For critical workloads, request quota increases through AWS Support or use Provisioned Throughput.
Multi-region failover: Deploy your application in multiple regions with a fallback model configuration. If us-east-1 throttles, automatically failover to us-west-2. Use Route 53 health checks to detect region-level issues.
import boto3
import hashlib
import json
from botocore.config import Config
# Configure retry behavior
bedrock_config = Config(
retries={"max_attempts": 5, "mode": "adaptive"},
read_timeout=120,
)
# Multi-region client setup
REGIONS = ["us-east-1", "us-west-2", "eu-west-1"]
clients = {
region: boto3.client("bedrock-runtime", region_name=region, config=bedrock_config)
for region in REGIONS
}
def invoke_with_failover(messages, model_id, max_tokens=1024):
"""Invoke Bedrock with automatic region failover."""
last_error = None
for region in REGIONS:
try:
response = clients[region].converse(
modelId=model_id,
messages=messages,
inferenceConfig={"maxTokens": max_tokens},
)
return response
except clients[region].exceptions.ThrottlingException as e:
last_error = e
print(f"Throttled in {region}, trying next region...")
continue
except Exception as e:
last_error = e
print(f"Error in {region}: {e}")
continue
raise last_errorCommon Pitfalls and Best Practices
After deploying dozens of Bedrock applications, several patterns emerge that separate successful deployments from problematic ones. Here are the most important lessons learned.
Do not put sensitive data in prompts without guardrails. If your application passes user-provided data to the model, implement input validation and use Guardrails to filter PII. A user might accidentally paste credit card numbers or passwords into a chat interface.
Monitor token usage, not just request counts. A single request with a 100K-token context window costs more than 100 requests with 1K-token contexts. Track both input and output tokens per request to understand your true cost profile.
Use system prompts effectively. A well-crafted system prompt reduces output token usage by making the model more concise and focused. Include explicit format instructions, length constraints, and domain context in the system prompt.
Test with multiple models. Bedrock's unified Converse API makes it trivial to switch models. Benchmark your use case across Claude, Titan, Llama, and Mistral to find the best quality/cost tradeoff for your specific task.
Implement proper error handling. Bedrock can return throttling errors, model timeout errors, and validation errors. Always implement exponential backoff with jitter for throttling, and set appropriate read timeouts for long-running generations.
Data Residency
Bedrock processes data in the region where you make the API call. If you have data residency requirements (GDPR, data sovereignty), ensure you invoke models in a compliant region. Bedrock does not use your data to train or improve foundation models, but you should still review the data processing terms for each model provider.
Next Steps
You now have a solid foundation for building AI applications on AWS Bedrock. The key concepts to internalize are: use the Converse API for model invocation, Knowledge Bases for RAG, Guardrails for safety, and Agents for action-taking capabilities. Start with a simple use case (document Q&A is the most common), prove value with a prototype, and incrementally add sophistication.
For your next steps, explore the Bedrock Playground in the console to experiment with different models and prompts interactively. Build a simple RAG application using a Knowledge Base backed by your own documentation. Then add Guardrails to make it production-ready.
IAM Best Practices: Securing Your AWS AccountLambda Performance Tuning GuideAI Services Across Clouds: Bedrock vs Azure OpenAI vs Vertex AIKey Takeaways
- 1Bedrock provides a unified API (Converse) for accessing multiple foundation models from different providers.
- 2Knowledge Bases automate the RAG pipeline: ingestion, chunking, embedding, and retrieval.
- 3Guardrails provide content filtering, PII detection, and topic restrictions independent of the model.
- 4Agents combine model reasoning with tool use for action-taking AI assistants.
- 5Fine-tuning requires Provisioned Throughput and is best reserved for high-volume, domain-specific use cases.
- 6Multi-region deployment with failover is essential for production reliability.
Frequently Asked Questions
What is the difference between Bedrock and SageMaker?
Which Bedrock model should I start with?
Does Bedrock use my data for training?
How do I reduce Bedrock costs?
Can I use Bedrock with a VPC endpoint?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.