Amazon OpenSearch Service Guide
Deploy and operate OpenSearch on AWS: domains, indexing, search queries, aggregations, Serverless, Dashboards, and ISM lifecycle policies.
Prerequisites
- Basic understanding of search and indexing concepts
- AWS account with OpenSearch permissions
Introduction to Amazon OpenSearch Service
Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) is a fully managed service for deploying, operating, and scaling OpenSearch clusters in the AWS Cloud. OpenSearch is an open-source search and analytics engine derived from Elasticsearch 7.10, used for full-text search, log analytics, application monitoring, clickstream analytics, and security information and event management (SIEM). The service also includes OpenSearch Dashboards (a fork of Kibana) for data visualization.
Amazon OpenSearch Service handles the operational heavy lifting of running a search cluster: provisioning nodes, configuring storage, managing software updates, monitoring cluster health, and creating automated snapshots. It supports VPC-based network isolation, fine-grained access control with IAM and SAML, data encryption at rest and in transit, and cross-cluster replication for disaster recovery.
This guide covers creating and configuring OpenSearch domains, indexing and searching documents, building aggregations for analytics, using OpenSearch Serverless for on-demand capacity, visualizing data with Dashboards, securing your cluster, and optimizing performance and costs.
OpenSearch vs. OpenSearch Serverless
Amazon OpenSearch Service offers two deployment options. Managed clusters (domains) give you full control over instance types, node count, storage, and configuration. OpenSearch Serverlessautomatically provisions and scales capacity based on workload, with no cluster management required. Use managed clusters for predictable workloads where you need full configuration control. Use Serverless for variable or unpredictable workloads and when you want zero operational overhead.
Creating an OpenSearch Domain
An OpenSearch domain is the equivalent of a cluster. It consists of data nodes (store and search data), optional dedicated master nodes (manage cluster state), and optional UltraWarm nodes (cost-effective warm storage for infrequently accessed data). You choose the instance types, node counts, and storage configuration based on your data volume and query patterns.
# Create an OpenSearch domain with production configuration
aws opensearch create-domain \
--domain-name production-logs \
--engine-version "OpenSearch_2.11" \
--cluster-config '{
"InstanceType": "r6g.xlarge.search",
"InstanceCount": 3,
"DedicatedMasterEnabled": true,
"DedicatedMasterType": "m6g.large.search",
"DedicatedMasterCount": 3,
"ZoneAwarenessEnabled": true,
"ZoneAwarenessConfig": {
"AvailabilityZoneCount": 3
},
"WarmEnabled": true,
"WarmType": "ultrawarm1.medium.search",
"WarmCount": 2
}' \
--ebs-options '{
"EBSEnabled": true,
"VolumeType": "gp3",
"VolumeSize": 500,
"Iops": 3000,
"Throughput": 250
}' \
--vpc-options '{
"SubnetIds": ["subnet-abc123", "subnet-def456", "subnet-ghi789"],
"SecurityGroupIds": ["sg-opensearch"]
}' \
--encryption-at-rest-options '{"Enabled": true}' \
--node-to-node-encryption-options '{"Enabled": true}' \
--domain-endpoint-options '{"EnforceHTTPS": true, "TLSSecurityPolicy": "Policy-Min-TLS-1-2-PFS-2023-10"}' \
--advanced-security-options '{
"Enabled": true,
"InternalUserDatabaseEnabled": true,
"MasterUserOptions": {
"MasterUserName": "admin",
"MasterUserPassword": "YourStr0ng_P@ss!"
}
}' \
--auto-tune-options '{"DesiredState": "ENABLED"}' \
--tags Key=Environment,Value=production
# Wait for domain to become active (15-30 minutes)
aws opensearch describe-domain \
--domain-name production-logs \
--query 'DomainStatus.{Endpoint: Endpoints.vpc, Status: Processing, Engine: EngineVersion, Nodes: ClusterConfig.InstanceCount}' \
--output tableDedicated Master Nodes
Always use dedicated master nodes for production clusters. Without them, data nodes handle both cluster management and search/indexing, which can lead to cluster instability under heavy load. Use three dedicated master nodes (never two, to avoid split-brain scenarios). Master nodes do not need to be large: m6g.large.search is sufficient for most clusters with up to 100 data nodes. Dedicated master nodes are included in the domain cost with no additional charge for the master node software.
Indexing Documents
Data in OpenSearch is stored as JSON documents within indices. An index is similar to a database table: it has a name, a mapping (schema) that defines the fields and their types, and settings that control behavior like the number of shards and replicas. You index documents using the REST API, which OpenSearch Dashboards, the AWS SDK, and tools like Logstash and Fluent Bit use under the hood.
# Set the domain endpoint variable
ENDPOINT="https://vpc-production-logs-xxx.us-east-1.es.amazonaws.com"
# Create an index with explicit mappings
curl -XPUT "$ENDPOINT/application-logs" \
-H "Content-Type: application/json" \
-u admin:YourStr0ng_P@ss! \
-d '{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy",
"index.lifecycle.rollover_alias": "application-logs-alias"
},
"mappings": {
"properties": {
"timestamp": {"type": "date", "format": "strict_date_optional_time||epoch_millis"},
"level": {"type": "keyword"},
"service": {"type": "keyword"},
"message": {"type": "text", "analyzer": "standard"},
"host": {"type": "keyword"},
"trace_id": {"type": "keyword"},
"response_time_ms": {"type": "integer"},
"status_code": {"type": "short"},
"user_agent": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},
"client_ip": {"type": "ip"},
"geo_location": {"type": "geo_point"}
}
}
}'
# Index a single document
curl -XPOST "$ENDPOINT/application-logs/_doc" \
-H "Content-Type: application/json" \
-u admin:YourStr0ng_P@ss! \
-d '{
"timestamp": "2026-03-14T10:30:15.123Z",
"level": "ERROR",
"service": "order-api",
"message": "Failed to process payment for order ORD-42891",
"host": "api-server-03",
"trace_id": "abc-123-def-456",
"response_time_ms": 5234,
"status_code": 500,
"client_ip": "203.0.113.50"
}'
# Bulk index documents (much more efficient)
curl -XPOST "$ENDPOINT/_bulk" \
-H "Content-Type: application/x-ndjson" \
-u admin:YourStr0ng_P@ss! \
--data-binary @- << 'EOF'
{"index": {"_index": "application-logs"}}
{"timestamp": "2026-03-14T10:30:16Z", "level": "INFO", "service": "order-api", "message": "Order ORD-42892 created", "response_time_ms": 45}
{"index": {"_index": "application-logs"}}
{"timestamp": "2026-03-14T10:30:17Z", "level": "WARN", "service": "payment-api", "message": "Payment gateway slow response", "response_time_ms": 2100}
{"index": {"_index": "application-logs"}}
{"timestamp": "2026-03-14T10:30:18Z", "level": "INFO", "service": "inventory-api", "message": "Stock updated for SKU-1234", "response_time_ms": 23}
EOFSearching and Querying
OpenSearch provides a powerful query DSL (Domain Specific Language) for searching documents. Queries range from simple full-text searches to complex boolean combinations with filters, aggregations, highlighting, and scoring modifications. Understanding the query DSL is essential for building effective search experiences and log analytics dashboards.
# Full-text search
curl -XGET "$ENDPOINT/application-logs/_search" \
-H "Content-Type: application/json" \
-u admin:YourStr0ng_P@ss! \
-d '{
"query": {
"bool": {
"must": [
{"match": {"message": "payment failed"}}
],
"filter": [
{"term": {"level": "ERROR"}},
{"range": {"timestamp": {"gte": "2026-03-14T00:00:00Z", "lte": "2026-03-14T23:59:59Z"}}},
{"term": {"service": "order-api"}}
]
}
},
"sort": [{"timestamp": {"order": "desc"}}],
"size": 20,
"highlight": {
"fields": {"message": {}}
}
}'
# Aggregation query - error count by service per hour
curl -XGET "$ENDPOINT/application-logs/_search" \
-H "Content-Type: application/json" \
-u admin:YourStr0ng_P@ss! \
-d '{
"size": 0,
"query": {
"bool": {
"filter": [
{"term": {"level": "ERROR"}},
{"range": {"timestamp": {"gte": "now-24h"}}}
]
}
},
"aggs": {
"errors_per_hour": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1h"
},
"aggs": {
"by_service": {
"terms": {"field": "service", "size": 10}
}
}
},
"avg_response_time": {
"avg": {"field": "response_time_ms"}
},
"response_percentiles": {
"percentiles": {
"field": "response_time_ms",
"percents": [50, 90, 95, 99]
}
}
}
}'
# Multi-field search with boosting
curl -XGET "$ENDPOINT/application-logs/_search" \
-H "Content-Type: application/json" \
-u admin:YourStr0ng_P@ss! \
-d '{
"query": {
"multi_match": {
"query": "connection timeout",
"fields": ["message^3", "service", "host"],
"type": "best_fields"
}
}
}'OpenSearch Serverless
OpenSearch Serverless removes the need to configure and manage OpenSearch clusters entirely. You create collections (the Serverless equivalent of domains), and AWS automatically provisions, scales, and manages the underlying infrastructure. Serverless supports two collection types: search collections for full-text search workloads and time series collections for log and metrics data.
# Create an encryption policy (required before creating a collection)
aws opensearchserverless create-security-policy \
--name logs-encryption \
--type encryption \
--policy '{
"Rules": [
{"ResourceType": "collection", "Resource": ["collection/log-analytics"]}
],
"AWSOwnedKey": true
}'
# Create a network policy
aws opensearchserverless create-security-policy \
--name logs-network \
--type network \
--policy '[{
"Rules": [
{"ResourceType": "collection", "Resource": ["collection/log-analytics"]},
{"ResourceType": "dashboard", "Resource": ["collection/log-analytics"]}
],
"AllowFromPublic": true
}]'
# Create a data access policy
aws opensearchserverless create-access-policy \
--name logs-data-access \
--type data \
--policy '[{
"Rules": [
{
"ResourceType": "index",
"Resource": ["index/log-analytics/*"],
"Permission": ["aoss:CreateIndex", "aoss:UpdateIndex", "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument"]
},
{
"ResourceType": "collection",
"Resource": ["collection/log-analytics"],
"Permission": ["aoss:CreateCollectionItems"]
}
],
"Principal": ["arn:aws:iam::123456789:role/opensearch-admin"]
}]'
# Create a serverless collection
aws opensearchserverless create-collection \
--name log-analytics \
--type TIMESERIES \
--description "Serverless collection for application logs"
# Get collection endpoint
aws opensearchserverless batch-get-collection \
--names log-analytics \
--query 'collectionDetails[0].{Name: name, Endpoint: collectionEndpoint, Dashboard: dashboardEndpoint, Status: status}'Serverless Collection Types
Choose TIMESERIES for log analytics and metrics data. Time series collections optimize for append-heavy write patterns and time-based queries, and they support index lifecycle management for automatic data rollover and deletion. Choose SEARCH for application search, e-commerce product catalogs, and content management where you need full-text search with updates and deletes. Choose VECTORSEARCH for AI/ML use cases like semantic search, recommendation engines, and RAG (Retrieval Augmented Generation) applications.
Data Ingestion Patterns
Getting data into OpenSearch efficiently requires choosing the right ingestion pipeline. AWS provides several managed options for streaming data into OpenSearch without writing custom code.
Ingestion Pipeline Comparison
| Method | Best For | Buffering | Transform |
|---|---|---|---|
| Amazon Data Firehose | CloudWatch Logs, Kinesis streams | Yes (buffer size/time) | Lambda transform |
| OpenSearch Ingestion | Complex pipelines with enrichment | Yes | Built-in processors |
| Logstash | On-premises log collection | Plugin-based | Rich filter plugins |
| Fluent Bit | Container and EC2 log shipping | Lightweight | Filter plugins |
| Direct API | Application-level indexing | Application handles | Application code |
# Create an OpenSearch Ingestion pipeline
aws osis create-pipeline \
--pipeline-name log-ingestion \
--min-units 1 \
--max-units 4 \
--pipeline-configuration-body 'version: "2"
log-pipeline:
source:
http:
path: "/logs/ingest"
processor:
- grok:
match:
message:
- "%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:level}\] %{GREEDYDATA:msg}"
- date:
from_time_received: true
destination: "@timestamp"
sink:
- opensearch:
hosts: ["https://vpc-production-logs-xxx.us-east-1.es.amazonaws.com"]
index: "application-logs-%{yyyy.MM.dd}"
aws:
sts_role_arn: "arn:aws:iam::123456789:role/osis-role"
region: "us-east-1"'Index Lifecycle Management
For log and time-series data, Index Lifecycle Management (ILM) automates the process of rolling over indices when they reach a certain size or age, moving older indices to cheaper storage tiers (UltraWarm, then cold storage), and deleting indices past their retention period. This is essential for controlling storage costs in log analytics workloads.
# Create an ISM (Index State Management) policy
curl -XPUT "$ENDPOINT/_plugins/_ism/policies/logs-lifecycle" \
-H "Content-Type: application/json" \
-u admin:YourStr0ng_P@ss! \
-d '{
"policy": {
"description": "Lifecycle policy for application logs",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{"rollover": {"min_size": "30gb", "min_index_age": "1d"}}
],
"transitions": [
{"state_name": "warm", "conditions": {"min_index_age": "7d"}}
]
},
{
"name": "warm",
"actions": [
{"warm_migration": {}},
{"force_merge": {"max_num_segments": 1}},
{"replica_count": {"number_of_replicas": 0}}
],
"transitions": [
{"state_name": "delete", "conditions": {"min_index_age": "90d"}}
]
},
{
"name": "delete",
"actions": [{"delete": {}}]
}
],
"ism_template": {
"index_patterns": ["application-logs-*"],
"priority": 100
}
}
}'Security and Access Control
OpenSearch Service provides multiple layers of security: VPC-based network isolation, fine-grained access control (FGAC) with role-based permissions, encryption at rest and in transit, audit logging, and integration with IAM and SAML identity providers for authentication.
Security Best Practices
Always deploy production OpenSearch domains within a VPC, not on public endpoints. Enable fine-grained access control (FGAC) to restrict access at the index, document, and field level. Use IAM-based authentication with Signature Version 4 signing for programmatic access. Enable audit logging to track all access and changes. Enforce HTTPS-only access with TLS 1.2 minimum. Regularly rotate the master user password and review access policies.
Amazon OpenSearch Service is a versatile platform for search, log analytics, and observability. For new deployments, consider OpenSearch Serverless to eliminate operational overhead. For existing workloads, use managed domains with dedicated master nodes, multi-AZ deployment, UltraWarm for cost-effective warm storage, and ISM policies for automated lifecycle management. Monitor cluster health with CloudWatch metrics and set up alerts for red cluster status, high JVM memory pressure, and storage utilization above 80%.
AWS RDS & Aurora Deep DiveAWS Batch GuideAWS Network Firewall GuideKey Takeaways
- 1OpenSearch Service handles provisioning, patching, scaling, and backups for search clusters.
- 2OpenSearch Serverless eliminates cluster management with automatic scaling and zero operational overhead.
- 3UltraWarm and cold storage tiers reduce costs for infrequently accessed log data.
- 4ISM policies automate index rollover, migration to warm storage, and deletion based on age and size.
Frequently Asked Questions
Should I use OpenSearch managed clusters or Serverless?
How do I right-size an OpenSearch domain?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.