Testing Infrastructure Code: Terratest, Checkov, OPA, and KICS in Practice

Why Most Infrastructure Code Is Untested

Application developers write tests as a matter of course. Frontend engineers have Jest, backend engineers have pytest or JUnit, and nobody ships code without at least some level of automated verification. Infrastructure engineers, on the other hand, routinely deploy Terraform modules, CloudFormation templates, and Helm charts that have never been tested beyond a manual terraform plan and a prayer.

This is not because infrastructure engineers do not care about quality. It is because infrastructure testing is genuinely harder than application testing. Application code runs in a predictable, local environment. Infrastructure code provisions real cloud resources that take minutes to create, cost money, have unpredictable timing, and interact with dozens of external APIs. The feedback loop for a Terraform integration test can be 15 to 30 minutes, compared to milliseconds for a unit test.

But the cost of untested infrastructure is real. A misconfigured security group, an S3 bucket without encryption, a database with public access -- these are the kinds of errors that cause security breaches and compliance failures. In 2026, with tools like Terratest, Checkov, OPA, and KICS all mature and well-integrated into CI/CD pipelines, there is no excuse for deploying infrastructure code without automated verification.

This article covers four categories of infrastructure testing tools, what each is good at, and how to build a practical testing strategy that catches real problems without slowing down your deployment pipeline to a crawl.

Category 1: Static Analysis with Checkov and KICS

Static analysis tools scan your infrastructure code without executing it. They do not create cloud resources, do not cost money, and run in seconds. This makes them the highest-value, lowest-cost testing layer you can add to your pipeline.

Checkov

Checkov is the most popular open-source infrastructure static analysis tool, and for good reason. It supports Terraform, CloudFormation, Kubernetes manifests, Helm charts, Dockerfiles, ARM templates, Bicep, and Serverless Framework -- basically everything you might use to define infrastructure. Out of the box, it includes over 2,500 built-in policies covering CIS benchmarks, SOC 2, PCI DSS, HIPAA, and general best practices.

Running Checkov against your Terraform code looks like this:

checkov -d ./terraform --framework terraform --output cli

Checkov scans every resource definition and reports violations. A typical first run on an existing codebase produces dozens or hundreds of findings. Do not try to fix them all at once. Instead, establish a baseline by suppressing existing findings with inline skip comments, then enforce zero new violations in CI. This lets you improve incrementally without blocking deployments on legacy issues.

The real power of Checkov is custom policies. You can write policies in Python or YAML that encode your organization's specific requirements. For example, a policy that requires all EC2 instances to use a specific AMI prefix, or that all S3 buckets must have a specific tag key. Custom policies catch organization-specific misconfigurations that generic CIS benchmarks miss.

KICS (Keeping Infrastructure as Code Secure)

KICS, developed by Checkmarx, is an alternative to Checkov with a different architecture. Instead of Python-based policies, KICS uses Rego (the same policy language as OPA) for its detection rules. This means you can share policy logic between KICS for static analysis and OPA for runtime enforcement.

KICS excels at detecting security vulnerabilities in Terraform, CloudFormation, Ansible, Kubernetes, Dockerfiles, and OpenAPI specifications. It includes over 2,000 queries out of the box and produces detailed reports with remediation guidance. Where Checkov focuses on compliance frameworks, KICS focuses more on security vulnerabilities -- things like open security groups, unencrypted storage, and excessive IAM permissions.

In practice, I recommend running both Checkov and KICS in your pipeline. They have significant overlap but each catches things the other misses. Checkov is stronger on compliance mapping (telling you which SOC 2 control a violation affects). KICS is stronger on security-specific patterns (detecting complex vulnerability chains across multiple resources). The combined runtime for both tools on a typical Terraform codebase is under 60 seconds, so there is no performance reason not to run both.

IDE integration matters

Both Checkov and KICS offer VS Code extensions that show violations inline as you write infrastructure code. This is dramatically more effective than catching issues in CI. An engineer who sees "this S3 bucket is missing encryption" while writing the code fixes it immediately. An engineer who sees the same finding in a failed CI pipeline 10 minutes after pushing a commit has to context-switch back, find the file, understand the finding, and fix it. Install the IDE extensions and encourage your team to use them.

AWS CloudFormation Template Linter

Category 2: Policy-as-Code with OPA and Rego

Open Policy Agent (OPA) is a general-purpose policy engine that evaluates structured data against policies written in Rego, a declarative query language. While Checkov and KICS are designed specifically for infrastructure code scanning, OPA is a policy engine that can evaluate any structured data -- Terraform plans, Kubernetes admission requests, API requests, CI/CD pipeline configurations, and more.

OPA for Terraform

The most common OPA use case in infrastructure is evaluating Terraform plans. The workflow is: run terraform plan -out=tfplan, convert the plan to JSON with terraform show -json tfplan, and evaluate the JSON against OPA policies. This approach tests the planned changes, not just the code, which means it catches issues that static analysis misses -- like a plan that would change a database instance type during business hours.

Writing Rego policies takes some getting used to. Rego is a declarative language that feels different from imperative languages most engineers are familiar with. A policy that denies S3 buckets without versioning looks like:

package terraform.s3

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket_versioning"
  resource.change.after.versioning_configuration[_].status != "Enabled"
  msg := sprintf("S3 bucket %v must have versioning enabled", [resource.address])
}

The learning curve is the main barrier to OPA adoption. In my experience, it takes a team about two weeks to become productive writing Rego policies and another month to build a comprehensive policy library. The investment pays off because OPA policies are reusable across Terraform, Kubernetes, and application APIs.

OPA for Kubernetes Admission Control

OPA Gatekeeper integrates OPA as a Kubernetes admission controller, evaluating every resource creation and modification request against your policies. This provides runtime enforcement -- even if someone bypasses the CI/CD pipeline and applies a manifest directly with kubectl, the admission controller blocks non-compliant resources.

Common Gatekeeper policies include: requiring resource limits on all containers, prohibiting privileged containers, requiring specific labels on all resources, restricting image registries to approved sources, and preventing hostPath volume mounts. Gatekeeper provides a ConstraintTemplate CRD for defining policies and a Constraint CRD for applying them to specific namespaces or resource types.

The practical advice I always give teams adopting Gatekeeper: start in audit mode (dryrun enforcement), not in deny mode. In audit mode, Gatekeeper logs violations without blocking deployments. Run in audit mode for at least two weeks, review the violation report, fix legitimate issues, and add exceptions for intentional policy deviations. Then switch to deny mode. Teams that start in deny mode inevitably block a critical deployment, panic-disable Gatekeeper, and never re-enable it.

OPA versus Kyverno

Kyverno is a Kubernetes-native alternative to OPA Gatekeeper that uses YAML policies instead of Rego. For teams that only need Kubernetes admission control and do not plan to use OPA for Terraform or API policies, Kyverno is simpler to adopt and maintain. For teams that want a unified policy engine across their entire infrastructure stack, OPA's flexibility justifies the steeper learning curve. Do not adopt OPA just for Kubernetes if Kyverno meets your needs.

Category 3: Integration Testing with Terratest

Terratest is a Go library for writing automated tests that provision real infrastructure, validate it, and tear it down. Unlike static analysis and policy checks, Terratest tests actually deploy resources to a cloud account and verify that they work correctly.

When Integration Tests Are Worth It

Integration tests are expensive. They take 10 to 30 minutes to run, they create real cloud resources that cost money, and they are inherently flaky because they depend on cloud API availability and timing. You should not write integration tests for everything. Reserve them for:

Shared modules. If a Terraform module is used by 10 teams, an integration test that validates the module works correctly is justified because a bug in the module affects 10 downstream consumers.
Complex networking. VPC configurations, peering connections, transit gateways, and DNS configurations are notoriously hard to validate with static analysis. An integration test that creates the network and verifies connectivity catches configuration errors that no amount of code review will find.
Security-critical resources. IAM roles, security groups, bucket policies, and encryption configurations should be integration-tested to verify that the actual deployed resource matches the intended configuration. A Terraform plan might look correct but the applied resource could behave differently due to provider bugs or API quirks.
Custom modules with conditional logic. Terraform modules that use count, for_each, or dynamic blocks to create different resource configurations based on input variables have a combinatorial surface area that static analysis cannot fully cover.

Terratest in Practice

A Terratest test follows a standard pattern: set up variables, run terraform init and terraform apply, validate the outputs and deployed resources, then run terraform destroy to clean up. The key function is defer terraform.Destroy(t, terraformOptions) which ensures cleanup runs even if the test fails.

The validation step is where Terratest shines. You can use the AWS, Azure, or GCP SDK wrappers in Terratest to inspect the actual deployed resources. For example, after deploying an S3 bucket, you can verify that encryption is enabled, versioning is active, public access is blocked, and the lifecycle policy is configured correctly -- all by querying the AWS API for the actual bucket configuration.

For networking modules, Terratest includes helpers for checking HTTP connectivity, SSH access, and DNS resolution. You can deploy a VPC with subnets, launch an EC2 instance, and verify that it can reach the internet through a NAT gateway -- an end-to-end validation that catches routing table misconfigurations, NAT gateway issues, and security group problems.

Managing Test Costs and Duration

The practical challenges with Terratest are cost and speed. Here are the strategies I use to keep integration tests sustainable:

Use a dedicated test account. Create a separate AWS account, Azure subscription, or GCP project for integration tests with aggressive resource cleanup (AWS Nuke or cloud-nuke on a schedule) to prevent orphaned resources.
Run expensive tests nightly, not on every commit. Static analysis and policy checks should run on every pull request. Integration tests should run nightly or on merge to main. This keeps the developer feedback loop fast for routine changes while still catching integration issues before release.
Use test stages. Terratest supports test stages that let you skip the deploy/destroy phases during development. You deploy once, iterate on the validation code, and destroy only when done. This reduces the feedback loop from 20 minutes to seconds during test development.
Parallelize tests. Terratest tests are Go tests, so they support t.Parallel(). Running 10 module tests in parallel instead of sequentially reduces total runtime from 200 minutes to 20 minutes. Use unique resource names (Terratest provides random.UniqueId()) to avoid naming conflicts between parallel tests.

Terraform test as an alternative

Terraform 1.6+ includes a native terraform test command that supports integration testing with HCL-based test files. It is simpler than Terratest for basic validation scenarios -- you do not need to write Go code. However, it lacks the rich validation capabilities, SDK wrappers, and retry logic that Terratest provides. For simple module validation (verify outputs, check resource counts), terraform test is sufficient. For complex validation (verify actual resource configuration via API, test network connectivity, validate IAM permissions), Terratest is still the better tool.

Category 4: CI/CD Integration Patterns

The testing tools only matter if they run automatically. Here is how to integrate them into a practical CI/CD pipeline.

The Testing Pyramid for Infrastructure

Just like application testing has a pyramid (unit tests at the base, integration tests in the middle, end-to-end tests at the top), infrastructure testing has its own pyramid:

Base: Static analysis (Checkov + KICS). Runs on every commit and pull request. Completes in under 60 seconds. Catches configuration errors, missing encryption, open security groups, and compliance violations. This is your first line of defense and catches 70 to 80 percent of infrastructure defects.
Middle: Policy evaluation (OPA on Terraform plan). Runs on every pull request as part of the plan phase. Completes in 2 to 5 minutes (mostly the Terraform plan time). Catches plan-level issues like unauthorized resource types, missing tags, non-compliant instance sizes, and changes that violate organizational policies.
Top: Integration tests (Terratest). Runs nightly or on merge to main. Completes in 10 to 30 minutes. Catches actual deployment issues, connectivity problems, and configuration drift between what the code says and what the cloud API produces.

GitHub Actions Pipeline Example

A typical GitHub Actions pipeline for Terraform modules runs three jobs: lint (Checkov + KICS + terraform validate), plan (terraform plan + OPA evaluation), and test (Terratest, triggered only on merge to main or nightly schedule). The lint job runs in parallel with the plan job since they do not depend on each other. The test job runs only after both lint and plan succeed.

For the plan job, use a GitHub Actions environment with OIDC authentication to your cloud provider. Do not store long-lived cloud credentials as GitHub secrets. Configure the plan output as a pull request comment so reviewers can see both the Terraform plan and the OPA policy evaluation results directly in the PR.

GitLab CI Integration

GitLab CI provides similar capabilities with its pipeline stages. Use a .pre stage for linting, a plan stage for Terraform plan and OPA evaluation, and a test stage for integration tests with a rules: - if condition that limits integration tests to the main branch or scheduled pipelines. GitLab's merge request approvals can be configured to require that the lint and plan stages pass before allowing merge.

ARM Template Linter

What Not to Test

This is as important as knowing what to test. Infrastructure testing has diminishing returns, and over-testing slows down deployments without proportional quality improvement.

Do not test the cloud provider. You do not need to verify that aws_s3_bucket creates an S3 bucket. The provider is tested by HashiCorp and the community. Test your configuration of the resource, not the resource itself.
Do not integration-test simple resources. An S3 bucket with standard configuration does not need a Terratest test. Static analysis covers encryption, versioning, and public access checks. Reserve integration tests for resources where the configuration is complex enough that you cannot be confident from code review alone.
Do not test the same thing at multiple layers. If Checkov already validates that all S3 buckets have encryption enabled, you do not also need an OPA policy and a Terratest assertion for the same check. Each layer should test different things: static analysis for configuration correctness, policy evaluation for organizational compliance, integration tests for deployment behavior.
Do not test infrastructure that changes frequently. If a Terraform module is modified weekly, a 20-minute integration test on every change becomes a bottleneck. For rapidly-changing modules, rely on static analysis and policy checks, and run integration tests less frequently (weekly or bi-weekly).

Getting Started: The 30-Day Plan

Week 1: Add Checkov to your CI pipeline with zero-failure policy enforcement. Suppress existing findings and commit to fixing 10 percent per sprint. Week 2: Add KICS alongside Checkov. Review findings that Checkov missed. Week 3: Write your first three OPA policies for your most critical requirements (encryption, tagging, network exposure) and integrate them into the Terraform plan phase. Week 4: Write Terratest tests for your two or three most critical shared modules and run them nightly.

This incremental approach gives you meaningful safety improvements in the first week while building toward a comprehensive testing strategy over the month. Teams that try to implement everything at once typically get overwhelmed by the volume of findings and abandon the effort. Start small, demonstrate value, and expand.

Related Tools

AWS CloudFormation Template Linter -- Lint and validate CloudFormation templates
ARM Template Linter -- Lint and validate ARM templates