Configure Glue job bookmarks and transformation parameters.
Last verified: May 2026
Configure Glue job bookmarks and transformation parameters.
Required Fields
jobNameenableBookmarksbookmarkOptionscriptArgumentsscriptArguments.--job-languageOutput will appear here...The Glue Job Bookmark Config Builder helps you configure AWS Glue job bookmarks for incremental data processing. Job bookmarks track what data has already been processed so subsequent job runs only pick up new data. This tool helps you set up bookmark configurations for different data sources like S3, JDBC, and DynamoDB, with proper transformation context and bookmark key settings to ensure reliable incremental processing.
Resetting a bookmark causes the next job run to reprocess all data from the beginning, as if the job has never run before. This is useful when you need to reprocess historical data or when bookmark state becomes corrupted. You can reset bookmarks via the console, CLI, or API.
Job bookmarks work with S3 sources (tracking processed files), JDBC sources (tracking incremental rows via bookmark keys), and some other native Glue connectors. Custom connectors may need manual bookmark management using the getBookmark and setBookmark APIs.
Your team's Glue ETL job processes new files in an S3 bucket nightly. Last night it reprocessed 6 months of data, blowing past your DPU budget. Investigation: someone changed the source DynamicFrame.fromCatalog call but kept the same transformation_ctx — Glue treated it as a new source and reset the bookmark. The builder helps you generate consistent transformation_ctx values per source (typically `<job_name>_<source_id>`) so future changes don't trigger silent bookmark resets.
The builder generates Glue job arguments and DataFrame configuration for bookmark behavior: --job-bookmark-option (job-bookmark-enable / job-bookmark-disable / job-bookmark-pause), transformation_ctx values per source, and bookmark keys for JDBC sources. Output is a JSON job argument block ready for `aws glue start-job-run` or as part of a CDK/Terraform Glue job definition.
transformation_ctx is the unique identifier Glue uses to track each bookmark. If you reuse the same value across multiple sources or jobs, bookmarks will collide and silently corrupt. Each DataFrame read should have its own transformation_ctx — never copy-paste them across jobs.
JDBC bookmarks require monotonically-increasing bookmark keys (typically a primary key or a created_at timestamp). If your source can have backfilled or out-of-order rows (e.g., async writes), bookmarks will skip those rows on the next run. For such cases, use Glue's job_bookmark_pause + custom watermark logic instead.
Resetting a job bookmark via the console only clears the bookmark for the next run — it doesn't roll back any state already written downstream. If you need to reprocess and the target is idempotent (e.g., S3 with overwrite), this is fine. For non-idempotent targets, you also need to clean up the previous output.
Was this tool helpful?
Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.