Build Data Flow (Spark) application configurations with driver/executor shapes and parameters.
Last verified: May 2026
Build Data Flow (Spark) application configurations with driver/executor shapes, parameters, and private endpoints.
Required Fields
compartmentIddisplayNamelanguagesparkVersionfileUridriverShapeexecutorShapenumExecutorsOutput will appear here...The builder constructs OCI Data Flow application configurations: application resource (compartment, name, Spark version, language: SPARK_SCALA / SPARK_PYTHON / SPARK_JAVA / SPARK_SQL), file URI (the application JAR/Python/SQL file in Object Storage), driver shape + count, executor shape + count, parameter overrides, and warehouse Object Storage bucket for output. Output is generated as oci data-flow commands and Terraform oci_dataflow_application + oci_dataflow_run resources.
Build Data Flow (Spark) application configurations with driver/executor shapes and parameters. This tool helps OCI engineers generate valid configurations quickly without consulting documentation, reducing errors and accelerating infrastructure deployment. All processing runs in your browser with no data sent to external servers.
Your data team's nightly Spark ETL on a self-managed EMR cluster costs $1,800/month always-on. Most jobs only run for 2-3 hours. The builder generates a Data Flow application: PySpark code in Object Storage, driver VM.Standard.E5.Flex 4 OCPU + executor VM.Standard.E5.Flex 8 OCPU × 5 instances, runs on-demand triggered by Object Storage events. New monthly cost: ~$200 (only 2-3 hours/day vs 24/7). Annual savings: $19K. Plus elimination of EMR cluster maintenance.
OCI Data Flow is managed Spark — eliminates the operational burden of running Spark clusters. Submit job → Data Flow provisions cluster → runs job → tears down. You pay only for actual job runtime. Dramatically simpler than self-managed EMR/Dataproc.
Driver/executor shape choice matters for cost. Small workloads: VM.Standard.E5.Flex with 4 OCPU. Large analytics: VM.Standard.E5.Flex with 16+ OCPU + 128 GB RAM. GPU workloads (rare for Spark): BM.GPU.A10. Right-size based on actual job profile, not 'just in case' over-provisioning.
Use Spot instances for fault-tolerant Spark workloads — Data Flow handles preemption gracefully by re-running affected tasks. Combined with Spot pricing (60% discount), this can cut large analytics job costs significantly.
No. This tool runs entirely in your browser and generates configuration JSON that you can copy and paste into your infrastructure-as-code templates, CLI commands, or cloud console. It never connects to any cloud account or sends data externally.
The tool produces syntactically valid configurations based on current OCI service specifications. Always review generated configs against your organization security policies and test in a non-production environment before deploying.
Was this tool helpful?
Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.