OCI Data Flow Application Builder

StorageOCI

Build Data Flow (Spark) application configurations with driver/executor shapes and parameters.

Last verified: May 2026

OCI Data Flow Configuration

Build Data Flow (Spark) application configurations with driver/executor shapes, parameters, and private endpoints.

Required Fields

compartmentIddisplayNamelanguagesparkVersionfileUridriverShapeexecutorShapenumExecutors

{
  "compartmentId": "ocid1.compartment.oc1..aaaaaaaexample",
  "displayName": "daily-sales-etl-spark",
  "description": "Spark application for daily sales data ETL from Object Storage to ADW",
  "language": "PYTHON",
  "sparkVersion": "3.5.0",
  "fileUri": "oci://spark-apps@my-tenancy/etl/sales_etl.py",
  "archiveUri": "oci://spark-apps@my-tenancy/etl/dependencies.zip",
  "arguments": [
    "--input", "oci://raw-data@my-tenancy/sales/",
    "--output", "oci://processed-data@my-tenancy/sales/",
    "--date", "${date}",
    "--format", "parquet"
  ],
  "configuration": {
    "spark.sql.shuffle.partitions": "200",
    "spark.sql.adaptive.enabled": "true",
    "spark.dynamicAllocation.enabled": "true",
    "spark.dynamicAllocation.minExecutors": "2",
    "spark.dynamicAllocation.maxExecutors": "20"
  },
  "driverShape": "VM.Standard.E4.Flex",
  "driverShapeConfig": {
    "ocpus": 4,
    "memoryInGBs": 32
  },
  "executorShape": "VM.Standard.E4.Flex",
  "executorShapeConfig": {
    "ocpus": 4,
    "memoryInGBs": 32
  },
  "numExecutors": 5,
  "warehouseBucketUri": "oci://dataflow-warehouse@my-tenancy/",
  "logsBucketUri": "oci://dataflow-logs@my-tenancy/",
  "privateEndpointId": "ocid1.dataflowprivateendpoint.oc1.iad.aaaaaaaexample",
  "metastoreId": "ocid1.datacatalogmetastore.oc1.iad.aaaaaaaexample",
  "maxDurationInMinutes": 180,
  "idleTimeoutInMinutes": 30,
  "freeformTags": {
    "pipeline": "sales-etl",
    "team": "data-engineering"
  }
}

Generated Output

Output will appear here...

How This Tool Works

The builder constructs OCI Data Flow application configurations: application resource (compartment, name, Spark version, language: SPARK_SCALA / SPARK_PYTHON / SPARK_JAVA / SPARK_SQL), file URI (the application JAR/Python/SQL file in Object Storage), driver shape + count, executor shape + count, parameter overrides, and warehouse Object Storage bucket for output. Output is generated as oci data-flow commands and Terraform oci_dataflow_application + oci_dataflow_run resources.

Overview

Build Data Flow (Spark) application configurations with driver/executor shapes and parameters. This tool helps OCI engineers generate valid configurations quickly without consulting documentation, reducing errors and accelerating infrastructure deployment. All processing runs in your browser with no data sent to external servers.

How Engineers Use This

•Designing a Data Flow Application configuration for a new OCI workload where the team wants the spec reviewed before any console click happens.
•Sketching a Data Flow Application migration across OCI compartments with policy and tag-namespace implications spelled out.
•Drafting a Data Flow Application change for a Terraform or Resource Manager stack before raising the pull request, so reviewers see intent in plain text.
•Onboarding a new engineer by giving them a Data Flow Application sandbox to iterate on without touching live tenancies.

A Real Example

Your data team's nightly Spark ETL on a self-managed EMR cluster costs $1,800/month always-on. Most jobs only run for 2-3 hours. The builder generates a Data Flow application: PySpark code in Object Storage, driver VM.Standard.E5.Flex 4 OCPU + executor VM.Standard.E5.Flex 8 OCPU × 5 instances, runs on-demand triggered by Object Storage events. New monthly cost: ~$200 (only 2-3 hours/day vs 24/7). Annual savings: $19K. Plus elimination of EMR cluster maintenance.

Tips & Gotchas

TIP

OCI Data Flow is managed Spark — eliminates the operational burden of running Spark clusters. Submit job → Data Flow provisions cluster → runs job → tears down. You pay only for actual job runtime. Dramatically simpler than self-managed EMR/Dataproc.

TIP

Driver/executor shape choice matters for cost. Small workloads: VM.Standard.E5.Flex with 4 OCPU. Large analytics: VM.Standard.E5.Flex with 16+ OCPU + 128 GB RAM. GPU workloads (rare for Spark): BM.GPU.A10. Right-size based on actual job profile, not 'just in case' over-provisioning.

TIP

Use Spot instances for fault-tolerant Spark workloads — Data Flow handles preemption gracefully by re-running affected tasks. Combined with Spot pricing (60% discount), this can cut large analytics job costs significantly.

Questions & Answers

Can I use this Data Flow Application builder for OCI Government Cloud or sovereign regions?

Most Data Flow Application primitives behave the same in commercial and Government Cloud OCI, but the OCID realm differs, region availability is limited, and a handful of services are unavailable. The output is portable in shape; you must adjust realm and verify service availability before applying in a Government Cloud tenancy.

Will the output of this Data Flow Application builder pass OCI Resource Manager validation or `terraform validate`?

It produces structurally valid output for the OCI schemas it supports. We still recommend running provider validation locally before applying — schemas evolve and a recently-released property may not yet be reflected. When validation does fail, the error points at the exact attribute the schema rejected.

Related Learning Guides

Autonomous Database on OCI20 min read
OCI Object Storage & Tiers16 min read

Was this tool helpful?

Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.