How does this Data Science Pipeline builder keep up with new OCI features?

The Data Science Pipeline options surface what is currently documented in the OCI reference for that service. When Oracle adds a new property or value, we add it here after verifying the schema in a real tenancy. If a recently-announced feature is not yet selectable, treat that as a 'not yet supported' signal rather than an opinion that it should not be used.

Can I use this Data Science Pipeline builder for OCI Government Cloud or sovereign regions?

Most Data Science Pipeline primitives behave the same in commercial and Government Cloud OCI, but the OCID realm differs, region availability is limited, and a handful of services are unavailable. The output is portable in shape; you must adjust realm and verify service availability before applying in a Government Cloud tenancy.

OCI Data Science Pipeline Builder

ComputeOCI

Build Data Science pipeline step configurations with job dependencies and infrastructure settings.

Last verified: May 2026

OCI Data Science Configuration

Build Data Science pipeline step configurations with job dependencies, infrastructure, and environment settings.

Required Fields

compartmentIdprojectIddisplayNamepipelineSteps

{
  "compartmentId": "ocid1.compartment.oc1..aaaaaaaexample",
  "projectId": "ocid1.datascienceproject.oc1.iad.aaaaaaaexample",
  "displayName": "ml-training-pipeline",
  "description": "End-to-end ML training pipeline with preprocessing, training, and evaluation",
  "pipelineSteps": [
    {
      "stepName": "data-preprocessing",
      "stepType": "ML_JOB",
      "description": "Clean and transform raw data",
      "jobId": "ocid1.datasciencejob.oc1.iad.aaaaaaaexample-preprocess",
      "stepConfigurationDetails": {
        "environmentVariables": {
          "INPUT_BUCKET": "raw-data",
          "OUTPUT_BUCKET": "processed-data",
          "SAMPLE_RATE": "1.0"
        },
        "commandLineArguments": "--format parquet --partition-by date",
        "maximumRuntimeInMinutes": 120
      },
      "stepInfrastructureConfigurationDetails": {
        "shapeName": "VM.Standard.E4.Flex",
        "shapeConfigDetails": {
          "ocpus": 8,
          "memoryInGBs": 128
        },
        "blockStorageSizeInGBs": 200,
        "subnetId": "ocid1.subnet.oc1.iad.aaaaaaaexample"
      }
    },
    {
      "stepName": "model-training",
      "stepType": "ML_JOB",
      "description": "Train the ML model with processed data",
      "jobId": "ocid1.datasciencejob.oc1.iad.aaaaaaaexample-train",
      "dependsOn": ["data-preprocessing"],
      "stepConfigurationDetails": {
        "environmentVariables": {
          "MODEL_TYPE": "xgboost",
          "EPOCHS": "100",
          "LEARNING_RATE": "0.01"
        },
        "maximumRuntimeInMinutes": 360
      },
      "stepInfrastructureConfigurationDetails": {
        "shapeName": "VM.GPU3.1",
        "blockStorageSizeInGBs": 500,
        "subnetId": "ocid1.subnet.oc1.iad.aaaaaaaexample"
      }
    },
    {
      "stepName": "model-evaluation",
      "stepType": "ML_JOB",
      "description": "Evaluate model metrics and register if passing threshold",
      "jobId": "ocid1.datasciencejob.oc1.iad.aaaaaaaexample-eval",
      "dependsOn": ["model-training"],
      "stepConfigurationDetails": {
        "environmentVariables": {
          "MIN_ACCURACY": "0.85",
          "REGISTER_MODEL": "true"
        },
        "maximumRuntimeInMinutes": 30
      },
      "stepInfrastructureConfigurationDetails": {
        "shapeName": "VM.Standard.E4.Flex",
        "shapeConfigDetails": {
          "ocpus": 4,
          "memoryInGBs": 64
        },
        "blockStorageSizeInGBs": 100
      }
    }
  ],
  "logConfiguration": {
    "logGroupId": "ocid1.loggroup.oc1.iad.aaaaaaaexample",
    "logId": "ocid1.log.oc1.iad.aaaaaaaexample"
  },
  "freeformTags": {
    "project": "ml-platform",
    "team": "data-science"
  }
}

Generated Output

Output will appear here...

About This Tool

Build Data Science pipeline step configurations with job dependencies and infrastructure settings. This tool helps OCI engineers generate valid configurations quickly without consulting documentation, reducing errors and accelerating infrastructure deployment. All processing runs in your browser with no data sent to external servers.

Real-World Scenario

Your data science team is repeatedly running the same training workflow (data prep → train → validate → register in catalog) by manually launching individual jobs. The builder generates a Pipeline definition: 4 steps with proper dependencies, infrastructure right-sized per step (small VM for prep, GPU for training, CPU for validation, small VM for registration). Now the team triggers entire workflow runs with one command. Each run is tracked with full lineage — when a production model misbehaves, the team has the exact training run, data version, and parameters to debug.

When to Use This Tool

•Designing a Data Science Pipeline configuration for a new OCI workload where the team wants the spec reviewed before any console click happens.
•Comparing two candidate Data Science Pipeline layouts side by side to pick the one with the cleaner compartment boundaries.
•Onboarding a new engineer by giving them a Data Science Pipeline sandbox to iterate on without touching live tenancies.
•Building a Data Science Pipeline reference implementation for a customer where you cannot rely on their tenancy access.

Pro Tips

TIP

Pipelines orchestrate ML workflows: data prep → training → validation → deployment. Each step runs in its own job with its own infrastructure (CPU for data prep, GPU for training, CPU for inference). Without pipelines, you'd manually coordinate jobs and pass artifacts — error-prone and not reproducible.

TIP

Use parameter overrides to run the same pipeline with different hyperparameters. The base pipeline definition stays stable; runtime parameters control learning rates, batch sizes, model architectures. This is how teams do hyperparameter tuning at scale.

TIP

Pipeline runs are versioned and tracked — every execution has full lineage. When a model misbehaves in production, you can trace back through the exact pipeline run, parameters, and data versions that produced it. Without this, ML model debugging is nearly impossible.

How It Works Under the Hood

The builder constructs OCI Data Science pipeline configurations: pipeline resource (compartment, project association, configuration overrides), pipeline steps (each with step type: ML_JOB or CUSTOM_SCRIPT, dependency array referencing prerequisite step names, custom container image references, infrastructure config: shape + count, environment variables, parameter overrides). Output is generated as oci data-science pipeline commands and Terraform oci_datascience_pipeline resources.

Frequently Asked Questions

How does this Data Science Pipeline builder keep up with new OCI features?: The Data Science Pipeline options surface what is currently documented in the OCI reference for that service. When Oracle adds a new property or value, we add it here after verifying the schema in a real tenancy. If a recently-announced feature is not yet selectable, treat that as a 'not yet supported' signal rather than an opinion that it should not be used.
Can I use this Data Science Pipeline builder for OCI Government Cloud or sovereign regions?: Most Data Science Pipeline primitives behave the same in commercial and Government Cloud OCI, but the OCID realm differs, region availability is limited, and a handful of services are unavailable. The output is portable in shape; you must adjust realm and verify service availability before applying in a Government Cloud tenancy.

Related Learning Guides

OCI Compute Shapes & Instances18 min read
OKE: Kubernetes on Oracle Cloud22 min read

Was this tool helpful?

Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.