What data sources can OCI Data Catalog harvest metadata from?

Data Catalog supports native harvesting from OCI Object Storage (auto-detects file formats), Autonomous Database, MySQL HeatWave, Oracle Database, and OCI Data Flow (Spark) applications. For external sources, you can use custom connectors or the Data Catalog API to register metadata from AWS S3, Azure Blob Storage, on-premises databases, and third-party data sources. Harvesting runs on a configurable schedule and detects schema changes automatically.

How does Data Catalog support data governance?

Data Catalog provides several governance features: business glossaries define standardized terminology and link to technical assets, custom properties let you tag assets with classification levels (public, internal, confidential), data lineage tracks how data flows between assets, and recommendations suggest tags and glossary terms based on metadata patterns. Combined with OCI policies, you can restrict data access based on catalog classifications.

OCI Data Catalog Config Builder

StorageOCI

Build Data Catalog configurations with data sources, harvest schedules, and glossary terms.

Last verified: May 2026

OCI Data Catalog Configuration

Build Data Catalog configurations with data sources, harvest schedules, and glossary terms.

Required Fields

compartmentIddisplayNamedataSources

{
  "compartmentId": "ocid1.compartment.oc1..aaaaaaaexample",
  "displayName": "enterprise-data-catalog",
  "description": "Central metadata catalog for data lake and warehouse assets",
  "dataSources": [
    {
      "displayName": "production-adw",
      "dataSourceType": "ORACLE_AUTONOMOUS_DATA_WAREHOUSE",
      "connectionDetail": {
        "autonomousDbId": "ocid1.autonomousdatabase.oc1.iad.aaaaaaaexample",
        "credentialName": "adw-catalog-cred",
        "walletSecretId": "ocid1.vaultsecret.oc1.iad.aaaaaaaexample"
      },
      "defaultConnection": "adw-prod-conn"
    },
    {
      "displayName": "data-lake-bucket",
      "dataSourceType": "ORACLE_OBJECT_STORAGE",
      "connectionDetail": {
        "namespace": "my-tenancy",
        "bucketName": "raw-data-lake",
        "region": "us-ashburn-1"
      },
      "defaultConnection": "oss-data-lake-conn"
    }
  ],
  "harvestSchedules": [
    {
      "displayName": "daily-adw-harvest",
      "scheduleType": "SCHEDULED",
      "cronExpression": "0 2 * * *",
      "dataSourceName": "production-adw",
      "schemaPatterns": ["ANALYTICS_%", "REPORTING_%"],
      "isIncremental": true
    },
    {
      "displayName": "weekly-oss-harvest",
      "scheduleType": "SCHEDULED",
      "cronExpression": "0 3 * * 0",
      "dataSourceName": "data-lake-bucket",
      "objectPatterns": ["*.parquet", "*.csv"],
      "isIncremental": false
    }
  ],
  "glossary": {
    "displayName": "business-glossary",
    "description": "Enterprise business term definitions",
    "terms": [
      {
        "name": "Customer",
        "description": "An individual or entity that purchases products or services"
      },
      {
        "name": "Revenue",
        "description": "Total income generated from sales before deductions"
      }
    ]
  },
  "customProperties": [
    {
      "displayName": "data_owner",
      "dataType": "TEXT",
      "scope": ["DATA_ASSET", "ENTITY"],
      "isRequired": true
    },
    {
      "displayName": "pii_classification",
      "dataType": "TEXT",
      "scope": ["ATTRIBUTE"],
      "allowedValues": ["none", "low", "medium", "high"],
      "isRequired": false
    }
  ],
  "freeformTags": {
    "project": "data-governance",
    "team": "data-engineering"
  }
}

Generated Output

Output will appear here...

See It in Action

Your team is building a data lake but has no central metadata view — analysts hunt for data via Slack DMs and tribal knowledge. The builder generates: a Data Catalog instance, automated harvesting from 5 Object Storage buckets and 3 Autonomous DBs, a business glossary with terms like 'customer', 'order', 'transaction' linking technical assets to business concepts, custom properties for sensitivity classification. After 3 months, the catalog has 10K+ assets cataloged and tagged. New analyst onboarding time drops from 2 weeks to 2 days — they search the catalog instead of asking around for 'where is the customer data'.

What This Tool Does

OCI Data Catalog is a metadata management service that helps you discover, organize, and govern data assets across OCI and external data sources. It automatically harvests metadata from Object Storage, Autonomous Database, MySQL HeatWave, and other data stores, creating a searchable catalog with business glossaries, tags, and data lineage. This builder helps you configure Data Catalog instances with data asset connections, harvesting schedules, custom metadata properties, and glossary structures.

Technical Details

The builder constructs OCI Data Catalog configurations: catalog instance resource (compartment), data assets (connections to Object Storage / Autonomous Database / MySQL / external sources), harvest jobs (with schedule for incremental updates), folders for organization, business glossaries (terms, categories, relationships), custom properties (for classification, tagging, ownership), and IAM policies for catalog access. Output is generated as oci data-catalog commands and Terraform oci_datacatalog_catalog + oci_datacatalog_data_asset resources.

Common Use Cases

1Configuring automated metadata harvesting from Object Storage buckets containing data lake files in Parquet, CSV, and JSON formats
2Setting up data asset connections to Autonomous Database and MySQL HeatWave for automatic schema and table metadata discovery
3Building a business glossary with terms, categories, and relationships to provide business context for technical data assets
4Creating custom metadata properties and tags that map to your organization's data classification and governance requirements

Common Questions

What data sources can OCI Data Catalog harvest metadata from?: Data Catalog supports native harvesting from OCI Object Storage (auto-detects file formats), Autonomous Database, MySQL HeatWave, Oracle Database, and OCI Data Flow (Spark) applications. For external sources, you can use custom connectors or the Data Catalog API to register metadata from AWS S3, Azure Blob Storage, on-premises databases, and third-party data sources. Harvesting runs on a configurable schedule and detects schema changes automatically.
How does Data Catalog support data governance?: Data Catalog provides several governance features: business glossaries define standardized terminology and link to technical assets, custom properties let you tag assets with classification levels (public, internal, confidential), data lineage tracks how data flows between assets, and recommendations suggest tags and glossary terms based on metadata patterns. Combined with OCI policies, you can restrict data access based on catalog classifications.

Expert Tips

TIP

Always start with automated metadata harvesting from existing data sources — Object Storage, Autonomous Database, MySQL HeatWave. Manual catalog entry doesn't scale. The harvester discovers schemas, file formats, and column statistics automatically; you add business context (glossary terms, classifications) on top of the auto-discovered metadata.

TIP

Custom metadata properties + tags are how you operationalize data classification. Tag every asset with sensitivity (public, internal, confidential, restricted) — this becomes the foundation for IAM policies that restrict access based on classification. Without classification, you can't enforce 'restricted data access requires elevated approval'.

TIP

Data lineage tracking is the killer feature for impact analysis. When a source schema changes, the catalog shows every downstream asset that depends on it. Without lineage, schema changes become 'hope nothing breaks' deployments. With it, you have a clear blast radius for any change.

Related Learning Guides

OCI Object Storage & Tiers16 min read

Was this tool helpful?

Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.

OCI Data Catalog Config Builder

StorageOCI

Build Data Catalog configurations with data sources, harvest schedules, and glossary terms.

Last verified: May 2026

OCI Data Catalog Configuration

Build Data Catalog configurations with data sources, harvest schedules, and glossary terms.

Required Fields

compartmentIddisplayNamedataSources

Generated Output

Output will appear here...

See It in Action

What This Tool Does

Technical Details

Common Use Cases

1Configuring automated metadata harvesting from Object Storage buckets containing data lake files in Parquet, CSV, and JSON formats
2Setting up data asset connections to Autonomous Database and MySQL HeatWave for automatic schema and table metadata discovery
3Building a business glossary with terms, categories, and relationships to provide business context for technical data assets
4Creating custom metadata properties and tags that map to your organization's data classification and governance requirements

Common Questions

What data sources can OCI Data Catalog harvest metadata from?: Data Catalog supports native harvesting from OCI Object Storage (auto-detects file formats), Autonomous Database, MySQL HeatWave, Oracle Database, and OCI Data Flow (Spark) applications. For external sources, you can use custom connectors or the Data Catalog API to register metadata from AWS S3, Azure Blob Storage, on-premises databases, and third-party data sources. Harvesting runs on a configurable schedule and detects schema changes automatically.
How does Data Catalog support data governance?: Data Catalog provides several governance features: business glossaries define standardized terminology and link to technical assets, custom properties let you tag assets with classification levels (public, internal, confidential), data lineage tracks how data flows between assets, and recommendations suggest tags and glossary terms based on metadata patterns. Combined with OCI policies, you can restrict data access based on catalog classifications.

Expert Tips

TIP

Related Learning Guides

OCI Object Storage & Tiers16 min read

Was this tool helpful?