What is the difference between Data Factory and Synapse pipelines?

Azure Synapse Analytics includes a pipeline service that is based on Data Factory. They share the same underlying technology and most features are identical. The main difference is that Synapse pipelines are part of the unified Synapse workspace and have direct integration with Synapse SQL pools and Spark pools.

Can Data Factory connect to on-premises data sources?

Yes. Data Factory uses a Self-Hosted Integration Runtime (SHIR) to connect to on-premises databases, file systems, and other data sources behind a firewall. The SHIR acts as a bridge between Data Factory in the cloud and your on-premises infrastructure.

Azure Data Factory Pipeline Builder

ComputeAzure

Build copy activity pipeline JSON with source and sink configurations.

Last verified: May 2026

Data Factory Configuration

Build copy activity pipeline JSON with source and sink configurations.

Required Fields

namepropertiesproperties.activities

Generated Output

Output will appear here...

About This Tool

The Azure Data Factory Pipeline Builder helps you create Data Factory pipeline definitions with activities, datasets, linked services, and triggers. Data Factory orchestrates data movement and transformation across cloud and on-premises sources. This tool provides a structured interface for building pipelines with copy activities, data flows, conditional logic, and iteration, generating the JSON definition for deployment via ARM templates, Bicep, or the Data Factory SDK.

Real-World Scenario

Your team is building a daily ETL that ingests 50 source tables from on-prem SQL Server into a Synapse data lake. The builder helps you generate a pipeline with: ForEach over a control table containing the 50 source/destination pairs, batchCount=10 (parallelism limited to 10 to prevent SQL Server overload), each iteration runs a Copy activity using a SHIR (self-hosted integration runtime) on the on-prem SQL side and AutoResolveIntegrationRuntime on the Synapse side. End-to-end pipeline definition: 30 minutes vs the 1-day estimate working from scratch.

When to Use This Tool

•Build data ingestion pipelines that copy data from SQL databases, blob storage, or REST APIs into a data lake.
•Create ETL pipelines with data flow transformations for cleaning, aggregating, and reshaping data before loading into a data warehouse.
•Design pipelines with ForEach loops and conditional logic for processing variable numbers of files or tables.
•Generate pipeline definitions with tumbling window or schedule triggers for recurring data processing jobs.

Pro Tips

TIP

Data Factory pipelines bill per activity run AND per integration runtime DIU-hour. Many teams optimize for fewer activities but ignore IR consumption — copying 1 TB through a 4-DIU IR takes longer (and costs more) than the same copy through a 16-DIU IR despite the higher per-hour rate. Right-size IR based on data volume, not activity count.

TIP

For ETL/ELT workloads where the source and destination are both Azure storage, Data Factory's mapping data flows on Spark are typically more cost-effective than copy activities + transformation. The ramp-up cost of starting a Spark cluster is the catch — only worth it for jobs >5 minutes of transformation.

TIP

ForEach activities default to PARALLEL execution (50 max). For source systems with rate limits, set sequential execution or batchCount to a small number — otherwise you'll hammer the source and trigger throttling errors that cost you both runtime and reliability.

How It Works Under the Hood

The builder constructs Data Factory pipeline JSON with activities (Copy, DataFlow, ForEach, IfCondition, Switch, ExecutePipeline, Lookup, etc.), datasets (source and sink definitions referencing linked services), and parameters. Output is generated as the JSON definition you'd put under Microsoft.DataFactory/factories/pipelines in ARM/Bicep, plus the sequence of az datafactory pipeline create commands.

Frequently Asked Questions

What is the difference between Data Factory and Synapse pipelines?: Azure Synapse Analytics includes a pipeline service that is based on Data Factory. They share the same underlying technology and most features are identical. The main difference is that Synapse pipelines are part of the unified Synapse workspace and have direct integration with Synapse SQL pools and Spark pools.
Can Data Factory connect to on-premises data sources?: Yes. Data Factory uses a Self-Hosted Integration Runtime (SHIR) to connect to on-premises databases, file systems, and other data sources behind a firewall. The SHIR acts as a bridge between Data Factory in the cloud and your on-premises infrastructure.

Was this tool helpful?

Disclaimer: This tool runs entirely in your browser. No data is sent to our servers. Always verify outputs before using them in production. AWS, Azure, and GCP are trademarks of their respective owners.

Azure Data Factory Pipeline Builder

ComputeAzure

Build copy activity pipeline JSON with source and sink configurations.

Last verified: May 2026

Data Factory Configuration

Build copy activity pipeline JSON with source and sink configurations.

Required Fields

namepropertiesproperties.activities

Generated Output

Output will appear here...

About This Tool

Real-World Scenario

When to Use This Tool

•Build data ingestion pipelines that copy data from SQL databases, blob storage, or REST APIs into a data lake.
•Create ETL pipelines with data flow transformations for cleaning, aggregating, and reshaping data before loading into a data warehouse.
•Design pipelines with ForEach loops and conditional logic for processing variable numbers of files or tables.
•Generate pipeline definitions with tumbling window or schedule triggers for recurring data processing jobs.

Pro Tips

TIP

How It Works Under the Hood

Frequently Asked Questions

What is the difference between Data Factory and Synapse pipelines?: Azure Synapse Analytics includes a pipeline service that is based on Data Factory. They share the same underlying technology and most features are identical. The main difference is that Synapse pipelines are part of the unified Synapse workspace and have direct integration with Synapse SQL pools and Spark pools.
Can Data Factory connect to on-premises data sources?: Yes. Data Factory uses a Self-Hosted Integration Runtime (SHIR) to connect to on-premises databases, file systems, and other data sources behind a firewall. The SHIR acts as a bridge between Data Factory in the cloud and your on-premises infrastructure.

Was this tool helpful?