ADF + Snowflake Pipeline Configuration Guide
This guide walks through every step required to configure a BimlFlex project that uses Azure Data Factory (ADF) to extract data from a source system and load it into Snowflake Data Warehouse. By the end you will have a working project that generates ADF pipelines, Snowflake DDL scripts, and all the supporting artifacts needed for deployment.
Prerequisites
Before you begin, make sure the following are in place:
- BimlFlex installed and a metadata database provisioned (see the installation guide).
- A Snowflake account with at least one warehouse, database, and a user with sufficient privileges to create schemas, tables, stages, and file formats.
- An Azure subscription containing:
- An Azure Data Factory instance.
- An Azure Blob Storage account with a container for staging files.
- A SAS token with container-level read and write permissions on that container.
- If your source is on-premises (e.g. SQL Server), a Self-Hosted Integration Runtime registered in your Data Factory.
Step 1: Create Your Project
In the BimlFlex App, create a new Project and set the Integration Template to ADF: Source -> Target.
The project requires connections assigned to specific integration stages. The table below lists every connection slot, whether it is required, and the recommended type settings.
| Connection | Required? | ConnectionType | SystemType | IntegrationStage |
|---|---|---|---|---|
| Source (e.g. SQL Server) | Yes | OLEDB | MSSQL | Source System |
| Landing (Azure Blob) | Yes | AzureBlobStorage | DelimitedFormat | Landing Area |
| Staging | Yes | ADONET | Snowflake Data Warehouse | Staging Area |
| PSA | Optional | ADONET | Snowflake Data Warehouse | Persistent Staging Area |
| Target | Yes | ADONET | Snowflake Data Warehouse | Data Vault or Data Mart |
ADF + Snowflake requires a Landing connection backed by Azure Blob Storage. ADF Copy Activities cannot write directly to Snowflake. Instead, data is landed as files in blob storage and then loaded into Snowflake using the high-performance COPY INTO command from an external stage. If you omit the Landing connection, the generated pipelines will have no way to bridge ADF and Snowflake.
When the Staging and Target connections point to the same Snowflake database, you can reuse a single connection for both by assigning it to both integration stages in the project. This simplifies linked-service management in ADF.
Step 2: Configure Connections
Open each connection in the BimlFlex App and fill in the required fields.
Source Connection
For a SQL Server source, provide the standard OLEDB connection details: server name, database, and authentication. If the source is on-premises, set the Integration Runtime to your Self-Hosted Integration Runtime name on the Linked Service tab.
Landing Connection
Set the ConnectionType to AzureBlobStorage and the SystemType to DelimitedFormat. The connection string should point to your Azure Blob Storage account. The Linked Service tab will expose fields for the storage account URL or connection string.
Snowflake Connections (Staging, PSA, Target)
For every Snowflake connection:
- Set ConnectionType to
ADONETand SystemType toSnowflake Data Warehouse. - Fill in the Catalog field with the Snowflake database name (e.g.
BFX_DW). - Check Cloud Enabled to expose the Linked Service tab, which is required for ADF deployments.
Validation PRJ_28005003 enforces that all connections within a single ADF project that target a single-database system must share the same Catalog (database name). If your Staging connection uses BFX_STG and your Target connection uses BFX_DW, create separate projects for each database, or consolidate into a single Snowflake database. Snowflake connections are checked by their Catalog value.
The Cloud Enabled checkbox must be checked on every connection that participates in an ADF pipeline. Without it the Linked Service configuration tab does not appear and the build cannot generate the ARM template for that connection.
Step 3: Configure Linked Services
With Cloud Enabled checked, navigate to the Linked Service tab on each connection to enter the technology-specific details that ADF needs at runtime.
Snowflake Linked Service (SnowflakeV2)
BimlFlex generates a SnowflakeV2 linked service in the ADF ARM template. The following fields are parsed from the linked service configuration:
| Field | Required? | Example | Notes |
|---|---|---|---|
| Account | Yes | xy12345 | Your Snowflake account identifier (without the region suffix) |
| Region | Yes | east-us-2.azure | The Snowflake region. Combined with Account as account.region in the ARM template |
| Warehouse | Yes | LOAD_WH | Default warehouse for the linked service |
| Database Name | Yes | BFX_DW | Must match the connection Catalog |
| Username | Yes | SVC_BIMLFLEX | Snowflake login user |
| Password | Yes | •••••• | Plain-text or Azure Key Vault reference (recommended) |
| Role Name | No | LOADER_ROLE | Snowflake role; omit to use the user's default role |
For production deployments, store the Snowflake password in Azure Key Vault and reference it through the Key Vault linked service. The Linked Service tab supports Key Vault secret references for the password field.
Internally, BimlFlex stores these fields as a synthetic connection string of the form account=xy12345;region=east-us-2.azure;warehouse=LOAD_WH;db=BFX_DW;user=SVC_BIMLFLEX;password=...;role=LOADER_ROLE;. The build process parses each field out of this string when generating the ARM template. You do not need to edit this string directly; the form fields handle it.
Azure Blob Storage Linked Service
For the Landing connection, configure the blob storage linked service with either:
- A Storage Account connection string, or
- An Account Key or SAS URI.
The Integration Runtime can remain AutoResolveIntegrationRuntime unless the storage account has private endpoints.
Step 4: Azure Storage Settings
These settings control how BimlFlex interacts with Azure Blob Storage as the landing area for Snowflake loads. Configure them in the Settings Editor under the Azure category.
| Setting | Required? | Default | Example | Notes |
|---|---|---|---|---|
| AzureStageAccountName | Yes | None | myprojectstorage | Storage account name for the Snowflake external stage URL. Becomes the AzureStorageAccount pipeline parameter. |
| AzureStageContainer | Yes | staging | staging | Blob container name. Becomes the AzureStageContainer pipeline parameter. |
| AzureStageSasToken | Yes | None | ?sv=2022-11-02&ss=b&srt=co&sp=rwdlacyx&... | SAS token with container-level read/write. Becomes the AzureStorageSasToken pipeline parameter. |
| AzureCopyMethod | No | BulkInsert | BulkInsert | BulkInsert lands data to blob via ADF Copy Activity. PolyBase is for Azure Synapse only and is not applicable to Snowflake. |
| AzureStageOnExtract | No | Y | Y | When enabled, the staging stored procedure executes inline during the extract phase rather than as a separate step. |
The SAS token must grant read and write permissions at the container level. Snowflake's COPY INTO command reads files from blob storage using this token, and ADF's Copy Activity writes files to the same container. A token with insufficient permissions will cause silent failures at runtime.
The AzureStageAccountName, AzureStageContainer, and AzureStageSasToken values are propagated to ADF as pipeline parameters (AzureStorageAccount, AzureStageContainer, and AzureStorageSasToken). This means you can override them at runtime without redeploying the pipeline, which is useful for environment-specific storage accounts.
Step 5: Snowflake Settings
These settings control Snowflake-specific behavior during pipeline execution. Configure them in the Settings Editor under the Snowflake category.
| Setting | Required? | Default | Example | Notes |
|---|---|---|---|---|
| SnowflakeWarehouse | Yes | None | LOAD_WH | Becomes the SnowflakeWarehouse pipeline parameter. Used in all generated ALTER WAREHOUSE and COPY INTO commands. |
| SnowflakeFileFormat | No | (auto-created) | Keep default | Controls the file format used in the COPY INTO statement. By default BimlFlex generates FILE_FORMAT = (TYPE = 'PARQUET') inline. |
| SnowflakeScaleUp | No | Y | Y | When enabled, emits ALTER WAREHOUSE ... RESUME IF SUSPENDED and ALTER WAREHOUSE ... SET WAREHOUSE_SIZE = ... at the start of each batch pipeline. |
| SnowflakeScaleUpSize | No | MEDIUM | LARGE | Warehouse size to scale up to during the load. Becomes the SnowflakeScaleUpSize pipeline parameter. |
| SnowflakeScaleDown | No | N | XSMALL | When enabled, emits ALTER WAREHOUSE ... SET WAREHOUSE_SIZE = ... at batch completion to reduce cost. Becomes the SnowflakeScaleDownSize pipeline parameter. |
| SnowflakeAutoSuspend | No | Y | Y | When enabled, emits ALTER WAREHOUSE ... SUSPEND at the end of the batch pipeline to stop billing. |
| SnowflakeUseDatabaseReferences | No | N | N | Enables cross-database variable references in generated stored procedures. Only enable if your staging and target reside in different Snowflake databases. |
How Scale and Suspend Work
The scale-up, scale-down, and auto-suspend settings generate ALTER WAREHOUSE commands at the batch level, not at the individual object level. A typical batch pipeline looks like this:
- Scale Up (if enabled) --
ALTER WAREHOUSE <name> RESUME IF SUSPENDED; ALTER WAREHOUSE <name> SET WAREHOUSE_SIZE = <SnowflakeScaleUpSize>; - Object pipelines -- The individual extract, stage, and load activities for each source object.
- Scale Down (if enabled) --
ALTER WAREHOUSE <name> SET WAREHOUSE_SIZE = <SnowflakeScaleDownSize>; - Auto Suspend (if enabled) --
ALTER WAREHOUSE <name> SUSPEND;
This approach ensures the warehouse is running at full capacity only during the active load window, and is either downsized or suspended immediately afterward to minimize Snowflake credit consumption.
The warehouse name used in these commands comes from the SnowflakeWarehouse pipeline parameter, which means you can override it at trigger time to point different schedules at different warehouses.
Step 6: Import Metadata
With connections and settings in place, use the Metadata Importer to bring source object definitions into BimlFlex.
- Open the project in the BimlFlex App.
- Navigate to the source connection and click Import Metadata.
- Select the schemas and tables to include.
- Review and accept the imported objects.
For detailed instructions, see Importing Metadata.
Delete detection is not supported for objects that have Snowflake as a source system. If DeleteDetectionEnabled is set to Y on an object with a Snowflake source connection, validation OBJ_21005007 will raise an error. Disable delete detection at the global level or use a setting override for the affected objects.
Step 7: Configure Object Metadata
After import, review each object in the Object Editor:
- Verify column mappings and data types.
- Set the Change Type (e.g. Full, Delta) for each object.
- Add any column-level overrides or transformations.
For a full walkthrough of object configuration, see the Metadata Configurations reference.
Snowflake uses TO_TIMESTAMP(<date>, 'YYYY-MM-DD') for default date expressions instead of CONVERT(DATETIME2(7), ...) used in SQL Server. BimlFlex handles this automatically in generated code based on the target system type.
Step 8: Data Vault and Data Mart Configuration
If your target integration stage is Data Vault or Data Mart, configure the accelerator and modeling metadata as described in the dedicated guides:
The setting DvProcessOnStage controls whether Data Vault processing occurs as part of the staging stored procedure rather than as a separate step. When enabled on a Snowflake target, the generated stored procedure combines the COPY INTO (loading from blob to staging) and MERGE (staging to Data Vault) operations into a single orchestrated flow, reducing the number of ADF activities and simplifying monitoring.
Step 9: Build and Deploy
Once metadata is complete, Build the project in BimlStudio or the BimlFlex App. The build produces two categories of artifacts:
Snowflake DDL Scripts
These SQL scripts are executed against your Snowflake account to prepare the database objects:
CREATE SCHEMA IF NOT EXISTSfor each schema.CREATE TABLE IF NOT EXISTSfor staging, PSA, Data Vault, and Data Mart tables.CREATE OR REPLACE FILE FORMATfor the Parquet file format used byCOPY INTO.CREATE OR REPLACE STAGEfor external stages pointing to your Azure Blob Storage container.
Deploy these scripts using SnowSQL or any Snowflake SQL client:
snowsql -c my_connection -f output/ddl/my_database.sql
ADF ARM Templates
The ARM templates contain the complete Data Factory pipeline definitions. Each source object generates an ADF pipeline that follows this activity sequence:
- Copy Activity -- Extracts from the source and writes Parquet files to the Landing blob container.
- Script Activity (
SFLDW_CREATE_STAGE) -- Executes aCREATE OR REPLACE STAGEcommand on Snowflake, pointing to the blob container using the SAS token. - Script Activity (
COPY INTO) -- Executes aCOPY INTOcommand that bulk-loads the Parquet files from the external stage into the Snowflake staging table. - Stored Procedure Activity -- Runs the generated
MERGEorINSERTstored procedure to load data from staging into the target tables (Data Vault, Data Mart, or PSA).
Pipeline Parameters
The generated batch pipelines expose the following parameters, which can be overridden at trigger time:
| Parameter | Source Setting | Purpose |
|---|---|---|
SnowflakeWarehouse | SnowflakeWarehouse | Warehouse name for all Snowflake commands |
AzureStorageSasToken | AzureStageSasToken | SAS token for blob storage access |
AzureStorageAccount | AzureStageAccountName | Storage account name for stage URL |
AzureStageContainer | AzureStageContainer | Blob container for staging files |
AzureArchiveContainer | AzureArchiveContainer | Blob container for archived files |
AzureErrorContainer | AzureErrorContainer | Blob container for error files |
SnowflakeScaleUpSize | SnowflakeScaleUpSize | Warehouse size during load (when scale-up is enabled) |
SnowflakeScaleDownSize | SnowflakeScaleDownSize | Warehouse size after load (when scale-down is enabled) |
Deployment Options
- ARM template deployment -- Use the Azure CLI or Azure Portal to deploy the ARM template directly to your Data Factory. This is the simplest approach for initial setup.
- ADF Git integration -- If your Data Factory is connected to a Git repository, commit the generated JSON pipeline definitions to the repo and publish through the ADF UI.
- CI/CD pipelines -- Incorporate the ARM template into an Azure DevOps or GitHub Actions pipeline for automated deployments.
After the initial deployment, you can re-run the build and redeploy whenever the metadata changes. BimlFlex generates CREATE TABLE IF NOT EXISTS statements, so redeploying the DDL is safe and will not drop existing tables unless you explicitly enable the drop-and-create option.