Skip to main content

ADF + Snowflake Pipeline Configuration Guide

This guide walks through every step required to configure a BimlFlex project that uses Azure Data Factory (ADF) to extract data from a source system and load it into Snowflake Data Warehouse. By the end you will have a working project that generates ADF pipelines, Snowflake DDL scripts, and all the supporting artifacts needed for deployment.

Prerequisites

Before you begin, make sure the following are in place:

  • BimlFlex installed and a metadata database provisioned (see the installation guide).
  • A Snowflake account with at least one warehouse, database, and a user with sufficient privileges to create schemas, tables, stages, and file formats.
  • An Azure subscription containing:
    • An Azure Data Factory instance.
    • An Azure Blob Storage account with a container for staging files.
    • A SAS token with container-level read and write permissions on that container.
  • If your source is on-premises (e.g. SQL Server), a Self-Hosted Integration Runtime registered in your Data Factory.

Step 1: Create Your Project

In the BimlFlex App, create a new Project and set the Integration Template to ADF: Source -> Target.

The project requires connections assigned to specific integration stages. The table below lists every connection slot, whether it is required, and the recommended type settings.

ConnectionRequired?ConnectionTypeSystemTypeIntegrationStage
Source (e.g. SQL Server)YesOLEDBMSSQLSource System
Landing (Azure Blob)YesAzureBlobStorageDelimitedFormatLanding Area
StagingYesADONETSnowflake Data WarehouseStaging Area
PSAOptionalADONETSnowflake Data WarehousePersistent Staging Area
TargetYesADONETSnowflake Data WarehouseData Vault or Data Mart
warning

ADF + Snowflake requires a Landing connection backed by Azure Blob Storage. ADF Copy Activities cannot write directly to Snowflake. Instead, data is landed as files in blob storage and then loaded into Snowflake using the high-performance COPY INTO command from an external stage. If you omit the Landing connection, the generated pipelines will have no way to bridge ADF and Snowflake.

tip

When the Staging and Target connections point to the same Snowflake database, you can reuse a single connection for both by assigning it to both integration stages in the project. This simplifies linked-service management in ADF.

Step 2: Configure Connections

Open each connection in the BimlFlex App and fill in the required fields.

Source Connection

For a SQL Server source, provide the standard OLEDB connection details: server name, database, and authentication. If the source is on-premises, set the Integration Runtime to your Self-Hosted Integration Runtime name on the Linked Service tab.

Landing Connection

Set the ConnectionType to AzureBlobStorage and the SystemType to DelimitedFormat. The connection string should point to your Azure Blob Storage account. The Linked Service tab will expose fields for the storage account URL or connection string.

Snowflake Connections (Staging, PSA, Target)

For every Snowflake connection:

  1. Set ConnectionType to ADONET and SystemType to Snowflake Data Warehouse.
  2. Fill in the Catalog field with the Snowflake database name (e.g. BFX_DW).
  3. Check Cloud Enabled to expose the Linked Service tab, which is required for ADF deployments.
warning

Validation PRJ_28005003 enforces that all connections within a single ADF project that target a single-database system must share the same Catalog (database name). If your Staging connection uses BFX_STG and your Target connection uses BFX_DW, create separate projects for each database, or consolidate into a single Snowflake database. Snowflake connections are checked by their Catalog value.

note

The Cloud Enabled checkbox must be checked on every connection that participates in an ADF pipeline. Without it the Linked Service configuration tab does not appear and the build cannot generate the ARM template for that connection.

Step 3: Configure Linked Services

With Cloud Enabled checked, navigate to the Linked Service tab on each connection to enter the technology-specific details that ADF needs at runtime.

Snowflake Linked Service (SnowflakeV2)

BimlFlex generates a SnowflakeV2 linked service in the ADF ARM template. The following fields are parsed from the linked service configuration:

FieldRequired?ExampleNotes
AccountYesxy12345Your Snowflake account identifier (without the region suffix)
RegionYeseast-us-2.azureThe Snowflake region. Combined with Account as account.region in the ARM template
WarehouseYesLOAD_WHDefault warehouse for the linked service
Database NameYesBFX_DWMust match the connection Catalog
UsernameYesSVC_BIMLFLEXSnowflake login user
PasswordYes••••••Plain-text or Azure Key Vault reference (recommended)
Role NameNoLOADER_ROLESnowflake role; omit to use the user's default role
tip

For production deployments, store the Snowflake password in Azure Key Vault and reference it through the Key Vault linked service. The Linked Service tab supports Key Vault secret references for the password field.

note

Internally, BimlFlex stores these fields as a synthetic connection string of the form account=xy12345;region=east-us-2.azure;warehouse=LOAD_WH;db=BFX_DW;user=SVC_BIMLFLEX;password=...;role=LOADER_ROLE;. The build process parses each field out of this string when generating the ARM template. You do not need to edit this string directly; the form fields handle it.

Azure Blob Storage Linked Service

For the Landing connection, configure the blob storage linked service with either:

  • A Storage Account connection string, or
  • An Account Key or SAS URI.

The Integration Runtime can remain AutoResolveIntegrationRuntime unless the storage account has private endpoints.

Step 4: Azure Storage Settings

These settings control how BimlFlex interacts with Azure Blob Storage as the landing area for Snowflake loads. Configure them in the Settings Editor under the Azure category.

SettingRequired?DefaultExampleNotes
AzureStageAccountNameYesNonemyprojectstorageStorage account name for the Snowflake external stage URL. Becomes the AzureStorageAccount pipeline parameter.
AzureStageContainerYesstagingstagingBlob container name. Becomes the AzureStageContainer pipeline parameter.
AzureStageSasTokenYesNone?sv=2022-11-02&ss=b&srt=co&sp=rwdlacyx&...SAS token with container-level read/write. Becomes the AzureStorageSasToken pipeline parameter.
AzureCopyMethodNoBulkInsertBulkInsertBulkInsert lands data to blob via ADF Copy Activity. PolyBase is for Azure Synapse only and is not applicable to Snowflake.
AzureStageOnExtractNoYYWhen enabled, the staging stored procedure executes inline during the extract phase rather than as a separate step.
warning

The SAS token must grant read and write permissions at the container level. Snowflake's COPY INTO command reads files from blob storage using this token, and ADF's Copy Activity writes files to the same container. A token with insufficient permissions will cause silent failures at runtime.

tip

The AzureStageAccountName, AzureStageContainer, and AzureStageSasToken values are propagated to ADF as pipeline parameters (AzureStorageAccount, AzureStageContainer, and AzureStorageSasToken). This means you can override them at runtime without redeploying the pipeline, which is useful for environment-specific storage accounts.

Step 5: Snowflake Settings

These settings control Snowflake-specific behavior during pipeline execution. Configure them in the Settings Editor under the Snowflake category.

SettingRequired?DefaultExampleNotes
SnowflakeWarehouseYesNoneLOAD_WHBecomes the SnowflakeWarehouse pipeline parameter. Used in all generated ALTER WAREHOUSE and COPY INTO commands.
SnowflakeFileFormatNo(auto-created)Keep defaultControls the file format used in the COPY INTO statement. By default BimlFlex generates FILE_FORMAT = (TYPE = 'PARQUET') inline.
SnowflakeScaleUpNoYYWhen enabled, emits ALTER WAREHOUSE ... RESUME IF SUSPENDED and ALTER WAREHOUSE ... SET WAREHOUSE_SIZE = ... at the start of each batch pipeline.
SnowflakeScaleUpSizeNoMEDIUMLARGEWarehouse size to scale up to during the load. Becomes the SnowflakeScaleUpSize pipeline parameter.
SnowflakeScaleDownNoNXSMALLWhen enabled, emits ALTER WAREHOUSE ... SET WAREHOUSE_SIZE = ... at batch completion to reduce cost. Becomes the SnowflakeScaleDownSize pipeline parameter.
SnowflakeAutoSuspendNoYYWhen enabled, emits ALTER WAREHOUSE ... SUSPEND at the end of the batch pipeline to stop billing.
SnowflakeUseDatabaseReferencesNoNNEnables cross-database variable references in generated stored procedures. Only enable if your staging and target reside in different Snowflake databases.

How Scale and Suspend Work

The scale-up, scale-down, and auto-suspend settings generate ALTER WAREHOUSE commands at the batch level, not at the individual object level. A typical batch pipeline looks like this:

  1. Scale Up (if enabled) -- ALTER WAREHOUSE <name> RESUME IF SUSPENDED; ALTER WAREHOUSE <name> SET WAREHOUSE_SIZE = <SnowflakeScaleUpSize>;
  2. Object pipelines -- The individual extract, stage, and load activities for each source object.
  3. Scale Down (if enabled) -- ALTER WAREHOUSE <name> SET WAREHOUSE_SIZE = <SnowflakeScaleDownSize>;
  4. Auto Suspend (if enabled) -- ALTER WAREHOUSE <name> SUSPEND;

This approach ensures the warehouse is running at full capacity only during the active load window, and is either downsized or suspended immediately afterward to minimize Snowflake credit consumption.

note

The warehouse name used in these commands comes from the SnowflakeWarehouse pipeline parameter, which means you can override it at trigger time to point different schedules at different warehouses.

Step 6: Import Metadata

With connections and settings in place, use the Metadata Importer to bring source object definitions into BimlFlex.

  1. Open the project in the BimlFlex App.
  2. Navigate to the source connection and click Import Metadata.
  3. Select the schemas and tables to include.
  4. Review and accept the imported objects.

For detailed instructions, see Importing Metadata.

warning

Delete detection is not supported for objects that have Snowflake as a source system. If DeleteDetectionEnabled is set to Y on an object with a Snowflake source connection, validation OBJ_21005007 will raise an error. Disable delete detection at the global level or use a setting override for the affected objects.

Step 7: Configure Object Metadata

After import, review each object in the Object Editor:

  • Verify column mappings and data types.
  • Set the Change Type (e.g. Full, Delta) for each object.
  • Add any column-level overrides or transformations.

For a full walkthrough of object configuration, see the Metadata Configurations reference.

note

Snowflake uses TO_TIMESTAMP(<date>, 'YYYY-MM-DD') for default date expressions instead of CONVERT(DATETIME2(7), ...) used in SQL Server. BimlFlex handles this automatically in generated code based on the target system type.

Step 8: Data Vault and Data Mart Configuration

If your target integration stage is Data Vault or Data Mart, configure the accelerator and modeling metadata as described in the dedicated guides:

tip

The setting DvProcessOnStage controls whether Data Vault processing occurs as part of the staging stored procedure rather than as a separate step. When enabled on a Snowflake target, the generated stored procedure combines the COPY INTO (loading from blob to staging) and MERGE (staging to Data Vault) operations into a single orchestrated flow, reducing the number of ADF activities and simplifying monitoring.

Step 9: Build and Deploy

Once metadata is complete, Build the project in BimlStudio or the BimlFlex App. The build produces two categories of artifacts:

Snowflake DDL Scripts

These SQL scripts are executed against your Snowflake account to prepare the database objects:

  • CREATE SCHEMA IF NOT EXISTS for each schema.
  • CREATE TABLE IF NOT EXISTS for staging, PSA, Data Vault, and Data Mart tables.
  • CREATE OR REPLACE FILE FORMAT for the Parquet file format used by COPY INTO.
  • CREATE OR REPLACE STAGE for external stages pointing to your Azure Blob Storage container.

Deploy these scripts using SnowSQL or any Snowflake SQL client:

snowsql -c my_connection -f output/ddl/my_database.sql

ADF ARM Templates

The ARM templates contain the complete Data Factory pipeline definitions. Each source object generates an ADF pipeline that follows this activity sequence:

  1. Copy Activity -- Extracts from the source and writes Parquet files to the Landing blob container.
  2. Script Activity (SFLDW_CREATE_STAGE) -- Executes a CREATE OR REPLACE STAGE command on Snowflake, pointing to the blob container using the SAS token.
  3. Script Activity (COPY INTO) -- Executes a COPY INTO command that bulk-loads the Parquet files from the external stage into the Snowflake staging table.
  4. Stored Procedure Activity -- Runs the generated MERGE or INSERT stored procedure to load data from staging into the target tables (Data Vault, Data Mart, or PSA).

Pipeline Parameters

The generated batch pipelines expose the following parameters, which can be overridden at trigger time:

ParameterSource SettingPurpose
SnowflakeWarehouseSnowflakeWarehouseWarehouse name for all Snowflake commands
AzureStorageSasTokenAzureStageSasTokenSAS token for blob storage access
AzureStorageAccountAzureStageAccountNameStorage account name for stage URL
AzureStageContainerAzureStageContainerBlob container for staging files
AzureArchiveContainerAzureArchiveContainerBlob container for archived files
AzureErrorContainerAzureErrorContainerBlob container for error files
SnowflakeScaleUpSizeSnowflakeScaleUpSizeWarehouse size during load (when scale-up is enabled)
SnowflakeScaleDownSizeSnowflakeScaleDownSizeWarehouse size after load (when scale-down is enabled)

Deployment Options

  • ARM template deployment -- Use the Azure CLI or Azure Portal to deploy the ARM template directly to your Data Factory. This is the simplest approach for initial setup.
  • ADF Git integration -- If your Data Factory is connected to a Git repository, commit the generated JSON pipeline definitions to the repo and publish through the ADF UI.
  • CI/CD pipelines -- Incorporate the ARM template into an Azure DevOps or GitHub Actions pipeline for automated deployments.
tip

After the initial deployment, you can re-run the build and redeploy whenever the metadata changes. BimlFlex generates CREATE TABLE IF NOT EXISTS statements, so redeploying the DDL is safe and will not drop existing tables unless you explicitly enable the drop-and-create option.