Implementing Databricks Using Azure Data Factory

BimlFlex provides an intuitive process to implement Databricks using Azure Data Factory (ADF) for cloud-based data warehousing solutions. With the BimlFlex 2026 release, significant enhancements have been introduced including Pushdown Processing and SQL Scripting options that dramatically improve performance and reduce costs.

Architecture Overview

BimlFlex supports two primary processing approaches when using Databricks with Azure Data Factory:

Traditional Notebook Approach

The traditional approach uses ADF Copy Activities to land source data in Azure Blob Storage as Parquet files, followed by ADF Notebook Activities that execute Databricks notebooks to load data into tables.

Pushdown Processing (Recommended)

New in BimlFlex 2026, pushdown processing eliminates the need for ADF Notebook Activities by leveraging the Azure Data Factory Databricks Job Activity. This approach:

Pushes all transformation logic directly into Databricks workflows
Significantly reduces cluster spin-up overhead
Supports compute clusters and job clusters configured with the required ODBC init script
Generates native Databricks workflows handling dependencies, delta detection, and restart logic

Performance Benefits

Early benchmarks demonstrate pipelines completing in under 30 minutes using job clusters, compared to over two hours on larger dedicated clusters—representing up to a 75% reduction in runtime cost.

Pushdown Processing

With pushdown processing enabled, all transformations are executed within Databricks workflows or jobs rather than being orchestrated externally through ADF notebook activities. This approach offers several advantages:

Lower Runtime Costs: Pay only for actual compute time without idle cluster overhead
Simplified Orchestration: Fewer ADF activities to manage and monitor
Better Resource Utilization: Leverage ephemeral job clusters for efficient resource usage
Native Artifacts: All generated pipelines, jobs, and notebooks remain fully native to Databricks and ADF

Cluster Requirements

BimlFlex uses the MS SQL ODBC driver for Databricks connectivity. This requires targeting a compute cluster or job cluster that uses an init script to load the bfx_init_odbc.sh file generated when BimlFlex builds the Databricks Asset Bundles. Serverless compute is not currently supported due to this ODBC driver requirement.

Pushdown processing supports:

Staging Layer: Source-to-staging data movement
Data Vault Layer: Hub, Link, and Satellite loading patterns
Data Mart Layer: Dimensional model population

Lakehouse Medallion Architecture Support

BimlFlex supports the medallion architecture pattern for Databricks Lakehouse implementations. The pushdown processing and SQL scripting options apply across all layers:

Layer	BimlFlex Implementation	Databricks Components
Bronze	Staging + Persistent Staging	Landing in Blob/ADLS, Delta tables for raw data
Silver	Data Vault or Normal Form	Unity Catalog managed Delta tables
Gold	Data Mart / Dimensional	Optimized Delta tables for analytics

Bronze Layer

Raw data lands in Azure Blob Storage or ADLS as Parquet files, then loads to Delta tables. BimlFlex manages:

Staging tables for current batch processing
Persistent Staging Area for historical retention

Silver Layer

BimlFlex supports two approaches:

Data Vault (recommended): Hub, Link, and Satellite patterns with full history
Normal Form: Traditional relational modeling

Gold Layer

Dimensional models optimized for analytics:

Star schema patterns with Fact and Dimension tables
Delta Lake optimizations (Z-ordering, partitioning)

tip

For detailed guidance on implementing medallion architecture, see the Delivering Lakehouse documentation.

SQL Scripting Option

New in BimlFlex 2026, a configuration option enables native SQL-based scripting for Databricks workloads. This provides:

Greater Readability: SQL-based templates are easier to review and understand
Easier Debugging: Familiar SQL syntax simplifies troubleshooting
SQL-Centric Development: Aligns with teams preferring SQL over Python/Scala approaches

Metadata-driven templates now support generating staging, Data Vault, and Data Mart patterns directly in SQL while still leveraging Databricks' scalability and performance.

Prerequisites

Before implementing Databricks with ADF, ensure you have completed the following:

Databricks Configuration: Complete the setup outlined in the Databricks Configuration Overview
Azure Storage: Configure blob storage for landing, staging, archive, and error containers
Linked Services: Create and configure the Databricks linked service in BimlFlex

note

Detailed prerequisites and configuration steps are provided in the Databricks Configuration Overview section.

Configuring Databricks in BimlFlex

Loading Sample Metadata

BimlFlex provides sample metadata specifically designed for Databricks with Azure Data Factory. Load the sample from the Dashboard by selecting from the Load Sample Metadata dropdown.

note

For more information on lakehouse and data modeling implementations:

Connection Configuration

Configure your Databricks connections from within the BimlFlex Connections editor:

Source System Connection:

Enable Cloud option for the source system
Configure Staging / Landing Environment for Blob Storage with ADF linked services

Databricks Connection:

Set System Type to Databricks Data Warehouse
Set Linked Service Type to Databricks
Configure Integration Template to ADF Source -> Target

Batch Configuration

Prior to building your solution, configure batches from the BimlFlex Batches editor to:

Assign batches to different compute resources
Configure scaling parameters
Set execution priorities

Generated Output

With metadata imported, BimlFlex generates a complete Databricks solution. All generated artifacts are fully native to Databricks and Azure Data Factory, with no proprietary runtime or execution engine required.

BimlFlex generates the following artifacts:

Table Definitions: DDL scripts for creating Databricks tables
Stored Procedures: SQL procedures for data transformation logic
Notebooks/Workflows: Databricks notebooks or workflow definitions (depending on processing mode)
ADF Pipelines: Azure Data Factory orchestration artifacts ready to deploy

Deployed Solution

Once deployed to Azure Data Factory, the solution provides:

Visual pipeline representation
Monitoring and logging capabilities
Error handling with automatic file archiving

Monitoring and Management

After deployment, you can:

Scale compute resources up or down
View copy command completions and errors
Suspend or resume solution execution
Monitor execution status and performance

note

Files encountering errors are automatically moved to an error folder. On subsequent runs, these files will have already been processed and archived appropriately.

Implementing Databricks Using Azure Data Factory

Architecture Overview

Traditional Notebook Approach

Pushdown Processing (Recommended)

Pushdown Processing

Lakehouse Medallion Architecture Support

Bronze Layer

Silver Layer

Gold Layer

SQL Scripting Option

Prerequisites

Configuring Databricks in BimlFlex

Loading Sample Metadata

Connection Configuration

Batch Configuration

Generated Output

Deployed Solution

Monitoring and Management

BimlFlex Documentation

External Resources

Architecture Overview​

Traditional Notebook Approach​

Pushdown Processing (Recommended)​

Pushdown Processing​

Lakehouse Medallion Architecture Support​

Bronze Layer​

Silver Layer​

Gold Layer​

SQL Scripting Option​

Prerequisites​

Configuring Databricks in BimlFlex​

Loading Sample Metadata​

Connection Configuration​

Batch Configuration​

Generated Output​

Deployed Solution​

Monitoring and Management​

Related Resources​

BimlFlex Documentation​

External Resources​

Architecture Overview

Traditional Notebook Approach

Pushdown Processing (Recommended)

Pushdown Processing

Lakehouse Medallion Architecture Support

Bronze Layer

Silver Layer

Gold Layer

SQL Scripting Option

Prerequisites

Configuring Databricks in BimlFlex

Loading Sample Metadata

Connection Configuration

Batch Configuration

Generated Output

Deployed Solution

Monitoring and Management

Related Resources

BimlFlex Documentation

External Resources