Skip to main content

Implementing Databricks Using Azure Data Factory

BimlFlex provides an intuitive process to implement Databricks using Azure Data Factory (ADF) for cloud-based data warehousing solutions. With the BimlFlex 2026 release, significant enhancements have been introduced including Pushdown Processing and SQL Scripting options that dramatically improve performance and reduce costs.

Architecture Overview

BimlFlex supports two primary processing approaches when using Databricks with Azure Data Factory:

Traditional Notebook Approach

The traditional approach uses ADF Copy Activities to land source data in Azure Blob Storage as Parquet files, followed by ADF Notebook Activities that execute Databricks notebooks to load data into tables.

New in BimlFlex 2026, pushdown processing eliminates the need for ADF Notebook Activities by leveraging the Azure Data Factory Databricks Job Activity. This approach:

  • Pushes all transformation logic directly into Databricks workflows
  • Significantly reduces cluster spin-up overhead
  • Supports compute clusters and job clusters configured with the required ODBC init script
  • Generates native Databricks workflows handling dependencies, delta detection, and restart logic
Performance Benefits

Early benchmarks demonstrate pipelines completing in under 30 minutes using job clusters, compared to over two hours on larger dedicated clusters—representing up to a 75% reduction in runtime cost.

Pushdown Processing

With pushdown processing enabled, all transformations are executed within Databricks workflows or jobs rather than being orchestrated externally through ADF notebook activities. This approach offers several advantages:

  • Lower Runtime Costs: Pay only for actual compute time without idle cluster overhead
  • Simplified Orchestration: Fewer ADF activities to manage and monitor
  • Better Resource Utilization: Leverage ephemeral job clusters for efficient resource usage
  • Native Artifacts: All generated pipelines, jobs, and notebooks remain fully native to Databricks and ADF
Cluster Requirements

BimlFlex uses the MS SQL ODBC driver for Databricks connectivity. This requires targeting a compute cluster or job cluster that uses an init script to load the bfx_init_odbc.sh file generated when BimlFlex builds the Databricks Asset Bundles. Serverless compute is not currently supported due to this ODBC driver requirement.

Pushdown processing supports:

  • Staging Layer: Source-to-staging data movement
  • Data Vault Layer: Hub, Link, and Satellite loading patterns
  • Data Mart Layer: Dimensional model population

Lakehouse Medallion Architecture Support

BimlFlex supports the medallion architecture pattern for Databricks Lakehouse implementations. The pushdown processing and SQL scripting options apply across all layers:

LayerBimlFlex ImplementationDatabricks Components
BronzeStaging + Persistent StagingLanding in Blob/ADLS, Delta tables for raw data
SilverData Vault or Normal FormUnity Catalog managed Delta tables
GoldData Mart / DimensionalOptimized Delta tables for analytics

Bronze Layer

Raw data lands in Azure Blob Storage or ADLS as Parquet files, then loads to Delta tables. BimlFlex manages:

  • Staging tables for current batch processing
  • Persistent Staging Area for historical retention

Silver Layer

BimlFlex supports two approaches:

  • Data Vault (recommended): Hub, Link, and Satellite patterns with full history
  • Normal Form: Traditional relational modeling

Gold Layer

Dimensional models optimized for analytics:

  • Star schema patterns with Fact and Dimension tables
  • Delta Lake optimizations (Z-ordering, partitioning)
tip

For detailed guidance on implementing medallion architecture, see the Delivering Lakehouse documentation.

SQL Scripting Option

New in BimlFlex 2026, a configuration option enables native SQL-based scripting for Databricks workloads. This provides:

  • Greater Readability: SQL-based templates are easier to review and understand
  • Easier Debugging: Familiar SQL syntax simplifies troubleshooting
  • SQL-Centric Development: Aligns with teams preferring SQL over Python/Scala approaches

Metadata-driven templates now support generating staging, Data Vault, and Data Mart patterns directly in SQL while still leveraging Databricks' scalability and performance.

Prerequisites

Before implementing Databricks with ADF, ensure you have completed the following:

  1. Databricks Configuration: Complete the setup outlined in the Databricks Configuration Overview
  2. Azure Storage: Configure blob storage for landing, staging, archive, and error containers
  3. Linked Services: Create and configure the Databricks linked service in BimlFlex
note

Detailed prerequisites and configuration steps are provided in the Databricks Configuration Overview section.

Configuring Databricks in BimlFlex

Loading Sample Metadata

BimlFlex provides sample metadata specifically designed for Databricks with Azure Data Factory. Load the sample from the Dashboard by selecting from the Load Sample Metadata dropdown.

note

For more information on lakehouse and data modeling implementations:

Connection Configuration

Configure your Databricks connections from within the BimlFlex Connections editor:

Source System Connection:

  • Enable Cloud option for the source system
  • Configure Staging / Landing Environment for Blob Storage with ADF linked services

Databricks Connection:

  • Set System Type to Databricks Data Warehouse
  • Set Linked Service Type to Databricks
  • Configure Integration Template to ADF Source -> Target

Batch Configuration

Prior to building your solution, configure batches from the BimlFlex Batches editor to:

  • Assign batches to different compute resources
  • Configure scaling parameters
  • Set execution priorities

Generated Output

With metadata imported, BimlFlex generates a complete Databricks solution. All generated artifacts are fully native to Databricks and Azure Data Factory, with no proprietary runtime or execution engine required.

BimlFlex generates the following artifacts:

  • Table Definitions: DDL scripts for creating Databricks tables
  • Stored Procedures: SQL procedures for data transformation logic
  • Notebooks/Workflows: Databricks notebooks or workflow definitions (depending on processing mode)
  • ADF Pipelines: Azure Data Factory orchestration artifacts ready to deploy

Deployed Solution

Once deployed to Azure Data Factory, the solution provides:

  • Visual pipeline representation
  • Monitoring and logging capabilities
  • Error handling with automatic file archiving

Monitoring and Management

After deployment, you can:

  • Scale compute resources up or down
  • View copy command completions and errors
  • Suspend or resume solution execution
  • Monitor execution status and performance
note

Files encountering errors are automatically moved to an error folder. On subsequent runs, these files will have already been processed and archived appropriately.

BimlFlex Documentation

External Resources