Fabric Pipeline Configuration Guide

This guide walks through setting up your first BimlFlex pipeline using Microsoft Fabric Data Factory to load data into Fabric Lakehouse or Warehouse. It covers project creation, connection setup, all Fabric-specific settings, and the build and deployment workflow.

Introduction

BimlFlex generates complete Fabric Data Factory pipelines, Spark notebooks, and deployment artifacts from your metadata. Before starting, ensure you have:

A Microsoft Fabric workspace with Data Factory enabled
Azure Blob Storage or Azure Data Lake Storage Gen2 for the landing area
BimlFlex installed and connected to a BimlFlex metadata database
Appropriate permissions in both the Fabric workspace and your storage account

:::caution Key Differences from Azure Data Factory

If you are coming from an ADF-based BimlFlex implementation, be aware of these fundamental architectural differences:

No linked services. Fabric completely skips linked service generation. Connections use inline JSON properties instead.
No datasets. Fabric skips dataset generation entirely. Connection details are embedded directly in pipeline activities.
ExternalReference replaces connection names. Every Fabric connection requires an ExternalReference containing the artifact GUID from the Fabric portal. This GUID is used in every generated notebook and pipeline activity.
Deployment uses .platform files. Instead of ARM templates, BimlFlex generates .platform companion files for each notebook. Deployment uses Fabric git integration.
TridentNotebook replaces DatabricksNotebook. Pipeline activities that execute notebooks use the TridentNotebook activity type instead of DatabricksNotebook.
runMultiple() for orchestration. Fabric uses notebookutils.notebook.runMultiple() with a DAG definition for parallel notebook execution, which is unique to the Fabric integration.

:::

Step 1: Create Your Project

Create a new project in BimlFlex and set the Integration Template to Data Factory (Fabric). This corresponds to IntegrationTemplateId = 6 internally.

Open the BimlFlex App and navigate to the Projects editor
Click Create to add a new project
Set the Integration Template to Data Factory (Fabric)
Configure the required connection slots:

Connection Slot	Purpose	Required
Source	The source system to extract data from	Yes
Target	The Fabric Lakehouse or Warehouse where data will be loaded	Yes
Stage	Intermediate staging connection (appears based on configuration)	Conditional
Persistent Stage	Persistent staging area for history tracking	Conditional
Landing	Azure storage for landing extracted files (appears when source is not Fabric)	Conditional

The Landing connection slot visibility depends on your source configuration. When using Pushdown Extraction from a Fabric source, the landing connection is not required because data stays within Fabric. For non-Fabric sources, a landing area in Azure Blob Storage or ADLS Gen2 is required.

tip

BimlFlex provides two sample metadata sets to help you get started quickly. Load either Fabric Data Vault or Fabric Datamart from the Dashboard to see a pre-configured project.

Step 2: Configure Connections

Fabric connections use system types that determine the SQL dialect, identifier delimiters, and data type mappings for all generated code.

System Types

System Type	Abbreviation	ID	Use For
Fabric Lakehouse	FBRLH	46	Lakehouse targets. Generates Spark SQL with backtick delimiters (`). Uses Spark data types (STRING, BIGINT, DOUBLE, TIMESTAMP).
Fabric Warehouse	FBRDW	45	Warehouse targets. Generates T-SQL with bracket delimiters (`[ ]`). Uses T-SQL data types, with DateTime capped at precision 6.
Fabric SQL Database	FBRSQL	47	SQL Database targets (preview).

Key Connection Fields

ExternalReference (Required for all Fabric connections)

The ExternalReference field must contain the artifact GUID for the Fabric Lakehouse or Warehouse. BimlFlex uses this GUID in every generated notebook header and pipeline activity to reference the correct workspace artifact.

To find the GUID:

Open the Fabric portal and navigate to your workspace
Select the Lakehouse or Warehouse artifact
Copy the artifact ID from the URL or the item properties pane

The format is a standard GUID: a1b2c3d4-e5f6-7890-abcd-ef1234567890

If this field is empty, the build produces validation error CON_21005008:

Connection: 'YourConnection' - A Connection that is configured for Fabric must use External Reference.

Catalog

The Catalog field maps to the Lakehouse or Warehouse name. This name appears in all generated notebooks as the default Lakehouse/Warehouse context. Set it to the exact name of your Fabric Lakehouse or Warehouse artifact.

ConnectionType

Set to MicrosoftFabric for Fabric connections.

ExternalLocation

Required when using Pushdown Extraction with a source connection. Specify the OneLake path, for example:

abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<lakehouse-name>/Files/

If this field is missing when Pushdown Extraction is enabled, the build produces validation error CON_21005010:

Connection: 'YourConnection' - A Connection that is configured for Fabric with Pushdown Extraction must use External Location.

Step 3: No Linked Services or Datasets

Unlike Azure Data Factory, Fabric pipelines do not use linked services or datasets. If you are transitioning from an ADF implementation, you will not find linked service or dataset configuration screens for your Fabric project---this is expected.

When BimlFlex encounters a Fabric project during the build process, it skips linked service and dataset generation entirely. Instead, connection details are embedded as inline RawJsonProperties within each pipeline activity. These properties reference the Fabric artifact using the ExternalReference GUID:

"linkedService": {
  "name": "MyLakehouse",
  "properties": {
    "annotations": [],
    "type": "DataWarehouse",
    "typeProperties": {
      "endpoint": "<sql-endpoint>",
      "artifactId": "<ExternalReference-GUID>",
      "workspaceId": "<FabricWorkspaceId-GUID>"
    }
  }
}

This means every connection property you configure in BimlFlex (ExternalReference, Catalog, Connection String) flows directly into the pipeline JSON. There is no separate linked service artifact to deploy or manage.

Steps 4-5: Configure Fabric Settings

BimlFlex provides 20 Fabric-specific settings that control workspace configuration, notebook generation, pipeline behavior, and file handling. Configure these in the Settings editor.

Workspace Settings

Setting	Default	Notes
`FabricWorkspaceId`	`00000000-0000-0000-0000-000000000000`	Required. The GUID of your Fabric workspace. Find this in the Fabric portal URL when you navigate to your workspace. This value is referenced in every generated notebook header and pipeline activity. Must be a valid GUID format.
`FabricLogicalId`	`00000000-0000-0000-0000-000000000000`	GUID used in `.platform` files for git-based deployment. Required when deploying through Fabric git integration. Must be a valid GUID format.
`FabricOutputPath`	`@@OutputPath\Fabric`	Local build output directory where all generated Fabric artifacts (notebooks, .platform files, pipeline JSON) are written.
`FabricWorkspaceName`	(empty)	Display name of the Fabric workspace. Used for reference only; does not affect generated code.

Notebook Execution Settings

These settings control how notebooks execute within the Fabric environment when invoked via notebookutils.notebook.runMultiple().

Setting	Default	Notes
`FabricNotebookConcurrency`	`0` (disabled)	Controls the maximum number of notebooks that execute in parallel within a `runMultiple()` DAG. When set to 0, the concurrency property is omitted from the DAG. Set to a value greater than 0 for production workloads to enable parallel execution.
`FabricNotebookTimeout`	`0` (disabled)	Per-notebook timeout in seconds within a `runMultiple()` DAG. When set to 0, the `timeoutInSeconds` property is omitted.
`FabricNotebookRetryInterval`	`0` (disabled)	Retry interval in seconds for notebook execution within a DAG. When set to 0, the property is omitted.
`FabricNotebookTimeoutPerCell`	`0` (disabled)	Per-cell timeout in seconds for individual notebook cells. When set to 0, the property is omitted.

Pipeline Activity Settings

These settings apply to TridentNotebook activities in the generated Data Factory pipeline JSON.

Setting	Default	Notes
`FabricActivityRetryAttempts`	`0`	Number of times a pipeline activity retries on failure.
`FabricActivityRetryInterval`	`30`	Seconds between retry attempts for pipeline activities.
`FabricActivityTimeout`	`0.12:00:00`	Maximum activity duration. The default of `0.12:00:00` is 12 hours. Uses the Data Factory timespan format (`d.hh:mm:ss`).
`FabricActivitySecureInput`	`N`	When set to `Y`, masks the activity input in Fabric monitoring views.
`FabricActivitySecureOutput`	`N`	When set to `Y`, masks the activity output in Fabric monitoring views.

Notebook Generation Settings

These settings control how BimlFlex names, organizes, and generates supplementary notebooks.

Setting	Default	Notes
`FabricUseDisplayFolder`	`N`	Controls whether notebooks are organized into subfolders based on their display folder. The default of `N` differs from the Databricks integration where folder structures are more commonly used.
`FabricAppendNotebookName`	(empty)	Adds a prefix or suffix to generated notebook names. If the value ends with `_` (e.g., `PRD_`), it is treated as a prefix. If the value starts with `_` (e.g., `_v2`), it is treated as a suffix.
`FabricAddDropNotebooks`	`N`	When set to `Y`, generates additional `DROP TABLE` notebooks for each target table.
`FabricAddTruncateNotebooks`	`N`	When set to `Y`, generates additional `TRUNCATE TABLE` notebooks for each target table.
`FabricTempTableSchema`	(empty)	Schema name to use for temporary tables in generated notebooks. When empty, temporary tables use the default schema.

File Handling Settings

These settings control how data files are read and written in COPY and Spark operations within generated notebooks.

Setting	Default	Notes
`FabricCopyFormatOptions`	(empty)	`FORMAT_OPTIONS` clause appended to `COPY INTO` statements in generated notebooks. Use this to specify file format details such as delimiters or headers.
`FabricCopyOptions`	(empty)	`COPY_OPTIONS` clause appended to `COPY INTO` statements. Use this to specify options like error tolerance or credential details.
`FabricReadFilesOptions`	(empty)	Options appended to `spark.read.load()` calls in generated notebooks. Use this to add Spark reader options such as `mergeSchema` or `inferSchema`.

Steps 6-7: Import and Configure Metadata

After configuring your connections and settings, import source metadata and configure your objects as you would for any BimlFlex project. There are two Fabric-specific behaviors to be aware of.

Schema Handling

Fabric Lakehouse (FBRLH) connections set IgnoreSchema internally when the schema is not relevant to the Lakehouse catalog structure. Schemas in BimlFlex are still used for logical organization of objects, but they map differently depending on the system type:

FBRLH: Schema names are used as Lakehouse schema namespaces with backtick delimiters (`schema`.`table`)
FBRDW: Schema names follow standard T-SQL conventions with bracket delimiters ([schema].[table])

Data Type Differences

The system type determines which data type family BimlFlex uses in generated DDL and notebooks:

Type Category	FBRLH (Lakehouse)	FBRDW (Warehouse)
Strings	`STRING`	`VARCHAR`, `NVARCHAR`
Integers	`BIGINT`, `INT`	`BIGINT`, `INT`
Decimals	`DOUBLE`, `DECIMAL`	`FLOAT`, `DECIMAL`
Dates/Times	`TIMESTAMP`	`DATETIME2(6)`
Delimiters	Backticks (`)	Brackets (`[ ]`)

note

Fabric Warehouse uses DATETIME2(6) for date-time columns such as RowEffectiveToDate, not DATETIME2(7) as used by standard SQL Server targets. BimlFlex applies this precision cap automatically when generating code for FBRDW connections. The default effective-to-date value generated is CAST('9999-12-31' AS DATETIME2(6)).

Step 8: Data Vault and Data Mart Configuration

Supported Constructs

BimlFlex supports the following Data Vault and Data Mart constructs for Fabric Lakehouse targets:

Construct	Status	Notes
Hubs	Supported	Core business entity tables
Links	Supported	Relationship tables between hubs
Satellites (SAT, LSAT, RSAT, REF)	Supported	Descriptive attribute tables with history
Dimensions	Supported	Data Mart dimensional tables
Facts	Supported	Data Mart fact tables

:::note Unsupported Constructs

The following constructs are not yet available for Fabric Lakehouse. They are planned for future releases:

PIT (Point-in-Time) tables --- Not yet implemented for Fabric
Bridge tables --- Not yet implemented for Fabric
Views for FBRLH --- Fabric Lakehouse does not currently support view generation
Delete detection output --- Not yet available in Fabric-generated artifacts
Reload staging notebooks --- Not yet available for Fabric

For pipelines that require these constructs, consider using ADF with Databricks or Snowflake integration templates.

:::

runMultiple DAG Orchestration

Fabric Data Vault loading uses notebookutils.notebook.runMultiple() to orchestrate parallel execution of Hub, Link, and Satellite notebooks. This is unique to the Fabric integration---ADF uses pipeline activities for orchestration, and Databricks uses job workflows.

For each source object, BimlFlex generates:

Individual notebooks for each Hub, Link, and Satellite
A control notebook that defines a DAG (Directed Acyclic Graph) and calls runMultiple() to execute the individual notebooks

The DAG structure looks like this in the generated control notebook:

DAG = {
    "activities": [
        { "name": "HUB_Customer", "path": "HUB_Customer", "args": {"row_audit_id": row_audit_id} },
        { "name": "LNK_Customer_Order", "path": "LNK_Customer_Order", "args": {"row_audit_id": row_audit_id} },
        { "name": "SAT_Customer", "path": "SAT_Customer", "args": {"row_audit_id": row_audit_id} }
    ],
    "concurrency": 3,
    "timeoutInSeconds": 600
}

notebookutils.notebook.runMultiple(DAG, {"displayDAGViaGraphviz": False})

The concurrency property is controlled by FabricNotebookConcurrency, and timeoutInSeconds by FabricNotebookTimeout. When these settings are 0, the properties are omitted from the DAG.

Table deployment notebooks also use runMultiple() to create tables in parallel. BimlFlex generates deploy.tables.<database>.<schema> notebooks that orchestrate table creation for each schema.

Persistent Staging Patterns

When Persistent Staging Area (PSA) is configured, BimlFlex generates one of two distinct notebook patterns based on the PersistentStageHistory setting on the source object:

Merge pattern (PersistHistory = false): Generates a notebook that uses a MERGE statement to upsert records into the PSA table. This maintains only the current state of each record, overwriting previous values. The notebook checks if the PSA table is empty and uses an optimized full-load path for the initial load.

Insert pattern (PersistHistory = true): Generates a notebook that uses INSERT statements to append all incoming records to the PSA table, preserving complete change history. Like the merge pattern, it includes an empty-table check to optimize the initial load.

Both patterns include:

Staging source SQL to create temporary views from landed files
Empty-table detection with is_empty / is_delta checks for optimized first-load behavior
Delta collapse logic for incremental loads
Full insert logic for initial loads

Step 9: System Columns

BimlFlex automatically handles system column generation based on the target system type.

Hashing (FBRLH): Fabric Lakehouse uses SHA2(CAST(...), 256) by default for hash key generation. This is the Spark SQL SHA2 function, which differs from the T-SQL HASHBYTES function used by SQL Server targets. The hash algorithm can be configured through the standard BimlFlex hash settings.

Hashing (FBRDW): Fabric Warehouse uses CONVERT(CHAR(64), HASHBYTES('SHA2_256', CONVERT(VARCHAR(MAX), ...)), 2), consistent with T-SQL patterns but using the Fabric Warehouse engine.

RowEffectiveToDate (FBRDW): Fabric Warehouse generates DATETIME2(6) for the RowEffectiveToDate system column instead of DATETIME2(7) used by standard SQL Server. The default end-date value is CAST('9999-12-31' AS DATETIME2(6)).

RowEffectiveToDate (FBRLH): Fabric Lakehouse uses TO_TIMESTAMP('9999-12-31') as the Spark-native timestamp representation.

Step 10: Build and Deploy

Generated Artifacts

When you build a Fabric project, BimlFlex generates the following artifacts in the FabricOutputPath directory:

Artifact	Location	Description
Notebooks	`Notebooks/<folder>/<name>.Notebook/notebook-content.py`	PySpark notebooks with Fabric metadata headers. Each notebook includes the Lakehouse context (workspace ID, Lakehouse ID, Lakehouse name).
.platform files	`Notebooks/<folder>/<name>.Notebook/.platform`	JSON companion file for each notebook. Required for Fabric git integration deployment.
Table creation notebooks	`Tables/<database>/<schema>/create.table.<db>.<schema>.<table>.Notebook/`	Individual table DDL notebooks.
Deploy notebooks	`Tables/<database>/_deploy/deploy.tables.<db>.<schema>.Notebook/`	Orchestration notebooks that use `runMultiple()` to create all tables in a schema in parallel.
Drop notebooks	`Tables/<database>/_deploy/drop.tables.<database>.Notebook/`	Generated when `FabricAddDropNotebooks = Y`. Notebooks with `DROP TABLE IF EXISTS` statements.
Pipeline JSON	Pipeline definition files	Uses `TridentNotebook` activities with inline connection references via `RawJsonProperties`.

BimlFlex does not generate datasets or linked services for Fabric projects.

.platform File Format

Every generated notebook is accompanied by a .platform file that Fabric git integration uses to identify and track the artifact:

{
  "$schema": "https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json",
  "metadata": {
    "type": "Notebook",
    "displayName": "deploy.tables.MyLakehouse.dbo",
    "description": "Notebook for MyLakehouse.dbo deploy"
  },
  "config": {
    "version": "2.0",
    "logicalId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
}

The logicalId is derived from the FabricLogicalId setting or generated deterministically from the object's metadata. The displayName follows the notebook naming convention configured through FabricAppendNotebookName and FabricUseDisplayFolder.

Notebook Header Format

Every generated notebook begins with a Fabric-specific metadata header that sets the Lakehouse context:

# Fabric notebook source

# METADATA ********************

# META {
# META   "kernel_info": {
# META     "name": "synapse_pyspark"
# META   },
# META   "dependencies": {
# META     "lakehouse": {
# META       "default_lakehouse": "<ExternalReference-GUID>",
# META       "default_lakehouse_name": "<Catalog-Name>",
# META       "default_lakehouse_workspace_id": "<FabricWorkspaceId>",
# META       "known_lakehouses": [
# META         {
# META           "id": "<ExternalReference-GUID>"
# META         }
# META       ]
# META     }
# META   }
# META }

This header tells Fabric which Lakehouse to attach when the notebook opens. The values come from FabricWorkspaceId (workspace), ExternalReference (Lakehouse artifact GUID), and Catalog (Lakehouse name).

Deployment Workflow

Fabric does not use ARM template deployment like ADF. Instead, deploy through Fabric git integration:

Build the project in BimlStudio to generate all artifacts to the FabricOutputPath
Push the generated .platform files and notebook folders to a git repository connected to your Fabric workspace
Sync the Fabric workspace from the git repository to import the notebooks and pipelines
Verify in the Fabric portal that all notebooks, tables, and pipeline definitions appear in your workspace

Alternatively, for manual deployment:

Build the project in BimlStudio
Open each generated notebook file and import it into your Fabric workspace manually
Create the pipeline activities by importing the generated pipeline JSON

tip

For production workflows, Fabric git integration is strongly recommended over manual import. It provides version control, change tracking, and the ability to deploy consistently across environments.

Microsoft Fabric Configuration Overview --- Prerequisites and connection setup
Implementing Fabric Lakehouse --- Lakehouse-specific implementation details
Implementing Fabric Warehouse --- Warehouse-specific implementation details
Configuring a Landing Area --- Azure storage configuration for data landing
Settings Editor --- BimlFlex settings management
Microsoft Fabric Documentation --- Official Fabric documentation

Introduction​

Step 1: Create Your Project​

Step 2: Configure Connections​

System Types​

Key Connection Fields​

Step 3: No Linked Services or Datasets​

Steps 4-5: Configure Fabric Settings​

Workspace Settings​

Notebook Execution Settings​

Pipeline Activity Settings​

Notebook Generation Settings​

File Handling Settings​

Steps 6-7: Import and Configure Metadata​

Schema Handling​

Data Type Differences​

Step 8: Data Vault and Data Mart Configuration​

Supported Constructs​

runMultiple DAG Orchestration​

Persistent Staging Patterns​

Step 9: System Columns​

Step 10: Build and Deploy​

Generated Artifacts​

.platform File Format​

Notebook Header Format​

Deployment Workflow​

Related Resources​