Fabric Pipeline Configuration Guide
This guide walks through setting up your first BimlFlex pipeline using Microsoft Fabric Data Factory to load data into Fabric Lakehouse or Warehouse. It covers project creation, connection setup, all Fabric-specific settings, and the build and deployment workflow.
Introduction
BimlFlex generates complete Fabric Data Factory pipelines, Spark notebooks, and deployment artifacts from your metadata. Before starting, ensure you have:
- A Microsoft Fabric workspace with Data Factory enabled
- Azure Blob Storage or Azure Data Lake Storage Gen2 for the landing area
- BimlFlex installed and connected to a BimlFlex metadata database
- Appropriate permissions in both the Fabric workspace and your storage account
If you are coming from an ADF-based BimlFlex implementation, be aware of these fundamental architectural differences:
- No linked services. Fabric completely skips linked service generation. Connections use inline JSON properties instead.
- No datasets. Fabric skips dataset generation entirely. Connection details are embedded directly in pipeline activities.
- ExternalReference replaces connection names. Every Fabric connection requires an
ExternalReferencecontaining the artifact GUID from the Fabric portal. This GUID is used in every generated notebook and pipeline activity. - Deployment uses .platform files. Instead of ARM templates, BimlFlex generates
.platformcompanion files for each notebook. Deployment uses Fabric git integration. - TridentNotebook replaces DatabricksNotebook. Pipeline activities that execute notebooks use the
TridentNotebookactivity type instead ofDatabricksNotebook. - runMultiple() for orchestration. Fabric uses
notebookutils.notebook.runMultiple()with a DAG definition for parallel notebook execution, which is unique to the Fabric integration.
Step 1: Create Your Project
Create a new project in BimlFlex and set the Integration Template to Data Factory (Fabric). This corresponds to IntegrationTemplateId = 6 internally.
- Open the BimlFlex App and navigate to the Projects editor
- Click Create to add a new project
- Set the Integration Template to Data Factory (Fabric)
- Configure the required connection slots:
| Connection Slot | Purpose | Required |
|---|---|---|
| Source | The source system to extract data from | Yes |
| Target | The Fabric Lakehouse or Warehouse where data will be loaded | Yes |
| Stage | Intermediate staging connection (appears based on configuration) | Conditional |
| Persistent Stage | Persistent staging area for history tracking | Conditional |
| Landing | Azure storage for landing extracted files (appears when source is not Fabric) | Conditional |
The Landing connection slot visibility depends on your source configuration. When using Pushdown Extraction from a Fabric source, the landing connection is not required because data stays within Fabric. For non-Fabric sources, a landing area in Azure Blob Storage or ADLS Gen2 is required.
BimlFlex provides two sample metadata sets to help you get started quickly. Load either Fabric Data Vault or Fabric Datamart from the Dashboard to see a pre-configured project.
Step 2: Configure Connections
Fabric connections use system types that determine the SQL dialect, identifier delimiters, and data type mappings for all generated code.
System Types
| System Type | Abbreviation | ID | Use For |
|---|---|---|---|
| Fabric Lakehouse | FBRLH | 46 | Lakehouse targets. Generates Spark SQL with backtick delimiters (`). Uses Spark data types (STRING, BIGINT, DOUBLE, TIMESTAMP). |
| Fabric Warehouse | FBRDW | 45 | Warehouse targets. Generates T-SQL with bracket delimiters ([ ]). Uses T-SQL data types, with DateTime capped at precision 6. |
| Fabric SQL Database | FBRSQL | 47 | SQL Database targets (preview). |
Key Connection Fields
ExternalReference (Required for all Fabric connections)
The ExternalReference field must contain the artifact GUID for the Fabric Lakehouse or Warehouse. BimlFlex uses this GUID in every generated notebook header and pipeline activity to reference the correct workspace artifact.
To find the GUID:
- Open the Fabric portal and navigate to your workspace
- Select the Lakehouse or Warehouse artifact
- Copy the artifact ID from the URL or the item properties pane
The format is a standard GUID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
If this field is empty, the build produces validation error CON_21005008:
Connection: 'YourConnection' - A Connection that is configured for Fabric must use External Reference.
Catalog
The Catalog field maps to the Lakehouse or Warehouse name. This name appears in all generated notebooks as the default Lakehouse/Warehouse context. Set it to the exact name of your Fabric Lakehouse or Warehouse artifact.
ConnectionType
Set to MicrosoftFabric for Fabric connections.
ExternalLocation
Required when using Pushdown Extraction with a source connection. Specify the OneLake path, for example:
abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<lakehouse-name>/Files/
If this field is missing when Pushdown Extraction is enabled, the build produces validation error CON_21005010:
Connection: 'YourConnection' - A Connection that is configured for Fabric with Pushdown Extraction must use External Location.
Step 3: No Linked Services or Datasets
Unlike Azure Data Factory, Fabric pipelines do not use linked services or datasets. If you are transitioning from an ADF implementation, you will not find linked service or dataset configuration screens for your Fabric project---this is expected.
When BimlFlex encounters a Fabric project during the build process, it skips linked service and dataset generation entirely. Instead, connection details are embedded as inline RawJsonProperties within each pipeline activity. These properties reference the Fabric artifact using the ExternalReference GUID:
"linkedService": {
"name": "MyLakehouse",
"properties": {
"annotations": [],
"type": "DataWarehouse",
"typeProperties": {
"endpoint": "<sql-endpoint>",
"artifactId": "<ExternalReference-GUID>",
"workspaceId": "<FabricWorkspaceId-GUID>"
}
}
}
This means every connection property you configure in BimlFlex (ExternalReference, Catalog, Connection String) flows directly into the pipeline JSON. There is no separate linked service artifact to deploy or manage.
Steps 4-5: Configure Fabric Settings
BimlFlex provides 20 Fabric-specific settings that control workspace configuration, notebook generation, pipeline behavior, and file handling. Configure these in the Settings editor.
Workspace Settings
| Setting | Default | Notes |
|---|---|---|
FabricWorkspaceId | 00000000-0000-0000-0000-000000000000 | Required. The GUID of your Fabric workspace. Find this in the Fabric portal URL when you navigate to your workspace. This value is referenced in every generated notebook header and pipeline activity. Must be a valid GUID format. |
FabricLogicalId | 00000000-0000-0000-0000-000000000000 | GUID used in .platform files for git-based deployment. Required when deploying through Fabric git integration. Must be a valid GUID format. |
FabricOutputPath | @@OutputPath\Fabric | Local build output directory where all generated Fabric artifacts (notebooks, .platform files, pipeline JSON) are written. |
FabricWorkspaceName | (empty) | Display name of the Fabric workspace. Used for reference only; does not affect generated code. |
Notebook Execution Settings
These settings control how notebooks execute within the Fabric environment when invoked via notebookutils.notebook.runMultiple().
| Setting | Default | Notes |
|---|---|---|
FabricNotebookConcurrency | 0 (disabled) | Controls the maximum number of notebooks that execute in parallel within a runMultiple() DAG. When set to 0, the concurrency property is omitted from the DAG. Set to a value greater than 0 for production workloads to enable parallel execution. |
FabricNotebookTimeout | 0 (disabled) | Per-notebook timeout in seconds within a runMultiple() DAG. When set to 0, the timeoutInSeconds property is omitted. |
FabricNotebookRetryInterval | 0 (disabled) | Retry interval in seconds for notebook execution within a DAG. When set to 0, the property is omitted. |
FabricNotebookTimeoutPerCell | 0 (disabled) | Per-cell timeout in seconds for individual notebook cells. When set to 0, the property is omitted. |
Pipeline Activity Settings
These settings apply to TridentNotebook activities in the generated Data Factory pipeline JSON.
| Setting | Default | Notes |
|---|---|---|
FabricActivityRetryAttempts | 0 | Number of times a pipeline activity retries on failure. |
FabricActivityRetryInterval | 30 | Seconds between retry attempts for pipeline activities. |
FabricActivityTimeout | 0.12:00:00 | Maximum activity duration. The default of 0.12:00:00 is 12 hours. Uses the Data Factory timespan format (d.hh:mm:ss). |
FabricActivitySecureInput | N | When set to Y, masks the activity input in Fabric monitoring views. |
FabricActivitySecureOutput | N | When set to Y, masks the activity output in Fabric monitoring views. |
Notebook Generation Settings
These settings control how BimlFlex names, organizes, and generates supplementary notebooks.
| Setting | Default | Notes |
|---|---|---|
FabricUseDisplayFolder | N | Controls whether notebooks are organized into subfolders based on their display folder. The default of N differs from the Databricks integration where folder structures are more commonly used. |
FabricAppendNotebookName | (empty) | Adds a prefix or suffix to generated notebook names. If the value ends with _ (e.g., PRD_), it is treated as a prefix. If the value starts with _ (e.g., _v2), it is treated as a suffix. |
FabricAddDropNotebooks | N | When set to Y, generates additional DROP TABLE notebooks for each target table. |
FabricAddTruncateNotebooks | N | When set to Y, generates additional TRUNCATE TABLE notebooks for each target table. |
FabricTempTableSchema | (empty) | Schema name to use for temporary tables in generated notebooks. When empty, temporary tables use the default schema. |
File Handling Settings
These settings control how data files are read and written in COPY and Spark operations within generated notebooks.
| Setting | Default | Notes |
|---|---|---|
FabricCopyFormatOptions | (empty) | FORMAT_OPTIONS clause appended to COPY INTO statements in generated notebooks. Use this to specify file format details such as delimiters or headers. |
FabricCopyOptions | (empty) | COPY_OPTIONS clause appended to COPY INTO statements. Use this to specify options like error tolerance or credential details. |
FabricReadFilesOptions | (empty) | Options appended to spark.read.load() calls in generated notebooks. Use this to add Spark reader options such as mergeSchema or inferSchema. |
Steps 6-7: Import and Configure Metadata
After configuring your connections and settings, import source metadata and configure your objects as you would for any BimlFlex project. There are two Fabric-specific behaviors to be aware of.
Schema Handling
Fabric Lakehouse (FBRLH) connections set IgnoreSchema internally when the schema is not relevant to the Lakehouse catalog structure. Schemas in BimlFlex are still used for logical organization of objects, but they map differently depending on the system type:
- FBRLH: Schema names are used as Lakehouse schema namespaces with backtick delimiters (
`schema`.`table`) - FBRDW: Schema names follow standard T-SQL conventions with bracket delimiters (
[schema].[table])
Data Type Differences
The system type determines which data type family BimlFlex uses in generated DDL and notebooks:
| Type Category | FBRLH (Lakehouse) | FBRDW (Warehouse) |
|---|---|---|
| Strings | STRING | VARCHAR, NVARCHAR |
| Integers | BIGINT, INT | BIGINT, INT |
| Decimals | DOUBLE, DECIMAL | FLOAT, DECIMAL |
| Dates/Times | TIMESTAMP | DATETIME2(6) |
| Delimiters | Backticks (`) | Brackets ([ ]) |
Fabric Warehouse uses DATETIME2(6) for date-time columns such as RowEffectiveToDate, not DATETIME2(7) as used by standard SQL Server targets. BimlFlex applies this precision cap automatically when generating code for FBRDW connections. The default effective-to-date value generated is CAST('9999-12-31' AS DATETIME2(6)).
Step 8: Data Vault and Data Mart Configuration
Supported Constructs
BimlFlex supports the following Data Vault and Data Mart constructs for Fabric Lakehouse targets:
| Construct | Status | Notes |
|---|---|---|
| Hubs | Supported | Core business entity tables |
| Links | Supported | Relationship tables between hubs |
| Satellites (SAT, LSAT, RSAT, REF) | Supported | Descriptive attribute tables with history |
| Dimensions | Supported | Data Mart dimensional tables |
| Facts | Supported | Data Mart fact tables |
The following constructs are not yet available for Fabric Lakehouse. They are planned for future releases:
- PIT (Point-in-Time) tables --- Not yet implemented for Fabric
- Bridge tables --- Not yet implemented for Fabric
- Views for FBRLH --- Fabric Lakehouse does not currently support view generation
- Delete detection output --- Not yet available in Fabric-generated artifacts
- Reload staging notebooks --- Not yet available for Fabric
For pipelines that require these constructs, consider using ADF with Databricks or Snowflake integration templates.
runMultiple DAG Orchestration
Fabric Data Vault loading uses notebookutils.notebook.runMultiple() to orchestrate parallel execution of Hub, Link, and Satellite notebooks. This is unique to the Fabric integration---ADF uses pipeline activities for orchestration, and Databricks uses job workflows.
For each source object, BimlFlex generates:
- Individual notebooks for each Hub, Link, and Satellite
- A control notebook that defines a DAG (Directed Acyclic Graph) and calls
runMultiple()to execute the individual notebooks
The DAG structure looks like this in the generated control notebook:
DAG = {
"activities": [
{ "name": "HUB_Customer", "path": "HUB_Customer", "args": {"row_audit_id": row_audit_id} },
{ "name": "LNK_Customer_Order", "path": "LNK_Customer_Order", "args": {"row_audit_id": row_audit_id} },
{ "name": "SAT_Customer", "path": "SAT_Customer", "args": {"row_audit_id": row_audit_id} }
],
"concurrency": 3,
"timeoutInSeconds": 600
}
notebookutils.notebook.runMultiple(DAG, {"displayDAGViaGraphviz": False})
The concurrency property is controlled by FabricNotebookConcurrency, and timeoutInSeconds by FabricNotebookTimeout. When these settings are 0, the properties are omitted from the DAG.
Table deployment notebooks also use runMultiple() to create tables in parallel. BimlFlex generates deploy.tables.<database>.<schema> notebooks that orchestrate table creation for each schema.
Persistent Staging Patterns
When Persistent Staging Area (PSA) is configured, BimlFlex generates one of two distinct notebook patterns based on the PersistentStageHistory setting on the source object:
Merge pattern (PersistHistory = false): Generates a notebook that uses a MERGE statement to upsert records into the PSA table. This maintains only the current state of each record, overwriting previous values. The notebook checks if the PSA table is empty and uses an optimized full-load path for the initial load.
Insert pattern (PersistHistory = true): Generates a notebook that uses INSERT statements to append all incoming records to the PSA table, preserving complete change history. Like the merge pattern, it includes an empty-table check to optimize the initial load.
Both patterns include:
- Staging source SQL to create temporary views from landed files
- Empty-table detection with
is_empty/is_deltachecks for optimized first-load behavior - Delta collapse logic for incremental loads
- Full insert logic for initial loads
Step 9: System Columns
BimlFlex automatically handles system column generation based on the target system type.
Hashing (FBRLH): Fabric Lakehouse uses SHA2(CAST(...), 256) by default for hash key generation. This is the Spark SQL SHA2 function, which differs from the T-SQL HASHBYTES function used by SQL Server targets. The hash algorithm can be configured through the standard BimlFlex hash settings.
Hashing (FBRDW): Fabric Warehouse uses CONVERT(CHAR(64), HASHBYTES('SHA2_256', CONVERT(VARCHAR(MAX), ...)), 2), consistent with T-SQL patterns but using the Fabric Warehouse engine.
RowEffectiveToDate (FBRDW): Fabric Warehouse generates DATETIME2(6) for the RowEffectiveToDate system column instead of DATETIME2(7) used by standard SQL Server. The default end-date value is CAST('9999-12-31' AS DATETIME2(6)).
RowEffectiveToDate (FBRLH): Fabric Lakehouse uses TO_TIMESTAMP('9999-12-31') as the Spark-native timestamp representation.
Step 10: Build and Deploy
Generated Artifacts
When you build a Fabric project, BimlFlex generates the following artifacts in the FabricOutputPath directory:
| Artifact | Location | Description |
|---|---|---|
| Notebooks | Notebooks/<folder>/<name>.Notebook/notebook-content.py | PySpark notebooks with Fabric metadata headers. Each notebook includes the Lakehouse context (workspace ID, Lakehouse ID, Lakehouse name). |
| .platform files | Notebooks/<folder>/<name>.Notebook/.platform | JSON companion file for each notebook. Required for Fabric git integration deployment. |
| Table creation notebooks | Tables/<database>/<schema>/create.table.<db>.<schema>.<table>.Notebook/ | Individual table DDL notebooks. |
| Deploy notebooks | Tables/<database>/_deploy/deploy.tables.<db>.<schema>.Notebook/ | Orchestration notebooks that use runMultiple() to create all tables in a schema in parallel. |
| Drop notebooks | Tables/<database>/_deploy/drop.tables.<database>.Notebook/ | Generated when FabricAddDropNotebooks = Y. Notebooks with DROP TABLE IF EXISTS statements. |
| Pipeline JSON | Pipeline definition files | Uses TridentNotebook activities with inline connection references via RawJsonProperties. |
BimlFlex does not generate datasets or linked services for Fabric projects.
.platform File Format
Every generated notebook is accompanied by a .platform file that Fabric git integration uses to identify and track the artifact:
{
"$schema": "https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json",
"metadata": {
"type": "Notebook",
"displayName": "deploy.tables.MyLakehouse.dbo",
"description": "Notebook for MyLakehouse.dbo deploy"
},
"config": {
"version": "2.0",
"logicalId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
}
The logicalId is derived from the FabricLogicalId setting or generated deterministically from the object's metadata. The displayName follows the notebook naming convention configured through FabricAppendNotebookName and FabricUseDisplayFolder.
Notebook Header Format
Every generated notebook begins with a Fabric-specific metadata header that sets the Lakehouse context:
# Fabric notebook source
# METADATA ********************
# META {
# META "kernel_info": {
# META "name": "synapse_pyspark"
# META },
# META "dependencies": {
# META "lakehouse": {
# META "default_lakehouse": "<ExternalReference-GUID>",
# META "default_lakehouse_name": "<Catalog-Name>",
# META "default_lakehouse_workspace_id": "<FabricWorkspaceId>",
# META "known_lakehouses": [
# META {
# META "id": "<ExternalReference-GUID>"
# META }
# META ]
# META }
# META }
# META }
This header tells Fabric which Lakehouse to attach when the notebook opens. The values come from FabricWorkspaceId (workspace), ExternalReference (Lakehouse artifact GUID), and Catalog (Lakehouse name).
Deployment Workflow
Fabric does not use ARM template deployment like ADF. Instead, deploy through Fabric git integration:
- Build the project in BimlStudio to generate all artifacts to the
FabricOutputPath - Push the generated
.platformfiles and notebook folders to a git repository connected to your Fabric workspace - Sync the Fabric workspace from the git repository to import the notebooks and pipelines
- Verify in the Fabric portal that all notebooks, tables, and pipeline definitions appear in your workspace
Alternatively, for manual deployment:
- Build the project in BimlStudio
- Open each generated notebook file and import it into your Fabric workspace manually
- Create the pipeline activities by importing the generated pipeline JSON
For production workflows, Fabric git integration is strongly recommended over manual import. It provides version control, change tracking, and the ability to deploy consistently across environments.
Related Resources
- Microsoft Fabric Configuration Overview --- Prerequisites and connection setup
- Implementing Fabric Lakehouse --- Lakehouse-specific implementation details
- Implementing Fabric Warehouse --- Warehouse-specific implementation details
- Configuring a Landing Area --- Azure storage configuration for data landing
- Settings Editor --- BimlFlex settings management
- Microsoft Fabric Documentation --- Official Fabric documentation