Skip to main content

Fabric Pipeline Configuration Guide

This guide walks through setting up your first BimlFlex pipeline using Microsoft Fabric Data Factory to load data into Fabric Lakehouse or Warehouse. It covers project creation, connection setup, all Fabric-specific settings, and the build and deployment workflow.

Introduction

BimlFlex generates complete Fabric Data Factory pipelines, Spark notebooks, and deployment artifacts from your metadata. Before starting, ensure you have:

  • A Microsoft Fabric workspace with Data Factory enabled
  • Azure Blob Storage or Azure Data Lake Storage Gen2 for the landing area
  • BimlFlex installed and connected to a BimlFlex metadata database
  • Appropriate permissions in both the Fabric workspace and your storage account
Key Differences from Azure Data Factory

If you are coming from an ADF-based BimlFlex implementation, be aware of these fundamental architectural differences:

  • No linked services. Fabric completely skips linked service generation. Connections use inline JSON properties instead.
  • No datasets. Fabric skips dataset generation entirely. Connection details are embedded directly in pipeline activities.
  • ExternalReference replaces connection names. Every Fabric connection requires an ExternalReference containing the artifact GUID from the Fabric portal. This GUID is used in every generated notebook and pipeline activity.
  • Deployment uses .platform files. Instead of ARM templates, BimlFlex generates .platform companion files for each notebook. Deployment uses Fabric git integration.
  • TridentNotebook replaces DatabricksNotebook. Pipeline activities that execute notebooks use the TridentNotebook activity type instead of DatabricksNotebook.
  • runMultiple() for orchestration. Fabric uses notebookutils.notebook.runMultiple() with a DAG definition for parallel notebook execution, which is unique to the Fabric integration.

Step 1: Create Your Project

Create a new project in BimlFlex and set the Integration Template to Data Factory (Fabric). This corresponds to IntegrationTemplateId = 6 internally.

  1. Open the BimlFlex App and navigate to the Projects editor
  2. Click Create to add a new project
  3. Set the Integration Template to Data Factory (Fabric)
  4. Configure the required connection slots:
Connection SlotPurposeRequired
SourceThe source system to extract data fromYes
TargetThe Fabric Lakehouse or Warehouse where data will be loadedYes
StageIntermediate staging connection (appears based on configuration)Conditional
Persistent StagePersistent staging area for history trackingConditional
LandingAzure storage for landing extracted files (appears when source is not Fabric)Conditional

The Landing connection slot visibility depends on your source configuration. When using Pushdown Extraction from a Fabric source, the landing connection is not required because data stays within Fabric. For non-Fabric sources, a landing area in Azure Blob Storage or ADLS Gen2 is required.

tip

BimlFlex provides two sample metadata sets to help you get started quickly. Load either Fabric Data Vault or Fabric Datamart from the Dashboard to see a pre-configured project.

Step 2: Configure Connections

Fabric connections use system types that determine the SQL dialect, identifier delimiters, and data type mappings for all generated code.

System Types

System TypeAbbreviationIDUse For
Fabric LakehouseFBRLH46Lakehouse targets. Generates Spark SQL with backtick delimiters (`). Uses Spark data types (STRING, BIGINT, DOUBLE, TIMESTAMP).
Fabric WarehouseFBRDW45Warehouse targets. Generates T-SQL with bracket delimiters ([ ]). Uses T-SQL data types, with DateTime capped at precision 6.
Fabric SQL DatabaseFBRSQL47SQL Database targets (preview).

Key Connection Fields

ExternalReference (Required for all Fabric connections)

The ExternalReference field must contain the artifact GUID for the Fabric Lakehouse or Warehouse. BimlFlex uses this GUID in every generated notebook header and pipeline activity to reference the correct workspace artifact.

To find the GUID:

  1. Open the Fabric portal and navigate to your workspace
  2. Select the Lakehouse or Warehouse artifact
  3. Copy the artifact ID from the URL or the item properties pane

The format is a standard GUID: a1b2c3d4-e5f6-7890-abcd-ef1234567890

If this field is empty, the build produces validation error CON_21005008:

Connection: 'YourConnection' - A Connection that is configured for Fabric must use External Reference.

Catalog

The Catalog field maps to the Lakehouse or Warehouse name. This name appears in all generated notebooks as the default Lakehouse/Warehouse context. Set it to the exact name of your Fabric Lakehouse or Warehouse artifact.

ConnectionType

Set to MicrosoftFabric for Fabric connections.

ExternalLocation

Required when using Pushdown Extraction with a source connection. Specify the OneLake path, for example:

abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<lakehouse-name>/Files/

If this field is missing when Pushdown Extraction is enabled, the build produces validation error CON_21005010:

Connection: 'YourConnection' - A Connection that is configured for Fabric with Pushdown Extraction must use External Location.

Step 3: No Linked Services or Datasets

Unlike Azure Data Factory, Fabric pipelines do not use linked services or datasets. If you are transitioning from an ADF implementation, you will not find linked service or dataset configuration screens for your Fabric project---this is expected.

When BimlFlex encounters a Fabric project during the build process, it skips linked service and dataset generation entirely. Instead, connection details are embedded as inline RawJsonProperties within each pipeline activity. These properties reference the Fabric artifact using the ExternalReference GUID:

"linkedService": {
"name": "MyLakehouse",
"properties": {
"annotations": [],
"type": "DataWarehouse",
"typeProperties": {
"endpoint": "<sql-endpoint>",
"artifactId": "<ExternalReference-GUID>",
"workspaceId": "<FabricWorkspaceId-GUID>"
}
}
}

This means every connection property you configure in BimlFlex (ExternalReference, Catalog, Connection String) flows directly into the pipeline JSON. There is no separate linked service artifact to deploy or manage.

Steps 4-5: Configure Fabric Settings

BimlFlex provides 20 Fabric-specific settings that control workspace configuration, notebook generation, pipeline behavior, and file handling. Configure these in the Settings editor.

Workspace Settings

SettingDefaultNotes
FabricWorkspaceId00000000-0000-0000-0000-000000000000Required. The GUID of your Fabric workspace. Find this in the Fabric portal URL when you navigate to your workspace. This value is referenced in every generated notebook header and pipeline activity. Must be a valid GUID format.
FabricLogicalId00000000-0000-0000-0000-000000000000GUID used in .platform files for git-based deployment. Required when deploying through Fabric git integration. Must be a valid GUID format.
FabricOutputPath@@OutputPath\FabricLocal build output directory where all generated Fabric artifacts (notebooks, .platform files, pipeline JSON) are written.
FabricWorkspaceName(empty)Display name of the Fabric workspace. Used for reference only; does not affect generated code.

Notebook Execution Settings

These settings control how notebooks execute within the Fabric environment when invoked via notebookutils.notebook.runMultiple().

SettingDefaultNotes
FabricNotebookConcurrency0 (disabled)Controls the maximum number of notebooks that execute in parallel within a runMultiple() DAG. When set to 0, the concurrency property is omitted from the DAG. Set to a value greater than 0 for production workloads to enable parallel execution.
FabricNotebookTimeout0 (disabled)Per-notebook timeout in seconds within a runMultiple() DAG. When set to 0, the timeoutInSeconds property is omitted.
FabricNotebookRetryInterval0 (disabled)Retry interval in seconds for notebook execution within a DAG. When set to 0, the property is omitted.
FabricNotebookTimeoutPerCell0 (disabled)Per-cell timeout in seconds for individual notebook cells. When set to 0, the property is omitted.

Pipeline Activity Settings

These settings apply to TridentNotebook activities in the generated Data Factory pipeline JSON.

SettingDefaultNotes
FabricActivityRetryAttempts0Number of times a pipeline activity retries on failure.
FabricActivityRetryInterval30Seconds between retry attempts for pipeline activities.
FabricActivityTimeout0.12:00:00Maximum activity duration. The default of 0.12:00:00 is 12 hours. Uses the Data Factory timespan format (d.hh:mm:ss).
FabricActivitySecureInputNWhen set to Y, masks the activity input in Fabric monitoring views.
FabricActivitySecureOutputNWhen set to Y, masks the activity output in Fabric monitoring views.

Notebook Generation Settings

These settings control how BimlFlex names, organizes, and generates supplementary notebooks.

SettingDefaultNotes
FabricUseDisplayFolderNControls whether notebooks are organized into subfolders based on their display folder. The default of N differs from the Databricks integration where folder structures are more commonly used.
FabricAppendNotebookName(empty)Adds a prefix or suffix to generated notebook names. If the value ends with _ (e.g., PRD_), it is treated as a prefix. If the value starts with _ (e.g., _v2), it is treated as a suffix.
FabricAddDropNotebooksNWhen set to Y, generates additional DROP TABLE notebooks for each target table.
FabricAddTruncateNotebooksNWhen set to Y, generates additional TRUNCATE TABLE notebooks for each target table.
FabricTempTableSchema(empty)Schema name to use for temporary tables in generated notebooks. When empty, temporary tables use the default schema.

File Handling Settings

These settings control how data files are read and written in COPY and Spark operations within generated notebooks.

SettingDefaultNotes
FabricCopyFormatOptions(empty)FORMAT_OPTIONS clause appended to COPY INTO statements in generated notebooks. Use this to specify file format details such as delimiters or headers.
FabricCopyOptions(empty)COPY_OPTIONS clause appended to COPY INTO statements. Use this to specify options like error tolerance or credential details.
FabricReadFilesOptions(empty)Options appended to spark.read.load() calls in generated notebooks. Use this to add Spark reader options such as mergeSchema or inferSchema.

Steps 6-7: Import and Configure Metadata

After configuring your connections and settings, import source metadata and configure your objects as you would for any BimlFlex project. There are two Fabric-specific behaviors to be aware of.

Schema Handling

Fabric Lakehouse (FBRLH) connections set IgnoreSchema internally when the schema is not relevant to the Lakehouse catalog structure. Schemas in BimlFlex are still used for logical organization of objects, but they map differently depending on the system type:

  • FBRLH: Schema names are used as Lakehouse schema namespaces with backtick delimiters (`schema`.`table`)
  • FBRDW: Schema names follow standard T-SQL conventions with bracket delimiters ([schema].[table])

Data Type Differences

The system type determines which data type family BimlFlex uses in generated DDL and notebooks:

Type CategoryFBRLH (Lakehouse)FBRDW (Warehouse)
StringsSTRINGVARCHAR, NVARCHAR
IntegersBIGINT, INTBIGINT, INT
DecimalsDOUBLE, DECIMALFLOAT, DECIMAL
Dates/TimesTIMESTAMPDATETIME2(6)
DelimitersBackticks (`)Brackets ([ ])
note

Fabric Warehouse uses DATETIME2(6) for date-time columns such as RowEffectiveToDate, not DATETIME2(7) as used by standard SQL Server targets. BimlFlex applies this precision cap automatically when generating code for FBRDW connections. The default effective-to-date value generated is CAST('9999-12-31' AS DATETIME2(6)).

Step 8: Data Vault and Data Mart Configuration

Supported Constructs

BimlFlex supports the following Data Vault and Data Mart constructs for Fabric Lakehouse targets:

ConstructStatusNotes
HubsSupportedCore business entity tables
LinksSupportedRelationship tables between hubs
Satellites (SAT, LSAT, RSAT, REF)SupportedDescriptive attribute tables with history
DimensionsSupportedData Mart dimensional tables
FactsSupportedData Mart fact tables
Unsupported Constructs

The following constructs are not yet available for Fabric Lakehouse. They are planned for future releases:

  • PIT (Point-in-Time) tables --- Not yet implemented for Fabric
  • Bridge tables --- Not yet implemented for Fabric
  • Views for FBRLH --- Fabric Lakehouse does not currently support view generation
  • Delete detection output --- Not yet available in Fabric-generated artifacts
  • Reload staging notebooks --- Not yet available for Fabric

For pipelines that require these constructs, consider using ADF with Databricks or Snowflake integration templates.

runMultiple DAG Orchestration

Fabric Data Vault loading uses notebookutils.notebook.runMultiple() to orchestrate parallel execution of Hub, Link, and Satellite notebooks. This is unique to the Fabric integration---ADF uses pipeline activities for orchestration, and Databricks uses job workflows.

For each source object, BimlFlex generates:

  1. Individual notebooks for each Hub, Link, and Satellite
  2. A control notebook that defines a DAG (Directed Acyclic Graph) and calls runMultiple() to execute the individual notebooks

The DAG structure looks like this in the generated control notebook:

DAG = {
"activities": [
{ "name": "HUB_Customer", "path": "HUB_Customer", "args": {"row_audit_id": row_audit_id} },
{ "name": "LNK_Customer_Order", "path": "LNK_Customer_Order", "args": {"row_audit_id": row_audit_id} },
{ "name": "SAT_Customer", "path": "SAT_Customer", "args": {"row_audit_id": row_audit_id} }
],
"concurrency": 3,
"timeoutInSeconds": 600
}

notebookutils.notebook.runMultiple(DAG, {"displayDAGViaGraphviz": False})

The concurrency property is controlled by FabricNotebookConcurrency, and timeoutInSeconds by FabricNotebookTimeout. When these settings are 0, the properties are omitted from the DAG.

Table deployment notebooks also use runMultiple() to create tables in parallel. BimlFlex generates deploy.tables.<database>.<schema> notebooks that orchestrate table creation for each schema.

Persistent Staging Patterns

When Persistent Staging Area (PSA) is configured, BimlFlex generates one of two distinct notebook patterns based on the PersistentStageHistory setting on the source object:

Merge pattern (PersistHistory = false): Generates a notebook that uses a MERGE statement to upsert records into the PSA table. This maintains only the current state of each record, overwriting previous values. The notebook checks if the PSA table is empty and uses an optimized full-load path for the initial load.

Insert pattern (PersistHistory = true): Generates a notebook that uses INSERT statements to append all incoming records to the PSA table, preserving complete change history. Like the merge pattern, it includes an empty-table check to optimize the initial load.

Both patterns include:

  • Staging source SQL to create temporary views from landed files
  • Empty-table detection with is_empty / is_delta checks for optimized first-load behavior
  • Delta collapse logic for incremental loads
  • Full insert logic for initial loads

Step 9: System Columns

BimlFlex automatically handles system column generation based on the target system type.

Hashing (FBRLH): Fabric Lakehouse uses SHA2(CAST(...), 256) by default for hash key generation. This is the Spark SQL SHA2 function, which differs from the T-SQL HASHBYTES function used by SQL Server targets. The hash algorithm can be configured through the standard BimlFlex hash settings.

Hashing (FBRDW): Fabric Warehouse uses CONVERT(CHAR(64), HASHBYTES('SHA2_256', CONVERT(VARCHAR(MAX), ...)), 2), consistent with T-SQL patterns but using the Fabric Warehouse engine.

RowEffectiveToDate (FBRDW): Fabric Warehouse generates DATETIME2(6) for the RowEffectiveToDate system column instead of DATETIME2(7) used by standard SQL Server. The default end-date value is CAST('9999-12-31' AS DATETIME2(6)).

RowEffectiveToDate (FBRLH): Fabric Lakehouse uses TO_TIMESTAMP('9999-12-31') as the Spark-native timestamp representation.

Step 10: Build and Deploy

Generated Artifacts

When you build a Fabric project, BimlFlex generates the following artifacts in the FabricOutputPath directory:

ArtifactLocationDescription
NotebooksNotebooks/<folder>/<name>.Notebook/notebook-content.pyPySpark notebooks with Fabric metadata headers. Each notebook includes the Lakehouse context (workspace ID, Lakehouse ID, Lakehouse name).
.platform filesNotebooks/<folder>/<name>.Notebook/.platformJSON companion file for each notebook. Required for Fabric git integration deployment.
Table creation notebooksTables/<database>/<schema>/create.table.<db>.<schema>.<table>.Notebook/Individual table DDL notebooks.
Deploy notebooksTables/<database>/_deploy/deploy.tables.<db>.<schema>.Notebook/Orchestration notebooks that use runMultiple() to create all tables in a schema in parallel.
Drop notebooksTables/<database>/_deploy/drop.tables.<database>.Notebook/Generated when FabricAddDropNotebooks = Y. Notebooks with DROP TABLE IF EXISTS statements.
Pipeline JSONPipeline definition filesUses TridentNotebook activities with inline connection references via RawJsonProperties.

BimlFlex does not generate datasets or linked services for Fabric projects.

.platform File Format

Every generated notebook is accompanied by a .platform file that Fabric git integration uses to identify and track the artifact:

{
"$schema": "https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json",
"metadata": {
"type": "Notebook",
"displayName": "deploy.tables.MyLakehouse.dbo",
"description": "Notebook for MyLakehouse.dbo deploy"
},
"config": {
"version": "2.0",
"logicalId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
}

The logicalId is derived from the FabricLogicalId setting or generated deterministically from the object's metadata. The displayName follows the notebook naming convention configured through FabricAppendNotebookName and FabricUseDisplayFolder.

Notebook Header Format

Every generated notebook begins with a Fabric-specific metadata header that sets the Lakehouse context:

# Fabric notebook source

# METADATA ********************

# META {
# META "kernel_info": {
# META "name": "synapse_pyspark"
# META },
# META "dependencies": {
# META "lakehouse": {
# META "default_lakehouse": "<ExternalReference-GUID>",
# META "default_lakehouse_name": "<Catalog-Name>",
# META "default_lakehouse_workspace_id": "<FabricWorkspaceId>",
# META "known_lakehouses": [
# META {
# META "id": "<ExternalReference-GUID>"
# META }
# META ]
# META }
# META }
# META }

This header tells Fabric which Lakehouse to attach when the notebook opens. The values come from FabricWorkspaceId (workspace), ExternalReference (Lakehouse artifact GUID), and Catalog (Lakehouse name).

Deployment Workflow

Fabric does not use ARM template deployment like ADF. Instead, deploy through Fabric git integration:

  1. Build the project in BimlStudio to generate all artifacts to the FabricOutputPath
  2. Push the generated .platform files and notebook folders to a git repository connected to your Fabric workspace
  3. Sync the Fabric workspace from the git repository to import the notebooks and pipelines
  4. Verify in the Fabric portal that all notebooks, tables, and pipeline definitions appear in your workspace

Alternatively, for manual deployment:

  1. Build the project in BimlStudio
  2. Open each generated notebook file and import it into your Fabric workspace manually
  3. Create the pipeline activities by importing the generated pipeline JSON
tip

For production workflows, Fabric git integration is strongly recommended over manual import. It provides version control, change tracking, and the ability to deploy consistently across environments.