Automation and Patterns

This guide shows you how to automate ETL pipeline generation using metadata-driven development. You'll learn to replace manual package creation with reusable patterns that scale from 10 to 1,000+ tables.

Why Automation Matters for Data Engineers

Data engineers spend significant time on repetitive tasks: creating staging tables, building load packages, implementing error handling. Each task follows predictable patterns, yet traditional tools require manual implementation for every object.

The core insight: If you can describe the pattern, Biml can generate the implementation.

Think of it this way: SQL Server and SSIS automate data movement, but you still manually create packages. Biml automates package creation itself—reading your source schema and generating hundreds of packages in seconds.

Manual vs Automated: The Real Difference

Task	Manual Approach	Automated Approach
Create 100 staging tables	Write 100 CREATE TABLE scripts	Generate all from source metadata
Build 100 load packages	Create each package in SSDT	Loop through tables, generate packages
Add audit columns to all tables	Edit each table definition	Add once in template, regenerate
Handle source schema change	Find and update affected packages	Regenerate all packages
Implement error handling	Copy/paste into each package	Define pattern once, apply everywhere

Time comparison:

100-table staging layer, manual: 2-3 weeks
100-table staging layer, automated: 1-2 hours (including testing)

Core Automation Concepts

Metadata-Driven Development

Metadata-driven development separates what you want to build from how it gets built:

Metadata describes your data structures (tables, columns, connections)
Templates define the patterns (staging loads, error handling, logging)
Output is generated automatically (SSIS packages, SQL scripts, pipelines)

When source systems change, update the metadata and regenerate. The templates remain unchanged.

The Biml Relational Hierarchy

Biml organizes database metadata in a hierarchy that mirrors your actual data infrastructure:

Connection
  └── Database
        └── Schema
              └── Table
                    └── Columns

You can import this hierarchy directly from your source systems:

<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#
// Connect to source and import table definitions
var sourceConnection = RootNode.OleDbConnections["Source"];
var tables = sourceConnection.GetDatabaseSchema(
    new List<string>{"Sales", "Production"},  // Schemas to include
    null,                                       // All tables
    ImportOptions.ExcludeForeignKey | ImportOptions.ExcludeViews
).TableNodes;
#>

This single call retrieves all table and column metadata, ready for use in your templates.

The Pattern Approach

A pattern is a reusable solution to a recurring problem. In data engineering:

Staging pattern: Truncate target, load from source, add audit columns
Incremental pattern: Track high-water mark, load only changed records
Error handling pattern: Redirect bad rows, log errors, continue processing

Patterns become BimlScript templates. Write the pattern once, apply it to every table:

<# foreach (var table in RootNode.Tables) { #>
<Package Name="Load_<#= table.Name #>" ConstraintMode="Linear">
  <!-- Pattern implementation here -->
</Package>
<# } #>

Practical Example: Automated Staging Layer

This example creates a complete staging layer: target tables and load packages for all source tables.

File Organization

Use a numbered naming convention to control build order:

1-Environment.biml        # Connections, databases (tier 10)
2-ImportMetadata.biml     # Read source schema (tier 20)
3-DeployTables.biml       # Generate DDL package (tier 30)
4-LoadPackages.biml       # Generate load packages (tier 40)

Tier numbering tip: Use increments of 10 (10, 20, 30) rather than 1, 2, 3. This leaves room to insert files between existing tiers without renumbering everything.

Step 1: Define Environment

1-Environment.biml
<#@ template tier="10" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="Source"
      ConnectionString="Provider=SQLNCLI11;Server=SourceServer;Initial Catalog=AdventureWorks;Integrated Security=SSPI;" />
    <OleDbConnection Name="Target"
      ConnectionString="Provider=SQLNCLI11;Server=TargetServer;Initial Catalog=Staging;Integrated Security=SSPI;" />
  </Connections>
  <Databases>
    <Database Name="Staging" ConnectionName="Target" />
  </Databases>
  <Schemas>
    <Schema Name="stg" DatabaseName="Staging" />
  </Schemas>
</Biml>

Step 2: Import Source Metadata

2-ImportMetadata.biml
<#@ template tier="20" #>
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#
var sourceConnection = RootNode.OleDbConnections["Source"];
var includedSchemas = new List<string>{"Sales", "Production", "Person"};
var tables = sourceConnection.GetDatabaseSchema(
    includedSchemas,
    null,
    ImportOptions.ExcludeForeignKey |
    ImportOptions.ExcludeColumnDefault |
    ImportOptions.ExcludeViews
).TableNodes;
#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Tables>
    <# foreach (var table in tables) { #>
    <Table Name="<#= table.Name #>" SchemaName="Staging.stg">
      <Columns>
        <# foreach (var column in table.Columns) { #>
        <#= column.GetBiml() #>
        <# } #>
        <!-- Audit columns added to every table -->
        <Column Name="LoadDateTime" DataType="DateTime2" IsNullable="false" />
        <Column Name="SourceSystem" DataType="String" Length="50" IsNullable="false" />
      </Columns>
      <Annotations>
        <Annotation Tag="SourceSchema"><#= table.Schema #></Annotation>
        <Annotation Tag="SourceTable"><#= table.Name #></Annotation>
      </Annotations>
    </Table>
    <# } #>
  </Tables>
</Biml>

Step 3: Generate Table Deployment Package

3-DeployTables.biml
<#@ template tier="30" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <Package Name="Deploy_StagingTables" ConstraintMode="Parallel">
      <Tasks>
        <# foreach (var table in RootNode.Tables) { #>
        <ExecuteSQL Name="Create_<#= table.Name #>" ConnectionName="Target">
          <DirectInput><#= table.GetDropAndCreateDdl() #></DirectInput>
        </ExecuteSQL>
        <# } #>
      </Tasks>
    </Package>
  </Packages>
</Biml>

Step 4: Generate Load Packages

4-LoadPackages.biml
<#@ template tier="40" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <# foreach (var table in RootNode.Tables) {
       var sourceSchema = table.GetTag("SourceSchema");
       var sourceTable = table.GetTag("SourceTable");
    #>
    <Package Name="Load_<#= table.Name #>" ConstraintMode="Linear">
      <Tasks>
        <!-- Step 1: Truncate target -->
        <ExecuteSQL Name="Truncate" ConnectionName="Target">
          <DirectInput>TRUNCATE TABLE [stg].[<#= table.Name #>]</DirectInput>
        </ExecuteSQL>

        <!-- Step 2: Load data with audit columns -->
        <Dataflow Name="LoadData">
          <Transformations>
            <OleDbSource Name="Source" ConnectionName="Source">
              <DirectInput>SELECT * FROM [<#= sourceSchema #>].[<#= sourceTable #>]</DirectInput>
            </OleDbSource>

            <DerivedColumns Name="AddAuditColumns">
              <Columns>
                <Column Name="LoadDateTime" DataType="DateTime2" Length="8">
                  (DT_DBTIMESTAMP2,7)GETDATE()
                </Column>
                <Column Name="SourceSystem" DataType="String" Length="50">
                  "AdventureWorks"
                </Column>
              </Columns>
            </DerivedColumns>

            <OleDbDestination Name="Target" ConnectionName="Target">
              <ExternalTableOutput Table="[stg].[<#= table.Name #>]" />
            </OleDbDestination>
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
    <# } #>

    <!-- Master workflow to execute all loads -->
    <Package Name="Master_LoadStaging" ConstraintMode="Parallel">
      <Tasks>
        <# foreach (var table in RootNode.Tables) { #>
        <ExecutePackage Name="Exec_<#= table.Name #>">
          <Package PackageName="Load_<#= table.Name #>" />
        </ExecutePackage>
        <# } #>
      </Tasks>
    </Package>
  </Packages>
</Biml>

Step 5: Build and Deploy

Build Step 1: Select files 1-3, generate DDL package
Run DDL package: Creates all staging tables
Build Step 2: Select files 1-4, generate load packages
Deploy and execute: Run Master_LoadStaging

Platform-Specific Automation

SSIS (SQL Server Integration Services)

The examples above generate SSIS packages directly. Key capabilities:

Data Flow Tasks: OleDbSource, transformations, OleDbDestination
Control Flow: ExecuteSQL, ExecutePackage, containers
Event Handlers: OnError, OnPreExecute for logging
Variables and Parameters: Dynamic configuration

See Biml Basics for SSIS for detailed SSIS patterns.

Azure Data Factory (ADF)

BimlFlex generates ADF pipelines using the same metadata-driven approach:

Linked Services: Connection definitions for Azure, on-premises, SaaS
Datasets: Source and target table definitions
Pipelines: Copy activities, data flows, orchestration
Integration Runtimes: Self-hosted for on-premises sources

BimlFlex maps your metadata to ADF JSON definitions, enabling consistent patterns across SSIS and ADF.

See BimlFlex ADF Documentation for implementation details.

Databricks

BimlFlex supports Databricks for modern data lakehouse architectures:

Notebooks: Generated Python/SQL for data transformations
Delta Lake: Managed tables with ACID transactions
Unity Catalog: Metadata and governance integration
Workflows: Orchestration of notebook jobs

See BimlFlex Databricks Documentation for patterns.

Snowflake

BimlFlex generates Snowflake-specific artifacts:

Stages: External and internal staging areas
Stored Procedures: Load logic in Snowflake SQL
Tasks: Scheduled execution
Streams: Change data capture

See BimlFlex Snowflake Documentation for implementation.

Microsoft Fabric

BimlFlex supports Microsoft Fabric for unified analytics:

Lakehouses: OneLake storage with Delta tables
Dataflows: Power Query-based transformations
Notebooks: Spark-based processing
Pipelines: Orchestration and scheduling

See BimlFlex Fabric Documentation for patterns.

CI/CD Integration

Version Control Best Practices

File organization:

/biml
  /environment          # Connection and config files
  /metadata             # Table import scripts
  /patterns             # Reusable pattern templates
  /packages             # Package generation scripts
  /output               # Generated artifacts (gitignore these)

Naming conventions:

Use tier prefixes: 1-Environment.biml, 2-Metadata.biml
Descriptive names: Load_Staging_Sales.biml
Separate concerns: one file per pattern type

Build Pipeline Example

azure-pipelines.yml
trigger:
  - main

pool:
  vmImage: 'windows-latest'

steps:
- task: PowerShell@2
  displayName: 'Build Biml'
  inputs:
    targetType: 'inline'
    script: |
      # Build using BimlFlex CLI or BimlStudio command line
      & "C:\Program Files\Varigence\BimlStudio\Biml.exe" `
        /build /project:MyProject.mst /output:$(Build.ArtifactStagingDirectory)

- task: PublishBuildArtifacts@1
  displayName: 'Publish Artifacts'
  inputs:
    PathtoPublish: '$(Build.ArtifactStagingDirectory)'
    ArtifactName: 'BimlOutput'

Key CI/CD Benefits

Reproducible builds: Same metadata produces identical packages
Automated testing: Validate package structure before deployment
Environment promotion: Dev → Test → Prod with config changes only
Audit trail: Git history tracks all metadata and pattern changes

Automation Benefits Summary

Metric	Manual Approach	Automated Approach
100 tables staging	2-3 weeks	1-2 hours
Schema changes	Update each package	Regenerate all
New audit requirement	Edit 100+ packages	Update 1 template
Error handling	Copy/paste per package	Single pattern
Consistency	Variable quality	Guaranteed
Documentation	Manual maintenance	Generated from metadata

Next Steps

Common Patterns - Ready-to-use patterns for staging, incremental loads, error handling
Build Automated Staging Layer - Complete end-to-end tutorial
Introduction to BimlScript - Deep dive into code nuggets and directives
Troubleshooting Guide - Debug compilation errors
BimlFlex - Enterprise platform with UI-driven metadata management

Why Automation Matters for Data Engineers​

Manual vs Automated: The Real Difference​

Core Automation Concepts​

Metadata-Driven Development​

The Biml Relational Hierarchy​

The Pattern Approach​

Practical Example: Automated Staging Layer​

File Organization​

Step 1: Define Environment​

Step 2: Import Source Metadata​

Step 3: Generate Table Deployment Package​

Step 4: Generate Load Packages​

Step 5: Build and Deploy​

Platform-Specific Automation​

SSIS (SQL Server Integration Services)​

Azure Data Factory (ADF)​

Databricks​

Snowflake​

Microsoft Fabric​

CI/CD Integration​

Version Control Best Practices​

Build Pipeline Example​

Key CI/CD Benefits​

Automation Benefits Summary​

Next Steps​