Skip to main content

Automation and Patterns

This guide shows you how to automate ETL pipeline generation using metadata-driven development. You'll learn to replace manual package creation with reusable patterns that scale from 10 to 1,000+ tables.

Why Automation Matters for Data Engineers

Data engineers spend significant time on repetitive tasks: creating staging tables, building load packages, implementing error handling. Each task follows predictable patterns, yet traditional tools require manual implementation for every object.

The core insight: If you can describe the pattern, Biml can generate the implementation.

Think of it this way: SQL Server and SSIS automate data movement, but you still manually create packages. Biml automates package creation itself—reading your source schema and generating hundreds of packages in seconds.

Manual vs Automated: The Real Difference

TaskManual ApproachAutomated Approach
Create 100 staging tablesWrite 100 CREATE TABLE scriptsGenerate all from source metadata
Build 100 load packagesCreate each package in SSDTLoop through tables, generate packages
Add audit columns to all tablesEdit each table definitionAdd once in template, regenerate
Handle source schema changeFind and update affected packagesRegenerate all packages
Implement error handlingCopy/paste into each packageDefine pattern once, apply everywhere

Time comparison:

  • 100-table staging layer, manual: 2-3 weeks
  • 100-table staging layer, automated: 1-2 hours (including testing)

Core Automation Concepts

Metadata-Driven Development

Metadata-driven development separates what you want to build from how it gets built:

  1. Metadata describes your data structures (tables, columns, connections)
  2. Templates define the patterns (staging loads, error handling, logging)
  3. Output is generated automatically (SSIS packages, SQL scripts, pipelines)

When source systems change, update the metadata and regenerate. The templates remain unchanged.

The Biml Relational Hierarchy

Biml organizes database metadata in a hierarchy that mirrors your actual data infrastructure:

Connection
└── Database
└── Schema
└── Table
└── Columns

You can import this hierarchy directly from your source systems:

<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#
// Connect to source and import table definitions
var sourceConnection = RootNode.OleDbConnections["Source"];
var tables = sourceConnection.GetDatabaseSchema(
new List<string>{"Sales", "Production"}, // Schemas to include
null, // All tables
ImportOptions.ExcludeForeignKey | ImportOptions.ExcludeViews
).TableNodes;
#>

This single call retrieves all table and column metadata, ready for use in your templates.

The Pattern Approach

A pattern is a reusable solution to a recurring problem. In data engineering:

  • Staging pattern: Truncate target, load from source, add audit columns
  • Incremental pattern: Track high-water mark, load only changed records
  • Error handling pattern: Redirect bad rows, log errors, continue processing

Patterns become BimlScript templates. Write the pattern once, apply it to every table:

<# foreach (var table in RootNode.Tables) { #>
<Package Name="Load_<#= table.Name #>" ConstraintMode="Linear">
<!-- Pattern implementation here -->
</Package>
<# } #>

Practical Example: Automated Staging Layer

This example creates a complete staging layer: target tables and load packages for all source tables.

File Organization

Use a numbered naming convention to control build order:

1-Environment.biml        # Connections, databases (tier 10)
2-ImportMetadata.biml # Read source schema (tier 20)
3-DeployTables.biml # Generate DDL package (tier 30)
4-LoadPackages.biml # Generate load packages (tier 40)

Tier numbering tip: Use increments of 10 (10, 20, 30) rather than 1, 2, 3. This leaves room to insert files between existing tiers without renumbering everything.

Step 1: Define Environment

1-Environment.biml
<#@ template tier="10" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="Source"
ConnectionString="Provider=SQLNCLI11;Server=SourceServer;Initial Catalog=AdventureWorks;Integrated Security=SSPI;" />
<OleDbConnection Name="Target"
ConnectionString="Provider=SQLNCLI11;Server=TargetServer;Initial Catalog=Staging;Integrated Security=SSPI;" />
</Connections>
<Databases>
<Database Name="Staging" ConnectionName="Target" />
</Databases>
<Schemas>
<Schema Name="stg" DatabaseName="Staging" />
</Schemas>
</Biml>

Step 2: Import Source Metadata

2-ImportMetadata.biml
<#@ template tier="20" #>
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#
var sourceConnection = RootNode.OleDbConnections["Source"];
var includedSchemas = new List<string>{"Sales", "Production", "Person"};
var tables = sourceConnection.GetDatabaseSchema(
includedSchemas,
null,
ImportOptions.ExcludeForeignKey |
ImportOptions.ExcludeColumnDefault |
ImportOptions.ExcludeViews
).TableNodes;
#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<# foreach (var table in tables) { #>
<Table Name="<#= table.Name #>" SchemaName="Staging.stg">
<Columns>
<# foreach (var column in table.Columns) { #>
<#= column.GetBiml() #>
<# } #>
<!-- Audit columns added to every table -->
<Column Name="LoadDateTime" DataType="DateTime2" IsNullable="false" />
<Column Name="SourceSystem" DataType="String" Length="50" IsNullable="false" />
</Columns>
<Annotations>
<Annotation Tag="SourceSchema"><#= table.Schema #></Annotation>
<Annotation Tag="SourceTable"><#= table.Name #></Annotation>
</Annotations>
</Table>
<# } #>
</Tables>
</Biml>

Step 3: Generate Table Deployment Package

3-DeployTables.biml
<#@ template tier="30" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="Deploy_StagingTables" ConstraintMode="Parallel">
<Tasks>
<# foreach (var table in RootNode.Tables) { #>
<ExecuteSQL Name="Create_<#= table.Name #>" ConnectionName="Target">
<DirectInput><#= table.GetDropAndCreateDdl() #></DirectInput>
</ExecuteSQL>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>

Step 4: Generate Load Packages

4-LoadPackages.biml
<#@ template tier="40" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<# foreach (var table in RootNode.Tables) {
var sourceSchema = table.GetTag("SourceSchema");
var sourceTable = table.GetTag("SourceTable");
#>
<Package Name="Load_<#= table.Name #>" ConstraintMode="Linear">
<Tasks>
<!-- Step 1: Truncate target -->
<ExecuteSQL Name="Truncate" ConnectionName="Target">
<DirectInput>TRUNCATE TABLE [stg].[<#= table.Name #>]</DirectInput>
</ExecuteSQL>

<!-- Step 2: Load data with audit columns -->
<Dataflow Name="LoadData">
<Transformations>
<OleDbSource Name="Source" ConnectionName="Source">
<DirectInput>SELECT * FROM [<#= sourceSchema #>].[<#= sourceTable #>]</DirectInput>
</OleDbSource>

<DerivedColumns Name="AddAuditColumns">
<Columns>
<Column Name="LoadDateTime" DataType="DateTime2" Length="8">
(DT_DBTIMESTAMP2,7)GETDATE()
</Column>
<Column Name="SourceSystem" DataType="String" Length="50">
"AdventureWorks"
</Column>
</Columns>
</DerivedColumns>

<OleDbDestination Name="Target" ConnectionName="Target">
<ExternalTableOutput Table="[stg].[<#= table.Name #>]" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>

<!-- Master workflow to execute all loads -->
<Package Name="Master_LoadStaging" ConstraintMode="Parallel">
<Tasks>
<# foreach (var table in RootNode.Tables) { #>
<ExecutePackage Name="Exec_<#= table.Name #>">
<Package PackageName="Load_<#= table.Name #>" />
</ExecutePackage>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>

Step 5: Build and Deploy

  1. Build Step 1: Select files 1-3, generate DDL package
  2. Run DDL package: Creates all staging tables
  3. Build Step 2: Select files 1-4, generate load packages
  4. Deploy and execute: Run Master_LoadStaging

Platform-Specific Automation

SSIS (SQL Server Integration Services)

The examples above generate SSIS packages directly. Key capabilities:

  • Data Flow Tasks: OleDbSource, transformations, OleDbDestination
  • Control Flow: ExecuteSQL, ExecutePackage, containers
  • Event Handlers: OnError, OnPreExecute for logging
  • Variables and Parameters: Dynamic configuration

See Biml Basics for SSIS for detailed SSIS patterns.

Azure Data Factory (ADF)

BimlFlex generates ADF pipelines using the same metadata-driven approach:

  • Linked Services: Connection definitions for Azure, on-premises, SaaS
  • Datasets: Source and target table definitions
  • Pipelines: Copy activities, data flows, orchestration
  • Integration Runtimes: Self-hosted for on-premises sources

BimlFlex maps your metadata to ADF JSON definitions, enabling consistent patterns across SSIS and ADF.

See BimlFlex ADF Documentation for implementation details.

Databricks

BimlFlex supports Databricks for modern data lakehouse architectures:

  • Notebooks: Generated Python/SQL for data transformations
  • Delta Lake: Managed tables with ACID transactions
  • Unity Catalog: Metadata and governance integration
  • Workflows: Orchestration of notebook jobs

See BimlFlex Databricks Documentation for patterns.

Snowflake

BimlFlex generates Snowflake-specific artifacts:

  • Stages: External and internal staging areas
  • Stored Procedures: Load logic in Snowflake SQL
  • Tasks: Scheduled execution
  • Streams: Change data capture

See BimlFlex Snowflake Documentation for implementation.

Microsoft Fabric

BimlFlex supports Microsoft Fabric for unified analytics:

  • Lakehouses: OneLake storage with Delta tables
  • Dataflows: Power Query-based transformations
  • Notebooks: Spark-based processing
  • Pipelines: Orchestration and scheduling

See BimlFlex Fabric Documentation for patterns.

CI/CD Integration

Version Control Best Practices

File organization:

/biml
/environment # Connection and config files
/metadata # Table import scripts
/patterns # Reusable pattern templates
/packages # Package generation scripts
/output # Generated artifacts (gitignore these)

Naming conventions:

  • Use tier prefixes: 1-Environment.biml, 2-Metadata.biml
  • Descriptive names: Load_Staging_Sales.biml
  • Separate concerns: one file per pattern type

Build Pipeline Example

azure-pipelines.yml
trigger:
- main

pool:
vmImage: 'windows-latest'

steps:
- task: PowerShell@2
displayName: 'Build Biml'
inputs:
targetType: 'inline'
script: |
# Build using BimlFlex CLI or BimlStudio command line
& "C:\Program Files\Varigence\BimlStudio\Biml.exe" `
/build /project:MyProject.mst /output:$(Build.ArtifactStagingDirectory)

- task: PublishBuildArtifacts@1
displayName: 'Publish Artifacts'
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: 'BimlOutput'

Key CI/CD Benefits

  1. Reproducible builds: Same metadata produces identical packages
  2. Automated testing: Validate package structure before deployment
  3. Environment promotion: Dev → Test → Prod with config changes only
  4. Audit trail: Git history tracks all metadata and pattern changes

Automation Benefits Summary

MetricManual ApproachAutomated Approach
100 tables staging2-3 weeks1-2 hours
Schema changesUpdate each packageRegenerate all
New audit requirementEdit 100+ packagesUpdate 1 template
Error handlingCopy/paste per packageSingle pattern
ConsistencyVariable qualityGuaranteed
DocumentationManual maintenanceGenerated from metadata

Next Steps