Automation and Patterns
This guide shows you how to automate ETL pipeline generation using metadata-driven development. You'll learn to replace manual package creation with reusable patterns that scale from 10 to 1,000+ tables.
Why Automation Matters for Data Engineers
Data engineers spend significant time on repetitive tasks: creating staging tables, building load packages, implementing error handling. Each task follows predictable patterns, yet traditional tools require manual implementation for every object.
The core insight: If you can describe the pattern, Biml can generate the implementation.
Think of it this way: SQL Server and SSIS automate data movement, but you still manually create packages. Biml automates package creation itself—reading your source schema and generating hundreds of packages in seconds.
Manual vs Automated: The Real Difference
| Task | Manual Approach | Automated Approach |
|---|---|---|
| Create 100 staging tables | Write 100 CREATE TABLE scripts | Generate all from source metadata |
| Build 100 load packages | Create each package in SSDT | Loop through tables, generate packages |
| Add audit columns to all tables | Edit each table definition | Add once in template, regenerate |
| Handle source schema change | Find and update affected packages | Regenerate all packages |
| Implement error handling | Copy/paste into each package | Define pattern once, apply everywhere |
Time comparison:
- 100-table staging layer, manual: 2-3 weeks
- 100-table staging layer, automated: 1-2 hours (including testing)
Core Automation Concepts
Metadata-Driven Development
Metadata-driven development separates what you want to build from how it gets built:
- Metadata describes your data structures (tables, columns, connections)
- Templates define the patterns (staging loads, error handling, logging)
- Output is generated automatically (SSIS packages, SQL scripts, pipelines)
When source systems change, update the metadata and regenerate. The templates remain unchanged.
The Biml Relational Hierarchy
Biml organizes database metadata in a hierarchy that mirrors your actual data infrastructure:
Connection
└── Database
└── Schema
└── Table
└── Columns
You can import this hierarchy directly from your source systems:
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#
// Connect to source and import table definitions
var sourceConnection = RootNode.OleDbConnections["Source"];
var tables = sourceConnection.GetDatabaseSchema(
new List<string>{"Sales", "Production"}, // Schemas to include
null, // All tables
ImportOptions.ExcludeForeignKey | ImportOptions.ExcludeViews
).TableNodes;
#>
This single call retrieves all table and column metadata, ready for use in your templates.
The Pattern Approach
A pattern is a reusable solution to a recurring problem. In data engineering:
- Staging pattern: Truncate target, load from source, add audit columns
- Incremental pattern: Track high-water mark, load only changed records
- Error handling pattern: Redirect bad rows, log errors, continue processing
Patterns become BimlScript templates. Write the pattern once, apply it to every table:
<# foreach (var table in RootNode.Tables) { #>
<Package Name="Load_<#= table.Name #>" ConstraintMode="Linear">
<!-- Pattern implementation here -->
</Package>
<# } #>
Practical Example: Automated Staging Layer
This example creates a complete staging layer: target tables and load packages for all source tables.
File Organization
Use a numbered naming convention to control build order:
1-Environment.biml # Connections, databases (tier 10)
2-ImportMetadata.biml # Read source schema (tier 20)
3-DeployTables.biml # Generate DDL package (tier 30)
4-LoadPackages.biml # Generate load packages (tier 40)
Tier numbering tip: Use increments of 10 (10, 20, 30) rather than 1, 2, 3. This leaves room to insert files between existing tiers without renumbering everything.
Step 1: Define Environment
<#@ template tier="10" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="Source"
ConnectionString="Provider=SQLNCLI11;Server=SourceServer;Initial Catalog=AdventureWorks;Integrated Security=SSPI;" />
<OleDbConnection Name="Target"
ConnectionString="Provider=SQLNCLI11;Server=TargetServer;Initial Catalog=Staging;Integrated Security=SSPI;" />
</Connections>
<Databases>
<Database Name="Staging" ConnectionName="Target" />
</Databases>
<Schemas>
<Schema Name="stg" DatabaseName="Staging" />
</Schemas>
</Biml>
Step 2: Import Source Metadata
<#@ template tier="20" #>
<#@ import namespace="Varigence.Biml.CoreLowerer.SchemaManagement" #>
<#
var sourceConnection = RootNode.OleDbConnections["Source"];
var includedSchemas = new List<string>{"Sales", "Production", "Person"};
var tables = sourceConnection.GetDatabaseSchema(
includedSchemas,
null,
ImportOptions.ExcludeForeignKey |
ImportOptions.ExcludeColumnDefault |
ImportOptions.ExcludeViews
).TableNodes;
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<# foreach (var table in tables) { #>
<Table Name="<#= table.Name #>" SchemaName="Staging.stg">
<Columns>
<# foreach (var column in table.Columns) { #>
<#= column.GetBiml() #>
<# } #>
<!-- Audit columns added to every table -->
<Column Name="LoadDateTime" DataType="DateTime2" IsNullable="false" />
<Column Name="SourceSystem" DataType="String" Length="50" IsNullable="false" />
</Columns>
<Annotations>
<Annotation Tag="SourceSchema"><#= table.Schema #></Annotation>
<Annotation Tag="SourceTable"><#= table.Name #></Annotation>
</Annotations>
</Table>
<# } #>
</Tables>
</Biml>
Step 3: Generate Table Deployment Package
<#@ template tier="30" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="Deploy_StagingTables" ConstraintMode="Parallel">
<Tasks>
<# foreach (var table in RootNode.Tables) { #>
<ExecuteSQL Name="Create_<#= table.Name #>" ConnectionName="Target">
<DirectInput><#= table.GetDropAndCreateDdl() #></DirectInput>
</ExecuteSQL>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>
Step 4: Generate Load Packages
<#@ template tier="40" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<# foreach (var table in RootNode.Tables) {
var sourceSchema = table.GetTag("SourceSchema");
var sourceTable = table.GetTag("SourceTable");
#>
<Package Name="Load_<#= table.Name #>" ConstraintMode="Linear">
<Tasks>
<!-- Step 1: Truncate target -->
<ExecuteSQL Name="Truncate" ConnectionName="Target">
<DirectInput>TRUNCATE TABLE [stg].[<#= table.Name #>]</DirectInput>
</ExecuteSQL>
<!-- Step 2: Load data with audit columns -->
<Dataflow Name="LoadData">
<Transformations>
<OleDbSource Name="Source" ConnectionName="Source">
<DirectInput>SELECT * FROM [<#= sourceSchema #>].[<#= sourceTable #>]</DirectInput>
</OleDbSource>
<DerivedColumns Name="AddAuditColumns">
<Columns>
<Column Name="LoadDateTime" DataType="DateTime2" Length="8">
(DT_DBTIMESTAMP2,7)GETDATE()
</Column>
<Column Name="SourceSystem" DataType="String" Length="50">
"AdventureWorks"
</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="Target" ConnectionName="Target">
<ExternalTableOutput Table="[stg].[<#= table.Name #>]" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
<!-- Master workflow to execute all loads -->
<Package Name="Master_LoadStaging" ConstraintMode="Parallel">
<Tasks>
<# foreach (var table in RootNode.Tables) { #>
<ExecutePackage Name="Exec_<#= table.Name #>">
<Package PackageName="Load_<#= table.Name #>" />
</ExecutePackage>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>
Step 5: Build and Deploy
- Build Step 1: Select files 1-3, generate DDL package
- Run DDL package: Creates all staging tables
- Build Step 2: Select files 1-4, generate load packages
- Deploy and execute: Run Master_LoadStaging
Platform-Specific Automation
SSIS (SQL Server Integration Services)
The examples above generate SSIS packages directly. Key capabilities:
- Data Flow Tasks: OleDbSource, transformations, OleDbDestination
- Control Flow: ExecuteSQL, ExecutePackage, containers
- Event Handlers: OnError, OnPreExecute for logging
- Variables and Parameters: Dynamic configuration
See Biml Basics for SSIS for detailed SSIS patterns.
Azure Data Factory (ADF)
BimlFlex generates ADF pipelines using the same metadata-driven approach:
- Linked Services: Connection definitions for Azure, on-premises, SaaS
- Datasets: Source and target table definitions
- Pipelines: Copy activities, data flows, orchestration
- Integration Runtimes: Self-hosted for on-premises sources
BimlFlex maps your metadata to ADF JSON definitions, enabling consistent patterns across SSIS and ADF.
See BimlFlex ADF Documentation for implementation details.
Databricks
BimlFlex supports Databricks for modern data lakehouse architectures:
- Notebooks: Generated Python/SQL for data transformations
- Delta Lake: Managed tables with ACID transactions
- Unity Catalog: Metadata and governance integration
- Workflows: Orchestration of notebook jobs
See BimlFlex Databricks Documentation for patterns.
Snowflake
BimlFlex generates Snowflake-specific artifacts:
- Stages: External and internal staging areas
- Stored Procedures: Load logic in Snowflake SQL
- Tasks: Scheduled execution
- Streams: Change data capture
See BimlFlex Snowflake Documentation for implementation.
Microsoft Fabric
BimlFlex supports Microsoft Fabric for unified analytics:
- Lakehouses: OneLake storage with Delta tables
- Dataflows: Power Query-based transformations
- Notebooks: Spark-based processing
- Pipelines: Orchestration and scheduling
See BimlFlex Fabric Documentation for patterns.
CI/CD Integration
Version Control Best Practices
File organization:
/biml
/environment # Connection and config files
/metadata # Table import scripts
/patterns # Reusable pattern templates
/packages # Package generation scripts
/output # Generated artifacts (gitignore these)
Naming conventions:
- Use tier prefixes:
1-Environment.biml,2-Metadata.biml - Descriptive names:
Load_Staging_Sales.biml - Separate concerns: one file per pattern type
Build Pipeline Example
trigger:
- main
pool:
vmImage: 'windows-latest'
steps:
- task: PowerShell@2
displayName: 'Build Biml'
inputs:
targetType: 'inline'
script: |
# Build using BimlFlex CLI or BimlStudio command line
& "C:\Program Files\Varigence\BimlStudio\Biml.exe" `
/build /project:MyProject.mst /output:$(Build.ArtifactStagingDirectory)
- task: PublishBuildArtifacts@1
displayName: 'Publish Artifacts'
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: 'BimlOutput'
Key CI/CD Benefits
- Reproducible builds: Same metadata produces identical packages
- Automated testing: Validate package structure before deployment
- Environment promotion: Dev → Test → Prod with config changes only
- Audit trail: Git history tracks all metadata and pattern changes
Automation Benefits Summary
| Metric | Manual Approach | Automated Approach |
|---|---|---|
| 100 tables staging | 2-3 weeks | 1-2 hours |
| Schema changes | Update each package | Regenerate all |
| New audit requirement | Edit 100+ packages | Update 1 template |
| Error handling | Copy/paste per package | Single pattern |
| Consistency | Variable quality | Guaranteed |
| Documentation | Manual maintenance | Generated from metadata |
Next Steps
- Common Patterns - Ready-to-use patterns for staging, incremental loads, error handling
- Build Automated Staging Layer - Complete end-to-end tutorial
- Introduction to BimlScript - Deep dive into code nuggets and directives
- Troubleshooting Guide - Debug compilation errors
- BimlFlex - Enterprise platform with UI-driven metadata management