Building a Common ETL Framework with Biml
Why This Pattern Matters
ETL teams that maintain dozens or hundreds of SSIS packages often discover that copy and paste templates drift over time. A package built last quarter looks different from one built today, framework upgrades have to be applied by hand, and new staff have to learn the conventions through inspection rather than configuration. A metadata-driven Biml project keeps every package aligned with one canonical template. When the framework changes, the packages regenerate.
The Two Layers in Biml
Biml is an XML dialect that describes objects to generate: packages, connections, tasks, containers, data flows, and so on. The compiler reads the XML and emits the corresponding .dtsx files.
BimlScript layers on top, embedding C# (or VB) inside the XML with code nuggets. The nuggets read external metadata, loop, evaluate expressions, and emit further Biml. The result is a template language that turns a metadata table into a complete SSIS project.
A Minimal Static Package
The simplest Biml file declares a single package and is converted to a single .dtsx file. The package element names the package, contains connection references, package parameters, and a control flow built from tasks and containers. The skeleton looks roughly like this:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="LoadCustomerStage" ConstraintMode="Linear">
<Tasks>
<Container Name="MainControlFlow" ConstraintMode="Linear">
<Tasks>
<!-- Control flow tasks go here -->
</Tasks>
</Container>
</Tasks>
</Package>
</Packages>
</Biml>
Running this through Biml emits a working .dtsx package. The structure is static: the package name and the control flow are fixed.
Replacing Static Values with Metadata
The next step replaces the literal values with expressions driven by a metadata table. A 'pkg_definition' table might hold:
| Column | Description |
|---|---|
| PackageKey | Identity primary key |
| PackageName | Name to use for the generated package |
| ConstraintMode | Linear or Parallel |
| ProtectionLevel | The SSIS ProtectionLevel value |
A BimlScript loop reads the table and emits one package per row. The package name and protection level come from the row values rather than being hard coded:
<#@ template language="C#" hostspecific="true" tier="0" #>
<#@ import namespace="System.Data" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# var configConn = (AstDbConnectionNode)RootNode.Connections["FrameworkConfig"]; #>
<Packages>
<#
var rows = ExternalDataAccess.GetDataTable(configConn.ConnectionString,
"SELECT PackageName, ConstraintMode, ProtectionLevel FROM dbo.pkg_definition WHERE Enabled = 1");
foreach (DataRow row in rows.Rows) {
#>
<Package
Name="<#=row["PackageName"]#>"
ConstraintMode="<#=row["ConstraintMode"]#>"
ProtectionLevel="<#=row["ProtectionLevel"]#>">
<Tasks>
<Container Name="MainControlFlow" ConstraintMode="Linear">
<Tasks>
<Expression Name="Placeholder" Expression="1" />
</Tasks>
</Container>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>
Adding a new package now means adding a row to the table and regenerating. The framework template is defined in one place.
Why a Framework Beats One-off Templates
The benefit of running every package through one template is consistency. When a framework change lands, for example a new audit task, a logging convention, or a notification step, the change goes into the template once and every generated package adopts it on the next build. Without generation, the same change would have to be edited into each package by hand, with the risk that some packages are missed or modified inconsistently.
The model also separates the framework (the template) from the specifics (the metadata). The framework is owned by the team that maintains conventions; the metadata is owned by the developers building individual loads. The two evolve independently.
What Comes Next
The next pattern to add is the data flow itself, driven from column-level mapping metadata between source and target. With column mappings in the table, the same template can generate the OLE DB Source, the data flow transformations, and the OLE DB Destination, completing the per-package automation.
Tools to Use
- BimlExpress is the free Visual Studio extension that compiles Biml and provides syntax highlighting and IntelliSense.
- BimlStudio is the standalone IDE with additional features including the metadata model and a built in compiler.
- BimlOnline is the free online editor that converts existing .dtsx packages to Biml, useful for reverse engineering a project before adopting a generation workflow.