Content Driven Extraction with Biml: Part 1
Why Drive ETL from Database Content
Defining solution objects in Biml works, but it ties the metadata to the SSDT representation. An alternative is to keep the ETL definition in a relational model. Stored procedures or views return rows that describe connections, source-to-target mappings, and column lists. A Biml script consumes those rows and emits packages. The benefits are language-agnostic editing, multiple front-end interfaces, and consistent generation of large package counts.
This is not a data-driven modeling pattern. It assumes the target model already exists and is concerned with managing how data flows into it.
The Sample Mapping
The example moves data from a source database into a staging area. Source and target column names do not match, so default name-based mapping cannot be relied on. Source and target table names also do not match, so the script must look up the correct names. A load_dt column on the target is populated by the ETL itself.
Three Metadata Procedures
The metadata environment uses three procedures.
- staging_connections returns connection name and connection definition pairs.
- staging_extractions returns extraction header rows: source connection, source schema, source table, target connection, target schema, target table, and an extraction_id.
- staging_column_mappings returns the column-level source-to-target mappings for a given extraction_id.
Building Connections in Tier 1
The first generator file uses the staging_connections procedure. The template tier directive forces this file to compile before later tier files. The script fills a DataTable, then iterates over the rows and emits one Connection per row.
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<#
string metaConnString = @"Data Source=.\sql2019;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=RetailMeta";
DataTable connRows = new DataTable();
SqlDataAdapter connAdapter = new SqlDataAdapter("exec [dbo].[staging_connections];", metaConnString);
connAdapter.Fill(connRows);
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
foreach (DataRow row in connRows.Rows) { #>
<Connection Name="<#= row["connection_name"].ToString() #>" ConnectionString="<#= row["connection_definition"] #>" />
<# } #>
</Connections>
</Biml>
After the script runs, the intermediate Biml looks like this:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<Connection Name="staging_db" ConnectionString="Provider=SQLNCLI11;Data Source=.\sql2019;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=RetailStage" />
<Connection Name="retail_source" ConnectionString="Provider=SQLNCLI11;Data Source=.\sql2019;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=RetailSource" />
</Connections>
</Biml>
Code Block Scope
The first code block fills the DataTable. Variables declared between code-block tags persist through the file, so connRows is available everywhere below. The second block opens a foreach loop with curly braces, then the loop body switches back to static Biml until the closing brace. Variables declared inside the C# scope, such as the loop variable row, are valid in any code-nugget block written between those braces. That mix of script-and-static is what binds the row's column values into the emitted Biml.
Emit Style: Mixed Or Pure Code
There are two ways to emit Biml. The mixed style above interleaves script tags inside the Connection element. The pure-code style writes the whole Connection element as a single string from C#.
<#
foreach (DataRow row in connRows.Rows) {
WriteLine("<Connection Name=\"" + row["connection_name"] + "\" ConnectionString=\"" + row["connection_definition"] + "\" />");
}
#>
Both styles work. The mixed style is usually easier to read because the static Biml stays recognizable.
Up Next
Part 2 builds the extraction package itself, with one container per extraction row, a truncate task, a dataflow, and explicit column mappings sourced from the third metadata procedure.