Content Driven Extraction with Biml: Part 1

Why Drive ETL from Database Content

Defining solution objects in Biml works, but it ties the metadata to the SSDT representation. An alternative is to keep the ETL definition in a relational model. Stored procedures or views return rows that describe connections, source-to-target mappings, and column lists. A Biml script consumes those rows and emits packages. The benefits are language-agnostic editing, multiple front-end interfaces, and consistent generation of large package counts.

This is not a data-driven modeling pattern. It assumes the target model already exists and is concerned with managing how data flows into it.

The Sample Mapping

The example moves data from a source database into a staging area. Source and target column names do not match, so default name-based mapping cannot be relied on. Source and target table names also do not match, so the script must look up the correct names. A load_dt column on the target is populated by the ETL itself.

Three Metadata Procedures

The metadata environment uses three procedures.

staging_connections returns connection name and connection definition pairs.
staging_extractions returns extraction header rows: source connection, source schema, source table, target connection, target schema, target table, and an extraction_id.
staging_column_mappings returns the column-level source-to-target mappings for a given extraction_id.

Building Connections in Tier 1

The first generator file uses the staging_connections procedure. The template tier directive forces this file to compile before later tier files. The script fills a DataTable, then iterates over the rows and emits one Connection per row.

<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="1" #>
<#
  string metaConnString = @"Data Source=.\sql2019;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=RetailMeta";
  DataTable connRows = new DataTable();
  SqlDataAdapter connAdapter = new SqlDataAdapter("exec [dbo].[staging_connections];", metaConnString);
  connAdapter.Fill(connRows);
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <#
      foreach (DataRow row in connRows.Rows) { #>
      <Connection Name="<#= row["connection_name"].ToString() #>" ConnectionString="<#= row["connection_definition"] #>" />
    <# } #>
  </Connections>
</Biml>

After the script runs, the intermediate Biml looks like this:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <Connection Name="staging_db" ConnectionString="Provider=SQLNCLI11;Data Source=.\sql2019;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=RetailStage" />
    <Connection Name="retail_source" ConnectionString="Provider=SQLNCLI11;Data Source=.\sql2019;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=RetailSource" />
  </Connections>
</Biml>

Code Block Scope

The first code block fills the DataTable. Variables declared between code-block tags persist through the file, so connRows is available everywhere below. The second block opens a foreach loop with curly braces, then the loop body switches back to static Biml until the closing brace. Variables declared inside the C# scope, such as the loop variable row, are valid in any code-nugget block written between those braces. That mix of script-and-static is what binds the row's column values into the emitted Biml.

Emit Style: Mixed Or Pure Code

There are two ways to emit Biml. The mixed style above interleaves script tags inside the Connection element. The pure-code style writes the whole Connection element as a single string from C#.

<#
  foreach (DataRow row in connRows.Rows) {
    WriteLine("<Connection Name=\"" + row["connection_name"] + "\" ConnectionString=\"" + row["connection_definition"] + "\" />");
  }
#>

Both styles work. The mixed style is usually easier to read because the static Biml stays recognizable.

Up Next

Part 2 builds the extraction package itself, with one container per extraction row, a truncate task, a dataflow, and explicit column mappings sourced from the third metadata procedure.

Why Drive ETL from Database Content​

The Sample Mapping​

Three Metadata Procedures​

Building Connections in Tier 1​

Code Block Scope​

Emit Style: Mixed Or Pure Code​

Up Next​