Automating Lookups with Biml
Why Generate Lookup Packages
Building a dimension table is largely a denormalization exercise. Start with a base list of products. Add subcategory attributes by joining on a key. Add category attributes by joining on the next key up. Each step is a Lookup transform with the same shape: a query, a join column, and a list of return columns. When the same denormalization runs across many entities, the per-package work is repetitive. A metadata-driven Biml script reads the steps out of two tables and generates the packages.
A Header and Detail Metadata Pattern
The metadata sits in two tables. The header table lists each transform job (one job per package). The detail table lists the lookups for a job, returned by a stored procedure that takes the job name as input. The first column of every lookup SELECT is the join key, and the remaining columns are the values to attach.
| Step | SQL |
|---|---|
| Base list of products | SELECT ProductAlternateKey, EnglishProductName, ProductSubcategoryKey FROM dbo.DimProduct |
| Subcategory lookup | SELECT ProductSubcategoryKey, EnglishProductSubcategoryName, ProductCategoryKey FROM DimProductSubcategory |
| Category lookup | SELECT ProductCategoryKey, EnglishProductCategoryName FROM DimProductCategory |
A header row holds the job name, the source query, the connection, the target schema, and the target table. A detail row holds the lookup name, its connection, and its SELECT statement.
The Generator Script
The script reads the header rows, opens an outer foreach over jobs, builds a Package per job, and inside each Package emits a Dataflow with an OleDbSource, the chained Lookups, and a final OleDbDestination. A SQLCols helper extracts the column list from each lookup SELECT so the script can emit Inputs and Outputs in the right order.
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#
string srcConnString = @"Data Source=.\sql2019;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=RetailSource";
string metaConnString = @"Data Source=.\sql2019;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=RetailMeta";
DataTable jobs = new DataTable();
SqlDataAdapter jobAdapter = new SqlDataAdapter("exec dbo.transforms_entities", metaConnString);
jobAdapter.Fill(jobs);
#>
<#+
public string[] SQLCols(string SelectSql)
{
SelectSql = SelectSql.Substring(SelectSql.ToUpper().IndexOf("SELECT") + 6);
SelectSql = SelectSql.Substring(0, SelectSql.ToUpper().IndexOf("FROM "));
while (SelectSql.IndexOf(" ") != -1)
SelectSql = SelectSql.Replace(" ", "");
return SelectSql.Split(',');
}
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="retail_source" ConnectionString="Provider=SQLNCLI11;Data Source=.\sql2019;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=RetailSource" />
<OleDbConnection Name="retail_stage" ConnectionString="Provider=SQLNCLI11;Data Source=.\sql2019;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=RetailStage" />
</Connections>
<Packages>
<# foreach (DataRow job in jobs.Rows) { #>
<Package Name="DIM_<#= job["name"] #>" ConstraintMode="Linear">
<Tasks>
<Dataflow Name="DFT_build_<#= job["name"] #>">
<Transformations>
<OleDbSource Name="SRC_<#= job["name"] #>" ConnectionName="<#= job["connection"] #>">
<DirectInput>
<#= job["src_command"] #>
</DirectInput>
</OleDbSource>
<#
DataTable lookups = new DataTable();
SqlDataAdapter lookupAdapter = new SqlDataAdapter("exec [dbo].[transforms_lookups] @transforms_entity='" + job["name"] + "'", metaConnString);
lookupAdapter.Fill(lookups);
foreach (DataRow lookup in lookups.Rows) { #>
<Lookup Name="LK_<#= lookup["name"] #>" OleDbConnectionName="<#= lookup["connection"] #>" NoMatchBehavior="IgnoreFailure">
<DirectInput><#= lookup["src_cmd"] #></DirectInput>
<# string[] cols = SQLCols(lookup["src_cmd"].ToString()); #>
<Inputs>
<Column SourceColumn="<#= cols[0] #>" />
</Inputs>
<Outputs>
<# for (int i = 1; i < cols.Length; i++) { #>
<Column SourceColumn="<#= cols[i] #>" />
<# } #>
</Outputs>
</Lookup>
<# } #>
<OleDbDestination Name="TGT_<#= job["tgt_connection"] #>" ConnectionName="<#= job["tgt_connection"] #>">
<ExternalTableOutput Table="<#= job["tgt_table"] #>" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>
How the Lookup Configuration Is Derived
The minimum a Lookup needs is a connection, a SELECT, an Inputs node that lists the join columns, and an Outputs node that lists the columns to attach. SQLCols deconstructs the SELECT statement to pull the column names out in order. By convention the first name is the join column. Everything after it becomes a return column.
job_list <- list of packages to create
foreach (job in job_list)
{
create a package (name = DIM_job_name)
add a dataflow to the package (name = DFT_build_job_name)
add the source to the data flow (job.src_command)
lookups <- results of detail proc (job.name)
foreach (lookup in lookups)
{
add a lookup
(name = LK_lookup.name, source = lookup.src_cmd && lookup.connection)
cols <- columns from lookup.src_cmd
Add Input (source column) = cols[0]
foreach (col after cols[0]) Add Output column = col
}
}
When the Pattern Pays Off
For a single package, the metadata setup is more work than dragging the lookups by hand. The benefit appears as soon as more than a handful of packages share the same pattern, when the same template is reused across consulting engagements, or when consistency across packages matters. Adding a new job becomes a row insert, not a package edit.