Skip to main content

Metadata Based SSIS with Biml, Part 1

Why a Metadata Layer

A simple Biml staging template hard codes a list of source tables in the file itself. That works for one or two tables. As the project grows, the template becomes a maintenance burden: every new table is another edit, every removed table is another cleanup. A small metadata layer reverses the relationship. The list of tables to load lives as rows in a database table. The Biml template reads the rows and emits one staging table and one load step for each row. Adding a table becomes a single INSERT.

This walkthrough covers a first cut at that layer with a single source database, a single target database, and one metadata table that holds nothing more than the source table names.

The Metadata Table

In the target database, define a small lookup table and seed it with the source tables that should be loaded:

CREATE TABLE dbo.LoadDriver_Tables (
TableName nvarchar(50) NULL
);

INSERT INTO dbo.LoadDriver_Tables
SELECT name
FROM SourceDb.sys.objects
WHERE type = 'U'
AND name LIKE 'Sales%';

The Biml project reads this table to decide which tables to import from the source.

A Helper Class for Reusable Logic

Most of the BimlScript pattern fits in a single helper class shared between the Biml files. It loads a row set into a generic list and exposes a default 'ImportOptions' value that excludes the parts of the schema the staging layer does not need:

using Varigence.Biml.CoreLowerer.SchemaManagement;
using Varigence.Biml.Extensions;
using Varigence.Languages.Biml;
using Varigence.Languages.Biml.Connection;
using System.Data;
using System.Collections.Generic;

public class LoadHelpers
{
public static string TargetDb = "WarehouseDb";

public static List<string> GetNonEmptyList(AstDbConnectionNode conn, string sqlOrTable)
{
var sql = sqlOrTable.Contains(" ") ? sqlOrTable : "SELECT * FROM " + sqlOrTable;
var dt = ExternalDataAccess.GetDataTable(conn.ConnectionString, sql);
var list = new List<string>();
foreach (DataRow row in dt.Rows)
{
list.Add(row[0].ToString());
}
if (list.Count == 0) list.Add("NONEMPTYFILLER");
return list;
}

public static ImportOptions DefaultImportOptions()
{
return ImportOptions.ExcludeIdentity
| ImportOptions.ExcludePrimaryKey
| ImportOptions.ExcludeUniqueKey
| ImportOptions.ExcludeColumnDefault
| ImportOptions.ExcludeIndex
| ImportOptions.ExcludeCheckConstraint
| ImportOptions.ExcludeForeignKey;
}
}

The 'NONEMPTYFILLER' guard prevents 'GetDatabaseSchema' from interpreting an empty list as "import every table." If no rows come back, a sentinel name keeps the filter active.

The Environment File

The connections file is plain Biml that points at the source and target databases. The target database needs both a Connection and a Database/Schema definition because later files build new tables under it:

<#@ template language="C#" tier="1" #>
<#@ code file="../code/LoadHelpers.cs" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="WarehouseConn"
ConnectionString="Provider=SQLNCLI11;Server=.;Initial Catalog=<#=LoadHelpers.TargetDb#>;Integrated Security=SSPI;" />
<OleDbConnection Name="SourceConn"
ConnectionString="Provider=SQLNCLI11;Server=.;Initial Catalog=SourceDb;Integrated Security=SSPI;" />
</Connections>
<Databases>
<Database Name="<#=LoadHelpers.TargetDb#>" ConnectionName="WarehouseConn" />
</Databases>
<Schemas>
<Schema Name="dbo" DatabaseName="<#=LoadHelpers.TargetDb#>" />
</Schemas>
</Biml>

Building Tables From Metadata

The next file does the real work. It reads the source table list from 'LoadDriver_Tables', uses 'GetDatabaseSchema' on the source connection to pull the column shape for those tables, and emits a Biml 'Table' for each one. The SELECT statement that the load package will run is stored as an annotation tagged 'SqlSelect':

<#@ template language="C#" tier="2" #>
<#@ code file="../code/LoadHelpers.cs" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<#
var srcConn = (AstDbConnectionNode)RootNode.Connections["SourceConn"];
var tgtConn = (AstDbConnectionNode)RootNode.Connections["WarehouseConn"];
var tableList = LoadHelpers.GetNonEmptyList(tgtConn, "LoadDriver_Tables");
var importResult = srcConn.GetDatabaseSchema(null, tableList, LoadHelpers.DefaultImportOptions());

foreach (var srcTable in importResult.TableNodes) {
#>
<Table Name="Stg_<#=srcTable.Schema.Name#>_<#=srcTable.Name#>"
SchemaName="<#=LoadHelpers.TargetDb#>.dbo">
<Columns>
<#=srcTable.Columns.GetBiml()#>
</Columns>
<Annotations>
<Annotation AnnotationType="Tag" Tag="SqlSelect">
SELECT <#=srcTable.GetColumnList()#> FROM <#=srcTable.SchemaQualifiedName#>
</Annotation>
</Annotations>
</Table>
<# } #>
</Tables>
</Biml>

The annotation lets the next file recover the SELECT without re-running 'GetDatabaseSchema'.

Creating the Staging Tables

The third file generates one Execute SQL task per Biml table. Each task issues the DROP and CREATE for that table:

<#@ template language="C#" tier="3" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="01_CreateStaging" ConstraintMode="Parallel"
ProtectionLevel="DontSaveSensitive">
<Tasks>
<# foreach (var stgTable in RootNode.Tables) { #>
<ExecuteSQL Name="Create <#=stgTable.Name#>" ConnectionName="WarehouseConn">
<DirectInput><#=stgTable.GetDropAndCreateDdl()#></DirectInput>
</ExecuteSQL>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>

Run this package before the load package so every staging target exists.

Populating the Staging Tables

The fourth file builds the load package. It loops over the same Biml tables, wraps each one in a Sequence Container, truncates the target, and runs a dataflow. The OleDbSource pulls the SELECT from the table's 'SqlSelect' annotation:

<#@ template language="C#" tier="4" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="02_PopulateStaging" ConstraintMode="Parallel"
ProtectionLevel="DontSaveSensitive">
<Tasks>
<# foreach (var stgTable in RootNode.Tables) { #>
<Container Name="Transfer <#=stgTable.Schema.Name#>_<#=stgTable.Name#>"
ConstraintMode="Linear">
<Tasks>
<ExecuteSQL Name="Truncate" ConnectionName="WarehouseConn">
<DirectInput>TRUNCATE TABLE <#=stgTable.ScopedName#></DirectInput>
</ExecuteSQL>
<Dataflow Name="Copy <#=stgTable.Name#>">
<Transformations>
<OleDbSource Name="GetRows" ConnectionName="SourceConn">
<DirectInput><#=stgTable.GetTag("SqlSelect")#></DirectInput>
</OleDbSource>
<OleDbDestination Name="SetRows" ConnectionName="WarehouseConn">
<TableOutput TableName="<#=stgTable.ScopedName#>" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Container>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>

A single execution of this package fills every staging table with rows from the source.

Where the Pattern Goes Next

This first cut treats every table the same way: full reload from a single source with no column filtering. Adding a row to 'LoadDriver_Tables' is the only change needed to bring a new table into the load. The next part extends the metadata so each entry can name its source connection and a column list, lifting both restrictions.