Metadata-Driven Data Flow Transformations With Biml
Why Drive Transformations From Metadata
Most ETL data flows do more than copy rows from a source to a destination. They derive new columns, look values up against dimension tables, exclude nulls, and aggregate. These shapes recur across many data flows. Storing the configuration of each transformation in a small set of metadata tables and generating the data flow from that metadata keeps the dozens of similar packages consistent and lets new transformations join the catalog without writing more Biml.
Tables That Describe a Transformation
A transformation catalog needs a few tables. A header table identifies each transformation by a short name and ties it back to a data flow. A detail table describes the columns inside the transformation. Lookup transformations need an extra row that captures the source query for the lookup table.
The script below targets a database named MetadataModel that contains tables and stored procedures for two transformation types: derived column and lookup.
USE MetadataModel;
GO
CREATE TABLE dbo.DerivedColumnHeader
(
DerivedHeaderId INT IDENTITY(1,1) PRIMARY KEY,
TransformationShortName NVARCHAR(50) NOT NULL,
TransformationComment NVARCHAR(200) NULL
);
CREATE TABLE dbo.DerivedColumnDetail
(
DerivedHeaderId INT NOT NULL,
DerivedDetailId INT NOT NULL PRIMARY KEY,
DerivedColumnName NVARCHAR(250) NOT NULL,
DerivedColumnType NVARCHAR(250) NOT NULL,
Expression NVARCHAR(200) NOT NULL,
DataType NVARCHAR(50) NOT NULL,
DataLength NVARCHAR(50) NOT NULL,
ReplaceExisting NVARCHAR(50) NULL,
CONSTRAINT FK_DerivedDetail_Header FOREIGN KEY (DerivedHeaderId)
REFERENCES dbo.DerivedColumnHeader(DerivedHeaderId)
);
CREATE TABLE dbo.LookupHeader
(
LookupHeaderId INT IDENTITY(1,1) PRIMARY KEY,
TransformationShortName NVARCHAR(250) NOT NULL,
SourceDataPackageId INT NOT NULL,
NoMatchConfiguration NVARCHAR(250) NOT NULL,
SourceQuery NVARCHAR(4000) NOT NULL
);
CREATE TABLE dbo.LookupDetail
(
LookupDetailId INT NOT NULL PRIMARY KEY,
LookupHeaderId INT NOT NULL,
ColumnName NVARCHAR(50) NOT NULL,
DataType NVARCHAR(50) NOT NULL,
DataLength NVARCHAR(50) NOT NULL,
ColumnDirection NCHAR(10) NOT NULL -- 'IN' or 'OUT'
);
Stored Procedures That Feed the Generator
Three stored procedures expose the metadata to the BimlScript driver. One returns the derived column transformation header and detail joined together. Two more return the lookup header information and the input and output column lists for each lookup.
CREATE PROCEDURE dbo.ListDerivedTransformations
AS
SELECT h.TransformationShortName,
d.DerivedColumnName,
d.Expression,
d.DataType,
d.DataLength,
d.ReplaceExisting
FROM dbo.DerivedColumnHeader h
JOIN dbo.DerivedColumnDetail d
ON h.DerivedHeaderId = d.DerivedHeaderId;
GO
CREATE PROCEDURE dbo.ListLookupHeaderForCustomer
AS
SELECT h.TransformationShortName,
LTRIM(RTRIM(p.Name)) AS ConnectionName,
h.SourceQuery
FROM dbo.LookupHeader h
JOIN dbo.DataPackage p
ON p.DataPackageId = h.SourceDataPackageId;
GO
CREATE PROCEDURE dbo.ListLookupColumns
@ColumnDirection NCHAR(10)
AS
SELECT h.TransformationShortName,
d.ColumnName
FROM dbo.LookupHeader h
JOIN dbo.LookupDetail d
ON h.LookupHeaderId = d.LookupHeaderId
WHERE d.ColumnDirection = @ColumnDirection;
GO
Sample Configuration Row
A single derived column row that uppercases a Name column looks like this:
INSERT INTO dbo.DerivedColumnHeader (TransformationShortName, TransformationComment)
VALUES ('DC_NameToUpper', 'Convert customer name to upper case');
INSERT INTO dbo.DerivedColumnDetail
(DerivedHeaderId, DerivedDetailId, DerivedColumnName, DerivedColumnType,
Expression, DataType, DataLength, ReplaceExisting)
VALUES
(1, 1, 'Name', 'DER', 'UPPER(Name)', 'String', '100', 'true');
A lookup that resolves a phone number for a customer name:
INSERT INTO dbo.LookupHeader
(LookupHeaderId, TransformationShortName, SourceDataPackageId, NoMatchConfiguration, SourceQuery)
VALUES
(1, 'LK_PhoneByName', 1, 'NA', 'SELECT Name, PhoneNumber FROM dbo.CustomerLookup');
INSERT INTO dbo.LookupDetail
(LookupDetailId, LookupHeaderId, ColumnName, DataType, DataLength, ColumnDirection)
VALUES
(1, 1, 'Name', 'String', '250', 'IN'),
(2, 1, 'PhoneNumber', 'String', '250', 'OUT');
Generating a Derived Column Inside a Data Flow
The driver opens a connection to MetadataModel, loads the derived column row into a DataTable, and emits a Data Flow with an OLE DB source, a DerivedColumns transformation, and an OLE DB destination. The values flow from the metadata row into the Biml attributes:
<#@ template language="C#" tier="1" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
string metaConnString = @"Data Source=localhost\BISERVER;Initial Catalog=MetadataModel;Integrated Security=SSPI;";
DataTable connRows = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListConnectionDetails", metaConnString))
sda.Fill(connRows);
DataRow connRow = connRows.Rows[0];
DataTable mappingRows = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListCustomerCopyMapping", metaConnString))
sda.Fill(mappingRows);
DataRow mappingRow = mappingRows.Rows[0];
DataTable transformRows = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListDerivedTransformations", metaConnString))
sda.Fill(transformRows);
DataRow transformRow = transformRows.Rows[0];
#>
<Connection Name="<#=connRow["ConnectionName"]#>"
ConnectionString="<#=connRow["ConnectionString"]#>" />
</Connections>
<Packages>
<Package Name="LoadCustomerCopy" ConstraintMode="Linear">
<Tasks>
<ExecuteSQL Name="Truncate Target" ConnectionName="<#=connRow["ConnectionName"]#>">
<DirectInput>TRUNCATE TABLE dbo.CustomerCopy</DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT Copy Customer Data">
<Transformations>
<OleDbSource Name="Source" ConnectionName="<#=mappingRow["ConnectionName"]#>">
<DirectInput>SELECT * FROM <#=mappingRow["SourceTableName"]#></DirectInput>
</OleDbSource>
<DerivedColumns Name="<#=transformRow["TransformationShortName"]#>">
<Columns>
<Column ReplaceExisting="<#=transformRow["ReplaceExisting"]#>"
Name="<#=transformRow["DerivedColumnName"]#>"
DataType="<#=transformRow["DataType"]#>"
Length="<#=transformRow["DataLength"]#>">
<#=transformRow["Expression"]#>
</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="Target" ConnectionName="<#=mappingRow["ConnectionName"]#>">
<ExternalTableOutput Table="<#=mappingRow["DestinationTableName"]#>" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Generating a Lookup Inside the Same Data Flow
A Lookup component needs three pieces of metadata: the header row that carries the source query and connection, the input column list, and the output column list. The driver loads each into its own DataTable and emits a Lookup with Inputs and Outputs columns:
<#@ template language="C#" tier="1" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<#
string metaConnString = @"Data Source=localhost\BISERVER;Initial Catalog=MetadataModel;Integrated Security=SSPI;";
DataTable connRows = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListConnectionDetails", metaConnString))
sda.Fill(connRows);
DataRow connRow = connRows.Rows[0];
DataTable mappingRows = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListCustomerCopyMapping", metaConnString))
sda.Fill(mappingRows);
DataRow mappingRow = mappingRows.Rows[0];
DataTable lookupHeader = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListLookupHeaderForCustomer", metaConnString))
sda.Fill(lookupHeader);
DataRow lookupRow = lookupHeader.Rows[0];
DataTable lookupIn = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListLookupColumns 'IN'", metaConnString))
sda.Fill(lookupIn);
DataRow inputRow = lookupIn.Rows[0];
DataTable lookupOut = new DataTable();
using (var sda = new SqlDataAdapter("EXEC dbo.ListLookupColumns 'OUT'", metaConnString))
sda.Fill(lookupOut);
DataRow outputRow = lookupOut.Rows[0];
#>
<Connection Name="<#=connRow["ConnectionName"]#>"
ConnectionString="<#=connRow["ConnectionString"]#>" />
</Connections>
<Packages>
<Package Name="LoadCustomerWithLookup" ConstraintMode="Linear">
<Tasks>
<ExecuteSQL Name="Truncate Target" ConnectionName="<#=connRow["ConnectionName"]#>">
<DirectInput>TRUNCATE TABLE dbo.CustomerCopy</DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT Copy Customer Data">
<Transformations>
<OleDbSource Name="Source" ConnectionName="<#=mappingRow["ConnectionName"]#>">
<DirectInput>SELECT * FROM <#=mappingRow["SourceTableName"]#></DirectInput>
</OleDbSource>
<Lookup Name="<#=lookupRow["TransformationShortName"]#>"
OleDbConnectionName="<#=lookupRow["ConnectionName"]#>">
<DirectInput><#=lookupRow["SourceQuery"]#></DirectInput>
<Inputs>
<Column SourceColumn="<#=inputRow["ColumnName"]#>" />
</Inputs>
<Outputs>
<Column SourceColumn="<#=outputRow["ColumnName"]#>" />
</Outputs>
</Lookup>
<OleDbDestination Name="Target" ConnectionName="<#=mappingRow["ConnectionName"]#>">
<ExternalTableOutput Table="<#=mappingRow["DestinationTableName"]#>" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Extending to Multiple Transformations Per Data Flow
The single-row examples above are easier to read than the production version. A real data flow typically needs several transformations chained in order. A TransformationStep table that ties each transformation to a data flow and a sequence column is the bridge. The driver script reads all transformations for the current data flow ordered by sequence and switches on TransformationType to decide which template block to emit. The catalog grows naturally to cover Union All, Merge Join, Sort, Aggregate, and any other transformation that has a stable shape.
The benefit compounds. Once the metadata model knows how to describe each transformation type, every new data flow becomes an INSERT into the catalog rather than a new Biml file.