Biml for a Type 1 Slowly Changing Dimension
Why This Pattern Matters
A Type 1 slowly changing dimension overwrites every non key attribute with the latest source value. The hard part is not the SQL but keeping every dimension load consistent across an entire warehouse project. A BimlScript template captures the pattern once and emits a fully wired SSIS package per dimension, which keeps audit logging, change detection, and update behavior identical from one dimension to the next.
This walkthrough builds the template against a small example dimension that holds a list of sales reasons.
Required Database Objects
The pattern needs a staging table populated by an upstream copy step, the dimension table itself, and a small staging update table for changed rows.
CREATE TABLE [Staging].[SalesReason](
[SalesReasonID] int NOT NULL,
[Name] nvarchar(50) NOT NULL,
[ReasonType] nvarchar(50) NOT NULL,
[ModifiedDate] datetime NOT NULL
);
CREATE TABLE [dbo].[DimSalesReason](
[SalesReasonKey] int IDENTITY(1,1) NOT NULL PRIMARY KEY,
[SalesReasonID] int NOT NULL,
[SalesReasonName] nvarchar(50) NOT NULL,
[SalesReasonType] nvarchar(50) NOT NULL,
[HistoricalHashKey] varbinary(20) NOT NULL,
[ChangeHashKey] varbinary(20) NOT NULL,
[InsertDttm] datetime NOT NULL,
[UpdateDttm] datetime NULL
);
CREATE TABLE [Updt].[UpdtSalesReason](
[SalesReasonID] int NOT NULL,
[SalesReasonName] nvarchar(50) NOT NULL,
[SalesReasonType] nvarchar(50) NOT NULL,
[HistoricalHashKey] varbinary(20) NOT NULL,
[ChangeHashKey] varbinary(20) NOT NULL,
[InsertDttm] datetime NOT NULL
);
The 'HistoricalHashKey' represents the business key that defines uniqueness in the dimension. The 'ChangeHashKey' represents every other tracked column. Comparing two short hash columns is faster than comparing every attribute, and the same shape works for narrow and wide dimensions.
Source View
The source view does the joins, the type conversions, the unknown member injection, and the hash calculations. The data flow then only handles change detection and routing.
CREATE VIEW [Staging].[DimSalesReason] AS
WITH SalesReasonData AS (
SELECT
SalesReasonID,
Name AS SalesReasonName,
ReasonType AS SalesReasonType,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(SalesReasonID, ' '))) AS HistoricalHashKey,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(Name, ' ', ReasonType))) AS ChangeHashKey,
CURRENT_TIMESTAMP AS InsertDttm,
CURRENT_TIMESTAMP AS UpdtDttm
FROM [Staging].[SalesReason]
UNION
SELECT
-1, 'Unknown', 'Unknown',
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(-1, ' '))),
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT('Unknown', ' ', 'Unknown'))),
CURRENT_TIMESTAMP, CURRENT_TIMESTAMP
)
SELECT
SalesReasonID, SalesReasonName, SalesReasonType,
HistoricalHashKey, ChangeHashKey,
CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII,
CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII,
InsertDttm, UpdtDttm
FROM SalesReasonData;
The unknown member uses business key '-1' so the surrogate key can be an identity value. Fact loads look up the unknown row by business key, which means there is no need to disable identity insert during deployment.
The Reusable Template
A 'Dim1.biml' template carries the pattern. Code nuggets receive the package name, the destination table, the source query, and the update statement. Tier 2 ensures the project connections from a tier 1 file are loaded first.
<#@ template language="C#" tier="2" #>
<#@ property name="PackageName" type="String" #>
<#@ property name="DstSchemaName" type="String" #>
<#@ property name="DstTableName" type="String" #>
<#@ property name="DstConnection" type="String" #>
<#@ property name="DataFlowSourceName" type="String" #>
<#@ property name="SrcConnection" type="String" #>
<#@ property name="SourceQuery" type="String" #>
<#@ property name="UpdateSchemaName" type="String" #>
<#@ property name="UpdateTableName" type="String" #>
<#@ property name="UpdateConnection" type="String" #>
<#@ property name="UpdateSQLStatement" type="String" #>
<Package Name="<#=PackageName#>" Language="None">
<Variables>
<Variable DataType="Int32" Name="RowCountChanged">0</Variable>
<Variable DataType="Int32" Name="RowCountNew">0</Variable>
<Variable DataType="Int32" Name="RowCountSource">0</Variable>
<Variable DataType="Int32" Name="RowCountUnchanged">0</Variable>
</Variables>
<Tasks>
<ExecuteSQL Name="SQL Truncate <#=UpdateTableName#>"
ConnectionName="<#=UpdateConnection#>">
<DirectInput>TRUNCATE TABLE [<#=UpdateSchemaName#>].[<#=UpdateTableName#>]</DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT Insert<#=DstTableName#>">
<Transformations>
<OleDbSource Name="<#=DataFlowSourceName#>" ConnectionName="<#=SrcConnection#>">
<DirectInput><#=SourceQuery#></DirectInput>
</OleDbSource>
<RowCount Name="CNT Source_Rows" VariableName="User.RowCountSource">
<InputPath OutputPathName="<#=DataFlowSourceName#>.Output" />
</RowCount>
<Lookup Name="LKP Historical Key" NoMatchBehavior="RedirectRowsToNoMatchOutput"
OleDbConnectionName="<#=DstConnection#>">
<DirectInput>
SELECT
CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII,
CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII
FROM <#=DstSchemaName#>.<#=DstTableName#>
</DirectInput>
<Inputs>
<Column SourceColumn="HistoricalHashKeyASCII" TargetColumn="HistoricalHashKeyASCII" />
</Inputs>
<Outputs>
<Column SourceColumn="ChangeHashKeyASCII" TargetColumn="lkp_ChangeHashKeyASCII" />
</Outputs>
<InputPath OutputPathName="CNT Source_Rows.Output" />
</Lookup>
<ConditionalSplit Name="CSPL Check For Changes">
<InputPath OutputPathName="LKP Historical Key.Match" />
<OutputPaths>
<OutputPath Name="ChangedRows">
<Expression>ChangeHashKeyASCII != lkp_ChangeHashKeyASCII</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>
<RowCount Name="CNT Changed_Rows" VariableName="User.RowCountChanged">
<InputPath OutputPathName="CSPL Check For Changes.ChangedRows" />
</RowCount>
<OleDbDestination Name="OLE_DST Update Table" ConnectionName="<#=DstConnection#>">
<InputPath OutputPathName="CNT Changed_Rows.Output" />
<ExternalTableOutput Table="[<#=UpdateSchemaName#>].[<#=UpdateTableName#>]" />
</OleDbDestination>
<RowCount Name="CNT New_Rows" VariableName="User.RowCountNew">
<InputPath OutputPathName="LKP Historical Key.NoMatch" />
</RowCount>
<OleDbDestination Name="OLE_DST New_Rows" ConnectionName="<#=DstConnection#>">
<InputPath OutputPathName="CNT New_Rows.Output" />
<ExternalTableOutput Table="<#=DstSchemaName#>.<#=DstTableName#>" />
</OleDbDestination>
<RowCount Name="CNT Unchanged_Rows" VariableName="User.RowCountUnchanged">
<InputPath OutputPathName="CSPL Check For Changes.Default" />
</RowCount>
</Transformations>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="SQL Truncate <#=UpdateTableName#>.Output" />
</Inputs>
</PrecedenceConstraints>
</Dataflow>
<ExecuteSQL Name="SQL Update <#=DstTableName#>" ConnectionName="<#=DstConnection#>">
<DirectInput><#=UpdateSQLStatement#></DirectInput>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="DFT Insert<#=DstTableName#>.Output" />
</Inputs>
</PrecedenceConstraints>
</ExecuteSQL>
</Tasks>
</Package>
The Driver Script
A second Biml file passes dimension specific values into the template. In a real project the driver loops over rows in a metadata table so a single execution emits every dimension package.
<#@ template language="C#" hostspecific="true" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#@ include file="ProjectConnection.biml" #>
<#
string PackageName = "LoadDimSalesReason";
string DstSchemaName = "dbo";
string DstTableName = "DimSalesReason";
string DstConnection = "WarehouseConn";
string DataFlowSourceName = "OLE_SRC StgDimSalesReason";
string SrcConnection = "WarehouseConn";
string SourceQuery = @"SELECT SalesReasonID, SalesReasonName, SalesReasonType,
HistoricalHashKey, ChangeHashKey,
HistoricalHashKeyASCII, ChangeHashKeyASCII,
InsertDttm, UpdtDttm
FROM [Staging].[DimSalesReason]";
string UpdateSchemaName = "Updt";
string UpdateTableName = "UpdtSalesReason";
string UpdateConnection = "WarehouseConn";
string UpdateSQLStatement = @"UPDATE d
SET d.SalesReasonName = u.SalesReasonName,
d.SalesReasonType = u.SalesReasonType,
d.ChangeHashKey = u.ChangeHashKey
FROM Updt.UpdtSalesReason u
INNER JOIN dbo.DimSalesReason d
ON u.HistoricalHashKey = d.HistoricalHashKey;";
#>
<Packages>
<#=CallBimlScript("Dim1.biml", PackageName, DstSchemaName, DstTableName, DstConnection,
DataFlowSourceName, SrcConnection, SourceQuery, UpdateSchemaName, UpdateTableName,
UpdateConnection, UpdateSQLStatement)#>
</Packages>
</Biml>
What the Generated Package Does
The control flow truncates the update table, runs the data flow, and then runs a single set based UPDATE that joins the update table back to the dimension on 'HistoricalHashKey'. Inside the data flow:
- Source rows arrive from the staging view.
- A lookup against the dimension on 'HistoricalHashKeyASCII' sends no match rows directly to an insert into the dimension table.
- Match rows pass through a conditional split that compares 'ChangeHashKeyASCII' with the lookup output. Equal rows are counted and discarded; unequal rows go to the update table.
The result is a Type 1 dimension load that grows nothing more than the metadata table as new dimensions are added.