Skip to main content

Type 2 Slowly Changing Dimension with Biml

Why This Pattern Matters

Most warehouses contain at least a few Type 2 slowly changing dimensions. The built in SSIS Slowly Changing Dimension Transformation often performs poorly because it issues row by row OLE DB updates. A staging table plus a single set based UPDATE removes that bottleneck and gives a consistent pattern that does not need to change as the dimension grows.

This walkthrough captures the design as a BimlScript template that any number of dimensions can reuse.

Control Flow

The package follows the same shape as a Type 1 load:

  1. Log the start of the package run.
  2. Truncate the staging update table.
  3. Run the data flow that inserts new rows directly and writes change rows to the update table.
  4. Run a single Execute SQL Task that expires existing rows and inserts new versions.
  5. Log the end of the package run.

Data Flow Logic

Business logic lives in a database view. SSIS handles the change detection mechanics. The data flow:

  1. Reads the source view.
  2. Counts source rows for audit logging.
  3. Looks up 'HistoricalHashKeyASCII' in the dimension table to bring back 'lkp_ChangeHashKeyASCII'.
  4. Sends no-match rows through a derived column that sets 'RowStartDate' to the warehouse epoch and inserts them as new dimension rows.
  5. Sends match rows through a conditional split that fires when the change hash differs, and writes those rows to a staging update table.
  6. Counts unchanged rows for audit purposes.

Source View with Hash Keys

The source view exposes the natural key as 'HistoricalHashKey' and the change tracked columns as 'ChangeHashKey'. The view also adds a synthetic unknown row and the row metadata columns the data flow needs:

CREATE VIEW [dbo].[StgItem]
AS
WITH ItemBase AS (
SELECT
ItemId, ItemName, ItemNumber, IsManufactured, IsFinishedGood,
ColorName, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice,
SizeText, SizeUom, WeightUom, WeightValue, DaysToManufacture,
ProductLine, ItemClass, ItemStyle, SellStartDate, SellEndDate,
DiscontinuedDate, 1 AS RowIsCurrent, 0 AS IsDeleted
FROM Staging.Item
UNION
SELECT -1, 'unknown', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, 1, 0
),
ItemData AS (
SELECT
ItemId, ItemName, ItemNumber, IsManufactured, IsFinishedGood,
ColorName, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice,
SizeText, SizeUom, WeightUom, WeightValue, DaysToManufacture,
ProductLine, ItemClass, ItemStyle, SellStartDate, SellEndDate,
DiscontinuedDate, RowIsCurrent, IsDeleted,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(ItemId, ' '))) AS HistoricalHashKey,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(ItemName, ItemNumber,
IsManufactured, IsFinishedGood, ColorName, SafetyStockLevel, ReorderPoint,
StandardCost, ListPrice, SizeText, SizeUom, WeightUom, WeightValue,
DaysToManufacture, ProductLine, ItemClass, ItemStyle, SellStartDate,
SellEndDate, DiscontinuedDate, RowIsCurrent, IsDeleted))) AS ChangeHashKey
FROM ItemBase
)
SELECT
ItemId, ItemName, ItemNumber, IsManufactured, IsFinishedGood,
ColorName, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice,
SizeText, SizeUom, WeightUom, WeightValue, DaysToManufacture,
ProductLine, ItemClass, ItemStyle, SellStartDate, SellEndDate,
DiscontinuedDate, RowIsCurrent, IsDeleted,
HistoricalHashKey, ChangeHashKey,
CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII,
CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII,
DATEADD(MINUTE, -1, CURRENT_TIMESTAMP) AS RowEndDate,
CURRENT_TIMESTAMP AS RowStartDate
FROM ItemData;

The Update Statements

After the data flow finishes, two statements drive the dimension changes from the staging update table:

UPDATE A
SET RowIsCurrent = 0,
A.RowEndDate = U.RowEndDate,
A.UpdtDtTm = CURRENT_TIMESTAMP
FROM Updt.UpdtItem AS U
JOIN dbo.DimItem AS A
ON U.HistoricalHashKey = A.HistoricalHashKey
WHERE A.RowIsCurrent = 1;

INSERT INTO dbo.DimItem
(ItemId, ItemName, ItemNumber, IsManufactured, IsFinishedGood,
ColorName, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice,
SizeText, SizeUom, WeightUom, WeightValue, DaysToManufacture,
ProductLine, ItemClass, ItemStyle, SellStartDate, SellEndDate,
DiscontinuedDate, RowIsCurrent, IsDeleted,
HistoricalHashKey, ChangeHashKey, RowEndDate, RowStartDate)
SELECT
ItemId, ItemName, ItemNumber, IsManufactured, IsFinishedGood,
ColorName, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice,
SizeText, SizeUom, WeightUom, WeightValue, DaysToManufacture,
ProductLine, ItemClass, ItemStyle, SellStartDate, SellEndDate,
DiscontinuedDate, RowIsCurrent, IsDeleted,
HistoricalHashKey, ChangeHashKey, NULL, RowStartDate
FROM Updt.UpdtItem;

The Reusable Template

A 'Dim2.biml' template carries the design pattern. Code nuggets accept package and table specific values from a driver script. The same template can produce loads for every Type 2 dimension in the project.

<#@ template language="C#" tier="2" #>
<#@ property name="PackageName" type="String" #>
<#@ property name="SourceConnection" type="String" #>
<#@ property name="DestinationConnection" type="String" #>
<#@ property name="DestinationSchemaName" type="String" #>
<#@ property name="DestinationTableName" type="String" #>
<#@ property name="DataFlowSourceName" type="String" #>
<#@ property name="DataFlowQuery" type="String" #>
<#@ property name="UpdateSchemaName" type="String" #>
<#@ property name="UpdateTableName" type="String" #>
<#@ property name="UpdateConnection" type="String" #>
<#@ property name="UpdateSQLStatement" type="String" #>
<Package Name="<#=PackageName#>" Language="None">
<Variables>
<Variable DataType="Int32" Name="RowCountChanged">0</Variable>
<Variable DataType="Int32" Name="RowCountNew">0</Variable>
<Variable DataType="Int32" Name="RowCountSource">0</Variable>
<Variable DataType="Int32" Name="RowCountUnchanged">0</Variable>
</Variables>
<Tasks>
<Dataflow Name="DFT_Insert<#=DestinationTableName#>">
<Transformations>
<OleDbSource Name="<#=DataFlowSourceName#>" ConnectionName="<#=SourceConnection#>">
<DirectInput><#=DataFlowQuery#></DirectInput>
</OleDbSource>
<RowCount Name="CNT_Source_Rows" VariableName="User.RowCountSource">
<InputPath OutputPathName="<#=DataFlowSourceName#>.Output" />
</RowCount>
<Lookup Name="LKP Historical Key" NoMatchBehavior="RedirectRowsToNoMatchOutput"
OleDbConnectionName="<#=DestinationConnection#>">
<DirectInput>
SELECT
CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII,
CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII
FROM <#=DestinationSchemaName#>.<#=DestinationTableName#>
WHERE RowIsCurrent = 1
</DirectInput>
<Inputs>
<Column SourceColumn="HistoricalHashKeyASCII" TargetColumn="HistoricalHashKeyASCII" />
</Inputs>
<Outputs>
<Column SourceColumn="ChangeHashKeyASCII" TargetColumn="lkp_ChangeHashKeyASCII" />
</Outputs>
<InputPath OutputPathName="CNT_Source_Rows.Output" />
</Lookup>
<ConditionalSplit Name="CSPL Check For Changes">
<InputPath OutputPathName="LKP Historical Key.Match" />
<OutputPaths>
<OutputPath Name="ChangedRows">
<Expression>ChangeHashKeyASCII != lkp_ChangeHashKeyASCII</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>
<RowCount Name="CNT_Changed_Rows" VariableName="User.RowCountChanged">
<InputPath OutputPathName="CSPL Check For Changes.ChangedRows" />
</RowCount>
<OleDbDestination Name="OLE_DST Update Table" ConnectionName="<#=DestinationConnection#>">
<InputPath OutputPathName="CNT_Changed_Rows.Output" />
<ExternalTableOutput Table="[<#=UpdateSchemaName#>].[<#=UpdateTableName#>]" />
</OleDbDestination>
<RowCount Name="CNT_New_Rows" VariableName="User.RowCountNew">
<InputPath OutputPathName="LKP Historical Key.NoMatch" />
</RowCount>
<DerivedColumns Name="NewItemRowStartDate">
<InputPath OutputPathName="CNT_New_Rows.Output" />
<Columns>
<Column ReplaceExisting="true" Name="RowStartDate" DataType="DateTime">
(DT_DBTIMESTAMP)"01/01/1900 00:00:00"
</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="OLE_DST_New_Rows" ConnectionName="<#=DestinationConnection#>">
<InputPath OutputPathName="NewItemRowStartDate.Output" />
<ExternalTableOutput Table="<#=DestinationSchemaName#>.<#=DestinationTableName#>" />
<Columns>
<Column SourceColumn="RowEndDate" IsUsed="false" />
</Columns>
</OleDbDestination>
<RowCount Name="CNT_Unchanged_Rows" VariableName="User.RowCountUnchanged">
<InputPath OutputPathName="CSPL Check For Changes.Default" />
</RowCount>
</Transformations>
</Dataflow>
<ExecuteSQL Name="SQL Truncate <#=UpdateTableName#>" ConnectionName="<#=UpdateConnection#>">
<DirectInput>TRUNCATE TABLE [<#=UpdateSchemaName#>].[<#=UpdateTableName#>]</DirectInput>
</ExecuteSQL>
<ExecuteSQL Name="SQL Update <#=DestinationTableName#>" ConnectionName="<#=DestinationConnection#>">
<DirectInput><#=UpdateSQLStatement#></DirectInput>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="DFT_Insert<#=DestinationTableName#>.Output" />
</Inputs>
</PrecedenceConstraints>
</ExecuteSQL>
</Tasks>
</Package>

The Driver Script

A second Biml file feeds dimension specific values into the template. In a real project the values come from a metadata table so a single execution emits every dimension package.

<#@ template language="C#" hostspecific="true" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#@ include file="ProjectConnection.biml" #>
<#
string PackageName = "LoadDimItem";
string SourceConnection = "WarehouseConn";
string DestinationConnection = "WarehouseConn";
string DestinationSchemaName = "dbo";
string DestinationTableName = "DimItem";
string DataFlowSourceName = "OLE_SRC_StgItem";
string UpdateSchemaName = "Updt";
string UpdateTableName = "UpdtItem";
string UpdateConnection = "WarehouseConn";
string DataFlowQuery = @"SELECT * FROM [dbo].[StgItem];";
string UpdateSQLStatement = @"-- the two statements from above";
#>
<Packages>
<#=CallBimlScript("Dim2.biml", PackageName, SourceConnection, DestinationConnection,
DestinationSchemaName, DestinationTableName, DataFlowSourceName, DataFlowQuery,
UpdateSchemaName, UpdateTableName, UpdateConnection, UpdateSQLStatement)#>
</Packages>
</Biml>

Once the source view, dimension table, and update table exist in the database, generating SSIS packages from the driver Biml produces a fully wired Type 2 dimension load.