Skip to main content

Hybrid Type 2 Slowly Changing Dimension with Biml

Why This Pattern Matters

A pure Type 2 slowly changing dimension creates a new row every time any tracked attribute changes. In practice, many warehouses need a hybrid that creates new rows for some columns and updates other columns in place. The Type 6 variant takes that idea further by carrying both the historical value and the current value of each attribute on the same row, which lets reports group facts by either perspective.

This walkthrough shows the SSIS pattern for that hybrid load and the BimlScript template that produces a package per dimension.

The Source View

The source view exposes three hashes that drive change detection:

  • 'HistoricalHashKey': the natural key that ties historical rows together.
  • 'ChangeHashKey': the columns whose changes require a new row.
  • 'UpdateHashKey': the columns that should be updated in place on existing rows.

In the example below, a depot together with a delivery route forms the natural key. A coordinator name, a description, and a category produce a new row when changed. A region attribute is updated in place across all historical rows.

CREATE VIEW [dbo].[StgDimDeliveryRoute] AS
WITH RouteBase AS (
SELECT
r.RouteCode,
r.RouteDescription,
r.Coordinator,
r.RouteCategory,
d.DepotName,
r.RegionName,
1 AS RowIsCurrent
FROM Stg.Routes r
LEFT JOIN dbo.StgDepot d ON d.DepotKey = r.DepotKey
UNION
SELECT -1, 'Unknown', 'Unknown', 'Unknown', 'Unknown', 'Unknown', 1
),
RouteData AS (
SELECT
RouteCode, RouteDescription, Coordinator, RouteCategory,
DepotName, RegionName, RowIsCurrent,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(RouteCode, ' ', DepotName))) AS HistoricalHashKey,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(Coordinator, RouteDescription, RouteCategory))) AS ChangeHashKey,
CONVERT(VARBINARY(20), HASHBYTES('MD5', CONCAT(RegionName, ' '))) AS UpdateHashKey
FROM RouteBase
)
SELECT
RouteCode, RouteDescription, Coordinator, RouteCategory,
DepotName, RegionName, RowIsCurrent,
HistoricalHashKey, ChangeHashKey, UpdateHashKey,
CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII,
CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII,
CONVERT(VARCHAR(34), UpdateHashKey, 1) AS UpdateHashKeyASCII,
DATEADD(MINUTE, -1, CURRENT_TIMESTAMP) AS RowEndDate,
CURRENT_TIMESTAMP AS RowStartDate
FROM RouteData;

The 'RowEndDate' value is used to expire the current row when a column requires a new historical row.

Data Flow Logic

The data flow:

  1. Reads the source view.
  2. Counts rows for audit logging.
  3. Looks up 'HistoricalHashKeyASCII' against the dimension to bring back 'lkp_ChangeHashKeyASCII' and 'lkp_UpdateHashKeyASCII'.
  4. Routes no-match rows to a derived column that sets 'RowStartDate' to the warehouse epoch and inserts them as new dimension rows.
  5. Routes match rows to a conditional split that fires when either the change hash or the update hash differs.
  6. Adds a derived column that flags 'UpdateInPlace' and 'UpdateNewRow' on each changed row, then writes the row to a staging update table.

The Update Statements

After the data flow finishes, three statements drive the dimension changes from the update table:

UPDATE A
SET A.RegionName = U.RegionName,
A.UpdateHashKey = U.UpdateHashKey,
A.UpdtDtTm = CURRENT_TIMESTAMP
FROM Updt.UpdtDeliveryRoute AS U
JOIN dbo.DimDeliveryRoute AS A
ON U.HistoricalHashKey = A.HistoricalHashKey
WHERE U.UpdateInPlace = 1;

UPDATE A
SET RowIsCurrent = 0,
A.RowEndDate = U.RowEndDate,
A.UpdtDtTm = CURRENT_TIMESTAMP
FROM Updt.UpdtDeliveryRoute U
JOIN dbo.DimDeliveryRoute AS A
ON U.HistoricalHashKey = A.HistoricalHashKey
WHERE A.RowIsCurrent = 1
AND U.UpdateNewRow = 1;

INSERT INTO dbo.DimDeliveryRoute
(RouteCode, RouteDescription, Coordinator, RouteCategory,
DepotName, RegionName, RowIsCurrent, RowStartDate, RowEndDate,
HistoricalHashKey, ChangeHashKey, UpdateHashKey)
SELECT RouteCode, RouteDescription, Coordinator, RouteCategory,
DepotName, RegionName, 1, RowStartDate, NULL,
HistoricalHashKey, ChangeHashKey, UpdateHashKey
FROM Updt.UpdtDeliveryRoute
WHERE UpdateNewRow = 1;

The first statement applies in place updates. The second expires current rows that need a new historical row. The third inserts the new historical rows.

The Reusable Biml Template

The pattern is parameterized so a single template emits a load package for any hybrid dimension. The driver Biml file calls 'CallBimlScript' once per dimension, passing the package name, source query, destination table, update table, and update SQL.

<#@ template language="C#" tier="2" #>
<#@ property name="PackageName" type="String" #>
<#@ property name="DstSchemaName" type="String" #>
<#@ property name="DstTableName" type="String" #>
<#@ property name="DstConnection" type="String" #>
<#@ property name="DataFlowSourceName" type="String" #>
<#@ property name="SrcConnection" type="String" #>
<#@ property name="SourceQuery" type="String" #>
<#@ property name="UpdateSchemaName" type="String" #>
<#@ property name="UpdateTableName" type="String" #>
<#@ property name="UpdateConnection" type="String" #>
<#@ property name="UpdateSQLStatement" type="String" #>
<Package Name="<#=PackageName#>" Language="None">
<Variables>
<Variable DataType="Int32" Name="RowCountChanged">0</Variable>
<Variable DataType="Int32" Name="RowCountNew">0</Variable>
<Variable DataType="Int32" Name="RowCountSource">0</Variable>
<Variable DataType="Int32" Name="RowCountUnchanged">0</Variable>
<Variable DataType="String" Name="SchemaName"><#=DstSchemaName#></Variable>
<Variable DataType="String" Name="TableName"><#=DstTableName#></Variable>
</Variables>
<Tasks>
<Dataflow Name="DFT_Insert<#=DstTableName#>">
<Transformations>
<OleDbSource Name="<#=DataFlowSourceName#>" ConnectionName="<#=SrcConnection#>">
<DirectInput><#=SourceQuery#></DirectInput>
</OleDbSource>
<RowCount Name="CNT_Source_Rows" VariableName="User.RowCountSource">
<InputPath OutputPathName="<#=DataFlowSourceName#>.Output" />
</RowCount>
<Lookup Name="LKP Historical Key" NoMatchBehavior="RedirectRowsToNoMatchOutput"
OleDbConnectionName="<#=DstConnection#>">
<DirectInput>
SELECT
CONVERT(VARCHAR(34), ChangeHashKey, 1) AS ChangeHashKeyASCII,
CONVERT(VARCHAR(34), HistoricalHashKey, 1) AS HistoricalHashKeyASCII,
CONVERT(VARCHAR(34), UpdateHashKey, 1) AS UpdateHashKeyASCII
FROM <#=DstSchemaName#>.<#=DstTableName#>
WHERE RowIsCurrent = 1
</DirectInput>
<Inputs>
<Column SourceColumn="HistoricalHashKeyASCII" TargetColumn="HistoricalHashKeyASCII" />
</Inputs>
<Outputs>
<Column SourceColumn="ChangeHashKeyASCII" TargetColumn="lkp_ChangeHashKeyASCII" />
<Column SourceColumn="UpdateHashKeyASCII" TargetColumn="lkp_UpdateHashKeyASCII" />
</Outputs>
<InputPath OutputPathName="CNT_Source_Rows.Output" />
</Lookup>
<ConditionalSplit Name="CSPL Check For Changes">
<InputPath OutputPathName="LKP Historical Key.Match" />
<OutputPaths>
<OutputPath Name="ChangedRows">
<Expression>(ChangeHashKeyASCII != lkp_ChangeHashKeyASCII) || (UpdateHashKeyASCII != lkp_UpdateHashKeyASCII)</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>
<RowCount Name="CNT_Changed_Rows" VariableName="User.RowCountChanged">
<InputPath OutputPathName="CSPL Check For Changes.ChangedRows" />
</RowCount>
<DerivedColumns Name="Identify Change Types">
<InputPath OutputPathName="CNT_Changed_Rows.Output" />
<Columns>
<Column Name="UpdateInPlace" DataType="Boolean">UpdateHashKeyASCII != lkp_UpdateHashKeyASCII ? (DT_BOOL)1 : (DT_BOOL)0</Column>
<Column Name="UpdateNewRow" DataType="Boolean">ChangeHashKeyASCII != lkp_ChangeHashKeyASCII ? (DT_BOOL)1 : (DT_BOOL)0</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="OLE_DST Update Table" ConnectionName="<#=DstConnection#>">
<InputPath OutputPathName="Identify Change Types.Output" />
<ExternalTableOutput Table="[<#=UpdateSchemaName#>].[<#=UpdateTableName#>]" />
</OleDbDestination>
<RowCount Name="CNT_New_Rows" VariableName="User.RowCountNew">
<InputPath OutputPathName="LKP Historical Key.NoMatch" />
</RowCount>
<DerivedColumns Name="NewItemRowStartDate">
<InputPath OutputPathName="CNT_New_Rows.Output" />
<Columns>
<Column ReplaceExisting="true" Name="RowStartDate" DataType="DateTime">
(DT_DBTIMESTAMP)"01/01/1900 00:00:00"
</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="OLE_DST_New_Rows" ConnectionName="<#=DstConnection#>">
<InputPath OutputPathName="NewItemRowStartDate.Output" />
<ExternalTableOutput Table="<#=DstSchemaName#>.<#=DstTableName#>" />
<Columns>
<Column SourceColumn="RowEndDate" IsUsed="false" />
</Columns>
</OleDbDestination>
<RowCount Name="CNT_Unchanged_Rows" VariableName="User.RowCountUnchanged">
<InputPath OutputPathName="CSPL Check For Changes.Default" />
</RowCount>
</Transformations>
</Dataflow>
<ExecuteSQL Name="SQL Truncate <#=UpdateTableName#>" ConnectionName="<#=UpdateConnection#>">
<DirectInput>TRUNCATE TABLE [<#=UpdateSchemaName#>].[<#=UpdateTableName#>]</DirectInput>
</ExecuteSQL>
<ExecuteSQL Name="SQL Update <#=DstTableName#>" ConnectionName="<#=DstConnection#>">
<DirectInput><#=UpdateSQLStatement#></DirectInput>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="DFT_Insert<#=DstTableName#>.Output" />
</Inputs>
</PrecedenceConstraints>
</ExecuteSQL>
</Tasks>
</Package>

The driver file passes the dimension specific values into the template:

<#@ template language="C#" hostspecific="true" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#@ include file="ProjectConnection.biml" #>
<#
string PackageName = "LoadDimDeliveryRoute";
string DstSchemaName = "dbo";
string DstTableName = "DimDeliveryRoute";
string DstConnection = "WarehouseConn";
string DataFlowSourceName = "OLE_SRC_StgDimDeliveryRoute";
string SrcConnection = "WarehouseConn";
string SourceQuery = @"SELECT RouteCode, RouteDescription, Coordinator,
RouteCategory, DepotName, RegionName, RowIsCurrent,
HistoricalHashKey, ChangeHashKey, UpdateHashKey,
HistoricalHashKeyASCII, ChangeHashKeyASCII, UpdateHashKeyASCII,
RowEndDate, RowStartDate
FROM [dbo].[StgDimDeliveryRoute];";
string UpdateSchemaName = "Updt";
string UpdateTableName = "UpdtDeliveryRoute";
string UpdateConnection = "WarehouseConn";
string UpdateSQLStatement = @"-- the three statements from above";
#>
<Packages>
<#=CallBimlScript("Dim2Hybrid.biml", PackageName, DstSchemaName, DstTableName,
DstConnection, DataFlowSourceName, SrcConnection, SourceQuery,
UpdateSchemaName, UpdateTableName, UpdateConnection, UpdateSQLStatement)#>
</Packages>
</Biml>

Once the source view, dimension table, and update table exist, generating SSIS packages from the driver Biml produces a fully wired hybrid Type 2 load.