Skip to main content

Loading Tables by Foreign Key Topology with Biml: Linear Loads

Why Order Matters

Most introductory Biml examples load tables in parallel and ignore foreign key constraints. Real warehouses often need the opposite: parent tables must be created and loaded before child tables, or referential integrity blows up at insert time. The pattern below extracts a fact table together with every dimension it references, then generates packages that drop, create, and populate the tables in dependency order.

The sample uses a sales fact as the seed and walks the foreign key graph to pick up every related dimension. The result is a slimmer copy of the source warehouse that contains only the tables connected to that fact.

A TopoSort Extension Method

Biml does not include a topology sort out of the box, so the first piece is a small VB module that adds a TopoSort extension method to any collection of AstTableNode. It walks each table's foreign key references, recurses into the referenced tables, and appends every visited table to an output list once its dependencies have been added:

Imports System.Collections.Generic
Imports System.Linq
Imports System.Linq.Expressions
Imports Varigence.Languages.Biml.Table
Imports System.Runtime.CompilerServices

Module TopologySort
Public OutputPath As String = "C:\SSIS\TopologySort"
Public LoadedTables As New List(Of String)
Public LoadableTables As New List(Of AstTableNode)
Public Level As Integer = 0

<Extension()>
Public Function TopoSort(tables As ICollection(Of AstTableNode)) As ICollection(Of AstTableNode)
Dim visitedList As New List(Of AstTableNode)
Dim outputList As New List(Of AstTableNode)
For Each tbl As AstTableNode In tables
TopoVisit(tbl, outputList, visitedList)
Next
Return outputList
End Function

Private Function TopoVisit(node As AstTableNode, outputList As List(Of AstTableNode), visitedList As List(Of AstTableNode))
If Not visitedList.Contains(node) Then
visitedList.Add(node)
For Each dependentTable As AstTableNode In node.Columns.OfType(Of AstTableColumnTableReferenceNode).Select(Function(c) c.ForeignTable)
TopoVisit(dependentTable, outputList, visitedList)
Next
For Each dependentTable As AstTableNode In node.Columns.OfType(Of AstMultipleColumnTableReferenceNode).Select(Function(c) c.ForeignTable)
TopoVisit(dependentTable, outputList, visitedList)
Next
outputList.Add(node)
End If
End Function
End Module

Source and Target Connections

A static Biml file declares the source warehouse and the slimmer target database, plus a target schema:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<Connection Name="SrcDw" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=SourceWarehouse" />
<Connection Name="TgtDw" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=TargetWarehouse" />
</Connections>
<Databases>
<Database Name="TgtDw" ConnectionName="TgtDw" />
</Databases>
<Schemas>
<Schema Name="dbo" DatabaseName="TgtDw" />
</Schemas>
</Biml>

Walking the Foreign Key Graph

The next file starts with one seed table name and grows the list by repeatedly visiting referenced tables until no new dependencies appear. Once the list is complete, GetDatabaseSchema is called with the final list as a filter and the returned tables are reparented to the target schema:

<#@ template language="VB" tier="2" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<#
Dim tableList As New List(Of String)
Dim tableListChecked As New List(Of String)
tableList.Add("FactSales")
Dim importResult As ImportResults = RootNode.Connections("SrcDw").GetDatabaseSchema()
Do While importResult.TableNodes.Where(Function(e) tableList.Contains(e.Name) And Not tableListChecked.Contains(e.Name)).Where(Function(r) r.Columns.OfType(Of AstTableColumnTableReferenceNode)().Any).Count > 0
For Each t As AstTableNode In importResult.TableNodes.Where(Function(e) tableList.Contains(e.Name) And Not tableListChecked.Contains(e.Name)).Where(Function(r) r.Columns.OfType(Of AstTableColumnTableReferenceNode)().Any)
For Each r As AstTableColumnTableReferenceNode In t.Columns.OfType(Of AstTableColumnTableReferenceNode)()
tableList.Add(r.ForeignTable.Name)
Next
tableListChecked.Add(t.Name)
Next
Loop
importResult = RootNode.Connections("SrcDw").GetDatabaseSchema(Nothing, tableList, ImportOptions.ExcludeViews)
For Each t As AstTableNode In importResult.TableNodes
t.Schema = RootNode.Schemas(0)
#>
<#= t.GetBiml() #>
<# Next #>
</Tables>
</Biml>

Recreating Tables in Order

A package drops every target table in reverse topology order, then recreates them in forward topology order using GetDropAndCreateDdl. Reverse first avoids hitting an existing foreign key when dropping a parent table:

<#@ template tier="3" language="VB" #>
<#@ code file="../Code/TopologySort.vb" #>
<#@ import namespace="TopologySort" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="01_Create" ConstraintMode="Linear" PackageSubpath="<#= OutputPath #>">
<Tasks>
<ExecuteSQL Name="Drop Tables" ConnectionName="TgtDw">
<DirectInput>
<# For Each table As AstTableNode In RootNode.Tables.TopoSort.Reverse #>
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[<#= table.Name #>]') AND type IN (N'U'))
DROP TABLE [dbo].[<#= table.Name #>]
GO
<# Next #>
</DirectInput>
</ExecuteSQL>
<ExecuteSQL Name="Create Tables" ConnectionName="TgtDw">
<DirectInput>
<# For Each table As AstTableNode In RootNode.Tables.TopoSort #>
<#= table.GetDropAndCreateDdl() #>
<# Next #>
</DirectInput>
</ExecuteSQL>
</Tasks>
</Package>
</Packages>
</Biml>

Loading Tables Sequentially

A separate package executes the create package, then runs one data flow per table in topology order:

<#@ template tier="4" language="VB" #>
<#@ code file="../Code/TopologySort.vb" #>
<#@ import namespace="TopologySort" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="02_Load_Linear" ConstraintMode="Linear" PackageSubpath="<#= OutputPath #>">
<Tasks>
<ExecutePackage Name="Recreate Tables">
<ExternalFile ExternalFilePath="<#= OutputPath #>\01_Create.dtsx" />
</ExecutePackage>
<# For Each t As AstTableNode In RootNode.Tables.TopoSort #>
<#= CallBimlScript("Dataflow.biml", t, Nothing) #>
<# Next #>
</Tasks>
</Package>
</Packages>
</Biml>

The data flow itself is a one liner per table, kept in a callable Biml fragment:

<#@ template language="VB" designerbimlpath="Biml/Packages/Package/Tasks" #>
<#@ property name="tbl" type="AstTableNode" #>
<#@ property name="LoadedTables" type="List (of String)" #>
<# If Not LoadedTables Is Nothing Then LoadedTables.Add(tbl.Name) #>
<Dataflow Name="Load <#= tbl.Name #>">
<Transformations>
<OleDbSource Name="Get Rows" ConnectionName="SrcDw">
<DirectInput>SELECT <#= tbl.GetColumnList() #> FROM <#= tbl.SchemaQualifiedName #></DirectInput>
</OleDbSource>
<OleDbDestination Name="Set Rows" ConnectionName="TgtDw" KeepIdentity="true">
<TableOutput TableName="<#= tbl.ScopedName #>" />
</OleDbDestination>
</Transformations>
</Dataflow>

The result is a sequential load that respects foreign key order. The companion walkthrough on parallel loads uses the same metadata to run independent branches concurrently.