SSIS Parallel Processing with LINQ in Biml
Why Use LINQ Inside Biml
LINQ has been part of C# since version 3.0 and works the same on any IEnumerable, including the collections that Biml exposes. Inside BimlScript, LINQ replaces hand rolled counters and conditional emit logic with a compact, declarative expression. The pattern below uses three LINQ operators (OrderBy, Skip, Take) to slice the table list into fixed sized groups and produce one SSIS package per group, each containing one data flow with five Source-Destination pairs running in parallel.
This walkthrough assumes the project already has tables defined in RootNode.Tables.
The Full Template
<#
int groupSize = 5;
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<# for (int i = 0; i < RootNode.Tables.Count / groupSize; i++) { #>
<Package Name="Group_<#=i*groupSize#>_<#=(i+1)*groupSize#>">
<Tasks>
<Dataflow Name="DFT Group_<#=i*groupSize#>_<#=(i+1)*groupSize#>">
<Transformations>
<# foreach (var tableNode in RootNode.Tables.OrderBy(o => o.Name).Skip(i*groupSize).Take(groupSize)) { #>
<OleDbSource Name="OLEDBSRC_<#=tableNode.Name#>" ConnectionName="OpsSource">
</OleDbSource>
<OleDbDestination Name="OLEDBDST_<#=tableNode.Name#>" ConnectionName="OpsSource">
</OleDbDestination>
<# } #>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>
Setting the Group Size
The whole pattern is parameterized by one integer. Setting groupSize to 5 means each generated package handles five tables. Lowering or raising the value changes how many tasks land in each data flow:
<#
int groupSize = 5;
#>
The Outer Loop
The outer for loop iterates from zero up to the number of tables divided by groupSize. Each pass produces one package. The package name uses the iteration counter to encode the row range that the package covers, which keeps generated package names predictable:
<# for (int i = 0; i < RootNode.Tables.Count / groupSize; i++) { #>
<Package Name="Group_<#=i*groupSize#>_<#=(i+1)*groupSize#>">
<Tasks>
<Dataflow Name="DFT Group_<#=i*groupSize#>_<#=(i+1)*groupSize#>">
<Transformations>
<# foreach (var tableNode in RootNode.Tables.OrderBy(o => o.Name).Skip(i*groupSize).Take(groupSize)) { #>
<OleDbSource Name="OLEDBSRC_<#=tableNode.Name#>" ConnectionName="OpsSource">
</OleDbSource>
<OleDbDestination Name="OLEDBDST_<#=tableNode.Name#>" ConnectionName="OpsSource">
</OleDbDestination>
<# } #>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
The LINQ Slice
The inner foreach is where LINQ does the work. Three operators chain together:
- OrderBy(o => o.Name) sorts the table collection by the table name. Using a stable order matters because Skip and Take only make sense against an ordered sequence. Other ordering options work too: an annotation on the table that records its size, the result of a row count query, or any other property exposed on AstTableNode.
- Skip(i * groupSize) drops the first 'i times groupSize' entries. On the first iteration this is zero, on the second iteration it is groupSize, and so on.
- Take(groupSize) returns the next groupSize entries from the result.
<# foreach (var tableNode in RootNode.Tables.OrderBy(o => o.Name).Skip(i*groupSize).Take(groupSize)) { #>
<OleDbSource Name="OLEDBSRC_<#=tableNode.Name#>" ConnectionName="OpsSource">
</OleDbSource>
<OleDbDestination Name="OLEDBDST_<#=tableNode.Name#>" ConnectionName="OpsSource">
</OleDbDestination>
<# } #>
The combination produces a sliding window across the sorted table list. With groupSize set to 5, package one handles tables in positions 0 through 4, package two handles tables in positions 5 through 9, and so on. Inside each data flow, the SSIS engine runs the source-destination pairs in parallel because they sit at the same level under Transformations.
Why This Pattern Works Well
The LINQ version stays compact even when the slicing rules become more involved. Swapping OrderBy(o => o.Name) for OrderBy(o => o.GetTag("ExpectedRows")) groups tables by load weight without restructuring the loops. Switching Take(groupSize) for TakeWhile lets the rule become "as many tables as fit under a row count budget". The control flow stays the same; only the LINQ expression changes.