Iterating Through Excel Worksheets with Biml
Why This Pattern Matters
A single Excel workbook often contains many worksheets that all share the same column shape, one per region or per period. Building one data flow per worksheet by hand is tedious. BimlScript can list the worksheets in a workbook, loop over the names, and emit one data flow per worksheet, each filtered to read from a different sheet.
The destination table can be the same for every data flow because automatic column name mapping in Biml takes care of the column to column wiring even when worksheets have minor structural differences.
What the Generated Package Does
The compiled package contains:
- An Excel connection manager pointing at the workbook.
- An OLE DB connection manager pointing at the destination database.
- One data flow per worksheet. Each data flow has an Excel source that reads from a single sheet and an OLE DB destination that loads the shared target table.
Reading the Worksheet List
The Excel ACE provider exposes worksheets as tables. A small helper queries the schema collection for the workbook and returns the worksheet names as a string array. The Excel 2007 driver has to be installed on the machine that runs the compile.
<#@ template language="C#" tier="2" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.OleDb" #>
<#
var connectionString =
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\\inbound\\RegionalAccidents.xlsx;Extended Properties=\"Excel 12.0;HDR=YES\";";
var worksheetCollection = new List<string>();
using (var excelConnection = new OleDbConnection(connectionString))
{
excelConnection.Open();
var schemaTable = excelConnection.GetOleDbSchemaTable(
OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
foreach (DataRow row in schemaTable.Rows)
{
worksheetCollection.Add(row["TABLE_NAME"].ToString());
}
}
#>
After this block runs, 'worksheetCollection' contains every worksheet name in the workbook, with the trailing dollar sign included.
Emitting One Data Flow per Worksheet
A 'foreach' loop over the worksheet names produces a data flow per sheet. The 'Replace' call in the source name strips the trailing dollar sign so the data flow names look natural in the SSIS designer. The shared destination is named once and reused across every data flow.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<ExcelConnection Name="RegionalAccidentsBook"
ConnectionString="<#=connectionString#>" />
<OleDbConnection Name="WarehouseDb"
ConnectionString="Data Source=localhost;Initial Catalog=AnalyticsDb;Integrated Security=SSPI;Provider=SQLNCLI11.1;" />
</Connections>
<Packages>
<Package Name="LoadRegionalAccidents" ConstraintMode="Linear"
ProtectionLevel="DontSaveSensitive">
<Tasks>
<# foreach (var worksheet in worksheetCollection) { #>
<Dataflow Name="DFT_<#=worksheet.Replace("$", "")#>">
<Transformations>
<ExcelSource Name="SRC_<#=worksheet.Replace("$", "")#>"
ConnectionName="RegionalAccidentsBook">
<DirectInput>SELECT * FROM [<#=worksheet#>]</DirectInput>
</ExcelSource>
<OleDbDestination Name="DST_RegionalAccidents"
ConnectionName="WarehouseDb">
<ExternalTableOutput Table="dbo.RegionalAccidents" />
</OleDbDestination>
</Transformations>
</Dataflow>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>
The Excel source uses 'DirectInput' with a SELECT statement that references the worksheet by its bracketed name. The destination omits explicit column mappings; Biml falls back to automatic name based mapping, which lets the same destination accept rows from worksheets that have a slightly different column set as long as the column names match the destination.
Trade Offs
A single package per workbook with one data flow per worksheet keeps the SSIS catalog tidy and makes a single restart point for the whole load. A per worksheet package is sometimes preferable: error isolation is easier and individual worksheets can be rerun on demand. Both shapes are minor variations of this template; the choice is a matter of taste and operational requirements.
The pattern works for any workbook whose worksheets share a roughly compatible structure. Replace the workbook path and the destination table, regenerate, and the same template loads a different multi sheet workbook.