Skip to main content

Importing Data From Excel With Biml

Why Treat Excel Like a Database

GetDatabaseSchema is not limited to SQL Server connections. An ExcelConnection that points at an .xlsx file responds to the same call and returns one table node per worksheet. That makes it possible to drive a staging load from the file itself without hand authoring Column elements for every worksheet.

Step 1: Define the Excel Source and the Target Database

The first file declares an ExcelConnection over the source workbook plus a regular OLE DB connection to the destination database. A Database and a Schema element wrap the destination so the generated tables have somewhere to land:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<ExcelConnection Name="WorkbookSrc"
ConnectionString="Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Flatfiles\XLS\InputBook.xlsx;Extended Properties=&quot;Excel 12.0 XML;HDR=YES&quot;;" />
<OleDbConnection Name="StgTarget"
ConnectionString="Data Source=localhost;initial catalog=StagingDb;provider=SQLNCLI11;integrated security=SSPI" />
</Connections>
<Databases>
<Database Name="StagingDb" ConnectionName="StgTarget" />
</Databases>
<Schemas>
<Schema Name="dbo" DatabaseName="StagingDb" />
</Schemas>
</Biml>

Step 2: Project Worksheets Into Biml Tables

The next file calls GetDatabaseSchema on the Excel connection. Excel does not carry a schema name, so the loop assigns the destination schema to every imported table before emitting the Biml:

<#@ template language="VB" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<# Dim XLS as AstDbConnectionNode = RootNode.Connections("WorkbookSrc")
Dim ImportResult as ImportResults = XLS.GetDatabaseSchema()
for each tbl as AstTableNode in ImportResult.TableNodes
tbl.Schema = RootNode.Schemas(0) #>
<#= tbl.GetBiml() #>
<# next #>
</Tables>
</Biml>

After this file is added to the project, RootNode.Tables contains one Biml table per worksheet in the workbook.

Step 3: Generate the Staging Tables

A small package emits a CREATE statement for each table by calling GetDropAndCreateDdl on the same nodes:

<#@ template language="VB" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="BuildStaging">
<Tasks>
<# for each tbl as AstTableNode in RootNode.Tables #>
<ExecuteSQL Name="Create <#= tbl.Name #>" ConnectionName="StgTarget">
<DirectInput><#= tbl.GetDropAndCreateDdl() #></DirectInput>
</ExecuteSQL>
<# next #>
</Tasks>
</Package>
</Packages>
</Biml>

Run the generated package once, and the destination database has a staging table for every worksheet, ready for a follow up data flow that copies the rows in.

Caveats

The pattern relies on the same ACE OLE DB provider that SSIS uses, so any data type detection quirks that show up in a hand built Excel source still apply here. For workbooks where SSIS handles the import cleanly, generating the load with Biml turns a fiddly manual step into a few lines of script.