For Each Loops with Biml over Flat Files
Why a Foreach Pattern Matters for Files
Many extract jobs receive multiple files in the same format dropped into a folder over time. Hand building a package that loops the folder is straightforward in SSIS, but doing it in Biml means the same pattern can be parameterized and reused across data sources. The walkthrough below builds a package that loops every CSV file in a directory, loads it into one target table, and tags each row with its source file name.
This walkthrough builds on the flat file format pattern from the related piece on working with flat files; reuse those format definitions as the prerequisite metadata.
Sample Target Table
A small target table holds the imported rows together with the originating file name:
CREATE TABLE [dbo].[FF_Inventory] (
[Warehouse] [nvarchar](50) NULL,
[Item] [nvarchar](50) NULL,
[Inventory] [bigint] NULL,
[Filename] [nvarchar](35) NULL
) ON [PRIMARY]
Static Environment Definition
The flat file format, the matching flat file connection, and the SQL Server target connection live in a static Biml file. The flat file connection uses an empty FilePath; the actual path is supplied at runtime through an expression on the package:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<FlatFileFormat Name="InventoryFile" CodePage="1252" ColumnNamesInFirstDataRow="true" IsUnicode="false">
<Columns>
<Column Name="Warehouse" DataType="String" Length="50" Delimiter="Semicolon" />
<Column Name="Item" DataType="String" Length="50" Delimiter="Semicolon" />
<Column Name="Inventory" DataType="Int64" Delimiter="CRLF" />
</Columns>
</FlatFileFormat>
</FileFormats>
<Connections>
<Connection Name="WarehouseDb" ConnectionString="Data Source=.;Initial Catalog=InventoryDb;Provider=SQLNCLI11.1;Integrated Security=SSPI;" />
<FlatFileConnection Name="InventoryFile" FilePath="empty" FileFormat="InventoryFile" />
</Connections>
</Biml>
What the Package Needs
To loop a folder of files in SSIS the package requires four things:
- A package variable that holds the current file name and feeds the connection string for the flat file connection.
- A ForEach loop that iterates over the files in the folder, writing the current path into the variable and running a data flow.
- A derived column transformation in the data flow that adds the file name as an extra column so it can be persisted to SQL.
- A truncate up front so reruns produce a clean target.
The Generated Package
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package ProtectionLevel="EncryptSensitiveWithUserKey" Name="ImportInventoryFiles" ConstraintMode="Linear">
<Variables>
<Variable DataType="String" Name="CurrentFile" />
</Variables>
<Connections>
<Connection ConnectionName="InventoryFile">
<Expressions>
<Expression ExternalProperty="ConnectionString">@[User::CurrentFile]</Expression>
</Expressions>
</Connection>
</Connections>
<Tasks>
<ExecuteSQL Name="Truncate" ConnectionName="WarehouseDb">
<DirectInput>truncate table FF_Inventory</DirectInput>
</ExecuteSQL>
<ForEachFileLoop Name="Loop CSV Files" Folder="C:\Imports\Inventory" FileSpecification="*.csv">
<VariableMappings>
<VariableMapping Name="0" VariableName="User.CurrentFile" />
</VariableMappings>
<Tasks>
<Dataflow Name="Load file">
<Transformations>
<FlatFileSource Name="SRC" ConnectionName="InventoryFile" />
<DerivedColumns Name="AddFileName">
<InputPath OutputPathName="SRC.Output" />
<Columns>
<Column Name="Filename" DataType="String" Length="35">@[User::CurrentFile]</Column>
</Columns>
</DerivedColumns>
<OleDbDestination Name="Target" ConnectionName="WarehouseDb">
<InputPath OutputPathName="AddFileName.Output" />
<ExternalTableOutput Table="[dbo].[FF_Inventory]" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</ForEachFileLoop>
</Tasks>
</Package>
</Packages>
</Biml>
The expression on the flat file connection rebinds its connection string to the variable so each loop iteration points to a different file. The derived column copies the same variable into a new column on every row.
Making the Pattern Reusable
Wrap the package definition in a tiered Biml file that accepts the format name, the target table, and the source folder as properties, then call it through CallBimlScript for every flat file format declared in the project. The result is one package per file shape, all built from the same template.