Skip to main content

Automating the Staged Delimited Flat File Pattern

Why This Pattern Matters

Hand building a stage delimited flat file package is fine for one file. A medium ETL project usually has fifty or a hundred files, all with the same shape: read a delimited file, convert text columns to typed columns, truncate the staging table, and load it. BimlScript can read file level metadata from a SQL table and emit one fully wired SSIS package per file in a single compile step.

This walkthrough shows how to express the pattern as plain Biml first, then split it into modular files and add code nuggets that loop over a metadata table.

A Vanilla Pattern in Plain Biml

The first pass writes the pattern as static Biml. The same five elements show up in every staging package: a flat file format, the source connection, the staging connection, the package itself, and a data flow that wires it all together.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<FlatFileFormat Name="FFF_RegionalSales" ColumnNamesInFirstDataRow="true"
FlatFileType="Delimited" IsUnicode="false" TextQualifer="None">
<Columns>
<Column Name="Territory" ColumnType="Delimited" DataType="AnsiString" Length="50" Delimiter="|" />
<Column Name="SalesYear" ColumnType="Delimited" DataType="AnsiString" Length="50" Delimiter="|" />
<Column Name="Amount" ColumnType="Delimited" DataType="AnsiString" Length="50" Delimiter="CRLF" />
</Columns>
</FlatFileFormat>
</FileFormats>
<Connections>
<FlatFileConnection Name="FF_RegionalSales" FileFormat="FFF_RegionalSales"
FilePath="C:\stage\inbound\RegionalSales.txt" />
<OleDbConnection Name="OLE_StageDb"
ConnectionString="Data Source=Localhost\DB12;Initial Catalog=StageDb;Provider=SQLNCLI11.1;Integrated Security=SSPI;" />
</Connections>
<Packages>
<Package Name="StageRegionalSales" ConstraintMode="Linear"
ProtectionLevel="EncryptSensitiveWithUserKey">
<Tasks>
<ExecuteSQL Name="EST_TruncateStgRegionalSales" ConnectionName="OLE_StageDb">
<DirectInput>TRUNCATE TABLE dbo.StgRegionalSales</DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT_StageRegionalSales">
<Transformations>
<FlatFileSource Name="FFSRC_RegionalSales" ConnectionName="FF_RegionalSales" />
<DataConversion Name="DCNV_ConvertDataType">
<Columns>
<Column SourceColumn="Territory" TargetColumn="Territory" DataType="AnsiString" Length="50" />
<Column SourceColumn="SalesYear" TargetColumn="SalesYear" DataType="Int32" />
<Column SourceColumn="Amount" TargetColumn="Amount" DataType="Int32" />
</Columns>
</DataConversion>
<OleDbDestination Name="OLEDST_StgRegionalSales" ConnectionName="OLE_StageDb">
<InputPath OutputPathName="DCNV_ConvertDataType.Output" />
<ExternalTableOutput Table="dbo.StgRegionalSales" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

Generating from this Biml produces a single 'StageRegionalSales.dtsx' that truncates the staging table and loads it from the named file. Once the static version compiles cleanly, automating it is mostly a matter of replacing literal values with code nuggets that read from metadata.

Splitting the Solution into Tiered Files

The automated version uses six Biml files, each at a different tier so the compiler processes them in order:

  1. '1.Environment.biml' (tier 0): connection strings, folder paths, schema names.
  2. '2.FileFormats.biml' (tier 1): file format definitions, generated from metadata.
  3. '3.Connections.biml' (tier 2): OLE DB and flat file connections.
  4. '4.ObjectDefinitions.biml' (tier 3): table definitions, generated from metadata.
  5. '5.CreateObjects.biml' (tier 4): an optional package that creates the staging tables.
  6. '6.Packages.biml' (tier 5): the staging packages themselves.

1. Environment

Tier 0 holds variables that point at the metadata database, the inbound folder, and the staging database. This is the only file a new project needs to edit.

<#@ template tier="0" #>
<# var vETLMetadata = "Data Source=Localhost\\DB12;Initial Catalog=ETLMetadata;Provider=SQLNCLI11.1;Integrated Security=SSPI;";
var vFolder = "C:\\stage\\inbound\\";
var vOleDbConnectionName = "OLE_StageDb";
var vOleDbConnectionString = "Data Source=Localhost\\DB12;Initial Catalog=StageDb;Provider=SQLNCLI11.1;Integrated Security=SSPI;";
var vOleDbDatabaseName = "StageDb";
var vDatabaseSchemaName = "dbo";
#>

2. File Formats

Tier 1 reads file level metadata, then nests a second query that returns the columns for each file. Annotations carry friendly tags forward so later tiers can read them with 'GetTag'.

<#@ include file="1.Environment.biml" #>
<#@ template tier="1" #>
<#@ import namespace="System.Data" #>
<# String sFileName = "";
String qGetFiles = "SELECT f.flat_file_name, f.first_row_header, f.flat_file_type, f.text_qualifer FROM dbo.get_delim_flat_files(NULL) AS f;";
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<FileFormats>
<# DataTable files = ExternalDataAccess.GetDataTable(vETLMetadata, qGetFiles);
foreach (DataRow file in files.Rows) {
sFileName = file[0].ToString();
#>
<FlatFileFormat Name="FFF_<#=file[0]#>" ColumnNamesInFirstDataRow="<#=file[1]#>"
IsUnicode="false" TextQualifer="<#=file[3]#>">
<Columns>
<# DataTable columns = ExternalDataAccess.GetDataTable(vETLMetadata,
"SELECT ff.column_name, ff.data_type, ff.column_length, ff.delimiter FROM dbo.get_delim_flat_file_format('" + sFileName + "') AS ff;");
foreach (DataRow column in columns.Rows) { #>
<Column Name="<#=column[0]#>" DataType="AnsiString" Length="<#=column[2]#>" Delimiter="<#=column[3]#>" />
<# } #>
</Columns>
<Annotations>
<Annotation Tag="FriendlyFileName"><#=file[0]#></Annotation>
<Annotation Tag="FriendlyFileFormatName">FFF_<#=file[0]#></Annotation>
<Annotation Tag="FileType"><#=file[2]#></Annotation>
</Annotations>
</FlatFileFormat>
<# } #>
</FileFormats>
</Biml>

3. Connections

Tier 2 emits a single OLE DB connection plus a flat file connection per file format already in the tree.

<#@ include file="1.Environment.biml" #>
<#@ template tier="2" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="<#=vOleDbConnectionName#>" ConnectionString="<#=vOleDbConnectionString#>" />
<# foreach (var file in RootNode.FileFormats) { #>
<FlatFileConnection Name="FF_<#=file.GetTag("FriendlyFileName")#>"
FilePath="<#=vFolder#><#=file.GetTag("FriendlyFileName")#>.<#=file.GetTag("FileType")#>"
FileFormat="<#=file.GetTag("FriendlyFileFormatName")#>" />
<# } #>
</Connections>
<Databases>
<Database Name="<#=vOleDbDatabaseName#>" ConnectionName="<#=vOleDbConnectionName#>" />
</Databases>
<Schemas>
<Schema Name="<#=vDatabaseSchemaName#>" DatabaseName="<#=vOleDbDatabaseName#>" />
</Schemas>
</Biml>

4. Table Definitions

Tier 3 mirrors the file formats but produces table objects. Each table carries a 'FriendlyTableName' annotation so the package generator can reference it later.

<#@ include file="1.Environment.biml" #>
<#@ template tier="3" #>
<#@ import namespace="System.Data" #>
<# String sFileName = ""; #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<# DataTable files = ExternalDataAccess.GetDataTable(vETLMetadata,
"SELECT f.flat_file_name, f.convert_data_type FROM dbo.get_delim_flat_files(NULL) AS f;");
foreach (DataRow file in files.Rows) {
sFileName = file[0].ToString();
#>
<Table Name="Stg<#=file[0]#>" SchemaName="<#=vOleDbDatabaseName#>.<#=vDatabaseSchemaName#>">
<Columns>
<# DataTable columns = ExternalDataAccess.GetDataTable(vETLMetadata,
"SELECT ff.column_name, ff.data_type, ff.column_length FROM dbo.get_delim_flat_file_format('" + sFileName + "') AS ff;");
foreach (DataRow column in columns.Rows) { #>
<Column Name="<#=column[0]#>" DataType="<#=column[1]#>" Length="<#=column[2]#>" />
<# } #>
</Columns>
<Annotations>
<Annotation Tag="FriendlyTableName"><#=file[0]#></Annotation>
</Annotations>
</Table>
<# } #>
</Tables>
</Biml>

5. CreateObjects (Optional)

Tier 4 generates a one off package that issues 'CREATE TABLE' for each staging table by calling 'GetTableSql' on every node in 'RootNode.Tables'. Most teams run this once during development, then check the generated DDL into source control.

<#@ include file="1.Environment.biml" #>
<#@ template tier="4" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="CreateObjects" ConstraintMode="Linear"
ProtectionLevel="EncryptSensitiveWithUserKey">
<Tasks>
<# foreach (var table in RootNode.Tables) { #>
<ExecuteSQL Name="EST_Create<#=table.Name#>" ConnectionName="<#=vOleDbConnectionName#>">
<DirectInput><#=table.GetTableSql()#></DirectInput>
</ExecuteSQL>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>

6. Packages

Tier 5 reuses the same loop. For each table in the tree it emits a staging package whose name, truncate statement, source connection, data conversion columns, and destination table all come from the metadata.

<#@ include file="1.Environment.biml" #>
<#@ template tier="5" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<# foreach (var table in RootNode.Tables) { #>
<Package Name="Stage<#=table.GetTag("FriendlyTableName")#>" ConstraintMode="Linear"
ProtectionLevel="EncryptSensitiveWithUserKey">
<Tasks>
<ExecuteSQL Name="EST_Trun_<#=table.GetTag("FriendlyTableName")#>" ConnectionName="<#=vOleDbConnectionName#>">
<DirectInput>TRUNCATE TABLE <#=table.SchemaQualifiedName#></DirectInput>
</ExecuteSQL>
<Dataflow Name="DFT_Stage_<#=table.GetTag("FriendlyTableName")#>">
<Transformations>
<FlatFileSource Name="FFSRC_<#=table.GetTag("FriendlyTableName")#>"
ConnectionName="FF_<#=table.GetTag("FriendlyTableName")#>" />
<DataConversion Name="DCNV_ConvertToTargetType">
<Columns>
<# foreach (var column in table.Columns) { #>
<Column SourceColumn="<#=column.Name#>" TargetColumn="<#=column.Name#>"
DataType="<#=column.DataType#>" Length="<#=column.Length#>" />
<# } #>
</Columns>
</DataConversion>
<OleDbDestination Name="OLEDST_<#=table.GetTag("FriendlyTableName")#>"
ConnectionName="<#=vOleDbConnectionName#>">
<ExternalTableOutput Table="<#=table.SchemaQualifiedName#>" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
</Packages>
</Biml>

Reusing the Pattern in a New Environment

Drop the six Biml files into a new SSIS project, edit '1.Environment.biml' to point at the new metadata database, staging database, and inbound folder, then add a row to the metadata table for each new file. Selecting the Biml files and choosing Generate SSIS Packages produces one staging package per file. No additional code changes are needed.