Export to Split Flat Files with Biml

Why Split Exports Across Multiple Files

A single CSV export per table is fine for small tables, but a table with millions of rows is awkward to ship as one file. Splitting the export into fixed sized chunks keeps each file manageable for downstream consumers and parallel uploads. The technique below uses OFFSET FETCH NEXT in T-SQL combined with a Biml package that calls another Biml script to drive the loop.

This walkthrough builds on the flat file format and annotation pattern from the companion piece on exporting whole databases to flat files. Reuse the file formats, connections, and annotations defined there as the prerequisite metadata.

Building the Row Range Query

The first piece is a Biml file that emits only T-SQL. It accepts the source table name and the maximum number of rows per file as properties, then returns one row per output file with the starting offset and a zero padded file id:

<#@ template language="VB" #>
<#@ property name="tableName" type="string" #>
<#@ property name="maxRows" type="string" #>
SELECT (RowId - 1) * <#= maxRows #> AS StartRow
  , RIGHT(LTRIM(STR(100000 + RowId)), 5) AS FileId
FROM
(
  SELECT CAST(COUNT(*) / <#= maxRows #> AS INT) + CASE
    WHEN COUNT(*) / <#= maxRows #>.00 = CAST(COUNT(*) / <#= maxRows #> AS INT)
      THEN 0
      ELSE 1
    END AS FileCount
  FROM <#= tableName #>
) AS FileCount
INNER JOIN
(
  SELECT ROW_NUMBER() OVER(ORDER BY id) AS RowId
  FROM sysobjects
) AS RowHelper
  ON RowHelper.RowId <= FileCount

The result is a small dataset: one row per output file, with the offset and the suffix that the package will use when constructing the file path.

The Per Format Package Builder

The second Biml file accepts a flat file format and the maximum row count. It builds a package that reads the row range query into an object variable, then loops with a ForEachAdoLoop, populating one file at a time using two expression based variables for the SELECT statement and the destination file name:

<#@ template language="VB" designerbimlpath="Biml/Packages" #>
<#@ property name="ff" type="AstFlatFileFormatNode" #>
<#@ property name="maxRows" type="string" #>
<Package Name="ExportSplit_<#= ff.Name #>" ConstraintMode="Linear">
  <Variables>
    <Variable DataType="Object" Name="DataSet" />
    <Variable DataType="String" Name="FileId">00001</Variable>
    <Variable EvaluateAsExpression="true" DataType="String" Name="FileName">"C:\\Exports\\Split\\<#= ff #>_" + @[User::FileId] + ".csv"</Variable>
    <Variable EvaluateAsExpression="true" DataType="String" Name="SelectStmt">"SELECT <#= ff.GetTag("SourceColumns") #> FROM <#= ff.GetTag("SourceTable") #> ORDER BY <#= ff.GetTag("KeyColumns") #> OFFSET " + @[User::StartRow] + " ROWS FETCH NEXT <#= maxRows #> ROWS ONLY"</Variable>
    <Variable DataType="String" Name="StartRow">0</Variable>
  </Variables>
  <Connections>
    <Connection ConnectionName="<#= ff #>">
      <Expressions>
        <Expression ExternalProperty="ConnectionString">@[User::FileName]</Expression>
      </Expressions>
    </Connection>
  </Connections>
  <Tasks>
    <ExecuteSQL Name="Read Range Set" ConnectionName="OpsSource" ResultSet="Full">
      <DirectInput><#= CallBimlScript("ExportSplit_Select.biml", ff.GetTag("SourceTable"), maxRows) #></DirectInput>
      <Results>
        <Result Name="0" VariableName="User.DataSet" />
      </Results>
    </ExecuteSQL>
    <ForEachAdoLoop Name="For Each Range" SourceVariableName="User.DataSet">
      <Tasks>
        <Dataflow Name="Export <#= ff #>">
          <Transformations>
            <OleDbSource Name="SRC <#= ff.Name #>" ConnectionName="OpsSource">
              <VariableInput VariableName="User.SelectStmt" />
            </OleDbSource>
            <FlatFileDestination ConnectionName="<#= ff #>" Overwrite="true" Name="DEST <#= ff #>" />
          </Transformations>
        </Dataflow>
      </Tasks>
      <VariableMappings>
        <VariableMapping Name="0" VariableName="User.StartRow" />
        <VariableMapping Name="1" VariableName="User.FileId" />
      </VariableMappings>
    </ForEachAdoLoop>
  </Tasks>
</Package>

Two expression based variables do the heavy lifting. The connection string for the flat file destination is rebuilt each iteration so every file gets a unique name, and the SELECT statement is rebuilt with the current StartRow value so each iteration reads the next slice of rows.

Calling the Builder for One Format

The top level Biml file invokes the per format builder for a specific flat file format and a chosen split size:

<#@ template language="VB" tier="3" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <#= CallBimlScript("ExportSplit_Packages.biml", RootNode.FileFormats("Crm_Customer"), 2500) #>
  </Packages>
</Biml>

With a customer table holding roughly twenty thousand rows and a split size of 2500, the generated package produces eight files.

Notes on the Pattern

The approach uses CallBimlScript twice in a chain: the package builder calls the SELECT builder, and the top level file calls the package builder. Each output happens to be different (text for one, Biml for the other) which shows that BimlScript output is just text. Wrap the top level invocation in a loop over multiple file formats to generate a split export package per source table.

Why Split Exports Across Multiple Files​

Building the Row Range Query​

The Per Format Package Builder​

Calling the Builder for One Format​

Notes on the Pattern​

Why Split Exports Across Multiple Files

Building the Row Range Query

The Per Format Package Builder

Calling the Builder for One Format

Notes on the Pattern