Extracting Database Tables to Raw Files
Why Raw Files Over Flat Files
Flat files lose type metadata: every column lands as text and downstream consumers have to reparse and recast. Raw files preserve the column definitions exactly, which makes them a better choice for SSIS-to-SSIS handoffs and for staging steps that read back into another package without having to reinterpret types.
Skeleton
The structure is the same as for flat file output, with two named sections that the package will reference:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<!-- connections go here -->
</Connections>
<FileFormats>
<!-- raw file format goes here -->
</FileFormats>
<Packages>
<Package Name="CopyAccount" ConstraintMode="Linear">
<!-- dataflow goes here -->
</Package>
</Packages>
</Biml>
Connections
The source is an OLE DB connection. The output is a file connection that points at the raw file location:
<Connections>
<OleDbConnection Name="Source" ConnectionString="Provider=SQLOLEDB;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=OperationsDW" />
<FileConnection Name="Target" FilePath="F:\\Output.raw" />
</Connections>
File Format
A 'RawFileFormat' element under 'FileFormats' lists the columns to write. Order does not matter because the dataflow auto maps by name; only the format name matters when the destination references it:
<FileFormats>
<RawFileFormat Name="Output">
<Columns>
<Column Name="LedgerKey" />
<Column Name="LedgerCodeAlternateKey" />
<Column Name="LedgerDescription" />
<Column Name="LedgerType" />
<Column Name="Operator" />
</Columns>
</RawFileFormat>
</FileFormats>
Key Difference From Flat File Output
A flat file destination needs only the file connection because the connection itself carries the format. A raw file destination needs both the file connection and the file format, separately:
<Dataflow Name="Copy_Dim_Ledger">
<Transformations>
<OleDbSource Name="Src_Dim_Ledger" ConnectionName="Source">
<DirectInput>Select LedgerKey, LedgerCodeAlternateKey, LedgerDescription, LedgerType, Operator from DimLedger;</DirectInput>
</OleDbSource>
<RawFileDestination Name="Dest_Dim_Ledger" RawFileFormatName="Output" WriteOption="CreateAlways">
<FileOutput ConnectionName="Target" />
</RawFileDestination>
</Transformations>
</Dataflow>
The 'RawFileFormatName' attribute references the named 'RawFileFormat' and the 'FileOutput' element references the file connection.
Complete Example
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="Source" ConnectionString="Provider=SQLOLEDB;Data Source=localhost\SQL2019;Integrated Security=SSPI;Initial Catalog=OperationsDW" />
<FileConnection Name="Target" FilePath="F:\\Output.raw" />
</Connections>
<FileFormats>
<RawFileFormat Name="Output">
<Columns>
<Column Name="LedgerKey" />
<Column Name="LedgerCodeAlternateKey" />
<Column Name="LedgerDescription" />
<Column Name="LedgerType" />
<Column Name="Operator" />
</Columns>
</RawFileFormat>
</FileFormats>
<Packages>
<Package Name="CopyTableToRawFile" ConstraintMode="Parallel" ProtectionLevel="DontSaveSensitive">
<Tasks>
<Dataflow Name="Copy_Dim_Ledger">
<Transformations>
<OleDbSource Name="Src_Dim_Ledger" ConnectionName="Source">
<DirectInput>Select LedgerKey, LedgerCodeAlternateKey, LedgerDescription, LedgerType, Operator from DimLedger;</DirectInput>
</OleDbSource>
<RawFileDestination Name="Dest_Dim_Ledger" RawFileFormatName="Output" WriteOption="CreateAlways">
<FileOutput ConnectionName="Target" />
</RawFileDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
Where This Pattern Leads
The single-table version is the building block for extracting an entire database to raw files, which mimics the staging step of a data warehouse load. Once the per-table shape is in place, generating one of these dataflows for every base table in a database is a small extension.