Generating Data Profiling Tasks for Every Table with BimlScript
Why This Pattern Matters
The Data Profiling Task in SSIS is a fast way to inspect the shape of a table or view: null ratios, value distributions, length distributions, common patterns, and inclusion checks. Configuring it manually for one table is straightforward. Configuring it for every table in a wide source database is repetitive and error prone. A single BimlScript file can iterate over the catalog and emit the task once per table, producing a complete profiling package without hand authoring.
What the Data Profiling Task Provides
The component reads a single table or view and writes the results to a flat file in XML format, viewable with the SQL Server Data Profile Viewer. The most common profile requests are:
- Column Null Ratio Profile Request: count and percentage of NULLs per column
- Column Statistics Profile Request: min, max, mean, and standard deviation for numeric and date columns
- Column Length Distribution Request: min, max, and length distribution for text columns
- Column Value Distribution Request: most frequent values and their share of the column
- Column Pattern Profile Request: common regex patterns inferred from text values
- Functional Dependency Profile Request: dependency strength between two columns
- Value Inclusion Profile Request: foreign key candidate analysis between two columns
The Generation Script
The script connects to the source database, reads the user table list with a small T-SQL query, and emits one Data Profiling Task per table with five of the standard profile requests. Each task writes to its own file connection, so the output is one XML file per table.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#
string profilingConnectionString =
@"Data Source=.\sqlsrv;Initial Catalog=RetailMart;Integrated Security=True;";
var profiledTables = new List<string>();
using (SqlConnection conn = new SqlConnection(profilingConnectionString))
{
string tableQuery = @"
SELECT QUOTENAME(SCHEMA_NAME(o.schema_id)) AS SchemaName,
o.name AS TableName,
SUM(p.rows) AS [RowCount]
FROM sys.objects AS o
INNER JOIN sys.partitions AS p
ON o.object_id = p.object_id
WHERE o.type = 'U'
AND o.is_ms_shipped = 0x0
AND p.index_id < 2
AND o.name LIKE 'Dim%'
GROUP BY o.schema_id, o.name
ORDER BY 3 DESC;";
using (SqlCommand cmd = new SqlCommand(tableQuery, conn))
{
conn.Open();
using (SqlDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
profiledTables.Add(reader.GetString(1));
}
}
}
}
#>
<Connections>
<AdoNetConnection
Name="ProfilingSource"
Provider="System.Data.SqlClient.SqlConnection, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
ConnectionString="<#=profilingConnectionString#>" />
<# foreach (string table in profiledTables) { #>
<FileConnection
Name="ProfileOutput<#=table#>"
FileUsageType="CreateFile"
FilePath="C:\Profiling\<#=table#>.xml" />
<# } #>
</Connections>
<Packages>
<Package Name="ProfileDimensions" ConstraintMode="Linear">
<Tasks>
<Container Name="DimensionProfiles">
<Tasks>
<# foreach (string table in profiledTables) { #>
<DataProfiling Name="<#=table#>" OverwriteDestination="true">
<FileOutput ConnectionName="ProfileOutput<#=table#>" />
<ProfileRequests>
<ColumnNullRatioProfileRequest
Name="NullRatioReq" ConnectionName="ProfilingSource"
SchemaId="dbo" TableId="<#=table#>" />
<ColumnStatisticsProfileRequest
Name="StatisticsReq" ConnectionName="ProfilingSource"
SchemaId="dbo" TableId="<#=table#>" />
<ColumnLengthDistributionProfileRequest
Name="LengthDistReq" ConnectionName="ProfilingSource"
SchemaId="dbo" TableId="<#=table#>" />
<ColumnValueDistributionProfileRequest
Name="ValueDistReq" ConnectionName="ProfilingSource"
SchemaId="dbo" TableId="<#=table#>" />
<ColumnPatternProfileRequest
Name="PatternReq" ConnectionName="ProfilingSource"
SchemaId="dbo" TableId="<#=table#>" />
</ProfileRequests>
</DataProfiling>
<# } #>
</Tasks>
</Container>
</Tasks>
</Package>
</Packages>
</Biml>
<#@ template language="C#" hostspecific="true" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
The script first declares the source connection string, then queries 'sys.objects' for user tables that match a name filter (here, dimensions whose name begins with 'Dim'). The result list drives two loops: one in 'Connections' to define a per-table file connection, and one in 'Tasks' to emit the matching DataProfiling task. The 'SchemaId' is hard coded to 'dbo' to keep the example focused, but a real script would carry the schema name through the loop alongside the table name.
After right-clicking the Biml file and choosing Generate SSIS Packages, the output is a single package containing one Data Profiling Task per table, each writing to its own XML output file.
Practical Notes
- Pasting the Biml into Visual Studio can introduce stray formatting characters that break validation. Disable Tools, Options, Text Editor, XML, Formatting, Auto Reformat, On paste from clipboard.
- The package generation can fail with sensitive data warnings if the project ProtectionLevel does not match. Either align the project setting or set ProtectionLevel explicitly on the generated package.
- The example assumes every table lives in the 'dbo' schema. To handle multiple schemas, carry the schema name through the loop and use it in 'SchemaId'.
Why Generate Instead of Author
A single Biml file can produce profiling coverage for hundreds of tables in seconds. Adding a new table requires no manual SSIS work; rerunning the generator picks it up automatically. The same metadata-driven approach can drive any other repetitive SSIS work, including stage loads, audit checks, and archival packages.