Skip Existing Objects with Biml
Why Avoid Recreating Everything
The default behavior of a Biml DDL generator is to drop and recreate every table on every run. That works during development and on small staging databases. It becomes a problem when staging tables are large, when the create step is followed by a multi hour reload, or when an out of band index has been added to the staging table to speed up downstream queries. A blind drop and recreate throws away the data and the index for no good reason if the table definition has not actually changed.
A small change to the generation template fixes this. Before emitting the DROP and CREATE for a table, the template asks the live database for the same table and compares its column list against the Biml definition. If the two match, the table is left alone. If they differ, the table is recreated.
Reading the Live Schema
The 'ImportDB' helper on a connection node returns an in memory schema graph for the database. It is the same shape as the Biml model, which is what makes the comparison straightforward:
<#
var warehouseConn = SchemaManager.CreateConnectionNode(
"WarehouseConn",
"Data Source=DwServer;Initial Catalog=WarehouseDb;Integrated Security=SSPI;Provider=SQLNCLI11;");
var liveSchema = warehouseConn.ImportDB(
new List<string> { "staging" }, // schemas
null, // all tables
ImportOptions.ExcludeIndexes
| ImportOptions.ExcludeForeignKey
| ImportOptions.ExcludeCheckConstraint);
#>
The result is a table set that mirrors the staging schema as it actually exists in the database.
Comparing Each Table
For each Biml managed table, the template looks up the same name in the live schema. If no match is found, the table is new and needs to be created. If a match is found, the column lists are compared. The simplest comparison serializes both column collections to Biml and checks for equality:
<# foreach (var stagingTable in RootNode.Tables.Where(t => t.Schema.Name == "staging")) { #>
<#
var liveTable = liveSchema.TableNodes
.FirstOrDefault(t => t.Name == stagingTable.Name && t.Schema.Name == stagingTable.Schema.Name);
bool needsRecreate = liveTable == null
|| liveTable.Columns.GetBiml() != stagingTable.Columns.GetBiml();
#>
<# if (needsRecreate) { #>
<ExecuteSQL Name="Create_<#=stagingTable.Name#>" ConnectionName="WarehouseConn">
<DirectInput><#=stagingTable.GetDropAndCreateDdl()#></DirectInput>
</ExecuteSQL>
<# } #>
<# } #>
Tables that already match the Biml definition produce no Execute SQL task at all. Tables that are missing or that have a different column shape produce the usual DROP and CREATE.
What 'GetBiml' on the Column Collection Buys
Calling 'GetBiml' on the column collection serializes the columns to their Biml string representation, including data type, length, precision, scale, and nullability. A pure string compare on those serializations is a strict but conservative test. If anything material has changed, the strings differ and the table is recreated. A non material change (column reordering inside the database, an extended property, a check constraint that was excluded from the import) does not count.
For projects that need a finer grained comparison (for example, only recreate when a column is dropped, not when one is added), replace the string equality check with a column by column walk that distinguishes additive from destructive changes.
When This Pattern Pays Off
Three situations make this pattern worth the small extra complexity:
- Staging tables hold reference data that is expensive to reload.
- Out of band indexes have been added on top of the Biml managed schema.
- The same generation template runs on multiple environments and most environments do not change between runs.
In all three cases, the conditional create avoids unnecessary IO and keeps downstream consumers happy.