Skip to main content

Delayed Durability for High Throughput Biml Migrations

Why This Pattern Matters

Biml driven migration projects often run hundreds of small SSIS packages in parallel against the same staging database. On well provisioned hardware this is fine. On a constrained shared environment with throttled disk, the same workload can become log writer bound, with WRITELOG waits dominating the wait stats and total runtime ballooning. SQL Server has a feature, Delayed Durability, designed for exactly this case. It can be enabled at the database level without touching the Biml framework or the generated packages, and the throughput gain on the right workload is large.

This walkthrough describes the symptom, the fix, and the trade off. It is not a recommendation to enable Delayed Durability everywhere; it is a recommendation to consider it when the workload matches.

The Workload

A typical Biml migration project moves data through three layers: RawSource, Source, and Staging. A meta data driven Biml framework emits one child package per source table, and a master package orchestrates the children using sequence containers. Within each container, packages run in parallel up to the SSIS engine limit. Leaving 'MaxConcurrentExecutables' at -1 lets SSIS run up to the number of processors plus two packages at once.

The result is a high concurrency workload writing to one or two staging databases. On hardware with fast local storage this scales well. On a throttled SAN with shared disk, the log writer becomes the bottleneck.

The Symptom

A migration that ran in twenty five minutes on the development team's hardware took 141 minutes on the client environment. Wait statistics showed WRITELOG consistently in the top three waits. Transactions per second hovered around 7,000. The packages themselves were unchanged; only the underlying disk had changed.

WRITELOG indicates that sessions are waiting for the log writer to flush log records to disk. With many parallel SSIS packages all committing small transactions against the same database, the log writer cannot keep up with the demand and every commit pays a serialized write cost.

What Delayed Durability Does

Delayed Durability is a SQL Server feature that delays the log flush until a buffer fills, a timeout fires, or an explicit 'sp_flush_log' is called. Sessions still commit logically, but the durability guarantee is deferred. The trade off is data loss on a crash: any committed but unflushed transactions are lost. For a migration that is a full drop and reload, that trade off is acceptable because the process can be rerun from scratch.

Enabling Delayed Durability at the database level looks like this:

ALTER DATABASE StagingArea
SET DELAYED_DURABILITY = FORCED;

FORCED makes every transaction in the database delayed. ALLOWED leaves the choice to each transaction through the COMMIT WITH option. For a migration use case, FORCED is simpler.

The Result

Applying FORCED Delayed Durability to the source and staging databases reduced the test migration from 141 minutes to 59 minutes, a roughly 58 percent reduction. Transactions per second went from approximately 7,000 to approximately 12,500. WRITELOG dropped well down the wait stat list. No code or package changes were needed.

The framework, the metadata, and the Biml generator stayed identical. The change was a single ALTER DATABASE statement per database.

When This Is Appropriate

Delayed Durability is a good fit for:

  • Migration and reload workloads where the source data still exists and the process can be rerun
  • Staging databases that are truncated and reloaded as part of every batch
  • Heavy parallel insert workloads where WRITELOG dominates wait stats

It is not a good fit for:

  • Production OLTP databases where any committed transaction must survive a crash
  • Workloads where the log writer is not the bottleneck (check wait stats first)
  • Any database where data loss on the order of the flush interval is unacceptable

The general rule: if losing the most recent few seconds of committed transactions on a crash would be a problem, do not enable Delayed Durability on that database.

Diagnosing Whether This Applies

Before enabling Delayed Durability, confirm that WRITELOG is actually the bottleneck. The 'sys.dm_os_wait_stats' DMV shows cumulative waits since the server started. A snapshot of currently active sessions, taken from 'sys.dm_exec_requests' or a session level stored procedure, shows what sessions are waiting on right now. If WRITELOG appears prominently for the database and the workload pattern matches the parallel insert profile above, Delayed Durability is worth testing.

Conclusion

Delayed Durability is a precise tool for a precise problem. On the right workload, the change is a single statement and the throughput improvement is large. On the wrong workload, the data loss exposure is real. Verify the wait stats first, confirm the workload can be rerun, then test in a non production copy before applying to a real environment.