Skip to main content

Conditional Splits in the Data Flow

Why the Conditional Split Matters

Filtering rows in the source query is usually preferable to splitting them in the data flow because the database can do that work efficiently and only the needed rows enter the buffer. The Conditional Split transformation earns its place when the same buffer must take multiple downstream paths, which is exactly the case for hand built Slowly Changing Dimension logic.

A typical hand built SCD load works like this:

  1. The dimension table carries a hash column that summarizes its changing attributes.
  2. The ETL package builds a view of the source rows and computes the same hash.
  3. The package looks up the dimension hash and routes each row to one of three streams.
    • No matching hash means the record is new and must be inserted.
    • A matching key with a different hash means the changing attributes have changed, so the existing row is expired and a new row is inserted.
    • A matching hash means nothing has changed and the row is dropped.

A single Conditional Split fans the buffer into all three of those paths.

Defining a Split in Biml

The 'ConditionalSplit' element holds an 'OutputPaths' block with one 'OutputPath' per non default branch. The 'Expression' element on each path holds the SSIS expression that selects rows for that path. The default output is implicit; downstream consumers reference it by appending '.Default' to the transformation name.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="SalesSource" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2019;Integrated Security=SSPI;Initial Catalog=SalesAnalytics" />
<OleDbConnection Name="StagingDb" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2019;Integrated Security=SSPI;Initial Catalog=Staging" />
</Connections>
<Packages>
<Package Name="SplitByLastInitial" ConstraintMode="Linear">
<Tasks>
<Dataflow Name="RouteCustomers">
<Transformations>

<OleDbSource Name="ReadCustomers" ConnectionName="SalesSource">
<DirectInput>
select
CustomerCode as customer_code,
GivenName as first_name,
FamilyName as last_name
from sales.Customers
</DirectInput>
</OleDbSource>

<ConditionalSplit Name="SplitOnInitial">
<OutputPaths>
<OutputPath Name="StartsWithA" IsErrorOutput="false">
<Expression>LEFT(last_name,1)=="A"</Expression>
</OutputPath>
<OutputPath Name="StartsWithB" IsErrorOutput="false">
<Expression>LEFT(last_name,1)=="B"</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>

<OleDbDestination Name="WriteDefault" ConnectionName="StagingDb">
<InputPath OutputPathName="SplitOnInitial.Default" />
<ExternalTableOutput Table="stg.Customers" />
</OleDbDestination>

<OleDbDestination Name="WriteA" ConnectionName="StagingDb">
<InputPath OutputPathName="SplitOnInitial.StartsWithA" />
<ExternalTableOutput Table="stg.CustomersA" />
</OleDbDestination>

<OleDbDestination Name="WriteB" ConnectionName="StagingDb">
<InputPath OutputPathName="SplitOnInitial.StartsWithB" />
<ExternalTableOutput Table="stg.CustomersB" />
</OleDbDestination>

</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

XML Escapes for Comparison Operators

SSIS expressions use C style operators, which means equality is the double equals symbol. That part causes no XML trouble. The greater than, less than, and ampersand characters do, because XML parses them as markup. Substitute the entity references when an expression needs them: '>' for greater than, '<' for less than, and '&' for ampersand.

The example below splits last names into A through L on one path and M onwards on another. The first path uses a compound expression that requires both an escaped less than and an escaped ampersand:

<ConditionalSplit Name="SplitByRange">
<OutputPaths>
<OutputPath Name="ThroughL" IsErrorOutput="false">
<Expression>LEFT(last_name,1)&gt;="A" &amp;&amp; LEFT(last_name,1)&lt;"M"</Expression>
</OutputPath>
<OutputPath Name="FromM" IsErrorOutput="false">
<Expression>LEFT(last_name,1)&gt;="M"</Expression>
</OutputPath>
</OutputPaths>
</ConditionalSplit>

The compiler converts the entity references back to the literal characters when it emits the .dtsx, so the saved expressions read normally inside the package designer.