Skip to main content

Intro to XML

Since Biml is built on XML, understanding XML syntax is your first step toward automating ETL pipelines.

Why XML Matters for Data Engineers

XML isn't just academic knowledge for Biml developers:

  • SSIS packages are XML files (.dtsx)
  • Biml files are XML that generates those packages
  • ADF pipelines use JSON (XML's cousin) with similar structure
  • Database schemas can be represented and generated from XML definitions

Understanding XML means understanding what your tools actually produce. When you read a generated SSIS package or debug a Biml file, you're reading XML.

XML Fundamentals

Elements and Attributes

XML documents consist of elements (surrounded by angle brackets) and attributes (properties on elements).

<Table Name="Customer" Schema="Sales">
<Column Name="CustomerID" DataType="Int32" />
</Table>
  • <Table> and <Column> are elements
  • Name="Customer" and DataType="Int32" are attributes
  • Elements can contain other elements (nesting)
  • Attributes store simple values (strings, numbers)

Self-Closing Elements

Elements without children can use self-closing syntax:

<!-- These are equivalent -->
<Column Name="CustomerID" DataType="Int32"></Column>
<Column Name="CustomerID" DataType="Int32" />

The self-closing form (/>) is cleaner and commonly used in Biml.

Hierarchy and Nesting

XML naturally represents hierarchical data. This maps directly to how Biml organizes database objects:

<Connections>
<Connection Name="SourceDB" ConnectionString="..." />
</Connections>
<Databases>
<Database Name="Sales" ConnectionName="SourceDB">
<Schemas>
<Schema Name="dbo" />
</Schemas>
</Database>
</Databases>
<Tables>
<Table Name="Customer" SchemaName="Sales.dbo">
<Columns>
<Column Name="CustomerID" DataType="Int32" IsNullable="false" />
<Column Name="Name" DataType="String" Length="100" />
<Column Name="Email" DataType="String" Length="255" />
</Columns>
</Table>
</Tables>

Notice the pattern: collections use plural names (<Tables>, <Columns>) containing singular items (<Table>, <Column>). This is a Biml convention you'll see throughout.

Text Content

Elements can contain text directly. In Biml, this is commonly used for SQL queries:

<ExecuteSQL Name="GetCustomerCount" ConnectionName="SourceDB">
<DirectInput>SELECT COUNT(*) FROM Sales.Customer</DirectInput>
</ExecuteSQL>

Comments

Use comments to document your code or temporarily disable sections:

<Tables>
<!-- Production tables -->
<Table Name="Customer" SchemaName="Sales.dbo">
<Columns>
<Column Name="CustomerID" DataType="Int32" />
</Columns>
</Table>

<!-- Staging table - uncomment when ready
<Table Name="Customer_Staging" SchemaName="stg.dbo">
<Columns>
<Column Name="CustomerID" DataType="Int32" />
</Columns>
</Table>
-->
</Tables>

Comments begin with <!-- and end with -->. They cannot be nested.

Special Characters and Escaping

XML reserves certain characters. When your SQL queries contain <, >, or &, you have two options:

Option 1: Escape Individual Characters

CharacterEscape Sequence
<&lt;
>&gt;
&&amp;
"&quot;
<DirectInput>SELECT * FROM Orders WHERE Amount &lt; 1000</DirectInput>

For SQL queries with multiple special characters, wrap the content in CDATA:

<ExecuteSQL Name="GetRecentSmallOrders" ConnectionName="SourceDB">
<DirectInput><![CDATA[
SELECT OrderID, CustomerID, Amount
FROM Sales.Orders
WHERE OrderDate > '2024-01-01'
AND Amount < 1000
AND Status <> 'Cancelled'
]]></DirectInput>
</ExecuteSQL>

CDATA blocks (<![CDATA[ ... ]]>) tell the XML parser to treat everything inside as literal text. This is the preferred approach for SQL in Biml.

XML Namespaces: The Biml Declaration

Every Biml file starts with a namespace declaration:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!-- Your Biml content here -->
</Biml>

The xmlns attribute tells tools that this XML follows the Biml schema. This enables:

  • Validation: Catch errors before building
  • Intellisense: Auto-complete in editors
  • Documentation: Tools understand the structure

From XML to Biml: The Pattern

Notice how generic XML patterns map directly to Biml:

Generic XML:

<root>
<collection>
<item name="Example" property="Value" />
</collection>
</root>

Same pattern in Biml:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="LoadCustomers" ConstraintMode="Linear" />
</Packages>
</Biml>

The structure is identical. Once you understand XML, you understand Biml's organization.

Quick Reference

XML ConceptBiml ExamplePurpose
Element<Package>, <Table>, <Column>Define objects
AttributeName="Customer", DataType="Int32"Set properties
Child elements<Columns> inside <Table>Nested objects
Collection wrapper<Tables>, <Packages>, <Connections>Group related items
Self-closing<Column Name="ID" DataType="Int32" />Elements without children
Comment<!-- Load staging tables -->Documentation
CDATA<![CDATA[SELECT * FROM...]]>SQL with special characters
Namespacexmlns="http://schemas.varigence.com/biml.xsd"Identifies file as Biml

Next Steps

Now that you understand XML structure, you're ready to write Biml: