Intro to XML
Since Biml is built on XML, understanding XML syntax is your first step toward automating ETL pipelines.
Why XML Matters for Data Engineers
XML isn't just academic knowledge for Biml developers:
- SSIS packages are XML files (.dtsx)
- Biml files are XML that generates those packages
- ADF pipelines use JSON (XML's cousin) with similar structure
- Database schemas can be represented and generated from XML definitions
Understanding XML means understanding what your tools actually produce. When you read a generated SSIS package or debug a Biml file, you're reading XML.
XML Fundamentals
Elements and Attributes
XML documents consist of elements (surrounded by angle brackets) and attributes (properties on elements).
<Table Name="Customer" Schema="Sales">
<Column Name="CustomerID" DataType="Int32" />
</Table>
<Table>and<Column>are elementsName="Customer"andDataType="Int32"are attributes- Elements can contain other elements (nesting)
- Attributes store simple values (strings, numbers)
Self-Closing Elements
Elements without children can use self-closing syntax:
<!-- These are equivalent -->
<Column Name="CustomerID" DataType="Int32"></Column>
<Column Name="CustomerID" DataType="Int32" />
The self-closing form (/>) is cleaner and commonly used in Biml.
Hierarchy and Nesting
XML naturally represents hierarchical data. This maps directly to how Biml organizes database objects:
<Connections>
<Connection Name="SourceDB" ConnectionString="..." />
</Connections>
<Databases>
<Database Name="Sales" ConnectionName="SourceDB">
<Schemas>
<Schema Name="dbo" />
</Schemas>
</Database>
</Databases>
<Tables>
<Table Name="Customer" SchemaName="Sales.dbo">
<Columns>
<Column Name="CustomerID" DataType="Int32" IsNullable="false" />
<Column Name="Name" DataType="String" Length="100" />
<Column Name="Email" DataType="String" Length="255" />
</Columns>
</Table>
</Tables>
Notice the pattern: collections use plural names (<Tables>, <Columns>) containing singular items (<Table>, <Column>). This is a Biml convention you'll see throughout.
Text Content
Elements can contain text directly. In Biml, this is commonly used for SQL queries:
<ExecuteSQL Name="GetCustomerCount" ConnectionName="SourceDB">
<DirectInput>SELECT COUNT(*) FROM Sales.Customer</DirectInput>
</ExecuteSQL>
Comments
Use comments to document your code or temporarily disable sections:
<Tables>
<!-- Production tables -->
<Table Name="Customer" SchemaName="Sales.dbo">
<Columns>
<Column Name="CustomerID" DataType="Int32" />
</Columns>
</Table>
<!-- Staging table - uncomment when ready
<Table Name="Customer_Staging" SchemaName="stg.dbo">
<Columns>
<Column Name="CustomerID" DataType="Int32" />
</Columns>
</Table>
-->
</Tables>
Comments begin with <!-- and end with -->. They cannot be nested.
Special Characters and Escaping
XML reserves certain characters. When your SQL queries contain <, >, or &, you have two options:
Option 1: Escape Individual Characters
| Character | Escape Sequence |
|---|---|
< | < |
> | > |
& | & |
" | " |
<DirectInput>SELECT * FROM Orders WHERE Amount < 1000</DirectInput>
Option 2: CDATA Blocks (Recommended for SQL)
For SQL queries with multiple special characters, wrap the content in CDATA:
<ExecuteSQL Name="GetRecentSmallOrders" ConnectionName="SourceDB">
<DirectInput><![CDATA[
SELECT OrderID, CustomerID, Amount
FROM Sales.Orders
WHERE OrderDate > '2024-01-01'
AND Amount < 1000
AND Status <> 'Cancelled'
]]></DirectInput>
</ExecuteSQL>
CDATA blocks (<![CDATA[ ... ]]>) tell the XML parser to treat everything inside as literal text. This is the preferred approach for SQL in Biml.
XML Namespaces: The Biml Declaration
Every Biml file starts with a namespace declaration:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<!-- Your Biml content here -->
</Biml>
The xmlns attribute tells tools that this XML follows the Biml schema. This enables:
- Validation: Catch errors before building
- Intellisense: Auto-complete in editors
- Documentation: Tools understand the structure
From XML to Biml: The Pattern
Notice how generic XML patterns map directly to Biml:
Generic XML:
<root>
<collection>
<item name="Example" property="Value" />
</collection>
</root>
Same pattern in Biml:
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="LoadCustomers" ConstraintMode="Linear" />
</Packages>
</Biml>
The structure is identical. Once you understand XML, you understand Biml's organization.
Quick Reference
| XML Concept | Biml Example | Purpose |
|---|---|---|
| Element | <Package>, <Table>, <Column> | Define objects |
| Attribute | Name="Customer", DataType="Int32" | Set properties |
| Child elements | <Columns> inside <Table> | Nested objects |
| Collection wrapper | <Tables>, <Packages>, <Connections> | Group related items |
| Self-closing | <Column Name="ID" DataType="Int32" /> | Elements without children |
| Comment | <!-- Load staging tables --> | Documentation |
| CDATA | <![CDATA[SELECT * FROM...]]> | SQL with special characters |
| Namespace | xmlns="http://schemas.varigence.com/biml.xsd" | Identifies file as Biml |
Next Steps
Now that you understand XML structure, you're ready to write Biml:
- What is Biml - Overview of Biml capabilities
- Biml Basics - Your first Biml file and SSIS package
- Biml Basics for Relational DBs - Tables, columns, and schemas