LINQ for BimlScript

LINQ (Language Integrated Query) is a C# feature that lets you query and transform collections using readable, chainable methods. For BimlScript, LINQ is essential for filtering tables, selecting columns, grouping metadata, and building dynamic ETL pipelines.

This guide extends the C# Primer LINQ fundamentals with advanced methods, performance optimization, and real-world patterns. If you're new to LINQ, start with the primer first.

Required import:

<#@ import namespace="System.Linq" #>

Quick Reference

All commonly used LINQ methods with SQL equivalents and Biml examples:

Filtering and Selection

Method	Purpose	SQL Equivalent	Example
`.Where()`	Filter items	`WHERE`	`tables.Where(t => t.Schema.Name == "Sales")`
`.Select()`	Transform/project	`SELECT`	`tables.Select(t => t.Name)`
`.SelectMany()`	Flatten nested	Subquery/Cross join	`tables.SelectMany(t => t.Columns)`
`.Distinct()`	Remove duplicates	`DISTINCT`	`tables.Select(t => t.Schema.Name).Distinct()`
`.Take()`	Limit results	`TOP n`	`tables.Take(10)`
`.Skip()`	Skip results	`OFFSET`	`tables.Skip(5).Take(10)`

Finding Items

Method	Purpose	SQL Equivalent	Example
`.First()`	Get first (throws if empty)	`TOP 1`	`tables.First()`
`.FirstOrDefault()`	Get first or null	`TOP 1`	`tables.FirstOrDefault(t => t.Name == "Customer")`
`.Single()`	Get only one (throws if not exactly one)	Expects 1 row	`tables.Single(t => t.Name == "Config")`
`.SingleOrDefault()`	Get only one or null	Expects 0-1 rows	`tables.SingleOrDefault(t => t.Name == "Config")`
`.Last()`	Get last item	N/A	`tables.OrderBy(t => t.Name).Last()`
`.ElementAt()`	Get by index	N/A	`tables.ElementAt(5)`

Boolean Checks

Method	Purpose	SQL Equivalent	Example
`.Any()`	Check if any exist	`EXISTS`	`table.Columns.Any(c => c.IsIdentity)`
`.All()`	Check all match	`NOT EXISTS (inverse)`	`table.Columns.All(c => c.IsNullable)`
`.Contains()`	Check if item exists	`IN`	`schemaList.Contains(t.Schema.Name)`

Sorting

Method	Purpose	SQL Equivalent	Example
`.OrderBy()`	Sort ascending	`ORDER BY ASC`	`tables.OrderBy(t => t.Name)`
`.OrderByDescending()`	Sort descending	`ORDER BY DESC`	`tables.OrderByDescending(t => t.Columns.Count)`
`.ThenBy()`	Secondary sort	`ORDER BY x, y`	`tables.OrderBy(t => t.Schema.Name).ThenBy(t => t.Name)`
`.ThenByDescending()`	Secondary desc sort	`ORDER BY x, y DESC`	`tables.OrderBy(t => t.Schema.Name).ThenByDescending(t => t.Name)`
`.Reverse()`	Reverse order	N/A	`tables.OrderBy(t => t.Name).Reverse()`

Aggregation

Method	Purpose	SQL Equivalent	Example
`.Count()`	Count items	`COUNT(*)`	`table.Columns.Count()`
`.Sum()`	Sum values	`SUM()`	`tables.Sum(t => t.Columns.Count)`
`.Max()`	Find maximum	`MAX()`	`tables.Max(t => t.Columns.Count)`
`.Min()`	Find minimum	`MIN()`	`tables.Min(t => t.Columns.Count)`
`.Average()`	Calculate average	`AVG()`	`tables.Average(t => t.Columns.Count)`
`.Aggregate()`	Custom aggregation	N/A	`names.Aggregate((a, b) => a + ", " + b)`

Grouping

Method	Purpose	SQL Equivalent	Example
`.GroupBy()`	Group items	`GROUP BY`	`tables.GroupBy(t => t.Schema.Name)`
`.ToLookup()`	Create grouped dictionary	`GROUP BY` (cached)	`tables.ToLookup(t => t.Schema.Name)`

Joining and Combining

Method	Purpose	SQL Equivalent	Example
`.Join()`	Inner join	`INNER JOIN`	See Joining section
`.GroupJoin()`	Left outer join	`LEFT JOIN`	See Joining section
`.Concat()`	Combine lists	`UNION ALL`	`list1.Concat(list2)`
`.Union()`	Combine unique	`UNION`	`list1.Union(list2)`
`.Intersect()`	Common items	`INTERSECT`	`list1.Intersect(list2)`
`.Except()`	Difference	`EXCEPT`	`list1.Except(list2)`
`.Zip()`	Pair items	N/A	`list1.Zip(list2, (a, b) => a + b)`

Conversion

Method	Purpose	When to Use
`.ToList()`	Convert to List	Materialize for reuse
`.ToArray()`	Convert to array	Fixed-size collection
`.ToDictionary()`	Create key-value lookup	Fast lookup by key
`.ToLookup()`	Create grouped lookup	Multiple values per key
`.Cast<T>()`	Cast elements	Change element type
`.OfType<T>()`	Filter by type	Get only matching types

Filtering Patterns

Multiple Conditions

Combine conditions with && (AND) and || (OR):

// AND: Tables in Sales schema that aren't views
var salesTables = RootNode.Tables
    .Where(t => t.Schema.Name == "Sales" && !t.Name.StartsWith("v"));

// OR: Tables in Sales or Production schemas
var targetTables = RootNode.Tables
    .Where(t => t.Schema.Name == "Sales" || t.Schema.Name == "Production");

// Complex: Chained Where (equivalent to AND)
var filteredTables = RootNode.Tables
    .Where(t => t.Schema.Name == "Sales")
    .Where(t => t.Columns.Count > 5)
    .Where(t => !t.Name.Contains("Archive"));

Filtering with External Lists

Use .Contains() to filter against a list of values:

// Include specific schemas
var includedSchemas = new List<string> { "Sales", "Production", "Person" };
var targetTables = RootNode.Tables
    .Where(t => includedSchemas.Contains(t.Schema.Name));

// Exclude specific tables
var excludedTables = new List<string> { "sysdiagrams", "ErrorLog", "BuildVersion" };
var cleanTables = RootNode.Tables
    .Where(t => !excludedTables.Contains(t.Name));

String Filtering Methods

// Tables starting with "dim"
var dimensions = RootNode.Tables.Where(t => t.Name.StartsWith("dim"));

// Tables ending with "History"
var historyTables = RootNode.Tables.Where(t => t.Name.EndsWith("History"));

// Tables containing "Customer"
var customerTables = RootNode.Tables.Where(t => t.Name.Contains("Customer"));

// Case-insensitive comparison
var matches = RootNode.Tables
    .Where(t => t.Name.Equals("customer", StringComparison.OrdinalIgnoreCase));

Nested Collection Filtering

Filter parent objects based on child collection properties:

// Tables that have at least one identity column
var tablesWithIdentity = RootNode.Tables
    .Where(t => t.Columns.Any(c => c.IsIdentity));

// Tables where ALL columns are nullable
var fullyNullableTables = RootNode.Tables
    .Where(t => t.Columns.All(c => c.IsNullable));

// Tables with more than 20 columns
var wideTables = RootNode.Tables
    .Where(t => t.Columns.Count > 20);

// Tables containing a datetime column
var tablesWithDates = RootNode.Tables
    .Where(t => t.Columns.Any(c =>
        c.DataType == DbType.DateTime || c.DataType == DbType.DateTime2));

Null-Safe Filtering

Use the null-conditional operator ?. to avoid null reference exceptions:

// Safe schema access (schema might be null)
var salesTables = RootNode.Tables
    .Where(t => t.Schema?.Name == "Sales");

// Safe annotation access
var taggedTables = RootNode.Tables
    .Where(t => t.GetTag("LoadType")?.ToLower() == "incremental");

// Null coalescing with default value
var tables = RootNode.Tables
    .Where(t => (t.GetTag("Priority") ?? "Normal") == "High");

Grouping and Aggregation

Basic Grouping

.GroupBy() returns groups where each group has a Key and contains matching items:

// Group tables by schema
var tablesBySchema = RootNode.Tables.GroupBy(t => t.Schema.Name);

foreach (var group in tablesBySchema)
{
    // group.Key = "Sales", "Production", etc.
    // group = IEnumerable<AstTableNode> containing tables in that schema

    var schemaName = group.Key;
    var tableCount = group.Count();

    foreach (var table in group)
    {
        // Process each table in this schema
    }
}

Grouping with Projection

Transform groups into custom objects:

// Get schema statistics
var schemaStats = RootNode.Tables
    .GroupBy(t => t.Schema.Name)
    .Select(g => new {
        Schema = g.Key,
        TableCount = g.Count(),
        TotalColumns = g.Sum(t => t.Columns.Count),
        AvgColumns = g.Average(t => t.Columns.Count)
    });

foreach (var stat in schemaStats)
{
    // stat.Schema, stat.TableCount, stat.TotalColumns, stat.AvgColumns
}

Generate Packages by Group

Common pattern: create one master package per schema:

<# foreach (var schemaGroup in RootNode.Tables.GroupBy(t => t.Schema.Name)) { #>
<Package Name="Master_<#= schemaGroup.Key #>" ConstraintMode="Parallel">
  <Tasks>
    <# foreach (var table in schemaGroup) { #>
    <ExecutePackage Name="Load_<#= table.Name #>">
      <Package PackageName="Load_<#= table.Name #>" />
    </ExecutePackage>
    <# } #>
  </Tasks>
</Package>
<# } #>

Aggregation Examples

// Count all tables
var totalTables = RootNode.Tables.Count();

// Count tables matching condition
var salesTableCount = RootNode.Tables.Count(t => t.Schema.Name == "Sales");

// Sum of all columns across all tables
var totalColumns = RootNode.Tables.Sum(t => t.Columns.Count);

// Find table with most columns
var maxColumns = RootNode.Tables.Max(t => t.Columns.Count);
var widestTable = RootNode.Tables.First(t => t.Columns.Count == maxColumns);

// Average columns per table
var avgColumns = RootNode.Tables.Average(t => t.Columns.Count);

// Build comma-separated list
var tableNames = RootNode.Tables.Select(t => t.Name);
var nameList = string.Join(", ", tableNames);
// Or using Aggregate:
var nameList = tableNames.Aggregate((current, next) => current + ", " + next);

Working with Nested Collections

Flattening with SelectMany

.SelectMany() flattens nested collections into a single sequence:

// Get ALL columns from ALL tables (flattened)
var allColumns = RootNode.Tables.SelectMany(t => t.Columns);

// Now you can query across all columns
var identityColumns = allColumns.Where(c => c.IsIdentity);
var stringColumns = allColumns.Where(c => c.DataType == DbType.String);

SelectMany with Parent Reference

Include parent information when flattening:

// Get columns with their table name
var columnsWithTable = RootNode.Tables
    .SelectMany(t => t.Columns, (table, column) => new {
        TableName = table.Name,
        SchemaName = table.Schema.Name,
        ColumnName = column.Name,
        DataType = column.DataType
    });

// Find all columns named "ModifiedDate" and their tables
var modifiedDateColumns = RootNode.Tables
    .SelectMany(t => t.Columns, (table, column) => new { Table = table, Column = column })
    .Where(x => x.Column.Name == "ModifiedDate");

foreach (var item in modifiedDateColumns)
{
    // item.Table.Name, item.Column.DataType
}

Cross-Table Analysis

// Get column data type distribution across entire model
var columnsByType = RootNode.Tables
    .SelectMany(t => t.Columns)
    .GroupBy(c => c.DataType)
    .Select(g => new { DataType = g.Key, Count = g.Count() })
    .OrderByDescending(x => x.Count);

// Find duplicate column names across tables
var duplicateColumns = RootNode.Tables
    .SelectMany(t => t.Columns)
    .GroupBy(c => c.Name)
    .Where(g => g.Count() > 1)
    .Select(g => new { ColumnName = g.Key, OccurrenceCount = g.Count() });

Joining Collections

Join with External Configuration

Use .Join() to combine Biml metadata with external configuration:

// External configuration dictionary
var loadConfig = new Dictionary<string, string>
{
    { "Customer", "SCD2" },
    { "Product", "SCD1" },
    { "Order", "Append" },
    { "OrderDetail", "Append" }
};

// Join tables to configuration
var tablesWithConfig = RootNode.Tables
    .Join(
        loadConfig,                           // Inner collection
        table => table.Name,                  // Outer key selector
        config => config.Key,                 // Inner key selector
        (table, config) => new {              // Result selector
            Table = table,
            LoadType = config.Value
        }
    );

foreach (var item in tablesWithConfig)
{
    // item.Table = AstTableNode
    // item.LoadType = "SCD2", "SCD1", "Append"
}

Left Join with GroupJoin

Include all items from the left collection, even without matches:

// All tables, with config if available
var tablesWithOptionalConfig = RootNode.Tables
    .GroupJoin(
        loadConfig,
        table => table.Name,
        config => config.Key,
        (table, configs) => new {
            Table = table,
            LoadType = configs.FirstOrDefault().Value ?? "Default"
        }
    );

Join Two Biml Collections

// Join tables to connections via annotation
var tableConnections = RootNode.Tables
    .Join(
        RootNode.Connections,
        table => table.GetTag("ConnectionName"),
        conn => conn.Name,
        (table, conn) => new { Table = table, Connection = conn }
    );

Set Operations

Combining Collections

// UNION ALL: Combine with duplicates
var allTables = salesTables.Concat(productionTables);

// UNION: Combine without duplicates
var uniqueTables = salesTables.Union(productionTables);

// Combine from multiple schemas
var targetTables = RootNode.Tables.Where(t => t.Schema.Name == "Sales")
    .Union(RootNode.Tables.Where(t => t.Schema.Name == "Production"))
    .Union(RootNode.Tables.Where(t => t.Schema.Name == "Person"));

Finding Differences

// Tables in source but not in target (by name)
var sourceTableNames = sourceTables.Select(t => t.Name);
var targetTableNames = targetTables.Select(t => t.Name);

var newTables = sourceTableNames.Except(targetTableNames);        // In source only
var removedTables = targetTableNames.Except(sourceTableNames);    // In target only
var commonTables = sourceTableNames.Intersect(targetTableNames);  // In both

Practical Example: Schema Comparison

// Compare columns between source and staging
var sourceColumns = sourceTables
    .SelectMany(t => t.Columns)
    .Select(c => $"{c.Table.Name}.{c.Name}");

var stagingColumns = stagingTables
    .SelectMany(t => t.Columns)
    .Select(c => $"{c.Table.Name}.{c.Name}");

var missingInStaging = sourceColumns.Except(stagingColumns).ToList();
var extraInStaging = stagingColumns.Except(sourceColumns).ToList();

Deferred Execution and Performance

Understanding Deferred Execution

LINQ queries don't execute immediately. They execute when you enumerate the results:

// Query is DEFINED but NOT executed yet
var query = RootNode.Tables.Where(t => t.Schema.Name == "Sales");

// Query executes NOW (first enumeration)
var count = query.Count();

// Query executes AGAIN (second enumeration)
foreach (var table in query) { }

// Query executes AGAIN (third enumeration)
var list = query.ToList();

When to Materialize

Use .ToList() to execute once and cache results:

// BAD: Query executes multiple times
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales");
Console.WriteLine($"Count: {tables.Count()}");        // Executes
Console.WriteLine($"First: {tables.First().Name}");   // Executes again
foreach (var t in tables) { }                          // Executes again

// GOOD: Execute once, reuse cached list
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales").ToList();
Console.WriteLine($"Count: {tables.Count}");          // Uses cached list
Console.WriteLine($"First: {tables.First().Name}");   // Uses cached list
foreach (var t in tables) { }                          // Uses cached list

Performance Guidelines

Scenario	Recommendation
Use results once	Don't materialize
Use results multiple times	Call `.ToList()`
Need fast key lookup	Use `.ToDictionary()`
Large metadata set + complex filter	Materialize early, filter cached list
Chained operations	Let LINQ optimize (don't materialize between steps)

// GOOD: Single pipeline, materializes once at the end
var result = RootNode.Tables
    .Where(t => t.Schema.Name == "Sales")
    .Where(t => t.Columns.Count > 5)
    .OrderBy(t => t.Name)
    .ToList();

// BAD: Materializes after each step (wastes memory)
var step1 = RootNode.Tables.Where(t => t.Schema.Name == "Sales").ToList();
var step2 = step1.Where(t => t.Columns.Count > 5).ToList();
var step3 = step2.OrderBy(t => t.Name).ToList();

Query Syntax vs Method Syntax

LINQ supports two syntaxes. Both compile to the same code.

Method Syntax (Recommended for BimlScript)

var salesTables = RootNode.Tables
    .Where(t => t.Schema.Name == "Sales")
    .OrderBy(t => t.Name)
    .Select(t => t.Name);

Query Syntax (SQL-like)

var salesTables = from t in RootNode.Tables
                  where t.Schema.Name == "Sales"
                  orderby t.Name
                  select t.Name;

Why Method Syntax for BimlScript?

More concise in code nuggets (less vertical space)
Easier to chain multiple operations
All methods available (some like .Take(), .Skip() have no query syntax)
Consistent style with other C# code in your templates

When Query Syntax Helps

Complex joins are sometimes clearer in query syntax:

// Query syntax for complex join
var result = from t in RootNode.Tables
             join c in loadConfig on t.Name equals c.Key
             where t.Schema.Name == "Sales"
             orderby t.Name
             select new { Table = t, LoadType = c.Value };

Common Patterns for ETL Automation

Pattern 1: Filter Tables for Staging

var includedSchemas = new List<string> { "Sales", "Production", "Person" };
var excludedTables = new List<string> { "sysdiagrams", "ErrorLog", "AWBuildVersion" };

var stagingTables = RootNode.Tables
    .Where(t => includedSchemas.Contains(t.Schema.Name))
    .Where(t => !excludedTables.Contains(t.Name))
    .Where(t => !t.Name.StartsWith("tmp_"))
    .Where(t => !t.Name.EndsWith("_Archive"))
    .OrderBy(t => t.Schema.Name)
    .ThenBy(t => t.Name)
    .ToList();

Pattern 2: Group Packages by Priority

// Assumes packages have Priority annotation: High, Normal, Low
var packageGroups = RootNode.Packages
    .GroupBy(p => p.GetTag("Priority") ?? "Normal")
    .OrderBy(g => g.Key == "High" ? 0 : g.Key == "Normal" ? 1 : 2);

foreach (var group in packageGroups)
{
    // Execute High priority first, then Normal, then Low
    foreach (var package in group)
    {
        // Generate ExecutePackage task
    }
}

Pattern 3: Find Tables Missing Audit Columns

var requiredAuditColumns = new[] { "LoadDateTime", "LoadBatchId" };

var tablesMissingAudit = RootNode.Tables
    .Where(t => !requiredAuditColumns.All(
        auditCol => t.Columns.Any(c => c.Name == auditCol)))
    .Select(t => new {
        Table = t.Name,
        Missing = requiredAuditColumns.Where(
            auditCol => !t.Columns.Any(c => c.Name == auditCol))
    })
    .ToList();

// Report tables that need audit columns added
foreach (var item in tablesMissingAudit)
{
    // item.Table, item.Missing
}

Pattern 4: Create Table-to-Connection Mapping

var tableConnections = RootNode.Tables
    .ToDictionary(
        t => t.Name,
        t => t.GetTag("SourceConnection") ?? "DefaultSource"
    );

// Fast lookup
var connectionName = tableConnections["Customer"]; // Returns connection name

Pattern 5: Generate Column Mappings

// Build source-to-target column mapping (handles name differences)
var columnMappings = RootNode.Tables
    .SelectMany(t => t.Columns, (table, column) => new {
        TableName = table.Name,
        SourceColumn = column.Name,
        TargetColumn = column.GetTag("TargetName") ?? column.Name,
        DataType = column.DataType
    })
    .ToList();

Pattern 6: Incremental Load Detection

// Find tables with date columns suitable for incremental loading
var incrementalCandidates = RootNode.Tables
    .Where(t => t.Columns.Any(c =>
        c.Name.Contains("Modified") ||
        c.Name.Contains("Updated") ||
        c.Name.Contains("Changed")))
    .Select(t => new {
        Table = t,
        DateColumn = t.Columns.First(c =>
            c.Name.Contains("Modified") ||
            c.Name.Contains("Updated") ||
            c.Name.Contains("Changed"))
    })
    .ToList();

Troubleshooting LINQ Errors

Common Errors and Fixes

Error Message	Cause	Solution
`Object reference not set to an instance`	Null value in chain	Use `?.` operator or add null check
`Sequence contains no elements`	`.First()` on empty collection	Use `.FirstOrDefault()` and check for null
`Sequence contains more than one element`	`.Single()` found multiple matches	Use `.First()` or add more specific filter
`Cannot implicitly convert type`	Lambda returns wrong type	Check lambda expression return type
`InvalidOperationException`	`.Single()` on empty or multiple	Use `.SingleOrDefault()`

Debugging LINQ Queries

// Add .ToList() and inspect count at each step
var step1 = RootNode.Tables.ToList();
// Check: step1.Count

var step2 = step1.Where(t => t.Schema.Name == "Sales").ToList();
// Check: step2.Count

var step3 = step2.Where(t => t.Columns.Any(c => c.IsIdentity)).ToList();
// Check: step3.Count - if 0, the filter is too restrictive

Safe Patterns

// UNSAFE: Throws if no match
var table = RootNode.Tables.First(t => t.Name == "MissingTable");

// SAFE: Returns null if no match
var table = RootNode.Tables.FirstOrDefault(t => t.Name == "MissingTable");
if (table != null)
{
    // Use table
}

// UNSAFE: Throws if schema is null
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales");

// SAFE: Handles null schema
var tables = RootNode.Tables.Where(t => t.Schema?.Name == "Sales");

Next Steps

C# Primer - Variables, loops, error handling fundamentals
Introduction to BimlScript - Code nuggets, directives, and tiers
Common Patterns - Production-ready staging and loading patterns
Biml Utility Methods - Code reuse with CallBimlScript and helpers

Quick Reference​

Filtering and Selection​

Finding Items​

Boolean Checks​

Sorting​

Aggregation​

Grouping​

Joining and Combining​

Conversion​

Filtering Patterns​

Multiple Conditions​

Filtering with External Lists​

String Filtering Methods​

Nested Collection Filtering​

Null-Safe Filtering​

Grouping and Aggregation​

Basic Grouping​

Grouping with Projection​

Generate Packages by Group​

Aggregation Examples​

Working with Nested Collections​

Flattening with SelectMany​

SelectMany with Parent Reference​

Cross-Table Analysis​

Joining Collections​

Join with External Configuration​

Left Join with GroupJoin​

Join Two Biml Collections​

Set Operations​

Combining Collections​

Finding Differences​

Practical Example: Schema Comparison​

Deferred Execution and Performance​

Understanding Deferred Execution​

When to Materialize​

Performance Guidelines​

Query Syntax vs Method Syntax​

Method Syntax (Recommended for BimlScript)​

Query Syntax (SQL-like)​

Why Method Syntax for BimlScript?​

When Query Syntax Helps​

Common Patterns for ETL Automation​

Pattern 1: Filter Tables for Staging​

Pattern 2: Group Packages by Priority​

Pattern 3: Find Tables Missing Audit Columns​

Pattern 4: Create Table-to-Connection Mapping​

Pattern 5: Generate Column Mappings​

Pattern 6: Incremental Load Detection​

Troubleshooting LINQ Errors​

Common Errors and Fixes​

Debugging LINQ Queries​

Safe Patterns​

Next Steps​