Skip to main content

LINQ for BimlScript

LINQ (Language Integrated Query) is a C# feature that lets you query and transform collections using readable, chainable methods. For BimlScript, LINQ is essential for filtering tables, selecting columns, grouping metadata, and building dynamic ETL pipelines.

This guide extends the C# Primer LINQ fundamentals with advanced methods, performance optimization, and real-world patterns. If you're new to LINQ, start with the primer first.

Required import:

<#@ import namespace="System.Linq" #>

Quick Reference

All commonly used LINQ methods with SQL equivalents and Biml examples:

Filtering and Selection

MethodPurposeSQL EquivalentExample
.Where()Filter itemsWHEREtables.Where(t => t.Schema.Name == "Sales")
.Select()Transform/projectSELECTtables.Select(t => t.Name)
.SelectMany()Flatten nestedSubquery/Cross jointables.SelectMany(t => t.Columns)
.Distinct()Remove duplicatesDISTINCTtables.Select(t => t.Schema.Name).Distinct()
.Take()Limit resultsTOP ntables.Take(10)
.Skip()Skip resultsOFFSETtables.Skip(5).Take(10)

Finding Items

MethodPurposeSQL EquivalentExample
.First()Get first (throws if empty)TOP 1tables.First()
.FirstOrDefault()Get first or nullTOP 1tables.FirstOrDefault(t => t.Name == "Customer")
.Single()Get only one (throws if not exactly one)Expects 1 rowtables.Single(t => t.Name == "Config")
.SingleOrDefault()Get only one or nullExpects 0-1 rowstables.SingleOrDefault(t => t.Name == "Config")
.Last()Get last itemN/Atables.OrderBy(t => t.Name).Last()
.ElementAt()Get by indexN/Atables.ElementAt(5)

Boolean Checks

MethodPurposeSQL EquivalentExample
.Any()Check if any existEXISTStable.Columns.Any(c => c.IsIdentity)
.All()Check all matchNOT EXISTS (inverse)table.Columns.All(c => c.IsNullable)
.Contains()Check if item existsINschemaList.Contains(t.Schema.Name)

Sorting

MethodPurposeSQL EquivalentExample
.OrderBy()Sort ascendingORDER BY ASCtables.OrderBy(t => t.Name)
.OrderByDescending()Sort descendingORDER BY DESCtables.OrderByDescending(t => t.Columns.Count)
.ThenBy()Secondary sortORDER BY x, ytables.OrderBy(t => t.Schema.Name).ThenBy(t => t.Name)
.ThenByDescending()Secondary desc sortORDER BY x, y DESCtables.OrderBy(t => t.Schema.Name).ThenByDescending(t => t.Name)
.Reverse()Reverse orderN/Atables.OrderBy(t => t.Name).Reverse()

Aggregation

MethodPurposeSQL EquivalentExample
.Count()Count itemsCOUNT(*)table.Columns.Count()
.Sum()Sum valuesSUM()tables.Sum(t => t.Columns.Count)
.Max()Find maximumMAX()tables.Max(t => t.Columns.Count)
.Min()Find minimumMIN()tables.Min(t => t.Columns.Count)
.Average()Calculate averageAVG()tables.Average(t => t.Columns.Count)
.Aggregate()Custom aggregationN/Anames.Aggregate((a, b) => a + ", " + b)

Grouping

MethodPurposeSQL EquivalentExample
.GroupBy()Group itemsGROUP BYtables.GroupBy(t => t.Schema.Name)
.ToLookup()Create grouped dictionaryGROUP BY (cached)tables.ToLookup(t => t.Schema.Name)

Joining and Combining

MethodPurposeSQL EquivalentExample
.Join()Inner joinINNER JOINSee Joining section
.GroupJoin()Left outer joinLEFT JOINSee Joining section
.Concat()Combine listsUNION ALLlist1.Concat(list2)
.Union()Combine uniqueUNIONlist1.Union(list2)
.Intersect()Common itemsINTERSECTlist1.Intersect(list2)
.Except()DifferenceEXCEPTlist1.Except(list2)
.Zip()Pair itemsN/Alist1.Zip(list2, (a, b) => a + b)

Conversion

MethodPurposeWhen to Use
.ToList()Convert to ListMaterialize for reuse
.ToArray()Convert to arrayFixed-size collection
.ToDictionary()Create key-value lookupFast lookup by key
.ToLookup()Create grouped lookupMultiple values per key
.Cast<T>()Cast elementsChange element type
.OfType<T>()Filter by typeGet only matching types

Filtering Patterns

Multiple Conditions

Combine conditions with && (AND) and || (OR):

// AND: Tables in Sales schema that aren't views
var salesTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales" && !t.Name.StartsWith("v"));

// OR: Tables in Sales or Production schemas
var targetTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales" || t.Schema.Name == "Production");

// Complex: Chained Where (equivalent to AND)
var filteredTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales")
.Where(t => t.Columns.Count > 5)
.Where(t => !t.Name.Contains("Archive"));

Filtering with External Lists

Use .Contains() to filter against a list of values:

// Include specific schemas
var includedSchemas = new List<string> { "Sales", "Production", "Person" };
var targetTables = RootNode.Tables
.Where(t => includedSchemas.Contains(t.Schema.Name));

// Exclude specific tables
var excludedTables = new List<string> { "sysdiagrams", "ErrorLog", "BuildVersion" };
var cleanTables = RootNode.Tables
.Where(t => !excludedTables.Contains(t.Name));

String Filtering Methods

// Tables starting with "dim"
var dimensions = RootNode.Tables.Where(t => t.Name.StartsWith("dim"));

// Tables ending with "History"
var historyTables = RootNode.Tables.Where(t => t.Name.EndsWith("History"));

// Tables containing "Customer"
var customerTables = RootNode.Tables.Where(t => t.Name.Contains("Customer"));

// Case-insensitive comparison
var matches = RootNode.Tables
.Where(t => t.Name.Equals("customer", StringComparison.OrdinalIgnoreCase));

Nested Collection Filtering

Filter parent objects based on child collection properties:

// Tables that have at least one identity column
var tablesWithIdentity = RootNode.Tables
.Where(t => t.Columns.Any(c => c.IsIdentity));

// Tables where ALL columns are nullable
var fullyNullableTables = RootNode.Tables
.Where(t => t.Columns.All(c => c.IsNullable));

// Tables with more than 20 columns
var wideTables = RootNode.Tables
.Where(t => t.Columns.Count > 20);

// Tables containing a datetime column
var tablesWithDates = RootNode.Tables
.Where(t => t.Columns.Any(c =>
c.DataType == DbType.DateTime || c.DataType == DbType.DateTime2));

Null-Safe Filtering

Use the null-conditional operator ?. to avoid null reference exceptions:

// Safe schema access (schema might be null)
var salesTables = RootNode.Tables
.Where(t => t.Schema?.Name == "Sales");

// Safe annotation access
var taggedTables = RootNode.Tables
.Where(t => t.GetTag("LoadType")?.ToLower() == "incremental");

// Null coalescing with default value
var tables = RootNode.Tables
.Where(t => (t.GetTag("Priority") ?? "Normal") == "High");

Grouping and Aggregation

Basic Grouping

.GroupBy() returns groups where each group has a Key and contains matching items:

// Group tables by schema
var tablesBySchema = RootNode.Tables.GroupBy(t => t.Schema.Name);

foreach (var group in tablesBySchema)
{
// group.Key = "Sales", "Production", etc.
// group = IEnumerable<AstTableNode> containing tables in that schema

var schemaName = group.Key;
var tableCount = group.Count();

foreach (var table in group)
{
// Process each table in this schema
}
}

Grouping with Projection

Transform groups into custom objects:

// Get schema statistics
var schemaStats = RootNode.Tables
.GroupBy(t => t.Schema.Name)
.Select(g => new {
Schema = g.Key,
TableCount = g.Count(),
TotalColumns = g.Sum(t => t.Columns.Count),
AvgColumns = g.Average(t => t.Columns.Count)
});

foreach (var stat in schemaStats)
{
// stat.Schema, stat.TableCount, stat.TotalColumns, stat.AvgColumns
}

Generate Packages by Group

Common pattern: create one master package per schema:

<# foreach (var schemaGroup in RootNode.Tables.GroupBy(t => t.Schema.Name)) { #>
<Package Name="Master_<#= schemaGroup.Key #>" ConstraintMode="Parallel">
<Tasks>
<# foreach (var table in schemaGroup) { #>
<ExecutePackage Name="Load_<#= table.Name #>">
<Package PackageName="Load_<#= table.Name #>" />
</ExecutePackage>
<# } #>
</Tasks>
</Package>
<# } #>

Aggregation Examples

// Count all tables
var totalTables = RootNode.Tables.Count();

// Count tables matching condition
var salesTableCount = RootNode.Tables.Count(t => t.Schema.Name == "Sales");

// Sum of all columns across all tables
var totalColumns = RootNode.Tables.Sum(t => t.Columns.Count);

// Find table with most columns
var maxColumns = RootNode.Tables.Max(t => t.Columns.Count);
var widestTable = RootNode.Tables.First(t => t.Columns.Count == maxColumns);

// Average columns per table
var avgColumns = RootNode.Tables.Average(t => t.Columns.Count);

// Build comma-separated list
var tableNames = RootNode.Tables.Select(t => t.Name);
var nameList = string.Join(", ", tableNames);
// Or using Aggregate:
var nameList = tableNames.Aggregate((current, next) => current + ", " + next);

Working with Nested Collections

Flattening with SelectMany

.SelectMany() flattens nested collections into a single sequence:

// Get ALL columns from ALL tables (flattened)
var allColumns = RootNode.Tables.SelectMany(t => t.Columns);

// Now you can query across all columns
var identityColumns = allColumns.Where(c => c.IsIdentity);
var stringColumns = allColumns.Where(c => c.DataType == DbType.String);

SelectMany with Parent Reference

Include parent information when flattening:

// Get columns with their table name
var columnsWithTable = RootNode.Tables
.SelectMany(t => t.Columns, (table, column) => new {
TableName = table.Name,
SchemaName = table.Schema.Name,
ColumnName = column.Name,
DataType = column.DataType
});

// Find all columns named "ModifiedDate" and their tables
var modifiedDateColumns = RootNode.Tables
.SelectMany(t => t.Columns, (table, column) => new { Table = table, Column = column })
.Where(x => x.Column.Name == "ModifiedDate");

foreach (var item in modifiedDateColumns)
{
// item.Table.Name, item.Column.DataType
}

Cross-Table Analysis

// Get column data type distribution across entire model
var columnsByType = RootNode.Tables
.SelectMany(t => t.Columns)
.GroupBy(c => c.DataType)
.Select(g => new { DataType = g.Key, Count = g.Count() })
.OrderByDescending(x => x.Count);

// Find duplicate column names across tables
var duplicateColumns = RootNode.Tables
.SelectMany(t => t.Columns)
.GroupBy(c => c.Name)
.Where(g => g.Count() > 1)
.Select(g => new { ColumnName = g.Key, OccurrenceCount = g.Count() });

Joining Collections

Join with External Configuration

Use .Join() to combine Biml metadata with external configuration:

// External configuration dictionary
var loadConfig = new Dictionary<string, string>
{
{ "Customer", "SCD2" },
{ "Product", "SCD1" },
{ "Order", "Append" },
{ "OrderDetail", "Append" }
};

// Join tables to configuration
var tablesWithConfig = RootNode.Tables
.Join(
loadConfig, // Inner collection
table => table.Name, // Outer key selector
config => config.Key, // Inner key selector
(table, config) => new { // Result selector
Table = table,
LoadType = config.Value
}
);

foreach (var item in tablesWithConfig)
{
// item.Table = AstTableNode
// item.LoadType = "SCD2", "SCD1", "Append"
}

Left Join with GroupJoin

Include all items from the left collection, even without matches:

// All tables, with config if available
var tablesWithOptionalConfig = RootNode.Tables
.GroupJoin(
loadConfig,
table => table.Name,
config => config.Key,
(table, configs) => new {
Table = table,
LoadType = configs.FirstOrDefault().Value ?? "Default"
}
);

Join Two Biml Collections

// Join tables to connections via annotation
var tableConnections = RootNode.Tables
.Join(
RootNode.Connections,
table => table.GetTag("ConnectionName"),
conn => conn.Name,
(table, conn) => new { Table = table, Connection = conn }
);

Set Operations

Combining Collections

// UNION ALL: Combine with duplicates
var allTables = salesTables.Concat(productionTables);

// UNION: Combine without duplicates
var uniqueTables = salesTables.Union(productionTables);

// Combine from multiple schemas
var targetTables = RootNode.Tables.Where(t => t.Schema.Name == "Sales")
.Union(RootNode.Tables.Where(t => t.Schema.Name == "Production"))
.Union(RootNode.Tables.Where(t => t.Schema.Name == "Person"));

Finding Differences

// Tables in source but not in target (by name)
var sourceTableNames = sourceTables.Select(t => t.Name);
var targetTableNames = targetTables.Select(t => t.Name);

var newTables = sourceTableNames.Except(targetTableNames); // In source only
var removedTables = targetTableNames.Except(sourceTableNames); // In target only
var commonTables = sourceTableNames.Intersect(targetTableNames); // In both

Practical Example: Schema Comparison

// Compare columns between source and staging
var sourceColumns = sourceTables
.SelectMany(t => t.Columns)
.Select(c => $"{c.Table.Name}.{c.Name}");

var stagingColumns = stagingTables
.SelectMany(t => t.Columns)
.Select(c => $"{c.Table.Name}.{c.Name}");

var missingInStaging = sourceColumns.Except(stagingColumns).ToList();
var extraInStaging = stagingColumns.Except(sourceColumns).ToList();

Deferred Execution and Performance

Understanding Deferred Execution

LINQ queries don't execute immediately. They execute when you enumerate the results:

// Query is DEFINED but NOT executed yet
var query = RootNode.Tables.Where(t => t.Schema.Name == "Sales");

// Query executes NOW (first enumeration)
var count = query.Count();

// Query executes AGAIN (second enumeration)
foreach (var table in query) { }

// Query executes AGAIN (third enumeration)
var list = query.ToList();

When to Materialize

Use .ToList() to execute once and cache results:

// BAD: Query executes multiple times
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales");
Console.WriteLine($"Count: {tables.Count()}"); // Executes
Console.WriteLine($"First: {tables.First().Name}"); // Executes again
foreach (var t in tables) { } // Executes again

// GOOD: Execute once, reuse cached list
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales").ToList();
Console.WriteLine($"Count: {tables.Count}"); // Uses cached list
Console.WriteLine($"First: {tables.First().Name}"); // Uses cached list
foreach (var t in tables) { } // Uses cached list

Performance Guidelines

ScenarioRecommendation
Use results onceDon't materialize
Use results multiple timesCall .ToList()
Need fast key lookupUse .ToDictionary()
Large metadata set + complex filterMaterialize early, filter cached list
Chained operationsLet LINQ optimize (don't materialize between steps)
// GOOD: Single pipeline, materializes once at the end
var result = RootNode.Tables
.Where(t => t.Schema.Name == "Sales")
.Where(t => t.Columns.Count > 5)
.OrderBy(t => t.Name)
.ToList();

// BAD: Materializes after each step (wastes memory)
var step1 = RootNode.Tables.Where(t => t.Schema.Name == "Sales").ToList();
var step2 = step1.Where(t => t.Columns.Count > 5).ToList();
var step3 = step2.OrderBy(t => t.Name).ToList();

Query Syntax vs Method Syntax

LINQ supports two syntaxes. Both compile to the same code.

var salesTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales")
.OrderBy(t => t.Name)
.Select(t => t.Name);

Query Syntax (SQL-like)

var salesTables = from t in RootNode.Tables
where t.Schema.Name == "Sales"
orderby t.Name
select t.Name;

Why Method Syntax for BimlScript?

  1. More concise in code nuggets (less vertical space)
  2. Easier to chain multiple operations
  3. All methods available (some like .Take(), .Skip() have no query syntax)
  4. Consistent style with other C# code in your templates

When Query Syntax Helps

Complex joins are sometimes clearer in query syntax:

// Query syntax for complex join
var result = from t in RootNode.Tables
join c in loadConfig on t.Name equals c.Key
where t.Schema.Name == "Sales"
orderby t.Name
select new { Table = t, LoadType = c.Value };

Common Patterns for ETL Automation

Pattern 1: Filter Tables for Staging

var includedSchemas = new List<string> { "Sales", "Production", "Person" };
var excludedTables = new List<string> { "sysdiagrams", "ErrorLog", "AWBuildVersion" };

var stagingTables = RootNode.Tables
.Where(t => includedSchemas.Contains(t.Schema.Name))
.Where(t => !excludedTables.Contains(t.Name))
.Where(t => !t.Name.StartsWith("tmp_"))
.Where(t => !t.Name.EndsWith("_Archive"))
.OrderBy(t => t.Schema.Name)
.ThenBy(t => t.Name)
.ToList();

Pattern 2: Group Packages by Priority

// Assumes packages have Priority annotation: High, Normal, Low
var packageGroups = RootNode.Packages
.GroupBy(p => p.GetTag("Priority") ?? "Normal")
.OrderBy(g => g.Key == "High" ? 0 : g.Key == "Normal" ? 1 : 2);

foreach (var group in packageGroups)
{
// Execute High priority first, then Normal, then Low
foreach (var package in group)
{
// Generate ExecutePackage task
}
}

Pattern 3: Find Tables Missing Audit Columns

var requiredAuditColumns = new[] { "LoadDateTime", "LoadBatchId" };

var tablesMissingAudit = RootNode.Tables
.Where(t => !requiredAuditColumns.All(
auditCol => t.Columns.Any(c => c.Name == auditCol)))
.Select(t => new {
Table = t.Name,
Missing = requiredAuditColumns.Where(
auditCol => !t.Columns.Any(c => c.Name == auditCol))
})
.ToList();

// Report tables that need audit columns added
foreach (var item in tablesMissingAudit)
{
// item.Table, item.Missing
}

Pattern 4: Create Table-to-Connection Mapping

var tableConnections = RootNode.Tables
.ToDictionary(
t => t.Name,
t => t.GetTag("SourceConnection") ?? "DefaultSource"
);

// Fast lookup
var connectionName = tableConnections["Customer"]; // Returns connection name

Pattern 5: Generate Column Mappings

// Build source-to-target column mapping (handles name differences)
var columnMappings = RootNode.Tables
.SelectMany(t => t.Columns, (table, column) => new {
TableName = table.Name,
SourceColumn = column.Name,
TargetColumn = column.GetTag("TargetName") ?? column.Name,
DataType = column.DataType
})
.ToList();

Pattern 6: Incremental Load Detection

// Find tables with date columns suitable for incremental loading
var incrementalCandidates = RootNode.Tables
.Where(t => t.Columns.Any(c =>
c.Name.Contains("Modified") ||
c.Name.Contains("Updated") ||
c.Name.Contains("Changed")))
.Select(t => new {
Table = t,
DateColumn = t.Columns.First(c =>
c.Name.Contains("Modified") ||
c.Name.Contains("Updated") ||
c.Name.Contains("Changed"))
})
.ToList();

Troubleshooting LINQ Errors

Common Errors and Fixes

Error MessageCauseSolution
Object reference not set to an instanceNull value in chainUse ?. operator or add null check
Sequence contains no elements.First() on empty collectionUse .FirstOrDefault() and check for null
Sequence contains more than one element.Single() found multiple matchesUse .First() or add more specific filter
Cannot implicitly convert typeLambda returns wrong typeCheck lambda expression return type
InvalidOperationException.Single() on empty or multipleUse .SingleOrDefault()

Debugging LINQ Queries

// Add .ToList() and inspect count at each step
var step1 = RootNode.Tables.ToList();
// Check: step1.Count

var step2 = step1.Where(t => t.Schema.Name == "Sales").ToList();
// Check: step2.Count

var step3 = step2.Where(t => t.Columns.Any(c => c.IsIdentity)).ToList();
// Check: step3.Count - if 0, the filter is too restrictive

Safe Patterns

// UNSAFE: Throws if no match
var table = RootNode.Tables.First(t => t.Name == "MissingTable");

// SAFE: Returns null if no match
var table = RootNode.Tables.FirstOrDefault(t => t.Name == "MissingTable");
if (table != null)
{
// Use table
}

// UNSAFE: Throws if schema is null
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales");

// SAFE: Handles null schema
var tables = RootNode.Tables.Where(t => t.Schema?.Name == "Sales");

Next Steps