LINQ for BimlScript
LINQ (Language Integrated Query) is a C# feature that lets you query and transform collections using readable, chainable methods. For BimlScript, LINQ is essential for filtering tables, selecting columns, grouping metadata, and building dynamic ETL pipelines.
This guide extends the C# Primer LINQ fundamentals with advanced methods, performance optimization, and real-world patterns. If you're new to LINQ, start with the primer first.
Required import:
<#@ import namespace="System.Linq" #>
Quick Reference
All commonly used LINQ methods with SQL equivalents and Biml examples:
Filtering and Selection
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.Where() | Filter items | WHERE | tables.Where(t => t.Schema.Name == "Sales") |
.Select() | Transform/project | SELECT | tables.Select(t => t.Name) |
.SelectMany() | Flatten nested | Subquery/Cross join | tables.SelectMany(t => t.Columns) |
.Distinct() | Remove duplicates | DISTINCT | tables.Select(t => t.Schema.Name).Distinct() |
.Take() | Limit results | TOP n | tables.Take(10) |
.Skip() | Skip results | OFFSET | tables.Skip(5).Take(10) |
Finding Items
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.First() | Get first (throws if empty) | TOP 1 | tables.First() |
.FirstOrDefault() | Get first or null | TOP 1 | tables.FirstOrDefault(t => t.Name == "Customer") |
.Single() | Get only one (throws if not exactly one) | Expects 1 row | tables.Single(t => t.Name == "Config") |
.SingleOrDefault() | Get only one or null | Expects 0-1 rows | tables.SingleOrDefault(t => t.Name == "Config") |
.Last() | Get last item | N/A | tables.OrderBy(t => t.Name).Last() |
.ElementAt() | Get by index | N/A | tables.ElementAt(5) |
Boolean Checks
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.Any() | Check if any exist | EXISTS | table.Columns.Any(c => c.IsIdentity) |
.All() | Check all match | NOT EXISTS (inverse) | table.Columns.All(c => c.IsNullable) |
.Contains() | Check if item exists | IN | schemaList.Contains(t.Schema.Name) |
Sorting
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.OrderBy() | Sort ascending | ORDER BY ASC | tables.OrderBy(t => t.Name) |
.OrderByDescending() | Sort descending | ORDER BY DESC | tables.OrderByDescending(t => t.Columns.Count) |
.ThenBy() | Secondary sort | ORDER BY x, y | tables.OrderBy(t => t.Schema.Name).ThenBy(t => t.Name) |
.ThenByDescending() | Secondary desc sort | ORDER BY x, y DESC | tables.OrderBy(t => t.Schema.Name).ThenByDescending(t => t.Name) |
.Reverse() | Reverse order | N/A | tables.OrderBy(t => t.Name).Reverse() |
Aggregation
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.Count() | Count items | COUNT(*) | table.Columns.Count() |
.Sum() | Sum values | SUM() | tables.Sum(t => t.Columns.Count) |
.Max() | Find maximum | MAX() | tables.Max(t => t.Columns.Count) |
.Min() | Find minimum | MIN() | tables.Min(t => t.Columns.Count) |
.Average() | Calculate average | AVG() | tables.Average(t => t.Columns.Count) |
.Aggregate() | Custom aggregation | N/A | names.Aggregate((a, b) => a + ", " + b) |
Grouping
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.GroupBy() | Group items | GROUP BY | tables.GroupBy(t => t.Schema.Name) |
.ToLookup() | Create grouped dictionary | GROUP BY (cached) | tables.ToLookup(t => t.Schema.Name) |
Joining and Combining
| Method | Purpose | SQL Equivalent | Example |
|---|---|---|---|
.Join() | Inner join | INNER JOIN | See Joining section |
.GroupJoin() | Left outer join | LEFT JOIN | See Joining section |
.Concat() | Combine lists | UNION ALL | list1.Concat(list2) |
.Union() | Combine unique | UNION | list1.Union(list2) |
.Intersect() | Common items | INTERSECT | list1.Intersect(list2) |
.Except() | Difference | EXCEPT | list1.Except(list2) |
.Zip() | Pair items | N/A | list1.Zip(list2, (a, b) => a + b) |
Conversion
| Method | Purpose | When to Use |
|---|---|---|
.ToList() | Convert to List | Materialize for reuse |
.ToArray() | Convert to array | Fixed-size collection |
.ToDictionary() | Create key-value lookup | Fast lookup by key |
.ToLookup() | Create grouped lookup | Multiple values per key |
.Cast<T>() | Cast elements | Change element type |
.OfType<T>() | Filter by type | Get only matching types |
Filtering Patterns
Multiple Conditions
Combine conditions with && (AND) and || (OR):
// AND: Tables in Sales schema that aren't views
var salesTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales" && !t.Name.StartsWith("v"));
// OR: Tables in Sales or Production schemas
var targetTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales" || t.Schema.Name == "Production");
// Complex: Chained Where (equivalent to AND)
var filteredTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales")
.Where(t => t.Columns.Count > 5)
.Where(t => !t.Name.Contains("Archive"));
Filtering with External Lists
Use .Contains() to filter against a list of values:
// Include specific schemas
var includedSchemas = new List<string> { "Sales", "Production", "Person" };
var targetTables = RootNode.Tables
.Where(t => includedSchemas.Contains(t.Schema.Name));
// Exclude specific tables
var excludedTables = new List<string> { "sysdiagrams", "ErrorLog", "BuildVersion" };
var cleanTables = RootNode.Tables
.Where(t => !excludedTables.Contains(t.Name));
String Filtering Methods
// Tables starting with "dim"
var dimensions = RootNode.Tables.Where(t => t.Name.StartsWith("dim"));
// Tables ending with "History"
var historyTables = RootNode.Tables.Where(t => t.Name.EndsWith("History"));
// Tables containing "Customer"
var customerTables = RootNode.Tables.Where(t => t.Name.Contains("Customer"));
// Case-insensitive comparison
var matches = RootNode.Tables
.Where(t => t.Name.Equals("customer", StringComparison.OrdinalIgnoreCase));
Nested Collection Filtering
Filter parent objects based on child collection properties:
// Tables that have at least one identity column
var tablesWithIdentity = RootNode.Tables
.Where(t => t.Columns.Any(c => c.IsIdentity));
// Tables where ALL columns are nullable
var fullyNullableTables = RootNode.Tables
.Where(t => t.Columns.All(c => c.IsNullable));
// Tables with more than 20 columns
var wideTables = RootNode.Tables
.Where(t => t.Columns.Count > 20);
// Tables containing a datetime column
var tablesWithDates = RootNode.Tables
.Where(t => t.Columns.Any(c =>
c.DataType == DbType.DateTime || c.DataType == DbType.DateTime2));
Null-Safe Filtering
Use the null-conditional operator ?. to avoid null reference exceptions:
// Safe schema access (schema might be null)
var salesTables = RootNode.Tables
.Where(t => t.Schema?.Name == "Sales");
// Safe annotation access
var taggedTables = RootNode.Tables
.Where(t => t.GetTag("LoadType")?.ToLower() == "incremental");
// Null coalescing with default value
var tables = RootNode.Tables
.Where(t => (t.GetTag("Priority") ?? "Normal") == "High");
Grouping and Aggregation
Basic Grouping
.GroupBy() returns groups where each group has a Key and contains matching items:
// Group tables by schema
var tablesBySchema = RootNode.Tables.GroupBy(t => t.Schema.Name);
foreach (var group in tablesBySchema)
{
// group.Key = "Sales", "Production", etc.
// group = IEnumerable<AstTableNode> containing tables in that schema
var schemaName = group.Key;
var tableCount = group.Count();
foreach (var table in group)
{
// Process each table in this schema
}
}
Grouping with Projection
Transform groups into custom objects:
// Get schema statistics
var schemaStats = RootNode.Tables
.GroupBy(t => t.Schema.Name)
.Select(g => new {
Schema = g.Key,
TableCount = g.Count(),
TotalColumns = g.Sum(t => t.Columns.Count),
AvgColumns = g.Average(t => t.Columns.Count)
});
foreach (var stat in schemaStats)
{
// stat.Schema, stat.TableCount, stat.TotalColumns, stat.AvgColumns
}
Generate Packages by Group
Common pattern: create one master package per schema:
<# foreach (var schemaGroup in RootNode.Tables.GroupBy(t => t.Schema.Name)) { #>
<Package Name="Master_<#= schemaGroup.Key #>" ConstraintMode="Parallel">
<Tasks>
<# foreach (var table in schemaGroup) { #>
<ExecutePackage Name="Load_<#= table.Name #>">
<Package PackageName="Load_<#= table.Name #>" />
</ExecutePackage>
<# } #>
</Tasks>
</Package>
<# } #>
Aggregation Examples
// Count all tables
var totalTables = RootNode.Tables.Count();
// Count tables matching condition
var salesTableCount = RootNode.Tables.Count(t => t.Schema.Name == "Sales");
// Sum of all columns across all tables
var totalColumns = RootNode.Tables.Sum(t => t.Columns.Count);
// Find table with most columns
var maxColumns = RootNode.Tables.Max(t => t.Columns.Count);
var widestTable = RootNode.Tables.First(t => t.Columns.Count == maxColumns);
// Average columns per table
var avgColumns = RootNode.Tables.Average(t => t.Columns.Count);
// Build comma-separated list
var tableNames = RootNode.Tables.Select(t => t.Name);
var nameList = string.Join(", ", tableNames);
// Or using Aggregate:
var nameList = tableNames.Aggregate((current, next) => current + ", " + next);
Working with Nested Collections
Flattening with SelectMany
.SelectMany() flattens nested collections into a single sequence:
// Get ALL columns from ALL tables (flattened)
var allColumns = RootNode.Tables.SelectMany(t => t.Columns);
// Now you can query across all columns
var identityColumns = allColumns.Where(c => c.IsIdentity);
var stringColumns = allColumns.Where(c => c.DataType == DbType.String);
SelectMany with Parent Reference
Include parent information when flattening:
// Get columns with their table name
var columnsWithTable = RootNode.Tables
.SelectMany(t => t.Columns, (table, column) => new {
TableName = table.Name,
SchemaName = table.Schema.Name,
ColumnName = column.Name,
DataType = column.DataType
});
// Find all columns named "ModifiedDate" and their tables
var modifiedDateColumns = RootNode.Tables
.SelectMany(t => t.Columns, (table, column) => new { Table = table, Column = column })
.Where(x => x.Column.Name == "ModifiedDate");
foreach (var item in modifiedDateColumns)
{
// item.Table.Name, item.Column.DataType
}
Cross-Table Analysis
// Get column data type distribution across entire model
var columnsByType = RootNode.Tables
.SelectMany(t => t.Columns)
.GroupBy(c => c.DataType)
.Select(g => new { DataType = g.Key, Count = g.Count() })
.OrderByDescending(x => x.Count);
// Find duplicate column names across tables
var duplicateColumns = RootNode.Tables
.SelectMany(t => t.Columns)
.GroupBy(c => c.Name)
.Where(g => g.Count() > 1)
.Select(g => new { ColumnName = g.Key, OccurrenceCount = g.Count() });
Joining Collections
Join with External Configuration
Use .Join() to combine Biml metadata with external configuration:
// External configuration dictionary
var loadConfig = new Dictionary<string, string>
{
{ "Customer", "SCD2" },
{ "Product", "SCD1" },
{ "Order", "Append" },
{ "OrderDetail", "Append" }
};
// Join tables to configuration
var tablesWithConfig = RootNode.Tables
.Join(
loadConfig, // Inner collection
table => table.Name, // Outer key selector
config => config.Key, // Inner key selector
(table, config) => new { // Result selector
Table = table,
LoadType = config.Value
}
);
foreach (var item in tablesWithConfig)
{
// item.Table = AstTableNode
// item.LoadType = "SCD2", "SCD1", "Append"
}
Left Join with GroupJoin
Include all items from the left collection, even without matches:
// All tables, with config if available
var tablesWithOptionalConfig = RootNode.Tables
.GroupJoin(
loadConfig,
table => table.Name,
config => config.Key,
(table, configs) => new {
Table = table,
LoadType = configs.FirstOrDefault().Value ?? "Default"
}
);
Join Two Biml Collections
// Join tables to connections via annotation
var tableConnections = RootNode.Tables
.Join(
RootNode.Connections,
table => table.GetTag("ConnectionName"),
conn => conn.Name,
(table, conn) => new { Table = table, Connection = conn }
);
Set Operations
Combining Collections
// UNION ALL: Combine with duplicates
var allTables = salesTables.Concat(productionTables);
// UNION: Combine without duplicates
var uniqueTables = salesTables.Union(productionTables);
// Combine from multiple schemas
var targetTables = RootNode.Tables.Where(t => t.Schema.Name == "Sales")
.Union(RootNode.Tables.Where(t => t.Schema.Name == "Production"))
.Union(RootNode.Tables.Where(t => t.Schema.Name == "Person"));
Finding Differences
// Tables in source but not in target (by name)
var sourceTableNames = sourceTables.Select(t => t.Name);
var targetTableNames = targetTables.Select(t => t.Name);
var newTables = sourceTableNames.Except(targetTableNames); // In source only
var removedTables = targetTableNames.Except(sourceTableNames); // In target only
var commonTables = sourceTableNames.Intersect(targetTableNames); // In both
Practical Example: Schema Comparison
// Compare columns between source and staging
var sourceColumns = sourceTables
.SelectMany(t => t.Columns)
.Select(c => $"{c.Table.Name}.{c.Name}");
var stagingColumns = stagingTables
.SelectMany(t => t.Columns)
.Select(c => $"{c.Table.Name}.{c.Name}");
var missingInStaging = sourceColumns.Except(stagingColumns).ToList();
var extraInStaging = stagingColumns.Except(sourceColumns).ToList();
Deferred Execution and Performance
Understanding Deferred Execution
LINQ queries don't execute immediately. They execute when you enumerate the results:
// Query is DEFINED but NOT executed yet
var query = RootNode.Tables.Where(t => t.Schema.Name == "Sales");
// Query executes NOW (first enumeration)
var count = query.Count();
// Query executes AGAIN (second enumeration)
foreach (var table in query) { }
// Query executes AGAIN (third enumeration)
var list = query.ToList();
When to Materialize
Use .ToList() to execute once and cache results:
// BAD: Query executes multiple times
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales");
Console.WriteLine($"Count: {tables.Count()}"); // Executes
Console.WriteLine($"First: {tables.First().Name}"); // Executes again
foreach (var t in tables) { } // Executes again
// GOOD: Execute once, reuse cached list
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales").ToList();
Console.WriteLine($"Count: {tables.Count}"); // Uses cached list
Console.WriteLine($"First: {tables.First().Name}"); // Uses cached list
foreach (var t in tables) { } // Uses cached list
Performance Guidelines
| Scenario | Recommendation |
|---|---|
| Use results once | Don't materialize |
| Use results multiple times | Call .ToList() |
| Need fast key lookup | Use .ToDictionary() |
| Large metadata set + complex filter | Materialize early, filter cached list |
| Chained operations | Let LINQ optimize (don't materialize between steps) |
// GOOD: Single pipeline, materializes once at the end
var result = RootNode.Tables
.Where(t => t.Schema.Name == "Sales")
.Where(t => t.Columns.Count > 5)
.OrderBy(t => t.Name)
.ToList();
// BAD: Materializes after each step (wastes memory)
var step1 = RootNode.Tables.Where(t => t.Schema.Name == "Sales").ToList();
var step2 = step1.Where(t => t.Columns.Count > 5).ToList();
var step3 = step2.OrderBy(t => t.Name).ToList();
Query Syntax vs Method Syntax
LINQ supports two syntaxes. Both compile to the same code.
Method Syntax (Recommended for BimlScript)
var salesTables = RootNode.Tables
.Where(t => t.Schema.Name == "Sales")
.OrderBy(t => t.Name)
.Select(t => t.Name);
Query Syntax (SQL-like)
var salesTables = from t in RootNode.Tables
where t.Schema.Name == "Sales"
orderby t.Name
select t.Name;
Why Method Syntax for BimlScript?
- More concise in code nuggets (less vertical space)
- Easier to chain multiple operations
- All methods available (some like
.Take(),.Skip()have no query syntax) - Consistent style with other C# code in your templates
When Query Syntax Helps
Complex joins are sometimes clearer in query syntax:
// Query syntax for complex join
var result = from t in RootNode.Tables
join c in loadConfig on t.Name equals c.Key
where t.Schema.Name == "Sales"
orderby t.Name
select new { Table = t, LoadType = c.Value };
Common Patterns for ETL Automation
Pattern 1: Filter Tables for Staging
var includedSchemas = new List<string> { "Sales", "Production", "Person" };
var excludedTables = new List<string> { "sysdiagrams", "ErrorLog", "AWBuildVersion" };
var stagingTables = RootNode.Tables
.Where(t => includedSchemas.Contains(t.Schema.Name))
.Where(t => !excludedTables.Contains(t.Name))
.Where(t => !t.Name.StartsWith("tmp_"))
.Where(t => !t.Name.EndsWith("_Archive"))
.OrderBy(t => t.Schema.Name)
.ThenBy(t => t.Name)
.ToList();
Pattern 2: Group Packages by Priority
// Assumes packages have Priority annotation: High, Normal, Low
var packageGroups = RootNode.Packages
.GroupBy(p => p.GetTag("Priority") ?? "Normal")
.OrderBy(g => g.Key == "High" ? 0 : g.Key == "Normal" ? 1 : 2);
foreach (var group in packageGroups)
{
// Execute High priority first, then Normal, then Low
foreach (var package in group)
{
// Generate ExecutePackage task
}
}
Pattern 3: Find Tables Missing Audit Columns
var requiredAuditColumns = new[] { "LoadDateTime", "LoadBatchId" };
var tablesMissingAudit = RootNode.Tables
.Where(t => !requiredAuditColumns.All(
auditCol => t.Columns.Any(c => c.Name == auditCol)))
.Select(t => new {
Table = t.Name,
Missing = requiredAuditColumns.Where(
auditCol => !t.Columns.Any(c => c.Name == auditCol))
})
.ToList();
// Report tables that need audit columns added
foreach (var item in tablesMissingAudit)
{
// item.Table, item.Missing
}
Pattern 4: Create Table-to-Connection Mapping
var tableConnections = RootNode.Tables
.ToDictionary(
t => t.Name,
t => t.GetTag("SourceConnection") ?? "DefaultSource"
);
// Fast lookup
var connectionName = tableConnections["Customer"]; // Returns connection name
Pattern 5: Generate Column Mappings
// Build source-to-target column mapping (handles name differences)
var columnMappings = RootNode.Tables
.SelectMany(t => t.Columns, (table, column) => new {
TableName = table.Name,
SourceColumn = column.Name,
TargetColumn = column.GetTag("TargetName") ?? column.Name,
DataType = column.DataType
})
.ToList();
Pattern 6: Incremental Load Detection
// Find tables with date columns suitable for incremental loading
var incrementalCandidates = RootNode.Tables
.Where(t => t.Columns.Any(c =>
c.Name.Contains("Modified") ||
c.Name.Contains("Updated") ||
c.Name.Contains("Changed")))
.Select(t => new {
Table = t,
DateColumn = t.Columns.First(c =>
c.Name.Contains("Modified") ||
c.Name.Contains("Updated") ||
c.Name.Contains("Changed"))
})
.ToList();
Troubleshooting LINQ Errors
Common Errors and Fixes
| Error Message | Cause | Solution |
|---|---|---|
Object reference not set to an instance | Null value in chain | Use ?. operator or add null check |
Sequence contains no elements | .First() on empty collection | Use .FirstOrDefault() and check for null |
Sequence contains more than one element | .Single() found multiple matches | Use .First() or add more specific filter |
Cannot implicitly convert type | Lambda returns wrong type | Check lambda expression return type |
InvalidOperationException | .Single() on empty or multiple | Use .SingleOrDefault() |
Debugging LINQ Queries
// Add .ToList() and inspect count at each step
var step1 = RootNode.Tables.ToList();
// Check: step1.Count
var step2 = step1.Where(t => t.Schema.Name == "Sales").ToList();
// Check: step2.Count
var step3 = step2.Where(t => t.Columns.Any(c => c.IsIdentity)).ToList();
// Check: step3.Count - if 0, the filter is too restrictive
Safe Patterns
// UNSAFE: Throws if no match
var table = RootNode.Tables.First(t => t.Name == "MissingTable");
// SAFE: Returns null if no match
var table = RootNode.Tables.FirstOrDefault(t => t.Name == "MissingTable");
if (table != null)
{
// Use table
}
// UNSAFE: Throws if schema is null
var tables = RootNode.Tables.Where(t => t.Schema.Name == "Sales");
// SAFE: Handles null schema
var tables = RootNode.Tables.Where(t => t.Schema?.Name == "Sales");
Next Steps
- C# Primer - Variables, loops, error handling fundamentals
- Introduction to BimlScript - Code nuggets, directives, and tiers
- Common Patterns - Production-ready staging and loading patterns
- Biml Utility Methods - Code reuse with CallBimlScript and helpers