Schema Definitions from an ODBC Text Source
Why This Pattern Matters
Sometimes a project must consume hundreds of flat files when direct database access is not allowed and the file documentation is incomplete. Manually defining the schema for each file is tedious and error prone. An ODBC text driver can expose a directory of files as a virtual database, where each file becomes a table. With that in place, the schema collection on the connection returns the column names and types for every file, and that metadata can drive ETL generation.
Setting Up the Driver
A 64-bit text ODBC driver is not installed on Windows by default. The Microsoft Access database engine redistributable provides one. Once installed, open the ODBC Data Sources control panel and add a new System or User DSN that uses the text driver. Pointing the DSN at the directory containing the files is enough; default file extensions handle most cases. The DSN gets a name (for example, 'flat_file_inbox') that the connection string later references.
Confirming the Source Works
After the driver is configured, the same DSN can be tested by importing one of the files into Excel through Get External Data, From Other Sources, Microsoft Query. The DSN appears as a database, and each file in the directory appears as a table. Confirming the source works in a familiar tool first keeps later code from chasing driver issues.
Reading the Schema in Code
A managed ODBC connection exposes schema collections that describe the source. The 'GetSchema' method on the connection returns metadata such as the list of tables, and a second call constrained to a single table returns its columns. The snippet below opens the DSN, lists every table, and writes each column name, type, and size for that table:
OdbcConnection conn = new OdbcConnection("DSN=flat_file_inbox");
conn.Open();
DataTable schema = conn.GetSchema();
DataTable restrictions = conn.GetSchema("Restrictions");
DataTable tables = conn.GetSchema("Tables");
foreach (DataRow tableRow in tables.Rows)
{
Console.WriteLine("{0}", tableRow["TABLE_NAME"]);
string[] tableRestrictions = { null, null, tableRow["TABLE_NAME"].ToString() };
DataTable columns = conn.GetSchema("Columns", tableRestrictions);
foreach (DataRow colRow in columns.Rows)
{
Console.WriteLine("\t{0},{1},{2}", colRow["COLUMN_NAME"], colRow["TYPE_NAME"], colRow["COLUMN_SIZE"]);
}
}
conn.Close();
Calling 'GetSchema("Columns")' without restrictions returns every column for every table at once, which can also be useful when the result is being inserted into a metadata table for later reuse.
Why It Works
The Schema Information collections on a database connection are conceptually similar to 'INFORMATION_SCHEMA' views inside a relational database. When pointed at a folder of flat files through an ODBC text driver, those same collections describe the files themselves. That metadata becomes the input that drives package generation, table creation, and any other downstream ETL work.