Skip to main content

Automate SSIS From SAP to SQL With Biml and Theobald XtractIS

Why Drive SAP Loads From Metadata

A hand authored SSIS package that copies one SAP table runs fine, but maintaining the column data types by hand across dozens of SAP tables quickly breaks down. The trick is to pull the column metadata directly out of the SAP DD03L data dictionary, store the bits the project needs in a small metadata repository, and let BimlScript build the staging tables and the load packages from that metadata. The Theobald XtractIS source handles the SAP side; BimlStudio is required because the TheobaldXtractSapSource element ships in the BimlStudio extensions.

Metadata Repository

Three tables and one view are enough to drive the entire pipeline.

  • 'meta.SAP_Tables' lists the SAP tables to load and an optional custom function name to override the default extractor.
  • 'meta.SAP_Columns' optionally restricts which columns get pulled. A table with no entries here loads every column.
  • 'meta.SAP_DD03L' caches a subset of the SAP DD03L data dictionary. A package populates this table; the modeler does not maintain it by hand.
  • 'meta.SAP_UseColumns' is a view that joins the dictionary cache to the column filter and maps the SAP internal type to a Biml data type.
CREATE DATABASE SapStaging
GO
USE [SapStaging]
GO
CREATE SCHEMA meta
GO
CREATE TABLE [meta].[SAP_Tables](
[TABNAME] [NVARCHAR](30) NULL,
[CUSTOM_Function] [NVARCHAR](50) NULL
) ON [PRIMARY]
GO
ALTER TABLE [meta].[SAP_Tables] ADD CONSTRAINT [DF_SAP_Tables_CUSTOM_Function]
DEFAULT ('') FOR [CUSTOM_Function]
GO
CREATE TABLE [meta].[SAP_Columns](
[TABNAME] [NVARCHAR](50) NULL,
[FIELDNAME] [NVARCHAR](50) NULL
) ON [PRIMARY]
GO
CREATE TABLE [meta].[SAP_DD03L](
[TABNAME] [NVARCHAR](30) NULL,
[FIELDNAME] [NVARCHAR](30) NULL,
[POSITION] [NVARCHAR](4) NULL,
[INTTYPE] [NVARCHAR](1) NULL,
[LENG] [NVARCHAR](6) NULL,
[DECIMALS] [NVARCHAR](6) NULL
) ON [PRIMARY]
GO
CREATE VIEW [meta].[SAP_UseColumns] AS
SELECT *,
CASE WHEN inttype IN ('P') THEN 'Decimal'
WHEN inttype IN ('C', 'N', 'D', 'T', 'F', 'X') THEN 'String'
WHEN inttype IN ('I') THEN 'Int64'
ELSE 'UNKNOWN' END BimlType
FROM (
SELECT SAP.*
FROM [meta].[SAP_DD03L] SAP
INNER JOIN meta.SAP_Columns COL
ON SAP.tabname = COL.tabname AND SAP.FIELDNAME = COL.fieldname
UNION ALL
SELECT SAP.*
FROM [meta].[SAP_DD03L] SAP
LEFT JOIN meta.SAP_Columns COL ON SAP.tabname = COL.tabname
WHERE COL.tabname IS NULL
) a;

Seed the repository with the tables and columns to load. The example below pulls a selected column list from one table and every column from another. The second table also overrides the default extractor:

INSERT INTO [meta].[SAP_Tables] (TABNAME, Custom_Function) VALUES ('MARA', '')
INSERT INTO [meta].[SAP_Tables] (TABNAME, Custom_Function) VALUES ('T001', 'Z_XTRACT_IS_TABLE')

INSERT INTO [meta].[SAP_Columns] (TABNAME, FIELDNAME) VALUES ('MARA', 'MANDT')
INSERT INTO [meta].[SAP_Columns] (TABNAME, FIELDNAME) VALUES ('MARA', 'MATNR')
INSERT INTO [meta].[SAP_Columns] (TABNAME, FIELDNAME) VALUES ('MARA', 'ERSDA')
INSERT INTO [meta].[SAP_Columns] (TABNAME, FIELDNAME) VALUES ('MARA', 'MEINS')

Tier 1: Connections, Database, and Schema

The first Biml file declares the project environment. Two SQL connections share a connection string because the same database holds the metadata and the staged data; the Theobald connection points at the SAP system.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<Connection Name="StgWarehouse"
ConnectionString="Data Source=.;Initial Catalog=SapStaging;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
<Connection Name="MetaRepo"
ConnectionString="Data Source=.;Initial Catalog=SapStaging;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" />
<TheobaldXtractSapConnection Name="SapErp"
ConnectionString="USER=Your_SAP_User PASSWD=Your_SAP_Pwd LANG=EN CLIENT=800 ASHOST=Your_SAP_Host SYSNR=0" />
</Connections>
<Databases>
<Database Name="StgWarehouse" ConnectionName="StgWarehouse" />
</Databases>
<Schemas>
<Schema Name="dbo" DatabaseName="StgWarehouse" />
</Schemas>
</Biml>

Tier 10: Populate the DD03L Cache

The next file generates a package that truncates the dictionary cache and reloads only the entries for the tables in the repository. The Theobald source filters on the SAP side using the table list assembled from 'meta.SAP_Tables':

<#@ template language="VB" optionexplicit="False" tier="10" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# dim tbls as new List(of String)
for each tbl in ExternalDataAccess.GetDataTable(
Rootnode.Connections("MetaRepo").RenderedConnectionString,
"SELECT DISTINCT TABNAME FROM meta.SAP_Tables").Rows
tbls.add ("'" + tbl(0) + "'")
next
#>
<Packages>
<Package Name="01 Read SAP Meta" ConstraintMode="LinearOnSuccess"
ProtectionLevel="EncryptSensitiveWithUserKey"
PackageSubpath="Prepare_SAP">
<Tasks>
<ExecuteSQL Name="TRUNCATE" ConnectionName="MetaRepo">
<DirectInput>TRUNCATE TABLE meta.SAP_DD03L</DirectInput>
</ExecuteSQL>
<Dataflow Name="Read DD03L">
<Transformations>
<TheobaldXtractSapSource Name="Xtract Table" ConnectionName="SapErp"
CustomFunction="Z_XTRACT_IS_TABLE_COMPRESSION"
Table="DD03L"
WhereClause="TABNAME IN (<#= string.Join(",", tbls) #>) and PRECFIELD = ''">
<Columns>
<Column SourceColumn="TABNAME" SsisDataType="DT_WSTR" Length="30" />
<Column SourceColumn="FIELDNAME" SsisDataType="DT_WSTR" Length="30" />
<Column SourceColumn="POSITION" SsisDataType="DT_WSTR" Length="4" AbapType="n" />
<Column SourceColumn="INTTYPE" SsisDataType="DT_WSTR" Length="1" />
<Column SourceColumn="LENG" SsisDataType="DT_WSTR" Length="6" AbapType="n" />
<Column SourceColumn="DECIMALS" SsisDataType="DT_WSTR" Length="6" AbapType="n" />
</Columns>
</TheobaldXtractSapSource>
<OleDbDestination Name="Dest" ConnectionName="MetaRepo">
<ExternalTableOutput Table="[meta].[SAP_DD03L]" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

Build this package and run it once before continuing. Every later tier reads from the populated 'meta.SAP_DD03L' table.

Tier 20: Materialize Tables Into Biml With Annotations

With the cache populated, this file projects every requested table into a Biml Table element. Each column carries Annotations that capture the SAP internal type, the matching SSIS data type, and the resolved length. These annotations carry forward to later tiers so the load package builder does not have to reinterpret SAP types.

<#@ template language="VB" optionexplicit="False" tier="20" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Tables>
<# for each tbl in ExternalDataAccess.GetDataTable(
Rootnode.Connections("MetaRepo").RenderedConnectionString,
"SELECT TABNAME, max(CUSTOM_Function) CUSTOM_Function FROM meta.SAP_Tables " &
"where tabname in (select tabname from meta.SAP_DD03L) group by TABNAME").Rows #>
<Table Name="SAP_<#= tbl(0) #>" SchemaName="StgWarehouse.dbo">
<Columns>
<# for each col in ExternalDataAccess.GetDataTable(
Rootnode.Connections("MetaRepo").RenderedConnectionString,
"SELECT * FROM meta.SAP_UseColumns where tabname = '" & tbl(0) & "' order by position").Rows
tabname = col("TABNAME")
ssisType = "DT_WSTR"
length = int(col("leng")) #>
<Column Name="<#= col("FIELDNAME") #>"
<# if col("BimlType") = "String" then #>
DataType="<#= col("BimlType") #>" Length="<#= length #>"
<# elseif col("BimlType") = "Decimal" then
ssisType = "DT_NUMERIC"
length = int(col("leng")) + int(col("decimals")) + 1
if length > 38 then length = 38 #>
DataType="<#= col("BimlType") #>"
Precision="<#= length #>" Scale="<#= int(col("decimals")) #>"
<# elseif col("BimlType") = "Int64" then #>
DataType="<#= col("BimlType") #>"
<# else #>
DataType="UNKNOWN TYPE: <#= col("inttype") #>"
<# end if #>
IsNullable="true">
<Annotations>
<Annotation AnnotationType="Tag" Tag="SAP_IntType"><#= col("IntType") #></Annotation>
<Annotation AnnotationType="Tag" Tag="SsisDataType"><#= ssisType #></Annotation>
<Annotation AnnotationType="Tag" Tag="Length"><#= Length #></Annotation>
</Annotations>
</Column>
<# next #>
</Columns>
<Annotations>
<Annotation AnnotationType="Tag" Tag="SAP_Name"><#= tabname #></Annotation>
<Annotation AnnotationType="Tag" Tag="CUSTOM_Function"><#= tbl("CUSTOM_Function") #></Annotation>
</Annotations>
</Table>
<# next #>
</Tables>
</Biml>

The decimal branch combines length and decimals from SAP and clamps the result at 38, the upper bound for SQL Server NUMERIC.

Tier 30 Part A: Generate the Staging Tables

Once the tables exist as Biml nodes, GetDropAndCreateDdl returns the DDL needed to recreate each one. A single ExecuteSQL task per table is enough:

<#@ template language="VB" optionexplicit="False" tier="30" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<Package Name="02 Create Tables"
ProtectionLevel="EncryptSensitiveWithUserKey"
PackageSubpath="Prepare_SAP">
<Tasks>
<# for each tbl in RootNode.Tables #>
<ExecuteSQL Name="Create <#= tbl.Name #>" ConnectionName="StgWarehouse">
<DirectInput><#= tbl.GetDropAndCreateDdl #></DirectInput>
</ExecuteSQL>
<# next #>
</Tasks>
</Package>
</Packages>
</Biml>

Tier 30 Part B: One Load Package Per Table

The final file emits a TRUNCATE then LOAD package for every table. The CustomFunction defaults to the compression extractor and falls back to whatever is recorded in the table level annotation. Each Theobald source column reads its data type and length straight from the column level annotations seeded in Tier 20:

<#@ template language="VB" optionexplicit="False" tier="30" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
<# for each tbl in RootNode.Tables #>
<Package Name="02_Load_<#= tbl.Name.Replace("/", "_") #>"
ConstraintMode="Linear" PackageSubpath="Load_SAP"
ProtectionLevel="EncryptSensitiveWithUserKey">
<# custom_function = "Z_XTRACT_IS_TABLE_COMPRESSION"
if tbl.gettag("CUSTOM_Function") <> "" then custom_function = tbl.gettag("CUSTOM_Function") #>
<Tasks>
<ExecuteSQL Name="TRUNCATE <#= tbl.Name #>" ConnectionName="StgWarehouse">
<DirectInput>TRUNCATE TABLE [<#= tbl.Name #>]</DirectInput>
</ExecuteSQL>
<Dataflow Name="Read <#= tbl.GetTag("SAP_Name") #>">
<Transformations>
<TheobaldXtractSapSource Name="Xtract Table" ConnectionName="SapErp"
CustomFunction="<#= custom_function #>"
Table="<#= tbl.GetTag("SAP_Name") #>">
<Columns>
<# for each col in tbl.Columns #>
<Column SourceColumn="<#= col.Name #>"
SsisDataType="<#= col.GetTag("SsisDataType") #>"
Length="<#= col.GetTag("Length") #>"
<# if col.GetTag("SAP_IntType").ToLower <> "c" then #>
AbapType="<#= col.GetTag("SAP_IntType").ToLower #>"
<# end if #>
<# if col.Scale > 0 then #>
Decimals="<#= col.Scale #>"
<# end if #> />
<# next #>
</Columns>
</TheobaldXtractSapSource>
<OleDbDestination Name="Dest" ConnectionName="StgWarehouse">
<ExternalTableOutput Table="[<#= tbl.Name #>]" />
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
<Annotations>
<Annotation AnnotationType="Tag" Tag="IsLoad">True</Annotation>
</Annotations>
</Package>
<# next #>
</Packages>
</Biml>

Why the Tier Pattern Matters

The flow runs in three passes because each later file depends on what the previous one produced. Tier 1 declares connections and the database. Tier 10 populates the dictionary cache by running a generated package. Tier 20 reads that cache to project tables and annotations into RootNode. Tier 30 reads RootNode to emit DDL and load packages. The annotations decouple the SAP specific shape from the rest of the project so downstream files keep working in plain Biml without knowing anything about ABAP types.