Skip to main content

Scripting an initial load into PDW

Before the SqlServerPdwDestination tag was introduced in BimlStudio, using Biml to automate building SSIS packages that target PDW was very complex.

With the convenient SqlServerPdwDestination tag, Biml for PDW is very straightforward.

The above Biml script has been tested on BimlStudio targeting PDW v2 Appliance Update 3 (circa December 2014). It requires you have the PDW SSIS destination adapter that matches your version of SSIS and your appliance version (appliance update or AU). The AU3 installers can be found here.

Note that this script is not supported in BimlExpress and requires BimlStudio. Only sources and destinations that are installed with the SSIS installer itself are supported by Biml in BimlExpress. For example, the PDW destination must be installed after installing SSIS.

Note that sys.tables in this example is a straightforward way to get a list of SQL tables, but many times ImportDB or GetSchema may be helpful in that it retrieves much more rich information including columns and data types.

<#@ template language="C#" hostspecific="true"#>
<#@ import namespace="System.Data" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="Source" ConnectionString="Provider=SQLNCLI11.1;Server=localhost;Initial Catalog=YourDB;Integrated Security=SSPI;" CreateInProject="false"/>
<OleDbConnection Name="PdwOledb" ConnectionString="Provider=SQLNCLI11.1;Server=PDW1-CTL01,17001;Initial Catalog=YourDestinationDB;Integrated Security=SSPI;" CreateInProject="false"/>
<SqlServerPdwConnection Name="SQLPDWConnection" CreateInProject="false" ConnectionString="Data Source=PDW1-CTL01,17001;Initial Catalog=YourDestinationDB;Integrated Security=True" StagingDatabase="" LoaderPort="8002" EncryptData="false" DelayValidation="false"/>
</Connections>


<Packages>
<#
string metadataConnectionString = "Provider=SQLNCLI11.1;Server=PDW1-CTL01,17001;Initial Catalog=YourDestinationDB;Integrated Security=SSPI;";
string query = @"SELECT '[' + s.name + '].[' + t.name + ']', s.name + ' ' + t.name
,'select d.* '
+'from [' + s.name + '].[' + t.name + '] d (nolock)'
FROM sys.tables t
INNER JOIN sys.schemas s on t.schema_id = s.schema_id
";
DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString, query);
foreach (DataRow row in tables.Rows)
{ #>
<Package Name="InitialLoad <#=row[1]#>" ConstraintMode="Linear" AutoCreateConfigurationsType="Xml" PackageSubpath="InitialLoad" Type="InitialLoadType">
<Tasks>
<ExecuteSQL ConnectionName="PdwOledb" Name="Truncate destination">
<DirectInput>truncate table <#=row[0]#></DirectInput>
</ExecuteSQL>
<Dataflow Name="Copy Data">
<Transformations>
<OleDbSource Name="Retrieve Data" ConnectionName="Source">
<DirectInput><#=row[2]#></DirectInput>
</OleDbSource>
<SqlServerPdwDestination Name="PDW dest" ConnectionName="SQLPDWConnection" LoadingMode="FastAppend" RollbackOnFailure="false" LocaleId="None" ValidateExternalMetadata="true">
<ExternalTableOutput Table="<#=row[0]#>">
</ExternalTableOutput>
</SqlServerPdwDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
<# } #>
<Package Name="InitialLoad Master" ConstraintMode="Parallel" AutoCreateConfigurationsType="None" PackageSubpath="InitialLoad" Type="InitialLoadType" MaxConcurrentExecutables="10">
<Tasks>
<#
foreach (DataRow row in tables.Rows)
{ #>
<ExecutePackage Name="InitialLoad <#=row[1]#>">
<Package PackageName="InitialLoad <#=row[1]#>"></Package>
</ExecutePackage>
<# } #>
</Tasks>
</Package>
</Packages>
</Biml>