BimlFlex Configurations and the Data Mart
BimlFlex Configurations are used to apply and drive default behavior across multiple data architecture concepts. At a high level, Configuration values control where/when they get applied and how they are calculated.
Tip
For additional details on managing Configuration, Attributes and Configuration Overrides refer to the below guides:
Common Configurations
The below tabs outline common Configurations used in the Data Mart, along with recommended field values. These are the recommendations only if choosing to enable the Configuration(s) and if no prior standard is in place. These can be adjusted to fit an organization's specific data standards.
Configuration | Description |
---|---|
RowStartDate | Defines the start of time definition for a row in the data warehouse. |
RowEndDate | Defines the end of time definition for a row in the data warehouse. |
RowIsCurrent | Flag to set if the row is current. |
RowIsInferred * |
Flag to set if the row is inferred. |
RowHashType1 ** |
Defines the expression used to derive a row hash for type 1 attributes. |
RowHashType2 *** |
Defines the expression used to derive a row hash for type 2 attributes. |
Note
*: Requires the Setting DmInferDim
to be enabled prior to use.
**: Only used if there is a Column with a CHANGE TYPE of Type 1
in the Object.
***: Only used if there is a Column with a CHANGE TYPE of Type 2
in the Object.
Setting a Configuration for the Data Mart
All Configurations are global and set in the Configuration Editor. These can be enabled and disabled for each data architecture concept.
Warning
Configurations are global, so be aware of prior Configurations before changing any of the mentioned values.
Some Configurations, such as RowIsCurrent
, may already be in use by the Staging or Data Vault Integration Stages.
Any changes to a field value will impact all Integration Stages where the Configuration is enabled.
If logic needs to be different between Integration Layers consider a project level Configuration Override.
Enabling A Configuration
In order for a Configuration to be applied the associated Concept
ATTRIBUTE will need to be configured.
For the Data Mart this is either the DIM ATTRIBUTE or FACT ATTRIBUTE fields for a Dimension or Fact respectively.
Important
The RowIsInferred
Configuration will not be used unless the DmInferDim
Setting is enabled.
When enabling the DmInferDim
Setting ensure that the RowIsInferred
Configuration is also enabled for DIM ATTRIBUTE.
Once enabled, the associated Configuration will be applied to every instance of the associated concept, unless there is an associated Setting preventing this. If required, a Configuration Override can be applied at either a Connection, Batch, Project or Object level.
Tip
For additional details on Configurations and their definitions refer to the below guide:
Overriding Configurations
There are often many factors that go into a Data Warehouse and it's environment. Not everything is always under the project team's control or specifications. It isn't uncommon to have to maintain support for an legacy system while migrating to a new tool or standard.
When a scenarios requires a differing or conflicting Configuration, a Configuration Override can be applied. A Configuration Override can be configured at the Connection, Batch, Project or Object level.
The Configuration itself is applied at an object level, so a column level override is not applicable or supported.
Important
Overriding a Configuration is currently only supported when using a SQL Based ELT
Connection.
A Configuration can be quickly added from Configuration Editor.
Clicking this button will open a Add Attribute Dialog that is populated with the required ATTRIBUTE value.
Next the level of the Configuration Override needs to be set by assigning ATTRIBUTE TYPE (and any supporting fields if required).
Enter the value to be used as the Configuration Override in the ATTRIBUTE VALUE field.
Don't forget to click the Save Button to record the changes.
Checking if an Override Exists
A Configuration Override is persisted as a unique type of Attribute. Any location in BimlFlex that would list out the associated Attributes will also list the Configuration Overrides along with the Standard Attributes.
It is recommended that when specifically looking for a Configuration Override that the the Configuration Editor is used. This will give a complete view of all Configuration Overrides for a specific Configuration.
Examples are given below that represents how a Configuration Override would be shown across the Attributes Tab. Overrides have been defined at each supported level to better communicate which Configuration Override will appear in each associated entity editor.
Note
A Configuration Override can be identified as an Attribute with a {ConfigurationKey}_{FieldName}
naming pattern.
Outside of the Configuration Editor this will have to be manually identified as they will be displayed alongside Standard Attributes.
The value assigned to the override is entered into the ATTRIBUTE VALUE field. This is different from a Standard Attribute which follows a different pattern.
EXAMPLES:
Configuration Override for a
RowStartDate
's SQL SOURCE EXPRESSION:RowStartDate_SqlSourceExpression
Configuration Override for a
RowIsCurrent
's DATA TYPE:RowIsCurrent_ConfigurationDataType
Note
The values used in the screenshots in the tabs above are show for example only. They do not represent valid code or recommended practice.
General Implementation Notes
All Data Mart Configuration Overrides should be placed on the Data Mart Entity itself.
When implementing an override for a single dimension, it would be on the Object with the Dimension
OBJECT TYPE.
When using a higher level override it would be one containing the Fact
or Dimension
OBJECT TYPE requiring the override.
If multiple identical Configuration Overrides are required and they all share the same naming patterns/rules consolidation may be possible through use of a Batch, Project, or Connection level override. Please note that all Objects contained within the scope of the effected entity will receive the applied Configuration Override. Creation and management of a separate BimlFlex Project to manage is useful if there is a large group of tables requiring alternate logic, but impractical if there is only a few exceptions or alternating logic.
Important
It is important to always keep standardization in mind as the primary goal.
The constant creation of Configuration Overrides can increase the complexity of the environment and introduce additional troubleshooting costs.
A Configuration Override should be used with care to handle exceptions.
The base Configuration should be used to handle standards.
Example Scenarios
To better communicate the use of Configuration Overrides in practice, the functional scenarios below have been provided. These cover a few real world examples that require the use of a Configuration Override along with the steps required to implement them.
Overriding Type 2 Logic
Scenario
The business uses another source handling the management of effectivity dates for a specific dimension.
The business rules for effectivity dates are identical to BimlFlex besides
RowStartDate
being derived from another process PRIOR to loading the Data Mart.These effectivity dates should be used in place of the ones BimlFlex would normally generate on load of the Data Mart.
This is a scenario that only applies to a single dimension and all other dimensions with a Type 2 Column should use the standard pattern of terminating effectivity on load of the dimension.
Scenario 1 Implementation
There are two basic ways to handle the scenario.
Method A is the most versatile and is useful when there is no ability to control the name of the incoming column. If the source feed for the Data Mart is a physical table this is a primary candidate for implementation.
Method B requires the incoming column to be named the same as the CONFIGURATION VALUE field for the Configuration. If the source feed is a view created/controlled by the developers this is a primary candidate for implementation.
Each method requires only a single Configuration Override (per column) and performs an identical function.
Note
These methods will exclude a Configuration Override for the RowEndDate
and RowIsCurrent
columns due to the business rules for effectivity being identical.
When supplied with only the RowStartDate
, BimlFlex will automatically calculate RowEndDate
and RowIsCurrent
identical to how the environment is currently configured.
If business rules for effectivity where different, a Configuration Override for RowEndDate
and RowIsCurrent
could also be applied.
Repeat the chosen method for each Configuration as needed.
The following method assumes the DIM ATTRIBUTE is already configured to Derived
for any of the applicable Configurations being overridden.
This is the recommended setting and is outlined Common Configurations - Attributes table above.
Note
If this DIM ATTRIBUTE is set to a value besides Derived
, an additional Configuration Override will be needed for set the DIM ATTRIBUTE to Derived
.
The steps from Method B
can be used simply by substituting and reference to Source
with Derived
.
When the DIM ATTRIBUTE is set to Derived
the SQL SOURCE EXPRESSION can be substituted with a column reference.
Create a Standard Override Configuration for SQL SOURCE EXPRESSION with the following values.
Field Name | Value | Example |
---|---|---|
Attribute Type | Object | Object |
Connection | {DimensionConnection} | BFX_DM |
Object | {DimensionName} | dim.Type2_History |
Attribute | RowStartDate_SqlSourceExpression | RowStartDate_SqlSourceExpression |
Attribute Value | {QualifiedColumnName} | [BeginDate] |
Note
{ ... }
indicate a variable value.
This is all that is required and your Dimension should now be ready to accepting and version incoming Type 2 Data.
Note
The column being referenced in the SQL SOURCE EXPRESSION is not required to have BimlFlex Column metadata. All that is required is for the column (or columns/values used in the expression) to be available on the Source.
One Time Historic Load
Scenario
The business is migrating creation and maintenance of an existing Type 2 Dimension to an identical one to be maintained in BimlFlex.
Historic data for the dimensional attributes should be preserved, though the specific Surrogate Key values do not need to be maintained and can be regenerated.
Once historic data is loaded the standard pattern of terminating effectivity on load should be applied.
Scenario 2 Implementation
To start with, two data feeds will be required, the Historical Data (or an Existing Dimension) and the Source Data/Query for continual dimensional processing. The general process will involved proceeding as if implementing Overriding Type 2 Logic and processing the job. After initial load of the historic data, the Configuration Override(s) are removed and the sourcing Object is then adjusted to point to new processing feed.
This method outlined below requires that both the Source and Target schema are identical. If the schemas are not aligned then modifications should be made to the either the initial historical load or the final processing to align the schemas. It is recommended that if adjustments need to happen that a view be created (or adjusted) to handle any logic adjustments as needed. This helps insolate Data Mart and improve resilience to unforeseen change and adjust without requiring the reconstruction of Data Mart processing routines.
Tip
It recommended that you persist the SQL used to create the view in the SOURCE CREATE SQL field for the Object. BimlFlex will use this SQL in the Create Table Scripts process to deploy it alongside any other Database artifacts required.
If the environment does not support the creation of a view for handle the initial load, BimlFlex features such as Data Type Mappings, Derived Columns, or a SQL SOURCE EXPRESSION/SSIS DATAFLOW EXPRESSION could be used.
Loading Historic Data
The Historical Data will need to include all the dimensional attributes along with any of the effectivity dates requiring override.
As with Scenario 1, the only values required are the RowStartDate
values.
BimlFlex will implement the RowEndDate
and RowIsCurrent
automatically after the first load.
Using the existing as a Source, Import the Metadata for the Dimension. If any metadata columns (effectivity dates and current flag) made it into the initial import of metadata, they should be removed now. The only columns that should be present are the attributes we want to populate the Dimension with.
Follow the normal process and ensure that the Dimension is created and mapped as normal.
Apply Configuration Overrides identical to as outlined above in Overriding Type 2 Logic.
Run the process and ensure all Dimension history is created correctly. Once the load has been verified the next step can be performed.
Adjust Source Object
This step involves adjusting of the currently map source Object for the Dimension to point to the new feed. Adjust the PROJECT, CONNECTION, SCHEMA, OBJECT NAME, and SOURCE CREATE SQL (or any other SQL OVERRIDES) as needed.
Important
If the Source Object has a change to the PROJECT, CONNECTION, SCHEMA, or OBJECT NAME will likely trigger the creation of a new and separate job (Package/Pipeline). At the same time this will result in the old job being left as an orphaned artifact in the effected orchestration environment.
BimlFlex does not remove tables, packages or pipelines from environment by default. Objects are only adjusted or added based on the generated artifacts name. Check your system for the following potential orphaned artifacts and address as needed.
- View (Used to load history)
- Staging Table
- Stored Procedure
- DTSX Package
- ADF Pipeline
Naming pattern for these artifacts differ between environments but generally involve the use of the PROJECT, CONNECTION, SCHEMA and OBJECT NAME. When checking for orphaned artifacts ensure you are looking for the prior values, not the current assigned values.
The above will only happen if change has been applied to a field used in currently configured naming patterns. If no changes were made to the fields used in naming of the artifacts then no orphaned artifacts should be created.
Remove any Configuration Override(s) used to load history. Ensure that any logic used to interpret effectivity dates is removed as the values will be derived on processing of the Type 2 Dimension.
Important
If only adjusting the SOURCE CREATE SQL ensure that changes are deployed prior to executing the adjusted process.
The new source view should be without metadata columns and verified that it will only return the most recent record for a given Integration Key (IK)
Warning
The grain of the load has shifted.
When BimlFlex performs standard Type 2 processing, only the current record should be returned. It is this record that is then compared against the current record held in the Dimension to see if a delta needs to be processed.
This is different from the loading of the history which may have had multiple records for a given IK across multiple effectivity dates.
The new process and schema are now ready to be built and deployed. Once deployed it should be available to maintain and record any future Type 2 Dimension changes.