5.24.1. Historized Slowly Changing Dimensions

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.24. TA - Data Engineering >

5.24.1. Historized Slowly Changing Dimensions

 

Icon: clip1182

 

Function: SlowlyChangingDimensions

 

Property window:

 

clip1183

 

Short description:

 

Create a historized view of a SCD table.

 

Long Description:

 

When engaging in predictive analytic activities, we must be able to “see” any table as it was at a specific point in time in the past (i.e. in technical terms, we must see a table at the “observation date”). Unfortunately, most operational systems do not keep an history of their tables and “going back in time” is almost never possible…  …unless you use the Anatella “Slowly Changing Dimensions” Action!

 

This Action creates an output table that keeps tracks of all the changes made to the specified (database) table. This process is typically referred as “Keeping a historical view of a Slowly Changing Dimension”: you’ll find more details on this process here:

 

https://en.wikipedia.org/wiki/Slowly_changing_dimension

 

Typically, you run this action every day (or week) and it will look for all the changes inside an input table and logs them all inside one output table.

 
According to wikipedia, there exists 6 different techniques to handle slowly changing dimensions:

 

SCD Type

Description

Type - 0

Retaining Original: you just get one dump

Type - 1

Overwriting: you overwrite the dimension with the last version

Type - 2

Adding new rows (you keep history as rows)

Type - 3

Adding columns (you keep history as extra columns)

Type - 4

Adding new records on new table (History Table)

Type - 6

Hybrid approach (Type 1 + Type 2 + Type 3) ß  used here

 

Here we will be using a ‘Type - 6" implementation including :

 

A surrogate (or technical) key as unique id: This is a composite key composed of the columns “SCD_Key” and “SCD_Seq”.

Active line indicator: “SCD_current_flag”

Start date and End date: [SCD_from_date, SCD_to_date[

Seq number of history: “SCD_Seq”

A very basic deactive flag  (only when the parameter P3 is “contains the full table”): “SCD_active”

 

The parameter P4 is the name of a unique key (e.g. customer key) inside the input table. This key must be unique in the input file.

 

Ideally, the input file should also contain a column that contains the date of the last update of a row (the date format is yyyyMMdd). The name of this “lastUpdateDate” column is given in the parameter P7. If you don’t have such a column, no worries: This is a very common situation: Just set the parameter P6 to “…does not contains a "lastUpdateTime" column (use the current date instead)”.

 

The output table contains these additional columns (in addition to all the columns from the input table):

 

SCD_Key

Logical key of the slowly changing dimension. Equal to the key (e.g. customer key) in the input file

SCD_seq

The sequence of statuses… first status = 1… next update = 2 …

SCD_from_date

Start moment of the validity for the record (YYYYMMDD)

SCD_to_Date

End date of the validity (YYYYMMDD)

SCD_current_flag

1 if the record is the current active record

0 if the record is a historic record

SCD_active

This is a technical active flag only relevant when the parameter P3 is “contains the full table”):

    A = active

    D = deleted

If the primary key is in the input file, then the status is set to A. If the primary key is missing in the input file the status will be ‘D’.