Tuesday 25 July 2017

Slowly Changing Dimension Type 2 in Informatica


We will see the mapping for populating SCD type 2 dimension using Informatica tool.
Slowly changing Dimension Type 2:
·         It contains full history.
·         It contains current and previous records in type 2 dimension.
·         If there are any new records in the source (records which are in source but, not in the target) – We have to insert into the target.
·         If there are any updated records (Records which exists in both source and target – and there are  any change in any of the critical columns) – we have to insert them into the target.

·         When we are inserting the updated records in the dimension, we may get the repetitions of the key. Database will through an exception – Unique Constraint violation.

·         In order to avoid unique constraint violation, we have to have surrogate key in the target. That can be a composite key (combination of key and effective date time or a new column with a sequence (numeric or alpha numeric).

In our current scenario, I am going to take a copy of EMP table which is in SCOTT schema (it is the default schema comes with oracle software) and load into Slowly changing Dimension type 2.

EMP_DIM_SRC going to be the source for the mapping. I am going to insert deptno 10 and 20 records into the source.

Here is the DDL for creating the table –
 CREATE TABLE EMP_DIM_SRC as select * from SCOTT.EMP where DEPTNO in (10,20);

Here is the definition –
SQL> Desc emp_dim_Src
 Name                                                              Null?    Type
 ----------------------------------------------------------------- -------- -------------
 EMPNO                                                                      NUMBER(4)
 ENAME                                                                      VARCHAR2(10)
 JOB                                                                        VARCHAR2(9)
 MGR                                                                        NUMBER(4)
 HIREDATE                                                                   DATE
 SAL                                                                        NUMBER(7,2)
 COMM                                                                       NUMBER(7,2)
 DEPTNO                                                                     NUMBER(2)


Let us create the target table – EMP_DIM2_TGT with SID(an extra column which is not in the source) and Inserted date time and effective date time.

Here is the DDL for the above –

CREATE table EMP_DIM2_TGT as select empno as SID, empno, ename, JOB, SAL, COMM, DEPTNO from emp_dim_src where 1=2;

Lets add 2 audit columns – Inserted_Date_Time and Effective_Date_Time.

ALTER TABLE EMP_DIM2_TGT add (Inserted_Date_Time timestamp(0), Effective_Date_Time Timestamp(0));

Here is the structure of Target – EMP_DIM2_TGT

SQL> DESC EMP_DIM2_TGT
 Name                                                              Null?    Type
 ----------------------------------------------------------------- -------- -------------
 SID                                                                              NUMBER(4)
 EMPNO                                                                      NUMBER(4)
 ENAME                                                                      VARCHAR2(10)
 JOB                                                                             VARCHAR2(9)
 SAL                                                                             NUMBER(7,2)
 COMM                                                                       NUMBER(7,2)
 DEPTNO                                                                     NUMBER(2)
 INSERTED_DATE_TIME                                           TIMESTAMP(0)
 EFFECTIVE_DATE_TIME                                          TIMESTAMP(0)

Let us import source and Target in Informatica.

Steps in creating SCD Type 2 mapping –
·         Import the source (EMP_DIM_SRC) from Oracle Database
·         Import the Target (EMP_DIM2_TGT) from Oracle Database
·         Create the Mapping with the name – M_Emp_Type2_Dimension_Demo
·         Drag and Drop the source from the folder.
·         Take Lookup Transformation and import the EMP_DIM2_TGT as lookup source which is in the targets of Informatica Folder.
·         Drag and Drop EMPNO (lookup source) from Source into Lookup transformation.
·         Edit Lookup Transformation à go to properties tab à add a condition.
·         Select Use Last Value as the Value in the lookup policy on multiple match property to get the last occurrence of Employee record from the lookup source.
·         Based on the output of lookup transformation, we are going to flag the records to identify New and Updated Records. If the map source record exists in the lookup source, it will return some value else it returns NULL value. This is going to be the key in populating the dimension. 
·         Take Expression Transformation into the mapping. Drag and Drop the columns which are needed in the target. And the lookup key and the critical columns from the lookup transformation into expression transformation.
·         Take Inserted_Date_Time from the lookup into Expression.
·         Create a port to hold sysdate to populate the inserted date time for the new records.  
·         Create a port to flag (o_New_Rec_Flag) the new records – Condition is – IIF(ISNULL(L_EMPNO),0,1)
·         Create a port to flag (o_Updated_Rec_Flag) updated records – Condition is – IIF(NOT ISNULL(L_EMPNO) AND (L_JOB <> JOB or L_SAL<>SAL or L_DEPTNO<>DEPTNO),0,1)
·         Uncheck unwanted output ports.
·         Take router transformation to get the new records and updated records.
·         Drag and drop columns into router along with flags.
·         Create 2 groups- one for filtering new records and the other one for updated records.
·         Take a target (EMP_DIM2_TGT) instance and map the new records group to it.
·         Take one more instance of the target to map the updated records as inserts.
·         Take third instance of the target to update the previous record’s effective date with the current record’s inserted date time.
·         Take SID from Lookup to expression, from expression to router, This is used to update the previous record’s effective date.
·         Take update strategy transformation into mapping and map SID and o_inserted_Date_Time to the 3rd target instance.
·         Give DD_UPDATE in the update strategy expression of update strategy transformation.
·         Take Sequence Generator transformation to the mapping and connect NEXTVAL from Sequence Generator to SID of first and the second target instance.
·         Validate the mapping
·         Save the mapping.
·         Create the session.
·         Edit the session and Set the source and target relational connections in the mapping tab of session.
·         Validate the session
·         Save the session
·         Create the workflow.
·         Connect the session to the workflow.
·         Validate the workflow
·         Save the work low
·         Start workflow.

·         Here is the screenshot for the mapping - 




Lets insert some more records in emp_dim_src table.

Insert into EMP_DIM_SRC ( select * from SCOTT.EMP where DEPTNO=30);
Commit;

Lets start the workflow again.

We have 14 records in the source and we will see 14 in the target.

If we start the workflow again, we should not get any records in the target in the current run.

Lets update a record.

Update emp_dim_src set deptno=10 where empno=7369;
Commit;

Lets start the workflow. We will see this updated record as a new record in the target and it has to update 7369’s first record’s effective date.

It has not updated the previous record since we have not created a primary key in the target.

Lets add the primary key to the target.

SQL> alter table emp_dim2_tgt add primary key (SID);
Table altered.


And then start the workflow with one more update.

That’s it. We are done with populating SCD Type 2.


Thanks…

No comments:

Post a Comment