We will see
the mapping for populating SCD type 2 dimension using Informatica tool.
Slowly
changing Dimension Type 2:
·
It
contains full history.
·
It
contains current and previous records in type 2 dimension.
·
If
there are any new records in the source (records which are in source but, not
in the target) – We have to insert into the target.
·
If
there are any updated records (Records which exists in both source and target –
and there are any change in any of the
critical columns) – we have to insert them into the target.
·
When
we are inserting the updated records in the dimension, we may get the
repetitions of the key. Database will through an exception – Unique Constraint
violation.
·
In
order to avoid unique constraint violation, we have to have surrogate key in
the target. That can be a composite key (combination of key and effective date
time or a new column with a sequence (numeric or alpha numeric).
In our
current scenario, I am going to take a copy of EMP table which is in SCOTT
schema (it is the default schema comes with oracle software) and load into
Slowly changing Dimension type 2.
EMP_DIM_SRC
going to be the source for the mapping. I am going to insert deptno 10 and 20
records into the source.
Here is the
DDL for creating the table –
CREATE TABLE EMP_DIM_SRC as select * from
SCOTT.EMP where DEPTNO in (10,20);
Here is the
definition –
SQL> Desc
emp_dim_Src
Name
Null? Type
-----------------------------------------------------------------
-------- -------------
EMPNO
NUMBER(4)
ENAME
VARCHAR2(10)
JOB
VARCHAR2(9)
MGR
NUMBER(4)
HIREDATE
DATE
SAL
NUMBER(7,2)
COMM
NUMBER(7,2)
DEPTNO
NUMBER(2)
Let us
create the target table – EMP_DIM2_TGT with SID(an extra column which is not in
the source) and Inserted date time and effective date time.
Here is the
DDL for the above –
CREATE table
EMP_DIM2_TGT as select empno as SID, empno, ename, JOB, SAL, COMM, DEPTNO from
emp_dim_src where 1=2;
Lets add 2
audit columns – Inserted_Date_Time and Effective_Date_Time.
ALTER TABLE
EMP_DIM2_TGT add (Inserted_Date_Time timestamp(0), Effective_Date_Time
Timestamp(0));
Here is the
structure of Target – EMP_DIM2_TGT
SQL> DESC
EMP_DIM2_TGT
Name
Null? Type
-----------------------------------------------------------------
-------- -------------
SID NUMBER(4)
EMPNO
NUMBER(4)
ENAME
VARCHAR2(10)
JOB VARCHAR2(9)
SAL
NUMBER(7,2)
COMM
NUMBER(7,2)
DEPTNO NUMBER(2)
INSERTED_DATE_TIME
TIMESTAMP(0)
EFFECTIVE_DATE_TIME
TIMESTAMP(0)
Let us
import source and Target in Informatica.
Steps in
creating SCD Type 2 mapping –
·
Import
the source (EMP_DIM_SRC) from Oracle Database
·
Import
the Target (EMP_DIM2_TGT) from Oracle Database
·
Create
the Mapping with the name – M_Emp_Type2_Dimension_Demo
·
Drag
and Drop the source from the folder.
·
Take
Lookup Transformation and import the EMP_DIM2_TGT as lookup source which is in
the targets of Informatica Folder.
·
Drag
and Drop EMPNO (lookup source) from Source into Lookup transformation.
·
Edit
Lookup Transformation à go to properties tab à add a condition.
·
Select
Use Last Value as the Value in the lookup policy on multiple match property to
get the last occurrence of Employee record from the lookup source.
·
Based
on the output of lookup transformation, we are going to flag the records to identify
New and Updated Records. If the map source record exists in the lookup source,
it will return some value else it returns NULL value. This is going to be the
key in populating the dimension.
·
Take
Expression Transformation into the mapping. Drag and Drop the columns which are
needed in the target. And the lookup key and the critical columns from the
lookup transformation into expression transformation.
·
Take
Inserted_Date_Time from the lookup into Expression.
·
Create
a port to hold sysdate to populate the inserted date time for the new records.
·
Create
a port to flag (o_New_Rec_Flag) the new records – Condition is –
IIF(ISNULL(L_EMPNO),0,1)
·
Create
a port to flag (o_Updated_Rec_Flag) updated records – Condition is – IIF(NOT
ISNULL(L_EMPNO) AND (L_JOB <> JOB or L_SAL<>SAL or
L_DEPTNO<>DEPTNO),0,1)
·
Uncheck
unwanted output ports.
·
Take
router transformation to get the new records and updated records.
·
Drag
and drop columns into router along with flags.
·
Create
2 groups- one for filtering new records and the other one for updated records.
·
Take
a target (EMP_DIM2_TGT) instance and map the new records group to it.
·
Take
one more instance of the target to map the updated records as inserts.
·
Take
third instance of the target to update the previous record’s effective date
with the current record’s inserted date time.
·
Take
SID from Lookup to expression, from expression to router, This is used to
update the previous record’s effective date.
·
Take
update strategy transformation into mapping and map SID and o_inserted_Date_Time
to the 3rd target instance.
·
Give
DD_UPDATE in the update strategy expression of update strategy transformation.
·
Take
Sequence Generator transformation to the mapping and connect NEXTVAL from
Sequence Generator to SID of first and the second target instance.
·
Validate
the mapping
·
Save
the mapping.
·
Create
the session.
·
Edit
the session and Set the source and target relational connections in the mapping
tab of session.
·
Validate
the session
·
Save
the session
·
Create
the workflow.
·
Connect
the session to the workflow.
·
Validate
the workflow
·
Save
the work low
·
Start
workflow.
·
Here is the screenshot for the mapping -
Lets insert
some more records in emp_dim_src table.
Insert into
EMP_DIM_SRC ( select * from SCOTT.EMP where DEPTNO=30);
Commit;
Lets start
the workflow again.
We have 14
records in the source and we will see 14 in the target.
If we start
the workflow again, we should not get any records in the target in the current
run.
Lets update
a record.
Update
emp_dim_src set deptno=10 where empno=7369;
Commit;
Lets start
the workflow. We will see this updated record as a new record in the target and
it has to update 7369’s first record’s effective date.
It has not
updated the previous record since we have not created a primary key in the
target.
Lets add the
primary key to the target.
SQL> alter table emp_dim2_tgt add
primary key (SID);
Table altered.
And then
start the workflow with one more update.
That’s it. We are done with
populating SCD Type 2.
Thanks…
No comments:
Post a Comment