DW-BIAnalytics: Incremental Aggregation

Incremental aggregation is the process of reading new and updated records from the source and adding the pre-aggregated values to the current aggregated values and update the target. After executing the ETL job, we will be getting the final aggregated values in the target.

Let us see an example which explains incremental aggregation. In this I am going to explain incremental aggregation with only new transactions.

Say we have transaction header as –

Txn_Hdr_Id	Cust_Id	Txn_Date_Time	Total_Sales
Txn_Hdr_1	C1	1-Feb-13 10:50:33	4590
Txn_Hdr_2	C2	1-Feb-13 11:29:05	3750
Txn_Hdr_3	C4	1-Feb-13 16:34:22	6790
Txn_Hdr_4	C5	1-Feb-13 17:20:45	90000

Transaction Details -

Txn_Det_Id	Txn_id	Prod_Id	Qty	Sales_Price	Sale_Amount
Txn_Det_1	Txn_Hdr_1	P2	10	250	2500
Txn_Det_2	Txn_Hdr_1	P1	11	190	2090
Txn_Det_3	Txn_Hdr_2	P2	15	250	3750
Txn_Det_4	Txn_Hdr_3	P3	5	1358	6790
Txn_Det_5	Txn_Hdr_4	P4	100	900	90000

Say you have to populate the target table which should hold the total quantities sold everyday by each product. After the first load, the data in the target will be -

Id	Prod_Id	Txn_date	Total_Sold_Quantities
1	P1	1-Feb-13	11
2	P2	1-Feb-13	25
3	P3	1-Feb-13	5
4	P4	1-Feb-13	100

Say we have two more transactions in the next run as follows –

Txn_Hdr_Id	Cust_Id	Txn_Date_Time	Total_Sales
Txn_Hdr_1	C1	1-Feb-13 10:50:33	4590
Txn_Hdr_2	C2	1-Feb-13 11:29:05	3750
Txn_Hdr_3	C4	1-Feb-13 16:34:22	6790
Txn_Hdr_4	C5	1-Feb-13 17:20:45	90000
Txn_Hdr_5	C3	1-Feb-13 18:30:37	9000
Txn_Hdr_6	C2	2-Feb-13 09:11:22	15000

Details for those transactions are -

Txn_Det_Id	Txn_id	Prod_Id	Qty	Sales_Price	Sale_Amount
Txn_Det_1	Txn_Hdr_1	P2	10	250	2500
Txn_Det_2	Txn_Hdr_1	P1	11	190	2090
Txn_Det_3	Txn_Hdr_2	P2	15	250	3750
Txn_Det_4	Txn_Hdr_3	P3	5	1358	6790
Txn_Det_5	Txn_Hdr_4	P4	100	900	90000
Txn_Det_6	Txn_Hdr_5	P2	25	250	6250
Txn_Det_7	Txn_Hdr_5	P5	5	550	2750
Txn_Det_8	Txn_Hdr_6	P4	100	150	15000

If we observe the data, we have new transactions in the transaction header table (which are highlighted in blue color and 4 old transactions of previous run (which are highlighted in red color). There are 25 more quantities sold for the product P2 after the first run on 1^st Feb2013 and there are new transactions for products P5 and P4.

The logic in performing incremental extraction when we are running the ETL job in the next run should be -

· We have to ignore old and unchanged records.

· Read new transactions.

· Perform aggregations on the extracted transactions.

· Get pre-aggregated values from the target.

· If summarized data is available for the product in the target, we have to add the current aggregated values to pre-aggregated values and update the target. Otherwise, we have to insert those records into the Summary table.

After the current execution of ETL, we have to add current run’s P2 quantities (25) with old value (25) and insert P4 and P5. Summary table will have the following data -

Id	Prod_Id	Txn_date	Total_Sold_Quantities
1	P1	1-Feb-13	11
2	P2	1-Feb-13	50
3	P3	1-Feb-13	5
4	P4	1-Feb-13	100
5	P5	1-Feb-13	5
6	P4	2-Feb-13	100

Please note that we have to perform incremental extraction to perform incremental aggregation.

I hope I have given an overall picture of incremental aggregation. Your suggestions and Comments are highly appreciated.

Happy Learning.

DW-BIAnalytics

Saturday, 20 June 2015

Incremental Aggregation

1 comment: