Monthly Archives: June 2011

Populating Data Warehouse Using SSIS

ETL tools are meant to be used for moving data from a source to a destination. They make life of a data warehouse or BI developer easy by providing easy to use user interfaces and many functionalities to cleanse data before loading into destination. But the situation is not always this easy, especially when moving data from production database (OLTP) to the data warehouse: you need to reference data in both of the different databases to be able to populate your DW with correct data, which generally refers to substituting dimension ID columns with the “Surrogate Keys” that are pre-loaded into data warehouse. Let’s have a look at different choices a developer has for referencing multiple databases in his SSIS package:

1- Using Linked Servers: The first solution that might occur to a developer could be referencing the tables in both databases at the query level, by doing a JOIN between them and stating the databases’ names before each table’s name to differentiate tables from different sources. This needs a linked server been set up between the 2 referenced databases. You can check this pageif you like to learn how to create a linked server.Let’s see an example of the queries that can be used when having a linked server. Suppose that the production database instance is named OLTP and the instance of database that is acting as data warehouse is named DW. Let’s assume there are 2 tables in our OLTP database, with following relationship:

When pulling data from this database into DW, the dimension tables’ IDs (in this case, SalesPersonID) must be replaced with surrogate keys:

If there is a linked server set up between OLTP and DW servers of the above example, then the query that populates SalesFact table in DW looks like this:

SELECT       ‘Amount’ = s.Amount
,’SalesPersonKey’ = dsp.SalesPersonKey
FROM         [OLTP].[dbo].Sales s
INNER JOIN [DW].[dbo].DimSalesPerson dsp ON s.SalesPersonID =              dsp.SalesPersonFullAlternateKey

2- SSIS Lookup Transformation: In cases when it is not possible to create a linked server between the two data sources for reasons like security concerns, SSIS’s Lookup transformation can be used. Lookup transformation is capable of doing many things for a BI developer, from adding new records to a destination table to replacing the columns from different tables. To replace a column from one table with a column from another one using Lookup transformation, all you need to do is to tick the column from referencing table in columns tab of your Lookup transformation:

    You can even replace the old column (SalesPersonID) with the new one (SalesPersonKey) by setting the Lookup Operation column:

This mechanism could be used to replace many IDs with the appropriate Surrogate Keys by using a waterfall of lookups, one for each dimension.