What is the first thing that comes to your mind as a Microsoft BI expert, when you think of a data migration tool? Undoubtedly SSIS!
As you all know, SSIS is a great tool for migrating data from legacy systems and databaseas into newer and more robust places and repositories to better serve knowledge and data workers.
Yesterday, I was assigned to a task which was about getting the list of custoemrs from SQL Server 2000 into SQL Server 2008, using SSIS 2008. It didn’t seem something extraordinary at first: all I needed to do was to get an OLE DB source into my Data Flow Task, have a lookup table to filter existing records in the table, and pass the new rows into the destination table. I then realized that not all the customers need to be migrated, and there is a stored procedure written in SQL Server 2000 to return the list of the customers that should be migrated. The SP works this way: gets the Id of a principal as input, and extracts the list of customers for the principal based on some pre-defined business logic.
What I needed to be able to do was to pass the principal ID as my package’s parameter to SP, and use the result returned by SP in my source query’s WHERE clause. Passing parameter to package and then use it in t-sql query is nothing difficult. It gets messy when you need to use that parameter in a nested query: using IN keyword in the main query and using SP in the internal query to build a temporary table for IN keyword.
I have only one word for doing this thing: Impossible!
I tried different types of calling a SP in another query’s WHERE clause. I even got the SP’s code and put it into my t-sql query. But as you may know, SSIS’s OLE DB source does not support parameters in nested queries.
So, I tried to design a strategy for doing this. A strategy to get rid of the parameterized nested query.
What I did is, I added a Execute SQL Task to my package, which gets executed before my Data Flow task. The role of this Execute SQL Task is to get the list of customer IDs from the source database by executing the SP, and store them in a temporary table in destination SQL Server 2008 using this piece of code:
IF Object_Id('TemporaryTablet', 'U') IS NOT NULL
DROP TABLE TemporaryTable
CREATE TABLE TemporaryTable(CustID int)
INSERT INTO TemporaryTable EXECUTE dbo.Sql2000SP ? --? is the parameter passed to SP
Then, I connect this Execute SQL Task to my Data Flow task. This way, the new Data Flow task is not paramtereized anymore because the parameter, which is the principal ID, is used in previous step to pull the list of customers into the temporary table using Execute SQL Task. This is the code that gets data in my OLE DB Source:
SELECT Column1, Column2, Column3
WHERE CC.CustId IN (SELECT * FROM TemporaryTable)
Bingo! It works without any problem: there is no parameter to be passed to SP or the nested query to replicate SP’s logic in OLE DB Source! 🙂