I spent almost the whole last week and the first 2 days of this week trying to improve my BI solutions’ performance. In my quest on learning the tricks to make my package faster, I came across SSIS Performance Design Patterns video by Matt Masson. A comprehensive discussion indeed, that I’m gonna list in vrief here:
1- Utilize parallelism: It is easy to utilize parallelism in SSIS. All you need to do is to recognize which Data Flow Tasks (DFTs) could be started at the same time and set the control flow constraints of your package in the way that they all can run simultaneously.
2- Synchronous vs. Asynchronous components: A synchronous transformation of SSIS takes a buffer, processes the buffer, and passes the result through without waiting for the next buffer to come in. On the other hand, an asynchronous transformation needs to process all its input data to be able to give out any output. This can case serious performance issues when the size of the input data to the async. transformation is too big to fit into memory and needs to be transferred to HDD at multiple stages.
3- Execution tree: An execution tree starts where a buffer starts and ends where the same buffer ends. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation; however, it is important to note that each new tree may also give you an additional worker thread.
4-OLE DB Command transformation: OLE DB Command is a row-by-row transformation, meaning that it runs the command in it on each one of its input rows. This make sit to be damn too slow when the number of the rows goes up. The solution for boosting performance is to stage data into a temporary table and use Execute SQL Task outside that DFT.
5- SQL Server Destination vs. OLE DB Destination: There are multiple reason why to use OLE DB Destination and not use SQL Server Destination:
- OLE DB Destination is mostly faster,
- OLE DB Destination is a lot clearer when it fails (The error message is more helpful),
- SQL Server Destination works only when SSIS is installed on the destination server.
6- Change Data Capture (CDC): Try to reduce the amount of data to be transferred to the maximum level you can, and do it as close to the source as you can. A Modified On column on the source table(s) helps a lot in this case.
7- Slowly Changing Dimension (SCD) transformation: There is only one advice about SSIS’s Slowly Changing Dimension transformation, and that is get rid of it! The reasons are:
- It doesn’t use any cached data, and goes to the data source every single time it is called,
- It uses many OLE DB Command transformations,
- Fast Data Load is off by default on its OLE DB Destination.
I recommend you to go and watch the whole video if you have enough time. All these topics are discussed more in details in the video.