The Merge Join Transformation and a SSIS Sorting Tip

My previous blog post was about the SSIS Lookup task and how it really works. Now that I have shown that the Lookup task shouldn’t be used for one-to-many or many-to-many joins, let’s take a look at the Merge Join transformation task. If you follow along with this blog, you will learn a little tip that will eliminate the requirement for you to add a SORT transformation task within your data flow task.

Previously, we isolated our results sets down to one employee in the AdventureWorks database and joined two tables together. I’m going to do the same thing here. This time, I am going to introduce a third table into the join logic. The three tables are listed below:

  • Employee Department History
  • Person
  • Department

Here is what it would look like via SSMS with T-SQL:

Image and video hosting by TinyPic

Let’s see how we can mimic this result set in SSIS without using T-SQL to join the three tables. First, I want to say it is not always going to be the best option not to use T-SQL instead of individual tasks in SSIS. I have learned over time that it is easier to write the join logic directly in you data flow source task sometimes. However, this is for demonstration purposes.

Let’s say you received a request to extract a result set, order the results set, and load it to another location. Here is what your package would look like in SSIS using the Merge Join transformation task:

Image and video hosting by TinyPic

Here are our results:

Image and video hosting by TinyPic

Notice, I used the SORT transformation task in the example above. I used this to depict what has to occur in a step by step approach:

  • Extracted data from the Person and Employee Department History tables
  • Sorted each result set
  • Merged the two results into one using inner join logic
  • Extracted data from the Departement table
  • Sorted the first Joined result set and the Department result set
  • Merge the Joined result set from Persons and Employee History with the Department table

Let’s talk about best practice for this example. This is where the Sort tip is introduced. Since we need an ordered result set per the request, we are using the merge transformation instead of the Union All task. Additionally, we used the Sort task. The Sort task can heavily impact the performance of an SSIS package, particularly when you have larger result sets than what we are going to extract from the AdventureWorks database.

Best practice is to bring in an ordered result set at the source and then merge your record sets. Well, how do you do that? Let’s walk through ordering your result set at the source and configuring your source to define the sorted column for merging your record sets.

First, we open the task and add our ORDER BY clause to our source.

Image and video hosting by TinyPic

Next, close the source task, right click on the same source task, and choose the Show Advanced Editor option.

Image and video hosting by TinyPic

There are two specifications in the Advanced Editor that need to be defined in order to make this work:

  • Click on the Input and Output Properties tab
  • Click on the OLE DB Source Outputs
  • Change the IsSorted parameter to “True”

Image and video hosting by TinyPic

  • Drill down into the OLE DB Source Output to Output columns.
  • Click on your columns that you used in your ORDER BY clause.
  • Change your SortKeyPosition parameter from “0” to “1”.

The desired results should look similar to those below:

Image and video hosting by TinyPic

Next, you can remove each sort task that directly follows your OLE DB Source task by repeating the steps above to reconfigure each source editor. Now, my data flow task looks like this:

Image and video hosting by TinyPic

We get back the same results:

Image and video hosting by TinyPic

In case you are wondering why I got rid of all of the Sort tasks except for the one that follows the first merge join, I’ll explain. There are two reasons for this. My second join is on DepartmentID and, most importantly, the merge transformation task is not considered a data flow source task and does not come with the functionality to define the sorted order.

To conclude my second blog post of this series, the Merge Join transformation task can be used to merge columns from two different tables using Join logic similar to the Joins that can be used in T-SQL. We have looked at a step by step break down of what has to occur to implement a Merge Join transformation task as well as discussing some tips and best practice in regards to using the Sort task in SSIS.

I hope this blog post has been informative and that you look forward to reading my third post soon.

Leave a Reply

Your email address will not be published. Required fields are marked *