The SQL Server Experts.

I heard about Adam Machanic’s #TSQL2sday initiative and I thought it was a fantastic idea.  For those of you not familiar with this initiative read up about TSQL2sday  Here’s a brief synopsis for this month’s topic.

Theme

Have you ever found yourself unable to figure out the intricacies of how some piece of code works? Ever been confused by the results you’ve gotten back from a query, only to find out that something totally unrelated was going on? Or have you ever been compelled to wile away your spare time working on a “challenge” posted by some blogger?

For this month’s T-SQL Tuesday, I’m asking participants to write a blog post on a “puzzling” topic, along the lines of some of the following ideas:

  • Describe a confusing situation you encountered, and explain how you debugged the problem and what the resolution was
  • Show a piece of code that doesn’t behave as most people might expect, and illustrate the reasoning behind the discrepancy
  • Create a challenge for your readers to solve

As always, even given the event’s name the posts are not limited to T-SQL! Any component of, or software product related to SQL Server, is fair game. MDX, SSIS, LINQ to SQL, Entity Data Model, NHibernate, and any other software product that deals with SQL Server data can be featured in your post. Be creative!

Rules

As before, any blogger can–and should–feel free to participate in this event. In order to make the event slightly more international, I’m changing the time range from PST-based to UTC-based. So the rules are:

  • Your post must go live between 00:00:00 UTC and 23:59:00 UTC on Tuesday, January 12, 2010
  • Your post must link back to this one, and it’s recommended that you clearly identify the post as a T-SQL Tuesday post
  • You are responsible for ensuring that a trackback or comment appears here so that I can find the posts

Follow the rules, and your post will be included in the roundup to be posted on January 13 or 14. Don’t follow the rules, and it won’t show up there. Simple as that!

My Submission

Not every SQL Server upgrade is equal.  Some are very straight forward and simple.  Yet others are just a complete bear, and have many things that have to be very carefully thought out and tested many times in order to insure things go smoothly.   The upgrade from SQL Server 2000 to SQL Server 2005 or 2008 is no exception and often times it will yield a few obstacles in your path that may slow you down some and at times it may get quite confusing in places.   To start out we may have to convert a lot of our DTS packages to SSIS packages.  This is often a chore and it is confusing in itself sometimes, but that’s not what I wanted to write about.  What I wanted to write about today are the obstacles that I often face during the conversion of the old TSQL join syntax in queries to the SQL-92 standard syntax.   With the old TSQL standard (pre-SQL Server 2005), left and right outer join conditions were able to be specified in the WHERE clause using the *= and =* operators.   In some cases,  this results in an ambiguous query that can be interpreted in more than one way. SQL-92 compliant outer joins on the other hand are specified in the FROM clause and therefore do not result in this same ambiguity.   In order to Upgrade to SQL Server 2005 or SQL Server 2008 we are forced to convert to these types of joins or we have to run our databases in a lower capability mode.  We can run the lower capability in some environments to ease the transition for a little while but eventually we will have to convert them all to the new standard.  So let’s now look at an example of how we can convert some queries to the SQL-92 standard from the old TSQL syntax.  Review the following lines of code:

select SubCategory=psc.Name, ProductName=p.Name, p.ListPrice, p.Color

from Production.Product p, Production.ProductSubcategory psc

where p.ProductSubcategoryID *= psc.ProductSubcategoryID

and Color is not null

--(256 row(s) affected)

 This query can be really simple to convert.  We just to remove the *= and move up the two fields referenced into the ON clause of the LEFT OUTER JOIN statement.  The result sets match and everything seems to be fine.

SELECT psc.Name AS SubCategory, p.Name AS ProductName, p.ListPrice, p.Color

FROM Production.Product AS p LEFT OUTER JOIN

Production.ProductSubcategory AS psc ON p.ProductSubcategoryID = psc.ProductSubcategoryID

Where (p.Color IS NOT NULL)

--(256 row(s) affected)

That wasn’t so bad, right?  Let’s now look at a little more complex example.   Concider the next few lines of code:

SELECT DISTINCT branch.branch_no,branch.branch_name,site.site_no
FROM site,
customer_site,
branch
WHERE site.site_no *= customer_site.site_no
and getdate() between customer_site.start_date and customer_site.end_date
and branch.branch_no = site.branch_no
and ( site.co_no = 1 )
ORDER BY branch.branch_no ASC, site.site_no ASC

If we did the same thing as before, everything would look to be just fine but when we actually run the query we will discover that the result sets would return different counts of records.  This is the ambiguity I was refering to earlier.  So what seems to be the problem?

SELECT DISTINCT branch.branch_no,branch.branch_name,site.site_no
FROM site
    left join customer_site
        on (site.site_no = customer_site.site_no
            and site.co_no = 1)
    inner join branch
        on (branch.branch_no = site.branch_no )
WHERE
    getdate() between customer_site.start_date and customer_site.end_date
ORDER BY
    branch.branch_no ASC,
    site.site_no ASC

The discrepancy in this example comes down to how the relations are really joined.  You see, there are two main things we always have to be mindful of when we are converting queries to the new form and they are. 

1) We need to first determine what the correct elements of the join itself are. 

2) We need to further identify the supplemental conditions have been applied to limit or filter the data returned after the initial join has taken place and decide if they are really part of the join or are applied afterwards.

What I mean by these two things is that there are certain criteria that is used to join the two relations and those are the things that need to be moved to the ON phrase of your join.  After the relations are joined there are often some additional filters applied to the result set as well that can widdle down the result set further but these may need to stay in the WHERE clause or be added to the join statement as well.  This is the confusing part.  So, for this conversion above the following would be a workable solution. 

SELECT DISTINCT branch.branch_no,branch.branch_name,site.site_no
FROM site
    left join customer_site
        on (site.site_no = customer_site.site_no
            and site.co_no = 1
            and getdate() between customer_site.start_date and customer_site.end_date)
    inner join branch
        on (branch.branch_no = site.branch_no )
ORDER BY
    branch.branch_no ASC,
    site.site_no ASC

As you can see, the SQL-92 standard really make a lot more sense when looking at the queries because if you are joining on something you do exactly that, we JOIN on it!  If you are doing additional filtering we keep that stuff in the where clause.  The real problem though is that depending on how the relations are joined in the first place you can get different result sets as in our second example above.   For inner joins this is not so much a problem but for outer joins it really comes into play.  In our example above the additional filters are actually needed in the join condition itself. Have you ever had any issues converting some of your joins to the new syntax like these?  Please let me know what they were and how you ended up solving them. 

TIP

One little trick that I have used in the past to speed up some of my conversions is to write the queries as a view in the view designer first.  By opening the view designer in SSMS it will automatically convert the queries for you.  It is not always 100% correct as the filters may need to be rearranged but it does get you pretty darn close and it makes it a bit easier when dealing with complex statements.  Try it out and let me know what you think.

1 Response to “#TSQL2sDay Article Submission – Join syntax”

  1. Adam Machanic : T-SQL Tuesday #002: The Roundup

    on February 8 2010

    [...] Dunleavy's post is on the topic of join syntax. Remember that deprecated *= outer join syntax? Apparently such abominations never go away, and John tells us all about how to get rid of the [...]

Comment RSS · TrackBack URI

Leave a comment

Name: (Required)

E-mail: (Required)

Website:

Comment:

Contact