Islands and Gaps

Or Winning Streak. This problem has many names but has the same solution. The simplest scenario is a sequence of numbers where some numbers are missing. The missing numbers are referred as gaps and the existing numbers are referred as islands.

See this sample data

It has a very simple setup with just a few numbers. The expected output is this

So how can we easily accomplish this? Before SQL Server 2012 we have to use a technique with ROW_NUMBER() windowed function.

It does produce the correct result, but how does it work? Let us see the result of the CTE (Common Table Expression), before we aggregate the set in the outer query.

As you can see, the Value column is intact and the Island column is calculated by substracting the position of the value from the value itself.
This generates an Island value that is unique for every continuous sequential numbers.
Then, in the outer query, we use the Island value as the grouping value and pick the minimum and maximum value for each island to get the starting value and ending value for each island.

However… using this technique is seemingly fast but it has a built-in performance bottle neck. If we study the generated execution plan, we can see that there is a Sort operator which steals 78% of our thunder, and only 22% is spent on reading the index!

How to get rid of this? Well, you can hint your query with OPTION (HASH GROUP), but then our power is transferred to a HASH MATCH and now only 15% of the query is spent on reading the index.

There seem to be no way around this issue. Wait. With SQL Server 2012, we do have an option to get rid of the Sort operator.
With SQL Server 2012, we got a new set of windowed functions, the framing functions. What if we build the query this way with SQL Server 2012.
What will then happen to the exeution plan? Will the Sort go away?

The query itself is a somewhat bigger , but look at the exeuction plan!

The execution plan is huge, but it has no Sort operator and now 96% of our thunder is spent on reading the index!

As as sidenote, the benefit of using the new framing window become obvious if we use the same sample data again, and we allow for 1 number missing and still be regarded as an island. This is the new expected result from the same sample data.

To accomplish this, all we have to do is to change the code to this (replace the hardwired value of 1 to a variable instead).

Leave a Reply