CLR Approach to SQL Server Natural Language Sorting

If you’ve ever dealt with user facing lists, then it’s likely at some point you’ll have wanted to sort them. You might have found that alphabetical sorting either in your code or SQL meets your needs. If, however you are dealing with item titles that have some implicit order defined by the user in their text, you might want a sorting implementation that better meets user expectation. Consider the list:

  • “1. Title One”
  • “2. Title Two”
  • “10. Title Ten”

Sorted alphabetically, you’d see “10. Title Ten” as the second entry in the list which would not meet the user’s intention of it being last.

It won’t take you long to find a natural language implementation in your language of choice and get it running in your code. In my case, this involved grabbing a Java implementation. I’ve had one floating around in my code for a long time but won’t post it in entirety here since I can’t fully trace its provenance. Using it in my code gives the desired result but as we’re going to see, sometimes this is not where you want sort.

When you’re dealing with long lists, you might encounter the need for pagination and in doing so, realise that you do not wish to load all data from the database before you apply the sorting. Depending on the framework you’re using, you’ll possibly have some nice support for pagination which boils down to running a TOP or OFFSET-FETCH SQL statement or equivalent. In such cases you’re likely going to need to do the sort on the database yet still meet user expectations.

Initially you may look at either:

  • Finding a function that can implement a version of your sort in native SQL.
  • Add a new column to your table and maintain it in code whenever new rows are updated or inserted. This column should be a number representing it’s order amongst others.

I discounted the first option due to performance combined with the fact that the sorting had been tested “in the wild” for some time and (after a bit of poking) the SQL I googled didn’t match up very well.

The second option worked for a period of time very nicely; the existing Java comparison code was used in combination with a binary search to locate its position amongst the other rows. The problem was that I had to ensure the sorting code was run after every insert/update which meant knowing when every insert/update was happening and ensuring that all rows were being manipulated through the application and not directly via something like Management Studio. Due to the complexity of the application, I couldn’t even be totally sure I had covered all programmatic access.

Eventually I started looking at MSSQL Server’s SQLCLR (SQL Common Language Runtime) functionality which would allow me to run .NET code within SQL Server. Not letting the fact that I’m not a .NET programmer get in the way, I installed Visual Studio and started converting my Java comparison code into C# which was pretty easy. After deciding that this was the way forward I decided that my aim was to:

  • Create means by which this could be called as a Stored Procedure in SQL.
  • Implement a trigger that used this Stored Procedure as to avoid unsorted data getting into the table, programmatically or manually.
  • Allow incoming parameters to the CLR sproc, including references to additional Stored Procedures which in turn provide the table and columns by which to sort.

I’ve uploaded my rough solution to github: https://github.com/Sheepalot/CLRNatural

SortingTable.sql contains my simple table definition.
NaturalSort.cs contains all of the code.

The following:

[Microsoft.SqlServer.Server.SqlProcedure]
public static void NaturalSort(SqlString midLookupSproc, SqlString wrapperLookupSproc, SqlString sortSproc, SqlInt32 id)

…marks the NaturalSort method as being able to be called as a stored procedure from within SQL Server

The parameters are as follows:

  • midLookupSproc – the sproc called in a binary search fashion, responsible for locating the correct sort value for the subject row. See getMidpointForSort.sql in my project.
  • wrapperLookupSproc – the sproc responsible for providing the row to sort. See selectSortableRow.sql in my project.
  • sortSproc – the sproc responsible for updating the sort value of a required row. See updateSort.sql in my project.
  • id – row id of the value being inserted/updated.

Finally, you can see this all being pulled together in the trigger I created on SortingTable for updates and inserts See tiu_sort.sql in my project.

While passing additional stored procedures into the CLR stored procedure might seem messy, the aim was to allow the code to remain agnostic of the table/columns that are being sorted. In my situation, code residing outside of my web container (on the database) needs to change as infrequently as possible. Doing it like this means simple changes and applying it to other tables and columns can be handled in SQL only.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Advertisements
%d bloggers like this: