Sorting Lists, Ignoring the Definite Article 2
Contributed by Dan Eastwell
You'd think that often people would need to sort lists ignoring the
definite article 'the'. Film titles, band names, Sherlock Holmes
cases. But a search of the web brings precious little, especially as
you're searching for 'the'. Which is practically everywhere.
How do you order a list, e.g. of bands, whilst ignoring 'The':
The Hives, The Strokes, etc, whilst keeping order with other bands such
as Pram and Blur. I searched high and low for it, but with special
thanks to James, we've come up with an answer.
A simple response to the problem would be to fetch the results and then
sort using SQL having stripped the starting 'the'. However, in the
project I'm working on currently, there's a need to allow for paging.
This means using SQL to strip the 'the' and using LIMIT to select only the current page of results.
You can order MySQL expressions according to other expressions,
so, instead of a PHP function, instead of a new column with the same
entry but with the 'the' removed (both of which I found as answers),
you make use of the expression clause in ORDER BY.
The expression I first attempted was using REGEXP,
which allows you to search or order, etc according to a regular
expression. To start, I tried to order based on whether or not an
artist contains, simply a letter. I had
SELECT 'Blur' REGEXP 'b';
That's true, hence => 1; is the results set (Note: REGEXP is not case sensitive for strings (i.e. non-binary). In order to do this, you need to cast the search parameter as binary, see the MySQL manual).
SELECT 'The Hives' REGEXP 'The ';
Is also true. When I put it in my ORDER BY expression
SELECT *
FROM artists
ORDER BY (
artist REGEXP 'The '
);
I got this result:
Blur
Pram
The Hives
The Strokes
The problem here is that the ORDER BY expresion orders first by things that don't match, then those that do match the regular expression. Since REGEXP only ever returns 0 or 1, you won't get the sorted list you're looking for. The master stroke comes in thinking that we just want to order the names of the bands, no matter if there's a 'The ' in there or not. This means, that for the purposes of the ORDER BY, we can trim the beginning 'The '.
This is where James' master-stroke comes in. We can use MySQL's string functions to trim only the parameter of the ORDER BY clause leaving the results intact! Happily, a function to do this exists in MySQL: TRIM.
The code:
SELECT artist
FROM artists
ORDER BY TRIM(
LEADING 'The ' FROM artist);
(Note: don't leave a space between TRIM and the open parentheses: you'll get a syntax error!)
What this code does is a simple SELECT, then orders the results by a function acting on 'artist'. TRIM will either slice off the LEADING or TRAILING or BOTH examples of a removal string: in this case 'The '. (For the record, if you don't specify LEADING or TRAILING or BOTH and the removal string, TRIM will trim the whitespace on either side. see the MySQL manual)
This gives us what we're looking for:
Blur
The Hives
Pram
The Strokes
And devotees of the bands are happy!
Dan Eastwell.