sorting and spaces in postgresql with en_US locale

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris Kratz

    sorting and spaces in postgresql with en_US locale

    We are having a weird problem that we ran into recently. If I use the
    following statements to create a test table and then run the select statement
    at the end, we get a very strange sort order. It appears that to do the
    sorting, all the spaces are removed from the strings. It would appear that in
    the example below 'ab e' should be before 'abd'.

    create table testing_sort(co l1 text);
    insert into testing_sort values('a');
    insert into testing_sort values('ab');
    insert into testing_sort values('ab c');
    insert into testing_sort values('abd');
    insert into testing_sort values('ab e');

    select * from testing_sort order by col1;
    col1
    ------
    a
    ab
    ab c
    abd
    ab e
    (5 rows)

    pg_controldata reports...
    LC_COLLATE: en_US
    LC_CTYPE: en_US

    on another box, which has both LC_COLLAGE and LC_CTYPE set to C, the sorting
    works as expected...

    select * from testing_sort order by col1;
    col1
    ------
    a
    ab
    ab c
    ab e
    abd

    Does anyone know if there is any other way to get the sorting to work as
    expected short of doing an dumping, doing an initdb, and reloading? Or is
    there some other setting that is causing the sort to do strange things.

    The only work around we have found is to create a sort column and replace all
    spaces with 0 and then sort on that column. Any other suggestions or
    workarounds?

    Issue was tested on both 7.4.1 and 7.3.4, both running on linux.

    Thanks,

    -Chris



    --
    Chris Kratz
    Systems Analyst/Programmer
    VistaShare LLC
    Simplify data and measure impact with Outcome Tracker, the leading CRM for nonprofits in the US and Canada. Manage client data effortlessly, engage clients, and analyze outcomes to showcase your impact. Serving sectors like Community Development, Asset Building, and Social Services. Ensure secure da


    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain analyze is your friend

  • Oliver Elphick

    #2
    Re: sorting and spaces in postgresql with en_US locale

    On Wed, 2004-07-21 at 20:58, Chris Kratz wrote:[color=blue]
    > We are having a weird problem that we ran into recently. If I use the
    > following statements to create a test table and then run the select statement
    > at the end, we get a very strange sort order. It appears that to do the
    > sorting, all the spaces are removed from the strings. It would appear that in
    > the example below 'ab e' should be before 'abd'.[/color]
    [color=blue]
    > LC_COLLATE: en_US
    > LC_CTYPE: en_US[/color]

    That is how the en_US (and en_GB) locales are defined.

    Their sorting is dictionary style: spaces and capitalisation are
    ignored. This is a glibc issue, not a PostgreSQL issue.

    --
    Oliver Elphick olly@lfix.co.uk
    Isle of Wight http://www.lfix.co.uk/oliver
    GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
    =============== =============== ==========
    "Greater love hath no man than this, that a man lay
    down his life for his friends." John 15:13


    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

    Comment

    • Edmund Bacon

      #3
      Re: sorting and spaces in postgresql with en_US locale

      Oliver Elphick wrote:
      [color=blue]
      > On Wed, 2004-07-21 at 20:58, Chris Kratz wrote:[color=green]
      >> We are having a weird problem that we ran into recently. If I use the
      >> following statements to create a test table and then run the select
      >> statement
      >> at the end, we get a very strange sort order. It appears that to do the
      >> sorting, all the spaces are removed from the strings. It would appear
      >> that in the example below 'ab e' should be before 'abd'.[/color]
      >[color=green]
      >> LC_COLLATE: en_US
      >> LC_CTYPE: en_US[/color]
      >
      > That is how the en_US (and en_GB) locales are defined.
      >
      > Their sorting is dictionary style: spaces and capitalisation are
      > ignored. This is a glibc issue, not a PostgreSQL issue.
      >[/color]

      Is there any way to specify the collation to use for a sort, other than the
      collation specified at initdb time?

      For example, suppose I have a database that is running in the Foo locale,
      and I have clients in the Bar locale. In the Foo locale 'Adam' sorts
      before 'adam', wheras in the Bar locale 'adam' sorts before 'Adam'. Other
      than doing the sort at the application level, is there any way to do this?

      e.g. something like:
      SELECT last_name, first_name
      FROM people
      ORDER BY last_name, first_name COLLATION = en_US;


      Comment

      Working...