Sorting weirdness?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • apattin

    Sorting weirdness?

    HI all,

    Can someone explain this sorting issue?

    we are using V8 on Windows, but database *might* have been created
    with V7 (I can find out if it really matters)

    I have a table with one column, data VARCHAR(255) . It contains three
    values : 1729_at, 1773_at and 177_at.

    If I issue this SQL:

    SELECT * FROM mytable ORDER BY data

    I get:

    1729_at
    177_at
    1773_at

    This seems to imply that in the active collating sequence, whatever it
    happens to be, the '_' character comes before the '3' character. Am I
    right? Why is that? In ASCII (and therefore also in UTF-8, I think)
    the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
    this at all possible, that someone created the database using EBCDIC
    on Windows?

    If I do the same sort in Java (uses Unicode) I get the right result,
    ie

    1729_at
    1773_at
    177_at

    What am I missing?

    Thanks,

    Alejandrina

  • Lennart

    #2
    Re: Sorting weirdness?

    apattin wrote:
    HI all,
    >
    Can someone explain this sorting issue?
    >
    we are using V8 on Windows, but database *might* have been created
    with V7 (I can find out if it really matters)
    >
    I have a table with one column, data VARCHAR(255) . It contains three
    values : 1729_at, 1773_at and 177_at.
    >
    If I issue this SQL:
    >
    SELECT * FROM mytable ORDER BY data
    >
    I get:
    >
    1729_at
    177_at
    1773_at
    >
    This seems to imply that in the active collating sequence, whatever it
    happens to be, the '_' character comes before the '3' character. Am I
    right? Why is that? In ASCII (and therefore also in UTF-8, I think)
    the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
    this at all possible, that someone created the database using EBCDIC
    on Windows?
    >
    If I do the same sort in Java (uses Unicode) I get the right result,
    ie
    >
    1729_at
    1773_at
    177_at
    >
    What am I missing?
    I'm also interested in why this is the case. I've tested 3 different
    databases (2 V8, 1 V9) with different codepage etc. All databases
    created in V8:

    LUW V8 fixpak 14:

    Database territory = SE
    Database code page = 1252
    Database code set = IBM-1252
    Database country/region code = 46
    Database collating sequence = UNIQUE
    Alternate collating sequence (ALT_COLLATE) =

    db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
    data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"

    DATA 2 3
    ------- ----------- -----------
    1729_at 50 57
    177_at 55 95
    1773_at 55 51


    LUV V8 fixpak 13:

    Database territory = SE
    Database code page = 819
    Database code set = ISO8859-1
    Database country/region code = 46
    Database collating sequence = UNIQUE
    Alternate collating sequence (ALT_COLLATE) =

    db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
    data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"

    DATA 2 3
    ------- ----------- -----------
    1729_at 50 57
    177_at 55 95
    1773_at 55 51


    LUW V9 fixpak 0:

    Database territory = C
    Database code page = 1208
    Database code set = UTF-8
    Database country/region code = 1
    Database collating sequence = UCA400_NO
    Alternate collating sequence (ALT_COLLATE) =

    db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
    data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"

    DATA 2 3
    ------- ----------- -----------
    1729_at 50 57
    177_at 55 95
    1773_at 55 51


    /Lennart

    Comment

    • Brian Tkatch

      #3
      Re: Sorting weirdness?

      On Wed, 06 Jun 2007 22:01:15 +0200, Lennart
      <erik.lennart.j onsson@gmail.co mwrote:
      >apattin wrote:
      >HI all,
      >>
      >Can someone explain this sorting issue?
      >>
      >we are using V8 on Windows, but database *might* have been created
      >with V7 (I can find out if it really matters)
      >>
      >I have a table with one column, data VARCHAR(255) . It contains three
      >values : 1729_at, 1773_at and 177_at.
      >>
      >If I issue this SQL:
      >>
      >SELECT * FROM mytable ORDER BY data
      >>
      >I get:
      >>
      >1729_at
      >177_at
      >1773_at
      >>
      >This seems to imply that in the active collating sequence, whatever it
      >happens to be, the '_' character comes before the '3' character. Am I
      >right? Why is that? In ASCII (and therefore also in UTF-8, I think)
      >the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
      >this at all possible, that someone created the database using EBCDIC
      >on Windows?
      >>
      >If I do the same sort in Java (uses Unicode) I get the right result,
      >ie
      >>
      >1729_at
      >1773_at
      >177_at
      >>
      >What am I missing?
      >
      >I'm also interested in why this is the case. I've tested 3 different
      >databases (2 V8, 1 V9) with different codepage etc. All databases
      >created in V8:
      >
      >LUW V8 fixpak 14:
      >
      Database territory = SE
      Database code page = 1252
      Database code set = IBM-1252
      Database country/region code = 46
      Database collating sequence = UNIQUE
      Alternate collating sequence (ALT_COLLATE) =
      >
      >db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
      >data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"
      >
      >DATA 2 3
      >------- ----------- -----------
      >1729_at 50 57
      >177_at 55 95
      >1773_at 55 51
      >
      >
      >LUV V8 fixpak 13:
      >
      Database territory = SE
      Database code page = 819
      Database code set = ISO8859-1
      Database country/region code = 46
      Database collating sequence = UNIQUE
      Alternate collating sequence (ALT_COLLATE) =
      >
      >db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
      >data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"
      >
      >DATA 2 3
      >------- ----------- -----------
      >1729_at 50 57
      >177_at 55 95
      >1773_at 55 51
      >
      >
      >LUW V9 fixpak 0:
      >
      Database territory = C
      Database code page = 1208
      Database code set = UTF-8
      Database country/region code = 1
      Database collating sequence = UCA400_NO
      Alternate collating sequence (ALT_COLLATE) =
      >
      >db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
      >data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"
      >
      >DATA 2 3
      >------- ----------- -----------
      >1729_at 50 57
      >177_at 55 95
      >1773_at 55 51
      >
      >
      >/Lennart

      To see the order:

      DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1))

      BEGIN ATOMIC
      DECLARE A INT DEFAULT 0;
      WHILE A < 257 DO
      INSERT INTO
      SESSION.Charmap (Num, Digit)
      VALUES (A, CHR(A));
      SET A = A + 1;
      END WHILE;
      END


      SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num
      SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit

      DROP TABLE SESSION.Charmap

      The ORDER BY Digit will show the order used.

      B.

      Comment

      • Brian Tkatch

        #4
        Re: Sorting weirdness?

        On Wed, 06 Jun 2007 10:38:34 -0700, apattin <apattin@gmail. comwrote:
        >HI all,
        >
        >Can someone explain this sorting issue?
        >
        >we are using V8 on Windows, but database *might* have been created
        >with V7 (I can find out if it really matters)
        >
        >I have a table with one column, data VARCHAR(255) . It contains three
        >values : 1729_at, 1773_at and 177_at.
        >
        >If I issue this SQL:
        >
        >SELECT * FROM mytable ORDER BY data
        >
        >I get:
        >
        >1729_at
        >177_at
        >1773_at
        >
        >This seems to imply that in the active collating sequence, whatever it
        >happens to be, the '_' character comes before the '3' character. Am I
        >right? Why is that? In ASCII (and therefore also in UTF-8, I think)
        >the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
        >this at all possible, that someone created the database using EBCDIC
        >on Windows?
        >
        >If I do the same sort in Java (uses Unicode) I get the right result,
        >ie
        >
        >1729_at
        >1773_at
        >177_at
        >
        >What am I missing?
        >
        >Thanks,
        >
        >Alejandrina
        If you want to ORDER BY the ASCII values, use ORDER BY ASCII(data);

        Otherwise, a different order is used.

        B.

        Comment

        • Lennart

          #5
          Re: Sorting weirdness?

          Brian Tkatch wrote:
          On Wed, 06 Jun 2007 10:38:34 -0700, apattin <apattin@gmail. comwrote:
          >
          >>HI all,
          >>
          >>Can someone explain this sorting issue?
          >>
          >>we are using V8 on Windows, but database *might* have been created
          >>with V7 (I can find out if it really matters)
          >>
          >>I have a table with one column, data VARCHAR(255) . It contains three
          >>values : 1729_at, 1773_at and 177_at.
          >>
          >>If I issue this SQL:
          >>
          >>SELECT * FROM mytable ORDER BY data
          >>
          >>I get:
          >>
          >>1729_at
          >>177_at
          >>1773_at
          >>
          >>This seems to imply that in the active collating sequence, whatever it
          >>happens to be, the '_' character comes before the '3' character. Am I
          >>right? Why is that? In ASCII (and therefore also in UTF-8, I think)
          >>the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
          >>this at all possible, that someone created the database using EBCDIC
          >>on Windows?
          >>
          >>If I do the same sort in Java (uses Unicode) I get the right result,
          >>ie
          >>
          >>1729_at
          >>1773_at
          >>177_at
          >>
          >>What am I missing?
          >>
          >>Thanks,
          >>
          >>Alejandrina
          >
          If you want to ORDER BY the ASCII values, use ORDER BY ASCII(data);
          >
          Otherwise, a different order is used.
          >
          I get the same sortorder using order by ascii(data)

          db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
          * from T order by ascii(data)"

          DATA
          -------
          1729_at
          177_at
          1773_at

          Comment

          • Lennart

            #6
            Re: Sorting weirdness?

            Brian Tkatch wrote:
            [...]
            To see the order:
            >
            DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1))
            >
            BEGIN ATOMIC
            DECLARE A INT DEFAULT 0;
            WHILE A < 257 DO
            INSERT INTO
            SESSION.Charmap (Num, Digit)
            VALUES (A, CHR(A));
            SET A = A + 1;
            END WHILE;
            END
            >
            >
            SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num
            SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit
            >
            DROP TABLE SESSION.Charmap
            >
            The ORDER BY Digit will show the order used.
            >
            B.
            Thanks, another variant without a session table:

            with charmap(num, digit) as (values (30,chr(30)) union all select num+1,
            chr(num+1) from charmap where num < 100) select * from charmap order by
            digit

            I used a smaller interval to avoid scrambling of the screen

            Comment

            • apattin

              #7
              Re: Sorting weirdness?

              On Jun 6, 5:55 pm, Brian Tkatch <N/Awrote:
              On Wed, 06 Jun 2007 22:01:15 +0200, Lennart
              >
              >
              >
              <erik.lennart.j ons...@gmail.co mwrote:
              apattin wrote:
              HI all,
              >
              Can someone explain this sorting issue?
              >
              we are using V8 on Windows, but database *might* have been created
              with V7 (I can find out if it really matters)
              >
              I have a table with one column, data VARCHAR(255) . It contains three
              values : 1729_at, 1773_at and 177_at.
              >
              If I issue this SQL:
              >
              SELECT * FROM mytable ORDER BY data
              >
              I get:
              >
              1729_at
              177_at
              1773_at
              >
              This seems to imply that in the active collating sequence, whatever it
              happens to be, the '_' character comes before the '3' character. Am I
              right? Why is that? In ASCII (and therefore also in UTF-8, I think)
              the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
              this at all possible, that someone created the database using EBCDIC
              on Windows?
              >
              If I do the same sort in Java (uses Unicode) I get the right result,
              ie
              >
              1729_at
              1773_at
              177_at
              >
              What am I missing?
              >
              I'm also interested in why this is the case. I've tested 3 different
              databases (2 V8, 1 V9) with different codepage etc. All databases
              created in V8:
              >
              LUW V8 fixpak 14:
              >
              Database territory = SE
              Database code page = 1252
              Database code set = IBM-1252
              Database country/region code = 46
              Database collating sequence = UNIQUE
              Alternate collating sequence (ALT_COLLATE) =
              >
              db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
              data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"
              >
              DATA 2 3
              ------- ----------- -----------
              1729_at 50 57
              177_at 55 95
              1773_at 55 51
              >
              LUV V8 fixpak 13:
              >
              Database territory = SE
              Database code page = 819
              Database code set = ISO8859-1
              Database country/region code = 46
              Database collating sequence = UNIQUE
              Alternate collating sequence (ALT_COLLATE) =
              >
              db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
              data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"
              >
              DATA 2 3
              ------- ----------- -----------
              1729_at 50 57
              177_at 55 95
              1773_at 55 51
              >
              LUW V9 fixpak 0:
              >
              Database territory = C
              Database code page = 1208
              Database code set = UTF-8
              Database country/region code = 1
              Database collating sequence = UCA400_NO
              Alternate collating sequence (ALT_COLLATE) =
              >
              db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
              data, ascii(substr(da ta,3,1)), ascii(substr(da ta,4,1)) from T order by data"
              >
              DATA 2 3
              ------- ----------- -----------
              1729_at 50 57
              177_at 55 95
              1773_at 55 51
              >
              /Lennart
              >
              To see the order:
              >
              DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1))
              >
              BEGIN ATOMIC
              DECLARE A INT DEFAULT 0;
              WHILE A < 257 DO
              INSERT INTO
              SESSION.Charmap (Num, Digit)
              VALUES (A, CHR(A));
              SET A = A + 1;
              END WHILE;
              END
              >
              SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num
              SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit
              >
              DROP TABLE SESSION.Charmap
              >
              The ORDER BY Digit will show the order used.
              >
              B.

              Thanks, Brian. I guess I want to know is why is this specific
              collating sequence being used, as ASCII or Unicode appear to be
              different.

              Alejandrina

              Comment

              • Tonkuma

                #8
                Re: Sorting weirdness?

                Collating sequence is defined at the time of creating DATABASE and
                can't change later.
                CREATE DATABASE .....
                COLLATE USING SYSTEM | COMPATIBILITY | IDENTITY | IDENTITY_16BIT |
                UCA400_NO | UCA400_LTH | NLSCHAR

                SYSTEM (Default)
                Collating sequence based on the database territory. This option cannot
                be specified when creating a Unicode database.

                IDENTITY
                Identity collating sequence, in which strings are compared byte for
                byte.


                ------------------- Commands Entered --------------------
                SELECT data, HEX(data)
                FROM (VALUES '1729_at', '1773_at', '177_at' ) AS mytable(data)
                ORDER BY HEX(data);
                --------------------------------------------------------

                DATA 2
                ------- --------------
                1729_at 313732395F6174
                1773_at 313737335F6174
                177_at 3137375F6174

                3 record(s) selected.


                Comment

                • Brian Tkatch

                  #9
                  Re: Sorting weirdness?

                  On Thu, 07 Jun 2007 00:16:36 +0200, Lennart
                  <erik.lennart.j onsson@gmail.co mwrote:
                  >Brian Tkatch wrote:
                  >[...]
                  >To see the order:
                  >>
                  >DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1))
                  >>
                  >BEGIN ATOMIC
                  > DECLARE A INT DEFAULT 0;
                  > WHILE A < 257 DO
                  > INSERT INTO
                  > SESSION.Charmap (Num, Digit)
                  > VALUES (A, CHR(A));
                  > SET A = A + 1;
                  > END WHILE;
                  >END
                  >>
                  >>
                  >SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num
                  >SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit
                  >>
                  >DROP TABLE SESSION.Charmap
                  >>
                  >The ORDER BY Digit will show the order used.
                  >>
                  >B.
                  >
                  >Thanks, another variant without a session table:
                  >
                  >with charmap(num, digit) as (values (30,chr(30)) union all select num+1,
                  >chr(num+1) from charmap where num < 100) select * from charmap order by
                  >digit
                  Silly me. I *still* haven't integrated recursive WITHs into my noggin.
                  Thanx for the lesson!
                  >
                  >I used a smaller interval to avoid scrambling of the screen
                  Considering the order changes a lot, showing every character may be
                  advantageous.

                  B.

                  B.

                  Comment

                  • Brian Tkatch

                    #10
                    Re: Sorting weirdness?

                    On Thu, 07 Jun 2007 00:08:09 +0200, Lennart
                    <erik.lennart.j onsson@gmail.co mwrote:
                    >Brian Tkatch wrote:
                    >On Wed, 06 Jun 2007 10:38:34 -0700, apattin <apattin@gmail. comwrote:
                    >>
                    >>>HI all,
                    >>>
                    >>>Can someone explain this sorting issue?
                    >>>
                    >>>we are using V8 on Windows, but database *might* have been created
                    >>>with V7 (I can find out if it really matters)
                    >>>
                    >>>I have a table with one column, data VARCHAR(255) . It contains three
                    >>>values : 1729_at, 1773_at and 177_at.
                    >>>
                    >>>If I issue this SQL:
                    >>>
                    >>>SELECT * FROM mytable ORDER BY data
                    >>>
                    >>>I get:
                    >>>
                    >>>1729_at
                    >>>177_at
                    >>>1773_at
                    >>>
                    >>>This seems to imply that in the active collating sequence, whatever it
                    >>>happens to be, the '_' character comes before the '3' character. Am I
                    >>>right? Why is that? In ASCII (and therefore also in UTF-8, I think)
                    >>>the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is
                    >>>this at all possible, that someone created the database using EBCDIC
                    >>>on Windows?
                    >>>
                    >>>If I do the same sort in Java (uses Unicode) I get the right result,
                    >>>ie
                    >>>
                    >>>1729_at
                    >>>1773_at
                    >>>177_at
                    >>>
                    >>>What am I missing?
                    >>>
                    >>>Thanks,
                    >>>
                    >>>Alejandrin a
                    >>
                    >If you want to ORDER BY the ASCII values, use ORDER BY ASCII(data);
                    >>
                    >Otherwise, a different order is used.
                    >>
                    >
                    >I get the same sortorder using order by ascii(data)
                    >
                    >db2 "with T (data) as (values ('1729_at'),('1 77_at'),('1773_ at')) select
                    >* from T order by ascii(data)"
                    >
                    >DATA
                    >-------
                    >1729_at
                    >177_at
                    >1773_at
                    My mistake. My example in the other thread used one character, and in
                    that case ASCII will obviously work. With more than one char, ASCII
                    returns only the first char's value, as the docs say "Returns the
                    ASCII code value of the leftmost character of the argument as an
                    integer." Which is pretty useless here.

                    Tonkuma's reply uses HEX(), which does an ASCII on all the chars,
                    which is what we want.

                    Thanx for the catch.

                    B.

                    Comment

                    Working...