true alphabetic sort...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ian Richardson

    true alphabetic sort...

    At the moment I'm using a quicksort algorithm to sort a list of
    countries in alphabetic order. This worked wonderfully until someone
    came up with the Åland Islands... and this is at the end of the list.

    I'm not sure it's supposed to be.

    Now I could just alter my comparison so it ignores the top bit, but this
    would then put it at the top of the list, even before Albania...
    Alternatively, should I put Å after A?

    In short, is there a preferred way of ordering these?

    Thanks,

    Ian
  • Knud Gert Ellentoft

    #2
    Re: true alphabetic sort...

    Ian Richardson <zathras@chaos. org.uk> skrev :
    [color=blue]
    >At the moment I'm using a quicksort algorithm to sort a list of
    >countries in alphabetic order. This worked wonderfully until someone
    >came up with the Åland Islands... and this is at the end of the list.[/color]

    Yes, and it's correct.

    In swedish, danish and norwegian is "Å" the last letter in the
    alphabet.
    --
    Knud

    Comment

    • Evertjan.

      #3
      Re: true alphabetic sort...

      Knud Gert Ellentoft wrote on 24 apr 2004 in comp.lang.javas cript:[color=blue]
      > In swedish, danish and norwegian is "Å" the last letter in the
      > alphabet.[/color]

      Just curious:

      This will write "å" overhere:

      document.write( 'Å'.toLowercase )

      Does this work for all European alphabets?

      =============== ==============

      When should I use:

      document.write( 'Å'.toLocaleLow erCase())

      ?

      --
      Evertjan.
      The Netherlands.
      (Please change the x'es to dots in my emailaddress)

      Comment

      • Lasse Reichstein Nielsen

        #4
        Re: true alphabetic sort...

        "Evertjan." <exjxw.hannivoo rt@interxnl.net > writes:
        [color=blue]
        > Just curious:
        >
        > This will write "å" overhere:
        >
        > document.write( 'Å'.toLowercase )
        >
        > Does this work for all European alphabets?[/color]

        It works for any Unicode letter, using the Unicode character database
        for the translation.
        [color=blue]
        > =============== ==============
        >
        > When should I use:
        >
        > document.write( 'Å'.toLocaleLow erCase())[/color]

        Never, for the letter "Å".
        In ECMA 262, secion 15.5.4.17, the reason given for using
        toLocaleLowerCa se, is for languages where the language rules conflict
        with the regular Unicode mapping. Tukish is given as an example.

        /L
        --
        Lasse Reichstein Nielsen - lrn@hotpop.com
        DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
        'Faith without judgement merely degrades the spirit divine.'

        Comment

        • Ivo

          #5
          Re: true alphabetic sort...

          "Knud Gert Ellentoft" wrote[color=blue]
          > Ian Richardson skrev :
          >[color=green]
          > >At the moment I'm using a quicksort algorithm to sort a list of
          > >countries in alphabetic order. This worked wonderfully until someone
          > >came up with the Åland Islands... and this is at the end of the list.[/color]
          >
          > Yes, and it's correct.
          > In swedish, danish and norwegian is "Å" the last letter in the
          > alphabet.[/color]

          This is interesting. It may be that the Å follows Z in those languages, but
          this is new for me and probably the rest of the world. In a long
          alphabetical list, I and the OP would look for Å after A, and so I think in
          a web-environment it probably should be put there. Where do the French put
          the character ç in the French alphabet? Where do the Germans put the ß? I
          would look for it after the B.

          As for a javascript solution, the easiest would probably be replacing all
          occurances of ÀÁÂÃÄÅ and perhaps Æ with an A prior to sorting the list. This
          would result in a mix of accented and normal A's which is not perfect. Åland
          must come after Aruba but before Bermuda. We must write our own comparison.
          It involves

          var abc = 'AÀÁÂÃÄÅBßCÇDÐE ÈÉÊËFGHIÌÍÎÏJ' +
          'KLMNÑOÒÓÔÕÖØPQ RSSTÙÚÛÜVWXYÝYZ ';

          and abc.toLowerCase () and testing for indexOf but I 'm quite not sure how.
          The following covers first letters only:

          function compare(a, b) {
          if (abc.indexOf(a. charAt(0)) < abc.indexOf(b.c harAt(0)))
          {
          return -1;
          }
          if (abc.indexOf(a. charAt(0)) > abc.indexOf(b.c harAt(0)))
          {
          return 1;
          }
          return 0;
          }
          var islands=['Curaçao','Bona ire','Åland','A ruba'];
          alert(islands.s ort(compare));

          HTH
          Ìvð


          Comment

          • Knud Gert Ellentoft

            #6
            Re: true alphabetic sort...

            "Ivo" <no@thank.you > skrev :
            [color=blue]
            >This is interesting. It may be that the Å follows Z in those languages, but
            >this is new for me and probably the rest of the world. In a long
            >alphabetical list, I and the OP would look for Å after A, and so I think in
            >a web-environment it probably should be put there. Where do the French put
            >the character ç in the French alphabet? Where do the Germans put the ß? I
            >would look for it after the B.[/color]

            I know only the scandinavian languages and a scandinavian would
            look for "Å" (and æ.ø.ä and ö) at the the end of the alfabet, so
            therefor I would let it be as the last letter.
            --
            Knud

            Comment

            • Lasse Reichstein Nielsen

              #7
              Re: true alphabetic sort...

              "Ivo" <no@thank.you > writes:
              [color=blue]
              > This is interesting. It may be that the Å follows Z in those languages,[/color]

              That would be all languages that actually have "Å" as a letter.
              [color=blue]
              > but this is new for me and probably the rest of the world.[/color]

              Hard to say. Microsoft seems to know it. When they alphabetize Danish
              words, the double-A, the original form which was turned into the new
              letter "Å", comes last (with predictable incorrect results for the
              foreign word Aardwark).
              [color=blue]
              > In a long alphabetical list, I and the OP would look for Å after A,
              > and so I think in a web-environment it probably should be put
              > there.[/color]

              That entirely depends on the language. If you are sorting words from
              different languages, I can see the problem, but would probably prefer
              to have it last anyway. It is a letter in its own, not just a letter
              with a accent.
              [color=blue]
              > Where do the French put the character ç in the French alphabet?[/color]

              It's a c-cedilla, that is, a "c" with an accent. It is not a separate
              letter.
              [color=blue]
              > Where do the Germans put the ß? I would look for it after the B.[/color]

              That would be a weird place to look for a sharp S. It is *not* a beta
              (it is an s-z-ligature).
              [color=blue]
              > As for a javascript solution, the easiest would probably be replacing all
              > occurances of ÀÁÂÃÄÅ and perhaps Æ with an A prior to sorting the list.[/color]

              That's one choice. Since you cannot fix one language to work with, I
              don't think there is an official way to alphabetize.
              I would probably expand Æ (the a-e-ligature) to AE.
              [color=blue]
              > This would result in a mix of accented and normal A's which is not
              > perfect.[/color]

              Alas, perfect does not exist.
              The closest to perfect for my tastes is to alphabetize letters according
              to the language they come from, so Aalborg (Danish city using old spelling)
              would be after Zaire, but Aardwark would be under "A".

              /L
              --
              Lasse Reichstein Nielsen - lrn@hotpop.com
              DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
              'Faith without judgement merely degrades the spirit divine.'

              Comment

              • Evertjan.

                #8
                Re: true alphabetic sort...

                Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javas cript:
                [color=blue]
                > In ECMA 262, secion 15.5.4.17, the reason given for using
                > toLocaleLowerCa se, is for languages where the language rules conflict
                > with the regular Unicode mapping. Tukish is given as an example.[/color]

                Not in
                <http://developer.netsc ape.com/docs/javascript/e262-pdf.pdf>
                from 1997, which stops at 15.5.4.12

                There should be a 3rd edition, but I cannot find it on the web.

                Do you have an URL?


                --
                Evertjan.
                The Netherlands.
                (Please change the x'es to dots in my emailaddress)

                Comment

                • Lasse Reichstein Nielsen

                  #9
                  Re: true alphabetic sort...

                  "Evertjan." <exjxw.hannivoo rt@interxnl.net > writes:
                  [color=blue]
                  > Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javas cript:
                  >[color=green]
                  >> In ECMA 262, secion 15.5.4.17, the reason given for using
                  >> toLocaleLowerCa se, is for languages where the language rules conflict
                  >> with the regular Unicode mapping. Tukish is given as an example.[/color]
                  >
                  > Not in
                  > <http://developer.netsc ape.com/docs/javascript/e262-pdf.pdf>
                  > from 1997, which stops at 15.5.4.12
                  >
                  > There should be a 3rd edition, but I cannot find it on the web.[/color]
                  [color=blue]
                  > Do you have an URL?[/color]

                  I use this one:
                  <URL:http://www.mozilla.org/js/language/E262-3.pdf>
                  It seems to be more recent, and better formatted, than the official
                  version from ECMA itself. I fail to imaginie an explanation for that :)
                  <URL:http://www.ecma-international.o rg/publications/files/ecma-st/Ecma-262.pdf>

                  /L
                  --
                  Lasse Reichstein Nielsen - lrn@hotpop.com
                  DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
                  'Faith without judgement merely degrades the spirit divine.'

                  Comment

                  • Evertjan.

                    #10
                    Re: true alphabetic sort...

                    Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javas cript:
                    [color=blue]
                    > I use this one:
                    > <URL:http://www.mozilla.org/js/language/E262-3.pdf>[/color]

                    tnx,

                    Interesting reading.


                    --
                    Evertjan.
                    The Netherlands.
                    (Please change the x'es to dots in my emailaddress)

                    Comment

                    • Dr John Stockton

                      #11
                      Re: true alphabetic sort...

                      JRS: In article <c6eega$ap7bp$1 @ID-99375.news.uni-berlin.de>, seen in
                      news:comp.lang. javascript, Ian Richardson <zathras@chaos. org.uk> posted
                      at Sat, 24 Apr 2004 20:17:30 :[color=blue]
                      >At the moment I'm using a quicksort algorithm to sort a list of
                      >countries in alphabetic order. This worked wonderfully until someone
                      >came up with the Åland Islands... and this is at the end of the list.
                      >
                      >I'm not sure it's supposed to be.
                      >
                      >Now I could just alter my comparison so it ignores the top bit, but this
                      >would then put it at the top of the list, even before Albania...
                      >Alternativel y, should I put Å after A?
                      >
                      >In short, is there a preferred way of ordering these?[/color]


                      I don't think those Islands *are* a country, but ICBW; are they not
                      loose bits of Finland - or are they a country in the same sense as Wales
                      & Scotland are? I have enough difficulty in determining which parts of
                      the globe are in the EU, or associated, or whatever, for
                      <URL:http://www.merlyn.demo n.co.uk/european.htm>.


                      However, while &Aring; may well sort to the end of the alphabet in all
                      languages that use it, that does not necessarily mean that all letters
                      of the extended Roman Alphabet sort to identical positions in all
                      countries that use them. It is possible that Potaniland sorts &AElig;
                      between A & B, while Erewhon puts it at the end.

                      I think all likely extended-roman letters can be mapped in an obvious
                      manner to one or two English letters; it is probably best to use that,
                      then sort. After all, even foreigners will probably not know the proper
                      sort order for languages other than their own; but they will be used to
                      what the Anglos do with their names. My fair-sized atlas indexes those
                      Islands as "Aland", in the middle of the "A" section.

                      Remember that the proper names of Asian and North African countries need
                      transliteration to be readable by the average Anglo - and may be quite
                      different too : one does not necessarily seek Bharat or Nippon among the
                      B or N sections.

                      <URL:http://www.merlyn.demo n.co.uk/quotes.htm#Fred Hoyle> :-)

                      --
                      © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
                      <URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang. javascript
                      <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
                      <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

                      Comment

                      • Ian Richardson

                        #12
                        Re: true alphabetic sort...

                        Dr John Stockton wrote:
                        [color=blue]
                        > JRS: In article <c6eega$ap7bp$1 @ID-99375.news.uni-berlin.de>, seen in
                        > news:comp.lang. javascript, Ian Richardson <zathras@chaos. org.uk> posted
                        > at Sat, 24 Apr 2004 20:17:30 :
                        >[color=green]
                        >>At the moment I'm using a quicksort algorithm to sort a list of
                        >>countries in alphabetic order. This worked wonderfully until someone
                        >>came up with the Åland Islands... and this is at the end of the list.
                        >>
                        >>I'm not sure it's supposed to be.[/color][/color]

                        <snip>
                        [color=blue]
                        > I don't think those Islands *are* a country, but ICBW[/color]

                        <snip>

                        According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.

                        <snip>

                        I guess what I'm looking for is a language-specific dictionary sort, if
                        such a thing exists, defaulting to a Unicode or some other default order
                        if not.

                        Ian

                        Comment

                        • optimistx

                          #13
                          Re: true alphabetic sort...

                          Ian Richardson wrote:
                          [color=blue]
                          >
                          > According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.
                          >
                          > <snip>
                          >
                          > I guess what I'm looking for is a language-specific dictionary sort, if
                          > such a thing exists, defaulting to a Unicode or some other default order
                          > if not.
                          >
                          > Ian[/color]

                          Åland is part of Finland, and Finland is an independent country. Member
                          of UN.





                          Comment

                          • Thomas 'PointedEars' Lahn

                            #14
                            Re: true alphabetic sort...

                            Lasse Reichstein Nielsen wrote:
                            [color=blue]
                            > "Evertjan." <exjxw.hannivoo rt@interxnl.net > writes:[color=green]
                            >> There should be a 3rd edition, but I cannot find it on the web.
                            >>
                            >> Do you have an URL?[/color]
                            >
                            > I use this one:
                            > <URL:http://www.mozilla.org/js/language/E262-3.pdf>
                            > It seems to be more recent, and better formatted, than the official
                            > version from ECMA itself. I fail to imaginie an explanation for that :)
                            > <URL:http://www.ecma-international.o rg/publications/files/ecma-st/Ecma-262.pdf>[/color]

                            Well, Netscape is (was?) developing the next version of JavaScript (v2.0)
                            which should (have?) become the next edition of ECMAScript (ed. 4). Since
                            AOLTW (apparently only temporarily) closed the Netscape browser division[1]
                            and consequently Netscape is (currently) no longer a member of ECMA and
                            AOLTW is neither, that might be a reason.


                            PointedEars
                            ___________
                            [1] <http://www.holgermetzg er.de/Netscape_Histor y.html>

                            Comment

                            Working...