sorting slovak utf

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Stano Paska

    sorting slovak utf

    Hi,

    I have one problem.
    In file aaa.txt I have slovak letters in utf-8.

    zcaron
    scaron
    aacute
    ocircumflex
    tcaron
    yacute
    ccaron
    eacute
    lcaron
    iacute
    dcaron
    uacute
    adiaeresis
    oacute
    lacute
    ncaron
    racute

    with this script (output is redirected to file bbb.txt):

    import fileinput
    riadky = []
    a = fileinput.input ("aaa.txt")
    for i in a:
    riadky.append(i .strip())
    a.close()
    riadky.sort()
    for i in riadky:
    print i

    I have this result:

    aacute
    adiaeresis
    eacute
    iacute
    oacute
    ocircumflex
    uacute
    yacute
    ccaron
    dcaron
    lacute
    lcaron
    ncaron
    racute
    scaron
    tcaron
    zcaron

    and corrent result would be:

    aacute
    adiaeresis
    ccaron
    dcaron
    eacute
    iacute
    lacute
    lcaron
    ncaron
    oacute
    ocircumflex
    racute
    scaron
    tcaron
    uacute
    yacute
    zcaron

    I have set utf-8 in sitecustomize.p y

    I tried:
    import locale
    locale.setlocal e(locale.LC_CTY PE, 'sk_SK.utf-8')
    and
    locale.setlocal e(locale.LC_CTY PE, ('sk_SK', 'utf-8'))
    but i got "unsupporte d locale" error

    What I must do to get correct sorting result?

    Stano.

    P.S. lower, upper works correct





  • Radovan Garabik

    #2
    Re: sorting slovak utf

    Stano Paska <paska@kios.s k> wrote:[color=blue]
    > Hi,
    >
    > I have one problem.
    > In file aaa.txt I have slovak letters in utf-8.
    >[/color]

    ....
    [color=blue]
    >
    > I tried:
    > import locale
    > locale.setlocal e(locale.LC_CTY PE, 'sk_SK.utf-8')
    > and
    > locale.setlocal e(locale.LC_CTY PE, ('sk_SK', 'utf-8'))
    > but i got "unsupporte d locale" error
    >
    > What I must do to get correct sorting result?[/color]

    you probably do not have sk_SK.UTF-8 locale generated
    what OS, version are you using?
    What is the output of locale -a ?
    In some linux distributions, e.g. debian, you have to
    generate the locale beforehaned, with locale-gen
    (according to /etc/locale.gen file)


    --
    -----------------------------------------------------------
    | Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
    | __..--^^^--..__ garabik @ kassiopeia.juls .savba.sk |
    -----------------------------------------------------------
    Antivirus alert: file .signature infected by signature virus.
    Hi! I'm a signature virus! Copy me into your signature file to help me spread!

    Comment

    • Stano Paska

      #3
      Re: sorting slovak utf

      Another thing I forgot to write...

      I have windows xp, python 2.3.2

      Stano.

      Radovan Garabik wrote:
      [color=blue]
      > Stano Paska <paska@kios.s k> wrote:
      >[color=green]
      >>Hi,
      >>
      >>I have one problem.
      >>In file aaa.txt I have slovak letters in utf-8.
      >>[/color]
      >
      >
      > ...
      >
      >[color=green]
      >>I tried:
      >>import locale
      >>locale.setloc ale(locale.LC_C TYPE, 'sk_SK.utf-8')
      >>and
      >>locale.setloc ale(locale.LC_C TYPE, ('sk_SK', 'utf-8'))
      >>but i got "unsupporte d locale" error
      >>
      >>What I must do to get correct sorting result?[/color]
      >
      >
      > you probably do not have sk_SK.UTF-8 locale generated
      > what OS, version are you using?
      > What is the output of locale -a ?
      > In some linux distributions, e.g. debian, you have to
      > generate the locale beforehaned, with locale-gen
      > (according to /etc/locale.gen file)
      >
      >[/color]



      Comment

      • Martin v. Löwis

        #4
        Re: sorting slovak utf

        Stano Paska <paska@kios.s k> writes:
        [color=blue]
        > import locale
        > locale.setlocal e(locale.LC_CTY PE, 'sk_SK.utf-8')
        > and
        > locale.setlocal e(locale.LC_CTY PE, ('sk_SK', 'utf-8'))
        > but i got "unsupporte d locale" error
        >
        > What I must do to get correct sorting result?[/color]

        You don't need to operate in a UTF-8 locale. Instead, any Slovak
        locale will do, provided your system offers locale.strcoll for Unicode
        objects (try locale.strcoll( u"", u"")).

        In this case, you can convert all strings to Unicode, and then collate
        using locale.strcoll.

        Alternatively, you could set the locale to any Slovak locale, and use
        locale.getprefe rredencoding() to find the locale's encoding. Then you
        could convert all input strings to that encoding, and use
        locale.strcoll to collate them as byte strings.

        Regards,
        Martin

        Comment

        • Serge Orlov

          #5
          Re: sorting slovak utf

          "Stano Paska" <paska@kios.s k> wrote in message news:mailman.22 9.1070901518.16 879.python-list@python.org ...[color=blue]
          >
          > I have windows xp, python 2.3.2[/color]
          In this case you need to pass 'slovak' parameter instead of 'sk' to
          locale.setlocal e(). It's not written in the docs but locale name is
          system dependant. I wonder maybe it's bug?


          Comment

          Working...