How to read space separated file in python?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ganesh gajre

    How to read space separated file in python?

    Hi all,

    I want to read file which is mapping file. Used in to map character from ttf
    to unicode.
    eg

    Map file contain data in the following way:

    0 ०
    1 १
    2 २
    3 ३
    4 ४
    5 ५
    6 ६
    7 ७
    8 ८
    9 ९

    Like this. Please use any unicode editor to view the text if it not properly
    shown.

    Now i want to read both the character separately like:

    str[0]=0 and str2[0]=०

    How can i do this?

    please give me solution?

    Regards,
    Ginovation
  • Steven D'Aprano

    #2
    Re: How to read space separated file in python?

    On Fri, 21 Nov 2008 14:16:13 +0530, ganesh gajre wrote:
    Hi all,
    >
    I want to read file which is mapping file. Used in to map character from
    ttf to unicode.
    eg
    >
    Map file contain data in the following way:
    >
    0 ०
    1 १
    2 २
    3 ३
    4 ४
    5 ५
    6 ६
    7 ७
    8 ८
    9 ९
    >
    Like this. Please use any unicode editor to view the text if it not
    properly shown.
    >
    Now i want to read both the character separately like:
    >
    str[0]=0 and str2[0]=०
    >
    How can i do this?
    >
    please give me solution?
    Well, because you said please...

    I assume the encoding of the second column is utf-8. You need something
    like this:


    # Untested.
    column0 = []
    column1 = []
    for line in open('somefile' , 'r'):
    a, b = line.split()
    column0.append( a)
    column1.append( b.decode('utf-8'))





    --
    Steven

    Comment

    • Peter Otten

      #3
      Re: How to read space separated file in python?

      ganesh gajre wrote:
      Hi all,
      >
      I want to read file which is mapping file. Used in to map character from
      ttf to unicode.
      eg
      >
      Map file contain data in the following way:
      >
      0 ०
      1 १
      2 २
      3 ३
      4 ४
      5 ५
      6 ६
      7 ७
      8 ८
      9 ९
      >
      Like this. Please use any unicode editor to view the text if it not
      properly shown.
      >
      Now i want to read both the character separately like:
      >
      str[0]=0 and str2[0]=०
      >
      How can i do this?
      >
      please give me solution?
      Read the file:
      >>import codecs
      >>pairs = [line.split() for line in codecs.open("ga nesh.txt",
      encoding="utf-8")]
      >>pairs[0]
      [u'0', u'\u0966']

      Create the conversion dictionary:
      >>trans = dict((ord(s), t) for s, t in pairs)
      Do the translation:
      >>print u"01109876".tra nslate(trans)
      ०११०९ ८७६

      You may have to use int(s) instead of ord(s) in your actual conversion code:
      >>trans = dict((int(s), t) for s, t in pairs)
      >>print u"\x00\x01\x09" .translate(tran s)
      ०१९

      Peter

      Comment

      • Joe Strout

        #4
        Re: How to read space separated file in python?

        On Nov 21, 2008, at 2:08 AM, Steven D'Aprano wrote:
        a, b = line.split()
        Note that in a case like this, you may want to consider using
        partition instead of split:

        a, sep, b = line.partition( ' ')

        This way, if there happens to be more than one space (for example,
        because the Unicode character you're mapping to happens to be a
        space), it'll still work. It also better encodes the intention, which
        is to split only on the first space in the line, rather than on every
        space.

        (It so happens I ran into exactly this issue yesterday, though my
        delimiter was a colon.)

        Cheers,
        - Joe

        Comment

        • Steve Holden

          #5
          Re: How to read space separated file in python?

          Joe Strout wrote:
          On Nov 21, 2008, at 2:08 AM, Steven D'Aprano wrote:
          >
          > a, b = line.split()
          >
          Note that in a case like this, you may want to consider using partition
          instead of split:
          >
          a, sep, b = line.partition( ' ')
          >
          This way, if there happens to be more than one space (for example,
          because the Unicode character you're mapping to happens to be a space),
          it'll still work. It also better encodes the intention, which is to
          split only on the first space in the line, rather than on every space.
          >
          (It so happens I ran into exactly this issue yesterday, though my
          delimiter was a colon.)
          >
          Joe:

          In the special case of the None first argument (the default for the
          str.split() method) runs of whitespace *are* treated as single
          delimiters. So line.split() is not the same as line.split(' ').

          regards
          Steve
          --
          Steve Holden +1 571 484 6266 +1 800 494 3119
          Holden Web LLC http://www.holdenweb.com/

          Comment

          • Joe Strout

            #6
            Re: How to read space separated file in python?

            On Nov 21, 2008, at 9:00 AM, Steve Holden wrote:
            Joe Strout wrote:
            >On Nov 21, 2008, at 2:08 AM, Steven D'Aprano wrote:
            >>
            >> a, b = line.split()
            >>
            >Note that in a case like this, you may want to consider using
            >partition
            >instead of split:
            >>
            > a, sep, b = line.partition( ' ')
            >>
            >This way, if there happens to be more than one space (for example,
            >because the Unicode character you're mapping to happens to be a
            >space),
            >it'll still work. It also better encodes the intention, which is to
            >split only on the first space in the line, rather than on every
            >space.
            >>
            In the special case of the None first argument (the default for the
            str.split() method) runs of whitespace *are* treated as single
            delimiters. So line.split() is not the same as line.split(' ').
            Right -- so using split() gives you the wrong answer for two different
            reasons. Try these:
            >>line = "1 x"
            >>a, b = line.split() # b == "x", which is correct
            >>line = "2 "
            >>a, b = line.split() # correct answer would be b == " "
            ValueError: need more than 1 value to unpack
            >>line = "3 x and here is some extra stuff"
            >>a, b = line.split() # correct answer would be b == "x and here
            is some extra stuff"
            ValueError: too many values to unpack

            Partition handles these cases correctly (at least, within the OP's
            specification that the value of "b" should be whatever comes after the
            first space).

            Cheers,
            - Joe

            Comment

            • Gabriel Genellina

              #7
              Re: How to read space separated file in python?

              En Fri, 21 Nov 2008 14:13:23 -0200, Joe Strout <joe@strout.net escribió:
              Right -- so using split() gives you the wrong answer for two different
              reasons. Try these:
              >
              >>line = "1 x"
              >>a, b = line.split() # b == "x", which is correct
              >
              >>line = "2 "
              >>a, b = line.split() # correct answer would be b == " "
              ValueError: need more than 1 value to unpack
              >
              >>line = "3 x and here is some extra stuff"
              >>a, b = line.split() # correct answer would be b == "x and here is
              some extra stuff"
              ValueError: too many values to unpack
              >
              Partition handles these cases correctly (at least, within the OP's
              specification that the value of "b" should be whatever comes after the
              first space).
              split takes an additional argument too:

              pyline = "3 x and here is some extra stuff"
              pya, b = line.split(None , 1)
              pya
              '3'
              pyb
              'x and here is some extra stuff'

              But it still fails if the line contains no spaces. partition is more
              robust in those cases

              --
              Gabriel Genellina

              Comment

              • Steve Holden

                #8
                Re: How to read space separated file in python?

                Joe Strout wrote:
                [...]
                Partition handles these cases correctly (at least, within the OP's
                specification that the value of "b" should be whatever comes after the
                first space).
                I believe if you read the OP's post again you will see that he specified
                two non-space items per line.

                You really *love* being right, don't you? ;-) You say partition "...
                better encodes the intention, which is to split only on the first space
                in the line, rather than on every space". Your mind-reading abilities
                are clearly superior to mine.

                Anyway, sorry to have told you something you already knew. It's true
                that partition has its place, and is too often overlooked. Particularly
                by me.

                regards
                Steve
                --
                Steve Holden +1 571 484 6266 +1 800 494 3119
                Holden Web LLC http://www.holdenweb.com/

                Comment

                Working...