searching through a string and pulling characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alexnb

    searching through a string and pulling characters


    This is similar to my last post, but a little different. Here is what I would
    like to do.

    Lets say I have a text file. The contents look like this, only there is A
    LOT of the same thing.

    () A registry mark given by underwriters (as at Lloyd's) to ships in
    first-class condition. Inferior grades are indicated by A 2 and A 3.
    () The first three letters of the alphabet, used for the whole alphabet.
    () In church or chapel style; -- said of compositions sung in the old church
    style, without instrumental accompaniment; as, a mass a capella, i. e., a
    mass purely vocal.
    () Astride; with a part on each side; -- used specif. in designating the
    position of an army with the wings separated by some line of demarcation, as
    a river or road.

    Now, I am talking 1000's of these. I need to do something like this. I will
    have a number, and what I want to do is go through this text file, just like
    the example. The trick is this, those "()'s" are what I need to match, so if
    the number is 245 I need to find the 245th () and then get the all the text
    from after it until the next (). If you have an idea about the best way to
    do this I would love your help. If you made it all the way through thanks!
    ;)
    --
    View this message in context: http://www.nabble.com/searching-thro...p19039594.html
    Sent from the Python - python-list mailing list archive at Nabble.com.

  • Wojtek Walczak

    #2
    Re: searching through a string and pulling characters

    On Mon, 18 Aug 2008 13:40:13 -0700 (PDT), Alexnb wrote:
    Now, I am talking 1000's of these. I need to do something like this. I will
    have a number, and what I want to do is go through this text file, just like
    the example. The trick is this, those "()'s" are what I need to match, so if
    the number is 245 I need to find the 245th () and then get the all the text
    from after it until the next (). If you have an idea about the best way to
    do this I would love your help. If you made it all the way through thanks!
    ;)
    findall comes to mind:
    >>a="""(string1 )
    .... (string2)
    .... (string3)
    .... (string4)
    .... (string5)
    .... (string6)"""
    >>import re
    >>pat = re.compile("(\( .*?\))")
    and now let's say you want to get fourth element:
    >>pat.findall(a )[3]
    '(string4)'

    To save some memory use finditer (as long as you don't have to search
    for too many of these):
    >>for i in enumerate(pat.f inditer(a)):
    .... if i[0] == 2:
    .... print i[1].group()
    ....
    (string3)
    >>>

    --
    Regards,
    Wojtek Walczak,

    Comment

    • Wojtek Walczak

      #3
      Re: searching through a string and pulling characters

      On Mon, 18 Aug 2008 21:43:43 +0000 (UTC), Wojtek Walczak wrote:
      On Mon, 18 Aug 2008 13:40:13 -0700 (PDT), Alexnb wrote:
      >Now, I am talking 1000's of these. I need to do something like this. I will
      >have a number, and what I want to do is go through this text file, just like
      >the example. The trick is this, those "()'s" are what I need to match, so if
      >the number is 245 I need to find the 245th () and then get the all the text
      >from after it until the next (). If you have an idea about the best way to
      >do this I would love your help. If you made it all the way through thanks!
      >;)
      >
      findall comes to mind:
      ....forget it, I misread your post :)

      --
      Regards,
      Wojtek Walczak,

      Comment

      • John Machin

        #4
        Re: searching through a string and pulling characters

        On Aug 19, 6:40 am, Alexnb <alexnbr...@gma il.comwrote:
        This is similar to my last post,
        Oh, goodie goodie goodie, I love guessing games!
        but a little different. Here is what I would
        like to do.
        >
        Lets say I have a text file. The contents look like this, only there is A
        LOT of the same thing.
        >
        () A registry mark given by underwriters (as at Lloyd's) to ships in
        first-class condition. Inferior grades are indicated by A 2 and A 3.
        () The first three letters of the alphabet, used for the whole alphabet.
        () In church or chapel style; -- said of compositions sung in the old church
        style, without instrumental accompaniment; as, a mass a capella, i. e., a
        mass purely vocal.
        () Astride; with a part on each side; -- used specif. in designating the
        position of an army with the wings separated by some line of demarcation, as
        a river or road.
        This looks like the "values" part of an abbreviation/acronym
        dictionary ... what has happened to the "keys" part (A1, ABC, AC, ?
        astride?, ...)

        Does "()" appear always at the start of a line (perhaps preceded by
        some whitespace), or can it appear in the middle of a line?

        Are you sure about "A 2" and "A 3"? I would have expected "A2" and
        "A3". In other words, is the above an exact copy of some input or have
        you re-typed it?

        "()" is a strange way of delimiting things ...

        OK, here's my guess: You have acquired a database with two tables.
        Table K maps e.g. "ABC" to 2. Table V maps 2 to "The first three
        letters of the alphabet, used for the whole alphabet." You have used
        some utility or done "select '() ' + column2 from V.
        >
        Now, I am talking 1000's of these. I need to do something like this. I will
        have a number, and what I want to do is go through this text file, just like
        the example. The trick is this, those "()'s" are what I need to match, so if
        the number is 245 I need to find the 245th () and then get the all the text
        from after it until the next (). If you have an idea about the best way to
        do this I would love your help.
        The best way to do this is to write a small simple Python script. I
        suggest that you try this, and if you have difficulties, post your
        attempt here together with a lucid description of the perceived
        problem.

        However searching through a large file (how many Mb?) looking for the
        nth occurrence of "()" doesn't sound like a good idea after about the
        10th time you do it. Perhaps it might be worth the extra effort to
        process the text file once and insert the results in a (say) SQLite
        data base so that later you can do "select column2 from V where
        column1 = 245".

        A really silly question: You say "I will have a number" (e.g. 245);
        what is the source or provenance of this ordinal? A random number
        generator? Inscription on a ticket passed through a wicket? "select
        column2 from K where column1 = 'A1'"? IOW, perhaps you may need to
        consider the larger problem.

        Cheers,
        John

        Comment

        • Alexnb

          #5
          Re: searching through a string and pulling characters


          Okay, well the point of this program is to steal from the OS X built-in
          dictionary. While most of the files are hidden this one is not.
          The "()" You saw actually looks like this: () only the []'s are <'s
          and >'s but the forum doesn't take kindly to html.

          What you saw was exactly how it will always be (by that I am talking about
          the A 2 A 3 thing)

          The number is based on the word(s) they type into my program, and then it
          fetches the number that word is in the list of words and then will search
          the definitions document and go to the nth def. It probably won't work, but
          that is the Idea.

          Also, on a side-note, does anyone know a very simple dictionary site, that
          isn't dictionary.com or yourdictionary. com. Or, a free dictionary that I can
          download to have an offline reference?

          John Machin wrote:
          >
          On Aug 19, 6:40 am, Alexnb <alexnbr...@gma il.comwrote:
          >This is similar to my last post,
          >
          Oh, goodie goodie goodie, I love guessing games!
          >
          >but a little different. Here is what I would
          >like to do.
          >>
          >Lets say I have a text file. The contents look like this, only there is A
          >LOT of the same thing.
          >>
          >() A registry mark given by underwriters (as at Lloyd's) to ships in
          >first-class condition. Inferior grades are indicated by A 2 and A 3.
          >() The first three letters of the alphabet, used for the whole alphabet.
          >() In church or chapel style; -- said of compositions sung in the old
          >church
          >style, without instrumental accompaniment; as, a mass a capella, i. e., a
          >mass purely vocal.
          >() Astride; with a part on each side; -- used specif. in designating the
          >position of an army with the wings separated by some line of demarcation,
          >as
          >a river or road.
          >
          This looks like the "values" part of an abbreviation/acronym
          dictionary ... what has happened to the "keys" part (A1, ABC, AC, ?
          astride?, ...)
          >
          Does "()" appear always at the start of a line (perhaps preceded by
          some whitespace), or can it appear in the middle of a line?
          >
          Are you sure about "A 2" and "A 3"? I would have expected "A2" and
          "A3". In other words, is the above an exact copy of some input or have
          you re-typed it?
          >
          "()" is a strange way of delimiting things ...
          >
          OK, here's my guess: You have acquired a database with two tables.
          Table K maps e.g. "ABC" to 2. Table V maps 2 to "The first three
          letters of the alphabet, used for the whole alphabet." You have used
          some utility or done "select '() ' + column2 from V.
          >
          >>
          >Now, I am talking 1000's of these. I need to do something like this. I
          >will
          >have a number, and what I want to do is go through this text file, just
          >like
          >the example. The trick is this, those "()'s" are what I need to match, so
          >if
          >the number is 245 I need to find the 245th () and then get the all the
          >text
          >from after it until the next (). If you have an idea about the best way
          >to
          >do this I would love your help.
          >
          The best way to do this is to write a small simple Python script. I
          suggest that you try this, and if you have difficulties, post your
          attempt here together with a lucid description of the perceived
          problem.
          >
          However searching through a large file (how many Mb?) looking for the
          nth occurrence of "()" doesn't sound like a good idea after about the
          10th time you do it. Perhaps it might be worth the extra effort to
          process the text file once and insert the results in a (say) SQLite
          data base so that later you can do "select column2 from V where
          column1 = 245".
          >
          A really silly question: You say "I will have a number" (e.g. 245);
          what is the source or provenance of this ordinal? A random number
          generator? Inscription on a ticket passed through a wicket? "select
          column2 from K where column1 = 'A1'"? IOW, perhaps you may need to
          consider the larger problem.
          >
          Cheers,
          John
          --

          >
          >
          --
          View this message in context: http://www.nabble.com/searching-thro...p19041356.html
          Sent from the Python - python-list mailing list archive at Nabble.com.

          Comment

          • John Machin

            #6
            Re: searching through a string and pulling characters

            On Aug 19, 8:34 am, Alexnb <alexnbr...@gma il.comwrote:
            The number is based on the word(s) they type into my program, and then it
            fetches the number that word is in the list of words and then will search
            the definitions document and go to the nth def. It probably won't work, but
            that is the Idea.
            Consider (1) an existing (free) dictionary application (2) using a
            database, if you feel you must write your own application.
            >
            Also, on a side-note, does anyone know a very simple dictionary site, that
            isn't dictionary.com or yourdictionary. com. Or, a free dictionary that I can
            download to have an offline reference?
            What happened when you did:

            Comment

            • Steven D'Aprano

              #7
              Re: searching through a string and pulling characters

              On Mon, 18 Aug 2008 13:40:13 -0700, Alexnb wrote:
              Lets say I have a text file. The contents look like this, only there is
              A LOT of the same thing.
              >
              () A registry mark given by underwriters (as at Lloyd's) to ships in
              first-class condition. Inferior grades are indicated by A 2 and A 3. ()
              The first three letters of the alphabet, used for the whole alphabet. ()
              In church or chapel style; -- said of compositions sung in the old
              church style, without instrumental accompaniment; as, a mass a capella,
              i. e., a mass purely vocal.
              () Astride; with a part on each side; -- used specif. in designating the
              position of an army with the wings separated by some line of
              demarcation, as a river or road.
              >
              Now, I am talking 1000's of these. I need to do something like this. I
              will have a number, and what I want to do is go through this text file,
              just like the example. The trick is this, those "()'s" are what I need
              to match, so if the number is 245 I need to find the 245th () and then
              get the all the text from after it until the next (). If you have an
              idea about the best way to do this I would love your help. If you made
              it all the way through thanks! ;)

              If I take your description of the problem literally, then the solution is:

              text = "() A registry mark given ..." # lots and lots of text
              blocks = text.split( "()" ) # use a literal "()" as a delimiter
              answer = blocks[n] # whichever number you want, starting counting at 0


              I suspect that the problem is more complicated than you are saying. I
              guess that in your actual data, the brackets () probably have something
              inside them. It looks like you are quoting definitions from a dictionary.

              Alex, a word of advice for you: we really don't like playing guessing
              games. If you get a reputation for describing your problem inaccurately,
              incompletely or cryptically, you will find fewer and fewer people willing
              to answer your questions. I recommend that you spend a few minutes now
              reading this page and save yourself a lot of grief later:



              Now, back to your problem. If my guess is right, and the brackets
              actually have text inside them, then my simple solution above will not
              work. You will need a more complicated solution using a regular
              expression or a parser. That solution will depend on whether or not you
              can get nested brackets "(ab (123 (fee fi fum) 456) cd ef)" or arbitrary
              single brackets without the matching pair.

              Your question also sounds suspiciously like homework. I don't do people's
              homework, but here's something to get you started. It's not a solution,
              but it can be used as the first step towards a solution.

              text = "() A registry mark given ..." # lots and lots of text
              level = 0
              blocks = []
              for c in text: # process text one character at a time
              if c == '(':
              print "Found an opening bracket"
              level += 1 # one deeper in brackets
              elif c == ')':
              level -= 1
              if level < 0:
              print "Found a close bracket without matching open bracket"
              else:
              print "Found a closing bracket"
              else: # any other character
              # here's where you do the real work
              if level == 0:
              print "Not inside a bracket"
              blocks.append(c )
              else:
              print "Inside a bracket"
              if level 0:
              print "Missing close bracket"
              text_minus_brac keted_words = ''.join(blocks)



              --
              Steven

              Comment

              • John Machin

                #8
                Re: searching through a string and pulling characters

                On Aug 19, 8:34 am, Alexnb <alexnbr...@gma il.comwrote:
                >
                The number is based on the word(s) they type into my program, and then it
                fetches the number that word is in the list of words and then will search
                the definitions document and go to the nth def. It probably won't work, but
                that is the Idea.
                Consider (1) an existing (free) dictionary application (2) using a
                database, if you feel you must write your own application.
                >
                Also, on a side-note, does anyone know a very simple dictionary site, that
                isn't dictionary.com or yourdictionary. com. Or, a free dictionary that I can
                download to have an offline reference?
                There's this thing called google (http://www.google.com). It's an
                example of a "web search engine". If you type (for example) "free
                dictionary download" (without the quotes!) into the text box and then
                click on the "Google Search" button, it will come back with a list of
                web pages where those words appear (e.g. http://www.dicts.info/dictionaries.php)

                HTH,
                John

                Comment

                • Alexnb

                  #9
                  Re: searching through a string and pulling characters


                  If by "What happened when you did:" you mean dictionary.com and
                  yourdictionary. com? Nothing, they work but screen scraping isn't medicore at
                  best. They both work fine (yourdictionary is better for screen scraping)
                  but. I want maybe an offline soloution. But the whole reason for the program
                  is that I can type in 20 words at one time, get them defined and formatted
                  and then save all from my app. So far, all is good, I just need an offline
                  soloution, or one from a database. You say a free dictionary program. But
                  how can I get definitions from another program w/o opening it? Anyway,
                  Ideas?


                  John Machin wrote:
                  >
                  On Aug 19, 8:34 am, Alexnb <alexnbr...@gma il.comwrote:
                  >
                  >The number is based on the word(s) they type into my program, and then it
                  >fetches the number that word is in the list of words and then will search
                  >the definitions document and go to the nth def. It probably won't work,
                  >but
                  >that is the Idea.
                  >
                  Consider (1) an existing (free) dictionary application (2) using a
                  database, if you feel you must write your own application.
                  >
                  >>
                  >Also, on a side-note, does anyone know a very simple dictionary site,
                  >that
                  >isn't dictionary.com or yourdictionary. com. Or, a free dictionary that I
                  >can
                  >download to have an offline reference?
                  >
                  What happened when you did:
                  >
                  --

                  >
                  >
                  --
                  View this message in context: http://www.nabble.com/searching-thro...p19041720.html
                  Sent from the Python - python-list mailing list archive at Nabble.com.

                  Comment

                  • Steven D'Aprano

                    #10
                    Re: searching through a string and pulling characters

                    On Mon, 18 Aug 2008 15:34:12 -0700, Alexnb wrote:
                    Okay, well the point of this program is to steal from the OS X built-in
                    dictionary.
                    Ah, not homework, but copyright infringement.
                    Also, on a side-note, does anyone know a very simple dictionary site,
                    that isn't dictionary.com or yourdictionary. com. Or, a free dictionary
                    that I can download to have an offline reference?


                    Goggling on "free dictionary OS X" comes up with 417,000 hits. I'm pretty
                    sure at least some of them will be relevant to what you want.



                    --
                    Steven

                    Comment

                    • Paul Boddie

                      #11
                      Re: searching through a string and pulling characters

                      On 19 Aug, 01:11, Steven D'Aprano <st...@REMOVE-THIS-
                      cybersource.com .auwrote:
                      On Mon, 18 Aug 2008 15:34:12 -0700, Alexnb wrote:
                      Okay, well the point of this program is to steal from the OS X built-in
                      dictionary.
                      >
                      Ah, not homework, but copyright infringement.
                      It depends what the inquirer is doing and what they mean by "steal".
                      Given the propaganda around "unauthoris ed" usage of content that is
                      pervasive these days ("you don't own that DVD: you just have our
                      temporary and conditional permission to watch it, pirate!"), the
                      inquirer may have been led to believe that just reading from a file on
                      their own system rather than using the nominated application is
                      somehow to "steal" from that file, even though it is content which has
                      presumably been obtained legitimately, even paid for in the case of OS
                      X.

                      Even if the end-user licence agreement were to attempt to wash away
                      any "fair use" (or just common sense) rights to using the content in
                      the way described by the inquirer - recalling that OS X is an Apple
                      product, so such games wouldn't be beneath that particular vendor - I
                      can't see how it does much good to dignify such antics with
                      unqualified cries of "copyright infringement". Indeed, for those not
                      acquainted with copyright and licensing, it probably just serves to
                      reinforce the dishonest message that they have to pay over and over
                      for content they already have and not to question what it is they're
                      paying for.

                      Paul

                      Comment

                      • Wojtek Walczak

                        #12
                        Re: searching through a string and pulling characters

                        On Mon, 18 Aug 2008 15:34:12 -0700 (PDT), Alexnb wrote:
                        Also, on a side-note, does anyone know a very simple dictionary site, that
                        isn't dictionary.com or yourdictionary. com.
                        This one is my favourite: http://www.lingro.com/


                        --
                        Regards,
                        Wojtek Walczak,
                        Cena domeny: 4999 PLN (do negocjacji). Możliwość kupna na raty od 624.88 PLN miesięcznie. Oferta sprzedaży znajduje się w serwisie Aftermarket.pl, największej giełdzie domen internetowych w Polsce.

                        Comment

                        • Sean DiZazzo

                          #13
                          Re: searching through a string and pulling characters

                          On Aug 19, 6:11 am, Wojtek Walczak <gmin...@bzt.bz twrote:
                          On Mon, 18 Aug 2008 15:34:12 -0700 (PDT), Alexnb wrote:
                          Also, on a side-note, does anyone know a very simple dictionary site, that
                          isn't dictionary.com or yourdictionary. com.
                          >
                          This one is my favourite:http://www.lingro.com/
                          >
                          --
                          Regards,
                          Wojtek Walczak,http://tosh.pl/gminick/
                          Thats hot!

                          Comment

                          Working...