Double replace or single re.sub?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Iain King

    Double replace or single re.sub?

    I have some code that converts html into xhtml. For example, convert
    all <i> tags into <em>. Right now I need to do to string.replace calls
    for every tag:

    html = html.replace('< i>','<em>')
    html = html.replace('</i>','</em>')

    I can change this to a single call to re.sub:

    html = re.sub('<([/]*)i>', r'<\1em>', html)

    Would this be a quicker/better way of doing it?

    Iain

  • Mike Meyer

    #2
    Re: Double replace or single re.sub?

    "Iain King" <iainking@gmail .com> writes:
    [color=blue]
    > I have some code that converts html into xhtml. For example, convert
    > all <i> tags into <em>. Right now I need to do to string.replace calls
    > for every tag:
    >
    > html = html.replace('< i>','<em>')
    > html = html.replace('</i>','</em>')
    >
    > I can change this to a single call to re.sub:
    >
    > html = re.sub('<([/]*)i>', r'<\1em>', html)
    >
    > Would this be a quicker/better way of doing it?[/color]

    Maybe. You could measure it and see. But neither will work in the face
    of attributes or whitespace in the tag.

    If you're going to parse [X]HTML, you really should use tools that are
    designed for the job. If you have well-formed HTML, you can use the
    htmllib parser in the standard library. If you have the usual crap one
    finds on the web, I recommend BeautifulSoup.

    <mike
    --
    Mike Meyer <mwm@mired.or g> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

    Comment

    • Iain King

      #3
      Re: Double replace or single re.sub?


      Mike Meyer wrote:[color=blue]
      > "Iain King" <iainking@gmail .com> writes:
      >[color=green]
      > > I have some code that converts html into xhtml. For example, convert
      > > all <i> tags into <em>. Right now I need to do to string.replace calls
      > > for every tag:
      > >
      > > html = html.replace('< i>','<em>')
      > > html = html.replace('</i>','</em>')
      > >
      > > I can change this to a single call to re.sub:
      > >
      > > html = re.sub('<([/]*)i>', r'<\1em>', html)
      > >
      > > Would this be a quicker/better way of doing it?[/color]
      >
      > Maybe. You could measure it and see. But neither will work in the face
      > of attributes or whitespace in the tag.
      >
      > If you're going to parse [X]HTML, you really should use tools that are
      > designed for the job. If you have well-formed HTML, you can use the
      > htmllib parser in the standard library. If you have the usual crap one
      > finds on the web, I recommend BeautifulSoup.
      >[/color]

      Thanks. My initial post overstates the program a bit - what I actually
      have is a cgi script which outputs my LIveJournal, which I then
      server-side include in my home page (so my home page also displays the
      latest X entries in my livejournal). The only html I need to convert
      is the stuff that LJ spews out, which, while bad, isn't terrible, and
      is fairly consistent. The stuff I need to convert is mostly stuff I
      write myself in journal entries, so it doesn't have to be so
      comprehensive that I'd need something like BeautifulSoup. I'm not
      trying to parse it, just clean it up a little.

      Iain

      Comment

      • SPE - Stani's Python Editor

        #4
        Re: Double replace or single re.sub?

        Of course it is better to precompile the expression, but I guess
        replace will beat even a precompiled regular expression. You could see
        this posting:


        But performance should be measured, not guessed.

        Stani
        --
        SPE - Stani's Python Editor
        Free python IDE for Windows,Mac & Linux with UML,PyChecker,Debugger,GUI design,Blender & more



        Comment

        • Josef Meile

          #5
          Re: Double replace or single re.sub?

          Hi Iain,
          [color=blue]
          > Would this be a quicker/better way of doing it?[/color]
          I don't know if this is faster, but it is for sure more elegant:



          I really like it because of its simplicity an easy use. (Thanks to
          Fredrik Lundh for the script). However, I suggested it once to replace
          the approach you suggested in a web application we have, but it was
          rejected because the person, who benchmarked it, said that it was OK for
          small strings, but for larger ones performance were an issue. Anyway,
          for my own applications, performance isn't an issue, so, I use it some
          times.

          By the way, the benchmarking, from which I don't have any information,
          was done in python 2.1.3, so, for sure you will get a better performance
          with 2.4.

          Regards,
          Josef


          Iain King wrote:[color=blue]
          > I have some code that converts html into xhtml. For example, convert
          > all <i> tags into <em>. Right now I need to do to string.replace calls
          > for every tag:
          >
          > html = html.replace('< i>','<em>')
          > html = html.replace('</i>','</em>')
          >
          > I can change this to a single call to re.sub:
          >
          > html = re.sub('<([/]*)i>', r'<\1em>', html)
          >[/color]
          [color=blue]
          >
          > Iain
          >[/color]


          Comment

          • EP

            #6
            Re: Double replace or single re.sub?

            How does Python execute something like the following

            oldPhrase="My dog has fleas on his knees"
            newPhrase=oldPh rase.replace("f leas",
            "wrinkles").rep lace("knees","f ace")

            Does it do two iterations of the replace method on the initial and then
            an intermediate string (my guess) -- or does it compile to something
            more efficient (I doubt it, unless it's Christmas in Pythonville... but
            I thought I'd query)

            Comment

            • Bengt Richter

              #7
              Re: Double replace or single re.sub?

              On 27 Oct 2005 12:39:18 -0700, "EP" <eric.pederson@ gmail.com> wrote:
              [color=blue]
              >How does Python execute something like the following
              >
              >oldPhrase="M y dog has fleas on his knees"
              >newPhrase=oldP hrase.replace(" fleas",
              >"wrinkles").re place("knees"," face")
              >
              >Does it do two iterations of the replace method on the initial and then
              >an intermediate string (my guess) -- or does it compile to something
              >more efficient (I doubt it, unless it's Christmas in Pythonville... but
              >I thought I'd query)
              >[/color]
              Here's a way to get an answer in one form:
              [color=blue][color=green][color=darkred]
              >>> def foo(): # for easy disassembly[/color][/color][/color]
              ... oldPhrase="My dog has fleas on his knees"
              ... newPhrase=oldPh rase.replace("f leas",
              ... "wrinkles").rep lace("knees","f ace")
              ...[color=blue][color=green][color=darkred]
              >>> import dis
              >>> dis.dis(foo)[/color][/color][/color]
              2 0 LOAD_CONST 1 ('My dog has fleas on his knees')
              3 STORE_FAST 1 (oldPhrase)

              3 6 LOAD_FAST 1 (oldPhrase)
              9 LOAD_ATTR 1 (replace)
              12 LOAD_CONST 2 ('fleas')

              4 15 LOAD_CONST 3 ('wrinkles')
              18 CALL_FUNCTION 2
              21 LOAD_ATTR 1 (replace)
              24 LOAD_CONST 4 ('knees')
              27 LOAD_CONST 5 ('face')
              30 CALL_FUNCTION 2
              33 STORE_FAST 0 (newPhrase)
              36 LOAD_CONST 0 (None)
              39 RETURN_VALUE

              Regards,
              Bengt Richter

              Comment

              • Alex Martelli

                #8
                Re: Double replace or single re.sub?

                Iain King <iainking@gmail .com> wrote:
                [color=blue]
                > I have some code that converts html into xhtml. For example, convert
                > all <i> tags into <em>. Right now I need to do to string.replace calls
                > for every tag:
                >
                > html = html.replace('< i>','<em>')
                > html = html.replace('</i>','</em>')
                >
                > I can change this to a single call to re.sub:
                >
                > html = re.sub('<([/]*)i>', r'<\1em>', html)
                >
                > Would this be a quicker/better way of doing it?[/color]

                *MEASURE*!

                Helen:~/Desktop alex$ python -m timeit -s'import re; h="<i>aap</i>"' \[color=blue]
                > 'h.replace("<i> ", "<em>").replace ("</i>", "</em>")'[/color]
                100000 loops, best of 3: 4.41 usec per loop

                Helen:~/Desktop alex$ python -m timeit -s'import re; h="<i>aap</i>"' \>
                're.sub("<([/]*)i>", r"<\1em>}", h)'
                10000 loops, best of 3: 52.9 usec per loop
                Helen:~/Desktop alex$

                timeit.py is your friend, remember this...!


                Alex

                Comment

                Working...