how to remove <BR> using replace function?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • localpricemaps@gmail.com

    how to remove <BR> using replace function?

    i have some html that looks like this


    <address style="color:#" >34 main,<br> Boston, MA</address>

    and i am trying to use the replace function to get rid of the <Br> that
    i scrape out using this code:

    for oText in incident.fetchT ext( oRE):
    strTitle += oText.strip()
    strTitle = string.replace( strTitle,'<br>' ,'')

    but it doesn't seem to remove the <br>

    any ideas?

  • Dylan Moreland

    #2
    Re: how to remove &lt;BR&gt; using replace function?

    I think you want to use the replace method of the string instance.
    Something like this will work:

    # See http://docs.python.org/lib/string-methods.html#l2h-196
    txt = "an unfortunate <br> in the middle"
    txt = txt.replace("<b r>", "")

    Comment

    • localpricemaps@gmail.com

      #3
      Re: how to remove &lt;BR&gt; using replace function?

      tried that, didn't work for me

      Comment

      • localpricemaps@gmail.com

        #4
        Re: how to remove &lt;BR&gt; using replace function?

        nope didn't work

        Comment

        • Dylan Moreland

          #5
          Re: how to remove &lt;BR&gt; using replace function?


          localpricemaps@ gmail.com wrote:[color=blue]
          > nope didn't work[/color]

          Could you be more specific about the error? Both my example and yours
          work perfectly on my box.

          Comment

          • Rinzwind

            #6
            Re: how to remove &lt;BR&gt; using replace function?

            Works for me.
            [color=blue][color=green][color=darkred]
            >>> txt = "an unfortunate <br> in the middle"
            >>> print txt.replace("<b r>", "")[/color][/color][/color]
            an unfortunate in the middle[color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]


            Though I don't like the 2 spaces it gives ;)

            Comment

            • Duncan Booth

              #7
              Re: how to remove &lt;BR&gt; using replace function?

              Rinzwind wrote:
              [color=blue]
              > Works for me.
              >[color=green][color=darkred]
              >>>> txt = "an unfortunate <br> in the middle"
              >>>> print txt.replace("<b r>", "")[/color][/color]
              > an unfortunate in the middle[color=green][color=darkred]
              >>>>[/color][/color]
              >
              >
              > Though I don't like the 2 spaces it gives ;)
              >[/color]
              Although I generally advise against overuse of regular expressions, this is
              one situation where regular expressions might be useful: the situation is
              simple enough not to warrant a parser, but apart from the whitespace a <br>
              tag could have attributes or be written in xhtml style <br />. Also judging
              by the inconsistency between the OP's subject line and his original
              question he doesn't seem sure whether the tag is <br> or <BR> or even <Br>.
              [color=blue][color=green][color=darkred]
              >>> import re
              >>> nobr = re.compile('\W* <br.*?>\W*', re.I)
              >>> nobr.sub(' ', "an unfortunate <br /> in the middle")[/color][/color][/color]
              'an unfortunate in the middle'[color=blue][color=green][color=darkred]
              >>> nobr.sub(' ', "an unfortunate <BR> in the middle")[/color][/color][/color]
              'an unfortunate in the middle'

              Comment

              • Albert Leibbrandt

                #8
                Re: how to remove &lt;BR&gt; using replace function?



                Rinzwind wrote:
                [color=blue]
                >Works for me.
                >
                >
                >[color=green][color=darkred]
                >>>>txt = "an unfortunate <br> in the middle"
                >>>>print txt.replace("<b r>", "")
                >>>>
                >>>>[/color][/color]
                >an unfortunate in the middle
                >
                >
                >
                >
                >Though I don't like the 2 spaces it gives ;)
                >
                >
                >[/color]
                so use regex and replace both the double spaces and the <br>

                cheers
                albert

                Comment

                • bruno at modulix

                  #9
                  Re: how to remove &lt;BR&gt; using replace function?

                  localpricemaps@ gmail.com wrote:[color=blue]
                  > i have some html that looks like this
                  >
                  >
                  > <address style="color:#" >34 main,<br> Boston, MA</address>
                  >
                  > and i am trying to use the replace function to get rid of the <Br> that
                  > i scrape out using this code:
                  >
                  > for oText in incident.fetchT ext( oRE):
                  > strTitle += oText.strip()[/color]

                  Why concatening ?
                  [color=blue]
                  > strTitle = string.replace( strTitle,'<br>' ,'')[/color]

                  Use strTitle.replac e('<br>', '') instead. And BTW, hungarian notation is
                  evil, so:
                  for text in incident.fetchT ext(...):
                  title = text.strip().re place('<br>', '')


                  [color=blue]
                  > but it doesn't seem to remove the <br>[/color]

                  it does :

                  Python 2.4.2 (#1, Feb 9 2006, 02:40:32)
                  [GCC 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
                  Type "help", "copyright" , "credits" or "license" for more information.[color=blue][color=green][color=darkred]
                  >>> s = '<address style="color:#" >34 main,<br> Boston, MA</address>'
                  >>> s.replace('<br> ', '')[/color][/color][/color]
                  '<address style="color:#" >34 main, Boston, MA</address>'[color=blue][color=green][color=darkred]
                  >>>[/color][/color][/color]

                  The problem is obviously not with str.replace(), as you could have
                  figured out by yourself very easily.
                  [color=blue]
                  > any ideas?[/color]

                  yes: post the minimal *running* code that exhibit the problem.

                  Your problem is probably elsewhere, and given some of previous posts
                  here ('problems writing tuple to log file' and 'indentation messing up
                  my tuple?'), I'd say that a programming101 course should be your first
                  move.


                  --
                  bruno desthuilliers
                  python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
                  p in 'onurb@xiludom. gro'.split('@')])"

                  Comment

                  • Sion Arrowsmith

                    #10
                    Re: how to remove &lt;BR&gt; using replace function?

                    Duncan Booth <duncan.booth@s uttoncourtenay. org.uk> wrote:[color=blue]
                    >Although I generally advise against overuse of regular expressions, this is
                    >one situation where regular expressions might be useful: [ ... ][color=green][color=darkred]
                    >>>> nobr = re.compile('\W* <br.*?>\W*', re.I)[/color][/color][/color]

                    Agreed (on both counts), but r'\s*<br.*?>\s* ' might be better
                    (consider what happens with "an unfortunate... <br> in the middle"
                    if you use \W rather than \s).

                    --
                    \S -- siona@chiark.gr eenend.org.uk -- http://www.chaos.org.uk/~sion/
                    ___ | "Frankly I have no feelings towards penguins one way or the other"
                    \X/ | -- Arthur C. Clarke
                    her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

                    Comment

                    • Duncan Booth

                      #11
                      Re: how to remove &lt;BR&gt; using replace function?

                      Sion Arrowsmith wrote:
                      [color=blue]
                      > Duncan Booth <duncan.booth@s uttoncourtenay. org.uk> wrote:[color=green]
                      >>Although I generally advise against overuse of regular expressions,
                      >>this is one situation where regular expressions might be useful: [ ...
                      >>][color=darkred]
                      >>>>> nobr = re.compile('\W* <br.*?>\W*', re.I)[/color][/color]
                      >
                      > Agreed (on both counts), but r'\s*<br.*?>\s* ' might be better
                      > (consider what happens with "an unfortunate... <br> in the middle"
                      > if you use \W rather than \s).
                      >[/color]

                      Yes, I don't really know why I wrote \W when I obviously meant \s. Thanks
                      for correcting that.

                      Even better might be r'(\s*<br.*?>)+ \s*' to get multiple runs of <br> tags.

                      Comment

                      Working...