How to remove empty lines with re?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ted

    How to remove empty lines with re?

    I'm having trouble using the re module to remove empty lines in a file.

    Here's what I thought would work, but it doesn't:

    import re
    f = open("old_site/index.html")
    for line in f:
    line = re.sub(r'^\s+$| \n', '', line)
    print line

    Also, when I try to remove some HTML tags, I get even more empty lines:

    import re
    f = open("old_site/index.html")
    for line in f:
    line = re.sub('<.*?>', '', line)
    line = re.sub(r'^\s+$| \n', '', line)
    print line

    I don't know what I'm doing. Any help appreciated.

    TIA,
    Ted








  • Tim Haynes

    #2
    Re: How to remove empty lines with re?

    "ted" <tedNOSPAM94107 @yahoo.com> writes:
    [color=blue]
    > f = open("old_site/index.html")
    > for line in f:
    > line = re.sub(r'^\s+$| \n', '', line) # }
    > print line # }[/color]


    If you will set a variable to an empty string and then print it, you will
    get an empty line printed ;)

    ~Tim
    --
    Product Development Consultant
    OpenLink Software
    Tel: +44 (0) 20 8681 7701
    Web: <http://www.openlinksw. com>
    Universal Data Access & Data Integration Technology Providers

    Comment

    • Peter Otten

      #3
      Re: How to remove empty lines with re?

      ted wrote:
      [color=blue]
      > I'm having trouble using the re module to remove empty lines in a file.
      >
      > Here's what I thought would work, but it doesn't:
      >
      > import re
      > f = open("old_site/index.html")
      > for line in f:
      > line = re.sub(r'^\s+$| \n', '', line)
      > print line[/color]

      Try:

      import sys
      for line in f:
      if line.strip():
      sys.stdout.writ e(line)

      Background: lines read from the file keep their trailing "\n", a second
      newline is inserted by the print statement.
      The strip() method creates a copy of the string with all leading/trailing
      whitespace chars removed. All but the empty string evaluate to True in the
      if statement.

      Peter

      Comment

      • Bror Johansson

        #4
        Re: How to remove empty lines with re?


        "ted" <tedNOSPAM94107 @yahoo.com> wrote in message
        news:vocoudjtp6 vv25@corp.super news.com...[color=blue]
        > I'm having trouble using the re module to remove empty lines in a file.
        >
        > Here's what I thought would work, but it doesn't:
        >
        > import re
        > f = open("old_site/index.html")
        > for line in f:
        > line = re.sub(r'^\s+$| \n', '', line)
        > print line
        >[/color]

        nonempty = [x for x in f if not x.strip()]

        /BJ


        Comment

        • Anand Pillai

          #5
          Re: How to remove empty lines with re?

          To do this, you need to modify your re to just
          this

          empty=re.compil e('^$')

          This of course looks for a pattern where there is beginning just
          after end, ie the line is empty :-)

          Here is the complete code.

          import re

          empty=re.compil e('^$')
          for line in open('test.txt' ).readlines():
          if empty.match(lin e):
          continue
          else:
          print line,

          The comma at the end of the print is to avoid printing another newline,
          since the 'readlines()' method gives you the line with a '\n' at the end.

          Also dont forget to compile your regexps for efficiency sake.

          HTH

          -Anand Pillai


          "ted" <tedNOSPAM94107 @yahoo.com> wrote in message news:<vocoudjtp 6vv25@corp.supe rnews.com>...[color=blue]
          > I'm having trouble using the re module to remove empty lines in a file.
          >
          > Here's what I thought would work, but it doesn't:
          >
          > import re
          > f = open("old_site/index.html")
          > for line in f:
          > line = re.sub(r'^\s+$| \n', '', line)
          > print line
          >
          > Also, when I try to remove some HTML tags, I get even more empty lines:
          >
          > import re
          > f = open("old_site/index.html")
          > for line in f:
          > line = re.sub('<.*?>', '', line)
          > line = re.sub(r'^\s+$| \n', '', line)
          > print line
          >
          > I don't know what I'm doing. Any help appreciated.
          >
          > TIA,
          > Ted[/color]

          Comment

          • Anand Pillai

            #6
            Re: How to remove empty lines with re?

            Errata:

            I meant "there is end just after the beginning" of course.

            -Anand

            "ted" <tedNOSPAM94107 @yahoo.com> wrote in message news:<vocoudjtp 6vv25@corp.supe rnews.com>...[color=blue]
            > I'm having trouble using the re module to remove empty lines in a file.
            >
            > Here's what I thought would work, but it doesn't:
            >
            > import re
            > f = open("old_site/index.html")
            > for line in f:
            > line = re.sub(r'^\s+$| \n', '', line)
            > print line
            >
            > Also, when I try to remove some HTML tags, I get even more empty lines:
            >
            > import re
            > f = open("old_site/index.html")
            > for line in f:
            > line = re.sub('<.*?>', '', line)
            > line = re.sub(r'^\s+$| \n', '', line)
            > print line
            >
            > I don't know what I'm doing. Any help appreciated.
            >
            > TIA,
            > Ted[/color]

            Comment

            • Klaus Alexander Seistrup

              #7
              Re: How to remove empty lines with re?

              Anand Pillai wrote:
              [color=blue]
              > Here is the complete code.
              >
              > import re
              >
              > empty=re.compil e('^$')
              > for line in open('test.txt' ).readlines():
              > if empty.match(lin e):
              > continue
              > else:
              > print line,[/color]

              The .readlines() method retains any line terminators, and using the
              builtin print will suffix an extra line terminator to every line,
              thus effectively producing an empty line for every non-empty line.
              You'd want to use e.g. sys.stdout.writ e() instead of print.


              // Klaus

              --[color=blue]
              ><> unselfish actions pay back better[/color]

              Comment

              • ted

                #8
                Re: How to remove empty lines with re?

                Thanks Anand, works great.


                "Anand Pillai" <pythonguy@Hotp op.com> wrote in message
                news:84fc4588.0 310100849.4546e 804@posting.goo gle.com...[color=blue]
                > To do this, you need to modify your re to just
                > this
                >
                > empty=re.compil e('^$')
                >
                > This of course looks for a pattern where there is beginning just
                > after end, ie the line is empty :-)
                >
                > Here is the complete code.
                >
                > import re
                >
                > empty=re.compil e('^$')
                > for line in open('test.txt' ).readlines():
                > if empty.match(lin e):
                > continue
                > else:
                > print line,
                >
                > The comma at the end of the print is to avoid printing another newline,
                > since the 'readlines()' method gives you the line with a '\n' at the end.
                >
                > Also dont forget to compile your regexps for efficiency sake.
                >
                > HTH
                >
                > -Anand Pillai
                >
                >
                > "ted" <tedNOSPAM94107 @yahoo.com> wrote in message[/color]
                news:<vocoudjtp 6vv25@corp.supe rnews.com>...[color=blue][color=green]
                > > I'm having trouble using the re module to remove empty lines in a file.
                > >
                > > Here's what I thought would work, but it doesn't:
                > >
                > > import re
                > > f = open("old_site/index.html")
                > > for line in f:
                > > line = re.sub(r'^\s+$| \n', '', line)
                > > print line
                > >
                > > Also, when I try to remove some HTML tags, I get even more empty lines:
                > >
                > > import re
                > > f = open("old_site/index.html")
                > > for line in f:
                > > line = re.sub('<.*?>', '', line)
                > > line = re.sub(r'^\s+$| \n', '', line)
                > > print line
                > >
                > > I don't know what I'm doing. Any help appreciated.
                > >
                > > TIA,
                > > Ted[/color][/color]


                Comment

                • Anand Pillai

                  #9
                  Re: How to remove empty lines with re?

                  You probably did not read my posting completely.

                  I have added a comma after the print statement and mentioned
                  a comment specifically on this.

                  The 'print line,' statement with a comma after it does not print
                  a newline which you also call as line terminator whereas
                  the 'print' without a comma at the end does just that.

                  No wonder python sometimes feels like high-level psuedocode ;-)
                  It has that ultra intuitive feel for most of its tricks.

                  In this case, the comma is usually put when you have more than
                  one item to print, and python puts a newline after all items.
                  So it very intuitively follows that just putting a comma will not
                  print a newline! It is better than telling the programmer to use
                  another print function to avoid newlines, which you find in many
                  other 'un-pythonic' languages.

                  -Anand

                  Klaus Alexander Seistrup <spam@magneti c-ink.dk> wrote in message news:<3f86e96c-da7dc89b-addc-47d2-82cf-a60a482e6e07@ne ws.szn.dk>...[color=blue]
                  > Anand Pillai wrote:
                  >[color=green]
                  > > Here is the complete code.
                  > >
                  > > import re
                  > >
                  > > empty=re.compil e('^$')
                  > > for line in open('test.txt' ).readlines():
                  > > if empty.match(lin e):
                  > > continue
                  > > else:
                  > > print line,[/color]
                  >
                  > The .readlines() method retains any line terminators, and using the
                  > builtin print will suffix an extra line terminator to every line,
                  > thus effectively producing an empty line for every non-empty line.
                  > You'd want to use e.g. sys.stdout.writ e() instead of print.
                  >
                  >
                  > // Klaus[/color]

                  Comment

                  • Klaus Alexander Seistrup

                    #10
                    Re: How to remove empty lines with re?

                    Anand Pillai wrote:
                    [color=blue]
                    > You probably did not read my posting completely.
                    >
                    > I have added a comma after the print statement and mentioned
                    > a comment specifically on this.[/color]

                    You are completely right, I missed an important part of your posting.
                    I didn't know about the comma feature, so thanks for teaching me!

                    Cheers,

                    // Klaus

                    --[color=blue]
                    ><> unselfish actions pay back better[/color]

                    Comment

                    Working...