find and replace with regular expressions

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • chrispoliquin@gmail.com

    find and replace with regular expressions

    I am using regular expressions to search a string (always full
    sentences, maybe more than one sentence) for common abbreviations and
    remove the periods. I need to break the string into different
    sentences but split('.') doesn't solve the whole problem because of
    possible periods in the middle of a sentence.

    So I have...

    ----------------

    import re

    middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')

    # this will find abbreviations like e.g. or i.e. in the middle of a
    sentence.
    # then I want to remove the periods.

    ----------------

    I want to keep the ie or eg but just take out the periods. Any
    ideas? Of course newString = middle_abbr.sub ('',txt) where txt is the
    string will take out the entire abbreviation with the alphanumeric
    characters included.
  • Mensanator

    #2
    Re: find and replace with regular expressions

    On Jul 31, 3:07 pm, chrispoliq...@g mail.com wrote:
    I am using regular expressions to search a string (always full
    sentences, maybe more than one sentence) for common abbreviations and
    remove the periods.  I need to break the string into different
    sentences but split('.') doesn't solve the whole problem because of
    possible periods in the middle of a sentence.
    >
    So I have...
    >
    ----------------
    >
    import re
    >
    middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >
    # this will find abbreviations like e.g. or i.e. in the middle of a
    sentence.
    # then I want to remove the periods.
    >
    ----------------
    >
    I want to keep the ie or eg but just take out the periods.  Any
    ideas?  Of course newString = middle_abbr.sub ('',txt) where txt is the
    string will take out the entire abbreviation with the alphanumeric
    characters included.
    >>middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
    >>s = 'A test, i.e., an example.'
    >>a = middle_abbr.sea rch(s) # find the abbreviation
    >>b = re.compile('\.' ) # period pattern
    >>c = b.sub('',a.grou p(0)) # remove periods from abbreviation
    >>d = middle_abbr.sub (c,s) # substitute new abbr for old
    >>d
    'A test, ie, an example.'

    Comment

    • Mensanator

      #3
      Re: find and replace with regular expressions

      On Jul 31, 3:56 pm, Mensanator <mensana...@aol .comwrote:
      On Jul 31, 3:07 pm, chrispoliq...@g mail.com wrote:
      >
      >
      >
      >
      >
      I am using regular expressions to search a string (always full
      sentences, maybe more than one sentence) for common abbreviations and
      remove the periods.  I need to break the string into different
      sentences but split('.') doesn't solve the whole problem because of
      possible periods in the middle of a sentence.
      >
      So I have...
      >
      ----------------
      >
      import re
      >
      middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
      >
      # this will find abbreviations like e.g. or i.e. in the middle of a
      sentence.
      # then I want to remove the periods.
      >
      ----------------
      >
      I want to keep the ie or eg but just take out the periods.  Any
      ideas?  Of course newString = middle_abbr.sub ('',txt) where txt is the
      string will take out the entire abbreviation with the alphanumeric
      characters included.
      >middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
      >s = 'A test, i.e., an example.'
      >a = middle_abbr.sea rch(s)      # find the abbreviation
      >b = re.compile('\.' )           # period pattern
      >c = b.sub('',a.grou p(0))       # remove periods from abbreviation
      >d = middle_abbr.sub (c,s)       # substitute new abbr for old
      >d
      >
      'A test, ie, an example.'

      A more versatile version:

      import re

      middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
      s = 'A test, i.e., an example.'
      a = middle_abbr.sea rch(s) # find the abbreviation
      b = re.compile('\.' ) # period pattern
      c = b.sub('',a.grou p(0)) # remove periods from abbreviation
      d = middle_abbr.sub (c,s) # substitute new abbr for old

      print d
      print
      print

      s = """A test, i.e., an example.
      Yet another test, i.e., example with 2 abbr."""

      a = middle_abbr.sea rch(s) # find the abbreviation
      c = b.sub('',a.grou p(0)) # remove periods from abbreviation
      d = middle_abbr.sub (c,s) # substitute new abbr for old

      print d
      print
      print

      s = """A test, i.e., an example.
      Yet another test, i.e., example with 2 abbr.
      A multi-test, e.g., one with different abbr."""

      done = False

      while not done:
      a = middle_abbr.sea rch(s) # find the abbreviation
      if a:
      c = b.sub('',a.grou p(0)) # remove periods from abbreviation
      s = middle_abbr.sub (c,s,1) # substitute new abbr for old ONCE
      else: # repeat until all removed
      done = True

      print s

      ## A test, ie, an example.
      ##
      ##
      ## A test, ie, an example.
      ## Yet another test, ie, example with 2 abbr.'
      ##
      ##
      ## A test, ie, an example.
      ## Yet another test, ie, example with 2 abbr.
      ## A multi-test, eg, one with different abbr.

      Comment

      • Paul McGuire

        #4
        Re: find and replace with regular expressions

        On Jul 31, 3:07 pm, chrispoliq...@g mail.com wrote:
        >
        middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
        >
        When defining re's with string literals, it is good practice to use
        the raw string literal format (precede with an 'r'):
        middle_abbr = re.compile(r'[A-Za-z0-9]\.[A-Za-z0-9]\.')

        What abbreviations have numeric digits in them?

        I hope your input string doesn't include something like this:
        For a good approximation of pi, use 3.1.

        -- Paul

        Comment

        • MRAB

          #5
          Re: find and replace with regular expressions

          On Jul 31, 9:07 pm, chrispoliq...@g mail.com wrote:
          I am using regular expressions to search a string (always full
          sentences, maybe more than one sentence) for common abbreviations and
          remove the periods.  I need to break the string into different
          sentences but split('.') doesn't solve the whole problem because of
          possible periods in the middle of a sentence.
          >
          So I have...
          >
          ----------------
          >
          import re
          >
          middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
          >
          # this will find abbreviations like e.g. or i.e. in the middle of a
          sentence.
          # then I want to remove the periods.
          >
          ----------------
          >
          I want to keep the ie or eg but just take out the periods.  Any
          ideas?  Of course newString = middle_abbr.sub ('',txt) where txt is the
          string will take out the entire abbreviation with the alphanumeric
          characters included.
          It's recommended that you should use a raw strings for regular
          expressions.

          Capture the letters using parentheses:

          middle_abbr = re.compile(r'([A-Za-z0-9])\.([A-Za-z0-9])\.')

          and replace what was found with what was captured:

          newString = middle_abbr.sub (r'\1\2', txt)

          HTH

          Comment

          • dusans

            #6
            Re: find and replace with regular expressions

            On Jul 31, 10:07 pm, chrispoliq...@g mail.com wrote:
            I am using regular expressions to search a string (always full
            sentences, maybe more than one sentence) for common abbreviations and
            remove the periods.  I need to break the string into different
            sentences but split('.') doesn't solve the whole problem because of
            possible periods in the middle of a sentence.
            >
            So I have...
            >
            ----------------
            >
            import re
            >
            middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
            >
            # this will find abbreviations like e.g. or i.e. in the middle of a
            sentence.
            # then I want to remove the periods.
            >
            ----------------
            >
            I want to keep the ie or eg but just take out the periods.  Any
            ideas?  Of course newString = middle_abbr.sub ('',txt) where txt is the
            string will take out the entire abbreviation with the alphanumeric
            characters included.
            Its impossible with regex. U could try it with a statistical analysis;
            and even this would give u a good split.

            Comment

            • dusans

              #7
              Re: find and replace with regular expressions

              On Aug 1, 12:53 pm, dusans <dusan.smit...@ gmail.comwrote:
              On Jul 31, 10:07 pm, chrispoliq...@g mail.com wrote:
              >
              >
              >
              >
              >
              I am using regular expressions to search a string (always full
              sentences, maybe more than one sentence) for common abbreviations and
              remove the periods.  I need to break the string into different
              sentences but split('.') doesn't solve the whole problem because of
              possible periods in the middle of a sentence.
              >
              So I have...
              >
              ----------------
              >
              import re
              >
              middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
              >
              # this will find abbreviations like e.g. or i.e. in the middle of a
              sentence.
              # then I want to remove the periods.
              >
              ----------------
              >
              I want to keep the ie or eg but just take out the periods.  Any
              ideas?  Of course newString = middle_abbr.sub ('',txt) where txt is the
              string will take out the entire abbreviation with the alphanumeric
              characters included.
              >
              Its impossible with regex. U could try it with a statistical analysis;
              and even this would give u a good split.
              "and even this wont* give u a good split." :P

              Comment

              Working...