String Replacement

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • moconno5
    New Member
    • Jul 2007
    • 19

    String Replacement

    Hello everyone, I've got a simple one today. I have a string and I want to remove all carriage returns ('\n') between the characters [ACGU] and [ACGU] and preserve the other ones. For example:

    'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGA\nUAACUGU ACAGGCCACUGCCUU GCCAGG\n
    Musmusculuslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUA\nACUGCGC AAGCUACUGCCUUGC UAG\n'

    editted string:

    'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGAUAACUGUAC AGGCCACUGCCUUGC CAGG\n
    Musmusculuslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUAACUGCGCAA GCUACUGCCUUGCUA G\n'

    I tried the following code but I got a syntax error:

    Code:
    for a in chunk3:
    	if ([ACGU]'\n'[ACGU]):
                   chunk3 = chunk3.replace('\n','')
    Thanks,

    Mark
  • ilikepython
    Recognized Expert Contributor
    • Feb 2007
    • 844

    #2
    Originally posted by moconno5
    Hello everyone, I've got a simple one today. I have a string and I want to remove all carriage returns ('\n') between the characters [ACGU] and [ACGU] and preserve the other ones. For example:

    'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGA\nUAACUGU ACAGGCCACUGCCUU GCCAGG\n
    Musmusculuslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUA\nACUGCGC AAGCUACUGCCUUGC UAG\n'

    editted string:

    'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGAUAACUGUAC AGGCCACUGCCUUGC CAGG\n
    Musmusculuslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUAACUGCGCAA GCUACUGCCUUGCUA G\n'

    I tried the following code but I got a syntax error:

    Code:
    for a in chunk3:
    	if ([ACGU]'\n'[ACGU]):
                   chunk3 = chunk3.replace('\n','')
    Thanks,

    Mark
    Try this:
    [code=python]
    import re

    s = "Musmusculu slet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGA\nUAACUGU ACAGGCCACUGCCUU GCCAGG\n"

    patt = re.compile(r"[ACGU]\n[ACGU]")
    matches = patt.findall(s)

    for m in matches:
    s = s.replace(m, m[0] + m[2])
    [/code]

    Comment

    • bvdet
      Recognized Expert Specialist
      • Oct 2006
      • 2851

      #3
      Originally posted by moconno5
      Hello everyone, I've got a simple one today. I have a string and I want to remove all carriage returns ('\n') between the characters [ACGU] and [ACGU] and preserve the other ones. For example:

      'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGA\nUAACUGU ACAGGCCACUGCCUU GCCAGG\n
      Musmusculuslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUA\nACUGCGC AAGCUACUGCCUUGC UAG\n'

      editted string:

      'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGAUAACUGUAC AGGCCACUGCCUUGC CAGG\n
      Musmusculuslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUAACUGCGCAA GCUACUGCCUUGCUA G\n'

      I tried the following code but I got a syntax error:

      Code:
      for a in chunk3:
      	if ([ACGU]'\n'[ACGU]):
                     chunk3 = chunk3.replace('\n','')
      Thanks,

      Mark
      It looks like you are trying to implement a regex solution without understanding how it works. This seems to do what you want:[code=Python]import re
      patt = re.compile(r'[ACGU]\n[ACGU]')
      s = 'Musmusculuslet-7gstem-loop\nCCAGGCUGA GGUAGUAGUUUGUAC AGUUUGAGGGUCUAU GAUACCACCCGGUAC AGGAGA\nUAACUGU ACAGGCCACUGCCUU GCCAGG\nMusmusc uluslet-7istem-loop\nCUGGCUGAG GUAGUAGUUUGUGCU GUUGGUCGGGUUGUG ACAUUGCCCGCUGUG GAGAUA\nACUGCGC AAGCUACUGCCUUGC UAG\n'
      s1 = s
      for item in patt.findall(s) :
      s1 = s1.replace(item , (item.replace(' \n', '')))[/code]Output:
      Code:
      >>> Musmusculuslet-7gstem-loop
      CCAGGCUGAGGUAGUAGUUUGUACAGUUUGAGGGUCUAUGAUACCACCCGGUACAGGAGAUAACUGUACAGGCCACUGCCUUGCCAGG
      Musmusculuslet-7istem-loop
      CUGGCUGAGGUAGUAGUUUGUGCUGUUGGUCGGGUUGUGACAUUGCCCGCUGUGGAGAUAACUGCGCAAGCUACUGCCUUGCUAG
      
      >>>
      Step by step:
      1. Import the 're' module
      2. Define and compile the pattern to match substrings in your string: a character in the set 'ACGU' followed by '\n' followed by a character in the set 'ACGU'.
      3. Use the compiled pattern object to find all occurrences of the matched patterns.
      >>> patt.findall(s)
      ['A\nU', 'A\nA']
      >>>
      4. Use string method replace to replace each '\n' with "".

      HTH

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        You beat me to it ilikepython! :(

        Comment

        • moconno5
          New Member
          • Jul 2007
          • 19

          #5
          Thanks for the guidance. My question is why are you creating a second string (s1), and running the pattern and replacing in that string? I'm still working my way around understanding regex's.

          Mark

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            Originally posted by moconno5
            Thanks for the guidance. My question is why are you creating a second string (s1), and running the pattern and replacing in that string? I'm still working my way around understanding regex's.

            Mark
            No reason other that to do further manipulations at the interactive prompt.

            Comment

            Working...