Complicated string substitution

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Horacius ReX

    Complicated string substitution

    Hi,

    I have a file with a lot of the following ocurrences:

    denmark.handa.1-10
    denmark.handa.1-12344
    denmark.handa.1-4
    denmark.handa.1-56

    ....

    distributed randomly in a file. I need to convert each of this
    ocurrences to:

    denmark.handa.1-10_1
    denmark.handa.1-12344_1
    denmark.handa.1-4_1
    denmark.handa.1-56_1

    so basically I add "_1" at the end of each ocurrence.

    I thought about using sed, but as each "root" is different I have no
    clue how to go through this.

    Any suggestion ?

    Thanks in advance.
  • Tim Chase

    #2
    Re: Complicated string substitution

    I have a file with a lot of the following ocurrences:
    >
    denmark.handa.1-10
    denmark.handa.1-12344
    denmark.handa.1-4
    denmark.handa.1-56
    Each on its own line? Scattered throughout the text? With other
    content that needs to be un-changed? With other stuff on the
    same line?
    denmark.handa.1-10_1
    denmark.handa.1-12344_1
    denmark.handa.1-4_1
    denmark.handa.1-56_1
    >
    so basically I add "_1" at the end of each ocurrence.
    >
    I thought about using sed, but as each "root" is different I have no
    clue how to go through this.
    How are the roots different? Do they all begin with
    "denmark.handa. "? Or can the be found by a pattern of "stuff
    period stuff period number dash number"?

    A couple sed solutions, since you considered them first:

    sed '/denmark\.handa/s/$/_1/'
    sed 's/denmark\.handa\ .\d+-\d+/&_1/g'
    sed 's/[a-z]+\.[a-z]+\.\d+-\d+/&_1/g'

    Or are you just looking for "number dash number" and want to
    suffix the "_1"?

    sed 's/\d+-\d+/&_1/g'

    Most of the sed versions translate pretty readily into Python
    regexps in the .sub() call.

    import re
    r = re.compile(r'[a-z]+\.[a-z]+\.\d+-\d+')
    out = file('out.txt', 'w')
    for line in file('in.txt'):
    out.write(r.sub (r'\g<0>_1', line))
    out.close()

    Tweak the regexps accordingly.

    -tkc



    Comment

    Working...