do a sed / awk filter with python tools (at least as fast)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mathieu Prevot

    do a sed / awk filter with python tools (at least as fast)

    Hi,

    I use in a bourne shell script the following filter:

    sed '/watch?v=/! d;s/.*v=//;s/\(.\{11\}\).*/\1/' \
    | sort | uniq | awk 'ORS=" "{print $1}'

    that give me all sets of 11 characters that follows the "watch?v="
    motif. I would like to do it in python on stdout from a
    subprocess.Pope n instance, using python tools rather than sed awk etc.
    How can I do this ? Can I expect something as fast ?

    Thanks,
    Mathieu
  • Peter Otten

    #2
    Re: do a sed / awk filter with python tools (at least as fast)

    Mathieu Prevot wrote:
    I use in a bourne shell script the following filter:
    >
    sed '/watch?v=/! d;s/.*v=//;s/\(.\{11\}\).*/\1/' \
    | sort | uniq | awk 'ORS=" "{print $1}'
    >
    that give me all sets of 11 characters that follows the "watch?v="
    motif. I would like to do it in python on stdout from a
    subprocess.Pope n instance, using python tools rather than sed awk etc.
    How can I do this ? Can I expect something as fast ?
    You should either do it in Python , e. g.:

    def process(lines):
    candidates = (line.rstrip(). partition("/watch?v=") for line in lines)
    matches = (c[:11] for a, b, c in candidates if len(c) >= 11)
    print " ".join(sorted(s et(matches)))

    if __name__ == "__main__":
    import sys
    process(sys.std in)

    or invoke your shell script via subprocess.Pope n(). Invoking a python script
    via subprocess doesn't make sense IMHO.

    Peter

    Comment

    • Mathieu Prevot

      #3
      Re: do a sed / awk filter with python tools (at least as fast)

      2008/7/7 Peter Otten <__peter__@web. de>:
      Mathieu Prevot wrote:
      >
      >I use in a bourne shell script the following filter:
      >>
      >sed '/watch?v=/! d;s/.*v=//;s/\(.\{11\}\).*/\1/' \
      >| sort | uniq | awk 'ORS=" "{print $1}'
      >>
      >that give me all sets of 11 characters that follows the "watch?v="
      >motif. I would like to do it in python on stdout from a
      >subprocess.Pop en instance, using python tools rather than sed awk etc.
      >How can I do this ? Can I expect something as fast ?
      >
      You should either do it in Python , e. g.:
      >
      def process(lines):
      candidates = (line.rstrip(). partition("/watch?v=") for line in lines)
      matches = (c[:11] for a, b, c in candidates if len(c) >= 11)
      print " ".join(sorted(s et(matches)))
      >
      if __name__ == "__main__":
      import sys
      process(sys.std in)
      >
      or invoke your shell script via subprocess.Pope n(). Invoking a python script
      via subprocess doesn't make sense IMHO.
      :) Thanks.
      Mathieu

      Comment

      Working...