How do I speedup this loop?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steve

    How do I speedup this loop?

    Hi,

    I'm getting some output by running a command using os.popen. I need to
    parse the output and transform it in some sense so that it's 'DB
    compatible', (i.e I need to store the output in a database (postgres)
    after escaping some characters). Since I'm new to python, I wasn't sure
    if there was a better way of doing this so this is what I did:


    # Parse the output returned by popen and return the script
    out = os.popen('some command')
    all_lines = out.readlines()

    script = []
    for i in xrange(len(all_ lines)):
    line = all_lines[i].replace("'", "\\'")[0:len(line)-1]
    # replace ' with \'
    line_without_ca rriage = line[0:len(line)-1] # remove
    carriage
    line_without_ca rriage =
    line_without_ca rriage.replace( "\\n", "$___n") # replace end of line with
    $___n
    line_without_ca rriage += "@___n" # add a 'end of line'
    character to the end
    script.append(l ine_without_car riage)
    # end for

    script = ''.join(script)

    Please help because I'm pretty sure I'm wasting a lot of cpu time in
    this loop. Thanks

    Steve

  • Marco Aschwanden

    #2
    Re: How do I speedup this loop?

    On Tue, 13 Jul 2004 16:48:36 +1000, Unknown <unknown@unknow n.invalid>
    wrote:
    [color=blue]
    > I'm getting some output by running a command using os.popen. I need to
    > parse the output and transform it in some sense so that it's 'DB
    > compatible', (i.e I need to store the output in a database (postgres)
    > after escaping some characters).[/color]

    If you are using Python's DB API 2.0 than this escaping would be done by
    the API:
    [color=blue][color=green][color=darkred]
    >>> import odbc,dbi
    >>> con = odbc.odbc("DB_I D/USERNAME/PASSWORD")
    >>> cur = con.cursor()
    >>> sql = "INSERT INTO output (line) VALUES (?)"
    >>> dirty_line = 'Some text with forbidden characters\n\r. ..'
    >>> cur.execute(sql , dirty_line)[/color][/color][/color]

    So, no need to parse (and afterwards unparse) the ouput - I don't think
    that anyone can beat this speed up!

    Regards,
    Marco

    Comment

    • David Fraser

      #3
      Re: How do I speedup this loop?

      Marco Aschwanden wrote:[color=blue]
      > On Tue, 13 Jul 2004 16:48:36 +1000, Unknown <unknown@unknow n.invalid>
      > wrote:
      >[color=green]
      >> I'm getting some output by running a command using os.popen. I need to
      >> parse the output and transform it in some sense so that it's 'DB
      >> compatible', (i.e I need to store the output in a database (postgres)
      >> after escaping some characters).[/color]
      >
      >
      > If you are using Python's DB API 2.0 than this escaping would be done by
      > the API:
      >[color=green][color=darkred]
      >>>> import odbc,dbi
      >>>> con = odbc.odbc("DB_I D/USERNAME/PASSWORD")
      >>>> cur = con.cursor()
      >>>> sql = "INSERT INTO output (line) VALUES (?)"
      >>>> dirty_line = 'Some text with forbidden characters\n\r. ..'
      >>>> cur.execute(sql , dirty_line)[/color][/color]
      >
      >
      > So, no need to parse (and afterwards unparse) the ouput - I don't think
      > that anyone can beat this speed up![/color]

      Except if you're aiming for database independence, as different database
      drivers support different means of escaping parameters...

      David

      Comment

      • Riccardo Attilio Galli

        #4
        Re: How do I speedup this loop?

        On Tue, 13 Jul 2004 16:48:36 +1000, Steve wrote:
        [color=blue]
        > Hi,
        >
        > I'm getting some output by running a command using os.popen. I need to
        > parse the output and transform it in some sense so that it's 'DB
        > compatible', (i.e I need to store the output in a database (postgres)
        > after escaping some characters). Since I'm new to python, I wasn't sure
        > if there was a better way of doing this so this is what I did:[/color]

        if you were replacing a character with another, the best and quick way was
        to use a translation table, but you're not lucky
        [color=blue]
        > line = all_lines[i].replace("'", "\\'")[0:len(line)-1]
        > # replace ' with \'[/color]

        ? this can't work. You never defined "line" and you're using "len" on it.
        I think you want to delete the last character of the string. if so, you
        can use negative indexes

        line = all_lines[i].replace("'", "\\'")[:-1]

        the 0 disappeared is the default value

        [color=blue]
        > line_without_ca rriage = line[0:len(line)-1] # remove carriage[/color]

        similar here
        line_without_ca rriage = line[:-1]

        but you're just deleting the current last character of the string.
        so you could delete this line and change the indexing in the first one
        so the first one would become
        line = all_lines[i].replace("'", "\\'")[:-2]

        ah, I don't think you're removing a carriage return ('\r') here.
        If your line end with '\r\n' you're killing '\n' , a line feed.
        This is important 'cause in the next line....
        [color=blue]
        > line_without_ca rriage = line_without_ca rriage.replace( "\\n", "$___n")[/color]
        # replace end of line with $___n
        .... you try to replace '\\n' ,
        are you intending to delete the line feed, the end of line ?
        if this is the case you should write '\n' (one character) not '\\n' (a
        string of len 2)
        [color=blue]
        > line_without_ca rriage += "@___n"
        > script.append(l ine_without_car riage)
        > # end for
        > script = ''.join(script)[/color]

        the best here is to do
        script.append(l ine_without_car riage)
        script.append(' @___n')
        # end for
        script = ''.join(script)

        Appending '@___n' you don't need to loose memory for destroying and
        creating a new string each time
        [color=blue]
        > Please help because I'm pretty sure I'm wasting a lot of cpu time in
        > this loop. Thanks
        >
        > Steve[/color]

        Ciao,
        Riccardo

        --
        -=Riccardo Galli=-

        _,e.
        s~ ``
        ~@. ideralis Programs
        .. ol
        `**~ http://www.sideralis.net

        Comment

        • Istvan Albert

          #5
          Re: How do I speedup this loop?

          David Fraser wrote:
          [color=blue]
          > Except if you're aiming for database independence, as different database
          > drivers support different means of escaping parameters...[/color]

          IMHO database independence is both overrated not to mention impossible.
          You can always try the 'greatest common factor' approach but that
          causes more trouble (and work) than it saves.

          I agree with the previous poster stating that escaping should be done
          in the DB API, but it is better to use the 'typed' escaping:

          sql = 'SELECT FROM users WHERE user_id=%d AND user_passwd=%s'
          par = [1, 'something']
          cursor.execute( sql, par)

          Istvan.


          Comment

          • george young

            #6
            Re: How do I speedup this loop?

            On Tue, 13 Jul 2004 16:48:36 +1000
            Steve <nospam@nopes > threw this fish to the penguins:[color=blue]
            >
            > I'm getting some output by running a command using os.popen. I need to
            > parse the output and transform it in some sense so that it's 'DB
            > compatible', (i.e I need to store the output in a database (postgres)
            > after escaping some characters). Since I'm new to python, I wasn't sure
            > if there was a better way of doing this so this is what I did:
            >
            > # Parse the output returned by popen and return the script
            > out = os.popen('some command')
            > all_lines = out.readlines()
            >
            > script = []
            > for i in xrange(len(all_ lines)):
            > line = all_lines[i].replace("'", "\\'")[0:len(line)-1]
            > # replace ' with \'
            > line_without_ca rriage = line[0:len(line)-1] # remove
            > carriage
            > line_without_ca rriage =
            > line_without_ca rriage.replace( "\\n", "$___n") # replace end of line with
            > $___n
            > line_without_ca rriage += "@___n" # add a 'end of line'
            > character to the end
            > script.append(l ine_without_car riage)
            > # end for
            >
            > script = ''.join(script)[/color]

            How about:

            lines = []
            out = os.popen('some command')
            for l in out:
            lines.append(l. strip())
            script = ''.join(lines)
            out.close()

            The "strip" actually removes white space from front and back of the string;
            you could say l.strip('\n') if you only want the newlines removed (or '\r'
            if they're really carriage return characters.)

            Or if you want a clever (and most CPU efficient!) one-liner:

            script = [l.strip() for l in os.popen('some command')]

            I'm not advocating such a terse one-liner unless you are very comfortable
            with it's meaning; will you easily know what it does when you see it
            six months from now in the heat of battle?

            Also, the one-liner does not allow you to explicitly close the file
            descriptor from popen. This could be a serious problem if it gets run
            hundreds of times in a loop.

            Have fun,
            -- George Young

            --
            "Are the gods not just?" "Oh no, child.
            What would become of us if they were?" (CSL)

            Comment

            • Jean Brouwers

              #7
              Re: How do I speedup this loop?


              What about handling all output as one string?

              script = os.popen('some command')
              script = script.replace( "'", "\\'") # replace ' with \'
              script = script.replace( "\r", ") # remove cr
              script = script.replace( "\\n", "$___n") # replace \n
              script = script.replace( "\n", "@___n'") # replace nl


              /Jean Brouwers


              In article <40f385dc$1@cla rion.carno.net. au>, Steve <nospam@nopes >
              wrote:
              [color=blue]
              > Hi,
              >
              > I'm getting some output by running a command using os.popen. I need to
              > parse the output and transform it in some sense so that it's 'DB
              > compatible', (i.e I need to store the output in a database (postgres)
              > after escaping some characters). Since I'm new to python, I wasn't sure
              > if there was a better way of doing this so this is what I did:
              >
              >
              > # Parse the output returned by popen and return the script
              > out = os.popen('some command')
              > all_lines = out.readlines()
              >
              > script = []
              > for i in xrange(len(all_ lines)):
              > line = all_lines[i].replace("'", "\\'")[0:len(line)-1]
              > # replace ' with \'
              > line_without_ca rriage = line[0:len(line)-1] # remove
              > carriage
              > line_without_ca rriage =
              > line_without_ca rriage.replace( "\\n", "$___n") # replace end of line with
              > $___n
              > line_without_ca rriage += "@___n" # add a 'end of line'
              > character to the end
              > script.append(l ine_without_car riage)
              > # end for
              >
              > script = ''.join(script)
              >
              > Please help because I'm pretty sure I'm wasting a lot of cpu time in
              > this loop. Thanks
              >
              > Steve
              >[/color]

              Comment

              • David Fraser

                #8
                Re: How do I speedup this loop?

                Istvan Albert wrote:[color=blue]
                > David Fraser wrote:
                >[color=green]
                >> Except if you're aiming for database independence, as different
                >> database drivers support different means of escaping parameters...[/color]
                >
                >
                > IMHO database independence is both overrated not to mention impossible.
                > You can always try the 'greatest common factor' approach but that
                > causes more trouble (and work) than it saves.[/color]

                Not overrated or impossible. It's part of our business model. It works.
                [color=blue]
                > I agree with the previous poster stating that escaping should be done
                > in the DB API, but it is better to use the 'typed' escaping:
                >
                > sql = 'SELECT FROM users WHERE user_id=%d AND user_passwd=%s'
                > par = [1, 'something']
                > cursor.execute( sql, par)
                >[/color]

                Better if the database driver you are using supports it... otherwise
                userless
                I think there is a need to drive towards some sort of standard approach
                to this in DB-API (maybe version 3?) as it otherwise nullifies
                parameters for anyone using multiple database drivers.

                David

                Comment

                • Lonnie Princehouse

                  #9
                  Re: How do I speedup this loop?

                  Welcome to Python =)
                  Somebody else already mentioned checking out the DBI API's way of
                  escaping data; this is a good idea. Besides that, here are some
                  general tips-

                  1. Consider using out.xreadlines( ) if you only need one line at a
                  time:

                  for line in out.xreadlines( ):
                  ...

                  If you need all of the data at once, try out.read()

                  2. You can use negative numbers to index relative to the end of a
                  sequence:

                  line[0:-1] is equivalent to line[0:len(line)-1]
                  (i.e. cut off the last character of a string)

                  You can also use line.strip() to remove trailing whitespace,
                  including newlines.

                  3. If you omit the index on either side of a slice, Python will
                  default to the beginning and end of a sequence:

                  line[:] is equivalent to line[0:len(line)]

                  4. Check out the regular expression module. Here's how to read all of
                  your output and make multiple escape substitutions. Smashing this
                  into one regular expression means you only need one pass over the
                  data. It also avoids string concatenation.

                  import re, os

                  out = os.popen('some command')
                  data = out.read()

                  substitution_ma p = {
                  "'" : r"\'",
                  "\n": "$___n",
                  }

                  def sub_func(match_ object, smap=substituti on_map):
                  return smap[match_object.gr oup(0)]

                  escape_expr = re.compile('|'. join(substituti on_map.keys())) )

                  escaped_data = escape_expr.sub (sub_func, data)

                  # et voila... now you've got a big escaped string without even
                  # writing a single for loop. Tastes great, less filling.

                  (caveat: I didn't run this code. It might have typos.)


                  Steve <nospam@nopes > wrote in message news:<40f385dc$ 1@clarion.carno .net.au>...[color=blue]
                  > Hi,
                  >
                  > I'm getting some output by running a command using os.popen. I need to
                  > parse the output and transform it in some sense so that it's 'DB
                  > compatible', (i.e I need to store the output in a database (postgres)
                  > after escaping some characters). Since I'm new to python, I wasn't sure
                  > if there was a better way of doing this so this is what I did:
                  >
                  >
                  > # Parse the output returned by popen and return the script
                  > out = os.popen('some command')
                  > all_lines = out.readlines()
                  >
                  > script = []
                  > for i in xrange(len(all_ lines)):
                  > line = all_lines[i].replace("'", "\\'")[0:len(line)-1]
                  > # replace ' with \'
                  > line_without_ca rriage = line[0:len(line)-1] # remove
                  > carriage
                  > line_without_ca rriage =
                  > line_without_ca rriage.replace( "\\n", "$___n") # replace end of line with
                  > $___n
                  >[/color]

                  line_without_ca rriage += "@___n" # add a 'end of line'[color=blue]
                  > character to the end
                  > script.append(l ine_without_car riage)
                  > # end for
                  >
                  > script = ''.join(script)
                  >
                  > Please help because I'm pretty sure I'm wasting a lot of cpu time in
                  > this loop. Thanks
                  >
                  > Steve[/color]

                  Comment

                  • Bart Nessux

                    #10
                    Re: How do I speedup this loop?

                    george young wrote:[color=blue]
                    > How about:
                    >
                    > lines = []
                    > out = os.popen('some command')
                    > for l in out:
                    > lines.append(l. strip())
                    > script = ''.join(lines)
                    > out.close()
                    >
                    > The "strip" actually removes white space from front and back of the string;
                    > you could say l.strip('\n') if you only want the newlines removed (or '\r'
                    > if they're really carriage return characters.)[/color]

                    The above is a great solution... should make for a good speed up. It's
                    how I might pproach it.
                    [color=blue]
                    > Or if you want a clever (and most CPU efficient!) one-liner:
                    >
                    > script = [l.strip() for l in os.popen('some command')][/color]

                    Clever progammers should be shot! I've had to work behind them... they
                    are too smart for their own good. They think everyone else in the world
                    is as clever as they are... this is where they are wrong ;)

                    Comment

                    • george young

                      #11
                      Re: How do I speedup this loop?

                      On Wed, 14 Jul 2004 15:40:15 -0400
                      Bart Nessux <bart_nessux@ho tmail.com> threw this fish to the penguins:
                      [color=blue]
                      > george young wrote:[color=green]
                      > > How about:
                      > >
                      > > lines = []
                      > > out = os.popen('some command')
                      > > for l in out:
                      > > lines.append(l. strip())
                      > > script = ''.join(lines)
                      > > out.close()
                      > >
                      > > The "strip" actually removes white space from front and back of the string;
                      > > you could say l.strip('\n') if you only want the newlines removed (or '\r'
                      > > if they're really carriage return characters.)[/color]
                      >
                      > The above is a great solution... should make for a good speed up. It's
                      > how I might pproach it.
                      >[color=green]
                      > > Or if you want a clever (and most CPU efficient!) one-liner:
                      > >
                      > > script = ''.join([l.strip() for l in os.popen('some command')])[/color][/color]
                      [fixed up a bit...I forgot about the join][color=blue]
                      >
                      > Clever progammers should be shot! I've had to work behind them... they
                      > are too smart for their own good. They think everyone else in the world
                      > is as clever as they are... this is where they are wrong ;)[/color]

                      Oh, all right. How about:

                      out = os.popen('some command')
                      temp_script = [l.strip() for l in out]
                      script = ''.join(temp_sc ript)

                      That's clear enough, and still takes advantage of the efficiency of
                      the list comprehension! (I admit, that for reading the whole file,
                      the other postings of regexp substitution on the total string are
                      certainly faster, given enough RAM, but still not as clear and concise
                      and elegant ... blah blah blah as mine...

                      -- George Young

                      --
                      "Are the gods not just?" "Oh no, child.
                      What would become of us if they were?" (CSL)

                      Comment

                      • Steve

                        #12
                        Re: How do I speedup this loop?


                        george young wrote:[color=blue]
                        > On Wed, 14 Jul 2004 15:40:15 -0400
                        > Bart Nessux <bart_nessux@ho tmail.com> threw this fish to the penguins:
                        >
                        >[color=green]
                        >>george young wrote:
                        >>[color=darkred]
                        >>>How about:
                        >>>
                        >>>lines = []
                        >>>out = os.popen('some command')
                        >>>for l in out:
                        >>> lines.append(l. strip())
                        >>>script = ''.join(lines)
                        >>>out.close( )
                        >>>
                        >>>The "strip" actually removes white space from front and back of the string;
                        >>>you could say l.strip('\n') if you only want the newlines removed (or '\r'
                        >>>if they're really carriage return characters.)[/color]
                        >>
                        >>The above is a great solution... should make for a good speed up. It's
                        >>how I might pproach it.
                        >>
                        >>[color=darkred]
                        >>>Or if you want a clever (and most CPU efficient!) one-liner:
                        >>>
                        >>>script = ''.join([l.strip() for l in os.popen('some command')])[/color][/color]
                        >
                        > [fixed up a bit...I forgot about the join]
                        >[color=green]
                        >>Clever progammers should be shot! I've had to work behind them... they
                        >>are too smart for their own good. They think everyone else in the world
                        >>is as clever as they are... this is where they are wrong ;)[/color]
                        >
                        >
                        > Oh, all right. How about:
                        >
                        > out = os.popen('some command')
                        > temp_script = [l.strip() for l in out]
                        > script = ''.join(temp_sc ript)[/color]

                        That looks good but the problem is that I don't want to 'strip' off the
                        'end of line' characters etc., because I need to reproduce/print the
                        output exactly as it was at a later stage. What's more... I need to
                        print it out on a HTML page, and so if I know the different between \n
                        (in code) and the implicit end of line character, I can interpret that
                        in HTML accordingly. For example, the output can contain something like:

                        printf("Hey there\n");

                        and so, there's a \n embedded inside the text as well as the end of line
                        character which isn't visible. Although escaping characters using a
                        DB-API function might do the trick, this still won't help me much in
                        the end, where I need to print a "<br>" for each 'end of line character'.

                        --
                        Steve[color=blue]
                        >
                        > That's clear enough, and still takes advantage of the efficiency of
                        > the list comprehension! (I admit, that for reading the whole file,
                        > the other postings of regexp substitution on the total string are
                        > certainly faster, given enough RAM, but still not as clear and concise
                        > and elegant ... blah blah blah as mine...
                        >
                        > -- George Young
                        >[/color]

                        Comment

                        • Steve

                          #13
                          Re: How do I speedup this loop?


                          Jean Brouwers wrote:[color=blue]
                          > What about handling all output as one string?
                          >
                          > script = os.popen('some command')
                          > script = script.replace( "'", "\\'") # replace ' with \'
                          > script = script.replace( "\r", ") # remove cr
                          > script = script.replace( "\\n", "$___n") # replace \n
                          > script = script.replace( "\n", "@___n'") # replace nl
                          >[/color]

                          This won't do any better than what I was already doing. I need the code
                          to be very fast and this will only end up creating a lot of copies
                          everytime the string is going to be modified (strings are immutable). I
                          really like the idea of using regex for this (proposed by lonnie), but I
                          still need to a hang of it.

                          Cheers,
                          Steve

                          ------------ And now a word from our sponsor ------------------
                          Want to have instant messaging, and chat rooms, and discussion
                          groups for your local users or business, you need dbabble!
                          -- See http://netwinsite.com/sponsor/sponsor_dbabble.htm ----

                          Comment

                          Working...