Problem of function calls from map()

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dasn

    Problem of function calls from map()


    Hi, there.

    'lines' is a large list of strings each of which is seperated by '\t'
    >>lines = ['bla\tbla\tblah ', 'bh\tb\tb', ... ]
    I wanna split each string into a list. For speed, using map() instead
    of 'for' loop. 'map(str.split, lines)' works fine , but...
    when I was trying:
    >>l = map(str.split(' \t'), lines)
    I got "TypeError: 'list' object is not callable".

    To avoid function call overhead, I am not willing to use lambda function
    either. So how to put '\t' argument to split() in map() ?

    Thanks.

  • Tim Lesher

    #2
    Re: Problem of function calls from map()

    Dasn wrote:
    So how to put '\t' argument to split() in map() ?
    How much is the lambda costing you, according to your profiler?

    Anyway, what you really want is a list comprehension:

    l = [line.split('\t' ) for line in lines]

    Comment

    • Diez B. Roggisch

      #3
      Re: Problem of function calls from map()

      Dasn wrote:
      >
      Hi, there.
      >
      'lines' is a large list of strings each of which is seperated by '\t'
      >>>lines = ['bla\tbla\tblah ', 'bh\tb\tb', ... ]
      >
      I wanna split each string into a list. For speed, using map() instead
      of 'for' loop. 'map(str.split, lines)' works fine , but...
      when I was trying:
      >
      >>>l = map(str.split(' \t'), lines)
      >
      I got "TypeError: 'list' object is not callable".
      >
      To avoid function call overhead, I am not willing to use lambda function
      either. So how to put '\t' argument to split() in map() ?
      You can't. Use a lambda or list-comprehension.


      map(lambda l: l.split("\t"), lines)

      [l.split("\t") for l in lines]


      Diez

      Comment

      • Paul McGuire

        #4
        Re: Problem of function calls from map()

        "Dasn" <dasn@bluebottl e.comwrote in message
        news:mailman.96 06.1156169593.2 7775.python-list@python.org ...
        >
        Hi, there.
        >
        'lines' is a large list of strings each of which is seperated by '\t'
        >lines = ['bla\tbla\tblah ', 'bh\tb\tb', ... ]
        >
        I wanna split each string into a list. For speed, using map() instead
        of 'for' loop.
        Try this. Not sure how it stacks up for speed, though. (As others have
        suggested, if 'for' loop is giving you speed heartburn, use a list
        comprehension.)

        In this case, splitUsing is called only once, to create the embedded
        function tmp. tmp is the function that split will call once per list item,
        using whatever characters were specified in the call to splitUsing.

        -- Paul



        data = [
        "sldjflsdfj\tls jdlj\tlkjsdlkfj ",
        "lsdjflsjd\tlsj dlfdj\tlskjdflk j",
        "lskdjfl\tlskdj flj\tlskdlfkjsd ",
        ]

        def splitUsing(char s):
        def tmp(s):
        return s.split(chars)
        return tmp

        for d in map(splitUsing( '\t'), data):
        print d


        Comment

        • Paul McGuire

          #5
          Re: Problem of function calls from map()

          >>tmp is the function that split will call once per list item

          should be

          tmp is the function that *map* will call once per list item

          -- Paul


          Comment

          • Georg Brandl

            #6
            Re: Problem of function calls from map()

            Paul McGuire wrote:
            "Dasn" <dasn@bluebottl e.comwrote in message
            news:mailman.96 06.1156169593.2 7775.python-list@python.org ...
            >>
            >Hi, there.
            >>
            >'lines' is a large list of strings each of which is seperated by '\t'
            >>lines = ['bla\tbla\tblah ', 'bh\tb\tb', ... ]
            >>
            >I wanna split each string into a list. For speed, using map() instead
            >of 'for' loop.
            >
            Try this. Not sure how it stacks up for speed, though. (As others have
            suggested, if 'for' loop is giving you speed heartburn, use a list
            comprehension.)
            >
            In this case, splitUsing is called only once, to create the embedded
            function tmp. tmp is the function that split will call once per list item,
            using whatever characters were specified in the call to splitUsing.
            >
            -- Paul
            >
            >
            >
            data = [
            "sldjflsdfj\tls jdlj\tlkjsdlkfj ",
            "lsdjflsjd\tlsj dlfdj\tlskjdflk j",
            "lskdjfl\tlskdj flj\tlskdlfkjsd ",
            ]
            >
            def splitUsing(char s):
            def tmp(s):
            return s.split(chars)
            return tmp
            >
            for d in map(splitUsing( '\t'), data):
            print d
            And why is this better than

            map(lambda t: t.split('\t'), data)

            ?

            Georg

            Comment

            • Paul McGuire

              #7
              Re: Problem of function calls from map()

              "Georg Brandl" <g.brandl-nospam@gmx.netw rote in message
              news:ecemdl$qd5 $1@news.albasan i.net...
              Paul McGuire wrote:
              "Dasn" <dasn@bluebottl e.comwrote in message
              news:mailman.96 06.1156169593.2 7775.python-list@python.org ...
              >
              Hi, there.
              >
              'lines' is a large list of strings each of which is seperated by '\t'
              >lines = ['bla\tbla\tblah ', 'bh\tb\tb', ... ]
              >
              I wanna split each string into a list. For speed, using map() instead
              of 'for' loop.
              <snip>

              def splitUsing(char s):
              def tmp(s):
              return s.split(chars)
              return tmp

              for d in map(splitUsing( '\t'), data):
              print d
              >
              And why is this better than
              >
              map(lambda t: t.split('\t'), data)
              >
              ?
              >
              Georg
              Hmm, "better" is a funny word. My posting was definitely more verbose, but
              verbosity isn't always bad.

              In defense of brevity:
              - often (but not always) runs faster
              - usually easier to understand as a single gestalt (i.e., you don't have to
              jump around in the code, or grasp the intent of a dozen or more lines, when
              one or a few lines do all the work), but this can be overdone

              In defense of verbosity:
              - usually more explicit, as individual bits of logic are exposed as separate
              functions or statements, and anonymous functions can be given more
              descriptive names
              - usually easier to understand, especially for language newcomers
              - separate functions can be compiled by psyco

              Of course, such generalizations invite obvious extremes and counterexamples .
              Prime number algorithms compacted into one-liners are anything but quick to
              understand; conversely, I've seen a 40-line database function exploded into
              >100 classes (this was in Java, so each was also a separate file!) in
              pursuit of implementing a developer's favorite GOF pattern.

              This idiom (as used in the splitUsing case) of returning a callable from a
              function whose purpose is to be a factory for callables seems to be a common
              one in Python, I think I've seen it go by different names: currying, and
              closures being most common, and decorators are another flavor of this idea.
              Perhaps these idioms ("idia"?) emerged when "lambda" was on Guido's Py3K
              chopping block.

              So I wouldn't really hold these two routines up for "betterness " - the OP's
              performance test shows them to be about the same. To summarize his
              performance results (times in CPU secs):
              - explicit "for" loop - 20.510 (309130 total function calls; 154563 to
              split and 154563 to append)
              - list comprehension - 12.240 (154567 total function calls; 154563 to split)
              - map+lambda - 20.480 (309130 total function calls; 154563 to <lambdaand
              154563 to split)
              - map+splitUsing - 21.900 (309130 total function calls; 154563 to tmp and
              154563 to split)

              The big winner here is the list comprehension, and it would seem it outdoes
              the others by halving the number of function calls. Unfortunately, most of
              our currying/closure/decorator idioms are implemented using some sort of
              "function-calls-an-embedded-function" form, and function calls are poison to
              performance in Python (and other languages, too, but perhaps easier to
              observe in Python). Even the anonymous lambda implementation has this same
              issue.

              So the interesting point here is to go back to the OP's OP, in which he
              states, "For speed, [I'm] using map() instead of 'for' loop." As it turns
              out, map() isn't much of a win in this case. The real, "best" solution is
              the list comprehension, not only for speed, but also for ease of readability
              and understanding. It's tough to beat this:

              return [s.split('\t') for s in lines]

              for clarity, explicity, brevity, and as it happens, also for speed.

              -- Paul


              Comment

              Working...