Another Python rookie trying to port to C/C++ question.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Don Bruder

    Another Python rookie trying to port to C/C++ question.


    A week or two ago, I asked here about porting Python to C. Got some good
    answers (Note 1) but now I've got another question. Actually, more a
    request for clarification of a topic that both the Python tutorial and
    docs leave a touch murky to my understanding.

    Dictionaries/"dict" types...

    Am I understanding/interpreting correctly when I go with the idea that a
    "dict" variable can be looked at as (effectively) two parallel arrays?
    Something like this C construct:

    struct DictStruct
    {
    char *KeyArray[<somenumber>];
    ValType ValArray[<somenumber>];
    }

    Where "ValType" would be likely be a union, to allow
    assigning/retrieving the actual values correctly depending on what their
    individiual types were, and "<somenumbe r>" is just that: Some
    more-or-less arbitrarily selected value that provides enough space to
    handle all expected numbers of key/value pairs?

    Similarly, would something like:

    struct DictStruct
    {
    char Key[25];
    ValType Value;
    DictStruct *PrevDictEntry;
    DictStruct *NextDictEntry;
    }

    (again, ValType would probably be a union to cover each type of value
    that might be wanted in the dictionary - but in this case, Key directly
    contains the Key's name) then building a double-linked list of struct
    DictStructs end up behaving at least reasonably like a Python "dict"?

    I expect that either will be a reasonable facsimile of how Python
    handles dicts, with, of course, the corresponding overhead (which may be
    surprisingly little, I'm thinking... but I have yet to code that part,
    so I may find out it's going to be a nightmare) of "We want the value
    that goes with key, so we've got to walk the KeyArray[] until we find a
    match, then extract the value from the other array", or a similar
    procedure for the doubly-linked list.

    Or am I way out in left field with the way I'm RTFM?



    (Note 1)
    Along with quite a bit of entirely unwelcome "Python evangelism" --
    Which part of "I don't care how good, bad, or indifferent Python is
    compared to C/C++, or any other language. I want this Python code to be
    written in C/C++, end of discussion." wasn't clear enough for those of
    you who took it upon yourselves to preach (including the idiot who
    decided to spew a relatively impressive (as such things go) string of
    insults about how stupid I am, how far up my ass my head is, and how
    pathetic my family all the way back to Adam must be for producing such a
    throwback because I've got no interest in adopting "the perfect
    language...") at me on the merits of Python, both on group and
    in-mailbox?

    --
    Don Bruder - dakidd@sonic.ne t <--- Preferred Email - SpamAssassinate d.
    Hate SPAM? See <http://www.spamassassi n.org> for some seriously great info.
    I will choose a path that's clear: I will choose Free Will! - N. Peart
    Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
  • Thomas Heller

    #2
    Re: Another Python rookie trying to port to C/C++ question.

    Don Bruder <dakidd@sonic.n et> writes:

    [rewriting a Python program in C/C++][color=blue]
    > Something like this C construct:
    >
    > struct DictStruct
    > {
    > char *KeyArray[<somenumber>];
    > ValType ValArray[<somenumber>];
    > }
    >[/color]
    [...][color=blue]
    > Similarly, would something like:
    >
    > struct DictStruct
    > {
    > char Key[25];
    > ValType Value;
    > DictStruct *PrevDictEntry;
    > DictStruct *NextDictEntry;
    > }
    >
    > [...] end up behaving at least reasonably like a Python "dict"?[/color]

    Yes.

    Except that it would be less flexible, *much* slower, and
    probably much more buggy.

    Thomas

    Comment

    • Peter Otten

      #3
      Re: Another Python rookie trying to port to C/C++ question.

      Don Bruder wrote:
      [color=blue]
      > A week or two ago, I asked here about porting Python to C. Got some good
      > answers (Note 1) but now I've got another question. Actually, more a
      > request for clarification of a topic that both the Python tutorial and
      > docs leave a touch murky to my understanding.
      >
      > Dictionaries/"dict" types...[/color]

      As you seem to feel at home with C, you could have a look into
      Objects/dictobject.c and Include/dictobject.h of the source distribution.
      [color=blue]
      > Which part of "I don't care how good, bad, or indifferent Python is
      > compared to C/C++, or any other language. I want this Python code to be
      > written in C/C++, end of discussion." wasn't clear enough for those of[/color]

      Smart people use efficient tools and build on existing libraries. If you are
      serious about the C++ part of your subject line, the Standard library aka
      STL has a map template that resembles Python's dict and might enhance both
      development and execution speed.

      Off topic (or not):
      A single person can start a discussion but not limit it, I think that's a
      feature of newsgroups rather than an annoyance.


      Peter

      Comment

      • Peter Otten

        #4
        Re: Another Python rookie trying to port to C/C++ question.

        Don Bruder wrote:
        [color=blue]
        > A week or two ago, I asked here about porting Python to C. Got some good
        > answers (Note 1) but now I've got another question. Actually, more a
        > request for clarification of a topic that both the Python tutorial and
        > docs leave a touch murky to my understanding.
        >
        > Dictionaries/"dict" types...[/color]

        As you seem to feel at home with C, you could have a look into
        Objects/dictobject.c and Include/dictobject.h of the source distribution.
        [color=blue]
        > Which part of "I don't care how good, bad, or indifferent Python is
        > compared to C/C++, or any other language. I want this Python code to be
        > written in C/C++, end of discussion." wasn't clear enough for those of[/color]

        Smart people use efficient tools and build on existing libraries. The
        Standard library aka STL has a <map> template that might enhance both
        development and execution speed.

        Off topic (or not):
        A single person can start a discussion but not end it, I think that's a
        feature of newsgroups rather than an annoyance.


        Peter

        Comment

        • Ulrich Petri

          #5
          Re: Another Python rookie trying to port to C/C++ question.

          "Don Bruder" <dakidd@sonic.n et> schrieb im Newsbeitrag
          news:GSGcb.2439 4$dk4.776543@ty phoon.sonic.net ...[color=blue]
          >[/color]
          [color=blue]
          > I expect that either will be a reasonable facsimile of how Python
          > handles dicts, with, of course, the corresponding overhead (which may be
          > surprisingly little, I'm thinking... but I have yet to code that part,
          > so I may find out it's going to be a nightmare) of "We want the value
          > that goes with key, so we've got to walk the KeyArray[] until we find a
          > match, then extract the value from the other array", or a similar
          > procedure for the doubly-linked list.[/color]

          Python hashes the keys for fast lookup.

          [color=blue]
          > (Note 1)
          > Along with quite a bit of entirely unwelcome "Python evangelism" --
          > Which part of "I don't care how good, bad, or indifferent Python is
          > compared to C/C++, or any other language. I want this Python code to be
          > written in C/C++, end of discussion." wasn't clear enough for those of
          > you who took it upon yourselves to preach (including the idiot who
          > decided to spew a relatively impressive (as such things go) string of
          > insults about how stupid I am, how far up my ass my head is, and how
          > pathetic my family all the way back to Adam must be for producing such a
          > throwback because I've got no interest in adopting "the perfect
          > language...") at me on the merits of Python, both on group and
          > in-mailbox?[/color]

          You are treated as you treat others.
          It might suprise you but this is a newsgroup not your personal information
          lookup place.


          Comment

          • Don Bruder

            #6
            Re: Another Python rookie trying to port to C/C++ question.

            In article <bkvklh$e2i$02$ 1@news.t-online.com>,
            Peter Otten <__peter__@web. de> wrote:
            [color=blue]
            > Don Bruder wrote:
            >[color=green]
            > > A week or two ago, I asked here about porting Python to C. Got some good
            > > answers (Note 1) but now I've got another question. Actually, more a
            > > request for clarification of a topic that both the Python tutorial and
            > > docs leave a touch murky to my understanding.
            > >
            > > Dictionaries/"dict" types...[/color]
            >
            > As you seem to feel at home with C, you could have a look into
            > Objects/dictobject.c and Include/dictobject.h of the source distribution.[/color]

            I may just go poking around there for more details.
            Someone (via email) suggests that dicts are "supposed to be" (my
            synopsis, not his words) a bit "murky", because all that's supposed to
            be relevant is stuffing things into them, and pulling things out. That's
            fine, completely admirable, and fully acceptable. *IF* You're working
            directly in Python. I'm not. I'm working in a combination of C and C++,
            attempting to duplicate the functionality of a program originally
            written in Python. As such, the details *MUST* get clear if I'm to
            create equivalent functionality. If only because I need to read data
            that was written (presumably...) by a Python program. Hence, it becomes
            neccesary to get down "in the muck" and actually know what's going on
            behind the nice, tidy "dict.whosit.ge t()" (or whatever other operators
            might be involved) interface that Python provides.[color=blue]
            >[color=green]
            > > Which part of "I don't care how good, bad, or indifferent Python is
            > > compared to C/C++, or any other language. I want this Python code to be
            > > written in C/C++, end of discussion." wasn't clear enough for those of[/color]
            >
            > Smart people use efficient tools and build on existing libraries.[/color]

            There's a statement that needed saying.... Not...

            [color=blue]
            > If you are
            > serious about the C++ part of your subject line, the Standard library aka
            > STL has a map template that resembles Python's dict and might enhance both
            > development and execution speed.[/color]

            I'm *REASONABLY* serious. This port started life as a "Do you really
            know as much about C++ as you think you do? Prove it! Take this Python
            program and make it C++!" sort of "mid-term" test in my self-taught C++
            course. It's rapidly degenerating (due to some technicalities of C++
            that I've never before encountered, and therefore had no expectation of
            trouble from) into "Let's just get the damn thing working in straight C,
            then worry about trying to turn it into C++ later."
            [color=blue]
            >
            > Off topic (or not):
            > A single person can start a discussion but not limit it, I think that's a
            > feature of newsgroups rather than an annoyance.[/color]

            Perhaps, and perhaps not. Guess it depends on your perspective. Having a
            stream of insult and invective dumped into your mailbox because you've
            made it plain that you don't consider somebody's "pet" language the
            be-all and end-all of computer programming is hardly what I'd call a
            "feature".. .

            Be that as it may, fortunately I have a thick enough hide to
            figuratively "flip off" the jerk that was involved, and go on with life
            knowing full well that in at least one respect, I'm so far superior to
            him that words can't describe the gulf between us. Doesn't mean I'm not
            going to mention that I think he's a hopeless putz, but I'm sure not
            going to let it get me down to the point of giving up.

            --
            Don Bruder - dakidd@sonic.ne t <--- Preferred Email - SpamAssassinate d.
            Hate SPAM? See <http://www.spamassassi n.org> for some seriously great info.
            I will choose a path that's clear: I will choose Free Will! - N. Peart
            Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>

            Comment

            • Don Bruder

              #7
              Re: Another Python rookie trying to port to C/C++ question.

              In article <8yoc1wwt.fsf@p ython.net>,
              Thomas Heller <theller@python .net> wrote:
              [color=blue]
              > Don Bruder <dakidd@sonic.n et> writes:
              >
              > [rewriting a Python program in C/C++][color=green]
              > > Something like this C construct:
              > >
              > > struct DictStruct
              > > {
              > > char *KeyArray[<somenumber>];
              > > ValType ValArray[<somenumber>];
              > > }
              > >[/color]
              > [...][color=green]
              > > Similarly, would something like:
              > >
              > > struct DictStruct
              > > {
              > > char Key[25];
              > > ValType Value;
              > > DictStruct *PrevDictEntry;
              > > DictStruct *NextDictEntry;
              > > }
              > >
              > > [...] end up behaving at least reasonably like a Python "dict"?[/color]
              >
              > Yes.
              >
              > Except that it would be less flexible, *much* slower, and
              > probably much more buggy.
              >
              > Thomas[/color]

              Well, since I'm not worried about "flexible", have my doubts about the
              "much slower" part, and have no reason to believe that choice of
              language automatically makes a routine buggy, I'll go with the tenative
              plan I had. Seems you're confirming that I'm understanding things
              properly when it comes to "This is what a dict is, and here's how it
              works." Now to put finishing touches on my understanding of exactly what
              goes on inside that "black box" that is the "dict"
              class/type/whatever-it-is-you-Python-types-like-to-call-such-things, and
              make some working code out of it...

              --
              Don Bruder - dakidd@sonic.ne t <--- Preferred Email - SpamAssassinate d.
              Hate SPAM? See <http://www.spamassassi n.org> for some seriously great info.
              I will choose a path that's clear: I will choose Free Will! - N. Peart
              Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>

              Comment

              • Stephen Horne

                #8
                Re: Another Python rookie trying to port to C/C++ question.

                On Thu, 25 Sep 2003 18:55:34 GMT, Don Bruder <dakidd@sonic.n et> wrote:
                [color=blue]
                >Dictionaries/"dict" types...
                >
                >Am I understanding/interpreting correctly when I go with the idea that a
                >"dict" variable can be looked at as (effectively) two parallel arrays?
                >Something like this C construct:
                >
                >struct DictStruct
                > {
                > char *KeyArray[<somenumber>];
                > ValType ValArray[<somenumber>];
                > }[/color]

                Sort of, in a sense, but not quite. The types of items within the
                dictionary (both key and value) are not constrained. This is no
                problem as all Python objects are held using a reference scheme. So
                (assuming that a pair-object scheme is used in the dictionary) the
                struct will be defined much like...

                struct DictPair
                {
                PyObject *Key;
                PyObject *Data;
                }

                The items will almost certainly be held in a single 'array' of pair
                objects.

                The Python dictionaries use hashing. I don't know the details of the
                hashing used. Anyway, there are many ways to handle dictionary-like
                data structures.

                C++ has the std::map template, which uses a form of binary tree called
                a red-black tree (the red-black refers to a partial balancing system).

                Tree-based mappings can be slightly more flexible than hashing (for
                instance, a tree can be 'walked' in sorted order - the hashing process
                does not preserve relative ordering) but hashing tends to be faster.

                An interesting approach to dictionary-like objects is the ternary
                tree. This is rather like cascading binary trees - each subtree
                identifies one character and cascades into the subtree that finds the
                next character. The third link in the 'ternary' is the 'I've
                identified one character and moving on to the next' link.

                A ternary tree can be faster than hashing - in particular at deciding
                that a particular key is not present (hashing needs to calculate a
                hash over the whole tree before looking up the result - a ternary tree
                search may fail at the first character, without looking at the whole
                key).

                Digital trees (or tries, IIRC) can be even faster still - but wastes a
                lot of memory. In a digital tree, each node has an array subscripted
                by character/digit code of pointers to the next subtree.

                Multiway trees (most often used on disk for database indexes - B trees
                and B+ trees being types of multiway trees) may well be appropriate in
                main store on modern desktop machines with caching and virtual memory.

                If these options are just too complicated, sorted arrays can be
                extremely effective as long as you don't have too many items. The C++
                library includes std::vector (a resizable array) and binary search
                algorithms which can ease the implementation.


                Odds are, if you are translating Python to C++, you should use an
                std::map to replace a dictionary. There are some different
                assumptions, though. Dictionaries assume that the key is hashable, but
                don't require relative comparisons ('<', '<=' etc). The std::map
                requires relative comparisons.

                This simply arises out of the different approaches - Python using
                hashing and C++ using binary trees. Going from Python to C++, 99 times
                out of 100 this is not a problem. From C++ to Python there is the
                minor issue that Python dictionaries can't be _directly_ iterated in
                sorted order, though there are certainly easy ways around that.


                --
                Steve Horne

                steve at ninereeds dot fsnet dot co dot uk

                Comment

                • Stephen Horne

                  #9
                  Re: Another Python rookie trying to port to C/C++ question.

                  On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <brian@sweetapp .com>
                  wrote:
                  [color=blue][color=green]
                  >> Well, since I'm not worried about "flexible", have my doubts about the
                  >> "much slower" part,[/color]
                  >
                  >Your implementation is O(n) with the number of entries while the Python
                  >implementati on is O(log(n)). If n is small then your implementation
                  >might be acceptably fast. Can't you just use the STL hash_map template
                  >class?[/color]

                  Are you sure Python dictionaries are O(log n)? - Remember, they are
                  based on hashing.

                  Simple hashing is O(1). Handling of collisions and stuff will affect
                  that (and I'm no expert on hashing techniques) but I can't help
                  wondering if you're confusing Pythons hashing with tree-based
                  techniques.


                  --
                  Steve Horne

                  steve at ninereeds dot fsnet dot co dot uk

                  Comment

                  • Tom Anderson

                    #10
                    Re: Another Python rookie trying to port to C/C++ question.

                    On Thu, 25 Sep 2003, Don Bruder wrote:
                    [color=blue]
                    > [...] As such, the details *MUST* get clear if I'm to create equivalent
                    > functionality. If only because I need to read data that was written
                    > (presumably...) by a Python program. Hence, it becomes neccesary to get
                    > down "in the muck" and actually know what's going on behind the nice,
                    > tidy "dict.whosit.ge t()" (or whatever other operators might be involved)
                    > interface that Python provides.[/color]

                    is this data in 'pickle' format (as opposed to text, XML, or some specific
                    binary format)? if it is, reading it in C/C++ is going to be very hard
                    indeed; you might consider writing a small python program to read it and
                    convert it to something more tractable. if it isn't, you don't need to
                    'get down in the muck'; you just need to write a C program to do the job
                    that needs to be done.

                    if pickling is entirely new to you, then your data probably isn't pickled,
                    so you don't need to worry about it.
                    [color=blue][color=green]
                    > > Off topic (or not): A single person can start a discussion but not
                    > > limit it, I think that's a feature of newsgroups rather than an
                    > > annoyance.[/color]
                    >
                    > Perhaps, and perhaps not. Guess it depends on your perspective. Having a
                    > stream of insult and invective dumped into your mailbox because you've
                    > made it plain that you don't consider somebody's "pet" language the
                    > be-all and end-all of computer programming is hardly what I'd call a
                    > "feature".. .[/color]

                    i'm sorry that happened; i hope you won't think badly of python and
                    pythoneers because of one moron. discreetly let the relevant people know
                    who he is and we'll fetch the Comfy Chair ... 8)

                    tom

                    --
                    Things fall apart - it's scientific

                    Comment

                    • Tom Anderson

                      #11
                      Re: Another Python rookie trying to port to C/C++ question.

                      On Thu, 25 Sep 2003, Don Bruder wrote:
                      [color=blue]
                      > In article <8yoc1wwt.fsf@p ython.net>,
                      > Thomas Heller <theller@python .net> wrote:
                      >[color=green]
                      > > Don Bruder <dakidd@sonic.n et> writes:
                      > >
                      > > [rewriting a Python program in C/C++][color=darkred]
                      > > > Something like this C construct:
                      > > >
                      > > > struct DictStruct
                      > > > {
                      > > > char *KeyArray[<somenumber>];
                      > > > ValType ValArray[<somenumber>];
                      > > > }
                      > > >[/color]
                      > > [...][color=darkred]
                      > > > Similarly, would something like:
                      > > >
                      > > > struct DictStruct
                      > > > {
                      > > > char Key[25];
                      > > > ValType Value;
                      > > > DictStruct *PrevDictEntry;
                      > > > DictStruct *NextDictEntry;
                      > > > }
                      > > >
                      > > > [...] end up behaving at least reasonably like a Python "dict"?[/color]
                      > >
                      > > Yes.
                      > >
                      > > Except that it would be less flexible, *much* slower, and
                      > > probably much more buggy.[/color]
                      >
                      > Well, since I'm not worried about "flexible", have my doubts about the
                      > "much slower" part,[/color]

                      think again. python's dictionaries use an efficient implementation, are
                      written in C, and have been tuned over the course of several years. your
                      sketched design uses a very slow implementation, and is brand new.
                      [color=blue]
                      > and have no reason to believe that choice of language automatically
                      > makes a routine buggy,[/color]

                      it's maturity that matters in this case; you will make mistakes, just like
                      the python (and STL) implementers did, but they've already had years to
                      find and fix them.

                      if you don't fancy using STL (which will be the case if you're steering
                      clear of C++ - and might well be even if you weren't!), at least go and
                      read up on how to implement dictionaries; they're a very well-studied
                      abstract type. hashtables are simple and efficient, and binary trees are
                      good too. there are more complicated things like ternary trees, red-black
                      trees, Patricia trees (my personal favourite :) ) and skip lists, but
                      you'll probably be fine with hashtables. if you can lay your hands on a
                      good data structures and algorithms textbook, you'll be fine (Sedgewick's
                      'Algorithms in C' is where i learnt my craft).

                      tom

                      --
                      Things fall apart - it's scientific

                      Comment

                      • Stephen Horne

                        #12
                        Re: Another Python rookie trying to port to C/C++ question.

                        On Fri, 26 Sep 2003 00:33:46 +0100, Tom Anderson
                        <twic@urchin.ea rth.li> wrote:
                        [color=blue]
                        >Patricia trees (my personal favourite :)[/color]

                        Oh goody - Something new to look up!


                        --
                        Steve Horne

                        steve at ninereeds dot fsnet dot co dot uk

                        Comment

                        • Duncan Booth

                          #13
                          Re: Another Python rookie trying to port to C/C++ question.

                          Stephen Horne <$$$$$$$$$$$$$$ $$$@$$$$$$$$$$$ $$$$$$$$$.co.uk > wrote in
                          news:79t6nvceo8 0m6cumskpqdmipa 921da054o@4ax.c om:
                          [color=blue]
                          > On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <brian@sweetapp .com>
                          > wrote:
                          >[color=green][color=darkred]
                          >>> Well, since I'm not worried about "flexible", have my doubts about the
                          >>> "much slower" part,[/color]
                          >>
                          >>Your implementation is O(n) with the number of entries while the Python
                          >>implementatio n is O(log(n)). If n is small then your implementation
                          >>might be acceptably fast. Can't you just use the STL hash_map template
                          >>class?[/color]
                          >
                          > Are you sure Python dictionaries are O(log n)? - Remember, they are
                          > based on hashing.
                          >
                          > Simple hashing is O(1). Handling of collisions and stuff will affect
                          > that (and I'm no expert on hashing techniques) but I can't help
                          > wondering if you're confusing Pythons hashing with tree-based
                          > techniques.
                          >[/color]

                          Python dictionaries are effectively O(1). Dictionaries are never more than
                          2/3rds full, so on average a dictionary lookup only needs a couple of
                          probes. Python's collision resolution is designed such that in most cases
                          lookups that collide on the first attempt will diverge for the next
                          attempt.

                          As some of the other posters have said dictionaries in Python have been
                          tuned over many years, and since the whole of Python is based on fast
                          dictionary lookups it is very unlikely that the OP will get close to the
                          same efficiency. That may not matter though as Python dictionaries are
                          tuned for general use, so a specific application might be able to do better
                          by losing some of the generality.

                          Some of the tuning extends beyond dictionaries, for example strings are
                          commonly used as dictionary keys, and Python avoids recalculating hash
                          values more than once for each string. It is unlikely that a C or C++
                          programmer would go to the effort of using a string type that cached hash
                          values throughout their application, so for many uses, string lookups in
                          dictionaries will almost certainly be slower (although there could be gains
                          elsewhere).

                          --
                          Duncan Booth duncan@rcp.co.u k
                          int month(char *p){return(1248 64/((p[0]+p[1]-p[2]&0x1f)+1)%12 )["\5\x8\3"
                          "\6\7\xb\1\x9\x a\2\0\4"];} // Who said my code was obscure?

                          Comment

                          Working...