Array of dict or lists or ....?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Pat

    Array of dict or lists or ....?


    I can't figure out how to set up a Python data structure to read in data
    that looks something like this (albeit somewhat simplified and contrived):


    States
    Counties
    Schools
    Classes
    Max Allowed Students
    Current enrolled Students

    Nebraska, Wabash, Newville, Math, 20, 0
    Nebraska, Wabash, Newville, Gym, 400, 0
    Nebraska, Tingo, Newfille, Gym, 400, 0
    Ohio, Dinger, OldSchool, English, 10, 0

    With each line I read in, I would create a hash entry and increment the
    number of enrolled students.

    I wrote a routine in Perl using arrays of hash tables (but the syntax
    was a bear) that allowed me to read in the data and with those arrays of
    hash tables to arrays of hash tables almost everything was dynamically
    assigned.

    I was able to fill in the hash tables and determine if any school class
    (e.g. Gym) had exceeded the number of max students or if no students had
    enrolled.

    No, this is not a classroom project. I really need this for my job.
    I'm converting my Perl program to Python and this portion has me stumped.

    The reason why I'm converting a perfectly working program is because no
    one else knows Perl or Python either (but I believe that someone new
    would learn Python quicker than Perl) and the Perl program has become
    huge and is continuously growing.
  • Tim Chase

    #2
    Re: Array of dict or lists or ....?

    I can't figure out how to set up a Python data structure to read in data
    that looks something like this (albeit somewhat simplified and contrived):
    >
    States
    Counties
    Schools
    Classes
    Max Allowed Students
    Current enrolled Students
    >
    Nebraska, Wabash, Newville, Math, 20, 0
    Nebraska, Wabash, Newville, Gym, 400, 0
    Nebraska, Tingo, Newfille, Gym, 400, 0
    Ohio, Dinger, OldSchool, English, 10, 0
    >
    With each line I read in, I would create a hash entry and increment the
    number of enrolled students.
    A python version of what you describe:

    class TooManyAttendan ts(Exception): pass
    class Attendence(obje ct):
    def __init__(self, max):
    self.max = int(max)
    self.total = 0
    def accrue(self, other):
    self.total += int(other)
    if self.total self.max: raise TooManyAttendan ts
    def __str__(self):
    return "%s/%s" % (self.max, self.total)
    __repr__ = __str__

    data = {}
    for i, line in enumerate(file( "input.txt" )):
    print line,
    state, county, school, cls, max_students, enrolled = map(
    lambda s: s.strip(),
    line.rstrip("\r \n").split(", ")
    )
    try:
    data.setdefault (
    state, {}).setdefault(
    county, {}).setdefault(
    cls, Attendence(max_ students)).accr ue(enrolled)
    except TooManyAttendan ts:
    print "Too many Attendants in line %i" % (i + 1)
    print repr(data)


    You can then access things like

    a = data["Nebraska"]["Wabash"]["Newville"]["Math"]
    print a.max, a.total

    If capitalization varies, you may have to do something like

    data.setdefault (
    state.upper(), {}).setdefault(
    county.upper(), {}).setdefault(
    cls.upper(), Attendence(max_ students)).accr ue(enrolled)

    to make sure they're normalized into the same groupings.

    -tkc






    Comment

    • bearophileHUGS@lycos.com

      #3
      Re: Array of dict or lists or ....?

      Tim Chase:
      __repr__ = __str__
      I don't know if that's a good practice.

      try:
      data.setdefault (
      state, {}).setdefault(
      county, {}).setdefault(
      cls, Attendence(max_ students)).accr ue(enrolled)
      except TooManyAttendan ts:
      I suggest to decompress that part a little, to make it a little more
      readable.

      Bye,
      bearophile

      Comment

      • Tim Chase

        #4
        Re: Array of dict or lists or ....?

        > __repr__ = __str__
        >
        I don't know if that's a good practice.
        I've seen it in a couple places, and it's pretty explicit what
        it's doing.
        > try:
        > data.setdefault (
        > state, {}).setdefault(
        > county, {}).setdefault(
        > cls, Attendence(max_ students)).accr ue(enrolled)
        > except TooManyAttendan ts:
        >
        I suggest to decompress that part a little, to make it a little more
        readable.
        I played around with the formatting and didn't really like any of
        the formatting I came up with. My other possible alternatives were:

        try:
        data \
        .setdefault(sta te, {}) \
        .setdefault(cou nty, {}) \
        .setdefault(cls , Attendence(max_ students)) \
        .accrue(enrolle d)
        except TooManyAttendan ts:

        or

        try:
        (data
        .setdefault(sta te, {})
        .setdefault(cou nty, {})
        .setdefault(cls , Attendence(max, 0))
        ).accrue(enroll ed)
        except TooManyAttendan ts:

        Both accentuate the setdefault() calls grouped with their
        parameters, which can be helpful. Which one is "better" is a
        matter of personal preference:

        * no extra characters but hard to read
        * backslashes, or
        * an extra pair of parens

        -tkc




        Comment

        • Gabriel Genellina

          #5
          Re: Array of dict or lists or ....?

          En Mon, 06 Oct 2008 22:52:29 -0300, Tim Chase
          <python.list@ti m.thechases.com escribió:
          >> __repr__ = __str__
          [bearophileHUGS@ lycos.com wrote]
          > I don't know if that's a good practice.
          I've seen it in a couple places, and it's pretty explicit what it's
          doing.
          __repr__ is used as a fallback for __str__, so just defining __repr__ (and
          leaving out __str__) is enough.

          --
          Gabriel Genellina

          Comment

          • Hrvoje Niksic

            #6
            Re: Array of dict or lists or ....?

            Tim Chase <python.list@ti m.thechases.com writes:
            >> __repr__ = __str__
            >>
            >I don't know if that's a good practice.
            >
            I've seen it in a couple places, and it's pretty explicit what it's
            doing.
            But what's the point? Simply define __repr__, and both repr and str
            will pick it up.

            Comment

            • Pat

              #7
              Re: Array of dict or lists or ....?

              Dennis Lee Bieber wrote:
              On Mon, 06 Oct 2008 19:45:07 -0400, Pat <Pat@junk.comde claimed the
              following in comp.lang.pytho n:
              >
              >I can't figure out how to set up a Python data structure to read in data
              >that looks something like this (albeit somewhat simplified and contrived):
              >>
              >>
              >States
              > Counties
              > Schools
              > Classes
              > Max Allowed Students
              > Current enrolled Students
              >>
              >Nebraska, Wabash, Newville, Math, 20, 0
              >Nebraska, Wabash, Newville, Gym, 400, 0
              >Nebraska, Tingo, Newfille, Gym, 400, 0
              >Ohio, Dinger, OldSchool, English, 10, 0
              >
              <snip>
              >
              The structure looks more suited to a database -- maybe SQLite since
              the interface is supplied with the newer versions of Python (and
              available for older versions).
              I don't understand why I need a database when it should just be a matter
              of defining the data structure. I used a fictional example to make it
              easier to (hopefully) convey how the data is laid out.

              One of the routines in the actual program checks a few thousand
              computers to verify that certain processes are running. I didn't want
              to complicate my original question by going through all of the gory
              details (multiple userids running many processes with some of the
              processes having the same name). To save time, I fork a process for
              each computer that I'm checking. It seems to me that banging away at a
              database would greatly slow down the program and make the program more
              complicated.

              The Perl routine works fine and I'd like to emulate that behavior but
              since I've just starting learning Python I don't know the syntax for
              designing the data structure. I would really appreciate it if someone
              could point me in the right direction.

              Comment

              • Barak, Ron

                #8
                RE: Array of dict or lists or ....?

                Would the following be suitable data structure:
                ....
                struct = {}
                struct["Nebraska"] = "Wabash"
                struct["Nebraska"]["Wabash"] = "Newville"
                struct["Nebraska"]["Wabash"]["Newville"]["topics"] = "Math"
                struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Max Allowed Students"] = 20
                struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Current enrolled Students"] = 0
                ....

                Have an easy Yom Kippur,
                Ron.

                -----Original Message-----
                From: Pat [mailto:Pat@junk .net]
                Sent: Wednesday, October 08, 2008 04:16
                To: python-list@python.org
                Subject: Re: Array of dict or lists or ....?

                Dennis Lee Bieber wrote:
                On Mon, 06 Oct 2008 19:45:07 -0400, Pat <Pat@junk.comde claimed the
                following in comp.lang.pytho n:
                >
                >I can't figure out how to set up a Python data structure to read in
                >data that looks something like this (albeit somewhat simplified and contrived):
                >>
                >>
                >States
                > Counties
                > Schools
                > Classes
                > Max Allowed Students
                > Current enrolled Students
                >>
                >Nebraska, Wabash, Newville, Math, 20, 0 Nebraska, Wabash, Newville,
                >Gym, 400, 0 Nebraska, Tingo, Newfille, Gym, 400, 0 Ohio, Dinger,
                >OldSchool, English, 10, 0
                >
                <snip>
                >
                The structure looks more suited to a database -- maybe SQLite since
                the interface is supplied with the newer versions of Python (and
                available for older versions).
                I don't understand why I need a database when it should just be a matter ofdefining the data structure. I used a fictional example to make it easierto (hopefully) convey how the data is laid out.

                One of the routines in the actual program checks a few thousand computers to verify that certain processes are running. I didn't want to complicate my original question by going through all of the gory details (multiple userids running many processes with some of the processes having the same name).. To save time, I fork a process for each computer that I'm checking. It seems to me that banging away at a database would greatly slow down the program and make the program more complicated.

                The Perl routine works fine and I'd like to emulate that behavior but sinceI've just starting learning Python I don't know the syntax for designing the data structure. I would really appreciate it if someone could point me in the right direction.

                Comment

                • Aaron \Castironpi\ Brady

                  #9
                  Re: Array of dict or lists or ....?

                  On Oct 7, 10:16 am, "Barak, Ron" <Ron.Ba...@lsi. comwrote:
                  Would the following be suitable data structure:
                  ...
                  struct = {}
                  struct["Nebraska"] = "Wabash"
                  struct["Nebraska"]["Wabash"] = "Newville"
                  struct["Nebraska"]["Wabash"]["Newville"]["topics"] = "Math"
                  struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Max Allowed Students"] = 20
                  struct["Nebraska"]["Wabash"]["Newville"]["Math"]["Current enrolled Students"] = 0
                  ...
                  That's not quite right as stated.
                  >>struct = {}
                  >>struct["Nebraska"] = "Wabash"
                  >>struct["Nebraska"]["Wabash"] = "Newville"
                  Traceback (most recent call last):
                  File "<stdin>", line 1, in <module>
                  TypeError: 'str' object does not support item assignment

                  Comment

                  • Reedick, Andrew

                    #10
                    RE: Array of dict or lists or ....?

                    -----Original Message-----
                    From: python-list-bounces+jr9445= att.com@python. org [mailto:python-
                    list-bounces+jr9445= att.com@python. org] On Behalf Of Pat
                    Sent: Tuesday, October 07, 2008 10:16 PM
                    To: python-list@python.org
                    Subject: Re: Array of dict or lists or ....?

                    The Perl routine works fine and I'd like to emulate that behavior but
                    since I've just starting learning Python I don't know the syntax for
                    designing the data structure. I would really appreciate it if someone
                    could point me in the right direction.



                    states = {}

                    if 'georgia' not in states:
                    states['georgia'] = {}

                    states['georgia']['fulton'] = {}
                    states['georgia']['fulton']['ps101'] = {}
                    states['georgia']['fulton']['ps101']['math'] = {}
                    states['georgia']['fulton']['ps101']['math']['max'] = 100
                    states['georgia']['fulton']['ps101']['math']['current'] = 33


                    states['georgia']['dekalb'] = {}
                    states['georgia']['dekalb']['ps202'] = {}
                    states['georgia']['dekalb']['ps202']['english'] = {}
                    states['georgia']['dekalb']['ps202']['english']['max'] = 500
                    states['georgia']['dekalb']['ps202']['english']['current'] = 44

                    print states


                    *****

                    The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621


                    Comment

                    • George Sakkis

                      #11
                      Re: Array of dict or lists or ....?

                      On Oct 7, 10:15 pm, Pat <P...@junk.netw rote:
                      Dennis Lee Bieber wrote:
                      On Mon, 06 Oct 2008 19:45:07 -0400, Pat <P...@junk.comd eclaimed the
                      following in comp.lang.pytho n:
                      >
                      I can't figure out how to set up a Python data structure to read in data
                      that looks something like this (albeit somewhat simplified and contrived):
                      >
                      States
                      Counties
                      Schools
                      Classes
                      Max Allowed Students
                      Current enrolled Students
                      >
                      Nebraska, Wabash, Newville, Math, 20, 0
                      Nebraska, Wabash, Newville, Gym, 400, 0
                      Nebraska, Tingo, Newfille, Gym, 400, 0
                      Ohio, Dinger, OldSchool, English, 10, 0
                      >
                      <snip>
                      >
                      The structure looks more suited to a database -- maybe SQLite since
                      the interface is supplied with the newer versions of Python (and
                      available for older versions).
                      Seconded.
                      I don't understand why I need a database when it should just be
                      a matter of defining the data structure.
                      Picking an appropriate data structure depends on the kind of
                      functionality you want to provide. So far you basically described just
                      one requirement: keep a tally of how many students are in each class
                      and compare it to the max allowed (and zero). If that's the only kind
                      of query you want to run against your data, there's no reason to index
                      separately each state, county, or school; all you care about are
                      classes. A simple data structure that satisfies perfectly the
                      requirement could then be:

                      # mapping of {class-info : (max,enrolled)}

                      data = {
                      ('Nebraska', 'Wabash', 'Newville', 'Math') : (20, 0),
                      ('Nebraska', 'Wabash', 'Newville', 'Gym') : (400, 0),
                      ('Nebraska', 'Tingo', 'Newville', 'Gym') : (400, 0),
                      ('Ohio', 'Dinger', 'OldSchool', 'English') : (10, 0),
                      }

                      Of course this data structure is pretty bad at answering a query like
                      "how many classes are there in Nebraska" or "what's the average number
                      of enrolled students in Newville". The more general information you
                      might want to get from the data, the more obvious it becomes that you
                      need a real database.

                      HTH,
                      George

                      Comment

                      • Ben Finney

                        #12
                        Re: Array of dict or lists or ....?

                        George Sakkis <george.sakkis@ gmail.comwrites :
                        On Oct 7, 10:15 pm, Pat <P...@junk.netw rote:
                        I don't understand why I need a database when it should just be a
                        matter of defining the data structure.
                        >
                        Picking an appropriate data structure depends on the kind of
                        functionality you want to provide.
                        […]
                        The more general information you might want to get from the data,
                        the more obvious it becomes that you need a real database.
                        Thanks very much for posting this answer; I tried to do something
                        similar but couldn't get at the essential points the way you did here.

                        Perhaps the original poster is confusing “you should use a database”
                        with “you should use a database stored in a fully-concurrent
                        dedicated database management system”.

                        Far from it: with Python 2.5 you have SQLite (in the ‘sqlite3’
                        module), which would be ideal for implementing a powerful relational
                        SQL database used directly by one program instance, without needing a
                        full-blown database management system in a separately-administrated
                        server application.

                        --
                        \ “Patience, n. A minor form of despair, disguised as a virtue.” |
                        `\ —Ambrose Bierce, _The Devil's Dictionary_, 1906 |
                        _o__) |
                        Ben Finney

                        Comment

                        • Gabriel Genellina

                          #13
                          Re: Array of dict or lists or ....?

                          En Tue, 07 Oct 2008 23:15:54 -0300, Pat <Pat@junk.netes cribió:
                          Dennis Lee Bieber wrote:
                          >On Mon, 06 Oct 2008 19:45:07 -0400, Pat <Pat@junk.comde claimed the
                          >following in comp.lang.pytho n:
                          >>
                          >>I can't figure out how to set up a Python data structure to read in
                          >>data that looks something like this (albeit somewhat simplified and
                          >>contrived):
                          >>>
                          >>>
                          >>States
                          >> Counties
                          >> Schools
                          >> Classes
                          >> Max Allowed Students
                          >> Current enrolled Students
                          >>>
                          >>Nebraska, Wabash, Newville, Math, 20, 0
                          >>Nebraska, Wabash, Newville, Gym, 400, 0
                          >>Nebraska, Tingo, Newfille, Gym, 400, 0
                          >>Ohio, Dinger, OldSchool, English, 10, 0
                          > <snip>
                          >>
                          >
                          >The structure looks more suited to a database -- maybe SQLite since
                          >the interface is supplied with the newer versions of Python (and
                          >available for older versions).
                          >
                          I don't understand why I need a database when it should just be a matter
                          of defining the data structure. I used a fictional example to make it
                          easier to (hopefully) convey how the data is laid out.
                          You don't need a full-blown-multiuser-concurrent-petabyte-capable-server
                          database, just one that does the job. SQLite is very small and comes with
                          Python 2.5
                          The Perl routine works fine and I'd like to emulate that behavior but
                          since I've just starting learning Python I don't know the syntax for
                          designing the data structure. I would really appreciate it if someone
                          could point me in the right direction.
                          So none of the previously posted alternatives worked for you?

                          --
                          Gabriel Genellina

                          Comment

                          • Scott David Daniels

                            #14
                            Re: Array of dict or lists or ....?

                            Pat wrote:
                            I can't figure out how to set up a Python data structure to read in data
                            that looks something like this (albeit somewhat simplified and contrived):
                            >
                            States
                            Counties
                            Schools
                            Classes
                            Max Allowed Students
                            Current enrolled Students
                            >
                            Nebraska, Wabash, Newville, Math, 20, 0
                            Nebraska, Wabash, Newville, Gym, 400, 0
                            Nebraska, Tingo, Newfille, Gym, 400, 0
                            Ohio, Dinger, OldSchool, English, 10, 0
                            >
                            With each line I read in, I would create a hash entry and increment the
                            number of enrolled students.

                            You might want something like this:
                            >>import collections, functools
                            >>int_dict = functools.parti al(collections. defaultdict, int)
                            >>curr = functools.parti al(collections. defaultdict, int)
                            >># builds a dict-maker where t = curr(); t['name'] += 1 "works"
                            >>for depth in range(4):
                            # add a layer with a default of the preceding "type"
                            curr = functools.parti al(collections. defaultdict, curr)
                            >>base = curr() # actually make one
                            >>base['Nebraska']['Wabash']['Newville']['Math']['max'] = 20
                            >>base['Nebraska']['Wabash']['Newville']['Math']['curr'] += 1
                            >>base['Nebraska']['Wabash']['Newville']['Math']['curr']
                            1
                            >>base['Nebraska']['Wabash']['Newville']['English']['curr']
                            0


                            --Scott David Daniels
                            Scott.Daniels@A cm.Org

                            Comment

                            Working...