* 'struct-like' list *

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ernesto

    * 'struct-like' list *

    I'm still fairly new to python, so I need some guidance here...

    I have a text file with lots of data. I only need some of the data. I
    want to put the useful data into an [array of] struct-like
    mechanism(s). The text file looks something like this:

    [BUNCH OF NOT-USEFUL DATA....]

    Name: David
    Age: 108 Birthday: 061095 SocialSecurity: 476892771999

    [MORE USELESS DATA....]

    Name........

    I would like to have an array of "structs." Each struct has

    struct Person{
    string Name;
    int Age;
    int Birhtday;
    int SS;
    }

    I want to go through the file, filling up my list of structs.

    My problems are:

    1. How to search for the keywords "Name:", "Age:", etc. in the file...
    2. How to implement some organized "list of lists" for the data
    structure.

    Any help is much appreciated.

  • Rene Pijlman

    #2
    Re: * 'struct-like' list *

    Ernesto:[color=blue]
    >1. How to search for the keywords "Name:", "Age:", etc. in the file...[/color]

    You could use regular expression matching:
    The official home of the Python Programming Language


    Or plain string searches:

    [color=blue]
    >2. How to implement some organized "list of lists" for the data
    >structure.[/color]

    You could make it a list of bunches, for example:


    Or a list of objects of your custom class.

    --
    René Pijlman

    Comment

    • Schüle Daniel

      #3
      Re: * 'struct-like' list *

      [color=blue]
      > I would like to have an array of "structs." Each struct has
      >
      > struct Person{
      > string Name;
      > int Age;
      > int Birhtday;
      > int SS;
      > }[/color]


      the easiest way would be

      class Person:
      pass

      john = Person()
      david = Person()

      john.name = "John Brown"
      john.age = 35
      etc

      think of john as namespace .. with attributes (we call them so) added on
      runtime

      better approch would be to make real class with constructor

      class Person(object):
      def __init__(self, name, age):
      self.name = name
      self.age = age
      def __str__(self):
      return "person name = %s and age = %i" % (self.name, self.age)

      john = Person("john brown", 35)
      print john # this calls __str__

      [color=blue]
      >
      > I want to go through the file, filling up my list of structs.
      >
      > My problems are:
      >
      > 1. How to search for the keywords "Name:", "Age:", etc. in the file...
      > 2. How to implement some organized "list of lists" for the data[/color]

      this depend on the structure of the file
      consider this format

      New
      Name: John
      Age: 35
      Id: 23242
      New
      Name: xxx
      Age
      Id: 43324
      OtherInfo: foo
      New

      here you could read all as string and split it on "New"

      here small example[color=blue][color=green][color=darkred]
      >>> txt = "fooXbarXfoobar "
      >>> txt.split("X")[/color][/color][/color]
      ['foo', 'bar', 'foobar'][color=blue][color=green][color=darkred]
      >>>[/color][/color][/color]

      in more complicated case I would use regexp but
      I doubt this is neccessary in your case

      Regards, Daniel

      Comment

      • Paul McGuire

        #4
        Re: * 'struct-like' list *

        "Ernesto" <erniedude@gmai l.com> wrote in message
        news:1139245389 .529742.317110@ g43g2000cwa.goo glegroups.com.. .[color=blue]
        > I'm still fairly new to python, so I need some guidance here...
        >
        > I have a text file with lots of data. I only need some of the data. I
        > want to put the useful data into an [array of] struct-like
        > mechanism(s). The text file looks something like this:
        >
        > [BUNCH OF NOT-USEFUL DATA....]
        >
        > Name: David
        > Age: 108 Birthday: 061095 SocialSecurity: 476892771999
        >
        > [MORE USELESS DATA....]
        >
        > Name........
        >
        > I would like to have an array of "structs." Each struct has
        >
        > struct Person{
        > string Name;
        > int Age;
        > int Birhtday;
        > int SS;
        > }
        >
        > I want to go through the file, filling up my list of structs.
        >
        > My problems are:
        >
        > 1. How to search for the keywords "Name:", "Age:", etc. in the file...
        > 2. How to implement some organized "list of lists" for the data
        > structure.
        >
        > Any help is much appreciated.
        >[/color]
        Ernesto -

        Since you are searching for keywords and matching fields, and trying to
        populate data structures as you go, this sounds like a good fit for
        pyparsing. Pyparsing as built-in features for scanning through text and
        extracting data, with suitably named data fields for accessing later.

        Download pyparsing at http://pyparsing.sourceforge.net.

        -- Paul

        ------------------------------------------------
        from pyparsing import *

        inputData = """[BUNCH OF NOT-USEFUL DATA....]

        Name: David
        Age: 108 Birthday: 061095 SocialSecurity: 476892771999

        [MORE USELESS DATA....]

        Name: Fred
        Age: 101 Birthday: 061065 SocialSecurity: 587903882000

        [MORE USELESS DATA....]

        Name: Barney
        Age: 99 Birthday: 061265 SocialSecurity: 698014993111

        [MORE USELESS DATA....]

        """

        dob = Word(nums,exact =6)
        # this matches your sample data, but I think SSN's are only 9 digits long
        socsecnum = Word(nums,exact =12)

        # define the personalData pattern - use results names to associate
        # field names with matched tokens, can then access data as if they were
        # attributes on an object
        personalData = ( "Name:" + empty + restOfLine.setR esultsName("Nam e") +
        "Age:" + Word(nums).setR esultsName("Age ") +
        "Birthday:" + dob.setResultsN ame("Birthday") +
        "SocialSecurity :" + socsecnum.setRe sultsName("SS") )

        # use personData.scan String to scan through the input, returning the
        matching
        # tokens, and their respective start/end locations in the string
        for person,s,e in personalData.sc anString(inputD ata):
        print "Name:", person.Name
        print "Age:", person.Age
        print "DOB:", person.Birthday
        print "SSN:", person.SS
        print

        # or use a list comp to scan the whole file, and return your Person data,
        giving you
        # your requested array of "structs" - not really structs, but ParseResults
        objects
        persons = [person for person,s,e in personalData.sc anString(inputD ata)]

        # or convert to Python dict's, which some people prefer to pyparsing's
        ParseResults
        persons = [dict(p) for p,s,e in personalData.sc anString(inputD ata)]
        print persons[0]
        print

        # or create an array of Person objects, as suggested in previous postings
        class Person(object):
        def __init__(self,p arseResults):
        self.__dict__.u pdate(dict(pars eResults))

        def __str__(self):
        return "Person(%s, %s, %s, %s)" %
        (self.Name,self .Age,self.Birth day,self.SS)

        persons = [Person(p) for p,s,e in personalData.sc anString(inputD ata)]
        for p in persons:
        print p.Name,"->",p

        --------------------------------------
        prints out:
        Name: David
        Age: 108
        DOB: 061095
        SSN: 476892771999

        Name: Fred
        Age: 101
        DOB: 061065
        SSN: 587903882000

        Name: Barney
        Age: 99
        DOB: 061265
        SSN: 698014993111

        {'SS': '476892771999', 'Age': '108', 'Birthday': '061095', 'Name': 'David'}

        David -> Person(David, 108, 061095, 476892771999)
        Fred -> Person(Fred, 101, 061065, 587903882000)
        Barney -> Person(Barney, 99, 061265, 698014993111)



        Comment

        • Raymond Hettinger

          #5
          Re: * 'struct-like' list *

          [Ernesto][color=blue]
          > I'm still fairly new to python, so I need some guidance here...
          >
          > I have a text file with lots of data. I only need some of the data. I
          > want to put the useful data into an [array of] struct-like
          > mechanism(s). The text file looks something like this:
          >
          > [BUNCH OF NOT-USEFUL DATA....]
          >
          > Name: David
          > Age: 108 Birthday: 061095 SocialSecurity: 476892771999
          >
          > [MORE USELESS DATA....]
          >
          > Name........
          >
          > I would like to have an array of "structs." Each struct has
          >
          > struct Person{
          > string Name;
          > int Age;
          > int Birhtday;
          > int SS;
          > }
          >
          > I want to go through the file, filling up my list of structs.
          >
          > My problems are:
          >
          > 1. How to search for the keywords "Name:", "Age:", etc. in the file...
          > 2. How to implement some organized "list of lists" for the data
          > structure.[/color]

          Since you're just starting out in Python, this problem presents an
          excellent opportunity to learn Python's two basic approaches to text
          parsing.

          The first approach involves looping over the input lines, searching for
          key phrases, and extracting them using string slicing and using
          str.strip() to trim irregular length input fields. The start/stop
          logic is governed by the first and last key phrases and the results get
          accumulated in a list. This approach is easy to program, maintain, and
          explain to others:

          # Approach suitable for inputs with fixed input positions
          result = []
          for line in inputData.split lines():
          if line.startswith ('Name:'):
          name = line[7:].strip()
          elif line.startswith ('Age:'):
          age = line[5:8].strip()
          bd = line[20:26]
          ssn = line[45:54]
          result.append(( name, age, bd, ssn))
          print result

          The second approach uses regular expressions. The pattern is to search
          for a key phrase, skip over whitespace, and grab the data field in
          parenthesized group. Unlike slicing, this approach is tolerant of
          loosely formatted data where the target fields do not always appear in
          the same column position. The trade-off is having less flexibility in
          parsing logic (i.e. the target fields must arrive in a fixed order):

          # Approach for more loosely formatted inputs
          import re
          pattern = '''(?x)
          Name:\s+(\w+)\s +
          Age:\s+(\d+)\s+
          Birthday:\s+(\d +)\s+
          SocialSecurity: \s+(\d+)
          '''
          print re.findall(patt ern, inputData)

          Other respondants have suggested the third-party PyParsing module which
          provides a powerful general-purpose toolset for text parsing; however,
          it is always worth mastering Python basics before moving on to special
          purpose tools. The above code fragements are easy to construct and not
          hard to explain to others. Maintenance is a breeze.


          Raymond


          P.S. Once you've formed a list of tuples, it is trivial to create
          Person objects for your pascal-like structure:

          class Person(object):
          def __init__(self, (name, age, bd, ssn)):
          self.name=name; self.age=age; self.bd=bd; self.ssn=ssn

          personlist = map(Person, result)
          for p in personlist:
          print p.name, p.age, p.bd, p.ssn

          Comment

          • Ernesto

            #6
            Re: * 'struct-like' list *

            Thanks for the approach. I decided to use regular expressions. I'm
            going by the code you posted (below). I replaced the line re.findall
            line with my file handle read( ) like this:

            print re.findall(patt ern, myFileHandle.re ad())

            This prints out only brackets []. Is a 're.compile' perhaps necessary
            ?


            Raymond Hettinger wrote:
            [color=blue]
            > # Approach for more loosely formatted inputs
            > import re
            > pattern = '''(?x)
            > Name:\s+(\w+)\s +
            > Age:\s+(\d+)\s+
            > Birthday:\s+(\d +)\s+
            > SocialSecurity: \s+(\d+)
            > '''
            > print re.findall(patt ern, inputData)[/color]

            Comment

            • Schüle Daniel

              #7
              Re: * 'struct-like' list *

              Ernesto wrote:[color=blue]
              > Thanks for the approach. I decided to use regular expressions. I'm
              > going by the code you posted (below). I replaced the line re.findall
              > line with my file handle read( ) like this:
              >
              > print re.findall(patt ern, myFileHandle.re ad())
              >
              > This prints out only brackets []. Is a 're.compile' perhaps necessary
              > ?[/color]

              if you see [] that means findall didn't find anything
              that would match your pattern
              if you re.compile your pattern beforehand that
              would not make findall find the matched text
              it's only there for the optimization

              consider
              lines = [line for line in file("foo.txt") .readlines() if
              re.match(r"\d+" ,line)]

              in this case it's better to pre-compile regexp one and use it
              to match all lines

              number = re.compile(r"\d +")
              lines = [line for line in file("foo.txt") .readlines() if number.match(li ne)]

              fire interactive python and play with re and patterns
              speaking from own experience ... the propability is
              against you that you will make pattern right on first time

              Regards, Daniel

              Comment

              • Ernesto

                #8
                Re: * 'struct-like' list *

                Thanks !

                Comment

                • Bengt Richter

                  #9
                  Re: * 'struct-like' list *

                  On 6 Feb 2006 09:03:09 -0800, "Ernesto" <erniedude@gmai l.com> wrote:
                  [color=blue]
                  >I'm still fairly new to python, so I need some guidance here...
                  >
                  >I have a text file with lots of data. I only need some of the data. I
                  >want to put the useful data into an [array of] struct-like
                  >mechanism(s) . The text file looks something like this:
                  >
                  >[BUNCH OF NOT-USEFUL DATA....]
                  >
                  >Name: David
                  >Age: 108 Birthday: 061095 SocialSecurity: 476892771999
                  >
                  >[MORE USELESS DATA....]
                  >
                  >Name........[/color]

                  Does the useful data always come in fixed-format pairs of lines as in your example?
                  If so, you could just iterate through the lines of your text file as in example at end [1]
                  [color=blue]
                  >
                  >I would like to have an array of "structs." Each struct has
                  >
                  >struct Person{
                  > string Name;
                  > int Age;
                  > int Birhtday;
                  > int SS;
                  >}[/color]
                  You don't normally want to do real structs in python. You probably want to define
                  a class to contain the data, e.g., class Person in example at end [1]
                  [color=blue]
                  >
                  >I want to go through the file, filling up my list of structs.
                  >
                  >My problems are:
                  >
                  >1. How to search for the keywords "Name:", "Age:", etc. in the file...
                  >2. How to implement some organized "list of lists" for the data
                  >structure.
                  >[/color]
                  It may be very easy, if the format is fixed and space-separated and line-paired
                  as in your example data, but you will have to tell us more if not.

                  [1] exmaple:

                  ----< ernesto.py >---------------------------------------------------------
                  class Person(object):
                  def __init__(self, name):
                  self.name = name
                  def __repr__(self): return 'Person(%r)'%se lf.name

                  def extract_info(li neseq):
                  lineiter = iter(lineseq) # normalize access to lines
                  personlist = []
                  for line in lineiter:
                  substrings = line.split()
                  if substrings and isinstance(subs trings, list) and substrings[0] == 'Name:':
                  try:
                  name = ' '.join(substrin gs[1:]) # allow for names with spaces
                  line = lineiter.next()
                  age_hdr, age, bd_hdr, bd, ss_hdr, ss = line.split()
                  assert age_hdr=='Age:' and bd_hdr=='Birthd ay:' and ss_hdr=='Social Security:', \
                  'Bad second line after "Name: %s" line:\n %r'%(name, line)
                  person = Person(name)
                  person.age = int(age); person.bd = int(bd); person.ss=int(s s)
                  personlist.appe nd(person)
                  except Exception,e:
                  print '%s: %s'%(e.__class_ _.__name__, e)
                  return personlist

                  def test():
                  lines = """\
                  [BUNCH OF NOT-USEFUL DATA....]

                  Name: David
                  Age: 108 Birthday: 061095 SocialSecurity: 476892771999

                  [MORE USELESS DATA....]

                  Name: Ernesto
                  Age: 25 Birthday: 040181 SocialSecurity: 123456789

                  Name: Ernesto
                  Age: 44 Brithdy: 040106 SocialSecurity: 123456789

                  Name........
                  """
                  persondata = extract_info(li nes.splitlines( ))
                  print persondata
                  ssdict = {}
                  for person in persondata:
                  if person.ss in ssdict:
                  print 'Rejecting %r with duplicate ss %s'%(person, person.ss)
                  else:
                  ssdict[person.ss] = person
                  print 'ssdict keys: %s'%ssdict.keys ()
                  for ss, pers in sorted(ssdict.i tems(), key=lambda item:item[1].name): #sorted by name
                  print 'Name: %s Age: %s SS: %s' % (pers.name, pers.age, pers.ss)

                  if __name__ == '__main__': test()
                  ---------------------------------------------------------------------------

                  this produces output:

                  [10:07] C:\pywk\clp>py2 4 ernesto.py
                  AssertionError: Bad second line after "Name: Ernesto" line:
                  'Age: 44 Brithdy: 040106 SocialSecurity: 123456789'
                  [Person('David') , Person('Ernesto ')]
                  ssdict keys: [123456789, 476892771999L]
                  Name: David Age: 108 SS: 476892771999
                  Name: Ernesto Age: 25 SS: 123456789

                  if you want to try this on a file, (we'll use the source itself here
                  since it includes valid example data lines), do something like:
                  [color=blue][color=green][color=darkred]
                  >>> import ernesto
                  >>> info = ernesto.extract _info(open('ern esto.py'))[/color][/color][/color]
                  AssertionError: Bad second line after "Name: Ernesto" line:
                  'Age: 44 Brithdy: 040106 SocialSecurity: 123456789\n'[color=blue][color=green][color=darkred]
                  >>> info[/color][/color][/color]
                  [Person('David') , Person('Ernesto ')]

                  tweak to taste ;-)

                  Regards,
                  Bengt Richter

                  Comment

                  • Ernesto

                    #10
                    Re: * 'struct-like' list *

                    Thanks tons !

                    Comment

                    • Bengt Richter

                      #11
                      Re: * 'struct-like' list *

                      On Tue, 07 Feb 2006 18:10:05 GMT, bokr@oz.net (Bengt Richter) wrote:
                      [...][color=blue]
                      >----< ernesto.py >---------------------------------------------------------[/color]
                      [...]
                      Just noticed:[color=blue]
                      > substrings = line.split()
                      > if substrings and isinstance(subs trings, list) and substrings[0] == 'Name:':[/color]
                      ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^^--not needed

                      str.split always returns a list, even if it's length 1, so that was harmless but should be

                      if substrings and substrings[0] == 'Name:':

                      (the first term is needed because ''.split() => [], to avoid [][0])
                      Sorry.

                      Regards,
                      Bengt Richter

                      Comment

                      Working...