Convert list to dictionary problem

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • GTXY20
    New Member
    • Oct 2007
    • 29

    Convert list to dictionary problem

    Hi,

    I have the following in a text file:

    1,a
    1,b
    1,b
    2,a
    2,c
    2,a
    2,c
    etc....

    I have the following code to open the text file create a list from the data inside. I am trying to create a dictionary like:

    {[1:a], [1:b], [1:b], [2:a], [2:c], [2:a], [2:c]}

    I am using the following:
    [CODE=python]
    infile = open('input.txt ', 'r')
    records = infile.readline s()
    infile.close()
    records = [s.replace('\n', '') for s in records]
    finalrecords = map(string.spli t() ,records)
    [/CODE]
    However I keep getting the following error:

    "pythontest.py" , line 5, in <module>
    finalrecords = map(string.spli t() ,records)
    NameError: name 'string' is not defined

    Any advice - also moving forward I would like to create from the dictionary a count associated with each unique instance of a key:value relationship so using the above data I would write to a file:

    KEY UNIQUE INSTANCES
    1 2 (sum for unique key value instance 1:a and 1:b)
    2 2 (sum for unique key value instance 2:a and 2:c)

    I can do this in SQL but would prefer to do in python for speed and flexibility with computations.

    Any advice is greatly appreciated.

    GTXY20
  • ghostdog74
    Recognized Expert Contributor
    • Apr 2006
    • 511

    #2
    string.split is deprecated.
    use <string>.split( ) instead.
    eg
    Code:
    s = "test , test1"
    s.split()
    by the way, you can't create dictionary will same key. dictionary keys should be unique.

    Comment

    • GTXY20
      New Member
      • Oct 2007
      • 29

      #3
      Thanks - if they need to be unique how do i import so that I keep the unique key but assign the multiple associated values so that I get:

      {[1:a,b], [2:a,c]}

      thanks again..

      Comment

      • bartonc
        Recognized Expert Expert
        • Sep 2006
        • 6478

        #4
        My friend ghostdog74 is correct. Given that data, you'd end up with a very small dictionary:[CODE=python]
        >>> records = '1,a\n1,b\n1,b\ n2,a\n2,c\n2,a\ n2,c' # often missing the last newline
        >>> lines = records.split()
        >>> lines
        ['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
        >>> dd = dict((key, value) for key, value in (line.split(',' ) for line in lines))
        >>> dd
        {'1': 'b', '2': 'c'}
        >>> [/CODE]

        Comment

        • GTXY20
          New Member
          • Oct 2007
          • 29

          #5
          OK - if i just run the first part:

          infile = open('input.txt ', 'r')
          records = infile.readline s()
          infile.close()
          records

          ['1,a\n', '1,b\n', '1,c\n', '1,a\n', '1,c\n', '1,a\n', '1,b\n', '2,a\n', '2,b\n', '2,c\n', '2,a\n', '3,c\n', '3,a\n', '3,b\n', '4,a\n', '4,a\n', '4,c\n', '4,c\n']

          so when I try and:

          lines = records.split

          I am thrown:

          Traceback (most recent call last):
          File "<interacti ve input>", line 1, in <module>
          AttributeError: 'list' object has no attribute 'split'

          I think it is becauase records is:

          ['1,a\n', '1,b\n', '1,c\n', '1,a\n', '1,c\n', '1,a\n', '1,b\n', '2,a\n', '2,b\n', '2,c\n', '2,a\n', '3,c\n', '3,a\n', '3,b\n', '4,a\n', '4,a\n', '4,c\n', '4,c\n']

          and not:

          '1,a\n1,b\n1,b\ n2,a\n2,c\n2,a\ n2,c'

          How can I open a text file and store records as above not using readlines.

          As for teh dictionary I would like to have it so that I get:

          {[1:a,b], [2:a,c]}

          Any ideas - sorry new to Python and used to just working in SQL.

          G.

          Comment

          • bartonc
            Recognized Expert Expert
            • Sep 2006
            • 6478

            #6
            Originally posted by GTXY20
            Thanks - if they need to be unique how do i import so that I keep the unique key but assign the multiple associated values so that I get:

            {[1:a,b], [2:a,c]}

            thanks again..
            [CODE=python]
            >>> records = '1,a\n1,b\n1,b\ n2,a\n2,c\n2,a\ n2,c' # often missing the last newline
            >>> lines = records.split()
            >>> lines
            ['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
            >>> dd = {}
            >>> for line in lines:
            ... key, value = line.split(',')
            ... if key in dd:
            ... oldvalue = dd[key]
            ... if value not in oldvalue:
            ... dd[key] = '%s,%s' %(oldvalue, value)
            ... else:
            ... dd[key] = value
            ...
            >>> dd
            {'1': 'a,b', '2': 'a,c'}
            >>> [/CODE]

            Comment

            • bartonc
              Recognized Expert Expert
              • Sep 2006
              • 6478

              #7
              Originally posted by GTXY20
              OK - if i just run the first part:

              infile = open('input.txt ', 'r')
              records = infile.readline s()
              infile.close()
              records
              Use
              Code:
              infile.read()
              instead.

              Comment

              • bartonc
                Recognized Expert Expert
                • Sep 2006
                • 6478

                #8
                Originally posted by bartonc
                Use
                Code:
                infile.read()
                instead.
                Even better: Use a tuple in the value:[CODE=python]
                >>> records = '1,a\n1,b\n1,b\ n2,a\n2,c\n2,a\ n2,c' # often missing the last newline
                >>> lines = records.split()
                >>> lines
                ['1,a', '1,b', '1,b', '2,a', '2,c', '2,a', '2,c']
                >>> dd = {}
                >>> for line in lines:
                ... key, value = line.split(',')
                ... if key in dd:
                ... oldvalue = dd[key]
                ... if value not in oldvalue:
                ... dd[key] = oldvalue + (value,) # tuple addition
                ... else:
                ... dd[key] = (value,) # a tuple of one
                ...
                >>> dd
                {'1': ('a', 'b'), '2': ('a', 'c')}
                >>> [/CODE]This allows any type a conversion on the text prior to being stored.

                Comment

                • GTXY20
                  New Member
                  • Oct 2007
                  • 29

                  #9
                  This is perfect!!!

                  I assume you can also sort the values so that values would always start like a,b,c or a,cor a,b depending on the value?

                  Finally I need to do two more things:

                  1. If I wanted to list the quantity of unique value combinations based on keys within a dictionary so for example I have the following dictionary:

                  {'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

                  I would need:

                  QTY VALUE COMBINATION
                  3 a,b,c
                  1 a,c

                  2. Get the total number of values for a key:

                  {'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

                  I would need:

                  KEY NUMBER OF VALUES
                  1 3
                  3 3
                  2 3
                  4 2

                  Thank you so much this is so helpful and incredibly more efficient than using SQL and VB to come up with. Do you know if there are any size limitations of a dictionary in python - I am thinking I may eventually have 2 million keys with a variety of values (average of about 5 values per key).

                  G.

                  Comment

                  • bartonc
                    Recognized Expert Expert
                    • Sep 2006
                    • 6478

                    #10
                    Originally posted by GTXY20
                    This is perfect!!!

                    I assume you can also sort the values so that values would always start like a,b,c or a,cor a,b depending on the value?

                    Finally I need to do two more things:

                    1. If I wanted to list the quantity of unique value combinations based on keys within a dictionary so for example I have the following dictionary:

                    {'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

                    I would need:

                    QTY VALUE COMBINATION
                    3 a,b,c
                    1 a,c

                    2. Get the total number of values for a key:

                    {'1': 'a,b,c', '3': 'a,b,c', '2': 'a,b,c', '4': 'a,c'}

                    I would need:

                    KEY NUMBER OF VALUES
                    1 3
                    3 3
                    2 3
                    4 2

                    Thank you so much this is so helpful and incredibly more efficient than using SQL and VB to come up with. Do you know if there are any size limitations of a dictionary in python - I am thinking I may eventually have 2 million keys with a variety of values (average of about 5 values per key).

                    G.
                    In order to sort, you'll need a list in the value:[CODE=python]
                    >>> records = '1,b\n1,a\n1,b\ n2,c\n2,a\n2,a\ n2,c' # reordered elements
                    >>> lines = records.split()
                    >>> lines
                    ['1,b', '1,a', '1,b', '2,c', '2,a', '2,a', '2,c']
                    >>> dd = {}
                    >>> for line in lines:
                    ... key, value = line.split(',')
                    ... if key in dd:
                    ... valueList = dd[key]
                    ... if value not in valueList:
                    ... valueList.appen d(value)
                    ... else:
                    ... dd[key] = [value] # a list of one
                    ...
                    >>> dd
                    {'1': ['b', 'a'], '2': ['c', 'a']}
                    >>> for key, valueList in dd.items():
                    ... valueList.sort( )
                    ...
                    >>> dd
                    {'1': ['a', 'b'], '2': ['a', 'c']}[/CODE]Since dictionaries are not ordered containers, you'll want to work with a sorted() list of its keys:[CODE=python]
                    >>> for key in sorted(dd.keys( )):
                    ... print key, len(dd[key])
                    ...
                    1 2
                    2 2
                    >>> [/CODE]Size limit, huh? With Python, memory is usually the limiting factor (as in (L)ong integers, which can contain a single value large enough to fill available memory - try it sometime!).

                    Comment

                    • GTXY20
                      New Member
                      • Oct 2007
                      • 29

                      #11
                      I was able to sort by KEY with the following:

                      sorted(dd.items (), key=lambda(k,v) :(v,k))

                      Comment

                      • bartonc
                        Recognized Expert Expert
                        • Sep 2006
                        • 6478

                        #12
                        Originally posted by GTXY20
                        I was able to sort by KEY with the following:

                        sorted(dd.items (), key=lambda(k,v) :(v,k))
                        I though that[CODE=python]sorted(dd.keys( ))[/CODE]would be sufficient.

                        Your way:[CODE=python]
                        >>> sorted(dd.items (), key=lambda(k,v) :(v,k))
                        [('1', ['a', 'b']), ('2', ['a', 'c'])]
                        >>> [/CODE]actually creates a list of tuples with one tuple for each entry in the dictionary.


                        PS: It's actually a rule on this site that you use the [CODE] tags around your code, as instructed on the right hand side of the page when posting or replying.

                        Comment

                        • GTXY20
                          New Member
                          • Oct 2007
                          • 29

                          #13
                          Thanks again - point taken about the code tags I will do this moving forward - too excited about this working out and got caught up with everything.

                          Comment

                          • GTXY20
                            New Member
                            • Oct 2007
                            • 29

                            #14
                            Hi there,

                            Sorry for all the questions - this is enligtening...

                            Any way to display the count of the values in the values list so here is my dictionary:

                            Code:
                            {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
                            I would like to display count as follows and I would not know all the values in the values list:

                            Value QTY
                            a 4
                            b 3
                            c 4

                            Also is there anyway to display the count of the values list combinations so here again is my dictionary:

                            Code:
                            {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
                            And I would like to display as follows

                            QTY Value List Combination
                            3 a,b,c
                            1 a,c

                            Once again all help is much appreciated.

                            G.

                            Comment

                            • bartonc
                              Recognized Expert Expert
                              • Sep 2006
                              • 6478

                              #15
                              Here's a neat trick that will give you a place to start:[CODE=python]

                              >>> dd = {'1': ['a', 'b', 'c'], '3': ['a', 'b', 'c'], '2': ['a', 'b', 'c'], '4': ['a', 'c']}
                              >>> uniques = set(tuple(value ) for key, value in dd.items())
                              >>> uniques
                              set([('a', 'b', 'c'), ('a', 'c')])
                              >>> [/CODE]Then, for the last part, use list.count() on a list of values:[CODE=python]
                              >>> all = [tuple(value) for key, value in dd.items()]
                              >>> all
                              [('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'b', 'c'), ('a', 'c')]
                              >>> for item in uniques:
                              ... print item, all.count(item)
                              ...
                              ('a', 'b', 'c') 3
                              ('a', 'c') 1
                              >>> [/CODE]

                              Comment

                              Working...