Rookie question about data types (hashtables)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steve D. Perkins

    Rookie question about data types (hashtables)

    Hello all -

    I'm a Java and C++ developer, looking to pick up some Python and
    wxPython skills for prototyping and quick projects. I've been developing
    for about 15 years, and am very familiar with the wxWindows framework (I
    wrote one of the early Java bindings for it), but I've only been exposed
    to Python for a few days.

    I like to learn by doing, so I decided to just dive in with a small
    project I have a need for... a Spanish-English dictionary that lives in
    the system tray and pops up on demand with a hotkey. I've borrowed the
    dictionary data from the Pythoñol project on SourceForge, and learned the
    Python methods for file I/O and parsing strings. I now have 80,000 or so
    cleanly-parsed Spanish-English word pairs, and need to find the
    appropriate data structure to store them in.

    What I would LIKE to have is a hashtable-like structure that lets me
    retrieve multiple values based on a partial key. I want to have a text
    field for the user to type in a Spanish word, and a box below containing
    a list of English words. With each character that the user types, I
    would like to narrow down the words listed in the box below. That is,
    when you type the letter "a" the box will contain all English words
    starting with the letter "a". When you then type "n", the box will
    narrow further and contain all words starting with "an", etc... until
    you've narrowed it down to appropriate word. I figure that this will be
    easier than having users enter entire words in one shot and then try to
    implement some intelligence for near-matches.

    Therefore, it would be great if there was some hashtable-like data
    structure that lets me say "Return all values where the key starts with
    'an'". Actually, it would be even better if I could provide multiple
    options and get a union in one pass... such as "Return all values where
    the key starts with 'an' or 'añ'".

    I haven't yet turned up EXACTLY what I'm looking for, but I thought
    I would ask in case everyone would say, "Duh! There's a native data type
    for that, silly!". Even if that's not the case, does anyone know of any
    examples out there I can look at where such a data structure has been
    implemented manually? Thanks!

    Steve
  • Rainer Deyke

    #2
    Re: Rookie question about data types (hashtables)

    Steve D. Perkins wrote:[color=blue]
    > What I would LIKE to have is a hashtable-like structure that lets
    > me retrieve multiple values based on a partial key.[/color]

    Sounds like a sorted list would work best. A binary search (see the bisect
    module) lets find all matches in O(log n) time.


    --
    Rainer Deyke - rainerd@eldwood .com - http://eldwood.com


    Comment

    • Sidharth Kuruvila

      #3
      Re: Rookie question about data types (hashtables)

      the bsddb module has a binary tree database but that might be overkill.
      Other than that I doubt there's anything in standard python that would help.

      If you don't plan to add any words to your dictionary you could order the
      list and use a binary search to find the entries.
      you could use python list for that :-)


      Comment

      • Anton Vredegoor

        #4
        Re: Rookie question about data types (hashtables)

        Rainer Deyke" <rainerd@eldwoo d.com> wrote:
        [color=blue][color=green]
        >> What I would LIKE to have is a hashtable-like structure that lets
        >> me retrieve multiple values based on a partial key.[/color][/color]
        [color=blue]
        >Sounds like a sorted list would work best. A binary search (see the bisect
        >module) lets find all matches in O(log n) time.[/color]

        Probably yes. I'm replying just in case someone would like to have
        their hash values sorted in alphabetical order too. The code below
        might be of interest to hobbyist programmers who don't care about
        practical issues :-) (also not tested beyond what you see)

        Anton

        from string import ascii_lowercase as alc

        def antonian_sorted _unrank(k,base, ndig):
        res = []
        while k > -1 :
        r,k = divmod(k,int('1 '*ndig,base))
        k,ndig = k-1,ndig-1
        res.append(r)
        return res

        def antonian_sorted _rank(seq, base,ndig):
        res = 0
        for x in seq:
        res = x*int('1'*ndig, base)+res+1
        ndig -= 1
        return res-1

        def word_rank(word, ndig):
        d = [alc.index(c) for c in word]
        return antonian_sorted _rank(d,len(alc ),ndig)

        def word_unrank(i,n dig):
        L = antonian_sorted _unrank(i,len(a lc),ndig)
        return "".join([alc[j] for j in L])

        def test():
        L = ["a","ab","aa"," b","bab","ba "]
        maxlen = max(map(len,L))
        print L
        def wu(i): return word_unrank(i,m axlen)
        def wr(w): return word_rank(w,max len)
        H = map(wr,L)
        print H
        H.sort()
        print H
        R = map(wu,H)
        print R
        L.sort()
        assert L == R

        if __name__=='__ma in__':
        test()

        Comment

        • Axel Mittendorf

          #5
          Re: Rookie question about data types (hashtables)


          "Steve D. Perkins" <dontwritesteve @hotmail.com> wrote:[color=blue]
          > Hello all -[/color]
          Hi
          [color=blue]
          > when you type the letter "a" the box will contain all English words
          > starting with the letter "a". When you then type "n", the box will
          > narrow further and contain all words starting with "an", etc... until[/color]

          Maybe you could use a normal python dictionary (you call it hashtable).
          Look at this script (You would need one dictionary for each direction.):

          #!/usr/bin/python

          # generate dictionaries (AKA hashtables).
          # direction: English --> German
          EN2DE = {}
          # direction: German --> English
          DE2EN = {}

          # inserts a word and its translation into the dict d
          def insertWord( d, key, val):
          # does the key exist
          if d.has_key( key):
          # yes
          d[ key].append( val)
          else:
          # no
          d[ key] = [val]

          # returns a list words that begin with s
          def getBeginWith( d, s):
          l = s.__len__()
          res = []
          keys = d.keys()
          for k in keys:
          if k.__len__()>=l and k[:l] == s:
          # use "k[:l].lower() == s.lower()" to ignore case
          res.append( k)
          return res

          # insert some entries
          insertWord( EN2DE, "someword", "meaning 1")
          insertWord( EN2DE, "someword", "meaning 2")
          insertWord( EN2DE, "my_codingstyle ", "evil")
          insertWord( EN2DE, "son", "Sohn")
          insertWord( EN2DE, "sunday", "Sonntag")
          insertWord( EN2DE, "wednesday" , "Mittwoch")
          insertWord( EN2DE, "Weapon of mass destruction", "George Walker Bush")
          insertWord( EN2DE, "sold", "verkauft")

          # show that it works
          print "all keys that begin with 'so': ", getBeginWith( EN2DE, "so")
          print "translatio ns for 'someword': ", EN2DE['someword']

          Instead of doing this you could maybe subclass UserList from the standard
          library.[color=blue]
          > Steve[/color]

          HTH, Axel


          Comment

          • Eric Brunel

            #6
            Re: Rookie question about data types (hashtables)

            Steve D. Perkins wrote:
            [snip][color=blue]
            > What I would LIKE to have is a hashtable-like structure that lets me
            > retrieve multiple values based on a partial key.[/color]

            AFAIK, if you can do queries on a "hashtable-like structure" with a partial key,
            it cannot be a real hashtable...
            [color=blue]
            > I want to have a text
            > field for the user to type in a Spanish word, and a box below containing
            > a list of English words. With each character that the user types, I
            > would like to narrow down the words listed in the box below. That is,
            > when you type the letter "a" the box will contain all English words
            > starting with the letter "a". When you then type "n", the box will
            > narrow further and contain all words starting with "an", etc... until
            > you've narrowed it down to appropriate word.[/color]

            I'd use a Python dictionary indexed not on words, but on letters in the word.
            So, for a dictionary translating english words into french, you'd have for example:

            dictionary['m']['o']['n']['d']['a']['y']['translation'] = 'lundi'

            (NB: having 'translation' as the last key is to solve the problem of words
            included in another; see 'sun' and 'sunday'. BTW, any key that is not a single
            character can be used...)

            Here is an example implementation:

            --------------------------------------------------------
            def addWord(myDict, originalWord, translatedWord) :
            d = myDict
            for c in originalWord:
            if not d.has_key(c): d[c] = {}
            d = d[c]
            d['translation'] = translatedWord

            def _allTranslation s(rootWord, d):
            res = []
            for c in d.keys():
            if c == 'translation':
            res.append((roo tWord, d[c]))
            else:
            res += _allTranslation s(rootWord + c, d[c])
            return res

            def getTranslations (myDict, word):
            d = myDict
            for c in word:
            if not d.has_key(c): return []
            d = d[c]
            return _allTranslation s(word, d)
            --------------------------------------------------------

            And then you'd do:

            frenchDict = {}
            ## Week days
            addWord(frenchD ict, 'monday', 'lundi')
            addWord(frenchD ict, 'tuesday', 'mardi')
            # ...
            addWord(frenchD ict, 'sunday', 'dimanche')
            ## Months
            addWord(frenchD ict, 'january', 'janvier')
            addWord(frenchD ict, 'february', 'fevrier')
            # ...
            addWord(frenchD ict, 'december', 'decembre')

            print getTranslations (frenchDict, 'wednesday')
            print getTranslations (frenchDict, 'ju')
            print getTranslations (frenchDict, 's')

            I don't know how it scales up for big dictionaries, not to mention huge ones...

            HTH
            --
            - Eric Brunel <eric dot brunel at pragmadev dot com> -
            PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

            Comment

            • Steve D. Perkins

              #7
              Re: Rookie question about data types (hashtables)

              >> What I would LIKE to have is a hashtable-like structure that lets[color=blue][color=green]
              >> me retrieve multiple values based on a partial key.[/color]
              >
              > Sounds like a sorted list would work best. A binary search (see the
              > bisect module) lets find all matches in O(log n) time.[/color]


              Hmm... a double-linked-list with recursion, like back in Freshman
              Programming 101? I'm embarrassed to say that this didn't even cross my
              mind originally. I've been coding in Java for too long, when I encounter
              programs my instincts are to find an open-source plug-in that's already
              been written to solve the problem.

              Comment

              • Rainer Deyke

                #8
                Re: Rookie question about data types (hashtables)

                Steve D. Perkins wrote:[color=blue][color=green]
                >> Sounds like a sorted list would work best. A binary search (see the
                >> bisect module) lets find all matches in O(log n) time.[/color]
                >
                >
                > Hmm... a double-linked-list with recursion, like back in Freshman
                > Programming 101?[/color]

                I was thinking about a standard Python list, which is actually implemented
                as an array, not a linked list. Binary searches don't work on linked lists.


                --
                Rainer Deyke - rainerd@eldwood .com - http://eldwood.com


                Comment

                Working...