Build a English-French dictionary in Java

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • levay14
    New Member
    • Jan 2007
    • 2

    Build a English-French dictionary in Java

    Hi i'm actually doing some research on how to develop a dictionary English-French using Java. Can anyone help me please
  • r035198x
    MVP
    • Sep 2006
    • 13225

    #2
    Originally posted by levay14
    Hi i'm actually doing some research on how to develop a dictionary English-French using Java. Can anyone help me please
    Deepends on how far you are willing to go with it. A HashMap might be a starting point for a small program. You may also want to consider storing the words in a database.

    Comment

    • DeMan
      Top Contributor
      • Nov 2006
      • 1799

      #3
      Assuming that you want to give word translation (that is if someone types in "yes" you give "oui" and some other synonyms) and that you are not, for example, trying to automate the translation of the entire Dickens collection, I would suggest something along the lines of <and yes this would be VERY timeconsuming>. ...

      We're going to use a tree structure (i think they're called suffix trees and dictionary trees depending on where in the world you are <though, as always, I expect to be corrected if they're something else> and assume you only want eng-fra (if you want to go the other way you may have to do the same work again unless someone has a better idea)

      Build a tree where each node has 27 children (more if you are interestend in hyphenated, accented or any othered words). The root node is unique and called (to be original) 'root' (or something equally as meaningful). The general idea is that each child represents a letter (with a leaf extension making the 27th node) so that (as an example) cat would follow from root-c-a-t-leaf. The leaf node, rather than containing children contains a container of the translations for the word we have reached. (I know my explanations aren't always clear so ask questions when you (or i) get confused).

      Obviously not all the tree has to be implemented (so long as you have good checks to make sure hasChild('x') so that you don't need to create paths root-x-z-q-y-r-w-c-v-end.

      A further optimization (which is not quite as easy as it sounds) is to store the maximum unique suffix string, ie if the only word in the english dictionary to begin aa were aardvark (and I'm not claiming it is) the path would be root-a-a-rdvark...That is you store the remainder of the string once it is unique. Personally, I think this is unnecessary these days when space is not a major issue in storage (though I'm sure others disagree, particularly in projects the scale of yours).

      The advantage of using such a tree structure, is that it ,akes searching quite easy.....

      if someone requests the word philanthropist you know to look down through p branch's h-branch etc (which is why you need a good test for "node doesn't exist <yet>).

      I'm starting to confuse myself, so I've probably confused everyone else, but if you need some further explanation post back with what i haven't explained well enough and I'll see if I can do a better job

      Comment

      • r035198x
        MVP
        • Sep 2006
        • 13225

        #4
        Originally posted by DeMan
        Assuming that you want to give word translation (that is if someone types in "yes" you give "oui" and some other synonyms) and that you are not, for example, trying to automate the translation of the entire Dickens collection, I would suggest something along the lines of <and yes this would be VERY timeconsuming>. ...

        We're going to use a tree structure (i think they're called suffix trees and dictionary trees depending on where in the world you are <though, as always, I expect to be corrected if they're something else> and assume you only want eng-fra (if you want to go the other way you may have to do the same work again unless someone has a better idea)

        Build a tree where each node has 27 children (more if you are interestend in hyphenated, accented or any othered words). The root node is unique and called (to be original) 'root' (or something equally as meaningful). The general idea is that each child represents a letter (with a leaf extension making the 27th node) so that (as an example) cat would follow from root-c-a-t-leaf. The leaf node, rather than containing children contains a container of the translations for the word we have reached. (I know my explanations aren't always clear so ask questions when you (or i) get confused).

        Obviously not all the tree has to be implemented (so long as you have good checks to make sure hasChild('x') so that you don't need to create paths root-x-z-q-y-r-w-c-v-end.

        A further optimization (which is not quite as easy as it sounds) is to store the maximum unique suffix string, ie if the only word in the english dictionary to begin aa were aardvark (and I'm not claiming it is) the path would be root-a-a-rdvark...That is you store the remainder of the string once it is unique. Personally, I think this is unnecessary these days when space is not a major issue in storage (though I'm sure others disagree, particularly in projects the scale of yours).

        The advantage of using such a tree structure, is that it ,akes searching quite easy.....

        if someone requests the word philanthropist you know to look down through p branch's h-branch etc (which is why you need a good test for "node doesn't exist <yet>).

        I'm starting to confuse myself, so I've probably confused everyone else, but if you need some further explanation post back with what i haven't explained well enough and I'll see if I can do a better job
        Disappearing from the java forum and appearing at will won't help too much either. Nice to pop in though. You should do that more often though...

        Interesting solution you have given there. I guess it depends on how far the OP is willing to take it or what his/her specs say. For example a tree data structure is already available in one of the java packages (java.util.Tree Map). The OP may decide to use it create their own tree. Or the OP might already have the words in some file and just wants to create an interface program that retrieves the words by making some kind of a mapping.


        I trust your holidays were great.

        Comment

        Working...