Question About Method

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael Lawson

    Question About Method

    I'm not quite sure if this the right forum for this question, so if
    I'm wrong in my assumption, if you could point me in the right
    direction, I'd appreciate it.

    I'm in the process of learning C++, coming from a brief spell with C.
    I'm completely self-taught, and the best way for me to learn a
    language is by coming up with an idea for a project, and working on it
    until I understand the methods used to finish it. Not the most ideal
    way to learn, I'm sure, but it's the only option I have at the moment.

    Currently, I'm working on a program that unscrambles words using a
    600k dictionary file. I've made one in C before, but unfortunately, it
    was horribly slow.

    To speed up the routine I've decided to make an integer array
    consisting of the line number where each letter starts in the
    alphabetized dictionary file. So in case the scrambled letters were
    abc (to make things simple), it would only scan through the a, b, and
    c sections of the file. Also, I've been pondering the idea of putting
    all of the data in the dictionary file into a linked list so as to make
    the data processing quite a bit faster. Unfortunately, this would use
    quite a bit of memory and I'm pretty sure it's not the most efficient
    way. The reasoning behind my madness is this: scanning through the same
    area of a file over and over again is redundant. Accessing the word
    directly from a variable is a lot faster than having to grab it from a
    file, and *then* allocating it to a variable. But like I said, memory is
    an issue. Or is it?

    My idea consists of scanning through the whole file when the program is
    originally opened, storing the values of the words in a linked list, and
    using an array to point to where the various letter groups starts. Is there
    an easier/more practical way of doing this? And if so, what? If my method
    *is* feasible, what sort of problems would I be likely to encounter?

    Thanks much,
    Mike


  • Sam Holden

    #2
    Re: Question About Method

    On Wed, 22 Oct 2003 02:04:46 GMT, Michael Lawson <jslawson@adelp hia.net> wrote:[color=blue]
    >
    > To speed up the routine I've decided to make an integer array
    > consisting of the line number where each letter starts in the
    > alphabetized dictionary file. So in case the scrambled letters were
    > abc (to make things simple), it would only scan through the a, b, and
    > c sections of the file. Also, I've been pondering the idea of putting
    > all of the data in the dictionary file into a linked list so as to make
    > the data processing quite a bit faster. Unfortunately, this would use
    > quite a bit of memory and I'm pretty sure it's not the most efficient
    > way. The reasoning behind my madness is this: scanning through the same
    > area of a file over and over again is redundant. Accessing the word
    > directly from a variable is a lot faster than having to grab it from a
    > file, and *then* allocating it to a variable. But like I said, memory is
    > an issue. Or is it?[/color]

    A better algorithm will speed things up.

    There's no need to search through the words, you can preprocess them
    once and then do a fast lookup.

    Convert each dictionary word into a word with the same letters but in
    sorted order. For example, "cat" -> "act" and "aardvark"=>"aa adkrrv".
    Store the mappings of "sorted words" to words in a multimap<string ,string>
    or map<string, vector<string> > or whatever.

    Finding the unscrambled word is then as simple as converting the
    scrambled word to a "sorted word" and then looking up the words it
    could be in the map.

    Memory will be gobbled up, but memory isn't usually a problem on
    PCs these days. Memory can be optimised by using a hashing function
    instead of "sorted words" as the index to an array of words. And file
    offsets for the words.

    But that's algorithms and not C++.

    --
    Sam Holden

    Comment

    • David White

      #3
      Re: Question About Method

      Sam Holden <sholden@flexal .cs.usyd.edu.au > wrote in message
      news:slrnbpc7rk .fa2.sholden@fl exal.cs.usyd.ed u.au...[color=blue]
      > On Wed, 22 Oct 2003 02:04:46 GMT, Michael Lawson <jslawson@adelp hia.net>[/color]
      wrote:[color=blue][color=green]
      > >
      > > To speed up the routine I've decided to make an integer array
      > > consisting of the line number where each letter starts in the
      > > alphabetized dictionary file. So in case the scrambled letters were
      > > abc (to make things simple), it would only scan through the a, b, and
      > > c sections of the file. Also, I've been pondering the idea of putting
      > > all of the data in the dictionary file into a linked list so as to make
      > > the data processing quite a bit faster. Unfortunately, this would use
      > > quite a bit of memory and I'm pretty sure it's not the most efficient
      > > way. The reasoning behind my madness is this: scanning through the same
      > > area of a file over and over again is redundant. Accessing the word
      > > directly from a variable is a lot faster than having to grab it from a
      > > file, and *then* allocating it to a variable. But like I said, memory is
      > > an issue. Or is it?[/color]
      >
      > A better algorithm will speed things up.
      >
      > There's no need to search through the words, you can preprocess them
      > once and then do a fast lookup.
      >
      > Convert each dictionary word into a word with the same letters but in
      > sorted order. For example, "cat" -> "act" and "aardvark"=>"aa adkrrv".
      > Store the mappings of "sorted words" to words in a multimap<string ,string>
      > or map<string, vector<string> > or whatever.[/color]

      Yes, I know of a Scrabble algorithm that found the highest-scoring possible
      word from a dictionary of 90,000 words in a fraction of a second on a 33MHz
      486. It stored words as sorted anagrams also, and then in a tree. Each
      letter of the sorted anagram was a node in the tree.

      DW



      Comment

      • David Rubin

        #4
        [OT] Re: Question About Method

        Sam Holden wrote:

        [snip - unscramble words][color=blue]
        > Convert each dictionary word into a word with the same letters but in
        > sorted order. For example, "cat" -> "act" and "aardvark"=>"aa adkrrv".
        > Store the mappings of "sorted words" to words in a multimap<string ,string>[/color]

        What do you do about anagrams?
        [color=blue]
        > or map<string, vector<string> > or whatever.[/color]

        Using a map makes the question moot, assuming you store (a hash of)
        sorted words. But consider "edra" which could unscramble to "read",
        "dear", or "dare". I'm not sure you can solve this problem without
        knowledge of the context. Perhaps you can use some sort of suffix
        chaining algorithm? Clearly, there is no 1-1 mapping.

        /david

        --
        Andre, a simple peasant, had only one thing on his mind as he crept
        along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
        -- unknown

        Comment

        • Sam Holden

          #5
          Re: [OT] Re: Question About Method

          On Wed, 22 Oct 2003 14:09:22 -0400,
          David Rubin <bogus_address@ nomail.com> wrote:[color=blue]
          > Sam Holden wrote:
          >
          > [snip - unscramble words][color=green]
          >> Convert each dictionary word into a word with the same letters but in
          >> sorted order. For example, "cat" -> "act" and "aardvark"=>"aa adkrrv".
          >> Store the mappings of "sorted words" to words in a multimap<string ,string>[/color]
          >
          > What do you do about anagrams?[/color]

          It's a multimap so they all get put in with the same key.
          [color=blue]
          >[color=green]
          >> or map<string, vector<string> > or whatever.[/color]
          >
          > Using a map makes the question moot, assuming you store (a hash of)
          > sorted words. But consider "edra" which could unscramble to "read",
          > "dear", or "dare". I'm not sure you can solve this problem without
          > knowledge of the context. Perhaps you can use some sort of suffix
          > chaining algorithm? Clearly, there is no 1-1 mapping.[/color]

          If a "scrambled" word just has a random arrangement of letters then clearly
          you can't. You get a number of possible answers. But that's in the OPs
          problem statement as well.

          --
          Sam Holden

          Comment

          • David Rubin

            #6
            Re: [OT] Re: Question About Method

            Sam Holden wrote:[color=blue]
            >
            > On Wed, 22 Oct 2003 14:09:22 -0400,
            > David Rubin <bogus_address@ nomail.com> wrote:[color=green]
            > > Sam Holden wrote:
            > >
            > > [snip - unscramble words][color=darkred]
            > >> Convert each dictionary word into a word with the same letters but in
            > >> sorted order. For example, "cat" -> "act" and "aardvark"=>"aa adkrrv".
            > >> Store the mappings of "sorted words" to words in a multimap<string ,string>[/color]
            > >
            > > What do you do about anagrams?[/color]
            >
            > It's a multimap so they all get put in with the same key.[/color]

            That's my point. There could be more than one unscrambled word per
            scrambled word.
            [color=blue]
            > [...anagrams...] Clearly, there is no 1-1 mapping.
            >
            > If a "scrambled" word just has a random arrangement of letters then clearly
            > you can't. You get a number of possible answers. But that's in the OPs
            > problem statement as well.[/color]

            This is all I saw:

            Currently, I'm working on a program that unscrambles words using a
            600k dictionary file.

            OP doesn't mention anagrams at all. But it seems like an obvious
            concern.

            /david

            --
            Andre, a simple peasant, had only one thing on his mind as he crept
            along the East wall: 'Andre, creep... Andre, creep... Andre, creep.'
            -- unknown

            Comment

            • Jerry Coffin

              #7
              Re: [OT] Re: Question About Method

              In article <3F999A08.E546A C01@nomail.com> , bogus_address@n omail.com
              says...

              [ ... ]
              [color=blue][color=green]
              > > It's a multimap so they all get put in with the same key.[/color]
              >
              > That's my point. There could be more than one unscrambled word per
              > scrambled word.[/color]

              I would use equal_range to find all the words corresponding to a set of
              letters. Except under rather unusual circumstances, the computer
              probably can't pick between them, at least without a LOT of extra work.

              --
              Later,
              Jerry.

              The universe is a figment of its own imagination.

              Comment

              Working...