How to lookup into dictionary and split sentences from file into words?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • phuonghanu
    New Member
    • Nov 2010
    • 9

    How to lookup into dictionary and split sentences from file into words?

    hi,
    I'm dealing with a problem in which I have to scan through a text file (there are one or more sentences in this file) and print out all the separated words in a sentence

    We input Sentence.txt file,and run Perl program.It will lookup in a dictionary file (.txt) and then print out the words in sentence(s) which appear in the dictionary

    For example: "today is Saturday". After lookup in dictionary, if the words match, Perl will print out: today,is,saturd ay (each word in one line)



    Hope that you can help me. [point in pic to see more in detail ^^]
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    My first request would be that you add the following two lines to the beginning of the script:

    Code:
    use strict;
    use warnings;
    Those will get rid of the simple errors, mostly syntactical and such, and allow you to specify the real error you are encountering.

    Second, you need to please specify what you are seeing as wrong. You told us what you are trying to do, but failed to either ask a question or specify the error or even what is going wrong.

    Regards,

    Jeff

    Comment

    • phuonghanu
      New Member
      • Nov 2010
      • 9

      #3
      ^^ thanks Jeff.As usual,I always put strict and warning in the code but this time i've forgot that things

      Before putting those lines, i do not see any errors during compiling the code, but actually the code does not run.

      Now it seems that i have many proplem with the variable and function decleration

      Comment

      • RonB
        Recognized Expert Contributor
        • Jun 2009
        • 589

        #4
        When posting code, it's best to use the code tags instead of posting a graphic image.

        What happens when you fix those errors, or do you not know what those errors mean or how to fix them?

        Comment

        • RonB
          Recognized Expert Contributor
          • Jun 2009
          • 589

          #5
          Since you never declared and assigned anything to the @FILE1 array, what do you expect this line to do?
          Code:
          foreach $word (@FILE1)

          Comment

          • phuonghanu
            New Member
            • Nov 2010
            • 9

            #6
            @RonB:I do not hesitate to post a question here just because i'm a newbie and I really want to learn sth about Perl

            about this line of code, i considered file Dictionary as an array of string so that i can compare between words in sentences and words in dictionary

            do u have any suggestion about the decleration and the code line above?

            Comment

            • numberwhun
              Recognized Expert Moderator Specialist
              • May 2007
              • 3467

              #7
              For the last screen shot of errors you posted, it looks like you definitely put the pragmas in place. What you have to realize is that when you use them, there are certain things you MUST do. For instance, when declaring variables, you would now have to use the "my" keyword in front of them. For instance:

              Code:
              my $line = "test";
              You will have to do that before each variable, even if you have a section at the top of the script (after the pragmas), that is a simple declaration of each variable.

              Regards,

              Jeff

              Comment

              • RonB
                Recognized Expert Contributor
                • Jun 2009
                • 589

                #8
                The mere act of opening a filehandle does not place the contents of that file into an array. You need to read/parse the file.

                Please provide more details.

                Does each line in your dictionary file consist of a single word, or a phrase that needs to be matched?

                If the dictionary lines consist of single words, does every word of a line in the text.txt need to be matched to be concidered successful and outputted?

                You should parse the dictionary file and put its contents into a hash to simplify and make more efficient the matching.

                Comment

                • phuonghanu
                  New Member
                  • Nov 2010
                  • 9

                  #9
                  Actually,each line in my dictionary file consists of both kind of "words":a single word,and a "phrase" (eg. do not, everyone's) that needs to be matched

                  And the program will choose the longer word than the shorter one. For example: Havard University. it will choose havard university instead of havard or university alone. In fact,these words appear in the dictionary

                  Comment

                  • RonB
                    Recognized Expert Contributor
                    • Jun 2009
                    • 589

                    #10
                    Ok, then the starting point from here is for you to take our suggestions and rework your script then post back with your script and its results and an updated question based on those results.

                    Comment

                    • phuonghanu
                      New Member
                      • Nov 2010
                      • 9

                      #11
                      ok ^^ i have almost go to the result,but there is a big problem now.hx. here is my code (just leave use strict, and use warnings aside for now):

                      Code:
                      #!/usr/bin/perl
                      
                      print "input sentence you want to split: \n";
                      
                      my $string = <STDIN>;
                      
                      chomp $string;
                      
                      @dict = (
                              "if",
                              "error",
                              "does not",
                              "he",
                              "imperfect",
                               );
                      
                      $pos = 0;
                      
                      while ($pos <= length ($string))
                      {
                             $myword = "";
                             
                             foreach $word (@dic)
                             {
                                     $newpos = index ($string, $word, $pos);
                                     if ($newpos == $pos && length ($word)) > length ($myword))
                                     {
                                              $myword = $word;
                                     }
                             }
                                     if ($myword)
                                     {
                                             print $myword. "\n";
                                             $pos += length ($myword);
                                     }
                                     else
                                     {
                                             $pos ++;
                                     }
                      }
                      if i type the sentence: if he does not imperfect. Perl print successfully:
                      if
                      he
                      does not
                      imperfect

                      But the big trouble is that, this dictionary is not a file.u know, this a just kind of hash or array. If I open a file dictionary, it will not work like that :( How do i solve the problem with file to finish this stuff :(( How can i open and read file dictionary, and make it work like the "dictionary " above?

                      Comment

                      • RonB
                        Recognized Expert Contributor
                        • Jun 2009
                        • 589

                        #12
                        Code:
                        open my $dictionary, '<', 'dictionary.txt'
                          or die "Failed to open 'dictionary.txt' $!";
                        
                        chomp(my @dictionary = <$dictionary>);

                        Comment

                        • phuonghanu
                          New Member
                          • Nov 2010
                          • 9

                          #13
                          ^^ first of all, thanks a lots RonB.
                          Actually, i'm nearly to the point

                          For example, when i type: if error rate of ham is imperfect

                          perl will print out:

                          if
                          error
                          rate
                          of
                          ham
                          is

                          so there's word that does not appear in the result.

                          another example: if everyone's imperfect so error rate of ham is high

                          Perl print everything except "imperfect" and "so". at first i thought about the position of them, but when i change their positions, Perl still does not catch them. I wonder this happens the same with other words :(

                          any suggestions?

                          Comment

                          • numberwhun
                            Recognized Expert Moderator Specialist
                            • May 2007
                            • 3467

                            #14
                            Looking at your code, I am guessing you chose not to deal with the errors produced by the pragmas. Sorry, I will not assist you with this code unless the pragmas are in place and you deal with the errors it brings up.

                            There is no reason for us to have to deal with your syntactical errors.

                            Regards,

                            Jeff

                            Comment

                            • phuonghanu
                              New Member
                              • Nov 2010
                              • 9

                              #15
                              Thanks numberwhun,i've fixed the syntactial errors that you posted earlier.

                              Now i can read from a file, look it up in the dictionary and split both single and complex words in the file

                              However, in order to optimize the Perl code, I'm trying to print out not only the words that appear in the dictionary but the words not in dictionary as well

                              For example: Today i do not want to do the exercises

                              The dictionary contains: today, I, do not,to,do (for examle)

                              Now what i want is that the Perl code will split and print out not only "today, I, do not, to,do" but "the", "want and "exercises" as well

                              any suggest for my code?

                              the code is below:
                              Code:
                              #!/usr/bin/perl
                              #this is file test.pl
                              
                              use strict;
                              use warnings;
                              
                              open my @dictionary, '<', 'Dictionary.txt' or die $!;
                              chomp(my @dictionary = <$dictionary>);
                              print ("\nCongratulation!This is result:\n");
                              
                              while (my $string =<>)
                              {
                                   print "\n----------\n"; 
                              #this is to separate each sentence of the file that contains more than 2 sentences
                                   print "$string\n";
                                   my $pos = 0;
                                   while ($pos <= length ($string))
                                   {
                                        my $myword = "";
                                        foreach my $word (@dictionary)
                                        {
                                            my $newpos = index ($string, $word, $pos);
                                            if ($newpos == $pos && length($word)>length ($myword))
                                            {
                                                $myword = $word;
                                            }
                                        }
                                        if ($myword)
                                        {
                                             print "\n$myword\n";
                                             $pos += length ($myword);
                                        } 
                                        else
                                        {
                                             $pos ++;
                                        }
                                   }       
                              }
                              input: sentence.txt, Dictionary.txt
                              sentence.txt contains 2 sentences,for example:
                              - today I do not want to do the exercise
                              - he is very handsome
                              run: perl test.pl sentence.txt
                              perl will print out
                              ---------------
                              today I do not want to do the exercise
                              today
                              I
                              do not
                              to
                              do
                              ---------------
                              he
                              is
                              very
                              handsome

                              words "the", "exercise" do not appear in the dictionary (say for example) so that Perl will not print out. help me to do this stuff,plz

                              Comment

                              Working...