remove certain words from a c++ string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • prasanna.hariharan@gmail.com

    remove certain words from a c++ string

    Hi guys,

    I want to remove certain words from a c++ string. The list of words are
    in a file with each word in a new line. I tried using the
    std::transform, but it dint work.

    Anybody got a clue as to how i should go about this.

    thanks a lot,
    Hp

  • Andrej Hristoliubov

    #2
    Re: remove certain words from a c++ string


    prasanna.hariha ran@gmail.com wrote:[color=blue]
    > Hi guys,
    >
    > I want to remove certain words from a c++ string. The list of words are
    > in a file with each word in a new line. I tried using the
    > std::transform, but it dint work.
    >
    > Anybody got a clue as to how i should go about this.
    >
    > thanks a lot,
    > Hp[/color]


    Try using string::find and string remove (I added swap for
    optimization, you don't have to):

    example:

    string str="Hello world is the only assignment I can do",
    remword=world;
    size_t=pos;

    if((pos=str.fin d(remword))!=st ring::npos)
    {
    str.swap(str.er ase(pos,remword .length()));

    }





    ps. I rule!

    Comment

    • Dave Rahardja

      #3
      Re: remove certain words from a c++ string

      On 22 Oct 2005 16:33:04 -0700, prasanna.hariha ran@gmail.com wrote:
      [color=blue]
      >Hi guys,
      >
      >I want to remove certain words from a c++ string. The list of words are
      >in a file with each word in a new line. I tried using the
      >std::transform , but it dint work.
      >
      >Anybody got a clue as to how i should go about this.[/color]

      transform() doesn't remove entries in a container, it only modifies them.

      Use string::find() to find the substring, then use string::erase() to remove
      the substring from the string.

      -dr

      Comment

      • Hp

        #4
        Re: remove certain words from a c++ string

        Hi, Thanks a lot for your replies. But i fugured out that before the
        word removal from the string, i need to convert the c++ string from
        upper to lower case. I infact used the transform() to perform this
        operation, which dint work.
        And also, after the uppertolower case conversion, i need to read the
        file containing all the stopwords, one in each line, to be removed from
        the transformed string.

        Thanks a lot in advance,
        Hp

        Comment

        • Jonathan Mcdougall

          #5
          Re: remove certain words from a c++ string

          Hp wrote:[color=blue]
          > Hi, Thanks a lot for your replies. But i fugured out that before the
          > word removal from the string, i need to convert the c++ string from
          > upper to lower case. I infact used the transform() to perform this
          > operation, which dint work.[/color]

          How did you do it? How didn't it work?
          [color=blue]
          > And also, after the uppertolower case conversion, i need to read the
          > file containing all the stopwords, one in each line, to be removed from
          > the transformed string.[/color]

          Show us some code! Reading a line from a file is a basic operation
          described in any textbook (good or bad).

          1) get the string
          2) convert it to lower case
          3) read the lines from the file
          4) search the string for the words you just read and remove each one


          Jonathan

          Comment

          • puzzlecracker

            #6
            Re: remove certain words from a c++ string


            Hp wrote:[color=blue]
            > Hi, Thanks a lot for your replies. But i fugured out that before the
            > word removal from the string, i need to convert the c++ string from
            > upper to lower case. I infact used the transform() to perform this
            > operation, which dint work.
            > And also, after the uppertolower case conversion, i need to read the
            > file containing all the stopwords, one in each line, to be removed from
            > the transformed string.
            >
            > Thanks a lot in advance,
            > Hp[/color]

            stopwords?? (like the, an, a)-- sounds like a problem from data mining
            what course are you taking?

            I think I've dealt with it (long ago in my academic career!!!!)

            Comment

            • Hp

              #7
              Re: remove certain words from a c++ string

              Hey Puzzlecracker, its exactly a problem from Datamining...ye s, the
              stopwords are in a file, with each stop word in a line.

              Hi jonathan, thanks for your replies. I used the following code to
              convert the string from upper to lower case:
              std::transform( file.begin(),fi le.end(),file.b egin(),(int(*)( int))std::tolow er);

              file: is the string from which stopwords need to be removed
              Thanks a lot,
              Hp

              Comment

              • puzzlecracker

                #8
                Re: remove certain words from a c++ string


                Hp wrote:[color=blue]
                > Hey Puzzlecracker, its exactly a problem from Datamining...ye s, the
                > stopwords are in a file, with each stop word in a line.
                >
                > Hi jonathan, thanks for your replies. I used the following code to
                > convert the string from upper to lower case:
                > std::transform( file.begin(),fi le.end(),file.b egin(),(int(*)( int))std::tolow er);
                >
                > file: is the string from which stopwords need to be removed
                > Thanks a lot,
                > Hp[/color]

                explain "(int(*)(int))s td::tolow­er)"; of transform? not quite sure
                what that casting is all about.

                thanks

                ps I assume you didn't just blindly copied the code.

                Comment

                • Hp

                  #9
                  Re: remove certain words from a c++ string

                  Hi, i figured out on how to do the case conversion, it was a casting
                  error which i took care of, thanks for the hint Puzzlecracker.
                  I tried using andreus piece of code to remove the stop words, but could
                  not get thru. Any hint on stopword removal would be greatly
                  appreciated, as i m a novice to c++.
                  Thanks, Hp

                  Comment

                  • puzzlecracker

                    #10
                    Re: remove certain words from a c++ string


                    Hp wrote:[color=blue]
                    > Hi, i figured out on how to do the case conversion, it was a casting
                    > error which i took care of, thanks for the hint Puzzlecracker.
                    > I tried using andreu piece of code to remove the stop words, but could
                    > not get thru. Any hint on stopword removal would be greatly
                    > appreciated, as i m a novice to c++.
                    > Thanks, Hp[/color]


                    easy:

                    1. populate all stop words into a set
                    2. read all words from the file into a vector and as you read, check
                    wether that word is a stop word (use lexegraphics_co mpare to avoid case
                    issue. If it is, discard it, otherwise put into a vector.

                    I will start:?
                    #include<iostre am>
                    #include<set>
                    #include<vector >

                    using namespace std;

                    void initialize(cons t set<string>);


                    int main(int argc, char *argv[])
                    {

                    set<string> stopWset;
                    vector<string> wordvec;
                    ifstream in("input.txt") ;

                    if(!in)
                    //report error

                    initialize(stop Wset); //

                    string word;
                    while(in>>word)
                    if(stopWset.fin d(word)!=stopWs et.end())
                    wordvec.push_ba ck(word);


                    return 0;


                    }



                    you get the idea. Or you suggest reading the entire file at once?

                    Comment

                    • Hp

                      #11
                      Re: remove certain words from a c++ string

                      Hi puzzlecracker, I got the idea, wherein we are putting all the
                      non-stopwords into a vector of strings.
                      Here, if i am not wrong, input.txt is the file that has the list of
                      stopwords. Which one is the string that has the contents with the
                      stopwords and non-stopwords?
                      And what does initialize do?
                      Thanks

                      Comment

                      • Karl Heinz Buchegger

                        #12
                        Re: remove certain words from a c++ string

                        Hp wrote:[color=blue]
                        >
                        > Hi puzzlecracker, I got the idea, wherein we are putting all the
                        > non-stopwords into a vector of strings.[/color]

                        No.
                        In puzzlecrackers code

                        stopWset stands for the 'set of stop words'
                        wordvec is the vector of words you read from your input and which are
                        (after the loop has finished) not stop words
                        [color=blue]
                        > Here, if i am not wrong, input.txt is the file that has the list of
                        > stopwords.[/color]

                        That's why it is called 'input' :-)
                        input is the file you want to check against the stop words
                        [color=blue]
                        > Which one is the string that has the contents with the
                        > stopwords and non-stopwords?
                        > And what does initialize do?[/color]

                        What do you think.
                        There are 2 file operations going on in the whole program
                        * one deals with your input
                        * the second one deals with the file of stop words

                        so if the loop handles your input file, what do you think
                        will be the job of initialize( stopWset). Especially when one
                        takes into account that it gets passed 'stopWset'.


                        --
                        Karl Heinz Buchegger
                        kbuchegg@gascad .at

                        Comment

                        • Hp

                          #13
                          Re: remove certain words from a c++ string

                          Hi All,
                          Thanks a lot for all your replies.

                          My requirement is as follows:
                          I need to read a text file, eliminate certain special characters(like !
                          , - = + ), and then convert it to lower case and then remove certain
                          stopwords(like and, a, an, by, the etc) which is there in another txt
                          file.
                          Then, i need to run it thru a stemmer(a program which converts words
                          like running to run, ie, converts them to roots words).
                          Then i need to create a term-by-document matrix, which would be a
                          matrix, where in M(i,j) will give the number of times the term j occurs
                          in the document i.

                          My situation as of now is as below:
                          I have read the file contents into a string variable, removed/replaced
                          the special characters with a space using the replace function, and
                          then converted the string completely to lower case, using the transform
                          function.

                          I would really appreciate .any help, thanks i advance.

                          Thanks,
                          Hp

                          Comment

                          • Greg

                            #14
                            Re: remove certain words from a c++ string

                            Hp wrote:[color=blue]
                            > Hi All,
                            > Thanks a lot for all your replies.
                            >
                            > My requirement is as follows:
                            > I need to read a text file, eliminate certain special characters(like !
                            > , - = + ), and then convert it to lower case and then remove certain
                            > stopwords(like and, a, an, by, the etc) which is there in another txt
                            > file.
                            > Then, i need to run it thru a stemmer(a program which converts words
                            > like running to run, ie, converts them to roots words).
                            > Then i need to create a term-by-document matrix, which would be a
                            > matrix, where in M(i,j) will give the number of times the term j occurs
                            > in the document i.
                            >
                            > My situation as of now is as below:
                            > I have read the file contents into a string variable, removed/replaced
                            > the special characters with a space using the replace function, and
                            > then converted the string completely to lower case, using the transform
                            > function.
                            >
                            > I would really appreciate .any help, thanks i advance.
                            >
                            > Thanks,
                            > Hp[/color]

                            I know this may sound sacriliegious in a C++ newsgroup and all, but
                            does the text processing program have to be written in C++?

                            There are several dedicated text processing tools such as awk or sed,
                            or scripting languages (like Perl) that are specifically designed for
                            text stream editing. While certainly none of these alternatives is
                            particularly accessible, none has a steep learning curve either.

                            The power of regular expressions for manipulating text is difficult to
                            match in a C++ program without such support, at least in my experience.
                            And since I am not (too much of) a language snob, I recommend choosing
                            the best language for the job, even if it's not the best language. For
                            example, lowercasing a file's content with sed is a simple command

                            sed -e 's/[A-Z]/[a-z]/g' inputfile

                            Writing a C++ program to do the same would more involved. The good news
                            is that tr1's regex brings regular expression support to C++. So if a
                            C++ solution is required, I would look at regex to see whether it can
                            help solve your problem.

                            And if you do write the program in a language other than C++, some here
                            will be able to forgive you. But just don't tell your friends what you
                            have done.

                            Greg

                            Comment

                            • Hp

                              #15
                              Re: remove certain words from a c++ string

                              Yeah Greg, i do need to have it coded in C++.
                              Thanks for your reply though. I still havent found a solution to that..

                              Greg wrote:[color=blue]
                              > Hp wrote:[color=green]
                              > > Hi All,
                              > > Thanks a lot for all your replies.
                              > >
                              > > My requirement is as follows:
                              > > I need to read a text file, eliminate certain special characters(like !
                              > > , - = + ), and then convert it to lower case and then remove certain
                              > > stopwords(like and, a, an, by, the etc) which is there in another txt
                              > > file.
                              > > Then, i need to run it thru a stemmer(a program which converts words
                              > > like running to run, ie, converts them to roots words).
                              > > Then i need to create a term-by-document matrix, which would be a
                              > > matrix, where in M(i,j) will give the number of times the term j occurs
                              > > in the document i.
                              > >
                              > > My situation as of now is as below:
                              > > I have read the file contents into a string variable, removed/replaced
                              > > the special characters with a space using the replace function, and
                              > > then converted the string completely to lower case, using the transform
                              > > function.
                              > >
                              > > I would really appreciate .any help, thanks i advance.
                              > >
                              > > Thanks,
                              > > Hp[/color]
                              >
                              > I know this may sound sacriliegious in a C++ newsgroup and all, but
                              > does the text processing program have to be written in C++?
                              >
                              > There are several dedicated text processing tools such as awk or sed,
                              > or scripting languages (like Perl) that are specifically designed for
                              > text stream editing. While certainly none of these alternatives is
                              > particularly accessible, none has a steep learning curve either.
                              >
                              > The power of regular expressions for manipulating text is difficult to
                              > match in a C++ program without such support, at least in my experience.
                              > And since I am not (too much of) a language snob, I recommend choosing
                              > the best language for the job, even if it's not the best language. For
                              > example, lowercasing a file's content with sed is a simple command
                              >
                              > sed -e 's/[A-Z]/[a-z]/g' inputfile
                              >
                              > Writing a C++ program to do the same would more involved. The good news
                              > is that tr1's regex brings regular expression support to C++. So if a
                              > C++ solution is required, I would look at regex to see whether it can
                              > help solve your problem.
                              >
                              > And if you do write the program in a language other than C++, some here
                              > will be able to forgive you. But just don't tell your friends what you
                              > have done.
                              >
                              > Greg[/color]

                              Comment

                              Working...