split a text

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mickey0
    New Member
    • Jan 2008
    • 142

    split a text

    hello,
    I have to read a text and I decided to split it with Scanner class.
    Code:
    	private Vector<String> splitWords(Scanner s) {
    		Vector<String> v = new Vector<String>();		
    		while ( s.hasNext() ) {
    			
    			String token = s.next();
    			if ( token.length() == 1 && (token == "." || token == ";" || token == "," || token == ":") ) { 
    				continue;
    			
    			
    			}
    			v.add( token );
    		}
    		return v;								
    	}
    and return a vector with all tokens. However as you can see I'd like to don't put in the vector the symbols ; ,.: ? )( etc. I can do as here above (with that ugly if), but I have a text like
    "heloworls..... ............... ..."
    how can I delete the ............... ............ in agodd way? (the scanner doens't separate the ".............. ............... ..."

    thanks
  • Nepomuk
    Recognized Expert Specialist
    • Aug 2007
    • 3111

    #2
    Well, have a look at regular expressions. They should do the job. Especially make sure to read this.

    Greetings,
    Nepomuk

    Comment

    • samido
      New Member
      • Oct 2007
      • 52

      #3
      string tokenisation my firend, use this class ...!

      Comment

      • JosAH
        Recognized Expert MVP
        • Mar 2007
        • 11453

        #4
        Originally posted by samido
        string tokenisation my firend, use this class ...!
        Tokenizers are the old fashioned way of doing things; Scanners or a simple String.split() handle the job much better (by using regular expressions).

        kind regards,

        Jos

        Comment

        • mickey0
          New Member
          • Jan 2008
          • 142

          #5
          Originally posted by Nepomuk
          Well, have a look at regular expressions. They should do the job. Especially make sure to read this.

          Greetings,
          Nepomuk
          Hi, I tri to do this to skip every non-word charachter but it dones't work at all.
          Code:
          		while ( s.hasNext() ) {
          			try {
          				s.skip("\\W");
          			}
          			catch (NoSuchElementException ex) {
          				
          			}
          			
          			String token = s.next();
          What's wrong, please?

          Comment

          • JosAH
            Recognized Expert MVP
            • Mar 2007
            • 11453

            #6
            Try this: Scanner.useDeli miter("\\W+");

            kind regards,

            Jos

            Comment

            Working...