Java: select 100 random words from String array

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • alex1984
    New Member
    • Oct 2015
    • 8

    Java: select 100 random words from String array

    Hello, I have a text file dictionary.txt and I'm trying to get first 100 random words from each line of the text file and put them into String array, but it's not working. I figured how to separate files from text and put them into array, but I can't figure where to include 100 to get them sorted.
    Thank you!!!
    Code:
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.util.Random;
    import java.util.Scanner;
    
    public class New2 
    {
    		 public static void main(String[] args) throws FileNotFoundException
    		 {
    			 Scanner sc = new Scanner(new File("dictionary.txt"));
    	    	 while (sc.hasNext()) 	
    	    		 {
    	    		 String word = sc.next();
    	    		 sc.nextLine();	  
    	    		 String[] wordArray = word.split(" "); 
    	    		 //System.out.println(Arrays.toString(wordArray));
    	    		 int idx = new Random().nextInt(wordArray.length);
    	    		 String random = (wordArray[idx]);
    	    		 System.out.println(random);
    
    			 
    			}
    }
    }
  • chaarmann
    Recognized Expert Contributor
    • Nov 2007
    • 785

    #2
    Use
    Code:
    Collections.sort(arrayList);
    to sort them.
    In order to do so, you must convert your array into an arrayList.
    You can do that with
    Code:
    List arrayList = Arrays.asList(wordArray);
    If you only want the first 100, you can make a sublist:
    Code:
    Collections.sort(arrayList.subList(0,100));

    Comment

    • alex1984
      New Member
      • Oct 2015
      • 8

      #3
      I tried to do it your way and it threw me an error:
      Code:
      Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = 100
      	at java.util.SubList.<init>(Unknown Source)
      	at java.util.RandomAccessSubList.<init>(Unknown Source)
      	at java.util.AbstractList.subList(Unknown Source)
      	at New2.main(New2.java:20)
      Here is the code that I've changed according to your suggestion:
      Code:
      import java.io.File;
      import java.io.FileNotFoundException;
      import java.util.Arrays;
      import java.util.Collections;
      import java.util.List;
      import java.util.Scanner;
      
      public class New2 
      {
      	public static void main(String[] args) throws FileNotFoundException
      		 {
      			 Scanner sc = new Scanner(new File("dictionary.txt"));
      	    	 while (sc.hasNext()) 	
      	    		 {
      	    		 String word = sc.next();
      	    		 sc.nextLine();	  
      	    		 String[] wordArray = word.split(" "); 
      	    		 //System.out.println(Arrays.toString(wordArray));
      	    		 List<String> arrayList = Arrays.asList(wordArray);	    			    		 
      	    		 Collections.sort(arrayList.subList(0, 100));
      
      
      // Here I tried to print out that arrayList.
      // It throws the same error even if I remove the code below.
      
      
      	    		 for (int i = 0; i < arrayList.size(); i++) 
      	    		 {
      	    			    String value = arrayList.get(i);
      	    			   System.out.println(value);
      	    		 }
      	    		    
      	    		 }
      		 }			 
      }
      Last edited by alex1984; Oct 28 '15, 12:52 AM. Reason: Forgot to insert Code

      Comment

      • chaarmann
        Recognized Expert Contributor
        • Nov 2007
        • 785

        #4
        So that means one of your lines has less than 100 words! I did not asume that, because you wrote "get first 100 random words from each line".

        So in case the line has less than 100 words, you only get what's available. (Or do you want to print out an error message instead?)

        So instead of:
        Code:
        arrayList.subList(0, 100)
        use:
        Code:
        arrayList.subList(0, Math.min(100, arrayList.size()))

        Oh wait, does "get first 100 random words from each line" mean that there are 100 lines and you get a single word, at a random position, from each line ?
        Or does that mean: get 100 words from each line and then shuffle-sort them randomly?

        because I told you the second meaning. But still a small mistake from my side: use Collections.shu ffle() instead of Collections.sor t()

        Comment

        • alex1984
          New Member
          • Oct 2015
          • 8

          #5
          So that dictionary file has almost 1000 lines in it. First word of each line is the actual word and then the rest of the line is definition. I need to pull 100 of those random first words in each line and put them into array or array list and then I'll print them into separate file, but that's different story.
          I did try Collections.shu ffle, but it doesn't shuffle them, just prints the whole array with all first words from each line. However, if I declare array and add my words into it, Collections.shu ffle works without a problem.

          Comment

          • alex1984
            New Member
            • Oct 2015
            • 8

            #6
            Do you know if there is way to get 100 random lines from text file and then separate first words from those lines, and then load them into array?

            Comment

            • chaarmann
              Recognized Expert Contributor
              • Nov 2007
              • 785

              #7
              Maybe the shuffle did not work, because you shuffled a temporary copy and then threw the shuffled copy away?
              like:
              Code:
              Collections.shuffle(Arrays.asList(arr))";
              Here, Arrays.asList() makes a copy of the original int[] arr and returns it. So if we refactor this statement, we get:
              Code:
              String[] arr = { "a", "ab" "b", "c"};
              List al = Arrays.asList(arr);
              Collections.shuffle(al);
              System.out.println("original array" + arrays.asList(arr));
              Sytsem.out.println("shuffled array" + al);
              Experts usually work with Collections and not with arrays of primitive types. Beginners usually only learn about primitive arrays at university, but not about Collections (List, Map, etc.) Especially your task would be simpler and less code when using only collections.

              Following algorithm:
              1.) define an empty ArrayList al
              2.) read all text lines and put only the first word of each line into your al with
              Code:
              al.add(firstWord)
              3.) at the end, shuffle your array list (see above)
              4.) print only first 100 entries with:
              Code:
              System.out.println("first 100=" + al.subList(0, Math.min(100, arrayList.size()))

              Comment

              Working...