CountWords Assn.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • slapsh0t11
    New Member
    • Oct 2009
    • 16

    CountWords Assn.

    I would greatly appreciate it if any one of you kind souls could take some time to help me out with an interesting bug in my program. I have tried many times to find the source of the problem unsuccessfully and believe that a second set of eyes will do wonders. Thanks in advance for reading my post.

    *************** *************** ***

    So, after quite some work, I was able to get this program to run (I am a bit of a Java novice). However, the output is not what I am looking for. Here's the assignment for background info:

    LAB ASSIGNMENT A19.3

    CountWords

    Background:

    1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:
    - Hyphenated-words w/out space = 1 word
    - Hypenated - words w/ space = 2 words
    - Apostrophes in words = 1 word



    2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:

    Data Structures Algorithms

    Array classes
    String class
    ArrayList class
    sorting
    searches
    text file processing

    Assignment:

    1. Your instructor will provide you with a data file (such as test.txt, Lincoln.txt, or dream.txt) to analyze. Parse the file and print out the following statistical results:

    – Total number of unique words used in the file.
    – Total number of words in a file.
    – The top 30 words which occur the most frequently, sorted in descending order by count.

    For example:

    1 103 the
    2 97 of
    3 59 to
    4 43 and
    5 36 a

    6 32 be
    7 32 we
    8 26 will
    9 24 that
    10 21 is

    ... rest of top 30 words ...

    Number of words used = 525
    Total # of words = 1577

    Now, time for my code:

    wordCounter.jav a:
    Code:
    import java.util.*;
    import java.io.*;
    
    public class wordCounter
    {
    	private String inFileName;
    	private int i;
    	private ArrayList <String> sortedWords = new ArrayList <String> ();
    	private ArrayList <String> uniqueWords = new ArrayList <String> ();
    	private ArrayList <Word> indivCount = new ArrayList <Word> ();
    	
    	public wordCounter(String fn)
    	{
    		inFileName = fn;
    	}
    	
    	public void readData(ArrayList <String> fileWords)
    	{
    		Scanner in;
    		try
    		{
    			in = new Scanner(new File(inFileName));
    			int i = 0;
    			while(in.hasNext())
    			{
    				fileWords.add(in.next().toLowerCase());
    				i++;
    			}
    		}
    		catch(IOException x)
    		{
    			System.out.println("Error: " + x.getMessage());
    		}
    	}
    	
    	public void sortList(ArrayList <String> a)
    	{
    		for(int position = 0; position < a.size(); position++)
    		{
    			String key = a.get(position);
    			
    			while(position > 0 && a.get(position - 1).compareTo(key) > 0)
    			{
    				a.set(position, a.get(position - 1));
    				position--;
    			}
    			
    			a.set(position, key);
    		}
    		sortedWords = a;
    	}
    	
    	public int findUnique(ArrayList <String> fileWords)
    	{
    		uniqueWords = fileWords;
    		
    		while(i < uniqueWords.size() - 1)
    		{
    			if(uniqueWords.get(i).compareTo(uniqueWords.get(i+1)) == 0)
    			{
    				uniqueWords.remove(i+1);
    			}
    			else
    			{
    				i++;
    			}
    		}
    		return uniqueWords.size();
    	}
    	
    	public int returnWordTotal(ArrayList <String> a)
    	{
    		return a.size();
    	}
    	
    	public void top30()
    	{
    		indivCount.add(new Word(sortedWords.get(0), 1));
    		
    		for(int x = 0; x < sortedWords.size() - 1; x++)
    		{
    			if(sortedWords.get(x).compareTo(sortedWords.get(x+1)) == 0)
    			{
    				int count = indivCount.get(x).getCount();
    				indivCount.get(x).setCount(count++);
    			}
    			else
    			{
    				indivCount.add(new Word((sortedWords.get(x)), 1));				
    			}
    			
    			//indivCount.add(new Word(sortedWords.get(x).getWord(), (sortedWords.get(x).getCount() + 1)));
    		}
    	}
    	
    	public void Sort()
    	{
    		mergeSort(indivCount, 0, indivCount.size() -	1);
    	}
    	
    	private void merge(ArrayList <Word> a, int first, int mid, int last)
    	{
    		  //same as in QuadSortComparableProject
    		  //use a temporary array and then put back into original
    		  int i = first;
    		  int j = 1 + mid;
    		  ArrayList <Word> temp = new ArrayList <Word> ();
    			
    		  while(i <= mid && j <= last)
    		  {
    			  if(a.get(i).compareTo(a.get(j)) < 0)
    			  {
    				  temp.add(a.get(i));
    				  i++;
    			  }
    			  else
    			  {
    				  temp.add(a.get(j));
    				  j++;
    			  }
    		  }
    		  
    		  if(i > mid)
    		  {
    			  for(int x = j; x <= last; x++)
    			  {
    				  temp.add(a.get(x));
    			  }
    		  }
    		  else if(j > last)
    		  {
    			  for(int y = i; y <= mid; y++)
    			  {
    				  temp.add(a.get(y));
    			  }
    		  }
    		  
    		  for(int q = 0; q < temp.size(); q++)
    		  {
    			  a.set(first + q, temp.get(q));
    		  }
    	}
    	
    	public void mergeSort(ArrayList <Word> a, int first, int last)
    	{
    		//same as in QuadSortComparableProject
    		if(first != last)
    		{
    			int mid = (first + last)/2;
    			mergeSort(a, first, mid);
    			mergeSort(a, mid + 1, last);
    			merge(a, first, mid, last);
    		}
    	}
    	
    	
    	public void displayWord()
    	{
    		System.out.printf("%8s", "Count");
    		System.out.printf("%15s", "Word");
    		System.out.println("");
    		for(int i = 0; i < 30; i++)  //30 used instead of: indivCount.size()
    		{
    			System.out.print(i+1);
    			System.out.printf("%8s", ((Word)indivCount.get(i)).getCount());
    			System.out.printf("%14s", ((Word)indivCount.get(i)).getWord());
    			System.out.println("");
    			if((i+1)%5 == 0)
    			{
    				System.out.println("");
    			}
    		}
    	}
    	
    }




    Now, Word.java:

    Code:
    public class Word implements Comparable <Word>
    {
    	private String myWord;
    	private int myCount; //word occurrences
    	
    	public Word(String word, int count)
    	{
    		myWord = word;
    		myCount = count;
    	}
    	
    	public int getCount()
    	{
    		return myCount;
    	}
    	
    	public void setCount(int count)
    	{
    		myCount = count;
    	}
    	
    	public String getWord()
    	{
    		return myWord;
    	}
    	
    	public void setWord(String word)
    	{
    		myWord = word;
    	}
    	
    	public int compareTo(Word other)
    	{
    		if(myCount > other.myCount)
    		{
    			return 1;
    		}
    		else if(myCount < other.myCount)
    		{
    			return -1;
    		}
    		else
    		{
    			return 0;
    		}
    	}
    	
    }


    And finally, my tester file, wordCounterTest er.java:

    Code:
    import java.util.ArrayList;
    
    
    public class wordCounterTester
    {
    	private static ArrayList <String> fileWords = new ArrayList <String> ();
    	
    	public static void main(String[] args)
    	{
    		wordCounter myCounter = new wordCounter("dream.txt");
    		myCounter.readData(fileWords);
    		myCounter.sortList(fileWords);
    		System.out.println("Total # of words in file: " + myCounter.returnWordTotal(fileWords));
    		System.out.println("Total # of unique words in file: " + myCounter.findUnique(fileWords));
    		myCounter.top30();
    		myCounter.Sort();
    		myCounter.displayWord();		
    	}
    
    }




    Here is a sample output for MLK Jr's "I have a dream" speech (dream.txt):

    Total # of words in file: 1580
    Total # of unique words in file: 587
    Count Word
    1 1 you
    2 1 york.
    3 1 york
    4 1 years
    5 1 wrote

    6 1 wrongful
    7 1 would
    8 1 work
    9 1 words
    10 1 withering

    11 1 with
    12 1 winds
    13 1 will
    14 1 whose
    15 1 who

    16 1 white
    17 1 whirlwinds
    18 1 which
    19 1 where
    20 1 when

    21 1 were
    22 1 we
    23 1 waters
    24 1 was
    25 1 warm

    26 1 wallow
    27 1 walk,
    28 1 walk
    29 1 vote.
    30 1 vote




    Obviously, the program doesn't print the top 30 recurring words. It seems to print the last 30 unique words in alphabetical order. This is NOT right!! I have traced through my code many times and see no reason that the output should be wrong. I want it to look like the sample output in the assn. at the top of this post. Attached is the txt file I used.

    SO: if anyone can take the time to help me sort this out, I would much appreciate any guidance. All I need is another set of eyes to help me identify the problem.

    THANKS IN ADVANCE.
    Attached Files
  • wizardry
    New Member
    • Jan 2009
    • 201

    #2
    we really should not help wth homework assignments please note for future posts! however look at what your calling to print in wordCounterTest er.java your calling unique method......

    Comment

    • slapsh0t11
      New Member
      • Oct 2009
      • 16

      #3
      NOTE: this is not a homework assignment for a grade. It is simply a problem my teacher has given my class to as optional review. So, helping me figure out were I went wrong would allow me to further my currently limited knowledge of Java.

      Again, if anyone is willing to help me find the error in my program that leads to this incorrect output, that would be much appreciated! Thanks again.

      I have traced through my code numerous times, and it seems to me that the output should be correct. I believe a new set of eyes is all that is necessary to help me solve this OPTIONAL problem.

      Comment

      • NeoPa
        Recognized Expert Moderator MVP
        • Oct 2006
        • 32633

        #4
        With homework assignments it is actually ok to help when the member has posted what they have already tried. We are always very keen to help anybody learn. What we want to avoid is posting something where the teacher (or professor or whatever) feels that we have done the assignment (or part of it) for them.

        Hints and general advice, for instance on debugging techniques that are usable in various circumstances are not a problem. Care should be taken of course, not simply to hand out answers.

        Please check your PMs though slapsh0t, as PMing experts directly is certainly not so acceptable.

        Comment

        • Frinavale
          Recognized Expert Expert
          • Oct 2006
          • 9749

          #5
          I don't see how NeoPa's answer was the "best answer" considering that it has nothing to do with the original problem or question. So, I have reset the best answer.

          @slapsh0t11

          It's very hard to read through so much code to try and figure out where you're going wrong. Its a lot easier for experts and members to look at a specific line of code or function rather than making them looking through an entire application. In the future try to reduce the question's size. Locate what you think the source of the problem is, and only post code that is relevant.

          That being said...
          Have you considered using a HashTable to solve your problem?

          I know that your assignment requirements list what you're allowed to use, but you could easily implement a quick HashTable class using classes listed in the assignment.

          You could create a new Key/Value pair for each word that you find and store it in the HashTable...if the key already exists for the word then add 1 to the Value.

          If you don't know what a HashTable is then you should look it up :)

          It's fairly similar to what you have really but you wouldn't be using a "Word Class".
          You'd just use a HashTable with the keys being the words in the file and the values being the count of the words in the file.

          It would look something like (pseudo-code):
          • If theHashTable.Ke ys collection contains the word then:
            • Get the Value
            • Add One to the value
            • Store the value back in the hash table at the key (the word)
          • If theHashTable.Ke ys collection does not contain the word then:
            • Add a new key/value pair to the HashTable:
              • the key being the word, the value being "1" (because there has only been one found so far).
          • Get the next word and Loop


          Now when you want to find out which words are unique you'd just loop through the hash table keys and check the value for each key...if the value is "1" then you know that it's unique.

          If you don't want to use a HashTable then you should at least be using an ArrayList of Word objects (as apposed to an ArrayList of String objects). What good is an ArrayList of String Objects to you anyways?

          You would populate this ArrayList in the readData method. The catch here is to only create a new Word object for each Unique word that you find.

          So you'd do something like (again pseudo-code):
          • Get the next word in the file
          • If ArrayList of Word Objects contains this word then:
            • Get the Word Object for the word from the ArrayList
            • Retrieve the current count for the word (using the getCount() method)
            • Add One to the current count value
            • Store the new value back in the Word (using the setCount() method)
          • If the ArrayList of Word Objects does not contain the word then:
            • Create a new Word Object for the word.
            • Store the Word Object in the ArrayList of Word Objects
          • Loop...


          While you're looping you should be checking for the special conditions that your assignment outlines (that word space hyphen space word requirement is a little weird) and removing any punctuation that may be attached to words (I would think the word "walk" and the word "walk," would be considered the same word...but then again that word<space>-<space>word thing is weird...so check your requirements)



          -Frinny

          Comment

          Working...