I would greatly appreciate it if any one of you kind souls could take some time to help me out with an interesting bug in my program. I have tried many times to find the source of the problem unsuccessfully and believe that a second set of eyes will do wonders. Thanks in advance for reading my post.
*************** *************** ***
So, after quite some work, I was able to get this program to run (I am a bit of a Java novice). However, the output is not what I am looking for. Here's the assignment for background info:
Now, time for my code:
wordCounter.jav a:
Now, Word.java:
And finally, my tester file, wordCounterTest er.java:
Here is a sample output for MLK Jr's "I have a dream" speech (dream.txt):
Obviously, the program doesn't print the top 30 recurring words. It seems to print the last 30 unique words in alphabetical order. This is NOT right!! I have traced through my code many times and see no reason that the output should be wrong. I want it to look like the sample output in the assn. at the top of this post. Attached is the txt file I used.
SO: if anyone can take the time to help me sort this out, I would much appreciate any guidance. All I need is another set of eyes to help me identify the problem.
THANKS IN ADVANCE.
*************** *************** ***
So, after quite some work, I was able to get this program to run (I am a bit of a Java novice). However, the output is not what I am looking for. Here's the assignment for background info:
LAB ASSIGNMENT A19.3
CountWords
Background:
1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:
- Hyphenated-words w/out space = 1 word
- Hypenated - words w/ space = 2 words
- Apostrophes in words = 1 word
2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:
Data Structures Algorithms
Array classes
String class
ArrayList class
sorting
searches
text file processing
Assignment:
1. Your instructor will provide you with a data file (such as test.txt, Lincoln.txt, or dream.txt) to analyze. Parse the file and print out the following statistical results:
– Total number of unique words used in the file.
– Total number of words in a file.
– The top 30 words which occur the most frequently, sorted in descending order by count.
For example:
1 103 the
2 97 of
3 59 to
4 43 and
5 36 a
6 32 be
7 32 we
8 26 will
9 24 that
10 21 is
... rest of top 30 words ...
Number of words used = 525
Total # of words = 1577
CountWords
Background:
1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:
- Hyphenated-words w/out space = 1 word
- Hypenated - words w/ space = 2 words
- Apostrophes in words = 1 word
2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:
Data Structures Algorithms
Array classes
String class
ArrayList class
sorting
searches
text file processing
Assignment:
1. Your instructor will provide you with a data file (such as test.txt, Lincoln.txt, or dream.txt) to analyze. Parse the file and print out the following statistical results:
– Total number of unique words used in the file.
– Total number of words in a file.
– The top 30 words which occur the most frequently, sorted in descending order by count.
For example:
1 103 the
2 97 of
3 59 to
4 43 and
5 36 a
6 32 be
7 32 we
8 26 will
9 24 that
10 21 is
... rest of top 30 words ...
Number of words used = 525
Total # of words = 1577
Now, time for my code:
wordCounter.jav a:
Code:
import java.util.*; import java.io.*; public class wordCounter { private String inFileName; private int i; private ArrayList <String> sortedWords = new ArrayList <String> (); private ArrayList <String> uniqueWords = new ArrayList <String> (); private ArrayList <Word> indivCount = new ArrayList <Word> (); public wordCounter(String fn) { inFileName = fn; } public void readData(ArrayList <String> fileWords) { Scanner in; try { in = new Scanner(new File(inFileName)); int i = 0; while(in.hasNext()) { fileWords.add(in.next().toLowerCase()); i++; } } catch(IOException x) { System.out.println("Error: " + x.getMessage()); } } public void sortList(ArrayList <String> a) { for(int position = 0; position < a.size(); position++) { String key = a.get(position); while(position > 0 && a.get(position - 1).compareTo(key) > 0) { a.set(position, a.get(position - 1)); position--; } a.set(position, key); } sortedWords = a; } public int findUnique(ArrayList <String> fileWords) { uniqueWords = fileWords; while(i < uniqueWords.size() - 1) { if(uniqueWords.get(i).compareTo(uniqueWords.get(i+1)) == 0) { uniqueWords.remove(i+1); } else { i++; } } return uniqueWords.size(); } public int returnWordTotal(ArrayList <String> a) { return a.size(); } public void top30() { indivCount.add(new Word(sortedWords.get(0), 1)); for(int x = 0; x < sortedWords.size() - 1; x++) { if(sortedWords.get(x).compareTo(sortedWords.get(x+1)) == 0) { int count = indivCount.get(x).getCount(); indivCount.get(x).setCount(count++); } else { indivCount.add(new Word((sortedWords.get(x)), 1)); } //indivCount.add(new Word(sortedWords.get(x).getWord(), (sortedWords.get(x).getCount() + 1))); } } public void Sort() { mergeSort(indivCount, 0, indivCount.size() - 1); } private void merge(ArrayList <Word> a, int first, int mid, int last) { //same as in QuadSortComparableProject //use a temporary array and then put back into original int i = first; int j = 1 + mid; ArrayList <Word> temp = new ArrayList <Word> (); while(i <= mid && j <= last) { if(a.get(i).compareTo(a.get(j)) < 0) { temp.add(a.get(i)); i++; } else { temp.add(a.get(j)); j++; } } if(i > mid) { for(int x = j; x <= last; x++) { temp.add(a.get(x)); } } else if(j > last) { for(int y = i; y <= mid; y++) { temp.add(a.get(y)); } } for(int q = 0; q < temp.size(); q++) { a.set(first + q, temp.get(q)); } } public void mergeSort(ArrayList <Word> a, int first, int last) { //same as in QuadSortComparableProject if(first != last) { int mid = (first + last)/2; mergeSort(a, first, mid); mergeSort(a, mid + 1, last); merge(a, first, mid, last); } } public void displayWord() { System.out.printf("%8s", "Count"); System.out.printf("%15s", "Word"); System.out.println(""); for(int i = 0; i < 30; i++) //30 used instead of: indivCount.size() { System.out.print(i+1); System.out.printf("%8s", ((Word)indivCount.get(i)).getCount()); System.out.printf("%14s", ((Word)indivCount.get(i)).getWord()); System.out.println(""); if((i+1)%5 == 0) { System.out.println(""); } } } }
Now, Word.java:
Code:
public class Word implements Comparable <Word> { private String myWord; private int myCount; //word occurrences public Word(String word, int count) { myWord = word; myCount = count; } public int getCount() { return myCount; } public void setCount(int count) { myCount = count; } public String getWord() { return myWord; } public void setWord(String word) { myWord = word; } public int compareTo(Word other) { if(myCount > other.myCount) { return 1; } else if(myCount < other.myCount) { return -1; } else { return 0; } } }
And finally, my tester file, wordCounterTest er.java:
Code:
import java.util.ArrayList; public class wordCounterTester { private static ArrayList <String> fileWords = new ArrayList <String> (); public static void main(String[] args) { wordCounter myCounter = new wordCounter("dream.txt"); myCounter.readData(fileWords); myCounter.sortList(fileWords); System.out.println("Total # of words in file: " + myCounter.returnWordTotal(fileWords)); System.out.println("Total # of unique words in file: " + myCounter.findUnique(fileWords)); myCounter.top30(); myCounter.Sort(); myCounter.displayWord(); } }
Here is a sample output for MLK Jr's "I have a dream" speech (dream.txt):
Total # of words in file: 1580
Total # of unique words in file: 587
Count Word
1 1 you
2 1 york.
3 1 york
4 1 years
5 1 wrote
6 1 wrongful
7 1 would
8 1 work
9 1 words
10 1 withering
11 1 with
12 1 winds
13 1 will
14 1 whose
15 1 who
16 1 white
17 1 whirlwinds
18 1 which
19 1 where
20 1 when
21 1 were
22 1 we
23 1 waters
24 1 was
25 1 warm
26 1 wallow
27 1 walk,
28 1 walk
29 1 vote.
30 1 vote
Total # of unique words in file: 587
Count Word
1 1 you
2 1 york.
3 1 york
4 1 years
5 1 wrote
6 1 wrongful
7 1 would
8 1 work
9 1 words
10 1 withering
11 1 with
12 1 winds
13 1 will
14 1 whose
15 1 who
16 1 white
17 1 whirlwinds
18 1 which
19 1 where
20 1 when
21 1 were
22 1 we
23 1 waters
24 1 was
25 1 warm
26 1 wallow
27 1 walk,
28 1 walk
29 1 vote.
30 1 vote
Obviously, the program doesn't print the top 30 recurring words. It seems to print the last 30 unique words in alphabetical order. This is NOT right!! I have traced through my code many times and see no reason that the output should be wrong. I want it to look like the sample output in the assn. at the top of this post. Attached is the txt file I used.
SO: if anyone can take the time to help me sort this out, I would much appreciate any guidance. All I need is another set of eyes to help me identify the problem.
THANKS IN ADVANCE.
Comment