I would greatly appreciate it if any one of you kind souls could take some time to help me out with an interesting bug in my program. I have tried many times to find the source of the problem unsuccessfully and believe that a second set of eyes will do wonders. Thanks in advance for reading my post.
*************** *************** ***
So, after quite some work, I was able to get this program to run (I am a bit of a Java novice). However, the output is not what I am looking for. Here's the assignment for background info:
Now, time for my code:
wordCounter.jav a:
Now, Word.java:
And finally, my tester file, wordCounterTest er.java:
Here is a sample output for MLK Jr's "I have a dream" speech (dream.txt):
Obviously, the program doesn't print the top 30 recurring words. It seems to print the last 30 unique words in alphabetical order. This is NOT right!! I have traced through my code many times and see no reason that the output should be wrong. I want it to look like the sample output in the assn. at the top of this post. Attached is the txt file I used.
SO: if anyone can take the time to help me sort this out, I would much appreciate any guidance. All I need is another set of eyes to help me identify the problem.
THANKS IN ADVANCE.
*************** *************** ***
So, after quite some work, I was able to get this program to run (I am a bit of a Java novice). However, the output is not what I am looking for. Here's the assignment for background info:
LAB ASSIGNMENT A19.3
CountWords
Background:
1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:
- Hyphenated-words w/out space = 1 word
- Hypenated - words w/ space = 2 words
- Apostrophes in words = 1 word
2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:
Data Structures Algorithms
Array classes
String class
ArrayList class
sorting
searches
text file processing
Assignment:
1. Your instructor will provide you with a data file (such as test.txt, Lincoln.txt, or dream.txt) to analyze. Parse the file and print out the following statistical results:
– Total number of unique words used in the file.
– Total number of words in a file.
– The top 30 words which occur the most frequently, sorted in descending order by count.
For example:
1 103 the
2 97 of
3 59 to
4 43 and
5 36 a
6 32 be
7 32 we
8 26 will
9 24 that
10 21 is
... rest of top 30 words ...
Number of words used = 525
Total # of words = 1577
CountWords
Background:
1. This lab assignment will count the occurrences of words in a text file. Here are some special cases that you must take into account:
- Hyphenated-words w/out space = 1 word
- Hypenated - words w/ space = 2 words
- Apostrophes in words = 1 word
2. You are encouraged to use a combination of all the programming tools you have learned so far, such as:
Data Structures Algorithms
Array classes
String class
ArrayList class
sorting
searches
text file processing
Assignment:
1. Your instructor will provide you with a data file (such as test.txt, Lincoln.txt, or dream.txt) to analyze. Parse the file and print out the following statistical results:
– Total number of unique words used in the file.
– Total number of words in a file.
– The top 30 words which occur the most frequently, sorted in descending order by count.
For example:
1 103 the
2 97 of
3 59 to
4 43 and
5 36 a
6 32 be
7 32 we
8 26 will
9 24 that
10 21 is
... rest of top 30 words ...
Number of words used = 525
Total # of words = 1577
Now, time for my code:
wordCounter.jav a:
Code:
import java.util.*;
import java.io.*;
public class wordCounter
{
private String inFileName;
private int i;
private ArrayList <String> sortedWords = new ArrayList <String> ();
private ArrayList <String> uniqueWords = new ArrayList <String> ();
private ArrayList <Word> indivCount = new ArrayList <Word> ();
public wordCounter(String fn)
{
inFileName = fn;
}
public void readData(ArrayList <String> fileWords)
{
Scanner in;
try
{
in = new Scanner(new File(inFileName));
int i = 0;
while(in.hasNext())
{
fileWords.add(in.next().toLowerCase());
i++;
}
}
catch(IOException x)
{
System.out.println("Error: " + x.getMessage());
}
}
public void sortList(ArrayList <String> a)
{
for(int position = 0; position < a.size(); position++)
{
String key = a.get(position);
while(position > 0 && a.get(position - 1).compareTo(key) > 0)
{
a.set(position, a.get(position - 1));
position--;
}
a.set(position, key);
}
sortedWords = a;
}
public int findUnique(ArrayList <String> fileWords)
{
uniqueWords = fileWords;
while(i < uniqueWords.size() - 1)
{
if(uniqueWords.get(i).compareTo(uniqueWords.get(i+1)) == 0)
{
uniqueWords.remove(i+1);
}
else
{
i++;
}
}
return uniqueWords.size();
}
public int returnWordTotal(ArrayList <String> a)
{
return a.size();
}
public void top30()
{
indivCount.add(new Word(sortedWords.get(0), 1));
for(int x = 0; x < sortedWords.size() - 1; x++)
{
if(sortedWords.get(x).compareTo(sortedWords.get(x+1)) == 0)
{
int count = indivCount.get(x).getCount();
indivCount.get(x).setCount(count++);
}
else
{
indivCount.add(new Word((sortedWords.get(x)), 1));
}
//indivCount.add(new Word(sortedWords.get(x).getWord(), (sortedWords.get(x).getCount() + 1)));
}
}
public void Sort()
{
mergeSort(indivCount, 0, indivCount.size() - 1);
}
private void merge(ArrayList <Word> a, int first, int mid, int last)
{
//same as in QuadSortComparableProject
//use a temporary array and then put back into original
int i = first;
int j = 1 + mid;
ArrayList <Word> temp = new ArrayList <Word> ();
while(i <= mid && j <= last)
{
if(a.get(i).compareTo(a.get(j)) < 0)
{
temp.add(a.get(i));
i++;
}
else
{
temp.add(a.get(j));
j++;
}
}
if(i > mid)
{
for(int x = j; x <= last; x++)
{
temp.add(a.get(x));
}
}
else if(j > last)
{
for(int y = i; y <= mid; y++)
{
temp.add(a.get(y));
}
}
for(int q = 0; q < temp.size(); q++)
{
a.set(first + q, temp.get(q));
}
}
public void mergeSort(ArrayList <Word> a, int first, int last)
{
//same as in QuadSortComparableProject
if(first != last)
{
int mid = (first + last)/2;
mergeSort(a, first, mid);
mergeSort(a, mid + 1, last);
merge(a, first, mid, last);
}
}
public void displayWord()
{
System.out.printf("%8s", "Count");
System.out.printf("%15s", "Word");
System.out.println("");
for(int i = 0; i < 30; i++) //30 used instead of: indivCount.size()
{
System.out.print(i+1);
System.out.printf("%8s", ((Word)indivCount.get(i)).getCount());
System.out.printf("%14s", ((Word)indivCount.get(i)).getWord());
System.out.println("");
if((i+1)%5 == 0)
{
System.out.println("");
}
}
}
}
Now, Word.java:
Code:
public class Word implements Comparable <Word>
{
private String myWord;
private int myCount; //word occurrences
public Word(String word, int count)
{
myWord = word;
myCount = count;
}
public int getCount()
{
return myCount;
}
public void setCount(int count)
{
myCount = count;
}
public String getWord()
{
return myWord;
}
public void setWord(String word)
{
myWord = word;
}
public int compareTo(Word other)
{
if(myCount > other.myCount)
{
return 1;
}
else if(myCount < other.myCount)
{
return -1;
}
else
{
return 0;
}
}
}
And finally, my tester file, wordCounterTest er.java:
Code:
import java.util.ArrayList;
public class wordCounterTester
{
private static ArrayList <String> fileWords = new ArrayList <String> ();
public static void main(String[] args)
{
wordCounter myCounter = new wordCounter("dream.txt");
myCounter.readData(fileWords);
myCounter.sortList(fileWords);
System.out.println("Total # of words in file: " + myCounter.returnWordTotal(fileWords));
System.out.println("Total # of unique words in file: " + myCounter.findUnique(fileWords));
myCounter.top30();
myCounter.Sort();
myCounter.displayWord();
}
}
Here is a sample output for MLK Jr's "I have a dream" speech (dream.txt):
Total # of words in file: 1580
Total # of unique words in file: 587
Count Word
1 1 you
2 1 york.
3 1 york
4 1 years
5 1 wrote
6 1 wrongful
7 1 would
8 1 work
9 1 words
10 1 withering
11 1 with
12 1 winds
13 1 will
14 1 whose
15 1 who
16 1 white
17 1 whirlwinds
18 1 which
19 1 where
20 1 when
21 1 were
22 1 we
23 1 waters
24 1 was
25 1 warm
26 1 wallow
27 1 walk,
28 1 walk
29 1 vote.
30 1 vote
Total # of unique words in file: 587
Count Word
1 1 you
2 1 york.
3 1 york
4 1 years
5 1 wrote
6 1 wrongful
7 1 would
8 1 work
9 1 words
10 1 withering
11 1 with
12 1 winds
13 1 will
14 1 whose
15 1 who
16 1 white
17 1 whirlwinds
18 1 which
19 1 where
20 1 when
21 1 were
22 1 we
23 1 waters
24 1 was
25 1 warm
26 1 wallow
27 1 walk,
28 1 walk
29 1 vote.
30 1 vote
Obviously, the program doesn't print the top 30 recurring words. It seems to print the last 30 unique words in alphabetical order. This is NOT right!! I have traced through my code many times and see no reason that the output should be wrong. I want it to look like the sample output in the assn. at the top of this post. Attached is the txt file I used.
SO: if anyone can take the time to help me sort this out, I would much appreciate any guidance. All I need is another set of eyes to help me identify the problem.
THANKS IN ADVANCE.
Comment