Counting words from text file

**Glenton** · Jun 7 '12, 07:41 AM

Hi

Do you have a seed list of countries that you are going to look for? Are you going to look for "USA", "United States", "US of A", "United States of America", "America" as separate items? If so, you need also to be cognisant that "America" would also be found when "United States of America" is found etc.

There tends to be a lot of messy details with real life data mining!!

Also, please use the code tags - it will make it much clearer to us - especially with Python where indents are important.

Anyway, you seem to have got some results so far - hopefully you can clarify your question a little more...

**eGrove Systems** · Dec 14 '12, 04:53 AM

Code:

def count_words(file_name ):

        fname = file_name

        num_lines = 0
        num_words = 0

        with open(fname, 'r') as f:
            for line in f:
                words = line.split()

                num_lines += 1
                num_words += len(words)

        return num_words

    words_count = count_words(file_name ) //File name with absolute path.

**kttr** · Dec 19 '12, 12:56 AM

I agree with Glenton. Data analysis in real life, especially with finding specific words matching a given meaning is complicated. However, I will assume that you only want to find the number of times "Germany" is found and NOT "Prussia". Either way, this difference should be made clear.

So far, your code is a little hard to read. Try adding more comments so that readers know what each part of your program is doing.

All criticism aside, here is what I would try, and why:

(Tell me if the code works or not!!!!!)

Before doing anything, just copy and paste the text from the online book onto a notepad document and save it as "ww1.txt" (quotes not included in name). That way, you can avoid any troubles that might arise by reading the file over the internet.

Once you've done that, here is what the code that you will make might look like (I have explained the code using inline comments).

Code:

filehandler = open("ww1.txt","r+")

#start a counter variable.
#Every time you find the word, the corresponding variable will increase by 1.
#this part is under the 'for' loop
germany_counter = 0
holland_counter = 0
#add more countries' counters here

for line in filehandler:
    stringy_line = str(line)#convert the line to string so you can use the find function.
    
    if stringy_line.find("Germany") != -1: #essentially, this part on the left means: if the word Germany is found AT ALL, then execute the following code. 
        germany_counter = germany_counter + 1 #increase counter by one when word is found in line
    if stringy_line.find("Holland") != -1:
        holland_counter = holland_counter + 1 #same principle as above applies here
        
#when it's done reading all of the lines, print out the country's name and the counter
print "Germany",germany_counter
print "Holland",holland_counter

#obviously, you must add more counters and if statements to the code for other countries. 
#PLEASE NOTE: you will have to change the search string depending on what you're looking for. Do you want Prussia or Germany? Edit the search string to see the difference.

**dwblas** · Dec 19 '12, 05:56 PM

The original code is from June and the OP has not responded. It was either solved months ago or forgotten.

Counting words from text file

Counting words from text file

Comment

Comment

Comment

Comment