Word count from file help.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jester.dev

    Word count from file help.

    Hello,

    I'm learning Python from Python Bible, and having some
    problems with this code below. When I run it, I get nothing. It
    should open the file poem.txt (which exists in the current
    directory) and count number of times any given word appears
    in the text.

    #!/usr/bin/python


    # WordCount.py - Counts the words in a given text file (poem.txt)

    import string

    def CountWords(Text ):
    "Count how many times each word appears in Text"
    # A string (above) after a def statement is a -
    # "docstring" - a comment intended for documentation.
    WordCount={}
    # We will build up (and return) a dictionary whose keys
    # are the words, and whose values are the corresponding
    # number of occurrences.

    CountWords=""
    # To make the job cleaner, add a period at the end of the
    # text; that way, we are guaranteed to be finished with
    # the current word when we run out of letters:
    Text=Text+"."

    # We assume that ' and - don't break words, but any other
    # nonalphabetic character does. This assumption isn't
    # entirely accurate, but it's close enough for us.
    # string.letters is a string of all alphabetic charactors.
    PiecesOfWords=s tring.letters+" '-"

    # Iterate over each character in the text. The function
    # len () returns the length of a sequence.
    for CharacterIndex in range(0,len(Tex t)):
    CurrentCharacte r=Text[CharacterIndex]

    # The find() method of a string finds the starting
    # index of the first occurrence of a substring within
    # a string, or returns -1 of it doesn't find a substring.
    # The next line of code tests to see wether CurrentCharacte r
    # is part of a word:
    if(PiecesOfWord s.find(CurrentC haracter)!=-1):
    # Append this letter to the current word.
    CurrentWord=Cur rentWord+Curren tCharacter
    else:
    # This character is no a letter.
    if(CurrentWord! =""):
    # We just finished a word.
    # Convert to lowercase, so "The" and
    "the"
    # fall in the same bucket...

    CurrentWord=str ing.lower(Curre ntWord)

    # Now increment this word's count.

    CurrentCount=Wo rdCount.get(Cur rentWord,0)

    WordCount[CurrentWord]=CurrentCount+1

    # Start a new word.
    CurrentWord=""
    return(WordCoun t)
    if (__name__=="__m ain__"):
    # Read the text from the file
    peom.txt.
    TextFile=open(" poem.txt","r")
    Text=TextFile.r ead()
    TextFile.close( )

    # Count the words in the text.
    WordCount=Count Words(Text)
    # Alphabetize the word list, and
    print them all out.
    SortedWords=Wor dCount.keys()
    SortedWords.sor t()
    for Word in SortedWords:
    print Word.WordCount[Word]
  • Ben Finney

    #2
    Re: Word count from file help.

    On Thu, 12 Feb 2004 01:04:20 GMT, jester.dev wrote:[color=blue]
    > I'm learning Python from Python Bible[/color]

    Welcome, I hope you're enjoying learning the language.
    [color=blue]
    > problems with this code below. When I run it, I get nothing.[/color]

    More information required:

    How are you invoking it (what command do you type)? Does the program
    appear to do something, then exit?

    You've told us what you expect the program to do (thanks!):
    [color=blue]
    > It should open the file poem.txt (which exists in the current
    > directory) and count number of times any given word appears in the
    > text.[/color]

    Diagnostics:

    When you encounter unexpected behaviour in a complex piece of code, it's
    best to test some assumptions.

    What happens when the file "poem.txt" is not there? (Rename the file to
    a different name.) This will tell you whether the program is even
    attempting to read the file.

    What happens when you import this into the interactive Python prompt,
    then call CountWords on some text? This will tell you whether the
    function is performing as expected.

    And so on.


    One possible problem that may be a mistake in the way you pasted the
    text into your newsgroup message:
    [color=blue]
    > #!/usr/bin/python
    > [...]
    > import string
    >
    > def CountWords(Text ):
    > [...]
    > for CharacterIndex in range(0,len(Tex t)):
    > [...]
    > if(PiecesOfWord s.find(CurrentC haracter)!=-1):
    > [...]
    > else:
    > if(CurrentWord! =""):
    > [...]
    > if (__name__=="__m ain__"):
    > [...][/color]

    Indentation defines structural language blocks in Python. The "def",
    "for", "if" structures above will encompass *all* lines below them until
    the next line at their own indentation level or less.

    In other words, if the code looks the way you've pasted it here, the
    "def" encompasses everything below it; the "for" encompasses everything
    below it; and the "if(PiecesOfWor ds...):" encompasses everything below
    it. Including the "if( __name__ == "__main__" ):" line.

    Thus, as you've posted it here, the file imports the string module,
    defines a function -- then does nothing with it.

    Please be sure to paste the text literally in messages; or, if you've
    pasted the text exactly as it is in the program, learn how Python
    interprets indentation:

    <http://www.python.org/doc/current/ref/indentation.htm l>

    --
    \ "You've got the brain of a four-year-old boy, and I'll bet he |
    `\ was glad to get rid of it." -- Groucho Marx |
    _o__) |
    Ben Finney <http://bignose.squidly .org/>

    Comment

    • Paul McGuire

      #3
      Re: Word count from file help.

      "jester.dev " <jester.dev@com cast.net> wrote in message
      news:oiAWb.9926 $jk2.28236@attb i_s53...[color=blue]
      > Hello,
      >
      > I'm learning Python from Python Bible, and having some
      > problems with this code below. When I run it, I get nothing. It
      > should open the file poem.txt (which exists in the current
      > directory) and count number of times any given word appears
      > in the text.
      >[/color]
      Try this:

      # wordCount.py
      #
      # invoke using: python wordCount.py <filename>
      #
      from pyparsing import Word, alphas
      import sys

      # modify this word definition as you wish - whitespace is implicit separator
      wordSpec = Word(alphas)

      if len(sys.argv) > 1:
      infile = sys.argv[1]

      wordDict = {}
      filetext = "\n".join( file(infile).re adlines() )
      for wd,locstart,loc end in wordSpec.scanSt ring(filetext):
      #~ curWord = string.lower(wd[0])
      curWord = wd[0].lower()
      if wordDict.has_ke y( curWord ):
      wordDict[curWord] += 1
      else:
      wordDict[curWord] = 1

      print "%s has %d different words." % ( infile, len(wordDict.ke ys()) )
      keylist = wordDict.keys()
      keylist.sort( lambda a,b:
      ( wordDict[b] - wordDict[a] ) or
      ( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
      for k in keylist:
      print k, ":", wordDict[k]


      Comment

      • Paul McGuire

        #4
        Re: Word count from file help.

        Oops, sorry, forgot to mention that this requires downloading pyparsing at
        http://pyparsing.sourceforge.net.

        "Paul McGuire" <ptmcg@users.so urceforge.net> wrote in message
        news:HENWb.8393 $ng6.3610@fe2.t exas.rr.com...[color=blue]
        > "jester.dev " <jester.dev@com cast.net> wrote in message
        > news:oiAWb.9926 $jk2.28236@attb i_s53...[color=green]
        > > Hello,
        > >
        > > I'm learning Python from Python Bible, and having some
        > > problems with this code below. When I run it, I get nothing. It
        > > should open the file poem.txt (which exists in the current
        > > directory) and count number of times any given word appears
        > > in the text.
        > >[/color]
        > Try this:
        >
        > # wordCount.py
        > #
        > # invoke using: python wordCount.py <filename>
        > #
        > from pyparsing import Word, alphas
        > import sys
        >
        > # modify this word definition as you wish - whitespace is implicit[/color]
        separator[color=blue]
        > wordSpec = Word(alphas)
        >
        > if len(sys.argv) > 1:
        > infile = sys.argv[1]
        >
        > wordDict = {}
        > filetext = "\n".join( file(infile).re adlines() )
        > for wd,locstart,loc end in wordSpec.scanSt ring(filetext):
        > curWord = wd[0].lower()
        > if wordDict.has_ke y( curWord ):
        > wordDict[curWord] += 1
        > else:
        > wordDict[curWord] = 1
        >
        > print "%s has %d different words." % ( infile, len(wordDict.ke ys()) )
        > keylist = wordDict.keys()
        > keylist.sort( lambda a,b:
        > ( wordDict[b] - wordDict[a] ) or
        > ( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
        > for k in keylist:
        > print k, ":", wordDict[k]
        >
        >[/color]


        Comment

        • jester.dev

          #5
          Re: Word count from file help.

          See inline.

          Ben Finney wrote:
          [color=blue]
          > On Thu, 12 Feb 2004 01:04:20 GMT, jester.dev wrote:[color=green]
          >> I'm learning Python from Python Bible[/color]
          >[/color]
          [color=blue]
          > How are you invoking it (what command do you type)? Does the program
          > appear to do something, then exit?
          >[/color]

          I made it executable: chmod 755 word_count.py
          I also tried: python word_count.py

          [color=blue]
          > Diagnostics:
          >
          > When you encounter unexpected behaviour in a complex piece of code, it's
          > best to test some assumptions.
          >
          > What happens when the file "poem.txt" is not there? (Rename the file to
          > a different name.) This will tell you whether the program is even
          > attempting to read the file.[/color]

          It does nothing either way. First time I ran it the file was not there.
          [color=blue]
          > What happens when you import this into the interactive Python prompt,
          > then call CountWords on some text? This will tell you whether the
          > function is performing as expected.[/color]

          Nothing happens. :) So I guess what you said below is correct.
          [color=blue]
          > And so on.
          >
          >
          > One possible problem that may be a mistake in the way you pasted the
          > text into your newsgroup message:
          >[color=green]
          >> #!/usr/bin/python
          >> [...]
          >> import string
          >>
          >> def CountWords(Text ):
          >> [...]
          >> for CharacterIndex in range(0,len(Tex t)):
          >> [...]
          >> if(PiecesOfWord s.find(CurrentC haracter)!=-1):
          >> [...]
          >> else:
          >> if(CurrentWord! =""):
          >> [...]
          >> if (__name__=="__m ain__"):
          >> [...][/color]
          >
          > Indentation defines structural language blocks in Python. The "def",
          > "for", "if" structures above will encompass *all* lines below them until
          > the next line at their own indentation level or less.
          >
          > In other words, if the code looks the way you've pasted it here, the
          > "def" encompasses everything below it; the "for" encompasses everything
          > below it; and the "if(PiecesOfWor ds...):" encompasses everything below
          > it. Including the "if( __name__ == "__main__" ):" line.
          >
          > Thus, as you've posted it here, the file imports the string module,
          > defines a function -- then does nothing with it.
          >
          > Please be sure to paste the text literally in messages; or, if you've
          > pasted the text exactly as it is in the program, learn how Python
          > interprets indentation:
          >
          > <http://www.python.org/doc/current/ref/indentation.htm l>
          >[/color]

          Thanks for the link. I'm not really used to this whole indention deal yet. I
          as however using WingIDE which indents for me.

          JesterDev

          Comment

          • Dave K

            #6
            Re: Word count from file help.

            On Thu, 12 Feb 2004 01:04:20 GMT in comp.lang.pytho n, "jester.dev "
            <jester.dev@com cast.net> wrote:
            [color=blue]
            >Hello,
            >
            > I'm learning Python from Python Bible, and having some
            >problems with this code below. When I run it, I get nothing. It
            >should open the file poem.txt (which exists in the current
            >directory) and count number of times any given word appears
            >in the text.[/color]

            When I run it (after re-formatting - you can see below how it appears
            in my newsreader), and after fixing the two error messages, it prints
            the results just as you describe. Try this:

            1) Add the line 'CurrentWord = ""' just before the line
            'for CharacterIndex in range(0,len(Tex t)):'
            2) Change the very last line to 'print Word, WordCount[Word]'

            If that doesn't work for you then I suspect that the indenting in your
            program is wrong (rather than just being mangled by posting it), but
            I'm just guessing. It would be helpful if you posted the actual error
            message (Traceback) that the Python interpreter prints, that makes it
            much easier to find the problem.

            Dave
            [color=blue]
            >
            >#!/usr/bin/python
            >
            >
            ># WordCount.py - Counts the words in a given text file (poem.txt)
            >
            >import string
            >
            >def CountWords(Text ):
            > "Count how many times each word appears in Text"
            > # A string (above) after a def statement is a -
            > # "docstring" - a comment intended for documentation.
            > WordCount={}
            > # We will build up (and return) a dictionary whose keys
            > # are the words, and whose values are the corresponding
            > # number of occurrences.
            >
            > CountWords=""
            > # To make the job cleaner, add a period at the end of the
            > # text; that way, we are guaranteed to be finished with
            > # the current word when we run out of letters:
            > Text=Text+"."
            >
            > # We assume that ' and - don't break words, but any other
            > # nonalphabetic character does. This assumption isn't
            > # entirely accurate, but it's close enough for us.
            > # string.letters is a string of all alphabetic charactors.
            > PiecesOfWords=s tring.letters+" '-"
            >
            > # Iterate over each character in the text. The function
            > # len () returns the length of a sequence.
            > for CharacterIndex in range(0,len(Tex t)):
            > CurrentCharacte r=Text[CharacterIndex]
            >
            > # The find() method of a string finds the starting
            > # index of the first occurrence of a substring within
            > # a string, or returns -1 of it doesn't find a substring.
            > # The next line of code tests to see wether CurrentCharacte r
            > # is part of a word:
            > if(PiecesOfWord s.find(CurrentC haracter)!=-1):
            > # Append this letter to the current word.
            > CurrentWord=Cur rentWord+Curren tCharacter
            > else:
            > # This character is no a letter.
            > if(CurrentWord! =""):
            > # We just finished a word.
            > # Convert to lowercase, so "The" and
            >"the"
            > # fall in the same bucket...
            >
            >CurrentWord=st ring.lower(Curr entWord)
            >
            > # Now increment this word's count.
            >
            >CurrentCount=W ordCount.get(Cu rrentWord,0)
            >
            >WordCount[CurrentWord]=CurrentCount+1
            >
            > # Start a new word.
            > CurrentWord=""
            > return(WordCoun t)
            > if (__name__=="__m ain__"):
            > # Read the text from the file
            >peom.txt.
            > TextFile=open(" poem.txt","r")
            > Text=TextFile.r ead()
            > TextFile.close( )
            >
            > # Count the words in the text.
            > WordCount=Count Words(Text)
            > # Alphabetize the word list, and
            >print them all out.
            > SortedWords=Wor dCount.keys()
            > SortedWords.sor t()
            > for Word in SortedWords:
            > print Word.WordCount[Word][/color]

            Comment

            Working...