Using the split command in a list

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • texas22
    New Member
    • Jun 2007
    • 26

    Using the split command in a list

    If you have a list how do you use the split command to split each line into word, and freqency?
  • Smygis
    New Member
    • Jun 2007
    • 126

    #2
    Originally posted by texas22
    If you have a list how do you use the split command to split each line into word, and freqency?
    What?

    Like:
    [code=python]
    >>> ListOWords = ["Hello Wold", "I am a", "list of words"]
    >>> Stuff = [i.split() for i in ListOWords]
    >>> Stuff
    [['Hello', 'Wold'], ['I', 'am', 'a'], ['list', 'of', 'words']]
    >>>
    [/code]

    I guess its totaly offbase but i have no idea what you want.

    Comment

    • bvdet
      Recognized Expert Specialist
      • Oct 2006
      • 2851

      #3
      Originally posted by texas22
      If you have a list how do you use the split command to split each line into word, and freqency?
      Do you have a list, a line, or a list of lines? Maybe this will help:[code=Python]>>> sentences = 'This is a sentence that we are going to split. We will also determine the frequency of each word. This sentence is here just for the heck of it.'
      >>> wordList = [s.lower() for s in sentences.split ()]
      >>> wordCnt = [wordList.count( w) for w in wordList]
      >>> dd = dict(zip(wordLi st,wordCnt))
      >>> for item in dd:
      ... print "Word '%s' occurs %d times." % (item, dd[item])
      ...
      Word 'just' occurs 1 times.
      Word 'sentence' occurs 2 times.
      Word 'is' occurs 2 times.
      Word 'word.' occurs 1 times.
      Word 'frequency' occurs 1 times.
      Word 'are' occurs 1 times.
      Word 'determine' occurs 1 times.
      Word 'for' occurs 1 times.
      Word 'to' occurs 1 times.
      Word 'also' occurs 1 times.
      Word 'going' occurs 1 times.
      Word 'split.' occurs 1 times.
      Word 'it.' occurs 1 times.
      Word 'we' occurs 2 times.
      Word 'that' occurs 1 times.
      Word 'here' occurs 1 times.
      Word 'a' occurs 1 times.
      Word 'this' occurs 2 times.
      Word 'of' occurs 2 times.
      Word 'will' occurs 1 times.
      Word 'heck' occurs 1 times.
      Word 'each' occurs 1 times.
      Word 'the' occurs 2 times.
      >>>[/code]Here's a way to get a list of words from lines from a file without using split:[code=Python]import re

      lineList = open(r'X:/path/subdir/name_of_file'). readlines()
      pat = "\w+"
      wordList = []

      for line in lineList:
      wordList += [w.lower() for w in re.findall(pat, line)]

      wordCnt = [wordList.count( w) for w in wordList]

      dd = dict(zip(wordLi st,wordCnt))

      for item in dd:
      print "Word '%s' occurs %d times." % (item, dd[item])[/code]This way will exclude any punctuation. If you have a list of lines and you don't care about possible punctuation:[code=Python]>>> wordList = []
      >>> for line in lineList:
      ... wordList += [s.lower() for s in line.strip().sp lit()][/code]

      Comment

      • bartonc
        Recognized Expert Expert
        • Sep 2006
        • 6478

        #4
        I alway like to add this little touch of elegance:[CODE=python]
        >>> for item in dd:
        ... i = dd[item]
        ... print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])[/CODE]

        Comment

        • bvdet
          Recognized Expert Specialist
          • Oct 2006
          • 2851

          #5
          Originally posted by bartonc
          I alway like to add this little touch of elegance:[CODE=python]
          >>> for item in dd:
          ... i = dd[item]
          ... print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])[/CODE]
          Barton,

          You have posted something similar to this before. I am beginning to catch on. Thanks! :)

          BV

          Comment

          • bartonc
            Recognized Expert Expert
            • Sep 2006
            • 6478

            #6
            Originally posted by bvdet
            Barton,

            You have posted something similar to this before. I am beginning to catch on. Thanks! :)

            BV
            Nope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:[CODE=python]flag = aRadioButton.Ge tState() # actually an int, not bool
            stateStr = ("Off", "On")[flag] # tuples require int indexes so there is often a cast from bool to int[/CODE]
            I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.

            Comment

            • bvdet
              Recognized Expert Specialist
              • Oct 2006
              • 2851

              #7
              Originally posted by bartonc
              Nope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:[CODE=python]flag = aRadioButton.Ge tState() # actually an int, not bool
              stateStr = ("Off", "On")[flag] # tuples require int indexes so there is often a cast from bool to int[/CODE]
              I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.
              This is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.[code=Python]# test utility functions and rules
              for i in range(20):
              RoleDice(dice)
              PrintDice(dice)
              print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
              print[/code]

              Comment

              • ilikepython
                Recognized Expert Contributor
                • Feb 2007
                • 844

                #8
                Originally posted by bartonc
                I alway like to add this little touch of elegance:[CODE=python]
                >>> for item in dd:
                ... i = dd[item]
                ... print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])[/CODE]
                Hmm, that's a clever way of doing it, never would have thought of it.

                Comment

                • bartonc
                  Recognized Expert Expert
                  • Sep 2006
                  • 6478

                  #9
                  Originally posted by bvdet
                  This is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.[code=Python]# test utility functions and rules
                  for i in range(20):
                  RoleDice(dice)
                  PrintDice(dice)
                  print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
                  print[/code]
                  Yep. I figured that. As soon as I submitted, I thought "that's not on point", but what's done is done (sort of).

                  Comment

                  • texas22
                    New Member
                    • Jun 2007
                    • 26

                    #10
                    Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.

                    Comment

                    • bartonc
                      Recognized Expert Expert
                      • Sep 2006
                      • 6478

                      #11
                      Originally posted by texas22
                      Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.
                      I s'pose you could use some complicated method of keeping track, or you can just do this:[CODE=python]
                      >>> anStr = "The fat cat ran into a fat cow"
                      >>> anStr.count("fa t")
                      2
                      >>> [/CODE]
                      Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.

                      Comment

                      • bartonc
                        Recognized Expert Expert
                        • Sep 2006
                        • 6478

                        #12
                        Originally posted by bartonc
                        I s'pose you could use some complicated method of keeping track, or you can just do this:[CODE=python]
                        >>> anStr = "The fat cat ran into a fat cow"
                        >>> anStr.count("fa t")
                        2
                        >>> [/CODE]
                        Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.
                        Sorry, I forgot that we were working on lists:[CODE=python]
                        >>> aList = anStr.split()
                        >>> aList
                        ['The', 'fat', 'cat', 'ran', 'into', 'a', 'fat', 'cow']
                        >>> aList.count('fa t')
                        2
                        >>> [/CODE]

                        Comment

                        • texas22
                          New Member
                          • Jun 2007
                          • 26

                          #13
                          Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?

                          Comment

                          • Smygis
                            New Member
                            • Jun 2007
                            • 126

                            #14
                            Originally posted by texas22
                            Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?
                            something like:


                            [code=python]
                            thing = [[len(word), word, listOfWords.cou nt(word)] for word in listOfWords]
                            [/code]

                            Comment

                            • texas22
                              New Member
                              • Jun 2007
                              • 26

                              #15
                              Thanks for the help that helped make sense of it

                              Comment

                              Working...