If you have a list how do you use the split command to split each line into word, and freqency?
Using the split command in a list
Collapse
X
-
Originally posted by texas22If you have a list how do you use the split command to split each line into word, and freqency?
Like:
[code=python]
>>> ListOWords = ["Hello Wold", "I am a", "list of words"]
>>> Stuff = [i.split() for i in ListOWords]
>>> Stuff
[['Hello', 'Wold'], ['I', 'am', 'a'], ['list', 'of', 'words']]
>>>
[/code]
I guess its totaly offbase but i have no idea what you want. -
Originally posted by texas22If you have a list how do you use the split command to split each line into word, and freqency?
>>> wordList = [s.lower() for s in sentences.split ()]
>>> wordCnt = [wordList.count( w) for w in wordList]
>>> dd = dict(zip(wordLi st,wordCnt))
>>> for item in dd:
... print "Word '%s' occurs %d times." % (item, dd[item])
...
Word 'just' occurs 1 times.
Word 'sentence' occurs 2 times.
Word 'is' occurs 2 times.
Word 'word.' occurs 1 times.
Word 'frequency' occurs 1 times.
Word 'are' occurs 1 times.
Word 'determine' occurs 1 times.
Word 'for' occurs 1 times.
Word 'to' occurs 1 times.
Word 'also' occurs 1 times.
Word 'going' occurs 1 times.
Word 'split.' occurs 1 times.
Word 'it.' occurs 1 times.
Word 'we' occurs 2 times.
Word 'that' occurs 1 times.
Word 'here' occurs 1 times.
Word 'a' occurs 1 times.
Word 'this' occurs 2 times.
Word 'of' occurs 2 times.
Word 'will' occurs 1 times.
Word 'heck' occurs 1 times.
Word 'each' occurs 1 times.
Word 'the' occurs 2 times.
>>>[/code]Here's a way to get a list of words from lines from a file without using split:[code=Python]import re
lineList = open(r'X:/path/subdir/name_of_file'). readlines()
pat = "\w+"
wordList = []
for line in lineList:
wordList += [w.lower() for w in re.findall(pat, line)]
wordCnt = [wordList.count( w) for w in wordList]
dd = dict(zip(wordLi st,wordCnt))
for item in dd:
print "Word '%s' occurs %d times." % (item, dd[item])[/code]This way will exclude any punctuation. If you have a list of lines and you don't care about possible punctuation:[code=Python]>>> wordList = []
>>> for line in lineList:
... wordList += [s.lower() for s in line.strip().sp lit()][/code]Comment
-
Originally posted by bartoncI alway like to add this little touch of elegance:[CODE=python]
>>> for item in dd:
... i = dd[item]
... print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])[/CODE]
You have posted something similar to this before. I am beginning to catch on. Thanks! :)
BVComment
-
Originally posted by bvdetBarton,
You have posted something similar to this before. I am beginning to catch on. Thanks! :)
BV
stateStr = ("Off", "On")[flag] # tuples require int indexes so there is often a cast from bool to int[/CODE]
I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.Comment
-
Originally posted by bartoncNope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:[CODE=python]flag = aRadioButton.Ge tState() # actually an int, not bool
stateStr = ("Off", "On")[flag] # tuples require int indexes so there is often a cast from bool to int[/CODE]
I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.
for i in range(20):
RoleDice(dice)
PrintDice(dice)
print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
print[/code]Comment
-
Originally posted by bartoncI alway like to add this little touch of elegance:[CODE=python]
>>> for item in dd:
... i = dd[item]
... print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])[/CODE]Comment
-
Originally posted by bvdetThis is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.[code=Python]# test utility functions and rules
for i in range(20):
RoleDice(dice)
PrintDice(dice)
print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
print[/code]Comment
-
Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.Comment
-
Originally posted by texas22Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.
>>> anStr = "The fat cat ran into a fat cow"
>>> anStr.count("fa t")
2
>>> [/CODE]
Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.Comment
-
Originally posted by bartoncI s'pose you could use some complicated method of keeping track, or you can just do this:[CODE=python]
>>> anStr = "The fat cat ran into a fat cow"
>>> anStr.count("fa t")
2
>>> [/CODE]
Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.
>>> aList = anStr.split()
>>> aList
['The', 'fat', 'cat', 'ran', 'into', 'a', 'fat', 'cow']
>>> aList.count('fa t')
2
>>> [/CODE]Comment
-
Originally posted by texas22Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?
[code=python]
thing = [[len(word), word, listOfWords.cou nt(word)] for word in listOfWords]
[/code]Comment
Comment