Split function to split sentence into words

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fellya
    New Member
    • Nov 2008
    • 11

    Split function to split sentence into words

    Hi,

    i don't have enough experience in writing codes in Python but now i'm trying to see how i can start using Python.
    I've tried to write a simple program that can display a sentence. now my problem is how to write a code using split function to split that sentence into words then print out each word separately. let me give u an example:

    >>>sentence=" My question is to know how to write a code in Python"

    then the output of this sentece must give:

    sentence[1]=My
    sentence[2]=question
    sentence[3]=is
    sentence[4]=to
    sentence[5]=know
    ......
    .......

    Can someone help me in this?
  • oler1s
    Recognized Expert Contributor
    • Aug 2007
    • 671

    #2
    Always check the documentation, for something interesting. In this case, look at possible string methods ( http://www.python.org/doc/2.5.2/lib/string-methods.html ) and you will see a split function. Here’s a quick example.
    Code:
    >>> sent = "Jack ate the apple."
    >>> splitsent = sent.split(' ')
    >>> splitsent
    ['Jack', 'ate', 'the', 'apple.']
    Simple as that.

    Comment

    • fellya
      New Member
      • Nov 2008
      • 11

      #3
      thank you for the help but the question is not fully answered! with this program it will split the sentence but i would like the output to be lets say if we have a= jack ate the apple, i would like the output to be:
      a[0]=jack
      a[1]=ate
      a[2]=the
      a[3]=apple

      can you please see if its possible to get the above output?

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        Originally posted by fellya
        thank you for the help but the question is not fully answered! with this program it will split the sentence but i would like the output to be lets say if we have a= jack ate the apple, i would like the output to be:
        a[0]=jack
        a[1]=ate
        a[2]=the
        a[3]=apple

        can you please see if its possible to get the above output?
        The answer is: string formatting!

        Example:
        [code=Python]>>> sentence = "The dog ate my homework"
        >>> for i,word in enumerate(sente nce.split()):
        ... print "Word #%d: %s" % (i, word)
        ...
        Word #0: The
        Word #1: dog
        Word #2: ate
        Word #3: my
        Word #4: homework
        >>> [/code]

        Comment

        • fellya
          New Member
          • Nov 2008
          • 11

          #5
          thank you for the help, but as u can see with the output below when i do the command sentence[0] to show me the first word it is showing me "T" this is not what i want!!! for me i would like to see if i type the command sentence[0]; to display "The" and if i type again sentence[1]; it has to give me "dog"


          can you plz help!
          Code:
          >>> sentence="The dog ate my homework"
          >>> for i, word in enumerate(sentence.split()):
          ...             print " word #%d: %s" % (i,word)
          ...
           word #0: The
           word #1: dog
           word #2: ate
           word #3: my
           word #4: homework
          >>> sentence[0]
          'T'
          >>> sentence[1];
          'h'
          >>>
          Last edited by numberwhun; Dec 9 '08, 05:54 PM. Reason: Please use code tags

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            How about this?[code=Python]>>> split_sentence = sentence.split( )
            >>> split_sentence[0]
            'The'
            >>> split_sentence[1]
            'dog'
            >>> [/code]

            Comment

            • fellya
              New Member
              • Nov 2008
              • 11

              #7
              ohhhh thank you so much.
              thats what i wanted.
              may God bless U.
              once again thank you

              Comment

              • fellya
                New Member
                • Nov 2008
                • 11

                #8
                hi, i have another question related to the above:
                I have created a file of more than 100 sentences in it then i saved it with extension .py , then i'm using the operations to open the file which are:
                Code:
                f=open("example.py")
                try:
                    for line in f:
                                    print line
                finally:
                          f.close()
                so after using the above commands im able to open my file. Now i know how to split a sentence into words, the problem comes now how can i do it on a file containing more than 100 sentences in it? with 1 or 2 senteces i can write the sentences and split them, now how about a file with many sentences?

                can someone help me?
                Last edited by numberwhun; Dec 9 '08, 05:53 PM. Reason: Please use code tags

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #9
                  Please use code tags around code. It will make your code much easier to read.

                  [CODE]..code goes here..[/CODE]

                  In your code, you are iterating on each line in the file. Each iteration, the variable line represents a sentence. Do you want to save the sentence in a list? What do you want to do with 100 sentences?

                  The following will save a list of lists. You can access each word by list index.
                  [code=Python]
                  lineList = [line.strip().sp lit() for line in open("your_file ").readline s()]
                  # print the first word in the first line.
                  print lineList[0][0][/code]

                  Comment

                  • fellya
                    New Member
                    • Nov 2008
                    • 11

                    #10
                    i dont need to save a sentence in a list. what i want to do is : i take a document which has like any number of sentences then by using Python i would like to split the document of any number of sentences into words where each word has a number e.g., word1=the, word2= apple ect. then by this output i will use an other program that can help me to identify if word1 is a noun or not and son on. Brief after getting all the words in a document , I will try to identify only noun and extract only nouns from the doc.

                    Comment

                    • bvdet
                      Recognized Expert Specialist
                      • Oct 2006
                      • 2851

                      #11
                      The previous code I posted above will work fine for your purpose. To get all the words in a single list:
                      [code=Python]wordList = reduce(lambda x,y: x+y, lineList, [])[/code]Now you have a list of all the words. To iterate on the list of words:
                      [code=Python]>>> lineList = [['1','2','3'],['4','5','6']]
                      >>> reduce(lambda x,y: x+y, lineList, [])
                      ['1', '2', '3', '4', '5', '6']
                      >>> wordList = reduce(lambda x,y: x+y, lineList, [])
                      >>> for i,word in enumerate(wordL ist):
                      ... print "Word[%d]: %s" % (i,word)
                      ...
                      Word[0]: 1
                      Word[1]: 2
                      Word[2]: 3
                      Word[3]: 4
                      Word[4]: 5
                      Word[5]: 6
                      >>> [/code]

                      Comment

                      • fellya
                        New Member
                        • Nov 2008
                        • 11

                        #12
                        hey thanks for the help.
                        my dear your last solution works perfectly with numbers!!!
                        but the one that i was lookin for is the solution u gave me in your reply number 9 :
                        Code:
                        lineList = [line.strip().split() for line in open("your_file").readlines()] 
                        # print the first word in the first line. 
                        print lineList[0][0]
                        this solution is helping me to find one word at a time. imagine i have a doc of two pages, the above codes will take time. because when i'm typing like print lineList[0][3] it is giving me the third word in my doc which is perfect, but the problem i have to type print lineList[0][1] upto print lineList[0][n] with n the last word in my doc!!! i want codes like the above one but which will not ask me to type print lineList[][] to get only one word in my doc.
                        can u please help? i know the codes u gave me are working but the problem i have to type print lineList[][] for each word.
                        Last edited by numberwhun; Dec 9 '08, 05:53 PM. Reason: Please use code tags!

                        Comment

                        • bvdet
                          Recognized Expert Specialist
                          • Oct 2006
                          • 2851

                          #13
                          If you have a list of lists:
                          [code=Python]>>> list_of_lists = [[1,2,3],[4,5,6],[7,8,9]]
                          >>> for i, item in enumerate(list_ of_lists):
                          ... for j, word in enumerate(item) :
                          ... print "List item #%d, Word #%d: %s" % (i,j,word)
                          ...
                          List item #0, Word #0: 1
                          List item #0, Word #1: 2
                          List item #0, Word #2: 3
                          List item #1, Word #0: 4
                          List item #1, Word #1: 5
                          List item #1, Word #2: 6
                          List item #2, Word #0: 7
                          List item #2, Word #1: 8
                          List item #2, Word #2: 9
                          >>> [/code]

                          Comment

                          • fellya
                            New Member
                            • Nov 2008
                            • 11

                            #14
                            okay thank you so much for the help bvdet!!!
                            but i think u didn't get my question. Ok let me be clear and simple. let us assume i have a file called ex1.py, then in this doc i have more than one paragraph. to open the file i know the pocedure to open a file. now I would like to know if there is a way i can open the file, then read like one sentence or paragraph of the doc then after readin the sentence, i split that sentence such that if the sentence was "jack is a hard worker" i want to have the output like :
                            word 1: jack
                            word 2: is
                            word 3: a
                            word 4: hard
                            word 5: worker.

                            then after reading and splitting that sentence, i go to the next sentence in the file and do the same thing.

                            is there any way to do it in python?
                            I need help please!!!

                            Comment

                            • bvdet
                              Recognized Expert Specialist
                              • Oct 2006
                              • 2851

                              #15
                              You will need to establish rules for determining what is a sentence. If the file is not too big, you can read the entire file into a string and split on the periods.
                              [code=Python]>>> import re
                              >>> s = 'This is a paragraph. How will we split it? We can use re module split()! We should get four sentences.'
                              >>> sList = [item.strip() for item in re.split('[!?.]', s) if item]
                              >>> sList
                              ['This is a paragraph', 'How will we split it', 'We can use re module split()', 'We should get four sentences']
                              >>> [/code]

                              Comment

                              Working...