User Profile

Collapse

Profile Sidebar

Collapse
kshw
kshw
Last Activity: Aug 6 '10, 12:27 AM
Joined: Jun 26 '10
Location:
  •  
  • Time
  • Show
  • Source
Clear All
new posts

  • Thanks all..

    dwblas, Temp is not empty:

    Temp.append("". join(Original_F ile_Content))
    See more | Go to post

    Leave a comment:


  • kshw
    started a topic How to remove words from a text file using re

    How to remove words from a text file using re

    Hi,

    I'm trying to remove non-stop words from a text file using regular expresions but it is not working. I used something like ('^[a-z]?or') in order to avoid removing (or) from the mibble of words e.g. morning.

    Code:
    Temp = [] 
    Original_File = open('out.txt', 'r') 
    Original_File_Content = Original_File.read() Original_File.close() Temp.append("".join(Original_File_Content))
    ...
    See more | Go to post

  • kshw
    started a topic How to retrieve URLs and text from web pages

    How to retrieve URLs and text from web pages

    Hi,

    I’m new to programming. I’m currently learning python to write a web crawler to extract all text from a web page, in addition to, crawling to further URLs and collecting the text there. The idea is to place all the extracted text in a .txt file with each word in a single line. So the text has to be tokenized. All punctuation marks, duplicate words and non-stop words have to be removed.

    The program should crawl...
    See more | Go to post
No activity results to display
Show More
Working...