User Profile

Collapse

Profile Sidebar

Collapse
andrewwan1980
andrewwan1980
Last Activity: Sep 29 '08, 03:48 PM
Joined: Nov 14 '07
Location:
  •  
  • Time
  • Show
  • Source
Clear All
new posts

  • andrewwan1980
    replied to HELP: parsing unicode web sites
    in Perl
    Thanks to those who helped. Here's my working script:
    Code:
    #!/usr/bin/perl
    # tom365crawl2.pl
    # http://www.cs.utk.edu/cs594ipm/perl/crawltut.html
    # http://perldoc.perl.org/Encode.html
    # http://juerd.nl/site.plp/perluniadvice
    # http://www.perlmonks.org/?node_id=620068
    
    use warnings;
    use strict;
    
    use File::stat;
    use Tie::File;
    
    use LWP::Simple;
    ...
    See more | Go to post

    Leave a comment:


  • andrewwan1980
    started a topic HELP: parsing unicode web sites
    in Perl

    HELP: parsing unicode web sites

    I need help in parsing unicode webpages & downloading jpeg image files via Perl scripts.

    I read http://www.cs.utk.edu/cs594ipm/perl/crawltut.html about using LWP or HTTP or get($url) functions & libraries. But the content returned is always garbled. I have used get($url) on a non-unicode webpage and the content is returned in perfect ascii.

    But now I want to parse http://www.tom365.com/movie_2004/html/5507.html...
    See more | Go to post

  • HELP: managing multiple virtual folders of cloned websites

    I am an ASP web developer developing a website.

    I've got ASP files in many sub-folders of many levels deep.

    I've always used:

    <!-- #INCLUDE VIRTUAL="/VIRTUALSITE/SUBFOLDER1/SUBFOLDER2/FILE.asp" -->

    The VIRTUALSITE name is really the virtual directory created under IIS.

    I clone my website multiple times, to cater for different people. So, each person gets a...
    See more | Go to post

  • andrewwan1980
    replied to QUERY: comparing website contents
    I need a tool to get me the substring between delimiters then 79char

    line wrap the result and then diff... for both oldsite/old1.htm and

    newsite/new1.htm

    As for web crawling, old site is local, new site is online. But I

    rather hard code the URLs in a big list (mapping).

    I think I'll use Perl (maybe Python), to:

    1. for each item in mapping list
    1.1...
    See more | Go to post

    Leave a comment:


  • andrewwan1980
    started a topic QUERY: comparing website contents

    QUERY: comparing website contents

    I've got two websites, one original, the other based off the original.

    I like to diff/compare the websites using diff automatic comparison tools to see what text/information has changed. The problem is, the HTML code and layout has been changed drastically so I can't do a straight text file compare. What am interested in is purely the raw content (paragraphs, sentences, etc.). The original site has no javascript, onmouseover hovers,...
    See more | Go to post

  • andrewwan1980
    replied to HELP: IE7 pop-up blocker bypass
    the parent window contains the main form with all the hidden input elements for posting & also correct URL, etc. the child windows know nothing about the URL nor does it have any forms of it's own. the child window would call the parent window's form to submit... to OpenDoc.asp to open a new child window. I do not want the child window to open another child window within itself. Only the parent window should be allowed to open many child windows....
    See more | Go to post

    Leave a comment:


  • andrewwan1980
    started a topic HELP: IE7 pop-up blocker bypass

    HELP: IE7 pop-up blocker bypass

    I have a problem with a child window using the parent's FORM to do a submission/post to open a new window. This is being blocked by IE7. This is happening when accessing an external website that is not listed in the IE7 pop-up blocker allow list.

    If the current page does a FORM submission/post to target "_blank" then this is allowed and a new window pops up. However, if the new child window calls opener.top.docu ment.someForm.s ubmit();...
    See more | Go to post
No activity results to display
Show More
Working...