Parsing Web Sites

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Colum

    Parsing Web Sites

    Hi I need to parse particular web sites to extract paritcular information on
    a weekly basis. How is this done in PHP and is PHP better at doing this than
    JSP?


  • Nikolai Chuvakhin

    #2
    Re: Parsing Web Sites

    "Colum" <columfoley@hot mail.com> wrote in message
    news:<Vi1Ab.336 8$nm6.18313@new s.indigo.ie>...[color=blue]
    >
    > I need to parse particular web sites to extract paritcular
    > information on a weekly basis. How is this done in PHP[/color]

    $remote = file_get_conten ts ('http://www.somesite.co m/');

    Now string $remote contains the entire index file for
    http://www.somesite.com/. You can parse it, extract anything
    you want from it, or do whatever you please with it.

    As to the weekly basis, PHP itself has no scheduling tools.
    You will have to use OS-level scheduling via cron on Unix
    or Scheduler on Windows.
    [color=blue]
    > and is PHP better at doing this than JSP?[/color]

    This is a very basic functionality, so it's highly unlikely
    one scripting environment will be much better at it than
    another...

    Cheers,
    NC

    Comment

    • Charlie-Boo

      #3
      Re: Parsing Web Sites

      "Colum" <columfoley@hot mail.com> wrote in message news:<Vi1Ab.336 8$nm6.18313@new s.indigo.ie>...[color=blue]
      > Hi I need to parse particular web sites to extract paritcular information on
      > a weekly basis. How is this done in PHP and is PHP better at doing this than
      > JSP?[/color]

      Get SNOOPY!

      Charlie

      Comment

      • Fox

        #4
        Re: Parsing Web Sites



        Nikolai Chuvakhin wrote:[color=blue]
        >
        > "Colum" <columfoley@hot mail.com> wrote in message
        > news:<Vi1Ab.336 8$nm6.18313@new s.indigo.ie>...[color=green]
        > >
        > > I need to parse particular web sites to extract paritcular
        > > information on a weekly basis. How is this done in PHP[/color]
        >
        > $remote = file_get_conten ts ('http://www.somesite.co m/');[/color]

        This is only available on php 4.3+ -- many hosts still only support
        4.2.x or less... (like CIHost)

        In case of php4.2-, use fsockopen and fgets
        [color=blue]
        >
        > Now string $remote contains the entire index file for
        > http://www.somesite.com/. You can parse it, extract anything
        > you want from it, or do whatever you please with it.
        >
        > As to the weekly basis, PHP itself has no scheduling tools.
        > You will have to use OS-level scheduling via cron on Unix
        > or Scheduler on Windows.
        >[color=green]
        > > and is PHP better at doing this than JSP?[/color]
        >
        > This is a very basic functionality, so it's highly unlikely
        > one scripting environment will be much better at it than
        > another...
        >
        > Cheers,
        > NC[/color]

        Comment

        • Zurab Davitiani

          #5
          Re: Parsing Web Sites

          Fox wrote on Friday 05 December 2003 18:19:
          [color=blue][color=green]
          >> $remote = file_get_conten ts ('http://www.somesite.co m/');[/color]
          >
          > This is only available on php 4.3+ -- many hosts still only support
          > 4.2.x or less... (like CIHost)
          >
          > In case of php4.2-, use fsockopen and fgets[/color]

          If the host in question has the fopen wrappers enabled, you only need to use
          file() or fopen() and fread(); socket functions would be an overkill for
          that simple task.

          Comment

          • Robert Downes

            #6
            Re: Parsing Web Sites

            Colum wrote:[color=blue]
            > Hi I need to parse particular web sites to extract paritcular information on
            > a weekly basis. How is this done in PHP and is PHP better at doing this than
            > JSP?[/color]

            Unless you're a search engine, you're not gonna make yourself too
            popular by harvesting information from other people's sites.
            --
            Bob
            London, UK
            echo Mail fefsensmrrjyahe eoceoq\! | tr "jefroq\!" "@obe.uk"

            Comment

            Working...