extract from html

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Lydia Shawn

    extract from html

    hi,
    how can i extract the number between text1 and text2 in input.html
    only the first time it occurs ignoring the rest?
    preferably input.html would be a URL that stops downloading once a
    match has occured, that would save a lot of bandwidth..
    i guess html::parser would provide an option to work with a file while
    it's downloading (?)

    example
    ----

    input.html:

    bla..
    text1 555 text2
    bla
    bla
    text1 6000 text2
    bla
    EOF


    output.txt
    555


    thanks for your help,
    peter
  • Michael Korte

    #2
    Re: extract from html


    "Lydia Shawn" <apfeloma@hotma il.com> schrieb im Newsbeitrag
    news:1240b4dc.0 308051647.685dd e59@posting.goo gle.com...[color=blue]
    > hi,
    > how can i extract the number between text1 and text2 in input.html
    > only the first time it occurs ignoring the rest?[/color]

    This problem I would solve by using a Hash. You can just put a unique key
    into it, while finding the same term
    it will be overwritten, or you can ask the hash if the term already exist

    # $term is taken from your text - inbeetween text1 / text2
    if( exists $myHash{$term})
    {
    # ignore
    }else
    {
    $myHash{$term} = $value;
    }

    The Rest of your question : I don“t know ... sorry
    [color=blue]
    > thanks for your help,
    > peter[/color]

    no prob...but what is your real name ?
    "Lydia Shawn" or Peter :-)

    HTH
    greets Michael


    Comment

    • Brian Helterline

      #3
      Re: extract from html


      "Lydia Shawn" <apfeloma@hotma il.com> wrote in message
      news:1240b4dc.0 308051647.685dd e59@posting.goo gle.com...[color=blue]
      > hi,
      > how can i extract the number between text1 and text2 in input.html
      > only the first time it occurs ignoring the rest?
      > preferably input.html would be a URL that stops downloading once a
      > match has occured, that would save a lot of bandwidth..
      > i guess html::parser would provide an option to work with a file while
      > it's downloading (?)[/color]

      Take a look at the lwp-download script (in your perl bin directory)
      as an example of a program that incrementally downloads a URL.
      You can then search the contents for your text1 and text2 and stop if found.

      The script uses LWP::UserAgent to do the download.

      --
      brian


      Comment

      Working...