How do I parse this page?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • nntp

    How do I parse this page?

    I am trying to parse
    http://www.ebay.com without success.

    I view the source, and I see a lot of ?/td>. This page is unsavable.

    It displays perfectly in IE, but once the source is saved/viewed, it no long
    display right in IE.

    When I use LYNX to view it, it is formated perfectly.

    My question is how Ebay allow any brower to display the content right
    without allowing viewing source or safe as?


  • Gregory Toomey

    #2
    Re: How do I parse this page?

    nntp wrote:
    [color=blue]
    > I am trying to parse
    > http://www.ebay.com without success.[/color]

    In Perl, try

    [color=blue]
    > I view the source, and I see a lot of ?/td>. This page is unsavable.[/color]
    [color=blue]
    > It displays perfectly in IE, but once the source is saved/viewed, it no
    > long display right in IE.[/color]

    Maybe it uses css, or needs images to provide formatting hints.
    [color=blue]
    > When I use LYNX to view it, it is formated perfectly.
    >
    > My question is how Ebay allow any brower to display the content right
    > without allowing viewing source or safe as?[/color]

    Please don't clutter Perl newsgroups with web server questions.

    gtoomey

    Comment

    • nntp

      #3
      Re: How do I parse this page?

      > > I am trying to parse[color=blue][color=green]
      > > http://www.ebay.com without success.[/color]
      >
      > In Perl, try
      > http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm
      >[color=green]
      > > I view the source, and I see a lot of ?/td>. This page is unsavable.[/color]
      >[color=green]
      > > It displays perfectly in IE, but once the source is saved/viewed, it no
      > > long display right in IE.[/color]
      >
      > Maybe it uses css, or needs images to provide formatting hints.[/color]
      Have you looked at the source codes of www.ebay.com?
      I don't know what you mean by uses images to provide formatting hints.


      Comment

      • Scott Bryce

        #4
        Re: How do I parse this page?

        nntp wrote:
        [color=blue]
        > I don't know what you mean by uses images to provide formatting hints.[/color]

        Transparent GIFs, perhaps?

        Comment

        • Toby Inkster

          #5
          Re: How do I parse this page?

          [F'ups set to a.w.w.]

          nntp wrote:
          [color=blue]
          > http://www.ebay.com
          > I view the source, and I see a lot of ?/td>. This page is unsavable.
          > It displays perfectly in IE, but once the source is saved/viewed, it no long
          > display right in IE. My question is how Ebay allow any brower to
          > display the content right without allowing viewing source or safe as?[/color]

          IE doesn't simply show you the source when you hit the "view source"
          button. Oh no. That would be too easy. It does all kinds of weird crap
          first and then shows you some modified source code. I'm guessing that some
          of that weird crap screws up some of the characters.

          Look at the source code in a different browser and it displays fine.

          Not that you should try to emulate any of that code. It's pants.

          --
          Toby A Inkster BSc (Hons) ARCS
          Contact Me ~ http://tobyinkster.co.uk/contact

          Comment

          • George King

            #6
            Re: How do I parse this page?

            "nntp" <nntp@rogers.co m> wrote in message
            news:_dydnarTGP NdFePcRVn-sQ@rogers.com.. .[color=blue]
            >I am trying to parse
            > http://www.ebay.com without success.
            >
            > I view the source, and I see a lot of ?/td>. This page is unsavable.
            >
            > It displays perfectly in IE, but once the source is saved/viewed, it no
            > long
            > display right in IE.
            >
            > When I use LYNX to view it, it is formated perfectly.
            >
            > My question is how Ebay allow any brower to display the content right
            > without allowing viewing source or safe as?
            >[/color]

            I don't have a copy of Lynx, so I can't duplicate your problem, but...
            Opera saves the file with images and IE displays it just fine from the saved
            files.

            Ebay.com (index.html) uses an external CSS stylesheet. It also uses a
            sizeable number of external javascript files and 68 images to make up the
            page I looked at.

            George



            Comment

            • Ben Morrow

              #7
              Re: How do I parse this page?


              Quoth "nntp" <nntp@rogers.co m>:[color=blue]
              > I am trying to parse
              > http://www.ebay.com without success.
              >
              > I view the source, and I see a lot of ?/td>. This page is unsavable.
              >
              > It displays perfectly in IE, but once the source is saved/viewed, it no long
              > display right in IE.
              >
              > When I use LYNX to view it, it is formated perfectly.
              >
              > My question is how Ebay allow any brower to display the content right
              > without allowing viewing source or safe as?[/color]

              They can't. You've probably got character-set issues. Use LWP to retreive the
              page.

              Ben

              --
              I must not fear. Fear is the mind-killer. I will face my fear and
              I will let it pass through me. When the fear is gone there will be
              nothing. Only I will remain.
              ben@morrow.me.u k Frank Herbert, 'Dune'

              Comment

              • A. Sinan Unur

                #8
                Re: How do I parse this page?

                "nntp" <nntp@rogers.co m> wrote in
                news:_dydnarTGP NdFePcRVn-sQ@rogers.com:
                [color=blue]
                > I am trying to parse
                > http://www.ebay.com without success.
                >
                > I view the source, and I see a lot of ?/td>. This page is unsavable.[/color]

                That ain't true. If you have any questions on parsing HTML using
                HTML::Parser, please post them here. Otherwise, this waaay off-topic.

                Sinan

                Comment

                • Dr John Stockton

                  #9
                  Re: How do I parse this page?

                  JRS: In article <Xns958EB135B42 A7asu1cornelled u@132.236.56.8> , dated
                  Tue, 26 Oct 2004 21:25:13, seen in news:comp.lang. javascript, A. Sinan
                  Unur <1usa@llenroc.u de.invalid> posted :[color=blue]
                  >"nntp" <nntp@rogers.co m> wrote in
                  >news:_dydnarTG PNdFePcRVn-sQ@rogers.com:
                  >[color=green]
                  >> I am trying to parse
                  >> http://www.ebay.com without success.
                  >>
                  >> I view the source, and I see a lot of ?/td>. This page is unsavable.[/color]
                  >
                  >That ain't true. If you have any questions on parsing HTML using
                  >HTML::Parser , please post them here. Otherwise, this waaay off-topic.[/color]

                  Please take greater, or at least better, thought before using a word
                  such as "here".

                  --
                  © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 IE 4 ©
                  <URL:http://www.jibbering.c om/faq/> JL/RC: FAQ of news:comp.lang. javascript
                  <URL:http://www.merlyn.demo n.co.uk/js-index.htm> jscr maths, dates, sources.
                  <URL:http://www.merlyn.demo n.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

                  Comment

                  • Tad McClellan

                    #10
                    Re: How do I parse this page?

                    Dr John Stockton <spam@merlyn.de mon.co.uk> wrote:[color=blue]
                    > JRS: In article <Xns958EB135B42 A7asu1cornelled u@132.236.56.8> , dated
                    > Tue, 26 Oct 2004 21:25:13, seen in news:comp.lang. javascript, A. Sinan
                    > Unur <1usa@llenroc.u de.invalid> posted :[color=green]
                    >>"nntp" <nntp@rogers.co m> wrote in
                    >>news:_dydnarT GPNdFePcRVn-sQ@rogers.com:
                    >>[color=darkred]
                    >>> I am trying to parse
                    >>> http://www.ebay.com without success.
                    >>>
                    >>> I view the source, and I see a lot of ?/td>. This page is unsavable.[/color]
                    >>
                    >>That ain't true. If you have any questions on parsing HTML using
                    >>HTML::Parse r, please post them here. Otherwise, this waaay off-topic.[/color]
                    >
                    > Please take greater, or at least better, thought before using a word
                    > such as "here".[/color]


                    Please take greater, or at least better, notice of the Newsgroups
                    header before determining which "where" is "here".

                    :-)


                    --
                    Tad McClellan SGML consulting
                    tadmc@augustmai l.com Perl programming
                    Fort Worth, Texas

                    Comment

                    Working...