XML Oddity

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mark Johnson

    XML Oddity

    >>DELURK<<

    Over the last few weeks, we've been working on building an online
    portfolio using XML to pass content to an HTML page via PHP. In the
    process, we've run across a rather inexplicable error which we've been
    unable to find any reference to elsewhere. Hopefully, someone who
    reads this will know what's going on and be able to provide some
    assistance.

    Here is our XML:


    Here is our HTML and PHP:


    And here is the page in action:


    The problem is this: When a user clicks the third link under the
    "Digital" heading, as you can see from the XML, the following text
    ought to be displayed:

    ==begin==
    Such has been the patient sufferance of these Colonies; and such is now
    the necessity which constrains them to alter their former Systems of
    Government. The history of the present King of Great Britain [George
    III] is a history of repeated injuries and usurpations, all having in
    direct object the establishment of an absolute Tyranny over these
    States. To prove this, let Facts be submitted to a candid world. He
    has refused his Assent to Laws, the most wholesome and necessary for
    the public good. He has forbidden his Governors to pass Laws of
    immediate and pressing importance, unless suspended in their operation
    till his Assent should be obtained; and when so suspended, he has
    utterly neglected to attend to them.
    ==end==

    However, rather than that text being displayed in its entirety, the
    following is all that displays:
    ==begin==
    sing importance, unless suspended in their operation till his Assent
    should be obtained; and when so suspended, he has utterly neglected to
    attend to them.
    ==end==

    Somehow, everything prior to that point has been eaten.

    This is what we know: this error occurs in WindowsXP, MacOSX, and
    RedHat Linux. It occurs regardless of whether IE or a Gekko-based
    browser is used. It occurs regardless of what type of server the files
    are uploaded to. If all elements are edited to contain the exact same
    number of characters, the error seems to disappear, but doing so
    renders the code useless for our purposes. No other errors have been
    noted. Changing the code so that no elements are undisplayed has no
    effect. The question is this: what is causing this error, and how can
    it be avoided? Any assistance would be greatly appreciated.

    Mark Johnson

  • Richard Light

    #2
    Re: XML Oddity

    In message <1112210810.044 671.21640@o13g2 000cwo.googlegr oups.com>, Mark
    Johnson <markgj@gmail.c om> writes

    Caveat: I know nothing about the PHP XML parser. However, I suspect
    that the problem is a failure to separate the physical reading of input
    blocks from the logical parsing of the data they contain. My reason for
    saying this is that the truncated phrase you quote "sing importance,
    unless suspended ..." is at the start of the second 4096-byte block in
    the file.

    I would guess that the parser handed you the first part of this data
    content, you placed in your array variable, and then it handed you the
    second part ... Little suspecting this, you promptly overwrote the
    variable with this second chunk. You can easily test this hypothesis by
    changing the block size and seeing if the position of the error changes.

    If this is the case, you'll have to be a bit smarter about processing
    character data. Or get a better parser ...

    Richard Light
    [color=blue]
    >Over the last few weeks, we've been working on building an online
    >portfolio using XML to pass content to an HTML page via PHP. In the
    >process, we've run across a rather inexplicable error which we've been
    >unable to find any reference to elsewhere. Hopefully, someone who
    >reads this will know what's going on and be able to provide some
    >assistance.
    >
    >Here is our XML:
    >http://www.uky.edu/AuxServ/creativeg...tfolio_xml.txt
    >
    >Here is our HTML and PHP:
    >http://www.uky.edu/AuxServ/creativeg...tfolio_php.txt
    >
    >And here is the page in action:
    >http://www.uky.edu/AuxServ/creativeg.../portfolio.php
    >
    >The problem is this: When a user clicks the third link under the
    >"Digital" heading, as you can see from the XML, the following text
    >ought to be displayed:
    >
    >==begin==
    >Such has been the patient sufferance of these Colonies; and such is now
    >the necessity which constrains them to alter their former Systems of
    >Government. The history of the present King of Great Britain [George
    >III] is a history of repeated injuries and usurpations, all having in
    >direct object the establishment of an absolute Tyranny over these
    >States. To prove this, let Facts be submitted to a candid world. He
    >has refused his Assent to Laws, the most wholesome and necessary for
    >the public good. He has forbidden his Governors to pass Laws of
    >immediate and pressing importance, unless suspended in their operation
    >till his Assent should be obtained; and when so suspended, he has
    >utterly neglected to attend to them.
    >==end==
    >
    >However, rather than that text being displayed in its entirety, the
    >following is all that displays:
    >==begin==
    >sing importance, unless suspended in their operation till his Assent
    >should be obtained; and when so suspended, he has utterly neglected to
    >attend to them.
    >==end==
    >
    >Somehow, everything prior to that point has been eaten.
    >
    >This is what we know: this error occurs in WindowsXP, MacOSX, and
    >RedHat Linux. It occurs regardless of whether IE or a Gekko-based
    >browser is used. It occurs regardless of what type of server the files
    >are uploaded to. If all elements are edited to contain the exact same
    >number of characters, the error seems to disappear, but doing so
    >renders the code useless for our purposes. No other errors have been
    >noted. Changing the code so that no elements are undisplayed has no
    >effect. The question is this: what is causing this error, and how can
    >it be avoided? Any assistance would be greatly appreciated.
    >
    >Mark Johnson
    >[/color]

    --
    Richard Light
    SGML/XML and Museum Information Consultancy
    richard@light.d emon.co.uk

    Comment

    • Malcolm Dew-Jones

      #3
      Re: XML Oddity

      Richard Light (richard@light. demon.co.uk) wrote:
      : In message <1112210810.044 671.21640@o13g2 000cwo.googlegr oups.com>, Mark
      : Johnson <markgj@gmail.c om> writes

      : Caveat: I know nothing about the PHP XML parser. However, I suspect
      : that the problem is a failure to separate the physical reading of input
      : blocks from the logical parsing of the data they contain. My reason for
      : saying this is that the truncated phrase you quote "sing importance,
      : unless suspended ..." is at the start of the second 4096-byte block in
      : the file.

      : I would guess that the parser handed you the first part of this data
      : content, you placed in your array variable, and then it handed you the
      : second part ... Little suspecting this, you promptly overwrote the
      : variable with this second chunk. You can easily test this hypothesis by
      : changing the block size and seeing if the position of the error changes.

      : If this is the case, you'll have to be a bit smarter about processing
      : character data. Or get a better parser ...
      ^^^^^^^^^^^^^^^ ^^^^^^

      sounds like a likely scenario

      however that doesn't mean there's anything wrong with the parser. a SAX
      parser has no requirement to feed all of some contiguous character data in
      a single call, and in fact a parser that did so could be considered a
      problem.

      Imagine if I had an xml document that had a giga byte of contiguous
      character data. One of the points of the SAX parser is that it can feed
      that data to the handler in smaller, more memory efficient chunks, and not
      have to load the entire string in to memory.




      --

      This space not for rent.

      Comment

      • Richard Light

        #4
        Re: XML Oddity

        In message <424c350d@news. victoria.tc.ca> , Malcolm Dew-Jones
        <yf110@vtn1.vic toria.tc.ca> writes
        [color=blue]
        >however that doesn't mean there's anything wrong with the parser. a SAX
        >parser has no requirement to feed all of some contiguous character data in
        >a single call, and in fact a parser that did so could be considered a
        >problem.
        >
        >Imagine if I had an xml document that had a giga byte of contiguous
        >character data. One of the points of the SAX parser is that it can feed
        >that data to the handler in smaller, more memory efficient chunks, and not
        >have to load the entire string in to memory.[/color]

        I would agree with that principle entirely. However, from a software
        engineering point of view, I would expect as the user of such a parser
        to be able to control the "text chunk" size, and not have character data
        cut into arbitrary chunks based on where the block boundaries in the
        input stream happen to fall.

        Richard
        --
        Richard Light
        SGML/XML and Museum Information Consultancy
        richard@light.d emon.co.uk

        Comment

        Working...