Questions about character entities in XML and PCI security compliance

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • tempest@ucla.edu

    Questions about character entities in XML and PCI security compliance

    Hi all.

    This is a rather long posting but I have some questions concerning the
    usage of character entities in XML documents and PCI security
    compliance.

    The company I work for is using a third party ecommerce service for
    hosting its online store. A few months ago this third party commerce
    site began using PGP file encryption on XML files (e.g. web orders)
    transferred to us as part of the ongoing PCI security compliance.
    Basically we only need to add a PGP decryption process before we can
    parse the incoming XML files so there should not have been any
    technical issue.

    However, we noticed that XML files they created since PGP encryption
    was implemented contain some unusual character entities.

    For example, if a XML file have elements containing characters such as
    <, >, &, -, /, ' and so on, the XML file will use the following
    character entities to represent them as shown below:

    Character Unusal Character Entities
    < &amp;lt;
    &amp;gt;
    & &amp;amp;
    - &amp;#45;
    / &amp;#47;
    ' &amp#39;

    No matter how you look at them, they are NOT the proper character
    entities for the original characters shown.

    The problem with these bad character entities is that when we use .Net
    Framework components such as XmlReader to load the XML file, character
    entities are not expanded back to the original characters they
    represent.

    Instead I would get the following result:

    Unusal Character Entities Expanded Result:
    &amp;lt; &lt;
    &amp;gt; &gt;
    &amp;#38; &#38;
    &amp;#45; &#45;
    &amp;#47; &#47;
    &amp;#39; &#39;

    If you take a close look at the expanded results, you would see that
    they are the normal character entities you would expect to see.

    It seems to me that XML export process used by the ecommerce site has
    applied character entities "encoding" twice.

    For example, the proper character entity for / is &#47;.
    However, if you treat &#47; as data string and not as character entity
    and apply another "encoding", you would get &amp;#47;.

    This means that whenever a online customer enter characters such as &
    or / in their name or shipping address, the XML file we parsed will
    not give us the correct text.

    For example, if customer entered "Christian & Cruz" on their shipping
    address the XML file we downloaded will show them as "Christian
    &amp;#38; Cruz". And when the XML file is parsed the resulting string
    we get would be "Christian &#38; Cruz".

    Another example. If a customer entered "c/o R. Fenton, M.D." in their
    shipping address, the XML file will show this string as "c&amp;#47; o
    R. Fenton, M.D.". And the resulting string we parsed would be
    "c&#47;o R. Fenton, M.D.".

    When we reported this problem to the ecommerse hosting company, their
    response was that these character entities were "encoded" per PCI
    security policy and thus they have no plan to "fix" them.

    Their reply sounds strange because these weird character entities they
    use in XML files are NOT data encryption nor do they provide security
    benefits.

    Can anyone tell me if there is in fact some kind of special character
    entities used in XML file per PCI security compliancy?

    Or is our ecommerce hosting company wrong?

    Any information would be appreciated.
    Thank you.
  • Joe Fawcett

    #2
    Re: Questions about character entities in XML and PCI security compliance



    <tempest@ucla.e duwrote in message
    news:55sm94p8ig 2vpodjolu2cq5d9 61fu8c4q1@4ax.c om...
    Hi all.
    >
    This is a rather long posting but I have some questions concerning the
    usage of character entities in XML documents and PCI security
    compliance.
    >
    The company I work for is using a third party ecommerce service for
    hosting its online store. A few months ago this third party commerce
    site began using PGP file encryption on XML files (e.g. web orders)
    transferred to us as part of the ongoing PCI security compliance.
    Basically we only need to add a PGP decryption process before we can
    parse the incoming XML files so there should not have been any
    technical issue.
    >
    However, we noticed that XML files they created since PGP encryption
    was implemented contain some unusual character entities.
    >
    For example, if a XML file have elements containing characters such as
    <, >, &, -, /, ' and so on, the XML file will use the following
    character entities to represent them as shown below:
    >
    Character Unusal Character Entities
    < &amp;lt;
    > &amp;gt;
    & &amp;amp;
    - &amp;#45;
    / &amp;#47;
    ' &amp#39;
    >
    No matter how you look at them, they are NOT the proper character
    entities for the original characters shown.
    >
    The problem with these bad character entities is that when we use .Net
    Framework components such as XmlReader to load the XML file, character
    entities are not expanded back to the original characters they
    represent.
    >
    Instead I would get the following result:
    >
    Unusal Character Entities Expanded Result:
    &amp;lt; &lt;
    &amp;gt; &gt;
    &amp;#38; &#38;
    &amp;#45; &#45;
    &amp;#47; &#47;
    &amp;#39; &#39;
    >
    If you take a close look at the expanded results, you would see that
    they are the normal character entities you would expect to see.
    >
    It seems to me that XML export process used by the ecommerce site has
    applied character entities "encoding" twice.
    >
    For example, the proper character entity for / is &#47;.
    However, if you treat &#47; as data string and not as character entity
    and apply another "encoding", you would get &amp;#47;.
    >
    This means that whenever a online customer enter characters such as &
    or / in their name or shipping address, the XML file we parsed will
    not give us the correct text.
    >
    For example, if customer entered "Christian & Cruz" on their shipping
    address the XML file we downloaded will show them as "Christian
    &amp;#38; Cruz". And when the XML file is parsed the resulting string
    we get would be "Christian &#38; Cruz".
    >
    Another example. If a customer entered "c/o R. Fenton, M.D." in their
    shipping address, the XML file will show this string as "c&amp;#47; o
    R. Fenton, M.D.". And the resulting string we parsed would be
    "c&#47;o R. Fenton, M.D.".
    >
    When we reported this problem to the ecommerse hosting company, their
    response was that these character entities were "encoded" per PCI
    security policy and thus they have no plan to "fix" them.
    >
    Their reply sounds strange because these weird character entities they
    use in XML files are NOT data encryption nor do they provide security
    benefits.
    >
    Can anyone tell me if there is in fact some kind of special character
    entities used in XML file per PCI security compliancy?
    >
    Or is our ecommerce hosting company wrong?
    >
    Any information would be appreciated.
    Thank you.
    Well we have similar files and I've never seen that happen. As you say they
    seem to be escaping twice. In my opinion they're wrong but I'd need to know
    their process etc.
    Pragmatically you may need to un-escape once before treating the file as
    XML.

    --

    Joe Fawcett (MVP - XML)


    Comment

    • Joe Fawcett

      #3
      Re: Questions about character entities in XML and PCI security compliance



      <tempest@ucla.e duwrote in message
      news:55sm94p8ig 2vpodjolu2cq5d9 61fu8c4q1@4ax.c om...
      Hi all.
      >
      This is a rather long posting but I have some questions concerning the
      usage of character entities in XML documents and PCI security
      compliance.
      >
      The company I work for is using a third party ecommerce service for
      hosting its online store. A few months ago this third party commerce
      site began using PGP file encryption on XML files (e.g. web orders)
      transferred to us as part of the ongoing PCI security compliance.
      Basically we only need to add a PGP decryption process before we can
      parse the incoming XML files so there should not have been any
      technical issue.
      >
      However, we noticed that XML files they created since PGP encryption
      was implemented contain some unusual character entities.
      >
      For example, if a XML file have elements containing characters such as
      <, >, &, -, /, ' and so on, the XML file will use the following
      character entities to represent them as shown below:
      >
      Character Unusal Character Entities
      < &amp;lt;
      > &amp;gt;
      & &amp;amp;
      - &amp;#45;
      / &amp;#47;
      ' &amp#39;
      >
      No matter how you look at them, they are NOT the proper character
      entities for the original characters shown.
      >
      The problem with these bad character entities is that when we use .Net
      Framework components such as XmlReader to load the XML file, character
      entities are not expanded back to the original characters they
      represent.
      >
      Instead I would get the following result:
      >
      Unusal Character Entities Expanded Result:
      &amp;lt; &lt;
      &amp;gt; &gt;
      &amp;#38; &#38;
      &amp;#45; &#45;
      &amp;#47; &#47;
      &amp;#39; &#39;
      >
      If you take a close look at the expanded results, you would see that
      they are the normal character entities you would expect to see.
      >
      It seems to me that XML export process used by the ecommerce site has
      applied character entities "encoding" twice.
      >
      For example, the proper character entity for / is &#47;.
      However, if you treat &#47; as data string and not as character entity
      and apply another "encoding", you would get &amp;#47;.
      >
      This means that whenever a online customer enter characters such as &
      or / in their name or shipping address, the XML file we parsed will
      not give us the correct text.
      >
      For example, if customer entered "Christian & Cruz" on their shipping
      address the XML file we downloaded will show them as "Christian
      &amp;#38; Cruz". And when the XML file is parsed the resulting string
      we get would be "Christian &#38; Cruz".
      >
      Another example. If a customer entered "c/o R. Fenton, M.D." in their
      shipping address, the XML file will show this string as "c&amp;#47; o
      R. Fenton, M.D.". And the resulting string we parsed would be
      "c&#47;o R. Fenton, M.D.".
      >
      When we reported this problem to the ecommerse hosting company, their
      response was that these character entities were "encoded" per PCI
      security policy and thus they have no plan to "fix" them.
      >
      Their reply sounds strange because these weird character entities they
      use in XML files are NOT data encryption nor do they provide security
      benefits.
      >
      Can anyone tell me if there is in fact some kind of special character
      entities used in XML file per PCI security compliancy?
      >
      Or is our ecommerce hosting company wrong?
      >
      Any information would be appreciated.
      Thank you.
      Well we have similar files and I've never seen that happen. As you say they
      seem to be escaping twice. In my opinion they're wrong but I'd need to know
      their process etc.
      Pragmatically you may need to un-escape once before treating the file as
      XML.

      --

      Joe Fawcett (MVP - XML)


      Comment

      • tempest@ucla.edu

        #4
        Re: Questions about character entities in XML and PCI security compliance

        On Fri, 8 Aug 2008 07:55:19 +0100, "Joe Fawcett"
        <joefawcett@new sgroup.nospamwr ote:
        >Well we have similar files and I've never seen that happen. As you say they
        >seem to be escaping twice. In my opinion they're wrong but I'd need to know
        >their process etc.
        >Pragmaticall y you may need to un-escape once before treating the file as
        >XML.
        I think I will just do what you suggested and write an extra process
        to convert ("un-escape") bad character entities to proper entities
        first before passing parsing XML files.

        At least I am glad that someone agrees with me that the third party
        ecommerce site is not exporting proper character entnites in their XML
        file. They refused to fix the problem and used PCI security policy as
        their excuse.

        I spent several hours on Google tyring to find if there is any
        relevancy at all between the use of XML character entities and PCI
        security. And I found none.

        Comment

        • Peter Flynn

          #5
          Re: Questions about character entities in XML and PCI security compliance

          Joe Fawcett wrote:
          [snip]
          Well we have similar files and I've never seen that happen. As you say
          they seem to be escaping twice. In my opinion they're wrong but I'd need
          to know their process etc.
          I would suspect they are not used to dealing with XML, and have been
          told by some less-than-well-informed person that "you always have to do
          this with those funny characters in web pages". But as Joe says, without
          knowing their process it's hard to be sure.

          What *is* sure is that they are wrong to do this. The file when
          decrypted should be the file that was encrypted. They have corrupted it,
          and they must stop doing that.
          Pragmatically you may need to un-escape once before treating the file as
          XML.
          That may not be possible if parts of the document already use numeric
          character references or the &amp;amp; escapement for other reasons (eg
          in CDATA sections). But with luck you may just be able to reconvert it
          until your hosting bods fix the bug.

          ///Peter

          Comment

          • Peter Flynn

            #6
            Re: Questions about character entities in XML and PCI security compliance

            tempest@ucla.ed u wrote:
            At least I am glad that someone agrees with me that the third party
            ecommerce site is not exporting proper character entnites in their XML
            file. They refused to fix the problem and used PCI security policy as
            their excuse.
            Then they are guilty of adding insolence to their ignorance.
            I'd get out of using them as quickly as possible.
            Can you please let us know who they are so that we can avoid them?
            I spent several hours on Google tyring to find if there is any
            relevancy at all between the use of XML character entities and PCI
            security. And I found none.
            There is none.

            ///Peter


            Comment

            • tempest@ucla.edu

              #7
              Re: Questions about character entities in XML and PCI security compliance

              On Sat, 09 Aug 2008 16:51:49 +0100, Peter Flynn
              <peter.nosp@m.s ilmaril.iewrote :
              >Can you please let us know who they are so that we can avoid them?
              If you want to know, the ecommerce service provider is MarketLive.
              According to our management, they are one of the better ecommerce
              providers out there and the reason our company use them.

              Since I have not been able to find similar problems on Google, I have
              a feeling it's just bad luck that MarketLive is exporting improper XML
              files to us (and probably only us) perhaps because of mistakes by
              their programmers. And their technical support manager who is in
              charge of handling our technical support issues insists that those
              character entities are part of their PCI security policy.

              Comment

              • tempest@ucla.edu

                #8
                Re: Questions about character entities in XML and PCI security compliance

                On Fri, 08 Aug 2008 23:49:24 +0100, Peter Flynn
                <peter.nosp@m.s ilmaril.iewrote :
                >I would suspect they are not used to dealing with XML, and have been
                >told by some less-than-well-informed person that "you always have to do
                >this with those funny characters in web pages". But as Joe says, without
                >knowing their process it's hard to be sure.
                The ecommerce provider is actually very knowledgeable as far as XML is
                concerned. When comapred to other provider we have delt with in the
                past, they use a very large and complicated set of XML schemas which
                appear to be well thought.
                >What *is* sure is that they are wrong to do this. The file when
                >decrypted should be the file that was encrypted. They have corrupted it,
                >and they must stop doing that.
                I agree but I am powerless to convince them that they are wrong.

                Comment

                Working...