XML and PDF...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Verner Jensen, Ålborg

    XML and PDF...

    Hi'

    Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
    encoded/wrapped or something, cause I can't figure out how the XML text
    format is able to hold binary data?

    The assignment is to extract the PDF from the XML - put it in an Oracle
    BLOB - and store it in an Ora-DB.

    The part which extract the PDF from XML - should this contain some kind of
    conversion (text => binary) ?

    Any help, samples, eg. would be appreciated...
    Rgds, Henrik


  • Peter Flynn

    #2
    Re: XML and PDF...

    Verner Jensen, �borg wrote:
    [color=blue]
    > Hi'
    >
    > Is it possible to store a PDF doc, as part of an XML?[/color]

    No, not directly.
    [color=blue]
    > Should the PDF-part
    > be encoded/wrapped or something,[/color]

    Yes, that's possible. You just have to ensure that the encode will never
    output non-XML characters, nor "<" or "&" unless you put it in a CDATA
    section.
    [color=blue]
    > cause I can't figure out how the XML text
    > format is able to hold binary data?[/color]

    It can't. XML is a text file format.
    [color=blue]
    > The assignment is to extract the PDF from the XML - put it in an Oracle
    > BLOB - and store it in an Ora-DB.
    >
    > The part which extract the PDF from XML - should this contain some kind of
    > conversion (text => binary) ?[/color]

    The code which extracts the encoded data would trigger a decoder which
    would recreate the PDF document.

    I realise it's a college assignment, but I have difficulty imagining any
    circumstances in which I would want to do this. I'd be interested to know
    what the person who set the assignment envisages.

    ///Peter, java groups removed from posting
    --
    sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
    &;top"

    Comment

    • Patrick TJ McPhee

      #3
      Re: XML and PDF...

      In article <TQT0e.107429$V f.4063527@news0 00.worldonline. dk>,
      Verner Jensen, Ålborg <java@ofir.dk > wrote:

      % Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
      % encoded/wrapped or something, cause I can't figure out how the XML text
      % format is able to hold binary data?

      It's typical to use MIME base-64 encoding to encode binary data in XML
      files.

      --

      Patrick TJ McPhee
      North York Canada
      ptjm@interlog.c om

      Comment

      • Romin Irani

        #4
        Re: XML and PDF...

        ptjm@interlog.c om (Patrick TJ McPhee) wrote in message news:<3aj0npF67 ube5U1@uni-berlin.de>...[color=blue]
        > In article <TQT0e.107429$V f.4063527@news0 00.worldonline. dk>,
        > Verner Jensen, Ålborg <java@ofir.dk > wrote:
        >
        > % Is it possible to store a PDF doc, as part of an XML? Should the PDF-part be
        > % encoded/wrapped or something, cause I can't figure out how the XML text
        > % format is able to hold binary data?
        >
        > It's typical to use MIME base-64 encoding to encode binary data in XML
        > files.[/color]

        Since the PDF file is a binary format -- you have to encode it in a
        fashion that is compatible with text while inserting it into the XML
        instance. As correctly mentioned here, you should be base64 encoding
        for the same.

        The process would roughly be the following:
        a) To encode the PDF
        1) Take the PDF content as bytes
        2) Run it through a program / method which goes something like:
        PDFInBase64Byte s = convertToBase64 (PDFBytes)
        3) Insert it into a XML instance after converting to string.
        <MyXMLDoc>
        <!-- other elements -->
        <PDFSegment>Bas e64 representation of
        PDF</PDFSegment>
        </MyXMLDoc>
        b) To decode the PDF
        1) Extract out the value of the XML element <PDFSegment>.
        2) Do the reverse i.e.
        PDFBytes = decodeFromBase6 4(<PDFSegment> value...)
        3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
        Reader.

        There are several free base64 encoding/decoding libraries available on
        the net in a variety of languages. Pick up and try it out quickly.

        We have used the above process as mentioned and it works fine.

        Comment

        • Verner Jensen, Ålborg

          #5
          Re: XML and PDF...

          Thx alot - fine description ;-)

          Rgds, Henrik

          "Romin Irani" <romin.k.irani@ gmail.com> wrote in message
          news:95f6cc08.0 503251930.11605 019@posting.goo gle.com...[color=blue]
          > ptjm@interlog.c om (Patrick TJ McPhee) wrote in message
          > news:<3aj0npF67 ube5U1@uni-berlin.de>...[color=green]
          >> In article <TQT0e.107429$V f.4063527@news0 00.worldonline. dk>,
          >> Verner Jensen, Ålborg <java@ofir.dk > wrote:
          >>
          >> % Is it possible to store a PDF doc, as part of an XML? Should the
          >> PDF-part be
          >> % encoded/wrapped or something, cause I can't figure out how the XML text
          >> % format is able to hold binary data?
          >>
          >> It's typical to use MIME base-64 encoding to encode binary data in XML
          >> files.[/color]
          >
          > Since the PDF file is a binary format -- you have to encode it in a
          > fashion that is compatible with text while inserting it into the XML
          > instance. As correctly mentioned here, you should be base64 encoding
          > for the same.
          >
          > The process would roughly be the following:
          > a) To encode the PDF
          > 1) Take the PDF content as bytes
          > 2) Run it through a program / method which goes something like:
          > PDFInBase64Byte s = convertToBase64 (PDFBytes)
          > 3) Insert it into a XML instance after converting to string.
          > <MyXMLDoc>
          > <!-- other elements -->
          > <PDFSegment>Bas e64 representation of
          > PDF</PDFSegment>
          > </MyXMLDoc>
          > b) To decode the PDF
          > 1) Extract out the value of the XML element <PDFSegment>.
          > 2) Do the reverse i.e.
          > PDFBytes = decodeFromBase6 4(<PDFSegment> value...)
          > 3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
          > Reader.
          >
          > There are several free base64 encoding/decoding libraries available on
          > the net in a variety of languages. Pick up and try it out quickly.
          >
          > We have used the above process as mentioned and it works fine.[/color]


          Comment

          • dc

            #6
            Re: XML and PDF...

            here's an example of an XML doc that contains a PNG image, base64-encoded.


            here's the JSP that generates it:


            you can actually run the JSP and load that XML into MS Word and see the
            result of the image.

            (need MS-Word installed to do this)

            -D

            "Verner Jensen, Ålborg" <java@ofir.dk > wrote in message
            news:CLb1e.1075 27$Vf.4081513@n ews000.worldonl ine.dk...[color=blue]
            > Thx alot - fine description ;-)
            >
            > Rgds, Henrik
            >
            > "Romin Irani" <romin.k.irani@ gmail.com> wrote in message
            > news:95f6cc08.0 503251930.11605 019@posting.goo gle.com...[color=green]
            >> ptjm@interlog.c om (Patrick TJ McPhee) wrote in message
            >> news:<3aj0npF67 ube5U1@uni-berlin.de>...[color=darkred]
            >>> In article <TQT0e.107429$V f.4063527@news0 00.worldonline. dk>,
            >>> Verner Jensen, Ålborg <java@ofir.dk > wrote:
            >>>
            >>> % Is it possible to store a PDF doc, as part of an XML? Should the
            >>> PDF-part be
            >>> % encoded/wrapped or something, cause I can't figure out how the XML
            >>> text
            >>> % format is able to hold binary data?
            >>>
            >>> It's typical to use MIME base-64 encoding to encode binary data in XML
            >>> files.[/color]
            >>
            >> Since the PDF file is a binary format -- you have to encode it in a
            >> fashion that is compatible with text while inserting it into the XML
            >> instance. As correctly mentioned here, you should be base64 encoding
            >> for the same.
            >>
            >> The process would roughly be the following:
            >> a) To encode the PDF
            >> 1) Take the PDF content as bytes
            >> 2) Run it through a program / method which goes something like:
            >> PDFInBase64Byte s = convertToBase64 (PDFBytes)
            >> 3) Insert it into a XML instance after converting to string.
            >> <MyXMLDoc>
            >> <!-- other elements -->
            >> <PDFSegment>Bas e64 representation of
            >> PDF</PDFSegment>
            >> </MyXMLDoc>
            >> b) To decode the PDF
            >> 1) Extract out the value of the XML element <PDFSegment>.
            >> 2) Do the reverse i.e.
            >> PDFBytes = decodeFromBase6 4(<PDFSegment> value...)
            >> 3) Provide the PDFBytes to a PDF-aware application e.g. Adobe PDF
            >> Reader.
            >>
            >> There are several free base64 encoding/decoding libraries available on
            >> the net in a variety of languages. Pick up and try it out quickly.
            >>
            >> We have used the above process as mentioned and it works fine.[/color]
            >
            >[/color]


            Comment

            Working...