xml and java euro signs disapear

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • flm

    xml and java euro signs disapear

    I've got an XML document that contains euro signs and looks like :

    <?xml version="1.0" encoding="utf-8"?>
    <merchant id="52">
    <product
    offerid="035430 68131"
    deliverycost="6 ,90 €"
    />
    ....

    I use this bit of Java (jdk 1.4.2) code to parse it :

    DocumentBuilder Factory factory = DocumentBuilder Factory.newInst ance();
    DocumentBuilder builder = factory.newDocu mentBuilder();
    Document document = builder.parse( file_ );

    The problem is the euro signs are transformed into the charactere '?'
    (printing the value of a getAttribute( "deliveryco st" ) gives ? on a
    utf-8 terminal)

    Thanks for any help,
    FL

  • David Carlisle

    #2
    Re: xml and java euro signs disapear


    You have declared that your xml file is utf-8 encoded but have used (as
    far as I can tell) a byte with value 128 to represent a euro which isn't
    the utf8 encoding of character 8364 which is the Euro.
    You either need to declare the encoding that you are using or express
    the character in an encoding-neutral form such as
    "& # 8364 ;"
    (without the spaces

    David

    Comment

    • Francois-Louis Mommens

      #3
      Re: xml and java euro signs disapear

      Thank for you reply David.
      If I use & # 8364; or even & # x20ac like you recommand I got the same
      result.

      FLM

      *** Sent via Developersdex http://www.developersdex.com ***

      Comment

      • Alain Ketterlin

        #4
        Re: xml and java euro signs disapear

        "flm" <flmommens@hotm ail.com> writes:
        [color=blue]
        > The problem is the euro signs are transformed into the charactere '?'
        > (printing the value of a getAttribute( "deliveryco st" ) gives ? on a
        > utf-8 terminal)[/color]

        The problem is in "printing", probably because your Writer object has
        improper encoding and/or mismatching locale. Or because you use
        System.out, which use the locale-specified encoding, which may not be
        utf-8. It's probably best to give an explicit encoding/charset.

        -- Alain.

        Comment

        • Martin Honnen

          #5
          Re: xml and java euro signs disapear



          Francois-Louis Mommens wrote:
          [color=blue]
          > If I use & # 8364; or even & # x20ac like you recommand I got the same
          > result.[/color]

          Are you sure that output terminal is able to render a Euro symbol
          properly? What happens if you do not use XML at all but try to output a
          Euro symbol '€' from a normal string?

          --

          Martin Honnen

          Comment

          • Rob van der Putten

            #6
            Re: xml and java euro signs disapear

            Hi there


            flm wrote:
            [color=blue]
            > I've got an XML document that contains euro signs and looks like :
            >
            > <?xml version="1.0" encoding="utf-8"?>
            > <merchant id="52">
            > <product
            > offerid="035430 68131"
            > deliverycost="6 ,90 ?"
            > />
            > ...
            >
            > I use this bit of Java (jdk 1.4.2) code to parse it :
            >
            > DocumentBuilder Factory factory = DocumentBuilder Factory.newInst ance();
            > DocumentBuilder builder = factory.newDocu mentBuilder();
            > Document document = builder.parse( file_ );
            >
            > The problem is the euro signs are transformed into the charactere '?'
            > (printing the value of a getAttribute( "deliveryco st" ) gives ? on a
            > utf-8 terminal)[/color]

            If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
            charset in your browser / newsreader to UTF-8.

            Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
            Set de default characterset of your editor to UTF-8.
            Use an UTF-8 enabled terminal such as uxterm.
            Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
            and install a unicode font as your default font.


            Regards,
            Rob
            --
            +----------------------------------------------------------------------+
            | The EU constitution will turn the EU into an USA colony |
            | Vote against the EU constitution in the referendum |
            +----------------------------------------------------------------------+

            Comment

            • Rob vd Putten

              #7
              Re: xml and java euro signs disapear

              Hi there


              Rob van der Putten wrote:
              [color=blue]
              > If you want to post an UTF-8 file, use UTF-8 as charset; Set the default
              > charset in your browser / newsreader to UTF-8.
              >
              > Set your locale to UTF-8, eg en_GB.UTF-8 or en_US.UTF-8
              > Set de default characterset of your editor to UTF-8.
              > Use an UTF-8 enabled terminal such as uxterm.
              > Install unicode fonts such as Cyberbit.ttf, Ariel-unicode or GNU-unifont
              > and install a unicode font as your default font.[/color]

              If all goes well, this should be UTF-8;

              Nicer typography in plain text files:

              ╔════ ═════ ═════ ═════ ═════ ═════ ═════ ═════ ═══╗
              â•‘ â•‘
              ║ • ‘single’ and “double” quotes ║
              â•‘ â•‘
              ║ • Curly apostrophes: “We’ve been here” ║
              â•‘ â•‘
              ║ • Latin-1 apostrophe and accents: '´` ║
              â•‘ â•‘
              ║ • ‚deutsche‘ „Anführungsz eichen“ ║
              â•‘ â•‘
              ║ • †, ‡, ‰, •, 3–4, —, −5/+5, ™, … ║
              â•‘ â•‘
              ║ • ASCII safety test: 1lI|, 0OD, 8B ║
              ║ ╭──── ───── ╮ ║
              ║ • the euro symbol: │ 14.95 € │ ║
              ║ ╰──── ───── ╯ ║
              ╚════ ═════ ═════ ═════ ═════ ═════ ═════ ═════ ═══╝

              Russian:

              From a Unicode conference invitation:

              Ð—Ð°Ñ€ÐµÐ³Ð¸ÑÑ ‚рируйте сь сейчас на Десятую ÐœÐµÐ¶Ð´ÑƒÐ½Ð°Ñ €Ð¾Ð´Ð½ÑƒÑŽ ÐšÐ¾Ð½Ñ„ÐµÑ€ÐµÐ ½Ñ†Ð¸ÑŽ Ð¿Ð¾
              Unicode, которая ÑÐ¾ÑÑ‚Ð¾Ð¸Ñ‚Ñ Ñ 10-12 марта 1997 года в Майнце в ГерманиР¸.
              КонфереР½Ñ†Ð¸Ñ соберет широкий круг экспертР¾Ð² по вопросаР¼ глобальР½Ð¾Ð³Ð¾
              Ð˜Ð½Ñ‚ÐµÑ€Ð½ÐµÑ ‚а и Unicode, локализР°Ñ†Ð¸Ð¸ и Ð¸Ð½Ñ‚ÐµÑ€Ð½Ð°Ñ †Ð¸Ð¾Ð½Ð°Ð»Ð¸Ð· ации, воплощеР½Ð¸ÑŽ и
              применеР½Ð¸ÑŽ Unicode в Ñ€Ð°Ð·Ð»Ð¸Ñ‡Ð½Ñ ‹Ñ… операциР¾Ð½Ð½Ñ‹Ñ… ÑÐ¸ÑÑ‚ÐµÐ¼Ð°Ñ … и програмР¼Ð½Ñ‹Ñ…
              приложеР½Ð¸ÑÑ…, шрифтах, верстке и Ð¼Ð½Ð¾Ð³Ð¾ÑÐ·Ñ ‹Ñ‡Ð½Ñ‹Ñ… компьютРµÑ€Ð½Ñ‹Ñ… ÑÐ¸ÑÑ‚ÐµÐ¼Ð°Ñ ….

              Greek:

              From a speech of Demosthenes in the 4th century BC:

              Οὐχὶ ταὐτὰ παρίστα ταί μοι γιγνώσκ ειν, ὦ ἄνδρες ᾿Αθηναῠ–οι,
              ὅταν τ᾿ εἰς τὰ πράγματ α á¼€Ï€Î¿Î²Î»á½³Ï ˆÏ‰ καὶ ὅταν πρὸς τοὺς
              λόγους οὓς ἀκούω· τοὺς μὲν γὰρ λόγους περὶ τοῦ
              τιμωρήσ ασθαι Φίλιππο ν ὁρῶ γιγνομέ νους, τὰ δὲ πράγματ ᾿
              εἰς τοῦτο προήκον τα, ὥσθ᾿ ὅπως μὴ πεισόμε θ᾿ αὐτοὶ
              πρότερο ν κακῶς σκέψασθ αι δέον. οὐδέν οὖν ἄλλο μοι δοκοῦσι ν
              οἱ τὰ τοιαῦτα λέγοντε Ï‚ á¼¢ τὴν ὑπόθεσΠ¹Î½, περὶ á¼§Ï‚ βουλεύε σθαι,
              οὐχὶ τὴν οὖσαν παριστά ντες ὑμῖν ἁμαρτάΠ½ÎµÎ¹Î½. ἐγὼ δέ, ὅτι μέν
              ποτ᾿ ἐξῆν τῇ πόλει καὶ τὰ αὑτῆς ἔχειν á¼€ÏƒÏ†Î±Î»á¿¶Ï ‚ καὶ Φίλιππο ν
              τιμωρήσ ασθαι, καὶ μάλ᾿ á¼€ÎºÏÎ¹Î²á¿¶Ï ‚ οἶδα· ἐπ᾿ ἐμοῦ γάρ, οὐ πάλαι
              γέγονεν ταῦτ᾿ á¼€Î¼Ï†á½¹Ï„ÎµÏ Î±Î‡ νῦν μέντοι πέπεισμ αι τοῦθ᾿ ἱκανὸν
              Ï€ÏÎ¿Î»Î±Î²Îµá ¿–ν ἡμῖν εἶναι τὴν πρώτην, ὅπως τοὺς συμμάχο Ï…Ï‚
              σώσομεν . ἐὰν γὰρ τοῦτο βεβαίως ὑπάρξῃ , τότε καὶ περὶ τοῦ
              τίνα τιμωρήσ εταί τις καὶ ὃν τρόπον ἐξέσταΠ¹ σκοπεῖν · πρὶν δὲ
              τὴν ἀρχὴν ὀρθῶς ὑποθέσΠ¸Î±Î¹, μάταιον ἡγοῦμαΠ¹ περὶ τῆς
              τελευτῆ Ï‚ ὁντινοῠ¦Î½ ποιεῖσθ αι λόγον.

              Δημοσθέ νους, Γ´ ᾿Ολυνθι ακὸς

              All the display, editing and conversion software you use should also be
              capable of handling UTF-8.


              Regards,
              Rob
              --
              +----------------------------------------------------------------------+
              | The EU constitution will turn the EU into an USA colony |
              | Vote against the EU constitution in the referendum |
              +----------------------------------------------------------------------+

              Comment

              • Rob van der Putten

                #8
                Re: xml and java euro signs disapear

                Hi there


                Martin Honnen wrote:
                [color=blue]
                > Are you sure that output terminal is able to render a Euro symbol
                > properly? What happens if you do not use XML at all but try to output a
                > Euro symbol '?' from a normal string?[/color]

                Most UTF-8 enviroments display dec 128 / hex 0x80 as a glyph looking
                something like;

                +----+
                | 00 |
                | 80 |
                +----+

                The same applies to other glyphs in the 128 / 0x80 ... 159 / 0x9F range;

                +----+
                | 00 |
                | 9F |
                +----+

                Maybe UTF-8 is somehow converterd to CP-1252.

                Try yudit, http://www.yudit.org/ to view and edit your files.


                Regards,
                Rob
                --
                +----------------------------------------------------------------------+
                | The EU constitution will turn the EU into an USA colony |
                | Vote against the EU constitution in the referendum |
                +----------------------------------------------------------------------+

                Comment

                Working...