Need help in parsing the special characters using XML::Parser

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rellaboyina
    New Member
    • Jan 2007
    • 55

    Need help in parsing the special characters using XML::Parser

    Dear All,

    I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Ex pat. This data consists of some special characters like "ø, á, í, é, È, ž, ù, ý".
    But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)".

    Could anyone please help me out in solving this one.

    Thanks alot.
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    Originally posted by rellaboyina
    Dear All,

    I am having some data which will be stored in XML format and this needs to be parsed using the parser module XML::Parser and XML::Parser::Ex pat. This data consists of some special characters like "ø, á, í, é, È, ž, ù, ý".
    But when I try to parse the particular record with these special characters using the method parse(), I got an error "not well-formed (invalid token)".

    Could anyone please help me out in solving this one.

    Thanks alot.

    Without seeing your code or the sample data, we have no way of knowing what you are doing. Please post your code ( in the appropriate code tags) and a sample of the data you are parsing and we will have a look.

    Regards,

    Jeff

    Comment

    • rellaboyina
      New Member
      • Jan 2007
      • 55

      #3
      Need help in parsing the special characters using XML::Parser

      Originally posted by numberwhun
      Without seeing your code or the sample data, we have no way of knowing what you are doing. Please post your code ( in the appropriate code tags) and a sample of the data you are parsing and we will have a look.

      Regards,

      Jeff
      Here I am posing my code:

      [CODE=perl]
      sub parse {
      my $self = shift;
      my $arg = shift;
      my @expat_options = ();
      my ($key, $val);
      while (($key, $val) = each %{$self}) {
      push(@expat_opt ions, $key, $val)
      unless exists $self->{Non_Expat_Opt ions}->{$key};
      }

      my $expat = new XML::Parser::Ex pat(@expat_opti ons, @_);
      my %handlers = %{$self->{Handlers}};
      my $init = delete $handlers{Init} ;
      my $final = delete $handlers{Final };

      $expat->setHandlers(%h andlers);

      if ($self->{Base}) {
      $expat->base($self->{Base});
      }

      &$init($expa t)
      if defined($init);

      my @result = ();
      my $result;
      eval {
      $result = $expat->parse($arg);
      };
      my $err = $@;
      if ($err) {
      $expat->release;
      die $err;
      }

      if ($result and defined($final) ) {
      if (wantarray) {
      @result = &$final($expat) ;
      }
      else {
      $result = &$final($expat) ;
      }
      }

      $expat->release;

      return unless defined wantarray;
      return wantarray ? @result : $result;
      }
      [/CODE]

      where $arg will contain the xml data to be parsed which is having the special characters to be parsed.

      The xml data will look like this :

      [CODE=xml]<record>
      <source-app >ABC</source-app>
      <ref-type>6</ref-type>
      <contributors >
      <authors>
      <author>
      <style face="normal" font="default" size="100%">Dvo øák, Petr</style>
      </author>
      </authors>
      </contributors>
      <titles>
      <title>
      <style face="normal" font="default" size="100%">Sys tematická teologie I : øÃ*mskokatolic ká perspektiva</style>
      </title>
      </titles>
      <pages>
      <style>285 s.</style>
      </pages>
      <edition>
      <style>1. vyd.</style>
      </edition>
      <keywords>
      <keyword>
      <style>uèenà * katolické cÃ*rkve</style>
      </keyword>
      </keywords>
      <dates>
      <year>
      <style>1996</style>
      </year>
      </dates>
      <pub-location>
      <style>Brno&#xD ;Praha</style>
      </pub-location>
      <publisher>
      <style>Centru m pro studium demokracie a kultury ;&#xD;Èeská køesanská akademie</style>
      </publisher>
      <notes>
      <style>uspoøá dali Francis S. Fiorenza a John P. Galvin ; [z angliètiny pøeložili Petr Dvoøák ... et al.]&#xD;20 cm&#xD;Pozn.&#x D;Pozn. o autorech traktátù&#xD; Zkratky&#xD;Bib liogr.&#xD;Odka zy na lit.&#xD;Jmennà ½ a vìcný rejstøÃ*k</style>
      </notes>
      </record>[/CODE]

      Please have a look at it and help me.
      Last edited by eWish; Oct 25 '07, 11:34 AM. Reason: Added XML Code Tag

      Comment

      • rellaboyina
        New Member
        • Jan 2007
        • 55

        #4
        Originally posted by rellaboyina
        Here I am posing my code:

        [CODE=perl]
        sub parse {
        my $self = shift;
        my $arg = shift;
        my @expat_options = ();
        my ($key, $val);
        while (($key, $val) = each %{$self}) {
        push(@expat_opt ions, $key, $val)
        unless exists $self->{Non_Expat_Opt ions}->{$key};
        }

        my $expat = new XML::Parser::Ex pat(@expat_opti ons, @_);
        my %handlers = %{$self->{Handlers}};
        my $init = delete $handlers{Init} ;
        my $final = delete $handlers{Final };

        $expat->setHandlers(%h andlers);

        if ($self->{Base}) {
        $expat->base($self->{Base});
        }

        &$init($expa t)
        if defined($init);

        my @result = ();
        my $result;
        eval {
        $result = $expat->parse($arg);
        };
        my $err = $@;
        if ($err) {
        $expat->release;
        die $err;
        }

        if ($result and defined($final) ) {
        if (wantarray) {
        @result = &$final($expat) ;
        }
        else {
        $result = &$final($expat) ;
        }
        }

        $expat->release;

        return unless defined wantarray;
        return wantarray ? @result : $result;
        }
        [/CODE]

        where $arg will contain the xml data to be parsed which is having the special characters to be parsed.

        The xml data will look like this :

        [CODE=xml]<record>
        <source-app >ABC</source-app>
        <ref-type>6</ref-type>
        <contributors >
        <authors>
        <author>
        <style face="normal" font="default" size="100%">Dvo øák, Petr</style>
        </author>
        </authors>
        </contributors>
        <titles>
        <title>
        <style face="normal" font="default" size="100%">Sys tematická teologie I : øÃ*mskokatolic ká perspektiva</style>
        </title>
        </titles>
        <pages>
        <style>285 s.</style>
        </pages>
        <edition>
        <style>1. vyd.</style>
        </edition>
        <keywords>
        <keyword>
        <style>uèenà * katolické cÃ*rkve</style>
        </keyword>
        </keywords>
        <dates>
        <year>
        <style>1996</style>
        </year>
        </dates>
        <pub-location>
        <style>Brno&#xD ;Praha</style>
        </pub-location>
        <publisher>
        <style>Centru m pro studium demokracie a kultury ;&#xD;Èeská køesanská akademie</style>
        </publisher>
        <notes>
        <style>uspoøá dali Francis S. Fiorenza a John P. Galvin ; [z angliètiny pøeložili Petr Dvoøák ... et al.]&#xD;20 cm&#xD;Pozn.&#x D;Pozn. o autorech traktátù&#xD; Zkratky&#xD;Bib liogr.&#xD;Odka zy na lit.&#xD;Jmennà ½ a vìcný rejstøÃ*k</style>
        </notes>
        </record>[/CODE]

        Please have a look at it and help me.

        Can anybody help me out on this

        Comment

        • eWish
          Recognized Expert Contributor
          • Jul 2007
          • 973

          #5
          What language uses the special characters you are encountering? The problem is the encoding. You more that likely need to use the utf-8 encoding in your XML document. This module does support the utf-8 encoding. As well as others except for Japanese, I believe.

          Comment

          • rellaboyina
            New Member
            • Jan 2007
            • 55

            #6
            Originally posted by eWish
            What language uses the special characters you are encountering? The problem is the encoding. You more that likely need to use the utf-8 encoding in your XML document. This module does support the utf-8 encoding. As well as others except for Japanese, I believe.
            The data corresponding to the tags in the above xml is Czec. My problem is that the Parser.pm is not able to parse these characters.

            Can you please give me a script which can parse these special characters i.e. a method to handle these characters using XML::Parser module.

            Comment

            • rellaboyina
              New Member
              • Jan 2007
              • 55

              #7
              Originally posted by rellaboyina
              The data corresponding to the tags in the above xml is Czec. My problem is that the Parser.pm is not able to parse these characters.

              Can you please give me a script which can parse these special characters i.e. a method to handle these characters using XML::Parser module.
              Could anybody help me out on this issue please?

              Comment

              Working...