How to check the encoding format of an XML

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rellaboyina
    New Member
    • Jan 2007
    • 55

    How to check the encoding format of an XML

    Dear All,
    I have an xml without the encoding format line like "<?xml version="1.0" encoding="UTF-8"?>" at the start of the xml. I am parsing the XML using the module XML::Parser and I am able to parse it without any errors.

    If I want to check the encoding format of the XML how can I check that? Is there any method in Perl to check the encoding format of the XML which is without the XML Declaration line at the start?

    The XML looks like this:
    [CODE=xml]<record>
    <name>a</name>
    <place>b</place>
    </record>
    <record>
    <name>c</name>
    <place>d</place>
    </record>[/CODE]

    Can anyone help me out please?
  • eWish
    Recognized Expert Contributor
    • Jul 2007
    • 973

    #2
    Have you looked at CPAN? There is a module called XML::ParseDTD that might be what you need.

    --Kevin

    Comment

    • rellaboyina
      New Member
      • Jan 2007
      • 55

      #3
      Originally posted by eWish
      Have you looked at CPAN? There is a module called XML::ParseDTD that might be what you need.

      --Kevin
      How can we check the encoding format of an XML when there is no DTD to be validated against? I have tried with Encode::Guess module but not able to get the format. May be something wrong in the code:

      Code:
      use Encode::Guess;
      use strict;
      
      my $filename = 'records.xml';
      open (my $fh,$filename) or die $!;
      my $data = "";
      while($_ = <$fh>){
        $data .= $_;
      }
      my $decoder = guess_encoding($data);
      die $decoder unless ref($decoder);
      print "\nref = @ref";
      my $utf8 = $decoder->decode($data);
      where as the records.xml is of the format which I have posted above.

      Can anyone please help me out.

      Comment

      • rellaboyina
        New Member
        • Jan 2007
        • 55

        #4
        Originally posted by rellaboyina
        How can we check the encoding format of an XML when there is no DTD to be validated against? I have tried with Encode::Guess module but not able to get the format. May be something wrong in the code:

        Code:
        use Encode::Guess;
        use strict;
        
        my $filename = 'records.xml';
        open (my $fh,$filename) or die $!;
        my $data = "";
        while($_ = <$fh>){
          $data .= $_;
        }
        my $decoder = guess_encoding($data);
        die $decoder unless ref($decoder);
        print "\nref = @ref";
        my $utf8 = $decoder->decode($data);
        where as the records.xml is of the format which I have posted above.

        Can anyone please help me out.

        Can anyone help me out ..

        Comment

        • eWish
          Recognized Expert Contributor
          • Jul 2007
          • 973

          #5
          [CODE=perl]
          use strict;
          use Encode::Guess;

          my $file = '/path/to/file/my_xml.xml';
          my $data;

          open (my $FH, '<', $file) || die "Can't open file: $!";
          while($data = <$FH>) {
          chomp($data);

          my $decoder = Encode::Guess->guess($data) ;
          die $decoder unless ref($decoder);
          my $utf8 = $decoder->decode($data );

          }
          close ($FH);[/CODE]

          Comment

          Working...