problem with parsing xml

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • donny
    New Member
    • Oct 2006
    • 1

    problem with parsing xml

    HEllo,

    I've found a script (at this url [url]http://www.thescripts. com/forum/thread84554.htm l [/ur]) that is like the things i want to do;
    i want to parse my xml file and modify the value of an attribute;
    for example modify this
    <nom name="pivot">
    <information valeur="Niveau" type="Bon"/>
    </nom>
    in that
    <nom name="pivot">
    <information valeur="Niveau" type="Mauvais"/>
    </nom>

    So, i've modify the script

    Code:
    #!/usr/bin/perl -w
    
    use strict;
    use XML::XPath;
    use XML::XPath::XMLParser;
    use XML::Twig;
    
    # create an object to parse the file and field XPath queries
    # my $xpath = XML::XPath->new( filename => shift @ARGV );
    my $xpath = XML::XPath->new( filename => "client.xml" );
    
    # apply the path from the command line and get back a list matches
    my $field;
    my @field = 'string';
    
    
    my $old_value = $xpath->find("//nom[\@name='pivot']/information/\@type" );
    #find("//nom[\@name='pivot']/information[\@type]/text()" );
    
    print $old_value."\n";
    
    
    #qq{$field\[string() = "$old_value"]}
    my $new_value = 'Tres BOB';
    my $t = new XML::Twig( TwigRoots =>
    qq{$field\[string() = "$old_value"] => \&update} ,
    TwigPrintOutsideRoots => 1,);
    $t->parsefile( 'client2.xml' );
    $t->flush;
    
    sub update
    {
    my( $t, $field_elt)= @_;
    $field_elt->set_text( $new_value);
    $field_elt->print;
    }



    my xml file


    <?xml version="1.0" encoding="windo ws-1250"?>
    <root value="x">
    <entreprise>som e text</entreprise>
    <info></info>
    <client>
    <nom name="pivot">
    <information valeur="Niveau" type="Bon"/>
    </nom>

    <nom name="paul">
    <information valeur="Niveau" type="Bon">xxx</information>
    <information valeur="Solvabl e" type="Mauvais"> zoooooooo</information>
    </nom>
    </client>
    <client>
    <nom name="albine">
    <information valeur="Solvabl e" type="Bon">azer </information>
    </nom>
    </client>
    <client>
    <nom name="Terence">
    <information valeur="Niveau" type="Tres bon"/>
    <information valeur="Solvabl e" type="Bon"/>
    <information valeur="Ancien" type="Oui"/>
    </nom>
    </client>
    </root>



    i obtains this errors , i don't understand ??
    normaly the value of the attribute type must be changed.



    Bon
    Use of uninitialized value in concatenation (.) or string at C:\Documents and Se
    ttings\donny\Bu reau\bigs\parsr .pl line 25.
    Can't use string ("[string() = "Bon"] => &update") as a HASH ref while "strict r
    efs" in use at C:/Perl/site/lib/XML/Twig.pm line 1303.


    thanks
  • miller
    Recognized Expert Top Contributor
    • Oct 2006
    • 1086

    #2
    Greetings,

    I used your post as an excuse to learn a little bit more about XML. I have three solutions to your problem using different CPAN modules. I do not advocate that any of my implimentations are all that efficient, nor that they take advantage of all of the features that these modules have to offer. Nevertheless, I dug through the limitted manuals, source code, or outside references, and come up with workable code using the following:

    1) XML::Simple
    2) XML::XPath
    3) XML::Twig

    I will now include my code. I've left in any debugging information or intermediate attempts in comments '##'. You'll notice that the code is separated into three sections, one for each CPAN module utilized as a solution.

    [CODE=perl]
    #!/usr/bin/perl

    # Goal:
    # From:
    # <nom name="pivot">
    # <information valeur="Niveau" type="Bon"/>
    # </nom>
    # To:
    # <nom name="pivot">
    # <information valeur="Niveau" type="Mauvais"/>
    # </nom>

    use strict;

    my $file = 'client.xml';


    ###
    # Use XML::Simple

    my $fileSimple = 'clientSimple.x ml';
    print "XML::Simpl e to $fileSimple\n\n ";

    use XML::Simple;
    use File::Slurp qw(write_file);

    my $ref = XMLin($file);
    ##$ref->{client}[0]{nom}{pivot}{in formation}{type } = 'Mauvais';
    foreach my $client (@{$ref->{client}}) {
    if (exists $client->{nom}{pivot} ) {
    $client->{nom}{pivot}{i nformation}{typ e} = 'Mauvais';
    last;
    }
    }
    XMLout($ref,
    OutputFile => $fileSimple,
    );


    ###
    # Use XML::XPath

    my $fileXPath = 'clientXPath.xm l';
    print "XML::XPath to $fileXPath\n\n" ;

    use XML::XPath;
    use XML::XPath::XML Parser;
    use File::Slurp qw(write_file);

    # Create an object to parse the file and field XPath queries
    my $xp = XML::XPath->new( filename => "client.xml " );

    # Pull Nodes: q{ type="Bon"} of q{<information valeur="Niveau" type="Bon" />}
    my $nodeset = $xp->find("//nom[\@name='pivot']/information[\@type='Bon']/\@type");

    foreach my $node ($nodeset->get_nodelist ) {
    ## print "FOUND\n",
    ## "\n",
    ## XML::XPath::XML Parser::as_stri ng($node),"\n",
    ## ref($node),"\n" ,
    ## "\n";

    $node->setNodeValue(" Mauvais");
    }

    ##my @nodes = $xp->findnodes("//nom[\@name='pivot']/information");
    ##print XML::XPath::XML Parser::as_stri ng($nodes[0]), "\n\n";

    # Output Results
    my ($root) = $xp->findnodes('/');
    write_file($fil eXPath,
    q{<?xml version="1.0" encoding="windo ws-1250"?>}, "\n", # For some reason, this line doesn't carry over.
    XML::XPath::XML Parser::as_stri ng($root)
    );


    ###
    # Use XML::Twig

    my $fileTwig = 'clientTwig.xml ';
    print "XML::Twig to $fileTwig\n\n";

    use XML::Twig;

    open(TWIGFILE, ">$fileTwig ") or die "open >$fileTwig: $!";

    my $t = new XML::Twig( twig_handlers => {
    ## qq{nom[\@name="pivot"]} => \&update, # process
    qq{nom/information} => \&update, # process
    __default__ => sub { $_[0]->flush; }, # flush anything else
    }, pretty_print => 'nice');
    $t->parsefile( 'client.xml' );
    $t->print( \*TWIGFILE );
    ##$t->flush;

    close(TWIGFLE);

    ##my $i = 0;
    sub update
    {
    my($t, $field_informat ion) = @_;

    ##print "Found " .++$i . "\n";

    my $field_nom = $field_informat ion->parent;
    return unless $field_nom->att("name") eq "pivot";

    return unless $field_informat ion->att("valeur" ) eq "Niveau";

    ##print "Before:\n" ;
    ##$field_nom->print;
    ##print "\n\n";

    $field_informat ion->set_att("typ e" => "Mauvais");

    ##print "After:\n";
    ##$field_nom->print;
    ##print "\n\n";
    }


    1;

    __END__
    [/CODE]

    Now, I'll discuss each of the results.

    The XML::Simple module is the easiest one to use in my opinion. It doesn't take any real knowledge of XML terminology or syntax. Instead it simply parses the xml file into a perl data structure which you can navigate on your own. This is the method that I personally use for any of my projects. But I honestly, don't do much with XML, hence this experiment.

    You'll notice that when the modified data is outputted, it is very different from the original XML. It might be possible to add options to XMLout to format the document a little closer to the input, but I leave such an endeavor up to you:


    clientSimple.xm l
    Code:
    <opt entreprise="some text" value="x">
      <client name="nom">
        <paul>
          <information type="Bon" valeur="Niveau">xxx</information>
          <information type="Mauvais" valeur="Solvable">zoooooooo</information>
        </paul>
        <pivot name="information" type="Mauvais" valeur="Niveau" />
      </client>
      <client name="albine">
        <information type="Bon" valeur="Solvable">azer</information>
      </client>
      <client name="Terence">
        <information type="Tres bon" valeur="Niveau" />
        <information type="Bon" valeur="Solvable" />
        <information type="Oui" valeur="Ancien" />
      </client>
      <info></info>
    </opt>
    The next cpan module was XML::XPath. I took a long time researching this one to figure out how to get it to work. It would definitely help if there was better documentation, but as I learned, most of the docs are in the form of the xpath specification, which is long and arduous. The best help was provided by the .t scripts included in the install of the module. Unfortunately, none of these script describe the best way of outputting results, so the toString method still feels a little like a hack.

    Nevertheless, the above method does work, and if you understand xpath's, this appears to be a very powerful method for accessing xml documents. This method also was the closest to outputting in exactly the format that the original file was in.

    clientXPath.xml
    Code:
    <?xml version="1.0" encoding="windows-1250"?>
    <root value="x">
    <entreprise>some text</entreprise>
    <info />
    <client>
    <nom name="pivot">
    <information valeur="Niveau" type="Mauvais" />
    </nom>
    <nom name="paul">
    <information valeur="Niveau" type="Bon">xxx</information>
    <information valeur="Solvable" type="Mauvais">zoooooooo</information>
    </nom>
    </client>
    <client>
    <nom name="albine">
    <information valeur="Solvable" type="Bon">azer</information>
    </nom>
    </client>
    <client>
    <nom name="Terence">
    <information valeur="Niveau" type="Tres bon" />
    <information valeur="Solvable" type="Bon" />
    <information valeur="Ancien" type="Oui" />
    </nom>
    </client>
    </root>
    Finally we come to XML::Twig. This was ultimately the method that I am both most hopeful for, and also most frustrated by. It appears that the twig_handlers's do not all the full syntax allowed by xpath. I was able to do things like: q{nom[@name="pivot"]} and q{nom/information}, but I was not able to join the two statements. Instead I was ultimately forced to pull all nom/information elements; check the parent name attribute, and only then make an update. This felt awkward after the very direct nature of the XPath query, but it works.

    It is very possible that I just haven't read enough about this module yet, but I leave any further research to you.

    clientTwig.xml
    Code:
    <?xml version="1.0" encoding="windows-1250"?>
    <root value="x">
    <entreprise>some text</entreprise>
    <info></info>
    <client>
    <nom name="pivot">
    <information type="Mauvais" valeur="Niveau"/>
    </nom>
    <nom name="paul">
    <information type="Bon" valeur="Niveau">xxx</information>
    <information type="Mauvais" valeur="Solvable">zoooooooo</information>
    </nom>
    </client>
    <client>
    <nom name="albine">
    <information type="Bon" valeur="Solvable">azer</information>
    </nom>
    </client>
    <client>
    <nom name="Terence">
    <information type="Tres bon" valeur="Niveau"/>
    <information type="Bon" valeur="Solvable"/>
    <information type="Oui" valeur="Ancien"/>
    </nom>
    </client>
    </root>
    That's all the xml play that I intend to do for now. Hopefully one of the solutions will strike your fancy. Enjoy

    Comment

    Working...