Remove data outside a pair of xml tags.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • lazypig06@gmail.com

    Remove data outside a pair of xml tags.

    Hi !

    I am a PHP beginner.
    I hope somebody can help me with this problem that I've been having.
    I've been trying to clean up junk data that I have at the begining and
    ending of an xml file. Let's say I have an xml file with some junk data
    like below

    --------------------------------------------------------------
    Junk dataflasjfasj
    <firsttag>
    <secondtag>data </secondtag>
    <thirdtag>whate ver</thirdtag>
    </firsttag>

    junk dataga.
    ---------------------------------------------------------------

    Does someone know how to remove the junk in the file and just return
    the actual xml stuff( including the <firsttag> tag ?

    I tried to use strpos and substr but somehow the tag in the string is
    not being returned.

    Any help would be appreciated !

    Thanks,
    Lazy Pig

  • Alvaro G. Vicario

    #2
    Re: Remove data outside a pair of xml tags.

    *** lazypig06@gmail .com escribió/wrote (21 Jun 2006 11:08:10 -0700):[color=blue]
    > --------------------------------------------------------------
    > Junk dataflasjfasj
    > <firsttag>
    > <secondtag>data </secondtag>
    > <thirdtag>whate ver</thirdtag>
    > </firsttag>
    >
    > junk dataga.
    > ---------------------------------------------------------------
    >
    > Does someone know how to remove the junk in the file and just return
    > the actual xml stuff( including the <firsttag> tag ?[/color]

    $foo=preg_repla ce('/^.*<firsttag>/mU', '', $foo);
    $foo=preg_repla ce('/<\/firsttag>.*$/mU', '', $foo);


    This is just an idea, make sure it works as expected in all special cases.


    --
    -+ Álvaro G. Vicario - Burgos, Spain
    ++ http://bits.demogracia.com es mi sitio para programadores web
    +- http://www.demogracia.com es mi web de humor libre de cloro
    --

    Comment

    • Tommy Gildseth

      #3
      Re: Remove data outside a pair of xml tags.

      lazypig06@gmail .com wrote:[color=blue]
      >
      > --------------------------------------------------------------
      > Junk dataflasjfasj
      > <firsttag>
      > <secondtag>data </secondtag>
      > <thirdtag>whate ver</thirdtag>
      > </firsttag>
      >
      > junk dataga.
      > ---------------------------------------------------------------
      >
      > Does someone know how to remove the junk in the file and just return
      > the actual xml stuff( including the <firsttag> tag ?[/color]

      I believe this should work:

      <?php
      $str = 'Junk dataflasjfasj
      <firsttag>
      <secondtag>data </secondtag>
      <thirdtag>whate ver</thirdtag>
      </firsttag>

      junk dataga.
      ';

      echo preg_replace('/.*?<(.*)>.*/s', '<$1>', $str);
      ?>


      --
      Tommy Gildseth

      Comment

      • Tommy Gildseth

        #4
        Re: Remove data outside a pair of xml tags.

        Tommy Gildseth wrote:[color=blue]
        > lazypig06@gmail .com wrote:
        >[color=green]
        >>
        >> --------------------------------------------------------------
        >> Junk dataflasjfasj
        >> <firsttag>
        >> <secondtag>data </secondtag>
        >> <thirdtag>whate ver</thirdtag>
        >> </firsttag>
        >>
        >> junk dataga.
        >> ---------------------------------------------------------------
        >>
        >> Does someone know how to remove the junk in the file and just return
        >> the actual xml stuff( including the <firsttag> tag ?[/color]
        >
        >
        > I believe this should work:
        >
        > ....snip snip php code[/color]

        Well.... not quite, if the junk data contains < or >

        This might be better:

        <?php
        $str = 'Junk> dat<aflasjfasj
        <firsttag>
        <secondtag>data </secondtag>
        <thirdtag>whate ver</thirdtag>
        </firsttag>

        junk data>ga. < sadfsda fsd
        ';

        echo preg_replace('/.*?(<[^<]+>.*<.*?>).*/s', '$1', $str);
        ?>

        --
        Tommy Gildseth

        Comment

        • lazypig06@gmail.com

          #5
          Re: Remove data outside a pair of xml tags.

          Thank you all for your responses !

          It turned out my xml file contains junks within its xml tags such as
          below.

          ----------------------------------------------
          1ffc
          <firsttag>
          <secondtag>so me data</secondtag>
          <third
          1ffc
          tag>Data for third tag</thirdtag>
          </firsttag>
          0
          --------------------------------------------------
          I did some research on the web and it turned out the junks that I have
          in the xml file are Greek's characters (The junks are "1ffc", "fa1",
          and some numbers including number 0 at the end of xml file. These junks
          prevent the xml to be parsed correctly.

          Does anybody have any idea to to get rid of these junk/ convert these
          characters to empty string/characters ?

          Thanks for your help,

          Lazy Pig





          lazypig06@gmail .com wrote:[color=blue]
          > Hi !
          >
          > I am a PHP beginner.
          > I hope somebody can help me with this problem that I've been having.
          > I've been trying to clean up junk data that I have at the begining and
          > ending of an xml file. Let's say I have an xml file with some junk data
          > like below
          >
          > --------------------------------------------------------------
          > Junk dataflasjfasj
          > <firsttag>
          > <secondtag>data </secondtag>
          > <thirdtag>whate ver</thirdtag>
          > </firsttag>
          >
          > junk dataga.
          > ---------------------------------------------------------------
          >
          > Does someone know how to remove the junk in the file and just return
          > the actual xml stuff( including the <firsttag> tag ?
          >
          > I tried to use strpos and substr but somehow the tag in the string is
          > not being returned.
          >
          > Any help would be appreciated !
          >
          > Thanks,
          > Lazy Pig[/color]

          Comment

          • Alvaro G. Vicario

            #6
            Re: Remove data outside a pair of xml tags.

            *** lazypig06@gmail .com escribió/wrote (21 Jun 2006 15:40:00 -0700):[color=blue]
            > I did some research on the web and it turned out the junks that I have
            > in the xml file are Greek's characters (The junks are "1ffc", "fa1",
            > and some numbers including number 0 at the end of xml file. These junks
            > prevent the xml to be parsed correctly.
            >
            > Does anybody have any idea to to get rid of these junk/ convert these
            > characters to empty string/characters ?[/color]

            This reminds me of raw responses when using the "chunked" transfer
            encoding. Check user notes in fsockopen() and fpassthru() manual pages.

            Also, if you're downloading the file from your script, I'd suggest you try
            Curl functions and see if garbage goes away.



            --
            -+ Álvaro G. Vicario - Burgos, Spain
            ++ http://bits.demogracia.com es mi sitio para programadores web
            +- http://www.demogracia.com es mi web de humor libre de cloro
            --

            Comment

            Working...