String parsing question...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • timslavin@gmail.com

    String parsing question...

    Hi,

    I'm trying to do something with PHP and I'm not 100% familiar with PHP
    as I am with VBScript. So if you could bear with me on what is likely a
    stupid question, I'd appreciate it!

    I have a chunk of text with a variety of tags inside the text. I want
    to perform the following process to this chunk of text:

    First, I want to grab data between two markers that I define (e.g.
    <start... data here ... </startand strip of the text before the
    first marker (<startin this example) and after the second marker
    (</start>) in this example. That would leave me with the "... data here
    ...." chunk with my markers either included (worst case) or removed
    (best case, saving me the third step below).

    Is this one PHP function or two functions? I see that strstr will get
    me everything to the right of <startbut I cannot figure out how to
    remove everything to the right of </startso that I only have the data
    chunk I want (what's between these two markers).

    Second, I want to substitute values for values found in the data chunk.
    I know str_replace does that just fine.

    Third, I then want to strip out the markers from my data chunk. The
    <startmarker has elements to it (e.g. limit=) so I'd need something
    that would grab everything from <start to the close of the bracket
    (e.g. remove <start limit=1>) to remove it from my data chunk. And
    finally I would then want to remove the </startmarker from my data
    chunk.

    Is this do-able in PHP with a couple functions? Or does it require lots
    of string manipulation and lots of functions? Or is it impossible?

    Thanks in advance for any insight or pointers to PHP string functions
    that'll help!

    Tim

  • Benjamin Esham

    #2
    Re: String parsing question...

    timslavin wrote:
    First, I want to grab data between two markers that I define (e.g.
    <start... data here ... </startand strip of the text before the first
    marker (<startin this example) and after the second marker (</start>) in
    this example. That would leave me with the "... data here ..." chunk with
    my markers either included (worst case) or removed (best case, saving me
    the third step below).
    $pieces = preg_split('/\<(\/)?start\>/', $input);
    $chunk = $pieces[1];

    Assuming that $input is your input data, $chunk will contain your "data
    here" segment. What this does is to split the data into an array; the
    regular expression passed to preg_split() matches both the <starttag and
    the </starttag, so the array has three elements. The 0th element contains
    everything before <start>, the 1st contains everything between the tags, and
    the 2nd contains everything afterwards. (Note that this is untested; my
    regular expression might be wrong. Looking at [1] should help you with the
    regex syntax expected by preg_split().)

    [1] http://perldoc.perl.org/perlre.html
    Is this one PHP function or two functions? I see that strstr will get me
    everything to the right of <startbut I cannot figure out how to remove
    everything to the right of </startso that I only have the data chunk I
    want (what's between these two markers).
    You could probably call strstr() twice and then substr(), but IMO using
    preg_split() is way easier.
    Second, I want to substitute values for values found in the data chunk. I
    know str_replace does that just fine.
    Yep. If you need even more advanced replacing functionality look at
    ereg_replace() and preg_replace().
    Third, I then want to strip out the markers from my data chunk.
    This will be done as a side effect of preg_split().

    HTH,
    --
    Benjamin D. Esham
    bdesham@gmail.c om | AIM: bdesham128 | Jabber: same as e-mail
    "...more and more of our imports are coming from overseas."
    — George W. Bush

    Comment

    • PTM

      #3
      Re: String parsing question...

      "Benjamin Esham" <bdesham@gmail. comwrote in message
      news:pan.2006.0 7.27.03.58.44.3 71018@gmail.com ...
      timslavin wrote:
      >
      >First, I want to grab data between two markers that I define (e.g.
      ><start... data here ... </startand strip of the text before the first
      >marker (<startin this example) and after the second marker (</start>)
      >in
      >this example. That would leave me with the "... data here ..." chunk with
      >my markers either included (worst case) or removed (best case, saving me
      >the third step below).
      >
      $pieces = preg_split('/\<(\/)?start\>/', $input);
      $chunk = $pieces[1];
      >
      Assuming that $input is your input data, $chunk will contain your "data
      here" segment. What this does is to split the data into an array; the
      regular expression passed to preg_split() matches both the <starttag and
      the </starttag, so the array has three elements. The 0th element
      contains
      everything before <start>, the 1st contains everything between the tags,
      and
      the 2nd contains everything afterwards. (Note that this is untested; my
      regular expression might be wrong. Looking at [1] should help you with
      the
      regex syntax expected by preg_split().)
      >
      [1] http://perldoc.perl.org/perlre.html
      >
      >Is this one PHP function or two functions? I see that strstr will get me
      >everything to the right of <startbut I cannot figure out how to remove
      >everything to the right of </startso that I only have the data chunk I
      >want (what's between these two markers).
      >
      You could probably call strstr() twice and then substr(), but IMO using
      preg_split() is way easier.
      >
      >Second, I want to substitute values for values found in the data chunk.
      >I
      >know str_replace does that just fine.
      >
      Yep. If you need even more advanced replacing functionality look at
      ereg_replace() and preg_replace().
      >
      >Third, I then want to strip out the markers from my data chunk.
      >
      This will be done as a side effect of preg_split().
      >
      HTH,
      --
      Benjamin D. Esham
      bdesham@gmail.c om | AIM: bdesham128 | Jabber: same as e-mail
      "...more and more of our imports are coming from overseas."
      - George W. Bush
      >
      Assuming that you are only using <startand </starttags, and no other <>
      </tag pairs, in the line you're checking, you could use the strip_tags()
      command, eg:

      $variable_name= STRIP_TAGS($lin e_read_from_fil e[$optional_line_ counter]);

      to do it.
      Your tags don't actually have to be called <startand </startfor this to
      work, ANY tag will be stripped.
      Tags you want kept will have to be listed as allowable tags, eg:

      $variable_name= STRIP_TAGS($lin e_read_from_fil e[$optional_line_ counter],
      "$allowable_tag s", "$allowable_tag s" );

      xhtml tags wont be allowed unless you use the html tag name
      eg <br /should be <br>
      You also need to be sure your tags are properly formatted with both < and >
      characters or you could get some strange results.

      I use strip_tags() in an xml parser and it reduced my code considerably.


      Phil


      Comment

      • Fred!head

        #4
        Re: String parsing question...

        Benjamin Esham wrote:
        timslavin wrote:
        >
        First, I want to grab data between two markers that I define (e.g.
        <start... data here ... </startand strip of the text before the first
        marker (<startin this example) and after the second marker (</start>) in
        this example. That would leave me with the "... data here ..." chunk with
        my markers either included (worst case) or removed (best case, saving me
        the third step below).
        >
        $pieces = preg_split('/\<(\/)?start\>/', $input);
        $chunk = $pieces[1];
        >
        Assuming that $input is your input data, $chunk will contain your "data
        here" segment. What this does is to split the data into an array; the
        regular expression passed to preg_split() matches both the <starttag and
        the </starttag, so the array has three elements. The 0th element contains
        everything before <start>, the 1st contains everything between the tags, and
        the 2nd contains everything afterwards. (Note that this is untested; my
        regular expression might be wrong. Looking at [1] should help you with the
        regex syntax expected by preg_split().)
        >
        [1] http://perldoc.perl.org/perlre.html
        >
        Is this one PHP function or two functions? I see that strstr will get me
        everything to the right of <startbut I cannot figure out how to remove
        everything to the right of </startso that I only have the data chunk I
        want (what's between these two markers).
        >
        You could probably call strstr() twice and then substr(), but IMO using
        preg_split() is way easier.
        >
        Second, I want to substitute values for values found in the data chunk. I
        know str_replace does that just fine.
        >
        Yep. If you need even more advanced replacing functionality look at
        ereg_replace() and preg_replace().
        >
        Third, I then want to strip out the markers from my data chunk.
        >
        This will be done as a side effect of preg_split().
        >
        HTH,
        --
        Benjamin D. Esham
        bdesham@gmail.c om | AIM: bdesham128 | Jabber: same as e-mail
        "...more and more of our imports are coming from overseas."
        - George W. Bush
        Thanks, Benjamin, and for the Bush quote: very obvious and funny.

        Probably it's the fact my mind goes blank when reading about regular
        expressions but I'm not able to make the preg_split work. If you have
        time/interest, I'd appreciate any additional thoughts.

        Basically I'm pulling a template from a database field then performing
        operations on that data. Within the template I have this data:

        .... stuff here ...

        <@content limit="" ... more elements ... >
        <h2><$Title$> </h2>
        <$Content$>
        </content@>

        .... more stuff here ...

        So I'm trying to grab everything between the end of <@content and
        </content@as a single data chunk that I can then perform operations
        on (like replacing <$Title$and <$Content$wit h result set data from
        another query).

        What modifications to the preg_split do I need to make this work? Is
        there a cleaner way to set up the <contenttags, like </content>
        instead of </content@that would make the regular expression more
        efficient? I like using the @ as a flag to find the start marker, on
        the premise that makes false results less likely, but maybe I'm
        deluded.

        I appreciate your help so far! Thank you.

        Tim

        Comment

        • Benjamin Esham

          #5
          Re: String parsing question...

          Fred!head wrote:
          Benjamin Esham wrote:
          >
          timslavin wrote:
          First, I want to grab data between two markers that I define (e.g.
          <start... data here ... </startand strip of the text before the
          first marker (<startin this example) and after the second marker
          (</start>) in this example. That would leave me with the "... data
          here ..." chunk with my markers either included (worst case) or
          removed (best case, saving me the third step below).
          $pieces = preg_split('/\<(\/)?start\>/', $input);
          >
          Probably it's the fact my mind goes blank when reading about regular
          expressions but I'm not able to make the preg_split work. If you have
          time/interest, I'd appreciate any additional thoughts.
          Whoops, I completely forgot that your opening tag has attributes! Sorry
          about that. Try this:

          $pieces = preg_split('/\<(@content[^>]*|\/content@)\>/', $input);
          What modifications to the preg_split do I need to make this work? Is there
          a cleaner way to set up the <contenttags, like </contentinstead of
          </content@that would make the regular expression more efficient?
          Actually, if you used, for example, <@contentfor both the start and the
          end, you could simply do

          $pieces = explode('<@cont ent>', $input);

          and bypass regular extensions altogether. The resulting array will be set
          up the same as before. If you are able to modify the input to make both
          tags the same, this would probably be the best solution.

          HTH,
          --
          Benjamin D. Esham
          bdesham@gmail.c om | AIM: bdesham128 | Jabber: same as e-mail
          ....and that's why I'm not wearing any pants.

          Comment

          • Fred!head

            #6
            Re: String parsing question...

            Thanks!

            Tim


            Benjamin Esham wrote:
            Fred!head wrote:
            >
            Benjamin Esham wrote:
            timslavin wrote:
            >
            First, I want to grab data between two markers that I define (e.g.
            <start... data here ... </startand strip of the text before the
            first marker (<startin this example) and after the second marker
            (</start>) in this example. That would leave me with the "... data
            here ..." chunk with my markers either included (worst case) or
            removed (best case, saving me the third step below).
            >
            $pieces = preg_split('/\<(\/)?start\>/', $input);
            Probably it's the fact my mind goes blank when reading about regular
            expressions but I'm not able to make the preg_split work. If you have
            time/interest, I'd appreciate any additional thoughts.
            >
            Whoops, I completely forgot that your opening tag has attributes! Sorry
            about that. Try this:
            >
            $pieces = preg_split('/\<(@content[^>]*|\/content@)\>/', $input);
            >
            What modifications to the preg_split do I need to make this work? Is there
            a cleaner way to set up the <contenttags, like </contentinstead of
            </content@that would make the regular expression more efficient?
            >
            Actually, if you used, for example, <@contentfor both the start and the
            end, you could simply do
            >
            $pieces = explode('<@cont ent>', $input);
            >
            and bypass regular extensions altogether. The resulting array will be set
            up the same as before. If you are able to modify the input to make both
            tags the same, this would probably be the best solution.
            >
            HTH,
            --
            Benjamin D. Esham
            bdesham@gmail.c om | AIM: bdesham128 | Jabber: same as e-mail
            ...and that's why I'm not wearing any pants.

            Comment

            Working...