Get text between A and B?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Philipp Lenssen

    Get text between A and B?

    I want to read out several strings from an HTML (text) file, like
    everything between "<h2>" and "</h2>" to create a table-of-contents
    (also, things other than tags).
    I do have a function but it's slow, and sometimes doesn't finish for
    larger files (600K, not much really!).
    Now what would be a nice function to do this job? I suppose some regex
    with preg_match_all?

    It should have a parameter telling which occurrence of the string
    should be used, e.g. the second, third and so on.

    ------------

    Like:

    function getTextBetween( $allText, $textBefore, $textAfter, $offset = 0)
    {
    // ?
    }

    Then I could say:

    $s = getTextBetween( "<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
    "<h2>", "</h2>", 1);
    echo $s; // ... would be "bar"

    ------------

    Any help greatly appreciated!

  • Jedi121

    #2
    Re: Get text between A and B?

    "Philipp Lenssen" a écrit le 17/11/2003 :[color=blue]
    > Then I could say:
    >
    > $s = getTextBetween( "<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
    > "<h2>", "</h2>", 1);
    > echo $s; // ... would be "bar"
    >
    > ------------
    >
    > Any help greatly appreciated![/color]

    I would use a combination of explode :
    explode( "<h2>", "<h2>foo</h2><p>Hello World</p><h2>bar</h2>")
    and for each element explode( "</h2>", element)


    Comment

    • Justin Koivisto

      #3
      Re: Get text between A and B?

      Philipp Lenssen wrote:[color=blue]
      > I want to read out several strings from an HTML (text) file, like
      > everything between "<h2>" and "</h2>" to create a table-of-contents
      > (also, things other than tags).
      > I do have a function but it's slow, and sometimes doesn't finish for
      > larger files (600K, not much really!).
      > Now what would be a nice function to do this job? I suppose some regex
      > with preg_match_all?
      >
      > It should have a parameter telling which occurrence of the string
      > should be used, e.g. the second, third and so on.
      >
      > ------------
      >
      > Like:
      >
      > function getTextBetween( $allText, $textBefore, $textAfter, $offset = 0)
      > {
      > // ?
      > }
      >
      > Then I could say:
      >
      > $s = getTextBetween( "<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
      > "<h2>", "</h2>", 1);
      > echo $s; // ... would be "bar"
      >
      > ------------
      >
      > Any help greatly appreciated!
      >[/color]

      Kinda like this then...

      function getTextBetween( $allText,$textB efore,$textAfte r,$offset=0){
      $pattern='#'.$t extBefore.'(.*) '.$textAfter.'# iU';
      preg_match_all( $pattern, $allText,$match es);
      return $matches[1][$offset];
      }


      --
      Justin Koivisto - spam@koivi.com
      PHP POSTERS: Please use comp.lang.php for PHP related questions,
      alt.php* groups are not recommended.

      Comment

      • Matty

        #4
        Re: Get text between A and B?

        Philipp Lenssen wrote:
        [color=blue]
        > It should have a parameter telling which occurrence of the string
        > should be used, e.g. the second, third and so on.
        >
        > ------------
        >
        > Like:
        >
        > function getTextBetween( $allText, $textBefore, $textAfter, $offset = 0)
        > {
        > // ?
        > }
        >
        > Then I could say:
        >
        > $s = getTextBetween( "<h2>foo</h2><p>Hello World</p><h2>bar</h2>",
        > "<h2>", "</h2>", 1);
        > echo $s; // ... would be "bar"
        >[/color]

        function getTextDelims($ alltext, $opener, $closer)
        {
        preg_match_all( '/'.preg_quote($o pener).'(.+)'.p reg_quote($clos er).'/mU', $alltext, $allmatches);
        if ((count($allmat ches) > 0) and (array_key_exis ts(1, $allmatches)))

        { return $allamtches[1]; }
        else { return array(); }
        }

        Then doing
        $myanswers = getTextDelims(. ....

        $myanswers[1] contains offset 1, etc

        If you want to make the matches case-insensitive, change '/mU' to '/mUi'

        HTH

        Matt

        Comment

        • Philipp Lenssen

          #5
          Re: Get text between A and B?

          Justin Koivisto wrote:
          [color=blue]
          > Kinda like this then...
          >
          > function getTextBetween( $allText,$textB efore,$textAfte r,$offset=0){
          > $pattern='#'.$t extBefore.'(.*) '.$textAfter.'# iU';
          > preg_match_all( $pattern, $allText,$match es);
          > return $matches[1][$offset];
          > }[/color]

          Thanks to that solution, and the other ones as well (I merged two
          together). Works very nice.

          Comment

          Working...