Pulling a synopsis from text

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • crucialmoment

    Pulling a synopsis from text

    Greetings,
    I am trying to automatically pull a beginning section from submitted
    text and return it with a More.. link. The submitted text is in html
    created by FckEditor (http://www.fckeditor.net/).
    The trouble I am running into is the cutoff point is often inside of a
    tag - ie after an opening <div> but the closing div is cut.
    The only idea I have come up with is to build an array of all possible
    html tags and search for a close for each but I am hoping there is a
    cleaner method. Has anyone attempted such a feat previously?

    function getSynop($input ="", $more_link="", $synop_size='75 0') {
    $tmp_str = substr($input, 0, $synop_size);
    $end_val = strrpos($tmp_st r, ">") + 1;
    if($end_val < ($synop_size)) {
    $end_val = strrpos($tmp_st r, ".") + 1;
    }
    if($end_val < ($synop_size)) {
    $end_val = strrpos($tmp_st r, ">") + 1;
    }
    Return substr($input, 0, $end_val) ." <a
    href='$more_lin k'>more...</a>";
    }

  • Chung Leong

    #2
    Re: Pulling a synopsis from text


    crucialmoment wrote:[color=blue]
    > Greetings,
    > I am trying to automatically pull a beginning section from submitted
    > text and return it with a More.. link. The submitted text is in html
    > created by FckEditor (http://www.fckeditor.net/).
    > The trouble I am running into is the cutoff point is often inside of a
    > tag - ie after an opening <div> but the closing div is cut.
    > The only idea I have come up with is to build an array of all possible
    > html tags and search for a close for each but I am hoping there is a
    > cleaner method. Has anyone attempted such a feat previously?
    >
    > function getSynop($input ="", $more_link="", $synop_size='75 0') {
    > $tmp_str = substr($input, 0, $synop_size);
    > $end_val = strrpos($tmp_st r, ">") + 1;
    > if($end_val < ($synop_size)) {
    > $end_val = strrpos($tmp_st r, ".") + 1;
    > }
    > if($end_val < ($synop_size)) {
    > $end_val = strrpos($tmp_st r, ">") + 1;
    > }
    > Return substr($input, 0, $end_val) ." <a
    > href='$more_lin k'>more...</a>";
    > }[/color]

    The trick here is to ignore the tags and only operate on what's between
    the tags. Say if we have the following:

    This is <div>a test</div> and this is only <div>a test.</div>

    and we want 10 characters, we would look at "This is " and grab 8
    characters. Then we look at "a test" and retain only 2 characters. As
    we have want we need, we will retain 0 characters from " and this is
    only " and "a test.". The end result will be:

    This is <div>a </div><div></div>

    Once the empty tags are discarded we end up with

    This is <div>a </div>

    which is want we want.

    Here's an implementation of the technique:

    <?

    $s = 'This is some <strong>sampl e text</strong>. You are using <a
    href="http://www.fckeditor.n et/">FCKeditor </a>.';

    function synop_callback( $m) {
    global $synop_char_to_ fetch;
    $tag = $m[2];

    // got enough characters already, return just the tag
    if($synop_char_ to_fetch < 0) {
    return $tag;
    }

    // decode HTML entities to avoid undercounting
    $inner_html = $m[1];
    $inner_text = html_entity_dec ode($inner_html );

    if(strlen($inne r_text) > $synop_char_to_ fetch) {
    // retain up to $synop_char_to_ fetch, ending
    // at a word boundary
    $r = preg_replace("/^(.{0,$synop_ch ar_to_fetch}\b) ?.*/", '\1',
    $inner_text);
    $inner_html = htmlspecialchar s(rtrim($r));
    }

    // substract the number of characters retained
    $synop_char_to_ fetch -= strlen($inner_t ext);
    return "$inner_html$ta g";
    }

    function synop_chop($s, $num) {
    // chop off extra text beyond $num characters
    global $synop_char_to_ fetch;
    $synop_char_to_ fetch = $num;
    $s = preg_replace_ca llback('/([^<]*)(<.*?>)?/s', 'synop_callback ',
    $s);

    // collapse empty tags
    do {
    $r = $s;
    $s = preg_replace('/<(\S*?)[^>]*?>\s*<\/\1>/i', '', $r);
    } while($r != $s);

    // add ellipsis
    $s = preg_replace('/\.?$/', '...', trim($s), 1);
    return $s;
    }

    echo synop_chop($s, 20);

    ?>

    Comment

    • d

      #3
      Re: Pulling a synopsis from text

      "crucialmom ent" <crucialmoment@ gmail.com> wrote in message
      news:1143321725 .586625.135020@ g10g2000cwb.goo glegroups.com.. .[color=blue]
      > Greetings,
      > I am trying to automatically pull a beginning section from submitted
      > text and return it with a More.. link. The submitted text is in html
      > created by FckEditor (http://www.fckeditor.net/).
      > The trouble I am running into is the cutoff point is often inside of a
      > tag - ie after an opening <div> but the closing div is cut.
      > The only idea I have come up with is to build an array of all possible
      > html tags and search for a close for each but I am hoping there is a
      > cleaner method. Has anyone attempted such a feat previously?
      >
      > function getSynop($input ="", $more_link="", $synop_size='75 0') {
      > $tmp_str = substr($input, 0, $synop_size);
      > $end_val = strrpos($tmp_st r, ">") + 1;
      > if($end_val < ($synop_size)) {
      > $end_val = strrpos($tmp_st r, ".") + 1;
      > }
      > if($end_val < ($synop_size)) {
      > $end_val = strrpos($tmp_st r, ">") + 1;
      > }
      > Return substr($input, 0, $end_val) ." <a
      > href='$more_lin k'>more...</a>";
      > }
      >[/color]

      1. Get text.
      2. Remove tags
      3. Take first <n> characters.

      $text="This is some <div>text</div> isn't it interesting. <b>send
      money.</b> <i>and beer</i>";

      function getSynop($input ="", $more_link="", $synop_size=750 ) {
      $syn=substr(str ip_tags($text), 0, $synop_size);
      return $syn." <a href='".$more_l ink.."'>more... </a>";
      }

      hope that helps!

      dave


      Comment

      Working...