Searching text

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • SK

    Searching text

    Is there a way to store HTML into a MySQL TEXT column,
    yet be able to search over textual content only?

    For example, if I store the following HTML snippet:

    <p>A very <em>small</em> paragraph.</p>

    .... into an TEXT column, I would like to be able to search
    in the text "A very small paragraph." only, meaning the search
    would find "small" (even if it's enclosed in <em> tags and
    thus creating string "<em>small</em>"), and ignore search
    strings "<p>" and "<em>".



  • kingofkolt

    #2
    Re: Searching text

    "SK" <sk@foo.com> wrote in message news:2uci4hF27h bs0U1@uni-berlin.de...[color=blue]
    > Is there a way to store HTML into a MySQL TEXT column,
    > yet be able to search over textual content only?
    >
    > For example, if I store the following HTML snippet:
    >
    > <p>A very <em>small</em> paragraph.</p>
    >
    > ... into an TEXT column, I would like to be able to search
    > in the text "A very small paragraph." only, meaning the search
    > would find "small" (even if it's enclosed in <em> tags and
    > thus creating string "<em>small</em>"), and ignore search
    > strings "<p>" and "<em>".
    >
    >[/color]

    Try strip_tags():



    Comment

    • Chung Leong

      #3
      Re: Searching text

      "SK" <sk@foo.com> wrote in message news:2uci4hF27h bs0U1@uni-berlin.de...[color=blue]
      > Is there a way to store HTML into a MySQL TEXT column,
      > yet be able to search over textual content only?
      >
      > For example, if I store the following HTML snippet:
      >
      > <p>A very <em>small</em> paragraph.</p>
      >
      > ... into an TEXT column, I would like to be able to search
      > in the text "A very small paragraph." only, meaning the search
      > would find "small" (even if it's enclosed in <em> tags and
      > thus creating string "<em>small</em>"), and ignore search
      > strings "<p>" and "<em>".
      >
      >[/color]

      One possible way to do this would be to strip out the tags before you save
      the text and store the formatting info separately. Restore the tags only
      when you need to output in HTML.

      Example:

      <?

      $example = <<<EXAMPLE
      <p>A very <em>small</em> paragraph.</p>

      .... into an TEXT column, I would like to be able to search
      in the text "A very small paragraph." only, meaning the search
      would find "small" (even if it's enclosed in <em> tags and
      thus creating string "<em>small</em>"), and ignore search
      strings "<p>" and "<em>".
      EXAMPLE;

      function ExtractFormatti ng($html) {
      preg_match_all( '/<.*?>/', $html, $matches,
      PREG_OFFSET_CAP TURE |PREG_PATTERN_O RDER);
      $tags = $matches[0];
      $formatting = array();
      foreach($tags as $tag) {
      list($tag_html, $tag_offset) = $tag;
      $formatting[$tag_offset] = $tag_html;
      }
      $text = preg_replace('/<.*?>/', '', $html);
      return array($text, $formatting);
      }

      function ApplyFormatting ($text, $formatting) {
      $html = $text;
      foreach($format ting as $tag_offset => $tag_html) {
      $html = substr($html, 0, $tag_offset)
      . $tag_html
      . substr($html, $tag_offset);
      }
      return $html;
      }

      list($text, $formatting) = ExtractFormatti ng($example);

      echo "<pre style='border:1 px solid black'>$text</pre>";

      $html = ApplyFormatting ($text, $formatting);

      echo "<div style='border:1 px solid black'>$html</div>";

      ?>

      To store the formatting info, just serialize the array and dump it into
      another text column.


      Comment

      Working...