KEYWORDS from a string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Nel

    KEYWORDS from a string

    Hi all,

    Before I re-invent the wheel here, has anyone willing to share a basic
    script to extract META keywords from a string. I have a string, let's say
    $pageText that contains the dynamic contents of the page.

    Ideally, I don't just want to explode the string and remove "and", "or" and
    "the" etc. because some the the repeated keywords may be more that one word
    long.

    Also, it would be good to be able to rank the keywords according to the
    frequency.

    I have searched google and hotscripts etc. Can only find web sites to
    create METAs to copy & paste.

    Thanx in advance.

    Nel


  • Chung Leong

    #2
    Re: KEYWORDS from a string

    "Nel" <nelly@ne14.co. NOSPAMuk> wrote in message
    news:41001aae$0 $44474$ed2619ec @ptn-nntp-reader02.plus.n et...[color=blue]
    > Hi all,
    >
    > Before I re-invent the wheel here, has anyone willing to share a basic
    > script to extract META keywords from a string. I have a string, let's say
    > $pageText that contains the dynamic contents of the page.
    >
    > Ideally, I don't just want to explode the string and remove "and", "or"[/color]
    and[color=blue]
    > "the" etc. because some the the repeated keywords may be more that one[/color]
    word[color=blue]
    > long.
    >
    > Also, it would be good to be able to rank the keywords according to the
    > frequency.
    >
    > I have searched google and hotscripts etc. Can only find web sites to
    > create METAs to copy & paste.
    >
    > Thanx in advance.
    >
    > Nel
    >[/color]

    See documentation for get_meta_tags() .


    Comment

    • steve

      #3
      Re: Re: KEYWORDS from a string

      "Chung Leong" wrote:[color=blue]
      > "Nel" <nelly@ne14.co. NOSPAMuk> wrote in message
      > news:41001aae[quote:362d827a4 d="Chung Leong"]"Nel"[/color]
      <nelly@ne14.co. NOSPAMuk> wrote in message
      news:41001aae$0 $44474$ed2619ec @ptn-nntp-reader02.plus.n et...[color=blue]
      > Hi all,
      >
      > Before I re-invent the wheel here, has anyone willing to share a[/color]
      basic[color=blue]
      > script to extract META keywords from a string. I have a string,[/color]
      let’s say[color=blue]
      > $pageText that contains the dynamic contents of the page.
      >
      > Ideally, I don’t just want to explode the string and remove "and",[/color]
      "or"
      and[color=blue]
      > "the" etc. because some the the repeated keywords may be more that[/color]
      one
      word[color=blue]
      > long.
      >
      > Also, it would be good to be able to rank the keywords according to[/color]
      the[color=blue]
      > frequency.
      >
      > I have searched google and hotscripts etc. Can only find web sites[/color]
      to[color=blue]
      > create METAs to copy & paste.
      >
      > Thanx in advance.
      >
      > Nel
      >[/color]

      See documentation for get_meta_tags() .[/quote:362d827a4 d]
      474$ed2619ec@pt n-nntp-reader02.plus.n et...[color=blue][color=green]
      > > Hi all,
      > >
      > > Before I re-invent the wheel here, has anyone willing to share a[/color]
      > basic[color=green]
      > > script to extract META keywords from a string. I have a string,[/color]
      > let’s say[color=green]
      > > $pageText that contains the dynamic contents of the page.
      > >
      > > Ideally, I don’t just want to explode the string and remove[/color]
      > "and", "or"
      > and[color=green]
      > > "the" etc. because some the the repeated keywords may be more[/color]
      > that one
      > word[color=green]
      > > long.
      > >
      > > Also, it would be good to be able to rank the keywords according[/color]
      > to the[color=green]
      > > frequency.
      > >
      > > I have searched google and hotscripts etc. Can only find web[/color]
      > sites to[color=green]
      > > create METAs to copy & paste.
      > >
      > > Thanx in advance.
      > >
      > > Nel
      > >[/color]
      >
      > See documentation for get_meta_tags() .[/color]

      The reply above answers your question if you are looking for strict
      definition of meta tags in html.

      If by meta you mean keywords that are important and somewhat unique in
      the body of the text, then I suggest that you need to have a
      definition for common keywords, and then remove them to arrive at
      "meta". The way I do it is to start with mysql stop words (search
      on web). Then add words that are common in your domain (e.g. "html"
      may be a common word on the web). Now remove all of these words from
      the string using regular expressions, and what remains is pretty much
      unique words.

      --
      http://www.dbForumz.com/ This article was posted by author's request
      Articles individually checked for conformance to usenet standards
      Topic URL: http://www.dbForumz.com/PHP-KEYWORDS...ict132415.html
      Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=442256

      Comment

      • steve

        #4
        Re: Re: KEYWORDS from a string

        "Chung Leong" wrote:[color=blue]
        > "Nel" <nelly@ne14.co. NOSPAMuk> wrote in message
        > news:41001aae[quote:362d827a4 d="Chung Leong"]"Nel"[/color]
        <nelly@ne14.co. NOSPAMuk> wrote in message
        news:41001aae$0 $44474$ed2619ec @ptn-nntp-reader02.plus.n et...[color=blue]
        > Hi all,
        >
        > Before I re-invent the wheel here, has anyone willing to share a[/color]
        basic[color=blue]
        > script to extract META keywords from a string. I have a string,[/color]
        let’s say[color=blue]
        > $pageText that contains the dynamic contents of the page.
        >
        > Ideally, I don’t just want to explode the string and remove "and",[/color]
        "or"
        and[color=blue]
        > "the" etc. because some the the repeated keywords may be more that[/color]
        one
        word[color=blue]
        > long.
        >
        > Also, it would be good to be able to rank the keywords according to[/color]
        the[color=blue]
        > frequency.
        >
        > I have searched google and hotscripts etc. Can only find web sites[/color]
        to[color=blue]
        > create METAs to copy & paste.
        >
        > Thanx in advance.
        >
        > Nel
        >[/color]

        See documentation for get_meta_tags() .[/quote:362d827a4 d]
        474$ed2619ec@pt n-nntp-reader02.plus.n et...[color=blue][color=green]
        > > Hi all,
        > >
        > > Before I re-invent the wheel here, has anyone willing to share a[/color]
        > basic[color=green]
        > > script to extract META keywords from a string. I have a string,[/color]
        > let’s say[color=green]
        > > $pageText that contains the dynamic contents of the page.
        > >
        > > Ideally, I don’t just want to explode the string and remove[/color]
        > "and", "or"
        > and[color=green]
        > > "the" etc. because some the the repeated keywords may be more[/color]
        > that one
        > word[color=green]
        > > long.
        > >
        > > Also, it would be good to be able to rank the keywords according[/color]
        > to the[color=green]
        > > frequency.
        > >
        > > I have searched google and hotscripts etc. Can only find web[/color]
        > sites to[color=green]
        > > create METAs to copy & paste.
        > >
        > > Thanx in advance.
        > >
        > > Nel
        > >[/color]
        >
        > See documentation for get_meta_tags() .[/color]

        The reply above answers your question if you are looking for strict
        definition of meta tags in html.

        If by meta you mean keywords that are important and somewhat unique in
        the body of the text, then I suggest that you need to have a
        definition for common keywords, and then remove them to arrive at
        "meta". The way I do it is to start with mysql stop words (search
        on web). Then add words that are common in your domain (e.g. "html"
        may be a common word on the web). Now remove all of these words from
        the string using regular expressions, and what remains is pretty much
        unique words.

        --
        http://www.dbForumz.com/ This article was posted by author's request
        Articles individually checked for conformance to usenet standards
        Topic URL: http://www.dbForumz.com/PHP-KEYWORDS...ict132415.html
        Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=442256

        Comment

        • Nel

          #5
          Re: Re: KEYWORDS from a string

          Here is the final script I put together thanks to your help and suggestions.
          It will automatically work through a string and remove duplicates, new lines
          and punctuation before listing the keywords within a meta tag.

          If anyone can offer any improvements I am open to suggestions.

          Nel.
          _______________ _______________ _______________

          <?php // metatags.inc.ph p
          // Create keyword META tags from dynamic page content

          // test string from BBC News
          echo metatags("Tony Blair has nominated long-time ally Peter Mandelson as
          Britain's next European commissioner.
          The announcement was made after Mr Blair spoke to new European Commission
          President Jose Manuel Durao Barroso on Friday morning.

          The appointment represents a remarkable comeback for Mr Mandelson, who has
          twice resigned from the Cabinet in controversial

          circumstances.

          It will also trigger a Westminster by-election in his Hartlepool seat.

          'Positive response'

          In a statement, Mr Mandelson said he was \"delighted\ " to have been
          nominated for the post by the prime minister, but confirmed that

          he had \"agonised\" over whether the job was right for him.");

          function metatags($paget ext)
          {
          // Define variables for this web site
          $websitename = "Example's Web Site";
          $metadescriptio n = "This web site's description";
          $metakeywords = cleankeywords($ pagetext);

          // Build up META TAGS
          $metatags = " <meta name=\"Name\" content=\"$webs itename\">\n";
          $metatags .= " <meta name=\"Rating\" content=\"Gener al\">\n";
          $metatags .= " <meta name=\"Robots\" content=\"Index \">\n";
          $metatags .= " <meta name=\"Revisit-After\" content=\"14 days\">\n";
          $metatags .= " <meta name=\"DESCRIPT ION\"
          content=\"$meta description\">\ n";
          $metatags .= " <meta name=\"KEYWORDS \"
          content=\"$webs itename,$metake ywords\">\n";

          return $metatags;
          }


          function cleankeywords($ term)
          {
          //Specify text file containing stop words (one on each line)
          $stopwords_file = "stopwords.txt" ;

          //Remove punctuation and \n \r
          $pat = array("/\./s","/\,/s","/\"/s","/\'/s","/\n/s","/\r/s");
          $term = preg_replace($p at, "", $term);

          //load list of common words
          $common = file($stopwords _file);
          $total = count($common);
          for ($x=0; $x<= $total; $x++)
          {
          $common[$x] = trim(strtolower ($common[$x]));
          }

          //make array of search terms
          $_terms = explode(" ", $term);

          foreach ($_terms as $line)
          {
          if (!in_array(strt olower(trim($li ne)), $common))
          {
          $cleanterm[$line] = $line;
          }
          }
          $cleanwords = implode(", ", $cleanterm);
          return $cleanwords;
          }
          ?>


          Comment

          Working...