HttpBrowserCapabilities - recognized Crawlers

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?Utf-8?B?Wm9sdA==?=

    HttpBrowserCapabilities - recognized Crawlers

    Hi,
    Would someone know where I could get a list of the supported crawlers for
    the HttpBrowserCapa bilities?
    Is there a way to add new ones/modify the list?

    I have a web site for which I want to show a different content for search
    engine bots. I was planning on relying on HttpBrowserCapa bilities.crawle r,
    but what if the bot signature changes, or there is another one added, ...

    Thanks,
    Zolt
  • Andrew Morton

    #2
    Re: HttpBrowserCapa bilities - recognized Crawlers

    Zolt wrote:
    I have a web site for which I want to show a different content for
    search engine bots.
    Rather than try to get the site blacklisted by search engines, why not just
    use a robots.txt file to exclude them?

    Andrew


    Comment

    • =?Utf-8?B?Wm9sdA==?=

      #3
      Re: HttpBrowserCapa bilities - recognized Crawlers

      Andrew,

      What I want to do is show the search engines a different content, not
      prevent them from coming to my site.
      The problem is that I have pages that contain text in 2 languages which is
      shown depending on the browser's prefered language and/or selected language
      saved in a cookie.
      Doing it this way, I don't have to show urls with ugly query strings like

      The problem with search engines is that they only use the default language,
      but can't switch language to reindex the content in the other language.
      My goal is to detect if the requester is a web crawler, and if it is, show
      both languages. If not, continue the normal way.

      I have found an interesting post, which I believe I will be able to use
      (http://forums.asp.net/p/908519/1012090.aspx#1012090).

      I should be able to modify it to monitor the major search engines - I am
      only interested in those major ones.

      Thanks for the suggestion anyway,
      Zolt

      "Andrew Morton" wrote:
      Zolt wrote:
      I have a web site for which I want to show a different content for
      search engine bots.
      >
      Rather than try to get the site blacklisted by search engines, why not just
      use a robots.txt file to exclude them?
      >
      Andrew
      >
      >
      >

      Comment

      • Andrew Morton

        #4
        Re: HttpBrowserCapa bilities - recognized Crawlers

        Zolt wrote:
        Andrew,
        >
        What I want to do is show the search engines a different content, not
        prevent them from coming to my site.
        The problem is that I have pages that contain text in 2 languages
        which is shown depending on the browser's prefered language and/or
        selected language saved in a cookie.
        Ahh - it sounded like you might want to do something referred to as web site
        cloaking.
        ....
        My goal is to detect if the requester is a web crawler, and if it is,
        show both languages. If not, continue the normal way.
        A regular expression which catches the crawlers which visit our sites is

        Dim re As New
        Regex("bot|spid er|slurp|crawle r|teoma|DMOZ|;1 813|findlinks|t ellbaby|ia_arch iver|nutch|voya ger|wwwster|3di r|scooter|appie |exactseek|feed fetcher|freedir |holmes|panscie nt|yandex|alef| cfnetwork|kaloo ga",
        RegexOptions.Co mpiled Or RegexOptions.Ig noreCase)

        applied to the user-agent string, of course. You could use the Sub
        Session_Start in Global.asax.vb as the location to check it.

        Then if you can find a URL in the UA string, you can check its TLD for .com,
        ..fr, .whatever.

        (You might want to take out the ";1813" - I put that in to filter out the
        AVG link checker thing which happened to distort the actual users stats on
        our sites.)

        HTH

        Andrew


        Comment

        • =?Utf-8?B?Wm9sdA==?=

          #5
          Re: HttpBrowserCapa bilities - recognized Crawlers

          Thanks a lot Andrew!
          Your solution seems to give more choices than the one I found.
          I will probably go that route.

          Really appreciated,
          Zolt

          Comment

          Working...