An idea for a new HTML attribute... opinions needed...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Richard Clarke
    New Member
    • Oct 2010
    • 4

    An idea for a new HTML attribute... opinions needed...

    Hey,

    I was browsing on Google earlier and found that there was one site which had content which had changed since it was indexed - the page was no longer relevant to the search term. I know some sites, such as eBay, have places in some pages which change regularly on page load - for example a catalogue where there may be a New Items section or a Featured Items section in which 5 random items are selected from a catalogue of 500 on every page load. I actually quite regularly come across this kind of thing, which is what made me think of this idea.

    My idea is to have a "changeFreq " attribute for which the developer can specify any of the following:
    • PageLoad[:URL]
    • Daily[:hh:mm][:URL]
    • Weekly[Mon|Tue|Wed|Thu |Fri|Sat|Sun][:hh:mm][:URL]
    • Monthly[:dd][:hh:mm][:URL]
    • Yearly[:mm[/dd]][:URL]


    So what does all this mean? Well for PageLoad, search engines simply will not index them. For options with [:hh:mm], the time is optional. If no time is supplied then it defaults to midnight. For options with [Mon|Tue|Wed|Thu |Fri|Sat|Sun], the day is optional. If no day is supplied then it defaults to Monday. For options with [:dd] (yes, you guessed it!) it is the day of the month. If it is more than the number of days in the month then it will default back to the last day. If no day is specified then it defaults to the 1st. For options with [:mm/dd] specified, it means (fairly obviously) the month and day. The same rules apply for day and if the month is specified on its own then the day defaults to the 1st.

    Now for the clever bit - the [:URL]. This specifies the URL to load into the element when it is out of date. This can be used for two purposes - for more relevant searches and for better caching systems. I can hear asking what difference this would have... it would basically allow search engines to re-index just a part of the page, and browsers to load most of a page from the cache but reload any out-dated parts. This will be optional and, if not specified on an out-dated element, the whole page will be re-indexed or reloaded.

    Oh and this attribute, if a URL is specified, could be used to load any outdated parts of the page with jQuery and similar frameworks... other options would have to be available for this, such as Hourly, Minutely, Secondly and so on.

    Inheritance will be applied to all elements where there is no changeFreq specified. Any elements within an element set to PageLoad will not be indexed (even if they have a different changeFreq specified. Consider the following code:

    Code:
    <div changeFreq="Weekly:Mon:13:00">
        <div id="TodayOnlyOffers" changeFreq="Daily:13:30:/Parts/TodayOnly.aspx">
            Content will be updated with the URL /Parts/TodayOnly.aspx at 13:30 every day, but this tag will be replaced weekly every Monday at 13:00.
        </div>
        <div changeFreq="PageLoad">
            Content will not be indexed, and this tag will also be replaced weekly every Monday at 13:00.
            <div changeFreq="Daily:15:00:/Parts/Something-Else.aspx">
                This tag will also not be indexed because it is within a PageLoad tag. No matter what the changeFreq of this tag is set to, the search engine will still think it changes every page load. A solution would be to put the content outside this tag, but within the parent tag, within its own tag set to PageLoad and remove the changeFreq of the parent tag.
            </div>
        </div>
    </div>
    So what do people think of this idea? Are there any major flaws that I havent thought of? I am not really up on how caching and indexing work - I know the basics but I dont know enough to know if this is actually a really good idea... Any opinions would be greatly appreciated before I even think about approaching W3C or anyone like that. And on that note, does anyone know exactly who (either firm or person) I would need to contact?

    Thanks in advance.

    Regards,

    Richard
  • drhowarddrfine
    Recognized Expert Expert
    • Sep 2006
    • 7434

    #2
    What you are looking at doing is duplicating HTTP headers that already do most of that, such as the Expires header. Other parts are done using .htaccess and robots.txt and sitemap.xml files. Also, you can't have parts of the HTML standard relying on outside functionality, particularly things that are not standards, such as other frameworks.

    The HTML and HTTP standards expose themselves so other systems can access the information you talk about and let those programs deal with that in whatever way they want.

    Comment

    • Richard Clarke
      New Member
      • Oct 2010
      • 4

      #3
      Well since the original example (in my experience) so frequently occurs, something needs to point out to search engines what, if anything, changes on page load and alike. This is basically tag level caching - so I know the amount of data would be vastly higher but it may also take large chunks of data out of storage due to changing on pageload.

      I know techniques exist - but they are ONLY for page level caching / indexing. Nothing can signal to the bot that "this tag should not be indexed".

      Richard

      Comment

      • drhowarddrfine
        Recognized Expert Expert
        • Sep 2006
        • 7434

        #4
        No standards committee will ever create something that reacts to a program. Tag level caching, like you describe, can be handled with RDF, already a standard and used quite a bit.

        Comment

        • Richard Clarke
          New Member
          • Oct 2010
          • 4

          #5
          What do you mean "something that reacts to a program"? Surely its the programs that are reacting to it - and thats how it is currently (i.e. normal caching with no-cache and alike is reacted to appropriately by browsers, and any reputable robot reacts to robots.txt).

          Comment

          • drhowarddrfine
            Recognized Expert Expert
            • Sep 2006
            • 7434

            #6
            What I meant was a standard developed so one particular implementation can use it. Search engines are just special purpose programs.

            Comment

            • Richard Clarke
              New Member
              • Oct 2010
              • 4

              #7
              Ah now I understand what you mean. As it happens I have posted this idea on the Google Webmaster forums, to see how well they accepted the idea - using it for search engines has generally been turned down because too many people dont keep to web standards anyway - so this probably wouldnt be any different, and would be abused.

              However I still think it could be used for web browsers to speed up page load times - and it wouldnt stop there as it could also be used for jQuery and other frameworks for smaller timespans such as Minutely:30:/parts/part.aspx (every minute at the 30 seconds mark load /parts/part.aspx into this tag). It would be easy enough to do that and could be built into jQuery to automatically detect these timespans and react accordingly.

              Comment

              Working...