Re: identify the language of a web page

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • VK

    Re: identify the language of a web page

    On Apr 12, 2:35 am, Dr J R Stockton <j...@merlyn.de mon.co.ukwrote:
    In comp.lang.javas cript message <cda0f617-c9a0-4389-b79e-a02ad24852a6@k1
    0g2000prm.googl egroups.com>, Thu, 10 Apr 2008 22:29:01,
    "us...@yahoo.co m" <us...@yahoo.co mposted:
    >
    Suppose I need to classify 10000 web pages based on their languages.
    What should I look for to determine the language of each web page? Any
    advice is welcome.
    >
    Consider <URL:http://www.merlyn.demo n.co.uk/zel-82px.htmand siblings.
    <OT>
    more for ciwah, so OT, but still:
    is there language code for multilanguage document, like lang="multi"
    or something?
    </OT>
  • Joost Diepenmaat

    #2
    Re: identify the language of a web page

    VK <schools_ring@y ahoo.comwrites:
    <OT>
    more for ciwah, so OT, but still:
    is there language code for multilanguage document, like lang="multi"
    or something?
    </OT>
    All (or most, I'm not sure, pretty much all though) tags can have a lang
    attribute, so the "multi-languageness" is built-in already.





    --
    Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/

    Comment

    Working...