how we extract data from html file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • veer
    New Member
    • Jul 2007
    • 198

    how we extract data from html file

    Hi
    i am making a program in which i want to extract data from html file .
    Actually there are two dates on html file i want to extract these dates but the main probleum is that these dates are different on each file. A word "AKTIVA" is always comes before these dates.
    i made this by seaching the activa word but after this i am not getting any idea how these dates can be accessed.

    i use one another method by transfering the whole data of html into excel file but when i open the newly created file using connection object it show s the error that "External Table Is Not IN required Format" actually the value in cells are not in a correct format

    can any body tell me about one of the above method i used or give me another idea

    i am eagrly waiting for the answer
    varinder
  • !NoItAll
    Contributor
    • May 2006
    • 297

    #2
    HTML is not a data structure and therefore can not be accessed like one.
    XML is a data structure and there are rules for XML that don't exist for HTML. What you are doing is called "screen scraping" and is a dubious activity at best.
    Here's an example:
    In XML tags can identify a string as a segment of data, such as a <firstname> or <lastname> or <date>. HTML can only identify how the string is to be displayed (big bold and blue). HTML is a display structure.
    You can only hope that whoever creates the HTML will never change how they do it - but they will. HTML is also typically not well formed (unless it is XHTML) and therefore cannot be parsed using standard XML tools.
    In order to parse HTML you will pretty much be forced to do it all manually. Instr, Left, Right, Mid, Replace, etc. It gets pretty ugly - and then has to change every time the page author changes their mind.

    '************** *************** *************** *************** ****
    Public Function GetDatefromHTML (ByVal sHTML as String)as Date
    Dim sTemp As String
    Dim dDate As Date

    sTemp = Mid$(sHTML, InStr(1, sHTML, "aktiva", vbTextCompare))
    On Error Goto BadDate
    dDate = CDate(Mid$(sTem p, [number of chars into sTemp the date begins], [length of the date string]))
    On Error Goto 0
    GetDatefromHTML = dDate
    Exit Function

    BadDate:
    dDate = "1/1/1970"
    Resume next
    End Sub
    '************** *************** *************** *************** **************
    There are lots of other ways to do this - this is pretty down and dirty. Using the cDate function (built into VB) it will convert most any valid date format into a DATE type - and will appear however your computer locale is set.
    If the HTML changes though, or the date is bad it will return 1/1/1970 (you can pick any date - but it has to return a date.

    Comment

    Working...