Reading Content Of Web Pages Using Vb

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Nilya
    New Member
    • Apr 2007
    • 6

    Reading Content Of Web Pages Using Vb

    HI,
    All

    I need a code some thing that can read content of webpages using vb tht is without tags.

    Or else a code that can remove all the tags from viewsource after gettin the viewsource in txet file or variable


    please let me know is it possible.

    Mail if can to (<Removed by Moderator>)

    Thanks In Advance.
    Last edited by Killer42; Apr 25 '07, 09:53 AM. Reason: Removed e-mail address
  • ansumansahu
    New Member
    • Mar 2007
    • 149

    #2
    Originally posted by Nilya
    HI,
    All

    I need a code some thing that can read content of webpages using vb tht is without tags.

    Or else a code that can remove all the tags from viewsource after gettin the viewsource in txet file or variable


    please let me know is it possible.

    Mail if can to (...)

    Thanks In Advance.
    You can use VB 6.0 Inet Control to read the contents of the webpage and then do operations on them Just do a google search on "vb 6.0 Inet Control" and you will info for this.

    -ansuman sahu

    Comment

    • Nilya
      New Member
      • Apr 2007
      • 6

      #3
      Originally posted by ansumansahu
      You can use VB 6.0 Inet Control to read the contents of the webpage and then do operations on them Just do a google search on "vb 6.0 Inet Control" and you will info for this.

      -ansuman sahu

      i am using the same but the problem is that it returns the source code from which i found difficulty in retrieving the main contents


      eg:

      <td width="99%" class="Small"> Also include <span class="SmallBol d"> Resume Summary </span></td>

      i want "Resume summary"
      this is just example.

      Thanks for reply

      Comment

      • Robbie
        New Member
        • Mar 2007
        • 180

        #4
        Originally posted by Nilya
        i am using the same but the problem is that it returns the source code from which i found difficulty in retrieving the main contents


        eg:

        <td width="99%" class="Small"> Also include <span class="SmallBol d"> Resume Summary </span></td>

        i want "Resume summary"
        this is just example.

        Thanks for reply
        If you simply want to remove all text between '<' and '>', I'll make a function for that, it shouldn't be very hard. ;)
        After making function: Okay, it was a little harder than I expected. ~_~;

        Code:
        Public Function StripHTMLTags(OriginalHTMLCode As String, Optional TagReplaceText As String = "") As String
        '
        'OriginalHTMLCode - HTML code to strip tags from
        'TagReplaceText - What this function will put in place of the
        'tag (by default, nothing - an empty string)
        '
        'Gives back the HTML code with tags replaced by TagReplaceText
        '
            Dim StartTagPos As Long
            Dim EndTagPos As Long
            Dim TempTagPos As Long
            
            Dim StartTagNum As Long
            Dim EndTagNum As Long
            
            Dim TempChar As String
            
            StartTagPos = InStr(1, OriginalHTMLCode, "<")
            
        While StartTagPos > 0
            
            
            If StartTagPos > 0 Then
            'An open tag has been found
                StartTagNum = 1
                EndTagNum = 0
                
                'Keep searching until same number of open tags and close tags
                'have been found (i.e. until nested tags finish >_<)
                TempTagPos = StartTagPos + 1
                
                While (EndTagNum < StartTagNum And TempTagPos <= Len(OriginalHTMLCode))
                    
                    TempChar = Mid(OriginalHTMLCode, TempTagPos, 1)
                    If TempChar = "<" Then StartTagNum = StartTagNum + 1
                    If TempChar = ">" Then EndTagNum = EndTagNum + 1
                    
                    TempTagPos = TempTagPos + 1
                Wend
                
                
            End If
            
            EndTagPos = TempTagPos - 1
            
            
            StripHTMLTags = TagReplaceText + StripHTMLTags
            If StartTagPos > 1 Then
                StripHTMLTags = Mid(OriginalHTMLCode, 1, StartTagPos - 1)
            End If
                StripHTMLTags = StripHTMLTags + Mid(OriginalHTMLCode, EndTagPos + 1, Len(OriginalHTMLCode) - 2)
                
                OriginalHTMLCode = StripHTMLTags
            
            
            StartTagPos = InStr(1, OriginalHTMLCode, "<")
            If StartTagPos > 0 Then
                EndTagPos = InStr(StartTagPos, OriginalHTMLCode, "<")
            End If
            
        Wend
            
            
        End Function
        Here's an example of how to use it and what it does.
        Text1.Text is:
        Code:
        <html>
        <b>Hi!!</b>
        Here's <i>more</i>.
        </html>
        Yep.
        Execute this:
        Text2.Text = StripHTMLTags(T ext1.Text)

        Text2.Text is now:
        Code:
        Hi!! 
        Here's more.
        
        Yep.
        Hope it's what you needed. :)

        Comment

        • Killer42
          Recognized Expert Expert
          • Oct 2006
          • 8429

          #5
          Originally posted by Nilya
          ...
          Mail if can to (<Removed by Moderator>)
          Hi.

          Just a note to let you know I've removed your e-mail address from the post. See the posting guidelines.

          Comment

          • Robbie
            New Member
            • Mar 2007
            • 180

            #6
            Originally posted by Killer42
            Hi.

            Just a note to let you know I've removed your e-mail address from the post. See the posting guidelines.
            Err, Killer, it's still in the second post by ansumansahu. ;)

            Comment

            • Killer42
              Recognized Expert Expert
              • Oct 2006
              • 8429

              #7
              Originally posted by Robbie
              Err, Killer, it's still in the second post by ansumansahu.
              No it isn't. :p

              Comment

              • Nilya
                New Member
                • Apr 2007
                • 6

                #8
                Originally posted by Robbie
                If you simply want to remove all text between '<' and '>', I'll make a function for that, it shouldn't be very hard. ;)
                After making function: Okay, it was a little harder than I expected. ~_~;

                Code:
                Public Function StripHTMLTags(OriginalHTMLCode As String, Optional TagReplaceText As String = "") As String
                '
                'OriginalHTMLCode - HTML code to strip tags from
                'TagReplaceText - What this function will put in place of the
                'tag (by default, nothing - an empty string)
                '
                'Gives back the HTML code with tags replaced by TagReplaceText
                '
                    Dim StartTagPos As Long
                    Dim EndTagPos As Long
                    Dim TempTagPos As Long
                    
                    Dim StartTagNum As Long
                    Dim EndTagNum As Long
                    
                    Dim TempChar As String
                    
                    StartTagPos = InStr(1, OriginalHTMLCode, "<")
                    
                While StartTagPos > 0
                    
                    
                    If StartTagPos > 0 Then
                    'An open tag has been found
                        StartTagNum = 1
                        EndTagNum = 0
                        
                        'Keep searching until same number of open tags and close tags
                        'have been found (i.e. until nested tags finish >_<)
                        TempTagPos = StartTagPos + 1
                        
                        While (EndTagNum < StartTagNum And TempTagPos <= Len(OriginalHTMLCode))
                            
                            TempChar = Mid(OriginalHTMLCode, TempTagPos, 1)
                            If TempChar = "<" Then StartTagNum = StartTagNum + 1
                            If TempChar = ">" Then EndTagNum = EndTagNum + 1
                            
                            TempTagPos = TempTagPos + 1
                        Wend
                        
                        
                    End If
                    
                    EndTagPos = TempTagPos - 1
                    
                    
                    StripHTMLTags = TagReplaceText + StripHTMLTags
                    If StartTagPos > 1 Then
                        StripHTMLTags = Mid(OriginalHTMLCode, 1, StartTagPos - 1)
                    End If
                        StripHTMLTags = StripHTMLTags + Mid(OriginalHTMLCode, EndTagPos + 1, Len(OriginalHTMLCode) - 2)
                        
                        OriginalHTMLCode = StripHTMLTags
                    
                    
                    StartTagPos = InStr(1, OriginalHTMLCode, "<")
                    If StartTagPos > 0 Then
                        EndTagPos = InStr(StartTagPos, OriginalHTMLCode, "<")
                    End If
                    
                Wend
                    
                    
                End Function
                Here's an example of how to use it and what it does.
                Text1.Text is:
                Code:
                <html>
                <b>Hi!!</b>
                Here's <i>more</i>.
                </html>
                Yep.
                Execute this:
                Text2.Text = StripHTMLTags(T ext1.Text)

                Text2.Text is now:
                Code:
                Hi!! 
                Here's more.
                
                Yep.
                Hope it's what you needed. :)




                ok i have solved the problem of removing tags, i have done it getting the source in text file and then removing the the tags.
                But what i need is wanna store the source code in a variable as string using inet or web browser, as its possible in it but i think tht variable has some limit of characters.


                So any other way to store the source code in variable
                Thanks,
                Nilesh Patil

                Comment

                Working...