Trying to find substring efficiently.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Rob S
    New Member
    • Jan 2011
    • 14

    Trying to find substring efficiently.

    Hi,
    I am not sure where I am going wrong with this code.
    It seems to work fine for a small text file but when I use files larger than 100MB it does not give me an accurate count.
    The program is pretty simple. It should return the number of occurences of a substring from within a text file. I look for a match to the first character of the substring and if found then my code should test to see if the whole substring exists.
    I have run several tests but for some reason I am not getting an accurate result. I used the windowsupdate.l og file. My code trturned 34596 hits but when I used MS Word to find the number of occurences it returned 50096. Obviously there is something wrong with my code but I cant seem to figure it out.
    Any help would be highly appreciated. Thanks a lot.
    Rob

    here is the code:
    Code:
    Imports System.IO
    Imports System.Diagnostics
    Imports System.Threading.Tasks
    Imports System.Text
    
    
    
    Module Module1
    
        Sub Main()
            Dim sw As New Stopwatch
            sw.Start()
            Dim fs As New FileStream("C:\mpi\windowsupdate.log", FileMode.Open, FileAccess.Read)
            Dim br As New StreamReader(fs)
            Dim searchterm As String = "WINDOWS"
            Dim bytesTerm As Byte() = Encoding.ASCII.GetBytes(searchterm)
            Dim WordLen As Integer = searchterm.Length - 1
            Dim matchcount As Integer = 0
            Dim matches As Boolean = False
            Dim c As Byte
            Console.WriteLine("Processing...")
            While br.Peek <> -1
                c = br.Read
                If c = bytesTerm(0) Then
                    matches = True
                    For i = 1 To WordLen
                        c = br.Read
                        If c = bytesTerm(i) Then
                            matches = True
                        Else
                            matches = False
                        End If
                    Next
                    If matches = True Then
                        matchcount += 1
                        matches = False
                    End If
                End If
            End While
            sw.Stop()
            Console.WriteLine("Total matches:" & matchcount)
            Console.WriteLine("Total time:" & sw.Elapsed.Seconds)
            Console.ReadLine()
            br.Close()
            fs.Close()
    
        End Sub
    
    End Module
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    #2
    A couple of things. You need to exit out of that For loop once you hit a False. Otherwise, if the last letter matches, it's going to return a hit. Second, you're only matching on capital letters, what about lowercase?

    Comment

    • Rob S
      New Member
      • Jan 2011
      • 14

      #3
      Hey Rabbit,
      Thanks a lot for your quick response. I made the changes you suggested and it works fine now. As you noticedthe search is case sensetive and will only return an exact hit which is what i was looking for anyways. I might modify the algorithm at some point to work with regular expressions as well.
      By the way, have you ever worked with MPI or any kind of distributed applications?
      Take care.
      Rob

      Comment

      • Rabbit
        Recognized Expert MVP
        • Jan 2007
        • 12517

        #4
        Sorry, can't say that I have.

        Comment

        Working...