Array Problem When Index Value is Nothing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Robert Bevington

    Array Problem When Index Value is Nothing

    Hi all,

    I ran into memory problems while tying to search and replace a very large
    text file. To solve this I break the file up into chunks and run the search
    and replace on each chunk. This works fine and has solved the OutOfMemory
    problem.

    However, on the last loop when the array c is written to CleanTMX, a number
    of 0x00 characters are written at the end of the file. This causes problems
    in a further XMLTransformati on as this character is not allowed in XML. I
    looked at the values of the index. Th eproblem seems to be caused by index
    values at the end of the array being set to Nothing.

    Question: How can I get rid of these characters? Or how can I reduce the
    array to only contain index values that are not Nothing?

    Here's the code that writes the CleanTMX file:

    Dim c(My.Settings.R eadChunkSize) As Char 'ReadChunkSize is a user-defined
    setting, normally set to 10000

    Using sr As StreamReader = New StreamReader(Or iginalTMX,
    System.Text.Enc oding.UTF8, True)
    Do While sr.Peek() >= 0
    sr.Read(c, 0, c.Length)
    Dim i As Integer
    For i = 0 To arrFind.Length - 1
    c = Regex.Replace(c , arrFind(i), arrReplace(i))
    Next
    Try
    Using sw As StreamWriter = New StreamWriter(Cl eanTMX, True,
    System.Text.Enc oding.UTF8)
    sw.Write(c)
    End Using

    Catch ex As Exception
    End Try
    Loop

    Would really appreciate any help on this one.

    Thanx

    Rob


  • Armin Zingler

    #2
    Re: Array Problem When Index Value is Nothing

    "Robert Bevington" <rbevington@fre enet.deschrieb
    Hi all,
    >
    I ran into memory problems while tying to search and replace a very
    large text file. To solve this I break the file up into chunks and
    run the search and replace on each chunk. This works fine and has
    solved the OutOfMemory problem.
    >
    However, on the last loop when the array c is written to CleanTMX, a
    number of 0x00 characters are written at the end of the file. This
    causes problems in a further XMLTransformati on as this character is
    not allowed in XML. I looked at the values of the index. Th eproblem
    seems to be caused by index values at the end of the array being set
    to Nothing.
    >
    Question: How can I get rid of these characters? Or how can I reduce
    the array to only contain index values that are not Nothing?
    >
    Here's the code that writes the CleanTMX file:
    >
    Dim c(My.Settings.R eadChunkSize) As Char 'ReadChunkSize is a
    user-defined setting, normally set to 10000
    >
    Using sr As StreamReader = New StreamReader(Or iginalTMX,
    System.Text.Enc oding.UTF8, True)
    Do While sr.Peek() >= 0
    sr.Read(c, 0, c.Length)
    Dim i As Integer
    For i = 0 To arrFind.Length - 1
    c = Regex.Replace(c , arrFind(i), arrReplace(i))
    Next
    Try
    Using sw As StreamWriter = New StreamWriter(Cl eanTMX, True,
    System.Text.Enc oding.UTF8)
    sw.Write(c)
    End Using
    >
    Catch ex As Exception
    End Try
    Loop
    >
    Would really appreciate any help on this one.
    I'm not sure if it's correct in this context, but I think sr.Read
    returns the number of characters read. Hence, you have to write only as
    many characters as have been read.

    dim CharCount as integer

    charcount = sr.read(c, 0, c.length)
    ...
    sw.write(c, 0, charcount)

    I think this explains the additional characters.

    However, you should reposition the file pointer after reading a chunk.
    I'm not sure if that's possible using the StreamReader because of the
    internal buffer, so you'd have to use a BinaryReader and do the UTF8
    decoding on your own, while being able to set the file pointer
    backwards. Otherwise, you will not recognize search strings that are
    split across chunks boundaries. For example,

    chunk #1: "Robert B"
    chunk #2: "evington"

    You don't find "Bev" in any of the chunks.


    Armin

    Comment

    • =?Utf-8?B?U3VydHVyWg==?=

      #3
      RE: Array Problem When Index Value is Nothing

      Can't you just REDIM PRESERVE to reduce the array size to get rid of the 0x00
      entries?

      Armin is correct that you'll miss entries on chunk boundaries, BTW. One
      solution is to use the 'c' array as a buffer, appending newly read characters
      to the end, taking off characters to the output stream from the beginning,
      and always leaving at least n characters in 'c', where n=length of the
      biggest string you are looking for (minus one).

      --
      David Streeter
      Synchrotech Software
      Sydney Australia


      "Robert Bevington" wrote:
      Hi all,
      >
      I ran into memory problems while tying to search and replace a very large
      text file. To solve this I break the file up into chunks and run the search
      and replace on each chunk. This works fine and has solved the OutOfMemory
      problem.
      >
      However, on the last loop when the array c is written to CleanTMX, a number
      of 0x00 characters are written at the end of the file. This causes problems
      in a further XMLTransformati on as this character is not allowed in XML. I
      looked at the values of the index. Th eproblem seems to be caused by index
      values at the end of the array being set to Nothing.
      >
      Question: How can I get rid of these characters? Or how can I reduce the
      array to only contain index values that are not Nothing?
      >
      Here's the code that writes the CleanTMX file:
      >
      Dim c(My.Settings.R eadChunkSize) As Char 'ReadChunkSize is a user-defined
      setting, normally set to 10000
      >
      Using sr As StreamReader = New StreamReader(Or iginalTMX,
      System.Text.Enc oding.UTF8, True)
      Do While sr.Peek() >= 0
      sr.Read(c, 0, c.Length)
      Dim i As Integer
      For i = 0 To arrFind.Length - 1
      c = Regex.Replace(c , arrFind(i), arrReplace(i))
      Next
      Try
      Using sw As StreamWriter = New StreamWriter(Cl eanTMX, True,
      System.Text.Enc oding.UTF8)
      sw.Write(c)
      End Using
      >
      Catch ex As Exception
      End Try
      Loop
      >
      Would really appreciate any help on this one.
      >
      Thanx
      >
      Rob
      >
      >
      >

      Comment

      • Robert Bevington

        #4
        Re: Array Problem When Index Value is Nothing

        Hi Armin and Surtur,

        thanx guys for your replies. Having read that my "great" solution to my
        problem didn't really work was a real downer for me :-) I wasa broken man
        last night and went straight to bed :-) But that's what happens when
        beginners start programming I suppose.

        I tried the Redim Preserve. That might solve the one problem. I just need to
        find the correct value for the redim.

        Surtur's solution sounds interesting too. I'll look into to both.

        Again thanx

        Rob


        "SurturZ" <surturz@newsgr oup.nospamschri eb im Newsbeitrag
        news:43AD1161-BF08-4108-B3FA-6BE5D199A07B@mi crosoft.com...
        Can't you just REDIM PRESERVE to reduce the array size to get rid of the
        0x00
        entries?
        >
        Armin is correct that you'll miss entries on chunk boundaries, BTW. One
        solution is to use the 'c' array as a buffer, appending newly read
        characters
        to the end, taking off characters to the output stream from the beginning,
        and always leaving at least n characters in 'c', where n=length of the
        biggest string you are looking for (minus one).
        >
        --
        David Streeter
        Synchrotech Software
        Sydney Australia
        >
        >
        "Robert Bevington" wrote:
        >
        >Hi all,
        >>
        >I ran into memory problems while tying to search and replace a very large
        >text file. To solve this I break the file up into chunks and run the
        >search
        >and replace on each chunk. This works fine and has solved the OutOfMemory
        >problem.
        >>
        >However, on the last loop when the array c is written to CleanTMX, a
        >number
        >of 0x00 characters are written at the end of the file. This causes
        >problems
        >in a further XMLTransformati on as this character is not allowed in XML. I
        >looked at the values of the index. Th eproblem seems to be caused by
        >index
        >values at the end of the array being set to Nothing.
        >>
        >Question: How can I get rid of these characters? Or how can I reduce the
        >array to only contain index values that are not Nothing?
        >>
        >Here's the code that writes the CleanTMX file:
        >>
        >Dim c(My.Settings.R eadChunkSize) As Char 'ReadChunkSize is a user-defined
        >setting, normally set to 10000
        >>
        >Using sr As StreamReader = New StreamReader(Or iginalTMX,
        >System.Text.En coding.UTF8, True)
        > Do While sr.Peek() >= 0
        > sr.Read(c, 0, c.Length)
        > Dim i As Integer
        > For i = 0 To arrFind.Length - 1
        > c = Regex.Replace(c , arrFind(i), arrReplace(i))
        > Next
        > Try
        > Using sw As StreamWriter = New StreamWriter(Cl eanTMX, True,
        >System.Text.En coding.UTF8)
        > sw.Write(c)
        > End Using
        >>
        > Catch ex As Exception
        > End Try
        >Loop
        >>
        >Would really appreciate any help on this one.
        >>
        >Thanx
        >>
        >Rob
        >>
        >>
        >>

        Comment

        Working...