Writing out text with nulls

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • tshad

    Writing out text with nulls

    I have a program in 2005 that is reading a text file removing text and then
    writing it back out again. It removes lines that start with PRINT.

    This program has worked fine for months. Now all of a sudden, it is reading
    a straight text file and adding a null after each character it reads in.
    Why is that?

    The original file doesn't have nulls in them. The code is:
    *************** *************** **************
    using System;
    using System.IO;
    using System.Collecti ons.Generic;
    using System.Text;

    namespace DeletePrintStat ements
    {
    class Program
    {
    static void Main(string[] args)
    {
    string lineDisplay;
    string oldLineDisplay;
    FileStream fs = null;

    StreamReader sr = null;
    fs = new FileStream(@"D: \Database
    Scripts\Current Schema101408.sq l", FileMode.Open, System.IO.FileA ccess.Read);
    sr = new StreamReader(fs );

    StreamWriter sw = null;
    sw = File.CreateText (@"D:\Databas e
    Scripts\Current SchemaNoPrint10 1408.sql");

    string stemp = null;
    sw.WriteLine("s et nocount on");

    while (sr.Peek() >= 0)
    {
    lineDisplay = sr.ReadLine();

    if (lineDisplay.Le ngth >= 4) stemp =
    lineDisplay.Sub string(0, 4);

    if ((lineDisplay.L ength < 5) || (lineDisplay.Su bstring(0, 5)
    != "PRINT"))
    sw.WriteLine(li neDisplay);
    else
    {
    // Since last line was not a Print statement make sure
    next line is = "GO" and if so ignore it

    if (sr.Peek() >= 0)
    {
    oldLineDisplay = lineDisplay;
    lineDisplay = sr.ReadLine();
    if ((lineDisplay.L ength < 2) ||
    (lineDisplay.Su bstring(0, 2) != "GO"))
    {
    sw.WriteLine(ol dLineDisplay); // Should only be
    the "Update Succeeded" line
    // or a print
    statement inside of a SP
    sw.WriteLine(li neDisplay);
    }
    }
    }
    Console.WriteLi ne(lineDisplay) ;
    }
    fs.Close();
    sr.Close();
    sw.Close();
    Console.ReadLin e();
    }
    }
    }
    *************** *************** **************

    I have tried closing an reopening the program but it keeps doing the same
    thing.

    Thanks,

    Tom


  • jimbrown

    #2
    Re: Writing out text with nulls

    On Oct 14, 2:55 pm, "tshad" <t...@dslextrem e.comwrote:
    I have a program in 2005 that is reading a text file removing text and then
    writing it back out again.  It removes lines that start with PRINT.
    >
    This program has worked fine for months.  Now all of a sudden, it is reading
    a straight text file and adding a null after each character it reads in.
    Why is that?
    >
    The original file doesn't have nulls in them.  The code is:
    *************** *************** **************
    using System;
    using System.IO;
    using System.Collecti ons.Generic;
    using System.Text;
    >
    namespace DeletePrintStat ements
    {
        class Program
        {
            static void Main(string[] args)
            {
                string lineDisplay;
                string oldLineDisplay;
                FileStream fs = null;
    >
                StreamReader sr = null;
                fs = new FileStream(@"D: \Database
    Scripts\Current Schema101408.sq l", FileMode.Open, System.IO.FileA ccess.Read);
                sr = new StreamReader(fs );
    >
                StreamWriter sw = null;
                sw = File.CreateText (@"D:\Databas e
    Scripts\Current SchemaNoPrint10 1408.sql");
    >
                string stemp = null;
                sw.WriteLine("s et nocount on");
    >
                while (sr.Peek() >= 0)
                {
                    lineDisplay = sr.ReadLine();
    >
                    if (lineDisplay.Le ngth >= 4) stemp =
    lineDisplay.Sub string(0, 4);
    >
                    if ((lineDisplay.L ength < 5) || (lineDisplay.Su bstring(0, 5)
    != "PRINT"))
                        sw.WriteLine(li neDisplay);
                    else
                    {
                        // Since last line was not a Print statement make sure
    next line is = "GO" and if so ignore it
    >
                        if (sr.Peek() >= 0)
                        {
                            oldLineDisplay = lineDisplay;
                            lineDisplay = sr.ReadLine();
                            if ((lineDisplay.L ength <2) ||
    (lineDisplay.Su bstring(0, 2) != "GO"))
                            {
                                sw.WriteLine(ol dLineDisplay);  // Should only be
    the "Update Succeeded" line
                                                               // or a print
    statement inside of a SP
                                sw.WriteLine(li neDisplay);
                            }
                        }
                    }
                    Console.WriteLi ne(lineDisplay) ;
                }
                fs.Close();
                sr.Close();
                sw.Close();
                Console.ReadLin e();
            }
        }}
    >
    *************** *************** **************
    >
    I have tried closing an reopening the program but it keeps doing the same
    thing.
    >
    Thanks,
    >
    Tom
    The output you describe is what Unicode characters would look like.
    Maybe your project changed from multi-byte to Unicode.

    Comment

    • cfps.Christian

      #3
      Re: Writing out text with nulls

      Not sure where your text file is coming from but I've had similar
      problems. One of the problems I ran into is there are characters
      which Visual Studio (in general) cannot render as well as many text
      editors. The way I ended up finding out that my text file was bad was
      to put it through programmer's notepad or the command prompt edit
      window.

      Now to edit these characters out I had to do a string.replace for each
      of these characters by using their integer value. Its something like
      character values 1 - 26 cannot be rendered by normal text editors.

      This may or may not be your problem, but I figured I'd offer at least
      an idea.

      Comment

      • tshad

        #4
        Re: Writing out text with nulls

        Here is the file I am reading:

        SET NUMERIC_ROUNDAB ORT OFF
        GO
        SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIE LDS_NULL, ARITHABORT, QUOTED_IDENTIFI ER, ANSI_NULLS ON
        GO

        Here is what it comes up with:

        0: 73 65 74 20 6E 6F 63 6F 75 6E 74 20 6F 6E 0D 0A set nocount on..
        10: 53 00 45 00 54 00 20 00 4E 00 55 00 4D 00 45 00 S.E.T. ..N.U.M.E.
        20: 52 00 49 00 43 00 5F 00 52 00 4F 00 55 00 4E 00 R.I.C._.R.O.U.N .
        30: 44 00 41 00 42 00 4F 00 52 00 54 00 20 00 4F 00 D.A.B.O.R.T. .O.
        40: 46 00 46 00 0D 0A 00 0D 0A 00 47 00 4F 00 0D 0A F.F.......G.O.. .
        50: 00 0D 0A 00 53 00 45 00 54 00 20 00 41 00 4E 00 .....S.E.T. .A.N.

        60: 53 00 49 00 5F 00 50 00 41 00 44 00 44 00 49 00 S.I._.P.A.D.D.I .
        70: 4E 00 47 00 2C 00 20 00 41 00 4E 00 53 00 49 00 N.G.,. ..A.N.S.I.
        80: 5F 00 57 00 41 00 52 00 4E 00 49 00 4E 00 47 00 _.W.A.R.N.I.N.G .
        90: 53 00 2C 00 20 00 43 00 4F 00 4E 00 43 00 41 00 S.,. ..C.O.N.C.A.
        A0: 54 00 5F 00 4E 00 55 00 4C 00 4C 00 5F 00 59 00 T._.N.U.L.L._.Y .

        B0: 49 00 45 00 4C 00 44 00 53 00 5F 00 4E 00 55 00 I.E.L.D.S._.N.U .
        C0: 4C 00 4C 00 2C 00 20 00 41 00 52 00 49 00 54 00 L.L.,. ..A.R.I.T.
        D0: 48 00 41 00 42 00 4F 00 52 00 54 00 2C 00 20 00 H.A.B.O.R.T.,. .
        E0: 51 00 55 00 4F 00 54 00 45 00 44 00 5F 00 49 00 Q.U.O.T.E.D._.I .
        F0: 44 00 45 00 4E 00 54 00 49 00 46 00 49 00 45 00 D.E.N.T.I.F.I.E .
        100: 52 00 2C 00 20 00 41 00 4E 00 53 00 49 00 5F 00 R.,. ..A.N.S.I._.

        110: 4E 00 55 00 4C 00 4C 00 53 00 20 00 4F 00 4E 00 N.U.L.L.S. .O.N.
        120: 0D 0A 00 0D 0A 00 47 00 4F 00 0D 0A 00 0D 0A 00 .......G.O..... ..
        130: 0D 0A ..


        As you can see the line that was added (set nocount on) didn't have nulls and the lines it read it does.

        What would cause this?

        Thanks,

        Tom

        "tshad" <tfs@dslextreme .comwrote in message news:uDQaWbjLJH A.4452@TK2MSFTN GP05.phx.gbl...
        >I have a program in 2005 that is reading a text file removing text and then
        writing it back out again. It removes lines that start with PRINT.

        This program has worked fine for months. Now all of a sudden, it is reading
        a straight text file and adding a null after each character it reads in.
        Why is that?

        The original file doesn't have nulls in them. The code is:
        *************** *************** **************
        using System;
        using System.IO;
        using System.Collecti ons.Generic;
        using System.Text;

        namespace DeletePrintStat ements
        {
        class Program
        {
        static void Main(string[] args)
        {
        string lineDisplay;
        string oldLineDisplay;
        FileStream fs = null;

        StreamReader sr = null;
        fs = new FileStream(@"D: \Database
        Scripts\Current Schema101408.sq l", FileMode.Open, System.IO.FileA ccess.Read);
        sr = new StreamReader(fs );

        StreamWriter sw = null;
        sw = File.CreateText (@"D:\Databas e
        Scripts\Current SchemaNoPrint10 1408.sql");

        string stemp = null;
        sw.WriteLine("s et nocount on");

        while (sr.Peek() >= 0)
        {
        lineDisplay = sr.ReadLine();

        if (lineDisplay.Le ngth >= 4) stemp =
        lineDisplay.Sub string(0, 4);

        if ((lineDisplay.L ength < 5) || (lineDisplay.Su bstring(0, 5)
        != "PRINT"))
        sw.WriteLine(li neDisplay);
        else
        {
        // Since last line was not a Print statement make sure
        next line is = "GO" and if so ignore it

        if (sr.Peek() >= 0)
        {
        oldLineDisplay = lineDisplay;
        lineDisplay = sr.ReadLine();
        if ((lineDisplay.L ength < 2) ||
        (lineDisplay.Su bstring(0, 2) != "GO"))
        {
        sw.WriteLine(ol dLineDisplay); // Should only be
        the "Update Succeeded" line
        // or a print
        statement inside of a SP
        sw.WriteLine(li neDisplay);
        }
        }
        }
        Console.WriteLi ne(lineDisplay) ;
        }
        fs.Close();
        sr.Close();
        sw.Close();
        Console.ReadLin e();
        }
        }
        }
        *************** *************** **************

        I have tried closing an reopening the program but it keeps doing the same
        thing.

        Thanks,

        Tom

        >

        Comment

        • Peter Duniho

          #5
          Re: Writing out text with nulls

          Please do not post HTML. Use plain text. As for the question...

          On Tue, 14 Oct 2008 13:14:59 -0700, tshad <tfs@dslextreme .comwrote:
          Here is the file I am reading: [...]
          Where did that file come from? As Jim suggested, the text with the 0
          bytes do in fact look like Unicode characters (UTF-16 to be specific).
          The bytes you posted have mixed UTF-8 and UTF-16 (UTF-8 is the default for
          StreamWriter, and as long as the characters are all in the 0-127 range
          will be indistinguishab le from ASCII), because you're reading UTF-16 data
          from the original file and emitted that data as if it were UTF-8 (along
          with the other UTF-8 stuff you've added, such as the first line, and the
          line breaks).

          Whatever the problem is, it's related to whatever outputs the file you're
          reading. Somewhere along the line, it apparently got changed to output
          UTF-16. You can either fix your program to read the input as UTF-16
          instead, or you can go smack upside the head whatever person it was that
          changed the output format without consulting the people that would affect
          (such as yourself). And then get them to change it back so that they are
          writing UTF-8 or ASCII again (whatever it was that was being written in
          the first place).

          Pete

          Comment

          • tshad

            #6
            Re: Writing out text with nulls


            "Peter Duniho" <NpOeStPeAdM@nn owslpianmk.comw rote in message
            news:op.ui07f82 f8jd0ej@petes-computer.local. ..
            Please do not post HTML. Use plain text. As for the question...
            >
            On Tue, 14 Oct 2008 13:14:59 -0700, tshad <tfs@dslextreme .comwrote:
            >
            >Here is the file I am reading: [...]
            >
            Where did that file come from? As Jim suggested, the text with the 0
            bytes do in fact look like Unicode characters (UTF-16 to be specific).
            The bytes you posted have mixed UTF-8 and UTF-16 (UTF-8 is the default for
            StreamWriter, and as long as the characters are all in the 0-127 range
            will be indistinguishab le from ASCII), because you're reading UTF-16 data
            from the original file and emitted that data as if it were UTF-8 (along
            with the other UTF-8 stuff you've added, such as the first line, and the
            line breaks).
            >
            Whatever the problem is, it's related to whatever outputs the file you're
            reading. Somewhere along the line, it apparently got changed to output
            UTF-16. You can either fix your program to read the input as UTF-16
            instead, or you can go smack upside the head whatever person it was that
            changed the output format without consulting the people that would affect
            (such as yourself). And then get them to change it back so that they are
            writing UTF-8 or ASCII again (whatever it was that was being written in
            the first place).
            >
            Found out what was going on. Just not sure why.

            It seems to be written out in unicode (hex shows it that way) but the
            program sees it as ansi (utf-8, I assume). And the program handles it fine.

            But if I make any change (textpad or notepad) it now shows the each
            character as having a blank character between it when it writes it out.
            Then when you look at it in Textpad it shows a black box between each
            character and Notepad shows a blank between each character.

            Not sure why they are different. In both cases, there were nulls between
            each character. But the editors treated them different.

            Tom

            Pete

            Comment

            • Michael B. Trausch

              #7
              Re: Writing out text with nulls

              On Tue, 14 Oct 2008 14:55:14 -0700
              "tshad" <tfs@dslextreme .comwrote:
              But if I make any change (textpad or notepad) it now shows the each
              character as having a blank character between it when it writes it
              out. Then when you look at it in Textpad it shows a black box between
              each character and Notepad shows a blank between each character.
              >
              Not sure why they are different. In both cases, there were nulls
              between each character. But the editors treated them different.
              The text editor is probably set up to use UTF-16 encoding for
              characters. Per MSDN, UTF-16 is the internal encoding used in Windows
              and .NET,[1] Java also uses this as well, IIRC. It could be saving the
              file in that way if the system configuration has somehow changed to do
              that, but I don't know what would be involved in such a thing.

              In any case, if you can manage to do it, you should probably try to
              detect the character set of the file before processing it, so that your
              program can appropriately handle it. UTF-16 is pretty easy to detect
              for documents that contain characters which mostly or completely fit in
              the ASCII character set, and most ASCII-compatible ones are detectable
              if you know their rules; ASCII compatible charsets use 0-127
              identically to ASCII. You could, in theory, detect UTF-16 and
              compensate for that, and otherwise just read bytes in the range of
              33-127, as a (very simple, but not terribly robust) way for dealing
              with files that may have an arbitrary charset.

              --- Mike

              --
              My sigfile ran away and is on hiatus.

              Comment

              • Franck

                #8
                Re: Writing out text with nulls

                Your problem seems the file format.

                try
                sw = new StreamWriter(fs , System.Text.Enc oding.UTF8);
                with reader you can do the same, try always specify the format your
                are readign when it's none binary files
                obviously System.Text.Enc oding contains other format like ASCII Utf16
                and more. choose one and stick with it.

                But those are SQL query so they should be using anything else than
                ASCII or UTF8. And right now your code seems to read as UTF16

                Comment

                Working...