An input UTF-8 encoded file is output as a ANSI encoded file. Why?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Joe Campanini
    New Member
    • May 2011
    • 58

    An input UTF-8 encoded file is output as a ANSI encoded file. Why?

    I am inputing a UTF-8 encoded file into memory using the following code...

    Code:
                try
                {
                    StreamReader readFile = new StreamReader(pathNames[0]);
                    while (line != null)
                    {
                        line = readFile.ReadLine();
                        compareData.Add(line);
                    }
                }
                catch (Exception f)
                {
                    Console.WriteLine(f.Message);
                    Console.ReadLine();
                }
    The input data includes some special characters such as "ü" and "ß". When I check the input "line" in debug mode, and the subsequent output file, those characters are look like this "�". Why?

    Joe
  • RhysW
    New Member
    • Mar 2012
    • 70

    #2
    because the software you are storing it in deosnt know what those characters are, those characters aren't part of the font thats being used, there just isnt an equivalent graphical representation of the code value of those characters.

    Comment

    • Joe Campanini
      New Member
      • May 2011
      • 58

      #3
      Reply to RhysW

      Originally posted by RhysW
      because the software you are storing it in deosnt know what those characters are, those characters aren't part of the font thats being used, there just isnt an equivalent graphical representation of the code value of those characters.
      The software is C# 2010 version, and it does recognize it, because I can key in alt 129 in a string and the character ü appears in the software. No, this has something to do with the way that the file is being read, I just don't know what! I have just seen that if I open the txt file using Exel the same thing happens, but if I open the file with notepad the characters are OK. But hey! thank you for at least making a sensible suggestion, I appreaciate it. Joe

      Comment

      • RhysW
        New Member
        • Mar 2012
        • 70

        #4
        no i mean the file, not vis studio or its equivalent, i mean the literal file that its being stored in, as in if youre reading from notepad i think if you opened notepad it would show that questionmark not the character. if you open up some files in notepad and it deosnt know the symbol it displays that questionmark in its place, this might be the problem though i havent checked

        Edit: though hecking in notepad it does support those characters, so im not sure, what sortware is the file actually stored as?

        Comment

        • Joe Campanini
          New Member
          • May 2011
          • 58

          #5
          Thanks for your input and sorry I did not reply sooner, but I managed to get round the problem, I think. I opened the txt file with notepad copied and pasted the whole file into a new notepad and saved it as a UTF-8 file. The C# program seems happy with this but exel still doesn't like. I have seen some funny things in my life in the IT world but this is got to be one of the strang ones. I bet the answere is really simple, but don't have time to investigate. Once again, thank you for your input. Joe

          Comment

          • Plater
            Recognized Expert Expert
            • Apr 2007
            • 7872

            #6
            The default encoding is ASCII in us-en locale.
            Did you set the encoding type of the output stream? For instance streamwriter takes an encoding paramater (which could be utf-8)

            Comment

            Working...