how find if a file is unicode or not

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • codefragment@googlemail.com

    how find if a file is unicode or not

    Hi
    As the subject says, how do you know if a file is unicode, ascii or
    whatever

    ta
  • Jon Skeet [C# MVP]

    #2
    Re: how find if a file is unicode or not

    On Jun 25, 5:15 pm, codefragm...@go oglemail.com wrote:
    As the subject says, how do you know if a file is unicode, ascii or
    whatever
    You can't find out for sure. You can read some portion of it and check
    for common patterns, but it's still only going to be a guess.

    Is there no way that you can require files to be in a certain encoding
    in your situation?

    Jon

    Comment

    • Sin Jeong-hun

      #3
      Re: how find if a file is unicode or not

      On Jun 26, 1:15 am, codefragm...@go oglemail.com wrote:
      Hi
        As the subject says, how do you know if a file is unicode, ascii or
      whatever
      >
      ta
      I think you might check for the BOM for Unicode text-file, but there's
      no certain and universal way to determine text encoding. I haven't
      seen any text editor that does this.

      Comment

      • codefragment@googlemail.com

        #4
        Re: how find if a file is unicode or not

        On 25 Jun, 17:20, "Jon Skeet [C# MVP]" <sk...@pobox.co mwrote:
        On Jun 25, 5:15 pm, codefragm...@go oglemail.com wrote:
        >
          As the subject says, how do you know if a file is unicode, ascii or
        whatever
        >
        You can't find out for sure. You can read some portion of it and check
        for common patterns, but it's still only going to be a guess.
        >
        Is there no way that you can require files to be in a certain encoding
        in your situation?
        >
        Jon
        Hi
        Thanks for the reply, I'm new to unicode in general.
        - Can you have a file thats part unicode and part ascii or are they
        one or the other?
        - Once the file is read into c# is there anyway of checking the loaded
        strings to see if their unicode?
        - Anyone got some example code for checking the BOM?

        I want to write a noddy program to read in a file that maybe ascii,
        maybe unicode. If its unicode it will rewrite it
        as ascii (fine so far) and tell you thats it did it. It could check
        the file size which I guess should be halved
        but I'm surprised theres no easier way of doing this?

        Comment

        • Jon Skeet [C# MVP]

          #5
          Re: how find if a file is unicode or not

          <codefragment@g ooglemail.comwr ote:
          Thanks for the reply, I'm new to unicode in general.
          - Can you have a file thats part unicode and part ascii or are they
          one or the other?
          A file is really just a sequence of bytes. How those bytes are
          interpreted is up to the programs using the file. You could certainly
          have a file which changed encoding half way through - it would just be
          a pain to work with.
          - Once the file is read into c# is there anyway of checking the loaded
          strings to see if their unicode?
          No, it doesn't work that way. All strings in .NET are stored as Unicode
          internally. You could see whether all of the characters in the string
          are part of the ASCII character set though.
          - Anyone got some example code for checking the BOM?
          Not offhand - although I believe StreamReader has an overload to auto-
          detect the BOM. Have a look at the docs to check.

          See http://pobox.com/~skeet/csharp/unicode.html for an introduction to
          the topic.


          --
          Jon Skeet - <skeet@pobox.co m>
          Web site: http://www.pobox.com/~skeet
          Blog: http://www.msmvps.com/jon_skeet
          C# in Depth: http://csharpindepth.com

          Comment

          Working...