Unicode conversion

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ankan.banerjee@gmail.com

    Unicode conversion

    Hi,

    I am currently trying to get an application to support Turkish
    language...
    The exact scenario is that we are trying to execute a BULK INSERT query
    in our MS SQL database based on a data file we have. The datafile
    itself is stored in ANSI format but has Turkish characters like 'S'
    which is represented in the hex code 0xDE. If I import this file into
    the DB I get the character 'Þ' instead which is U+00DE instead of
    getting U+015E.
    I tried to use mbstowcs() and other conversion functions but none of
    them help me to get 'S'. Any ideas on proper conversion?

    Thanks and regards,
    Ankan

  • Jordan Abel

    #2
    Re: Unicode conversion

    2006-10-31 <1162297522.186 663.33230@m7g20 00cwm.googlegro ups.com>,
    ankan.banerjee@ gmail.com wrote:
    Hi,
    >
    I am currently trying to get an application to support Turkish
    language...
    <offtopic>
    The exact scenario is that we are trying to execute a BULK INSERT query
    in our MS SQL database based on a data file we have. The datafile
    itself is stored in ANSI format
    I assume by "ANSI" you mean Windows codepage 1254. Windows has an
    unfortunate habit of using "ANSI" to mean "Non-Unicode" when very few
    ANSI standards are used. (I suppose they're all supersets of ANSI X3.4, but,
    then, isn't unicode also?)
    but has Turkish characters like 'S'
    which is represented in the hex code 0xDE. If I import this file into
    the DB I get the character 'Þ' instead which is U+00DE instead of
    getting U+015E.
    </offtopic>
    I tried to use mbstowcs() and other conversion functions but none of
    them help me to get 'S'. Any ideas on proper conversion?
    Did you call setlocale()?

    Comment

    • Richard Tobin

      #3
      Re: Unicode conversion

      In article <slrnekf885.28j .random@rlaptop .random.yi.org> ,
      Jordan Abel <random832@gmai l.comwrote:
      >(I suppose they're all supersets of ANSI X3.4, but,
      >then, isn't unicode also?)
      To be pedantic, Unicode and X3.4 are rather different things. Unicode
      defines a mapping between characters and numbers ("code points"),
      while X3.4 defines a mapping between characters and computer
      representations . To get something comparable to X3.4, you have to
      take Unicode itself plus one of the various formats for representing
      Unicode, such as UTF-8 or UTF-16.

      -- Richard

      Comment

      • Jordan Abel

        #4
        Re: Unicode conversion

        2006-10-31 <ei8a4p$1p7g$4@ pc-news.cogsci.ed. ac.uk>,
        Richard Tobin wrote:
        In article <slrnekf885.28j .random@rlaptop .random.yi.org> ,
        Jordan Abel <random832@gmai l.comwrote:
        >
        >>(I suppose they're all supersets of ANSI X3.4, but,
        >>then, isn't unicode also?)
        >
        To be pedantic, Unicode and X3.4 are rather different things. Unicode
        defines a mapping between characters and numbers ("code points"),
        while X3.4 defines a mapping between characters and computer
        representations . To get something comparable to X3.4, you have to
        take Unicode itself plus one of the various formats for representing
        Unicode, such as UTF-8 or UTF-16.
        I'd assumed that X3.4 had code points, since it works for both 7-bit and
        8-bit bytes. (doesn't it use that awful "column/row" decimal
        representation, too?)

        Comment

        • Richard Tobin

          #5
          Re: Unicode conversion

          In article <slrnekfc41.28j .random@rlaptop .random.yi.org> ,
          Jordan Abel <random832@gmai l.comwrote:
          >I'd assumed that X3.4 had code points, since it works for both 7-bit and
          >8-bit bytes. (doesn't it use that awful "column/row" decimal
          >representation , too?)
          You can always interpret a binary representation as code points, and
          you could also give odd- and even-parity as different representations .
          But I think Unicode makes the separation much more explicit than
          most previous character sets.

          -- Richard
          --
          "Considerat ion shall be given to the need for as many as 32 characters
          in some alphabets" - X3.4, 1963.

          Comment

          Working...