Unicode conversion

**Jordan Abel** · Oct 31 '06, 06:55 PM

Re: Unicode conversion

2006-10-31 <1162297522.186 663.33230@m7g20 00cwm.googlegro ups.com>,
ankan.banerjee@ gmail.com wrote:

Hi,
>
I am currently trying to get an application to support Turkish
language...

The exact scenario is that we are trying to execute a BULK INSERT query
in our MS SQL database based on a data file we have. The datafile
itself is stored in ANSI format

I assume by "ANSI" you mean Windows codepage 1254. Windows has an
unfortunate habit of using "ANSI" to mean "Non-Unicode" when very few
ANSI standards are used. (I suppose they're all supersets of ANSI X3.4, but,
then, isn't unicode also?)

but has Turkish characters like 'S'
which is represented in the hex code 0xDE. If I import this file into
the DB I get the character 'Þ' instead which is U+00DE instead of
getting U+015E.

</offtopic>

I tried to use mbstowcs() and other conversion functions but none of
them help me to get 'S'. Any ideas on proper conversion?

Did you call setlocale()?

**Richard Tobin** · Oct 31 '06, 07:45 PM

Re: Unicode conversion

In article <slrnekf885.28j .random@rlaptop .random.yi.org> ,
Jordan Abel <random832@gmai l.comwrote:

>(I suppose they're all supersets of ANSI X3.4, but,
>then, isn't unicode also?)

To be pedantic, Unicode and X3.4 are rather different things. Unicode
defines a mapping between characters and numbers ("code points"),
while X3.4 defines a mapping between characters and computer
representations . To get something comparable to X3.4, you have to
take Unicode itself plus one of the various formats for representing
Unicode, such as UTF-8 or UTF-16.

-- Richard

**Jordan Abel** · Oct 31 '06, 08:05 PM

Re: Unicode conversion

2006-10-31 <ei8a4p$1p7g$4@ pc-news.cogsci.ed. ac.uk>,
Richard Tobin wrote:

In article <slrnekf885.28j .random@rlaptop .random.yi.org> ,
Jordan Abel <random832@gmai l.comwrote:
>

>>(I suppose they're all supersets of ANSI X3.4, but,
>>then, isn't unicode also?)

>
To be pedantic, Unicode and X3.4 are rather different things. Unicode
defines a mapping between characters and numbers ("code points"),
while X3.4 defines a mapping between characters and computer
representations . To get something comparable to X3.4, you have to
take Unicode itself plus one of the various formats for representing
Unicode, such as UTF-8 or UTF-16.

I'd assumed that X3.4 had code points, since it works for both 7-bit and
8-bit bytes. (doesn't it use that awful "column/row" decimal
representation, too?)

**Richard Tobin** · Oct 31 '06, 11:15 PM

Re: Unicode conversion

In article <slrnekfc41.28j .random@rlaptop .random.yi.org> ,
Jordan Abel <random832@gmai l.comwrote:

>I'd assumed that X3.4 had code points, since it works for both 7-bit and
>8-bit bytes. (doesn't it use that awful "column/row" decimal
>representation , too?)

You can always interpret a binary representation as code points, and
you could also give odd- and even-parity as different representations .
But I think Unicode makes the separation much more explicit than
most previous character sets.

-- Richard
--
"Considerat ion shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Unicode conversion

Unicode conversion

Comment

Comment

Comment

Comment