Wide character input/output

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ioannis Vranos

    Wide character input/output

    [The current message encoding is set to Unicode (UTF-8) because it
    contains Greek]


    The following code does not work as expected:


    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>

    int main()
    {
    char *p= setlocale( LC_ALL, "Greek" );

    wchar_t input[50];

    if (!p)
    printf("NULL returned!\n");

    fgetws(input, 50, stdin);

    wprintf(L"%s\n" , input);

    return 0;
    }


    Under Linux:


    [john@localhost src]$ ./foobar-cpp
    Test
    T
    [john@localhost src]$


    [john@localhost src]$ ./foobar-cpp
    Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
    �
    [john@localhost src]$




    Under MS Visual C++ 2008 Express:

    Test
    Test

    Press any key to continue . . .


    Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
    ??????ε????

    Press any key to continue . . .


    Am I missing something?
  • Ben Bacarisse

    #2
    Re: Wide character input/output

    Ioannis Vranos <ivranos@nospam .no.spamfreemai l.grwrites:
    [The current message encoding is set to Unicode (UTF-8) because it
    contains Greek]
    >
    >
    The following code does not work as expected:
    >
    >
    #include <wchar.h>
    #include <locale.h>
    #include <stdio.h>
    #include <stddef.h>
    >
    int main()
    {
    char *p= setlocale( LC_ALL, "Greek" );
    >
    wchar_t input[50];
    >
    if (!p)
    printf("NULL returned!\n");
    >
    fgetws(input, 50, stdin);
    >
    wprintf(L"%s\n" , input);
    You need "%ls". This is very important with wprintf since without it
    %s denotes a multi-byte character sequence. printf("%ls\n" input)
    should also work. You need the w version if you want the multi-byte
    conversion of %s or if the format has to be a wchar_t pointer.
    >
    return 0;
    }
    >
    >
    Under Linux:
    >
    >
    [john@localhost src]$ ./foobar-cpp
    Test
    T
    [john@localhost src]$
    >
    >
    [john@localhost src]$ ./foobar-cpp
    Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
    �
    [john@localhost src]$
    The above my not be the only problem. In cases like this, you need to
    say way encoding your terminal is using.

    <snip>

    --
    Ben.

    Comment

    • Ioannis Vranos

      #3
      Re: Wide character input/output

      Ben Bacarisse wrote:
      >
      You need "%ls". This is very important with wprintf since without it
      %s denotes a multi-byte character sequence. printf("%ls\n" input)
      should also work. You need the w version if you want the multi-byte
      conversion of %s or if the format has to be a wchar_t pointer.

      Perhaps you may help me understand better. We have the usual char
      encoding which is implementation defined (usually ASCII).

      wchar_t is wide character encoding, which is the "largest character set
      supported by the system", so I suppose Unicode under Linux and Windows.

      What exactly is a multi-byte character?

      I have to say that I am talking about C95 here, not C99.

      >
      > return 0;
      >}
      >>
      >>
      >Under Linux:
      >>
      >>
      >[john@localhost src]$ ./foobar-cpp
      >Test
      >T
      >[john@localhost src]$
      >>
      >>
      >[john@localhost src]$ ./foobar-cpp
      >Δοκιμασ τικό
      >�
      >[john@localhost src]$
      >
      The above my not be the only problem. In cases like this, you need to
      say way encoding your terminal is using.

      You are somehow correct on this. My terminal encoding was UTF-8 and I
      added Greek(ISO-8859-7). Under the last, the following code works OK:


      #include <wchar.h>
      #include <locale.h>
      #include <stdio.h>
      #include <stddef.h>

      int main()
      {
      char *p= setlocale( LC_ALL, "Greek" );

      wprintf(L"ΔοΠºÎ¹Î¼Î±ÏƒÏ„ικ ÏŒ\n");

      return 0;
      }

      [john@localhost src]$ ./foobar-cpp
      Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
      [john@localhost src]$


      Also the original, fixed according to your suggestion:


      #include <wchar.h>
      #include <locale.h>
      #include <stdio.h>
      #include <stddef.h>

      int main()
      {
      char *p= setlocale( LC_ALL, "Greek" );

      wchar_t input[50];

      if (!p)
      printf("NULL returned!\n");

      fgetws(input, 50, stdin);

      wprintf(L"%ls", input);

      return 0;
      }

      works OK too:

      [john@localhost src]$ ./foobar-cpp
      Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
      Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
      [john@localhost src]$


      It works OK under Terminal UTF-8 default encoding too. So "%ls" is what
      was really needed.


      BTW, how can we define UTF-8 as the locale?


      Thanks a lot.

      Comment

      • Ioannis Vranos

        #4
        Re: Wide character input/output

        Ioannis Vranos wrote:
        >
        It works OK under Terminal UTF-8 default encoding too. So "%ls" is what
        was really needed.

        Actually the code:

        #include <wchar.h>
        #include <locale.h>
        #include <stdio.h>
        #include <stddef.h>

        int main()
        {
        char *p= setlocale( LC_ALL, "Greek" );

        wprintf(L"ΔοΠºÎ¹Î¼Î±ÏƒÏ„ικ ÏŒ\n");

        return 0;
        }

        works only when I set the Terminal encoding to Greek (ISO-8859-7).


        >
        >
        BTW, how can we define UTF-8 as the locale?
        >
        >
        Thanks a lot.

        Comment

        • Ben Bacarisse

          #5
          Re: Wide character input/output

          Ioannis Vranos <ivranos@nospam .no.spamfreemai l.grwrites:
          Ben Bacarisse wrote:
          >>
          >You need "%ls". This is very important with wprintf since without it
          >%s denotes a multi-byte character sequence. printf("%ls\n" input)
          >should also work. You need the w version if you want the multi-byte
          >conversion of %s or if the format has to be a wchar_t pointer.
          >
          >
          Perhaps you may help me understand better. We have the usual char
          encoding which is implementation defined (usually ASCII).
          >
          wchar_t is wide character encoding, which is the "largest character
          set supported by the system", so I suppose Unicode under Linux and
          Windows.
          >
          What exactly is a multi-byte character?
          It is a confusing term. It means an encoding that uses sequences of
          ordinary bytes (in the C sense -- chars) to encode a large character
          set. The most common example is UTF-8.
          I have to say that I am talking about C95 here, not C99.
          >
          >
          >>
          >> return 0;
          >>}
          >>>
          >>>
          >>Under Linux:
          >>>
          >>>
          >>[john@localhost src]$ ./foobar-cpp
          >>Test
          >>T
          >>[john@localhost src]$
          >>>
          >>>
          >>[john@localhost src]$ ./foobar-cpp
          >>Î”Î¿ÎºÎ¹Î¼Î±Ï ƒÏ„ικό
          >>�
          >>[john@localhost src]$
          >>
          >The above my not be the only problem. In cases like this, you need to
          >say way encoding your terminal is using.
          >
          >
          You are somehow correct on this.
          Strange, I know!
          My terminal encoding was UTF-8 and I
          added Greek(ISO-8859-7). Under the last, the following code works OK:
          >
          >
          #include <wchar.h>
          #include <locale.h>
          #include <stdio.h>
          #include <stddef.h>
          >
          int main()
          {
          char *p= setlocale( LC_ALL, "Greek" );
          >
          wprintf(L"ΔοΠºÎ¹Î¼Î±ÏƒÏ„ικ ÏŒ\n");
          >
          return 0;
          }
          >
          [john@localhost src]$ ./foobar-cpp
          Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
          [john@localhost src]$
          >
          >
          Also the original, fixed according to your suggestion:
          >
          >
          #include <wchar.h>
          #include <locale.h>
          #include <stdio.h>
          #include <stddef.h>
          >
          int main()
          {
          char *p= setlocale( LC_ALL, "Greek" );
          >
          wchar_t input[50];
          >
          if (!p)
          printf("NULL returned!\n");
          >
          fgetws(input, 50, stdin);
          >
          wprintf(L"%ls", input);
          >
          return 0;
          }
          >
          works OK too:
          >
          [john@localhost src]$ ./foobar-cpp
          Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
          Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
          [john@localhost src]$
          >
          >
          It works OK under Terminal UTF-8 default encoding too. So "%ls" is
          what was really needed.
          >
          >
          BTW, how can we define UTF-8 as the locale?
          I *think* this is now off-topic. I don't think C says anything about
          what the locale string means...

          The character encoding is usually specified after a '.'. I use, for
          example, "en-GB.UTF-8". I suspect that if you only specify a part of
          the locale (or one that does not make sense) your C library picks up
          what to do from the execution environment. To me "Greek" looks like
          an odd locale string. I would expect "el-GR.UTF-8" or
          "el-GR.ISO8859-7".

          --
          Ben.

          Comment

          • Ben Bacarisse

            #6
            Re: Wide character input/output

            Ioannis Vranos <ivranos@nospam .no.spamfreemai l.grwrites:
            Ioannis Vranos wrote:
            >>
            >It works OK under Terminal UTF-8 default encoding too. So "%ls" is
            >what was really needed.
            >
            >
            Actually the code:
            >
            #include <wchar.h>
            #include <locale.h>
            #include <stdio.h>
            #include <stddef.h>
            >
            int main()
            {
            char *p= setlocale( LC_ALL, "Greek" );
            >
            wprintf(L"ΔοΠºÎ¹Î¼Î±ÏƒÏ„ικ ÏŒ\n");
            >
            return 0;
            }
            >
            works only when I set the Terminal encoding to Greek (ISO-8859-7).
            This sort of thing is almost impossible to investigate over Usenet.
            Your news software will take your code and may or may not encode the
            characters of the L"..." string in the encoding of your post (UTF-8).
            It makes it very hard to know what the program text actually is.

            Another complication is that the locale setting affects the run-time
            behaviour, but you program also depends on what character encoding is
            expected by the compiler that builds the string.

            --
            Ben.

            Comment

            • Ioannis Vranos

              #7
              Re: Wide character input/output

              Ben Bacarisse wrote:
              >BTW, how can we define UTF-8 as the locale?
              >
              I *think* this is now off-topic. I don't think C says anything about
              what the locale string means...
              >
              The character encoding is usually specified after a '.'. I use, for
              example, "en-GB.UTF-8". I suspect that if you only specify a part of
              the locale (or one that does not make sense) your C library picks up
              what to do from the execution environment. To me "Greek" looks like
              an odd locale string. I would expect "el-GR.UTF-8" or
              "el-GR.ISO8859-7".

              I got the idea from:




              Comment

              • Ioannis Vranos

                #8
                Re: Wide character input/output

                Ben Bacarisse wrote:
                >BTW, how can we define UTF-8 as the locale?
                >
                I *think* this is now off-topic. I don't think C says anything about
                what the locale string means...
                >
                The character encoding is usually specified after a '.'. I use, for
                example, "en-GB.UTF-8". I suspect that if you only specify a part of
                the locale (or one that does not make sense) your C library picks up
                what to do from the execution environment. To me "Greek" looks like
                an odd locale string. I would expect "el-GR.UTF-8" or
                "el-GR.ISO8859-7".

                This code works with gcc:

                #include <wchar.h>
                #include <locale.h>
                #include <stdio.h>
                #include <stddef.h>

                int main()
                {
                char *p= setlocale( LC_ALL, "greek" );

                wchar_t input[50];

                if (!p)
                printf("NULL returned!\n");

                fgetws(input, 50, stdin);

                wprintf(L"%ls", input);

                return 0;
                }


                [john@localhost src]$ ./foobar-cpp
                Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
                Î”Î¿ÎºÎ¹Î¼Î±ÏƒÏ „ικό
                [john@localhost src]$


                When I place el-GR.UTF-8 or el-GR.ISO8859-7 I get:


                [john@localhost src]$ ./foobar-cpp
                NULL returned!

                [john@localhost src]$

                Comment

                • Ben Bacarisse

                  #9
                  Re: Wide character input/output

                  Ioannis Vranos <ivranos@nospam .no.spamfreemai l.grwrites:
                  Ben Bacarisse wrote:
                  >>BTW, how can we define UTF-8 as the locale?
                  >>
                  >I *think* this is now off-topic. I don't think C says anything about
                  >what the locale string means...
                  >>
                  >The character encoding is usually specified after a '.'. I use, for
                  >example, "en-GB.UTF-8". I suspect that if you only specify a part of
                  >the locale (or one that does not make sense) your C library picks up
                  >what to do from the execution environment. To me "Greek" looks like
                  >an odd locale string. I would expect "el-GR.UTF-8" or
                  >"el-GR.ISO8859-7".
                  >
                  I got the idea from:
                  >
                  http://msdn2.microsoft.com/en-us/lib...1d(VS.80).aspx
                  Ah, OK. Anyway, we are off-topic now. I think you'd have to post in
                  a Windows group to find out what locale strings mean there.

                  --
                  Ben.

                  Comment

                  • Ioannis Vranos

                    #10
                    Re: Wide character input/output

                    Ben Bacarisse wrote:
                    Ioannis Vranos <ivranos@nospam .no.spamfreemai l.grwrites:
                    >
                    >Ben Bacarisse wrote:
                    >>>BTW, how can we define UTF-8 as the locale?
                    >>I *think* this is now off-topic. I don't think C says anything about
                    >>what the locale string means...
                    >>>
                    >>The character encoding is usually specified after a '.'. I use, for
                    >>example, "en-GB.UTF-8". I suspect that if you only specify a part of
                    >>the locale (or one that does not make sense) your C library picks up
                    >>what to do from the execution environment. To me "Greek" looks like
                    >>an odd locale string. I would expect "el-GR.UTF-8" or
                    >>"el-GR.ISO8859-7".
                    >I got the idea from:
                    >>
                    >http://msdn2.microsoft.com/en-us/lib...1d(VS.80).aspx
                    >
                    Ah, OK. Anyway, we are off-topic now. I think you'd have to post in
                    a Windows group to find out what locale strings mean there.

                    I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you suggested
                    make setlocale() return NULL. The "greek" and "Greek" suggested by
                    MSDN works. So I supposed there is a portable way for this. Aren't any
                    portable locale encoding strings?

                    Comment

                    • Ioannis Vranos

                      #11
                      Re: Wide character input/output

                      Clarified:

                      I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you suggested
                      make setlocale() return NULL
                      ==under Linux.
                      The "greek" and "Greek" suggested by MSDN
                      works
                      ==under Linux.
                      So I supposed there is a portable way for this. Aren't any
                      portable locale encoding strings?

                      Comment

                      • Ioannis Vranos

                        #12
                        Re: Wide character input/output

                        Ioannis Vranos wrote:
                        Clarified:
                        >
                        >
                        >I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you
                        >suggested make setlocale() return NULL
                        >
                        ==under Linux.
                        >
                        >The "greek" and "Greek" suggested by MSDN works
                        >
                        ==under Linux.
                        >
                        >So I supposed there is a portable way for this. Aren't any portable
                        >locale encoding strings?

                        Also based on
                        http://gcc.gnu.org/onlinedocs/libstd...le/locale.html where it
                        mentions "locale -a" and provides a list of locales, in my system it
                        outputs among other things:


                        galego
                        galician
                        gd_GB
                        gd_GB.iso885915
                        gd_GB.utf8
                        german
                        gez_ER
                        gez_ER@abegede
                        gez_ER.utf8
                        gez_ER.utf8@abe gede
                        gez_ET
                        gez_ET@abegede
                        gez_ET.utf8
                        gez_ET.utf8@abe gede
                        gl_ES
                        gl_ES@euro
                        gl_ES.iso88591
                        gl_ES.iso885915 @euro
                        gl_ES.utf8
                        ==greek
                        gu_IN
                        gu_IN.utf8
                        gv_GB
                        gv_GB.iso88591
                        gv_GB.utf8
                        hebrew
                        he_IL
                        he_IL.iso88598
                        he_IL.utf8
                        hi_IN
                        hi_IN.utf8
                        hr_HR
                        hr_HR.iso88592
                        hr_HR.utf8
                        hrvatski
                        hsb_DE
                        hsb_DE.iso88592
                        hsb_DE.utf8
                        hu_HU
                        hu_HU.iso88592
                        hu_HU.utf8
                        hungarian


                        So "greek" is a valid locale for linux too.

                        Comment

                        • Ben Bacarisse

                          #13
                          Re: Wide character input/output

                          Ioannis Vranos <ivranos@nospam .no.spamfreemai l.grwrites:
                          Ioannis Vranos wrote:
                          >Clarified:
                          >>
                          >>
                          >>I am a Linux user. The "el-GR.UTF-8" and "el-GR.ISO8859-7" you
                          >>suggested make setlocale() return NULL
                          >>
                          >==under Linux.
                          >>
                          >>The "greek" and "Greek" suggested by MSDN works
                          >>
                          >==under Linux.
                          >>
                          >>So I supposed there is a portable way for this. Aren't any portable
                          >>locale encoding strings?
                          >
                          Also based on
                          http://gcc.gnu.org/onlinedocs/libstd...le/locale.html where it
                          mentions "locale -a" and provides a list of locales, in my system it
                          outputs among other things:
                          >
                          galego
                          galician
                          gd_GB
                          ....
                          gl_ES.iso885915 @euro
                          gl_ES.utf8
                          ==greek
                          Post in comp.unix.progr ammer. I think you can define anything you
                          like under Linux, but what is and is not valid is not specified by C.
                          Other standards (like POSIX) probably specify much more.
                          So "greek" is a valid locale for linux too.
                          --
                          Ben.

                          Comment

                          • CBFalconer

                            #14
                            Re: Wide character input/output

                            Ioannis Vranos wrote:
                            >
                            .... snip ...
                            >
                            I have attached a screenshot.
                            According to which, I believe, you are using a c++ compiler.

                            --
                            [mail]: Chuck F (cbfalconer at maineline dot net)
                            [page]: <http://cbfalconer.home .att.net>
                            Try the download section.



                            --
                            Posted via a free Usenet account from http://www.teranews.com

                            Comment

                            • CBFalconer

                              #15
                              Re: Wide character input/output

                              Ioannis Vranos wrote:
                              >
                              [The current message encoding is set to Unicode (UTF-8) because
                              it contains Greek]
                              >
                              The following code does not work as expected:
                              >
                              #include <wchar.h>
                              #include <locale.h>
                              #include <stdio.h>
                              #include <stddef.h>
                              >
                              int main() {
                              char *p= setlocale( LC_ALL, "Greek" );
                              wchar_t input[50];
                              >
                              if (!p)
                              printf("NULL returned!\n");
                              fgetws(input, 50, stdin);
                              wprintf(L"%s\n" , input);
                              return 0;
                              }
                              >
                              .... snip ...
                              >
                              Am I missing something?
                              Yes. If setlocale fails, it returns NULL, which you detect, but do
                              not immediately exit the program. You also forgot to check for
                              errors in executing fgetws or wprintf.

                              --
                              [mail]: Chuck F (cbfalconer at maineline dot net)
                              [page]: <http://cbfalconer.home .att.net>
                              Try the download section.



                              --
                              Posted via a free Usenet account from http://www.teranews.com

                              Comment

                              Working...