ispunct()

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ioannis Vranos

    ispunct()

    ispunct() returns true for all symbols? (like <>/@^&#@ etc).







    Ioannis Vranos
  • Lew Pitcher

    #2
    Re: ispunct()

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Ioannis Vranos wrote:
    | ispunct() returns true for all symbols? (like <>/@^&#@ etc).

    Caveat: You cross-posted this question to newsgroups that cover two different
    computer languages. You may get two different answers, depending on which
    language is described.

    The ISO/IEC 9989:1999 draft for the ISO C'99 standard says of ispunct()
    "The ispunct function tests for any printing character that is one of a
    locale-specific set of punctuation characters for which neither isspace nor
    isalnum is true. In the "C" locale, ispunct returns true for every printing
    character for which neither isspace nor isalnum is true."

    So, to answer your question, for ISO C'99, in the "C" locale, all symbols will
    return true from ispunct, as they
    a) are printing characters,
    b) do not return true from isspace, and
    c) do not return true from isalnum

    Other locales may result in different values from C'99 ispunct for those characters.

    Other levels of C standards compliance (i.e. C'90, K&R C, etc.) may result in
    different values from ispunct for those characters.

    Other languages may result in different values from ispunct for those characters.



    - --
    Lew Pitcher

    Master Codewright & JOAT-in-training | GPG public key available on request
    Registered Linux User #112576 (http://counter.li.org/)
    Slackware - Because I know what I'm doing.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iD8DBQFAk7VBagV FX4UWr64RAqhVAJ 0XGG36295evkof2 QbC+zorBLtn1ACe JU2V
    mfzwEoMgbv9UgMl nXJyjhb8=
    =15YI
    -----END PGP SIGNATURE-----

    Comment

    • Régis Troadec

      #3
      Re: ispunct()


      "Ioannis Vranos" <ivr@guesswh.at .emails.ru> a écrit dans le message de
      news:c709jc$2oo p$1@ulysses.noc .ntua.gr...[color=blue]
      > ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]

      I would say yes.
      It returns true for every printable character for which neither isspace()
      nor isalnum() returns true. That's what it is said in the standard.
      I think about punctuators when I see ispunct(), but I don't know if its name
      is semantically related to them. The short program below shows which
      printable characters make ispunct() returning true and those which make
      ispunct() returning false :

      #include <stdio.h>
      #include <ctype.h>

      int main(void)
      {
      /* Walk through the range of printable characters
      form 0x20 ' ' to Ox7E '~' in the 7-bit ASCII table */
      char c = 0x20;
      while(c <= 0x7E)
      {
      printf("Is %c a printable char different from space or alphanum?"
      " %s\n",c,ispunct (c) ? "YES":"NO") ;
      c += 1;
      }
      return 0;
      }

      Regis


      Comment

      • Ioannis Vranos

        #4
        Re: ispunct()

        "Lew Pitcher" <lpitcher@sympa tico.ca> wrote in message
        news:TwOkc.5683 9$OU.1339048@ne ws20.bellglobal .com...[color=blue]
        > -----BEGIN PGP SIGNED MESSAGE-----
        >
        > Caveat: You cross-posted this question to newsgroups that cover two[/color]
        different[color=blue]
        > computer languages. You may get two different answers, depending on which
        > language is described.[/color]


        Yes i know, however i guessed that C99 ispunct() behaviour does not differ
        from C++98 (and C90).






        Ioannis Vranos

        Comment

        • August Derleth

          #5
          Re: ispunct()

          On Sat, 01 May 2004 16:44:13 +0300, Ioannis Vranos wrote:
          [color=blue]
          > ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]

          From my manpage that shipped with gcc, ispunct() returns true for any
          nonblank character that isn't a letter or a number. gcc says this
          subroutine is conformant with ANSI-C.

          What, exactly, is considered a letter can vary by locale, but in the C
          locale any member of [A-Za-z] is considered alphabetic.

          --
          yvoregnevna gjragl-guerr gjb-gubhfnaq guerr ng lnubb qbg pbz
          To email me, rot13 and convert spelled-out numbers to numeric form.
          "Makes hackers smile" makes hackers smile.

          Comment

          • Barry Schwarz

            #6
            Re: ispunct()

            On Sat, 01 May 2004 12:23:03 -0600, August Derleth <see@sig.now>
            wrote:
            [color=blue]
            >On Sat, 01 May 2004 16:44:13 +0300, Ioannis Vranos wrote:
            >[color=green]
            >> ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]
            >
            >From my manpage that shipped with gcc, ispunct() returns true for any
            >nonblank character that isn't a letter or a number. gcc says this
            >subroutine is conformant with ANSI-C.[/color]

            There are a minimum of 256 possible values for a char. Blank is only
            1. If we stick to the English alphabet, there are 52 letters and ten
            digits leaving at least 193 values for which you man page says ispunct
            returns true. Unfortunately, the C99 standard says it must be a
            printing character which eliminates a significant number of these 193.
            I see three possibilities:

            You misquoted the man page.

            The man page is less specific than it should be and therefore
            misleading.

            The man page is incorrect regarding compliance and therefore
            misleading.[color=blue]
            >
            >What, exactly, is considered a letter can vary by locale, but in the C
            >locale any member of [A-Za-z] is considered alphabetic.[/color]

            In any locale, a letter is any character for which isalpha returns
            true. While your regular expression is correct (because it does not
            depend on representation) , it may lead someone to believe that if 'A'
            <= mychar <= 'Z' then mychar is a letter. On my system, there are
            characters between 'I' and "J' and between 'R' and 'S' that are not
            letters.



            <<Remove the del for email>>

            Comment

            • Ioannis Vranos

              #7
              Re: ispunct()

              "Barry Schwarz" <schwarzb@deloz .net> wrote in message
              news:c71c1f$7oj $2@216.39.134.6 9...[color=blue]
              >
              > There are a minimum of 256 possible values for a char.[/color]


              We must note here that (plain) char may be either of type signed char or
              unsigned char, and if it is signed char the negative values are useless
              here.

              [color=blue]
              > Blank is only
              > 1. If we stick to the English alphabet, there are 52 letters and ten
              > digits leaving at least 193 values for which you man page says ispunct
              > returns true. Unfortunately, the C99 standard says it must be a
              > printing character which eliminates a significant number of these 193.[/color]


              But it is ok with me since i want to use the (printable) keyboard symbols of
              the ASCII table and filter the rest letters and digits.






              Ioannis Vranos

              Comment

              • Keith Thompson

                #8
                Re: ispunct()

                "Ioannis Vranos" <ivr@guesswh.at .emails.ru> writes:[color=blue]
                > "Barry Schwarz" <schwarzb@deloz .net> wrote in message
                > news:c71c1f$7oj $2@216.39.134.6 9...[color=green]
                > >
                > > There are a minimum of 256 possible values for a char.[/color]
                >
                > We must note here that (plain) char may be either of type signed char or
                > unsigned char, and if it is signed char the negative values are useless
                > here.[/color]

                A quibble: (plain) char has the same characteristics as either signed
                char or unsigned char, but it's a distinct type.

                --
                Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
                San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
                Schroedinger does Shakespeare: "To be *and* not to be"

                Comment

                • Richard Bos

                  #9
                  Re: ispunct()

                  "Ioannis Vranos" <ivr@guesswh.at .emails.ru> wrote:
                  [color=blue]
                  > "Lew Pitcher" <lpitcher@sympa tico.ca> wrote in message
                  > news:TwOkc.5683 9$OU.1339048@ne ws20.bellglobal .com...[color=green]
                  > > Caveat: You cross-posted this question to newsgroups that cover two different
                  > > computer languages. You may get two different answers, depending on which
                  > > language is described.[/color]
                  >
                  > Yes i know, however i guessed that C99 ispunct() behaviour does not differ
                  > from C++98 (and C90).[/color]

                  Then why cross-post in the first place?

                  Richard

                  Comment

                  • Richard Bos

                    #10
                    Re: ispunct()

                    "Ioannis Vranos" <ivr@guesswh.at .emails.ru> wrote:
                    [color=blue]
                    > "Barry Schwarz" <schwarzb@deloz .net> wrote in message
                    > news:c71c1f$7oj $2@216.39.134.6 9...[color=green]
                    > >
                    > > There are a minimum of 256 possible values for a char.[/color]
                    >
                    > We must note here that (plain) char may be either of type signed char or
                    > unsigned char, and if it is signed char the negative values are useless
                    > here.[/color]

                    True as such, but all is*()s take an int having the value of an unsigned
                    char (or EOF), not a signed or plain char.

                    Richard

                    Comment

                    • Michiel Salters

                      #11
                      Re: ispunct()

                      Barry Schwarz <schwarzb@deloz .net> wrote in message news:<c71c1f$7o j$2@216.39.134. 69>...[color=blue]
                      > On Sat, 01 May 2004 12:23:03 -0600, August Derleth <see@sig.now>
                      > wrote:
                      >[color=green]
                      > >On Sat, 01 May 2004 16:44:13 +0300, Ioannis Vranos wrote:
                      > >[color=darkred]
                      > >> ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]
                      > >
                      > >From my manpage that shipped with gcc, ispunct() returns true for any
                      > >nonblank character that isn't a letter or a number. gcc says this
                      > >subroutine is conformant with ANSI-C.[/color]
                      >
                      > There are a minimum of 256 possible values for a char. Blank is only
                      > 1.[/color]

                      \t isn't blank ?
                      [color=blue]
                      > If we stick to the English alphabet, there are 52 letters and ten
                      > digits leaving at least 193 values for which you man page says ispunct
                      > returns true. Unfortunately, the C99 standard says it must be a
                      > printing character which eliminates a significant number of these 193.[/color]

                      Al least \0 must be eliminated, obviously. That can never be a printing
                      character. I don't understand the "Unfortunat ely" - do you want to
                      imply that ispunct('\0') should be true?
                      [color=blue]
                      > I see three possibilities:
                      >
                      > You misquoted the man page.
                      >
                      > The man page is less specific than it should be and therefore
                      > misleading.
                      >
                      > The man page is incorrect regarding compliance and therefore
                      > misleading.[/color]

                      I think it's the second, but it's really nit picking. The only word
                      missing is non-printing, and that may even be dropped in the quote.
                      [color=blue][color=green]
                      > >What, exactly, is considered a letter can vary by locale, but in the C
                      > >locale any member of [A-Za-z] is considered alphabetic.[/color]
                      >
                      > In any locale, a letter is any character for which isalpha returns
                      > true. While your regular expression is correct (because it does not
                      > depend on representation) , it may lead someone to believe that if 'A'
                      > <= mychar <= 'Z' then mychar is a letter. On my system, there are
                      > characters between 'I' and "J' and between 'R' and 'S' that are not
                      > letters.[/color]

                      What someone believes, based on a misinterpretaio n of a regex can't be
                      helped. The regex is well defined and doesn't include those other
                      characters you refer to. Anyway, regex'es aren't C, not yet C++, and
                      were used only as a shorthand.

                      Comment

                      • Keith Thompson

                        #12
                        Re: ispunct()

                        Michiel.Salters @logicacmg.com (Michiel Salters) writes:[color=blue]
                        > Barry Schwarz <schwarzb@deloz .net> wrote in message
                        > news:<c71c1f$7o j$2@216.39.134. 69>...[color=green]
                        > > On Sat, 01 May 2004 12:23:03 -0600, August Derleth <see@sig.now>
                        > > wrote:[/color][/color]
                        [...][color=blue][color=green][color=darkred]
                        > > >What, exactly, is considered a letter can vary by locale, but in the C
                        > > >locale any member of [A-Za-z] is considered alphabetic.[/color]
                        > >
                        > > In any locale, a letter is any character for which isalpha returns
                        > > true. While your regular expression is correct (because it does not
                        > > depend on representation) , it may lead someone to believe that if 'A'
                        > > <= mychar <= 'Z' then mychar is a letter. On my system, there are
                        > > characters between 'I' and "J' and between 'R' and 'S' that are not
                        > > letters.[/color]
                        >
                        > What someone believes, based on a misinterpretaio n of a regex can't be
                        > helped. The regex is well defined and doesn't include those other
                        > characters you refer to. Anyway, regex'es aren't C, not yet C++, and
                        > were used only as a shorthand.[/color]

                        <OT>
                        I understand the intent of the shorthand, but according to my
                        (limited) understanding of how regular expression are defined, the
                        regexp [A-Za-z] covers all characters from 'A' to 'Z' and from 'a' to
                        'z' inclusive in the current (locale-dependent) collating sequence.
                        If that collating sequence happens to put non-letters between letters
                        (as it might on an EBCDIC system), the regexp could match non-letters.
                        That's why things like [:alpha:], [:lower:], and [:upper:] were
                        introduced.
                        </OT>

                        --
                        Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
                        San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
                        Schroedinger does Shakespeare: "To be *and* not to be"

                        Comment

                        Working...