Ioannis Vranos wrote:
| ispunct() returns true for all symbols? (like <>/@^&#@ etc).
Caveat: You cross-posted this question to newsgroups that cover two different
computer languages. You may get two different answers, depending on which
language is described.
The ISO/IEC 9989:1999 draft for the ISO C'99 standard says of ispunct()
"The ispunct function tests for any printing character that is one of a
locale-specific set of punctuation characters for which neither isspace nor
isalnum is true. In the "C" locale, ispunct returns true for every printing
character for which neither isspace nor isalnum is true."
So, to answer your question, for ISO C'99, in the "C" locale, all symbols will
return true from ispunct, as they
a) are printing characters,
b) do not return true from isspace, and
c) do not return true from isalnum
Other locales may result in different values from C'99 ispunct for those characters.
Other levels of C standards compliance (i.e. C'90, K&R C, etc.) may result in
different values from ispunct for those characters.
Other languages may result in different values from ispunct for those characters.
- --
Lew Pitcher
Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
"Ioannis Vranos" <ivr@guesswh.at .emails.ru> a écrit dans le message de
news:c709jc$2oo p$1@ulysses.noc .ntua.gr...[color=blue]
> ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]
I would say yes.
It returns true for every printable character for which neither isspace()
nor isalnum() returns true. That's what it is said in the standard.
I think about punctuators when I see ispunct(), but I don't know if its name
is semantically related to them. The short program below shows which
printable characters make ispunct() returning true and those which make
ispunct() returning false :
#include <stdio.h>
#include <ctype.h>
int main(void)
{
/* Walk through the range of printable characters
form 0x20 ' ' to Ox7E '~' in the 7-bit ASCII table */
char c = 0x20;
while(c <= 0x7E)
{
printf("Is %c a printable char different from space or alphanum?"
" %s\n",c,ispunct (c) ? "YES":"NO") ;
c += 1;
}
return 0;
}
"Lew Pitcher" <lpitcher@sympa tico.ca> wrote in message
news:TwOkc.5683 9$OU.1339048@ne ws20.bellglobal .com...[color=blue]
> -----BEGIN PGP SIGNED MESSAGE-----
>
> Caveat: You cross-posted this question to newsgroups that cover two[/color]
different[color=blue]
> computer languages. You may get two different answers, depending on which
> language is described.[/color]
Yes i know, however i guessed that C99 ispunct() behaviour does not differ
from C++98 (and C90).
On Sat, 01 May 2004 16:44:13 +0300, Ioannis Vranos wrote:
[color=blue]
> ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]
From my manpage that shipped with gcc, ispunct() returns true for any
nonblank character that isn't a letter or a number. gcc says this
subroutine is conformant with ANSI-C.
What, exactly, is considered a letter can vary by locale, but in the C
locale any member of [A-Za-z] is considered alphabetic.
--
yvoregnevna gjragl-guerr gjb-gubhfnaq guerr ng lnubb qbg pbz
To email me, rot13 and convert spelled-out numbers to numeric form.
"Makes hackers smile" makes hackers smile.
On Sat, 01 May 2004 12:23:03 -0600, August Derleth <see@sig.now>
wrote:
[color=blue]
>On Sat, 01 May 2004 16:44:13 +0300, Ioannis Vranos wrote:
>[color=green]
>> ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]
>
>From my manpage that shipped with gcc, ispunct() returns true for any
>nonblank character that isn't a letter or a number. gcc says this
>subroutine is conformant with ANSI-C.[/color]
There are a minimum of 256 possible values for a char. Blank is only
1. If we stick to the English alphabet, there are 52 letters and ten
digits leaving at least 193 values for which you man page says ispunct
returns true. Unfortunately, the C99 standard says it must be a
printing character which eliminates a significant number of these 193.
I see three possibilities:
You misquoted the man page.
The man page is less specific than it should be and therefore
misleading.
The man page is incorrect regarding compliance and therefore
misleading.[color=blue]
>
>What, exactly, is considered a letter can vary by locale, but in the C
>locale any member of [A-Za-z] is considered alphabetic.[/color]
In any locale, a letter is any character for which isalpha returns
true. While your regular expression is correct (because it does not
depend on representation) , it may lead someone to believe that if 'A'
<= mychar <= 'Z' then mychar is a letter. On my system, there are
characters between 'I' and "J' and between 'R' and 'S' that are not
letters.
"Barry Schwarz" <schwarzb@deloz .net> wrote in message
news:c71c1f$7oj $2@216.39.134.6 9...[color=blue]
>
> There are a minimum of 256 possible values for a char.[/color]
We must note here that (plain) char may be either of type signed char or
unsigned char, and if it is signed char the negative values are useless
here.
[color=blue]
> Blank is only
> 1. If we stick to the English alphabet, there are 52 letters and ten
> digits leaving at least 193 values for which you man page says ispunct
> returns true. Unfortunately, the C99 standard says it must be a
> printing character which eliminates a significant number of these 193.[/color]
But it is ok with me since i want to use the (printable) keyboard symbols of
the ASCII table and filter the rest letters and digits.
"Ioannis Vranos" <ivr@guesswh.at .emails.ru> writes:[color=blue]
> "Barry Schwarz" <schwarzb@deloz .net> wrote in message
> news:c71c1f$7oj $2@216.39.134.6 9...[color=green]
> >
> > There are a minimum of 256 possible values for a char.[/color]
>
> We must note here that (plain) char may be either of type signed char or
> unsigned char, and if it is signed char the negative values are useless
> here.[/color]
A quibble: (plain) char has the same characteristics as either signed
char or unsigned char, but it's a distinct type.
--
Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
"Ioannis Vranos" <ivr@guesswh.at .emails.ru> wrote:
[color=blue]
> "Lew Pitcher" <lpitcher@sympa tico.ca> wrote in message
> news:TwOkc.5683 9$OU.1339048@ne ws20.bellglobal .com...[color=green]
> > Caveat: You cross-posted this question to newsgroups that cover two different
> > computer languages. You may get two different answers, depending on which
> > language is described.[/color]
>
> Yes i know, however i guessed that C99 ispunct() behaviour does not differ
> from C++98 (and C90).[/color]
"Ioannis Vranos" <ivr@guesswh.at .emails.ru> wrote:
[color=blue]
> "Barry Schwarz" <schwarzb@deloz .net> wrote in message
> news:c71c1f$7oj $2@216.39.134.6 9...[color=green]
> >
> > There are a minimum of 256 possible values for a char.[/color]
>
> We must note here that (plain) char may be either of type signed char or
> unsigned char, and if it is signed char the negative values are useless
> here.[/color]
True as such, but all is*()s take an int having the value of an unsigned
char (or EOF), not a signed or plain char.
Barry Schwarz <schwarzb@deloz .net> wrote in message news:<c71c1f$7o j$2@216.39.134. 69>...[color=blue]
> On Sat, 01 May 2004 12:23:03 -0600, August Derleth <see@sig.now>
> wrote:
>[color=green]
> >On Sat, 01 May 2004 16:44:13 +0300, Ioannis Vranos wrote:
> >[color=darkred]
> >> ispunct() returns true for all symbols? (like <>/@^&#@ etc).[/color]
> >
> >From my manpage that shipped with gcc, ispunct() returns true for any
> >nonblank character that isn't a letter or a number. gcc says this
> >subroutine is conformant with ANSI-C.[/color]
>
> There are a minimum of 256 possible values for a char. Blank is only
> 1.[/color]
\t isn't blank ?
[color=blue]
> If we stick to the English alphabet, there are 52 letters and ten
> digits leaving at least 193 values for which you man page says ispunct
> returns true. Unfortunately, the C99 standard says it must be a
> printing character which eliminates a significant number of these 193.[/color]
Al least \0 must be eliminated, obviously. That can never be a printing
character. I don't understand the "Unfortunat ely" - do you want to
imply that ispunct('\0') should be true?
[color=blue]
> I see three possibilities:
>
> You misquoted the man page.
>
> The man page is less specific than it should be and therefore
> misleading.
>
> The man page is incorrect regarding compliance and therefore
> misleading.[/color]
I think it's the second, but it's really nit picking. The only word
missing is non-printing, and that may even be dropped in the quote.
[color=blue][color=green]
> >What, exactly, is considered a letter can vary by locale, but in the C
> >locale any member of [A-Za-z] is considered alphabetic.[/color]
>
> In any locale, a letter is any character for which isalpha returns
> true. While your regular expression is correct (because it does not
> depend on representation) , it may lead someone to believe that if 'A'
> <= mychar <= 'Z' then mychar is a letter. On my system, there are
> characters between 'I' and "J' and between 'R' and 'S' that are not
> letters.[/color]
What someone believes, based on a misinterpretaio n of a regex can't be
helped. The regex is well defined and doesn't include those other
characters you refer to. Anyway, regex'es aren't C, not yet C++, and
were used only as a shorthand.
Michiel.Salters @logicacmg.com (Michiel Salters) writes:[color=blue]
> Barry Schwarz <schwarzb@deloz .net> wrote in message
> news:<c71c1f$7o j$2@216.39.134. 69>...[color=green]
> > On Sat, 01 May 2004 12:23:03 -0600, August Derleth <see@sig.now>
> > wrote:[/color][/color]
[...][color=blue][color=green][color=darkred]
> > >What, exactly, is considered a letter can vary by locale, but in the C
> > >locale any member of [A-Za-z] is considered alphabetic.[/color]
> >
> > In any locale, a letter is any character for which isalpha returns
> > true. While your regular expression is correct (because it does not
> > depend on representation) , it may lead someone to believe that if 'A'
> > <= mychar <= 'Z' then mychar is a letter. On my system, there are
> > characters between 'I' and "J' and between 'R' and 'S' that are not
> > letters.[/color]
>
> What someone believes, based on a misinterpretaio n of a regex can't be
> helped. The regex is well defined and doesn't include those other
> characters you refer to. Anyway, regex'es aren't C, not yet C++, and
> were used only as a shorthand.[/color]
<OT>
I understand the intent of the shorthand, but according to my
(limited) understanding of how regular expression are defined, the
regexp [A-Za-z] covers all characters from 'A' to 'Z' and from 'a' to
'z' inclusive in the current (locale-dependent) collating sequence.
If that collating sequence happens to put non-letters between letters
(as it might on an EBCDIC system), the regexp could match non-letters.
That's why things like [:alpha:], [:lower:], and [:upper:] were
introduced.
</OT>
--
Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Comment