syntax/notation used in describing c's grammar

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ben

    syntax/notation used in describing c's grammar

    getting a bit confused with the details of how c's grammar is
    specified, especially when you get self-reference like in this:

    postfix-expression:
    primary-expression
    postfix-expression [ expression ]
    postfix-expression ( argument-expression-listopt )
    postfix-expression . identifier
    postfix-expression -> identifier
    postfix-expression ++
    postfix-expression --
    ( type-name ) { initializer-list }
    ( type-name ) { initializer-list , }

    (copied and pasted from a ISO/IEC 9899 (that's "C99" i think)
    specification pdf)

    when a module (i'm thinking of the above whole thing as a module) is
    referred to from within itself like in the above one, at the start of
    one or several of its possibilities, that means that one of the other
    variations/possibilities of the postfix-expression module must occur
    immediately before one of the possibilities that start with
    'postfix-expression' can occur right? so you can have a '++' only after
    one of the three lines which do not begin with 'postfix-expression'
    have occured? or another way of getting at the same point: if you had a
    module, say called 'xyz', whose possibilities all started with 'xyz',
    then that module would be a waste of time and impossible and useless
    because none of it could ever occur? at least one of the possibilities
    should *not* start with 'xyz' -- then it would be possible for it to
    occur? is that right?

    also, does that give potentially endless looping like behaviour -- not
    just one possibility that starts with 'postfix-expression' but
    several/many even possibly? eg, a primary-expression occurs, so then
    that primary-expression stands for a postfix-expression. then say [
    expression ] occurs after the primary expression. (so what's occured so
    far is: primary-expression [ expression ] ). what's occured so far is a
    postfix-expression, so can another [ expression ] follow on, or even
    anything else that has 'postfix-expression' before it in the above
    list? and then again possibly, and again etc?

    so, just to clarify, you could have these occurances, for example,
    within the code and it'd be fine? (at least according to the grammar
    that is, it'd be fine?) (one line per part):

    ( type-name ) { initializer-list }
    ++
    ( argument-expression-listopt )
    -> identifier
    -> identifier

    (i guess that couldn't actually happen in code (according to further
    checks by compiler/parser or whatever), but so far as what the above
    postfix-expression grammar specification goes, that's ok?)

    so the logic of implementing that in code, you'd need to look back to
    the previous occurance when a module name is encountered? -- but that's
    only applicable to modules self-referencing themselves isn't it? that's
    what i can't quite understand. when you come accross a module name that
    isn't self referencing, there's no need to look back -- it's just a
    straight, 'does it occur here' question, but when it's a self reference
    it seems to be a 'did it occur last time' question. i feel there
    shouldn't be any logical difference between a module using it's own
    module name, and a module using another module's name, but it does seem
    a different logic -- am i just getting confused or is a slightly
    different logic required for modules references to themselves?

    btw, i'm attempting to do a c, also objective-c, (semi) parser --
    that's why i'm trying to get on top of the grammar syntax.

    one last quickie: this line:
    ( type-name ) { initializer-list , }
    it looks like a ',' can occur followd by a '}' ? surely i'm reading
    that wrong as that isn't possible? feels like there should be something
    inbetween the , and }?

    thanks-a-lot, ben
  • Ben Pfaff

    #2
    Re: syntax/notation used in describing c's grammar

    ben <x@x.com> writes:
    [color=blue]
    > postfix-expression:
    > primary-expression
    > postfix-expression [ expression ]
    > postfix-expression ( argument-expression-listopt )
    > postfix-expression . identifier
    > postfix-expression -> identifier
    > postfix-expression ++
    > postfix-expression --
    > ( type-name ) { initializer-list }
    > ( type-name ) { initializer-list , }
    >
    > (copied and pasted from a ISO/IEC 9899 (that's "C99" i think)
    > specification pdf)
    >
    > when a module (i'm thinking of the above whole thing as a module)[/color]

    The proper term is "nontermina l".
    [color=blue]
    > is referred to from within itself like in the above one, at the
    > start of one or several of its possibilities, that means that
    > one of the other variations/possibilities of the
    > postfix-expression module must occur immediately before one of
    > the possibilities that start with 'postfix-expression' can
    > occur right?[/color]

    No. There is no such requirement.
    [color=blue]
    > so you can have a '++' only after one of the three lines which
    > do not begin with 'postfix-expression' have occured?[/color]

    Nope. something like a->b++ is potentially valid syntax.
    [color=blue]
    > or another way of getting at the same point: if you had a
    > module, say called 'xyz', whose possibilities all started with
    > 'xyz', then that module would be a waste of time and impossible
    > and useless because none of it could ever occur? at least one
    > of the possibilities should *not* start with 'xyz' -- then it
    > would be possible for it to occur? is that right?[/color]

    Yes: to be a sensible grammar, at least one of the productions
    for a nonterminal mentioned in a grammar must not contain itself
    on the right-hand side.
    [color=blue]
    > also, does that give potentially endless looping like behaviour -- not
    > just one possibility that starts with 'postfix-expression' but
    > several/many even possibly? eg, a primary-expression occurs, so then
    > that primary-expression stands for a postfix-expression. then say [
    > expression ] occurs after the primary expression. (so what's occured so
    > far is: primary-expression [ expression ] ). what's occured so far is a
    > postfix-expression, so can another [ expression ] follow on, or even
    > anything else that has 'postfix-expression' before it in the above
    > list? and then again possibly, and again etc?[/color]

    Yes, you can loop for a long time. But a real program is not
    infinite in length, and every real C implementation has limits on
    how big or complex a program it can accept.
    [color=blue]
    > so, just to clarify, you could have these occurances, for example,
    > within the code and it'd be fine? (at least according to the grammar
    > that is, it'd be fine?) (one line per part):
    >
    > ( type-name ) { initializer-list }
    > ++
    > ( argument-expression-listopt )
    > -> identifier
    > -> identifier
    >
    > (i guess that couldn't actually happen in code (according to further
    > checks by compiler/parser or whatever), but so far as what the above
    > postfix-expression grammar specification goes, that's ok?)[/color]

    Right. The above is syntactically valid but semantically
    unlikely to be valid.
    [color=blue]
    > so the logic of implementing that in code, you'd need to look back to
    > the previous occurance when a module name is encountered? -- but that's
    > only applicable to modules self-referencing themselves isn't it? that's
    > what i can't quite understand. when you come accross a module name that
    > isn't self referencing, there's no need to look back -- it's just a
    > straight, 'does it occur here' question, but when it's a self reference
    > it seems to be a 'did it occur last time' question. i feel there
    > shouldn't be any logical difference between a module using it's own
    > module name, and a module using another module's name, but it does seem
    > a different logic -- am i just getting confused or is a slightly
    > different logic required for modules references to themselves?[/color]

    Parsers are somewhat complex beasts. It's not always easy to get
    them to be correct. It's far more difficult to get them correct
    if you don't have the right knowledge to write them. The best
    place to get that knowledge is probably a compilers textbook.
    Have you tried reading one?

    Alternatively, you could use a "parser generator" such as Bison
    or yacc.
    [color=blue]
    > btw, i'm attempting to do a c, also objective-c, (semi) parser --
    > that's why i'm trying to get on top of the grammar syntax.
    >
    > one last quickie: this line:
    > ( type-name ) { initializer-list , }
    > it looks like a ',' can occur followd by a '}' ? surely i'm reading
    > that wrong as that isn't possible? feels like there should be something
    > inbetween the , and }?[/color]

    No, in many places in C a trailing comma is allowed in a list of
    comma-separated items. It makes it easier to rearrange things in
    the list (when they're one per line) if you don't have to deal
    with an exceptional comma after the last item. It can also
    simplify programs that generate C.
    --
    "For those who want to translate C to Pascal, it may be that a lobotomy
    serves your needs better." --M. Ambuhl

    "Here are the steps to create a C-to-Turbo-Pascal translator..." --H. Schildt

    Comment

    • ben

      #3
      Re: syntax/notation used in describing c's grammar

      hiyer Ben,

      thanks very much for the reply.

      In article <873c2nvwgi.fsf @benpfaff.org>, Ben Pfaff
      <blp@cs.stanfor d.edu> wrote:
      [color=blue]
      > ben <x@x.com> writes:
      >[color=green]
      > > postfix-expression:
      > > primary-expression
      > > postfix-expression [ expression ]
      > > postfix-expression ( argument-expression-listopt )
      > > postfix-expression . identifier
      > > postfix-expression -> identifier
      > > postfix-expression ++
      > > postfix-expression --
      > > ( type-name ) { initializer-list }
      > > ( type-name ) { initializer-list , }[/color][/color]
      [color=blue][color=green]
      > > when a[/color]
      > "nontermina l"[color=green]
      > > is referred to from within itself like in the above one, at the
      > > start of one or several of its possibilities, that means that
      > > one of the other variations/possibilities of the
      > > postfix-expression module must occur immediately before one of
      > > the possibilities that start with 'postfix-expression' can
      > > occur right?[/color]
      >
      > No. There is no such requirement.[/color]

      are you sure? i think i might have just worded that wrongly because
      what's said later, and you agree to, contradics your above answer -- i
      think i might have just worded the above part badly/misleadingly.[color=blue][color=green]
      > > so you can have a '++' only after one of the three lines which
      > > do not begin with 'postfix-expression' have occured?[/color]
      > Nope. something like a->b++ is potentially valid syntax.[/color]

      basically what i'm saying (checking on that i've understood the
      "postfix-expression:" grammer specification) is this:

      on the first reading/occurance of a postfix-expression part, only one
      of these three possibilities are possible:
      primary-expression
      ( type-name ) { initializer-list }
      ( type-name ) { initializer-list , }

      then, once one of those three has occured, one of these things can
      occur and follow on possibly :
      [ expression ]
      ( argument-expression-listopt )
      . identifier
      -> identifier
      ++
      --
      (and none of those would be possible if one of the three things in the
      first list hadn't happend).
      and then that second list above can be repeated over and over possibly,
      until something else happens.

      [color=blue][color=green]
      > > or another way of getting at the same point: if you had a
      > > module, say called 'xyz', whose possibilities all started with
      > > 'xyz', then that module would be a waste of time and impossible
      > > and useless because none of it could ever occur? at least one
      > > of the possibilities should *not* start with 'xyz' -- then it
      > > would be possible for it to occur? is that right?[/color]
      >
      > Yes: to be a sensible grammar, at least one of the productions
      > for a nonterminal mentioned in a grammar must not contain itself
      > on the right-hand side.[/color]

      shouldn't that be left-hand side?
      [color=blue]
      > Parsers are somewhat complex beasts. It's not always easy to get
      > them to be correct. It's far more difficult to get them correct
      > if you don't have the right knowledge to write them. The best
      > place to get that knowledge is probably a compilers textbook.
      > Have you tried reading one?
      >
      > Alternatively, you could use a "parser generator" such as Bison
      > or yacc.[/color]

      generally, so far my parser is going ok. i think i'm ok doing it the
      way i am (so far that is). i think learning yacc or bison would take
      longer and be harder than the way i'm going. just needed to check on
      the grammar of the grammar as it were. i don't think i need a fully
      fledged parser, although saying that, i'm not even sure if a
      semi-parser is possible or easier -- it may turn out that you either
      parse, or you don't, or that letting somethings go, being flexible
      won't be easier. i'm not sure at the moment. anyway i'm rambling. if it
      is possible and easier to do a semi-parser than a full parser, that's
      what i'm going to do.

      and it does look like the logic of a nonterminal reference to the
      nonterminal that that reference is in, is a slightly different logic to
      a nonterminal reference to another nonterminal -- in that the self
      reference needs to look back at the last thing that occured rather than
      the thing that's occuring now. not completely sure on that, but
      anyway...
      [color=blue][color=green]
      > > one last quickie: this line:
      > > ( type-name ) { initializer-list , }
      > > it looks like a ',' can occur followd by a '}' ? surely i'm reading
      > > that wrong as that isn't possible? feels like there should be something
      > > inbetween the , and }?[/color]
      >
      > No, in many places in C a trailing comma is allowed in a list of
      > comma-separated items. It makes it easier to rearrange things in
      > the list (when they're one per line) if you don't have to deal
      > with an exceptional comma after the last item. It can also
      > simplify programs that generate C.[/color]

      great. well i never knew that.

      thans very much for the info -- most useful.

      ben.

      Comment

      • Ben Pfaff

        #4
        Re: syntax/notation used in describing c's grammar

        ben <x@x.com> writes:
        [color=blue]
        > In article <873c2nvwgi.fsf @benpfaff.org>, Ben Pfaff
        > <blp@cs.stanfor d.edu> wrote:
        >[color=green]
        >> ben <x@x.com> writes:
        >>[color=darkred]
        >> > postfix-expression:
        >> > primary-expression
        >> > postfix-expression [ expression ]
        >> > postfix-expression ( argument-expression-listopt )
        >> > postfix-expression . identifier
        >> > postfix-expression -> identifier
        >> > postfix-expression ++
        >> > postfix-expression --
        >> > ( type-name ) { initializer-list }
        >> > ( type-name ) { initializer-list , }[/color][/color]
        >[color=green][color=darkred]
        >> > when a[/color]
        >> "nontermina l"[color=darkred]
        >> > is referred to from within itself like in the above one, at the
        >> > start of one or several of its possibilities, that means that
        >> > one of the other variations/possibilities of the
        >> > postfix-expression module must occur immediately before one of
        >> > the possibilities that start with 'postfix-expression' can
        >> > occur right?[/color]
        >>
        >> No. There is no such requirement.[/color]
        >
        > are you sure? i think i might have just worded that wrongly because
        > what's said later, and you agree to, contradics your above answer -- i
        > think i might have just worded the above part badly/misleadingly.[/color]

        Well, let's see.
        [color=blue][color=green][color=darkred]
        >> > so you can have a '++' only after one of the three lines which
        >> > do not begin with 'postfix-expression' have occured?[/color]
        >> Nope. something like a->b++ is potentially valid syntax.[/color]
        >
        > basically what i'm saying (checking on that i've understood the
        > "postfix-expression:" grammer specification) is this:
        >
        > on the first reading/occurance of a postfix-expression part, only one
        > of these three possibilities are possible:
        > primary-expression
        > ( type-name ) { initializer-list }
        > ( type-name ) { initializer-list , }[/color]

        The first token (terminal) in a postfix-expression has to come
        from one of those productions, yes.
        [color=blue]
        > then, once one of those three has occured, one of these things can
        > occur and follow on possibly :
        > [ expression ]
        > ( argument-expression-listopt )
        > . identifier
        > -> identifier
        > ++
        > --
        > (and none of those would be possible if one of the three things in the
        > first list hadn't happend).
        > and then that second list above can be repeated over and over possibly,
        > until something else happens.[/color]

        Yes, that's one way to look at it.
        [color=blue][color=green][color=darkred]
        >> > or another way of getting at the same point: if you had a
        >> > module, say called 'xyz', whose possibilities all started with
        >> > 'xyz', then that module would be a waste of time and impossible
        >> > and useless because none of it could ever occur? at least one
        >> > of the possibilities should *not* start with 'xyz' -- then it
        >> > would be possible for it to occur? is that right?[/color]
        >>
        >> Yes: to be a sensible grammar, at least one of the productions
        >> for a nonterminal mentioned in a grammar must not contain itself
        >> on the right-hand side.[/color]
        >
        > shouldn't that be left-hand side?[/color]

        Each production has the form L -> R, where L is a single
        nonterminal and R is a string of terminals and nonterminals.
        Here everything in R is on the right-hand side.
        [color=blue]
        > generally, so far my parser is going ok. i think i'm ok doing it the
        > way i am (so far that is). i think learning yacc or bison would take
        > longer and be harder than the way i'm going. just needed to check on
        > the grammar of the grammar as it were. i don't think i need a fully
        > fledged parser, although saying that, i'm not even sure if a
        > semi-parser is possible or easier -- it may turn out that you either
        > parse, or you don't, or that letting somethings go, being flexible
        > won't be easier. i'm not sure at the moment. anyway i'm rambling. if it
        > is possible and easier to do a semi-parser than a full parser, that's
        > what i'm going to do.[/color]

        Well, good luck. I still think you'd be better off getting a
        good book on grammars and parsers, whether you decide to write
        your own parser or use someone else's parser generator.
        --
        "...Almost makes you wonder why Heisenberg didn't include postinc/dec operators
        in the uncertainty principle. Which of course makes the above equivalent to
        Schrodinger's pointer..."
        --Anthony McDonald

        Comment

        • ben

          #5
          Re: syntax/notation used in describing c's grammar

          In article <874qn1vl2a.fsf @benpfaff.org>, Ben Pfaff
          <blp@cs.stanfor d.edu> wrote:

          [color=blue]
          > Yes, that's one way to look at it.[/color]


          cool. thanks for the info.

          ben.

          Comment

          Working...