Design for parsing and translating into a new syntext

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • koren99
    New Member
    • Aug 2007
    • 9

    Design for parsing and translating into a new syntext

    Hi,

    I am having problems creating a good model for this.
    and i'll appriciate any help.

    Req.
    1. parse a predefind language such as arethmatic: 4+5-7*9... with abilty to extend / add more implementations in the future
    2. be able to translate the object tree created during the parse, into a string / object in a new language (for example, if i parse arethmatic string and have 2 calculator apps, that recieve input in 2 different ways, i want to be able to send the command to both.

    What i did:
    I Created a set of Objects such as NumericValueTok en, OperatorToken, BeginBracketTok en, EndBracketToken . each contains a way method parse when given input, checks validity and keeps value in some way.

    I Created a ArethmaticParse r object that will contain a valid tree of the objects above after a successfull translation.

    This is were problems stated for me:

    I tried creating a Visitor pattern for this,
    In order to translate the tree into a different language each time, i created a class hirarcy called LanguageContext which is the visitor, it contains a method called Translate that will get a Token Object and translate it.
    each Token will have a Translate method that revicevs a LanguageContext

    the parser will call the first translate of the first token and the string will be returned through the tree.

    The Problem:
    each Token req. a different translation so for each context i wanted to create a set of Translate Methods. one for OperatorToken, one for ValueToken, etc.
    and each token will go to the appropriate method according to its type.
    That will not work, because they all will go to the base classes method.
    because they are kept in a generic baseToken tree.

    another problem is that the context passed is also passed by the Abstract or base, so that the mothods used are from the base and not from the real object.

    i know these things because i tried a little test.

    Can anyone recommed a different approche? a fix to the problem.

    Thanks,
    Koren Shoval
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    #2
    So basically your visitor pattern is broken because each token should simply report
    back to the visitor which token it actually is and the visitor should handle it from there.
    A Java example:

    [code=java]
    interface Visitor {
    void leftBracketToke n(LeftBracketTo ken visitee);
    void rightBracketTok en(RightBractke tToken visitee);
    void plusToken(PlusT oken visitee);
    void minusToken(Minu sToken visitee);
    void numberToken(Num berToken visitee);
    // etc. etc.
    }
    interface visitee {
    void visit(Visitor visitor);
    }
    [/code]

    An implementation of the Visitor visits a Visitee that'll reveil what it actually is. All
    the different tokens should implement this Visitee interface. This pattern is also
    called the 'double dispatch' pattern because two 'hops' are needed to get to the
    point where something appropriate can be done 1: visitor calls visitee, 2: visitee
    calls back on the visitor.

    kind regards,

    Jos

    Comment

    • koren99
      New Member
      • Aug 2007
      • 9

      #3
      Thanks for the input. its probably what i'll do.

      But can you / someone else advise if there is a better approch / design that would fit ?
      a design that would not force me to detect which token is being parsed.
      this is a problem because the parser contains a tree of tokens created according to the user's input and i need to run the visitor on all of these and i don't want to add logic to detect what type of token i send to the visitor.
      (they are stored as base tokens in a list)

      Comment

      • JosAH
        Recognized Expert MVP
        • Mar 2007
        • 11453

        #4
        Originally posted by koren99
        Thanks for the input. its probably what i'll do.

        But can you / someone else advise if there is a better approch / design that would fit ?
        a design that would not force me to detect which token is being parsed.
        this is a problem because the parser contains a tree of tokens created according to the user's input and i need to run the visitor on all of these and i don't want to add logic to detect what type of token i send to the visitor.
        (they are stored as base tokens in a list)
        If your token set is fixed a visitor pattern is sort of ideal: let the visitor do with
        those tokens whatever it wants. You can write a new visitor whenever it's needed.

        OTOH if the token set isn't fixed (built-in functions come to mind and identifiers
        as in 'sin(x+y)') a visitor pattern for that is hopeless because you have to rewrite
        (add to) every visitor when a new token type sees the light.

        Scanning and parsing each token up front remains ideal because you don't want
        to determine the type of a token over and over again. Building a syntax tree out
        of the token stream is fine too because you don't want to determine the syntax
        of a stream of tokens over and over again.

        It's the translation of one AST (Abstract Syntax Tree) to another representation
        that's bothering you but an AST is an ideal 'blue print' representation of the original
        input stream. It's just the set of acceptable tokens you have to deal with, but you
        already stated that you're dealing with a fixed input language. Simply define that
        one as general as possible and fix it.

        kind regards,

        Jos

        Comment

        • koren99
          New Member
          • Aug 2007
          • 9

          #5
          Thanks, your ports were very helpful.
          I'll do just that.

          Comment

          Working...