how to parse structured text file?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Petr Jakes

    how to parse structured text file?

    I have a file which contains data in the format shown in the sample
    bellow.
    How can I parse it to get following:
    (14,trigger,gua rd,do_action,15 )

    Thanks a lot for your postings
    Petr Jakes

    type:
    4
    bgrColor:
    255 255 255
    fgrColor:
    0 0 0
    objId:
    16
    Num.Pts:
    2
    177 104
    350 134
    objStartId:
    14
    objEndId:
    15
    eventName:
    trigger
    eventCond:
    guard
    eventAction:
    do_action

  • Paul McGuire

    #2
    Re: how to parse structured text file?

    A problem fit for pyparsing! Download pyparsing at
    http://pyparsing.sourceforge.net.

    Assuming you always have these fields, in this order, this program will
    figure them out. If not, you'll need to tweak the pyparsing
    definitions as needed.

    -- Paul


    data = """type:
    4
    bgrColor:
    255 255 255
    fgrColor:
    0 0 0
    objId:
    16
    Num.Pts:
    2
    177 104
    350 134
    objStartId:
    14
    objEndId:
    15
    eventName:
    trigger
    eventCond:
    guard
    eventAction:
    do_action
    """

    from pyparsing import *

    # define literals for field labels
    type_ = Literal("type")
    bgrColor = Literal("bgrCol or")
    fgrColor = Literal("fgrCol or")
    objId = Literal("objId" )
    numPts = Literal("Num.Pt s")
    objStartId = Literal("objSta rtId")
    objEndId = Literal("objEnd Id")
    eventName = Literal("eventN ame")
    eventCond = Literal("eventC ond")
    eventAction = Literal("eventA ction")

    # define an integer, and tell parser to convert them to ints
    intvalue = Word(nums).setP arseAction( lambda s,l,toks: int(toks[0]) )

    # define an alphabetic identifier
    alphavalue = Word(alphas,alp hanums+"_")

    # define a 2D coordinate, with results names for fields
    coordvalue = Group( intvalue.setRes ultsName("X") +
    intvalue.setRes ultsName("Y") )

    # define an RGB color value, with results names for fields
    colorvalue = Group( intvalue.setRes ultsName("R") +
    intvalue.setRes ultsName("G") +
    intvalue.setRes ultsName("B") )

    # compose an entry definition, using above-defined expressions, with
    results names for fields
    entry = ( type_ + ":" + "4" +
    bgrColor + ":" + colorvalue.setR esultsName("bgr Color") +
    fgrColor + ":" + colorvalue.setR esultsName("fgr Color") +
    objId + ":" + intvalue.setRes ultsName("objId ") +
    numPts + ":" + Group( intvalue.setRes ultsName("numpt s") +
    OneOrMore( coordvalue ).setResultsNam e("coords")
    ).setResultsNam e("pts") +
    objStartId + ":" + intvalue.setRes ultsName("objSt artId") +
    objEndId + ":" + intvalue.setRes ultsName("objEn dId") +
    eventName + ":" + alphavalue.setR esultsName("eve ntName") +
    eventCond + ":" + alphavalue.setR esultsName("eve ntCond") +
    eventAction + ":" + alphavalue.setR esultsName("eve ntAction")
    )

    # scan through input data, and retrieve data fields as desired
    for entryData,start ,end in entry.scanStrin g(data):
    print
    "(%(objStartId) d,%(eventName)s ,%(eventCond)s, %(eventAction)s ,%(objEndId)d)"
    % entryData
    print entryData.objId
    print entryData.bgrCo lor
    print entryData.fgrCo lor
    print [ (pt.X,pt.Y) for pt in entryData.pts.c oords ]
    print [ tuple(pt) for pt in entryData.pts.c oords ]


    Prints:
    (14,trigger,gua rd,do_action,15 )
    16
    [255, 255, 255]
    [0, 0, 0]
    [(177, 104), (350, 134)]
    [(177, 104), (350, 134)]

    Comment

    Working...