regular expression extracting groups

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • clawsicus@gmail.com

    regular expression extracting groups

    Hi list,

    I'm trying to use regular expressions to help me quickly extract the
    contents of messages that my application will receive. I have worked
    out most of the regex but the last section of the message has me
    stumped. This is mostly because I want to pull the content out into
    regex groups that I can easily access later. I have a regex to extract
    the key/value pairs but it ends up with only the contents of the last
    key/value pair encountered.

    An example of the section of the message that is troubling me appears
    like this:

    {
    option=value
    foo=bar
    another=42
    option=7
    }

    So it's basically a bunch of lines. Every line is terminated with a
    '\n' character. The number of key/value fields changes depending on
    the particular message. Also notice that there are two 'option' keys.
    This is allowable and I need to cater for it.


    A couple of example messages are:
    xpl-stat\n{\nhop=1\ nsource=vendor-device.instance \ntarget=*\n}
    \nhbeat.basic\n {\ninterval=10\ n}\n

    xpl-stat\n{\nhop=1\ nsource=vendor-device.instance \ntarget=vendor-
    device.instance \n}\nconfig.lis t\n{\nreconf=ne wconf\noption=i nterval
    \noption=group[16]\noption=filter[16]\n}\n


    As all messages follow the same pattern I'm hoping to develop a
    generic regex, instead of one for each message kind - because there
    are many, that can pull a message from a received packet.



    The regex I came up with looks like this:
    # This should match any xPL message

    GROUP_MESSAGE_T YPE = 'message_type'
    GROUP_HOP = 'hop'
    GROUP_SOURCE = 'source'
    GROUP_TARGET = 'target'
    GROUP_SRC_VENDO R_ID = 'source_vendor_ id'
    GROUP_SRC_DEVIC E_ID = 'source_device_ id'
    GROUP_SRC_INSTA NCE_ID = 'source_instanc e_id'
    GROUP_TGT_VENDO R_ID = 'target_vendor_ id'
    GROUP_TGT_DEVIC E_ID = 'target_device_ id'
    GROUP_TGT_INSTA NCE_ID = 'target_instanc e_id'
    GROUP_IDENTIFIE R_TYPE = 'identifier_typ e'
    GROUP_SCHEMA = 'schema'
    GROUP_SCHEMA_CL ASS = 'schema_class'
    GROUP_SCHEMA_TY PE = 'schema_type'
    GROUP_OPTION_KE Y = 'key'
    GROUP_OPTION_VA LUE = 'value'


    XplMessageGroup sRe = r'''(?P<%s>xpl-(cmnd|stat|trig ))
    \n # message type
    \
    {\n
    #
    hop=(?P<%s>[1-9]{1})
    \n # hop
    count
    source=(?P<%s>( ?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P<
    %s>[a-z0-9]{1,16}))\n # source identifier
    target=(?P<%s>( \*|(?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P<
    %s>[a-z0-9]{1,16})))\n # target identifier
    \}
    \n
    #
    (?P<%s>(?P<%s>[a-z0-9]{1,8})\.(?P<%s>[a-z0-9]{1,8}))\n
    # schema
    \
    {\n
    #
    (?:(?P<%s>[a-z0-9\-]{1,16})=(?P<%s>[\x20-\x7E]{0,128})\n){1,6 4} #
    key/value pairs
    \}\n''' % (GROUP_MESSAGE_ TYPE,
    GROUP_HOP,
    GROUP_SOURCE,
    GROUP_SRC_VENDO R_ID,
    GROUP_SRC_DEVIC E_ID,
    GROUP_SRC_INSTA NCE_ID,
    GROUP_TARGET,
    GROUP_TGT_VENDO R_ID,
    GROUP_TGT_DEVIC E_ID,
    GROUP_TGT_INSTA NCE_ID,
    GROUP_SCHEMA,
    GROUP_SCHEMA_CL ASS,
    GROUP_SCHEMA_TY PE,
    GROUP_OPTION_KE Y,
    GROUP_OPTION_VA LUE)

    XplMessageGroup s = re.compile(XplM essageGroupsRe, re.VERBOSE |
    re.DOTALL)


    If I pass the second example message through this regex the 'key'
    group ends up containing 'option' and the 'value' group ends up
    containing 'filter[16]' which are the last key/value pairs in that
    message.

    So the problem I have lies in the key/value regex extraction section.
    It handles multiple occurrences of the pattern and writes the content
    into the single key/value group hence I can't extract and access all
    fields.

    Is there some other way to do this which allows me to store all the
    key/value pairs into the regex match object for later retrieval?
    Perhaps using the standard unnamed number groups?

    Thanks,
    Chris
  • Paul Hankin

    #2
    Re: regular expression extracting groups

    On Aug 10, 2:30 pm, clawsi...@gmail .com wrote:
    I'm trying to use regular expressions to help me quickly extract the
    contents of messages that my application will receive.
    Don't use regexps for parsing complex data; they're limited,
    completely unreadable, and hugely difficult to debug. Your code is
    well written, and you've already reached the limits of the power of
    regexps, and it's difficult to read.

    Have a look at pyparsing for a simple solution to your problem.


    --
    Paul Hankin

    Comment

    • clawsicus@gmail.com

      #3
      Re: regular expression extracting groups

      Thanks all for your responses, especially Paul McGuire for the
      excellent example usage of pyparsing.
      I'm off to check out pyparsing.

      Thanks,
      Chris


      Comment

      Working...