Hi list,
I'm trying to use regular expressions to help me quickly extract the
contents of messages that my application will receive. I have worked
out most of the regex but the last section of the message has me
stumped. This is mostly because I want to pull the content out into
regex groups that I can easily access later. I have a regex to extract
the key/value pairs but it ends up with only the contents of the last
key/value pair encountered.
An example of the section of the message that is troubling me appears
like this:
{
option=value
foo=bar
another=42
option=7
}
So it's basically a bunch of lines. Every line is terminated with a
'\n' character. The number of key/value fields changes depending on
the particular message. Also notice that there are two 'option' keys.
This is allowable and I need to cater for it.
A couple of example messages are:
xpl-stat\n{\nhop=1\ nsource=vendor-device.instance \ntarget=*\n}
\nhbeat.basic\n {\ninterval=10\ n}\n
xpl-stat\n{\nhop=1\ nsource=vendor-device.instance \ntarget=vendor-
device.instance \n}\nconfig.lis t\n{\nreconf=ne wconf\noption=i nterval
\noption=group[16]\noption=filter[16]\n}\n
As all messages follow the same pattern I'm hoping to develop a
generic regex, instead of one for each message kind - because there
are many, that can pull a message from a received packet.
The regex I came up with looks like this:
# This should match any xPL message
GROUP_MESSAGE_T YPE = 'message_type'
GROUP_HOP = 'hop'
GROUP_SOURCE = 'source'
GROUP_TARGET = 'target'
GROUP_SRC_VENDO R_ID = 'source_vendor_ id'
GROUP_SRC_DEVIC E_ID = 'source_device_ id'
GROUP_SRC_INSTA NCE_ID = 'source_instanc e_id'
GROUP_TGT_VENDO R_ID = 'target_vendor_ id'
GROUP_TGT_DEVIC E_ID = 'target_device_ id'
GROUP_TGT_INSTA NCE_ID = 'target_instanc e_id'
GROUP_IDENTIFIE R_TYPE = 'identifier_typ e'
GROUP_SCHEMA = 'schema'
GROUP_SCHEMA_CL ASS = 'schema_class'
GROUP_SCHEMA_TY PE = 'schema_type'
GROUP_OPTION_KE Y = 'key'
GROUP_OPTION_VA LUE = 'value'
XplMessageGroup sRe = r'''(?P<%s>xpl-(cmnd|stat|trig ))
\n # message type
\
{\n
#
hop=(?P<%s>[1-9]{1})
\n # hop
count
source=(?P<%s>( ?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P<
%s>[a-z0-9]{1,16}))\n # source identifier
target=(?P<%s>( \*|(?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P<
%s>[a-z0-9]{1,16})))\n # target identifier
\}
\n
#
(?P<%s>(?P<%s>[a-z0-9]{1,8})\.(?P<%s>[a-z0-9]{1,8}))\n
# schema
\
{\n
#
(?:(?P<%s>[a-z0-9\-]{1,16})=(?P<%s>[\x20-\x7E]{0,128})\n){1,6 4} #
key/value pairs
\}\n''' % (GROUP_MESSAGE_ TYPE,
GROUP_HOP,
GROUP_SOURCE,
GROUP_SRC_VENDO R_ID,
GROUP_SRC_DEVIC E_ID,
GROUP_SRC_INSTA NCE_ID,
GROUP_TARGET,
GROUP_TGT_VENDO R_ID,
GROUP_TGT_DEVIC E_ID,
GROUP_TGT_INSTA NCE_ID,
GROUP_SCHEMA,
GROUP_SCHEMA_CL ASS,
GROUP_SCHEMA_TY PE,
GROUP_OPTION_KE Y,
GROUP_OPTION_VA LUE)
XplMessageGroup s = re.compile(XplM essageGroupsRe, re.VERBOSE |
re.DOTALL)
If I pass the second example message through this regex the 'key'
group ends up containing 'option' and the 'value' group ends up
containing 'filter[16]' which are the last key/value pairs in that
message.
So the problem I have lies in the key/value regex extraction section.
It handles multiple occurrences of the pattern and writes the content
into the single key/value group hence I can't extract and access all
fields.
Is there some other way to do this which allows me to store all the
key/value pairs into the regex match object for later retrieval?
Perhaps using the standard unnamed number groups?
Thanks,
Chris
I'm trying to use regular expressions to help me quickly extract the
contents of messages that my application will receive. I have worked
out most of the regex but the last section of the message has me
stumped. This is mostly because I want to pull the content out into
regex groups that I can easily access later. I have a regex to extract
the key/value pairs but it ends up with only the contents of the last
key/value pair encountered.
An example of the section of the message that is troubling me appears
like this:
{
option=value
foo=bar
another=42
option=7
}
So it's basically a bunch of lines. Every line is terminated with a
'\n' character. The number of key/value fields changes depending on
the particular message. Also notice that there are two 'option' keys.
This is allowable and I need to cater for it.
A couple of example messages are:
xpl-stat\n{\nhop=1\ nsource=vendor-device.instance \ntarget=*\n}
\nhbeat.basic\n {\ninterval=10\ n}\n
xpl-stat\n{\nhop=1\ nsource=vendor-device.instance \ntarget=vendor-
device.instance \n}\nconfig.lis t\n{\nreconf=ne wconf\noption=i nterval
\noption=group[16]\noption=filter[16]\n}\n
As all messages follow the same pattern I'm hoping to develop a
generic regex, instead of one for each message kind - because there
are many, that can pull a message from a received packet.
The regex I came up with looks like this:
# This should match any xPL message
GROUP_MESSAGE_T YPE = 'message_type'
GROUP_HOP = 'hop'
GROUP_SOURCE = 'source'
GROUP_TARGET = 'target'
GROUP_SRC_VENDO R_ID = 'source_vendor_ id'
GROUP_SRC_DEVIC E_ID = 'source_device_ id'
GROUP_SRC_INSTA NCE_ID = 'source_instanc e_id'
GROUP_TGT_VENDO R_ID = 'target_vendor_ id'
GROUP_TGT_DEVIC E_ID = 'target_device_ id'
GROUP_TGT_INSTA NCE_ID = 'target_instanc e_id'
GROUP_IDENTIFIE R_TYPE = 'identifier_typ e'
GROUP_SCHEMA = 'schema'
GROUP_SCHEMA_CL ASS = 'schema_class'
GROUP_SCHEMA_TY PE = 'schema_type'
GROUP_OPTION_KE Y = 'key'
GROUP_OPTION_VA LUE = 'value'
XplMessageGroup sRe = r'''(?P<%s>xpl-(cmnd|stat|trig ))
\n # message type
\
{\n
#
hop=(?P<%s>[1-9]{1})
\n # hop
count
source=(?P<%s>( ?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P<
%s>[a-z0-9]{1,16}))\n # source identifier
target=(?P<%s>( \*|(?P<%s>[a-z0-9]{1,8})-(?P<%s>[a-z0-9]{1,8})\.(?P<
%s>[a-z0-9]{1,16})))\n # target identifier
\}
\n
#
(?P<%s>(?P<%s>[a-z0-9]{1,8})\.(?P<%s>[a-z0-9]{1,8}))\n
# schema
\
{\n
#
(?:(?P<%s>[a-z0-9\-]{1,16})=(?P<%s>[\x20-\x7E]{0,128})\n){1,6 4} #
key/value pairs
\}\n''' % (GROUP_MESSAGE_ TYPE,
GROUP_HOP,
GROUP_SOURCE,
GROUP_SRC_VENDO R_ID,
GROUP_SRC_DEVIC E_ID,
GROUP_SRC_INSTA NCE_ID,
GROUP_TARGET,
GROUP_TGT_VENDO R_ID,
GROUP_TGT_DEVIC E_ID,
GROUP_TGT_INSTA NCE_ID,
GROUP_SCHEMA,
GROUP_SCHEMA_CL ASS,
GROUP_SCHEMA_TY PE,
GROUP_OPTION_KE Y,
GROUP_OPTION_VA LUE)
XplMessageGroup s = re.compile(XplM essageGroupsRe, re.VERBOSE |
re.DOTALL)
If I pass the second example message through this regex the 'key'
group ends up containing 'option' and the 'value' group ends up
containing 'filter[16]' which are the last key/value pairs in that
message.
So the problem I have lies in the key/value regex extraction section.
It handles multiple occurrences of the pattern and writes the content
into the single key/value group hence I can't extract and access all
fields.
Is there some other way to do this which allows me to store all the
key/value pairs into the regex match object for later retrieval?
Perhaps using the standard unnamed number groups?
Thanks,
Chris
Comment