Finding formatting items in a string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jack

    Finding formatting items in a string

    Hi there,

    Given a standard .NET string, does anyone know what the regular expression
    would be to locate each (optional) formatting item in the string (or more
    likely does anyone have a link that will show me this). For instance, given
    the following simple string:

    "My phone number is {0} and my SSN is {1}"

    I want to enumerate (or create a collection of) all formatting items in the
    string which would be "{0}" and "{1}" in this (trivial) example. The regular
    expression itself should handle all legal cases of course (as described
    under "composite formatting" in MSDN - see here:
    http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any help would be
    appreciated. Thanks.


  • Jesse Houwing

    #2
    Re: Finding formatting items in a string

    Hello Jack,
    Hi there,
    >
    Given a standard .NET string, does anyone know what the regular
    expression would be to locate each (optional) formatting item in the
    string (or more likely does anyone have a link that will show me
    this). For instance, given the following simple string:
    >
    "My phone number is {0} and my SSN is {1}"
    >
    I want to enumerate (or create a collection of) all formatting items
    in the string which would be "{0}" and "{1}" in this (trivial)
    example. The regular expression itself should handle all legal cases
    of course (as described under "composite formatting" in MSDN - see
    here: http://msdn2.microsoft.com/en-us/library/txafckwd.aspx). Any
    help would be appreciated. Thanks.

    The following expression will take care of most you want:

    (?<!([^\{]|^)\{(\{{2})*)\ {[0-9]+(,[-]?[0-9]+)?(:[^\}]+)?\}(?!\}(\{{2 })*([^\}]|$))

    I'll try to explain what it does:

    (?<!([^\{]|^)\{(\{{2})*)
    This part sees if we're dealing with an even number of opening {. In that
    case all are escaped and should therefore be ignored.
    Due to the fact that there is no easy way to check for off or even numbers
    I've done it as follows:
    - first make sure we're either at the beginning of a line or that we match
    a character that is no {. That way we're sure where we're startign to count.
    - Now chop off the first {, followed by any group of 2 extra {'s.

    \{
    - If that still leaves us with one {, then we're in business.

    [0-9]+
    - Now accept the numbered part. I've made it pretty simple here, any number
    will so.

    (,[-]?[0-9]+)?
    - Now accept the optional alignment. I think you could write the [-] as [+-],
    but I'm not sure from the top of my head that a plus is allowed for the alignment.
    I guess it is though.

    (:[^\}]+)?
    - Accept almost anything as optional formatting mask. As you can specify
    the formatting mask for each and every tipe differently based on the TypeFormatter,
    I guess there's no use in limiting the possible formats any way.
    - So chop off everything that's not a closing }

    \}
    - Pick off the closing }

    (?!\}(\{{2})*([^\}]|$))
    - But only if it's followed by no or an odd number of closing }'s. This used
    the same logic as above.

    You could make the regex more specific, but I guess this should get you started.

    Also note that I haven't taken any whitespace into account, as I haven't
    had time to experiment where you would be allowed to add whitespace and where
    not.

    If you still have any questions on how to improve or further limit the expression,
    feel free to ask.

    --
    Jesse Houwing
    jesse.houwing at sogeti.nl


    Comment

    • Jesse Houwing

      #3
      Re: Finding formatting items in a string

      Hello Jack,

      I even found a FAQ on this... I'm going to write a blogpost about this pattern
      at some poitn I guess. It has a lot of interesting regex things in it.

      The FAQ:


      And a further completed regex, including the fact that you can use { and
      } within the custom pattern if you want to... Just escape them again. Which
      makes the whole escaping { harder and harder to understand...

      This is the expression I've got so far:
      (?<!([^\{]|^){({{)*){(?<i tem>[0-9]+)(?<alignment> ,[-+]?[0-9]+)?(?<format>:([^{}]|{{|}})+)?}(?!} ({{)*([^}]|$))

      note also that I removed the \ before most, if not all of the { and } in
      the expression. It seems that the .NET regex parser is quite content with
      this. Only if the {n(,(m)?)?} format is used, do you need to escape the {
      and the }. I found that by accident. Not that it makes the expression any
      easier to read... *sigh*...

      --
      Jesse Houwing
      jesse.houwing at sogeti.nl


      Comment

      • Jack

        #4
        Re: Finding formatting items in a string

        BTW, I came across this gem during my own research. You may be interested in
        checking it out (it even breaks down your regex expression into plain
        English)




        Comment

        • Jack

          #5
          Re: Finding formatting items in a string

          Just an update that it looks very good so far (thanks). I haven't unravelled
          the opening and closing brace (stuff) yet but I think there may be a (rare)
          problem with the handling of the ":[formatString]". Any pairs of "{{" or
          "}}" are valid in "formatStri ng" if I understand the docs correctly so they
          should be ignored. I'm still reviewing the situation however (and your code)
          so this is just a heads-up before you start tackling your article :)


          Comment

          • Jack

            #6
            Re: Finding formatting items in a string

            "Jack" <no_spam@_nospa m.comwrote in message
            news:evV9q2jcIH A.4332@TK2MSFTN GP06.phx.gbl...
            Just an update that it looks very good so far (thanks). I haven't
            unravelled the opening and closing brace (stuff) yet but I think there may
            be a (rare) problem with the handling of the ":[formatString]". Any pairs
            of "{{" or "}}" are valid in "formatStri ng" if I understand the docs
            correctly so they should be ignored. I'm still reviewing the situation
            however (and your code) so this is just a heads-up before you start
            tackling your article :)
            Ok. It appears that your expression:

            (?<!([^\{]|^)\{(\{{2})*)\ {(?<item>[0-9]+)(?<alignment> ,[-+]?[0-9]+)?(?<format>:[^\}]+)?\}(?!\}(\{{2 })*([^\}]|$))

            may need to be modified slightly:

            (?<!([^\{]|^)\{(\{{2})*)\ {(?<item>[0-9]+)(?<alignment> ,[-+]?[0-9]+)?(?<format>:([^\}]|\}{2})+)?\}(?! \}(\{{2})*([^\}]|$))

            I've simply changed the "format" so that it in addition to allowing one or
            more of any character except a "}" (as per your original expression), it
            also now allows one or more pairs of "}}" (before the final "}" that
            terminates it). I'm still digging through it all though as I rarely ever
            work with regular expressions.


            Comment

            • Jesse Houwing

              #7
              Re: Finding formatting items in a string

              Hello Jack,
              "Jack" <no_spam@_nospa m.comwrote in message
              news:evV9q2jcIH A.4332@TK2MSFTN GP06.phx.gbl...
              >
              >Just an update that it looks very good so far (thanks). I haven't
              >unravelled the opening and closing brace (stuff) yet but I think
              >there may be a (rare) problem with the handling of the
              >":[formatString]". Any pairs of "{{" or "}}" are valid in
              >"formatStrin g" if I understand the docs correctly so they should be
              >ignored. I'm still reviewing the situation however (and your code) so
              >this is just a heads-up before you start tackling your article :)
              >>
              Ok. It appears that your expression:
              >
              (?<!([^\{]|^)\{(\{{2})*)\ {(?<item>[0-9]+)(?<alignment> ,[-+]?[0-9]+)?(?
              <format>:[^\}]+)?\}(?!\}(\{{2 })*([^\}]|$))
              >
              may need to be modified slightly:
              >
              (?<!([^\{]|^)\{(\{{2})*)\ {(?<item>[0-9]+)(?<alignment> ,[-+]?[0-9]+)?(?
              <format>:([^\}]|\}{2})+)?\}(?! \}(\{{2})*([^\}]|$))
              >
              I've simply changed the "format" so that it in addition to allowing
              one or more of any character except a "}" (as per your original
              expression), it also now allows one or more pairs of "}}" (before the
              final "}" that terminates it). I'm still digging through it all though
              as I rarely ever work with regular expressions.
              That would indeed solve the issue. I've been experimenting a bit more and
              came to the same conclusion...

              To make it even less readable, but shorter, you can remove the escapes from
              the \{ and \} to make it the following expression:

              (?<!([^{]|^){({{)*){(?<i tem>[0-9]+)(?<alignment> ,[-+]?[0-9]+)?(?<format>:([^{}]|}}|{{)+)?}(?!} ({{)*([^}]|$))

              Also as far as I can tell the opening { must be escaped in the format pattern
              as well. I adjusted the above expression for that.


              --
              Jesse Houwing
              jesse.houwing at sogeti.nl


              Comment

              Working...