Java RegEx pattern

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • m6s
    New Member
    • Aug 2007
    • 55

    Java RegEx pattern

    Let's suppose we have, this in an XML
    <cs:CUSTOMER_ID ><cs:CUSTOMER_I D/>

    and a pattern to match : ":CUSTOMER_ ID>([A-Za-z][A-Za-z0-9]*|)"

    Really, I understand that between the second [] looks for alpharethmetic.
    But I don't understand the first [] and after the OR operator '|'.
    Has anyone any experience to decipher it?

    Thank you in advance
  • JosAH
    Recognized Expert MVP
    • Mar 2007
    • 11453

    #2
    Originally posted by m6s
    Let's suppose we have, this in an XML
    <cs:CUSTOMER_ID ><cs:CUSTOMER_I D/>

    and a pattern to match : ":CUSTOMER_ ID>([A-Za-z][A-Za-z0-9]*|)"

    Really, I understand that between the second [] looks for alpharethmetic.
    But I don't understand the first [] and after the OR operator '|'.
    Has anyone any experience to decipher it?

    Thank you in advance
    This part: [A-Za-z][A-Za-z0-9]* reads: one letter [A-Za-z] followed by zero or more
    letters or digits [A-Za-z0-9]*.

    The vertical bar | at the end is a syntax error and shouldn't be there. The | operator
    is a choice A|B meaning either A or B for whatever A or B.

    kind regards,

    Jos

    Comment

    • m6s
      New Member
      • Aug 2007
      • 55

      #3
      Originally posted by JosAH
      This part: [A-Za-z][A-Za-z0-9]* reads: one letter [A-Za-z] followed by zero or more
      letters or digits [A-Za-z0-9]*.

      The vertical bar | at the end is a syntax error and shouldn't be there. The | operator
      is a choice A|B meaning either A or B for whatever A or B.

      kind regards,

      Jos
      Could n't just say then [A-Za-z0-9]? ? So the english questionmark to say one or more?
      I bet the | is the or for nothing, so to say bring me one ore more, or nothing.. I think..

      Comment

      • Ganon11
        Recognized Expert Specialist
        • Oct 2006
        • 3651

        #4
        [a-zA-Z0-9]* is very different from [a-zA-Z][a-zA-Z0-9]*. This first will match 123cat, and the second will not. Both, however, will match cat123.

        The first, [a-zA-Z0-9]*, looks for any sequence of digits and letters. They can be in any order and are under no constraints.

        The second, [a-zA-Z][a-zA-Z0-9]*, MUST match a letter first, and only then will match any sequence of letters or digits. So it can't match something starting with a number, which the first regex can.

        Some word starting with a letter, but made of of digits and letters...sound like anything to you?

        Comment

        • m6s
          New Member
          • Aug 2007
          • 55

          #5
          Originally posted by Ganon11
          [a-zA-Z0-9]* is very different from [a-zA-Z][a-zA-Z0-9]*. This first will match 123cat, and the second will not. Both, however, will match cat123.

          The first, [a-zA-Z0-9]*, looks for any sequence of digits and letters. They can be in any order and are under no constraints.

          The second, [a-zA-Z][a-zA-Z0-9]*, MUST match a letter first, and only then will match any sequence of letters or digits. So it can't match something starting with a number, which the first regex can.

          Some word starting with a letter, but made of of digits and letters...sound like anything to you?
          It seems rather clear now, perhaps, however if we need the first word after the '>' whether name or number this trick is too restrictive I think. We could have loosen it a bit.....
          I tried to some regexp utility or grep itself, but couln't get what the writer wanted to say...
          In any case thank you very much... It was very detailed description.

          Comment

          Working...