preg_match_all optional subpattern

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Han

    preg_match_all optional subpattern

    Using preg_match_all, I need to capture a list of first and last names plus
    an optional country code proceeding them.

    For example:

    <tr><td>AU</td><td>Jane Smith</td></tr>
    <tr><td></td><td>Bill Johnson</td></tr>
    <tr><td>GB</td><td>Larry Brown</td></tr>
    <tr><td>US</td><td>Mary Jordon</td></tr>
    <tr><td></td><td>Peter Jones</td></tr>

    The country code might exist, it might not.

    I would like the array contents to look like this:

    AU Jane Smith
    Bill Johnson
    US Larry Brown
    GB Mary Jordon
    Peter Jones

    I know a subpattern is needed of all the possible country codes:

    AU|GB|US

    but how do you include this as an optional subpattern?

    Thanks in advance.






  • John Dunlop

    #2
    Re: preg_match_all optional subpattern

    Han wrote:
    [color=blue]
    > Using preg_match_all, I need to capture a list of first and last names plus
    > an optional country code proceeding them.
    >
    > For example:
    >
    > <tr><td>AU</td><td>Jane Smith</td></tr>
    > <tr><td></td><td>Bill Johnson</td></tr>
    > <tr><td>GB</td><td>Larry Brown</td></tr>
    > <tr><td>US</td><td>Mary Jordon</td></tr>
    > <tr><td></td><td>Peter Jones</td></tr>
    >
    > [...] I know a subpattern is needed of all the possible country codes:
    >
    > AU|GB|US
    >
    > but how do you include this as an optional subpattern?[/color]

    The ? quantifier means zero or one of whatever came before,
    representable by {0,1}. Quantifying a subpattern using the
    question mark denotes its nonobligatory nature.

    So, to match optional two-letter country codes within a table cell
    (doesn't properly cater for attributes, but that's rectifiable):

    `<td.*>([a-z]{2})?</td.*>`Usi

    If you wish to list the possible values, precluding others:

    `<td.*>(au|gb|u s)?</td.*>`Usi

    --
    Jock

    Comment

    • s van gemmert

      #3
      Re: preg_match_all optional subpattern

      "Han" <nobody@nowhere .com> wrote in message news:<TT4gb.225 533$mp.141550@r wcrnsc51.ops.as p.att.net>...[color=blue]
      > Using preg_match_all, I need to capture a list of first and last names plus
      > an optional country code proceeding them.
      >
      > For example:
      >
      > <tr><td>AU</td><td>Jane Smith</td></tr>
      > <tr><td></td><td>Bill Johnson</td></tr>
      > <tr><td>GB</td><td>Larry Brown</td></tr>
      > <tr><td>US</td><td>Mary Jordon</td></tr>
      > <tr><td></td><td>Peter Jones</td></tr>
      >
      > The country code might exist, it might not.
      >
      > I would like the array contents to look like this:
      >
      > AU Jane Smith
      > Bill Johnson
      > US Larry Brown
      > GB Mary Jordon
      > Peter Jones
      >
      > I know a subpattern is needed of all the possible country codes:
      >
      > AU|GB|US[/color]

      this pattern should do the job:

      "{<tr><td>\ s*([A-Z]{2})?\s*</td><td>\s*(\w+) ?\s*(\w+)\s*</td></tr>}im"

      if this pattern is used in preg_match_all, it should produce the
      desired result.
      it will extract the country code if availible, first name if
      availible, and last name.
      They will be put in an 2 dim array. If no country code or first name
      is given, the array element will be left empty.

      hope this helps,

      sascha


      [color=blue]
      >
      > but how do you include this as an optional subpattern?
      >
      > Thanks in advance.[/color]

      Comment

      • Han

        #4
        Re: preg_match_all optional subpattern

        John,

        Thank you for another detailed reply.

        The cryptic syntax is beginning to slowly sink in, but there's still a few
        nagging issues.

        In my price list, the amount may or may not be preceded with a $ sign.

        For instance, the list might look like this:

        $2.99
        1.99
        $3.00
        $4.00

        I modified my price pattern to accommodate this:

        ((\\$|\s*)?\d{1 ,3}\.\d{2})

        which works great. The problem is, it also creates another array dimension
        that contains only $ or space:

        $

        $
        $

        I can simply ignore this dimension, but is there a way to prevent it?

        Thanks (again) in advance.

        "John Dunlop" <john+usenet@jo hndunlop.info> wrote in message
        news:MPG.19eb26 2ef3b8171798977 7@news.freeserv e.net...[color=blue]
        > Han wrote:
        >[color=green]
        > > Using preg_match_all, I need to capture a list of first and last names[/color][/color]
        plus[color=blue][color=green]
        > > an optional country code proceeding them.
        > >
        > > For example:
        > >
        > > <tr><td>AU</td><td>Jane Smith</td></tr>
        > > <tr><td></td><td>Bill Johnson</td></tr>
        > > <tr><td>GB</td><td>Larry Brown</td></tr>
        > > <tr><td>US</td><td>Mary Jordon</td></tr>
        > > <tr><td></td><td>Peter Jones</td></tr>
        > >
        > > [...] I know a subpattern is needed of all the possible country codes:
        > >
        > > AU|GB|US
        > >
        > > but how do you include this as an optional subpattern?[/color]
        >
        > The ? quantifier means zero or one of whatever came before,
        > representable by {0,1}. Quantifying a subpattern using the
        > question mark denotes its nonobligatory nature.
        >
        > So, to match optional two-letter country codes within a table cell
        > (doesn't properly cater for attributes, but that's rectifiable):
        >
        > `<td.*>([a-z]{2})?</td.*>`Usi
        >
        > If you wish to list the possible values, precluding others:
        >
        > `<td.*>(au|gb|u s)?</td.*>`Usi
        >
        > --
        > Jock[/color]



        Comment

        • John Dunlop

          #5
          Re: preg_match_all optional subpattern

          Han wrote:
          [color=blue]
          > ((\\$|\s*)?\d{1 ,3}\.\d{2})
          >
          > which works great. The problem is, it also creates another array
          > dimension that contains only $ or space:
          >
          > [...] I can simply ignore this dimension, but is there a way to
          > prevent it?[/color]

          Subpatterns that begin with the two character sequence "?:" aren't
          captured. You could then write your pattern as:

          `(?:\\$|\s*)?\d {1,3}\.\d{2}`

          --
          Jock

          Comment

          • Han

            #6
            Re: preg_match_all optional subpattern

            Jock,

            That's it--thanks.

            I've been spending some time re-reading the pattern documentation on php.net
            and it's beginning to sink in.

            Again, much appreciated!

            "John Dunlop" <john+usenet@jo hndunlop.info> wrote in message
            news:MPG.19ec77 cdbce17be098977 c@news.freeserv e.net...[color=blue]
            > Han wrote:
            >[color=green]
            > > ((\\$|\s*)?\d{1 ,3}\.\d{2})
            > >
            > > which works great. The problem is, it also creates another array
            > > dimension that contains only $ or space:
            > >
            > > [...] I can simply ignore this dimension, but is there a way to
            > > prevent it?[/color]
            >
            > Subpatterns that begin with the two character sequence "?:" aren't
            > captured. You could then write your pattern as:
            >
            > `(?:\\$|\s*)?\d {1,3}\.\d{2}`
            >
            > --
            > Jock[/color]


            Comment

            Working...