Problem Creating Regex Expression

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sean

    Problem Creating Regex Expression

    I am finally taking the time to get to know regex, but it seems I have
    taken a bit of a tumble.

    I have the following (dummy) data:

    <td>Name:</td <td>Kherie Kali</td>

    If I use this expression: "<td>Name:</td>\s*<td>Kheri e Kali</td>"

    I indeed get a match.

    The next step I took is to get this without knowing the name in
    advance. I then used the following expression: <td>Name:</td>
    \s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>

    and I didn't get a thing.

    Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
    symbol and then the * make it multiple words or symbols? Any light
    that could be shed on this situation is appreciated. I don't
    necessarily want the answer to my quandry, but insight as to what I am
    doing wrong.

    Thank you,

    -Sean

  • Sergey Zyuzin

    #2
    Re: Problem Creating Regex Expression

    On Feb 8, 7:59 am, Sean <ColdFusion...@ gmail.comwrote:
    I am finally taking the time to get to know regex, but it seems I have
    taken a bit of a tumble.
    >
    I have the following (dummy) data:
    >
    <td>Name:</td <td>Kherie Kali</td>
    >
    If I use this expression: "<td>Name:</td>\s*<td>Kheri e Kali</td>"
    >
    I indeed get a match.
    >
    The next step I took is to get this without knowing the name in
    advance. I then used the following expression: <td>Name:</td>
    \s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
    >
    and I didn't get a thing.
    >
    Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
    symbol and then the * make it multiple words or symbols? Any light
    that could be shed on this situation is appreciated. I don't
    necessarily want the answer to my quandry, but insight as to what I am
    doing wrong.
    >
    Thank you,
    >
    -Sean
    Hi, Sean

    "([a-zA-Z_$][a-zA-Z0-9_$]*)" will match any letter, underscore or '$'
    character followed by zero or more letters, digits, underscores, '$'
    chars.

    It seems you don't take into account space in the middle of "Kherie
    Kali".
    If you write more specific requirements I could write a RegEx

    Thanks,
    Sergey

    Comment

    • Kevin Spencer

      #3
      Re: Problem Creating Regex Expression

      The following will work capture all content inside <td></tdtags:

      (?<=td>)(.*?)(? =<)

      The first part is a positive look-ahead, indicating that a match must be
      preceded by the character sequence "td>" (non-capturing). The second part
      indicates any character 0 or more times with a lazy quantifier, meaning that
      it will capture as few times as possible. The third part is a positive
      look-ahead, indicating that the match must be followed by a "<" character.
      Since there are no "<" characters in the actual tag's content, this stops
      the match at the end of the tag.

      --
      HTH,

      Kevin Spencer
      Chicken Salad Surgeon
      Microsoft MVP

      "Sean" <ColdFusion244@ gmail.comwrote in message
      news:9e791c78-b1a2-43cd-916f-ae0202d31b66@1g 2000hsl.googleg roups.com...
      >I am finally taking the time to get to know regex, but it seems I have
      taken a bit of a tumble.
      >
      I have the following (dummy) data:
      >
      <td>Name:</td <td>Kherie Kali</td>
      >
      If I use this expression: "<td>Name:</td>\s*<td>Kheri e Kali</td>"
      >
      I indeed get a match.
      >
      The next step I took is to get this without knowing the name in
      advance. I then used the following expression: <td>Name:</td>
      \s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
      >
      and I didn't get a thing.
      >
      Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
      symbol and then the * make it multiple words or symbols? Any light
      that could be shed on this situation is appreciated. I don't
      necessarily want the answer to my quandry, but insight as to what I am
      doing wrong.
      >
      Thank you,
      >
      -Sean
      >

      Comment

      • Sean

        #4
        Re: Problem Creating Regex Expression

        On Feb 8, 8:19 am, "Kevin Spencer" <unclechutney@l ocalhostwrote:
        The following will work capture all content inside <td></tdtags:
        >
        (?<=td>)(.*?)(? =<)
        >
        The first part is a positive look-ahead, indicating that a match must be
        preceded by the character sequence "td>" (non-capturing). The second part
        indicates any character 0 or more times with a lazy quantifier, meaning that
        it will capture as few times as possible. The third part is a positive
        look-ahead, indicating that the match must be followed by a "<" character.
        Since there are no "<" characters in the actual tag's content, this stops
        the match at the end of the tag.
        >
        --
        HTH,
        >
        Kevin Spencer
        Chicken Salad Surgeon
        Microsoft MVP
        >
        "Sean" <ColdFusion...@ gmail.comwrote in message
        >
        news:9e791c78-b1a2-43cd-916f-ae0202d31b66@1g 2000hsl.googleg roups.com...
        >
        >
        >
        I am finally taking the time to get to know regex, but it seems I have
        taken a bit of a tumble.
        >
        I have the following (dummy) data:
        >
        <td>Name:</td <td>Kherie Kali</td>
        >
        If I use this expression: "<td>Name:</td>\s*<td>Kheri e Kali</td>"
        >
        I indeed get a match.
        >
        The next step I took is to get this without knowing the name in
        advance. I then used the following expression: <td>Name:</td>
        \s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
        >
        and I didn't get a thing.
        >
        Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
        symbol and then the * make it multiple words or symbols? Any light
        that could be shed on this situation is appreciated. I don't
        necessarily want the answer to my quandry, but insight as to what I am
        doing wrong.
        >
        Thank you,
        >
        -Sean- Hide quoted text -
        >
        - Show quoted text -
        Kevin and Sergey,

        Kevin: Thank you for the explanation! This find the Name: Tag, but
        won't find the dummy person's actual name. I think I have to add in a
        place for spaces like sergey said.

        Sergey: I have been fittling with the last sequence by attempting to
        add spaces, but I still can't get it to work for some reason. There
        really is no specific requirements, I'm just trying to pull that name
        out.

        Thank you both for the explanations, it was very helpful, I'm just
        still having problems understanding why it won't work. The latest one
        I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"

        -Sean

        Comment

        • Sergey Zyuzin

          #5
          Re: Problem Creating Regex Expression

          On Feb 8, 4:13 pm, Sean <ColdFusion...@ gmail.comwrote:
          On Feb 8, 8:19 am, "Kevin Spencer" <unclechutney@l ocalhostwrote:
          >
          >
          >
          >
          >
          The following will work capture all content inside <td></tdtags:
          >
          (?<=td>)(.*?)(? =<)
          >
          The first part is a positive look-ahead, indicating that a match must be
          preceded by the character sequence "td>" (non-capturing). The second part
          indicates any character 0 or more times with a lazy quantifier, meaning that
          it will capture as few times as possible. The third part is a positive
          look-ahead, indicating that the match must be followed by a "<" character.
          Since there are no "<" characters in the actual tag's content, this stops
          the match at the end of the tag.
          >
          --
          HTH,
          >
          Kevin Spencer
          Chicken Salad Surgeon
          Microsoft MVP
          >
          "Sean" <ColdFusion...@ gmail.comwrote in message
          >
          news:9e791c78-b1a2-43cd-916f-ae0202d31b66@1g 2000hsl.googleg roups.com...
          >
          >I am finally taking the time to get to know regex, but it seems I have
          taken a bit of a tumble.
          >
          I have the following (dummy) data:
          >
          <td>Name:</td <td>Kherie Kali</td>
          >
          If I use this expression: "<td>Name:</td>\s*<td>Kheri e Kali</td>"
          >
          I indeed get a match.
          >
          The next step I took is to get this without knowing the name in
          advance. I then used the following expression: <td>Name:</td>
          \s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
          >
          and I didn't get a thing.
          >
          Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
          symbol and then the * make it multiple words or symbols? Any light
          that could be shed on this situation is appreciated. I don't
          necessarily want the answer to my quandry, but insight as to what I am
          doing wrong.
          >
          Thank you,
          >
          -Sean- Hide quoted text -
          >
          - Show quoted text -
          >
          Kevin and Sergey,
          >
          Kevin: Thank you for the explanation! This find the Name: Tag, but
          won't find the dummy person's actual name. I think I have to add in a
          place for spaces like sergey said.
          >
          Sergey: I have been fittling with the last sequence by attempting to
          add spaces, but I still can't get it to work for some reason. There
          really is no specific requirements, I'm just trying to pull that name
          out.
          >
          Thank you both for the explanations, it was very helpful, I'm just
          still having problems understanding why it won't work. The latest one
          I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"
          >
          -Sean- Hide quoted text -
          >
          - Show quoted text -
          You should put \s inside brackets "([a-zA-Z_$][a-zA-Z0-9_$\s]*)"
          If you don't have specific requirements than probably you could use
          expression
          similar to what Kevin suggests or something like "<td>Name:</td>
          \s*<td>(.*?)</td>"

          Thanks,
          Sergey

          Comment

          • Sean

            #6
            Re: Problem Creating Regex Expression

            On Feb 8, 9:50 am, Sergey Zyuzin <forever....@gm ail.comwrote:
            On Feb 8, 4:13 pm, Sean <ColdFusion...@ gmail.comwrote:
            >
            >
            >
            >
            >
            On Feb 8, 8:19 am, "Kevin Spencer" <unclechutney@l ocalhostwrote:
            >
            The following will work capture all content inside <td></tdtags:
            >
            (?<=td>)(.*?)(? =<)
            >
            The first part is a positive look-ahead, indicating that a match must be
            preceded by the character sequence "td>" (non-capturing). The second part
            indicates any character 0 or more times with a lazy quantifier, meaning that
            it will capture as few times as possible. The third part is a positive
            look-ahead, indicating that the match must be followed by a "<" character.
            Since there are no "<" characters in the actual tag's content, this stops
            the match at the end of the tag.
            >
            --
            HTH,
            >
            Kevin Spencer
            Chicken Salad Surgeon
            Microsoft MVP
            >
            "Sean" <ColdFusion...@ gmail.comwrote in message
            >
            >news:9e791c7 8-b1a2-43cd-916f-ae0202d31b66@1g 2000hsl.googleg roups.com...
            >
            I am finally taking the time to get to know regex, but it seems I have
            taken a bit of a tumble.
            >
            I have the following (dummy) data:
            >
            <td>Name:</td <td>Kherie Kali</td>
            >
            If I use this expression: "<td>Name:</td>\s*<td>Kheri e Kali</td>"
            >
            I indeed get a match.
            >
            The next step I took is to get this without knowing the name in
            advance. I then used the following expression: <td>Name:</td>
            \s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
            >
            and I didn't get a thing.
            >
            Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
            symbol and then the * make it multiple words or symbols? Any light
            that could be shed on this situation is appreciated. I don't
            necessarily want the answer to my quandry, but insight as to what I am
            doing wrong.
            >
            Thank you,
            >
            -Sean- Hide quoted text -
            >
            - Show quoted text -
            >
            Kevin and Sergey,
            >
            Kevin: Thank you for the explanation! This find the Name: Tag, but
            won't find the dummy person's actual name. I think I have to add in a
            place for spaces like sergey said.
            >
            Sergey: I have been fittling with the last sequence by attempting to
            add spaces, but I still can't get it to work for some reason. There
            really is no specific requirements, I'm just trying to pull that name
            out.
            >
            Thank you both for the explanations, it was very helpful, I'm just
            still having problems understanding why it won't work. The latest one
            I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"
            >
            -Sean- Hide quoted text -
            >
            - Show quoted text -
            >
            You should put \s inside brackets "([a-zA-Z_$][a-zA-Z0-9_$\s]*)"
            If you don't have specific requirements than probably you could use
            expression
            similar to what Kevin suggests or something like "<td>Name:</td>
            \s*<td>(.*?)</td>"
            >
            Thanks,
            Sergey- Hide quoted text -
            >
            - Show quoted text -
            Perfect, thank you both!

            I used your last suggestion Sergey, and did the following: "<td>Name:</
            td>\s*<td>(?<a0 >(.*?))</td>" which correctly matched "Kherie Kali" and
            put it into the a0 group.

            I appreciate both of your help!

            -Sean

            Comment

            Working...