Regex.Matches Problem

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?Utf-8?B?SkF1bA==?=

    Regex.Matches Problem

    I am currently working on a project and need to get a return… even if that
    return is a failure. I must also add that I have no control over either the
    Regular Expression that will be used or the text file that will be parsed,
    and while my text example is XML, there is no guarantee that the text file
    will be xml.

    This is a portion of the file I am testing with:
    <Stock ticker="ACEGX" type="EQUITY" title="VAN KAMPEN STRATEGIC GROWTH
    FUND CLASS A" price="44.47" net="-0.01" volume="0" />
    <Stock ticker="ACSRX" type="EQUITY" title="VAN KAMPEN COMSTOCK FUND CL R"
    price="19.85" net="0.02" volume="0" />
    <Stock ticker="ADP" type="EQUITY" title="Automati c Data Processing Inc."
    price="50.58" net="-0.11" volume="422600" />
    <Stock ticker="AEE" type="EQUITY" title="Ameren Corp." price="53.26"
    net="-0.41" volume="129800" />
    <Stock ticker="AEP" type="EQUITY" title="American Electric Power Company
    Inc." price="45.56" net="-0.39" volume="753100" />

    Everything works as it should if I use the Regular Expression:
    Stock ticker="(?<tick er>.*?)" type="(?<type>. *?)" title="(?<title >.*?)"
    price="(?<price >.*?)" net="(?<net>.*? )" volume="(?<volu me>.*?)"

    But if I miss 1 space (“net=” instead of “ net=”) there is no return, it
    locks up (I let it run all night, it’s not just slow)… (The bad expression):
    Stock ticker="(?<tick er>.*?)" type="(?<type>. *?)" title="(?<title >.*?)"
    price="(?<price >.*?)"net="(?<n et>.*?)" volume="(?<volu me>.*?)"

    I need a way to set up the Regex that will not lock up…

    Any ideas?

    (If this is not the right place to post this question, please let me know
    which forum would be better)
  • Oliver Sturm

    #2
    Re: Regex.Matches Problem

    Hello JAul,

    I'm wondering what exactly your question is. As I understand you, you have
    one expression that works and one that doesn't. The one that doesn't work,
    also locks up the system for an unknown reason - why do you care, as it
    doesn't work anyway?
    >But if I miss 1 space (“net=” instead of “ net=”) there is no
    >return, it
    >locks up (I let it run all night, it’s not just slow)… (The bad
    >expression):
    >Stock ticker="(?<tick er>.*?)" type="(?<type>. *?)" title="(?<title >.*?)"
    >price="(?<pric e>.*?)"net="(?< net>.*?)" volume="(?<volu me>.*?)"
    >
    >I need a way to set up the Regex that will not lock up…
    Seems to me as if the answer was "use the one with the right spacing (as
    you're likely to do anyway, as the other doesn't even work) and you'll be
    fine".

    I'm sure I'm somehow missing the point.


    Oliver Sturm
    --

    Comment

    • =?Utf-8?B?SkF1bA==?=

      #3
      Re: Regex.Matches Problem

      Thank you for the reply and sorry I was not clear, but I have no control over
      the expression or the document, a user will enter the parsing expression and
      point at the document, my application needs to report the matches.

      If the expression is way off it works, it reports no matches, which is
      correct. The problem is if the expression is close but not quite right.

      My question is about setting up the regex call...

      Regex m_RegEx = new Regex(m_strExpr ession);
      MatchCollection m_Matches = m_RegEx.Matches (m_strTextToPar se);
      return m_Matches.Count ;

      The problem is the m_Matches = m_RegEx.Matches (m_strTextToPar se); never
      returns if the expression (m_strExpressio n) is that tad bit off.

      What I need is advice on how to set up the call (C# code) so it will not
      lock up, that it will return (even if the return is just a failure).


      "Oliver Sturm" wrote:
      Hello JAul,
      >
      I'm wondering what exactly your question is. As I understand you, you have
      one expression that works and one that doesn't. The one that doesn't work,
      also locks up the system for an unknown reason - why do you care, as it
      doesn't work anyway?
      >
      But if I miss 1 space (“net=” instead of “ net=”) there is no
      return, it
      locks up (I let it run all night, it’s not just slow)… (The bad
      expression):
      Stock ticker="(?<tick er>.*?)" type="(?<type>. *?)" title="(?<title >.*?)"
      price="(?<price >.*?)"net="(?<n et>.*?)" volume="(?<volu me>.*?)"

      I need a way to set up the Regex that will not lock up…
      >
      Seems to me as if the answer was "use the one with the right spacing (as
      you're likely to do anyway, as the other doesn't even work) and you'll be
      fine".
      >
      I'm sure I'm somehow missing the point.
      >
      >
      Oliver Sturm
      --

      >

      Comment

      • Oliver Sturm

        #4
        Re: Regex.Matches Problem

        Hello JAul,
        >Thank you for the reply and sorry I was not clear, but I have no control
        >over
        >the expression or the document, a user will enter the parsing expression
        >and
        >point at the document, my application needs to report the matches.
        Right, I understand that.
        >If the expression is way off it works, it reports no matches, which is
        >correct. The problem is if the expression is close but not quite right.
        Okay - but you say you're not the one who writes that expression, right?
        >My question is about setting up the regex call...
        <snip>

        You seem to be under the impression that there's something you can do
        about the call ("setting it up" - what's that supposed to mean?) that
        would influence whether or not the expression works. I don't understand
        what you imagine you could do. If the expression is wrong, it's wrong - it
        won't work under any circumstances.

        Actually I would say that if you have an expression that makes the call to
        the Matches() method never return, you've found a bug in the regex
        implementation. It would probably be good if you'd report that to
        Microsoft. But that won't help you now - there's no trick you can use from
        the outside to make that method return if it hangs due to that bug.


        Oliver Sturm
        --

        Comment

        • =?Utf-8?B?SkF1bA==?=

          #5
          Re: Regex.Matches Problem



          "Oliver Sturm" wrote:
          Hello JAul,
          >
          Thank you for the reply and sorry I was not clear, but I have no control
          over
          the expression or the document, a user will enter the parsing expression
          and
          point at the document, my application needs to report the matches.
          >
          Right, I understand that.
          >
          If the expression is way off it works, it reports no matches, which is
          correct. The problem is if the expression is close but not quite right.
          >
          Okay - but you say you're not the one who writes that expression, right?
          >
          My question is about setting up the regex call...
          >
          <snip>
          >
          You seem to be under the impression that there's something you can do
          about the call ("setting it up" - what's that supposed to mean?) that
          would influence whether or not the expression works. I don't understand
          what you imagine you could do. If the expression is wrong, it's wrong - it
          won't work under any circumstances.
          >
          Actually I would say that if you have an expression that makes the call to
          the Matches() method never return, you've found a bug in the regex
          implementation. It would probably be good if you'd report that to
          Microsoft. But that won't help you now - there's no trick you can use from
          the outside to make that method return if it hangs due to that bug.
          >
          >
          Oliver Sturm
          --

          >
          Thank you Oliver, that is what I was afraid of... I will report the bug and
          see what I get back from Micrsoft.

          I was hoping that there was something that could be done with the m_Matches
          = Regex.Matches(m _strExp) call that would fix the lack of a return.

          Comment

          • Andrew Morton

            #6
            Re: Regex.Matches Problem

            JAul wrote:
            "Oliver Sturm" wrote:
            >Actually I would say that if you have an expression that makes the
            >call to the Matches() method never return, you've found a bug in the
            >regex implementation. It would probably be good if you'd report that
            >to Microsoft. But that won't help you now - there's no trick you can
            >use from the outside to make that method return if it hangs due to
            >that bug.
            >
            Thank you Oliver, that is what I was afraid of... I will report the
            bug and see what I get back from Micrsoft.
            >
            I was hoping that there was something that could be done with the
            m_Matches = Regex.Matches(m _strExp) call that would fix the lack of a
            return.
            It may not be a bug - some regexes can take a very long time to process even
            though they look simple.

            Perhaps you could do a test when the user has entered the regex: start the
            regex on some test data in a separate thread and if that thread takes too
            long, kill the thread and tell the user to try again.

            I have no knowledge of creating/killing threads.

            Andrew


            Comment

            Working...