Parsing in between strings using Regex

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • CJ

    Parsing in between strings using Regex

    Is this the format to parse a string and return the value between the item?

    Regex pRE = new Regex("<File_Na me>.*>(?<inside Text>.*)</File_Name>");

    I am trying to parse this string.

    <File_Name>Serv ices</File_Name>


    Thanks


  • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

    #2
    Re: Parsing in between strings using Regex

    CJ wrote:
    Is this the format to parse a string and return the value between the item?
    >
    Regex pRE = new Regex("<File_Na me>.*>(?<inside Text>.*)</File_Name>");
    >
    I am trying to parse this string.
    >
    <File_Name>Serv ices</File_Name>
    Regex re = new Regex("<File_Na me>(?<insideTex t>.*)</File_Name>");
    string fn = re.Match(s).Gro ups["insideText "].Value;

    seems to work.

    Arne

    Comment

    • CJ

      #3
      Re: Parsing in between strings using Regex

      Thanks Arne,

      Seems like the ".*" was messing me up.

      This regular expression is so hard at times, I don't know how
      you guys have this thing figured out.

      CJ


      "Arne Vajhøj" <arne@vajhoej.d kwrote in message
      news:47cb619e$0 $90267$14726298 @news.sunsite.d k...
      CJ wrote:
      >Is this the format to parse a string and return the value between the
      >item?
      >>
      >Regex pRE = new Regex("<File_Na me>.*>(?<inside Text>.*)</File_Name>");
      >>
      >I am trying to parse this string.
      >>
      ><File_Name>Ser vices</File_Name>
      >
      Regex re = new Regex("<File_Na me>(?<insideTex t>.*)</File_Name>");
      string fn = re.Match(s).Gro ups["insideText "].Value;
      >
      seems to work.
      >
      Arne

      Comment

      • Jesse Houwing

        #4
        Re: Parsing in between strings using Regex

        Hello cj,
        Thanks Arne,
        >
        Seems like the ".*" was messing me up.
        >
        This regular expression is so hard at times, I don't know how you guys
        have this thing figured out.
        This looks a lot like XML data. If it is, you really should try to avoid
        regex and use XPath to fetch the data you need.

        If it isn't wellformed Regex can help you, but the regex you have still has
        a few issues in it.

        dor one, if your input would contain "<file_name>bbb bbbbbb</file_name><file _name>aaaaaaaaa aaa</file_name>"
        you would get this as your whole value:
        "bbbbbbbbb</file_name><file _name>aaaaaaaaa aaa". Obviously not what's required.

        You can adjust your regex to prevent this from happening in two ways:

        1) Use Reluctant Matching
        Regex re = new Regex("<File_Na me>(?<insideTex t>.*?)</File_Name>");
        string fn = re.Match(s).Gro ups["insideText "].Value;

        2) Use a negative Look Ahead
        Regex re = new Regex("<File_Na me>(?<insideTex t>((?!</File_Name>).)*) </File_Name>");
        string fn = re.Match(s).Gro ups["insideText "].Value;

        One thing that migth also catch up with you is afile that is formatted like
        this (let's hope the newsreader will leave this in tact):
        <file_name>
        bbbbbbbbb
        </file_name>

        This is probably syntactically correct, but as . normally doesn't match over
        the end of a line, it will require you to use an extra switch in your regex
        constructor (either case) which will allow . to match newline.
        Regex re = new Regex("your regex here", RegexOptions.Si ngleline);

        Alternatively you could 'eat up' all whitespace around the File_Name. But
        only if you're very sure the filename itself will never contain a newline
        or have whitespace in it at the strat or end of the filename.

        1)
        Regex re = new Regex("<File_Na me>\s*(?<inside Text>.*?)\s*</File_Name>");
        2)
        Regex re = new Regex("<File_Na me>\s*(?<inside Text>((?!</File_Name>).)*? )\s*</File_Name>");

        Kind Regards,

        Jesse Houwing
        >
        CJ
        >
        "Arne Vajhøj" <arne@vajhoej.d kwrote in message
        news:47cb619e$0 $90267$14726298 @news.sunsite.d k...
        >
        >CJ wrote:
        >>
        >>Is this the format to parse a string and return the value between
        >>the item?
        >>>
        >>Regex pRE = new
        >>Regex("<File_ Name>.*>(?<insi deText>.*)</File_Name>");
        >>>
        >>I am trying to parse this string.
        >>>
        >><File_Name>Se rvices</File_Name>
        >>>
        >Regex re = new Regex("<File_Na me>(?<insideTex t>.*)</File_Name>");
        >string fn = re.Match(s).Gro ups["insideText "].Value;
        >>
        >seems to work.
        >>
        >Arne
        >>
        --
        Jesse Houwing
        jesse.houwing at sogeti.nl


        Comment

        • Ignacio Machin \( .NET/ C# MVP \)

          #5
          Re: Parsing in between strings using Regex

          Hi,

          "CJ" <cjrivers@noema il.comwrote in message
          news:e6vBobNfIH A.536@TK2MSFTNG P06.phx.gbl...
          Thanks Arne,
          >
          Seems like the ".*" was messing me up.
          >
          This regular expression is so hard at times, I don't know how
          you guys have this thing figured out.
          Practice, you should try it a couple of times until you find the correct way

          Also a book would help you ;)

          Comment

          Working...