Regex to retain only the HTML body

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Karch

    Regex to retain only the HTML body

    If you run this:

    string result = "<html><hea d></head><body>The body</body></html>";
    result = retainBody.Repl ace(result, "$1");


    With the following Regex:

    private static readonly Regex retainBody = new
    Regex(@"<\s*bod y[^>]*>(.*)<[\s/]*body[^>]*>", RegexOptions.Co mpiled |
    RegexOptions.Ig noreCase | RegexOptions.Si ngleline);


    You get this as the return:

    <html><head></head>The body</html>

    I want this instead:

    The body


  • Nikola Stjelja

    #2
    Re: Regex to retain only the HTML body

    Karch wrote:
    If you run this:
    >
    string result = "<html><hea d></head><body>The body</body></html>";
    result = retainBody.Repl ace(result, "$1");
    >
    >
    With the following Regex:
    >
    private static readonly Regex retainBody = new
    Regex(@"<\s*bod y[^>]*>(.*)<[\s/]*body[^>]*>", RegexOptions.Co mpiled |
    RegexOptions.Ig noreCase | RegexOptions.Si ngleline);
    >
    >
    You get this as the return:
    >
    <html><head></head>The body</html>
    >
    I want this instead:
    >
    The body
    >
    >
    Try this

    string result = "<html><hea d></head><body>The body</body></html>";
    Regex reg = new
    Regex(@"<\s*bod y[^>]*>(?<body>(.*)) <[\s/]*body[^>]*>");
    Match body=reg.Match( result);
    Console.WriteLi ne(body.Groups["body"].ToString());

    Comment

    Working...