Regular Expressions Difficulty

**Martin Honnen** · Jul 23 '05, 03:55 PM

Re: Regular Expressions Difficulty

Befuddled wrote:
[color=blue]
> I am writing a function to have its argument, HTML-containing string,
> return a DOM 1 Document Fragment, and so it seems the use of regular
> expressions (REs) is a natural.[/color]

HTML browsers have HTML parsing built in so why do you neeed regular
expressions to parse HTML, why don't you simply create an element, set
its innerHTML to the HTML snippet and then read out the child nodes as
needed:
var div = document.create Element('div');
div.innerHTML = htmlString;
Now build a document fragment if needed and simply move the child nodes
of the div to the fragment if you want.

--

Martin Honnen

Attention Required! | Cloudflare

http://JavaScript.FAQTs.com/

**Befuddled** · Jul 23 '05, 03:55 PM

Re: Regular Expressions Difficulty

Martin Honnen <mahotrash@yaho o.de> wrote in news:41baedaf$0 $16044
$9b4e6d93@newsr ead4.arcor-online.net:
[color=blue]
>
>
> Befuddled wrote:
>[color=green]
>> I am writing a function to have its argument, HTML-containing string,
>> return a DOM 1 Document Fragment, and so it seems the use of regular
>> expressions (REs) is a natural.[/color]
>
> HTML browsers have HTML parsing built in so why do you neeed regular
> expressions to parse HTML, why don't you simply create an element, set
> its innerHTML to the HTML snippet and then read out the child nodes as
> needed:
> var div = document.create Element('div');
> div.innerHTML = htmlString;[/color]

I was avoiding the property 'innerHTML' because I did not know if it was
standardized in DOM at any level. I am ABSOLUTELY avoiding the use of
extensions beyond the standard (or more modestly put forth as a
"recommendation "), no matter how many browsers have the functionality to
interpret it, even if it is 99.999% of all browsers used on the planet.

If 'innerHTML' is now standardized, that saves a lot of
work/coding/function writing. Searches of the specifications for DOM
(and JavaScript for that matter) that I have in my possession for the
property 'innerHTML' produce ZERO results. Please provide a URL to the
DOM and/or JavaScript specification that I am missing so that I can make
use of that information. Thanks.
[color=blue]
> Now build a document fragment if needed and simply move the child nodes
> of the div to the fragment if you want.
>[/color]

**Martin Honnen** · Jul 23 '05, 03:55 PM

Re: Regular Expressions Difficulty

Befuddled wrote:
[color=blue]
> Martin Honnen <mahotrash@yaho o.de> wrote[/color]
[color=blue][color=green]
>>HTML browsers have HTML parsing built in so why do you neeed regular
>>expressions to parse HTML, why don't you simply create an element, set
>>its innerHTML to the HTML snippet and then read out the child nodes as
>>needed:
>> var div = document.create Element('div');
>> div.innerHTML = htmlString;[/color]
>
>
> I was avoiding the property 'innerHTML' because I did not know if it was
> standardized in DOM at any level. I am ABSOLUTELY avoiding the use of
> extensions beyond the standard (or more modestly put forth as a
> "recommendation ")[/color]

So you would prefer createDocumentF ragment for instance to innerHTML
because createDocumentF ragment is in the W3C recommendation but
innerHTML is not? For istance IE 5.5 doesn't support
createDocumentF ragment so your code will not work there. innerHTML
certainly has far greater support than createDocumentF ragment.
But anyway, as for your regular expression problem, matching by default
is greedy meaning as much as possible is matched so your expression
correctyly consumes characters to the last > it can find.
If you want non greedy matching then you can use ? after the quantifier e.g.
.+?
but support for that is only in ECMAScript edition 3 compatible
implementations , with older browsers such a construct is likely to not
give the desired result.
There are workarounds such as
/<([^>]+)>/
--

Martin Honnen

Attention Required! | Cloudflare

http://JavaScript.FAQTs.com/

**Befuddled** · Jul 23 '05, 03:55 PM

Re: Regular Expressions Difficulty

Martin Honnen <mahotrash@yaho o.de> wrote in
news:41bb0730$0 $16044$9b4e6d93 @newsread4.arco r-online.net:
[color=blue]
>
>
> Befuddled wrote:
>[color=green]
>> Martin Honnen <mahotrash@yaho o.de> wrote[/color]
>[color=green][color=darkred]
>>>HTML browsers have HTML parsing built in so why do you neeed regular
>>>expression s to parse HTML, why don't you simply create an element,
>>>set its innerHTML to the HTML snippet and then read out the child
>>>nodes as needed:
>>> var div = document.create Element('div');
>>> div.innerHTML = htmlString;[/color]
>>
>>
>> I was avoiding the property 'innerHTML' because I did not know if it
>> was standardized in DOM at any level. I am ABSOLUTELY avoiding the
>> use of extensions beyond the standard (or more modestly put forth as
>> a "recommendation ")[/color]
>
> So you would prefer createDocumentF ragment for instance to innerHTML
> because createDocumentF ragment is in the W3C recommendation but
> innerHTML is not? For istance IE 5.5 doesn't support
> createDocumentF ragment so your code will not work there. innerHTML
> certainly has far greater support than createDocumentF ragment.[/color]

You're right. I was hasty in my explanation of adhering to the standard.
I should have said that while my first duty is to the standard and to get
its code in place, after writing its code, I attempt to include browser-
dependent code, where possible, to accomodate browsers that don't happen
to understand the standard. Sorry for being misleading, sounding
impractical, and standing too adamantly.
[color=blue]
> But anyway, as for your regular expression problem, matching by
> default is greedy meaning as much as possible is matched so your
> expression correctyly consumes characters to the last > it can find.[/color]

I suppose there was a good reason why the original developers of regular
expressions wanted them to consume as much text as possible in matching
criteria, rather than grabbing what was minimal (working from left to
right, rather than right to left). I would love to know their reasoning.
[color=blue]
> If you want non greedy matching then you can use ? after the
> quantifier e.g.
> .+?
> but support for that is only in ECMAScript edition 3 compatible
> implementations , with older browsers such a construct is likely to not
> give the desired result.
> There are workarounds such as
> /<([^>]+)>/[/color]

Your solution appears to be working nicely. Thanks for all your good
information.

--

http://hume.realisticpolitics.com/

Regular Expressions Difficulty

Regular Expressions Difficulty

Comment

Comment

Comment

Comment