Re: Search for multiple things in a string
Oliver Sturm <oliver@sturmne t.org> wrote:[color=blue][color=green]
> >I know I couldn't off the top of my head list all the characters which
> >need escaping for regular expressions - could you and every member of
> >your team?[/color]
>
> I think I might, they are not really as many as you think. But that's not
> the point; I use a testing tool when I create a larger expression and I
> most probably use it again when I make changes. I have comments on my
> regular expressions telling me what they do, what sample input and output
> is. The first thing that's important is just that someone has to recognize
> a regular expression when he encounters it, you're right about that.[/color]
Absolutely - especially when your tests may well not catch the problem.
For instance, if you have a search for "jon.skeet" , are you going to
write a test to make sure that "jonxskeet" doesn't match? Unless you
actually know what to avoid (in which case you're likely to have
written it correctly in the first place) the test may well not pick up
on a missed character which needs escaping.
[color=blue][color=green][color=darkred]
> >>>Whereas three calls to IndexOf is definitely more readable than a
> >>>regular expression which, depending on the strings involved may well
> >>>need to involve escaping.
> >>
> >>In this case, as far as it's described by the sample we've seen, I
> >>wouldn't favor the usage of regular expressions.[/color]
> >
> >Even though it's more than one call to a simple string function?[/color]
>
> Probably... the number of calls is not really what counts, is it?[/color]
I was only going by what you'd said previously:
<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>
[color=blue]
> Sometimes, string parsing algorithms that don't make use of regular
> expressions involve several nested loops, several temporary variables and
> just a single call to a simple string function. Yet these beasts can be
> horrible because it takes only a short while until even the author can't
> reliably remember what the algorithm does.[/color]
Absolutely.
[color=blue]
> I won't contest the fact that three lines of code, calling IndexOf three
> times, are probably a better alternative to a regular expression.[/color]
Goodo :)
[color=blue][color=green]
> >They have a readability problem compared with simple operations - they
> >require more care than simple literals. To me, "more care required"
> >means "lower readability and maintainability ", which is a problem.[/color]
>
> Well, let's agree to disagree. I'm still trying to make the point that the
> comparison with simple string literals is a bad one, because the two won't
> ever be equal alternatives in any real world problem situation.[/color]
I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.
[color=blue]
> Use the simple operations as long as it makes sense, but don't
> hesitate to look at other solutions because you think someone else on
> the team might make a mistake changing a string literal later on.[/color]
If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.
[color=blue][color=green]
> >I'm not saying they're hideously unreadable - just less readable.
> >That's enough for me.[/color]
>
> Jon, I'm with you most of the way. But there's a limit to the demand for
> readability, as I see it. I'm not likely to turn down a useful technology
> in cases where it is practically without alternatives because the solution
> doesn't please me aesthetically.[/color]
Me either - but where there *is* a practical alternative which is more
readable, I'll go for that. If you only have one solution, you *can't*
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)
--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Oliver Sturm <oliver@sturmne t.org> wrote:[color=blue][color=green]
> >I know I couldn't off the top of my head list all the characters which
> >need escaping for regular expressions - could you and every member of
> >your team?[/color]
>
> I think I might, they are not really as many as you think. But that's not
> the point; I use a testing tool when I create a larger expression and I
> most probably use it again when I make changes. I have comments on my
> regular expressions telling me what they do, what sample input and output
> is. The first thing that's important is just that someone has to recognize
> a regular expression when he encounters it, you're right about that.[/color]
Absolutely - especially when your tests may well not catch the problem.
For instance, if you have a search for "jon.skeet" , are you going to
write a test to make sure that "jonxskeet" doesn't match? Unless you
actually know what to avoid (in which case you're likely to have
written it correctly in the first place) the test may well not pick up
on a missed character which needs escaping.
[color=blue][color=green][color=darkred]
> >>>Whereas three calls to IndexOf is definitely more readable than a
> >>>regular expression which, depending on the strings involved may well
> >>>need to involve escaping.
> >>
> >>In this case, as far as it's described by the sample we've seen, I
> >>wouldn't favor the usage of regular expressions.[/color]
> >
> >Even though it's more than one call to a simple string function?[/color]
>
> Probably... the number of calls is not really what counts, is it?[/color]
I was only going by what you'd said previously:
<quote>
I'd even go so far as to say that as soon as more than one call to a
simple string function is needed for a given problem, most probably
I'll find the regular expression solution more readable.
</quote>
[color=blue]
> Sometimes, string parsing algorithms that don't make use of regular
> expressions involve several nested loops, several temporary variables and
> just a single call to a simple string function. Yet these beasts can be
> horrible because it takes only a short while until even the author can't
> reliably remember what the algorithm does.[/color]
Absolutely.
[color=blue]
> I won't contest the fact that three lines of code, calling IndexOf three
> times, are probably a better alternative to a regular expression.[/color]
Goodo :)
[color=blue][color=green]
> >They have a readability problem compared with simple operations - they
> >require more care than simple literals. To me, "more care required"
> >means "lower readability and maintainability ", which is a problem.[/color]
>
> Well, let's agree to disagree. I'm still trying to make the point that the
> comparison with simple string literals is a bad one, because the two won't
> ever be equal alternatives in any real world problem situation.[/color]
I don't see how you can say that when using regular expressions was one
suggested solution, and using IndexOf was another suggested solution.
[color=blue]
> Use the simple operations as long as it makes sense, but don't
> hesitate to look at other solutions because you think someone else on
> the team might make a mistake changing a string literal later on.[/color]
If the other solution is likely to be fundamentally simpler, I'm all
for that. It was this particular situation that I was commenting on,
and the general comment that regular expressions are often used as a
sledgehammer to crack a pretty flimsy nut.
[color=blue][color=green]
> >I'm not saying they're hideously unreadable - just less readable.
> >That's enough for me.[/color]
>
> Jon, I'm with you most of the way. But there's a limit to the demand for
> readability, as I see it. I'm not likely to turn down a useful technology
> in cases where it is practically without alternatives because the solution
> doesn't please me aesthetically.[/color]
Me either - but where there *is* a practical alternative which is more
readable, I'll go for that. If you only have one solution, you *can't*
turn it down really, can you? (Unless you can forego the feature which
requires it, of course, which is unlikely.)
--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Comment