Search for multiple things in a string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jon Skeet [C# MVP]

    #16
    Re: Search for multiple things in a string

    Oliver Sturm <oliver@sturmne t.org> wrote:[color=blue][color=green]
    > >I know I couldn't off the top of my head list all the characters which
    > >need escaping for regular expressions - could you and every member of
    > >your team?[/color]
    >
    > I think I might, they are not really as many as you think. But that's not
    > the point; I use a testing tool when I create a larger expression and I
    > most probably use it again when I make changes. I have comments on my
    > regular expressions telling me what they do, what sample input and output
    > is. The first thing that's important is just that someone has to recognize
    > a regular expression when he encounters it, you're right about that.[/color]

    Absolutely - especially when your tests may well not catch the problem.
    For instance, if you have a search for "jon.skeet" , are you going to
    write a test to make sure that "jonxskeet" doesn't match? Unless you
    actually know what to avoid (in which case you're likely to have
    written it correctly in the first place) the test may well not pick up
    on a missed character which needs escaping.
    [color=blue][color=green][color=darkred]
    > >>>Whereas three calls to IndexOf is definitely more readable than a
    > >>>regular expression which, depending on the strings involved may well
    > >>>need to involve escaping.
    > >>
    > >>In this case, as far as it's described by the sample we've seen, I
    > >>wouldn't favor the usage of regular expressions.[/color]
    > >
    > >Even though it's more than one call to a simple string function?[/color]
    >
    > Probably... the number of calls is not really what counts, is it?[/color]

    I was only going by what you'd said previously:

    <quote>
    I'd even go so far as to say that as soon as more than one call to a
    simple string function is needed for a given problem, most probably
    I'll find the regular expression solution more readable.
    </quote>
    [color=blue]
    > Sometimes, string parsing algorithms that don't make use of regular
    > expressions involve several nested loops, several temporary variables and
    > just a single call to a simple string function. Yet these beasts can be
    > horrible because it takes only a short while until even the author can't
    > reliably remember what the algorithm does.[/color]

    Absolutely.
    [color=blue]
    > I won't contest the fact that three lines of code, calling IndexOf three
    > times, are probably a better alternative to a regular expression.[/color]

    Goodo :)
    [color=blue][color=green]
    > >They have a readability problem compared with simple operations - they
    > >require more care than simple literals. To me, "more care required"
    > >means "lower readability and maintainability ", which is a problem.[/color]
    >
    > Well, let's agree to disagree. I'm still trying to make the point that the
    > comparison with simple string literals is a bad one, because the two won't
    > ever be equal alternatives in any real world problem situation.[/color]

    I don't see how you can say that when using regular expressions was one
    suggested solution, and using IndexOf was another suggested solution.
    [color=blue]
    > Use the simple operations as long as it makes sense, but don't
    > hesitate to look at other solutions because you think someone else on
    > the team might make a mistake changing a string literal later on.[/color]

    If the other solution is likely to be fundamentally simpler, I'm all
    for that. It was this particular situation that I was commenting on,
    and the general comment that regular expressions are often used as a
    sledgehammer to crack a pretty flimsy nut.
    [color=blue][color=green]
    > >I'm not saying they're hideously unreadable - just less readable.
    > >That's enough for me.[/color]
    >
    > Jon, I'm with you most of the way. But there's a limit to the demand for
    > readability, as I see it. I'm not likely to turn down a useful technology
    > in cases where it is practically without alternatives because the solution
    > doesn't please me aesthetically.[/color]

    Me either - but where there *is* a practical alternative which is more
    readable, I'll go for that. If you only have one solution, you *can't*
    turn it down really, can you? (Unless you can forego the feature which
    requires it, of course, which is unlikely.)

    --
    Jon Skeet - <skeet@pobox.co m>
    http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
    If replying to the group, please do not mail me too

    Comment

    • Oliver Sturm

      #17
      Re: Search for multiple things in a string

      Jon Skeet [C# MVP] wrote:
      [color=blue][color=green][color=darkred]
      >>>Even though it's more than one call to a simple string function?[/color]
      >>
      >>Probably... the number of calls is not really what counts, is it?[/color]
      >
      >I was only going by what you'd said previously:
      >
      ><quote>
      >I'd even go so far as to say that as soon as more than one call to a
      >simple string function is needed for a given problem, most probably
      >I'll find the regular expression solution more readable.
      ></quote>[/color]

      I know I said that and I know you were referring to it. But I meant one
      call as in "one call at runtime", as opposed to "one line of code that
      makes the call".
      [color=blue][color=green]
      >>Well, let's agree to disagree. I'm still trying to make the point that the
      >>comparison with simple string literals is a bad one, because the two won't
      >>ever be equal alternatives in any real world problem situation.[/color]
      >
      >I don't see how you can say that when using regular expressions was one
      >suggested solution, and using IndexOf was another suggested solution.[/color]

      Sorry, I meant "simple string operations". And I meant that I wouldn't
      consider using a regular expression if an IndexOf could do the job just as
      well - the two are no equal alternatives because I wouldn't seriously
      consider one of them.
      [color=blue][color=green]
      >>Use the simple operations as long as it makes sense, but don't
      >>hesitate to look at other solutions because you think someone else on
      >>the team might make a mistake changing a string literal later on.[/color]
      >
      >If the other solution is likely to be fundamentally simpler, I'm all
      >for that. It was this particular situation that I was commenting on,
      >and the general comment that regular expressions are often used as a
      >sledgehammer to crack a pretty flimsy nut.[/color]

      You're right about that. Complex technologies tend to be misused more
      often than simple ones, don't they?
      [color=blue][color=green]
      >>Jon, I'm with you most of the way. But there's a limit to the demand for
      >>readability , as I see it. I'm not likely to turn down a useful technology
      >>in cases where it is practically without alternatives because the solution
      >>doesn't please me aesthetically.[/color]
      >
      >Me either - but where there is a practical alternative which is more
      >readable, I'll go for that. If you only have one solution, you can't
      >turn it down really, can you? (Unless you can forego the feature which
      >requires it, of course, which is unlikely.)[/color]

      Well, usually someone will come forward with other solutions, however
      far-fetched. One that can actually be quite a good alternative to more
      complex regular expression scenarios is writing a parser - or rather,
      using a compiler compiler to create one. But in my experience there's a
      lot of room for nicely written regular expressions, somewhere between a
      few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)


      Oliver Sturm
      --
      Expert programming and consulting services available
      See http://www.sturmnet.org (try /blog as well)

      Comment

      • Jon Skeet [C# MVP]

        #18
        Re: Search for multiple things in a string

        Oliver Sturm <oliver@sturmne t.org> wrote:[color=blue][color=green]
        > ><quote>
        > >I'd even go so far as to say that as soon as more than one call to a
        > >simple string function is needed for a given problem, most probably
        > >I'll find the regular expression solution more readable.
        > ></quote>[/color]
        >
        > I know I said that and I know you were referring to it. But I meant one
        > call as in "one call at runtime", as opposed to "one line of code that
        > makes the call".[/color]

        Not quite with you there - in this case, there would be three calls at
        runtime, and three lines of code.
        [color=blue][color=green][color=darkred]
        > >>Well, let's agree to disagree. I'm still trying to make the point that the
        > >>comparison with simple string literals is a bad one, because the two won't
        > >>ever be equal alternatives in any real world problem situation.[/color]
        > >
        > >I don't see how you can say that when using regular expressions was one
        > >suggested solution, and using IndexOf was another suggested solution.[/color]
        >
        > Sorry, I meant "simple string operations". And I meant that I wouldn't
        > consider using a regular expression if an IndexOf could do the job just as
        > well - the two are no equal alternatives because I wouldn't seriously
        > consider one of them.[/color]

        Right - but unfortunately (IMO) other people do.
        [color=blue][color=green]
        > >If the other solution is likely to be fundamentally simpler, I'm all
        > >for that. It was this particular situation that I was commenting on,
        > >and the general comment that regular expressions are often used as a
        > >sledgehammer to crack a pretty flimsy nut.[/color]
        >
        > You're right about that. Complex technologies tend to be misused more
        > often than simple ones, don't they?[/color]

        Absolutely...
        [color=blue][color=green]
        > >Me either - but where there is a practical alternative which is more
        > >readable, I'll go for that. If you only have one solution, you can't
        > >turn it down really, can you? (Unless you can forego the feature which
        > >requires it, of course, which is unlikely.)[/color]
        >
        > Well, usually someone will come forward with other solutions, however
        > far-fetched. One that can actually be quite a good alternative to more
        > complex regular expression scenarios is writing a parser - or rather,
        > using a compiler compiler to create one. But in my experience there's a
        > lot of room for nicely written regular expressions, somewhere between a
        > few IndexOf calls and a complete lex/yacc/SLK/Coco/R implementation. :-)[/color]

        Oh certainly. I'm really *not* trying to suggest that regular
        expressions should never be used - just that they shouldn't be the
        first port of call as soon as you need to do anything with a string :)

        --
        Jon Skeet - <skeet@pobox.co m>
        http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
        If replying to the group, please do not mail me too

        Comment

        • Oliver Sturm

          #19
          Re: Search for multiple things in a string

          Jon Skeet [C# MVP] wrote:
          [color=blue][color=green][color=darkred]
          >>><quote>
          >>>I'd even go so far as to say that as soon as more than one call to a
          >>>simple string function is needed for a given problem, most probably
          >>>I'll find the regular expression solution more readable.
          >>></quote>[/color]
          >>
          >>I know I said that and I know you were referring to it. But I meant one
          >>call as in "one call at runtime", as opposed to "one line of code that
          >>makes the call".[/color]
          >
          >Not quite with you there - in this case, there would be three calls at
          >runtime, and three lines of code.[/color]

          And in this case I would be prepared to see things differently - I said
          already that I don't believe in call counting. But the sentence you quoted
          was meant more in the context of the problem I was describing, where
          simple string functions are used as a part of a, possibly hugely
          complicated, larger algorithm.

          As soon as there are loops involved, which may or may not result in a
          single line with such a call being executed multiple times, things start
          getting complex very quickly in my experience. How often have you been
          sitting there with the debugger running, counting characters in a string
          to find that one-off problem somebody introduced? I'll take an enormously
          unreadable regular expression over that task any day :-)



          Oliver Sturm
          --
          Expert programming and consulting services available
          See http://www.sturmnet.org (try /blog as well)

          Comment

          • tshad

            #20
            Re: Search for multiple things in a string


            "Oliver Sturm" <oliver@sturmne t.org> wrote in message
            news:xn0e7a65k6 4n7ok00f@msnews .microsoft.com. ..[color=blue]
            > Jon Skeet [C# MVP] wrote:
            >[color=green][color=darkred]
            >>>In a hurry, all kinds of things can happen when making changes to source
            >>>code.[/color]
            >>
            >>Indeed - but why make it even easier to introduce bugs? Changing a
            >>search from "somewhere" to "somewhere. com" shouldn't be something
            >>which requires significant thought, in my view - but it does as soon as
            >>you're using regular expressions.[/color]
            >
            > But in any proper real-world use case of regular expressions, there won't
            > be an expression saying "somewhere" to start with. If the pattern string
            > doesn't show any trace of wildcards or other recognizable regular
            > expression features, it should be safe to assume that regular expressions
            > aren't being used. If a string in some source code I don't know shows
            > signs of being a match pattern and there's nothing else that tells me
            > whether it's a regular expression or not, I'll have to look and find it
            > out, there's no way around that. To be safe in assuming that no string
            > could ever be a regular expression, regardless of whether it looks like
            > it, you would have to forbid them completely in your team at least.
            >[color=green][color=darkred]
            >>>As I'm trying to say all the time, as soon as an implementation reaches a
            >>>complexity that makes it worth thinking about regular expressions, I'm
            >>>sure an alternative solution based on simple string functions won't be
            >>>more readable any longer.[/color]
            >>
            >>Well, Nicholas certainly thought it worth thinking about regular
            >>expressions in this case - do you? (The earlier part of your reply
            >>suggests not, but the bit below suggests you do.)
            >>[color=darkred]
            >>>I'd even go so far as to say that as soon as
            >>>more than one call to a simple string function is needed for a given
            >>>problem, most probably I'll find the regular expression solution more
            >>>readable. This is, after all, a subjective decision to make.[/color]
            >>
            >>Whereas three calls to IndexOf is definitely more readable than a
            >>regular expression which, depending on the strings involved may well
            >>need to involve escaping.[/color]
            >
            > In this case, as far as it's described by the sample we've seen, I
            > wouldn't favor the usage of regular expressions. I don't know whether the
            > actual code that the OP is writing might justify regexes better. Anyway, I
            > was merely using the case to demonstrate the fact that regular expressions
            > don't have a readability problem, IMHO, or at least they don't need to
            > have one if used properly.[/color]

            I also feel that Regular Expressions, being an object in asp.net (not
            necessarily C#) makes it just as valid as C#.

            As far as readability, it has nothing to do with Regular Expressions whether
            it is readable or not, as Oliver mentions, but how you write it.

            You can also make some pretty unreadable C# code as well. Readability is a
            function of the programmer not the language (in most cases). As was also
            mentioned you also need to know the language. For someone not used to
            objects, abstract objects and interfaces are also hard to read.

            I like seeing different options and make a choice. Sometimes I may use
            something like Regex just so I am used to using it, as long as the problem
            warrants it.

            You don't use it - you lose it.

            Tom[color=blue]
            >
            >
            > Oliver Sturm
            > --
            > Expert programming and consulting services available
            > See http://www.sturmnet.org (try /blog as well)[/color]


            Comment

            • Jon Skeet [C# MVP]

              #21
              Re: Search for multiple things in a string

              tshad <tscheiderich@f tsolutions.com> wrote:[color=blue]
              > Escaping?
              >
              > You've mentioned that as being a problem a couple of times.
              >
              > What do you mean by this?
              >
              > Are you talking about stopping if you find the first one matching?[/color]

              No - I'm talking about finding things like "jon.skeet" in a string.
              Using IndexOf, that's no problem - no characters are interpreted in a
              "special" way by IndexOf.

              Regular expressions, however, treat "." as "any character", so to find
              an actual dot, you need to escape it with a backslash - and from a C#
              point of view that means either doubling the backslash or using a
              verbatim string literal, i.e.
              "jon\\.skee t"
              or
              @"jon\.skeet "

              --
              Jon Skeet - <skeet@pobox.co m>
              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
              If replying to the group, please do not mail me too

              Comment

              • tshad

                #22
                Re: Search for multiple things in a string

                "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                news:MPG.1d9919 a0e0d6ec1e98c76 e@msnews.micros oft.com...[color=blue]
                > tshad <tscheiderich@f tsolutions.com> wrote:[color=green]
                >> Escaping?
                >>
                >> You've mentioned that as being a problem a couple of times.
                >>
                >> What do you mean by this?
                >>
                >> Are you talking about stopping if you find the first one matching?[/color]
                >
                > No - I'm talking about finding things like "jon.skeet" in a string.
                > Using IndexOf, that's no problem - no characters are interpreted in a
                > "special" way by IndexOf.
                >
                > Regular expressions, however, treat "." as "any character", so to find
                > an actual dot, you need to escape it with a backslash - and from a C#
                > point of view that means either doubling the backslash or using a
                > verbatim string literal, i.e.
                > "jon\\.skee t"
                > or
                > @"jon\.skeet "[/color]

                Got ya.

                I thought you were talking about escaping the function/call as you might in
                a loop when you find what you are looking for.

                Thanks,

                Tom[color=blue]
                >
                > --
                > Jon Skeet - <skeet@pobox.co m>
                > http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                > If replying to the group, please do not mail me too[/color]


                Comment

                • Jon Skeet [C# MVP]

                  #23
                  Re: Search for multiple things in a string

                  tshad <tscheiderich@f tsolutions.com> wrote:[color=blue]
                  > I also feel that Regular Expressions, being an object in asp.net (not
                  > necessarily C#) makes it just as valid as C#.[/color]

                  Regular expressions have nothing to do with ASP.NET - they're a part of
                  "normal" .NET.
                  [color=blue]
                  > As far as readability, it has nothing to do with Regular Expressions whether
                  > it is readable or not, as Oliver mentions, but how you write it.[/color]

                  No - I believe that searching for "jon.skeet" with IndexOf is clearer
                  than searching for "jon\\.skee t" or @"jon\.skeet ". Which of them
                  contains just the information which is actually of concern, and which
                  contains information which is only present due to the technology used
                  to do the searching?
                  [color=blue]
                  > You can also make some pretty unreadable C# code as well.[/color]

                  Sure, but that's no reason to use regular expressions just to make
                  things worse.
                  [color=blue]
                  > Readability is a function of the programmer not the language (in most
                  > cases).[/color]

                  Yes, but it's the programmer's decision how to approach things -
                  whether you do things the simple way or the complex way. You *could*
                  implement the string search by manually iterating over all the
                  characters in the string, perhaps even writing your own state machine
                  to do it. The code could be pretty readable considering what it's doing
                  - but it's *bound* to be more complex than using IndexOf.
                  [color=blue]
                  > As was also mentioned you also need to know the language. For someone
                  > not used to objects, abstract objects and interfaces are also hard to
                  > read.[/color]

                  Sure - but why introduce unnecessarily complexity? You're already
                  writing C#, so you'd better know C# - but why add regular expressions
                  into the mix when they're unnecessary?
                  [color=blue]
                  > I like seeing different options and make a choice. Sometimes I may use
                  > something like Regex just so I am used to using it, as long as the problem
                  > warrants it.[/color]

                  And that's the point - I don't think this problem *does* warrant it.
                  [color=blue]
                  > You don't use it - you lose it.[/color]

                  So do you add a database when you just need to do a hashtable lookup,
                  just in case you forget SQL? Do you use reflection to get at the value
                  of a property, just in case you forget how to use that? I hope not.

                  It's very important to use appropriate technology, rather than using it
                  for the sake of it. (It's one thing to experiment with technology for
                  the sake of it as a learning tool, but I wouldn't do it in production
                  code.)

                  --
                  Jon Skeet - <skeet@pobox.co m>
                  http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                  If replying to the group, please do not mail me too

                  Comment

                  • tshad

                    #24
                    Re: Search for multiple things in a string

                    "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                    news:MPG.1d9925 9a591cb91698c77 0@msnews.micros oft.com...[color=blue]
                    > tshad <tscheiderich@f tsolutions.com> wrote:[color=green]
                    >> I also feel that Regular Expressions, being an object in asp.net (not
                    >> necessarily C#) makes it just as valid as C#.[/color]
                    >
                    > Regular expressions have nothing to do with ASP.NET - they're a part of
                    > "normal" .NET.[/color]

                    Actually, you're right.

                    But that was my point.

                    Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
                    So using Regex is not really like using another language (as C# is different
                    from VB.Net).

                    But the discussion was valid in you use the best tool for the situation.
                    [color=blue]
                    >[color=green]
                    >> As far as readability, it has nothing to do with Regular Expressions
                    >> whether
                    >> it is readable or not, as Oliver mentions, but how you write it.[/color]
                    >
                    > No - I believe that searching for "jon.skeet" with IndexOf is clearer
                    > than searching for "jon\\.skee t" or @"jon\.skeet ".[/color]

                    That's maybe true. But it would be clear to someone used to using both C#
                    and Regex.

                    Also, you have the same problem when dealing with web pages or getting a
                    file from the disk. You still use the escape character there (and as you
                    say, is a little confusing) - but you still do it.
                    [color=blue]
                    >Which of them
                    > contains just the information which is actually of concern, and which
                    > contains information which is only present due to the technology used
                    > to do the searching?
                    >[color=green]
                    >> You can also make some pretty unreadable C# code as well.[/color]
                    >
                    > Sure, but that's no reason to use regular expressions just to make
                    > things worse.[/color]

                    I agree with you that readability is important.

                    It used to be that people didn't like C and C++ for exactly the same reason
                    you point out. The code was not as clear as COBOL or Basic and that was the
                    complaint back then. I happened to be a Fortran programmer at that time and
                    was not interested to moving to C for that reason (not that Fortran was
                    better - readability wise).

                    The problem with C back that was that even though much of the code was
                    really cryptic. But it didn't have to be, that was just how people coded
                    back then. Mainly, it was important to make the most efficient code
                    possible because of the limited computing power and efficient rarely equates
                    to readable. And I am not even talking about compiling and linking and all
                    the options and cryptic command lines.
                    [color=blue]
                    >[color=green]
                    >> Readability is a function of the programmer not the language (in most
                    >> cases).[/color]
                    >
                    > Yes, but it's the programmer's decision how to approach things -
                    > whether you do things the simple way or the complex way. You *could*
                    > implement the string search by manually iterating over all the
                    > characters in the string, perhaps even writing your own state machine
                    > to do it. The code could be pretty readable considering what it's doing
                    > - but it's *bound* to be more complex than using IndexOf.[/color]

                    I agree.

                    Just because you can - doesn't mean you should.
                    [color=blue]
                    >[color=green]
                    >> As was also mentioned you also need to know the language. For someone
                    >> not used to objects, abstract objects and interfaces are also hard to
                    >> read.[/color]
                    >
                    > Sure - but why introduce unnecessarily complexity? You're already
                    > writing C#, so you'd better know C# - but why add regular expressions
                    > into the mix when they're unnecessary?[/color]

                    But if you know both and as I (and you) mentioned regex is part of .net as
                    is C# - so it is already in the mix. But you're right, don't introduce any
                    more complexity that necessary. But if it's 6 of one ... it's really up to
                    the programmer. In the original case, that was what it was. You can't tell
                    me that you feel that the solution suggested for this case was even close to
                    being unreadable (if you are even a stones throw from understanding Regular
                    Expressions).

                    I personally feel that both solutions are equally usable and readable (in
                    this situation).

                    I have also seen times when I just couldn't find an easy solution in C# or
                    VB and it was fairly easy in Regex.

                    I myself would usually opt for the C# or VB solutions first, but would have
                    no problem using Regex. As a matter of fact, I use Regex to strip commas
                    and $ from my textbox fields before writing it to SQL as it was the best
                    solution I could find. Such as:

                    SalaryMax.Text =
                    String.Format(" {0:c}",Calculat eYearly(Regex.R eplace(WagesMax .Text,"\$|\,"," ")))

                    At the time, I couldn't seem to find as simple a solution as this in VB.Net
                    so I use this (not saying there isn't one).[color=blue]
                    >[color=green]
                    >> I like seeing different options and make a choice. Sometimes I may use
                    >> something like Regex just so I am used to using it, as long as the
                    >> problem
                    >> warrants it.[/color]
                    >
                    > And that's the point - I don't think this problem *does* warrant it.[/color]

                    I agree that is isn't necessary here, but I don't think it is warranted or
                    unwarranted here. I think it's just as readable either way.[color=blue]
                    >[color=green]
                    >> You don't use it - you lose it.[/color]
                    >
                    > So do you add a database when you just need to do a hashtable lookup,
                    > just in case you forget SQL? Do you use reflection to get at the value
                    > of a property, just in case you forget how to use that? I hope not.[/color]

                    Of course not. But as was mentioned there are times where Regex may be a
                    good solution and if you can do it either way, why not.
                    [color=blue]
                    >
                    > It's very important to use appropriate technology, rather than using it
                    > for the sake of it. (It's one thing to experiment with technology for
                    > the sake of it as a learning tool, but I wouldn't do it in production
                    > code.)[/color]

                    Right. But Regex is not inappropriate technology. As you said, trying to
                    loop through each character when there is an easier way is a bit much.

                    But Regex is valid and is an appropriate method for handling strings and if
                    you are as comfortable with one as the other than it isn't inappropriate.
                    It's all in how you use it. And I was not saying experiment with it. I was
                    saying using it for the sake of staying familier with it. I don't want to
                    need to use it and have to figure it out when I need to use it.

                    As you said. Use the appropriate tool. If the appropriate tool is Regex,
                    it is going to be d... inconvenient to need it and not know how to use it.

                    Now I am not saying go out and learn every tool out there. But if it is a
                    valid tool in your particular environment, and it is available - why would
                    you not avail yourself of it?

                    Tom[color=blue]
                    > --
                    > Jon Skeet - <skeet@pobox.co m>
                    > http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                    > If replying to the group, please do not mail me too[/color]


                    Comment

                    • Jon Skeet [C# MVP]

                      #25
                      Re: Search for multiple things in a string

                      tshad <tscheiderich@f tsolutions.com> wrote:[color=blue][color=green]
                      > > Regular expressions have nothing to do with ASP.NET - they're a part of
                      > > "normal" .NET.[/color]
                      >
                      > Actually, you're right.
                      >
                      > But that was my point.
                      >
                      > Regex is part of .net as is C# (although it doesn't have to be) or VB.Net.
                      > So using Regex is not really like using another language (as C# is different
                      > from VB.Net).[/color]

                      It is - the regular expression *language* is a different language to
                      C#, in the same way that XPath is. That's why under "regular
                      expressions" in MSDN, there's a "language elements" section.
                      [color=blue]
                      > But the discussion was valid in you use the best tool for the situation.[/color]

                      Indeed.
                      [color=blue][color=green][color=darkred]
                      > >> As far as readability, it has nothing to do with Regular Expressions
                      > >> whether
                      > >> it is readable or not, as Oliver mentions, but how you write it.[/color]
                      > >
                      > > No - I believe that searching for "jon.skeet" with IndexOf is clearer
                      > > than searching for "jon\\.skee t" or @"jon\.skeet ".[/color]
                      >
                      > That's maybe true. But it would be clear to someone used to using both C#
                      > and Regex.[/color]

                      But not as instantly clear, I believe. Can you really say that you find
                      the regex version doesn't take you *any* longer to understand than the
                      non-regex version?
                      [color=blue]
                      > Also, you have the same problem when dealing with web pages or getting a
                      > file from the disk. You still use the escape character there (and as you
                      > say, is a little confusing) - but you still do it.[/color]

                      You have to know the C# escaping, but not the regular expression
                      escaping.
                      [color=blue][color=green][color=darkred]
                      > >> You can also make some pretty unreadable C# code as well.[/color]
                      > >
                      > > Sure, but that's no reason to use regular expressions just to make
                      > > things worse.[/color]
                      >
                      > I agree with you that readability is important.
                      >
                      > It used to be that people didn't like C and C++ for exactly the same reason
                      > you point out. The code was not as clear as COBOL or Basic and that was the
                      > complaint back then. I happened to be a Fortran programmer at that time and
                      > was not interested to moving to C for that reason (not that Fortran was
                      > better - readability wise).
                      >
                      > The problem with C back that was that even though much of the code was
                      > really cryptic. But it didn't have to be, that was just how people coded
                      > back then. Mainly, it was important to make the most efficient code
                      > possible because of the limited computing power and efficient rarely equates
                      > to readable. And I am not even talking about compiling and linking and all
                      > the options and cryptic command lines.[/color]

                      To me, a lot of readability comes from decent naming and commenting,
                      which fortunately are available in pretty much any language. I'd
                      certainly agree that object orientation (and exceptions, automatic
                      memory management etc) makes it a lot easier to write readable code
                      though.
                      [color=blue][color=green]
                      > > Yes, but it's the programmer's decision how to approach things -
                      > > whether you do things the simple way or the complex way. You *could*
                      > > implement the string search by manually iterating over all the
                      > > characters in the string, perhaps even writing your own state machine
                      > > to do it. The code could be pretty readable considering what it's doing
                      > > - but it's *bound* to be more complex than using IndexOf.[/color]
                      >
                      > I agree.
                      >
                      > Just because you can - doesn't mean you should.[/color]

                      Exactly.
                      [color=blue][color=green]
                      > > Sure - but why introduce unnecessarily complexity? You're already
                      > > writing C#, so you'd better know C# - but why add regular expressions
                      > > into the mix when they're unnecessary?[/color]
                      >
                      > But if you know both and as I (and you) mentioned regex is part of .net as
                      > is C# - so it is already in the mix.[/color]

                      No, it's not. It's not already used in every single C# program, any
                      more than SQL is.
                      [color=blue]
                      > But you're right, don't introduce any
                      > more complexity that necessary. But if it's 6 of one ... it's really up to
                      > the programmer.[/color]

                      In what way is it 6 of one or half a dozen of the other when one
                      solution requires knowing more than the other? I would expect *any* C#
                      programmer to know what String.IndexOf does. I wouldn't expect all C#
                      programmers to know by heart which regex language elements require
                      escaping - and if you don't know that off the top of your head, then
                      changing the code to search for a different string involves an extra
                      bit of brainpower.
                      [color=blue]
                      > In the original case, that was what it was. You can't tell
                      > me that you feel that the solution suggested for this case was even close to
                      > being unreadable (if you are even a stones throw from understanding Regular
                      > Expressions).[/color]

                      It was *less* readable though - and would have been *significantly*
                      less readable if the string being searched for had included dots,
                      brackets etc.
                      [color=blue]
                      > I personally feel that both solutions are equally usable and readable (in
                      > this situation).[/color]

                      I suspect not all programmers would though. Don't forget that the
                      person who writes the code is very often not the one to maintain it.
                      Can you guarantee that *everyone* who touches the code will find
                      regexes as readable as String.IndexOf?
                      [color=blue]
                      > I have also seen times when I just couldn't find an easy solution in C# or
                      > VB and it was fairly easy in Regex.[/color]

                      Which is why I've said repeatedly that I'm not trying to suggest that
                      regexes are bad, or should never be used. I'm just saying that in this
                      case it's using a sledgehammer to crack a nut.
                      [color=blue]
                      > I myself would usually opt for the C# or VB solutions first, but would have
                      > no problem using Regex. As a matter of fact, I use Regex to strip commas
                      > and $ from my textbox fields before writing it to SQL as it was the best
                      > solution I could find. Such as:
                      >
                      > SalaryMax.Text =
                      > String.Format(" {0:c}",Calculat eYearly(Regex.R eplace(WagesMax .Text,"\$|\,"," ")))
                      >
                      > At the time, I couldn't seem to find as simple a solution as this in VB.Net
                      > so I use this (not saying there isn't one).[/color]

                      And of course there is:
                      SalaryMax.Text =
                      String.Format ("{0:c}",Calcul ateYearly(Wages Max.Text.Replac e("$", "")
                      .Replace(",", ""));

                      I know which version I'd rather read...
                      [color=blue][color=green]
                      > > And that's the point - I don't think this problem *does* warrant it.[/color]
                      >
                      > I agree that is isn't necessary here, but I don't think it is warranted or
                      > unwarranted here. I think it's just as readable either way.[/color]

                      But I suspect you're more used to regular expressions than many other
                      programmers - and making the code less readable for other programmers
                      for no benefit is what makes it unwarranted here, even in the simple
                      case where there's nothing to escape.
                      [color=blue][color=green]
                      > > So do you add a database when you just need to do a hashtable lookup,
                      > > just in case you forget SQL? Do you use reflection to get at the value
                      > > of a property, just in case you forget how to use that? I hope not.[/color]
                      >
                      > Of course not. But as was mentioned there are times where Regex may be a
                      > good solution and if you can do it either way, why not.[/color]

                      Because it's more complicated! You can't deny that there's more to
                      consider due to the escaping. There's more to know, more to consider,
                      and it doesn't get the job done any more cleanly.
                      [color=blue][color=green]
                      > > It's very important to use appropriate technology, rather than using it
                      > > for the sake of it. (It's one thing to experiment with technology for
                      > > the sake of it as a learning tool, but I wouldn't do it in production
                      > > code.)[/color]
                      >
                      > Right. But Regex is not inappropriate technology. As you said, trying to
                      > loop through each character when there is an easier way is a bit much.[/color]

                      As is using the power of regular expressions when there is an easier
                      way - using IndexOf, which is *precisely* there to find one string
                      within another.
                      [color=blue]
                      > But Regex is valid and is an appropriate method for handling strings and if
                      > you are as comfortable with one as the other than it isn't inappropriate.
                      > It's all in how you use it. And I was not saying experiment with it. I was
                      > saying using it for the sake of staying familier with it. I don't want to
                      > need to use it and have to figure it out when I need to use it.[/color]

                      Do you really think it would take you that long to refamiliarise
                      yourself with it? I don't see why it's a good idea to make some poor
                      maintenance engineer who hasn't used regular expressions before try to
                      figure out that *actually* you were just trying to find strings within
                      each other just so you can keep your skill set current.
                      [color=blue]
                      > As you said. Use the appropriate tool. If the appropriate tool is Regex,
                      > it is going to be d... inconvenient to need it and not know how to use it.[/color]

                      I've never had a problem with reading the documentation when I've
                      needed to use regular expressions, without putting it in projects in
                      places where I *don't* need it.
                      [color=blue]
                      > Now I am not saying go out and learn every tool out there. But if it is a
                      > valid tool in your particular environment, and it is available - why would
                      > you not avail yourself of it?[/color]

                      Because it makes things more complicated for no benefit. The reflection
                      example was a good one - that allows you to get a property value, so do
                      you think it's a good idea to write:

                      string x = (string) something.GetTy pe()
                      .GetProperty("N ame")
                      .GetValue(somet hing, null);
                      or

                      string x = something.Name;

                      ?

                      Maybe I should use the latter. After all, I wouldn't want to forget how
                      to use reflection, would I?

                      --
                      Jon Skeet - <skeet@pobox.co m>
                      http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                      If replying to the group, please do not mail me too

                      Comment

                      • tshad

                        #26
                        Re: Search for multiple things in a string

                        "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                        news:MPG.1d99bc 169a63740998c77 a@msnews.micros oft.com...[color=blue]
                        > tshad <tscheiderich@f tsolutions.com> wrote:[color=green][color=darkred]
                        >> > Regular expressions have nothing to do with ASP.NET - they're a part of
                        >> > "normal" .NET.[/color]
                        >>
                        >> Actually, you're right.
                        >>
                        >> But that was my point.
                        >>
                        >> Regex is part of .net as is C# (although it doesn't have to be) or
                        >> VB.Net.
                        >> So using Regex is not really like using another language (as C# is
                        >> different
                        >> from VB.Net).[/color]
                        >
                        > It is - the regular expression *language* is a different language to
                        > C#, in the same way that XPath is. That's why under "regular
                        > expressions" in MSDN, there's a "language elements" section.[/color]

                        I think calling it a language is a stretch, although I know it is called a
                        language in places(it's all in what you define as a language). It really is
                        a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
                        by various languages.

                        You don't build pages with it. It isn't procedural. It is a tool used by
                        the other languages. You don't use VB.Net in C# or Vice versa but both use
                        Regular expressions (as the both use Substring, Replace etc).
                        [color=blue]
                        >[color=green]
                        >> But the discussion was valid in you use the best tool for the situation.[/color]
                        >
                        > Indeed.
                        >[color=green][color=darkred]
                        >> >> As far as readability, it has nothing to do with Regular Expressions
                        >> >> whether
                        >> >> it is readable or not, as Oliver mentions, but how you write it.
                        >> >
                        >> > No - I believe that searching for "jon.skeet" with IndexOf is clearer
                        >> > than searching for "jon\\.skee t" or @"jon\.skeet ".[/color]
                        >>
                        >> That's maybe true. But it would be clear to someone used to using both
                        >> C#
                        >> and Regex.[/color]
                        >
                        > But not as instantly clear, I believe. Can you really say that you find
                        > the regex version doesn't take you *any* longer to understand than the
                        > non-regex version?[/color]

                        Depends on the C# code as well as the Regex code.

                        Again, are we talking about the best tool for the job or the most
                        readability. As was mentioned before, you set up loops and temporary
                        variables to do what you can do in a simple Regular Expression.

                        Again, I am not pushing Regular Expressions here, just that they are just a
                        valid as C# (or VB.Net) string handlers.

                        I do use them when convenient.

                        For example, I was creating a simple text search engine and wanted to modify
                        what the user put in and found it simpler to do the following than in VB or
                        C:

                        ' The following replaces all multiple blanks with " ". It then takes
                        ' out the anomalies, such as "and not and" and replaces them with "and"

                        keywords = trim(Regex.Repl ace(keywords, "\s{2,}", " "))
                        keywords = Regex.Replace(k eywords, "( )", " or ")
                        keywords = Regex.Replace(k eywords," or or "," ")
                        keywords = Regex.Replace(k eywords,"or and or","and")
                        keywords = Regex.Replace(k eywords,"or near or","near")
                        keywords = Regex.Replace(k eywords,"and not or","and not")

                        Fairly straight forward and easy to follow.
                        [color=blue]
                        >[color=green]
                        >> Also, you have the same problem when dealing with web pages or getting a
                        >> file from the disk. You still use the escape character there (and as you
                        >> say, is a little confusing) - but you still do it.[/color]
                        >
                        > You have to know the C# escaping, but not the regular expression
                        > escaping.[/color]

                        But you do NEED to know the C# escaping (readability not high - unless you
                        understand it).
                        [color=blue]
                        >[color=green][color=darkred]
                        >> >> You can also make some pretty unreadable C# code as well.
                        >> >
                        >> > Sure, but that's no reason to use regular expressions just to make
                        >> > things worse.[/color]
                        >>
                        >> I agree with you that readability is important.
                        >>
                        >> It used to be that people didn't like C and C++ for exactly the same
                        >> reason
                        >> you point out. The code was not as clear as COBOL or Basic and that was
                        >> the
                        >> complaint back then. I happened to be a Fortran programmer at that time
                        >> and
                        >> was not interested to moving to C for that reason (not that Fortran was
                        >> better - readability wise).
                        >>
                        >> The problem with C back that was that even though much of the code was
                        >> really cryptic. But it didn't have to be, that was just how people coded
                        >> back then. Mainly, it was important to make the most efficient code
                        >> possible because of the limited computing power and efficient rarely
                        >> equates
                        >> to readable. And I am not even talking about compiling and linking and
                        >> all
                        >> the options and cryptic command lines.[/color]
                        >
                        > To me, a lot of readability comes from decent naming and commenting,
                        > which fortunately are available in pretty much any language. I'd
                        > certainly agree that object orientation (and exceptions, automatic
                        > memory management etc) makes it a lot easier to write readable code
                        > though.[/color]

                        But writing objects and the objects themselves are not easily readable. But
                        you would advocate not writing them, would you?
                        [color=blue]
                        >[color=green][color=darkred]
                        >> > Yes, but it's the programmer's decision how to approach things -
                        >> > whether you do things the simple way or the complex way. You *could*
                        >> > implement the string search by manually iterating over all the
                        >> > characters in the string, perhaps even writing your own state machine
                        >> > to do it. The code could be pretty readable considering what it's doing
                        >> > - but it's *bound* to be more complex than using IndexOf.[/color]
                        >>
                        >> I agree.
                        >>
                        >> Just because you can - doesn't mean you should.[/color]
                        >
                        > Exactly.
                        >[color=green][color=darkred]
                        >> > Sure - but why introduce unnecessarily complexity? You're already
                        >> > writing C#, so you'd better know C# - but why add regular expressions
                        >> > into the mix when they're unnecessary?[/color]
                        >>
                        >> But if you know both and as I (and you) mentioned regex is part of .net
                        >> as
                        >> is C# - so it is already in the mix.[/color]
                        >
                        > No, it's not. It's not already used in every single C# program, any
                        > more than SQL is.[/color]

                        Nor are all the objects you use.

                        But if you are using .Net, it is part of the mix.
                        [color=blue]
                        >[color=green]
                        >> But you're right, don't introduce any
                        >> more complexity that necessary. But if it's 6 of one ... it's really up
                        >> to
                        >> the programmer.[/color]
                        >
                        > In what way is it 6 of one or half a dozen of the other when one
                        > solution requires knowing more than the other? I would expect *any* C#
                        > programmer to know what String.IndexOf does. I wouldn't expect all C#
                        > programmers to know by heart which regex language elements require
                        > escaping - and if you don't know that off the top of your head, then
                        > changing the code to search for a different string involves an extra
                        > bit of brainpower.[/color]

                        Why? Ever heard of references or cheat sheets? And what is wrong with a
                        little extra brainpower - if you don't use it, you lose it :)

                        I don't know all of the possible combinations of calls to every Object, but
                        that doesn't preclude me from using them.

                        My position has always been, don't memorize. You will remember what you
                        use. But if you know how to get it (where to look), then you have
                        everything you need.

                        I happen to use .Net. Regex is part of .Net. I would be limiting myself if
                        I didn't use Regex in places where it is appropriate. If I happen to know a
                        good way in Regex to solve a problem, I am not going use *extra brainpower*
                        to try to solve the problem in C#.
                        [color=blue]
                        >[color=green]
                        >> In the original case, that was what it was. You can't tell
                        >> me that you feel that the solution suggested for this case was even close
                        >> to
                        >> being unreadable (if you are even a stones throw from understanding
                        >> Regular
                        >> Expressions).[/color]
                        >
                        > It was *less* readable though - and would have been *significantly*
                        > less readable if the string being searched for had included dots,
                        > brackets etc.[/color]

                        But it didn't. But if it did, it is no different than having to deal with
                        escapes in C (less readable)

                        If you are talking about

                        if ((someString.In dexOf("somethin g1",0) >= 0) ||
                        ((someString.In dexOf("somethin g2",0) >= 0) ||
                        ((someString.In dexOf("somethin g3",0) >= 0))
                        {
                        Do something
                        }

                        vs

                        if (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))

                        If you know absolutely nothing about Regular expressions, I would agree that
                        this is less readable.

                        But I would also contend that IndexOf could be just as confusing. What is
                        the first 0 for? What about the 2nd? It is readable because you know C.

                        I would maintain that if even if you knew nothing about Regex, you would
                        assume that you are doing a Match (can't tell that from the word "IndexOf")
                        and it probably has something to do with the words "something1 ",
                        "something2 " and "something3 ". Now if you know C than I would assume you
                        would pick up that "|" is "or" (not so clear to a VB programmer). And that
                        would be to someone not familier with regular expressions doing a quick
                        perusal

                        So I am at a loss as to how this regular expression is more unreadable than
                        the C# counterpart. That is not to say that you couldn't make it more
                        unreadable - but you could do the same with C# if you wanted to.
                        [color=blue]
                        >[color=green]
                        >> I personally feel that both solutions are equally usable and readable (in
                        >> this situation).[/color]
                        >
                        > I suspect not all programmers would though. Don't forget that the
                        > person who writes the code is very often not the one to maintain it.
                        > Can you guarantee that *everyone* who touches the code will find
                        > regexes as readable as String.IndexOf?[/color]

                        As was said, you can make readable and unreadable C or Regex code. Are you
                        going to tell your programmers they "cannot" use Regex for the same reason?

                        Are you going to leave out some objects that programmers may not be familier
                        with?
                        [color=blue]
                        >[color=green]
                        >> I have also seen times when I just couldn't find an easy solution in C#
                        >> or
                        >> VB and it was fairly easy in Regex.[/color]
                        >
                        > Which is why I've said repeatedly that I'm not trying to suggest that
                        > regexes are bad, or should never be used. I'm just saying that in this
                        > case it's using a sledgehammer to crack a nut.[/color]

                        And I don't in this case, as I think I've shown. Less typing, easy to read,
                        straight forward - in this case.[color=blue]
                        >[color=green]
                        >> I myself would usually opt for the C# or VB solutions first, but would
                        >> have
                        >> no problem using Regex. As a matter of fact, I use Regex to strip commas
                        >> and $ from my textbox fields before writing it to SQL as it was the best
                        >> solution I could find. Such as:
                        >>
                        >> SalaryMax.Text =
                        >> String.Format(" {0:c}",Calculat eYearly(Regex.R eplace(WagesMax .Text,"\$|\,"," ")))
                        >>
                        >> At the time, I couldn't seem to find as simple a solution as this in
                        >> VB.Net
                        >> so I use this (not saying there isn't one).[/color]
                        >
                        > And of course there is:
                        > SalaryMax.Text =
                        > String.Format ("{0:c}",Calcul ateYearly(Wages Max.Text.Replac e("$", "")
                        > .Replace(",", ""));
                        >
                        > I know which version I'd rather read...[/color]

                        I can read either (although, I didn't know you could string multiple
                        "Replace"s together).
                        [color=blue]
                        >[color=green][color=darkred]
                        >> > And that's the point - I don't think this problem *does* warrant it.[/color]
                        >>
                        >> I agree that is isn't necessary here, but I don't think it is warranted
                        >> or
                        >> unwarranted here. I think it's just as readable either way.[/color]
                        >
                        > But I suspect you're more used to regular expressions than many other
                        > programmers - and making the code less readable for other programmers
                        > for no benefit is what makes it unwarranted here, even in the simple
                        > case where there's nothing to escape.[/color]

                        First of all, I am not. I don't use it much at all, but I find it easy to
                        figure out and staight forward (but you can make it really complex). I use
                        it to validate phone numbers, credit card numbers, zip codes etc. Which are
                        very well documented and when there are a myiad of ways a user can put input
                        these types of data, I prefer to use Regular expressions which are all over
                        the place (easy to find) then try to come put with some complex set of loops
                        and temporary variables which make it far easier to make a mistake and much
                        more unreadable the the Regex equivelant.[color=blue]
                        >[color=green][color=darkred]
                        >> > So do you add a database when you just need to do a hashtable lookup,
                        >> > just in case you forget SQL? Do you use reflection to get at the value
                        >> > of a property, just in case you forget how to use that? I hope not.[/color]
                        >>
                        >> Of course not. But as was mentioned there are times where Regex may be a
                        >> good solution and if you can do it either way, why not.[/color]
                        >
                        > Because it's more complicated! You can't deny that there's more to
                        > consider due to the escaping. There's more to know, more to consider,
                        > and it doesn't get the job done any more cleanly.[/color]

                        Escaping seems to be your main compaint with it.

                        I have the same problem with C or VB when trying to remember when to use "\"
                        vs "/" in paths or do I need to add "\" in front of my slash or quote.
                        These are inherent problems with pretty much all of them.
                        [color=blue]
                        >[color=green][color=darkred]
                        >> > It's very important to use appropriate technology, rather than using it
                        >> > for the sake of it. (It's one thing to experiment with technology for
                        >> > the sake of it as a learning tool, but I wouldn't do it in production
                        >> > code.)[/color]
                        >>
                        >> Right. But Regex is not inappropriate technology. As you said, trying
                        >> to
                        >> loop through each character when there is an easier way is a bit much.[/color]
                        >
                        > As is using the power of regular expressions when there is an easier
                        > way - using IndexOf, which is *precisely* there to find one string
                        > within another.[/color]

                        I am not discounting IndexOf, I am just saying that both work fine and are
                        just as readable (in this case). In other cases, that may not be the case
                        (with either C or Regex).
                        [color=blue]
                        >[color=green]
                        >> But Regex is valid and is an appropriate method for handling strings and
                        >> if
                        >> you are as comfortable with one as the other than it isn't inappropriate.
                        >> It's all in how you use it. And I was not saying experiment with it. I
                        >> was
                        >> saying using it for the sake of staying familier with it. I don't want
                        >> to
                        >> need to use it and have to figure it out when I need to use it.[/color]
                        >
                        > Do you really think it would take you that long to refamiliarise
                        > yourself with it? I don't see why it's a good idea to make some poor
                        > maintenance engineer who hasn't used regular expressions before try to
                        > figure out that *actually* you were just trying to find strings within
                        > each other just so you can keep your skill set current.[/color]

                        So you would prefer to code to the lowest common denominator.

                        I am not going to code to the level of a junior programmer. I prefer that
                        he learn to code to a higher level.

                        I am not saying that that you still should write decent, readable, commented
                        code. But I am not going to limit myself because another programmer may not
                        be able to read well written code. If that were the case, I would not be
                        writing objects (abstract classes, interfaces, etc).
                        [color=blue]
                        >[color=green]
                        >> As you said. Use the appropriate tool. If the appropriate tool is
                        >> Regex,
                        >> it is going to be d... inconvenient to need it and not know how to use
                        >> it.[/color]
                        >
                        > I've never had a problem with reading the documentation when I've
                        > needed to use regular expressions, without putting it in projects in
                        > places where I *don't* need it.
                        >[/color]

                        "Need" is a personal question. I don't thing it applies here. You prefer
                        IndexOf and I might prefer IsMatch.
                        [color=blue][color=green]
                        >> Now I am not saying go out and learn every tool out there. But if it is
                        >> a
                        >> valid tool in your particular environment, and it is available - why
                        >> would
                        >> you not avail yourself of it?[/color]
                        >
                        > Because it makes things more complicated for no benefit. The reflection
                        > example was a good one - that allows you to get a property value, so do
                        > you think it's a good idea to write:
                        >
                        > string x = (string) something.GetTy pe()
                        > .GetProperty("N ame")
                        > .GetValue(somet hing, null);
                        > or
                        >
                        > string x = something.Name;
                        >
                        > ?
                        >
                        > Maybe I should use the latter. After all, I wouldn't want to forget how
                        > to use reflection, would I?[/color]

                        Lost me on that one.

                        Tom[color=blue]
                        >
                        > --
                        > Jon Skeet - <skeet@pobox.co m>
                        > http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                        > If replying to the group, please do not mail me too[/color]


                        Comment

                        • Jon Skeet [C# MVP]

                          #27
                          Re: Search for multiple things in a string

                          tshad <tscheiderich@f tsolutions.com> wrote:[color=blue][color=green]
                          > > It is - the regular expression *language* is a different language to
                          > > C#, in the same way that XPath is. That's why under "regular
                          > > expressions" in MSDN, there's a "language elements" section.[/color]
                          >
                          > I think calling it a language is a stretch, although I know it is called a
                          > language in places(it's all in what you define as a language).[/color]

                          In plenty of places. It has a language with a defined syntax etc.
                          [color=blue]
                          > It really is
                          > a text/string processor, as is: IndexOf, Substring, Right, Replace etc used
                          > by various languages.
                          >
                          > You don't build pages with it. It isn't procedural.[/color]

                          Neither of those are required for it to be a language.
                          [color=blue]
                          > It is a tool used by the other languages.[/color]

                          Sure - so is XPath, but that's a language too.
                          (See http://www.w3.org/TR/xpath)
                          [color=blue]
                          > You don't use VB.Net in C# or Vice versa but both use
                          > Regular expressions (as the both use Substring, Replace etc).[/color]

                          None of those state that regular expressions aren't a language.
                          [color=blue][color=green]
                          > > But not as instantly clear, I believe. Can you really say that you find
                          > > the regex version doesn't take you *any* longer to understand than the
                          > > non-regex version?[/color]
                          >
                          > Depends on the C# code as well as the Regex code.[/color]

                          The C# code in question would be:

                          if (someVariable.I ndexOf ("firstliteral" ) != -1 ||
                          someVariable.In dexOf ("secondliteral ") != -1 ||
                          someVariable.In dexOf ("thirdliteral" ) != -1)

                          If I did it regularly, I'd write a short method which took a params
                          string array.
                          [color=blue]
                          > Again, are we talking about the best tool for the job or the most
                          > readability.[/color]

                          Unless there's another compelling argument in favour of one tool or
                          another, readability is a very important part of choosing the best
                          tool.
                          [color=blue]
                          > As was mentioned before, you set up loops and temporary
                          > variables to do what you can do in a simple Regular Expression.
                          >
                          > Again, I am not pushing Regular Expressions here, just that they are just a
                          > valid as C# (or VB.Net) string handlers.[/color]

                          But you're effectively pushing them in the situation described by the
                          OP when you say that the solution using regular expressions is as
                          readable as the solution without.
                          [color=blue]
                          > I do use them when convenient.
                          >
                          > For example, I was creating a simple text search engine and wanted to modify
                          > what the user put in and found it simpler to do the following than in VB or
                          > C:
                          >
                          > ' The following replaces all multiple blanks with " ". It then takes
                          > ' out the anomalies, such as "and not and" and replaces them with "and"
                          >
                          > keywords = trim(Regex.Repl ace(keywords, "\s{2,}", " "))
                          > keywords = Regex.Replace(k eywords, "( )", " or ")
                          > keywords = Regex.Replace(k eywords," or or "," ")
                          > keywords = Regex.Replace(k eywords,"or and or","and")
                          > keywords = Regex.Replace(k eywords,"or near or","near")
                          > keywords = Regex.Replace(k eywords,"and not or","and not")
                          >
                          > Fairly straight forward and easy to follow.[/color]

                          Reasonably, although apart from the first regex, I'd suggest doing the
                          rest with straight calls to String.Replace. As an example of why I
                          think that would be more readable, what exactly do the second line do?
                          In some flavours of regular expressions, brackets form capturing
                          groups. Do they in .NET? I'd have to look it up. If it's really just
                          trying to replace the string "( )" with " or ", a call to
                          String.Replace would mean I didn't need to look anything up.
                          [color=blue][color=green][color=darkred]
                          > >> Also, you have the same problem when dealing with web pages or getting a
                          > >> file from the disk. You still use the escape character there (and as you
                          > >> say, is a little confusing) - but you still do it.[/color]
                          > >
                          > > You have to know the C# escaping, but not the regular expression
                          > > escaping.[/color]
                          >
                          > But you do NEED to know the C# escaping (readability not high - unless you
                          > understand it).[/color]

                          Yes, but I *already* need to know that in order to write C#. Choosing
                          to use String.IndexOf doesn't add to what I need to remember - choosing
                          regular expressions does. In addition, there aren't many things which
                          need escaping compared with those which need escaping in regular
                          expressions. In addition to *that*, whenever you need to escape in
                          regular expressions, you also need to escape in C# (or remember to use
                          verbatim string literals) - yet another piece of headache.
                          [color=blue][color=green]
                          > > To me, a lot of readability comes from decent naming and commenting,
                          > > which fortunately are available in pretty much any language. I'd
                          > > certainly agree that object orientation (and exceptions, automatic
                          > > memory management etc) makes it a lot easier to write readable code
                          > > though.[/color]
                          >
                          > But writing objects and the objects themselves are not easily readable. But
                          > you would advocate not writing them, would you?[/color]

                          No, but I don't see how that's relevant.
                          [color=blue][color=green][color=darkred]
                          > >> But if you know both and as I (and you) mentioned regex is part of .net
                          > >> as is C# - so it is already in the mix.[/color]
                          > >
                          > > No, it's not. It's not already used in every single C# program, any
                          > > more than SQL is.[/color]
                          >
                          > Nor are all the objects you use.
                          >
                          > But if you are using .Net, it is part of the mix.[/color]

                          It's not necessarily part of the mix I have to use. I suspect *very*
                          few programs don't do any string manipulation - knowing the string
                          methods well is *far* more fundamental to .NET programming than knowing
                          regular expressions.
                          [color=blue][color=green]
                          > > In what way is it 6 of one or half a dozen of the other when one
                          > > solution requires knowing more than the other? I would expect *any* C#
                          > > programmer to know what String.IndexOf does. I wouldn't expect all C#
                          > > programmers to know by heart which regex language elements require
                          > > escaping - and if you don't know that off the top of your head, then
                          > > changing the code to search for a different string involves an extra
                          > > bit of brainpower.[/color]
                          >
                          > Why? Ever heard of references or cheat sheets? And what is wrong with a
                          > little extra brainpower - if you don't use it, you lose it :)[/color]

                          If you truly think that given two solutions which are otherwise equal,
                          the solution which is easiest to write, read and maintain doesn't win
                          hands down, we'll definitely never agree.

                          If you want to keep your hand in with respect to regular expressions,
                          do it in a test project, or with a regular expressions workbench. Keep
                          it out of code which needs to be read and maintained, probably by other
                          people who don't want to waste time because you wanted to keep your
                          skill set up to date.
                          [color=blue]
                          > I don't know all of the possible combinations of calls to every Object, but
                          > that doesn't preclude me from using them.[/color]

                          Exactly - and you wouldn't go out of your way to use methods you don't
                          need, just to get into the habit of using them, would you?
                          [color=blue]
                          > My position has always been, don't memorize. You will remember what you
                          > use. But if you know how to get it (where to look), then you have
                          > everything you need.[/color]

                          Absolutely - so why are you so keen on making people either memorise or
                          look up the characters which need escaping for regular expressions
                          every time they read or modify your code?
                          [color=blue]
                          > I happen to use .Net. Regex is part of .Net. I would be limiting myself if
                          > I didn't use Regex in places where it is appropriate.[/color]

                          I seem to be having difficulty making myself clear on this point: I
                          have never stated and will never state that you shouldn't use regular
                          expressions where they're appropriate. But they are *not* appropriate
                          in this case, as they are a more complex and less readable way of
                          solving the problem.

                          Show me a problem where the regex way of solving it is simpler than
                          using simple string operations (and there are plenty of problems like
                          that) and I'll plump for the regex in a heartbeat.
                          [color=blue]
                          > If I happen to know a good way in Regex to solve a problem, I am not
                          > going use *extra brainpower* to try to solve the problem in C#.[/color]

                          In what way is using the method which is designed for *precisely* the
                          task in hand (finding something in a string) using extra brainpower? If
                          you're not familiar with String.IndexOf, you've got *much* bigger
                          things to worry about than whether or not your regular expression
                          skills are getting rusty.
                          [color=blue][color=green]
                          > > It was *less* readable though - and would have been *significantly*
                          > > less readable if the string being searched for had included dots,
                          > > brackets etc.[/color]
                          >
                          > But it didn't. But if it did, it is no different than having to deal with
                          > escapes in C (less readable)
                          >
                          > If you are talking about
                          >
                          > if ((someString.In dexOf("somethin g1",0) >= 0) ||
                          > ((someString.In dexOf("somethin g2",0) >= 0) ||
                          > ((someString.In dexOf("somethin g3",0) >= 0))
                          > {
                          > Do something
                          > }
                          >
                          > vs
                          >
                          > if (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))
                          >
                          > If you know absolutely nothing about Regular expressions, I would agree that
                          > this is less readable.
                          >
                          > But I would also contend that IndexOf could be just as confusing. What is
                          > the first 0 for? What about the 2nd? It is readable because you know C.[/color]

                          Well, for a start the 0s aren't necessary, and I wouldn't include them.
                          [color=blue]
                          > I would maintain that if even if you knew nothing about Regex, you would
                          > assume that you are doing a Match (can't tell that from the word "IndexOf")
                          > and it probably has something to do with the words "something1 ",
                          > "something2 " and "something3 ". Now if you know C than I would assume you
                          > would pick up that "|" is "or" (not so clear to a VB programmer). And that
                          > would be to someone not familier with regular expressions doing a quick
                          > perusal[/color]

                          Okay - now suppose I need to change it from searching for "something1 "
                          to "something. 1" or "something[1]". How long does it take to change in
                          each version? How easy is it to read afterwards?
                          [color=blue]
                          > So I am at a loss as to how this regular expression is more unreadable than
                          > the C# counterpart. That is not to say that you couldn't make it more
                          > unreadable - but you could do the same with C# if you wanted to.[/color]

                          You could start by making the C# more readable, as I've shown...

                          However, the regex is already less readable:
                          1) It's got "|" as a "magic character" in there.
                          2) It's got all the strings concatenated, so it's harder to spot each
                          of them separately.

                          And that's before you need to actually *maintain* the code.

                          Furthermore, suppose you didn't just want to search for literals -
                          suppose one of the strings you wanted to search for was contained in a
                          variable. How sure are you that *no-one* on your team would use:

                          x+"|something2| something3"

                          as the regular expression?
                          [color=blue][color=green]
                          > > I suspect not all programmers would though. Don't forget that the
                          > > person who writes the code is very often not the one to maintain it.
                          > > Can you guarantee that *everyone* who touches the code will find
                          > > regexes as readable as String.IndexOf?[/color]
                          >
                          > As was said, you can make readable and unreadable C or Regex code. Are you
                          > going to tell your programmers they "cannot" use Regex for the same reason?[/color]

                          I would tell programmers on my team not to use regular expressions
                          where the alternative is simpler and more readbale, yes.
                          [color=blue]
                          > Are you going to leave out some objects that programmers may not be familier
                          > with?[/color]

                          Absolutely, where there are simpler and more familiar ways of solving
                          the same problem.
                          [color=blue][color=green]
                          > > Which is why I've said repeatedly that I'm not trying to suggest that
                          > > regexes are bad, or should never be used. I'm just saying that in this
                          > > case it's using a sledgehammer to crack a nut.[/color]
                          >
                          > And I don't in this case, as I think I've shown. Less typing, easy to read,
                          > straight forward - in this case.[/color]

                          You've shown nothing of the kind - whereas I think I've given plenty of
                          examples of how using regular expressions make the code less easily
                          maintainable, even if you consider it equally readable to start with
                          (which I don't).
                          [color=blue][color=green][color=darkred]
                          > >> SalaryMax.Text =
                          > >> String.Format(" {0:c}",Calculat eYearly(Regex.R eplace(WagesMax .Text,"\$|\,"," ")))
                          > >>
                          > >> At the time, I couldn't seem to find as simple a solution as this in
                          > >> VB.Net
                          > >> so I use this (not saying there isn't one).[/color]
                          > >
                          > > And of course there is:
                          > > SalaryMax.Text =
                          > > String.Format ("{0:c}",Calcul ateYearly(Wages Max.Text.Replac e("$", "")
                          > > .Replace(",", ""));
                          > >
                          > > I know which version I'd rather read...[/color]
                          >
                          > I can read either (although, I didn't know you could string multiple
                          > "Replace"s together).[/color]

                          Yes, I can read either too. The point is that in reading my version, I
                          didn't need to wade through various special characters, understanding
                          exactly what was there for. Of course, your version wasn't even valid
                          C#, as it didn't escape the backslashes and you didn't specify a
                          verbatim literal. I assume it was originally VB.NET. I wonder which
                          version would be easier to convert to valid C#? Mine, perhaps?
                          [color=blue][color=green]
                          > > But I suspect you're more used to regular expressions than many other
                          > > programmers - and making the code less readable for other programmers
                          > > for no benefit is what makes it unwarranted here, even in the simple
                          > > case where there's nothing to escape.[/color]
                          >
                          > First of all, I am not. I don't use it much at all, but I find it easy to
                          > figure out and staight forward (but you can make it really complex). I use
                          > it to validate phone numbers, credit card numbers, zip codes etc.[/color]

                          And in all of those cases, regular expressions are really useful.
                          [color=blue]
                          > Which are very well documented and when there are a myiad of ways a
                          > user can put input these types of data, I prefer to use Regular
                          > expressions which are all over the place (easy to find) then try to
                          > come put with some complex set of loops and temporary variables which
                          > make it far easier to make a mistake and much more unreadable the the
                          > Regex equivelant.[/color]

                          Where exactly are the complex loops and temporary variables in this
                          specific case? After all, you have been arguing for using regular
                          expressions in *this specific case*, haven't you?
                          [color=blue][color=green]
                          > > Because it's more complicated! You can't deny that there's more to
                          > > consider due to the escaping. There's more to know, more to consider,
                          > > and it doesn't get the job done any more cleanly.[/color]
                          >
                          > Escaping seems to be your main compaint with it.[/color]

                          It's the main potential source of problems, yes. It's a potential
                          source of problems which simply doesn't exist when you use
                          String.IndexOf.
                          [color=blue]
                          > I have the same problem with C or VB when trying to remember when to use "\"
                          > vs "/" in paths or do I need to add "\" in front of my slash or quote.
                          > These are inherent problems with pretty much all of them.[/color]

                          You already need to know that when writing C# though - my use of
                          String.IndexOf doesn't add to the volume of knowledge required.
                          [color=blue][color=green]
                          > > As is using the power of regular expressions when there is an easier
                          > > way - using IndexOf, which is *precisely* there to find one string
                          > > within another.[/color]
                          >
                          > I am not discounting IndexOf, I am just saying that both work fine and are
                          > just as readable (in this case). In other cases, that may not be the case
                          > (with either C or Regex).[/color]

                          Just because they're as readable *to you* doesn't mean they're as
                          readable to everyone. How sure are you that the next engineer to read
                          this code will be familiar with regular expressions? How sure are you
                          that when you need to change it to look for a different string, you'll
                          check whether any of the characters need to be escaped? Why would you
                          even want to force that check on yourself?
                          [color=blue][color=green]
                          > > Do you really think it would take you that long to refamiliarise
                          > > yourself with it? I don't see why it's a good idea to make some poor
                          > > maintenance engineer who hasn't used regular expressions before try to
                          > > figure out that *actually* you were just trying to find strings within
                          > > each other just so you can keep your skill set current.[/color]
                          >
                          > So you would prefer to code to the lowest common denominator.[/color]

                          When there's no good reason not to, absolutely.
                          [color=blue]
                          > I am not going to code to the level of a junior programmer. I prefer that
                          > he learn to code to a higher level.[/color]

                          Learning to solve problems as simply as possible *is* learning to code
                          to a higher level.
                          [color=blue]
                          > I am not saying that that you still should write decent, readable, commented
                          > code. But I am not going to limit myself because another programmer may not
                          > be able to read well written code. If that were the case, I would not be
                          > writing objects (abstract classes, interfaces, etc).[/color]

                          If it's not the simplest code for the situation, it's not well written
                          IMO. If it introduces risk for no reward (the risk of maintenance
                          failing to notice that they might need to escape something, versus no
                          reward) then it's not well written.
                          [color=blue][color=green]
                          > > I've never had a problem with reading the documentation when I've
                          > > needed to use regular expressions, without putting it in projects in
                          > > places where I *don't* need it.[/color]
                          >
                          > "Need" is a personal question. I don't thing it applies here. You prefer
                          > IndexOf and I might prefer IsMatch.[/color]

                          I bet if I showed my code to a random sample of a hundred C# developers
                          and asked them to change it to search for "hello[there]", virtually all
                          of them would get it right. I also bet that if I showed your code to
                          them and asked them for the same change, some would fail to escape it
                          appropriately. Do you disagree?
                          [color=blue][color=green]
                          > > Because it makes things more complicated for no benefit. The reflection
                          > > example was a good one - that allows you to get a property value, so do
                          > > you think it's a good idea to write:
                          > >
                          > > string x = (string) something.GetTy pe()
                          > > .GetProperty("N ame")
                          > > .GetValue(somet hing, null);
                          > > or
                          > >
                          > > string x = something.Name;
                          > >
                          > > ?
                          > >
                          > > Maybe I should use the latter. After all, I wouldn't want to forget how
                          > > to use reflection, would I?[/color]
                          >
                          > Lost me on that one.[/color]

                          Both are ways of finding the value of a property. The first is harder
                          to maintain and harder to read, just like your use of regular
                          expressions in this instance. Now, which of the above snippets of code
                          would you use, and why?

                          --
                          Jon Skeet - <skeet@pobox.co m>
                          http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                          If replying to the group, please do not mail me too

                          Comment

                          • tshad

                            #28
                            Re: Search for multiple things in a string

                            "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                            news:MPG.1d9a63 4f148e765998c78 1@msnews.micros oft.com...[color=blue]
                            > tshad <tscheiderich@f tsolutions.com> wrote:[color=green][color=darkred]
                            >> > It is - the regular expression *language* is a different language to
                            >> > C#, in the same way that XPath is. That's why under "regular
                            >> > expressions" in MSDN, there's a "language elements" section.[/color]
                            >>
                            >> I think calling it a language is a stretch, although I know it is called
                            >> a
                            >> language in places(it's all in what you define as a language).[/color]
                            >
                            > In plenty of places. It has a language with a defined syntax etc.[/color]

                            Yes, but so are dolphin sounds.

                            When I talk about a Programming Language - I am talking about a Procedural
                            Language (C, Fortran, VB, Pascal, etc.).[color=blue]
                            >[color=green]
                            >> It really is
                            >> a text/string processor, as is: IndexOf, Substring, Right, Replace etc
                            >> used
                            >> by various languages.
                            >>
                            >> You don't build pages with it. It isn't procedural.[/color]
                            >
                            > Neither of those are required for it to be a language.
                            >[color=green]
                            >> It is a tool used by the other languages.[/color]
                            >
                            > Sure - so is XPath, but that's a language too.
                            > (See http://www.w3.org/TR/xpath)
                            >[color=green]
                            >> You don't use VB.Net in C# or Vice versa but both use
                            >> Regular expressions (as the both use Substring, Replace etc).[/color]
                            >
                            > None of those state that regular expressions aren't a language.
                            >[color=green][color=darkred]
                            >> > But not as instantly clear, I believe. Can you really say that you find
                            >> > the regex version doesn't take you *any* longer to understand than the
                            >> > non-regex version?[/color]
                            >>
                            >> Depends on the C# code as well as the Regex code.[/color]
                            >
                            > The C# code in question would be:
                            >
                            > if (someVariable.I ndexOf ("firstliteral" ) != -1 ||
                            > someVariable.In dexOf ("secondliteral ") != -1 ||
                            > someVariable.In dexOf ("thirdliteral" ) != -1)
                            >[/color]

                            And the Regex version:

                            if (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))
                            [color=blue]
                            > If I did it regularly, I'd write a short method which took a params
                            > string array.
                            >[color=green]
                            >> Again, are we talking about the best tool for the job or the most
                            >> readability.[/color]
                            >
                            > Unless there's another compelling argument in favour of one tool or
                            > another, readability is a very important part of choosing the best
                            > tool.[/color]

                            Again, why do I need a compelling reason. If I have the solution and it
                            happens to be Regex, I would use it, I wouldn't necessarily say to myself -
                            "Is there perhaps a more readable way to write this? I wonder if Jim will
                            be able to read this or not."
                            [color=blue]
                            >[color=green]
                            >> As was mentioned before, you set up loops and temporary
                            >> variables to do what you can do in a simple Regular Expression.
                            >>
                            >> Again, I am not pushing Regular Expressions here, just that they are just
                            >> a
                            >> valid as C# (or VB.Net) string handlers.[/color]
                            >
                            > But you're effectively pushing them in the situation described by the
                            > OP when you say that the solution using regular expressions is as
                            > readable as the solution without.[/color]

                            No.

                            No pushing. No more than your pushing not using it.
                            [color=blue]
                            >[color=green]
                            >> I do use them when convenient.
                            >>
                            >> For example, I was creating a simple text search engine and wanted to
                            >> modify
                            >> what the user put in and found it simpler to do the following than in VB
                            >> or
                            >> C:
                            >>
                            >> ' The following replaces all multiple blanks with " ". It then takes
                            >> ' out the anomalies, such as "and not and" and replaces them with "and"
                            >>
                            >> keywords = trim(Regex.Repl ace(keywords, "\s{2,}", " "))
                            >> keywords = Regex.Replace(k eywords, "( )", " or ")
                            >> keywords = Regex.Replace(k eywords," or or "," ")
                            >> keywords = Regex.Replace(k eywords,"or and or","and")
                            >> keywords = Regex.Replace(k eywords,"or near or","near")
                            >> keywords = Regex.Replace(k eywords,"and not or","and not")
                            >>
                            >> Fairly straight forward and easy to follow.[/color]
                            >
                            > Reasonably, although apart from the first regex, I'd suggest doing the
                            > rest with straight calls to String.Replace. As an example of why I
                            > think that would be more readable, what exactly do the second line do?[/color]

                            Actually, nothing. It is grouping a " ", which isn't necessary. I think I
                            used to have something else there and took it out and didn't realize I
                            didn't need the ().
                            [color=blue]
                            > In some flavours of regular expressions, brackets form capturing
                            > groups. Do they in .NET? I'd have to look it up. If it's really just
                            > trying to replace the string "( )" with " or ", a call to
                            > String.Replace would mean I didn't need to look anything up.[/color]

                            Obviously, you didn't need to look this one up either - as you were correct.
                            It is just grouping a blank.[color=blue]
                            >[color=green][color=darkred]
                            >> >> Also, you have the same problem when dealing with web pages or getting
                            >> >> a
                            >> >> file from the disk. You still use the escape character there (and as
                            >> >> you
                            >> >> say, is a little confusing) - but you still do it.
                            >> >
                            >> > You have to know the C# escaping, but not the regular expression
                            >> > escaping.[/color]
                            >>
                            >> But you do NEED to know the C# escaping (readability not high - unless
                            >> you
                            >> understand it).[/color]
                            >
                            > Yes, but I *already* need to know that in order to write C#. Choosing
                            > to use String.IndexOf doesn't add to what I need to remember - choosing
                            > regular expressions does. In addition, there aren't many things which
                            > need escaping compared with those which need escaping in regular
                            > expressions. In addition to *that*, whenever you need to escape in
                            > regular expressions, you also need to escape in C# (or remember to use
                            > verbatim string literals) - yet another piece of headache.
                            >[color=green][color=darkred]
                            >> > To me, a lot of readability comes from decent naming and commenting,
                            >> > which fortunately are available in pretty much any language. I'd
                            >> > certainly agree that object orientation (and exceptions, automatic
                            >> > memory management etc) makes it a lot easier to write readable code
                            >> > though.[/color]
                            >>
                            >> But writing objects and the objects themselves are not easily readable.
                            >> But
                            >> you would advocate not writing them, would you?[/color]
                            >
                            > No, but I don't see how that's relevant.[/color]

                            Just that you don't want to Regex as it is not easily readable. Neither are
                            Regex.

                            But the fact a junior programmer might not understand Objects as you do
                            would not prevent you from writing them, would you?
                            [color=blue]
                            >[color=green][color=darkred]
                            >> >> But if you know both and as I (and you) mentioned regex is part of
                            >> >> .net
                            >> >> as is C# - so it is already in the mix.
                            >> >
                            >> > No, it's not. It's not already used in every single C# program, any
                            >> > more than SQL is.[/color]
                            >>
                            >> Nor are all the objects you use.
                            >>
                            >> But if you are using .Net, it is part of the mix.[/color]
                            >
                            > It's not necessarily part of the mix I have to use.[/color]

                            You don't have to use lots of things. That doesn't make them invalid.
                            Neither is the fact that you use Foreach vs For {}. They are there and are
                            part of the mix as is Regex. I might agree with you more if Regex were some
                            component that you picked up and added. Or if Regex were some obscure
                            technique that few knew about. They have been around for quite a long time
                            and is just another gun in your arsenal. If I thought that MS were
                            deprecating it, I would also think twice about using it. But it is part of
                            ..Net that all the languages can make use of and I would never tell a
                            programmer, who may be really comfortable with it and uses it responsibly
                            (not obscure cryptic non-commented code), that he should be using IndexOf
                            instead.
                            [color=blue]
                            >I suspect *very*
                            > few programs don't do any string manipulation - knowing the string
                            > methods well is *far* more fundamental to .NET programming than knowing
                            > regular expressions.[/color]

                            I agree with part of that and think that regular expressions are just as
                            important to know. As we have been saying, it is here and many people use
                            it, so to not understand it is to limit yourself. You don't have to use it,
                            but you should at least understand the basics of how it works. What are you
                            going to do when someone uses a RegularExpressi onValidator and you don't
                            understand what the expression is? The fact that it is not C# (neither is a
                            textbox, datagrid, etc), doesn't mean you should understand them. Whether
                            you use them is up to you.

                            As you point out, you are not the only programmer and many programmers like
                            to use Regex and that doesn't make them any lesser programmers. What are
                            you going to when you run into their code?

                            I see code all the time (much of the time it is mine) and wonder why the
                            programmer didn't do it another way. There are many ways to skin a cat.
                            Sometimes it is just style, sometimes it is all they know. But if they
                            follow whatever standards are setup (and in your case maybe you forbid
                            Regex) then as long as the code is well written and clean - I have no
                            problem with it.[color=blue]
                            >[color=green][color=darkred]
                            >> > In what way is it 6 of one or half a dozen of the other when one
                            >> > solution requires knowing more than the other? I would expect *any* C#
                            >> > programmer to know what String.IndexOf does. I wouldn't expect all C#
                            >> > programmers to know by heart which regex language elements require
                            >> > escaping - and if you don't know that off the top of your head, then
                            >> > changing the code to search for a different string involves an extra
                            >> > bit of brainpower.[/color]
                            >>
                            >> Why? Ever heard of references or cheat sheets? And what is wrong with a
                            >> little extra brainpower - if you don't use it, you lose it :)[/color]
                            >
                            > If you truly think that given two solutions which are otherwise equal,
                            > the solution which is easiest to write, read and maintain doesn't win
                            > hands down, we'll definitely never agree.
                            >[/color]

                            I agree there.

                            Which is easier to write is obviously your perception. I found my example,
                            as easy as yours to write and just as readable.
                            [color=blue]
                            > If you want to keep your hand in with respect to regular expressions,
                            > do it in a test project, or with a regular expressions workbench. Keep
                            > it out of code which needs to be read and maintained, probably by other
                            > people who don't want to waste time because you wanted to keep your
                            > skill set up to date.
                            >[/color]

                            Keep regular expressions out of my code?????

                            So now you are saying there is no use for it?
                            [color=blue][color=green]
                            >> I don't know all of the possible combinations of calls to every Object,
                            >> but
                            >> that doesn't preclude me from using them.[/color]
                            >
                            > Exactly - and you wouldn't go out of your way to use methods you don't
                            > need, just to get into the habit of using them, would you?[/color]

                            Sure.

                            If it is valid. As I said there are many ways to skin ..., depending on the
                            situation I may do it one way and the next time another way. Gives me many
                            options. I don't do it willy nilly, as you seem to suggest, as a test
                            bench.[color=blue]
                            >[color=green]
                            >> My position has always been, don't memorize. You will remember what you
                            >> use. But if you know how to get it (where to look), then you have
                            >> everything you need.[/color]
                            >
                            > Absolutely - so why are you so keen on making people either memorise or
                            > look up the characters which need escaping for regular expressions
                            > every time they read or modify your code?
                            >[/color]
                            I am not. I don't memorize. But I still use it.
                            [color=blue][color=green]
                            >> I happen to use .Net. Regex is part of .Net. I would be limiting myself
                            >> if
                            >> I didn't use Regex in places where it is appropriate.[/color]
                            >
                            > I seem to be having difficulty making myself clear on this point: I
                            > have never stated and will never state that you shouldn't use regular
                            > expressions where they're appropriate. But they are *not* appropriate
                            > in this case, as they are a more complex and less readable way of
                            > solving the problem.[/color]

                            No you are very clear. If you are so concerned with others being able to
                            read your code and problems with escape characters - why would you EVER want
                            them to use them. You can't have it both ways.

                            If they would have a hard time with a nothing expression like "if
                            (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))" - they are
                            never going to get some of the of the other standard Regex solutions I
                            mentioned before.

                            As you said, the two solutions are equal. Your solution is that you MUST go
                            with IndexOf. Mine is you can use either.
                            [color=blue]
                            >
                            > Show me a problem where the regex way of solving it is simpler than
                            > using simple string operations (and there are plenty of problems like
                            > that) and I'll plump for the regex in a heartbeat.
                            >[color=green]
                            >> If I happen to know a good way in Regex to solve a problem, I am not
                            >> going use *extra brainpower* to try to solve the problem in C#.[/color]
                            >
                            > In what way is using the method which is designed for *precisely* the
                            > task in hand (finding something in a string) using extra brainpower?[/color]

                            I wasn't referring to this particular issue when I said this.
                            [color=blue]
                            >If
                            > you're not familiar with String.IndexOf, you've got *much* bigger
                            > things to worry about than whether or not your regular expression
                            > skills are getting rusty.[/color]

                            I never said I was not familier with IndexOf.

                            As a matter of fact, the original question was given whether you could "do a
                            search for more that one string in another string".

                            *************** *************** *************** *************** ****
                            Can you do a search for more that one string in another string?

                            Something like:

                            someString.Inde xOf("something1 ","something2", "something3 ",0)

                            or would you have to do something like:

                            if ((someString.In dexOf("somethin g1",0) >= 0) ||
                            ((someString.In dexOf("somethin g2",0) >= 0) ||
                            ((someString.In dexOf("somethin g3",0) >= 0))
                            {
                            Do something
                            }
                            *************** *************** *************** *************** ***************
                            IndexOf doesn't do it. This was the original question. You have to do
                            multiple calls as is said in the original question. Nicholas was correct in
                            his assessment. One Regex call would work.[color=blue]
                            >[color=green][color=darkred]
                            >> > It was *less* readable though - and would have been *significantly*
                            >> > less readable if the string being searched for had included dots,
                            >> > brackets etc.[/color]
                            >>
                            >> But it didn't. But if it did, it is no different than having to deal
                            >> with
                            >> escapes in C (less readable)
                            >>
                            >> If you are talking about
                            >>
                            >> if ((someString.In dexOf("somethin g1",0) >= 0) ||
                            >> ((someString.In dexOf("somethin g2",0) >= 0) ||
                            >> ((someString.In dexOf("somethin g3",0) >= 0))
                            >> {
                            >> Do something
                            >> }
                            >>
                            >> vs
                            >>
                            >> if (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))
                            >>
                            >> If you know absolutely nothing about Regular expressions, I would agree
                            >> that
                            >> this is less readable.
                            >>
                            >> But I would also contend that IndexOf could be just as confusing. What
                            >> is
                            >> the first 0 for? What about the 2nd? It is readable because you know C.[/color]
                            >
                            > Well, for a start the 0s aren't necessary, and I wouldn't include them.[/color]

                            You're right.
                            [color=blue]
                            >[color=green]
                            >> I would maintain that if even if you knew nothing about Regex, you would
                            >> assume that you are doing a Match (can't tell that from the word
                            >> "IndexOf")
                            >> and it probably has something to do with the words "something1 ",
                            >> "something2 " and "something3 ". Now if you know C than I would assume you
                            >> would pick up that "|" is "or" (not so clear to a VB programmer). And
                            >> that
                            >> would be to someone not familier with regular expressions doing a quick
                            >> perusal[/color]
                            >
                            > Okay - now suppose I need to change it from searching for "something1 "
                            > to "something. 1" or "something[1]". How long does it take to change in
                            > each version? How easy is it to read afterwards?[/color]

                            That wasn't the question.

                            What if you wanted to change "something1 " to "something\ ". Same problem.
                            And if escapes were a problem (if it were me) I would have a little sheet
                            that showed them at my desk within easy reach.[color=blue]
                            >[color=green]
                            >> So I am at a loss as to how this regular expression is more unreadable
                            >> than
                            >> the C# counterpart. That is not to say that you couldn't make it more
                            >> unreadable - but you could do the same with C# if you wanted to.[/color]
                            >
                            > You could start by making the C# more readable, as I've shown...[/color]

                            As you can with Regular Expressions.
                            [color=blue]
                            >
                            > However, the regex is already less readable:
                            > 1) It's got "|" as a "magic character" in there.[/color]

                            | = or (same as C)
                            [color=blue]
                            > 2) It's got all the strings concatenated, so it's harder to spot each
                            > of them separately.[/color]

                            You are kidding, right?
                            [color=blue]
                            >
                            > And that's before you need to actually *maintain* the code.
                            >
                            > Furthermore, suppose you didn't just want to search for literals -
                            > suppose one of the strings you wanted to search for was contained in a
                            > variable. How sure are you that *no-one* on your team would use:
                            >
                            > x+"|something2| something3"
                            >
                            > as the regular expression?
                            >[/color]


                            You are now leaving the original question. I never said that Regular
                            Expressions was the better (or not better) in all cases.
                            [color=blue][color=green][color=darkred]
                            >> > I suspect not all programmers would though. Don't forget that the
                            >> > person who writes the code is very often not the one to maintain it.
                            >> > Can you guarantee that *everyone* who touches the code will find
                            >> > regexes as readable as String.IndexOf?[/color]
                            >>
                            >> As was said, you can make readable and unreadable C or Regex code. Are
                            >> you
                            >> going to tell your programmers they "cannot" use Regex for the same
                            >> reason?[/color]
                            >
                            > I would tell programmers on my team not to use regular expressions
                            > where the alternative is simpler and more readbale, yes.[/color]

                            Why use them at all? It isn't readable.

                            And if your programmers can't maintain the simple Regexs, they definately
                            won't be able to handle the more complicated ones.[color=blue]
                            >[color=green]
                            >> Are you going to leave out some objects that programmers may not be
                            >> familier
                            >> with?[/color]
                            >
                            > Absolutely, where there are simpler and more familiar ways of solving
                            > the same problem.
                            >[color=green][color=darkred]
                            >> > Which is why I've said repeatedly that I'm not trying to suggest that
                            >> > regexes are bad, or should never be used. I'm just saying that in this
                            >> > case it's using a sledgehammer to crack a nut.[/color]
                            >>
                            >> And I don't in this case, as I think I've shown. Less typing, easy to
                            >> read,
                            >> straight forward - in this case.[/color]
                            >
                            > You've shown nothing of the kind - whereas I think I've given plenty of
                            > examples of how using regular expressions make the code less easily
                            > maintainable, even if you consider it equally readable to start with
                            > (which I don't).[/color]

                            Not in this specific case. I was never maintaining or pushing Regex for all
                            or any situations.

                            But I am not going to force my programmers to come to me to find out whether
                            or not Regex is the easiest way or not. That is up to the programmer. If
                            there is a problem with their code and feel the programmer is way off base
                            in his coding we would talk about (that would be the case with his C#, VB or
                            Regex code).
                            [color=blue]
                            >[color=green][color=darkred]
                            >> >> SalaryMax.Text =
                            >> >> String.Format(" {0:c}",Calculat eYearly(Regex.R eplace(WagesMax .Text,"\$|\,"," ")))
                            >> >>
                            >> >> At the time, I couldn't seem to find as simple a solution as this in
                            >> >> VB.Net
                            >> >> so I use this (not saying there isn't one).
                            >> >
                            >> > And of course there is:
                            >> > SalaryMax.Text =
                            >> > String.Format ("{0:c}",Calcul ateYearly(Wages Max.Text.Replac e("$", "")
                            >> > .Replace(",", ""));
                            >> >
                            >> > I know which version I'd rather read...[/color]
                            >>
                            >> I can read either (although, I didn't know you could string multiple
                            >> "Replace"s together).[/color]
                            >
                            > Yes, I can read either too. The point is that in reading my version, I
                            > didn't need to wade through various special characters, understanding
                            > exactly what was there for.[/color]

                            If you knew enough to know about Regex at all (which you said you would have
                            no problem with in some situations - so the programmers better be able to
                            read it), there should not be a problem with the 2 special characters which
                            is the same as C#. There is nothing obscure in this example - that I can
                            see.
                            [color=blue]
                            >Of course, your version wasn't even valid
                            > C#, as it didn't escape the backslashes and you didn't specify a
                            > verbatim literal. I assume it was originally VB.NET. I wonder which
                            > version would be easier to convert to valid C#? Mine, perhaps?[/color]

                            Actually, it was VB.Net.
                            [color=blue]
                            >[color=green][color=darkred]
                            >> > But I suspect you're more used to regular expressions than many other
                            >> > programmers - and making the code less readable for other programmers
                            >> > for no benefit is what makes it unwarranted here, even in the simple
                            >> > case where there's nothing to escape.[/color]
                            >>
                            >> First of all, I am not. I don't use it much at all, but I find it easy
                            >> to
                            >> figure out and staight forward (but you can make it really complex). I
                            >> use
                            >> it to validate phone numbers, credit card numbers, zip codes etc.[/color]
                            >
                            > And in all of those cases, regular expressions are really useful.[/color]

                            But according to you, you shouldn't use them as some of the programmers may
                            not be able to maintain it. Definately if they would have a problem with
                            our example.

                            Can't have it both ways. If you allow Regular Expressions, you shouldn't
                            have a problem if a programmer used the Regex or IndexOf in our example.
                            Anyone maintaining the "USEFUL" ones would have zero problems with this one.
                            [color=blue]
                            >[color=green]
                            >> Which are very well documented and when there are a myiad of ways a
                            >> user can put input these types of data, I prefer to use Regular
                            >> expressions which are all over the place (easy to find) then try to
                            >> come put with some complex set of loops and temporary variables which
                            >> make it far easier to make a mistake and much more unreadable the the
                            >> Regex equivelant.[/color]
                            >
                            > Where exactly are the complex loops and temporary variables in this
                            > specific case? After all, you have been arguing for using regular
                            > expressions in *this specific case*, haven't you?
                            >[/color]

                            I was obviously talking about Regular Expressions in general here as I was
                            refering to the standard ones you can get anywhere dealing with (Phone
                            numbers, credit card etc). There would be none in this case, obviously.
                            But there may be in more complicated cases.
                            [color=blue][color=green][color=darkred]
                            >> > Because it's more complicated! You can't deny that there's more to
                            >> > consider due to the escaping. There's more to know, more to consider,
                            >> > and it doesn't get the job done any more cleanly.[/color]
                            >>
                            >> Escaping seems to be your main compaint with it.[/color]
                            >
                            > It's the main potential source of problems, yes. It's a potential
                            > source of problems which simply doesn't exist when you use
                            > String.IndexOf.
                            >[color=green]
                            >> I have the same problem with C or VB when trying to remember when to use
                            >> "\"
                            >> vs "/" in paths or do I need to add "\" in front of my slash or quote.
                            >> These are inherent problems with pretty much all of them.[/color]
                            >
                            > You already need to know that when writing C# though - my use of
                            > String.IndexOf doesn't add to the volume of knowledge required.
                            >[/color]
                            It is still an issue. Just as the Regular expressions are. And again, if
                            you are going to allow Regex at all, you would still need to know about the
                            escapes.
                            [color=blue][color=green][color=darkred]
                            >> > As is using the power of regular expressions when there is an easier
                            >> > way - using IndexOf, which is *precisely* there to find one string
                            >> > within another.[/color]
                            >>
                            >> I am not discounting IndexOf, I am just saying that both work fine and
                            >> are
                            >> just as readable (in this case). In other cases, that may not be the
                            >> case
                            >> (with either C or Regex).[/color]
                            >
                            > Just because they're as readable *to you* doesn't mean they're as
                            > readable to everyone. How sure are you that the next engineer to read
                            > this code will be familiar with regular expressions? How sure are you
                            > that when you need to change it to look for a different string, you'll
                            > check whether any of the characters need to be escaped? Why would you
                            > even want to force that check on yourself?[/color]

                            Again - then don't allow them at all.[color=blue]
                            >[color=green][color=darkred]
                            >> > Do you really think it would take you that long to refamiliarise
                            >> > yourself with it? I don't see why it's a good idea to make some poor
                            >> > maintenance engineer who hasn't used regular expressions before try to
                            >> > figure out that *actually* you were just trying to find strings within
                            >> > each other just so you can keep your skill set current.[/color]
                            >>
                            >> So you would prefer to code to the lowest common denominator.[/color]
                            >
                            > When there's no good reason not to, absolutely.[/color]

                            I guess that is where we disagree.[color=blue]
                            >[color=green]
                            >> I am not going to code to the level of a junior programmer. I prefer
                            >> that
                            >> he learn to code to a higher level.[/color]
                            >
                            > Learning to solve problems as simply as possible *is* learning to code
                            > to a higher level.[/color]

                            No argument there.[color=blue]
                            >[color=green]
                            >> I am not saying that that you still should write decent, readable,
                            >> commented
                            >> code. But I am not going to limit myself because another programmer may
                            >> not
                            >> be able to read well written code. If that were the case, I would not be
                            >> writing objects (abstract classes, interfaces, etc).[/color]
                            >
                            > If it's not the simplest code for the situation, it's not well written
                            > IMO. If it introduces risk for no reward (the risk of maintenance
                            > failing to notice that they might need to escape something, versus no
                            > reward) then it's not well written.
                            >[/color]
                            I see no risk in the example we are talking about. At least, no more that
                            in the IndexOf solution (in this situation).
                            [color=blue][color=green][color=darkred]
                            >> > I've never had a problem with reading the documentation when I've
                            >> > needed to use regular expressions, without putting it in projects in
                            >> > places where I *don't* need it.[/color]
                            >>
                            >> "Need" is a personal question. I don't thing it applies here. You
                            >> prefer
                            >> IndexOf and I might prefer IsMatch.[/color]
                            >
                            > I bet if I showed my code to a random sample of a hundred C# developers
                            > and asked them to change it to search for "hello[there]", virtually all
                            > of them would get it right. I also bet that if I showed your code to
                            > them and asked them for the same change, some would fail to escape it
                            > appropriately. Do you disagree?[/color]

                            No. But then the same developers would have a problem with the more
                            complicated expressions you claim is useful.[color=blue]
                            >[color=green][color=darkred]
                            >> > Because it makes things more complicated for no benefit. The reflection
                            >> > example was a good one - that allows you to get a property value, so do
                            >> > you think it's a good idea to write:
                            >> >
                            >> > string x = (string) something.GetTy pe()
                            >> > .GetProperty("N ame")
                            >> > .GetValue(somet hing, null);
                            >> > or
                            >> >
                            >> > string x = something.Name;
                            >> >
                            >> > ?
                            >> >
                            >> > Maybe I should use the latter. After all, I wouldn't want to forget how
                            >> > to use reflection, would I?[/color]
                            >>
                            >> Lost me on that one.[/color]
                            >
                            > Both are ways of finding the value of a property. The first is harder
                            > to maintain and harder to read, just like your use of regular
                            > expressions in this instance. Now, which of the above snippets of code
                            > would you use, and why?[/color]

                            Since I am not sure why you would use the first, I would do the 2nd.

                            But in our case, I would still use either - as I see the Regex version as
                            easy as the IndexOf.

                            Tom


                            Comment

                            • Jon Skeet [C# MVP]

                              #29
                              Re: Search for multiple things in a string

                              tshad <tscheiderich@f tsolutions.com> wrote:[color=blue][color=green]
                              > > In plenty of places. It has a language with a defined syntax etc.[/color]
                              >
                              > Yes, but so are dolphin sounds.
                              >
                              > When I talk about a Programming Language - I am talking about a Procedural
                              > Language (C, Fortran, VB, Pascal, etc.).[/color]

                              So you wouldn't regard LISP as a programming language, just because
                              it's functional rather than procedural?

                              Of course, you didn't even specify "programmin g language" before.

                              Regular expressions form a language in computing, and that language
                              needs to be learned before being used, just as any other language does,
                              whether it's C#, HTML, XPath or VB.NET.
                              [color=blue][color=green]
                              > > The C# code in question would be:
                              > >
                              > > if (someVariable.I ndexOf ("firstliteral" ) != -1 ||
                              > > someVariable.In dexOf ("secondliteral ") != -1 ||
                              > > someVariable.In dexOf ("thirdliteral" ) != -1)
                              > >[/color]
                              >
                              > And the Regex version:
                              >
                              > if (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))[/color]

                              Right. Immediately the IndexOf value is more readable, by more clearly
                              separating the three separate strings which are being searched on.
                              (Oliver Sturm's version is more readable than that
                              [color=blue][color=green]
                              > > Unless there's another compelling argument in favour of one tool or
                              > > another, readability is a very important part of choosing the best
                              > > tool.[/color]
                              >
                              > Again, why do I need a compelling reason. If I have the solution and it
                              > happens to be Regex, I would use it, I wouldn't necessarily say to myself -
                              > "Is there perhaps a more readable way to write this? I wonder if Jim will
                              > be able to read this or not."[/color]

                              Then I'm afraid that's your problem. It sounds like you're basically
                              admitting that you're not that interested in readability. Personally, I
                              like writing code which is elegant but easy to maintain. Having *a*
                              solution which happens to work isn't enough when there are obviously
                              others available which could well be simpler.

                              Far more time is spent maintaining code than writing it in the first
                              place. Taking the attitude you take above just isn't cost-effective in
                              the long run.
                              [color=blue][color=green]
                              > > But you're effectively pushing them in the situation described by the
                              > > OP when you say that the solution using regular expressions is as
                              > > readable as the solution without.[/color]
                              >
                              > No.
                              >
                              > No pushing. No more than your pushing not using it.[/color]

                              But I'll readily admit to pushing the (IMO simpler) solution, for this
                              particular situation. So are you actually admitting that you *are*
                              pushing the use of regular expressions here?
                              [color=blue][color=green][color=darkred]
                              > >> ' The following replaces all multiple blanks with " ". It then takes
                              > >> ' out the anomalies, such as "and not and" and replaces them with "and"
                              > >>
                              > >> keywords = trim(Regex.Repl ace(keywords, "\s{2,}", " "))
                              > >> keywords = Regex.Replace(k eywords, "( )", " or ")
                              > >> keywords = Regex.Replace(k eywords," or or "," ")
                              > >> keywords = Regex.Replace(k eywords,"or and or","and")
                              > >> keywords = Regex.Replace(k eywords,"or near or","near")
                              > >> keywords = Regex.Replace(k eywords,"and not or","and not")
                              > >>
                              > >> Fairly straight forward and easy to follow.[/color]
                              > >
                              > > Reasonably, although apart from the first regex, I'd suggest doing the
                              > > rest with straight calls to String.Replace. As an example of why I
                              > > think that would be more readable, what exactly do the second line do?[/color]
                              >
                              > Actually, nothing. It is grouping a " ", which isn't necessary. I think I
                              > used to have something else there and took it out and didn't realize I
                              > didn't need the ().[/color]

                              So again, the code could be made more readable even by just modifying
                              the existing regex replacement, let alone by replacing the regular
                              expressions with simple String.Replace calls. Had they been
                              String.Replace calls, the meaning of the second line would have been
                              unambiguous - you'd have had to write it the simple way to start with.

                              Note that your first replacement will replace two tabs with a single
                              space, but leave one tab alone, by the way. It would be better to
                              replace "\s+" with the space, IMO.
                              [color=blue][color=green]
                              > > In some flavours of regular expressions, brackets form capturing
                              > > groups. Do they in .NET? I'd have to look it up. If it's really just
                              > > trying to replace the string "( )" with " or ", a call to
                              > > String.Replace would mean I didn't need to look anything up.[/color]
                              >
                              > Obviously, you didn't need to look this one up either - as you were correct.
                              > It is just grouping a blank.[/color]

                              I have had to look it up if you hadn't been answering the question
                              though. Why make the code harder to understand in the first place? If
                              you want to replace a space with " or ", just use
                              keywords = keywords.Replac e (" ", " or ");
                              Much more straightforward .
                              [color=blue][color=green][color=darkred]
                              > >> But writing objects and the objects themselves are not easily readable.
                              > >> But
                              > >> you would advocate not writing them, would you?[/color]
                              > >
                              > > No, but I don't see how that's relevant.[/color]
                              >
                              > Just that you don't want to Regex as it is not easily readable. Neither are
                              > Regex.[/color]

                              Eh?
                              [color=blue]
                              > But the fact a junior programmer might not understand Objects as you do
                              > would not prevent you from writing them, would you?[/color]

                              When using C#, one has to use objects. I will almost always try to
                              implement the simplest solution to a problem, unless there is a
                              compelling reason to use a more complex solution. That way, anyone
                              reading the code has to learn relatively little "extra" stuff beyond
                              the language itself.
                              [color=blue][color=green][color=darkred]
                              > >> But if you are using .Net, it is part of the mix.[/color]
                              > >
                              > > It's not necessarily part of the mix I have to use.[/color]
                              >
                              > You don't have to use lots of things. That doesn't make them invalid.
                              > Neither is the fact that you use Foreach vs For {}. They are there and are
                              > part of the mix as is Regex.[/color]

                              No, they really aren't. for and foreach are well-defined in the C#
                              language specification. If the program is in C# to start with, it is
                              reasonable to assume competency in C# on the part of the reader of the
                              code. It is *not* reasonable to assume competency in regular
                              expressions, and while that wouldn't prevent me from using regular
                              expressions where they provide value, they just *don't* here.
                              [color=blue]
                              > I might agree with you more if Regex were some
                              > component that you picked up and added. Or if Regex were some obscure
                              > technique that few knew about. They have been around for quite a long time
                              > and is just another gun in your arsenal. If I thought that MS were
                              > deprecating it, I would also think twice about using it. But it is part of
                              > .Net that all the languages can make use of and I would never tell a
                              > programmer, who may be really comfortable with it and uses it responsibly
                              > (not obscure cryptic non-commented code), that he should be using IndexOf
                              > instead.[/color]

                              Clearly not, as you seem to be keen on using them instead of simple
                              string manipulations all over the place - if I saw anyone using regular
                              expressions rather than String.Replace in the way you've shown in other
                              code posts, that code would never get through code review.
                              [color=blue][color=green]
                              > >I suspect *very*
                              > > few programs don't do any string manipulation - knowing the string
                              > > methods well is *far* more fundamental to .NET programming than knowing
                              > > regular expressions.[/color]
                              >
                              > I agree with part of that and think that regular expressions are just as
                              > important to know.[/color]

                              Why? I'm working on a fairly large project which hasn't needed to use
                              regular expressions and wouldn't have benefitted from them once. I
                              suspect many people could say the same thing. I suspect very few if any
                              of them could say the same thing about the basic string manipulation
                              methods - and yet you were surprised to see that one could call Replace
                              on the result of another Replace method call, which I'd consider a far
                              more "basic" level of understanding than knowledge of regular
                              expressions.
                              [color=blue]
                              > As we have been saying, it is here and many people use it, so to not
                              > understand it is to limit yourself.[/color]

                              It's one thing to understand the general power of regular expressions,
                              so you would know when they may be applicable - it's another thing to
                              use them when they serve no purpose beyond what can be more simply
                              achieved with the simple String methods.
                              [color=blue]
                              > You don't have to use it, but you should at least understand the
                              > basics of how it works. What are you going to do when someone uses a
                              > RegularExpressi onValidator and you don't understand what the
                              > expression is?[/color]

                              At that point, if I didn't understand the regular expression, I'd look
                              it up in the documentation. Do you know every part of regular
                              expression syntax off by heart?
                              [color=blue]
                              > The fact that it is not C# (neither is a textbox, datagrid, etc),
                              > doesn't mean you should understand them. Whether you use them is up
                              > to you.
                              >
                              > As you point out, you are not the only programmer and many programmers like
                              > to use Regex and that doesn't make them any lesser programmers. What are
                              > you going to when you run into their code?[/color]

                              If they're on my team, I'll tell them to refactor their code to only
                              use them when they're appropriate, frankly.
                              [color=blue]
                              > I see code all the time (much of the time it is mine) and wonder why the
                              > programmer didn't do it another way. There are many ways to skin a cat.
                              > Sometimes it is just style, sometimes it is all they know. But if they
                              > follow whatever standards are setup (and in your case maybe you forbid
                              > Regex) then as long as the code is well written and clean - I have no
                              > problem with it.[/color]

                              If code uses regular expressions when they serve no purpose, it is
                              *not* well written and clean though - it is less maintainable than it
                              might be.
                              [color=blue][color=green]
                              > > If you truly think that given two solutions which are otherwise equal,
                              > > the solution which is easiest to write, read and maintain doesn't win
                              > > hands down, we'll definitely never agree.[/color]
                              >
                              > I agree there.
                              >
                              > Which is easier to write is obviously your perception. I found my example,
                              > as easy as yours to write and just as readable.[/color]

                              And you believe that everyone else does? Again, bear in mind that
                              you're unlikely to be the only person ever to read your code.
                              [color=blue][color=green]
                              > > If you want to keep your hand in with respect to regular expressions,
                              > > do it in a test project, or with a regular expressions workbench. Keep
                              > > it out of code which needs to be read and maintained, probably by other
                              > > people who don't want to waste time because you wanted to keep your
                              > > skill set up to date.[/color]
                              >
                              > Keep regular expressions out of my code?????
                              >
                              > So now you are saying there is no use for it?[/color]

                              Not at all - I'm saying that you shouldn't put regular expressions in
                              your code just for the sake of keeping your hand in. Use them where
                              they're applicable, and only there.
                              [color=blue][color=green][color=darkred]
                              > >> I don't know all of the possible combinations of calls to every Object,
                              > >> but that doesn't preclude me from using them.[/color]
                              > >
                              > > Exactly - and you wouldn't go out of your way to use methods you don't
                              > > need, just to get into the habit of using them, would you?[/color]
                              >
                              > Sure.
                              >
                              > If it is valid. As I said there are many ways to skin ..., depending on the
                              > situation I may do it one way and the next time another way. Gives me many
                              > options. I don't do it willy nilly, as you seem to suggest, as a test
                              > bench.[/color]

                              But that's *exactly* what you've suggested you should do with regular
                              expressions - use them even when there's no real purpose in doing so,
                              just so that you remember what they look like.
                              [color=blue][color=green]
                              > > Absolutely - so why are you so keen on making people either memorise or
                              > > look up the characters which need escaping for regular expressions
                              > > every time they read or modify your code?[/color]
                              >
                              > I am not. I don't memorize. But I still use it.[/color]

                              Okay, so you don't memorise it, which means you *do* have to look up
                              which characters require escaping. I think you've just admitted that
                              your code is less maintainable than mine.
                              [color=blue][color=green]
                              > > I seem to be having difficulty making myself clear on this point: I
                              > > have never stated and will never state that you shouldn't use regular
                              > > expressions where they're appropriate. But they are *not* appropriate
                              > > in this case, as they are a more complex and less readable way of
                              > > solving the problem.[/color]
                              >
                              > No you are very clear. If you are so concerned with others being able to
                              > read your code and problems with escape characters - why would you EVER want
                              > them to use them. You can't have it both ways.[/color]

                              I would use them when the solution which uses regular expressions is
                              clearer than the solution which doesn't use them. It seems a pretty
                              simple policy to me.
                              [color=blue]
                              > If they would have a hard time with a nothing expression like "if
                              > (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))" - they are
                              > never going to get some of the of the other standard Regex solutions I
                              > mentioned before.[/color]

                              Those maintaining the code could no doubt understand it after looking
                              at it for a little while, just like they could work out your other
                              regular expressions after looking at them and consulting the
                              documentation - but why are you trying to make their jobs harder? Why
                              are you not concerned that the code you're writing is costing your
                              company money by making it harder to maintain than it needs to be?
                              [color=blue]
                              > As you said, the two solutions are equal. Your solution is that you MUST go
                              > with IndexOf. Mine is you can use either.[/color]

                              Well, they're equal in terms of their semantics. They're definitely not
                              equal in terms of maintainability , and as that's important to me, I
                              don't see what's wrong with saying that I'm very strongly in favour of
                              avoiding the less readable/maintainable code.
                              [color=blue][color=green]
                              > > Show me a problem where the regex way of solving it is simpler than
                              > > using simple string operations (and there are plenty of problems like
                              > > that) and I'll plump for the regex in a heartbeat.
                              > >[color=darkred]
                              > >> If I happen to know a good way in Regex to solve a problem, I am not
                              > >> going use *extra brainpower* to try to solve the problem in C#.[/color]
                              > >
                              > > In what way is using the method which is designed for *precisely* the
                              > > task in hand (finding something in a string) using extra brainpower?[/color]
                              >
                              > I wasn't referring to this particular issue when I said this.[/color]

                              It would have been nice if you'd indicated that. Do you agree then that
                              it doesn't actually take any more brainpower to come up with
                              String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
                              brainpower when it comes to maintaining the IndexOf solution?
                              [color=blue][color=green]
                              > >If
                              > > you're not familiar with String.IndexOf, you've got *much* bigger
                              > > things to worry about than whether or not your regular expression
                              > > skills are getting rusty.[/color]
                              >
                              > I never said I was not familier with IndexOf.
                              >
                              > As a matter of fact, the original question was given whether you could "do a
                              > search for more that one string in another string".[/color]

                              And of course the answer is "yes, by calling IndexOf multiple times".
                              [color=blue]
                              > *************** *************** *************** *************** ****
                              > Can you do a search for more that one string in another string?
                              >
                              > Something like:
                              >
                              > someString.Inde xOf("something1 ","something2", "something3 ",0)
                              >
                              > or would you have to do something like:
                              >
                              > if ((someString.In dexOf("somethin g1",0) >= 0) ||
                              > ((someString.In dexOf("somethin g2",0) >= 0) ||
                              > ((someString.In dexOf("somethin g3",0) >= 0))
                              > {
                              > Do something
                              > }
                              > *************** *************** *************** *************** ***************
                              > IndexOf doesn't do it. This was the original question. You have to do
                              > multiple calls as is said in the original question. Nicholas was correct in
                              > his assessment. One Regex call would work.[/color]

                              Yes, as would a single call to a method which called IndexOf on the
                              string multiple times. I disagree with you - Nicholas wasn't correct in
                              his assessment, as he claimed that the "best bet" would be to use a
                              regular expression. Using regular expressions is just *not* the best
                              bet here - it requires more effort, as I've described repeatedly.
                              [color=blue][color=green]
                              > > Okay - now suppose I need to change it from searching for "something1 "
                              > > to "something. 1" or "something[1]". How long does it take to change in
                              > > each version? How easy is it to read afterwards?[/color]
                              >
                              > That wasn't the question.[/color]

                              Are you suggesting that maintainability isn't something that should be
                              considered? Do you *really* want to look for "something1 ",
                              "something2 " and "something3 " or were they (as I suspect) just
                              examples, and the real values could easily have dots, brackets etc in?
                              [color=blue]
                              > What if you wanted to change "something1 " to "something\ ". Same problem.[/color]

                              Well, half the problem with IndexOf than it is with regular
                              expressions. With regular expressions, you'd need to know that not only
                              does backslash need escaping in C#, it also needs escaping in regular
                              expressions.

                              IndexOf: "something\ \" or @"something\ "
                              Regex: "something\ \\\" or @"something\ \"

                              Once again, the IndexOf version is easier to understand - there's less
                              to mentally unescape to work out what's actually being asked for.
                              [color=blue]
                              > And if escapes were a problem (if it were me) I would have a little sheet
                              > that showed them at my desk within easy reach.[/color]

                              Whereas by needing to know less (just the C# escapes) it's really easy
                              to memorise everything I need to know to solve this situation.
                              [color=blue][color=green][color=darkred]
                              > >> So I am at a loss as to how this regular expression is more unreadable
                              > >> than
                              > >> the C# counterpart. That is not to say that you couldn't make it more
                              > >> unreadable - but you could do the same with C# if you wanted to.[/color]
                              > >
                              > > You could start by making the C# more readable, as I've shown...[/color]
                              >
                              > As you can with Regular Expressions.[/color]

                              Well, Oliver Sturm has shown a more readable version, but you seem to
                              be keen on the "put them all in the same line" version.

                              Neither is as readable as the String.IndexOf version, however.
                              [color=blue][color=green]
                              > > However, the regex is already less readable:
                              > > 1) It's got "|" as a "magic character" in there.[/color]
                              >
                              > | = or (same as C)[/color]

                              Yup, but it's something that isn't used in string literals other than
                              for regular expressions. It's an extra thing to bear in mind
                              unnecessarily.
                              [color=blue][color=green]
                              > > 2) It's got all the strings concatenated, so it's harder to spot each
                              > > of them separately.[/color]
                              >
                              > You are kidding, right?[/color]

                              Absolutely not! It's significantly easier to spot the three separate
                              values when they're three separate strings than when they're all mashed
                              together.
                              [color=blue][color=green]
                              > > Furthermore, suppose you didn't just want to search for literals -
                              > > suppose one of the strings you wanted to search for was contained in a
                              > > variable. How sure are you that *no-one* on your team would use:
                              > >
                              > > x+"|something2| something3"
                              > >
                              > > as the regular expression?[/color]
                              >
                              > You are now leaving the original question. I never said that Regular
                              > Expressions was the better (or not better) in all cases.[/color]

                              While I'm leaving the exact original question, it's far from out of the
                              question that the original code wouldn't need to be changed to use a
                              variable to be searched for some time. At that point, can you guarantee
                              that your team would get it right? They'd need to be on their guard
                              when using regular expressions - they wouldn't need to be on their
                              guard using IndexOf.
                              [color=blue][color=green]
                              > > I would tell programmers on my team not to use regular expressions
                              > > where the alternative is simpler and more readbale, yes.[/color]
                              >
                              > Why use them at all? It isn't readable.[/color]

                              They aren't as readable *in this case*. In other, more complicated
                              situations, the version which only used IndexOf would be harder to read
                              than the regular expression version.

                              Using a regular expression is like getting a car compared with walking
                              somewhere - it's absolutely the right thing to do when you're going on
                              a long journey, but in this case you're advocating getting in a car
                              just to travel to the next room. It's simpler to walk.
                              [color=blue]
                              > And if your programmers can't maintain the simple Regexs, they definately
                              > won't be able to handle the more complicated ones.[/color]

                              You seem to fail to grasp the "make it as simple as possible" concept.
                              It's not a case of maintenance engineers being idiots - it's about
                              presenting them with fewer possible risks. Why leave them a trap to
                              fall into when you can write simpler code which is easier to change
                              later on?
                              [color=blue][color=green]
                              > > You've shown nothing of the kind - whereas I think I've given plenty of
                              > > examples of how using regular expressions make the code less easily
                              > > maintainable, even if you consider it equally readable to start with
                              > > (which I don't).[/color]
                              >
                              > Not in this specific case. I was never maintaining or pushing Regex for all
                              > or any situations.[/color]

                              But you're pushing for regular expressions in *this* situation, or at
                              least saying it's just as good as using IndexOf. You've also shown in
                              your other code that you use regular expressions unnecessarily for
                              replacement, making a simple two-step replacement into a complicated
                              single-step replacement where the number of characters which *aren't*
                              just plain text is greater than the number of characters which are.
                              [color=blue]
                              > But I am not going to force my programmers to come to me to find out whether
                              > or not Regex is the easiest way or not. That is up to the programmer. If
                              > there is a problem with their code and feel the programmer is way off base
                              > in his coding we would talk about (that would be the case with his C#, VB or
                              > Regex code).[/color]

                              Using regular expressions in this case *is* a problem with their code,
                              IMO. It's just asking for trouble later on.
                              [color=blue][color=green]
                              > > Yes, I can read either too. The point is that in reading my version, I
                              > > didn't need to wade through various special characters, understanding
                              > > exactly what was there for.[/color]
                              >
                              > If you knew enough to know about Regex at all (which you said you would have
                              > no problem with in some situations - so the programmers better be able to
                              > read it), there should not be a problem with the 2 special characters which
                              > is the same as C#. There is nothing obscure in this example - that I can
                              > see.[/color]

                              Of course there is - to work out what's going on, you've got to
                              mentally unescape the dollar and the comma, but *not* mentally unescape
                              the |. All that rather than just "replace dollar with space, replace
                              comma with space" in a simple form with no hidden meanings to anything.
                              [color=blue][color=green]
                              > >Of course, your version wasn't even valid
                              > > C#, as it didn't escape the backslashes and you didn't specify a
                              > > verbatim literal. I assume it was originally VB.NET. I wonder which
                              > > version would be easier to convert to valid C#? Mine, perhaps?[/color]
                              >
                              > Actually, it was VB.Net.[/color]

                              Right. So in the C#, you'd either have to have more escapes, or make
                              them verbatim literals. More stuff to get right. Note how no escaping
                              at all is required in my version.
                              [color=blue][color=green]
                              > > And in all of those cases, regular expressions are really useful.[/color]
                              >
                              > But according to you, you shouldn't use them as some of the programmers may
                              > not be able to maintain it.[/color]

                              <sigh> If you actually believe that, you haven't been reading what I've
                              been writing.
                              [color=blue]
                              > Definately if they would have a problem with our example.
                              >
                              > Can't have it both ways. If you allow Regular Expressions, you shouldn't
                              > have a problem if a programmer used the Regex or IndexOf in our example.
                              > Anyone maintaining the "USEFUL" ones would have zero problems with this one.[/color]

                              How very black and white of you. Do you really have no concept of
                              someone being able to understand something, but having a harder time
                              understanding it one way than the other?
                              [color=blue][color=green][color=darkred]
                              > >> Which are very well documented and when there are a myiad of ways a
                              > >> user can put input these types of data, I prefer to use Regular
                              > >> expressions which are all over the place (easy to find) then try to
                              > >> come put with some complex set of loops and temporary variables which
                              > >> make it far easier to make a mistake and much more unreadable the the
                              > >> Regex equivelant.[/color]
                              > >
                              > > Where exactly are the complex loops and temporary variables in this
                              > > specific case? After all, you have been arguing for using regular
                              > > expressions in *this specific case*, haven't you?[/color]
                              >
                              > I was obviously talking about Regular Expressions in general here as I was
                              > refering to the standard ones you can get anywhere dealing with (Phone
                              > numbers, credit card etc). There would be none in this case, obviously.
                              > But there may be in more complicated cases.[/color]

                              Yes - the complicated cases where I've already said that regular
                              expressions are useful!
                              [color=blue][color=green]
                              > > You already need to know that when writing C# though - my use of
                              > > String.IndexOf doesn't add to the volume of knowledge required.[/color]
                              >
                              > It is still an issue.[/color]

                              Yes, it's still going to be harder to search for "some\thing " than
                              "something" . However, it's *not* going to be harder to search for
                              "some.thing ", or "(something )", or "[something]", or "some,thing ", or
                              "some*thing " or "some+thing " etc. Furthermore, there's still going to
                              be less to remember when you *are* faced with searching for
                              "some\thing " than there would be using regular expressions.
                              [color=blue]
                              > Just as the Regular expressions are. And again, if
                              > you are going to allow Regex at all, you would still need to know about the
                              > escapes.[/color]

                              You'd need to know about the escapes where regular expressions are
                              used. The fewer places they're used, the fewer times someone will need
                              to look them up in the documentation.
                              [color=blue][color=green]
                              > > Just because they're as readable *to you* doesn't mean they're as
                              > > readable to everyone. How sure are you that the next engineer to read
                              > > this code will be familiar with regular expressions? How sure are you
                              > > that when you need to change it to look for a different string, you'll
                              > > check whether any of the characters need to be escaped? Why would you
                              > > even want to force that check on yourself?[/color]
                              >
                              > Again - then don't allow them at all.[/color]

                              No, just allow them where they make sense. Note that if you only use
                              them where they're going to be doing something fairly involved, it's
                              much less likely that an engineer will forget that he's actually
                              dealing with a regular expression than with a simple string.
                              [color=blue][color=green]
                              > > When there's no good reason not to, absolutely.[/color]
                              >
                              > I guess that is where we disagree.[/color]

                              It certainly sounds like it.
                              [color=blue][color=green][color=darkred]
                              > >> I am not going to code to the level of a junior programmer. I prefer
                              > >> that
                              > >> he learn to code to a higher level.[/color]
                              > >
                              > > Learning to solve problems as simply as possible *is* learning to code
                              > > to a higher level.[/color]
                              >
                              > No argument there.[/color]

                              But regular expressions are by their very nature more complicated than
                              a simple String.IndexOf call. If they weren't they wouldn't be as
                              powerful as they are.
                              [color=blue][color=green]
                              > > If it's not the simplest code for the situation, it's not well written
                              > > IMO. If it introduces risk for no reward (the risk of maintenance
                              > > failing to notice that they might need to escape something, versus no
                              > > reward) then it's not well written.[/color]
                              >
                              > I see no risk in the example we are talking about. At least, no more that
                              > in the IndexOf solution (in this situation).[/color]

                              You don't think there's any risk that someone will forget one of the
                              regular expression characters which needs escaping? There is no string
                              you could need to search for which needs *less* escaping in regular
                              expressions than with String.IndexOf, but there are *lots* of strings
                              which need more escaping - thus there's more overall risk.
                              [color=blue][color=green]
                              > > I bet if I showed my code to a random sample of a hundred C# developers
                              > > and asked them to change it to search for "hello[there]", virtually all
                              > > of them would get it right. I also bet that if I showed your code to
                              > > them and asked them for the same change, some would fail to escape it
                              > > appropriately. Do you disagree?[/color]
                              >
                              > No. But then the same developers would have a problem with the more
                              > complicated expressions you claim is useful.[/color]

                              Actually, the fact that they were presented with a complicated
                              expression would immediately make them wary, I suspect. Problems tend
                              to creep in when something *looks* simpler than it actually is - as is
                              the case here.
                              [color=blue][color=green]
                              > > Both are ways of finding the value of a property. The first is harder
                              > > to maintain and harder to read, just like your use of regular
                              > > expressions in this instance. Now, which of the above snippets of code
                              > > would you use, and why?[/color]
                              >
                              > Since I am not sure why you would use the first, I would do the 2nd.[/color]

                              You'd use the first to keep up your knowledge of reflection, of course.
                              After all, if you don't use it, you lose it, right? That's your
                              argument for using regular expressions where they're completely
                              unnecessary and provide no benefit, after all.
                              [color=blue]
                              > But in our case, I would still use either - as I see the Regex version as
                              > easy as the IndexOf.[/color]

                              I think we'll have to agree to disagree. You seem to be unable to grasp
                              the idea that there are more potential pitfalls and more knowledge
                              required for the regular expression version than for the IndexOf
                              version.

                              --
                              Jon Skeet - <skeet@pobox.co m>
                              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                              If replying to the group, please do not mail me too

                              Comment

                              • tshad

                                #30
                                Re: Search for multiple things in a string

                                I'm back.

                                Was a little busy and didn't have time to respond.

                                "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                                news:MPG.1d9ab4 9a4d273e4298c78 c@msnews.micros oft.com...[color=blue]
                                > tshad <tscheiderich@f tsolutions.com> wrote:[color=green][color=darkred]
                                >> > In plenty of places. It has a language with a defined syntax etc.[/color]
                                >>
                                >> Yes, but so are dolphin sounds.
                                >>
                                >> When I talk about a Programming Language - I am talking about a
                                >> Procedural
                                >> Language (C, Fortran, VB, Pascal, etc.).[/color]
                                >
                                > So you wouldn't regard LISP as a programming language, just because
                                > it's functional rather than procedural?
                                >[/color]
                                I don't know much about LISP, but Mathematics is also a language, but not
                                the same way as English and German are.
                                [color=blue]
                                > Of course, you didn't even specify "programmin g language" before.[/color]

                                True.

                                But I did specify, that it depends on how you define it.
                                [color=blue]
                                >
                                > Regular expressions form a language in computing, and that language
                                > needs to be learned before being used, just as any other language does,
                                > whether it's C#, HTML, XPath or VB.NET.
                                >[/color]

                                OK
                                [color=blue][color=green][color=darkred]
                                >> > The C# code in question would be:
                                >> >
                                >> > if (someVariable.I ndexOf ("firstliteral" ) != -1 ||
                                >> > someVariable.In dexOf ("secondliteral ") != -1 ||
                                >> > someVariable.In dexOf ("thirdliteral" ) != -1)
                                >> >[/color]
                                >>
                                >> And the Regex version:
                                >>
                                >> if (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))[/color]
                                >
                                > Right. Immediately the IndexOf value is more readable, by more clearly
                                > separating the three separate strings which are being searched on.
                                > (Oliver Sturm's version is more readable than that[/color]


                                I assume you mean "if (Regex.IsMatch( myString, @"something[123]"))".

                                But actually they are both Olivers.

                                I don't agree there. I think the Regex is just as readable, as long as you
                                have a bit of Regular Expression understanding, obviously. I also think
                                that if you understand C and didn't understand Regex - you would get what it
                                is saying (IsMatch is pretty much of a giveaway). Much than if you didn't
                                understand C and so the IndexOf - which doesn't really telling you what it
                                is doing. IsMatch is much more understandable term than IndexOf.
                                [color=blue]
                                >[color=green][color=darkred]
                                >> > Unless there's another compelling argument in favour of one tool or
                                >> > another, readability is a very important part of choosing the best
                                >> > tool.[/color]
                                >>
                                >> Again, why do I need a compelling reason. If I have the solution and it
                                >> happens to be Regex, I would use it, I wouldn't necessarily say to
                                >> myself -
                                >> "Is there perhaps a more readable way to write this? I wonder if Jim
                                >> will
                                >> be able to read this or not."[/color]
                                >
                                > Then I'm afraid that's your problem. It sounds like you're basically
                                > admitting that you're not that interested in readability. Personally, I
                                > like writing code which is elegant but easy to maintain. Having *a*
                                > solution which happens to work isn't enough when there are obviously
                                > others available which could well be simpler.
                                >[/color]
                                I never said that.

                                I never said readability is not an issue, but I am not going to write "Cat
                                in the Hat" instead of a novel so that the programmers with the simplest of
                                experience can read it. But I am not going to write cryptic code either so
                                they can't read it.

                                I assume there are company standards to program by and I would follow that.
                                [color=blue]
                                >Having *a*
                                > solution which happens to work isn't enough when there are obviously
                                > others available which could well be simpler.[/color]

                                I am not writing simple code, I am writing code to handle a problem. I
                                prefer to write good code not simple code. Sometimes they are synonymous,
                                sometimes they aren't.

                                But in our case, I still them as equally readable.
                                [color=blue]
                                > Far more time is spent maintaining code than writing it in the first
                                > place. Taking the attitude you take above just isn't cost-effective in
                                > the long run.[/color]

                                Don't agree there.[color=blue]
                                >[color=green][color=darkred]
                                >> > But you're effectively pushing them in the situation described by the
                                >> > OP when you say that the solution using regular expressions is as
                                >> > readable as the solution without.[/color]
                                >>
                                >> No.
                                >>
                                >> No pushing. No more than your pushing not using it.[/color]
                                >
                                > But I'll readily admit to pushing the (IMO simpler) solution, for this
                                > particular situation. So are you actually admitting that you *are*
                                > pushing the use of regular expressions here?
                                >[/color]
                                In your opinion (as you say).

                                And you obviously are not listening. I am not pushing either side. I have
                                been saying over and over that in this situation, they are the same (IMO).
                                I am not pushing Regex nor am I ruling them out. You however, can't make up
                                your mind. One minute you say that something as simple as the example we
                                are using is too complex for a programmer and then proceed to say that you
                                would use Regex in other situations (which would have to be more
                                complicated), makes no sense.
                                [color=blue][color=green][color=darkred]
                                >> >> ' The following replaces all multiple blanks with " ". It then takes
                                >> >> ' out the anomalies, such as "and not and" and replaces them with
                                >> >> "and"
                                >> >>
                                >> >> keywords = trim(Regex.Repl ace(keywords, "\s{2,}", " "))
                                >> >> keywords = Regex.Replace(k eywords, "( )", " or ")
                                >> >> keywords = Regex.Replace(k eywords," or or "," ")
                                >> >> keywords = Regex.Replace(k eywords,"or and or","and")
                                >> >> keywords = Regex.Replace(k eywords,"or near or","near")
                                >> >> keywords = Regex.Replace(k eywords,"and not or","and not")
                                >> >>
                                >> >> Fairly straight forward and easy to follow.
                                >> >
                                >> > Reasonably, although apart from the first regex, I'd suggest doing the
                                >> > rest with straight calls to String.Replace. As an example of why I
                                >> > think that would be more readable, what exactly do the second line do?[/color]
                                >>
                                >> Actually, nothing. It is grouping a " ", which isn't necessary. I think
                                >> I
                                >> used to have something else there and took it out and didn't realize I
                                >> didn't need the ().[/color]
                                >
                                > So again, the code could be made more readable even by just modifying
                                > the existing regex replacement, let alone by replacing the regular
                                > expressions with simple String.Replace calls. Had they been
                                > String.Replace calls, the meaning of the second line would have been
                                > unambiguous - you'd have had to write it the simple way to start with.
                                >[/color]
                                I am not saying there may not be other ways to write the code. As I said, I
                                often rewrite my own code later as I see a way I like better that I may not
                                have thought of at the time I wrote it. Many times it isn't better code,
                                just different.
                                [color=blue]
                                > Note that your first replacement will replace two tabs with a single
                                > space, but leave one tab alone, by the way. It would be better to
                                > replace "\s+" with the space, IMO.[/color]

                                Probably true. I am not a Regex expert. That was what I came up with at
                                the time.[color=blue]
                                >[color=green][color=darkred]
                                >> > In some flavours of regular expressions, brackets form capturing
                                >> > groups. Do they in .NET? I'd have to look it up. If it's really just
                                >> > trying to replace the string "( )" with " or ", a call to
                                >> > String.Replace would mean I didn't need to look anything up.[/color]
                                >>
                                >> Obviously, you didn't need to look this one up either - as you were
                                >> correct.
                                >> It is just grouping a blank.[/color]
                                >
                                > I have had to look it up if you hadn't been answering the question
                                > though. Why make the code harder to understand in the first place? If
                                > you want to replace a space with " or ", just use
                                > keywords = keywords.Replac e (" ", " or ");
                                > Much more straightforward .
                                >[/color]
                                Even in C, which I have used for years, I have to look up parameters to make
                                sure I have the right parameters and have them in the right order.

                                As I said, the Parens were probably a mistake and may have made some changes
                                to the line and left the parens in. I agree yours is the correct one.
                                [color=blue][color=green][color=darkred]
                                >> >> But writing objects and the objects themselves are not easily
                                >> >> readable.
                                >> >> But
                                >> >> you would advocate not writing them, would you?
                                >> >
                                >> > No, but I don't see how that's relevant.[/color]
                                >>
                                >> Just that you don't want to Regex as it is not easily readable. Neither
                                >> are
                                >> Regex.[/color]
                                >
                                > Eh?
                                >[/color]
                                Must have had a little brain fade there. Not sure what I was saying.
                                [color=blue][color=green]
                                >> But the fact a junior programmer might not understand Objects as you do
                                >> would not prevent you from writing them, would you?[/color]
                                >
                                > When using C#, one has to use objects. I will almost always try to
                                > implement the simplest solution to a problem, unless there is a
                                > compelling reason to use a more complex solution. That way, anyone
                                > reading the code has to learn relatively little "extra" stuff beyond
                                > the language itself.[/color]

                                That isn't the point.

                                We are talking readability here. So don't write any objects. You can use
                                the ones you need to, but if you write objects and someone has to maintain
                                it, it could be a problem if he doesn't understand objects.

                                You can write the same code in straight C to do what objects do. We got
                                along fine before there were objects. So I think, based on your statements,
                                you should write the easier code that some very junior programmer might have
                                to read.
                                [color=blue]
                                >[color=green][color=darkred]
                                >> >> But if you are using .Net, it is part of the mix.
                                >> >
                                >> > It's not necessarily part of the mix I have to use.[/color]
                                >>
                                >> You don't have to use lots of things. That doesn't make them invalid.
                                >> Neither is the fact that you use Foreach vs For {}. They are there and
                                >> are
                                >> part of the mix as is Regex.[/color]
                                >
                                > No, they really aren't. for and foreach are well-defined in the C#
                                > language specification. If the program is in C# to start with, it is
                                > reasonable to assume competency in C# on the part of the reader of the
                                > code. It is *not* reasonable to assume competency in regular
                                > expressions, and while that wouldn't prevent me from using regular
                                > expressions where they provide value, they just *don't* here.[/color]

                                But I am not writing in C# only. I am writing in .Net.[color=blue]
                                >[color=green]
                                >> I might agree with you more if Regex were some
                                >> component that you picked up and added. Or if Regex were some obscure
                                >> technique that few knew about. They have been around for quite a long
                                >> time
                                >> and is just another gun in your arsenal. If I thought that MS were
                                >> deprecating it, I would also think twice about using it. But it is part
                                >> of
                                >> .Net that all the languages can make use of and I would never tell a
                                >> programmer, who may be really comfortable with it and uses it responsibly
                                >> (not obscure cryptic non-commented code), that he should be using IndexOf
                                >> instead.[/color]
                                >
                                > Clearly not, as you seem to be keen on using them instead of simple
                                > string manipulations all over the place - if I saw anyone using regular
                                > expressions rather than String.Replace in the way you've shown in other
                                > code posts, that code would never get through code review.
                                >[/color]
                                Obviously, you micro manage more than I.

                                If you would have a problem with our examples, I don't think I would like to
                                work in your team.

                                In my area, if your code is reasonable and well written and it follows our
                                standards, it's fine.
                                [color=blue][color=green][color=darkred]
                                >> >I suspect *very*
                                >> > few programs don't do any string manipulation - knowing the string
                                >> > methods well is *far* more fundamental to .NET programming than knowing
                                >> > regular expressions.[/color]
                                >>
                                >> I agree with part of that and think that regular expressions are just as
                                >> important to know.[/color]
                                >
                                > Why?[/color]

                                Because they are perfectly valid and as you said before there are some that
                                are useful (therefore, you should know them as someone might use them and
                                you may have to maintain it).
                                [color=blue]
                                >I'm working on a fairly large project which hasn't needed to use
                                > regular expressions and wouldn't have benefitted from them once.[/color]

                                That's your style and position, but may not be someone else's.
                                [color=blue]
                                >I suspect many people could say the same thing. I suspect very few if any
                                > of them could say the same thing about the basic string manipulation
                                > methods - and yet you were surprised to see that one could call Replace
                                > on the result of another Replace method call, which I'd consider a far
                                > more "basic" level of understanding than knowledge of regular
                                > expressions.
                                >[color=green]
                                >> As we have been saying, it is here and many people use it, so to not
                                >> understand it is to limit yourself.[/color]
                                >
                                > It's one thing to understand the general power of regular expressions,
                                > so you would know when they may be applicable - it's another thing to
                                > use them when they serve no purpose beyond what can be more simply
                                > achieved with the simple String methods.
                                >[color=green]
                                >> You don't have to use it, but you should at least understand the
                                >> basics of how it works. What are you going to do when someone uses a
                                >> RegularExpressi onValidator and you don't understand what the
                                >> expression is?[/color]
                                >
                                > At that point, if I didn't understand the regular expression, I'd look
                                > it up in the documentation. Do you know every part of regular
                                > expression syntax off by heart?[/color]

                                According to your position, you should ban them altogether for ANY use,
                                since you can do anything in C# you can do in Regex.[color=blue]
                                >[color=green]
                                >> The fact that it is not C# (neither is a textbox, datagrid, etc),
                                >> doesn't mean you should understand them. Whether you use them is up
                                >> to you.
                                >>
                                >> As you point out, you are not the only programmer and many programmers
                                >> like
                                >> to use Regex and that doesn't make them any lesser programmers. What are
                                >> you going to when you run into their code?[/color]
                                >
                                > If they're on my team, I'll tell them to refactor their code to only
                                > use them when they're appropriate, frankly.[/color]

                                Appropriate as defined by you. Why allow them at all?[color=blue]
                                >[color=green]
                                >> I see code all the time (much of the time it is mine) and wonder why the
                                >> programmer didn't do it another way. There are many ways to skin a cat.
                                >> Sometimes it is just style, sometimes it is all they know. But if they
                                >> follow whatever standards are setup (and in your case maybe you forbid
                                >> Regex) then as long as the code is well written and clean - I have no
                                >> problem with it.[/color]
                                >
                                > If code uses regular expressions when they serve no purpose, it is
                                > *not* well written and clean though - it is less maintainable than it
                                > might be.
                                >[/color]
                                They serve a purpose. They do the same as your string routines, so there is
                                a pupose. Both are string handling routines.
                                [color=blue][color=green][color=darkred]
                                >> > If you truly think that given two solutions which are otherwise equal,
                                >> > the solution which is easiest to write, read and maintain doesn't win
                                >> > hands down, we'll definitely never agree.[/color]
                                >>
                                >> I agree there.
                                >>
                                >> Which is easier to write is obviously your perception. I found my
                                >> example,
                                >> as easy as yours to write and just as readable.[/color]
                                >
                                > And you believe that everyone else does? Again, bear in mind that
                                > you're unlikely to be the only person ever to read your code.[/color]

                                So you should never EVER use Regex. Someone else might read your code.

                                This is going in circles.

                                As I said, I would have a problem with someone who couldn't figure out what
                                the example we were using was doing.[color=blue]
                                >[color=green][color=darkred]
                                >> > If you want to keep your hand in with respect to regular expressions,
                                >> > do it in a test project, or with a regular expressions workbench. Keep
                                >> > it out of code which needs to be read and maintained, probably by other
                                >> > people who don't want to waste time because you wanted to keep your
                                >> > skill set up to date.[/color]
                                >>
                                >> Keep regular expressions out of my code?????
                                >>
                                >> So now you are saying there is no use for it?[/color]
                                >
                                > Not at all - I'm saying that you shouldn't put regular expressions in
                                > your code just for the sake of keeping your hand in. Use them where
                                > they're applicable, and only there.[/color]

                                There either is a use or not. You can't say there is a use for it and then
                                brow beat a programmer because he happens to like to use it. Has a
                                programmer got to come to you each time he wants to use it to get your
                                permission.

                                I can see it if he writes some obscure cyptic Regular Expression - but come
                                on.
                                [color=blue]
                                >[color=green][color=darkred]
                                >> >> I don't know all of the possible combinations of calls to every
                                >> >> Object,
                                >> >> but that doesn't preclude me from using them.
                                >> >
                                >> > Exactly - and you wouldn't go out of your way to use methods you don't
                                >> > need, just to get into the habit of using them, would you?[/color]
                                >>
                                >> Sure.
                                >>
                                >> If it is valid. As I said there are many ways to skin ..., depending on
                                >> the
                                >> situation I may do it one way and the next time another way. Gives me
                                >> many
                                >> options. I don't do it willy nilly, as you seem to suggest, as a test
                                >> bench.[/color]
                                >
                                > But that's *exactly* what you've suggested you should do with regular
                                > expressions - use them even when there's no real purpose in doing so,
                                > just so that you remember what they look like.[/color]

                                Sure.

                                If they are both perfectly valid, I might. Depends on my mood (you should
                                really have a problem with that). :)[color=blue]
                                >[color=green][color=darkred]
                                >> > Absolutely - so why are you so keen on making people either memorise or
                                >> > look up the characters which need escaping for regular expressions
                                >> > every time they read or modify your code?[/color]
                                >>
                                >> I am not. I don't memorize. But I still use it.[/color]
                                >
                                > Okay, so you don't memorise it, which means you *do* have to look up
                                > which characters require escaping. I think you've just admitted that
                                > your code is less maintainable than mine.[/color]

                                No.

                                I can maintain my car, but I might still have to look up specs on it.[color=blue]
                                >[color=green][color=darkred]
                                >> > I seem to be having difficulty making myself clear on this point: I
                                >> > have never stated and will never state that you shouldn't use regular
                                >> > expressions where they're appropriate. But they are *not* appropriate
                                >> > in this case, as they are a more complex and less readable way of
                                >> > solving the problem.[/color]
                                >>
                                >> No you are very clear. If you are so concerned with others being able to
                                >> read your code and problems with escape characters - why would you EVER
                                >> want
                                >> them to use them. You can't have it both ways.[/color]
                                >
                                > I would use them when the solution which uses regular expressions is
                                > clearer than the solution which doesn't use them. It seems a pretty
                                > simple policy to me.[/color]

                                If they are not readable, you shouldn't use them at all. I personally think
                                they are both readable, in this case.[color=blue]
                                >[color=green]
                                >> If they would have a hard time with a nothing expression like "if
                                >> (Regex.IsMatch( myString, @"something1|so mething2|someth ing3"))" - they
                                >> are
                                >> never going to get some of the of the other standard Regex solutions I
                                >> mentioned before.[/color]
                                >
                                > Those maintaining the code could no doubt understand it after looking
                                > at it for a little while, just like they could work out your other
                                > regular expressions after looking at them and consulting the
                                > documentation - but why are you trying to make their jobs harder? Why
                                > are you not concerned that the code you're writing is costing your
                                > company money by making it harder to maintain than it needs to be?[/color]

                                Again, then you feel there is no place for Regex as you can do anything with
                                C# that you can do with Regex. As you say, it will always be harder to
                                read.
                                [color=blue]
                                >[color=green]
                                >> As you said, the two solutions are equal. Your solution is that you MUST
                                >> go
                                >> with IndexOf. Mine is you can use either.[/color]
                                >
                                > Well, they're equal in terms of their semantics. They're definitely not
                                > equal in terms of maintainability , and as that's important to me, I
                                > don't see what's wrong with saying that I'm very strongly in favour of
                                > avoiding the less readable/maintainable code.
                                >[/color]
                                I didn't say that.
                                [color=blue][color=green][color=darkred]
                                >> > Show me a problem where the regex way of solving it is simpler than
                                >> > using simple string operations (and there are plenty of problems like
                                >> > that) and I'll plump for the regex in a heartbeat.
                                >> >
                                >> >> If I happen to know a good way in Regex to solve a problem, I am not
                                >> >> going use *extra brainpower* to try to solve the problem in C#.
                                >> >
                                >> > In what way is using the method which is designed for *precisely* the
                                >> > task in hand (finding something in a string) using extra brainpower?[/color]
                                >>
                                >> I wasn't referring to this particular issue when I said this.[/color]
                                >
                                > It would have been nice if you'd indicated that. Do you agree then that
                                > it doesn't actually take any more brainpower to come up with
                                > String.IndexOf instead of Regex.IsMatch, but in fact it takes *less*
                                > brainpower when it comes to maintaining the IndexOf solution?
                                >[/color]
                                In this case, no. In other cases, could be. Would have to look at it. I
                                never said that Regex is the best thing out there. I was just saying that
                                it is valid and can be readable - can also be cryptic (as can C#).
                                [color=blue][color=green][color=darkred]
                                >> >If
                                >> > you're not familiar with String.IndexOf, you've got *much* bigger
                                >> > things to worry about than whether or not your regular expression
                                >> > skills are getting rusty.[/color]
                                >>
                                >> I never said I was not familier with IndexOf.
                                >>
                                >> As a matter of fact, the original question was given whether you could
                                >> "do a
                                >> search for more that one string in another string".[/color]
                                >
                                > And of course the answer is "yes, by calling IndexOf multiple times".[/color]

                                That wasn't the question asked. That was the example that was given and the
                                question was can you do it in one statement.

                                So the answer is no, using IndexOf.[color=blue]
                                >[color=green]
                                >> *************** *************** *************** *************** ****
                                >> Can you do a search for more that one string in another string?
                                >>
                                >> Something like:
                                >>
                                >> someString.Inde xOf("something1 ","something2", "something3 ",0)
                                >>
                                >> or would you have to do something like:
                                >>
                                >> if ((someString.In dexOf("somethin g1",0) >= 0) ||
                                >> ((someString.In dexOf("somethin g2",0) >= 0) ||
                                >> ((someString.In dexOf("somethin g3",0) >= 0))
                                >> {
                                >> Do something
                                >> }
                                >> *************** *************** *************** *************** ***************
                                >> IndexOf doesn't do it. This was the original question. You have to do
                                >> multiple calls as is said in the original question. Nicholas was correct
                                >> in
                                >> his assessment. One Regex call would work.[/color]
                                >
                                > Yes, as would a single call to a method which called IndexOf on the
                                > string multiple times. I disagree with you - Nicholas wasn't correct in
                                > his assessment, as he claimed that the "best bet" would be to use a
                                > regular expression. Using regular expressions is just *not* the best
                                > bet here - it requires more effort, as I've described repeatedly.
                                >[/color]

                                No, he was correct in his answer to the question. The question was never
                                "Which is better", but can you do it . And you can do a method which called
                                IndexOf multiple times. But then it isn't one line, is it?
                                [color=blue][color=green][color=darkred]
                                >> > Okay - now suppose I need to change it from searching for "something1 "
                                >> > to "something. 1" or "something[1]". How long does it take to change in
                                >> > each version? How easy is it to read afterwards?[/color]
                                >>
                                >> That wasn't the question.[/color]
                                >
                                > Are you suggesting that maintainability isn't something that should be
                                > considered? Do you *really* want to look for "something1 ",
                                > "something2 " and "something3 " or were they (as I suspect) just
                                > examples, and the real values could easily have dots, brackets etc in?[/color]

                                I don't really remember what the context was originally. But I know they
                                didn't have dots and brackets in it.[color=blue]
                                >[color=green]
                                >> What if you wanted to change "something1 " to "something\ ". Same problem.[/color]
                                >
                                > Well, half the problem with IndexOf than it is with regular
                                > expressions. With regular expressions, you'd need to know that not only
                                > does backslash need escaping in C#, it also needs escaping in regular
                                > expressions.
                                >
                                > IndexOf: "something\ \" or @"something\ "
                                > Regex: "something\ \\\" or @"something\ \"
                                >
                                > Once again, the IndexOf version is easier to understand - there's less
                                > to mentally unescape to work out what's actually being asked for.
                                >[/color]
                                Splitting hairs, now. Both are the same, as far as I can see (here).
                                [color=blue][color=green]
                                >> And if escapes were a problem (if it were me) I would have a little sheet
                                >> that showed them at my desk within easy reach.[/color]
                                >
                                > Whereas by needing to know less (just the C# escapes) it's really easy
                                > to memorise everything I need to know to solve this situation.
                                >[/color]
                                That's true, but then you would only know C#. And if that is your aim.
                                That's fine.
                                [color=blue][color=green][color=darkred]
                                >> >> So I am at a loss as to how this regular expression is more unreadable
                                >> >> than
                                >> >> the C# counterpart. That is not to say that you couldn't make it more
                                >> >> unreadable - but you could do the same with C# if you wanted to.
                                >> >
                                >> > You could start by making the C# more readable, as I've shown...[/color]
                                >>
                                >> As you can with Regular Expressions.[/color]
                                >
                                > Well, Oliver Sturm has shown a more readable version, but you seem to
                                > be keen on the "put them all in the same line" version.
                                >
                                > Neither is as readable as the String.IndexOf version, however.
                                >[color=green][color=darkred]
                                >> > However, the regex is already less readable:
                                >> > 1) It's got "|" as a "magic character" in there.[/color]
                                >>
                                >> | = or (same as C)[/color]
                                >
                                > Yup, but it's something that isn't used in string literals other than
                                > for regular expressions. It's an extra thing to bear in mind
                                > unnecessarily.
                                >[/color]
                                No room for it, huh?
                                [color=blue][color=green][color=darkred]
                                >> > 2) It's got all the strings concatenated, so it's harder to spot each
                                >> > of them separately.[/color]
                                >>
                                >> You are kidding, right?[/color]
                                >
                                > Absolutely not! It's significantly easier to spot the three separate
                                > values when they're three separate strings than when they're all mashed
                                > together.
                                >[color=green][color=darkred]
                                >> > Furthermore, suppose you didn't just want to search for literals -
                                >> > suppose one of the strings you wanted to search for was contained in a
                                >> > variable. How sure are you that *no-one* on your team would use:
                                >> >
                                >> > x+"|something2| something3"
                                >> >
                                >> > as the regular expression?[/color]
                                >>
                                >> You are now leaving the original question. I never said that Regular
                                >> Expressions was the better (or not better) in all cases.[/color]
                                >
                                > While I'm leaving the exact original question, it's far from out of the
                                > question that the original code wouldn't need to be changed to use a
                                > variable to be searched for some time. At that point, can you guarantee
                                > that your team would get it right? They'd need to be on their guard
                                > when using regular expressions - they wouldn't need to be on their
                                > guard using IndexOf.
                                >[/color]
                                Right. No one makes mistakes with IndexOf.
                                [color=blue][color=green][color=darkred]
                                >> > I would tell programmers on my team not to use regular expressions
                                >> > where the alternative is simpler and more readbale, yes.[/color]
                                >>
                                >> Why use them at all? It isn't readable.[/color]
                                >
                                > They aren't as readable *in this case*. In other, more complicated
                                > situations, the version which only used IndexOf would be harder to read
                                > than the regular expression version.[/color]

                                But your problem was that it would be hard for other programmers to read.
                                If they can read your more complicated version, this one should be easy.[color=blue]
                                >
                                > Using a regular expression is like getting a car compared with walking
                                > somewhere - it's absolutely the right thing to do when you're going on
                                > a long journey, but in this case you're advocating getting in a car
                                > just to travel to the next room. It's simpler to walk.
                                >[color=green]
                                >> And if your programmers can't maintain the simple Regexs, they definately
                                >> won't be able to handle the more complicated ones.[/color]
                                >
                                > You seem to fail to grasp the "make it as simple as possible" concept.
                                > It's not a case of maintenance engineers being idiots - it's about
                                > presenting them with fewer possible risks. Why leave them a trap to
                                > fall into when you can write simpler code which is easier to change
                                > later on?
                                >[/color]
                                No.

                                I just find it as simple, in this case and you don't.
                                [color=blue][color=green][color=darkred]
                                >> > You've shown nothing of the kind - whereas I think I've given plenty of
                                >> > examples of how using regular expressions make the code less easily
                                >> > maintainable, even if you consider it equally readable to start with
                                >> > (which I don't).[/color]
                                >>
                                >> Not in this specific case. I was never maintaining or pushing Regex for
                                >> all
                                >> or any situations.[/color]
                                >
                                > But you're pushing for regular expressions in *this* situation, or at
                                > least saying it's just as good as using IndexOf. You've also shown in
                                > your other code that you use regular expressions unnecessarily for
                                > replacement, making a simple two-step replacement into a complicated
                                > single-step replacement where the number of characters which *aren't*
                                > just plain text is greater than the number of characters which are.[/color]

                                No. Not pushing. But think they are equivelant in this case. As you said
                                earlier, I am sure others would disagree. But I don't think that the
                                difference is significant enough, in this case, even if I were to agree on
                                which is easier, to preclude it.[color=blue]
                                >[color=green]
                                >> But I am not going to force my programmers to come to me to find out
                                >> whether
                                >> or not Regex is the easiest way or not. That is up to the programmer.
                                >> If
                                >> there is a problem with their code and feel the programmer is way off
                                >> base
                                >> in his coding we would talk about (that would be the case with his C#, VB
                                >> or
                                >> Regex code).[/color]
                                >
                                > Using regular expressions in this case *is* a problem with their code,
                                > IMO. It's just asking for trouble later on.
                                >[color=green][color=darkred]
                                >> > Yes, I can read either too. The point is that in reading my version, I
                                >> > didn't need to wade through various special characters, understanding
                                >> > exactly what was there for.[/color]
                                >>
                                >> If you knew enough to know about Regex at all (which you said you would
                                >> have
                                >> no problem with in some situations - so the programmers better be able to
                                >> read it), there should not be a problem with the 2 special characters
                                >> which
                                >> is the same as C#. There is nothing obscure in this example - that I can
                                >> see.[/color]
                                >
                                > Of course there is - to work out what's going on, you've got to
                                > mentally unescape the dollar and the comma, but *not* mentally unescape
                                > the |. All that rather than just "replace dollar with space, replace
                                > comma with space" in a simple form with no hidden meanings to anything.
                                >[color=green][color=darkred]
                                >> >Of course, your version wasn't even valid
                                >> > C#, as it didn't escape the backslashes and you didn't specify a
                                >> > verbatim literal. I assume it was originally VB.NET. I wonder which
                                >> > version would be easier to convert to valid C#? Mine, perhaps?[/color]
                                >>
                                >> Actually, it was VB.Net.[/color]
                                >
                                > Right. So in the C#, you'd either have to have more escapes, or make
                                > them verbatim literals. More stuff to get right. Note how no escaping
                                > at all is required in my version.
                                >[color=green][color=darkred]
                                >> > And in all of those cases, regular expressions are really useful.[/color]
                                >>
                                >> But according to you, you shouldn't use them as some of the programmers
                                >> may
                                >> not be able to maintain it.[/color]
                                >
                                > <sigh> If you actually believe that, you haven't been reading what I've
                                > been writing.
                                >[color=green]
                                >> Definately if they would have a problem with our example.
                                >>
                                >> Can't have it both ways. If you allow Regular Expressions, you shouldn't
                                >> have a problem if a programmer used the Regex or IndexOf in our example.
                                >> Anyone maintaining the "USEFUL" ones would have zero problems with this
                                >> one.[/color]
                                >
                                > How very black and white of you. Do you really have no concept of
                                > someone being able to understand something, but having a harder time
                                > understanding it one way than the other?
                                >[/color]
                                Who?

                                The person who can understand Regex if complicated, but would be trashed
                                trying to figure out our little example.

                                Bit of a stretch there.[color=blue][color=green][color=darkred]
                                >> >> Which are very well documented and when there are a myiad of ways a
                                >> >> user can put input these types of data, I prefer to use Regular
                                >> >> expressions which are all over the place (easy to find) then try to
                                >> >> come put with some complex set of loops and temporary variables which
                                >> >> make it far easier to make a mistake and much more unreadable the the
                                >> >> Regex equivelant.
                                >> >
                                >> > Where exactly are the complex loops and temporary variables in this
                                >> > specific case? After all, you have been arguing for using regular
                                >> > expressions in *this specific case*, haven't you?[/color]
                                >>
                                >> I was obviously talking about Regular Expressions in general here as I
                                >> was
                                >> refering to the standard ones you can get anywhere dealing with (Phone
                                >> numbers, credit card etc). There would be none in this case, obviously.
                                >> But there may be in more complicated cases.[/color]
                                >
                                > Yes - the complicated cases where I've already said that regular
                                > expressions are useful![/color]

                                Just make sure the programmer that can't handle the easy Regex doesn't see
                                that one.[color=blue]
                                >[color=green][color=darkred]
                                >> > You already need to know that when writing C# though - my use of
                                >> > String.IndexOf doesn't add to the volume of knowledge required.[/color]
                                >>[/color][/color]
                                Can't have that !!!!
                                [color=blue][color=green]
                                >> It is still an issue.[/color]
                                >
                                > Yes, it's still going to be harder to search for "some\thing " than
                                > "something" . However, it's *not* going to be harder to search for
                                > "some.thing ", or "(something )", or "[something]", or "some,thing ", or
                                > "some*thing " or "some+thing " etc. Furthermore, there's still going to
                                > be less to remember when you *are* faced with searching for
                                > "some\thing " than there would be using regular expressions.
                                >[color=green]
                                >> Just as the Regular expressions are. And again, if
                                >> you are going to allow Regex at all, you would still need to know about
                                >> the
                                >> escapes.[/color]
                                >
                                > You'd need to know about the escapes where regular expressions are
                                > used. The fewer places they're used, the fewer times someone will need
                                > to look them up in the documentation.
                                >[color=green][color=darkred]
                                >> > Just because they're as readable *to you* doesn't mean they're as
                                >> > readable to everyone. How sure are you that the next engineer to read
                                >> > this code will be familiar with regular expressions? How sure are you
                                >> > that when you need to change it to look for a different string, you'll
                                >> > check whether any of the characters need to be escaped? Why would you
                                >> > even want to force that check on yourself?[/color]
                                >>
                                >> Again - then don't allow them at all.[/color]
                                >
                                > No, just allow them where they make sense. Note that if you only use
                                > them where they're going to be doing something fairly involved, it's
                                > much less likely that an engineer will forget that he's actually
                                > dealing with a regular expression than with a simple string.[/color]

                                Already dealt with.[color=blue]
                                >[color=green][color=darkred]
                                >> > When there's no good reason not to, absolutely.[/color]
                                >>
                                >> I guess that is where we disagree.[/color]
                                >
                                > It certainly sounds like it.
                                >[color=green][color=darkred]
                                >> >> I am not going to code to the level of a junior programmer. I prefer
                                >> >> that
                                >> >> he learn to code to a higher level.
                                >> >
                                >> > Learning to solve problems as simply as possible *is* learning to code
                                >> > to a higher level.[/color]
                                >>
                                >> No argument there.[/color]
                                >
                                > But regular expressions are by their very nature more complicated than
                                > a simple String.IndexOf call. If they weren't they wouldn't be as
                                > powerful as they are.
                                >[/color]
                                Write and vanilla C# is less complicated than writing objects, but we still
                                do them.
                                [color=blue][color=green][color=darkred]
                                >> > If it's not the simplest code for the situation, it's not well written
                                >> > IMO. If it introduces risk for no reward (the risk of maintenance
                                >> > failing to notice that they might need to escape something, versus no
                                >> > reward) then it's not well written.[/color]
                                >>
                                >> I see no risk in the example we are talking about. At least, no more
                                >> that
                                >> in the IndexOf solution (in this situation).[/color]
                                >
                                > You don't think there's any risk that someone will forget one of the
                                > regular expression characters which needs escaping? There is no string
                                > you could need to search for which needs *less* escaping in regular
                                > expressions than with String.IndexOf, but there are *lots* of strings
                                > which need more escaping - thus there's more overall risk.
                                >[color=green][color=darkred]
                                >> > I bet if I showed my code to a random sample of a hundred C# developers
                                >> > and asked them to change it to search for "hello[there]", virtually all
                                >> > of them would get it right. I also bet that if I showed your code to
                                >> > them and asked them for the same change, some would fail to escape it
                                >> > appropriately. Do you disagree?[/color]
                                >>
                                >> No. But then the same developers would have a problem with the more
                                >> complicated expressions you claim is useful.[/color]
                                >
                                > Actually, the fact that they were presented with a complicated
                                > expression would immediately make them wary, I suspect. Problems tend
                                > to creep in when something *looks* simpler than it actually is - as is
                                > the case here.
                                >[color=green][color=darkred]
                                >> > Both are ways of finding the value of a property. The first is harder
                                >> > to maintain and harder to read, just like your use of regular
                                >> > expressions in this instance. Now, which of the above snippets of code
                                >> > would you use, and why?[/color]
                                >>
                                >> Since I am not sure why you would use the first, I would do the 2nd.[/color]
                                >
                                > You'd use the first to keep up your knowledge of reflection, of course.
                                > After all, if you don't use it, you lose it, right? That's your
                                > argument for using regular expressions where they're completely
                                > unnecessary and provide no benefit, after all.
                                >[color=green]
                                >> But in our case, I would still use either - as I see the Regex version as
                                >> easy as the IndexOf.[/color]
                                >
                                > I think we'll have to agree to disagree. You seem to be unable to grasp
                                > the idea that there are more potential pitfalls and more knowledge
                                > required for the regular expression version than for the IndexOf
                                > version.[/color]

                                Agreed.

                                Tom


                                Comment

                                Working...