string routines and code libraries

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael S

    #16
    Re: string routines and code libraries

    "Kevin Spencer" <kevin@DIESPAMM ERSDIEtakempis. com> wrote in message
    news:uW6KDlM2FH A.620@TK2MSFTNG P10.phx.gbl...[color=blue][color=green]
    >> Anybody knows why we don't get such a class? If I think for 2 seconds I'd
    >> imagine it would screw up the GC as every such string must be pinned to a
    >> memory location. If anyone else could think for like 4 seconds or even a
    >> minute, I would appriecate your input on why and why not.[/color]
    >
    > Ask Anders Hejlsberg. He led the team that created Delphi AND the
    > Mcirosoft .Net platform.[/color]

    No shit!
    I've been a follower of Hejlsberg since Turbo Pascal..

    <joke>But since I showed up in tiger-tanga-lingerie, he sorta stopped
    calling me</joke>

    But this is not the question. My question is why we don't have a
    variant-copy-on-write-string in the .NET-framework..

    Happy Coding
    - Michael S





    Comment

    • kevin  cline

      #17
      Re: string routines and code libraries

      Are you imagining some sort of reference-counting scheme where a string
      will only be copied if there is more than one reference to the string?
      That doesn't play well with threading.

      Comment

      • kevin  cline

        #18
        Re: string routines and code libraries

        In a word, no, it's not desirable. You end up with a huge number of
        simple functions that are relatively useless, because it's rare that
        one of them will do the entire job. And once you need more than one
        function call, you might as well use a regular expression, which often
        can do the whole job.

        Comment

        • Michael S

          #19
          Re: string routines and code libraries


          "kevin cline" <kevin.cline@gm ail.com> wrote in message
          news:1130182179 .358897.185080@ f14g2000cwb.goo glegroups.com.. .[color=blue]
          > Are you imagining some sort of reference-counting scheme where a string
          > will only be copied if there is more than one reference to the string?
          > That doesn't play well with threading.[/color]

          No, it doesn't play well with threading at all.
          But I'm not dreaming. It is all there. And I don't take credit for it as it
          has been in Delphi since 2.0. =)

          Have a look how strings are done in Delphi and you'll see something neat.
          Or don't. I'll do it for you...

          I'm not saying that System.String should be replaced, but that a sorta
          System.StringBu ffer would be desirable.
          I just picked the name from Java, just to make sure Javaites would get
          really really confused...

          StringBuffer o1 = "Hello World!" // o1 points to virtual memory of address
          1000 and has a refcount of 1.
          StringBuffer o2 = o1; // o2 now also points to the memory address of 1000
          that keeps a refcount of 2. No chars hurt!
          o2.CharAt[1] = 'a'; // Now a new string gets copied to the heap at address
          2000 and points to 'Hallo World!".
          StringBuffer o3 = o2; // o3 is simply a reference to address of 2000. No
          chars was copied.

          But there is more to strings in Delphi. A string in Delphi also keeps its
          length.

          o1 = "OK; // o1 still points to the memory of address 1000 containing "OKllo
          World!"
          o1 = "Now this is really cool"; // The allocated space of o1 cannot hold the
          string. It is being copied to address of 3000.

          There is (somewhat) no magic. This is how the structure works.

          [32-bit refcount][32-bit allocated][32-bit length[0
          depricated]][1][2][3][4]...[N] ascii characters.

          o1 = "Get it?" //o1 does not reallocate. It stays at 3000 and contains "Get
          it?s is really cool"

          Hence the reference of o1 would point to address of 3000:
          1, 23, 7 [points here]Get it?s is really cool

          Also why Length(o1) in Delphi is actually nothing more than a single fetch
          of the address with a -3 offset. No strlen needed at all.

          Happy Strings
          - Michael S


















          Comment

          • Jon Skeet [C# MVP]

            #20
            Re: string routines and code libraries

            kevin cline <kevin.cline@gm ail.com> wrote:[color=blue]
            > In a word, no, it's not desirable. You end up with a huge number of
            > simple functions that are relatively useless, because it's rare that
            > one of them will do the entire job. And once you need more than one
            > function call, you might as well use a regular expression, which often
            > can do the whole job.[/color]

            Do you use regular expressions every time you need to do more than one
            operation on a string then? I certainly don't. I'd rather see a few
            simple operations than one regular expression which could take a while
            to understand or even to write properly in the first place.

            Regular expressions are great when they take the place of *complicated*
            string processing, but when you've just got a few operations to
            perform, I'll take the simplicity of straight string operations any
            day.

            --
            Jon Skeet - <skeet@pobox.co m>
            http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
            If replying to the group, please do not mail me too

            Comment

            • The Crow

              #21
              Re: string routines and code libraries

              how do you think about performance comparation between regular string
              operations and regular expression? can we use regular expressions with no
              performance consideration (is performance slow down is very little)?


              Comment

              • Jon Skeet [C# MVP]

                #22
                Re: string routines and code libraries

                <"The Crow" <q>> wrote:[color=blue]
                > how do you think about performance comparation between regular string
                > operations and regular expression? can we use regular expressions with no
                > performance consideration (is performance slow down is very little)?[/color]

                It entirely depends what you're doing. In some situations, compiled
                regular expressions will be faster than the same kind of operations
                done just with String methods - at least without significant work.

                In most cases, however, regular expressions are slower, sometimes quite
                significantly.

                --
                Jon Skeet - <skeet@pobox.co m>
                http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                If replying to the group, please do not mail me too

                Comment

                • kevin  cline

                  #23
                  Re: string routines and code libraries

                  It's a well-known technique, and doesn't play well with threading
                  because the reference count has to be updated atomically.

                  Comment

                  • kevin  cline

                    #24
                    Re: string routines and code libraries

                    Jon wrote:[color=blue]
                    > kevin cline <kevin.cline@gm ail.com> wrote:[color=green]
                    > > In a word, no, it's not desirable. You end up with a huge number of
                    > > simple functions that are relatively useless, because it's rare that
                    > > one of them will do the entire job. And once you need more than one
                    > > function call, you might as well use a regular expression, which often
                    > > can do the whole job.[/color]
                    >
                    > Do you use regular expressions every time you need to do more than one
                    > operation on a string then?[/color]

                    Mostly, yes, I do. I've been using regular expressions for a long time
                    and it's easier for me to read and verify one regular expression than
                    to understand multiple calls to index and substring. Also, that sort
                    of string manipulation is very easy to get wrong.

                    I certainly don't. I'd rather see a few[color=blue]
                    > simple operations than one regular expression which could take a while
                    > to understand or even to write properly in the first place.[/color]

                    With practice, you'll find that regular expressions are easy to
                    understand.
                    [color=blue]
                    > Regular expressions are great when they take the place of *complicated*
                    > string processing, but when you've just got a few operations to
                    > perform, I'll take the simplicity of straight string operations any
                    > day.[/color]

                    As soon as you get to 'a few' operations, it's no longer simple. Such
                    code is quite prone to off-by-one errors, index out of range
                    exceptions, invalid argument exceptions, etc. It also tends to be
                    slower than a single regular expression match.

                    Comment

                    • Jon Skeet [C# MVP]

                      #25
                      Re: string routines and code libraries

                      kevin cline <kevin.cline@gm ail.com> wrote:[color=blue][color=green]
                      > > Do you use regular expressions every time you need to do more than one
                      > > operation on a string then?[/color]
                      >
                      > Mostly, yes, I do. I've been using regular expressions for a long time
                      > and it's easier for me to read and verify one regular expression than
                      > to understand multiple calls to index and substring.[/color]

                      Have all the other engineers who might read your code also been using
                      regular expressions for that long?
                      [color=blue]
                      > Also, that sort of string manipulation is very easy to get wrong.[/color]

                      Whereas no-one ever gets regular expressions wrong, I suppose? ;)
                      [color=blue]
                      > I certainly don't. I'd rather see a few[color=green]
                      > > simple operations than one regular expression which could take a while
                      > > to understand or even to write properly in the first place.[/color]
                      >
                      > With practice, you'll find that regular expressions are easy to
                      > understand.[/color]

                      Without practice, simple string calls are easy to understand, IME. Why
                      should anyone who has to read my code also have to have years of
                      experience with regular expressions?
                      [color=blue][color=green]
                      > > Regular expressions are great when they take the place of *complicated*
                      > > string processing, but when you've just got a few operations to
                      > > perform, I'll take the simplicity of straight string operations any
                      > > day.[/color]
                      >
                      > As soon as you get to 'a few' operations, it's no longer simple.[/color]

                      If it genuinely is "a few" (as opposed to several including a couple of
                      loops), it can still be very simple IMO.
                      [color=blue]
                      > Such code is quite prone to off-by-one errors, index out of range
                      > exceptions, invalid argument exceptions, etc.[/color]

                      Likewise regular expressions are prone to forgetting to escape certain
                      characters, forgetting just which bits need matching, etc. They're also
                      prone to assumptions in terms of portability - not all regular
                      expression environments are the same, so you either have to limit
                      yourself to a basic core, or learn the extensions in each and remember
                      which platform you're dealing with. Of course, not all string-handling
                      libraries are the same either - but I've got the compiler and
                      intellisense to help me there.
                      [color=blue]
                      > It also tends to be slower than a single regular expression match.[/color]

                      That's not my experience in the benchmarks I've done on various
                      operations over the years (in response to newsgroup questions). It
                      depends what exactly is being done, but often "hard-coded" string
                      operations are significantly faster. That makes sense, as they're
                      (each) less generalised.

                      --
                      Jon Skeet - <skeet@pobox.co m>
                      http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                      If replying to the group, please do not mail me too

                      Comment

                      • The Crow

                        #26
                        Re: string routines and code libraries


                        "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                        news:MPG.1dca02 f2127a389298c9b f@msnews.micros oft.com...[color=blue]
                        > kevin cline <kevin.cline@gm ail.com> wrote:[color=green][color=darkred]
                        >> > Do you use regular expressions every time you need to do more than one
                        >> > operation on a string then?[/color]
                        >>
                        >> Mostly, yes, I do. I've been using regular expressions for a long time
                        >> and it's easier for me to read and verify one regular expression than
                        >> to understand multiple calls to index and substring.[/color]
                        >
                        > Have all the other engineers who might read your code also been using
                        > regular expressions for that long?
                        >[color=green]
                        >> Also, that sort of string manipulation is very easy to get wrong.[/color]
                        >
                        > Whereas no-one ever gets regular expressions wrong, I suppose? ;)
                        >[color=green]
                        >> I certainly don't. I'd rather see a few[color=darkred]
                        >> > simple operations than one regular expression which could take a while
                        >> > to understand or even to write properly in the first place.[/color]
                        >>
                        >> With practice, you'll find that regular expressions are easy to
                        >> understand.[/color]
                        >
                        > Without practice, simple string calls are easy to understand, IME. Why
                        > should anyone who has to read my code also have to have years of
                        > experience with regular expressions?
                        >[color=green][color=darkred]
                        >> > Regular expressions are great when they take the place of *complicated*
                        >> > string processing, but when you've just got a few operations to
                        >> > perform, I'll take the simplicity of straight string operations any
                        >> > day.[/color]
                        >>
                        >> As soon as you get to 'a few' operations, it's no longer simple.[/color]
                        >
                        > If it genuinely is "a few" (as opposed to several including a couple of
                        > loops), it can still be very simple IMO.
                        >[color=green]
                        >> Such code is quite prone to off-by-one errors, index out of range
                        >> exceptions, invalid argument exceptions, etc.[/color]
                        >
                        > Likewise regular expressions are prone to forgetting to escape certain
                        > characters, forgetting just which bits need matching, etc. They're also
                        > prone to assumptions in terms of portability - not all regular
                        > expression environments are the same, so you either have to limit
                        > yourself to a basic core, or learn the extensions in each and remember
                        > which platform you're dealing with. Of course, not all string-handling
                        > libraries are the same either - but I've got the compiler and
                        > intellisense to help me there.
                        >[/color]


                        "intellisen se" is available only in .net platform.

                        [color=blue][color=green]
                        >> It also tends to be slower than a single regular expression match.[/color]
                        >
                        > That's not my experience in the benchmarks I've done on various
                        > operations over the years (in response to newsgroup questions). It
                        > depends what exactly is being done, but often "hard-coded" string
                        > operations are significantly faster. That makes sense, as they're
                        > (each) less generalised.[/color]



                        in my opinion, someone who has a little knowledge on regular expressions and
                        software engineering can sense where to use regular string operations or
                        regular expressions... if you ask me, ill choose expressing rather then
                        doing the work. doing the work is always more error prone.


                        Comment

                        • Jon Skeet [C# MVP]

                          #27
                          Re: string routines and code libraries

                          <"The Crow" <q>> wrote:[color=blue][color=green]
                          > > Likewise regular expressions are prone to forgetting to escape certain
                          > > characters, forgetting just which bits need matching, etc. They're also
                          > > prone to assumptions in terms of portability - not all regular
                          > > expression environments are the same, so you either have to limit
                          > > yourself to a basic core, or learn the extensions in each and remember
                          > > which platform you're dealing with. Of course, not all string-handling
                          > > libraries are the same either - but I've got the compiler and
                          > > intellisense to help me there.[/color]
                          >
                          > "intellisen se" is available only in .net platform.[/color]

                          Call it what you like, many IDEs have the same sort of auto-completion
                          and prompting with documentation that VS.NET has. Eclipse's version is
                          actually rather better than VS.NET 2003's, in fact.

                          I believe that most developers on most platforms use an IDE which can
                          help them with basic string handling.

                          I believe that very few developers use an IDE which can help them
                          (without having to go to a different view/window/whatever) get regular
                          expressions right first time.
                          [color=blue][color=green][color=darkred]
                          > >> It also tends to be slower than a single regular expression match.[/color]
                          > >
                          > > That's not my experience in the benchmarks I've done on various
                          > > operations over the years (in response to newsgroup questions). It
                          > > depends what exactly is being done, but often "hard-coded" string
                          > > operations are significantly faster. That makes sense, as they're
                          > > (each) less generalised.[/color]
                          >
                          > in my opinion, someone who has a little knowledge on regular expressions and
                          > software engineering can sense where to use regular string operations or
                          > regular expressions... if you ask me, ill choose expressing rather then
                          > doing the work. doing the work is always more error prone.[/color]

                          Of course, everyone in this thread probably thinks they can sense where
                          to use regular string operations and where to use regular expressions -
                          but come out with completely different answers.

                          And if you think that using a regular expression means you aren't doing
                          work, you're kidding yourself. There's a reason I see more questions
                          about regular expressions on the newsgroups than string operations -
                          and that reason is that regular expressions are relatively complex to
                          both read and write.

                          --
                          Jon Skeet - <skeet@pobox.co m>
                          http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                          If replying to the group, please do not mail me too

                          Comment

                          • kevin  cline

                            #28
                            Re: string routines and code libraries


                            Jon wrote:[color=blue]
                            > kevin cline <kevin.cline@gm ail.com> wrote:[color=green][color=darkred]
                            > > > Do you use regular expressions every time you need to do more than one
                            > > > operation on a string then?[/color]
                            > >
                            > > Mostly, yes, I do. I've been using regular expressions for a long time
                            > > and it's easier for me to read and verify one regular expression than
                            > > to understand multiple calls to index and substring.[/color]
                            >
                            > Have all the other engineers who might read your code also been using
                            > regular expressions for that long?[/color]
                            [color=blue]
                            >[color=green]
                            > > Also, that sort of string manipulation is very easy to get wrong.[/color]
                            >
                            > Whereas no-one ever gets regular expressions wrong, I suppose? ;)[/color]

                            It's easier to get regular expressions right because they are usually
                            closer to the requirement. All I know is that I've seen a lot of buggy
                            string manipulation functions that could be easily performed with a
                            single regular expression.
                            [color=blue][color=green]
                            > > With practice, you'll find that regular expressions are easy to
                            > > understand.[/color]
                            >
                            > Without practice, simple string calls are easy to understand, IME.[/color]

                            Individually, they are trivial to understand. But it's not so easy to
                            understand the purpose of five or six of them in a row, and usually not
                            at all easy to verify that the code is doing what it is supposed to do.

                            [color=blue]
                            > which
                            > should anyone who has to read my code also have to have years of
                            > experience with regular expressions?[/color]

                            I generally assume the other programmers on my team are competent
                            enough to read the documentation of library functions. It's not rocket
                            science, just basic computer science. An hour of study will save you
                            hundreds of hours of programming and debugging in the future.

                            Comment

                            • Jon Skeet [C# MVP]

                              #29
                              Re: string routines and code libraries

                              kevin cline <kevin.cline@gm ail.com> wrote:[color=blue][color=green][color=darkred]
                              > > > Also, that sort of string manipulation is very easy to get wrong.[/color]
                              > >
                              > > Whereas no-one ever gets regular expressions wrong, I suppose? ;)[/color]
                              >
                              > It's easier to get regular expressions right because they are usually
                              > closer to the requirement. All I know is that I've seen a lot of buggy
                              > string manipulation functions that could be easily performed with a
                              > single regular expression.[/color]

                              And I've seen people going out of their way to use regular expressions
                              (often needing to ask for help because they can't get it right on their
                              own) when the code can be significantly simpler with just a few string
                              operations.
                              [color=blue][color=green][color=darkred]
                              > > > With practice, you'll find that regular expressions are easy to
                              > > > understand.[/color]
                              > >
                              > > Without practice, simple string calls are easy to understand, IME.[/color]
                              >
                              > Individually, they are trivial to understand. But it's not so easy to
                              > understand the purpose of five or six of them in a row, and usually not
                              > at all easy to verify that the code is doing what it is supposed to do.[/color]

                              I see it's gone up from "more than one" to "five or six"...

                              Verification is necessary with either technique, and should involve
                              enough test cases to give confidence. I'd be a lot happier
                              [color=blue][color=green]
                              > > which
                              > > should anyone who has to read my code also have to have years of
                              > > experience with regular expressions?[/color]
                              >
                              > I generally assume the other programmers on my team are competent
                              > enough to read the documentation of library functions.[/color]

                              I think it's far more likely that people will know the *basic* library
                              functions (including string manipulations) than that they'll know the
                              details of the regular expression dialect used on every platform they
                              happen to come across.

                              Even when you know regular expressions, when they become even slightly
                              non-trivial they take a while to understand, IMO.
                              [color=blue]
                              > It's not rocket science, just basic computer science. An hour of
                              > study will save you hundreds of hours of programming and debugging in
                              > the future.[/color]

                              I think we'll have to agree to disagree. Regular expressions certainly
                              have their place, but for me the bar for their use is much higher than
                              it is for you. I believe it's much easier to make a mistake -
                              particularly when changing the behaviour of a working regular
                              expression in a way which appears trivial at first sight, but where you
                              need to be careful about escaping, grouping etc.

                              --
                              Jon Skeet - <skeet@pobox.co m>
                              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                              If replying to the group, please do not mail me too

                              Comment

                              • Kevin Spencer

                                #30
                                Re: string routines and code libraries

                                This debate is all rather silly. The usefulness of Regular Expressions lies
                                in the purpose for which they were created. That is, pattern-matching. There
                                is quite a bit of difference between a string and a pattern. A string (in
                                the purely non-oop sense of the word) is a literal array of char. Each
                                character in it is a specific character, having a specific value. A pattern,
                                on the other hand, is a non-specific set of rules for determining whether or
                                not a given string (or substring of a string) satisfies the rules laid out
                                by the pattern.

                                When parsing a string for a string (as in the problem which sparked this
                                discussion), obviously you are not looking for a pattern. You are looking
                                for a string. A Regular Expression carries overhead with it which makes it
                                less optimal for this sort of use. Why use a sledge hammer to hammer a nail?

                                On the other hand, when parsing a string for one or more patterns, the
                                Regular Expression is the optimal tool for this sort of use. Regular
                                Expressions are designed using the most efficient algorithm for
                                pattern-matching. While one could certainly write the same algorithm in C#,
                                why build a sledgehammer when you already have one in your toolbox?

                                So, how about we shake hands and make up here, and move on to more important
                                matters? :-D

                                --
                                HTH,

                                Kevin Spencer
                                Microsoft MVP
                                ..Net Developer
                                A watched clock never boils.

                                "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                                news:MPG.1dca81 9a78aa7e6e98c9c 7@msnews.micros oft.com...[color=blue]
                                > kevin cline <kevin.cline@gm ail.com> wrote:[color=green][color=darkred]
                                >> > > Also, that sort of string manipulation is very easy to get wrong.
                                >> >
                                >> > Whereas no-one ever gets regular expressions wrong, I suppose? ;)[/color]
                                >>
                                >> It's easier to get regular expressions right because they are usually
                                >> closer to the requirement. All I know is that I've seen a lot of buggy
                                >> string manipulation functions that could be easily performed with a
                                >> single regular expression.[/color]
                                >
                                > And I've seen people going out of their way to use regular expressions
                                > (often needing to ask for help because they can't get it right on their
                                > own) when the code can be significantly simpler with just a few string
                                > operations.
                                >[color=green][color=darkred]
                                >> > > With practice, you'll find that regular expressions are easy to
                                >> > > understand.
                                >> >
                                >> > Without practice, simple string calls are easy to understand, IME.[/color]
                                >>
                                >> Individually, they are trivial to understand. But it's not so easy to
                                >> understand the purpose of five or six of them in a row, and usually not
                                >> at all easy to verify that the code is doing what it is supposed to do.[/color]
                                >
                                > I see it's gone up from "more than one" to "five or six"...
                                >
                                > Verification is necessary with either technique, and should involve
                                > enough test cases to give confidence. I'd be a lot happier
                                >[color=green][color=darkred]
                                >> > which
                                >> > should anyone who has to read my code also have to have years of
                                >> > experience with regular expressions?[/color]
                                >>
                                >> I generally assume the other programmers on my team are competent
                                >> enough to read the documentation of library functions.[/color]
                                >
                                > I think it's far more likely that people will know the *basic* library
                                > functions (including string manipulations) than that they'll know the
                                > details of the regular expression dialect used on every platform they
                                > happen to come across.
                                >
                                > Even when you know regular expressions, when they become even slightly
                                > non-trivial they take a while to understand, IMO.
                                >[color=green]
                                >> It's not rocket science, just basic computer science. An hour of
                                >> study will save you hundreds of hours of programming and debugging in
                                >> the future.[/color]
                                >
                                > I think we'll have to agree to disagree. Regular expressions certainly
                                > have their place, but for me the bar for their use is much higher than
                                > it is for you. I believe it's much easier to make a mistake -
                                > particularly when changing the behaviour of a working regular
                                > expression in a way which appears trivial at first sight, but where you
                                > need to be careful about escaping, grouping etc.
                                >
                                > --
                                > Jon Skeet - <skeet@pobox.co m>
                                > http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                                > If replying to the group, please do not mail me too[/color]


                                Comment

                                Working...