Serious Bug System.Collections Sort

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • william.hooper@gmail.com

    Serious Bug System.Collections Sort

    There is a longer article about this subject here:

    See the main article and the reply thread started by Robert Rohde.

    Alternatively look at this code:

    ArrayList a=new ArrayList();

    string s1 = "-0.67:-0.33:0.33";
    string s2 = "0.67:-0.33:0.33";
    string s3 = "-0.67:0.33:-0.33";

    a.Add(s1);
    a.Add(s2);
    a.Add(s3);

    a.Sort();
    for (int i=0; i<3; i++) Console.WriteLi ne( a[i] );

    Console.WriteLi ne();

    a.Clear();
    a.Add(s1);
    a.Add(s3);
    a.Add(s2);

    a.Sort();
    for (int i=0; i<3; i++) Console.WriteLi ne( a[i] );

    This code produces the following six lines of output:

    -0.67:0.33:-0.33
    0.67:-0.33:0.33
    -0.67:-0.33:0.33

    -0.67:-0.33:0.33
    -0.67:0.33:-0.33
    0.67:-0.33:0.33

    Note that the .Sort produces different outputs depending on the order
    the strings are added.

    It looks like the Sort algorithm is ignoring the "-" mark.

    This is a very serious Bug impacting the System.Collecti ons Array,
    SortedList etc.

  • Marc Gravell

    #2
    Re: Serious Bug System.Collecti ons Sort

    This appears to relate to the culture-specific comparer, which is presumably
    ignoring symbols (one of the CompareOptions flags); perhaps switch to
    ordinal comparison, which resolves this. You may be able to use
    StringComparer. Ordinal; can't remember if that exists in 1.1, but if not
    something like this should do (and use it in the Sort() calls).

    public class OrdinalStringCo mparer : IComparer
    {
    public readonly static OrdinalStringCo mparer Singleton = new
    OrdinalStringCo mparer();
    private OrdinalStringCo mparer() { }
    public int Compare(object x, object y)
    {
    return string.CompareO rdinal((string) x, (string) y);
    }
    }

    Marc


    Comment

    • william.hooper@gmail.com

      #3
      Re: Serious Bug System.Collecti ons Sort

      I don't agree....

      string s2 = "0.67:-0.33:0.33";
      string s3 = "-0.67:0.33:-0.33";

      Console.WriteLi ne( String.Compare( s2,s3));
      Console.WriteLi ne( String.Compare( s2,s3));

      returns -1 and 1 showing that the Sting.Compare function is working as
      expected.

      I don't think this is a culture issue and writing an ICompared for
      evxery System.Collecti ons string comparison is a nighmare.

      Comment

      • Marc Gravell

        #4
        Re: Serious Bug System.Collecti ons Sort

        Well... it would appear that the problem is a dodgy comparer; try this
        (using your previous values for s1, s2, s3):

        Console.WriteLi ne(Comparer.Def ault.Compare(s1 , s2));
        Console.WriteLi ne(Comparer.Def ault.Compare(s2 , s3));
        Console.WriteLi ne(Comparer.Def ault.Compare(s3 , s1));

        Yields 1, 1, 1 meaning there is a comparer loop. Oops! Anybody [MS?] want to
        comment on whether this is intentional or a fubar? It would appear to
        violate the "transitive " rule of comparers, which should be enforced for
        non-zero results (0 being transitive as long as it agrees in each
        direction).

        Actually, it's quite lucky that this returns at all! Perhaps there is a
        panic "oops, shouldn't possibly have taken more than n^2 iterations...".

        Anyway, the point of my post was that this can be avoided using an ordinal
        comparer. And you can re-use the same one each time... no need to write
        anything more.

        Marc


        Comment

        • Chris Dunaway

          #5
          Re: Serious Bug System.Collecti ons Sort

          william.hooper@ gmail.com wrote:
          I don't agree....
          >
          string s2 = "0.67:-0.33:0.33";
          string s3 = "-0.67:0.33:-0.33";
          >
          Console.WriteLi ne( String.Compare( s2,s3));
          Console.WriteLi ne( String.Compare( s2,s3));
          >
          returns -1 and 1 showing that the Sting.Compare function is working as
          expected.
          >
          I don't think this is a culture issue and writing an ICompared for
          evxery System.Collecti ons string comparison is a nighmare.
          >From the docs, note how it mentions that the hyphen might be given a
          low weight so that similar words will sort together:

          "The .NET Framework supports word, string, and ordinal sort rules. A
          word sort performs a culture-sensitive comparison of strings in which
          certain nonalphanumeric Unicode characters might have special weights
          assigned to them. For example, the hyphen ("-") might have a very small
          weight assigned to it so that "coop" and "co-op" appear next to each
          other in a sorted list. A string sort is similar to a word sort, except
          that there are no special cases and all nonalphanumeric symbols come
          before all alphanumeric Unicode characters. An ordinal sort compares
          strings based on the numeric value of each Char in the string. For more
          information about word, string, and ordinal sort rules, see the
          System.Globaliz ation.CompareOp tions topic.

          Comparison and search procedures are case-sensitive by default and use
          the culture associated with the current thread unless specified
          otherwise. By definition, any string, including the empty string (""),
          compares greater than a null reference, and two null references compare
          equal to each other.

          If your application makes security decisions based on the result of a
          comparison or case change operation, then the operation should use the
          invariant culture to ensure the result is not affected by the value of
          the current culture. For more information, see the
          CultureInfo.Inv ariantCulture topic."

          I don't know if this is what is causing the behavior you reported, but
          at least it's worth looking into.

          Chris

          Comment

          • william.hooper@gmail.com

            #6
            Re: Serious Bug System.Collecti ons Sort

            Yes, Marc, sorry you are right.

            The code

            string s1 = "-0.67:-0.33:0.33";
            string s2 = "0.67:-0.33:0.33";
            string s3 = "-0.67:0.33:-0.33";

            Console.WriteLi ne( String.Compare( s1,s2));
            Console.WriteLi ne( String.Compare( s2,s3));
            Console.WriteLi ne( String.Compare( s3,s1));

            Console.WriteLi ne();

            Console.WriteLi ne( String.CompareO rdinal(s1, s2));
            Console.WriteLi ne( String.CompareO rdinal(s2, s3));
            Console.WriteLi ne( String.CompareO rdinal(s3, s1));

            returns

            1
            1
            1

            -3
            3
            3

            Ie String.Compare can not handle the "-" marks but
            String.CompareO rdinal can.

            Is there not some regional setting you can put once in the code so
            String.Compare and ArrayList.Sort all work that way from then on? Would
            much prefer this...

            ----

            Note it annoys me that the documentation writes:

            "The .NET Framework supports word, string, and ordinal sort
            rules....For example, the hyphen ("-") might have a very small weight
            assigned to it so that "coop" and "co-op" appear next to each other in
            a sorted list...."

            But in fact the hypen is being completly ignored rather than given a
            low weight.

            I think this behaviour by default is very undesirable.

            Comment

            • Chris Dunaway

              #7
              Re: Serious Bug System.Collecti ons Sort

              And FWIW, I didn't see this issue officially reported on the feedback
              site. You may wish to post it there.

              Comment

              • Marc Gravell

                #8
                Re: Serious Bug System.Collecti ons Sort

                Well, it isn't being ignored. If it was being ignored I would expect the
                result to be 0,0,0.
                I think this behaviour by default is very undesirable.
                I think it is buggy, but it isn't in System.Collecti ons - it is in
                String.CompareT o. I can't think of a valid occasion in a well-ordered system
                when a < b < c < a or a b c a. It just makes no logical sense unless
                you are Maurits Escher.

                Now, a = b = c = a = 0 I could live with (i.e. hyhpens completely ignored).

                Marc


                Comment

                • w.hooper@hotmail.com

                  #9
                  Re: Serious Bug System.Collecti ons Sort

                  If you play around with examples like this:

                  s1 = "0.67:0.33:-0.33";
                  s2 = "0.67:-0.33:0.33";
                  s3 = "-0.67:0.33:0.33" ;

                  Console.WriteLi ne( String.Compare( s1,s2));
                  Console.WriteLi ne( String.Compare( s2,s3));
                  Console.WriteLi ne( String.Compare( s3,s1));

                  It all looks OK. But when you put two hyphens into the lines the
                  String.Compare gets confused and gives silly results.

                  OK the String.CompareO rdinal function fixes the problem but this is not
                  behaviour by design surely?

                  We need a comment from our lord and master MS....

                  Comment

                  • Marc Gravell

                    #10
                    Re: Serious Bug System.Collecti ons Sort

                    Logged:


                    Marc


                    Comment

                    • w.hooper@hotmail.com

                      #11
                      Re: Serious Bug System.Collecti ons Sort

                      >Chris Dunaway wrote:
                      >
                      And FWIW, I didn't see this issue officially reported on the feedback
                      site. You may wish to post it there.
                      Chris, what is the feedback site? I would liek to post it there. It's
                      really annoying and you never know, they might take an interest.

                      Comment

                      • w.hooper@hotmail.com

                        #12
                        Re: Serious Bug System.Collecti ons Sort

                        Thanks very much that's great. I will keep an eye on it.

                        Comment

                        • Marc Gravell

                          #13
                          Re: Serious Bug System.Collecti ons Sort

                          See my previous post; since you didn't seem familiar with "connect" I posted
                          it as a bug. Feel free to go in and click "validate". .. and vote!


                          And note: I wasn't trying to steal your thunder... just to get it logged
                          with the least fuss...

                          Marc


                          Comment

                          • Marc Gravell

                            #14
                            Re: Serious Bug System.Collecti ons Sort

                            Still no MS viewpoint?

                            For ref, I think this is actually quite important, as it could (as
                            illustrated on the now-deleted CodeProject link) cause a whole range of
                            sort-critical operations to fail... SortedList etc, or any custom
                            collections that assume that .Sort() might actually work, and then
                            trust the results.

                            Any thoughts?

                            Marc

                            Comment

                            • william.hooper@gmail.com

                              #15
                              Re: Serious Bug System.Collecti ons Sort

                              Yes absolutely, it cost me lots of time in the SortedList failing. I
                              think it's to do with the hyphen algorithm which is designed to sort
                              words with hyphens in a smart way. Obviously with two hyphens in the
                              word it fails. Anyone who puts two hyphens in strings is taking a huge
                              risk - yet using hyphens in strings is common enough. A really
                              dangerous bug I would say, but I guess Microsoft haven't seen that
                              yet. You could mention on the article that it causes problems with
                              SortedList and you think it's a very serious problem. I think we have
                              discovered a real corker and it deserves to be given a lot of attention.

                              Comment

                              Working...