Search a list/array of objects for specified criteria?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rick

    Search a list/array of objects for specified criteria?

    I have a large list of objects where each object has a unique
    (non-overlapping) date range. The list is sorted chronologically . What is
    the most efficient way to search this list for a single object that spans a
    specified date?


  • Oliver Sturm

    #2
    Re: Search a list/array of objects for specified criteria?

    Hello Rick,
    >I have a large list of objects where each object has a unique
    >(non-overlapping) date range. The list is sorted chronologically . What is
    >the most efficient way to search this list for a single object that spans
    >a specified date?
    My suggestion below... you mention that the list is sorted
    chronologically . I don't think that information is of any use here, as
    long as you don't have any good information about the distribution of the
    elements in the list, or the lengths of the spans covered by a typical
    element. Of course you could implement an algorithm that looks at a first
    element and tries to heuristically guess where the element you're looking
    for may be in the list - but without knowing the total range of elements,
    the number, the distribution and the lengths of the spans, I find it very
    hard to say whether the algorithm would be any more efficient than the
    basic one below.

    If you want efficiency without having good estimates of these details,
    maybe you should organize your data in a different way - using
    dictionaries for certain ranges of data, for instance. Once more it's hard
    to say what that structure should best look like, without knowing details
    about the data you're expecting.

    And finally, don't forget: it's usually better to write a reasonably good
    algorithm and to optimize it only when it proves to be a problem.



    [TestFixture]
    public class Tests {
    [Test]
    public void Test( ) {
    List<ClassValid WithinRangelist = new List<ClassValid WithinRange>(ne w
    ClassValidWithi nRange[] {
    new ClassValidWithi nRange(new DateTime(2007, 1, 1), new DateTime(2007,
    1,15)),
    new ClassValidWithi nRange(new DateTime(2007, 1, 16), new
    DateTime(2007, 1,31)),
    new ClassValidWithi nRange(new DateTime(2007, 2, 1), new DateTime(2007,
    2,28))
    });
    ClassValidWithi nRange checkVal = new ClassValidWithi nRange(new
    DateTime(2007, 3, 1), new DateTime(2007, 3, 31));
    list.Add(checkV al);

    ClassValidWithi nRange foundVal =
    list.Find(deleg ate(ClassValidW ithinRange entry) {
    return entry.IsValidOn (new DateTime(2007, 3, 15));
    });

    Assert.AreEqual (checkVal, foundVal);
    }
    }

    public class ClassValidWithi nRange {
    public ClassValidWithi nRange(DateTime from, DateTime to) {
    this.from = from;
    this.to = to;
    }
    public bool IsValidOn(DateT ime date) {
    // Implement with your choice of <, <=, and >=
    return from < date && to date;
    }
    private DateTime from;
    public DateTime From {
    get {
    return from;
    }
    set {
    from = value;
    }
    }
    private DateTime to;
    public DateTime To {
    get {
    return to;
    }
    set {
    to = value;
    }
    }
    }


    Oliver Sturm
    --

    Comment

    • Rick

      #3
      Re: Search a list/array of objects for specified criteria?

      Thanks Oliver.

      In short, you recommend using the Find method of the List<Tclass. This
      performs a linear search, which is basically what I'm doing now. It's
      acceptable but hardly optimal. I believe that a binary search could be more
      efficient, even where the distribution is uneven. The problem is that the
      BinarySearch methods provided by .NET don't seem to support searching using
      an IComparer alone.


      "Oliver Sturm" <oliver@sturmne t.orgwrote in message
      news:xn0f3jqdc1 akgb3008@msnews .microsoft.com. ..
      Hello Rick,
      >
      >>I have a large list of objects where each object has a unique
      >>(non-overlapping) date range. The list is sorted chronologically . What is
      >>the most efficient way to search this list for a single object that spans
      >>a specified date?
      >
      My suggestion below... you mention that the list is sorted
      chronologically . I don't think that information is of any use here, as
      long as you don't have any good information about the distribution of the
      elements in the list, or the lengths of the spans covered by a typical
      element. Of course you could implement an algorithm that looks at a first
      element and tries to heuristically guess where the element you're looking
      for may be in the list - but without knowing the total range of elements,
      the number, the distribution and the lengths of the spans, I find it very
      hard to say whether the algorithm would be any more efficient than the
      basic one below.
      >
      If you want efficiency without having good estimates of these details,
      maybe you should organize your data in a different way - using
      dictionaries for certain ranges of data, for instance. Once more it's hard
      to say what that structure should best look like, without knowing details
      about the data you're expecting.
      >
      And finally, don't forget: it's usually better to write a reasonably good
      algorithm and to optimize it only when it proves to be a problem.
      >
      >
      >
      [TestFixture]
      public class Tests {
      [Test]
      public void Test( ) {
      List<ClassValid WithinRangelist = new List<ClassValid WithinRange>(ne w
      ClassValidWithi nRange[] {
      new ClassValidWithi nRange(new DateTime(2007, 1, 1), new DateTime(2007,
      1,15)),
      new ClassValidWithi nRange(new DateTime(2007, 1, 16), new DateTime(2007,
      1,31)),
      new ClassValidWithi nRange(new DateTime(2007, 2, 1), new DateTime(2007,
      2,28))
      });
      ClassValidWithi nRange checkVal = new ClassValidWithi nRange(new
      DateTime(2007, 3, 1), new DateTime(2007, 3, 31));
      list.Add(checkV al);
      >
      ClassValidWithi nRange foundVal = list.Find(deleg ate(ClassValidW ithinRange
      entry) {
      return entry.IsValidOn (new DateTime(2007, 3, 15));
      });
      >
      Assert.AreEqual (checkVal, foundVal);
      }
      }
      >
      public class ClassValidWithi nRange {
      public ClassValidWithi nRange(DateTime from, DateTime to) {
      this.from = from;
      this.to = to;
      }
      public bool IsValidOn(DateT ime date) {
      // Implement with your choice of <, <=, and >=
      return from < date && to date;
      }
      private DateTime from;
      public DateTime From {
      get {
      return from;
      }
      set {
      from = value;
      }
      }
      private DateTime to;
      public DateTime To {
      get {
      return to;
      }
      set {
      to = value;
      }
      }
      }
      >
      >
      Oliver Sturm
      --
      http://www.sturmnet.org/blog

      Comment

      • Rick

        #4
        Re: Search a list/array of objects for specified criteria?

        I think I found an answer. I could create an IComparer that compares the
        upperbound of each date range. I can then use a BinarySearch to look for an
        object with a specific upperbound (equal to my desired date). It will return
        either the index of an exact match, or the bitwise complement of the index
        next highest object, or the bitwise complement of the list count.

        "Rick" <rick@nospam.co mwrote in message
        news:u6v$E2zYHH A.4396@TK2MSFTN GP06.phx.gbl...
        Thanks Oliver.
        >
        In short, you recommend using the Find method of the List<Tclass. This
        performs a linear search, which is basically what I'm doing now. It's
        acceptable but hardly optimal. I believe that a binary search could be
        more efficient, even where the distribution is uneven. The problem is that
        the BinarySearch methods provided by .NET don't seem to support searching
        using an IComparer alone.
        >
        >
        "Oliver Sturm" <oliver@sturmne t.orgwrote in message
        news:xn0f3jqdc1 akgb3008@msnews .microsoft.com. ..
        >Hello Rick,
        >>
        >>>I have a large list of objects where each object has a unique
        >>>(non-overlapping) date range. The list is sorted chronologically . What is
        >>>the most efficient way to search this list for a single object that spans
        >>>a specified date?
        >>
        >My suggestion below... you mention that the list is sorted
        >chronologicall y. I don't think that information is of any use here, as
        >long as you don't have any good information about the distribution of the
        >elements in the list, or the lengths of the spans covered by a typical
        >element. Of course you could implement an algorithm that looks at a first
        >element and tries to heuristically guess where the element you're looking
        >for may be in the list - but without knowing the total range of elements,
        >the number, the distribution and the lengths of the spans, I find it very
        >hard to say whether the algorithm would be any more efficient than the
        >basic one below.
        >>
        >If you want efficiency without having good estimates of these details,
        >maybe you should organize your data in a different way - using
        >dictionaries for certain ranges of data, for instance. Once more it's
        >hard to say what that structure should best look like, without knowing
        >details about the data you're expecting.
        >>
        >And finally, don't forget: it's usually better to write a reasonably good
        >algorithm and to optimize it only when it proves to be a problem.
        >>
        >>
        >>
        >[TestFixture]
        >public class Tests {
        >[Test]
        >public void Test( ) {
        >List<ClassVali dWithinRangelis t = new List<ClassValid WithinRange>(ne w
        >ClassValidWith inRange[] {
        >new ClassValidWithi nRange(new DateTime(2007, 1, 1), new DateTime(2007,
        >1,15)),
        >new ClassValidWithi nRange(new DateTime(2007, 1, 16), new DateTime(2007,
        >1,31)),
        >new ClassValidWithi nRange(new DateTime(2007, 2, 1), new DateTime(2007,
        >2,28))
        >});
        >ClassValidWith inRange checkVal = new ClassValidWithi nRange(new
        >DateTime(200 7, 3, 1), new DateTime(2007, 3, 31));
        >list.Add(check Val);
        >>
        >ClassValidWith inRange foundVal = list.Find(deleg ate(ClassValidW ithinRange
        >entry) {
        >return entry.IsValidOn (new DateTime(2007, 3, 15));
        >});
        >>
        >Assert.AreEqua l(checkVal, foundVal);
        >}
        >}
        >>
        >public class ClassValidWithi nRange {
        >public ClassValidWithi nRange(DateTime from, DateTime to) {
        >this.from = from;
        >this.to = to;
        >}
        >public bool IsValidOn(DateT ime date) {
        >// Implement with your choice of <, <=, and >=
        >return from < date && to date;
        >}
        >private DateTime from;
        >public DateTime From {
        >get {
        >return from;
        >}
        >set {
        >from = value;
        >}
        >}
        >private DateTime to;
        >public DateTime To {
        >get {
        >return to;
        >}
        >set {
        >to = value;
        >}
        >}
        >}
        >>
        >>
        > Oliver Sturm
        >--
        >http://www.sturmnet.org/blog
        >
        >

        Comment

        • Oliver Sturm

          #5
          Re: Search a list/array of objects for specified criteria?

          Hello Rick,
          >In short, you recommend using the Find method of the List<Tclass. This
          >performs a linear search, which is basically what I'm doing now. It's
          >acceptable but hardly optimal. I believe that a binary search could be
          >more efficient, even where the distribution is uneven.
          Hm... I'm not sure really - it's a mathematical exercise to prove whether
          it would be more efficient under the circumstances or not. There are two
          problems:

          * the distribution may be seriously uneven, we (I at least) don't have any information on that
          * the decision of whether or not a date falls in a given range is not only dependent on one distributed value, but on two different ones.

          The second problem brings the length of ranges into the equation - if
          there is some regularity to this, the task of estimating efficiency may
          become easier.

          In the end I think that theoretically a binary search could improve
          efficiency, if done right. But as we seem to know practically nothing
          about the content of the data list, it would be rather hard to implement
          the binary search in a way that is more efficient on average than a linear
          search. And of course there's always the storage to think about... linear
          search could be made more efficient very easily by splitting the list
          based on years, or something similar.

          Other things I would think about... I think a linear search algorithm
          should probably be efficient enough for a large majority of use cases. In
          those cases where the number of objects would be too large to use linear
          searching efficiently, it seems like a good question to ask why these huge
          numbers of objects are being handled in-memory in the first place - a
          relational database could make the search operation much easier and a lot
          faster in these particular cases.


          Oliver Sturm
          --

          Comment

          • Oliver Sturm

            #6
            Re: Search a list/array of objects for specified criteria?

            Hello Rick,
            >I think I found an answer. I could create an IComparer that compares the
            >upperbound of each date range. I can then use a BinarySearch to look for
            >an object with a specific upperbound (equal to my desired date). It will
            >return either the index of an exact match, or the bitwise complement of
            >the index next highest object, or the bitwise complement of the list count.
            Have fun :-)


            Oliver Sturm
            --

            Comment

            • Bill Butler

              #7
              Re: Search a list/array of objects for specified criteria?

              "Oliver Sturm" <oliver@sturmne t.orgwrote in message
              news:xn0f3jru11 cjl6j009@msnews .microsoft.com. ..
              Hello Rick,
              >
              >>In short, you recommend using the Find method of the List<Tclass.
              >>This performs a linear search, which is basically what I'm doing now.
              >>It's acceptable but hardly optimal. I believe that a binary search
              >>could be more efficient, even where the distribution is uneven.
              >
              Hm... I'm not sure really - it's a mathematical exercise to prove
              whether it would be more efficient under the circumstances or not.
              There are two problems:
              >
              * the distribution may be seriously uneven, we (I at least) don't
              have any information on that
              * the decision of whether or not a date falls in a given range is not
              only dependent on one distributed value, but on two different ones.
              Neither of these points will effect the speed of the search in the
              slightest.
              Remember, we are not talking about a (possibly imbalanced) binary tree.
              We are talking about using a binary search algorithm over a sorted list.
              That algorithm will give you the same performance as a perfectly
              balanced tree. Uneven distributions of the data will not impact the
              search in the slightest.

              Also remember that Rick said:
              <quote>
              I have a large list of objects where each object has a unique
              (non-overlapping) date range. The list is sorted chronologically
              </quote>

              Since the date ranges do NOT overlap there is a clear sequence of data
              points
              obj1.Start
              obj1.End
              obj2.Start
              obj2.End
              ....

              So, You perform a Binary search using either Start or end (the choice is
              irrelevant). This seach will go as O(logN). If you were lucky enough to
              hit the date on the money, then you are done. If you did not find the
              exact date (far more likely), the algorithm will return a negative
              number, which is the bitwise complement of the index of the next
              element. In other words, the algorithm has in effect told you which item
              in the Array you need to examine (off by one is more like it). Then it
              is a simple matter of seeing if your date falls within the Start and End
              dates for the object in question.

              Of course, unless the number of data points is substantial, the total
              time would be negligable in either case.

              Hope this helps
              Bill




              <snip>



              Comment

              • Oliver Sturm

                #8
                Re: Search a list/array of objects for specified criteria?

                Hello Bill,

                I understand I was probably mixing up some things here - my ideas were
                targeted at a more efficient algorithm that would take the structure into
                account, based on the thought that the amount of data in question is so
                vast that we start running into problems with the linear search to begin
                with.


                Oliver Sturm
                --

                Comment

                • Bill Butler

                  #9
                  Re: Search a list/array of objects for specified criteria?


                  "Oliver Sturm" <oliver@sturmne t.orgwrote in message
                  news:xn0f3jwmv1 j1cxo00f@msnews .microsoft.com. ..
                  Hello Bill,
                  >
                  I understand I was probably mixing up some things here - my ideas were
                  targeted at a more efficient algorithm that would take the structure
                  into account, based on the thought that the amount of data in question
                  is so vast that we start running into problems with the linear search
                  to begin with.
                  >

                  You do bring up a good point.
                  Knowledge of your data, and how it is likely to be searched can go a
                  LONG way towards picking the best approach.

                  If most of the time you are accessing data near the ends of the range
                  the linear search may in fact outperform the binary search which does
                  better for random access.

                  If the number of data points is not large, it doesn't matter how you
                  search, since the time is negligible in either case. Unless you are
                  sitting in a tight loop processing requests continuously that is.

                  Bill


                  Comment

                  • Oliver Sturm

                    #10
                    Re: Search a list/array of objects for specified criteria?

                    Hello Bill,
                    >Knowledge of your data, and how it is likely to be searched can go a LONG
                    >way towards picking the best approach.
                    That's what I had in mind - maybe too much so, granted.
                    >If the number of data points is not large, it doesn't matter how you
                    >search, since the time is negligible in either case. Unless you are
                    >sitting in a tight loop processing requests continuously that is.
                    Oh, of course. I didn't mean my comments somewhere on the thread to say
                    that it's meaningless to search for efficiency in this regard. But I find
                    it significant to think about the best way to achieve efficiency - for
                    instance by changing the storage structure instead of optimizing searches
                    -, and this depends on the use case a lot.


                    Oliver Sturm
                    --

                    Comment

                    Working...