matching objects by a tuple field criterion

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bullockbefriending bard

    matching objects by a tuple field criterion

    i have a large collection of python objects, each of which contains an
    integer 6-tuple as part of its data payload. what i need to be able to
    do is select only those objects which meet a simple tuple element
    wildcard matching criterion. e.g. given the following python objects:

    object A includes tuple (1,2,3,4,5,6)
    object B includes tuple (1,4,4,4,11,1)
    object C includes tuple (1,3,9,1,1,1)

    all tuples are unique. for what it's worth, the values in each field
    are independent of the other fields and range from 1 to 14. although
    'large', my collection is sparse cf. the 14^6 possible tuples.

    i want to search on *one only* tuple field/value. if my search
    criterion is (*,*,*,4,*,*), then i want to find object A and object B.
    if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
    ever specify an integer match for one tuple field.

    i can think of some naive approaches, but is there an elegant way to
    do this?

  • Steven D'Aprano

    #2
    Re: matching objects by a tuple field criterion

    On Sun, 10 Jun 2007 03:58:44 -0700, bullockbefriend ing bard wrote:
    i have a large collection of python objects, each of which contains an
    integer 6-tuple as part of its data payload. what i need to be able to
    do is select only those objects which meet a simple tuple element
    wildcard matching criterion. e.g. given the following python objects:
    >
    object A includes tuple (1,2,3,4,5,6)
    object B includes tuple (1,4,4,4,11,1)
    object C includes tuple (1,3,9,1,1,1)
    >
    all tuples are unique. for what it's worth, the values in each field
    are independent of the other fields and range from 1 to 14. although
    'large', my collection is sparse cf. the 14^6 possible tuples.
    >
    i want to search on *one only* tuple field/value. if my search
    criterion is (*,*,*,4,*,*), then i want to find object A and object B.
    if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
    ever specify an integer match for one tuple field.
    >
    i can think of some naive approaches, but is there an elegant way to
    do this?
    Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
    integer you want to match and the position you want to match it in.

    As a generator:


    def matcher(list_of _objects, what, where):
    for obj in list_of_objects :
    if obj.data[where] == what:
    yield obj


    As a generator expression:

    (obj for obj in list_of_objects if obj.data[what] == where)



    --
    Steven.

    Comment

    • Diez B. Roggisch

      #3
      Re: matching objects by a tuple field criterion

      bullockbefriend ing bard schrieb:
      i have a large collection of python objects, each of which contains an
      integer 6-tuple as part of its data payload. what i need to be able to
      do is select only those objects which meet a simple tuple element
      wildcard matching criterion. e.g. given the following python objects:
      >
      object A includes tuple (1,2,3,4,5,6)
      object B includes tuple (1,4,4,4,11,1)
      object C includes tuple (1,3,9,1,1,1)
      >
      all tuples are unique. for what it's worth, the values in each field
      are independent of the other fields and range from 1 to 14. although
      'large', my collection is sparse cf. the 14^6 possible tuples.
      >
      i want to search on *one only* tuple field/value. if my search
      criterion is (*,*,*,4,*,*), then i want to find object A and object B.
      if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
      ever specify an integer match for one tuple field.
      >
      i can think of some naive approaches, but is there an elegant way to
      do this?
      Depends on what you find elegant. Are the criteria runtime-specified,
      and is anything other than the * allowerd, e.g. [1,3,8]? IMHO the best
      thing is to create a filter-function that you then use to... filter :)

      Like this:

      def create_filter_p redicate(criter ia):
      """
      criteria is an iterable containing either
      an '*' or a list of comma-separated
      integers
      """
      sub_preds = []
      for i, sub_crit in enumerate(crite ria):
      if sub_crit == "*":
      continue
      matching_set = set(int(n) for n in sub_crit.split( ","))
      sub_preds.appen d((i, matching_set))
      def predicate(o):
      t = o.my_tuple
      for i, ms in sub_preds:
      if not t[i] in ms:
      return False
      return True
      return predicate

      Diez

      Comment

      • Diez B. Roggisch

        #4
        Re: matching objects by a tuple field criterion

        Diez B. Roggisch schrieb:
        bullockbefriend ing bard schrieb:
        >i have a large collection of python objects, each of which contains an
        >integer 6-tuple as part of its data payload. what i need to be able to
        >do is select only those objects which meet a simple tuple element
        >wildcard matching criterion. e.g. given the following python objects:
        >>
        > object A includes tuple (1,2,3,4,5,6)
        > object B includes tuple (1,4,4,4,11,1)
        > object C includes tuple (1,3,9,1,1,1)
        >>
        >all tuples are unique. for what it's worth, the values in each field
        >are independent of the other fields and range from 1 to 14. although
        >'large', my collection is sparse cf. the 14^6 possible tuples.
        >>
        >i want to search on *one only* tuple field/value. if my search
        >criterion is (*,*,*,4,*,*), then i want to find object A and object B.
        >if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
        >ever specify an integer match for one tuple field.
        >>
        >i can think of some naive approaches, but is there an elegant way to
        >do this?
        >
        Depends on what you find elegant. Are the criteria runtime-specified,
        and is anything other than the * allowerd, e.g. [1,3,8]? IMHO the best
        thing is to create a filter-function that you then use to... filter :)
        >
        Like this:
        >
        def create_filter_p redicate(criter ia):
        """
        criteria is an iterable containing either
        an '*' or a list of comma-separated
        integers
        """
        sub_preds = []
        for i, sub_crit in enumerate(crite ria):
        if sub_crit == "*":
        continue
        matching_set = set(int(n) for n in sub_crit.split( ","))
        sub_preds.appen d((i, matching_set))
        def predicate(o):
        t = o.my_tuple
        for i, ms in sub_preds:
        if not t[i] in ms:
        return False
        return True
        return predicate
        Obviously the docs are faulty - the criteria looks like this:

        ['*', '1,2,3']

        But I think you can get the gist of it - regardless on how the actual
        criteria are entered.

        Diez

        Comment

        • bullockbefriending bard

          #5
          Re: matching objects by a tuple field criterion

          Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
          integer you want to match and the position you want to match it in.
          for sure. that was more for expository purpose rather than how i was
          planning to go about it.

          As a generator expression:
          >
          (obj for obj in list_of_objects if obj.data[what] == where)
          above or equivalent list comprehension was what i had in mind as far
          as linear search goes. and scanning the list like this will most
          likely be 'good enough' performance-wise. however, partly just out of
          curiosity, i was wondering if there is some kind of data structure
          which might let me find all the matches a bit faster.

          Comment

          • Diez B. Roggisch

            #6
            Re: matching objects by a tuple field criterion

            bullockbefriend ing bard schrieb:
            >Instead of passing a wild-card tuple like (*,*,*,4,*,*) simply pass the
            >integer you want to match and the position you want to match it in.
            >
            for sure. that was more for expository purpose rather than how i was
            planning to go about it.
            >
            >
            >As a generator expression:
            >>
            >(obj for obj in list_of_objects if obj.data[what] == where)
            >
            above or equivalent list comprehension was what i had in mind as far
            as linear search goes. and scanning the list like this will most
            likely be 'good enough' performance-wise. however, partly just out of
            curiosity, i was wondering if there is some kind of data structure
            which might let me find all the matches a bit faster.
            You can of course create a tree from the tuples, where the first level
            of nodes corresponds to the first attribute of the tuple and so forth.

            There are certainly cases where the speedup is tremendous - think of a
            single integer in the first criteria - but then the overall performance
            depends on the real-live queries. If lot's of wildcards are used, you
            might end up slower if the tree-walk takes more time than the
            C-implemented list-iteration will cost you.

            Diez

            Comment

            • John Machin

              #7
              Re: matching objects by a tuple field criterion

              On Jun 10, 8:58 pm, bullockbefriend ing bard <kinch1...@gmai l.com>
              wrote:
              i have a large collection of python objects, each of which contains an
              integer 6-tuple as part of its data payload. what i need to be able to
              do is select only those objects which meet a simple tuple element
              wildcard matching criterion. e.g. given the following python objects:
              >
              object A includes tuple (1,2,3,4,5,6)
              object B includes tuple (1,4,4,4,11,1)
              object C includes tuple (1,3,9,1,1,1)
              >
              all tuples are unique. for what it's worth, the values in each field
              are independent of the other fields and range from 1 to 14. although
              'large', my collection is sparse cf. the 14^6 possible tuples.
              >
              i want to search on *one only* tuple field/value. if my search
              criterion is (*,*,*,4,*,*), then i want to find object A and object B.
              if (1,*,*,*,*,*), i want to find objects A, B, and C, etc. i will only
              ever specify an integer match for one tuple field.
              >
              i can think of some naive approaches, but is there an elegant way to
              do this?
              You are going to have to tell us how many is in a "large" collection,
              how often you will do this query, how much memory you have to spare,
              how fast you want the answer back, whether you want the answer as a
              generator, or in a list/tuple/set/whatever -- or you are going to have
              to do a fair bit of prototyping and benchmarking.

              Steven has given you one end of the spectrum: a naive serial scan. Now
              I'll give you another end: a naive pre-build all possible answers
              method:

              [quite untested]
              # build 14x6 array of empty sets
              answer = [[set() for what in range(14)] for slot in range(6)]
              # populate the sucker
              for obj in obj_list:
              for slot in range(6):
              answer[slot][obj.data[slot]-1].add(obj)

              Later, the answer to 'Which objects have the value 11 in slot 3?' is
              answer[3][11-1]

              HTH,
              John

              Comment

              • bullockbefriending bard

                #8
                Re: matching objects by a tuple field criterion

                quite so, i rephrased docstring to be:

                """criteria is an iterable containing either '*' instances or strings
                of comma-separated integers. e.g. ['*','1,2,3', '11,12']"""

                thanks very much for the idea! upon further reflection, this seems to
                be a more elegant solution for my case than the ad-hoc generator or
                list comprehension approach. this is because i *do* have to filter
                data based on multiple single field criteria and i know all of these
                criteria at the same time - so it makes a lot of sense to do as you
                have done and then only do one filter operation to pull out all the
                objects i am interested in.

                Comment

                • bullockbefriending bard

                  #9
                  Re: matching objects by a tuple field criterion

                  There are certainly cases where the speedup is tremendous - think of a
                  single integer in the first criteria - but then the overall performance
                  depends on the real-live queries. If lot's of wildcards are used, you
                  might end up slower if the tree-walk takes more time than the
                  C-implemented list-iteration will cost you.
                  hopefully, won't need to do this then. i'll test the filter approach
                  tonight. generally speaking, going to be mostly wildcards most of the
                  time.

                  Comment

                  • John Machin

                    #10
                    Re: matching objects by a tuple field criterion

                    On Jun 10, 10:32 pm, bullockbefriend ing bard <kinch1...@gmai l.com>
                    wrote:
                    quite so, i rephrased docstring to be:
                    >
                    """criteria is an iterable containing either '*' instances or strings
                    of comma-separated integers. e.g. ['*','1,2,3', '11,12']"""
                    >
                    thanks very much for the idea! upon further reflection, this seems to
                    be a more elegant solution for my case than the ad-hoc generator or
                    list comprehension approach. this is because i *do* have to filter
                    data based on multiple single field criteria and i know all of these
                    criteria at the same time - so it makes a lot of sense to do as you
                    have done and then only do one filter operation to pull out all the
                    objects i am interested in.
                    """i want to search on *one only* tuple field/value."""
                    """i will only ever specify an integer match for one tuple field."""
                    """the cheque's in the mail"""
                    """of course i'll still love you in the morning"""

                    So augment my pre-built approach by intersection (or union? your
                    wording is somewhat vague) of the selected answer sets.


                    Comment

                    Working...