Efficient Uniqueness Check

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alan Little

    Efficient Uniqueness Check

    I have affiliates submitting batches of anywhere from 10 to several
    hundred orders. Each order in the batch must include an order ID,
    originated by the affiliate, which must be unique across all orders in
    all batches ever submitted by that affiliate. I'm trying to figure out
    the most efficient way to check the uniqueness of the order ID.

    Order data is being submitted to Zen Cart, and also stored in custom
    tables. I have created a unique key in the custom tables, consisting of
    affiliate ID and order ID, but that's not workable because the custom
    tables need data from ZC, and I don't really want to submit the data to
    ZC until I've checked the order IDs, plus part of the batch might already
    be stored before I hit a dup, and would then have to be deleted.

    There are currently about a half dozen affiliates; I have no idea how
    many there will be ultimately. The client I'm developing this for thinks
    1-200. Each affiliate can potentially have several thousand order IDs
    that need to be checked.

    The two ways I'm considering are: 1) a single query to load all the
    affiliate's order IDs into an array for checking as I process the batch;
    2) creating a temporary table with just the affiliate's IDs, querying it
    for each order in the batch.

    Any ideas which would be more efficient? Would there be any significant
    difference either way?

    --
    Alan Little
    Phorm PHP Form Processor

  • petersprc@gmail.com

    #2
    Re: Efficient Uniqueness Check

    Perhaps this could work: If you keep the submissions in a set of
    staging tables, then your unique key would work. In this scenario, you
    wouldn't delete the submissions after they've been processed. Keep them
    around to check uniqueness of future submissions. You might be able to
    add a bit to the staging table (submission.com mitted) which would let
    you select the new items only and pass them along to ZC.

    Of the things you're considering, a single query would be more
    scalable, but it's like you won't notice much difference at all until
    the volume went up.

    Alan Little wrote:
    I have affiliates submitting batches of anywhere from 10 to several
    hundred orders. Each order in the batch must include an order ID,
    originated by the affiliate, which must be unique across all orders in
    all batches ever submitted by that affiliate. I'm trying to figure out
    the most efficient way to check the uniqueness of the order ID.
    >
    Order data is being submitted to Zen Cart, and also stored in custom
    tables. I have created a unique key in the custom tables, consisting of
    affiliate ID and order ID, but that's not workable because the custom
    tables need data from ZC, and I don't really want to submit the data to
    ZC until I've checked the order IDs, plus part of the batch might already
    be stored before I hit a dup, and would then have to be deleted.
    >
    There are currently about a half dozen affiliates; I have no idea how
    many there will be ultimately. The client I'm developing this for thinks
    1-200. Each affiliate can potentially have several thousand order IDs
    that need to be checked.
    >
    The two ways I'm considering are: 1) a single query to load all the
    affiliate's order IDs into an array for checking as I process the batch;
    2) creating a temporary table with just the affiliate's IDs, querying it
    for each order in the batch.
    >
    Any ideas which would be more efficient? Would there be any significant
    difference either way?
    >
    --
    Alan Little
    Phorm PHP Form Processor
    http://www.phorm.com/

    Comment

    • Alan Little

      #3
      Re: Efficient Uniqueness Check

      Carved in mystic runes upon the very living rock, the last words of
      <petersprc@gmai l.comof comp.lang.php make plain:
      Alan Little wrote:
      >I have affiliates submitting batches of anywhere from 10 to several
      >hundred orders. Each order in the batch must include an order ID,
      >originated by the affiliate, which must be unique across all orders
      >in all batches ever submitted by that affiliate. I'm trying to figure
      >out the most efficient way to check the uniqueness of the order ID.
      >>
      >The two ways I'm considering are: 1) a single query to load all the
      >affiliate's order IDs into an array for checking as I process the
      >batch; 2) creating a temporary table with just the affiliate's IDs,
      >querying it for each order in the batch.
      >>
      >Any ideas which would be more efficient? Would there be any
      >significant difference either way?
      >
      Perhaps this could work: If you keep the submissions in a set of
      staging tables, then your unique key would work.
      Thanks, but that isn't really practical; all orders are maintained in the
      orders table, so inserting them into an intermediate table would just be
      another step. Using a single table, I would have to flag the records as
      temporary, as I added them, then remove the flag upon successful
      completion, or delete the flagged records upon failure. With an
      intermediate table, I wouldn't have the deletion problem, but I would
      still then have to copy them to the permanent table upon success, which
      (one would hope) would be the majority of cases.
      Of the things you're considering, a single query would be more
      scalable, but it's like you won't notice much difference at all until
      the volume went up.
      That's kind of what I figured, that even with an order of magnitude of
      tens of thousands of past records, it wouldn't make much difference
      either way. I just wondered about the efficiency of having to loop over
      all those records from the single query, and load them into an array, but
      I think that would be the most efficient.

      --
      Alan Little
      Phorm PHP Form Processor

      Comment

      • Anonymous

        #4
        Re: Efficient Uniqueness Check

        Alan Little wrote:
        >
        Carved in mystic runes upon the very living rock, the last words of
        <petersprc@gmai l.comof comp.lang.php make plain:
        >
        Alan Little wrote:
        I have affiliates submitting batches of anywhere from 10 to several
        hundred orders. Each order in the batch must include an order ID,
        originated by the affiliate, which must be unique across all orders
        in all batches ever submitted by that affiliate. I'm trying to figure
        out the most efficient way to check the uniqueness of the order ID.
        >
        The two ways I'm considering are: 1) a single query to load all the
        affiliate's order IDs into an array for checking as I process the
        batch; 2) creating a temporary table with just the affiliate's IDs,
        querying it for each order in the batch.
        >
        Any ideas which would be more efficient? Would there be any
        significant difference either way?
        Perhaps this could work: If you keep the submissions in a set of
        staging tables, then your unique key would work.
        >
        Thanks, but that isn't really practical; all orders are maintained in the
        orders table, so inserting them into an intermediate table would just be
        another step. Using a single table, I would have to flag the records as
        temporary, as I added them, then remove the flag upon successful
        completion, or delete the flagged records upon failure. With an
        intermediate table, I wouldn't have the deletion problem, but I would
        still then have to copy them to the permanent table upon success, which
        (one would hope) would be the majority of cases.
        >
        Of the things you're considering, a single query would be more
        scalable, but it's like you won't notice much difference at all until
        the volume went up.
        >
        That's kind of what I figured, that even with an order of magnitude of
        tens of thousands of past records, it wouldn't make much difference
        either way. I just wondered about the efficiency of having to loop over
        all those records from the single query, and load them into an array, but
        I think that would be the most efficient.
        >
        --
        Alan Little
        Phorm PHP Form Processor
        http://www.phorm.com/

        Nope, the most efficient way is to let the database do the checking.

        I would solve your problem like this: Put a new row into the orders
        table, character, as long as you need (which is the max. length of
        affiliate ID + max. length of order ID + 1), which will become your new
        primary key.

        You create your new primary keys by concatenating the affiliate ID with
        the order ID, separated by a dash for better readability.

        Example: You have three affiliates, "Jack", "Johnny" and "Jim", each
        having three orders in the table. Let's assume that while Jack and Jim
        use simple numbering for their IDs, Johnny uses some obscure letter code
        for order IDs. Then you would have nine entries in the order table and
        their primary keys would look similar to this:

        "Jack-0001"
        "Jack-0002"
        "Jack-0003"
        "Johnny-qwertz"
        "Johnny-asdfgh"
        "Johnny-yxcvbn"
        "Jim-0001"
        "Jim-0002"
        "Jim-0003"

        As you can see, every affiliate has his own 'namespace' within your new
        primary key, because every affiliate has his own unique ID, so the
        primary keys generated this way will never overlap between different
        affiliates. However, since the part up to the dash is always constant
        for any affiliate it would lead to a primary key collision if any
        affiliate screws up his order ID scheme and submits an ID he has
        submitted before.

        So you don't need to do any checking beforehand at all. Just insert all
        the orders you receive into the orders table creating your new key like
        $ID = $AffiliateID . "-" . $OrderID; and insert it along with the data
        into your table. Only when the insert fails will you have to call
        mysql_error() or mysql_errno() to find out whether the INSERT query
        failed because of a duplicate key. But you are doing some proper error
        checking anyway, right? At least I hope so.

        You don't need any arrays, any temporary tables or other things and no
        duplicate checking code. The database does it all for you. And it needs
        to check for dulpicate keys anyway, so we are not even wasting one
        single CPU cycle here. :-) About 2-3 lines of code added to your error
        checking to check if the reason for an insert failure was a duplicate
        index would be sufficient plus one new database column.


        Further optimizing the above solution:

        *) You don't even need to create a new variable for $ID, you can just
        concatenate the strings when you put your query together. But you
        probably thought of that yourself already.

        *) After creating your new column, populating it with
        "AffiliateI D-OrderID" for all database entries that already exist and
        declaring it the primary key for the table you can even delete these two
        columns from your table. They are redundant now because their combined
        information is already contained within your new primary key. And it can
        be easily extracted again if needed and also filtered on. If you want to
        see for example only orders from Jim, you just filter for all IDs which
        start with "Jim-". I don't see a reason why we would need to keep these
        around. Except perhaps for comfort, because we wouldn't have to extract
        the information from the combined string when we need it, but that's not
        really difficult at all.


        If I understood your problem correctly this should be the optimal way to
        do it. Or did I get your problem wrong?

        Bye!

        Comment

        • Alan Little

          #5
          Re: Efficient Uniqueness Check

          Carved in mystic runes upon the very living rock, the last words of
          Anonymous of comp.lang.php make plain:
          Alan Little wrote:
          >>
          >Carved in mystic runes upon the very living rock, the last words of
          ><petersprc@gma il.comof comp.lang.php make plain:
          >>
          Alan Little wrote:
          >I have affiliates submitting batches of anywhere from 10 to
          >several hundred orders. Each order in the batch must include an
          >order ID, originated by the affiliate, which must be unique across
          >all orders in all batches ever submitted by that affiliate. I'm
          >trying to figure out the most efficient way to check the
          >uniqueness of the order ID.
          >>
          >The two ways I'm considering are: 1) a single query to load all
          >the affiliate's order IDs into an array for checking as I process
          >the batch; 2) creating a temporary table with just the affiliate's
          >IDs, querying it for each order in the batch.
          >>
          >Any ideas which would be more efficient? Would there be any
          >significant difference either way?
          >
          Perhaps this could work: If you keep the submissions in a set of
          staging tables, then your unique key would work.
          >>
          >Thanks, but that isn't really practical; all orders are maintained in
          >the orders table, so inserting them into an intermediate table would
          >just be another step. Using a single table, I would have to flag the
          >records as temporary, as I added them, then remove the flag upon
          >successful completion, or delete the flagged records upon failure.
          >With an intermediate table, I wouldn't have the deletion problem, but
          >I would still then have to copy them to the permanent table upon
          >success, which (one would hope) would be the majority of cases.
          >>
          Of the things you're considering, a single query would be more
          scalable, but it's like you won't notice much difference at all
          until the volume went up.
          >>
          >That's kind of what I figured, that even with an order of magnitude
          >of tens of thousands of past records, it wouldn't make much
          >difference either way. I just wondered about the efficiency of having
          >to loop over all those records from the single query, and load them
          >into an array, but I think that would be the most efficient.
          >
          Nope, the most efficient way is to let the database do the checking.
          >
          [snip]
          Thanks for your detailed response. However, it doesn't work, because I
          don't want to wait until I've already attempted the insert, to know if it
          fails. The issue of two different affiliates having the same order ID
          isn't a problem, as there is an affiliate ID as well, so what needs to be
          unique is AID + OID, as you've described.

          The affiliates are providing their own order IDs, since they're going to
          be using whatever systems they use to collect the orders. When they
          submit the orders for fufillment, here's the sequence:

          1) Affiliate submits a batch of (say) 100 orders
          2) Check all the order data
          3) Submit the batch as a single order to ZC
          4) Store the sub-orders in custom tables

          The problem with doing the checking at insert time is that the first 90
          orders may be OK, with a problem on order #91. The whole batch has to be
          rejected and re-submitted, so that means I have to delete the 90 sub-
          orders already entered, *and* the order submitted to ZC. Thus I want to
          do the ID check at step 2.

          Hmmm..... I just had a thought, though. If I do a bulk insert, the whole
          thing would fail if there was a dup, and then I wouldn't have to delete
          the already-processed sub-orders, but I'd still have to delete the ZC
          order. I'll have to think about it some more.

          --
          Alan Little
          Phorm PHP Form Processor

          Comment

          • Anonymous

            #6
            Re: Efficient Uniqueness Check

            Alan Little wrote:
            Thanks for your detailed response. However, it doesn't work, because I
            don't want to wait until I've already attempted the insert, to know if it
            fails. The issue of two different affiliates having the same order ID
            isn't a problem, as there is an affiliate ID as well, so what needs to be
            unique is AID + OID, as you've described.
            >
            The affiliates are providing their own order IDs, since they're going to
            be using whatever systems they use to collect the orders. When they
            submit the orders for fufillment, here's the sequence:
            >
            1) Affiliate submits a batch of (say) 100 orders
            2) Check all the order data
            3) Submit the batch as a single order to ZC
            4) Store the sub-orders in custom tables
            >
            The problem with doing the checking at insert time is that the first 90
            orders may be OK, with a problem on order #91. The whole batch has to be
            rejected and re-submitted, so that means I have to delete the 90 sub-
            orders already entered, *and* the order submitted to ZC. Thus I want to
            do the ID check at step 2.
            >
            Hmmm..... I just had a thought, though. If I do a bulk insert, the whole
            thing would fail if there was a dup, and then I wouldn't have to delete
            the already-processed sub-orders, but I'd still have to delete the ZC
            order. I'll have to think about it some more.
            If all inserts have to fail if one fails use a transaction. That's what
            they are here for. :-)

            If you have never worked with transactions before read them up in the
            MySQL manual.

            Bye!

            Comment

            Working...