Best way to do so.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Bob Bedford

    Best way to do so.

    Sorry if I post here, but I don't have access to any valuable mysql ng for
    this question, ans since 99% or php programmers deal with mysql, I'm sure
    I'll get an answer here.

    Look at the folloming table:
    CREATE TABLE searchlog (
    IDSearchLog MEDIUMINT UNSIGNED NOT NULL auto_increment,
    F1 MEDIUMINT UNSIGNED default NULL,
    F2 VARCHAR(40) default NULL,
    F3a SMALLINT UNSIGNED default NULL,
    F3b SMALLINT UNSIGNED default NULL,
    F4a FLOAT(11,2) UNSIGNED default NULL,
    F4b FLOAT(11,2) UNSIGNED default NULL,
    F5a MEDIUMINT UNSIGNED default NULL,
    F5b MEDIUMINT UNSIGNED default NULL,
    F6 char(8) default NULL,
    F7 char(8) default NULL,
    F8 char(8) default NULL,
    F9a TINYINT UNSIGNED default NULL,
    F9b TINYINT UNSIGNED default NULL,
    F10a FLOAT(6,2) UNSIGNED default NULL,
    F10b FLOAT(6,2) UNSIGNED default NULL,
    F11a FLOAT(6,2) UNSIGNED default NULL,
    F11b FLOAT(6,2) UNSIGNED default NULL,
    F12 datetime NOT NULL,
    PRIMARY KEY (IDSearchLog),
    UNIQUE KEY IDSearchLog (IDSearchLog),
    INDEX AI_IDSearchLog (IDSearchLog)
    );

    Now this will be used for managing searchlog. Any time a search is done on
    some fields, then I must log them.
    For the long time, I think that they may be about 5-600 different search
    type. Many search would be quite similar.
    Now, for performances reasons I've 3 possibilities:
    1) I may check if the same search is done then log only the datetime and the
    IDSearchLog. If the same search doesn't exist, then I create it.
    2) I may create a new entry on every search: I won't have to check if the
    search exists, but I will have a lot of similar records. This will rise the
    amount of data, but isn't quicker ? I avoid a check at every search.
    3) I may create many tables. If you have a close look at the field's name
    (explicitely changed for better understanding), any number with a or b next
    to it means that you may define min and max. So for fields F3a and F3b is a
    min and max value. So the limits combination may be quite common between
    searches. So in my searchlog table, I may only have pointers to the
    differents combinations. This seems very heavy in query time, but it let me
    gain some space. Maybe I'm missing a quick way to create a new combination
    every time, and return the existing ID if available instead of creating it.

    After thinking at what I've, the first choice would have my preference. But
    I'm scared that the search for existing record with the same fields would
    slow down a lot, and it doesn't worth the gained space against choice 2.
    Maybe there is a simple way to add some fields in a query, and if the fields
    combination already exists, return the ID.
    Some search example:
    Adidas, shoes, from 150$, to 250$, blue
    Nike, socks, to 50$
    Adidas, socks, to 60$
    Nike, socks, to 60$
    Adidas shoes, to 250$
    Puma shoes, red
    Puma shoes, yellow
    .....
    I've to log such fields in order to be as fast as possible, but also save
    space.

    Please help.

    Bob


  • News Me

    #2
    Re: Best way to do so.

    Bob Bedford wrote:[color=blue]
    > Sorry if I post here, but I don't have access to any valuable mysql ng
    > for this question, ans since 99% or php programmers deal with mysql, I'm
    > sure I'll get an answer here.
    >
    > Look at the folloming table:
    > CREATE TABLE searchlog (
    > IDSearchLog MEDIUMINT UNSIGNED NOT NULL auto_increment,
    > F1 MEDIUMINT UNSIGNED default NULL,
    > F2 VARCHAR(40) default NULL,
    > F3a SMALLINT UNSIGNED default NULL,
    > F3b SMALLINT UNSIGNED default NULL,
    > F4a FLOAT(11,2) UNSIGNED default NULL,
    > F4b FLOAT(11,2) UNSIGNED default NULL,
    > F5a MEDIUMINT UNSIGNED default NULL,
    > F5b MEDIUMINT UNSIGNED default NULL,
    > F6 char(8) default NULL,
    > F7 char(8) default NULL,
    > F8 char(8) default NULL,
    > F9a TINYINT UNSIGNED default NULL,
    > F9b TINYINT UNSIGNED default NULL,
    > F10a FLOAT(6,2) UNSIGNED default NULL,
    > F10b FLOAT(6,2) UNSIGNED default NULL,
    > F11a FLOAT(6,2) UNSIGNED default NULL,
    > F11b FLOAT(6,2) UNSIGNED default NULL,
    > F12 datetime NOT NULL,
    > PRIMARY KEY (IDSearchLog),
    > UNIQUE KEY IDSearchLog (IDSearchLog),
    > INDEX AI_IDSearchLog (IDSearchLog)
    > );[/color]

    You don't need the UNIQUE KEY clause because the manual says:

    14.2.6 CREATE TABLE Syntax

    A PRIMARY KEY is a unique KEY where all key columns must be defined as
    NOT NULL.

    <snip>

    As for everything else, I have no idea what you are asking.

    NM

    --
    convert uppercase WORDS to single keystrokes to reply

    Comment

    • Dan Stumpus

      #3
      Re: Best way to do so.


      "Bob Bedford" <bedford1@YouKn owWhatToDoHereh otmail.com> wrote
      [color=blue]
      > Now this will be used for managing searchlog. Any time a search is done on
      > some fields, then I must log them.
      > For the long time, I think that they may be about 5-600 different search
      > type. Many search would be quite similar.[/color]

      For a log, just record the raw event. Don't try and get fancy. Doing so
      may get you in trouble later when you want to mine the data in some way you
      haven't even considered.

      Try and be relational: use the part id or style/color/size id's from your
      item master tables.

      I'd also add timestamp, source, and session_id fields to each row.

      Don't design to minimize physical storage. Design for maximum flexibility.

      How many rows will your app be inserting a day, anyway? You could purge the
      old records to a history table every month if it got really nuts.

      I designed a large inventory application which accumulated about a million
      transactions a month, and we kept a years worth on-line without any
      particular problems (this was Oracle on an HP box). It's hard to imagine
      you'll get anything like that volume of transactions.

      Just my 2 cents off the top of my head, not fully understanding your
      application...

      -- Dan


      Comment

      • Bob Bedford

        #4
        Re: Best way to do so.


        First, thanks for replying.

        I'll explain differently what I'm trying to do.
        I've to store all possible data for informing the companies that signed with
        our website what the people choice are.
        Actually I took example of the shoes, for simplify. The real items in the
        site are used cars.
        If they are 80% of the customers that are looking for Chevrolet, then the
        car dealer may buy Chevrolet instead of BMW for their customers. If they
        know that 90% are looking for a car between 5000 and 8000$, they would try
        to provide this kind of car. So, everytime somebody make a search, I may log
        what he was looking for, then build statistics for the car dealers. Since
        not all fields are compulsory, a customer may chose the price, an other the
        color, the other one just the make. I want to optimize it as much as
        possible but keeping it flexible for doing any kind of statistics.
        [color=blue]
        > For a log, just record the raw event. Don't try and get fancy. Doing so
        > may get you in trouble later when you want to mine the data in some way
        > you haven't even considered.[/color]
        When you say I've to log the row event, I don't know what you mean. My form
        has many fields, and I must save only the used ones, not the ones left
        empty. I'll use the part ID, but how to store any price if they are filled
        by the customer itself ? I don't provide any preset value for price.
        [color=blue]
        > Try and be relational: use the part id or style/color/size id's from your
        > item master tables.[/color]
        Obviously I'll do this way. I'll link to other tables.
        [color=blue]
        >
        > I'd also add timestamp, source, and session_id fields to each row.[/color]
        What you mean for "source" to each row ?
        Why should I store session_id ? this will let me know if the same person
        looked at different articles, but since they are too many search, it would
        be hard to get an interesting result. Maybe I may say: people looking for a
        BMW are also looking for an AUDI.
        [color=blue]
        > Don't design to minimize physical storage. Design for maximum
        > flexibility.[/color]
        My problem is saving "time". I mean if my search makes 5 seconds because of
        bad DB design or other, it may be good for trashbin. I want maximum speed
        and flexibility.
        [color=blue]
        > How many rows will your app be inserting a day, anyway? You could purge
        > the old records to a history table every month if it got really nuts.[/color]
        I won't purge records. Maybe it wouldn't be interesting to have records 3
        years old, but the records may be kept.
        [color=blue]
        > I designed a large inventory application which accumulated about a million
        > transactions a month, and we kept a years worth on-line without any
        > particular problems (this was Oracle on an HP box). It's hard to imagine
        > you'll get anything like that volume of transactions.[/color]
        The concurrent get 7 millions/month. I won't achieve this for a long time,
        but as what I see, I've about 50'000 search/month, and I don't store them
        yet.

        I've 2 choices:
        1) store a table with every different search and an other where I store the
        searchID, the timestamp and the session_ID.
        But I've to check if the record already exist at every new search.
        2) store everything in the same table, but a lot of same records will exist
        in the table.

        in example1, I may have this kind of result.
        table a:
        (ID,Make,pricef rom, priceto,kmfrom, kmto,....)
        1,BMW, ,8000, , ,....
        2,Bugatti, 50000,80000, ,100000,...
        3,BMW, , , ,150000,...

        table b:
        (aID,timestamp, session_id)
        1,10:10:42 2004-10-10,dfasdfsdfsdf
        2,10:10:48 2004-10-10,fdsadsfdsdsv
        3,10:10:55 2004-10-10,fsdvgrtwtrwt
        2,10:11:02 2004-10-10,ffdsfvvvddvd
        2,10:11:20 2004-10-10,ffshhzjuztur z

        at beginning, the table a would be filled very quickly, but later, I'll have
        less new data there.

        Flexibility wouldn't be affected by such choices, only speed and diskspace.
        So what's the best choice ?
        Does it exist a way to store a record in a table, and if it already exist,
        retrieve the ID ? That would be the faster way I think


        Comment

        • R. Rajesh Jeba Anbiah

          #5
          Re: Best way to do so.

          Bob Bedford wrote:[color=blue]
          > I'll explain differently what I'm trying to do.
          > I've to store all possible data for informing the companies that[/color]
          signed with[color=blue]
          > our website what the people choice are.
          > Actually I took example of the shoes, for simplify. The real items in[/color]
          the[color=blue]
          > site are used cars.
          > If they are 80% of the customers that are looking for Chevrolet, then[/color]
          the[color=blue]
          > car dealer may buy Chevrolet instead of BMW for their customers. If[/color]
          they[color=blue]
          > know that 90% are looking for a car between 5000 and 8000$, they[/color]
          would try[color=blue]
          > to provide this kind of car. So, everytime somebody make a search, I[/color]
          may log[color=blue]
          > what he was looking for, then build statistics for the car dealers.[/color]
          Since[color=blue]
          > not all fields are compulsory, a customer may chose the price, an[/color]
          other the[color=blue]
          > color, the other one just the make. I want to optimize it as much as
          > possible but keeping it flexible for doing any kind of statistics.[/color]
          <snip>

          Just another thought... Don't log anything with DB. Just use access
          log for data mining. Probably, you have to use a cron job once in a
          week that does the data mining and provides you enough stats or
          trims/groups the data that available in access log.

          --
          <?php echo 'Just another PHP saint'; ?>
          Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

          Comment

          • Dan Stumpus

            #6
            Re: Best way to do so.


            "Bob Bedford" <bedford1@YouKn owWhatToDoHereh otmail.com> wrote[color=blue]
            >
            > First, thanks for replying.[/color]

            Too much time on my hands...
            [color=blue]
            > I'll explain differently what I'm trying to do.
            > I've to store all possible data for informing the companies that signed
            > with our website what the people choice are.
            > Actually I took example of the shoes, for simplify. The real items in the
            > site are used cars.
            > If they are 80% of the customers that are looking for Chevrolet, then the
            > car dealer may buy Chevrolet instead of BMW for their customers. If they
            > know that 90% are looking for a car between 5000 and 8000$, they would try
            > to provide this kind of car. So, everytime somebody make a search, I may
            > log what he was looking for, then build statistics for the car dealers.
            > Since not all fields are compulsory, a customer may chose the price, an
            > other the color, the other one just the make. I want to optimize it as
            > much as possible but keeping it flexible for doing any kind of statistics.[/color]

            It's a design mistake to try and optimize storage. It limits your choices
            down the line. When I was younger, I did this several times, and it was
            always a mistake in the long run.

            Log the entire choice row. Then you can cook it down and digest it if you
            need to for further analysis. 50,000 rows a month isn't much of a problem
            on a fast box with fast disks. Keeping *all* fields in the log will give
            you maximum flexibility and ease in conducting ad-hoc querying etc.
            [color=blue][color=green]
            >> For a log, just record the raw event.[/color][/color]
            [color=blue]
            > When you say I've to log the row event, I don't know what you mean. My
            > form has many fields, and I must save only the used ones, not the ones
            > left empty.[/color]

            Is someone looking over your shoulder and saying "you cannot save a field in
            a row if it may be blank x% of the time?" If you want to design
            heirarchical schema with "search" at the top, and many other types of
            zero-to-1 entities beneath it (eg, mfg, color, miles, cost, upholstery,
            etc), have at it. This will save storage but slow down querying (cause
            you'll need lots of joining), and make querying always be a multiple table
            join (a pain, if you do a lot of them). De-normalization (eg, one row per
            search event) is used all the time where data mining needs to be quick and
            convenient.
            [color=blue]
            > I'll use the part ID, but how to store any price if they are filled by the
            > customer itself ? I don't provide any preset value for price.[/color]

            Leave them null, or default to zero, your pick. Nulls are sometimes a bit
            of a pain to deal with syntactically, as function arguments, etc.
            [color=blue][color=green]
            >> I'd also add timestamp, source, and session_id fields to each row.[/color]
            > What you mean for "source" to each row ?[/color]

            Originating url or IP address, just in case someone is messing with you...
            [color=blue]
            > Why should I store session_id ? this will let me know if the same person
            > looked at different articles, but since they are too many search, it would
            > be hard to get an interesting result.[/color]

            If you're in the design mentality, you don't prejudge what may or may not be
            interesting.
            [color=blue]
            > Maybe I may say: people looking for a BMW are also looking for an AUDI.[/color]

            Shopping behavior is very interesting to marketing types. You may get clues
            for suggestive selling from analyzing search habits.
            [color=blue][color=green]
            >> Don't design to minimize physical storage. Design for maximum
            >> flexibility.[/color]
            > My problem is saving "time". I mean if my search makes 5 seconds because
            > of bad DB design or other, it may be good for trashbin. I want maximum
            > speed and flexibility.[/color]

            Waiting a few seconds for a research query is perfectly ok. In my
            experience (designing fairly large systems for 25 yrs) sub-second response
            times are important for high volume things like customer activity, order
            entry, and the like. Research (which is what your search history table is
            for) involves people thinking about what they want, looking at the results,
            and thinking some more. Educate your customer that processing historical
            data takes a few seconds.
            [color=blue]
            > The concurrent get 7 millions/month. I won't achieve this for a long time,
            > but as what I see, I've about 50'000 search/month, and I don't store them
            > yet.[/color]

            A few hundred thousand rows in a table or database is not much of a
            challenge for a modern database. If you're worried, create a table with a
            couple hundred k rows and run some queries against it.
            [color=blue]
            > 1) store a table with every different search and an other where I store
            > the searchID, the timestamp and the session_ID.
            > But I've to check if the record already exist at every new search.
            > 2) store everything in the same table, but a lot of same records will
            > exist in the table.[/color]

            IMO, you're over-thinking this one. Just store a row per search and be done
            with it.

            [color=blue]
            > Flexibility wouldn't be affected by such choices, only speed and
            > diskspace. So what's the best choice ?[/color]

            Simplicity. One row per search event.
            [color=blue]
            > Does it exist a way to store a record in a table, and if it already exist,
            > retrieve the ID ? That would be the faster way I think[/color]

            Every insertion will be slower that way. If you want user entry speed
            maximized, this isn't the way to go.

            Just the thoughts of an old designer....

            -- Dan


            Comment

            • Bob Bedford

              #7
              Re: Best way to do so.

              > Just another thought... Don't log anything with DB. Just use access[color=blue]
              > log for data mining. Probably, you have to use a cron job once in a
              > week that does the data mining and provides you enough stats or
              > trims/groups the data that available in access log.[/color]

              Since I use session variables trough pages, and also POST datas, access log
              won't be possible, as those datas aren't shown in access log.

              I'm thinking of logging datas in a table, then use a cron for "packing it a
              way that takes less space, and does fit better the way datas may be stored.
              The fast way is kept, and the space is also saved. letting me keep data for
              a long time without caring about lost space.

              Don't know if it's the best way..but it's quite a compromise fitting best to
              my needs.
              Bob


              Comment

              • Bob Bedford

                #8
                Re: Best way to do so.

                > Just the thoughts of an old designer....

                ....with probably far more experience than me. I'll take your advice and I
                may always change later. This way will also be faster to implement.

                Thanks for help.

                Bob


                Comment

                • R. Rajesh Jeba Anbiah

                  #9
                  Re: Best way to do so.

                  Bob Bedford wrote:[color=blue][color=green]
                  > > Just another thought... Don't log anything with DB. Just use access
                  > > log for data mining. Probably, you have to use a cron job once in a
                  > > week that does the data mining and provides you enough stats or
                  > > trims/groups the data that available in access log.[/color]
                  >
                  > Since I use session variables trough pages, and also POST datas,[/color]
                  access log[color=blue]
                  > won't be possible, as those datas aren't shown in access log.[/color]
                  <snip>

                  Since, it is a search page and if I'm an enduser, I would expect it
                  to be a GET. Can't you just transform it to GET? Also, you *can*
                  actually get the session or any PHP variables in access log.

                  --
                  <?php echo 'Just another PHP saint'; ?>
                  Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

                  Comment

                  • Bob Bedford

                    #10
                    Re: Best way to do so.


                    "R. Rajesh Jeba Anbiah" <ng4rrjanbiah@r ediffmail.com> a écrit dans le
                    message de news: 1105986507.6227 82.195630@z14g2 00...legr oups.com...[color=blue]
                    > Bob Bedford wrote:[color=green][color=darkred]
                    >> > Just another thought... Don't log anything with DB. Just use access
                    >> > log for data mining. Probably, you have to use a cron job once in a
                    >> > week that does the data mining and provides you enough stats or
                    >> > trims/groups the data that available in access log.[/color]
                    >>
                    >> Since I use session variables trough pages, and also POST datas,[/color]
                    > access log[color=green]
                    >> won't be possible, as those datas aren't shown in access log.[/color]
                    > <snip>
                    >
                    > Since, it is a search page and if I'm an enduser, I would expect it
                    > to be a GET. Can't you just transform it to GET? Also, you *can*
                    > actually get the session or any PHP variables in access log.[/color]

                    I'm actually transforming to GET. what a mistake to use POST for search ! I
                    didn't know. Doesn't matter, I rewrite part of my site and improve it.

                    The session ID is available by using SID variable. Don't know the size of
                    it, so how to store it in a table ? wath type of field should I set for it ?

                    Bob


                    Comment

                    • R. Rajesh Jeba Anbiah

                      #11
                      Re: Best way to do so.

                      Bob Bedford wrote:
                      <snip>[color=blue][color=green]
                      > > Since, it is a search page and if I'm an enduser, I would expect it
                      > > to be a GET. Can't you just transform it to GET? Also, you *can*
                      > > actually get the session or any PHP variables in access log.[/color]
                      >
                      > I'm actually transforming to GET. what a mistake to use POST for[/color]
                      search ! I[color=blue]
                      > didn't know. Doesn't matter, I rewrite part of my site and improve[/color]
                      it.

                      Just think about the Google which uses POST; being an enduser, you
                      lose the ability/flexibility to bookmark the pages, access the history,
                      hit back & forward, etc. The technical benefits are usually even
                      covered on the basic books.
                      [color=blue]
                      > The session ID is available by using SID variable. Don't know the[/color]
                      size of[color=blue]
                      > it, so how to store it in a table ? wath type of field should I set[/color]
                      for it ?

                      I'm not sure whether you misunderstood my suggestion or not. My
                      suggestion is to avoid logging of search queries via DB, but to use
                      access log as a source of data mining--and possibly to extract/compact
                      the data in access log to a DB through a cron job or so. So, in that
                      case, if you need to log (in access log) any PHP variables or session
                      variables, it can be easily achieved. See
                      <http://www.webmasterwo rld.com/forum88/2489.htm>

                      --
                      <?php echo 'Just another PHP saint'; ?>
                      Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

                      Comment

                      Working...