Group By with Order BY, or INSERT INTO (SELECT * GROUP BY)???

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pedalpete
    New Member
    • Oct 2007
    • 109

    Group By with Order BY, or INSERT INTO (SELECT * GROUP BY)???

    I seem to have a few duplicates entries in a very large database.

    My database has two identical tables. One is written to on a regular basis (every few seconds) and the other is used for queries, and updated from the write table on a daily basis.

    I'm trying to get rid of duplicate entries for the queries, and I thought the most effective way to do this would be to run the query as
    Code:
     
    INSERT INTO readtable (SELECT * FROM writetable GROUP BY column1, column2);
    I have also tried the same query, but listed each row instead of using a *.
    The problem is this seems to be crashing my dev mysql server. Though SQL is not giving me an error.

    My other option is to use the GROUP BY statement on the read table when a read request is made. But I thought building the read table without duplicates would be more efficient, and I have an ORDER BY statement in the read requests as well, and have read that mixing GROUP BY with ORDER BY is a no, no...

    I have run the GROUP BY column1, column2 ORDER BY column 1, column6
    and it seems to work ok.

    Any thoughts on this?
  • coolsti
    Contributor
    • Mar 2008
    • 310

    #2
    It might help if you explained the logic of your design to us.

    Why are there duplicates in your write table? Are they needed (as for logging purposes) or are there duplicates because you do not know of a good way to not have duplicates when writing to the table?

    What are your table structures (show the "show create table" output).

    It may be possible and sufficiently efficient to create your "read" table as data is being added to your "write" table (assuming both are necessary according to my first comment above) instead of doing this on a daily batch basis.

    Comment

    • pedalpete
      New Member
      • Oct 2007
      • 109

      #3
      Hey Coolsti -
      i was trying to keep things simple, that's why i didn't overload the previous message with info.

      The data in the write table is added by a web indexer, and though it is pretty good at not duplicating data, for some reason a small percentage of duplicate data is getting through.

      I tried creating a unique field of the duplicate data concat (bid, date, time), but for some reason that didn't work. meaning that it still seemed to be returning the same number of entries in the table.

      The indexer is constantly running, and therefore there are many reads and writes to that table (reads to check for duplicate data, writes to only update if dups not found).

      I tried running queries against one table originally, but the visitors queries were taking too long.

      when I build the second table (read) for visitor queries, there was a huge performance jump.
      my create tables looks like this
      Code:
       CREATE TABLE `update` (
        `sid` int(40) NOT NULL auto_increment,
        `bid` int(11) NOT NULL,
        `ven` varchar(100) collate utf8_unicode_ci NOT NULL,
        `address` varchar(100) collate utf8_unicode_ci NOT NULL,
        `city` varchar(100) collate utf8_unicode_ci NOT NULL,
        `state` varchar(100) collate utf8_unicode_ci NOT NULL,
        `zip` varchar(50) collate utf8_unicode_ci NOT NULL,
        `lat` float NOT NULL,
        `long` float NOT NULL,
        `date` date NOT NULL,
        `time` time NOT NULL,
        `accuracy` tinyint(2) NOT NULL default '0',
        `geo` tinyint(1) NOT NULL default '0',
        PRIMARY KEY  (`sid`),
        KEY `getshows_idx` (`lat`,`long`,`date`),
        KEY `location_idx` (`address`,`city`,`state`)
      ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
      the read table is the same, except for the table name.


      Originally posted by coolsti
      It might help if you explained the logic of your design to us.

      Why are there duplicates in your write table? Are they needed (as for logging purposes) or are there duplicates because you do not know of a good way to not have duplicates when writing to the table?

      What are your table structures (show the "show create table" output).

      It may be possible and sufficiently efficient to create your "read" table as data is being added to your "write" table (assuming both are necessary according to my first comment above) instead of doing this on a daily batch basis.

      Comment

      Working...