Group By with Order BY, or INSERT INTO (SELECT * GROUP BY)???

**coolsti** · Sep 10 '08, 07:06 AM

It might help if you explained the logic of your design to us.

Why are there duplicates in your write table? Are they needed (as for logging purposes) or are there duplicates because you do not know of a good way to not have duplicates when writing to the table?

What are your table structures (show the "show create table" output).

It may be possible and sufficiently efficient to create your "read" table as data is being added to your "write" table (assuming both are necessary according to my first comment above) instead of doing this on a daily batch basis.

**pedalpete** · Sep 10 '08, 09:41 PM

Hey Coolsti -
i was trying to keep things simple, that's why i didn't overload the previous message with info.

The data in the write table is added by a web indexer, and though it is pretty good at not duplicating data, for some reason a small percentage of duplicate data is getting through.

I tried creating a unique field of the duplicate data concat (bid, date, time), but for some reason that didn't work. meaning that it still seemed to be returning the same number of entries in the table.

The indexer is constantly running, and therefore there are many reads and writes to that table (reads to check for duplicate data, writes to only update if dups not found).

I tried running queries against one table originally, but the visitors queries were taking too long.

when I build the second table (read) for visitor queries, there was a huge performance jump.
my create tables looks like this

Code:

 CREATE TABLE `update` (
  `sid` int(40) NOT NULL auto_increment,
  `bid` int(11) NOT NULL,
  `ven` varchar(100) collate utf8_unicode_ci NOT NULL,
  `address` varchar(100) collate utf8_unicode_ci NOT NULL,
  `city` varchar(100) collate utf8_unicode_ci NOT NULL,
  `state` varchar(100) collate utf8_unicode_ci NOT NULL,
  `zip` varchar(50) collate utf8_unicode_ci NOT NULL,
  `lat` float NOT NULL,
  `long` float NOT NULL,
  `date` date NOT NULL,
  `time` time NOT NULL,
  `accuracy` tinyint(2) NOT NULL default '0',
  `geo` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`sid`),
  KEY `getshows_idx` (`lat`,`long`,`date`),
  KEY `location_idx` (`address`,`city`,`state`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |

the read table is the same, except for the table name.

Originally posted by coolsti

It might help if you explained the logic of your design to us.

Why are there duplicates in your write table? Are they needed (as for logging purposes) or are there duplicates because you do not know of a good way to not have duplicates when writing to the table?

What are your table structures (show the "show create table" output).

It may be possible and sufficiently efficient to create your "read" table as data is being added to your "write" table (assuming both are necessary according to my first comment above) instead of doing this on a daily batch basis.

Group By with Order BY, or INSERT INTO (SELECT * GROUP BY)???

Group By with Order BY, or INSERT INTO (SELECT * GROUP BY)???

Comment

Comment