Is there a more efficient way to get unique results than by using DISTINCT?

**code green** · Feb 11 '11, 08:46 AM

I have found GROUP BY to be quicker than DISTINCT.
Not much, but I have never tried with that many records.

Post the SQL and maybe somebody can suggest a more efficient query.

**TheSmileyCoder** · Feb 11 '11, 10:14 AM

Is there any possibility of removing the duplicate records? Another posibility is if you can use a smaller query (i.e. on fewer fields) to determine uniqueness and then join that query with your main query.

**NeoPa** · Feb 11 '11, 11:39 AM

Interesting question Jesse. I would have expected a DISTINCT predicate to be implemented in such a way as to be more efficient than a GROUP BY personally, but I guess there may be complications with optimisation of the basic concept when many fields are involved and particularly when non-indexed fields are involved. This is not speaking from testing, but just the idea that there is less work to do and less checking involved logically. Each implementation would have its own way of doing things though, hence I expect you're finding DISTINCT slower than GROUP BY in your case.

I'd be interested to know from you what determines your idea of duplicate records? This is typically determined by comparing the PKs, but it may be that you are comparing each field (the full record essentially), which of course, is orders of magnitude less efficient.

Smiley's idea is a good one going forward, but I'd make sure, if at all possible, that the compared field(s) are all included in a single index. Determining how practical this will be is for you, who knows more about your use of the data, to do. This is one of the areas where db design experience comes in.

**Jesse Jones** · Feb 13 '11, 12:01 AM

Here is my code:

Code:

SELECT [Call Log].[Client ID], DCount("[Call ID]","[Call Log]","[Call Code]<>'NOM' AND [Client ID]=" & [Clients].[Client ID]) AS CountactsCount, [Call Log].[Call Code], Clients.[First Name], Clients.[Last Name], Clients.Sfx, Clients.[Mailing Address], Clients.City, Clients.ST, Clients.Zip, Clients.Phone, Clients.Type, Clients.[Time Zone], Clients.Gender, DCount("Phone","Clients") AS countTotal
FROM Clients LEFT JOIN [Call Log] ON Clients.[Client ID] = [Call Log].[Client ID]
WHERE (((DCount("[Call ID]","[Call Log]","[Call Code]<>'NOM' AND [Client ID]=" & [Clients].[Client ID]))=0));

Here's the background: I have a table of clients and a table that serves as a call log. I want to be able to find all of the people who have not given a reply to our inquiry. We do this by using the dcount shown above to count how many times in the call log people have given us a response other than the no response code. (So, how many times they have given us a response. If it is >0, we don't need to call them again.)

This works. The problem is that we get a record back for every call to that client. I only want the client to show once. DISTINCT works, but it takes quite a long time. Any thoughts would help.

Thanks.

**NeoPa** · Feb 13 '11, 12:29 AM

Jesse, if you're going to use a Domain Aggregate function within a query then you can't really expect anything in the way of performance. I'm only surprised it waited for you to add the DISTINCT predicate before slowing down. If you have a record source that needs to be included in the logic of your query then a JOIN of some form is a much more logical approach to use.

Unfortunately, most RDBMSs run so fast that even SQL like this example runs pretty quickly unless large numbers of records are involved. People often don't even consider what their design structure until they get to that stage, by which time it is too late really. It can be fixed. I'm not saying it's beyond repair, but that sort of approach does tend to waste a great deal of your time in the long run if you want to design a database that can handle the sort of numbers you're talking about.

**Jesse Jones** · Feb 13 '11, 12:53 AM

Okay, thanks for the thought. I'm sorry, but I'm pretty unschooled in this. This database was my first and I am learning as I go. Unfortunately (from a technical side), the demand for this information has grown at a degree that was incomprehensibl e a few months ago. Am I reading that right to understand that I'm pretty much in big trouble? The size of this is only going to grow. I am in the process of converting to SQLServer, but we won't get there for a few weeks at least. I need filter running in the next day or two.

Any suggestions?

**NeoPa** · Feb 13 '11, 08:51 PM

Originally posted by Jesse Jones

Jesse Jones:
Am I reading that right to understand that I'm pretty much in big trouble?

I would say yes.

SQL Server does give you more options certainly, but I would stick with my earlier advice of designing a query to process all the inputs required. Domain Aggregate functions are not things that work very efficiently within SQL.

**Rabbit** · Feb 13 '11, 09:57 PM

For the WHERE clause I would try a subquery while for the count in the SELECT clause, I would try a cross join count if it's feasible. If not, I would use a subquery. I would also make sure all the appropriate indexes have been created.

Is there a more efficient way to get unique results than by using DISTINCT?

Is there a more efficient way to get unique results than by using DISTINCT?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment