How efficient are queries that use DISTINCT (vs filtering data yourself)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Brosert
    New Member
    • Jul 2008
    • 57

    How efficient are queries that use DISTINCT (vs filtering data yourself)

    I have a query on a database that returns a large number of rows from a database (somewhere near a million) to an application that then does some processing with them. Some of these returns may be duplicates, and I considered using DISTINCT to make sure these duplicates do not occur, however I suspect I would be better off filtering the duplicates within the application, rather than within the database.

    My justification would be as follows:
    - Using DISTINCT means the entire query needs to complete before any results are returned
    - Filtering myself means the results will start to be returned almost immediately, so I can be processing the results as the query is still executing.

    Just wondering whether people more familiar with how databases (in particular Oracle) work.

    (This application is written in C#, however I suppose the bigger point is questioning whether there's any point using 'distinct' in a cursor (other than in a sub-query) in any language....)
  • rski
    Recognized Expert Contributor
    • Dec 2006
    • 700

    #2
    Did you try to optimize the query in any way?
    Is it very complex query? Did you check explain plan for it?
    I think that processing 1mln rows from the application will also be very time consuming. How do you want to code that distinct in application(usi ng some hash table?)?

    Comment

    • Brosert
      New Member
      • Jul 2008
      • 57

      #3
      1) Not sure what you mean by optimisation. I am making sure that where possible, indexes are being used for joins (the data was designed to be queried roughly how I query it)
      2) The query joins on 3 tables, but I wouldn't consider it overly complex
      3) No I did not have a look at the explain plan, because I didn't really think it was overly relevant to the question - I hoped someone might discuss whether on the whole they think it more efficient to use distinct or filter in the app.
      4) 1 million rows may be time consuming either way, but the specific question I was interested in is whether using DISTINCT is likely to be more or less efficient than filtering Application side.

      For the record, I have found that in this case at least, it seems a lot more efficient to filter in the application rather than on the database - and is far more flexible

      Comment

      • rski
        Recognized Expert Contributor
        • Dec 2006
        • 700

        #4
        4) yes, but if you do filtering on the application side you will have to send 1mln rows to the application and it is wrong from the performance point of view. If you can do something with the database (like get distinct columns) you should do that.

        Comment

        Working...