Finding duplicates rows in a table

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Gayatree
    New Member
    • Nov 2008
    • 9

    Finding duplicates rows in a table

    I have to concatenate 2 colimns in a table and find duplicates in them.
    I already used the method
    select * from tableA a
    where (select count(*) from TableA b
    where acol1+ +col2 = b.col1+ +col2)>1
    But the performance is very bad.
    Data is also huge
    Can you help me
    Thanks in advance
  • ck9663
    Recognized Expert Specialist
    • Jun 2007
    • 2878

    #2
    Originally posted by Gayatree
    I have to concatenate 2 colimns in a table and find duplicates in them.
    I already used the method
    select * from tableA a
    where (select count(*) from TableA b
    where acol1+ +col2 = b.col1+ +col2)>1
    But the performance is very bad.
    Data is also huge
    Can you help me
    Thanks in advance
    try this to show the duplicates.

    Code:
    select a.*
    from tableA a
    inner join (select col1, col2 from tableA b group by col1, col2 having count(*) > 1) s on a.col1 = s.col1 and s.col2 = a.col2

    Happy coding!

    -- CK

    Comment

    • Gayatree
      New Member
      • Nov 2008
      • 9

      #3
      Hi ck9663,
      Thank u for u r immediate reply.
      But when I run the below query

      select Col1 + + Col2
      from TableA
      group by Col1 + + Col2
      having(count(*) )>1

      I should get the duplicates .
      But the the duplicate which I got from the query which u suggested is
      different from the above qurey.
      Is the above query wrong?
      can u please help me
      Thanks in advance

      Comment

      • ck9663
        Recognized Expert Specialist
        • Jun 2007
        • 2878

        #4
        try:

        Code:
        select Col1, Col2
        from TableA
        group by Col, Col2
        having count(*) >1
        By the way, use the tags when posting.

        -- CK

        Comment

        • Gayatree
          New Member
          • Nov 2008
          • 9

          #5
          Hi CK
          Thank u for u r reply.
          The 2 query's which u have suggested r working well.
          But The count which both of them return should be the same.
          But I am getting 1000 more records with the first one.

          Next time when i post a query I will use tags

          Can u please help me
          If I am not clear with the question me please let me know
          Thank u in advance

          Comment

          • ck9663
            Recognized Expert Specialist
            • Jun 2007
            • 2878

            #6
            This query will give the duplicate:

            Code:
            select Col1, Col2, count(*) as cnt
            from TableA
            group by Col, Col2
            having count(*) >1
            Look at cnt column. If it says "2" it means you have two records on your table with the same col1 and col2. This means it will return the unique combination of the two columns and the number of rows that are duplicate. Not the actual duplicate rows. Make sense?

            -- CK

            Comment

            • Gayatree
              New Member
              • Nov 2008
              • 9

              #7
              YUP ,GOT U

              Thank u So much.

              Comment

              • Gayatree
                New Member
                • Nov 2008
                • 9

                #8
                Sorry for troubling you again.
                I am loading the data using SSIS.
                If there r 2 duplicates only one should be loaded into the
                target table ,the remaining should go into another table.
                Is this possible in SSIS.

                Thank you for helping me and giving me all the above query's
                Thank you

                Comment

                • ck9663
                  Recognized Expert Specialist
                  • Jun 2007
                  • 2878

                  #9
                  Am not sure if it's possible in SSIS.

                  What you could do is to create a trigger on your target table. If the value you're trying to insert on your table is already existing, insert it to another table and rollback the transaction.

                  On T-SQL, it would've been different. Depending on your case, a DISTINCT might be sufficient enough. If not, you might need to further analyze your table and establish an algorithm on which record to keep and which record to throw.

                  -- CK

                  Comment

                  • Gayatree
                    New Member
                    • Nov 2008
                    • 9

                    #10
                    Thank you for u r immedaite answer.
                    I will try creating trigger.

                    In SSMS,when we right click on the table
                    we will get options as script table as create to ...ect
                    But for me alter to is disabled,what should I do to enable that.
                    why will that be disabled when I have permissions to devlop in that box
                    and even execute is disabled.

                    Can u please help me

                    Thank you in advance

                    Comment

                    • ck9663
                      Recognized Expert Specialist
                      • Jun 2007
                      • 2878

                      #11
                      Permission to see the table and do a SELECT on it, yes. Other rights and permissions, no. Call your admin to raise your permission rights.

                      -- CK

                      Comment

                      • Gayatree
                        New Member
                        • Nov 2008
                        • 9

                        #12
                        Hi Ck,
                        Sorry for asking u the same question again.
                        the query above which u suggested will return all the duplicates in a table
                        But I want a select statement that will return only one duplicate.
                        Can u please help me
                        Thanks in advance

                        Comment

                        • ck9663
                          Recognized Expert Specialist
                          • Jun 2007
                          • 2878

                          #13
                          Which query?

                          This:

                          Code:
                          select Col1, Col2
                          from TableA
                          group by Col, Col2
                          having count(*) >1
                          will only return one row for each pair.

                          -- CK

                          Comment

                          • Gayatree
                            New Member
                            • Nov 2008
                            • 9

                            #14
                            Thank you for u r immediate reply.

                            But i need to select all the fileds in the table.
                            For that I got the query like

                            Code:
                            CREATE TABLE [dbo].[Employee] (
                                        [id] [int] NULL ,
                                        [name] [Varchar] (50) COLLATE Latin1_General_CI_AS NULL ,
                                        [salary] [Numeric](18, 2) NULL 
                             ) ON [PRIMARY]
                            
                            Insert into employee values (1,'Ram', 1000.00)
                            Insert into employee values (1,'Ram', 1000.00)
                            Insert into employee values (2,'Joe', 2000.00)
                            Insert into employee values (2,'Joe', 1000.00)
                            Insert into employee values (3,'Mary', 1000.00)
                            Insert into employee values (4,'Julie', 5000.00)
                            Insert into employee values (2,'Joe', 1000.00)
                            Insert into employee values (1,'Ram', 1000.00)
                            
                            --The above is the table and I am trying to find out duplicates
                            
                            --My query is
                            SELECT *  FROM employee
                            WHERE ID IN  (SELECT ID  FROM employee   
                            WHERE EXIST(SELECT NULL FROM employee AS tmptable     
                             WHERE employee.id = tmptable.id
                             group by id          
                            HAVING (employee.id > MIN(tmptable.id))))
                            But it throwing me errors
                            Sorry for troubling u again

                            Thanks in advance

                            Comment

                            • ck9663
                              Recognized Expert Specialist
                              • Jun 2007
                              • 2878

                              #15
                              Joe (ID = 2) has three records with 2 salary values (1000.00 and 2000.00). Which one do you want to return? Is there a pattern?

                              -- CK

                              Comment

                              Working...