Deleting duplicates of rows

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chazzy69
    New Member
    • Sep 2007
    • 196

    Deleting duplicates of rows

    Hi im new to msql, anyway i have made up a table which contains a bunch of results from a search and what i want to do is to delete any duplicate rows in my database.

    I not actually sure how to go about achieve this, i thought perphaps performing a search combined with some check method and a delete statement.

    Is there perphaps a msql function that can achieve this??

    Also note that when i say duplicate copies, it does not necesarilly mean that all the colums two rows while be the same but will have a enough in common to be able to tell if duplicate.

    Any help with this or a reference would be great, thanks
  • coolsti
    Contributor
    • Mar 2008
    • 310

    #2
    Basically it sounds like you have the wrong table structure if you have duplicate rows that you wish to remove. Either your program that puts data into the database, or the database itself, through its structure, should filter out the rows that you consider to be duplicate before they even get put into the database tables.

    Take a look at the mysql documentation (e.g. online) about the "INSERT INTO ..... ON DUPLICATE KEY UPDATE" syntax. This will do what you want.

    For example, this query statement:

    INSERT INTO mytable (id,xpos,ypos,s tring1,string2, string3)
    values (25, 2, 5, 'hello','world' ,'from me')
    ON DUPLICATE KEY UPDATE string1 = VALUES(string1) ;

    This is only an example, and surely does not do what you need. But take a look what it does. In this case, let us say that I have set up my table "mytable" to have a unique key composed of the three columns id, xpos and ypos.

    Let us say I try to insert into the database a set of values for which the combination of id, xpos and ypos are not yet in the database. In this case, the insert part of the query will be carried out, and the row will be inserted into the database.

    Now let us say I try to insert a set of values for which the combination of id, xpos and ypos already are present in the table. Because of the "ON DUPLICATE KEY" clause, mysql will not return an error saying I am trying to insert a duplicate key, but instead will carry out the statement after the ON DUPLICATE KEY keywords. In my case above, I ask to update the value stored in the table for string1 with the value in my insert statement, but to leave the current values of string2 and string3 alone. You can update any or none of the other values, as you need.

    What I am trying to say here is, instead of trying to find a query to remove the rows you consider to be duplicate, you should instead change your queries and your database structure (by adding unique keys) so that these duplicate rows do not even get into the database.

    Comment

    • chazzy69
      New Member
      • Sep 2007
      • 196

      #3
      First thanks for replying, secondly this sounds like what i probably need to do, also i do have a few unique colums approximatley about five of them if needed to differenatate between rows and another six or so rows that contain arbitry data that only relates to the other colums but can vary in the case of a duplicate of of a row.

      i.e. The first five colums can be used to find a duplicate but the rest does vary.
      meaning that i have check for duplicate in the first five colums, but i need to enter data also in the rest that may not be the same. You may ask why are there differences between duplicates? its because im retrieve the same information from different sources so that the sources are identical to each other.

      Next i kinda understand the code example your using, correct me if i wrong but it seems that a specific key/colum name is been update when there is a duplicate rather than actually not adding the row at all.

      What im asking is it possible to not update a certain key/colum but rather just skip over the row.

      Note im using a php script to input data into the database.

      Thanks for help,

      Comment

      • chazzy69
        New Member
        • Sep 2007
        • 196

        #4
        I think i just realised that the duplicate key means that it will skip putting in the next entry and just update a certain colum or not.

        This will work fine but can it determine a duplicate key with only five of 10 colums been identical?

        Thanks for help

        Comment

        • coolsti
          Contributor
          • Mar 2008
          • 310

          #5
          You need to determine which columns in your table need to be identical in order for it to be considered by you as being a duplicate row.

          If your table has 10 columns, and you say that 5 of them need to be unique, then you need to create a unique key for that table which consists of these 5 columns, and not the others.

          Here an example:

          CREATE TABLE `mytable` (
          `id` smallint(5) unsigned NOT NULL auto_increment,
          `firstname` varchar(12) NOT NULL,
          `lastname` varchar(12) NOT NULL,
          `zipcode` smallint(3) unsigned NOT NULL default '0',
          `age` smallint(3) unsigned default NULL,
          `haircolor` varchar(20) NOT NULL default '',
          PRIMARY KEY (`id`),
          UNIQUE KEY (`firstname`,`l astname`,`zipco de`) )

          In the above case I declare that the columns firstname, lastname and zipcode have to be unique. But not the age or haircolor. So if I tried to add a row which had the same combination of firstname, lastname and zipcode of another row, I would not be allowed to. But the database does not care about what is in the other two columns, age and haircolor. Only the other three columns must be unique.

          And yes, if you use the "insert ... on duplicate key .... " command, it will insert the row if a duplicate row would not occur, but instead would perform what you ask it to do after the "on duplicate key" clause otherwise. So you can decide yourself which of the columns will be updated or left alone in the event that an attempt is made to create a duplicate key.

          Comment

          • chazzy69
            New Member
            • Sep 2007
            • 196

            #6
            Thanks this helps heaps, so i only need to specify something as a unique key just once and then the duplicate will only look at the colums specified.


            Thanks agains, this really helps

            Comment

            • chazzy69
              New Member
              • Sep 2007
              • 196

              #7
              This seems to work quite well except when one of the unique keys is different it still recognises that it is a duplicate.

              Is it possible to it to recognise a duplicate only when say all five unique keys are idenitical.


              Thanks agian for all the help

              Comment

              • coolsti
                Contributor
                • Mar 2008
                • 310

                #8
                This should not happen if you have it configured correctly.

                Try to post your table structure here, and the query you are using to insert.

                If you have access to it, try to do a "SHOW CREATE TABLE tablename"
                and post that. It will tell the indices, keys, etc. that you have set up.

                You may have some other key as well, because if you have created a unique key composed of 5 columns, all 5 columns would need to be the same for the insert to be disallowed. So if only a subset of those 5 columns is the same, then you may have some other key.

                You also can try to do the query without the "on duplicate key" part, and have the mysql error printed out and look at that. If it is not inserting a row because it thinks the row is a duplicate, the error should give you an indication of the criteria that mysql used for this decision.

                Comment

                • chazzy69
                  New Member
                  • Sep 2007
                  • 196

                  #9
                  Heres the query im using in php

                  Code:
                  $con = mysql_connect("127.0.0.1","a_table","password");
                  if (!$con)
                    		{
                    			die('Could not connect: ' . mysql_error());
                    		}
                   
                  		else {
                  
                  			mysql_select_db("realestate_aus", $con);
                  
                  
                  			$sql = "INSERT INTO search_results (prices, beds, baths, car, logo, houseimage, propadd, blur, 					moreinfo, state, pcode, suburb) 
                  			VALUES 
                  			('12000', 
                  			'2', 
                  			'2', 
                  			'4', 
                  			'logos', 
                  			'image', 
                  			'address', 
                  			'blur4', 
                  			'info', 
                  			'nsw', 
                  			'1100', 
                  			'liverpool' )
                  			ON DUPLICATE KEY UPDATE blur = VALUES(blur)";
                  
                  		if (!mysql_query($sql,$con))
                    		{
                    			die('Error: ' . mysql_error());
                    		}
                  		echo "Your Form Successfully submitted" . '<br />';
                  
                  		mysql_close($con);
                  
                  
                  }
                  -------------------

                  and table is like this but can't draw a actual table sorry:

                  Prices| Beds| Baths| Car| Logo| Houseimage| Propadd| Blur| Moreinfo| Index| State| Pcode| Suburb|

                  thats the best way i can desiplay;

                  The colums that are currently unique are prices, beds, baths, car, state, pcode, suburb.

                  [EDIT]

                  I got that show table thing to work heres the results, looks a mess though sorry,

                  CREATE TABLE `search_results ` (\n `prices` varchar(30) NOT NULL,\n `beds` int(10) NOT NULL,\n `baths` int(10) NOT NULL,\n `car` int(10) NOT NULL,\n `logo` varchar(200) NOT NULL,\n `houseimage` varchar(200) NOT NULL,\n `propadd` varchar(50) NOT NULL,\n `blur` varchar(400) NOT NULL,\n `moreinfo` varchar(50) NOT NULL,\n `Index` int(11) NOT NULL auto_increment, \n `state` varchar(10) NOT NULL,\n `pcode` int(15) NOT NULL,\n `suburb` varchar(30) NOT NULL,\n UNIQUE KEY `prices` (`prices`),\n UNIQUE KEY `beds` (`beds`),\n UNIQUE KEY `baths` (`baths`),\n UNIQUE KEY `car` (`car`),\n UNIQUE KEY `state` (`state`),\n UNIQUE KEY `pcode` (`pcode`),\n UNIQUE KEY `suburb` (`suburb`),\n KEY `Index` (`Index`)\n) ENGINE=MyISAM AUTO_INCREMENT= 482 DEFAULT CHARSET=utf8

                  Comment

                  • amitpatel66
                    Recognized Expert Top Contributor
                    • Mar 2007
                    • 2358

                    #10
                    Originally posted by chazzy69
                    Heres the query im using in php

                    Code:
                    $con = mysql_connect("127.0.0.1","a_table","password");
                    if (!$con)
                      		{
                      			die('Could not connect: ' . mysql_error());
                      		}
                     
                    		else {
                    
                    			mysql_select_db("realestate_aus", $con);
                    
                    
                    			$sql = "INSERT INTO search_results (prices, beds, baths, car, logo, houseimage, propadd, blur, 					moreinfo, state, pcode, suburb) 
                    			VALUES 
                    			('12000', 
                    			'2', 
                    			'2', 
                    			'4', 
                    			'logos', 
                    			'image', 
                    			'address', 
                    			'blur4', 
                    			'info', 
                    			'nsw', 
                    			'1100', 
                    			'liverpool' )
                    			ON DUPLICATE KEY UPDATE blur = VALUES(blur)";
                    
                    		if (!mysql_query($sql,$con))
                      		{
                      			die('Error: ' . mysql_error());
                      		}
                    		echo "Your Form Successfully submitted" . '<br />';
                    
                    		mysql_close($con);
                    
                    
                    }
                    -------------------

                    and table is like this but can't draw a actual table sorry:

                    Prices| Beds| Baths| Car| Logo| Houseimage| Propadd| Blur| Moreinfo| Index| State| Pcode| Suburb|

                    thats the best way i can desiplay;

                    The colums that are currently unique are prices, beds, baths, car, state, pcode, suburb.

                    [EDIT]

                    I got that show table thing to work heres the results, looks a mess though sorry,

                    CREATE TABLE `search_results ` (\n `prices` varchar(30) NOT NULL,\n `beds` int(10) NOT NULL,\n `baths` int(10) NOT NULL,\n `car` int(10) NOT NULL,\n `logo` varchar(200) NOT NULL,\n `houseimage` varchar(200) NOT NULL,\n `propadd` varchar(50) NOT NULL,\n `blur` varchar(400) NOT NULL,\n `moreinfo` varchar(50) NOT NULL,\n `Index` int(11) NOT NULL auto_increment, \n `state` varchar(10) NOT NULL,\n `pcode` int(15) NOT NULL,\n `suburb` varchar(30) NOT NULL,\n UNIQUE KEY `prices` (`prices`),\n UNIQUE KEY `beds` (`beds`),\n UNIQUE KEY `baths` (`baths`),\n UNIQUE KEY `car` (`car`),\n UNIQUE KEY `state` (`state`),\n UNIQUE KEY `pcode` (`pcode`),\n UNIQUE KEY `suburb` (`suburb`),\n KEY `Index` (`Index`)\n) ENGINE=MyISAM AUTO_INCREMENT= 482 DEFAULT CHARSET=utf8
                    Please use [CODE] tags for any source code that you post in this forum. That would make toe source code look much clear and readable for other users.

                    Comment

                    • coolsti
                      Contributor
                      • Mar 2008
                      • 310

                      #11
                      Your dump of the "show create table" shows your error clearly.

                      Examine again the way you need to define a unique key composed of more than one column. You have done this incorrectly.

                      You have defined many unique keys each composed of one table column. Instead, you want to have one unique key composed of all columns:
                      ...
                      UNIQUE KEY `prices` (`prices`,`beds `,`baths`,`car` ,`state`,`pcode `,`suburb`),
                      ...

                      Look to the documentation (e.g. online) on how to add keys to your database tables.

                      Hint: in the above, the unique key composed of 7 columns has been given the name "prices" but this is not important what it is called.

                      Comment

                      • chazzy69
                        New Member
                        • Sep 2007
                        • 196

                        #12
                        Ok i check the manual and i found a reference for making a single key for more than one colum e.g. -

                        Code:
                        UNIQUE KEY (col1, col2)
                        So then i took your example from above removing all the quotation marks and im getting sql syntax, i also tried it with the quotations; i tried this directly in mysql database rather than a php file just to make sure it was right.

                        So im little confused as to what i supposed to do.

                        Note i now understand that the unique key can be for more than one colum.

                        Thanks for the help

                        Comment

                        • chazzy69
                          New Member
                          • Sep 2007
                          • 196

                          #13
                          I did a little more research and is it possible that you have to make the unique key on the creation of the table or is possible to edit the table.

                          Thanks again

                          Comment

                          • chazzy69
                            New Member
                            • Sep 2007
                            • 196

                            #14
                            First sorry for the triple posting;

                            I think i finally figured out how to implement the unique key - used this statement does it look right???


                            Code:
                            ALTER TABLE search_results ADD UNIQUE KEY mykey(prices,beds,baths,car,state,pcode,suburb)
                            [EDIT]

                            OK i tested ok the my inserting code and it seems to be working,

                            thank you very much for you help

                            Comment

                            • coolsti
                              Contributor
                              • Mar 2008
                              • 310

                              #15
                              Exactly, you found the right syntax for adding a unique key consisting of multiple columns.

                              Good luck! Glad I could help. I also have learned all of this rather recently, mostly by surfing through the online mysql documentation.

                              By the way, those were not quotation marks in the create table statements, they were these funny little slanted accent marks. They are not necessary if your column name does not have any spaces in it. Single quotation marks will not work here, as Mysql will interpret this then as a string variable and not a column name.

                              Comment

                              Working...