It's Me again!

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • TVining
    New Member
    • Dec 2007
    • 26

    It's Me again!

    Okay,
    I have a table with a LONGTEXT field. Some records have 1000+ characters in the suspect field. Since it's a LONGTEXT field, i can't figure out how to set a no-dupe index on it.

    I get duplicate records where the first 100 characters are the same. If the first 100 are the same, the entire record is the same. Guaranteed. I have a field that includes the first 50 chars, but sometimes they are the same. (fld 1 "James d. Smith" fld 2= "James d. Smith has deployed" or "James d. Smith has been promoted"

    I would like to eliminate existing duplicates. My thought is to create a TEXT field with 100 length, use a query to copy the first 100 chars over to it for each record. THEN, copy the structure, make that field UNIQUE then copy the records over, thus eliminating the duplicates.

    Anyone have any better ideas? anyone have any code that does this?
  • iburyak
    Recognized Expert Top Contributor
    • Nov 2006
    • 1016

    #2
    Here is my little test to simulate what you have and remove duplicates. I worked with first 14 characters just to make it small.


    Code:
    1. Create a table
    create table #a (name varchar(100))
    2. Insert some records
    insert into #a values ('James d. Smith has deployed')
    insert into #a values ('James d. Smith has deployed')
    insert into #a values ('b James d. Smith')
    insert into #a values ('a James d. Smith has deployed')
    insert into #a values ('James d. Smith has deployed')
    insert into #a values ('James d. Smith has deployed')
    insert into #a values ('b James d. Smith')
    insert into #a values ('a James d. Smith has deployed')
    
    3. Run code: 
    set rowcount 1
    
    select 1
    While @@Rowcount > 0
    BEGIN
    delete from #a where substring(name,1,14) = (select top 1 substring(name,1,14)
    					     from #a
    					     group by substring(name,1,14)
    					     having count(*) > 1)
    END
    set rowcount 0 
    
    4. Check result
    select * from #a order by 1
    Your result should be:

    a James d. Smith has deployed
    b James d. Smith
    James d. Smith has deployed

    Then you can add a column to the table with first 100 characters to be able to have unique records.

    Good Luck.

    Comment

    Working...