How to find max value to find partially duplicate rows

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chintan85
    New Member
    • Jun 2010
    • 33

    How to find max value to find partially duplicate rows

    I have a table1

    id col1 col2
    1 NC_001A>T 5
    2 NC_001C>G 4
    3 NC_001G>C 3
    4 NC_001_98_G>C 1

    Now, I want to have max col2 value for partially duplicate col1 value

    for example if I take partial value for col 1 values NC_001A>T, NC_001C>G and NC_OO1G>C, I can write following sql query

    Code:
     select col1, MAX (col2) from table1 where col1 = ('NC_001')group by col1
    Result:
    id col1 col2
    1 NC_001A>T 5



    But I am not sure how to find max value for all similar partically duplicate rows i.e NC_002, NC_003 and so on in the table

    Thanking in advance.
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    #2
    First off, you have to define "partially duplicate" since that can mean a hundred different things.

    Comment

    • chintan85
      New Member
      • Jun 2010
      • 33

      #3
      Partially duplicate values are found in similar pattern in col1.

      for example consider
      NC_001A>T, NC_001C>G are one pair partially similar
      NC_002A>T, NC_002C_G are another pair partially similar

      they all differ by having expression such as C>G or G>T

      Comment

      • zmbd
        Recognized Expert Moderator Expert
        • Mar 2012
        • 5501

        #4
        ChinTan
        I think that you're going to have to start by defining what is the same about the data and what makes it the largest.

        -z

        Comment

        • Rabbit
          Recognized Expert MVP
          • Jan 2007
          • 12517

          #5
          That still doesn't explain exactly what you mean by "partially duplicate". In your first post, is that fourth item (NC_001_98_G>C) a partial duplicate? Are you defining partial duplicate as the first 6 characters? Which would mean the fourth item is a duplicate. Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate? In which case the fourth item is not a duplicate? Or do you have some other definition of duplicate?

          Comment

          • chintan85
            New Member
            • Jun 2010
            • 33

            #6
            Sorry I was not clear but please exclude row 4 NC_001_98_G>C 1.
            Let just concentrate on values which are same when stripping of last three character (i.e A>T or it could be c>G or G>T etc..)

            "Or do you mean that if you strip off the last 3 characters, if the remainder matches, then it's a duplicate?"

            Yes, I want to write a sql query that matches these values and find max 'col2' value for them.

            Is that clear?

            Comment

            • Rabbit
              Recognized Expert MVP
              • Jan 2007
              • 12517

              #7
              What you can do then is use a combination of the left function with the len function to strip off the last 3 characters. Use that in place of your col1 in your previous query and that will give you what you want.

              Comment

              • chintan85
                New Member
                • Jun 2010
                • 33

                #8
                Code:
                select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1
                The above the query results in:
                NC_001 5

                which is max from
                1 NC_001A>T 5
                2 NC_001C>G 4
                3 NC_001G>C 3


                now how can I modify query to produce result
                NC_001A>T 5

                which is max from

                Comment

                • chintan85
                  New Member
                  • Jun 2010
                  • 33

                  #9
                  Oh I am sorry the query failed. Can you correct me what I am doing wrong

                  Code:
                  select left(col1,LEN(col1) - 3), MAX (col2) from table1 group by col1
                  I have tried to use where clause but it failed
                  Last edited by chintan85; Sep 13 '12, 04:46 PM. Reason: adding

                  Comment

                  • Rabbit
                    Recognized Expert MVP
                    • Jan 2007
                    • 12517

                    #10
                    There's no where clause in your query.

                    You can get the original value by joining it back to the table on the partial key and value.

                    Comment

                    • chintan85
                      New Member
                      • Jun 2010
                      • 33

                      #11
                      can you modify query to give correct result
                      Code:
                      select left(col1,LEN(col1) - 3) as , MAX (col2) from table1 group by col1

                      Comment

                      • Rabbit
                        Recognized Expert MVP
                        • Jan 2007
                        • 12517

                        #12
                        You need to group by your new calculated field. You also need to give your calculated field an alias.

                        Comment

                        • chintan85
                          New Member
                          • Jun 2010
                          • 33

                          #13
                          I found a smarter way to do this. The below query fetches the row from col1 value where no other row exists with the same col1 value and a greater col2 values for that.


                          Code:
                          SELECT *
                          FROM table1 as t1
                          LEFT OUTER JOIN table1 AS t2
                          ON (left(t1.col1,LEN(t1.col1) - 3)= left(t2.col1,LEN(t2.col1) - 3) AND t1.col2 < t2.col2)
                          WHERE t2.col1 IS Null

                          Comment

                          • ck9663
                            Recognized Expert Specialist
                            • Jun 2007
                            • 2878

                            #14
                            Make sure t2.col1 does not allow NULL. Also, if this will be executed against a large table, use NOT EXISTS instead.

                            Good Luck!!!


                            ~~ CK

                            Comment

                            Working...