Find Credit Card numbers in text field

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • code green
    Recognized Expert Top Contributor
    • Mar 2007
    • 1726

    Find Credit Card numbers in text field

    Data Protection etc.... shouldn't be there.
    How do I find credit card numbers buried amongst any length of text.

    Googling finds credit card validation claims, which are too complicated.

    I need to find groups of numbers that resemble credit card numbers.
    I think the criteria is 13 - 24 digits in groups of 4 or 5
    seperated by spaces or dashes or bunched together.

    It looks like a regex using PATINDEX is the answer but regex is black magic to me
    Have I got this right and could I please have some pattern suggestions?
    Code:
    SELECT notes FROM cust_order WHERE PATINDEX('pattern',notes) > 0
  • Uncle Dickie
    New Member
    • Nov 2008
    • 67

    #2
    Using a similar function to the one I posted last time you could get something that might help you on your way (I'm not sure what CK had in mind for an alternative approach with String Parsing so I'm bumbling on as before !):

    Code:
    CREATE FUNCTION [dbo].[R_Test_func2](@sub nvarchar(1000))
    RETURNS int
    AS
    BEGIN
    
    DECLARE @Output int
    DECLARE @myOutput int
    SET @Output = 0
    SET @myOutput = 0
    
    WHILE patindex('%[0123456789]%', @sub) <> 0
     BEGIN
      SET @Output = 
       len(substring
       (
        substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))
        ,1
        ,CASE
         WHEN patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))) = 0
         THEN len(@sub)
         ELSE patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub)))-1
        END
       ))
    
      SET @sub = 
       CASE
        WHEN patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))) = 0
        THEN 'a'
        ELSE substring
        (
         substring(@sub,patindex('%[0123456789]%',@sub),len(@sub))
         ,patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub)))
         ,len(@sub)
        )
       END
    
      SET @myOutput = 
       CASE
        WHEN @Output > @myOutput THEN @Output ELSE @myOutput
        END
      END
     RETURN  @myOutput
    
    END
    The general principle is the same as before

    It looks for the 1st numeric character:
    Code:
    patindex('%[0123456789]%',@sub)
    Then for the 1st non numeric character thereafter (this time also allowing for a space or -):
    Code:
    patindex('%[^0123456789 -]%',substring(@sub,patindex('%[0123456789]%',@sub),len(@sub)))
    The length of this string is stored and then the rest of the string is searched.
    The resulting output will be the length of the longest string of consecutive numbers (including spaces and -)

    So, you could use a WHERE statement such as:

    Code:
    WHERE dbo.R_Test_Func2(notes) > 12
    You could get rid of one of the two variables @Output and @myOutput by doing the SET as a CASE statement in one go but I felt this may complicate the (already less than easy to read) code too much!

    As a note, another fix from the original function I posted is that it will now work if the last character is non-numeric, hence the addition of a couple of CASE statements buried in there. In the previous version if the last character was not a number then it would try and return a string of either length 0 or starting from the 0th character which I discovered it did not like very much!


    Hope this helps

    Comment

    • code green
      Recognized Expert Top Contributor
      • Mar 2007
      • 1726

      #3
      Looks like it could work Uncle Dickie.
      I am under a bit of pressure with this one now, so will give it a try

      Comment

      • ck9663
        Recognized Expert Specialist
        • Jun 2007
        • 2878

        #4
        It meant the same thing :) You're basically parsing the TEXT which is by common definition is a string. As string parsing is reading the string one character at a time or finding patterns within it.

        As I always said, if that algorithm gave you the performance you need, that should be fine, until someone suggested a better one :)

        Thanks Uncle Dickie and Happy Coding CG :)


        --- CK

        Comment

        • Uncle Dickie
          New Member
          • Nov 2008
          • 67

          #5
          Having looked a bit more at the patindex function, a much easier option than my previous function would simply be:

          Code:
          SELECT notes
          FROM   cust_order
          WHERE  patindex('%[0-9][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -]%',TranID) > 0
          This is just looking for a string of 13 consecutive numbers, spaces or '-' (it must start with a number).

          Comment

          Working...