I would like some advice on a data and query problem I face. I have a
data table with a "raw key" value which is not guaranteed to be valid
at its source. Normally, this value will be 9 numeric digits and map to
a "names" table where the entity is given assigned an "official name".
My problem is that I'd like to be able to identify data values that are
"close" to being "correct". For example, in the case of a
nine digit number such as 077467881, I'd like to be able to identify
rows with values close to this raw string. That is, if
there were a row with a value for this column that was "off" by say, a
transposed single digit (such as 077647881 in this example)
I would like to find a query to locate the "close candidates" in a
result set. If I can find rows having a raw key
value that is close to a "good key" then I can allow my user to use
other criteria to possibly assign the "close key" as
an alternate or alias of the official key. Here is part of my schema:
CREATE TABLE MYData (
StateCD char (2) NOT NULL ,
CountyCD char (3) NOT NULL ,
MYID int NULL ,
RawNumString varchar(9) NULL ,
SaleMnYear datetime NOT NULL ,
NumberWidgets int NOT NULL ,
)
CREATE TABLE MYNames (
MYID int IDENTITY (1, 1) NOT NULL ,
OfficialName varchar (70) NOT NULL ,
CONSTRAINT PK_MYNames PRIMARY KEY CLUSTERED
(
MYID
)
)
CREATE TABLE MYAltID (
RawNumString varchar (9) NOT NULL ,
MYID int NOT NULL ,
CONSTRAINT PK_MYALTID PRIMARY KEY CLUSTERED
(
RawNumString
) ,
CONSTRAINT FK_HasName FOREIGN KEY
(
MYID
) REFERENCES MYNames (
MYID
)
)
So, how to generalize something like:
SELECT * FROM MYData WHERE RawNumString = '077467881'
OR RawNumString = '077647881'
data table with a "raw key" value which is not guaranteed to be valid
at its source. Normally, this value will be 9 numeric digits and map to
a "names" table where the entity is given assigned an "official name".
My problem is that I'd like to be able to identify data values that are
"close" to being "correct". For example, in the case of a
nine digit number such as 077467881, I'd like to be able to identify
rows with values close to this raw string. That is, if
there were a row with a value for this column that was "off" by say, a
transposed single digit (such as 077647881 in this example)
I would like to find a query to locate the "close candidates" in a
result set. If I can find rows having a raw key
value that is close to a "good key" then I can allow my user to use
other criteria to possibly assign the "close key" as
an alternate or alias of the official key. Here is part of my schema:
CREATE TABLE MYData (
StateCD char (2) NOT NULL ,
CountyCD char (3) NOT NULL ,
MYID int NULL ,
RawNumString varchar(9) NULL ,
SaleMnYear datetime NOT NULL ,
NumberWidgets int NOT NULL ,
)
CREATE TABLE MYNames (
MYID int IDENTITY (1, 1) NOT NULL ,
OfficialName varchar (70) NOT NULL ,
CONSTRAINT PK_MYNames PRIMARY KEY CLUSTERED
(
MYID
)
)
CREATE TABLE MYAltID (
RawNumString varchar (9) NOT NULL ,
MYID int NOT NULL ,
CONSTRAINT PK_MYALTID PRIMARY KEY CLUSTERED
(
RawNumString
) ,
CONSTRAINT FK_HasName FOREIGN KEY
(
MYID
) REFERENCES MYNames (
MYID
)
)
So, how to generalize something like:
SELECT * FROM MYData WHERE RawNumString = '077467881'
OR RawNumString = '077647881'
Comment