performing a whitespace-insensitive query

**Simon Hayes** · Jul 20 '05, 02:44 AM

Re: performing a whitespace-insensitive query

"Andy Fish" <ajfish@blueyon der.co.uk> wrote in message
news:QTaUb.6399 $t05.58794173@n ews-text.cableinet. net...[color=blue]
> Hi,
>
> I am trying to concoct a query that will join rows on the basis of a
> whitespace insensitive comparison. For instance if one row has the value
> 'a<space>b' and another has the value 'a<space><space >b' I want them to be
> considered equal (but not the same as 'ab')
>
> I am happy to do some T-SQL if that helps. Also I have a full-text index[/color]
on[color=blue]
> the column in question, but note that I am comparing rows against each
> other, not to a fixed string.
>
> This is for a one-off job, so if there is no obvious way to do it on the
> database, I will just bcp out the data and knock up some perl or something
> to do it. All other things being equal I would rather do it on the[/color]
database[color=blue]
> though.
>
> Many thanks
>
> Andy
>
>[/color]

If you want to it in the database, and assuming that you have SQL2000, then
one possibility is to write a user-defined function to iterate over a string
and reduce all whitespace characters to a single space, using the standard
string functions. If you only have spaces (ie no tabs or newlines), then
even some nested REPLACE() functions might work.

Another way would be to write a function which instantiates the Regex COM
object using the sp_OA% procedures, and does the same thing with a regular
expression. But both these approaches would be very slow (unless perhaps you
used your UDF as a computed column and indexed it), and the second one
requires sysadmin permissions.

I suspect, though, that your BCP and Perl approach would probably be the
quickest and easiest solution.

Simon

**Gert-Jan Strik** · Jul 20 '05, 02:44 AM

Re: performing a whitespace-insensitive query

SELECT *
FROM A
INNER JOIN B
ON Replace(
Replace(
Replace(
Replace(
Replace(
Replace(A.col, replicate('X',1 32),'X')
, replicate('X', 21),'X')
, replicate('X', 6),'X')
, replicate('X', 3),'X')
, replicate('X', 2),'X')
, replicate('X', 2),'X')
= Replace(
Replace(
Replace(
Replace(
Replace(
Replace(B.col, replicate('X',1 32),'X')
, replicate('X', 21),'X')
, replicate('X', 6),'X')
, replicate('X', 3),'X')
, replicate('X', 2),'X')
, replicate('X', 2),'X')

Hope this helps,
Gert-Jan

Andy Fish wrote:[color=blue]
>
> Hi,
>
> I am trying to concoct a query that will join rows on the basis of a
> whitespace insensitive comparison. For instance if one row has the value
> 'a<space>b' and another has the value 'a<space><space >b' I want them to be
> considered equal (but not the same as 'ab')
>
> I am happy to do some T-SQL if that helps. Also I have a full-text index on
> the column in question, but note that I am comparing rows against each
> other, not to a fixed string.
>
> This is for a one-off job, so if there is no obvious way to do it on the
> database, I will just bcp out the data and knock up some perl or something
> to do it. All other things being equal I would rather do it on the database
> though.
>
> Many thanks
>
> Andy[/color]

**Gert-Jan Strik** · Jul 20 '05, 02:44 AM

Re: performing a whitespace-insensitive query

Hmmm, again I posted to fast. Replace all X with space in the solution
below and it should work...

Gert-Jan

Gert-Jan Strik wrote:[color=blue]
>
> SELECT *
> FROM A
> INNER JOIN B
> ON Replace(
> Replace(
> Replace(
> Replace(
> Replace(
> Replace(A.col, replicate('X',1 32),'X')
> , replicate('X', 21),'X')
> , replicate('X', 6),'X')
> , replicate('X', 3),'X')
> , replicate('X', 2),'X')
> , replicate('X', 2),'X')
> = Replace(
> Replace(
> Replace(
> Replace(
> Replace(
> Replace(B.col, replicate('X',1 32),'X')
> , replicate('X', 21),'X')
> , replicate('X', 6),'X')
> , replicate('X', 3),'X')
> , replicate('X', 2),'X')
> , replicate('X', 2),'X')
>
> Hope this helps,
> Gert-Jan
>
> Andy Fish wrote:[color=green]
> >
> > Hi,
> >
> > I am trying to concoct a query that will join rows on the basis of a
> > whitespace insensitive comparison. For instance if one row has the value
> > 'a<space>b' and another has the value 'a<space><space >b' I want them to be
> > considered equal (but not the same as 'ab')
> >
> > I am happy to do some T-SQL if that helps. Also I have a full-text index on
> > the column in question, but note that I am comparing rows against each
> > other, not to a fixed string.
> >
> > This is for a one-off job, so if there is no obvious way to do it on the
> > database, I will just bcp out the data and knock up some perl or something
> > to do it. All other things being equal I would rather do it on the database
> > though.
> >
> > Many thanks
> >
> > Andy[/color][/color]

**Andy Fish** · Jul 20 '05, 02:44 AM

Re: performing a whitespace-insensitive query

Thanks Gert-Jan, this definitely looks like the right way to do it.

unfortunately I neglected to mention that I have 3/4 million rows in the
table. I left the query running overnight for 9 hours (it used 100%cpu on my
athlon 1.5 ghz) but I had to cancel it. However, it did return 22,500
correct results (I have no idea what proportion this represents).

I think it must have thrown the query optimization out because I can't
beleive it really takes that long. I think I will create temporary columns
with the whitespace stripped (using your function) and then join on those
instead.

after seeing your post I was fascinated by the magic numbers used. I found a
previous thread from 2002 where there was a lot of discussion and you said
that they were an educated guess but verified empirically. The problem of
finding the optimal numbers has lodged itself into my brain now and won't
let go, so I will let you know if I get anywhere with it.

Andy

"Gert-Jan Strik" <sorry@toomuchs pamalready.nl> wrote in message
news:402153E9.B F779772@toomuch spamalready.nl. ..[color=blue]
> Hmmm, again I posted to fast. Replace all X with space in the solution
> below and it should work...
>
> Gert-Jan
>
>
> Gert-Jan Strik wrote:[color=green]
> >
> > SELECT *
> > FROM A
> > INNER JOIN B
> > ON Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(A.col, replicate('X',1 32),'X')
> > , replicate('X', 21),'X')
> > , replicate('X', 6),'X')
> > , replicate('X', 3),'X')
> > , replicate('X', 2),'X')
> > , replicate('X', 2),'X')
> > = Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(B.col, replicate('X',1 32),'X')
> > , replicate('X', 21),'X')
> > , replicate('X', 6),'X')
> > , replicate('X', 3),'X')
> > , replicate('X', 2),'X')
> > , replicate('X', 2),'X')
> >
> > Hope this helps,
> > Gert-Jan
> >
> > Andy Fish wrote:[color=darkred]
> > >
> > > Hi,
> > >
> > > I am trying to concoct a query that will join rows on the basis of a
> > > whitespace insensitive comparison. For instance if one row has the[/color][/color][/color]
value[color=blue][color=green][color=darkred]
> > > 'a<space>b' and another has the value 'a<space><space >b' I want them[/color][/color][/color]
to be[color=blue][color=green][color=darkred]
> > > considered equal (but not the same as 'ab')
> > >
> > > I am happy to do some T-SQL if that helps. Also I have a full-text[/color][/color][/color]
index on[color=blue][color=green][color=darkred]
> > > the column in question, but note that I am comparing rows against each
> > > other, not to a fixed string.
> > >
> > > This is for a one-off job, so if there is no obvious way to do it on[/color][/color][/color]
the[color=blue][color=green][color=darkred]
> > > database, I will just bcp out the data and knock up some perl or[/color][/color][/color]
something[color=blue][color=green][color=darkred]
> > > to do it. All other things being equal I would rather do it on the[/color][/color][/color]
database[color=blue][color=green][color=darkred]
> > > though.
> > >
> > > Many thanks
> > >
> > > Andy[/color][/color][/color]

**Andy Fish** · Jul 20 '05, 02:44 AM

Re: performing a whitespace-insensitive query

OK, I think I have figured out the algorithm to produce the magic numbers in
the formula.

say 10 is a magic number (i.e. it's one of the ones in the list of replicate
functions), then doing a replace of 10 spaces with 1 will reduce anything up
to 108 spaces down to 18 or less. obviously it will also reduce anything up
to 109 spaces with 19 or less, but it just naturally seems that the 108->18
is giving us the best "leverage" from the number 10.

so in general, putting 'n' in the formula reduces n(n+1)-2 spaces to 2(n-1)

In the above example the next term above 10 would need to reduce to X spaces
to 108 spaces or less

solving 2(n-1)=108 gives us n=55, so 55 would reduce anything up to 3078
spaces down to 108 or less

in the general case, say the term after n is m, then

n(n+1)-2 = 2(m-1)

or

m = 1/2 (n)(n+1)

this generates the sequence 2, 3, 6, 21, 231, 26796, 359026206,
644499084768903 00, 207689535133977 000000000000000 0000. I'm sure there is
some rounding in the last couple of terms (I only used excel).

Anyone with high school math will notice that the formula for m is the sum
of all numbers adding up to m, which does make sense although I'm not sure I
could prove why. one analagy is that if you took a triangle of billiard
balls (start with 3) then spread them out so that these form the bottom row
of a bigger next triangle, then spread all those out to form the bottom row
of the next triangle and so on.

Anyway I hope that's of interest to someone. Oh who am I kidding, it was
such a buzz figuring it out I don't care if anyone ever reads this.

Andy

"Gert-Jan Strik" <sorry@toomuchs pamalready.nl> wrote in message
news:402153E9.B F779772@toomuch spamalready.nl. ..[color=blue]
> Hmmm, again I posted to fast. Replace all X with space in the solution
> below and it should work...
>
> Gert-Jan
>
>
> Gert-Jan Strik wrote:[color=green]
> >
> > SELECT *
> > FROM A
> > INNER JOIN B
> > ON Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(A.col, replicate('X',1 32),'X')
> > , replicate('X', 21),'X')
> > , replicate('X', 6),'X')
> > , replicate('X', 3),'X')
> > , replicate('X', 2),'X')
> > , replicate('X', 2),'X')
> > = Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(
> > Replace(B.col, replicate('X',1 32),'X')
> > , replicate('X', 21),'X')
> > , replicate('X', 6),'X')
> > , replicate('X', 3),'X')
> > , replicate('X', 2),'X')
> > , replicate('X', 2),'X')
> >
> > Hope this helps,
> > Gert-Jan
> >
> > Andy Fish wrote:[color=darkred]
> > >
> > > Hi,
> > >
> > > I am trying to concoct a query that will join rows on the basis of a
> > > whitespace insensitive comparison. For instance if one row has the[/color][/color][/color]
value[color=blue][color=green][color=darkred]
> > > 'a<space>b' and another has the value 'a<space><space >b' I want them[/color][/color][/color]
to be[color=blue][color=green][color=darkred]
> > > considered equal (but not the same as 'ab')
> > >
> > > I am happy to do some T-SQL if that helps. Also I have a full-text[/color][/color][/color]
index on[color=blue][color=green][color=darkred]
> > > the column in question, but note that I am comparing rows against each
> > > other, not to a fixed string.
> > >
> > > This is for a one-off job, so if there is no obvious way to do it on[/color][/color][/color]
the[color=blue][color=green][color=darkred]
> > > database, I will just bcp out the data and knock up some perl or[/color][/color][/color]
something[color=blue][color=green][color=darkred]
> > > to do it. All other things being equal I would rather do it on the[/color][/color][/color]
database[color=blue][color=green][color=darkred]
> > > though.
> > >
> > > Many thanks
> > >
> > > Andy[/color][/color][/color]

**Delbert Glass** · Jul 20 '05, 02:45 AM

Re: performing a whitespace-insensitive query

Consider: literals.

Here are some solutions
with maximium literal length of 19:
[19, 18, [6,7], 3, 2, 2]
[19, 17, [6,7], 4, 3, 2]
[19, 18, [6,7], 4, 3, 2]

Bye,
Delbert Glass

**Gert-Jan Strik** · Jul 20 '05, 02:45 AM

Re: performing a whitespace-insensitive query

In case you are interested, I just gave sort of an explanation a few
days ago. See

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=utf-8&safe=off&selm=401EB002.E212C85E%40toomuchspamalready.nl

Gert-Jan

Andy Fish wrote:[color=blue]
>
> OK, I think I have figured out the algorithm to produce the magic numbers in
> the formula.
>
> say 10 is a magic number (i.e. it's one of the ones in the list of replicate
> functions), then doing a replace of 10 spaces with 1 will reduce anything up
> to 108 spaces down to 18 or less. obviously it will also reduce anything up
> to 109 spaces with 19 or less, but it just naturally seems that the 108->18
> is giving us the best "leverage" from the number 10.
>
> so in general, putting 'n' in the formula reduces n(n+1)-2 spaces to 2(n-1)
>
> In the above example the next term above 10 would need to reduce to X spaces
> to 108 spaces or less
>
> solving 2(n-1)=108 gives us n=55, so 55 would reduce anything up to 3078
> spaces down to 108 or less
>
> in the general case, say the term after n is m, then
>
> n(n+1)-2 = 2(m-1)
>
> or
>
> m = 1/2 (n)(n+1)
>
> this generates the sequence 2, 3, 6, 21, 231, 26796, 359026206,
> 644499084768903 00, 207689535133977 000000000000000 0000. I'm sure there is
> some rounding in the last couple of terms (I only used excel).
>
> Anyone with high school math will notice that the formula for m is the sum
> of all numbers adding up to m, which does make sense although I'm not sure I
> could prove why. one analagy is that if you took a triangle of billiard
> balls (start with 3) then spread them out so that these form the bottom row
> of a bigger next triangle, then spread all those out to form the bottom row
> of the next triangle and so on.
>
> Anyway I hope that's of interest to someone. Oh who am I kidding, it was
> such a buzz figuring it out I don't care if anyone ever reads this.
>
> Andy
>
> "Gert-Jan Strik" <sorry@toomuchs pamalready.nl> wrote in message
> news:402153E9.B F779772@toomuch spamalready.nl. ..[color=green]
> > Hmmm, again I posted to fast. Replace all X with space in the solution
> > below and it should work...
> >
> > Gert-Jan
> >
> >
> > Gert-Jan Strik wrote:[color=darkred]
> > >
> > > SELECT *
> > > FROM A
> > > INNER JOIN B
> > > ON Replace(
> > > Replace(
> > > Replace(
> > > Replace(
> > > Replace(
> > > Replace(A.col, replicate('X',1 32),'X')
> > > , replicate('X', 21),'X')
> > > , replicate('X', 6),'X')
> > > , replicate('X', 3),'X')
> > > , replicate('X', 2),'X')
> > > , replicate('X', 2),'X')
> > > = Replace(
> > > Replace(
> > > Replace(
> > > Replace(
> > > Replace(
> > > Replace(B.col, replicate('X',1 32),'X')
> > > , replicate('X', 21),'X')
> > > , replicate('X', 6),'X')
> > > , replicate('X', 3),'X')
> > > , replicate('X', 2),'X')
> > > , replicate('X', 2),'X')
> > >
> > > Hope this helps,
> > > Gert-Jan
> > >
> > > Andy Fish wrote:
> > > >
> > > > Hi,
> > > >
> > > > I am trying to concoct a query that will join rows on the basis of a
> > > > whitespace insensitive comparison. For instance if one row has the[/color][/color]
> value[color=green][color=darkred]
> > > > 'a<space>b' and another has the value 'a<space><space >b' I want them[/color][/color]
> to be[color=green][color=darkred]
> > > > considered equal (but not the same as 'ab')
> > > >
> > > > I am happy to do some T-SQL if that helps. Also I have a full-text[/color][/color]
> index on[color=green][color=darkred]
> > > > the column in question, but note that I am comparing rows against each
> > > > other, not to a fixed string.
> > > >
> > > > This is for a one-off job, so if there is no obvious way to do it on[/color][/color]
> the[color=green][color=darkred]
> > > > database, I will just bcp out the data and knock up some perl or[/color][/color]
> something[color=green][color=darkred]
> > > > to do it. All other things being equal I would rather do it on the[/color][/color]
> database[color=green][color=darkred]
> > > > though.
> > > >
> > > > Many thanks
> > > >
> > > > Andy[/color][/color][/color]

**Delbert Glass** · Jul 20 '05, 02:45 AM

Re: performing a whitespace-insensitive query

Here are (the?) 24 solutions
with fewer then 7 stages
and lengths less than 20.

Solution TotalLength
-------------------- -----------
[19, 17, 6, 3, 2, 2] 49
[19, 17, 6, 3, 3, 2] 50
[19, 17, 6, 4, 2, 2] 50
[19, 17, 6, 4, 3, 2] 51
[19, 17, 7, 3, 2, 2] 50
[19, 17, 7, 3, 3, 2] 51
[19, 17, 7, 4, 2, 2] 51
[19, 17, 7, 4, 3, 2] 52
[19, 18, 6, 3, 2, 2] 50
[19, 18, 6, 3, 3, 2] 51
[19, 18, 6, 4, 2, 2] 51
[19, 18, 6, 4, 3, 2] 52
[19, 18, 7, 3, 2, 2] 51
[19, 18, 7, 3, 3, 2] 52
[19, 18, 7, 4, 2, 2] 52
[19, 18, 7, 4, 3, 2] 53
[19, 19, 6, 3, 2, 2] 51
[19, 19, 6, 3, 3, 2] 52
[19, 19, 6, 4, 2, 2] 52
[19, 19, 6, 4, 3, 2] 53
[19, 19, 7, 3, 2, 2] 52
[19, 19, 7, 3, 3, 2] 53
[19, 19, 7, 4, 2, 2] 53
[19, 19, 7, 4, 3, 2] 54

Bye,
Delbert Glass

**Gert-Jan Strik** · Jul 20 '05, 02:45 AM

Re: performing a whitespace-insensitive query

I have run just a few tests with solution (1) [19, 17, 6, 3, 2, 2] and
(2) [132, 21, 6, 3, 2, 2] on SQL7.0, and the second solution uses
significantly less CPU.

On a testset with values 'a'+space(2)+'a ' .. 'a'+space(7998) +'a', the
results were:
solution 1: approx. 3780 ms
solution 2: approx. 2750 ms

On a testset with more small values and fewer large values, the results
show similar differences: 4280 vs 3110 ms and 953 vs 719 ms

Gert-Jan

Delbert Glass wrote:[color=blue]
>
> Here are (the?) 24 solutions
> with fewer then 7 stages
> and lengths less than 20.
>
> Solution TotalLength
> -------------------- -----------
> [19, 17, 6, 3, 2, 2] 49
> [19, 17, 6, 3, 3, 2] 50
> [19, 17, 6, 4, 2, 2] 50
> [19, 17, 6, 4, 3, 2] 51
> [19, 17, 7, 3, 2, 2] 50
> [19, 17, 7, 3, 3, 2] 51
> [19, 17, 7, 4, 2, 2] 51
> [19, 17, 7, 4, 3, 2] 52
> [19, 18, 6, 3, 2, 2] 50
> [19, 18, 6, 3, 3, 2] 51
> [19, 18, 6, 4, 2, 2] 51
> [19, 18, 6, 4, 3, 2] 52
> [19, 18, 7, 3, 2, 2] 51
> [19, 18, 7, 3, 3, 2] 52
> [19, 18, 7, 4, 2, 2] 52
> [19, 18, 7, 4, 3, 2] 53
> [19, 19, 6, 3, 2, 2] 51
> [19, 19, 6, 3, 3, 2] 52
> [19, 19, 6, 4, 2, 2] 52
> [19, 19, 6, 4, 3, 2] 53
> [19, 19, 7, 3, 2, 2] 52
> [19, 19, 7, 3, 3, 2] 53
> [19, 19, 7, 4, 2, 2] 53
> [19, 19, 7, 4, 3, 2] 54
>
> Bye,
> Delbert Glass[/color]

**Delbert Glass** · Jul 20 '05, 02:46 AM

Re: performing a whitespace-insensitive query

I was looking for the earliest solution (with values in decreasing sequence)
not the fastest (earliest finishing) solution.
I think what's fastest is going to depend on what the
distribution of the data is and how REPLACE is implemented.

Here are some solutions:
[[77..104], 9, 4, 2, 2, 2]
with a bunch of two-s.
I'm thinking [104, 9, 4, 2, 2, 2]
might be good to use on data that have lots
of occurances of two space runs.
(I suspect how REPLACE is implemented
will have a significant impact how efficiently
these solutions are executed.)

Here is a solution:
[445, 22, 7, 4, 3, 2]
in which I upsized the numbers
but made no attempt to find the absolute largest,
most largest, or etc.

As you surely already know,
larger values help you skip along;
while, smaller values help you shorten
the string so there is not so much to
dig through (and copy, etc) in later passes.
I think, in general, the last two solutions
mentioned above [104, 9, 4, 2, 2, 2]
and [445, 22, 7, 4, 3, 2]
will run much closer to the speed of
your solution [132, 21, 6, 3, 2, 2]
then the solution [ 19, 17, 6, 3, 2, 2] does
even though they have fewer values in common.

My guess is the way you picked your values
makes a reasonable trade off between
wanting the values to be large and
wanting the values to be small
and thus makes it a reasonable general solution.

---

In some solutions, the values are not in decreasing order.
Such solutions should not be overlooked.

---

Don't forget there might be 8000 "spaces"
that need to get reduce to one "space".

Bye,
Delbert Glass

"Gert-Jan Strik" <sorry@toomuchs pamalready.nl> wrote in message
news:4023EEDE.D 13DFD21@toomuch spamalready.nl. ..[color=blue]
> I have run just a few tests with solution (1) [19, 17, 6, 3, 2, 2] and
> (2) [132, 21, 6, 3, 2, 2] on SQL7.0, and the second solution uses
> significantly less CPU.
>
> On a testset with values 'a'+space(2)+'a ' .. 'a'+space(7998) +'a', the
> results were:
> solution 1: approx. 3780 ms
> solution 2: approx. 2750 ms
>
> On a testset with more small values and fewer large values, the results
> show similar differences: 4280 vs 3110 ms and 953 vs 719 ms
>
> Gert-Jan
>
>
> Delbert Glass wrote:[color=green]
> >
> > Here are (the?) 24 solutions
> > with fewer then 7 stages
> > and lengths less than 20.
> >
> > Solution TotalLength
> > -------------------- -----------
> > [19, 17, 6, 3, 2, 2] 49
> > [19, 17, 6, 3, 3, 2] 50
> > [19, 17, 6, 4, 2, 2] 50
> > [19, 17, 6, 4, 3, 2] 51
> > [19, 17, 7, 3, 2, 2] 50
> > [19, 17, 7, 3, 3, 2] 51
> > [19, 17, 7, 4, 2, 2] 51
> > [19, 17, 7, 4, 3, 2] 52
> > [19, 18, 6, 3, 2, 2] 50
> > [19, 18, 6, 3, 3, 2] 51
> > [19, 18, 6, 4, 2, 2] 51
> > [19, 18, 6, 4, 3, 2] 52
> > [19, 18, 7, 3, 2, 2] 51
> > [19, 18, 7, 3, 3, 2] 52
> > [19, 18, 7, 4, 2, 2] 52
> > [19, 18, 7, 4, 3, 2] 53
> > [19, 19, 6, 3, 2, 2] 51
> > [19, 19, 6, 3, 3, 2] 52
> > [19, 19, 6, 4, 2, 2] 52
> > [19, 19, 6, 4, 3, 2] 53
> > [19, 19, 7, 3, 2, 2] 52
> > [19, 19, 7, 3, 3, 2] 53
> > [19, 19, 7, 4, 2, 2] 53
> > [19, 19, 7, 4, 3, 2] 54
> >
> > Bye,
> > Delbert Glass[/color][/color]

**Andy Fish** · Jul 20 '05, 02:47 AM

Re: performing a whitespace-insensitive query

sorry for a late reply but I have been away

thanks for pointing me at this post - I guess it hadn't reached google's
server when I did the search.

I especially liked Robert Carnegie's approach. Unfortunately I don't have
the time to benchmark the possible options but using robert's approach I did
this

1. create a second column on the table with the duplicate spaces removed (30
secs)
2. created an index on this new column (30 secs)
3. do a join to find all rows where the original value is different but
space-insensitive value is the same (15 secs)

as for the proper solution with the stacked replaces, I think my numbers
give the greatest amount of squashing power for the least number of replace
functions, even though they might use more CPU.

Andy

"Gert-Jan Strik" <sorry@toomuchs pamalready.nl> wrote in message
news:4022C3CE.5 94D1E89@toomuch spamalready.nl. ..[color=blue]
> In case you are interested, I just gave sort of an explanation a few
> days ago. See
>[/color]
http://groups.google.com/groups?hl=e...spamalready.nl[color=blue]
>
> Gert-Jan
>
>
> Andy Fish wrote:[color=green]
> >
> > OK, I think I have figured out the algorithm to produce the magic[/color][/color]
numbers in[color=blue][color=green]
> > the formula.
> >
> > say 10 is a magic number (i.e. it's one of the ones in the list of[/color][/color]
replicate[color=blue][color=green]
> > functions), then doing a replace of 10 spaces with 1 will reduce[/color][/color]
anything up[color=blue][color=green]
> > to 108 spaces down to 18 or less. obviously it will also reduce anything[/color][/color]
up[color=blue][color=green]
> > to 109 spaces with 19 or less, but it just naturally seems that the[/color][/color]
108->18[color=blue][color=green]
> > is giving us the best "leverage" from the number 10.
> >
> > so in general, putting 'n' in the formula reduces n(n+1)-2 spaces to[/color][/color]
2(n-1)[color=blue][color=green]
> >
> > In the above example the next term above 10 would need to reduce to X[/color][/color]
spaces[color=blue][color=green]
> > to 108 spaces or less
> >
> > solving 2(n-1)=108 gives us n=55, so 55 would reduce anything up to 3078
> > spaces down to 108 or less
> >
> > in the general case, say the term after n is m, then
> >
> > n(n+1)-2 = 2(m-1)
> >
> > or
> >
> > m = 1/2 (n)(n+1)
> >
> > this generates the sequence 2, 3, 6, 21, 231, 26796, 359026206,
> > 644499084768903 00, 207689535133977 000000000000000 0000. I'm sure there is
> > some rounding in the last couple of terms (I only used excel).
> >
> > Anyone with high school math will notice that the formula for m is the[/color][/color]
sum[color=blue][color=green]
> > of all numbers adding up to m, which does make sense although I'm not[/color][/color]
sure I[color=blue][color=green]
> > could prove why. one analagy is that if you took a triangle of billiard
> > balls (start with 3) then spread them out so that these form the bottom[/color][/color]
row[color=blue][color=green]
> > of a bigger next triangle, then spread all those out to form the bottom[/color][/color]
row[color=blue][color=green]
> > of the next triangle and so on.
> >
> > Anyway I hope that's of interest to someone. Oh who am I kidding, it was
> > such a buzz figuring it out I don't care if anyone ever reads this.
> >
> > Andy
> >
> > "Gert-Jan Strik" <sorry@toomuchs pamalready.nl> wrote in message
> > news:402153E9.B F779772@toomuch spamalready.nl. ..[color=darkred]
> > > Hmmm, again I posted to fast. Replace all X with space in the solution
> > > below and it should work...
> > >
> > > Gert-Jan
> > >
> > >
> > > Gert-Jan Strik wrote:
> > > >
> > > > SELECT *
> > > > FROM A
> > > > INNER JOIN B
> > > > ON Replace(
> > > > Replace(
> > > > Replace(
> > > > Replace(
> > > > Replace(
> > > > Replace(A.col, replicate('X',1 32),'X')
> > > > , replicate('X', 21),'X')
> > > > , replicate('X', 6),'X')
> > > > , replicate('X', 3),'X')
> > > > , replicate('X', 2),'X')
> > > > , replicate('X', 2),'X')
> > > > = Replace(
> > > > Replace(
> > > > Replace(
> > > > Replace(
> > > > Replace(
> > > > Replace(B.col, replicate('X',1 32),'X')
> > > > , replicate('X', 21),'X')
> > > > , replicate('X', 6),'X')
> > > > , replicate('X', 3),'X')
> > > > , replicate('X', 2),'X')
> > > > , replicate('X', 2),'X')
> > > >
> > > > Hope this helps,
> > > > Gert-Jan
> > > >
> > > > Andy Fish wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I am trying to concoct a query that will join rows on the basis of[/color][/color][/color]
a[color=blue][color=green][color=darkred]
> > > > > whitespace insensitive comparison. For instance if one row has the[/color]
> > value[color=darkred]
> > > > > 'a<space>b' and another has the value 'a<space><space >b' I want[/color][/color][/color]
them[color=blue][color=green]
> > to be[color=darkred]
> > > > > considered equal (but not the same as 'ab')
> > > > >
> > > > > I am happy to do some T-SQL if that helps. Also I have a full-text[/color]
> > index on[color=darkred]
> > > > > the column in question, but note that I am comparing rows against[/color][/color][/color]
each[color=blue][color=green][color=darkred]
> > > > > other, not to a fixed string.
> > > > >
> > > > > This is for a one-off job, so if there is no obvious way to do it[/color][/color][/color]
on[color=blue][color=green]
> > the[color=darkred]
> > > > > database, I will just bcp out the data and knock up some perl or[/color]
> > something[color=darkred]
> > > > > to do it. All other things being equal I would rather do it on the[/color]
> > database[color=darkred]
> > > > > though.
> > > > >
> > > > > Many thanks
> > > > >
> > > > > Andy[/color][/color][/color]

performing a whitespace-insensitive query

performing a whitespace-insensitive query

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment