Re: website doc search is extremely SLOW
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).
I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.
I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.
You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.
John Sidney-Woollett
Dave Cramer said:[color=blue]
> Marc,
>
> No it doesn't spider, it is a specialized tool for searching documents.
>
> I'm curious, what value is there to being able to count the number of
> url's ?
>
> It does do things like query all documents where CREATE AND TABLE are n
> words apart, just as fast, I would think these are more valuable to
> document searching?
>
> I think the challenge here is what do we want to search. I am betting
> that folks use this page as they would man? ie. what is the command for
> create trigger?
>
> As I said my offer stands to help out, but I think if the goal is to
> search the entire website, then this particular tool is not useful.
>
> At this point I am working on indexing the sgml directly as it has less
> cruft in it. For instance all the links that appear in every summary are
> just noise.
>
>
> Dave
>
> On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:[color=green]
>> On Wed, 31 Dec 2003, Dave Cramer wrote:
>>[color=darkred]
>> > I can modify mine to be client server if you want?
>> >
>> > It is a java app, so we need to be able to run jdk1.3 at least?[/color]
>>
>> jdk1.4 is available on the VMs ... does your spider? for instance, you
>> mention that you have the docs indexed right now, but we are currently
>> indexing:
>>
>> Server http://archives.postgresql.org/
>> Server http://advocacy.postgresql.org/
>> Server http://developer.postgresql.org/
>> Server http://gborg.postgresql.org/
>> Server http://pgadmin.postgresql.org/
>> Server http://techdocs.postgresql.org/
>> Server http://www.postgresql.org/
>>
>> will it be able to handle:
>>
>> 186_archives=# select count(*) from url;
>> count
>> --------
>> 393551
>> (1 row)
>>
>> as fast as you are finding with just the docs?
>>
>> ----
>> Marc G. Fournier Hub.Org Networking Services
>> (http://www.hub.org)
>> Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
>> 7615664
>>[/color]
> --
> Dave Cramer
> 519 939 0336
> ICQ # 1467551
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>[/color]
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)
I think that Oleg's new search offering looks really good and fast. (I
can't wait till I have some task that needs tsearch!).
I agree with Dave that searching the docs is more important for me than
the sites - but it would be really nice to have both, in one tool.
I built something similar for the Tate Gallery in the UK - here you can
select the type of content that you want returned, either static pages or
dynamic. You can see the idea at
This is custom built (using java/Oracle), supports stemming, boolean
operators, exact phrase matching, relevancy and matched term highlighting.
You can switch on/off the types of documents that you are not interested
in. Using this analogy, a search facility that could offer you results
from i) the docs and/or ii) the postgres sites static pages would be very
useful.
John Sidney-Woollett
Dave Cramer said:[color=blue]
> Marc,
>
> No it doesn't spider, it is a specialized tool for searching documents.
>
> I'm curious, what value is there to being able to count the number of
> url's ?
>
> It does do things like query all documents where CREATE AND TABLE are n
> words apart, just as fast, I would think these are more valuable to
> document searching?
>
> I think the challenge here is what do we want to search. I am betting
> that folks use this page as they would man? ie. what is the command for
> create trigger?
>
> As I said my offer stands to help out, but I think if the goal is to
> search the entire website, then this particular tool is not useful.
>
> At this point I am working on indexing the sgml directly as it has less
> cruft in it. For instance all the links that appear in every summary are
> just noise.
>
>
> Dave
>
> On Wed, 2003-12-31 at 00:44, Marc G. Fournier wrote:[color=green]
>> On Wed, 31 Dec 2003, Dave Cramer wrote:
>>[color=darkred]
>> > I can modify mine to be client server if you want?
>> >
>> > It is a java app, so we need to be able to run jdk1.3 at least?[/color]
>>
>> jdk1.4 is available on the VMs ... does your spider? for instance, you
>> mention that you have the docs indexed right now, but we are currently
>> indexing:
>>
>> Server http://archives.postgresql.org/
>> Server http://advocacy.postgresql.org/
>> Server http://developer.postgresql.org/
>> Server http://gborg.postgresql.org/
>> Server http://pgadmin.postgresql.org/
>> Server http://techdocs.postgresql.org/
>> Server http://www.postgresql.org/
>>
>> will it be able to handle:
>>
>> 186_archives=# select count(*) from url;
>> count
>> --------
>> 393551
>> (1 row)
>>
>> as fast as you are finding with just the docs?
>>
>> ----
>> Marc G. Fournier Hub.Org Networking Services
>> (http://www.hub.org)
>> Email: scrappy@hub.org Yahoo!: yscrappy ICQ:
>> 7615664
>>[/color]
> --
> Dave Cramer
> 519 939 0336
> ICQ # 1467551
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>[/color]
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)
Comment