making tsearch2 dictionaries

**Ben** · Nov 22 '05, 09:00 AM

Re: making tsearch2 dictionaries

Okay, so I was actually able to answer this question on my own, in a
manner of speaking. It seems the way to do this is to merely return a
larger char** array, with one element for each word. But I was having
trouble with postgres crashing, because (I think) it tries to free each
element independently before using all of them. I had set each element
to a different null-terminated chunk of the same palloc'd memory
segment. Having never written C stored procs before, I take it that's
bad practice?

Anyway, now that this is working, my next question is: can I take the
lexemes from one dictionary lookup and pipe them into another
dictionary? I see that I can have redundant dictionaries, such that if
lexemes aren't found in one it'll try another, but that's not quite the
same.

For instance, the en_stem dictionary converts "hundred" into "hundr".
Right now, my dictionary converts "100" into "one" and "hundred", but
I'd like it to filter both one and hundred through the en_stem
dictionary to arrive at "one" and "hundr".

It also occurs to me I could pipe things through an ispell dictionary
and be able to handle misspellings... .

On Sun, 2004-02-15 at 15:35, Ben wrote:[color=blue]
> I'm trying to make myself a dictionary for tsearch2 that converts
> numbers to their english word equivalents. This seems to be working
> great, except that I can't figure out how to make my lexize function
> return multiple lexemes. For instance, I'd like "100" to get converted
> to {one,hundred}, not {"one hundred"} as is currently happening.
>
> How do I specify the output of the lexize function so that this will
> happen?[/color]

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

**Teodor Sigaev** · Nov 22 '05, 09:00 AM

Re: making tsearch2 dictionaries

From http://www.sai.msu.su/~megera/oddmus...ch_V2_in_Brief

Table for storing dictionaries. Dict_init field store Oid of function
that initialize dictionary. Dict_init has one option: text value from
dict_initoption and should return internal representation (structure)
of dictionary. Structure must be malloced or palloced in
TopMemoryContex t. Dict_init is called only one times per process.
dict_lexize field store Oid of function that lemmatize lexem.
Input values: structure of dictionary, pionter to string and it's
length. Output: pointer to array of pointers to C-strings. Last pointer
in array must be NULL. Returns NULL means that dictionary can't resolve
this word, but return void array means that dictionary know input word,
but suppose that word is stop-word.

Ben wrote:[color=blue]
> I'm trying to make myself a dictionary for tsearch2 that converts
> numbers to their english word equivalents. This seems to be working
> great, except that I can't figure out how to make my lexize function
> return multiple lexemes. For instance, I'd like "100" to get converted
> to {one,hundred}, not {"one hundred"} as is currently happening.
>
> How do I specify the output of the lexize function so that this will
> happen?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster[/color]

--
Teodor Sigaev E-mail: teodor@sigaev.r u

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

**Tom Lane** · Nov 22 '05, 09:00 AM

Re: making tsearch2 dictionaries

Ben <bench@silentme dia.com> writes:[color=blue]
> Okay, so I was actually able to answer this question on my own, in a
> manner of speaking. It seems the way to do this is to merely return a
> larger char** array, with one element for each word. But I was having
> trouble with postgres crashing, because (I think) it tries to free each
> element independently before using all of them. I had set each element
> to a different null-terminated chunk of the same palloc'd memory
> segment. Having never written C stored procs before, I take it that's
> bad practice?[/color]

Given Teodor's response, I think the issue is probably that you were
palloc'ing in too short-lived a context. But whatever the problem is,
you'll narrow it down a lot faster if you build with --enable-cassert.
I wouldn't ever recommend trying to debug C functions without that.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

PostgreSQL: Not Found

http://www.postgresql.org/docs/faqs/FAQ.html

**Teodor Sigaev** · Nov 22 '05, 09:00 AM

Re: making tsearch2 dictionaries

Excuse me, but I was too brief.
I mean your lexize method of dictionary should return pointer to array with 3
elements:
first should points to "one" C-string, second - to "hundred" C-string and 3rd is
NULL.
Array and C-strings should be palloc'ed in short-lived context, because it's
lives during parse text only.

Tom Lane wrote:[color=blue]
> Ben <bench@silentme dia.com> writes:
>[color=green]
>>Okay, so I was actually able to answer this question on my own, in a
>>manner of speaking. It seems the way to do this is to merely return a
>>larger char** array, with one element for each word. But I was having
>>trouble with postgres crashing, because (I think) it tries to free each
>>element independently before using all of them. I had set each element
>>to a different null-terminated chunk of the same palloc'd memory
>>segment. Having never written C stored procs before, I take it that's
>>bad practice?[/color]
>
>
> Given Teodor's response, I think the issue is probably that you were
> palloc'ing in too short-lived a context. But whatever the problem is,
> you'll narrow it down a lot faster if you build with --enable-cassert.
> I wouldn't ever recommend trying to debug C functions without that.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html[/color]

--
Teodor Sigaev E-mail: teodor@sigaev.r u

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

**Ben** · Nov 22 '05, 09:00 AM

Re: making tsearch2 dictionaries

Thanks for the replies. Just to clarify what I was doing, quaicode
looked something like:

phrase = palloc(8);
phrase = "foo\0bar\0 ";
res = palloc(3);
res[0] = phrase[0];
res[1] = phrase[5];
res[2] = 0;

That crashed. Once I changed it to:

res = palloc(3);
res[0] = palloc(4);
res[0] = "foo\0";
res[1] = palloc(4);
res[2] = "bar\0";
res[3] = 0;

it worked.

Anyway, I'm happy to forget my pain with this if only I could figure out
how to pipe the lexemes from one dictionary into another dictionary. :)

On Mon, 2004-02-16 at 08:09, Teodor Sigaev wrote:[color=blue]
> Excuse me, but I was too brief.
> I mean your lexize method of dictionary should return pointer to array with 3
> elements:
> first should points to "one" C-string, second - to "hundred" C-string and 3rd is
> NULL.
> Array and C-strings should be palloc'ed in short-lived context, because it's
> lives during parse text only.
>
>
>
>
> Tom Lane wrote:[color=green]
> > Ben <bench@silentme dia.com> writes:
> >[color=darkred]
> >>Okay, so I was actually able to answer this question on my own, in a
> >>manner of speaking. It seems the way to do this is to merely return a
> >>larger char** array, with one element for each word. But I was having
> >>trouble with postgres crashing, because (I think) it tries to free each
> >>element independently before using all of them. I had set each element
> >>to a different null-terminated chunk of the same palloc'd memory
> >>segment. Having never written C stored procs before, I take it that's
> >>bad practice?[/color]
> >
> >
> > Given Teodor's response, I think the issue is probably that you were
> > palloc'ing in too short-lived a context. But whatever the problem is,
> > you'll narrow it down a lot faster if you build with --enable-cassert.
> > I wouldn't ever recommend trying to debug C functions without that.
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/docs/faqs/FAQ.html[/color][/color]

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

PostgreSQL: Not Found

http://www.postgresql.org/docs/faqs/FAQ.html

**Ben** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

Like I said, quasicode. :)

And in fact I see I even put an off-by-one error in this last email that
wasn't in my function. (Honest!) Should have been "res[1] = phrase[4]"
in the first section.

Are there docs for making parsers? Or anything like gendict?

On Mon, 2004-02-16 at 09:25, Teodor Sigaev wrote:
[color=blue]
> :)
> I hope you mean:
> res = palloc(3);
> res[0] = palloc(4);
> memcpy(res[0] ,"foo", 4);
> res[1] = palloc(4);
> memcpy(res[1] ,"bar", 4);
> res[2] = 0;
>
> Look at indexes of res.[/color]

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

**Teodor Sigaev** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

Ben wrote:[color=blue]
> Thanks for the replies. Just to clarify what I was doing, quaicode
> looked something like:
>
> phrase = palloc(8);
> phrase = "foo\0bar\0 ";
> res = palloc(3);
> res[0] = phrase[0];
> res[1] = phrase[5];
> res[2] = 0;
>
> That crashed. Once I changed it to:
>
> res = palloc(3);
> res[0] = palloc(4);
> res[0] = "foo\0";
> res[1] = palloc(4);
> res[2] = "bar\0";
> res[3] = 0;
>
> it worked.
>[/color]
:)
I hope you mean:
res = palloc(3);
res[0] = palloc(4);
memcpy(res[0] ,"foo", 4);
res[1] = palloc(4);
memcpy(res[1] ,"bar", 4);
res[2] = 0;

Look at indexes of res.

--
Teodor Sigaev E-mail: teodor@sigaev.r u

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

**Teodor Sigaev** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

Small docs are avaliable at

Zen: Tsearch V2 in Brief

http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_in_Brief

and into current implementation at contrib/tsearch2/wparser_def.c. The largest
code is about headline stuff.

Ben wrote:[color=blue]
> Like I said, quasicode. :)
>
> And in fact I see I even put an off-by-one error in this last email that
> wasn't in my function. (Honest!) Should have been "res[1] = phrase[4]"
> in the first section.
>
> Are there docs for making parsers? Or anything like gendict?
>
> On Mon, 2004-02-16 at 09:25, Teodor Sigaev wrote:
>
>[color=green]
>>:)
>>I hope you mean:
>>res = palloc(3);
>>res[0] = palloc(4);
>>memcpy(res[0] ,"foo", 4);
>>res[1] = palloc(4);
>>memcpy(res[1] ,"bar", 4);
>>res[2] = 0;
>>
>>Look at indexes of res.[/color][/color]

--
Teodor Sigaev E-mail: teodor@sigaev.r u

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Oleg Bartunov** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

btw, Ben, if you get you dictionary working, could you describe process
of developing so other people will appreciate your work. This part of
tsearch2 documentation is very weak.

Oleg

On Mon, 16 Feb 2004, Teodor Sigaev wrote:
[color=blue]
>
>
> Ben wrote:[color=green]
> > Thanks for the replies. Just to clarify what I was doing, quaicode
> > looked something like:
> >
> > phrase = palloc(8);
> > phrase = "foo\0bar\0 ";
> > res = palloc(3);
> > res[0] = phrase[0];
> > res[1] = phrase[5];
> > res[2] = 0;
> >
> > That crashed. Once I changed it to:
> >
> > res = palloc(3);
> > res[0] = palloc(4);
> > res[0] = "foo\0";
> > res[1] = palloc(4);
> > res[2] = "bar\0";
> > res[3] = 0;
> >
> > it worked.
> >[/color]
> :)
> I hope you mean:
> res = palloc(3);
> res[0] = palloc(4);
> memcpy(res[0] ,"foo", 4);
> res[1] = palloc(4);
> memcpy(res[1] ,"bar", 4);
> res[2] = 0;
>
> Look at indexes of res.
>
>[/color]

Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Ben** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

So I noticed. ;) The dictionary's working, and I'd be happy to expand
upon the documentation. Just point me at something to work on.

But, like I said, I really want to figure out a way to pipe the output
of my dictionary through the another dictionary. If I can't do that, it
doesn't seem as useful, because "100" (handled by my dictionary) and
"one hundred" (handled by en_stem) currently don't generate the same
ts_vector.

Once I figure out how to tweak the parser to parse things they way I
want, I can expand upon those docs too. Looks like I'm going to need to
reach waaaay back into my brain and dust off my flex knowledge for that,
though....

On Mon, 2004-02-16 at 10:33, Oleg Bartunov wrote:[color=blue]
> btw, Ben, if you get you dictionary working, could you describe process
> of developing so other people will appreciate your work. This part of
> tsearch2 documentation is very weak.
>
> Oleg
>
> On Mon, 16 Feb 2004, Teodor Sigaev wrote:
>[color=green]
> >
> >
> > Ben wrote:[color=darkred]
> > > Thanks for the replies. Just to clarify what I was doing, quaicode
> > > looked something like:
> > >
> > > phrase = palloc(8);
> > > phrase = "foo\0bar\0 ";
> > > res = palloc(3);
> > > res[0] = phrase[0];
> > > res[1] = phrase[5];
> > > res[2] = 0;
> > >
> > > That crashed. Once I changed it to:
> > >
> > > res = palloc(3);
> > > res[0] = palloc(4);
> > > res[0] = "foo\0";
> > > res[1] = palloc(4);
> > > res[2] = "bar\0";
> > > res[3] = 0;
> > >
> > > it worked.
> > >[/color]
> > :)
> > I hope you mean:
> > res = palloc(3);
> > res[0] = palloc(4);
> > memcpy(res[0] ,"foo", 4);
> > res[1] = palloc(4);
> > memcpy(res[1] ,"bar", 4);
> > res[2] = 0;
> >
> > Look at indexes of res.
> >
> >[/color]
>
> Regards,
> Oleg
> _______________ _______________ _______________ _______________ _
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83[/color]

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

PostgreSQL: PostgreSQL Mailing List Archives

http://archives.postgresql.org

**Oleg Bartunov** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

On Mon, 16 Feb 2004, Ben wrote:
[color=blue]
> So I noticed. ;) The dictionary's working, and I'd be happy to expand
> upon the documentation. Just point me at something to work on.
>[/color]

I think you may just write a paper "How I did custom dictionary for tsearch2".
From what I've read I see your dictionary could be interesting to people
especially if you describe the motivation and usage.
Do you want '100' or 'hundred' will be fully equivalent ? So,
if you search '100' you will find document with 'hundred'. Interesting,
that you will find '123', because '123' will be 'one hundred twenty three'.
[color=blue]
> But, like I said, I really want to figure out a way to pipe the output
> of my dictionary through the another dictionary. If I can't do that, it
> doesn't seem as useful, because "100" (handled by my dictionary) and
> "one hundred" (handled by en_stem) currently don't generate the same
> ts_vector.[/color]

What's the problem ? You may configure which dictionaries and in what order
should be used for given type of token (pg_ts_cfgmap table).
Aha, I got your problem:

www=# select * from ts_debug('one hundred');
ts_name | tok_type | description | token | dict_name | tsvector
-----------------+----------+-------------+---------+-----------+----------
default_russian | lword | Latin word | one | {en_stem} | 'one'
default_russian | lword | Latin word | hundred | {en_stem} | 'hundr

'hundred' becames 'hundr'. You may use synonym dictionary which is
rather simple
( see http://www.sai.msu.su/~megera/oddmus...earch_V2_Notes for details ).
Once word is recognized by synonym dictionary it will not pass to
next dictionary ! This is how tsearch2 is working with any dictionary.

[color=blue]
>
> Once I figure out how to tweak the parser to parse things they way I
> want, I can expand upon those docs too. Looks like I'm going to need to
> reach waaaay back into my brain and dust off my flex knowledge for that,
> though....[/color]

What do you want from parser ?
[color=blue]
>
> On Mon, 2004-02-16 at 10:33, Oleg Bartunov wrote:[color=green]
> > btw, Ben, if you get you dictionary working, could you describe process
> > of developing so other people will appreciate your work. This part of
> > tsearch2 documentation is very weak.
> >
> > Oleg
> >
> > On Mon, 16 Feb 2004, Teodor Sigaev wrote:
> >[color=darkred]
> > >
> > >
> > > Ben wrote:
> > > > Thanks for the replies. Just to clarify what I was doing, quaicode
> > > > looked something like:
> > > >
> > > > phrase = palloc(8);
> > > > phrase = "foo\0bar\0 ";
> > > > res = palloc(3);
> > > > res[0] = phrase[0];
> > > > res[1] = phrase[5];
> > > > res[2] = 0;
> > > >
> > > > That crashed. Once I changed it to:
> > > >
> > > > res = palloc(3);
> > > > res[0] = palloc(4);
> > > > res[0] = "foo\0";
> > > > res[1] = palloc(4);
> > > > res[2] = "bar\0";
> > > > res[3] = 0;
> > > >
> > > > it worked.
> > > >
> > > :)
> > > I hope you mean:
> > > res = palloc(3);
> > > res[0] = palloc(4);
> > > memcpy(res[0] ,"foo", 4);
> > > res[1] = palloc(4);
> > > memcpy(res[1] ,"bar", 4);
> > > res[2] = 0;
> > >
> > > Look at indexes of res.
> > >
> > >[/color]
> >
> > Regards,
> > Oleg
> > _______________ _______________ _______________ _______________ _
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83[/color]
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>[/color]

Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postg resql.org so that your
message can get through to the mailing list cleanly

**Ben** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

On Tue, 2004-02-17 at 03:15, Oleg Bartunov wrote:
[color=blue]
> Do you want '100' or 'hundred' will be fully equivalent ? So,
> if you search '100' you will find document with 'hundred'. Interesting,
> that you will find '123', because '123' will be 'one hundred twenty three'.[/color]

Yeah, for a general case of documents I'm not sure how accurate it would
make things, but I'm trying to index music artist names and song titles,
where I'd get things like "3 Dog Night".... or is that "Three Dog
Night"? :)
[color=blue]
> What's the problem ? You may configure which dictionaries and in what order
> should be used for given type of token (pg_ts_cfgmap table).
> Aha, I got your problem:[/color]
[color=blue]
> Once word is recognized by synonym dictionary it will not pass to
> next dictionary ! This is how tsearch2 is working with any dictionary.[/color]

Yep, that's my problem. :) And it seems that if I could pass the normal
words into an ispell dictionary before passing them on to the en_stem
dictionary, I'd get spell checking for free. Unless there's a better way
to give "did you mean: <your search spelled correctly>?" results....?

I know doing this would increase the size of the generated ts_vector,
but for my case, where what I'm indexing is generally only a few words
anyway, that's not an issue. As it is, I'm already going to get rid of
the stop words file, so that I can actually find things like "The Who."

How hard do you think it would be to change up the behavior to make this
happen? I
[color=blue]
> What do you want from parser ?[/color]

I want to be able to recognize symbols, such as the degree (Â°) and
vulgar half (Â½) symbols.

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

PostgreSQL: Not Found

http://www.postgresql.org/docs/faqs/FAQ.html

**Oleg Bartunov** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

On Tue, 17 Feb 2004, Ben wrote:
[color=blue]
> On Tue, 2004-02-17 at 03:15, Oleg Bartunov wrote:
>[color=green]
> > Do you want '100' or 'hundred' will be fully equivalent ? So,
> > if you search '100' you will find document with 'hundred'. Interesting,
> > that you will find '123', because '123' will be 'one hundred twenty three'.[/color]
>
> Yeah, for a general case of documents I'm not sure how accurate it would
> make things, but I'm trying to index music artist names and song titles,
> where I'd get things like "3 Dog Night".... or is that "Three Dog
> Night"? :)
>[color=green]
> > What's the problem ? You may configure which dictionaries and in what order
> > should be used for given type of token (pg_ts_cfgmap table).
> > Aha, I got your problem:[/color]
>[color=green]
> > Once word is recognized by synonym dictionary it will not pass to
> > next dictionary ! This is how tsearch2 is working with any dictionary.[/color]
>
> Yep, that's my problem. :) And it seems that if I could pass the normal
> words into an ispell dictionary before passing them on to the en_stem
> dictionary, I'd get spell checking for free. Unless there's a better way
> to give "did you mean: <your search spelled correctly>?" results....?
>[/color]

If ispell dictionary recognizes a word, that word will not pass to en_stem.
We know how to add "query spelling feature" to tsearch2, just waiting
for sponsorships :) meanwhile, you could use our trgm module, which
implements trigram based spelling correction. You need to maintain
separate table with all words of interests (say, from tsvectors) and
search query words in that table using bestmatch finction.
[color=blue]
> I know doing this would increase the size of the generated ts_vector,
> but for my case, where what I'm indexing is generally only a few words
> anyway, that's not an issue. As it is, I'm already going to get rid of
> the stop words file, so that I can actually find things like "The Who."
>
> How hard do you think it would be to change up the behavior to make this
> happen? I
>[color=green]
> > What do you want from parser ?[/color]
>
> I want to be able to recognize symbols, such as the degree (ôá) and
> vulgar half (ôî) symbols.[/color]

You mean '(TA)', '(TH)' ? I think it's not very difficult. What'd be
a token type ( parenthesis_wor d :?)
[color=blue]
>[/color]

Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

**Ben** · Nov 22 '05, 09:01 AM

Re: making tsearch2 dictionaries

On Tue, 17 Feb 2004, Oleg Bartunov wrote:
[color=blue]
> If ispell dictionary recognizes a word, that word will not pass to en_stem.
> We know how to add "query spelling feature" to tsearch2, just waiting
> for sponsorships :) meanwhile, you could use our trgm module, which
> implements trigram based spelling correction. You need to maintain
> separate table with all words of interests (say, from tsvectors) and
> search query words in that table using bestmatch finction.[/color]

Hm, I'll take a look at this approach. I take it you think piping
dictionary output to more dictionaries in the chain is a bad idea? :)
[color=blue][color=green][color=darkred]
> > > What do you want from parser ?[/color]
> >
> > I want to be able to recognize symbols, such as the degree (ôá) and
> > vulgar half (ôî) symbols.[/color]
>
> You mean '(TA)', '(TH)' ? I think it's not very difficult. What'd be
> a token type ( parenthesis_wor d :?)[/color]

uh, not sure how you got (TA) and (TH)... if you look at the original
message with utf-8 unicode encoding, the sympols come out fine. Or, maybe
you'd just have better luck pointing a browser at a page like
http://homepages.comnet.co.nz/~r-mah...text/utf8.html. I want to be
able to recognize a subset of these symbols, and I'd want another
dictionary I'd make to handle the symbol token to return both the symbol
and the common name as lexemes, in case people spell out the symbol
instead of entering it.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

making tsearch2 dictionaries

making tsearch2 dictionaries

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment