Pointer Artithmetic on DB_File values: i need it (or a very efficient solution for this problem)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Antonio

    Pointer Artithmetic on DB_File values: i need it (or a very efficient solution for this problem)

    Hope that some perl guru will help me:

    I've implemented an inverted index using PERL + DB_File (using hash,
    and b-tree). Therefore, i have my indexer and my searcher (a little
    search engine, indead).

    Given a word in the lexicon i stored in B-tree the corrispondent
    inverted list,
    using gap coding. For istance i have:

    word_r -> id_1r:freq_1r,i d_2r:freq_2r,id _3r:freq_3r,... .....,
    id_nr:freq_nr
    word_s -> ...

    where freqij is just the freq of word_j in document doc_i.

    gap encoding means that id_ij is expressed as difference with
    id_(i-1)j

    Now, i need to write some Cursor obj with state (call it
    postingCursor).
    The posting in fetched in a scalar and (a reference to it!) is saved
    in the Cursor.

    I need to write efficiently a function:

    $postingCursorO BJ->getNext($minID )

    which should return the next available (if any) id_lr >= minID.

    1) I need some way to fetch from the disk, just the portion of
    inverted list involved in current operations. Do tie or DBI support
    this?

    2) I need a way to unroll incrementally the inverted list ? Is this a
    case
    where pointer arithmethic could help ? or there is any efficent
    solution
    which i missing. I think that splitting it is the wrong way to do it.

    PS: the original problem is to find the common IDs, given two inverted
    lists
    L1 and L2 which store IDs sorted in ascending way. This should be done
    in
    O(min {n, m)) instead of O(m+n).
  • Antonio

    #2
    Re: Pointer Artithmetic on DB_File values: i need it (or a very efficient solution for this problem)

    gulli@di.unipi. it (Antonio) wrote in message news:<6c6b5814. 0307080048.2bf8 6f20@posting.go ogle.com>...[color=blue]
    >
    > PS: the original problem is to find the common IDs, given two inverted
    > lists
    > L1 and L2 which store IDs sorted in ascending way. This should be done
    > in
    > O(min {n, m)) instead of O(m+n).[/color]

    It seems that i found a solution using index and substr (it's my fault
    to forget about the importance of index !).

    A question: given a scalar, do index/substr access to the specified offset
    in O(1)?

    Comment

    Working...