Changes between Version 3 and Version 4 of FlintPositionListTable

Show
Ignore:
Timestamp:
2008-08-21 07:46:25 (2 years ago)
Author:
olly
Comment:

Update for 1.0.7

Legend:

Unmodified
Added
Removed
Modified
  • FlintPositionListTable

    v3 v4  
    22 
    33This page describes the format of the Position List table in the FlintBackend. 
    4 This table stores the list of positions in a given document at which a term appears. Term positions are required for phrase queries. 
     4This table stores the list of positions in a given document at which a term appears. 
     5Term positions are required for phrase queries. 
    56 
    67== Key Format == 
    78 
    8 Quartz stores the key as: pack_uint(did) + tname 
     9{{{ 
     10pack_uint_preserving_sort(docid) + tname 
     11}}} 
    912 
    10 Flint currently uses pack_uint_preserving_sort(docid) + tname - this sometimes takes one extra byte 
    11 compared to 
    12 Quartz, but means that when appending documents to a database, the insert is always 
    13 in the same place (at the "end" of the table).  This is faster, and produces a more compact 
    14 database without a separate compaction step. 
     13This sometimes takes one extra byte compared to Quartz, which used 
     14{{{pack_uint(did) + tname}}}, but it does mean that when appending documents 
     15to a database, the insert is always in the same place (at the "end" of the table). 
     16This is faster, and produces a more compact database without a separate compaction 
     17step. 
    1518 
    16 (The eventual plan is to subclass the compare routine so we can store the key 
    17 as compactly as Quartz does but keep the improved sort order.) 
     19The eventual plan is to subclass the compare routine so we can store the key 
     20as compactly as Quartz did but keep the improved sort order. 
    1821 
    1922== Tag Format == 
    2023 
    21 Quartz store the differences between each position and the previous one, using the 7 bit encoding 
    22 scheme implemented by pack_uint() and unpack_uint(). 
    23  
    24 Flint uses an interpolative coding to store positions (pretty much as described in Managing Gigabytes). 
     24Flint uses an interpolative coding to store term positions (pretty much as described in 
     25Managing Gigabytes).  This is particularly compact when there are many occurrences of 
     26a term in a document, which helps speed up positional searches involving common terms.