Changes between Version 3 and Version 4 of FlintPositionListTable
- Timestamp:
- 2008-08-21 07:46:25 (2 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
FlintPositionListTable
v3 v4 2 2 3 3 This page describes the format of the Position List table in the FlintBackend. 4 This table stores the list of positions in a given document at which a term appears. Term positions are required for phrase queries. 4 This table stores the list of positions in a given document at which a term appears. 5 Term positions are required for phrase queries. 5 6 6 7 == Key Format == 7 8 8 Quartz stores the key as: pack_uint(did) + tname 9 {{{ 10 pack_uint_preserving_sort(docid) + tname 11 }}} 9 12 10 Flint currently uses pack_uint_preserving_sort(docid) + tname - this sometimes takes one extra byte 11 compared to 12 Quartz, but means that when appending documents to a database, the insert is always 13 in the same place (at the "end" of the table). This is faster, and produces a more compact 14 database without a separate compactionstep.13 This sometimes takes one extra byte compared to Quartz, which used 14 {{{pack_uint(did) + tname}}}, but it does mean that when appending documents 15 to a database, the insert is always in the same place (at the "end" of the table). 16 This is faster, and produces a more compact database without a separate compaction 17 step. 15 18 16 (The eventual plan is to subclass the compare routine so we can store the key17 as compactly as Quartz d oes but keep the improved sort order.)19 The eventual plan is to subclass the compare routine so we can store the key 20 as compactly as Quartz did but keep the improved sort order. 18 21 19 22 == Tag Format == 20 23 21 Quartz store the differences between each position and the previous one, using the 7 bit encoding 22 scheme implemented by pack_uint() and unpack_uint(). 23 24 Flint uses an interpolative coding to store positions (pretty much as described in Managing Gigabytes). 24 Flint uses an interpolative coding to store term positions (pretty much as described in 25 Managing Gigabytes). This is particularly compact when there are many occurrences of 26 a term in a document, which helps speed up positional searches involving common terms.
