The Flint Position List Table

This page describes the format of the Position List table in the FlintBackend. This table stores the list of positions in a given document at which a term appears. Term positions are required for phrase queries.

Key Format

pack_uint_preserving_sort(docid) + tname

This sometimes takes one extra byte compared to Quartz, which used pack_uint(did) + tname, but it does mean that when appending documents to a database, the insert is always in the same place (at the "end" of the table). This is faster, and produces a more compact database without a separate compaction step.

The eventual plan is to subclass the compare routine so we can store the key as compactly as Quartz did but keep the improved sort order.

Tag Format

Flint uses an interpolative coding to store term positions (pretty much as described in Managing Gigabytes). This is particularly compact when there are many occurrences of a term in a document, which helps speed up positional searches involving common terms.

Last modified 12 years ago Last modified on 08/21/08 07:46:25
Note: See TracWiki for help on using the wiki.