Roadmap

Here's some of the changes I'm planning to work on soon, probably in roughly this order:

  • Alter the "7 bit" coding to eliminate the multiple encodings of values (which will reduce the number of bytes needed for some values). Note: experiments seem to show that the size reduction is minimal. Since the encoding is more complicated to work with, perhaps this change isn't worthwhile - we could redefine the redundant encodings for other purposes instead (e.g. to encode ranges in postlists).
  • Write a replacement Btree manager which compresses keys, has a specialised format for branch blocks, and is structured in a more helpful way for future changes I want to make. To reduce the amount of work required to get back to a testable system, the initial version is likely to be missing the following features (which will get added once the initial version is working well):
    • reading while updating
    • deleting tags
    • modifying tags (except for a special case to cope with the magic document length tag)
    • Subclass the key comparison method so we don't need to jump through hoops to make keys sort in byte order. I think we'll also need a "shortest dividing key" method too.
  • Store Btree tags which are larger than a block by filling whole blocks with tag data and only storing the partial block in the Btree itself.
  • Allow update in "dangerous" mode, similar to the quartz DANGEROUS patch, but more cleanly integrated into the backend. This will write modified btree blocks back in place (rather than copying the changed block and all parent blocks to the root). This is somewhat faster (especially once we're I/O bound) and gives a more compact database due to the lack of churn of blocks. However the database can't be safely searched during indexing and if indexing is terminated uncleanly, the database may be corrupt. Despite the risks, it's useful for speeding up a full rebuild, especially if you are running on a UPS protected server. Flint should lock a database in dangerous update mode in a way which (a) will fail if there are existing readers and (b) will prevent readers from opening the database while dangerous update is in progress.