Opened 8 years ago

Last modified 3 years ago

#444 new enhancement

xapian-compact --multipass should use flat intermediate files

Reported by: olly Owned by: olly
Priority: normal Milestone: 1.4.x
Component: Backend-Chert Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All


Current --multipass creates temporary intermediate B-trees, but we write these in sorted key order, and then reread them in the same order, so we could just use a flat file with a format like something like:

<length of key><key><length of tag><tag>

Using a flat file means less I/O (and we'll be I/O bound here), the I/O will be linear (which is easier for the OS, FS, and hardware to handle efficiently), and also less CPU.

Probably worth prefix-compressing the keys, since we're I/O bound here, and using less intermediate disk space is also a bonus:

<length of previous key to reuse><length of key tail><key tail><length of tag><tag>

A quick estimate suggests that the dump file will probably be 8-9% smaller than the equivalent intermediate table, so assuming I/O is the only factor, we'd save about that time on the intermediate compacting stages for the postlist table. In fact there's CPU time too, and we'll save on that (so assuming I/O is the only factor is probably OK). We'll also save a bit for doing purely linear I/O. Reading the source databases, and writing the final databases wouldn't be sped up, and neither would the the other tables (which we can just copy over in turn).

No ABI or API changes required, so marking for 1.2.x.

Change History (2)

comment:1 Changed 5 years ago by olly

  • Milestone changed from 1.2.x to 1.3.x

1.3.x material now.

comment:2 Changed 3 years ago by olly

  • Milestone changed from 1.3.x to 1.4.x

Not a blocker for 1.4.0.

Note: See TracTickets for help on using tickets.