Opened 15 years ago

Last modified 22 months ago

#444 new enhancement

xapian-compact --multipass should use flat intermediate files

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Backend-Glass Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Current --multipass creates temporary intermediate B-trees, but we write these in sorted key order, and then reread them in the same order, so we could just use a flat file with a format like something like:

<length of key><key><length of tag><tag>

Using a flat file means less I/O (and we'll be I/O bound here), the I/O will be linear (which is easier for the OS, FS, and hardware to handle efficiently), and also less CPU.

Probably worth prefix-compressing the keys, since we're I/O bound here, and using less intermediate disk space is also a bonus:

<length of previous key to reuse><length of key tail><key tail><length of tag><tag>

A quick estimate suggests that the dump file will probably be 8-9% smaller than the equivalent intermediate table, so assuming I/O is the only factor, we'd save about that time on the intermediate compacting stages for the postlist table. In fact there's CPU time too, and we'll save on that (so assuming I/O is the only factor is probably OK). We'll also save a bit for doing purely linear I/O. Reading the source databases, and writing the final databases wouldn't be sped up, and neither would the the other tables (which we can just copy over in turn).

No ABI or API changes required, so marking for 1.2.x.

Change History (4)

comment:1 by Olly Betts, 12 years ago

Milestone: 1.2.x1.3.x

1.3.x material now.

comment:2 by Olly Betts, 10 years ago

Milestone: 1.3.x1.4.x

Not a blocker for 1.4.0.

comment:3 by Olly Betts, 5 years ago

Component: Backend-ChertBackend-Glass
Version: SVN trunkgit master

comment:4 by Olly Betts, 22 months ago

Milestone: 1.4.x2.0.0
Note: See TracTickets for help on using tickets.