Opened 15 years ago
Last modified 22 months ago
#444 new enhancement
xapian-compact --multipass should use flat intermediate files
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Backend-Glass | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Current --multipass creates temporary intermediate B-trees, but we write these in sorted key order, and then reread them in the same order, so we could just use a flat file with a format like something like:
<length of key><key><length of tag><tag>
Using a flat file means less I/O (and we'll be I/O bound here), the I/O will be linear (which is easier for the OS, FS, and hardware to handle efficiently), and also less CPU.
Probably worth prefix-compressing the keys, since we're I/O bound here, and using less intermediate disk space is also a bonus:
<length of previous key to reuse><length of key tail><key tail><length of tag><tag>
A quick estimate suggests that the dump file will probably be 8-9% smaller than the equivalent intermediate table, so assuming I/O is the only factor, we'd save about that time on the intermediate compacting stages for the postlist table. In fact there's CPU time too, and we'll save on that (so assuming I/O is the only factor is probably OK). We'll also save a bit for doing purely linear I/O. Reading the source databases, and writing the final databases wouldn't be sped up, and neither would the the other tables (which we can just copy over in turn).
No ABI or API changes required, so marking for 1.2.x.
Change History (4)
comment:1 by , 12 years ago
Milestone: | 1.2.x → 1.3.x |
---|
comment:3 by , 5 years ago
Component: | Backend-Chert → Backend-Glass |
---|---|
Version: | SVN trunk → git master |
comment:4 by , 22 months ago
Milestone: | 1.4.x → 2.0.0 |
---|
1.3.x material now.