Opened 12 years ago
Closed 12 years ago
#615 closed defect (fixed)
xapian-compact -m (multipass) trashes value 0 (chert 1.2.13)
Reported by: | mjy | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.14 |
Component: | Backend-Chert | Version: | 1.2.13 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description (last modified by )
When compacting multiple databases into one with xapian-compact, the option -m causes data corruption at value 0 of the document (sometimes?). xapian-check reports this on the created database:
[...] Value slot 0 has value above upper bound: ' ' > '' Value slot 0 has value above upper bound: ' ' > '' Value slot 0 has value above upper bound: ' ' > '' Value slot 0 has value above upper bound: '¨' > '' Value slot 0 has value above upper bound: ' ' > ''
The resulting database will be fine without -m though (with and without --no-renumber).
This can be reproduced with a set of databases created with 1.2.13, Linux x86-64 (total size 643MB and confidential-ish, so not attaching).
Attachments (1)
Change History (7)
comment:1 by , 12 years ago
Description: | modified (diff) |
---|
comment:2 by , 12 years ago
The value 0 is a small floating point number (but often 1), stored with Search::Xapian::sortable_serialise() (Perl API), it is always set in my case:
$doc->add_value(0, Search::Xapian::sortable_serialise($x->{weight}));
It happened while merging 20 databases, but by deleting some while testing, I got the test case down to 6 (I didn't try further, it does take a while...).
(I'll email you a link for the database ...)
comment:3 by , 12 years ago
I had a quick look at the code, and it basically does a multi-way merge by doing repeated calls to the same merging code. The only real difference is that the metainfo entry isn't created in the intervening copies, and this means we end up skipping the first item in the table, which the attached patch fixes. I've not had a chance to try it on your testcase yet, but I'm fairly sure it is wrong as it is, and looks it looks like it could cause the issues you report, so I'm attaching it here so you can try it out if you want.
comment:4 by , 12 years ago
The patch (applied against 1.2.13 release) fixes the problem for my test case, xapian-check is quiet too (both with --no-renumber and without). Thanks!
comment:5 by , 12 years ago
Milestone: | → 1.2.14 |
---|---|
Status: | new → assigned |
Committed to trunk in r17100. In writing a regression test for this, I found a further issue - the database doclength upper and lower bound aren't updated correctly in this case. I've fixed that too in the committed patch, and filed #617 to remind us to enhance xapian-check to perform this additional check.
This needs backporting to the 1.2 branch.
Are you putting anything in value slot 0? If not, are you value slots at all?
I'm not clear if the bounds are getting updated incorrectly, or if we're somehow accidentally creating entries in slot 0 which shouldn't be there.
Also, how many databases are you merging?