Opened 15 years ago
Closed 15 years ago
#427 closed defect (fixed)
xapian-compact results in corrupt postlist table (test data included)
Reported by: | Henry | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.1.4 |
Component: | Backend-Chert | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description
Either xapian-compact is corrupting the data, or the data is corrupt to begin with (even though xapian-check reports source indexes are ok).
How to reproduce:
Extract tgz file (creates two folders: index1 & temp.index1).
Check sources:
# xapian-check-1.1 index1/
# xapian-check-1.1 temp.index1/
(both should test OK).
# mkdir dst
Test1 - compacts OK:
# xapian-compact-1.1 temp.index1/ dst
# xapian-check-1.1 dst
(should check out OK).
Test2 - compact reports OK, but check fails:
# rm dst/*
# xapian-compact-1.1 index1/ dst
# xapian-check-1.1 dst
(reports errors in postlist).
Needless to say, compacting both index1 and temp.index1 into dst will compact OK, but the check will fail.
Attachments (3)
Change History (11)
by , 15 years ago
Attachment: | testidxfolders.tgz added |
---|
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Severity: | normal → major |
---|
Further tests confirm the following:
# merges ok, xapian-check-1.1 ok; composite_index can be searched on: xapian-compact-1.1 src1 src2 src3 srcN... composite_index
# merges ok, xapian-check-1.1 fails; big_composite_index cannot be search on: xapian-compact-1.1 composite_index1 composite_indexN... big_composite_index
help!
comment:3 by , 15 years ago
Component: | Other → Backend-Chert |
---|---|
Milestone: | → 1.1.4 |
Severity: | major → normal |
Status: | new → assigned |
I found a more minimal example - the spelling and position tables aren't relevant, and neither is temp.index1. This reproduces the issue:
$ rm -rf index1/spelling.* $ rm -rf index1/position.* $ (rm -rf dst;../bin/xapian-compact index1 dst && ../bin/xapian-check dst) 2>&1|more
Running delve on index1 reports 611 distinct terms, while for dst it reports 710, so it appears there's a bug in xapian-compact here as that statistic shouldn't be changed by compaction. The alternative seems to be that index1 is malformed in a way which xapian-check doesn't detect.
This appears to be trunk-only (the chert format has changed incompatibly since 1.1.3) so lowering the priority, and marking for 1.1.4.
comment:4 by , 15 years ago
Thanks for the details - I've reproduced the error following the steps in your last comment. xapian-check passes on index1, but fails on the output of xapian-compact, so I'm pretty sure the error is in the trunk version of xapian compact.
comment:5 by , 15 years ago
The patch I've just applied partially fixes this: the databases produced with this fix appear to be nearly valid, but return a value of 0 for get_lastdocid() - I think there's a secondary problem causing this which will need a different fix.
My patch addresses the following issue: there was an off-by-one error in the truncation of the key of follow-on chunks of postlists in PostlistCursor which was meant to make the key into the equivalent key for an initial chunk. My fix needs a testcase (which will in turn need a database in which the postlist for a term is split into more than one chunk), but seems to help, and makes sense to me. The problem is hidden in flint because all keys have a trailing '\0' byte, so the key for the first chunk in a postlist had the '\0' byte trailing when returned from PostlistCursor, but still matched the key for the next chunk. For Chert and Brass, the first chunk's key doesn't have a trailing '\0', so didn't match the following keys after they had been (insufficiently) truncated.
comment:6 by , 15 years ago
Just written a testcase which exhibits this problem. It currently fails for chert and brass, but passes for flint.
comment:7 by , 15 years ago
comment:8 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I've just gone back and rechecked with the original test files supplied, and these changes now produce a database which passes xapian-check happily. The problem only existed in chert and brass which are not in the 1.0 branch, and only in trunk (not in the 1.1.3 release), so marking this as closed: no need to backport.
Let me know if you need the individual source indexes (100 of them, split into two). The ones I used for the two test indexes were individually checked with xapian-check as well.