Opened 15 years ago
Last modified 20 months ago
#388 assigned defect
Allow lazy document deletion
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Backend-Honey | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Currently, document deletion works by reading the termlist for the document, and updating the posting lists for each term immediately. Instead, we could keep a list of deleted document ids, and merge this list with the posting lists when iterating through them. This would allow deletion to be performed even when the termlist table is not present. The downside is that the statistics for the frequency of terms in the deleted documents would be incorrect.
Xapian-compact should be updated to merge the list with the database (ie, to apply the deletions) - this could probably be done with little extra overhead, since xapian-compact already needs to read through the posting lists.
Change History (5)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Component: | Backend-Chert → Backend-Brass |
---|
Marking for brass rather than chert.
comment:3 by , 10 years ago
Component: | Backend-Brass → Backend-Glass |
---|
comment:4 by , 5 years ago
Component: | Backend-Glass → Backend-Honey |
---|---|
Milestone: | → 1.5.0 |
Status: | new → assigned |
Version: | SVN trunk → git master |
Let's try to do this for honey.
comment:5 by , 20 months ago
Milestone: | 1.5.0 → 2.0.0 |
---|
I think it's probably optimistic to think the overhead to fix this up in xapian-compact would be "little" - currently it copies chunks unmodified (except for a tweak to the start of chunks in some cases when renumbering documents). Having to unpack and repack them without some entries is going to add significant CPU use, though shouldn't change I/O much.