Opened 15 years ago

Last modified 20 months ago

#388 assigned defect

Allow lazy document deletion

Reported by: Richard Boulton Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Backend-Honey Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Currently, document deletion works by reading the termlist for the document, and updating the posting lists for each term immediately. Instead, we could keep a list of deleted document ids, and merge this list with the posting lists when iterating through them. This would allow deletion to be performed even when the termlist table is not present. The downside is that the statistics for the frequency of terms in the deleted documents would be incorrect.

Xapian-compact should be updated to merge the list with the database (ie, to apply the deletions) - this could probably be done with little extra overhead, since xapian-compact already needs to read through the posting lists.

Change History (5)

comment:1 by Olly Betts, 15 years ago

I think it's probably optimistic to think the overhead to fix this up in xapian-compact would be "little" - currently it copies chunks unmodified (except for a tweak to the start of chunks in some cases when renumbering documents). Having to unpack and repack them without some entries is going to add significant CPU use, though shouldn't change I/O much.

comment:2 by Olly Betts, 15 years ago

Component: Backend-ChertBackend-Brass

Marking for brass rather than chert.

comment:3 by Olly Betts, 10 years ago

Component: Backend-BrassBackend-Glass

comment:4 by Olly Betts, 5 years ago

Component: Backend-GlassBackend-Honey
Milestone: 1.5.0
Status: newassigned
Version: SVN trunkgit master

Let's try to do this for honey.

comment:5 by Olly Betts, 20 months ago

Milestone: 1.5.02.0.0
Note: See TracTickets for help on using tickets.