Xapian should provide a way to securely remove a document from the database
|Reported by:||Daniel Kahn Gillmor||Owned by:||Olly Betts|
currently, if i remove a document from a xapian index, the indexed terms remain in the db, but are marked as part of the freelist.
This means that removal of a document is "insecure" in the sense that if someone gained access to the index after message deletion, they could recover information about the document by inspecting the contents of the freelist.
There may be other traces of a document that are retained in the index as well: for example, on IRC, olly mentioned:
oh, there's one awkward thing in the backend stuff -- dividing keys get created in the branch levels based on the leaf level keys around where the block is split
Some of these fixes may be easier to do than others.
For example, it might be pretty easy to zero blocks when they're returned to the freelist, but it might be harder to deal with the dividing keys. It's still worth fixing the easy parts, even if some harder challenges remain.
Another way to think about the problem is one of "index reproducibility" -- if an index contains exactly the same set of documents as another index, a byte-for-byte identical data store on disk is the ideal. Any divergence from that ideal leaks some information about documents that have been added to the database in the past, and then subseqently removed.
It's possible that any of these fixes incur a cost that some people are reluctant to pay (e.g. they're not concerned about the confidentiality of any of their indexed documents, or they're confident in the long-term confidentiality of the index itself for other reasons). So it seems likely that the feature needs to be optional. Whether the choice of feature is opt-in or opt-out; and whether the choice is made done on a per-deletion basis, or a per-database basis, or a per-xapian-session basis, i don't know.
I'm happy to review API proposals if that'd be useful.