Opened 11 years ago

Last modified 11 months ago

#639 assigned enhancement

omega : should reindex file's write when needed

Reported by: egarette Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Omega Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

If we change write access for a specified file, omega don't reindex file, so user that lost right access could see it in query.

In ticket #632 i suggest to used ctime instead of mtime. But this proposal is not suitable.

Here is the Olly's reply:

The change from mtime to ctime will mean that the "last modified" time reported in the Omega UI will now in general not actually be the last time the contents of the file were changed.

I'm also slightly concerned that the mtime -> ctime change will result in reindexing files in many more cases - e.g. if I tar up a file tree and Xapian database and untar it on another machine (as a non-privileged user), the mtimes are preserved but the ctimes change. So this change would mean that omindex would have to reindex every document in this case (and without root access, I don't think one can avoid that).

I think we probably need to store the ctime separately (so lastmod still works as before) and make whether ctime or mtime is used for reindexing an option, or else find a better way to know when ACLs have changed - perhaps only checking the ACL for changes if the ctime has changed but the mtime hasn't.

Change History (7)

comment:1 by Olly Betts, 10 years ago

Component: OtherOmega
Milestone: 1.3.3

Marking to consider for 1.3.3.

comment:2 by Olly Betts, 10 years ago

Status: newassigned

[2853cdace3ab8ba4d23a1dd568f207b1bbbbb4b5] adds a --track-ctime option which stores ctime and uses it instead of mtime to decide if we reindex. But mtime is still used in the UI.

I realised we can actually easily check for the case when the file contents are the same and only the inode metadata has changes (newer ctime but same mtime), and in this case just update the terms and values for that metadata in the existing document - that's a lot less work, especially if a slow filter is involved (e.g. if we are doing OCR to get document text). I've not implemented this optimisation yet though.

comment:3 by Olly Betts, 10 years ago

Blocking: 632 added

(In #632) Need to get #639 sorted first, so bumping the milestone on this.

comment:4 by Olly Betts, 9 years ago

Milestone: 1.3.31.3.4

I don't want to delay 1.3.3 any longer, so bumping the rest of this to 1.3.4.

comment:5 by Olly Betts, 9 years ago

Milestone: 1.3.41.3.5

Grouping the omega 1.3.* tickets on 1.3.5.

comment:6 by Olly Betts, 9 years ago

Blocking: 632 removed
Milestone: 1.3.51.4.x
Version: git master

The thing left to do here is the optimisation for the case of "ctime changed, mtime unchanged". I don't think it makes much sense to block 1.4.0 by that - it's not a correctness issue. This also no longer blocks 632.

comment:7 by Olly Betts, 11 months ago

Milestone: 1.4.x2.0.0

It'd be good to finish off this work, but it's an optimisation rather than a correctness thing, and could be added in a stable release so postponing in the interests of actually getting a new stable release series started.

Note: See TracTickets for help on using tickets.