#278 closed defect (fixed)
When changesets are being generated, old changesets aren't cleaned up
Reported by: | Richard Boulton | Owned by: | Dan |
---|---|---|---|
Priority: | low | Milestone: | 1.2.5 |
Component: | Backend-Chert | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | dcolish@… | Blocked By: | |
Blocking: | Operating System: | All |
Description (last modified by )
Currently, changesets are generated when the "XAPIAN_MAX_CHANGESETS" environment variable is set to a non-empty value. However, they are never removed. Whenever a changeset is generated, the number of changesets around should be checked, and old changesets should be removed if too many old changesets exist.
Alternatively (or as well), a different criteria might be useful for the changesets: it might be useful to be able to set an absolute limit on the total size of the changesets, or perhaps, a limit on the total size of the changesets as a proportion of the total database size.
Attachments (3)
Change History (22)
comment:1 by , 16 years ago
Status: | new → assigned |
---|
comment:2 by , 16 years ago
Component: | Other → Backend-Chert |
---|
comment:3 by , 16 years ago
Description: | modified (diff) |
---|
Fixed to not generate a changeset for the first revision in trunk r11069 (description updated).
In order to efficiently implement keeping a maximum number of changesets and/or maximum size of changesets, I think we want to store the lowest changeset number somewhere. Then we can calculate current_revision - lowest_changeset_number and compare with maximum_number to implement that. And if we also store the total size (as number of blocks would make most sense probably), we can then delete the lowest numbered changeset files and decrease this total appropriately until it is within the specified limit. We can also age by date by checking the timestamps of the lowest numbered changeset files.
(We could establish the lowest changeset number in O(log(current_revision)) by binary chop, but that seems unnecessary work...)
comment:4 by , 16 years ago
Blocking: | 227 removed |
---|
(In #227) This is done except for the two blockers, and a bug just to track two others is just clutter so unmarking the blockage and closing.
comment:5 by , 16 years ago
Milestone: | 1.1.0 → 1.1.1 |
---|
This doesn't require API (or even ABI) changes to fix AFAICS - just a file or something noting the lowest changeset number which we could easily handle the absence of by scanning the directory contents or a binary chop. So -> milestone:1.1.1
comment:7 by , 15 years ago
Milestone: | 1.1.4 → 1.1.3 |
---|
comment:8 by , 15 years ago
Priority: | normal → low |
---|
This is a bit of a usability obstacle, but the user could add a cronjob to clean up old ones or something, so it's not a blocker, and it doesn't break API or ABI. So priority low.
by , 14 years ago
Attachment: | remove_max_changesets.diff added |
---|
comment:10 by , 14 years ago
Updated diff uses dbstats for brass backend. For chert and flint it will use the more naive cleanup method. Also I move where the clean up happens to make sure we dont remove files if there is an exception when creating the changeset.
comment:11 by , 14 years ago
I applied a similar change to factor out MAX_OPEN_RETRIES in r15377.
Otherwise this mostly looks good, though I didn't try it yet. A few comments:
- If the changeset file is successfully unlinked, then oldest_changeset isn't updated so I think this will loop forever after successfully removing an old changeset. I think you just want to move the line which increments it down a line so it is outside the if statement.
- I would keep the total document length last (if only because otherwise the comment about why it is last is confusing).
- There's no need for the str() here (or similar cases) - this is iostream style so numeric types can just be written directly:
LOGLINE(DB, "Removed changeset " << str(oldest_changeset));
- Using '{ }' rather than ';' for an empty loop body is a bit more visible, so we prefer that (I've actually documented this convention in r15379).
- There's whitespace added to a blank line in xapian-core/backends/brass/brass_database.h - we try to avoid adding trailing whitespace and other whitespace irregularities (some existing code has it, which we've resisted wholesale fixing as svn blame regards it as a significant change).
- The indent of the conditional #include in brass_version.cc was intentional (for more complex cases, it's easier to read with the indent).
comment:12 by , 14 years ago
Cc: | added |
---|
comment:13 by , 14 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:14 by , 14 years ago
Status: | new → assigned |
---|
comment:15 by , 14 years ago
Looks good to me, but a few nits:
brass:
Comment typo - "being" -> "begin".
Another case where we might delete multiple changeset files is if the value of max_changesets is reduced by the user (for the comment).
I think this will try to delete changeset0 the first time, which is fairly harmless but easy to avoid.
stop_changeset should be brass_revision_number_t not unsigned.
I'd just call the methods on the stats member directly rather than adding another two private methods to BrassDatabase which only get called from one place.
chert and flint:
"changset" -> "changeset" in comment.
testcase:
A simpler (and clearer) check for the changeset not existing is:
TEST(!file_exists(masterpath + "/changes1");
Also might be good to test that if you reduce max_changesets then the intervening ones also get removed (e.g. set max to 4, commit 5 changes, check that the first one got removed, reduce max to 2, commit a change, and check that the last 2 changesets exist but the 3 before that don't).
comment:16 by , 14 years ago
I've just gone through the patch myself, and would echo Olly's comments. Given that the algorithm in brass is new (and a little bit complex) it would also be reassuring to have a test for the situation where max_changesets is increased again. ie:
- set max to 2
- commit 3 changes
- check that only the last 2 changesets exist
- set max to 3
- commit another 2 changesets
- check that only the last 3 changesets exist
This is looking good, though.
by , 14 years ago
Attachment: | max_changesets_post_review.diff added |
---|
comment:17 by , 14 years ago
Ok, I think I got all the changes suggested. One thing that I didn't see an issue with was the 0th changeset, so I left that out.
comment:18 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Landed patch at revision 15403.
comment:19 by , 14 years ago
Milestone: | 1.2.x → 1.2.5 |
---|
(Setting component to Backend-Chert, though it also affects flint.)