Opened 17 years ago

Closed 6 years ago

Last modified 6 years ago

#156 closed defect (fixed)

Testsuite should allow tests to mark themselves as known failures

Reported by: Richard Boulton Owned by: Olly Betts
Priority: normal Milestone: 1.4.6
Component: Test Suite Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

The idea of this is that if you find a bug, and have produced a test case to exhibit the bug, it is helpful to check that test case into SVN so that others can see the problem and work on it, but it is not desirable to break SVN HEAD. Therefore, it is helpful to be able to mark the test as a known failure, and commit it.

The testsuite will report the known failures (in yellow, if colour output is supported), but still report a success status. If a test which is marked as a known failure actually passes, the testsuite will report this as an error; this should ensure that the known failure marker is removed once the problem being tested is fixed.

Attachments (2)

HACKING (45.5 KB ) - added by Richard Boulton 17 years ago.
Implementation of this suggestion
patch (10.6 KB ) - added by Richard Boulton 17 years ago.
Implementation of this suggestion

Download all attachments as: .zip

Change History (10)

by Richard Boulton, 17 years ago

Attachment: patch added

Implementation of this suggestion

comment:1 by Richard Boulton, 17 years ago

attachments.isobsolete: 01

comment:2 by Olly Betts, 17 years ago

I can see that "expected fail" is useful for a project like (say) GCC or libtool, where there are a significant number of bugs which simply aren't going to get fixed anytime soon because they're platform dependent and developers don't have access to the necessary platforms, so it is useful to include knowledge of these in the test suite. Then users who run checks on platforms the developers don't have access to can discover that bugs may have been fixed by other changes.

But we just don't have bugs like that generally.

So to me, this change just seems mean that we'll have more places to look for things - it would be better to keep the bug report and testcases and patches together in the bug tracker rather than encourage them to get separated. It's already annoying when some information about a bug is in a mailing list thread, or users send mail in response to bug report discussion (which probably isn't helped by bugzilla's clunkiness).

comment:3 by Richard Boulton, 17 years ago

Operating System: All
Resolution: wontfix
Status: newclosed

My view is that I don't see that having a test checked into the testsuite makes it hard to keep track of from the bug report. My main reason for wanting this is that when I find a problem, I tend to want to build a version of xapian with the test showing the problem. And then, build a version which fixes it. If I could first commit the failing test, it would be easier to satisfy myself that I've really fixed the problem. Because my second commit wouldn't change the testsuite, but would make it pass. It's not that big a deal, really, but would just let me work in the way which seems natural.

Olly points out that you don't need to commit the testsuite change to try it without the patch - just svn diff the non-testsuite changes, patch -R, make check, patch, make check. And that the testsuite change sometimes turns out not to be quite right anyway in my experience...

However, it just seems odd from my way of looking at things not to use the VCS to do the work of looking after that for you.

Anyway. I don't really feel strongly enough about this to worry about it, so I'm marking this bug as WONTFIX. If we ever get a huge development team and lots of platform specific bugs, maybe it'll be reopened.

comment:4 by Olly Betts, 6 years ago

Description: modified (diff)
Resolution: wontfix
Status: closedreopened

I'm wondering if we should revisit this.

Since the switch to git, the style of development is now more branch-based, and there's a lot to be said for being able to add a testcase on a branch as XFAIL in one commit, then in a separate commit (or commits) fix the bug and drop the XFAIL. With CI, a reviewer can then see that the testcase was actually failing before the fix was applied (avoiding the easy pitfall of adding a regression test which actually passed before the fix too).

And that the testsuite change sometimes turns out not to be quite right anyway in my experience...

With git you can rewrite history prior to submitting the changes for review, and fix-up the commit with the testsuite change.

The patch looks plausible (though I haven't tried to apply it to current sources and I suspect it'll need a little adjustment after this much time - at least the numeric range fix needs dropping, as we fixed that years ago).

I think it'd be nice to allow an informational message in KNOWN_FAILURE (e.g. something like KNOWN_FAILURE("NumberVRP assumes numbers sort as their string values"); for the example bug in the patch). The current bool flag could just be a const char* which defaulted to NULL, assuming a static message is sufficient.

comment:5 by James Aylett, 6 years ago

+1 on having this functionality; I wonder if it'd be good to be able to run the test suite requiring known failures to fail, and then running that in CI for master, to reduce the risk of bad merges dropping regression fixes.

comment:6 by Olly Betts, 6 years ago

If I read the patch right, an testcase marked "XFAIL()" which passes results in the test run failing overall (which I think makes sense and seems to align with at least some other test harnesses which support "XFAIL"). Other harnesses seem to use "XPASS" for "test marked XFAIL() which passed", which seems like a convention we probably should follow.

Failing on "XPASS" means "XFAIL()" isn't suitable for flagging a flaky test, but if we really want something for that, then a separate "FLAKY()" annotation of some sort would make more sense. I'm not sure we want to go there (making flaky tests not flaky would be better), and if we do we should look at what other harnesses do before we just invent something different. Somewhat related is that we already skip tests which use timing to determine O() behaviour if $AUTOMATED_TESTING is set - these can be a little bit flaky in the face of variable load (and that's pretty much inherent in what they do), but sufficiently rarely that it really only causes a problem for automated builds. This skipping is automated by the timing machinery in the test harness.

comment:7 by Olly Betts, 6 years ago

Keywords: GoodFirstBug added

comment:8 by Olly Betts, 6 years ago

Keywords: GoodFirstBug removed
Resolution: fixed
Status: reopenedclosed

Merged in dbebdc17000c807ac86346dd1bc2e89bb62cc07c for master.

Aside from a lot of adjustments for changes in the underlying code, the main change I've made is to use the XFAIL/XPASS nomenclature that some other harnesses use.

I'm thinking that we backport this to 1.4 only if there's a patch to backport which needs it.

comment:9 by Olly Betts, 6 years ago

Milestone: 1.4.6

The patch backported with only a trivial conflict so I've just backported it so it's ready for backporting anything which makes use of it: 6dbca48739aea8c1aa0006082e2acdbd4eef82ea

Note: See TracTickets for help on using tickets.