#141 closed enhancement (released)
Writable xapian-tcpsrv should allow queueing of connections
Reported by: | Mark Hammond | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Backend-Remote | Version: | SVN trunk |
Severity: | minor | Keywords: | |
Cc: | Richard Boulton, Sidnei da Silva | Blocked By: | |
Blocking: | #120 | Operating System: | All |
Description
From xapian-devel:
There may be 3 stages of this bug:
- In the first instance a writable server enforces a simple "one connection at a
time" lock.
- In the second instance, a remote connection includes a 'writable needed' flag
when connecting and server enforces only one writable connection.
- Finally, we implement a queue, where we respond to the client that a writable
connection exists and we will block until it finishes. I'll avoid too much implementation speculation in this initial comment :)
Attachments (3)
Change History (15)
by , 18 years ago
Attachment: | test_multi_writers.py added |
---|
comment:1 by , 18 years ago
Status: | new → assigned |
---|
Looks pretty good - a few comments:
There's no point using add_database() in the check that we can open all the read-only databases - just create and ignore each one like you do for the writable case.
You're missing spaces in the dissected "Starting server" message (and "port" might as well be factored out too!)
xapian-progsrv inherently only has a single connection which is established right away, so checking that the database can be opened in xapian-progsrv.cc is just adding start-up overhead for no benefit.
Replacing db_->get_description() with dbpaths[0] makes the error context less useful - a little helper function with joins all the elements of dbpaths[0] with spaces between would be better.
initialise() is now only used once, so might as well be merged into the ctor.
"its" should be "it's" (comment in ~RemoteServer)!
(And unconnected with your changes, RemoteServer's default timeout values are probably unused, in which case they should be removed).
If you have time to address those that would be great. Otherwise I can probably sort them out, but not right now as I need to stop being distracted from rejigging index_text.
comment:2 by , 18 years ago
Blocking: | 118 added |
---|---|
Summary: | tcp server doesn't enforce one-writer constraint → xapian-tcpsrv doesn't enforce one-writer constraint |
I believe we should try to make sure that at least the first stage as listed in the original report is implemented for 1.0.0, since it's a rather nasty hole in our database locking scheme, and we have a patch to fix it which is "mostly there".
I've now addressed the parenthetical point about the unused timeouts in RemoteServer - they're history.
I'll have a think about the test harness. It's great for running arbitrary API calls against anonymous databases, but it's hard work to do this sort of stuff as it stands.
comment:3 by , 18 years ago
Blocking: | 142 added |
---|
comment:4 by , 18 years ago
attachments.isobsolete: | 0 → 1 |
---|
comment:5 by , 18 years ago
I'm working on applying this, but for some reason it causes problems with running tests under valgrind. I'm investigating...
comment:6 by , 18 years ago
Blocking: | 120 added; 118 removed |
---|---|
Severity: | normal → enhancement |
Summary: | xapian-tcpsrv doesn't enforce one-writer constraint → Writable xapian-tcpsrv should allow queueing of connections |
Turns out I get the same valgrind issues without the patch too! So committed a revised version (the version here doesn't compile on UNIX, xapian-progsrv doesn't work, and GCC warns about the order of class member initialisation).
So this bug is no longer a 1.0.0 blocker - it would be nice to support queued connections in the 1.0 release series I though, so moving to block that metabug instead and updating the summary.
comment:7 by , 18 years ago
Blocking: | 142 removed |
---|---|
Cc: | added |
Missed this discussion because I wasn't subscribed to the bug. We really need to work out how to make bugzilla email a list with all bug changes, or move to a new bugtracker (after 1.0, of course).
This bug clearly isn't a blocker for 142, since 142 is now fixed and this one isn't, so I'm removing the block on 142. Incidentally, I've just noticed a load of valgrind failures in the testsuite (exclusively for the remote-tcp backend).
I was about to investigate, but it sounds like Olly is already doing so, so
I'll get on with something else.
comment:8 by , 18 years ago
I'd marked 142 as depending on this just to avoid anyone trying to fix that independently since there was a patch here. I realise the relationship didn't really fit "A blocks B". I would have merged the two bugs, but our bugzilla doesn't seem to have a way to do that.
I've pretty much run out of ideas on the valgrind issues, and then noticed that the problem wasn't made worse by this patch, so gave up and applied it. I'm not actively looking at them now, so feel free to.
Oddly I didn't notice them before though, and I do run "make check" pretty often! I did upgrade to feisty fairly recently (while working on the zlib patch
- valgrind reported errors with that which turned out to be known problems which
there's normally a suppression for but it wasn't in the standard list for the latest zlib library version - I couldn't recompile valgrind from source for some reason, so I gave up and upgraded, before I discovered it was a missing suppression) so perhaps that's related.
I see two different problems:
(A) This fails because with "EOF" errors (all tests fail, but restricting to zerodocid1 makes it easier to track down):
./runtest ./apitest -v -b remote zerodocid1
If you manually hack runsrv to add "2> /dev/null" to the last line (the exec) then the test passes. But redirecting to a file shows that there's no output on stderr. Also, I believe Xapian actually closes fd 2 anyway. So I'm a bit puzzled there.
(B) This reports memory leaks:
./runtest ./apitest -v -b remotetcp zerodocid1
They appear to be memory that's still accessible but that the allocations grow when you rerun the test. The valgrind output appears to just list a load of allocations which either happened before main, or as the test harness initialises. I poked at it for a bit, but this seems to be a bogus report of the type I've seen before, and even if it's real I don't view a "leak" of memory which we still have a pointer to as a showstopper for releasing 1.0.0.
comment:9 by , 18 years ago
After investigating the valgrind issues with remotetcp, I quite agree that it looks like a false positive, so doesn't need to block 1.0.0 (though we should try and fix it to clean up the output at some point).
I can't make sense of the problem with the "remote" backend, but I see the same behaviour as you.
comment:10 by , 18 years ago
attachments.isobsolete: | 0 → 1 |
---|
comment:11 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Closing in favour of bug#145 which covers the remaining issues in this bug.
comment:12 by , 18 years ago
Operating System: | → All |
---|---|
Resolution: | fixed → released |
Fixed in 1.0.0 release.
simple test using python