Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#141 closed enhancement (released)

Writable xapian-tcpsrv should allow queueing of connections

Reported by: Mark Hammond Owned by: Olly Betts
Priority: normal Milestone:
Component: Backend-Remote Version: SVN trunk
Severity: minor Keywords:
Cc: Richard Boulton, Sidnei da Silva Blocked By:
Blocking: #120 Operating System: All

Description

From xapian-devel:

There may be 3 stages of this bug:

  • In the first instance a writable server enforces a simple "one connection at a

time" lock.

  • In the second instance, a remote connection includes a 'writable needed' flag

when connecting and server enforces only one writable connection.

  • Finally, we implement a queue, where we respond to the client that a writable

connection exists and we will block until it finishes. I'll avoid too much implementation speculation in this initial comment :)

Attachments (3)

test_multi_writers.py (518 bytes ) - added by Mark Hammond 18 years ago.
simple test using python
multi_writer_lock.2.patch (15.8 KB ) - added by Mark Hammond 18 years ago.
updated patch
multi_writer_lock.patch (13.3 KB ) - added by Mark Hammond 18 years ago.
patch as described

Download all attachments as: .zip

Change History (15)

by Mark Hammond, 18 years ago

Attachment: test_multi_writers.py added

simple test using python

comment:1 by Olly Betts, 18 years ago

Status: newassigned

Looks pretty good - a few comments:

There's no point using add_database() in the check that we can open all the read-only databases - just create and ignore each one like you do for the writable case.

You're missing spaces in the dissected "Starting server" message (and "port" might as well be factored out too!)

xapian-progsrv inherently only has a single connection which is established right away, so checking that the database can be opened in xapian-progsrv.cc is just adding start-up overhead for no benefit.

Replacing db_->get_description() with dbpaths[0] makes the error context less useful - a little helper function with joins all the elements of dbpaths[0] with spaces between would be better.

initialise() is now only used once, so might as well be merged into the ctor.

"its" should be "it's" (comment in ~RemoteServer)!

(And unconnected with your changes, RemoteServer's default timeout values are probably unused, in which case they should be removed).

If you have time to address those that would be great. Otherwise I can probably sort them out, but not right now as I need to stop being distracted from rejigging index_text.

comment:2 by Olly Betts, 18 years ago

Blocking: 118 added
Summary: tcp server doesn't enforce one-writer constraintxapian-tcpsrv doesn't enforce one-writer constraint

I believe we should try to make sure that at least the first stage as listed in the original report is implemented for 1.0.0, since it's a rather nasty hole in our database locking scheme, and we have a patch to fix it which is "mostly there".

I've now addressed the parenthetical point about the unused timeouts in RemoteServer - they're history.

I'll have a think about the test harness. It's great for running arbitrary API calls against anonymous databases, but it's hard work to do this sort of stuff as it stands.

comment:3 by Olly Betts, 18 years ago

Blocking: 142 added

by Mark Hammond, 18 years ago

Attachment: multi_writer_lock.2.patch added

updated patch

by Mark Hammond, 18 years ago

Attachment: multi_writer_lock.patch added

patch as described

comment:4 by Mark Hammond, 18 years ago

attachments.isobsolete: 01

comment:5 by Olly Betts, 18 years ago

I'm working on applying this, but for some reason it causes problems with running tests under valgrind. I'm investigating...

comment:6 by Olly Betts, 18 years ago

Blocking: 120 added; 118 removed
Severity: normalenhancement
Summary: xapian-tcpsrv doesn't enforce one-writer constraintWritable xapian-tcpsrv should allow queueing of connections

Turns out I get the same valgrind issues without the patch too! So committed a revised version (the version here doesn't compile on UNIX, xapian-progsrv doesn't work, and GCC warns about the order of class member initialisation).

So this bug is no longer a 1.0.0 blocker - it would be nice to support queued connections in the 1.0 release series I though, so moving to block that metabug instead and updating the summary.

comment:7 by Richard Boulton, 18 years ago

Blocking: 142 removed
Cc: richard@… added

Missed this discussion because I wasn't subscribed to the bug. We really need to work out how to make bugzilla email a list with all bug changes, or move to a new bugtracker (after 1.0, of course).

This bug clearly isn't a blocker for 142, since 142 is now fixed and this one isn't, so I'm removing the block on 142. Incidentally, I've just noticed a load of valgrind failures in the testsuite (exclusively for the remote-tcp backend).

I was about to investigate, but it sounds like Olly is already doing so, so

I'll get on with something else.

comment:8 by Olly Betts, 18 years ago

I'd marked 142 as depending on this just to avoid anyone trying to fix that independently since there was a patch here. I realise the relationship didn't really fit "A blocks B". I would have merged the two bugs, but our bugzilla doesn't seem to have a way to do that.

I've pretty much run out of ideas on the valgrind issues, and then noticed that the problem wasn't made worse by this patch, so gave up and applied it. I'm not actively looking at them now, so feel free to.

Oddly I didn't notice them before though, and I do run "make check" pretty often! I did upgrade to feisty fairly recently (while working on the zlib patch

  • valgrind reported errors with that which turned out to be known problems which

there's normally a suppression for but it wasn't in the standard list for the latest zlib library version - I couldn't recompile valgrind from source for some reason, so I gave up and upgraded, before I discovered it was a missing suppression) so perhaps that's related.

I see two different problems:

(A) This fails because with "EOF" errors (all tests fail, but restricting to zerodocid1 makes it easier to track down):

./runtest ./apitest -v -b remote zerodocid1

If you manually hack runsrv to add "2> /dev/null" to the last line (the exec) then the test passes. But redirecting to a file shows that there's no output on stderr. Also, I believe Xapian actually closes fd 2 anyway. So I'm a bit puzzled there.

(B) This reports memory leaks:

./runtest ./apitest -v -b remotetcp zerodocid1

They appear to be memory that's still accessible but that the allocations grow when you rerun the test. The valgrind output appears to just list a load of allocations which either happened before main, or as the test harness initialises. I poked at it for a bit, but this seems to be a bogus report of the type I've seen before, and even if it's real I don't view a "leak" of memory which we still have a pointer to as a showstopper for releasing 1.0.0.

comment:9 by Richard Boulton, 18 years ago

After investigating the valgrind issues with remotetcp, I quite agree that it looks like a false positive, so doesn't need to block 1.0.0 (though we should try and fix it to clean up the output at some point).

I can't make sense of the problem with the "remote" backend, but I see the same behaviour as you.

comment:10 by Olly Betts, 18 years ago

attachments.isobsolete: 01

comment:11 by Olly Betts, 18 years ago

Resolution: fixed
Status: assignedclosed

Closing in favour of bug#145 which covers the remaining issues in this bug.

comment:12 by Olly Betts, 18 years ago

Operating System: All
Resolution: fixedreleased

Fixed in 1.0.0 release.

Note: See TracTickets for help on using tickets.