Opened 18 years ago

Last modified 14 years ago

#185 closed defect

Xapian operations hang with mod-python under apache — at Version 20

Reported by: Richard Boulton Owned by: Richard Boulton
Priority: normal Milestone: 1.0.13
Component: Xapian-bindings Version: SVN trunk
Severity: normal Keywords:
Cc: Olly Betts, Mark Hammond, Bas van Oostveen, Deron Meranda, herbert.poul@…, daevaorn@…, dcolish@… Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

Xapian appears to hang as soon as any Xapian methods are called from the Python bindings, when running using mod-python. Reported at http://thread.gmane.org/gmane.comp.search.xapian.general/4486

Change History (21)

comment:1 by Bas van Oostveen, 18 years ago

Cc: v.oostveen@… added

comment:2 by Richard Boulton, 18 years ago

http://www.modpython.org/pipermail/mod_python/2007-April/023445.html has a good explanation of what is going on here.

Basically, mod-python creates a separate python interpreter for each vhost. However, the SWIG generated xapian python bindings use the simplified python handling routines, which assume that the thread lock to use is that held in the python interpreter called "main_interpreter". This results in the wrong thread lock being requested, and this always seems to result in a deadlock.

Short term workaround: Set the option "PythonInterpreter main_interpreter" in the apache configuration section for all mod_python scripts which use xapian.

Longer term fix: change SWIG to generate code which uses the full threading api.

by Richard Boulton, 18 years ago

Attachment: mptest.py added

mod-python script testing xapian

by Richard Boulton, 18 years ago

Attachment: 000-default added

Site configuration which works.

comment:3 by Mark Hammond, 18 years ago

Cc: mhammond@… added

FYI: "Longer term fix: change SWIG to generate code which uses the full threading api." isn't as easy as it sounds (well - uglier than it sounds).

The full threading API *forces* you to specify an explicit PyInterpreter object when using the GIL. For libraries which may spawn new threads which attempt to call back into Python, this is very tricky - that thread has no way to determine exactly what context it should use, so that library would be forced to remember an interpreter object in a global variable and use that. The simplified API just formalizes this process - an interpreter object is nominated the "main" one and the simplified API uses that.

In xapian's case, the threading consideration doesn't really apply - but it still would force xapian to track which interpreter is currently in use.

Eg:

python calls xapian xapian bindings release the GIL xapian winds up in code that needs to call back into Python xapian bindings restore the GIL

The problem is that in the last step, the "restore the GIL" must refer to the same PyInterpreter object that was used in step 2, when the GIL was released. The problem is how to "remember" what should be used. It is likely to require thread-local-storage, for example. (Note that in the general case though, when new calls may come in on a thread never previously seen by Python, even this falls down - eg, imagine if xapian spawned new threads that wanted to call back into Python). The simplified API just papers over this - there is a single, nominated "main" interpreter, and that interpreter is *always* used, regardless of the thread making the call.

I'm surprised apache went this way, as multiple interpreter support in Python is patchy at best. For example, extension modules don't play this game - they often store Python objects in C global variables, for example. This means that extension modules are *not* isolated in such an environment. So although each interpreter is somewhat isolated, its not complete isolation.

http://www.python.org/dev/peps/pep-0311/ has more info on this (I wrote it :)

comment:4 by Richard Boulton, 18 years ago

Status: newassigned

comment:5 by Olly Betts, 18 years ago

Cc: olly@… added

What was the fate of that PEP? Has the patch been included in a Python release?

Xapian doesn't thread internally, so the TLS approach should work for us. I imagine that it would be suitable for many C libraries wrapped by SWIG, so patching SWIG is worth considering.

comment:6 by Mark Hammond, 18 years ago

The PEP was accepted (although that fact was not recorded in the PEP), and those GIL state APIs referenced in the PEP are what SWIG uses. While it may be possible to patch SWIG such that it works with many C++ extensions, the GIL state APIs are more general purpose and will work with far more (ie, those that do thread internally as well as those that do not), and also ensures that multiple threaded extensions can cooperate (which is not possible using Apache's approach). It's also not clear that SWIG would always have a TLS implementation available to it. Apache using the explicit APIs is still widely considered a mistake (eg, http://mail.python.org/pipermail/web-sig/2007-August/002757.html is the most recent message to Python's web-sig, and it makes that point - google will offer many more) and almost all modern code uses the saner API provided by that PEP.

comment:7 by Olly Betts, 18 years ago

OK, have reread everything twice, I think I follow.

Richard has added a note about "PythonInterpreter main_interpreter" to the documentation. Since there's a workaround, and we're not the only Python extension to have this issue, I think we have an acceptable resolution to this for the time being.

comment:8 by Deron Meranda, 18 years ago

Cc: deron.meranda@… added
Operating System: All

comment:9 by Deron Meranda, 18 years ago

The workaround is of course only acceptable for those mod_python users who do not make use of the multiple-interpreter feature it exposes. If you do use multiple interpreters then there is still no offered workaround.

I do use other extension modules which have no problems in a multiple interpreter environment, such as MySQLdb. However this is the first SWIG-generated module I've tried using in this way. Is this really a SWIG problem, and can it be worked around in the Xapian case?

I do have a couple questions. First does Xapian ever call back into Python as PEP-311 describes. If not, then is there any reason to use the simplified GIL API to begin with? And secondly, assuming one only uses a non-threaded Apache MPM, is there any reason to unlock/relock the GIL inside C code anyway? And if not, is there a way to get the module to compile in a manner which would work in a multiple-interpreter environment as long as the process was always single-threaded?

Also, I don't really agree that the multiple-interpreter support that mod_python provides is "widely" considered a mistake. The only misgivings really are that some extension modules are buggy, and that the simplified GIL API is now promoting "broken" modules (although this was a reasonable trade off between reducing bugginess for everybody and completely breaking one feature of Python that only a few users would notice). But that's all a matter of opinion.

comment:10 by Mark Hammond, 18 years ago

xapian does have a need to acquire the GIL, but only from a thread that previously released it. xapian does not currently create new threads that call back into Python; without the simplified API, this is impossible to do in a cross-platform way. Each app managing its own interpreter also makes it impossible for multiple libraries to share python thread-states etc, making certain interactions between such libraries impossible.

Sadly, it really is Python that is "buggy" here - the multi-interpreter support is only 1/2 baked and the shortcomings of the original API are acknowledged - see the python-dev discussions leading up to that PEP. I'm glad it works for Apache and its extensions and I didn't mean to suggest anything other than SWIG continue supporting the simplified API by default. I'm sure contributions of alternative strategies that are optimized for Apache would be most welcome.

comment:11 by Deron Meranda, 18 years ago

Yes, this seems to be the result of an improperly designed Python arichtecture where two internal parts (GIL/threading and multiple-interpreter) can be mutually-exclusive (or mutually-deadlocking). Too bad.

However, even if one is able to use the "PythonInterpreter main_interpreter" workaround previously mentioned, it still doesn't prevent deadlock in all cases.

This appears to be a problem only when using Python 2.3.x (probably an issue with the simplified GIL code in that version). More detail can be found in mod_python issue 217 notes:

http://issues.apache.org/jira/browse/MODPYTHON-217

Other useful links to related information for the curious:

http://docs.python.org/api/threads.html http://modpython.org/live/current/doc-html/pyapi-interps.html

I can reproduce this deadlock with Python 2.3.4 while calling nothing other than

xapian.xapian_version_string()

while in the main_interpreter interpreter. You don't need any of the complexities of all the virtual host stuff in the attached config to reproduce this deadlock, just a simple 4-line .htaccess is enough.

So there's still no reasonable workaround. I'm beginning to think that the best approach is to get the SWIG stuff completely out of the mod_python process: such as by writing a pure-python xapian proxy module which communicates with a fork-exec'd child python process which can then import the SWIG-generated xapian without conflicting GIL/thread issues. But I'm suspect there's challenging issues there too.

comment:12 by Olly Betts, 18 years ago

Others understand most of this much better than I do, but it seems it would be a real headache to fork/exec and then (I'm assuming) have to do everything via IPC or similar. I'm not sure we really want to go there.

Are you saying this works correctly with Python 2.4 and later, or is it just that the problems can't be demonstrated with such a trivial example? If it's only Python 2.3, what's different there, and can we work around that difference?

If SWIG could implement the locking better, I'm happy to apply suitable patches there (unless the SWIG Python people disagree with the new approach).

comment:13 by Richard Boulton, 18 years ago

My understanding is that this is only a problem with Python 2.3. Therefore, I've updated the documentation of the workaround to suggest using Python 2.4 or later (I've tested that the main_interpreter workaround works with some version of python (can't remember whether it was 2.4 or 2.5), but haven't tested that it fails with 2.3 - I'm willing to believe the claim made earlier in this bug, though).

Another approach might be to add a configure option to remove the "-threads" option from the swig invocation when compiling for use with mod python. IIUC, without this option swig doesn't make any attempts to reduce the GIL, which is bad news for any normal multi-threaded program using xapian, but might be a worthwhile tradeoff for mod-python programs in setups which insist on using multiple interpreters. On the other hand, this would only work in maintainer mode, and would therefore require anyone building for such a setup to have all the necessary build tools installed (in particular, having the right version of swig), so this could be more trouble for the Xapian team than it's worth. Plus, it's liable to kill throughput on multi-core systems, which we don't want to get a reputation for. So, I don't propose trying this.

comment:14 by Michael Barton, 18 years ago

We've run into this problem with xapian 1.0.2 on python 2.4. Switching to main_interpreter hasn't helped.

If rebuilding the bindings without swig's threading support will work, I think we'd trade xapian performance for deadlocks. Especially since our current solution is to popen() a second script that makes all the xapian calls.

comment:15 by Olly Betts, 18 years ago

I've just fixed a bug in the mutex code swig uses for python.

The fix was simply to change `PyThread_free_lock(mutex_);' to `PyThread_release_lock(mutex_);'. This occurs just once in python/modern/xapian_wrap.cc.

Can those of you who have experienced these problems try that and report back?

I suspect this will fix problems for some people but not everyone but it will be interesting to see.

comment:16 by Richard Boulton, 18 years ago

As far as I can see, in modern/xapian_wrap.cc, the only call to PyThread_release_lock is in the destructor of "Guard", which in turn is only used by the SWIG_GUARD() macro, which in turn is used in 4 places in the code. However, these 4 places are in the Director base class, and none of them seem to be called by the Xapian code. So, unless I'm missing something, this won't change anything.

However, I've just tested the current Xapian HEAD with mod-python, using python 2.5 (on Ubuntu Dapper), and it works fine there. (I do need to use the main_interpreter workaround described earlier in this bug report, however.) So, we need more information from someone who is having problems to make any progress with this bug.

comment:18 by Herbert Poul, 17 years ago

Cc: herbert.poul@… added

i am having the same problem - on debian with apache 2 + prefork, mod_python-3.3.1, python 2.5.1 and xapian+bindings 1.0.6

i have tried using: PythonInterpreter main_interpreter in my apache config .. but it doesn't change anything .. - i have verified that the setting is working by outputting req.interpreter: main_interpreter ... so no idea what i missed ?

do i have to compile xapian-bindings with a special option or something ? any idea what else i could try ? (other than not using mod_python or xapian)

comment:19 by Michael Barton, 17 years ago

I'm still having this problem on ubuntu dapper (so I'm on 1.0.5) with python 2.4 and mod_python 3.1.4. It locks up every time, and running xapian code in the main interpreter has never helped any that I could tell. We have a semi-complex apache configuration with some legacy stuff that requires explicit separate interpreters, and that's probably a factor.

I've been using a build of the bindings minus swig -threads and that works, but maintainer mode is sort of harrowing.

in reply to:  19 comment:20 by Olly Betts, 17 years ago

Description: modified (diff)

Replying to michael.barton:

I've been using a build of the bindings minus swig -threads and that works, but maintainer mode is sort of harrowing.

Does it work with -threads if you use SWIG SVN HEAD instead of the SWIG in the Xapian tree (or use Xapian SVN HEAD, which includes a newer SWIG snapshot)? There's a change in SWIG which might be relevant, but neither Richard nor I can reproduce this issue currently.

Specifying "main_interpreter" will still be necessary.

Note: See TracTickets for help on using tickets.