Opened 11 years ago
Closed 9 years ago
#640 closed defect (incomplete)
Core dump when using python bindings for Xapian
Reported by: | Jesse | Owned by: | Richard Boulton |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Xapian-bindings (Python) | Version: | 1.2.15 |
Severity: | major | Keywords: | core dump python bindings |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description (last modified by )
Occasionally when running some python code that uses Xapian databases I encounter a coredump in the Xapian python bindings. Please advise on what debug information would be useful, or how I might avoid this issue programmaticly.
Running the following packages on RHEL 6: xapian-bindings.x86_64 0:1.2.15-1 xapian-bindings-python.x86_64 0:1.2.15-1 xapian-core.x86_64 0:1.2.15-1 xapian-core-libs.x86_64 0:1.2.15-1
# python --version Python 2.6.6
Error starts with:
File "/usr/lib/python2.6/site-packages/netflowindexer-0.1.38-py2.6.egg/netflowindexer/base/indexer.py", line 111, in real_index_files database.replace_document(key, doc) xapian.DatabaseError: Error reading block 933718066: got end of file *** glibc detected *** /usr/bin/python: free(): invalid next size (normal): 0x0000000002a2b0e0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3dd1076166] /lib64/libc.so.6[0x3dd1078c93] /usr/lib64/libxapian.so.22(_ZN10ChertTable5closeEb+0x4a)[0x7f35ef23c5ba] /usr/lib64/libxapian.so.22(_ZN10ChertTableD2Ev+0xf)[0x7f35ef23d13f] /usr/lib64/libxapian.so.22(+0xc88a5)[0x7f35ef2248a5] /usr/lib64/libxapian.so.22(+0xc8bc9)[0x7f35ef224bc9] /usr/lib64/libxapian.so.22(_ZN6Xapian8DatabaseD1Ev+0x4a)[0x7f35ef1ad9ba] /usr/lib64/libxapian.so.22(_ZN6Xapian16WritableDatabaseD0Ev+0x9)[0x7f35ef1ada29] /usr/lib64/python2.6/site-packages/xapian/_xapian.so(+0x29263)[0x7f35ef574263] /usr/lib64/python2.6/site-packages/xapian/_xapian.so(+0x1d586)[0x7f35ef568586] /usr/lib64/libpython2.6.so.1.0[0x33b9a79e4b] /usr/lib64/libpython2.6.so.1.0[0x33b9a9a75c] /usr/lib64/libpython2.6.so.1.0[0x33b9a79e4b] /usr/lib64/libpython2.6.so.1.0[0x33b9a5538b] /usr/lib64/libpython2.6.so.1.0[0x33b9a69382] /usr/lib64/libpython2.6.so.1.0[0x33b9afa7cb] /usr/lib64/libpython2.6.so.1.0[0x33b9afa7db] /usr/lib64/libpython2.6.so.1.0[0x33b9afa7db] /usr/lib64/libpython2.6.so.1.0[0x33b9a78287] /usr/lib64/libpython2.6.so.1.0(PyDict_SetItem+0xa7)[0x33b9a7acf7] /usr/lib64/libpython2.6.so.1.0(PyDict_SetItemString+0x40)[0x33b9a7aed0] /usr/lib64/libpython2.6.so.1.0(PyImport_Cleanup+0x11b)[0x33b9ae980b] /usr/lib64/libpython2.6.so.1.0(Py_Finalize+0x11b)[0x33b9af28ab] /usr/lib64/libpython2.6.so.1.0(Py_Main+0x596)[0x33b9aff2d6] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3dd101ed1d] /usr/bin/python[0x400649]
Attachments (1)
Change History (10)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Description: | modified (diff) |
---|---|
Priority: | high → normal |
The DatabaseError suggests that your database is corrupted - have you tried running xapian-check on it? But even a corrupt database shouldn't result in a crash like this, so there's still a bug here.
It sounds like the heap's already corrupted at this point, so this probably isn't where the problem lies. Running under valgrind's memcheck tool might identify where things actually start to go wrong, though it will slow things down significantly, so you probably don't want to run your production server like that.
A lighter-weight alternative is to tell glibc to do validity checks of the heap, which might catch the corruption sooner (though probably not as it actually happens as valgrind can):
export MALLOC_CHECK_=2
comment:3 by , 11 years ago
Hello,
Thank you for the advice. I've attached a run of my program under valgrind which exhibits the failure. Please let me know what you see in this output, and if this is helpful. I've also turned on core dumps for this machine, and have several core files from previous failures that are about 22 MB apiece. Please let me know if one of these would be helpful, and where I should send them.
Cheers,
Jesse
comment:4 by , 11 years ago
It looks to me like the corrupt database is causing a write to just outside the block. We ought to catch that, but the more fundamental problem you appear to have is a corrupted database. Did you try running xapian-check on it as I suggested?
comment:5 by , 11 years ago
Yes, I forgot to mention that. Running xapian-check on all the databases resulted in no error reports. :(
That said, I can remove databases associated with the errors and things will proceed well enough for a while. However, it seems that there is always a crash after a day or so. I have even tried removing all databases and starting fresh, but something seems to corrupt the databases after a while.
I understand that this corruption is likely my own programs fault, but I would like to get this obvious bug resolved also. :)
Cheers,
Jesse
comment:6 by , 10 years ago
I'm wondering if the recently reported #645 is related - that's also a case where the database appears to have gone bad, but it seems the problem is with the data in-memory rather than on disk.
comment:7 by , 10 years ago
If your code calls reopen()
on Database
objects, this may be fixed by [826d1a19cc356e7bf66c1681626e70af32967447]. That leads to using junk data in cursors which could cause the symptoms you reported.
comment:8 by , 9 years ago
No submitter response in 7 months, so closing this ticket as "incomplete".
The reported symptoms are consistent with the bug fixed by the commit mentioned in comment:7, but if you can reproduce this with the latest 1.2.x release (currently 1.2.21) please let us know and we can reopen and investigate.
comment:9 by , 9 years ago
Resolution: | → incomplete |
---|---|
Status: | new → closed |
Oops, actually closing as "incomplete" this time. Again, more details welcome.
Hi Richard,
Anything I can add to this to help debug? Debug statements? Core dump? Code snippet?
Cheers,
Jesse