Opened 10 years ago

Closed 8 years ago

#640 closed defect (incomplete)

Core dump when using python bindings for Xapian

Reported by: Jesse Owned by: Richard Boulton
Priority: normal Milestone:
Component: Xapian-bindings (Python) Version: 1.2.15
Severity: major Keywords: core dump python bindings
Cc: Blocked By:
Blocking: Operating System: Linux

Description (last modified by Olly Betts)

Occasionally when running some python code that uses Xapian databases I encounter a coredump in the Xapian python bindings. Please advise on what debug information would be useful, or how I might avoid this issue programmaticly.

Running the following packages on RHEL 6: xapian-bindings.x86_64 0:1.2.15-1 xapian-bindings-python.x86_64 0:1.2.15-1 xapian-core.x86_64 0:1.2.15-1 xapian-core-libs.x86_64 0:1.2.15-1

# python --version Python 2.6.6

Error starts with:

  File "/usr/lib/python2.6/site-packages/netflowindexer-0.1.38-py2.6.egg/netflowindexer/base/indexer.py", line 111, in real_index_files
    database.replace_document(key, doc)
xapian.DatabaseError: Error reading block 933718066: got end of file
*** glibc detected *** /usr/bin/python: free(): invalid next size (normal): 0x0000000002a2b0e0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3dd1076166]
/lib64/libc.so.6[0x3dd1078c93]
/usr/lib64/libxapian.so.22(_ZN10ChertTable5closeEb+0x4a)[0x7f35ef23c5ba]
/usr/lib64/libxapian.so.22(_ZN10ChertTableD2Ev+0xf)[0x7f35ef23d13f]
/usr/lib64/libxapian.so.22(+0xc88a5)[0x7f35ef2248a5]
/usr/lib64/libxapian.so.22(+0xc8bc9)[0x7f35ef224bc9]
/usr/lib64/libxapian.so.22(_ZN6Xapian8DatabaseD1Ev+0x4a)[0x7f35ef1ad9ba]
/usr/lib64/libxapian.so.22(_ZN6Xapian16WritableDatabaseD0Ev+0x9)[0x7f35ef1ada29]
/usr/lib64/python2.6/site-packages/xapian/_xapian.so(+0x29263)[0x7f35ef574263]
/usr/lib64/python2.6/site-packages/xapian/_xapian.so(+0x1d586)[0x7f35ef568586]
/usr/lib64/libpython2.6.so.1.0[0x33b9a79e4b]
/usr/lib64/libpython2.6.so.1.0[0x33b9a9a75c]
/usr/lib64/libpython2.6.so.1.0[0x33b9a79e4b]
/usr/lib64/libpython2.6.so.1.0[0x33b9a5538b]
/usr/lib64/libpython2.6.so.1.0[0x33b9a69382]
/usr/lib64/libpython2.6.so.1.0[0x33b9afa7cb]
/usr/lib64/libpython2.6.so.1.0[0x33b9afa7db]
/usr/lib64/libpython2.6.so.1.0[0x33b9afa7db]
/usr/lib64/libpython2.6.so.1.0[0x33b9a78287]
/usr/lib64/libpython2.6.so.1.0(PyDict_SetItem+0xa7)[0x33b9a7acf7]
/usr/lib64/libpython2.6.so.1.0(PyDict_SetItemString+0x40)[0x33b9a7aed0]
/usr/lib64/libpython2.6.so.1.0(PyImport_Cleanup+0x11b)[0x33b9ae980b]
/usr/lib64/libpython2.6.so.1.0(Py_Finalize+0x11b)[0x33b9af28ab]
/usr/lib64/libpython2.6.so.1.0(Py_Main+0x596)[0x33b9aff2d6]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3dd101ed1d]
/usr/bin/python[0x400649]

Attachments (1)

xapian_valgrind_results.txt (13.1 KB ) - added by Jesse 10 years ago.
Results from valgrind run #1

Download all attachments as: .zip

Change History (10)

comment:1 by Jesse, 10 years ago

Hi Richard,

Anything I can add to this to help debug? Debug statements? Core dump? Code snippet?

Cheers,

Jesse

comment:2 by Olly Betts, 10 years ago

Description: modified (diff)
Priority: highnormal

The DatabaseError suggests that your database is corrupted - have you tried running xapian-check on it? But even a corrupt database shouldn't result in a crash like this, so there's still a bug here.

It sounds like the heap's already corrupted at this point, so this probably isn't where the problem lies. Running under valgrind's memcheck tool might identify where things actually start to go wrong, though it will slow things down significantly, so you probably don't want to run your production server like that.

A lighter-weight alternative is to tell glibc to do validity checks of the heap, which might catch the corruption sooner (though probably not as it actually happens as valgrind can):

export MALLOC_CHECK_=2

by Jesse, 10 years ago

Attachment: xapian_valgrind_results.txt added

Results from valgrind run #1

comment:3 by Jesse, 10 years ago

Hello,

Thank you for the advice. I've attached a run of my program under valgrind which exhibits the failure. Please let me know what you see in this output, and if this is helpful. I've also turned on core dumps for this machine, and have several core files from previous failures that are about 22 MB apiece. Please let me know if one of these would be helpful, and where I should send them.

Cheers,

Jesse

comment:4 by Olly Betts, 10 years ago

It looks to me like the corrupt database is causing a write to just outside the block. We ought to catch that, but the more fundamental problem you appear to have is a corrupted database. Did you try running xapian-check on it as I suggested?

comment:5 by Jesse, 10 years ago

Yes, I forgot to mention that. Running xapian-check on all the databases resulted in no error reports. :(

That said, I can remove databases associated with the errors and things will proceed well enough for a while. However, it seems that there is always a crash after a day or so. I have even tried removing all databases and starting fresh, but something seems to corrupt the databases after a while.

I understand that this corruption is likely my own programs fault, but I would like to get this obvious bug resolved also. :)

Cheers,

Jesse

comment:6 by Olly Betts, 10 years ago

I'm wondering if the recently reported #645 is related - that's also a case where the database appears to have gone bad, but it seems the problem is with the data in-memory rather than on disk.

comment:7 by Olly Betts, 9 years ago

If your code calls reopen() on Database objects, this may be fixed by [826d1a19cc356e7bf66c1681626e70af32967447]. That leads to using junk data in cursors which could cause the symptoms you reported.

Last edited 9 years ago by Olly Betts (previous) (diff)

comment:8 by Olly Betts, 8 years ago

No submitter response in 7 months, so closing this ticket as "incomplete".

The reported symptoms are consistent with the bug fixed by the commit mentioned in comment:7, but if you can reproduce this with the latest 1.2.x release (currently 1.2.21) please let us know and we can reopen and investigate.

comment:9 by Olly Betts, 8 years ago

Resolution: incomplete
Status: newclosed

Oops, actually closing as "incomplete" this time. Again, more details welcome.

Note: See TracTickets for help on using tickets.