Opened 9 years ago
Closed 9 years ago
#696 closed defect (fixed)
Use after free with WILDCARD_LIMIT_MOST_FREQUENT
Reported by: | Alex Coventry | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.3.4 |
Component: | Library API | Version: | 1.3.3 |
Severity: | major | Keywords: | 1.3.3 segfault git master 9c9213d |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description (last modified by )
I don't have time to reduce this to a minimal test case, but the process for reproducing the failure is automated, so even though it takes a long time to run, it's not too much human effort.
Start up a $10 Ubuntu 14.04 digital ocean box, and run the attached bash script. Takes a couple of hours to complete. The final python command either segfaults or gets killed for excessive memory usage. Usually segfaults, in my experience. With a bit of print debugging, I've determined it's segfaulting on "enquire.get_mset(0, numresults)", unsurprisingly.
Here are the versions of all the git checkouts during the build process.
root@testscript:~/voter_lookup# for p in `find /root -name .git -type d` ; do ( cd $p ; git log -1 --pretty=format:"%h $p" ) ; done 94ca849 /root/voter_lookup/.git 1a99212 /root/alien/xapian/swig/.git 3c0adf0 /root/alien/xapian/xapian-applications/omega/.common.git/.git 3c0adf0 /root/alien/xapian/xapian-letor/.common.git/.git 9c9213d /root/alien/xapian/.git
Attachments (1)
Change History (11)
by , 9 years ago
comment:1 by , 9 years ago
Description: | modified (diff) |
---|
comment:2 by , 9 years ago
Description: | modified (diff) |
---|
comment:3 by , 9 years ago
Description: | modified (diff) |
---|
My guess is it's simply exceeding the allocated stack depth, which typically leads to a segfault - I don't have an Ubuntu machine within reach, but on Debian x86_64:
$ cat > stack.cc int f(int c) { return f(c + 1) + f(c + 2); } int main() { return f(0); } $ g++ stack.cc $ ./a.out Segmentation fault
You say it sometimes fails with OOM, which fits that conclusion.
What does free
report for this VM? How big is the database?
I'm afraid a reproducer which involves having to buy and set up a special VM and then wait several hours per iteration doesn't exactly fill me with the urge to dig in...
comment:4 by , 9 years ago
Once the script has run, only its last command is necessary to induce the segfault, and its runtime is less than 3s. If you give me an ssh key, I can give you access to the box where I've run the script already.
The box has 1G of memory. The database has 11G. There are 22 frames in the stack trace, which is included below.
root@testscript:~# free -h total used free shared buffers cached Mem: 994M 117M 876M 340K 20M 49M -/+ buffers/cache: 47M 946M Swap: 0B 0B 0B r
root@testscript:~# ls -lh voter_lookup/data/voters-xapian/ total 11G -rw-r--r-- 1 root root 0 Nov 18 07:03 flintlock -rw-r--r-- 1 root root 28 Nov 18 06:13 iamchert -rw-r--r-- 1 root root 38K Nov 18 07:58 position.baseA -rw-r--r-- 1 root root 38K Nov 18 07:58 position.baseB -rw-r--r-- 1 root root 2.4G Nov 18 07:58 position.DB -rw-r--r-- 1 root root 18K Nov 18 07:58 postlist.baseA -rw-r--r-- 1 root root 18K Nov 18 07:58 postlist.baseB -rw-r--r-- 1 root root 1.1G Nov 18 07:58 postlist.DB -rw-r--r-- 1 root root 95K Nov 18 07:58 record.baseA -rw-r--r-- 1 root root 95K Nov 18 07:58 record.baseB -rw-r--r-- 1 root root 6.0G Nov 18 07:58 record.DB -rw-r--r-- 1 root root 14K Nov 18 07:58 termlist.baseA -rw-r--r-- 1 root root 14K Nov 18 07:58 termlist.baseB -rw-r--r-- 1 root root 881M Nov 18 07:58 termlist.DB
root@testscript:~# PYTHONPATH=/root gdb python GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from python...(no debugging symbols found)...done. (gdb) r Starting program: /usr/bin/python [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from voter_lookup import query; print query.search({'RESIDENTIAL_ZIP':'43', 'FIRST_NAME': 'john', 'LAST_NAME': 's'}, active_element='LAST_NAME') Traceback (most recent call last): File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module> from libstdcxx.v6.printers import register_libstdcxx_printers ImportError: No module named 'libstdcxx' get to 1 Program received signal SIGSEGV, Segmentation fault. 0x00007ffff5d4aa50 in LocalSubMatch::open_post_list (this=0xa2f9c0, term=..., wqf=1, factor=1, need_positions=<optimized out>, hint=0x7fffffffd830, lazy_weight=<optimized out>) at matcher/localsubmatch.cc:278 278 pl = (*hint)->open_nearby_postlist(term); (gdb) where #0 0x00007ffff5d4aa50 in LocalSubMatch::open_post_list (this=0xa2f9c0, term=..., wqf=1, factor=1, need_positions=<optimized out>, hint=0x7fffffffd830, lazy_weight=<optimized out>) at matcher/localsubmatch.cc:278 #1 0x00007ffff5c63561 in Xapian::Query::Internal::postlist_sub_or_like (this=<optimized out>, ctx=..., qopt=<optimized out>, factor=<optimized out>) at api/queryinternal.cc:588 #2 0x00007ffff5c64873 in Xapian::Internal::QueryBranch::do_or_like (this=this@entry=0xa2e5f0, ctx=..., qopt=qopt@entry=0x7fffffffd820, factor=factor@entry=1, elite_set_size=elite_set_size@entry=0, first=first@entry=0) at api/queryinternal.cc:1116 #3 0x00007ffff5c64b70 in Xapian::Internal::QueryOr::postlist (this=0xa2e5f0, qopt=0x7fffffffd820, factor=1) at api/queryinternal.cc:1430 #4 0x00007ffff5c634a1 in Xapian::Query::Internal::postlist_sub_and_like (this=<optimized out>, ctx=..., qopt=<optimized out>, factor=<optimized out>) at api/queryinternal.cc:580 #5 0x00007ffff5c6039b in Xapian::Internal::QueryAndLike::postlist_sub_and_like (this=0xa2f900, ctx=..., qopt=0x7fffffffd820, factor=1) at api/queryinternal.cc:1353 #6 0x00007ffff5c63a6d in Xapian::Internal::QueryAndLike::postlist (this=0xa2f900, qopt=0x7fffffffd820, factor=1) at api/queryinternal.cc:1342 #7 0x00007ffff5d49d85 in LocalSubMatch::get_postlist (this=0xa2f9c0, matcher=0x7fffffffde10, total_subqs_ptr=0x7fffffffd97c) at matcher/localsubmatch.cc:202 #8 0x00007ffff5d4ed71 in MultiMatch::get_mset (this=this@entry=0x7fffffffde10, first=first@entry=0, maxitems=10, check_at_least=check_at_least@entry=10, mset=..., stats=..., mdecider=mdecider@entry=0x0, sorter=0x0) at matcher/multimatch.cc:438 #9 0x00007ffff5c549b9 in Xapian::Enquire::Internal::get_mset (this=0xa13f70, first=<optimized out>, first@entry=0, maxitems=<optimized out>, maxitems@entry=10, check_at_least=<optimized out>, check_at_least@entry=0, rset=rset@entry=0x0, mdecider=mdecider@entry=0x0) at api/omenquire.cc:707 #10 0x00007ffff5c54cfd in Xapian::Enquire::get_mset (this=this@entry=0x9e69e0, first=first@entry=0, maxitems=maxitems@entry=10, check_at_least=check_at_least@entry=0, rset=rset@entry=0x0, mdecider=mdecider@entry=0x0) at api/omenquire.cc:1071 #11 0x00007ffff604c5e2 in _wrap_Enquire_get_mset__SWIG_3 (swig_obj=0x7fffffffdfd0, nobjs=3) at xapian_wrap.cc:25987 #12 _wrap_Enquire_get_mset (self=<optimized out>, args=<optimized out>) at xapian_wrap.cc:26129 #13 0x000000000049ec76 in PyEval_EvalFrameEx () #14 0x00000000004a090c in PyEval_EvalCodeEx () #15 0x0000000000499a52 in PyEval_EvalFrameEx () #16 0x00000000004a1634 in ?? () #17 0x000000000044e87c in PyRun_InteractiveOneFlags () #18 0x000000000044e998 in PyRun_InteractiveLoopFlags () #19 0x000000000044ed70 in PyRun_AnyFileExFlags () #20 0x000000000044f904 in Py_Main () #21 0x00007ffff7818ec5 in __libc_start_main (main=0x44f9c2 <main>, argc=1, argv=0x7fffffffe6b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6a8) at libc-start.c:287 #22 0x0000000000578c4e in _start () (gdb)
comment:5 by , 9 years ago
Interesting - *hint
shouldn't be NULL at this point, as we check for that on the line before; hint
shouldn't be NULL either, as it's just a pointer to a member of the object we call this from (and the stack shows it isn't).
My guess is that the postlist pointed to by *hint
ends up deleted at this point, though I've not yet managed to reproduce that.
Can you print the variables involved, in gdb:
p hint p *hint p (*hint)->get_description()->c_str() p term->c_str() p *this
comment:6 by , 9 years ago
Component: | Xapian-bindings (Python) → Library API |
---|---|
Milestone: | 1.3.x → 1.3.4 |
Owner: | changed from | to
Status: | new → assigned |
Summary: | Segfault/OOM in git master → Use after free with WILDCARD_LIMIT_MOST_FREQUENT |
Version: | other → 1.3.3 |
OK, have reproduced this (and directly from C++), so no need for the above.
The issue is with Xapian::Query::WILDCARD_LIMIT_MOST_FREQUENT
, which QueryParser
defaults to using for FLAG_PARTIAL - this can delete the PostList
object which *hint
points to. That wildcard limit option was new in 1.3.3, so this doesn't affect anything earlier (or 1.2.x).
comment:8 by , 9 years ago
Should be fixed by [2299e1d21e39f1295c81833ccd5037f746f4744a]. Can you confirm that deals with the segfaults?
That bug doesn't explain the OOM errors - I think they are likely mostly down to the memory used by the postlists for all the terms the wildcard expands to.
I'd expect you can avoid those by using the "glass" backend (not the default yet) - glass reference counts cursor blocks, which makes a huge difference to wildcards (because all the terms are together and so most of the cursor blocks can be reused several times over).
The downside is that the on-disk format for glass isn't completely fixed yet, so you potentially need to reindex after an upgrade. The simplest way to select it is probably just to export XAPIAN_PREFER_GLASS=1
before running the indexer - that changes the default when creating a new database.
Script which leads to the segfault.