Opened 8 years ago

Closed 8 years ago

#696 closed defect (fixed)

Use after free with WILDCARD_LIMIT_MOST_FREQUENT

Reported by: Alex Coventry Owned by: Olly Betts
Priority: normal Milestone: 1.3.4
Component: Library API Version: 1.3.3
Severity: major Keywords: 1.3.3 segfault git master 9c9213d
Cc: Blocked By:
Blocking: Operating System: Linux

Description (last modified by Olly Betts)

I don't have time to reduce this to a minimal test case, but the process for reproducing the failure is automated, so even though it takes a long time to run, it's not too much human effort.

Start up a $10 Ubuntu 14.04 digital ocean box, and run the attached bash script. Takes a couple of hours to complete. The final python command either segfaults or gets killed for excessive memory usage. Usually segfaults, in my experience. With a bit of print debugging, I've determined it's segfaulting on "enquire.get_mset(0, numresults)", unsurprisingly.

Here are the versions of all the git checkouts during the build process.

root@testscript:~/voter_lookup# for p in `find /root -name .git -type d` ; do ( cd $p ; git log -1 --pretty=format:"%h $p" ) ; done
94ca849 /root/voter_lookup/.git
1a99212 /root/alien/xapian/swig/.git
3c0adf0 /root/alien/xapian/xapian-applications/omega/.common.git/.git
3c0adf0 /root/alien/xapian/xapian-letor/.common.git/.git
9c9213d /root/alien/xapian/.git

Attachments (1)

setup.sh (957 bytes ) - added by Alex Coventry 8 years ago.
Script which leads to the segfault.

Download all attachments as: .zip

Change History (11)

by Alex Coventry, 8 years ago

Attachment: setup.sh added

Script which leads to the segfault.

comment:1 by Alex Coventry, 8 years ago

Description: modified (diff)

comment:2 by Alex Coventry, 8 years ago

Description: modified (diff)

comment:3 by Olly Betts, 8 years ago

Description: modified (diff)

My guess is it's simply exceeding the allocated stack depth, which typically leads to a segfault - I don't have an Ubuntu machine within reach, but on Debian x86_64:

$ cat > stack.cc
int f(int c) {
   return f(c + 1) + f(c + 2);
}

int main() { return f(0); }
$ g++ stack.cc
$ ./a.out 
Segmentation fault

You say it sometimes fails with OOM, which fits that conclusion.

What does free report for this VM? How big is the database?

I'm afraid a reproducer which involves having to buy and set up a special VM and then wait several hours per iteration doesn't exactly fill me with the urge to dig in...

comment:4 by Alex Coventry, 8 years ago

Once the script has run, only its last command is necessary to induce the segfault, and its runtime is less than 3s. If you give me an ssh key, I can give you access to the box where I've run the script already.

The box has 1G of memory. The database has 11G. There are 22 frames in the stack trace, which is included below.

root@testscript:~# free -h
             total       used       free     shared    buffers     cached
Mem:          994M       117M       876M       340K        20M        49M
-/+ buffers/cache:        47M       946M
Swap:           0B         0B         0B
r
root@testscript:~# ls -lh voter_lookup/data/voters-xapian/
total 11G
-rw-r--r-- 1 root root    0 Nov 18 07:03 flintlock
-rw-r--r-- 1 root root   28 Nov 18 06:13 iamchert
-rw-r--r-- 1 root root  38K Nov 18 07:58 position.baseA
-rw-r--r-- 1 root root  38K Nov 18 07:58 position.baseB
-rw-r--r-- 1 root root 2.4G Nov 18 07:58 position.DB
-rw-r--r-- 1 root root  18K Nov 18 07:58 postlist.baseA
-rw-r--r-- 1 root root  18K Nov 18 07:58 postlist.baseB
-rw-r--r-- 1 root root 1.1G Nov 18 07:58 postlist.DB
-rw-r--r-- 1 root root  95K Nov 18 07:58 record.baseA
-rw-r--r-- 1 root root  95K Nov 18 07:58 record.baseB
-rw-r--r-- 1 root root 6.0G Nov 18 07:58 record.DB
-rw-r--r-- 1 root root  14K Nov 18 07:58 termlist.baseA
-rw-r--r-- 1 root root  14K Nov 18 07:58 termlist.baseB
-rw-r--r-- 1 root root 881M Nov 18 07:58 termlist.DB
root@testscript:~# PYTHONPATH=/root gdb python
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...(no debugging symbols found)...done.
(gdb) r
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from voter_lookup import query; print query.search({'RESIDENTIAL_ZIP':'43', 'FIRST_NAME': 'john', 'LAST_NAME': 's'}, active_element='LAST_NAME')
Traceback (most recent call last):
  File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
    from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
get to 1

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5d4aa50 in LocalSubMatch::open_post_list (this=0xa2f9c0, term=..., wqf=1, factor=1, need_positions=<optimized out>, hint=0x7fffffffd830, lazy_weight=<optimized out>) at matcher/localsubmatch.cc:278
278		    pl = (*hint)->open_nearby_postlist(term);
(gdb) where
#0  0x00007ffff5d4aa50 in LocalSubMatch::open_post_list (this=0xa2f9c0, term=..., wqf=1, factor=1, need_positions=<optimized out>, hint=0x7fffffffd830, lazy_weight=<optimized out>) at matcher/localsubmatch.cc:278
#1  0x00007ffff5c63561 in Xapian::Query::Internal::postlist_sub_or_like (this=<optimized out>, ctx=..., qopt=<optimized out>, factor=<optimized out>) at api/queryinternal.cc:588
#2  0x00007ffff5c64873 in Xapian::Internal::QueryBranch::do_or_like (this=this@entry=0xa2e5f0, ctx=..., qopt=qopt@entry=0x7fffffffd820, factor=factor@entry=1, elite_set_size=elite_set_size@entry=0, first=first@entry=0)
    at api/queryinternal.cc:1116
#3  0x00007ffff5c64b70 in Xapian::Internal::QueryOr::postlist (this=0xa2e5f0, qopt=0x7fffffffd820, factor=1) at api/queryinternal.cc:1430
#4  0x00007ffff5c634a1 in Xapian::Query::Internal::postlist_sub_and_like (this=<optimized out>, ctx=..., qopt=<optimized out>, factor=<optimized out>) at api/queryinternal.cc:580
#5  0x00007ffff5c6039b in Xapian::Internal::QueryAndLike::postlist_sub_and_like (this=0xa2f900, ctx=..., qopt=0x7fffffffd820, factor=1) at api/queryinternal.cc:1353
#6  0x00007ffff5c63a6d in Xapian::Internal::QueryAndLike::postlist (this=0xa2f900, qopt=0x7fffffffd820, factor=1) at api/queryinternal.cc:1342
#7  0x00007ffff5d49d85 in LocalSubMatch::get_postlist (this=0xa2f9c0, matcher=0x7fffffffde10, total_subqs_ptr=0x7fffffffd97c) at matcher/localsubmatch.cc:202
#8  0x00007ffff5d4ed71 in MultiMatch::get_mset (this=this@entry=0x7fffffffde10, first=first@entry=0, maxitems=10, check_at_least=check_at_least@entry=10, mset=..., stats=..., mdecider=mdecider@entry=0x0, sorter=0x0)
    at matcher/multimatch.cc:438
#9  0x00007ffff5c549b9 in Xapian::Enquire::Internal::get_mset (this=0xa13f70, first=<optimized out>, first@entry=0, maxitems=<optimized out>, maxitems@entry=10, check_at_least=<optimized out>, check_at_least@entry=0, 
    rset=rset@entry=0x0, mdecider=mdecider@entry=0x0) at api/omenquire.cc:707
#10 0x00007ffff5c54cfd in Xapian::Enquire::get_mset (this=this@entry=0x9e69e0, first=first@entry=0, maxitems=maxitems@entry=10, check_at_least=check_at_least@entry=0, rset=rset@entry=0x0, mdecider=mdecider@entry=0x0)
    at api/omenquire.cc:1071
#11 0x00007ffff604c5e2 in _wrap_Enquire_get_mset__SWIG_3 (swig_obj=0x7fffffffdfd0, nobjs=3) at xapian_wrap.cc:25987
#12 _wrap_Enquire_get_mset (self=<optimized out>, args=<optimized out>) at xapian_wrap.cc:26129
#13 0x000000000049ec76 in PyEval_EvalFrameEx ()
#14 0x00000000004a090c in PyEval_EvalCodeEx ()
#15 0x0000000000499a52 in PyEval_EvalFrameEx ()
#16 0x00000000004a1634 in ?? ()
#17 0x000000000044e87c in PyRun_InteractiveOneFlags ()
#18 0x000000000044e998 in PyRun_InteractiveLoopFlags ()
#19 0x000000000044ed70 in PyRun_AnyFileExFlags ()
#20 0x000000000044f904 in Py_Main ()
#21 0x00007ffff7818ec5 in __libc_start_main (main=0x44f9c2 <main>, argc=1, argv=0x7fffffffe6b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6a8) at libc-start.c:287
#22 0x0000000000578c4e in _start ()
(gdb) 

comment:5 by Olly Betts, 8 years ago

Interesting - *hint shouldn't be NULL at this point, as we check for that on the line before; hint shouldn't be NULL either, as it's just a pointer to a member of the object we call this from (and the stack shows it isn't).

My guess is that the postlist pointed to by *hint ends up deleted at this point, though I've not yet managed to reproduce that.

Can you print the variables involved, in gdb:

p hint
p *hint
p (*hint)->get_description()->c_str()
p term->c_str()
p *this

comment:6 by Olly Betts, 8 years ago

Component: Xapian-bindings (Python)Library API
Milestone: 1.3.x1.3.4
Owner: changed from Richard Boulton to Olly Betts
Status: newassigned
Summary: Segfault/OOM in git masterUse after free with WILDCARD_LIMIT_MOST_FREQUENT
Version: other1.3.3

OK, have reproduced this (and directly from C++), so no need for the above.

The issue is with Xapian::Query::WILDCARD_LIMIT_MOST_FREQUENT, which QueryParser defaults to using for FLAG_PARTIAL - this can delete the PostList object which *hint points to. That wildcard limit option was new in 1.3.3, so this doesn't affect anything earlier (or 1.2.x).

comment:7 by Alex Coventry, 8 years ago

Thanks for investigating it.

comment:8 by Olly Betts, 8 years ago

Should be fixed by [2299e1d21e39f1295c81833ccd5037f746f4744a]. Can you confirm that deals with the segfaults?

That bug doesn't explain the OOM errors - I think they are likely mostly down to the memory used by the postlists for all the terms the wildcard expands to.

I'd expect you can avoid those by using the "glass" backend (not the default yet) - glass reference counts cursor blocks, which makes a huge difference to wildcards (because all the terms are together and so most of the cursor blocks can be reused several times over).

The downside is that the on-disk format for glass isn't completely fixed yet, so you potentially need to reindex after an upgrade. The simplest way to select it is probably just to export XAPIAN_PREFER_GLASS=1 before running the indexer - that changes the default when creating a new database.

comment:9 by Alex Coventry, 8 years ago

Yep, that seems to fix it. Thanks!

comment:10 by Olly Betts, 8 years ago

Resolution: fixed
Status: assignedclosed

Thanks for confirming.

Note: See TracTickets for help on using tickets.