Opened 3 weeks ago
Closed 3 weeks ago
#852 closed defect (fixed)
Segfault in enquire.get_mset(0,0,x) if collapse key is set.
| Reported by: | Antonio Rojas | Owned by: | Olly Betts |
|---|---|---|---|
| Priority: | normal | Milestone: | 2.0.1 |
| Component: | Matcher | Version: | 2.0.0 |
| Severity: | normal | Keywords: | |
| Cc: | Blocked By: | ||
| Blocking: | Operating System: | Linux |
Description
The following test code segfaults in 2.0, it didn't in 1.4:
#include <xapian.h>
int main() {
char dir[] = "/tmp/xapian_test_XXXXXX";
mkdtemp(dir);
std::string path(dir);
Xapian::WritableDatabase db(path, Xapian::DB_CREATE_OR_OPEN);
for (int i = 1; i <= 2; i++) {
Xapian::Document doc;
doc.set_data("foo");
doc.add_term("hello");
doc.add_value(0, "0");
db.add_document(doc);
}
Xapian::Enquire enquire(db);
enquire.set_query(Xapian::Query("hello"));
enquire.set_collapse_key(0);
Xapian::MSet mset = enquire.get_mset(0, 0, 1);
return 0;
}
Backtrace:
Program received signal SIGSEGV, Segmentation fault.
CollapseData::check_item (this=<optimized out>, results=std::vector of length 0, capacity 0, result=..., collapse_max=<optimized out>,
mcmp=0x7ffff7d574a0 <msetcmp_by_relevance<true>(Result const&, Result const&)>, old_item=<optimized out>) at ./api/result.h:68
⚠ warning: Source file is more recent than executable.
68 Xapian::docid get_docid() const { return did; }
(gdb) bt
#0 CollapseData::check_item (this=<optimized out>, results=std::vector of length 0, capacity 0, result=..., collapse_max=<optimized out>,
mcmp=0x7ffff7d574a0 <msetcmp_by_relevance<true>(Result const&, Result const&)>, old_item=<optimized out>) at ./api/result.h:68
#1 Collapser::check (this=<optimized out>, result=..., vsdoc=...) at matcher/collapser.cc:153
#2 0x00007ffff7d55bd1 in ProtoMSet::process (this=0x7fffffffdd60, new_item=..., vsdoc=...) at matcher/protomset.h:309
#3 Matcher::get_local_mset (this=<optimized out>, first=<optimized out>, maxitems=<optimized out>, check_at_least=<optimized out>, wtscheme=...,
mdecider=<optimized out>, sorter=<optimized out>, collapse_key=<optimized out>, collapse_max=<optimized out>, percent_threshold=<optimized out>,
percent_threshold_factor=<optimized out>, weight_threshold=<optimized out>, order=<optimized out>, sort_key=<optimized out>,
sort_by=<optimized out>, sort_val_reverse=<optimized out>, time_limit=<optimized out>, matchspies=...) at matcher/matcher.cc:563
#4 Matcher::get_mset (this=<optimized out>, first=<optimized out>, maxitems=<optimized out>, check_at_least=<optimized out>, stats=..., wtscheme=...,
mdecider=<optimized out>, sorter=<optimized out>, collapse_key=<optimized out>, collapse_max=<optimized out>, percent_threshold=<optimized out>,
weight_threshold=<optimized out>, order=<optimized out>, sort_key=<optimized out>, sort_by=<optimized out>, sort_val_reverse=<optimized out>,
time_limit=<optimized out>, matchspies=...) at matcher/matcher.cc:660
#5 0x00007ffff7c59642 in Xapian::Enquire::Internal::get_mset (this=0x555555586d10, first=0, maxitems=<optimized out>, checkatleast=<optimized out>,
rset=<optimized out>, mdecider=0x0) at api/enquire.cc:328
#6 0x00007ffff7c599b9 in Xapian::Enquire::get_mset (this=<optimized out>, first=<optimized out>, maxitems=<optimized out>,
checkatleast=<optimized out>, rset=<optimized out>, mdecider=0x0) at ./include/xapian/intrusive_ptr.h:301
#7 0x00005555555564d0 in main () at example.cpp:20
Change History (2)
comment:1 by , 3 weeks ago
| Component: | Other → Matcher |
|---|---|
| Milestone: | → 2.0.1 |
| Status: | new → assigned |
comment:2 by , 3 weeks ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Fixed by 832e642db73b8e91d2c47088cf4f149e7e18e81f. Doesn't affect 1.4.x so nothing to backport.
Note:
See TracTickets
for help on using tickets.

Thanks, your reproducer triggers the problem for me too.
The cause is that we literally keep track of zero results when
maxitemsis 0, but the collapsing code tries to look at the lowest ranked result we've found so far. We could special case this situation, or internally forcemaxitemsto be at least 1 ifcheck_at_leastis non-zero. I'm leaning towards the latter as it shouldn't add overhead (but I should profile to check that's really the case) but potentially protects us from other situations similar to this. It may also give slightly better bounds and estimate of the number of matches.A simple work-around in user code would be to emulate the second approach to fixing and use
enquire.get_mset(0, 1, 1)instead when collapsing (maxitemsshould only need to be 1, even ifcheckatleastis more, so e.g.enquire.get_mset(0, 1, 10)).(P.S. Congratulations on the first 2.0.0 bug report!)