Opened 3 weeks ago

Closed 3 weeks ago

#852 closed defect (fixed)

Segfault in enquire.get_mset(0,0,x) if collapse key is set.

Reported by: Antonio Rojas Owned by: Olly Betts
Priority: normal Milestone: 2.0.1
Component: Matcher Version: 2.0.0
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: Linux

Description

The following test code segfaults in 2.0, it didn't in 1.4:

#include <xapian.h>

int main() {
    char dir[] = "/tmp/xapian_test_XXXXXX";
    mkdtemp(dir);
    std::string path(dir);

    Xapian::WritableDatabase db(path, Xapian::DB_CREATE_OR_OPEN);
    for (int i = 1; i <= 2; i++) {
        Xapian::Document doc;
        doc.set_data("foo");
        doc.add_term("hello");
        doc.add_value(0, "0");
        db.add_document(doc);
    }

    Xapian::Enquire enquire(db);
    enquire.set_query(Xapian::Query("hello"));
    enquire.set_collapse_key(0);
    Xapian::MSet mset = enquire.get_mset(0, 0, 1);

    return 0;
}

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
CollapseData::check_item (this=<optimized out>, results=std::vector of length 0, capacity 0, result=..., collapse_max=<optimized out>, 
    mcmp=0x7ffff7d574a0 <msetcmp_by_relevance<true>(Result const&, Result const&)>, old_item=<optimized out>) at ./api/result.h:68
⚠ warning: Source file is more recent than executable.
68          Xapian::docid get_docid() const { return did; }
(gdb) bt
#0  CollapseData::check_item (this=<optimized out>, results=std::vector of length 0, capacity 0, result=..., collapse_max=<optimized out>, 
    mcmp=0x7ffff7d574a0 <msetcmp_by_relevance<true>(Result const&, Result const&)>, old_item=<optimized out>) at ./api/result.h:68
#1  Collapser::check (this=<optimized out>, result=..., vsdoc=...) at matcher/collapser.cc:153
#2  0x00007ffff7d55bd1 in ProtoMSet::process (this=0x7fffffffdd60, new_item=..., vsdoc=...) at matcher/protomset.h:309
#3  Matcher::get_local_mset (this=<optimized out>, first=<optimized out>, maxitems=<optimized out>, check_at_least=<optimized out>, wtscheme=..., 
    mdecider=<optimized out>, sorter=<optimized out>, collapse_key=<optimized out>, collapse_max=<optimized out>, percent_threshold=<optimized out>, 
    percent_threshold_factor=<optimized out>, weight_threshold=<optimized out>, order=<optimized out>, sort_key=<optimized out>, 
    sort_by=<optimized out>, sort_val_reverse=<optimized out>, time_limit=<optimized out>, matchspies=...) at matcher/matcher.cc:563
#4  Matcher::get_mset (this=<optimized out>, first=<optimized out>, maxitems=<optimized out>, check_at_least=<optimized out>, stats=..., wtscheme=..., 
    mdecider=<optimized out>, sorter=<optimized out>, collapse_key=<optimized out>, collapse_max=<optimized out>, percent_threshold=<optimized out>, 
    weight_threshold=<optimized out>, order=<optimized out>, sort_key=<optimized out>, sort_by=<optimized out>, sort_val_reverse=<optimized out>, 
    time_limit=<optimized out>, matchspies=...) at matcher/matcher.cc:660
#5  0x00007ffff7c59642 in Xapian::Enquire::Internal::get_mset (this=0x555555586d10, first=0, maxitems=<optimized out>, checkatleast=<optimized out>, 
    rset=<optimized out>, mdecider=0x0) at api/enquire.cc:328
#6  0x00007ffff7c599b9 in Xapian::Enquire::get_mset (this=<optimized out>, first=<optimized out>, maxitems=<optimized out>, 
    checkatleast=<optimized out>, rset=<optimized out>, mdecider=0x0) at ./include/xapian/intrusive_ptr.h:301
#7  0x00005555555564d0 in main () at example.cpp:20

Change History (2)

comment:1 by Olly Betts, 3 weeks ago

Component: OtherMatcher
Milestone: 2.0.1
Status: newassigned

Thanks, your reproducer triggers the problem for me too.

The cause is that we literally keep track of zero results when maxitems is 0, but the collapsing code tries to look at the lowest ranked result we've found so far. We could special case this situation, or internally force maxitems to be at least 1 if check_at_least is non-zero. I'm leaning towards the latter as it shouldn't add overhead (but I should profile to check that's really the case) but potentially protects us from other situations similar to this. It may also give slightly better bounds and estimate of the number of matches.

A simple work-around in user code would be to emulate the second approach to fixing and use enquire.get_mset(0, 1, 1) instead when collapsing (maxitems should only need to be 1, even if checkatleast is more, so e.g. enquire.get_mset(0, 1, 10)).

(P.S. Congratulations on the first 2.0.0 bug report!)

comment:2 by Olly Betts, 3 weeks ago

Resolution: fixed
Status: assignedclosed

Fixed by 832e642db73b8e91d2c47088cf4f149e7e18e81f. Doesn't affect 1.4.x so nothing to backport.

Note: See TracTickets for help on using tickets.