Opened 5 hours ago
#838 new defect
Invalid write in EstimateOp::report_ratio
Reported by: | Robert Stepanek | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Matcher | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
I am debugging a memory violation with latest 1.5 master when running Xapian::Enquire::get_mset for a specific query and database. I managed to reduce the query and database to the minimal set that's required to reproduce the memory violation, but now I'm stuck how to further debug and fix this.
The query looks like this (I replaced the actual terms with placeholders):
Query((term1@1 AND ((term2@1 PHRASE 2 term2@2) OR (term2@1 PHRASE 2 term2@2)) AND (<alldocuments> AND_NOT (<alldocuments> FILTER XEP))))
I can not share the database, but it's a glass database with the following characteristics:
$ xapian-delve-1.5 crasherdb UUID = fcdb211b-3bad-4583-8863-a99ef02a40fe number of documents = 2 average document length = 167.5 document length lower bound = 102 document length upper bound = 233 highest document id ever used = 3 has positional information = true revision = 1 currently open for writing = false
What happens for this query and this particular database is that valgrind reports an invalid write (full valgrind log is attached):
==29738== Invalid write of size 4 ==29738== at 0x4A417CC: report_ratio (estimateop.h:124) ==29738== by 0x4A417CC: SelectPostList::~SelectPostList() (selectpostlist.cc:61) ==29738== by 0x4A2ED93: ExactPhrasePostList::~ExactPhrasePostList() (exactphrasepostlist.cc:68) ==29738== by 0x4A4022F: ~OrPostList (orpostlist.h:71) ==29738== by 0x4A4022F: OrPostList::~OrPostList() (orpostlist.h:73) ==29738== by 0x4A2B67F: next_helper (andpostlist.h:75) ==29738== by 0x4A2B67F: AndPostList::next(double) (andpostlist.cc:124) ==29738== by 0x4A35DB3: PostListTree::next(double) (postlisttree.h:151) ==29738== by 0x4A32DC3: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:499)
which is because the written memory got freed previously
==29738== Address 0x52a248c is 12 bytes inside a block of size 40 free'd ==29738== at 0x4888360: operator delete(void*, unsigned long) (vg_replace_malloc.c:935) ==29738== by 0x493046B: pop_op (localsubmatch.h:112) ==29738== by 0x493046B: destroy_postlist (queryoptimiser.h:176) ==29738== by 0x493046B: shrink (queryinternal.cc:190) ==29738== by 0x493046B: Xapian::Internal::Context::~Context() (queryinternal.cc:153) ==29738== by 0x492B637: ~OrContext (queryinternal.cc:370) ==29738== by 0x492B637: operator() (unique_ptr.h:95) ==29738== by 0x492B637: operator() (unique_ptr.h:89) ==29738== by 0x492B637: reset (unique_ptr.h:203) ==29738== by 0x492B637: reset (unique_ptr.h:501) ==29738== by 0x492B637: Xapian::Internal::AndContext::postlist(Xapian::Internal::TermFreqs*) (queryinternal.cc:833) ==29738== by 0x492B8AF: Xapian::Internal::QueryAndLike::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2502) ==29738== by 0x4A30A1B: LocalSubMatch::get_postlist(PostListTree*, unsigned int*) (localsubmatch.cc:188) ==29738== by 0x4A32A8B: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:381)
and that data had been allocated before at
==29738== Block was alloc'd at ==29738== at 0x4885828: operator new(unsigned long) (vg_replace_malloc.c:422) ==29738== by 0x492B4B7: add_op<EstimateOp::op_type> (localsubmatch.h:101) ==29738== by 0x492B4B7: add_op<EstimateOp::op_type> (queryoptimiser.h:84) ==29738== by 0x492B4B7: postlist (queryinternal.cc:629) ==29738== by 0x492B4B7: Xapian::Internal::AndContext::postlist(Xapian::Internal::TermFreqs*) (queryinternal.cc:842) ==29738== by 0x492B8AF: Xapian::Internal::QueryAndLike::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2502) ==29738== by 0x492873F: Xapian::Query::Internal::postlist_sub_or_like(Xapian::Internal::OrContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*, bool) const (queryinternal.cc:1174) ==29738== by 0x492C307: Xapian::Internal::QueryBranch::do_or_like(Xapian::Internal::OrContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*, unsigned int, unsigned long, bool) const (queryinternal.cc:2256) ==29738== by 0x492C9AF: Xapian::Internal::QueryOr::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2619) ==29738== by 0x492A977: Xapian::Query::Internal::postlist_sub_and_like(Xapian::Internal::AndContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:1163) ==29738== by 0x492524F: Xapian::Internal::QueryAndLike::postlist_sub_and_like(Xapian::Internal::AndContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2515) ==29738== by 0x492B89B: Xapian::Internal::QueryAndLike::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2499) ==29738== by 0x4A30A1B: LocalSubMatch::get_postlist(PostListTree*, unsigned int*) (localsubmatch.cc:188) ==29738== by 0x4A32A8B: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:381)
The valgrind log suggests to me that the various rewrites of the EstimateOp stack do no propagate to all the places where pointers to those EstimateOps are held. To mitigate the invalid write, a shared_ptr might help, but I fear that would just hide the logical bug that's occurring here? I might try doing that though, to at least mitigate the crash until I could fix the actual root cause.
Since I can't share the database as-is, is there some way I can rewrite the terms in that database to obfuscate its contents but still replicate the crash so that I can share it? I tried trimming down the database to the minimal set using the WritableDatabase API, but it looks as if I can't mess with the postlists or term lists that way?
Valgrind log