Opened 5 hours ago

#838 new defect

Invalid write in EstimateOp::report_ratio

Reported by: Robert Stepanek Owned by: Olly Betts
Priority: normal Milestone:
Component: Matcher Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

I am debugging a memory violation with latest 1.5 master when running Xapian::Enquire::get_mset for a specific query and database. I managed to reduce the query and database to the minimal set that's required to reproduce the memory violation, but now I'm stuck how to further debug and fix this.

The query looks like this (I replaced the actual terms with placeholders):

Query((term1@1 AND ((term2@1 PHRASE 2 term2@2) OR (term2@1 PHRASE 2 term2@2)) AND (<alldocuments> AND_NOT (<alldocuments> FILTER XEP))))

I can not share the database, but it's a glass database with the following characteristics:

$ xapian-delve-1.5 crasherdb
UUID = fcdb211b-3bad-4583-8863-a99ef02a40fe
number of documents = 2
average document length = 167.5
document length lower bound = 102
document length upper bound = 233
highest document id ever used = 3
has positional information = true
revision = 1
currently open for writing = false

What happens for this query and this particular database is that valgrind reports an invalid write (full valgrind log is attached):

==29738== Invalid write of size 4
==29738==    at 0x4A417CC: report_ratio (estimateop.h:124)
==29738==    by 0x4A417CC: SelectPostList::~SelectPostList() (selectpostlist.cc:61)
==29738==    by 0x4A2ED93: ExactPhrasePostList::~ExactPhrasePostList() (exactphrasepostlist.cc:68)
==29738==    by 0x4A4022F: ~OrPostList (orpostlist.h:71)
==29738==    by 0x4A4022F: OrPostList::~OrPostList() (orpostlist.h:73)
==29738==    by 0x4A2B67F: next_helper (andpostlist.h:75)
==29738==    by 0x4A2B67F: AndPostList::next(double) (andpostlist.cc:124)
==29738==    by 0x4A35DB3: PostListTree::next(double) (postlisttree.h:151)
==29738==    by 0x4A32DC3: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:499)

which is because the written memory got freed previously

==29738==  Address 0x52a248c is 12 bytes inside a block of size 40 free'd
==29738==    at 0x4888360: operator delete(void*, unsigned long) (vg_replace_malloc.c:935)
==29738==    by 0x493046B: pop_op (localsubmatch.h:112)
==29738==    by 0x493046B: destroy_postlist (queryoptimiser.h:176)
==29738==    by 0x493046B: shrink (queryinternal.cc:190)
==29738==    by 0x493046B: Xapian::Internal::Context::~Context() (queryinternal.cc:153)
==29738==    by 0x492B637: ~OrContext (queryinternal.cc:370)
==29738==    by 0x492B637: operator() (unique_ptr.h:95)
==29738==    by 0x492B637: operator() (unique_ptr.h:89)
==29738==    by 0x492B637: reset (unique_ptr.h:203)
==29738==    by 0x492B637: reset (unique_ptr.h:501)
==29738==    by 0x492B637: Xapian::Internal::AndContext::postlist(Xapian::Internal::TermFreqs*) (queryinternal.cc:833)
==29738==    by 0x492B8AF: Xapian::Internal::QueryAndLike::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2502)
==29738==    by 0x4A30A1B: LocalSubMatch::get_postlist(PostListTree*, unsigned int*) (localsubmatch.cc:188)
==29738==    by 0x4A32A8B: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:381)

and that data had been allocated before at

==29738==  Block was alloc'd at
==29738==    at 0x4885828: operator new(unsigned long) (vg_replace_malloc.c:422)
==29738==    by 0x492B4B7: add_op<EstimateOp::op_type> (localsubmatch.h:101)
==29738==    by 0x492B4B7: add_op<EstimateOp::op_type> (queryoptimiser.h:84)
==29738==    by 0x492B4B7: postlist (queryinternal.cc:629)
==29738==    by 0x492B4B7: Xapian::Internal::AndContext::postlist(Xapian::Internal::TermFreqs*) (queryinternal.cc:842)
==29738==    by 0x492B8AF: Xapian::Internal::QueryAndLike::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2502)
==29738==    by 0x492873F: Xapian::Query::Internal::postlist_sub_or_like(Xapian::Internal::OrContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*, bool) const (queryinternal.cc:1174)
==29738==    by 0x492C307: Xapian::Internal::QueryBranch::do_or_like(Xapian::Internal::OrContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*, unsigned int, unsigned long, bool) const (queryinternal.cc:2256)
==29738==    by 0x492C9AF: Xapian::Internal::QueryOr::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2619)
==29738==    by 0x492A977: Xapian::Query::Internal::postlist_sub_and_like(Xapian::Internal::AndContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:1163)
==29738==    by 0x492524F: Xapian::Internal::QueryAndLike::postlist_sub_and_like(Xapian::Internal::AndContext&, Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2515)
==29738==    by 0x492B89B: Xapian::Internal::QueryAndLike::postlist(Xapian::Internal::QueryOptimiser*, double, Xapian::Internal::TermFreqs*) const (queryinternal.cc:2499)
==29738==    by 0x4A30A1B: LocalSubMatch::get_postlist(PostListTree*, unsigned int*) (localsubmatch.cc:188)
==29738==    by 0x4A32A8B: Matcher::get_local_mset(unsigned int, unsigned int, unsigned int, Xapian::Weight const&, Xapian::MatchDecider const*, Xapian::KeyMaker const*, unsigned int, unsigned int, int, double, double, Xapian::Enquire::docid_order, unsigned int, Xapian::Enquire::Internal::sort_setting, bool, double, std::vector<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy>, std::allocator<Xapian::Internal::opt_intrusive_ptr<Xapian::MatchSpy> > > const&) (matcher.cc:381)

The valgrind log suggests to me that the various rewrites of the EstimateOp stack do no propagate to all the places where pointers to those EstimateOps are held. To mitigate the invalid write, a shared_ptr might help, but I fear that would just hide the logical bug that's occurring here? I might try doing that though, to at least mitigate the crash until I could fix the actual root cause.

Since I can't share the database as-is, is there some way I can rewrite the terms in that database to obfuscate its contents but still replicate the crash so that I can share it? I tried trimming down the database to the minimal set using the WritableDatabase API, but it looks as if I can't mess with the postlists or term lists that way?

Attachments (1)

valgrind.txt (29.3 KB ) - added by Robert Stepanek 5 hours ago.
Valgrind log

Download all attachments as: .zip

Change History (1)

by Robert Stepanek, 5 hours ago

Attachment: valgrind.txt added

Valgrind log

Note: See TracTickets for help on using tickets.