Opened 3 months ago
Last modified 8 weeks ago
#833 new defect
The Xapian 1.4.25 causes the program to crash when performing concurrent multi-threaded queries.
Reported by: | myx | Owned by: | Olly Betts |
---|---|---|---|
Priority: | highest | Milestone: | |
Component: | Backend-Glass | Version: | 1.4.25 |
Severity: | blocker | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Hello!
I compiled the Xapian 1.4.25 Windows version library using msys2 and MSVC 2017, and I wrote a test program to test it. During testing, I found that when using multithreaded queries in Xapian (with read-write locks), the program crashes easily.
In the test code, I used the TestRead class for multithreaded querying. When I pre-construct the Xapian::Query object based on the query conditions and pass it to the member variable during the construction of TestRead, the program crashes when it is used in the worker thread. However, if I create a temporary Xapian::Query object in the worker thread based on the same query conditions, the program runs normally(These two testing methods can be toggled using the macro USE_GLOBAL_XAPIAN_QUERY_OBJECT). I suspect there might be an issue with intrusive_ptr regarding reference counting management.
This same test did not reveal any such exceptions in the Xapian 1.2.25 version.
Initially, I doubted whether there was a problem with the compiled Windows Xapian library, so I tested it again on a Linux (Suse 12.5) system, where the 1.2.25 version worked fine, but the 1.4.25 version still crashed, and the crash behavior was consistent.
You can find the specific code in the attached files, which include a test project for MSVC 2017, the Xapian database files, the compiled library files, and some screenshots of the stack trace at the time of the crash.
Please take some time out of your busy schedule to look into what might be causing this issue and if there are any solutions. Thank you very much!
If you have any questions, please feel free to contact me. I look forward to your reply. Thank you once again!
Attachments (3)
Change History (7)
by , 3 months ago
Attachment: | testxapian.zip added |
---|
by , 3 months ago
Attachment: | crash.docx added |
---|
comment:1 by , 3 months ago
There is a size limit for attachments. May I send it via email? (I will need an email address.)
comment:2 by , 3 months ago
The internal implementation of Xapian::Query
changed a lot between 1.2.x and 1.4.x, and notably it doesn't use reference counted internals in 1.2.x but does in 1.4.x.
(We also switched from a home-grown reference counted pointer to one based on Boost's intrusive_ptr, but I'd expect Boost's implementation to be less buggy than our old one as it will have seen a lot more use.)
I tried looking at your code, but it's too complex for me to easily follow what's going on.
Are you sure you are locking between threads sufficiently? Not doing so would explain problems. For example, you can't have two different threads modifying the reference count on a Xapian object concurrently. For example, care is needed if you want to pass ownership of an object between threads.
Otherwise I'd suggest reducing it to a minimal testcase that shows the problem. That may make it obvious what's going on, and if not it'll be more feasible for me to investigate.
Not sure what the file you couldn't attach is, but if it's too large to attach it's unlikely to be useful to me - I really need a small, self-contained reproducer to look at.
comment:3 by , 3 months ago
Thank you very much for your reply!
The crash problem I described occurs in situations with multithreaded reads and does not require concurrent read/write scenarios. Additionally, I have added locks during multithreaded reads.
Following your suggestion, I have adjusted the test code structure and simplified it to make the problem clearer.
In the test code, when the macro USE_GLOBAL_XAPIAN_QUERY_OBJECT is defined, the problem can be reproduced. You can take a look at the relevant code, and if you want to debug, you can compile and run the test program to try to reproduce the problem.
Please let me know if there is any new progress, thanks!
by , 3 months ago
Attachment: | testxapian_minimal.zip added |
---|
comment:4 by , 8 weeks ago
Sorry for taking a while to look into this.
It looks to me like the problem is in TestRead::Run()
:
Xapian::Query query; if (mQueryObjConstructed) { query = mQuery; } else { ConstructQuery(mQueryCond, query); }
When USE_GLOBAL_XAPIAN_QUERY_OBJECT
is defined, we construct using new TestRead(queryCond, query)
so mQueryObjConstructed
is set and we take the first branch.
The assignment query = mQuery;
increments the reference count of this Xapian::Query
object's internal representation (it will also get decremented again when this method returns and query
goes out of scope). mQuery
for each TestRead
shares the same internal representation so these reference count changes are happening on the same object across every thread and there's no locking to prevent them happening concurrently.
test code