#784 closed enhancement (wontfix)
Have a public API for merging MSets
Reported by: | German M. Bravo | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Library API | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Enquire::get_mset
returns a merged MSet
for a query (from every shard in a database). It is desirable for some use cases to be able to do the merging separately, from a set of sub-MSet
s. An example of such use-case would be to be able to parallelize matching for a query in multiple databases and then using the proposed API to get a final (merged) MSet
.
Change History (5)
comment:2 by , 5 years ago
I think parallel query matching is better handled automatically inside the matcher rather than by adding new API features which users then have to connect up for themselves.
comment:3 by , 5 years ago
Component: | Other → Library API |
---|
comment:4 by , 3 years ago
template<ITOR> Xapian::MSet merge(ITOR begin, ITOR end)
To clarify, you mean that to be a static method of MSet? of what type would that ITOR be? and where/how would it get the stats?
Looks like I failed to answer this part.
Probably as a static method of MSet
, yes.
ITOR would be any iterator type returning a Xapian::MSet
(or reference), so you could do stuff like:
auto merged_= Xapian::MSet::merge(mset_container.begin(), mset_container.end();
The stats method would be to get the stats for a query. I thought merging was a two-seteps process:
- Get stats for a query from each involved database and merge as "total" stats; and
- Get query results (queries using the total stats) and merge results into the merged MSet
That's approximately how the match works internally, that's a public API to allow reimplementing something equivalent to Xapian's matcher outside of Xapian rather than just for merging MSets.
I think parallel query matching is better handled automatically inside the matcher rather than by adding new API features which users then have to connect up for themselves.
I'm guessing your actual motivation here is for use by xapiand, right?
What does xapiand actually need? Instead of just trying to expose a lot of internal details of Xapian can we make some smaller tweaks which would allow xapiand to effectively do what it needs to via the existing API?
For example, parallel matching could clearly be done within the existing matcher. There's a risk it might be slower if you're talking about matching multiple local shards in parallel since I/O is the limitation as it would likely result in more scattered I/O access pattern. Also currently each shard processed locally can benefit from a minimum weight established by the shard(s) before, so there's more total work to do if shards are processed in parallel. It would likely useful for cases where time for a single search matters more than total throughput and the database is mostly cached. And by doing in it Xapian other users of Xapian can benefit.
comment:5 by , 2 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
No response in ~4 months.
Looking at the xapiand repo, I can't help conclude that the project is no longer at all active. https://github.com/Kronuz/Xapiand/issues/39 explicitly asked "Is this project dead?" and sadly nobody has replied to it in over 9 months.
Without xapiand, there doesn't seem to be any motivation for this change. Even with xapiand, I think it would make more sense to implement parallel matching inside Xapian rather than outside.
Hence closing.
In the IRC chat, olly asked/suggested:
To clarify, you mean that to be a static method of MSet? of what type would that
ITOR
be? and where/how would it get the stats?The stats method would be to get the stats for a query. I thought merging was a two-seteps process: