Opened 6 years ago

Closed 2 years ago

Last modified 17 months ago

#784 closed enhancement (wontfix)

Have a public API for merging MSets

Reported by: German M. Bravo Owned by: Olly Betts
Priority: normal Milestone:
Component: Library API Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Enquire::get_mset returns a merged MSet for a query (from every shard in a database). It is desirable for some use cases to be able to do the merging separately, from a set of sub-MSets. An example of such use-case would be to be able to parallelize matching for a query in multiple databases and then using the proposed API to get a final (merged) MSet.

Change History (5)

comment:1 by German M. Bravo, 6 years ago

In the IRC chat, olly asked/suggested:

Not sure why a method for stats is needed - wouldn't the public api be a static method something like:
template<ITOR> Xapian::MSet merge(ITOR begin, ITOR end)?
which should take care of any merging of stats etc that's needed


To clarify, you mean that to be a static method of MSet? of what type would that ITOR be? and where/how would it get the stats?

The stats method would be to get the stats for a query. I thought merging was a two-seteps process:

  1. Get stats for a query from each involved database and merge as "total" stats; and
  2. Get query results (queries using the total stats) and merge results into the merged MSet
Last edited 6 years ago by German M. Bravo (previous) (diff)

comment:2 by Olly Betts, 5 years ago

I think parallel query matching is better handled automatically inside the matcher rather than by adding new API features which users then have to connect up for themselves.

comment:3 by Olly Betts, 5 years ago

Component: OtherLibrary API

comment:4 by Olly Betts, 2 years ago

template<ITOR> Xapian::MSet merge(ITOR begin, ITOR end)

To clarify, you mean that to be a static method of MSet? of what type would that ITOR be? and where/how would it get the stats?

Looks like I failed to answer this part.

Probably as a static method of MSet, yes.

ITOR would be any iterator type returning a Xapian::MSet (or reference), so you could do stuff like:

auto merged_= Xapian::MSet::merge(mset_container.begin(), mset_container.end();

The stats method would be to get the stats for a query. I thought merging was a two-seteps process:

  1. Get stats for a query from each involved database and merge as "total" stats; and
  2. Get query results (queries using the total stats) and merge results into the merged MSet

That's approximately how the match works internally, that's a public API to allow reimplementing something equivalent to Xapian's matcher outside of Xapian rather than just for merging MSets.

I think parallel query matching is better handled automatically inside the matcher rather than by adding new API features which users then have to connect up for themselves.

I'm guessing your actual motivation here is for use by xapiand, right?

What does xapiand actually need? Instead of just trying to expose a lot of internal details of Xapian can we make some smaller tweaks which would allow xapiand to effectively do what it needs to via the existing API?

For example, parallel matching could clearly be done within the existing matcher. There's a risk it might be slower if you're talking about matching multiple local shards in parallel since I/O is the limitation as it would likely result in more scattered I/O access pattern. Also currently each shard processed locally can benefit from a minimum weight established by the shard(s) before, so there's more total work to do if shards are processed in parallel. It would likely useful for cases where time for a single search matters more than total throughput and the database is mostly cached. And by doing in it Xapian other users of Xapian can benefit.

Last edited 17 months ago by Olly Betts (previous) (diff)

comment:5 by Olly Betts, 2 years ago

Resolution: wontfix
Status: newclosed

No response in ~4 months.

Looking at the xapiand repo, I can't help conclude that the project is no longer at all active. https://github.com/Kronuz/Xapiand/issues/39 explicitly asked "Is this project dead?" and sadly nobody has replied to it in over 9 months.

Without xapiand, there doesn't seem to be any motivation for this change. Even with xapiand, I think it would make more sense to implement parallel matching inside Xapian rather than outside.

Hence closing.

Note: See TracTickets for help on using tickets.