Opened 9 years ago
Last modified 14 months ago
#712 new defect
Geospatial API thoughts
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Library API | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
I've been looking at making the geospatial API not force you to specify a metric (the only user feedback I've so far heard about it was that it's annoying to have to do so when there's a sane default), and it feels like it provides unnecessary flexibility which close to nobody is actually going to want, but in doing so it forces inefficiencies upon all users.
The LatLongMetric
class is essentially a pluggable algorithm for "distance between two coordinates".
We don't actually seem to document any requirements for it, though presumably it ought to fulfil the requirements of a "metric" in the mathematical sense (https://en.wikipedia.org/wiki/Metric_%28mathematics%29):
- d(x,y) >= 0
- d(x,y) == 0 <=> x == y
- d(x,y) = d(y,x)
- d(x,z) <= d(x,y) + d(y,z)
If we documented these as requirements, we might be able to take advantage them to optimise - e.g. when handling distance from a polygon.
The documentation only seems to suggest you might want to use a different metric which calculates great circle distances more exactly. The provided metric assumes that the planet is a sphere, and uses the haversine function, which suffers from numeric errors for points close to opposite each other - I struggle to believe that for purposes of ranking or limiting searches that either of these is a problem in reality.
Providing a metric which calculates "distance by road" or similar seems a more plausible reason an API user might to want to provide their own, but the documentation doesn't even hint that this as something you could do. You would need to take care that the metric wasn't too expensive to calculate of course - running a full routing algorithm for every pair of points being considered is unlikely to scale well.
The forced inefficiency issue I noted is that metric has to calculate "pointwise_distance" in metres. At least for sorting and filtering, we really just need a value which is strictly monotonic in distance (and for filtering to reverse map the threshold into the same domain). For haversine, we could sort by h, and avoid having to calculate an arcsin and square root each time. Not exposing metrics would easily allow that, but redefining the interface carefully could also allow it.
None of this feasible before 1.4.0 (except perhaps dropping metrics from the public API entirely), but I wanted to write down my thoughts while they were still fresh.
I think for 1.4.0 we default the metric if not specified, and either just leave the metric classes marked as experimental, or improve the documentation to actually say what the requirements are and give a better example of a user metric.
Change History (3)
comment:1 by , 2 years ago
comment:2 by , 21 months ago
Milestone: | → 1.5.0 |
---|
1.5.0 is an opportunity to make API changes, so setting milestone so we consider this.
comment:3 by , 14 months ago
Milestone: | 1.5.0 → 2.0.0 |
---|
From #186: