Opened 8 years ago
Last modified 5 months ago
#737 assigned enhancement
Fix/improve $filters
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | highest | Milestone: | 1.5.0 |
Component: | Omega | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
The current encoding of $filters has at least one bug (which was also present in the older encoding used in 1.2.x):
DOCIDORDER=A
is the default, but produces anX
in$filters
/DOCIDORDER=X
is non-default but produces nothing in$filters
. Currently however,A
andX
are identical asDONT_CARE
currently actually always results inASCENDING
order, so this doesn't seem worth changing anything for. But if/when we change the encoding, we should address this.
And it could be more compact:
- Every
N
term is prefixed by!
, but only the first needs to be. - Every encoded string has at least
~~
after the character forDEFAULTOP
, which isn't necessary. - The
DEFAULTOP
character could be omitted when using the defaultDEFAULTOP
. - We could combine some/all of
DEFAULTOP
,DOCIDORDER
and the existingSORTREVERSE
/SORTAFTER
characters - there are currently 2, 3 and 2*2 states, though moreDEFAULTOP
values are possible, and about 10+26*2+19=81 characters which don't need URL encoding, so we could support up to 6DEFAULTOP
values and encode all of these into one character which shouldn't need URL encoding. - We could encode value slot numbers using something like base64 and save bytes when slots > 9 are used (or perhaps encode all the slot numbers together such that they'd usually all fit in one byte).
- Lists of
B
andN
are sorted, so could easily be prefix-compressed - reducing the size when there are a lot of either, which is a case where keeping the size down matters most.
The compactness matters as the length of a URL is limited, and using GET
is common for search systems. A longer URL can also look uglier when pasted, etc.
Change History (4)
comment:1 by , 10 months ago
Status: | new → assigned |
---|
comment:2 by , 10 months ago
As a first step, dropped compatibility handling for Xapian 1.2.x xFILTERS
encoding in d66c9e9b4d9f8456e6245d0fc1ee59f9e9c5a7d9.
comment:3 by , 10 months ago
Working on this. My WIP so far addresses the first 3 points (any START
/END
/SPAN
filter is now encoded in the same way as date range filters from START.n
, etc are) which gets rid of the ~~
when these aren't used. Additionally I've shortened the encoding of date range filters by a character or two in cases where SPAN
/SPAN.n
isn't used.
The DEFAULTOP character could be omitted when using the default DEFAULTOP.
We probably could, but it's a single character and omitting it entirely seems to complicate things.
We could combine some/all of DEFAULTOP, DOCIDORDER and the existing SORTREVERSE/SORTAFTER characters - there are currently 2, 3 and 2*2 states, though more DEFAULTOP values are possible, and about 10+26*2+19=81 characters which don't need URL encoding, so we could support up to 6 DEFAULTOP values and encode all of these into one character which shouldn't need URL encoding.
This seems a better approach and potentially saves more.
We could encode value slot numbers using something like base64 and save bytes when slots > 9 are used (or perhaps encode all the slot numbers together such that they'd usually all fit in one byte).
Not looked into this.
Lists of B and N are sorted, so could easily be prefix-compressed - reducing the size when there are a lot of either, which is a case where keeping the size down matters most.
Or this.
As well as building the
filters
string, we also build anold_filters
string which is the value ofFILTERS
from Omega < 1.3.4. This means the first stable release it was in was 1.4.0, released 2016-06-24, so we can reasonably drop support for this and instead haveold_filters
supporting what 1.4.x generates inFILTERS
.