Opened 8 years ago

Last modified 5 months ago

#737 assigned enhancement

Fix/improve $filters

Reported by: Olly Betts Owned by: Olly Betts
Priority: highest Milestone: 1.5.0
Component: Omega Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

The current encoding of $filters has at least one bug (which was also present in the older encoding used in 1.2.x):

  • DOCIDORDER=A is the default, but produces an X in $filters/DOCIDORDER=X is non-default but produces nothing in $filters. Currently however, A and X are identical as DONT_CARE currently actually always results in ASCENDING order, so this doesn't seem worth changing anything for. But if/when we change the encoding, we should address this.

And it could be more compact:

  • Every N term is prefixed by !, but only the first needs to be.
  • Every encoded string has at least ~~ after the character for DEFAULTOP, which isn't necessary.
  • The DEFAULTOP character could be omitted when using the default DEFAULTOP.
  • We could combine some/all of DEFAULTOP, DOCIDORDER and the existing SORTREVERSE/SORTAFTER characters - there are currently 2, 3 and 2*2 states, though more DEFAULTOP values are possible, and about 10+26*2+19=81 characters which don't need URL encoding, so we could support up to 6 DEFAULTOP values and encode all of these into one character which shouldn't need URL encoding.
  • We could encode value slot numbers using something like base64 and save bytes when slots > 9 are used (or perhaps encode all the slot numbers together such that they'd usually all fit in one byte).
  • Lists of B and N are sorted, so could easily be prefix-compressed - reducing the size when there are a lot of either, which is a case where keeping the size down matters most.

The compactness matters as the length of a URL is limited, and using GET is common for search systems. A longer URL can also look uglier when pasted, etc.

Change History (4)

comment:1 by Olly Betts, 10 months ago

Status: newassigned

As well as building the filters string, we also build an old_filters string which is the value of FILTERS from Omega < 1.3.4. This means the first stable release it was in was 1.4.0, released 2016-06-24, so we can reasonably drop support for this and instead have old_filters supporting what 1.4.x generates in FILTERS.

comment:2 by Olly Betts, 10 months ago

As a first step, dropped compatibility handling for Xapian 1.2.x xFILTERS encoding in d66c9e9b4d9f8456e6245d0fc1ee59f9e9c5a7d9.

comment:3 by Olly Betts, 10 months ago

Working on this. My WIP so far addresses the first 3 points (any START/END/SPAN filter is now encoded in the same way as date range filters from START.n, etc are) which gets rid of the ~~ when these aren't used. Additionally I've shortened the encoding of date range filters by a character or two in cases where SPAN/SPAN.n isn't used.

The DEFAULTOP character could be omitted when using the default DEFAULTOP.

We probably could, but it's a single character and omitting it entirely seems to complicate things.

We could combine some/all of DEFAULTOP, DOCIDORDER and the existing SORTREVERSE/SORTAFTER characters - there are currently 2, 3 and 2*2 states, though more DEFAULTOP values are possible, and about 10+26*2+19=81 characters which don't need URL encoding, so we could support up to 6 DEFAULTOP values and encode all of these into one character which shouldn't need URL encoding.

This seems a better approach and potentially saves more.

We could encode value slot numbers using something like base64 and save bytes when slots > 9 are used (or perhaps encode all the slot numbers together such that they'd usually all fit in one byte).

Not looked into this.

Lists of B and N are sorted, so could easily be prefix-compressed - reducing the size when there are a lot of either, which is a case where keeping the size down matters most.

Or this.

comment:4 by Olly Betts, 5 months ago

Priority: normalhighest

We really should do this for 1.5.0.

Note: See TracTickets for help on using tickets.