| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"> |
|---|
| 2 | <html> |
|---|
| 3 | <head> |
|---|
| 4 | <title>Xapian: Remote Backend Protocol</title> |
|---|
| 5 | </head> |
|---|
| 6 | <body bgcolor="white" text="black"> |
|---|
| 7 | <h1>Remote Backend Protocol</h1> |
|---|
| 8 | |
|---|
| 9 | <p> |
|---|
| 10 | This document describes <em>version 30.5</em> of the protocol used by Xapian's |
|---|
| 11 | remote backend. Clients and servers must support matching major protocol versions |
|---|
| 12 | and the client's minor protocol version must be the same or lower. This means that |
|---|
| 13 | for a minor protocol version change, you can upgrade first servers and then |
|---|
| 14 | clients and everything should work during the upgrades. |
|---|
| 15 | </p> |
|---|
| 16 | |
|---|
| 17 | <p> |
|---|
| 18 | The protocol assumes a reliable two-way connection across which |
|---|
| 19 | arbitrary data can be sent - this could be provided by a TCP socket for example |
|---|
| 20 | (as it is with xapian-tcpsrv), but any such connection could be used. For |
|---|
| 21 | example, you could used xapian-progsrv across an ssh connection, or even |
|---|
| 22 | a custom server across a suitable serial connection. |
|---|
| 23 | </p> |
|---|
| 24 | |
|---|
| 25 | <p> |
|---|
| 26 | All messages start with a single byte identifying code. A message from client |
|---|
| 27 | to server has a <code>MSG_XXX</code> identifying code, while a message from |
|---|
| 28 | server to client has a <code>REPLY_XXX</code> identifying code (but note that a |
|---|
| 29 | reply might not actually be in response to a message - REPLY_GREETING isn't - |
|---|
| 30 | and some messages result in multiple replies). |
|---|
| 31 | </p> |
|---|
| 32 | |
|---|
| 33 | <p> |
|---|
| 34 | The identifying code is followed by the encoded length of the contents followed |
|---|
| 35 | by the contents themselves. |
|---|
| 36 | </p> |
|---|
| 37 | |
|---|
| 38 | <p> |
|---|
| 39 | Inside the contents, strings are generally passed as an encoded |
|---|
| 40 | length followed by the string data (this is indicated below by |
|---|
| 41 | <code>L<...></code>) except when the string is the last or only |
|---|
| 42 | thing in the contents in which case we know the length because |
|---|
| 43 | we know the length of the contents so we don't need to explicitly |
|---|
| 44 | specify it. |
|---|
| 45 | </p> |
|---|
| 46 | |
|---|
| 47 | <p> |
|---|
| 48 | Integers are encoded using the same encoding used for string lengths |
|---|
| 49 | (indicated by <code>I<...></code> below). |
|---|
| 50 | </p> |
|---|
| 51 | |
|---|
| 52 | <p> |
|---|
| 53 | Floating pointing values are passed using a bit packed encoding of the sign |
|---|
| 54 | and exponent and a base-256 encoding of the mantissa which avoids any rounding |
|---|
| 55 | issues (assuming that both machines have FLT_RADIX equal to some power of 2). |
|---|
| 56 | This is indicated by <code>F<...></code> below. |
|---|
| 57 | </p> |
|---|
| 58 | |
|---|
| 59 | <p> |
|---|
| 60 | Boolean values are passed as a single byte which is the ASCII character |
|---|
| 61 | value for <code>0</code> or <code>1</code>. This is indicated by |
|---|
| 62 | <code>B<...></code> below. |
|---|
| 63 | </p> |
|---|
| 64 | |
|---|
| 65 | <h2>Server greeting and statistics</h2> |
|---|
| 66 | |
|---|
| 67 | <ul> |
|---|
| 68 | <li> <code>REPLY_GREETING <protocol major version> <protocol minor version> I<db doc count> I<last docid> B<has positions?> F<db average length></code> |
|---|
| 69 | </ul> |
|---|
| 70 | |
|---|
| 71 | <p> |
|---|
| 72 | The protocol major and minor versions are passed as a single byte each (e.g. |
|---|
| 73 | <code>'\x1e\x01'</code> for version 30.1). The server and client must |
|---|
| 74 | understand the same protocol major version, and the server protocol minor |
|---|
| 75 | version must be greater than or equal to that of the client (this means that |
|---|
| 76 | the server understands newer MSG_<i>XXX</i>, but will only send newer |
|---|
| 77 | REPLY_<i>YYY</i> in response to an appropriate client message. |
|---|
| 78 | </p> |
|---|
| 79 | |
|---|
| 80 | <h2>Exception</h2> |
|---|
| 81 | |
|---|
| 82 | <ul> |
|---|
| 83 | <li> <code>REPLY_EXCEPTION <serialised Xapian::Error object></code> |
|---|
| 84 | </ul> |
|---|
| 85 | |
|---|
| 86 | <p> |
|---|
| 87 | If an unknown exception is caught by the server, this message is sent but |
|---|
| 88 | with empty contents. |
|---|
| 89 | </p> |
|---|
| 90 | |
|---|
| 91 | <p> |
|---|
| 92 | This message can be sent at any point - the serialised exception is |
|---|
| 93 | unserialised by the client and thrown. The server and client both |
|---|
| 94 | abort any current sequence of messages. |
|---|
| 95 | </p> |
|---|
| 96 | |
|---|
| 97 | <h2>All Terms</h2> |
|---|
| 98 | |
|---|
| 99 | <ul> |
|---|
| 100 | <li> <code>MSG_ALLTERMS</code> |
|---|
| 101 | <li> <code>REPLY_ALLTERMS I<term freq> L<term name></code> |
|---|
| 102 | <li> <code>...</code> |
|---|
| 103 | <li> <code>REPLY_DONE</code> |
|---|
| 104 | </ul> |
|---|
| 105 | |
|---|
| 106 | <h2>Term Exists</h2> |
|---|
| 107 | |
|---|
| 108 | <ul> |
|---|
| 109 | <li> <code>MSG_TERMEXISTS <term name></code> |
|---|
| 110 | <li> <code>REPLY_TERMEXISTS</code> or <code>REPLY_TERMDOESNTEXIST</code> |
|---|
| 111 | </ul> |
|---|
| 112 | |
|---|
| 113 | <h2>Term Frequency</h2> |
|---|
| 114 | |
|---|
| 115 | <ul> |
|---|
| 116 | <li> <code>MSG_TERMFREQ <term name></code> |
|---|
| 117 | <li> <code>REPLY_TERMFREQ I<term freq></code> |
|---|
| 118 | </ul> |
|---|
| 119 | |
|---|
| 120 | <h2>Collection Frequency</h2> |
|---|
| 121 | |
|---|
| 122 | <ul> |
|---|
| 123 | <li> <code>MSG_COLLFREQ <term name></code> |
|---|
| 124 | <li> <code>REPLY_COLLFREQ I<collection freq></code> |
|---|
| 125 | </ul> |
|---|
| 126 | |
|---|
| 127 | <h2>Document</h2> |
|---|
| 128 | |
|---|
| 129 | <ul> |
|---|
| 130 | <li> <code>MSG_DOCUMENT I<document id></code> |
|---|
| 131 | <li> <code>REPLY_DOCDATA L<document data></code> |
|---|
| 132 | <li> <code>REPLY_VALUE I<value no> <value></code> |
|---|
| 133 | <li> <code>...</code> |
|---|
| 134 | <li> <code>REPLY_DONE</code> |
|---|
| 135 | </ul> |
|---|
| 136 | |
|---|
| 137 | <h2>Document Length</h2> |
|---|
| 138 | |
|---|
| 139 | <ul> |
|---|
| 140 | <li> <code>MSG_DOCLENGTH I<document id></code> |
|---|
| 141 | <li> <code>REPLY_DOCLENGTH F<document length></code> |
|---|
| 142 | </ul> |
|---|
| 143 | |
|---|
| 144 | <h2>Keep Alive</h2> |
|---|
| 145 | |
|---|
| 146 | <ul> |
|---|
| 147 | <li> <code>MSG_KEEPALIVE</code> |
|---|
| 148 | <li> <code>REPLY_DONE</code> |
|---|
| 149 | </ul> |
|---|
| 150 | |
|---|
| 151 | <h2>Reopen</h2> |
|---|
| 152 | |
|---|
| 153 | <ul> |
|---|
| 154 | <li> <code>MSG_REOPEN</code> |
|---|
| 155 | <li> <code>REPLY_UPDATE I<db doc count> I<last docid> B<has positions?> F<db average length></code> |
|---|
| 156 | </ul> |
|---|
| 157 | |
|---|
| 158 | <p>The reply is the same as for <code>MSG_UPDATE</code>.</p> |
|---|
| 159 | |
|---|
| 160 | <h2>Query</h2> |
|---|
| 161 | |
|---|
| 162 | <ul> |
|---|
| 163 | <li> <code>MSG_QUERY L<serialised Xapian::Query object> |
|---|
| 164 | I<query length> I<collapse key number> <docid order> |
|---|
| 165 | I<sort key number> <sort by> B<sort value forward> |
|---|
| 166 | <percent cutoff> F<weight cutoff> <serialised Xapian::Weight object> <serialised Xapian::RSet object></code> |
|---|
| 167 | <li> <code>REPLY_STATS <serialised Stats object></code> |
|---|
| 168 | <li> <code>MSG_GETMSET I<first> I<max items> I<check at least> |
|---|
| 169 | <serialised global Stats object></code> |
|---|
| 170 | <li> <code>REPLY_RESULTS <serialised Xapian::MSet object> <percent_factor></code> |
|---|
| 171 | </ul> |
|---|
| 172 | |
|---|
| 173 | <p> |
|---|
| 174 | (Instead of MSG_GETMSET and REPLY_RESULTS, clients running protocol 30.3 or |
|---|
| 175 | 30.4 have: |
|---|
| 176 | </p> |
|---|
| 177 | |
|---|
| 178 | <ul> |
|---|
| 179 | <li> <code>MSG_GETMSET_PRE_30_5 I<first> I<max items> I<check at least> |
|---|
| 180 | <serialised global Stats object></code> |
|---|
| 181 | <li> <code>REPLY_RESULTS_PRE_30_5 <serialised Xapian::MSet object></code> |
|---|
| 182 | </li> |
|---|
| 183 | </ul> |
|---|
| 184 | |
|---|
| 185 | <p> |
|---|
| 186 | and clients running protocol 30.2 or earlier send: |
|---|
| 187 | </p> |
|---|
| 188 | |
|---|
| 189 | <ul> |
|---|
| 190 | <li> <code>MSG_GETMSET_PRE_30_3 I<first> I<max items> |
|---|
| 191 | <serialised global Stats object></code> |
|---|
| 192 | <li> <code>REPLY_RESULTS_PRE_30_5 <serialised Xapian::MSet object></code> |
|---|
| 193 | </li> |
|---|
| 194 | </ul> |
|---|
| 195 | |
|---|
| 196 | <p>).</p> |
|---|
| 197 | |
|---|
| 198 | <p>docid order is <code>'0'</code>, <code>'1'</code> or <code>'2'</code>.</p> |
|---|
| 199 | |
|---|
| 200 | <p>sort by is <code>'0'</code>, <code>'1'</code>, <code>'2'</code> or <code>'3'</code>.</p> |
|---|
| 201 | |
|---|
| 202 | <h2>Termlist</h2> |
|---|
| 203 | |
|---|
| 204 | <ul> |
|---|
| 205 | <li> <code>MSG_TERMLIST I<document id></code> |
|---|
| 206 | <li> <code>REPLY_DOCLENGTH F<document length></code> |
|---|
| 207 | <li> <code>REPLY_TERMLIST I<wdf> I<term freq> L<term name></code> |
|---|
| 208 | <li> <code>...</code> |
|---|
| 209 | <li> <code>REPLY_DONE</code> |
|---|
| 210 | </ul> |
|---|
| 211 | |
|---|
| 212 | <h2>Positionlist</h2> |
|---|
| 213 | |
|---|
| 214 | <ul> |
|---|
| 215 | <li> <code>MSG_POSITIONLIST I<document id> <term name></code> |
|---|
| 216 | <li> <code>REPLY_POSITIONLIST I<termpos delta - 1></code> |
|---|
| 217 | <li> <code>...</code> |
|---|
| 218 | <li> <code>REPLY_DONE</code> |
|---|
| 219 | </ul> |
|---|
| 220 | |
|---|
| 221 | <p> |
|---|
| 222 | Since positions must be strictly monotonically increasing, we encode |
|---|
| 223 | <tt>(pos - lastpos - 1)</tt> so that small differences |
|---|
| 224 | between large position values can still be encoded compactly. The first |
|---|
| 225 | position is encoded as its true value. |
|---|
| 226 | </p> |
|---|
| 227 | |
|---|
| 228 | <h2>Postlist</h2> |
|---|
| 229 | |
|---|
| 230 | <ul> |
|---|
| 231 | <li> <code>MSG_POSTLIST <term name></code> |
|---|
| 232 | <li> <code>REPLY_POSTLISTSTART I<termfreq> I<collfreq></code> |
|---|
| 233 | <li> <code>REPLY_POSTLISTITEM I<docid delta - 1> I<wdf> F<document length></code> |
|---|
| 234 | <li> <code>...</code> |
|---|
| 235 | <li> <code>REPLY_DONE</code> |
|---|
| 236 | </ul> |
|---|
| 237 | |
|---|
| 238 | <p> |
|---|
| 239 | Since document IDs in postlists must be strictly monotonically increasing, we |
|---|
| 240 | encode <tt>(docid - lastdocid - 1)</tt> so that small |
|---|
| 241 | differences between large document IDs can still be encoded compactly. The |
|---|
| 242 | first document ID is encoded as its true value - 1 (since document IDs are always > 0). |
|---|
| 243 | </p> |
|---|
| 244 | |
|---|
| 245 | <h2>Shut Down</h2> |
|---|
| 246 | |
|---|
| 247 | <ul> |
|---|
| 248 | <li> <code>MSG_SHUTDOWN</code> |
|---|
| 249 | </ul> |
|---|
| 250 | |
|---|
| 251 | <p> |
|---|
| 252 | No reply is sent - this message signals that the client has ended the session. |
|---|
| 253 | </p> |
|---|
| 254 | |
|---|
| 255 | <h2>Update</h2> |
|---|
| 256 | |
|---|
| 257 | <ul> |
|---|
| 258 | <li> <code>MSG_UPDATE</code> |
|---|
| 259 | <li> <code>REPLY_UPDATE I<db doc count> I<last docid> B<has positions?> F<db average length></code> |
|---|
| 260 | </ul> |
|---|
| 261 | |
|---|
| 262 | <p> |
|---|
| 263 | Only useful for a <code>WritableDatabase</code> (since the same statistics |
|---|
| 264 | are sent when the connection is initiated in the <code>REPLY_GREETING</code> |
|---|
| 265 | and they don't change if the database can't change). |
|---|
| 266 | </p> |
|---|
| 267 | |
|---|
| 268 | <h2>Add document</h2> |
|---|
| 269 | |
|---|
| 270 | <ul> |
|---|
| 271 | <li> <code>MSG_ADDDOCUMENT <serialised Xapian::Document object></code> |
|---|
| 272 | <li> <code>REPLY_ADDDOCUMENT I<document id></code> |
|---|
| 273 | </ul> |
|---|
| 274 | |
|---|
| 275 | <h2>Delete document</h2> |
|---|
| 276 | |
|---|
| 277 | <ul> |
|---|
| 278 | <li> <code>MSG_DELETEDOCUMENT I<document id></code> |
|---|
| 279 | <li> <code>REPLY_DONE</code> |
|---|
| 280 | </ul> |
|---|
| 281 | |
|---|
| 282 | <h2>Delete document (for compatibility with clients using protocols < 30.2)</h2> |
|---|
| 283 | |
|---|
| 284 | <ul> |
|---|
| 285 | <li> <code>MSG_DELETEDOCUMENT_PRE_30_2 I<document id></code> |
|---|
| 286 | </ul> |
|---|
| 287 | |
|---|
| 288 | <h2>Delete document by term</h2> |
|---|
| 289 | |
|---|
| 290 | <ul> |
|---|
| 291 | <li> <code>MSG_DELETEDOCUMENTTERM <term name></code> |
|---|
| 292 | </ul> |
|---|
| 293 | |
|---|
| 294 | <h2>Replace document</h2> |
|---|
| 295 | |
|---|
| 296 | <ul> |
|---|
| 297 | <li> <code>MSG_REPLACEDOCUMENT I<document id> <serialised Xapian::Document object></code> |
|---|
| 298 | </ul> |
|---|
| 299 | |
|---|
| 300 | <h2>Replace document by term</h2> |
|---|
| 301 | |
|---|
| 302 | <ul> |
|---|
| 303 | <li> <code>MSG_REPLACEDOCUMENTTERM L<term name> <serialised Xapian::Document object></code> |
|---|
| 304 | </ul> |
|---|
| 305 | |
|---|
| 306 | <h2>Cancel</h2> |
|---|
| 307 | |
|---|
| 308 | <ul> |
|---|
| 309 | <li> <code>MSG_CANCEL</code> |
|---|
| 310 | </ul> |
|---|
| 311 | |
|---|
| 312 | <h2>Flush</h2> |
|---|
| 313 | |
|---|
| 314 | <ul> |
|---|
| 315 | <li> <code>MSG_FLUSH</code> |
|---|
| 316 | <li> <code>REPLY_DONE</code> |
|---|
| 317 | </ul> |
|---|
| 318 | |
|---|
| 319 | </body> |
|---|
| 320 | </html> |
|---|