Phrasebook

Introduction

Newcomers to Xapian have occasionally (okay, quite often) reported being confused by some of the specific terminology used by Xapian. The Glossary in the Xapian documentation is essential reading for such users, and describes the terminology used in Xapian. However, it doesn't provide "translations" of terminology used in other related technologies to the Xapian equivalents: the following Phrasebook provides some such translations - for example, the Xapian equivalents of some relational database terminology are provided.

Note that the "translations" described in the following document are not generally exact - for example, Xapian is not actually a relational database, so the concepts do not match precisely.

Relational Databases

The following list contains terminology used in relational databases, and descriptions of related Xapian terminology.

Database

A relational database is a complex, customisable, structured entity, usually consisting of many tables and other items (such as indexes). In Xapian, there is really no direct equivalent - a Xapian database is a simpler entity, consisting of a fixed structure. However, like a relational database, a Xapian Database is the main store that information is placed in.

Table

A table in a relational database is probably most similar to a Xapian Database: it holds a set of objects (rows), whereas a Xapian database holds a set of documents.

Row

A row is a set of a single value for each column in a table. Similarly, a Xapian Document is an object containing a set of values stored in a Database.

However, unlike a row, a xapian Document contains three separate types of data:

  • Document Data (which is just a binary blob, set to an arbitrary value by the user)
  • Terms (which are used for performing searches, and roughly correspond to words in the text)
  • Values (which are used for performing special operations, such as sorting)

Column

Unique

In relational databases, Unique has fairly complicated semantics, particularly where joins are involved. We don't need to go into these semantics here, because Xapian does not have the concept of a "join": each result returned corresponds to a single document, and that document will only ever be returned once in a result set.

However, sometimes multiple Xapian documents will be produced for a set of related resources (eg, pages for a particular website, or pages from a single PDF document), and in this situation it can be desirable to return only one of the documents for each resource. In this case, the Xapian "Collapse" feature provides the ability to return only

Group By

Sort

Normally, Xapian sorts its result set by relevance, with the most relevant documents appearing first. However, it is also possible to sort in other orders; this can be done by storing a value associated with each document which you want to sort by, and using Enquire::set_sort_by_value_then_relevance() to sort by the value. The result set will then be sorted by the value; any documents in the result set which have the same value will be sorted by relevance.

Currently you can only sort by one value at a time. This roughly corresponds to being able to do "ORDER BY foo" but not "ORDER BY foo, bar". However, this restriction is expected to be removed in future.