| 1 | ==============
|
|---|
| 2 | Omega overview
|
|---|
| 3 | ==============
|
|---|
| 4 |
|
|---|
| 5 | If you just want a very quick overview, you might prefer to read the
|
|---|
| 6 | `quick-start guide <quickstart.html>`_.
|
|---|
| 7 |
|
|---|
| 8 | Omega operates on a set of databases. Each database is created and updated
|
|---|
| 9 | separately using either omindex or `scriptindex <scriptindex.html>`_. You can
|
|---|
| 10 | search these databases (or any other Xapian database with suitable contents)
|
|---|
| 11 | via a web front-end provided by omega, a CGI application. A search can also be
|
|---|
| 12 | done over more than one database at once.
|
|---|
| 13 |
|
|---|
| 14 | There are separate documents covering `CGI parameters <cgiparams.html>`_, the
|
|---|
| 15 | `Term Prefixes <termprefixes.html>`_ which are conventionally used, and
|
|---|
| 16 | `OmegaScript <omegascript.html>`_, the language used to define omega's web
|
|---|
| 17 | interface. Omega ships with several OmegaScript templates and you can
|
|---|
| 18 | use these, modify them, or just write your own. See the "Supplied Templates"
|
|---|
| 19 | section below for details of the supplied templates.
|
|---|
| 20 |
|
|---|
| 21 | Omega parses queries using the ``Xapian::QueryParser`` class - for the supported
|
|---|
| 22 | syntax, see queryparser.html in the xapian-core documentation
|
|---|
| 23 | - available online at: http://www.xapian.org/docs/queryparser.html
|
|---|
| 24 |
|
|---|
| 25 | Term construction
|
|---|
| 26 | =================
|
|---|
| 27 |
|
|---|
| 28 | Documents within an omega database are stored with two types of terms:
|
|---|
| 29 | those used for probabilistic searching (the CGI parameter 'P'), and
|
|---|
| 30 | those used for boolean filtering (the CGI parameter 'B'). Boolean
|
|---|
| 31 | terms start with an initial capital letter denoting the 'group' of the
|
|---|
| 32 | term (e.g. 'M' for MIME type), while probabilistic terms are all
|
|---|
| 33 | lower-case, and are also stemmed before adding to the
|
|---|
| 34 | database.
|
|---|
| 35 |
|
|---|
| 36 | The "english" stemmer is used by default - you can configure this for omindex
|
|---|
| 37 | and scriptindex with "--stemmer LANGUAGE" (use 'none' to disable stemming, see
|
|---|
| 38 | omindex --help for the list of accepted language names). At search time you
|
|---|
| 39 | can configure the stemmer by adding $set{stemmer,LANGUAGE} to the top of you
|
|---|
| 40 | OmegaScript template.
|
|---|
| 41 |
|
|---|
| 42 | The two term types are used as follows when building the query:
|
|---|
| 43 | B(oolean) terms with the same prefix are ORed together, with all the
|
|---|
| 44 | different prefix groups being ANDed together. This is then FILTERed
|
|---|
| 45 | against the P(robabilistic) terms. This will look something like::
|
|---|
| 46 |
|
|---|
| 47 | [ FILTER ]
|
|---|
| 48 | / \
|
|---|
| 49 | / \
|
|---|
| 50 | P-terms [ AND ]
|
|---|
| 51 | / | ... \
|
|---|
| 52 | /
|
|---|
| 53 | [ OR ]
|
|---|
| 54 | / | ... \
|
|---|
| 55 | B(F,1) B(F,2)...B(F,n)
|
|---|
| 56 |
|
|---|
| 57 | Where B(F,1) is the first boolean term with prefix F, and so on.
|
|---|
| 58 |
|
|---|
| 59 | The intent here is to allow filtering on arbitrary (and, typically,
|
|---|
| 60 | orthogonal) characteristics of the document. For instance, by adding
|
|---|
| 61 | boolean terms "Ttext/html", "Ttext/plain" and "P/press" you would be
|
|---|
| 62 | filtering the probabilistic search for only documents that are both in
|
|---|
| 63 | the "/press" site *and* which are either of MIME type text/html or
|
|---|
| 64 | text/plain. (See below for more information about sites.)
|
|---|
| 65 |
|
|---|
| 66 | If there is no probabilistic query, the boolean filter is promoted to
|
|---|
| 67 | be the query, and the weighting scheme is set to boolean. This has
|
|---|
| 68 | the effect of applying the boolean filter to the whole database.
|
|---|
| 69 |
|
|---|
| 70 | In order to add more boolean prefixes, you will need to alter the
|
|---|
| 71 | ``index_file()`` function in omindex.cc. Currently omindex adds several
|
|---|
| 72 | useful ones, detailed below.
|
|---|
| 73 |
|
|---|
| 74 | Probabilistic terms are constructed from the title, body and keywords
|
|---|
| 75 | of a document. (Not all document types support all three areas of
|
|---|
| 76 | text.) Title terms are stored with position data starting at 0, body
|
|---|
| 77 | terms starting 100 beyond title terms, and keyword terms starting 100
|
|---|
| 78 | beyond body terms. This allows queries using positional data without
|
|---|
| 79 | causing false matches across the different types of term.
|
|---|
| 80 |
|
|---|
| 81 | Sites
|
|---|
| 82 | =====
|
|---|
| 83 |
|
|---|
| 84 | Within a database, Omega supports multiple sites. These are recorded
|
|---|
| 85 | using boolean terms (see 'Term construction', above) to allow
|
|---|
| 86 | filtering on them.
|
|---|
| 87 |
|
|---|
| 88 | Sites work by having all documents within them having a common base
|
|---|
| 89 | URL. For instance, you might have two sites, one for your press area
|
|---|
| 90 | and one for your product descriptions:
|
|---|
| 91 |
|
|---|
| 92 | - \http://example.com/press/index.html
|
|---|
| 93 | - \http://example.com/press/bigrelease.html
|
|---|
| 94 | - \http://example.com/products/bigproduct.html
|
|---|
| 95 | - \http://example.com/products/littleproduct.html
|
|---|
| 96 |
|
|---|
| 97 | You could index all documents within \http://example.com/press/ using a
|
|---|
| 98 | site of '/press', and all within \http://example.com/products/ using
|
|---|
| 99 | '/products'.
|
|---|
| 100 |
|
|---|
| 101 | Sites are also useful because omindex indexes documents through the
|
|---|
| 102 | file system, not by fetching from the web server. If you don't have a
|
|---|
| 103 | URL to file system mapping which puts all documents under one
|
|---|
| 104 | hierarchy, you'll need to index each separate section as a site.
|
|---|
| 105 |
|
|---|
| 106 | An obvious example of this is the way that many web servers map URLs
|
|---|
| 107 | of the form <\http://example.com/~<username>/> to a directory within
|
|---|
| 108 | that user's home directory (such as ~<username>/pub on a Unix
|
|---|
| 109 | system). In this case, you can index each user's home page separately,
|
|---|
| 110 | as a site of the form '/~<username>'. You can then use boolean
|
|---|
| 111 | filters to allow people to search only a specific home page (or a
|
|---|
| 112 | group of them), or omit such terms to search everyone's pages.
|
|---|
| 113 |
|
|---|
| 114 | Note that the site specified when you index is used to build the
|
|---|
| 115 | complete URL that the results page links to. Thus while sites will
|
|---|
| 116 | typically want to be relative to the hostname part of the URL (e.g.
|
|---|
| 117 | '/site' rather than '\http://example.com/site'), you can use them
|
|---|
| 118 | to have a single search across several different hostnames. This will
|
|---|
| 119 | still work if you actually store each distinct hostname in a different
|
|---|
| 120 | database.
|
|---|
| 121 |
|
|---|
| 122 | omindex operation
|
|---|
| 123 | =================
|
|---|
| 124 |
|
|---|
| 125 | omindex is fairly simple to use, for example::
|
|---|
| 126 |
|
|---|
| 127 | omindex --db default --url http://example.com/ /var/www/example.com
|
|---|
| 128 |
|
|---|
| 129 | For a full list of command line options supported, see ``man omindex``
|
|---|
| 130 | or ``omindex --help``.
|
|---|
| 131 |
|
|---|
| 132 | You *must* specify the database to index into (it's created if it doesn't
|
|---|
| 133 | exist, but parent directories must exist). You will often also want to specify
|
|---|
| 134 | the base URL (which is used as the site, and can be relative to the hostname -
|
|---|
| 135 | starts '/' - or absolute - starts with a scheme, e.g.
|
|---|
| 136 | '\http://example.com/products/'). If not specified, the base URL defaults to
|
|---|
| 137 | ``/``.
|
|---|
| 138 |
|
|---|
| 139 | You also need to tell omindex which directory to index. This should be
|
|---|
| 140 | either a single directory (in which case it is taken to be the
|
|---|
| 141 | directory base of the entire site being indexed), or as two arguments,
|
|---|
| 142 | the first being the directory base of the site being indexed, and the
|
|---|
| 143 | second being a relative directory within that to index.
|
|---|
| 144 |
|
|---|
| 145 | For instance, in the example above, if you separate your products by
|
|---|
| 146 | size, you might end up with:
|
|---|
| 147 |
|
|---|
| 148 | - \http://example.com/press/index.html
|
|---|
| 149 | - \http://example.com/press/bigrelease.html
|
|---|
| 150 | - \http://example.com/products/large/bigproduct.html
|
|---|
| 151 | - \http://example.com/products/small/littleproduct.html
|
|---|
| 152 |
|
|---|
| 153 | If the entire website is stored in the file system under the directory
|
|---|
| 154 | /www/example, then you would probably index the site in two
|
|---|
| 155 | passes, one for the '/press' site and one for the '/products' site. You
|
|---|
| 156 | might use the following commands::
|
|---|
| 157 |
|
|---|
| 158 | $ omindex -p --db /var/lib/omega/data/default --url /press /www/example/press
|
|---|
| 159 | $ omindex -p --db /var/lib/omega/data/default --url /products /www/example/products
|
|---|
| 160 |
|
|---|
| 161 | If you add a new large products, but don't want to reindex the whole of
|
|---|
| 162 | the products section, you could do::
|
|---|
| 163 |
|
|---|
| 164 | $ omindex -p --db /var/lib/omega/data/default --url /products /www/example/products large
|
|---|
| 165 |
|
|---|
| 166 | and just the large products will be reindexed. You need to do it like that, and
|
|---|
| 167 | not as::
|
|---|
| 168 |
|
|---|
| 169 | $ omindex -p --db /var/lib/omega/data/default --url /products/large /www/example/products/large
|
|---|
| 170 |
|
|---|
| 171 | because that would make the large products part of a new site,
|
|---|
| 172 | '/products/large', which is unlikely to be what you want, as large
|
|---|
| 173 | products would no longer come up in a search of the products
|
|---|
| 174 | site. (Note that the --depth-limit option may come in handy if you have
|
|---|
| 175 | sites '/products' and '/products/large', or similar.)
|
|---|
| 176 |
|
|---|
| 177 | omindex has built-in support for indexing HTML, PHP, text files, and AbiWord
|
|---|
| 178 | documents. It can also index a number of other formats using external
|
|---|
| 179 | programs. Filter programs are run with CPU and memory limits to prevent a
|
|---|
| 180 | runaway filter from blocking indexing of other files.
|
|---|
| 181 |
|
|---|
| 182 | The following formats are currently supported (if you know of a reliable
|
|---|
| 183 | filter which can extract text from another useful file format, please let us
|
|---|
| 184 | know):
|
|---|
| 185 |
|
|---|
| 186 | * HTML (.html, .htm, .shtml)
|
|---|
| 187 | * PHP (.php) - our HTML parser knows to ignore PHP code
|
|---|
| 188 | * text files (.txt, .text)
|
|---|
| 189 | * PDF (.pdf) if pdftotext is available (comes with xpdf)
|
|---|
| 190 | * PostScript (.ps, .eps, .ai) if ps2pdf (from ghostscript) and pdftotext (comes
|
|---|
| 191 | with xpdf) are available
|
|---|
| 192 | * OpenOffice/StarOffice documents (.sxc, .stc, .sxd, .std, .sxi, .sti, .sxm,
|
|---|
| 193 | .sxw, .sxg, .stw) if unzip is available
|
|---|
| 194 | * OpenDocument format documents (.odt, .ods, .odp, .odg, .odc, .odf, .odb,
|
|---|
| 195 | .odi, .odm, .ott, .ots, .otp, .otg, .otc, .otf, .oti, .oth) if unzip is
|
|---|
| 196 | available
|
|---|
| 197 | * MS Word documents (.docx) and (.doc, .dot) if antiword is available
|
|---|
| 198 | * MS Excel documents (.xlsx) and (.xls, .xlb, .xlt) if xls2csv is available (comes with catdoc)
|
|---|
| 199 | * MS Powerpoint documents (.pptx) and (.ppt, .pps) if catppt is available (comes with catdoc)
|
|---|
| 200 | * Wordperfect documents (.wpd) if wpd2text is available (comes with libwpd)
|
|---|
| 201 | * MS Works documents (.wps, .wpt) if wps2text is available (comes with libwps)
|
|---|
| 202 | * AbiWord documents (.abw)
|
|---|
| 203 | * Compressed AbiWord documents (.zabw) if gzip is available
|
|---|
| 204 | * Rich Text Format documents (.rtf) if unrtf is available
|
|---|
| 205 | * Perl POD documentation (.pl, .pm, .pod) if pod2text is available
|
|---|
| 206 | * TeX DVI files (.dvi) if catdvi is available
|
|---|
| 207 | * DjVu files (.djv, .djvu) if djvutxt is available
|
|---|
| 208 |
|
|---|
| 209 | If you have additional extensions that represent one of these types, you need
|
|---|
| 210 | to add an additional MIME mapping using the --mime-type option. For instance::
|
|---|
| 211 |
|
|---|
| 212 | $ omindex --db /var/lib/omega/data/default --url /press /www/example/press --mime-type doc:application/postscript
|
|---|
| 213 |
|
|---|
| 214 | The syntax of --mime-type is 'ext:type', where ext is the extension of
|
|---|
| 215 | a file of that type (everything after the last '.'), and type is one
|
|---|
| 216 | of:
|
|---|
| 217 |
|
|---|
| 218 | - text/html
|
|---|
| 219 | - text/plain
|
|---|
| 220 | - text/rtf
|
|---|
| 221 | - text/x-perl
|
|---|
| 222 | - application/msword
|
|---|
| 223 | - application/pdf
|
|---|
| 224 | - application/postscript
|
|---|
| 225 | - application/vnd.ms-excel
|
|---|
| 226 | - application/vnd.ms-powerpoint
|
|---|
| 227 | - application/vnd.ms-works
|
|---|
| 228 | - application/vnd.oasis.opendocument.text
|
|---|
| 229 | - application/vnd.oasis.opendocument.spreadsheet
|
|---|
| 230 | - application/vnd.oasis.opendocument.presentation
|
|---|
| 231 | - application/vnd.oasis.opendocument.graphics
|
|---|
| 232 | - application/vnd.oasis.opendocument.chart
|
|---|
| 233 | - application/vnd.oasis.opendocument.formula
|
|---|
| 234 | - application/vnd.oasis.opendocument.database
|
|---|
| 235 | - application/vnd.oasis.opendocument.image
|
|---|
| 236 | - application/vnd.oasis.opendocument.text-master
|
|---|
| 237 | - application/vnd.oasis.opendocument.text-template
|
|---|
| 238 | - application/vnd.oasis.opendocument.spreadsheet-template
|
|---|
| 239 | - application/vnd.oasis.opendocument.presentation-template
|
|---|
| 240 | - application/vnd.oasis.opendocument.graphics-template
|
|---|
| 241 | - application/vnd.oasis.opendocument.chart-template
|
|---|
| 242 | - application/vnd.oasis.opendocument.formula-template
|
|---|
| 243 | - application/vnd.oasis.opendocument.image-template
|
|---|
| 244 | - application/vnd.oasis.opendocument.text-web
|
|---|
| 245 | - application/vnd.sun.xml.calc
|
|---|
| 246 | - application/vnd.sun.xml.calc.template
|
|---|
| 247 | - application/vnd.sun.xml.draw
|
|---|
| 248 | - application/vnd.sun.xml.draw.template
|
|---|
| 249 | - application/vnd.sun.xml.impress
|
|---|
| 250 | - application/vnd.sun.xml.impress.template
|
|---|
| 251 | - application/vnd.sun.xml.math
|
|---|
| 252 | - application/vnd.sun.xml.writer
|
|---|
| 253 | - application/vnd.sun.xml.writer.global
|
|---|
| 254 | - application/vnd.sun.xml.writer.template
|
|---|
| 255 | - application/vnd.wordperfect
|
|---|
| 256 | - application/x-abiword
|
|---|
| 257 | - application/x-abiword-compressed
|
|---|
| 258 | - application/x-dvi
|
|---|
| 259 | - image/vnd.djvu
|
|---|
| 260 |
|
|---|
| 261 | If you wish to remove a MIME mapping, you can do this by omitting the type -
|
|---|
| 262 | for example to not index .doc files, use: --mime-type doc:
|
|---|
| 263 |
|
|---|
| 264 | The lookup of extensions in the MIME mappings is case sensitive, but if an
|
|---|
| 265 | extension isn't found and includes upper case ASCII letters, they're converted
|
|---|
| 266 | to lower case and the lookup is repeated, so you effectively get case
|
|---|
| 267 | insensitive lookup for mappings specified with a lower-case extension, but
|
|---|
| 268 | you can set different handling for differently cased variants if you need
|
|---|
| 269 | to.
|
|---|
| 270 |
|
|---|
| 271 | --duplicates configures how omindex handles duplicates (detected on
|
|---|
| 272 | URL). 'ignore' means to ignore a document if it already appears to be
|
|---|
| 273 | in the database; 'replace' means to replace the document in the
|
|---|
| 274 | database with a new one by indexing this file, and 'duplicate' means
|
|---|
| 275 | to index this file as a new document, leaving the previous one in the
|
|---|
| 276 | database as well. The last strategy is very fast, but is liable to do
|
|---|
| 277 | strange things to your results set. In general, 'ignore' is useful for
|
|---|
| 278 | completely static documents (e.g. archive sites), while 'replace' is
|
|---|
| 279 | the most generally useful.
|
|---|
| 280 |
|
|---|
| 281 | With 'replace', omindex will remove any document it finds in the
|
|---|
| 282 | database that it did not update - in other words, it will clear out
|
|---|
| 283 | everything that doesn't exist any more. However if you are building up
|
|---|
| 284 | an omega database with several runs of omindex, this is not
|
|---|
| 285 | appropriate (as each run would delete the data from the previous run),
|
|---|
| 286 | so you should use the --preserve-nonduplicates. Note that if you
|
|---|
| 287 | choose to work like this, it is impossible to prune old documents from
|
|---|
| 288 | the database using omindex. If this is a problem for you, an
|
|---|
| 289 | alternative is to index each subsite into a different database, and
|
|---|
| 290 | merge all the databases together when searching.
|
|---|
| 291 |
|
|---|
| 292 | --depth-limit allows you to prevent omindex from descending more than
|
|---|
| 293 | a certain number of directories. If you wish to replicate the old
|
|---|
| 294 | --no-recurse option, use ----depth-limit=1.
|
|---|
| 295 |
|
|---|
| 296 | HTML Parsing
|
|---|
| 297 | ============
|
|---|
| 298 |
|
|---|
| 299 | The document ``<title>`` tag is used as the document title, the 'description'
|
|---|
| 300 | META tag (if present) is used for the document snippet, and the 'keywords'
|
|---|
| 301 | META tag (if present) is indexed as extra document text.
|
|---|
| 302 |
|
|---|
| 303 | The HTML parser will look for the 'robots' META tag, and won't index pages
|
|---|
| 304 | which are marked as ``noindex`` or ``none``, for example any of the following::
|
|---|
| 305 |
|
|---|
| 306 | <meta name="robots" content="noindex,nofollow">
|
|---|
| 307 | <meta name="robots" content="noindex">
|
|---|
| 308 | <meta name="robots" content="none">
|
|---|
| 309 |
|
|---|
| 310 | The parser also understand ht://dig comments to mark sections of the document
|
|---|
| 311 | to not index (for example, you can use this to avoid indexing navigation links
|
|---|
| 312 | or standard headers/footers) - for example::
|
|---|
| 313 |
|
|---|
| 314 | Index this bit <!--htdig_noindex-->but <b>not</b> this<!--/htdig_noindex>
|
|---|
| 315 |
|
|---|
| 316 | Boolean terms
|
|---|
| 317 | =============
|
|---|
| 318 |
|
|---|
| 319 | omindex will create the following boolean terms when it indexes a
|
|---|
| 320 | document:
|
|---|
| 321 |
|
|---|
| 322 | T
|
|---|
| 323 | MIME type
|
|---|
| 324 | H
|
|---|
| 325 | hostname of site (if supplied - this term won't exist if you index a
|
|---|
| 326 | site with base URL '/press', for instance)
|
|---|
| 327 | P
|
|---|
| 328 | path of site (i.e. the rest of the site base URL)
|
|---|
| 329 | U
|
|---|
| 330 | full URL of indexed document - if the resulting term would be > 240
|
|---|
| 331 | characters, a hashing scheme is used to prevent omindex overflowing
|
|---|
| 332 | the Xapian term length limit.
|
|---|
| 333 |
|
|---|
| 334 |
|
|---|
| 335 |
|
|---|
| 336 | D
|
|---|
| 337 | date (numeric format: YYYYMMDD)
|
|---|
| 338 | date can also have the magical form "latest" - a document indexed
|
|---|
| 339 | by the term Dlatest matches any date-range without an end date.
|
|---|
| 340 | You can index dynamic documents which are always up to date
|
|---|
| 341 | with Dlatest and they'll match as expected. (If you use sort by date,
|
|---|
| 342 | you'll probably also want to set the value containing the timestamp to
|
|---|
| 343 | a "max" value so dynamic documents match a date in the far future).
|
|---|
| 344 | M
|
|---|
| 345 | month (numeric format: YYYYMM)
|
|---|
| 346 | Y
|
|---|
| 347 | year (four digits)
|
|---|
| 348 |
|
|---|
| 349 | omega configuration
|
|---|
| 350 | ===================
|
|---|
| 351 |
|
|---|
| 352 | Most of the omega CGI configuration is dynamic, by setting CGI
|
|---|
| 353 | parameters. However some things must be configured using a
|
|---|
| 354 | configuration file. The configuration file is searched for in
|
|---|
| 355 | various locations:
|
|---|
| 356 |
|
|---|
| 357 | - Firstly, if the "OMEGA_CONFIG_FILE" environment variable is
|
|---|
| 358 | set, its value is used as the full path to a configuration file
|
|---|
| 359 | to read.
|
|---|
| 360 | - Next (if the environment variable is not set, or the file pointed
|
|---|
| 361 | to is not present), the file "omega.conf" in the same directory as
|
|---|
| 362 | the Omega CGI is used.
|
|---|
| 363 | - Next (if neither of the previous steps found a file), the file
|
|---|
| 364 | "${sysconfdir}/omega.conf" (e.g. /etc/omega.conf on Linux systems)
|
|---|
| 365 | is used.
|
|---|
| 366 | - Finally, if no configuration file is found, default values are used.
|
|---|
| 367 |
|
|---|
| 368 | The format of the file is very simple: a line per option, with the
|
|---|
| 369 | option name followed by its value, separated by a whitespace. Blank
|
|---|
| 370 | lines are ignored. If the first non-whitespace character on a line
|
|---|
| 371 | is a '#', omega treats the line as a comment and ignores it.
|
|---|
| 372 |
|
|---|
| 373 | The current options are 'database_dir' (the directory containing all the
|
|---|
| 374 | Omega databases), 'template_dir' (the directory containing the OmegaScript
|
|---|
| 375 | templates), and 'log_dir' (the directory which the OmegaScript $log command
|
|---|
| 376 | writes log files to).
|
|---|
| 377 |
|
|---|
| 378 | The default values (used if no configuration file is found) are::
|
|---|
| 379 |
|
|---|
| 380 | database_dir /var/lib/omega/data
|
|---|
| 381 | template_dir /var/lib/omega/templates
|
|---|
| 382 | log_dir /var/log/omega
|
|---|
| 383 |
|
|---|
| 384 | Note that, with apache, environment variables may be set using mod_env, and
|
|---|
| 385 | with apache 1.3.7 or later this may be used inside a .htaccess file. This
|
|---|
| 386 | makes it reasonably easy to share a single system installed copy of Omega
|
|---|
| 387 | between multiple users.
|
|---|
| 388 |
|
|---|
| 389 | Supplied Templates
|
|---|
| 390 | ==================
|
|---|
| 391 |
|
|---|
| 392 | The OmegaScript templates supplied with Omega are:
|
|---|
| 393 |
|
|---|
| 394 | * query - This is the default template, providing a typical Web search
|
|---|
| 395 | interface.
|
|---|
| 396 | * topterms - This is just like query, but provides a "top terms" feature
|
|---|
| 397 | which suggests terms the user might want to add to their query to
|
|---|
| 398 | obtain better results.
|
|---|
| 399 | * godmode - Allows you to inspect a database showing which terms index
|
|---|
| 400 | each document, and which documents are indexed by each term.
|
|---|
| 401 | * opensearch - Provides results in OpenSearch format (for more details
|
|---|
| 402 | see http://www.opensearch.org/).
|
|---|
| 403 | * xml - Provides results in a custom XML format.
|
|---|
| 404 |
|
|---|
| 405 | There are also "helper fragments" used by the templates above:
|
|---|
| 406 |
|
|---|
| 407 | * inc/anyalldropbox - Provides a choice of matching "any" or "all" terms
|
|---|
| 408 | by default as a drop down box.
|
|---|
| 409 | * inc/anyallradio - Provides a choice of matching "any" or "all" terms
|
|---|
| 410 | by default as radio buttons.
|
|---|
| 411 | * toptermsjs - Provides some JavaScript used by the topterms template.
|
|---|
| 412 |
|
|---|
| 413 | Document data construction
|
|---|
| 414 | ==========================
|
|---|
| 415 |
|
|---|
| 416 | This is only useful if you need to inject your own documents into the
|
|---|
| 417 | database independently of omindex, such as if you are indexing
|
|---|
| 418 | dynamically-generated documents that are served using a server-side
|
|---|
| 419 | system such as PHP or ASP, but which you can determine the contents of
|
|---|
| 420 | in some way, such as documents generated from reasonably static
|
|---|
| 421 | database contents.
|
|---|
| 422 |
|
|---|
| 423 | The document data field stores some summary information about the
|
|---|
| 424 | document, in the following (sample) format::
|
|---|
| 425 |
|
|---|
| 426 | url=<baseurl>
|
|---|
| 427 | sample=<sample>
|
|---|
| 428 | caption=<title>
|
|---|
| 429 | type=<mimetype>
|
|---|
| 430 |
|
|---|
| 431 | Further fields may be added (although omindex doesn't currently add any
|
|---|
| 432 | others), and may be looked up from OmegaScript using the $field{}
|
|---|
| 433 | command.
|
|---|
| 434 |
|
|---|
| 435 | As of Omega 0.9.3, you can alternatively add something like this near the
|
|---|
| 436 | start of your OmegaScript template::
|
|---|
| 437 |
|
|---|
| 438 | $set{fieldnames,$split{caption sample url}}
|
|---|
| 439 |
|
|---|
| 440 | Then you need only give the field values in the document data, which can
|
|---|
| 441 | save a lot of space in a large database. With the setting of fieldnames
|
|---|
| 442 | above, the first line of document data can be accessed with $field{caption},
|
|---|
| 443 | the second with $field{sample}, and the third with $field{url}.
|
|---|