wiki:GSoC2011/SupportLua/ProjectPlan

Project

This project aims to support Lua on Xapian. Lua is a powerful, fast, lightweight, embeddable scripting language. It has been used in many industrial applications and games. So support for Lua could allow Xapian to be more widely used and more powerful. Also many Lua projects could benefit from such binding as they could use Xapian as a highly adaptable Search Engine library.

Motivation and Benifits

Lua[4], written in C, is an embeddable scripting language designed to support general procedural programming with data description facilities. It supports for object-oriented programming, functional programming, and data-driven programming. It is powerful, fast, and lightweight. These features make Lua widely used in many industrial applications and the leading scripting language in games. So support for Lua language could allow Xapian to be more widely used in industrial applications and games. Xapian[5], written in C++, is an Open Source Search Engine Library. It is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators. These features make Xapian easy to extend and widely used. So many Lua projects, which need the high adaptable Search Engine, could benefit from a binding to Xapian. Since both organizations could benefit from such project, I would like to implement it. Another reason why I choose Lua is that, Lua is well supported by SWIG[6]. As we know, the implement of Xapian's most bindings to other language are based the swig tool, which connects programs written in C and C++ with a variety of high-level programming languages. Using Swig, we do not need write large trivial interface functions. We just need write some small SWIG interface files, (with .i as the suffix). SWIG uses these *.i files and the source C/C++ files as input then generates the wrapper files we need. The link[2] is SWIG Lua documentation, and link [3] are some Xapian binding examples.

Project Detail

Now Xapian supports for several language, such as Python, Java, C# and so on. These modules give me good examples to learn. These days I download the source code by svn and mainly install xapian-core and xapian-bindings libraries. I run the binding examples and look into the source to know how Xapian is working and how it interfaces to other languages. The following technologies are needed to do the project well.

1.Lua language  

(a) Basic knowledge.

There are eight basic types in Lua: nil, boolean, number, string, function, userdata, thread, and table. In our project, the most important types are the tables and userdata. Table which is similar to array in C and dictionary in Python, is a bit different. Every element in it could be seen as a key -value. Tables can contain values of all types(except nil). They can use field name as an index, for example, x = {“hello”, 4, f(y), z = 5 }, then x[1] = “hello”, x[2] = 4, x[3] = f(y) (f(y) is a function), x.z = 5(also could be represented by x[z] = 5). This easy way could make tables to represent ordinary arrays, symbol tables, sets, records, graphs, trees, etc. The metatable is one kind of table that defines the behavior of the original value under certain special operations. Every value in Lua can have a metatable. If users want to change the behavior of operations over a value, they could set specific fields in the metatables. For example, if two strings x = “hello”, y = “world”. If we want to implement x+y = “hello world”, we could set the metatables of x and y, with one filed is add and its value is a function pointer, which pointers the function that concats two strings. The metatables make Lua easily extend to C++ classes. When we want to use a C++ classes in Lua, we could encapsulate the member and method into the metatable. User data is another important type in Lua. It allows arbitrary C data to be stored in Lua variables. This type corresponds to a block of raw memory and has no pre-defined operations in Lua. C/C++ pointers, structure, classes are all wrapped into Lua userdata. We can define operations for userdata values by using metatables. In Lua, it throws an exception with error function and catches it with pcall function. When a Lua function terminates with an error it returns one value back to the caller. SWIG automatically maps any basic types which is thrown into a Lua error, including numeric types, enums, chars, char*, std::string and std::exception. These exceptions are converted into an error string in Lua. If we want to throw an object exception to the Lua interpreter, we need write the typemap by ourselves. As the error function could accept any type, including userdata type. So our Xapian object exception could be typemaped into the userdata and then Lua uses the error function to throw this exception.

(b) Lua C API. This API[7] describes the communication between Lua an C. Lua uses a virtual stack to pass values to and from C. Each element in this stack represents a Lua value (nil, number, string, etc.). Whenever Lua calls C, the called function gets a new stack. This stack initially contains any arguments to the C function and it is where the C function pushes its results to be returned to the caller. This stack is powerful than the stack what we understand in C as it could use a index to get the any element in the stack. In our project, the following API is used most: lua_istable(lua_State *L, int index): whether the value at the given index is a table. lua_istring (lua_State *L, int index): whether the value at the given index is a string (including the number which is always convertible to a string). lua_isuserdata (lua_State *L, int index): whether the value at the given index is a userdata. lua_pushstring (lua_State *L, const char *s): Pushes the zero-terminated string pointed to by s onto the stack. lua_tolstring (lua_State *L, int index, size_t *len): Converts the Lua value at the given index to a C string. lua_error (lua_State *L): Generates a Lua error, which is on the stack top.

(C) Idioms. The intention of bindings is to be as idiomatic in the target language as possible. There are a lot of binding examples in the following wiki[1].SWIG support for Lua very well. It supports most C++ idioms, such as C++ inheritance, C++ overloaded functions, C++ operators and so on.

2.SWIG

The current SWIG implementation is support Lua 5.0.x and Lua 5.1.x. It should work with later versions of Lua. It support for Lua language well as most of C++ features could be well wrapped as the above. The Xapian project use the swig tool to generated the wrapper file using the *.i files as input. There are two parts in Xapian's binding: the generic code shared across all the bindings (the xapian.i, xapian-head.i files in the xapian-bindings root), and language-specific files in a subdirectory. The language-specific files include two main files, util.i and extra.i. The file util.i is for SWIG typemaps, which convert arguments between the target language and C++, while extra.i is currently mostly used by the python bindings to inject additional python code, mostly to provide more idiomatic iterators. So to Support for Lua, one main task is to write the specical util.i file, which customs Lua typemaps for xapian-bindings. The most features we used in SWIG is as the following:

%rename: this feature renames the declaration or the function to another name, in order not to generate a conflict with a keyword or already existing function in the target language.

%extend: this feature can extend structures and classes with new methods. For example, the xapian.i file uses the %extend feature to constructs a query from a vector of subqueries merged with the specified operator. The subqueries could be Query object or could be strings if XAPIAN_MIXED_VECTOR_QUERY_INPUT_TYPEMAP is defined.

%typemap(in): The feature is used to convert function arguments from the target language to C/C++. One example is in the file util.i:

%typemap(in) const vector<Xapian::Query> & (vector<Xapian::Query> v) {

special statements depend on various lanaguge ...

} This “in” typemape is used to construct Query object using a C ++ vector. In Lua, it converts the table to the C++ vector. The table element could be string or the user data, which represents the Query object. Besides, there is also some SWIG Lua-C API we could use. The most important one is SWIG_ConvertPtr(lua_State* L,int index,void ptr,swig_type_info *type,int flags), which converts a Lua userdata to a void*. It takes the value at the given index in the Lua state and converts it to a userdata. It also then provides the necessary type checks, confirming that the pointer is compatible with the type given in 'type'.

  1. HANCKING Xapian In order to understand Xapian binding better, I do some simple hacking, which tests simple Lua-binding . I have pasted the patch code in the pastebin, pleas see the link [8]. The following is my hacking:

(A) change Makefile.am and configure.ac file in the xapian-bindings directory.

The Makefile.am is a used by automake to produce Makefile.in file. The Makefile.am is very small and is easy to add the Lua support statement in corresponding place.

The configure.ac file is used to produce a configure script by autoconf. In this file we need set the Lua configure, include testing the Lua version, the library path, the header file path. For my simple Hacking, I just set it hard-coded configuration which is suitable for my machine.

(B) add a Make.am in the Lua diretory.

This file includes the special enviroment for Lua when generating the wrapper file. For example, it defines the input file for swig "util.i", the test file "smoketest.lua", docdatadir, etc. Its main function is to use "swig -c++ -lua -o xapian_wrap.cc" command to generate the wrapper files.

(C) write a small util.i file. This file mainly includes a “in” typemap. It could do some simple type check and construct the Query object using any mix of strings and Query objects.

(D) write a small test file “smoktest.lua”. This file is used to test whether my util.i file works well or not. In lua we usually use “:”to call a method of a class, and use “.”to get/set the attribute of the object.

Project Plan

During the coding period, I divided the project into small tasks weekly. As I am not familiar with auto tools, I would spend two weeks setting up the environment, just as the following:

May 23 ~ May 29 change“configure.ac”, “Makefile.am” in the xapian-binding dir.

May 30 ~ June 5 Creating the Makefile.am in the Lua dir.

Then I will begin to develop the main job that interface Lua to Xapian.

 

June 6 ~ June 12 Query support. In Lua, we could pass tables to Xapian which contain various types, including string or userdata to construct the Query object.

 

June 13 ~ June 19 MatchAll and MatchNothing support.

 

June 20 ~ June 26 MatchDecider support. In Lua we could custom MatchDeciders.

 

June 27 ~ July 3 Iterators support. Iterators are used most in Xapian. By this, we could use the C++ iterators nationally in Lua

 

July 4~ July 10 Exception support. Xapian exceptions are translated into Lua exceptions, and dealt with by Lua interpreter well.

  

July 11 ~July 17 Non-class functions support. There are also some non-class in Xapian. By this support, we could use these functions in Lua.

 

July 18~ July 24 Improve iterators support and improve exception support

 

July 25~ July 31 ExpandDecider, Stopper, StemImplementation Support

August 1 ~ August 7 KeyMaker, MatchSpy, ValueRangeProcessor Support

August 8 ~ August 15 XapianSWIGQueryItor Support

Reference:

[1] Lua Binding http://lua-users.org/wiki/BindingCodeToLua

[2] SWIG and Lua. http://www.swig.org/Doc2.0/SWIGDocumentation.html#Lua

[3] Xapian Binding https://xapian.org/docs/bindings/

[4] Lua http://www.lua.org/

[5] Xapian https://xapian.org/

[6] SWIG http://swig.org/

[7] LUA C API http://www.lua.org/manual/5.1/manual.html

[8] Lua-binding patch http://pastebin.com/Fu9Y790D

Last modified 9 years ago Last modified on 01/26/16 10:10:43
Note: See TracWiki for help on using the wiki.