GSoC Guide
If you are interested in doing a project with us, here's a quick guide to how best to go about it. We expect potential students to have read through this guide; you should also be able to start the first few steps, although if you have any questions, please don't be afraid to ask!
Previous experience with Xapian or other search software certainly isn't required for most of these projects, and neither is having studied Information Retrieval theory at university, but please do tell us about any relevant experience in your application.
Table of Contents
Asking for help
It's okay to ask for help at any time — that's part of the point of GSoC! Rather than hitting your head against something you don't understand or can't figure out how to work through, please do ask questions (either on the mailing list or in our IRC channel).
It's helpful when asking questions to make it clear what you've already tried. Our documentation isn't perfect, so if you think you've followed it fully and things didn't do what you expected, please tell us that, including as precise a reference to the bit of documentation you followed as you can manage.
If you've tried things beyond that (maybe you've modified some code to try to fix a compile error), then it's useful to make that clear as well, in case that's the first thing we think to suggest. A good way to share code changes you've made is to fork our repository on github, and push your changes as one of more commits. These can then be shared by URL.
If you need to share some command output, such as a configure run that gave errors, or a compile step that didn't complete, then please include the full command you ran and its output; gist.github.com and pastebin.com are helpful for this. Again, you should also mention anything you did before that would be different to someone else checking out the latest code and running exactly the same command.
Finally, it's important to ask as specific a question as possible. The top question asked on IRC by completely new potential students is "how can I get started?" (or some variant of this). The answer will always be to read this guide and follow it through — so if you have already done so, please let us know! Otherwise it will take longer for us to help you.
Mentoring
We're intending to use a similar "group mentoring" approach as we did in previous years, so you'll be expected to discuss issues and ask questions in the IRC channel or on the mailing list, which is better for you as it means you don't have to wait for your mentor to be able to respond, and better for us as it makes it easier to keep track of how everyone's doing. Public communication is also the way open source projects usually work, and GSoC is meant to give you experience of working in an open source project.
You'll have an official mentor - one reason is because someone needs to be in Google's system as your mentor, but they are also responsible for making sure you do actually get a response to questions, and things like that. We won't decide who is in that role for each project until we have the final list of selected projects.
So there's no point asking us "who will mentor me?" - the answer is either (or perhaps both) "everyone" or "we don't know yet".
Checking out and building the code
A good way to start learning about Xapian is to check out the code, and get it to build. It's better to use the latest code from the repository rather than a release, as that's what we want the projects to be based on.
We recommend you use Linux or another UNIX-like system for development work, as we're better set up for development on such platforms. In particular we use them ourselves, so can more easily help with any set up issues you may encounter. If you want run Linux (perhaps virtualised) for development and have no existing preference, we suggest Debian or Ubuntu as our developer documentation covers these well.
It can take a while to get the code if your network connection is slow, and it may take a while to build and run the testsuite if your computer is slow - while you are waiting, you might want to make a start on the next section.
Learn about Xapian's API
The best resource for this is the newer "Getting Started" guide, which we're intending will replace much of the user documentation currently on the main website. This is a work in progress, but it's already in good shape. Currently the examples shown in the text are in Python, but there are versions of most of the example code in C++ and PHP.
Choosing an idea to work on
If you haven't already, the first thing to do is to take a look at our list of project ideas and see if there's anything that seems interesting to you and fits the skills you have.
You are also welcome to propose other ideas for projects, but it's probably wise to discuss them with us a bit before you write a full proposal.
Note that we will not accept projects that involve a significant amount of new research; GSoC projects for Xapian need to have a reasonably tight project plan before the beginning of the summer, and so is more suitable for implementing existing research in Xapian to bring it to a wider audience.
Reading the Resources for your chosen Project Idea
Most of the ideas on the project ideas list link to several resources which provide more information - have a read through those for the idea you've chosen. If you have any questions, come and talk to us on IRC or the mailing list. We want you to be ready to break your idea into a series of steps towards accomplishing it by the application period.
Particularly if you are working on something with a more theoretical basis (such as implementing a new weighting scheme, or extending our learning to rank system), you may find other resources which are helpful. Please remember that you should critically assess any resource you find -- including academic papers.
Get familiar with the code
Once you have managed to build the code, try to find the area of the code where your project would fit in. See if you can think of a simple first step towards implementing the project which you could do.
If the project idea isn't really amenable to that, you could see if there's a "bite size" project idea or a bug report tagged "GoodFirstBug" in a similar area which you could look at.
Doing this will give you a better feel for what the project involves, which should help you write a better proposal. It also means we can get to see what you can do, so provides a way for you to convince us that you are likely to be able to successfully tackle the project. While we will consider all applications, those from people who've already taken a small change through our submissive and review process are easier for us to assess. If you have time, we strongly recommend you find a small change you can contribute before we start reviewing proposals. Even a tiny change of only a line or two is worth doing!
You should look at the Xapian developer guide for guidance on getting started with developing Xapian, and for useful information about how to prepare your changes for inclusion. If you need a hand getting to grips with the code, don't be afraid to ask.
Writing your proposal
You should use the application template which we provide. This helps to ensure that you provide all the information we're after; if you don't use it, our initial feedback will be to ask you to reformat it into our template. Your application needs to be submitted via the Google Summer of Code website, and is then reviewed by us.
While it is important, the proposal doesn't stand alone - we'll look at discussions on IRC and/or the mailing lists, and we may hold IRC interviews with some or all candidates.
When planning out your project, try to bear in mind the following.
Projects should include automated tests and documentation
Although it is called the Summer of Code, tests and documentation are an important part of writing good code, and communication is an important part of writing good code in an Open Source project. The "tests" really need to be automated tests, not just '"I've run it and it seems to work"'. A new feature without adequate test coverage probably doesn't work in at least some of the untested cases, and even if it does work now, it is likely to cease to work as other changes are made, but if there's an automated test then we'll see as soon as it stops working. Code without documentation is hard for anyone to use. And communication is important too - we'd rather you achieved less in code terms and communicated well with your mentors and the wider community.
Good projects are built out of many small changes and improvements
It's essential with any project to be able to break the work into small, achievable pieces, each of which can be reviewed and merged before moving on to the next. When constructing a timeline, it's important to ensure there is sufficient time for each milestone to be reviewed by the rest of the Xapian team, including any time you may need to incorporate suggestions, so that sub-projects can be merged to Xapian's master branch. (This will also aid subsequent merges, because you won't need to spend significant time rebasing back onto master any subsequent work.) For this reason, it is important that you take advantage of the opportunity before GSoC proper starts to get familiar with contributing to Xapian by getting a small change merged to master. This will give you a chance to get used to the tools needed (git and so forth), ensure your development environment is set up, and get you into the swing of Xapian's coding and development style.
You may wish to plan to use the time while waiting for feedback on a sub-project to finish documentation for that piece of work, and perhaps to start writing example code for the next sub-project. (The documentation is in a different repository, so can be reviewed in parallel to planning and coding a subsequent sub-project without causing rebase problems. Example code can initially be created and reviewed in a separate repository, and copied over once it's ready.)
This is the worst approach to the timeline:
- 4 weeks: write code
- first evaluation
- 4 weeks: write more code
- second evaluation
- 2 weeks: write more code
- 1 week: test
- 1 week: document
Most importantly, the granularity is too coarse - it should be broken down into identified tasks of a week or so at most.
There are some other things that stand out, however. The split between coding and testing and docs is too uneven for a start; writing good tests and documentation takes significant time. It's going to be hard to judge how you're doing until two weeks from the end, and if you overrun by a week there's no documentation; if you overrun by two there are no tests either! Also most people find testing and documentation more enjoyable if interleaved rather than spending a solid chunk of time doing nothing else.
Cite academic references and other support
It's important to cite suitable references both to support any position you take (such as 'algorithm X is considered to perform better than algorithm Y') and to show which ideas underpin your project, and how you've had to develop them further to make them practical for Xapian.