wiki:GSoC2012/Bi-gram Language Modeling

Bi-gram Language Modeling

Name Gaurav Arora
IRC nick samuelharden
Timezone UTC+5:30
Work hours 5:00 -14:00 UTC
Official mentor James Aylett
Code repository https://github.com/samuelharden/xapian-gaurav-gsoc
Current Worked on Branch https://github.com/samuelharden/xapian-gaurav-gsoc/tree/bigram
Evaluation Code repository https://github.com/samuelharden/xapian-evaluation
Documentation repository https://github.com/samuelharden/xapian-docsprint
Melange http://www.google-melange.com/gsoc/project/google/gsoc2012/samuelharden/21001
GSOC Blog http://gsocxapian.blogspot.com

Bi-gram Language modeling approach to information retrieval have proved to outperform the three tradition IR approaches . Bi-gram Language model apart from better retrieval performance renders a rich resource Bi-gram from collection which can be used for phrase searching, Diversifying search results, and query reformulation suggestion to user. Bi-gram Language model would make Xapian a more powerful library for research in information retrieval.

Last modified 3 years ago Last modified on 06/03/17 23:36:25