wiki:GSoC2017/Clustering

Clustering of search results

Name Richhiey Thomas
IRC nick richhiey1996
Timezone UTC +0530
Work hours Temporarily 22:00 - 04:00
Code repository https://github.com/richhiey1996/xapian/tree/kmeans-clusterer

PROJECT DESCRIPTION

Last year in GSoC 2016, I had the chance to work on a KMeans clustering API. The aim of the project this year would be to merge in all the work from last year with significant additions to improve functionality and performance.

Some of the main problems to be tackled this year are :

  1. Merge previous PR that has been opened for KMeans
  2. Implement triangle inequality to improve performance
  3. Apply dimensionality reduction to improve feature extraction
  4. Implement Cluster Evaluation class, to be able to evaluate clustering results
  5. Implement agglomerative clusterer for search result clustering
  6. Optimize the agglomerative clusterer
  7. Documentation of all the work done

A stretch goal for this year would be to implement LSA to be able to find the most important features in document vectors.

Last modified 7 years ago Last modified on 08/22/17 19:36:57
Note: See TracWiki for help on using the wiki.