Clustering of search results
|Work hours||Temporarily 22:00 - 04:00|
Last year in GSoC 2016, I had the chance to work on a KMeans clustering API. The aim of the project this year would be to merge in all the work from last year with significant additions to improve functionality and performance.
Some of the main problems to be tackled this year are :
- Merge previous PR that has been opened for KMeans
- Implement triangle inequality to improve performance
- Apply dimensionality reduction to improve feature extraction
- Implement Cluster Evaluation class, to be able to evaluate clustering results
- Implement agglomerative clusterer for search result clustering
- Optimize the agglomerative clusterer
- Documentation of all the work done
A stretch goal for this year would be to implement LSA to be able to find the most important features in document vectors.