wiki:GSoC2017/Clustering

Context Navigation

Clustering of search results

Name	Richhiey Thomas
IRC nick	richhiey1996
Timezone	UTC +0530
Work hours	Temporarily 22:00 - 04:00
Code repository	https://github.com/richhiey1996/xapian/tree/kmeans-clusterer

PROJECT DESCRIPTION

Last year in GSoC 2016, I had the chance to work on a KMeans clustering API. The aim of the project this year would be to merge in all the work from last year with significant additions to improve functionality and performance.

Some of the main problems to be tackled this year are :

Merge previous PR that has been opened for KMeans
Implement triangle inequality to improve performance
Apply dimensionality reduction to improve feature extraction
Implement Cluster Evaluation class, to be able to evaluate clustering results
Implement agglomerative clusterer for search result clustering
Optimize the agglomerative clusterer
Documentation of all the work done

A stretch goal for this year would be to implement LSA to be able to find the most important features in document vectors.

Last modified 8 years ago Last modified on 22/08/17 19:36:57

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text