Opened 3 years ago
Last modified 4 months ago
#812 new defect
Stemming of proper nouns
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.5.0 |
Component: | QueryParser | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Currently the QueryParser suppresses stemming for words with an initial capital, with the assumption that these are proper nouns where stemming can be unhelpful.
However, that's a bit English-centric - e.g. nouns in German always have an initial capital, and names are inflected in some languages (Russian and Czech are two I'm aware of but there are likely others).
We could scrap this special handling completely, but it seems useful for some languages. Perhaps each stemmer should be able to report whether it's desirable to do this or not?
This issue is present in 1.4.x but so far nobody has actually complained about it. Therefore I think we should decide how to address it best without the restriction of backportability, then we can look at that once we have addressed it.
Turkish too, if apostrophe is treated as a word character (like we currently do) - e.g.
Türkiye'dir ("it is Turkey")
.