Message4058
Bernhard,
From what I understand, Roundup uses the Porter2 stemming algorithm exposed by Xapian.
http://snowball.tartarus.org/algorithms/english/stemmer.html
The original Porter algorithm requires lowercase input. Take a look at some of the reference
implementations here.
http://tartarus.org/~martin/PorterStemmer/
The only Xapian reference I found was on their intro page and is hardly prescriptive
(http://xapian.org/docs/intro_ir.html).
"Usually they are converted to lower case, and often a stemming algorithm is applied"
The problem is that stemming doesn't work properly. "Silently" should stem to "silent", not
"SILENTLi". A search for "silently" should return pages that contain the word "silent" and vice
versa.
A simple test would be to index a document containing the word "silently" and ensure that a
search on the term "silent" returns the same document.
--Jeff |
|
Date |
User |
Action |
Args |
2010-05-10 22:15:48 | jvstein | set | messageid: <1273529748.4.0.476912199405.issue2550583@psf.upfronthosting.co.za> |
2010-05-10 22:15:48 | jvstein | set | recipients:
+ jvstein, ber, ThomasAH, olly |
2010-05-10 22:15:48 | jvstein | link | issue2550583 messages |
2010-05-10 22:15:47 | jvstein | create | |
|