【发布时间】:2023-03-22 23:43:01
【问题描述】:
我有以下代码
static class TaggerAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String s, Reader reader) {
SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("al"), new CharsRef("americanleague"), true);
builder.add(new CharsRef("al"), new CharsRef("a.l."), true);
builder.add(new CharsRef("nba"), new CharsRef("national" + SynonymMap.WORD_SEPARATOR + "basketball" + SynonymMap.WORD_SEPARATOR + "association"), true);
SynonymMap mySynonymMap = null;
try {
mySynonymMap = builder.build();
} catch (IOException e) {
e.printStackTrace();
}
Tokenizer source = new ClassicTokenizer(Version.LUCENE_40, reader);
TokenStream filter = new StandardFilter(Version.LUCENE_40, source);
filter = new LowerCaseFilter(Version.LUCENE_40, filter);
filter = new SynonymFilter(filter, mySynonymMap, true);
return new TokenStreamComponents(source, filter);
}
}
我正在运行一些测试,到目前为止,一切都很好,直到我弄清楚了这个场景。
String title = "Very short title at a.l. bla bla"
Assert.assertTrue(TagUtil.evaluate(memoryIndex,"americanleague"));
Assert.assertTrue(TagUtil.evaluate(memoryIndex,"al"));
我期待这两个案例都能成功运行,但美国联盟与“a.l.”不匹配除了“a.l.”和“americanleague”是“al”的同义词。
那么,我该怎么办?我不想将所有组合添加到地图中。谢谢
【问题讨论】: