【发布时间】:2014-06-16 13:06:11
【问题描述】:
我遇到了与this thread 完全相同的问题,所以我要提出一个新问题。很抱歉大家回答了链接的线程,顺便说一句。
所以:我正在尝试避免 java.lang.IllegalStateException: TokenStream 合同违规。
我有一个与上面链接非常相似的代码:
protected TokenStreamComponents createComponents( String fieldName, Reader reader ) {
String token;
CharArraySet stopWords = new CharArraySet( Version.LUCENE_48, 0, false );
stopWords.addAll( StopAnalyzer.ENGLISH_STOP_WORDS_SET );
keepWords.addAll( getKeepWordList() );
Tokenizer source = new StandardTokenizer( Version.LUCENE_48, reader );
TokenStream filter = new StandardFilter( Version.LUCENE_48, source );
filter = new StopFilter( Version.LUCENE_48, filter, stopWords );
ShingleFilter shiFilter = new ShingleFilter( filter, 2, 3 );
CharTermAttribute cta = shiFilter.addAttribute( CharTermAttribute.class );
try {
shiFilter.reset();
while( shiFilter.incrementToken() ) {
token = cta.toString();
System.out.println( token );
}
shiFilter.end();
shiFilter.close();
}
catch ( IOException ioe ) {
ioe.printStackTrace();
}
return new TokenStreamComponents( source, filter );
}
我不明白建议的解决方案:“简单地构造一个新的 TokenStream”或“重置阅读器”是什么意思?我已经尝试了这两种解决方案,比如添加:
source.setReader( reader );
或改为:
filter = new StopFilter( Version.LUCENE_48, filter, stopWords );
ShingleFilter shiFilter = new ShingleFilter( filter, 2, 3 );
但错误最后。有什么建议吗?
【问题讨论】:
标签: java exception lucene token tokenize