【问题标题】:Tokens produced by Shingle filter are not included in the query - Lucene由 Shingle 过滤器生成的令牌不包含在查询中 - Lucene
【发布时间】:2017-07-24 09:58:52
【问题描述】:
public class CustomAnalyzer extends Analyzer {
    public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
    private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;

    @Override
    protected Analyzer.TokenStreamComponents createComponents(final String fieldName,final Reader reader) {
        final ClassicTokenizer src = new ClassicTokenizer(getVersion(), reader);
        src.setMaxTokenLength(maxTokenLength);
        TokenStream  tok = new ShingleFilter(src,2,3);
        tok = new ClassicFilter(tok);
        tok = new LowerCaseFilter(tok);
    //    tok = new SynonymFilter(tok,SynonymDictionary.getSynonymMap(),true);
        return new Analyzer.TokenStreamComponents(src, tok) {
            @Override
            protected void setReader(final Reader reader) throws IOException {
                src.setMaxTokenLength(CustomAnalyzer.this.maxTokenLength);
                super.setReader(reader);
            }
        };
    }
}


public class Test {
    public static void main(String[] args) throws Exception {
        Directory dir = new NIOFSDirectory(new File("/home/local/test"));
        IndexReader indexReader = DirectoryReader.open(dir);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        CustomAnalyzer analyzer1 = new CustomAnalyzer();
        TokenStream ts=new CustomSynonymAnalyzer().tokenStream("n",new StringReader("cup board"));
        ts.reset();
        System.out.println("Tokens are :");
        while (ts.incrementToken()) {
           System.out.print(ts.getAttribute(CharTermAttribute.class) + ", ");
        }
        QueryParser parser = new QueryParser("n", analyzer1);
        Query query = null;
        query = parser.parse("cup board");
        System.out.println("\nQuery is");
        System.out.println(query.toString());
    }
}

我使用的是 Lucene 4.10.4。上面代码的输出是,

Tokens are :
cup, cup board, board 
Query is
n:cup n:board

我希望得到的查询是 n:cup n:board n:cup board。但是由shingle filter 形成的标记不会附加到查询中。我只得到 n:cup n:board。我的错在哪里?

【问题讨论】:

    标签: java lucene


    【解决方案1】:

    这些标记不会被分析器拆分,它们会被 QueryParser 语法拆分。它们是单独的查询子句,而不是单独的术语,因为子句用空格分隔。

    尝试一个短语查询,看看有什么不同:parser.parse("\"cup board\"");

    【讨论】:

    • 如何使用 shingle 过滤器实现这一点?对于查询词“hello world java”,我需要在查询中形成token的组合(hello, hello world, hello world java, world java, java)。
    猜你喜欢
    • 1970-01-01
    • 2011-07-12
    • 2018-04-04
    • 2017-01-18
    • 1970-01-01
    • 2021-12-28
    • 1970-01-01
    • 2017-05-09
    • 1970-01-01
    相关资源
    最近更新 更多