【问题标题】:Implementing custom prefix remover token filter in lucene producing dirty tokens在产生脏令牌的lucene中实现自定义前缀去除器令牌过滤器
【发布时间】:2021-12-28 06:52:12
【问题描述】:

我正在尝试实现一个 lucene 过滤器以从查询中的术语中删除前缀。 似乎在多次查询后的某个时候,过滤器已被重用,因此 char 缓冲区是脏的。

下面的代码是简化的,prefix是一个外部参数。

  public static class PrefixFilter extends TokenFilter {

    private final PackedTokenAttributeImpl termAtt = (PackedTokenAttributeImpl) addAttribute(CharTermAttribute.class);

    public PrefixFilter(TokenStream in) {
      super(in);
    }

    @Override
    public final boolean incrementToken() throws IOException {
      if (!input.incrementToken()) {
        return false;
      }
      String value = new String(termAtt.buffer());
      value = value.trim();
      value = value.toLowerCase();
      value = StringUtils.removeStart(value, "prefix_");
      if (value.isBlank()) {
        termAtt.setEmpty();
      } else {
        termAtt.copyBuffer(value.toCharArray(), 0, value.length());
        termAtt.setLength(value.length());
      }
      return true;
    }
  }

所以在 10 或 12 次查询之后,值“prefix_a”变成了“abcde”。

所以我正在尝试以这种方式添加 termBuffer 偏移结束值:

    termAtt.setEmpty();
    termAtt.resizeBuffer(value.length());
    termAtt.copyBuffer(value.toCharArray(), 0, value.length());
    termAtt.setLength(value.length());
    termAtt.setOffset(0, value.length());

但我不知道它是否正确。谁能帮帮我?

谢谢。

【问题讨论】:

    标签: java lucene buffer


    【解决方案1】:

    看看这对你有没有帮助,

    /**
     * Standard number token filter.
     */
    public class StandardnumberTokenFilter extends TokenFilter {
    
        private final LinkedList<PackedTokenAttributeImpl> tokens;
    
        private final StandardnumberService service;
    
        private final Settings settings;
    
        private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
    
        private final PositionIncrementAttribute posIncAtt = addAttribute(PositionIncrementAttribute.class);
    
        private State current;
    
        protected StandardnumberTokenFilter(TokenStream input, StandardnumberService service, Settings settings) {
            super(input);
            this.tokens = new LinkedList<>();
            this.service = service;
            this.settings = settings;
        }
    
        @Override
        public final boolean incrementToken() throws IOException {
            if (!tokens.isEmpty()) {
                if (current == null) {
                    throw new IllegalArgumentException("current is null");
                }
                PackedTokenAttributeImpl token = tokens.removeFirst();
                restoreState(current);
                termAtt.setEmpty().append(token);
                posIncAtt.setPositionIncrement(0);
                return true;
            }
            if (input.incrementToken()) {
                detect();
                if (!tokens.isEmpty()) {
                    current = captureState();
                }
                return true;
            } else {
                return false;
            }
        }
    
        private void detect() throws CharacterCodingException {
            CharSequence term = new String(termAtt.buffer(), 0, termAtt.length());
            Collection<CharSequence> variants = service.lookup(settings, term);
            for (CharSequence ch : variants) {
                if (ch != null) {
                    PackedTokenAttributeImpl token = new PackedTokenAttributeImpl();
                    token.append(ch);
                    tokens.add(token);
                }
            }
        }
    
        @Override
        public void reset() throws IOException {
            super.reset();
            tokens.clear();
            current = null;
        }
    
        @Override
        public boolean equals(Object object) {
            return object instanceof StandardnumberTokenFilter &&
                    service.equals(((StandardnumberTokenFilter)object).service) &&
                    settings.equals(((StandardnumberTokenFilter)object).settings);
        }
    
        @Override
        public int hashCode() {
            return service.hashCode() ^ settings.hashCode();
        }
    }
    

    https://github.com/jprante/elasticsearch-plugin-bundle/blob/f63690f877cc7f50360faffbac827622c9d404ef/src/main/java/org/xbib/elasticsearch/plugin/bundle/index/analysis/standardnumber/StandardnumberTokenFilter.java

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-11-19
      • 1970-01-01
      • 1970-01-01
      • 2019-02-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-02
      相关资源
      最近更新 更多