我需要一种仅使用词频对 lucene 文档进行评分的方法。是否有任何标志需要为此更改？答案

【问题标题】：I need a way to score the lucene documents using term frequency only. Is there any flag that needs to be changed for this?我需要一种仅使用词频对 lucene 文档进行评分的方法。是否有任何标志需要为此更改？
【发布时间】：2016-04-07 11:21:44
【问题描述】：

如果我有两个文档，其中 D1 有两次“lucene”一词，而 D2 有三次“lucene”一词。我希望 lucene 的 D2 得分高于 D1。这里需要注意的是，D1 只有两个词（即 lucene lucene），而 D3 有 100 个词，其中 3 个词是 lucene。默认 lucene 评分模型将 D1 评分高于 D2。我想禁用此模式并将 D2 排名高于 D1。这是我的项目要求。

【问题讨论】：

标签： lucene

【解决方案1】：

您需要实现一个相似度来满足您的需求。您可以直接在Similarity 上实现，但您可能会发现复制ClassicSimilarity（DefaultSimilarity，5.4 之前的版本）更简单，并删除您不想影响分数的事情（即。返回一个常数）。例如，这是一个非常简单的实现，它会简单地返回查询中术语的频率：

import org.apache.lucene.index.FieldInvertState;
import org.apache.lucene.search.similarities.TFIDFSimilarity;
import org.apache.lucene.util.BytesRef;

public class SimpleSimilarity extends TFIDFSimilarity {
//Comments describe briefly what these methods do in the *standard* implementation.
//Not what they do in this implementation (which, for most of them, is nothing at all)

  public SimpleSimilarity() {}

  //boosts results which match more query terms
  @Override
  public float coord(int overlap, int maxOverlap) {
    return 1f;
  }

  //constant per query, normalizes scores somewhat based on query
  @Override
  public float queryNorm(float sumOfSquaredWeights) {
    return 1f;
  }

  //Norms should be disabled when using this similarity
  //They are useless to it, and would just be wasted space.
  @Override
  public final long encodeNormValue(float f) {
    return 1L;
  }

  @Override
  public final float decodeNormValue(long norm) {
    return 1f;
  }

  //Weighs shorter fields more heavily
  @Override
  public float lengthNorm(FieldInvertState state) {
    return 1f;
  }

  //Higher frequency terms (more matches) scored higher
  @Override
  public float tf(float freq) {
    //return (float)Math.sqrt(freq);  The standard tf impl
    return freq;
  }

  //Scores closer matches higher when using a sloppy phrase query
  @Override
  public float sloppyFreq(int distance) {
    return 1.0f;
  }

  //ClassicSimilarity doesn't really do much with payloads.  This is unmodified
  @Override
  public float scorePayload(int doc, int start, int end, BytesRef payload) {
    return 1f;
  }

  //Weigh matches on rarer terms more heavily.
  @Override
  public float idf(long docFreq, long numDocs) {
    return 1f;
  }

  @Override
  public String toString() {
    return "SimpleSimilarity";
  }
}

【讨论】：