【发布时间】:2017-03-01 09:04:08
【问题描述】:
我正在开发一个基于搜索引擎的应用程序并且正在开发 Lucene java 框架,我对 lucene 默认提供的评分功能感到困惑,即评分功能是否默认实现 tf-idf 和余弦相似度,或者我们有做点别的?
public class LuceneTester {
String indexDir = "C:\\Users\\hamda\\Documents\\NetBeansProjects\\luceneDemo\\Index";
String dataDir = "C:\\Users\\hamda\\Documents\\NetBeansProjects\\luceneDemo\\Data";
Indexer indexer;
Searcher searcher;
public static void main(String[] args) {
LuceneTester tester;
try {
tester = new LuceneTester();
tester.createIndex();
tester.search("DataGuides");
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
}
private void createIndex() throws IOException{
indexer = new Indexer(indexDir);
int numIndexed;
long startTime = System.currentTimeMillis();
numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
long endTime = System.currentTimeMillis();
indexer.close();
System.out.println(numIndexed+" File indexed, time taken: "
+(endTime-startTime)+" ms");
}
我在下面的搜索功能末尾获得了文档分数
private void search(String searchQuery) throws IOException, ParseException{
searcher = new Searcher(indexDir);
long startTime = System.currentTimeMillis();
TopDocs hits = searcher.search(searchQuery);
long endTime = System.currentTimeMillis();
System.out.println(hits.totalHits +
" documents found. Time :" + (endTime - startTime));
for(ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = searcher.getDocument(scoreDoc);
System.out.println(scoreDoc.score+" File: "
+ doc.get(LuceneConstants.FILE_PATH));
}
searcher.close();
}
}
我用谷歌搜索了一下,发现了这个: how can I implement the tf-idf and cosine similarity in Lucene? 任何帮助将不胜感激:)
【问题讨论】:
标签: java lucene search-engine tf-idf cosine-similarity