【发布时间】:2011-02-17 07:22:54
【问题描述】:
我正在尝试使用 Lucene(实际上是 PyLucene!)来找出有多少文档包含我的确切短语。我的代码目前看起来像这样......但它运行得相当慢。有谁知道返回文档计数的更快方法?
phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases...
countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True)
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)
for phrase in phraseList:
query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"")
scoreDocs = countsearcher.search(query, 200).scoreDocs
print "count is: " + str(len(scoreDocs))
【问题讨论】: