【问题标题】:Lucene.net 2.9.2 sorting (sort doesn't work)Lucene.net 2.9.2 排序(排序不起作用)
【发布时间】:2010-12-21 16:18:41
【问题描述】:

我在 .NET 中对 lucene.net 索引进行排序时遇到问题。我几乎在 stackoverflow 上尝试了所有解决方案并寻找谷歌答案。我正在使用 Lucene.NET 2.9.2 和 ASP.NET 2.0。我想像在 sql 中那样对字符串进行排序,你可以输入'order by Title desc [asc]'

我将向您展示我的代码,希望有人可以帮助我。

    //Here I create Index with some fields
    doc.Add(new Field("prod_id",row["prod_id"].ToString(),Field.Store.YES,Field.Index.ANALYZED));
            doc.Add(new Field("prod_title", row["prod_title"].ToString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.Add(new Field("prod_desc", row["prod_desc"].ToString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.Add(new Field("prod_author", row["prod_author"].ToString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.Add(new Field("prod_publisher", row["prod_publisher"].ToString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.Add(new Field("prod_price", row["prod_price"].ToString(), Field.Store.YES, Field.Index.ANALYZED));

//Then next I try to do search with sort option:

//method for return approciate Sort object
private static Sort SetSortForLucene(string _sort)
    {
        Sort sort;
        switch (_sort)
        {
            case "UnitPriceGorss":
                sort = new Sort(new SortField("prod_price",SortField.DOUBLE,false);
                break;

            case "UnitPriceGorssDESC":
                sort = new Sort(new SortField("prod_price",SortField.DOUBLE,true);
                break;

            case "Title":
                //not working
                sort = new Sort(new SortField("prod_title", SortField.STRING, true));
                break;

            case "TitleDESC":
                //not working
                sort = new Sort(new SortField("prod_title", SortField.STRING, false));
                break;
            case "":
                sort = new Sort(new SortField("prod_title", SortField.STRING, false));
                break;
            default:
                sort = new Sort(new SortField("prod_title", SortField.STRING, false));
                break;
        }
        return sort;
    }
//Inside my query of lucene method:
StandardAnalyzer analizer = new StandardAnalyzer(Version.LUCENE_29);
IndexReader reader =IndexReader.Open(IndexPath);
Searcher searcher = new IndexSearcher(reader);
//Here call for Sort object
Sort sort = SetSortForLucene(_sort);
TopFieldDocCollector collector = new TopFieldDocCollector(reader, sort, pageSize);
//Find which document field need to me asked in QueryParser object
string _luceneField = "";

        if (luceneField.Contains("_"))
            _luceneField = luceneField;
        else
        switch (luceneField)
        {
            case "Title": _luceneField = "prod_title"; break;
            case "Description": _luceneField = "prod_desc"; break;
            case "Author": _luceneField = "prod_author"; break;
            case "Publisher": _luceneField = "prod_publisher"; break;
            default: _luceneField = "prod_title"; break;
        }
        QueryParser parser = new QueryParser(_luceneField, analizer);
        Query query = parser.Parse(luceneQuery);
        ScoreDoc[] hits;
        searcher.Search(query,collector);
//Obtaining top records from search but without any sort.
        hits =  collector.TopDocs().scoreDocs;

        foreach (ScoreDoc hit in hits)
        {
            Document doc = searcher.Doc(hit.doc);
            string a = doc.Get("prod_id");
            int id = 0;
            if (hit.score > score)
            {
                if (int.TryParse(doc.Get("prod_id"), out id))
                                tmpId.Add(id);
            }
        }
//I also define stop words for full text searching and i think this is
//real cause of problem with sorting.
System.String[] stopWords = new System.String[]{"a","że","w","przy","o","bo","co","z","za","ze","ta","i","no","do"};

我用过这个link in stackoverflow.this pretty one link 来解决我的问题,但排序失败,我不知道我的代码有什么问题。

几天后,我终于找到了解决方案。我想要排序的字段在表示字符串值时不应该被标记。

例如,当我想按标题(升序/降序)对产品进行排序时,您应该输入如下内容:

doc.Add(new Field(Product.PROD_TITLE_SORT, row["prod_title"].ToString().Replace(" ", "_") + "_" + row[Product.PROD_ID].ToString(), Field.Store.NO, Field.Index.NOT_ANALYZED));

我不明白为什么这个字段不存储也不分析,因此 lucene.net 可以按这个添加的字段排序。这个排序字段甚至不在索引中!我检查了 lukeall-1.0.1.jar 索引浏览器。

其次你需要创建一个合适的排序方法:

private static Sort SetSortForLucene(string _sort)
        {
            Sort sort;
            _sort = !string.IsNullOrEmpty(_sort) ? _sort : "";
            switch (_sort)
            {
                case "UnitPriceGorss":
                    sort = new Sort(new SortField(PROD_PRICE, SortField.DOUBLE, false));
                    break;

                case "UnitPriceGorssDESC":
                    sort = new Sort(new SortField(PROD_PRICE, SortField.DOUBLE, true));
                    break;

                case "Title":
                    //not it works perfectly.
                    sort = new Sort(new SortField(PROD_TITLE_SORT, SortField.STRING, true));
                    break;

                case "TitleDESC":
                    //not it works perfectly.
                    sort = new Sort(new SortField(PROD_TITLE_SORT, SortField.STRING, false));
                    break;
                case ""://Here is default sorting behavior. It get's result according to Lucene.NET search result score.
                    sort = new Sort(SortField.FIELD_SCORE);
                    break;
                default:
                    sort = new Sort(SortField.FIELD_SCORE);
                    break;
            }
            return sort;
        }

让我真正怀疑的是,当字段在 lucene 全文索引中被索引时,排序与 SortField.DOUBLE 一起工作。

我希望这篇文章能帮助任何在排序方面遇到类似问题的人。

【问题讨论】:

    标签: sorting lucene.net


    【解决方案1】:

    除非您在查询中返回数据,否则不需要存储该字段。但它仍会添加到索引中。

    您不分析要排序的字段的原因是分析器将字段分解为单独的术语,这使得排序非常困难,因为文档的索引中有多个单词,显然无法排序针对整个指数。这适用于所有字段类型,无论它们是否是单个术语。

    我相信您可以存储该字段,但除非您想在查询中返回它,否则没有必要。

    【讨论】:

      【解决方案2】:

      我想知道关于排序的一件重要事情。

      它不适用于标记化(分析)的数据。

      【讨论】:

      • 所以....为您希望排序的数据添加一个未分析的字段,并使用该字段进行排序,如果您还希望对其进行分析,您可能必须复制该字段。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-03-18
      • 1970-01-01
      • 2021-05-17
      • 2013-02-09
      • 2013-04-07
      相关资源
      最近更新 更多