【问题标题】:Elastic Search to search for words that starts with phrase弹性搜索以搜索以短语开头的单词
【发布时间】:2014-04-02 05:19:22
【问题描述】:

我正在尝试使用 Elastic Search 和 NEST 为我的网站创建搜索功能。您可以在下面看到我的代码,如果我搜索完整(几乎完整)的单词,我会得到结果。 即,如果我搜索“Buttermilk”或“Buttermil”,我会在包含“Buttermilk”这个词的文档中找到匹配项。

但是,我尝试完成的是,如果我搜索“Butter”,我应该得到一个包含所有三个文档的结果,这些文档的单词都以“Butter”开头。我认为这是通过使用 FuzzyLikeThis 解决的?

谁能看到我做错了什么并指出我正确的方向?

我创建了一个控制台应用程序,您可以在此处查看完整代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nest;
using Newtonsoft.Json;

namespace ElasticSearchTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var indexSettings = new IndexSettings();
            indexSettings.Analysis.Analyzers["text-en"] = new SnowballAnalyzer { Language = "English" };

            ElasticClient.CreateIndex("elastictesting", indexSettings);

            var testItem1 = new TestItem {
                Id = 1,
                Name = "Buttermilk"
            };
            ElasticClient.Index(testItem1, "elastictesting", "TestItem", testItem1.Id);

            var testItem2 = new TestItem {
                Id = 2,
                Name = "Buttercream"
            };
            ElasticClient.Index(testItem2, "elastictesting", "TestItem", testItem2.Id);

            var testItem3 = new TestItem {
                Id = 3,
                Name = "Butternut"
            };
            ElasticClient.Index(testItem3, "elastictesting", "TestItem", testItem3.Id);

            Console.WriteLine("Write search phrase:");
            var searchPhrase = Console.ReadLine();
            var searchResults = Search(searchPhrase);

            Console.WriteLine("Number of search results: " + searchResults.Count());
            foreach (var item in searchResults) {
                Console.WriteLine(item.Name);
            }

            Console.WriteLine("Press any key to exit");
            Console.ReadKey();
        }

        private static List<TestItem> Search(string searchPhrase)
        {
            var query = BuildQuery(searchPhrase);

            var result = ElasticClient
                .Search(query)
                .Documents
                .Select(d => d)
                .Distinct()
                .ToList();

            return result;
        }

        public static ElasticClient ElasticClient
        {
            get
            {
                var localhost = new Uri("http://localhost:9200");
                var setting = new ConnectionSettings(localhost);
                setting.SetDefaultIndex("elastictesting");
                return new ElasticClient(setting);
            }
        }

        private static SearchDescriptor<TestItem> BuildQuery(string searchPhrase)
        {
            var querifiedKeywords = string.Join(" AND ", searchPhrase.Split(' '));

            var filters = new BaseFilter[1];

            filters[0] = Filter<TestItem>.Bool(b => b.Should(m => m.Query(q =>
                q.FuzzyLikeThis(flt =>
                    flt.OnFields(new[] {
                        "name"
                    }).LikeText(querifiedKeywords)
                    .PrefixLength(2)
                    .MaxQueryTerms(1)
                    .Boost(2))
                )));

            var searchDescriptor = new SearchDescriptor<TestItem>()
                .Filter(f => f.Bool(b => b.Must(filters)))
                .Index("elastictesting")
                .Type("TestItem")
                .Size(500);

            var jsons = JsonConvert.SerializeObject(searchDescriptor, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });
            return searchDescriptor;
        }
    }

    class TestItem {
        public int Id { get; set; }
        [ElasticProperty(Analyzer = "text-en", Index = FieldIndexOption.analyzed)]
        public string Name { get; set; }
    }
}

2014-04-01 11:18 编辑

好吧,我最终使用了 MultiMatch 和 QueryString,所以这就是我的代码现在的样子。希望它对未来的任何人有所帮助。另外,我在我的 TestItem 中添加了一个 Description 属性来说明多重匹配。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nest;
using Newtonsoft.Json;

namespace ElasticSearchTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var indexSettings = new IndexSettings();

            ElasticClient.CreateIndex("elastictesting", indexSettings);

            var testItem1 = new TestItem {
                Id = 1,
                Name = "Buttermilk",
                Description = "butter with milk"
            };
            ElasticClient.Index(testItem1, "elastictesting", "TestItem", testItem1.Id);

            var testItem2 = new TestItem {
                Id = 2,
                Name = "Buttercream",
                Description = "Butter with cream"
            };
            ElasticClient.Index(testItem2, "elastictesting", "TestItem", testItem2.Id);

            var testItem3 = new TestItem {
                Id = 3,
                Name = "Butternut",
                Description = "Butter with nut"
            };
            ElasticClient.Index(testItem3, "elastictesting", "TestItem", testItem3.Id);

            Console.WriteLine("Write search phrase:");
            var searchPhrase = Console.ReadLine();
            var searchResults = Search(searchPhrase);

            Console.WriteLine("Number of search results: " + searchResults.Count());
            foreach (var item in searchResults) {
                Console.WriteLine(item.Name);
                Console.WriteLine(item.Description);
            }

            Console.WriteLine("Press any key to exit");
            Console.ReadKey();
        }

        private static List<TestItem> Search(string searchPhrase)
        {
            var query = BuildQuery(searchPhrase);

            var result = ElasticClient
                .Search(query)
                .Documents
                .Select(d => d)
                .Distinct()
                .ToList();

            return result;
        }

        public static ElasticClient ElasticClient
        {
            get
            {
                var localhost = new Uri("http://localhost:9200");
                var setting = new ConnectionSettings(localhost);
                setting.SetDefaultIndex("elastictesting");
                return new ElasticClient(setting);
            }
        }

        private static SearchDescriptor<TestItem> BuildQuery(string searchPhrase)
        {
            var searchDescriptor = new SearchDescriptor<TestItem>()
                .Query(q => q
                    .MultiMatch(m =>
                    m.OnFields(new[] {
                        "name",
                        "description"
                    }).QueryString(searchPhrase).Type(TextQueryType.PHRASE_PREFIX)
                    )
                )
                .Index("elastictesting")
                .Type("TestItem")
                .Size(500);

            var jsons = JsonConvert.SerializeObject(searchDescriptor, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });

            return searchDescriptor;
        }
    }

    class TestItem {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Description { get; set; }
    }
}

【问题讨论】:

  • 我不太清楚你想要完成什么。您要搜索“Butter”并返回包含“buttermilk”的结果?
  • 当我搜索黄油时,我想得到所有以黄油开头的单词。所以在我的示例代码中它应该返回所有三个,因为它们以黄油开头。

标签: c# elasticsearch nest


【解决方案1】:

而不是使用 FuzzyLikequery.. 使用前缀查询更快速和准确..! 更多信息refer

curl -XPOST "http://localhost:9200/try/indextype/_search" -d'
{
"query": {
    "prefix": {
       "field": {
          "value": "Butter"
       }
    }
}
}'

在 NEST 中创建上述查询并重试..!

【讨论】:

    【解决方案2】:

    这与FuzzyLikeThis 无关。

    您可以按照@BlackPOP 的建议使用开箱即用的前缀查询。 您也可以选择使用EdgeNGrams,这将在索引时间标记您的输入。与prefixquery 相比,结果更快的性能,抵消了增加的索引大小。

    要记住的一点是prefixquery 仅适用于未分析的字段,因此如果您想在索引时进行任何分析,最好使用 EdgeNGrams。

    如果您不知道它们是什么,请阅读分析仪等。 一些参考:

    请参阅How can I do a prefix search in ElasticSearch in addition to a generic query string? 了解类似问题。

    【讨论】:

    • 感谢您澄清并指出正确的方向。我会调查的!
    猜你喜欢
    • 2021-03-11
    • 2023-03-16
    • 1970-01-01
    • 1970-01-01
    • 2020-07-09
    • 2016-07-14
    • 1970-01-01
    • 2014-11-18
    • 2012-10-26
    相关资源
    最近更新 更多