【发布时间】:2019-03-30 12:31:53
【问题描述】:
我在 Elasticsearch-6 中分析的文本有许多我不感兴趣的数字,但我不知道如何删除它们。谢谢,我对代币的搜索将带回邮政编码、时间或年份。我可以将它们添加到停用词中的不同年份很少。但是其他人太多了,无法通过这种方式过滤掉它们。
我确实尝试过编写自定义过滤器:
"char_filter": {
"number_filter": {
"type": "pattern_replace",
"pattern": "\\d+",
"replacement": " "
}
但是当我尝试在设置中添加它时,我收到了以下错误:
由于缺少“。”,无法获取 [index.analysis.analyzer.] 设置前缀和设置 [index.analysis.analyzer.char_filter] 的设置组
这是我的配置的全部设置部分(注意:在我添加数字替换器之前它工作):
"settings": {
"analysis": {
"analyzer": {
"t_analyzer": {
"tokenizer": "t_tokenizer"
},
"major_words_analyzer": {
"type": "standard",
"stopwords": "_english_"
},
"char_filter": [
"number_filter"
]
},
"tokenizer": {
"t_tokenizer": {
"type": "standard"
}
},
"char_filter": {
"number_filter": {
"type": "pattern_replace",
"pattern": "\\d+",
"replacement": " "
}
}
}
}
编辑:这是相关的字段设置:
},
"narrative": {
"type": "text",
"store": "true",
"analyzer": "t_analyzer",
"fielddata": "true",
"fields": {
"raw": {
"type": "text"
}
}
},
"narrativePhrases": {
"type": "text",
"analyzer": "major_words_analyzer",
"fielddata": "true",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
编辑:我之后要做的是:
POST /test_narrative/_search?size=0
{
"aggs": {
"incidents_by_month":{
"date_histogram":{
"field":"eventDate",
"interval":"month",
"min_doc_count" : 5
},
"aggs":{
"top_phrases":{
"significant_text": {
"field": "narrative",
"size": 10
}
}
}
}
}
}
而且我的返回值中仍然有数字:
{
"key": "personally",
"doc_count": 3,
"score": 5.22625236294896,
"bg_count": 36
},
{
"key": "2011",
"doc_count": 4,
"score": 2.4786045712321703,
"bg_count": 132
}
【问题讨论】:
标签: elasticsearch elasticsearch-6