IllegalArgumentException - 仅支持 <= 256 个有限字符串答案

【问题标题】：IllegalArgumentException - Only <= 256 finite strings are supportedIllegalArgumentException - 仅支持 <= 256 个有限字符串
【发布时间】：2015-11-17 15:00:26
【问题描述】：

索引我的数据时出现此错误。经过一番研究，我发现为什么会发生这种情况，并且增加了max_token_length 所以我这样做了，但我仍然收到与TokenStream expanded to 912 finite strings. Only <= 256 finite strings are supported 相同的错误

这是我的分析仪设置：

"settings": {
    "index": {
        "analysis": {
            "analyzer": {
                "shingle_analyzer": {
                    "tokenizer": "standard",
                    "max_token_length": 920,
                    "filter": ["lowercase", "shingle_filter", "asciifolding"],
                    "char_filter": ["html_strip"],
                    "type": "custom"
                },
                "html_analyzer": {
                    "tokenizer": "standard",
                    "max_token_length": 920,
                    "filter": ["lowercase", "asciifolding"],
                    "char_filter": ["html_strip"],
                    "type": "custom"
                }
            },
            "tokenizer": {
                "standard": {
                    "type": "standard"
                }
            },
            "filter": {
                "shingle_filter": {
                    "min_shingle_size": 2,
                    "max_shingle_size": 5,
                    "type": "shingle"
                }
            }
        }
    }
}

这是我尝试插入的示例：

POST /my_index/my_type/{id}
{
    "myField":{
        "input":"Abcdefghij kl Mnopqrstwx yz Abcdef g Hijklmno pq Rstwxy Zabc (DEF)",
        "weight":2,
        "payload":{
            "iD":"2786129"
        }
    }
}

这是my_type 属性的映射

"Suggestion": {
    "properties": {
        "id": {
            "index": "not_analyzed",
            "type": "integer"
        },
        "myField": {
            "type": "completion",
            "analyzer": "shingle_analyzer",
            "search_analyzer": "shingle_analyzer",
            "max_input_length": 150,
            "payloads": true
        }
    }
}

我错过了什么？

我将不胜感激任何帮助或解决此问题的线索，谢谢！

编辑： 已更正 analyzer 封闭丢失

【问题讨论】：

请注意，您在索引设置中缺少并包含 "analyzer": {...} 部分以包装您的自定义分析器。请参阅 structure of custom analyzers、analyzer、tokenizer 和 filter 都在 analysis 结构中。
哦，对不起，我只是写错了，我实际上把它们都包含在analyzer设置中

标签： elasticsearch

【解决方案1】：

好吧，几天前我想出了一个解决方案。我只是从分析器的设置中删除了max_token_length 属性，并从我的字段中的映射中减少了max_input_length，这似乎解决了我的问题，但老实说，我不确定为什么会发生这种情况以及为什么会这样解决它。如果有人有想法，请随时分享您的知识:)

【讨论】：

木瓦分析器对建议结果有什么影响吗？据我所知，木瓦分析器不应该影响返回的结果，因为完成建议适用于整个短语。另外，您将max_input_length 设置为什么？
在处理重音时会有所不同，当我调整对没有重音的数据的查询时，只有完成类型的字段不起作用，但是当我添加 shingle_analyzer 时，它确实建议口音是否存在。在我的情况下，我只是将max_input_length 降低到 25，这可以解决它。
我怀疑重音修复是由于 asciifolding 过滤器而不是 shingle-filter。如果您从分析器中删除 shingle-filter，您应该不会遇到 256 问题