elasticsearch - 如何处理未分配的分片答案

【问题标题】：elasticsearch - what to do with unassigned shardselasticsearch - 如何处理未分配的分片
【发布时间】：2014-07-02 14:45:44
【问题描述】：

我的集群处于黄色状态，因为某些分片未分配。这个怎么办？

我尝试将 cluster.routing.allocation.disable_allocation = false 设置为所有索引，但我认为这不起作用，因为我使用的是 1.1.1 版本。

我也尝试重新启动所有机器，但同样发生。

有什么想法吗？

编辑：

集群统计：

{ 
  cluster_name: "elasticsearch",
  status: "red",
  timed_out: false,
  number_of_nodes: 5,
  number_of_data_nodes: 4,
  active_primary_shards: 4689,
  active_shards: 4689,
  relocating_shards: 0,
  initializing_shards: 10,
  unassigned_shards: 758
}

【问题讨论】：

发布 _cluster\health 和 _stats
这些端点对问题有何看法？在 /health 中显示 756 个未分配的碎片
使用“get”调用获取_cluster\health和_stats
{ cluster_name: "elasticsearch", status: "red", timed_out: false, number_of_nodes: 5, number_of_data_nodes: 4, active_primary_shards: 4689, active_shards: 4689, relocating_shards: 0, initializing_shards: 10, unassigned_shards : 758 }
这对我有用 stackoverflow.com/a/63777546/5756620 POST _cluster/reroute?retry_failed 如此处所述elastic.co/guide/en/elasticsearch/reference/6.8/…

标签： elasticsearch

【解决方案1】：

分配不发生的可能原因有很多：

您在不同的节点上运行不同版本的 Elasticsearch
您的集群中只有一个节点，但您的副本数设置为非零值。
您的磁盘空间不足。
您已禁用分片分配。
您启用了防火墙或 SELinux。在启用 SELinux 但未正确配置的情况下，您将看到分片永远卡在 INITIALIZING 或 RELOCATING 中。

作为一般规则，您可以像这样解决问题：

查看集群中的节点：curl -s 'localhost:9200/_cat/nodes?v'。如果您只有一个节点，则需要将number_of_replicas 设置为 0。（参见 ES 文档或其他答案）。
查看集群中可用的磁盘空间：curl -s 'localhost:9200/_cat/allocation?v'
检查集群设置：curl 'http://localhost:9200/_cluster/settings?pretty' 并查找 cluster.routing 设置
查看哪些分片未分配curl -s localhost:9200/_cat/shards?v | grep UNASS

尝试强制分配一个分片

curl -XPOST -d '{ "commands" : [ {
  "allocate" : {
       "index" : ".marvel-2014.05.21", 
       "shard" : 0, 
       "node" : "SOME_NODE_HERE",
       "allow_primary":true 
     } 
  } ] }' http://localhost:9200/_cluster/reroute?pretty

查看响应并查看其内容。会有一堆“是”是好的，然后是一个“否”。如果没有任何 NO，则可能是防火墙/SELinux 问题。

【讨论】：

这太好了，谢谢 - 我通过这种方式找到了我的问题。结果发现我的一个节点的 Elasticsearch 版本比其他节点稍差，因此集群拒绝将这些分片复制到它。哦，差异很小 - 1.4.2 vs 1.4.4。
这会准确告诉您哪个索引未分配。有时是您认为已删除的索引！看起来像是 ES 中的一个错误，但这至少可以让您确定它未分配的确切原因！！！谢谢
谢谢！在向集群添加新节点后，我一直在弄清楚为什么我的分片没有被分配——新节点比旧节点稍新。
非常感谢！调试流程是无价的。
感谢您的提示！如果你只有一个节点，你肯定不需要副本......

【解决方案2】：

这是默认索引设置引起的常见问题，尤其是当您尝试在单个节点上复制时。要使用瞬态集群设置解决此问题，请执行以下操作：

curl -XPUT http://localhost:9200/_settings -d '{ "number_of_replicas" :0 }'

接下来，让集群重新分配分片（总而言之，你可以随时打开它）：

curl -XPUT http://localhost:9200/_cluster/settings -d '
{
    "transient" : {
        "cluster.routing.allocation.enable": true
    }
}'

现在坐下来观察集群清理未分配的副本分片。如果您希望这对未来的索引生效，请不要忘记使用以下设置修改 elasticsearch.yml 文件并反弹集群：

index.number_of_replicas: 0

【讨论】：

这对我有用。 windows命令供参考： curl -XPUT localhost:9200/_settings -d "{ """number_of_replicas""" :0 }" curl -XPUT localhost:9200/_cluster/settings -d "{ """transient""" : { """cluster .routing.allocation.enable""": true }}"
true 不是cluster.routing.allocation.enable 的有效值（这将引发java.lang.IllegalArgumentException: Illegal allocation.enable value [TRUE]）。有效值为all、primaries、new_primaries 或none（来源：elastic.co/guide/en/elasticsearch/reference/2.4/…）

【解决方案3】：

那些未分配的分片实际上是主节点的实际分片的未分配副本。

为了分配这些分片，您需要运行一个新的 elasticsearch 实例来创建一个辅助节点来承载数据副本。

编辑： 有时，未分配的分片属于已删除的索引，使它们成为无论是否添加节点都永远不会分配的孤立分片。但这里不是这样！

【讨论】：

谢谢，我想我明白了。由于每个节点的最大分片数，这些分片未分配？
不客气。每个节点的最大分片数是多少？
index.routing.allocation.total_shards_per_node = -1（默认）
我有 1800 个索引，有些有 2 个分片，有些有 10 个分片。所有这些都分发到 4 个具有 8gb ram 和 80gb ssd 的数据机
你有多少个节点？您可以发布您的 elasticsearch-head 插件的屏幕截图吗？

【解决方案4】：

唯一对我有用的是更改 number_of_replicas（我有 2 个副本，所以我将其更改为 1，然后再更改回 2）。

第一：

PUT /myindex/_settings
{
    "index" : {
        "number_of_replicas" : 1
     }
}

然后：

PUT /myindex/_settings
{
    "index" : {
        "number_of_replicas" : 2
     }
}

【讨论】：

我有大约 20 个未分配的分片和一个空节点（共 6 个）。将其中一个的“number_of_replicas”设置为 1，然后再设置回 2，似乎使事情松散了，所有未分配的副本都移到了空节点。

【解决方案5】：

Alcanzar 回答的前 2 点为我做了，但我不得不补充

"allow_primary" : true

像这样

curl -XPOST http://localhost:9200/_cluster/reroute?pretty -d '{
  "commands": [
    {
      "allocate": {
        "index": ".marvel-2014.05.21",
        "shard": 0,
        "node": "SOME_NODE_HERE",
        "allow_primary": true
      }
    }
  ]
}'

【讨论】：

【解决方案6】：

对于较新的 ES 版本，这应该可以解决问题（在 Kibana DevTools 中运行）：

PUT /_cluster/settings
{
  "transient" : {
    "cluster.routing.rebalance.enable" : "all"
  }
}

但是，这并不能解决根本原因。就我而言，有很多未分配的分片，因为默认副本大小为 1，但实际上我只使用单个节点。所以我也在我的elasticsearch.yml添加了这一行：

index.number_of_replicas: 0

【讨论】：

【解决方案7】：

检查每个节点上的 ElasticSearch 版本是否相同。如果不是，则 ES 不会将索引的副本分配给“旧”节点。

使用@Alcanzar 的答案，您可以获得一些诊断错误消息：

curl -XPOST 'http://localhost:9200/_cluster/reroute?pretty' -d '{
  "commands": [
    {
      "allocate": {
        "index": "logstash-2016.01.31",
        "shard": 1,
        "node": "arc-elk-es3",
        "allow_primary": true
      }
    }
  ]
}'

结果是：

{
  "error" : "ElasticsearchIllegalArgumentException[[allocate] allocation of
            [logstash-2016.01.31][1] on node [arc-elk-es3]
            [Xn8HF16OTxmnQxzRzMzrlA][arc-elk-es3][inet[/172.16.102.48:9300]]{master=false} is not allowed, reason:
            [YES(shard is not allocated to same node or host)]
            [YES(node passes include/exclude/require filters)]
            [YES(primary is already active)]
            [YES(below shard recovery limit of [2])]
            [YES(allocation disabling is ignored)]
            [YES(allocation disabling is ignored)]
            [YES(no allocation awareness enabled)]
            [YES(total shard limit disabled: [-1] <= 0)]
            *** [NO(target node version [1.7.4] is older than source node version [1.7.5]) ***
            [YES(enough disk for shard on node, free: [185.3gb])]
            [YES(shard not primary or relocation disabled)]]",
  "status" : 400
}

如何确定ElasticSearch的版本号：

adminuser@arc-elk-web:/var/log/kibana$ curl -XGET 'localhost:9200'
{
  "status" : 200,
  "name" : "arc-elk-web",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.5",
    "build_hash" : "00f95f4ffca6de89d68b7ccaf80d148f1f70e4d4",
    "build_timestamp" : "2016-02-02T09:55:30Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

就我而言，我错误地设置了apt-get 存储库，它们在不同的服务器上不同步。我在所有服务器上更正了它：

echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list

然后是通常的：

sudo apt-get update
sudo apt-get upgrade

最后一次服务器重启。

【讨论】：