【问题标题】:Docker Engine fails on Azure Batch nodeDocker 引擎在 Azure Batch 节点上失败
【发布时间】:2017-10-18 23:05:33
【问题描述】:

场景

我创建了一个包含多个节点的池(基本映像是 Ubuntu Server 16.04),并提供以下启动命令: /bin/bash -c 'set -o pipefail; export DEBIAN_FRONTEND=noninteractive ; sudo -E apt update ; sudo -E apt upgrade -y ; sudo -E apt-get install -y --no-install-recommends apt-transport-https curl software-properties-common ; curl -fsSL "https://sks-keyservers.net/pks/lookup?op=get&search=0xee6d536cf7dc86e2d7d56f59a178ac6c6238f52e" | sudo -E apt-key add - ; sudo -E apt-add-repository "deb https://packages.docker.com/1.13/apt/repo/ ubuntu-$(lsb_release -cs) main" ; sudo -E apt-get update ; sudo -E apt-get install -y docker-engine ; sudo usermod -a -G docker $USER ; sudo -E service docker start ; journalctl -xe; wait'

命令服务器是安装 Docker 引擎的唯一目的。另请注意,我删除了选项 set -e 以便能够运行命令 journalctl -xe 并捕获以下错误。

错误

在创建上述池时,某些节点会导致启动任务失败。该行为似乎是随机的,因为并非总是一个节点失败,并且如前所述,其他节点也不会失败。 该行为不依赖于节点的大小(我尝试了 D2_v3 和 NC6)。

这是journalctl -xe的输出:

Oct 12 09:19:40 7d8bb094c57c400582f6031d59f1630000000A systemd[1]: Listening on Docker Socket for the API.
-- Subject: Unit docker.socket has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.socket has finished starting up.
-- 
-- The start-up result is done.
Oct 12 09:19:40 7d8bb094c57c400582f6031d59f1630000000A systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has begun starting up.
Oct 12 09:19:40 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:40.605332263Z" level=info msg="libcontainerd: new containerd process, pid: 24492"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.608293321Z" level=info msg="[graphdriver] using prior storage driver: aufs"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.626089049Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.626378756Z" level=warning msg="Your kernel does not support swap memory limit"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.626558660Z" level=warning msg="Your kernel does not support cgroup rt period"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.626698864Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.626834867Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.626970070Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.627384080Z" level=info msg="Loading containers: start."
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.630900065Z" level=info msg="Firewalld running: false"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.661877309Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
Oct 12 09:19:41 7d8bb094c57c400582f6031d59f1630000000A dockerd[24473]: time="2017-10-12T09:19:41.996853856Z" level=info msg="Loading containers: done."
Oct 12 09:19:42 7d8bb094c57c400582f6031d59f1630000000A kernel: aufs au_opts_verify:1585:dockerd[24490]: dirperm1 breaks the protection by the permission bits on the lower branch
Oct 12 09:19:45 7d8bb094c57c400582f6031d59f1630000000A systemd[1]: docker.service: Main process exited, code=killed, status=11/SEGV
Oct 12 09:19:45 7d8bb094c57c400582f6031d59f1630000000A systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit docker.service has failed.
-- 
-- The result is failed.
Oct 12 09:19:45 7d8bb094c57c400582f6031d59f1630000000A systemd[1]: docker.service: Unit entered failed state.
Oct 12 09:19:45 7d8bb094c57c400582f6031d59f1630000000A systemd[1]: docker.service: Failed with result 'signal'.

在创建网络接口时似乎出了点问题,但我不确定是什么问题,尤其是如何解决它。

【问题讨论】:

    标签: azure docker azure-batch docker-engine


    【解决方案1】:

    2017-10-18 更新答案:

    Canonical UbuntuServer 16.04-LTS 的latest 平台映像已修复该问题,并再次与 Go/Docker 一起使用。

    原答案:

    您的代码没有问题。有一个 issue 与 Canonical UbuntuServer 16.04-LTS 201709190 平台映像(此时也是 latest)和 Go/Docker。

    在问题修复期间,将要部署的映像版本暂时设置为201708151

    顺便说一句:如果您使用的是 Docker 和 Azure Batch,您应该查看提供此功能的 Batch Shipyard。 (完全披露:我是此代码的贡献者。)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-10-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多