【问题标题】:Azure ARM - Custom Script Extension - Failing randomlyAzure ARM - 自定义脚本扩展 - 随机失败
【发布时间】:2020-10-25 23:15:36
【问题描述】:

我开发了一个 Azure ARM 模板来部署一个 Ubuntu Linux 机器,一旦配置了一个 bash 脚本,它就会运行以安装特定的软件。该软件涉及下载一些软件包以及从用户那里传递一个输入参数以完成配置。我面临的问题是脚本扩展似乎间歇性地工作。我成功部署过一次,现在一直失败。这是自定义脚本开始执行几秒钟后返回的错误:

    {
  "code": "DeploymentFailed",
  "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
  "details": [
    {
      "code": "Conflict",
      "message": "{\r\n  \"status\": \"Failed\",\r\n  \"error\": {\r\n    \"code\": \"ResourceDeploymentFailure\",\r\n    \"message\": \"The resource operation completed with terminal provisioning state 'Failed'.\",\r\n    \"details\": [\r\n      {\r\n        \"code\": \"VMExtensionProvisioningError\",\r\n        \"message\": \"VM has reported a failure when processing extension 'metaport-onboard'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=1\\n[stdout]\\nReading package lists...\\nBuilding dependency tree...\\nReading state information...\\nsoftware-properties-common is already the newest version (0.96.24.32.14).\\nsoftware-properties-common set to manually installed.\\nThe following packages were automatically installed and are no longer required:\\n  grub-pc-bin linux-headers-4.15.0-121\\nUse 'sudo apt autoremove' to remove them.\\n0 upgraded, 0 newly installed, 0 to remove and 18 not upgraded.\\nReading package lists...\\nBuilding dependency tree...\\nReading state information...\\nSome packages could not be installed. This may mean that you have\\nrequested an impossible situation or if you are using the unstable\\ndistribution that some required packages have not yet been created\\nor been moved out of Incoming.\\nThe following information may help to resolve the situation:\\n\\nThe following packages have unmet dependencies:\\n python3-pip : Depends: python3-distutils but it is not installable\\n               Recommends: build-essential but it is not installable\\n               Recommends: python3-dev (>= 3.2) but it is not installable\\n               Recommends: python3-setuptools but it is not installable\\n               Recommends: python3-wheel but it is not installable\\n\\n[stderr]\\n+ sudo apt-get -qq -y update\\n+ sudo apt-get -q -y install software-properties-common\\n+ sudo apt-get -q -y install python3-pip\\nE: Unable to correct problems, you have held broken packages.\\nNo passwd entry for user 'mpadmin'\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \"\r\n      }\r\n    ]\r\n  }\r\n}"
    }
  ]
}

下面是我定义扩展的模板部分

    {
  "type": "Microsoft.Compute/virtualMachines",
  "name": "[variables('vmName')]",
  "apiVersion": "2019-12-01",
  "location": "[variables('location')]",
  "dependsOn": [
    "[resourceId('Microsoft.Network/networkInterfaces/', variables('nicName'))]",
    "[resourceId('Microsoft.Network/virtualNetworks', parameters('virtualNetworkName'))]",
    "[resourceId('Microsoft.Network/natGateways', variables('natGatewayName'))]",
    "[resourceId('Microsoft.Network/networkSecurityGroups', variables('networkSecurityGroupName'))]"
  ],
  "properties": {
    "hardwareProfile": {
      "vmSize": "[parameters('virtualMachineSize')]"
    },
    "osProfile": {
      "computerName": "[variables('vmName')]",
      "adminUsername": "[parameters('adminUsername')]",
      "adminPassword": "[parameters('adminPasswordOrKey')]",
      "linuxConfiguration": "[if(equals(parameters('authenticationType'), 'password'), json('null'), variables('linuxConfiguration'))]"
    },
    "storageProfile": {
      "imageReference": {
        "publisher": "[variables('imagePublisher')]",
        "offer": "[variables('imageOffer')]",
        "sku": "[variables('imageSKU')]",
        "version": "[variables('imageVersion')]"
      },
      "osDisk": {
        "name": "[concat(variables('vmName'), '_OSDisk')]",
        "caching": "ReadWrite",
        "createOption": "FromImage",
        "managedDisk": {
          "storageAccountType": "[variables('storageAccountType')]"
        }
      }
    },
      "networkProfile": {
        "networkInterfaces": [
          {
            "id": "[resourceId('Microsoft.Network/networkInterfaces',variables('nicName'))]"
          }
        ]
      }
    },
    "resources": [
          {
          "name": "metaport-onboard",
          "type": "extensions",
          "apiVersion": "2019-03-01",
          "location": "[resourceGroup().location]",
          "dependsOn": [
            "[resourceId('Microsoft.Compute/virtualMachines/', variables('vmName'))]",
            "[resourceId('Microsoft.Network/networkInterfaces',variables('nicName'))]",
            "[resourceId('Microsoft.Network/virtualNetworks', parameters('virtualNetworkName'))]",
            "[resourceId('Microsoft.Network/natGateways', variables('natGatewayName'))]",
            "[resourceId('Microsoft.Network/networkSecurityGroups', variables('networkSecurityGroupName'))]"
          ],
          "properties": {
            "publisher": "Microsoft.Azure.Extensions",
            "type": "CustomScript",
            "typeHandlerVersion": "2.1",
            "autoUpgradeMinorVersion": true,
            "settings": {
              "fileUris": [
                "https://raw.githubusercontent.com/willguibr/azure/main/Latest/MetaPort-Standalone-NATGW-v1.0/install_metaport.sh"
                ]
              },
            "protectedSettings": {
              "commandToExecute": "[concat('sh install_metaport.sh ', parameters('metaTokenCode'))]"
              }
            }
          }
        ]
      }
    ]
  }

完整的模板包是here

任何人对如何防止此问题或实施任何可能需要的更正有任何想法?

【问题讨论】:

    标签: azure azure-devops azure-devops-extensions


    【解决方案1】:

    嗯,这清楚地表明:脚本以代码 1 退出。这意味着脚本本身失败。所以你需要登录到 vm 并查看 c:\windowsazure\packages\logs (或类似的东西)的扩展日志,并找出问题所在并用一些 try\catch 逻辑包装它。另外,考虑将错误传播到控制台,这样您就可以在日志中实际看到它们。

    【讨论】:

    • 我的脚本很简单。前 5 行安装了所需的软件,一旦完成,我在继续之前添加了“sleep 30”。然后我添加了一个带有参数 $1 的变量。这个变量必须带有一个在 ARM 模板中引用的输入参数。我看到的问题是间歇性工作。有什么建议吗? #!/bin/bash sudo mkdir build sudo wget -q -Obuild/build_mp.sh s3.amazonaws.com/public.nsof.io/lxd/metaport-install.sh sudo chmod +x build/build_mp.sh sudo ./build/build_mp.sh sleep 30 metaTokenCode=$1 su mpadmin -c "metaport板载 $metaTokenCode"
    • 用重试逻辑包裹,这个问题与脚本扩展或arm模板或azure无关
    【解决方案2】:

    我终于弄清楚发生了什么。软件安装具有未自动安装的依赖项,它使整个脚本失败并以 status=1 退出。我更改了 bash 脚本,并在安装过程中手动添加了推荐的依赖项,并重新部署了模板和 booom。安装很顺利。 这是每次生成的错误消息:

    Building dependency tree...\\nReading state information...\\nSome packages could not be installed. **This may mean that you have\\nrequested an impossible situation or if you are using the unstable\\ndistribution that some required packages have not yet been created\\nor been moved out of Incoming.\\nThe following information may help to resolve the situation:\\n\\nThe following packages have unmet dependencies:\\n python3-pip : Depends: python3-distutils but it is not installable\\n Recommends: build-essential but it is not installable\\n Recommends: python3-dev (>= 3.2) but it is not installable\\n Recommends: python3-setuptools but it is not installable\\n Recommends: python3-wheel** but it is not installable\\n\\n[stderr]\\n+ sudo apt-get -qq -y update\\n+ sudo apt-get -q -y install software-properties-common\\n+ sudo apt-get -q -y install python3-pip\\nE: Unable to correct problems, you have held broken packages
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-06-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多