无法通过 SSH 连接到 Google Cloud Engine。引导错误答案

【问题标题】：Can't SSH into Google Cloud Engine. Boot errors无法通过 SSH 连接到 Google Cloud Engine。引导错误
【发布时间】：2021-12-04 10:07:28
【问题描述】：

我无法通过 ssh 进入我的实例。我尝试了以下

创建了新的 ssh 密钥对并添加了项目，但这没有帮助。我在同一个项目中创建了一个全新的实例，我可以轻松地 ssh。所以，我认为 ssh 密钥不是问题。
“阻止项目范围的 SSH 密钥”也未选中
创建了一个机器映像并生成了一个新实例，但遇到了同样的问题
使用“启动脚本”启用串行控制台，但这也无济于事。它根本不接受密码。

    #! /bin/bash
    adduser serial1
    echo serial1:desperate-attempt | chpasswd
    usermod -aG google-sudoers serial1

我认为这不是磁盘空间问题。实例有 10 GB 磁盘。我只写入一个日志文件，最后我检查它是 ~50 MB。我也没有在控制台日志中看到磁盘空间错误

我确实在“串行端口 1（控制台）”日志中看到了这些错误

Oct 16 16:29:01 instance-1 ntpd[668]: bind(21) AF_INET6 fe80::4001:aff:fe8e:2%2#123 flags 0x11 failed: Cannot assign requested address
Oct 16 16:29:01 instance-1 ntpd[668]: unable to create socket on eth0 (5) for fe80::4001:aff:fe8e:2%2#123
Oct 16 16:29:01 instance-1 ntpd[668]: failed to init interface for address fe80::4001:aff:fe8e:2%2
Oct 16 16:29:01 instance-1 ntpd[668]: Listening on routing socket on fd #21 for interface updates
Oct 16 16:29:02 instance-1 ntpd[668]: bind(24) AF_INET6 fe80::4001:aff:fe8e:2%2#123 flags 0x11 failed: Cannot assign requested address
Oct 16 16:29:02 instance-1 ntpd[668]: unable to create socket on eth0 (6) for fe80::4001:aff:fe8e:2%2#123
Oct 16 16:29:02 instance-1 ntpd[668]: failed to init interface for address fe80::4001:aff:fe8e:2%2
Oct 16 16:29:02 instance-1 google_instance_setup[663]: Traceback (most recent call last):
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/bin/google_instance_setup", line 6, in <module>
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     from pkg_resources import load_entry_point
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3257, in <module>
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     def _initialize_master_working_set():
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3240, in _call_aside
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     f(*args, **kwargs)
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 3269, in _initialize_master_working_set
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     working_set = WorkingSet._build_master()
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 582, in _build_master
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     ws.require(__requires__)
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 899, in require
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     needed = self.resolve(parse_requirements(requirements))
Oct 16 16:29:02 instance-1 google_instance_setup[663]:   File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 785, in resolve
Oct 16 16:29:02 instance-1 google_instance_setup[663]:     raise DistributionNotFound(req, requirers)
Oct 16 16:29:02 instance-1 google_instance_setup[663]: pkg_resources.DistributionNotFound: The 'google-compute-engine==2.8.13' distribution was not found and is required by the application
[[0;1;31mFAILED[0m] Failed to start Google Compute Engine Instance Setup.

Oct 16 16:29:02 instance-1 google_instance_setup[663]: pkg_resources.DistributionNotFound: The 'google-compute-engine==2.8.13' distribution was not found and is required by the application
[[0;1;31mFAILED[0m] Failed to start Google Compute Engine Instance Setup.
See 'systemctl status google-instance-setup.service' for details.
         Starting NSS cache refresh...
Oct 16 16:29:02 instance-1 systemd[1]: google-instance-setup.service: Main process exited, code=exited, status=1/FAILURE
Oct 16 16:29:02 instance-1 systemd[1]: Failed to start Google Compute Engine Instance Setup.
Oct 16 16:29:02 instance-1 systemd[1]: google-instance-setup.service: Unit entered failed state.
Oct 16 16:29:02 instance-1 systemd[1]: google-instance-setup.service: Failed with result 'exit-code'.

google_accounts_daemon、google_metadata_script_runner、google_network_daemon、google_*、..重复上述错误。

听起来有些软件包不是最新的。但是如何在不登录实例的情况下安装？有什么好的方法可以解决这个错误吗？

【问题讨论】：

1) 这是一个新实例还是已经成功运行了一段时间的实例？ 2) 您可能会发现系统磁盘空间不足的消息。如果是这种情况，请调整引导磁盘的大小。 3) 查看日志以查找运行启动脚本的错误。如果您的实例磁盘空间不足，您将无法运行启动脚本 - 没有地方可以存储脚本。
1) 该实例已运行 1 年多。 2) 我没有看到“磁盘”、“空间”、“超出”virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0 virtio-scsi blksize=512 sectors=20971520 = 10240 MiB 3) 向主帖添加了更多日志的错误消息。谢谢@JohnHanley
您发布的错误消息是实际问题的副作用。返回引导日志的开头并找到导致问题的问题。
我在快速搜索中没有找到任何东西。将逐行遍历。 brb
我将磁盘增加到 20GB 以消除这种可能性。我仍然无法破译错误。完整的日志在这里 pastebin.com/3Nr3EWHt 感谢您的帮助！

标签： google-cloud-platform ssh google-compute-engine

【解决方案1】：

对于您的实例，Google Cloud 软件包或 Python 安装或两者均已损坏。此问题使您无法登录。

我建议您创建一个新实例并将永久性磁盘从损坏的实例移到新实例。

第 1 步：

在同一区域中创建一个新实例。微型实例将起作用。

第 2 步：

打开 Cloud Shell 提示符（如果设置了 gcloud，这也适用于您的桌面）。执行此命令。将 NAME 替换为您的实例名称（损坏的系统），将 DISK 替换为启动磁盘名称，将 ZONE 替换为系统所在的区域：

gcloud compute instances detach-disk NAME --disk=DISK --zone=ZONE

确保之前的命令没有报错。

第 3 步：

将此磁盘附加到您创建的新实例。

在附加第二个磁盘之前，确保新的 VM 实例正在运行。有时，如果多个磁盘可引导，实例可能会对从哪个磁盘引导感到困惑。

转到 Compute Engine -> 虚拟机实例。单击您的实例。单击编辑。在“其他磁盘”下单击“添加项目”。对于名称，输入/选择您从损坏的实例中分离的磁盘。点击保存。

第 4 步：

SSH 连接到您的新实例并连接两个磁盘。

第 5 步：

请仔细执行这些步骤。将第二个磁盘作为子目录挂载到根文件系统上。

成为超级用户。执行 sudo -s
执行命令df -h。确保未挂载 /dev/sdb1。
为挂载点创建一个目录：mkdir /mnt/oldsystem
挂载第二个磁盘：mount /dev/sdb1 /mnt/oldsystem

您现在可以从路径 /mnt/oldsystem 中的旧文件系统访问文件。

【讨论】：