【问题标题】:Condor master node and workers only see the master nodeCondor主节点和worker只看到主节点
【发布时间】:2021-12-12 14:08:09
【问题描述】:

我正在尝试设置 HTCondor 批处理系统,但是当我执行 condor_status 时,它只在主节点和工作节点中显示主节点。他们都表明了这一点:

Name               OpSys      Arch   State     Activity LoadAv Mem

[master ip]   LINUX      X86_64 Unclaimed Idle      0.000  973

               Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

  X86_64/LINUX     1     0       0         1       0          0        0      0

         Total     1     0       0         1       0          0        0      0

主节点上的Condor_restart 工作正常,但在工作节点上会产生此错误:

ERROR
SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using no authentication method, which may imply host-based security.  Our address was '[ip address of master]', and server's address was '[ip address of worker]'.  Check your ALLOW settings and IP protocols.

这里是配置文件:

主节点:

CONDOR_HOST = [private ip of master]
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
# to avoid user authentication
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
HOSTALLOW_ADMINISTRATOR = *

工作节点的:

CONDOR_HOST = [private ip of master]
DAEMON_LIST = MASTER, STARTD
# to avoid user authentication
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
HOSTALLOW_ADMINISTRATOR = *

我允许在同一个安全组上:

All TCP    TCP      0 - 65535     
All    ICMP-IPv4   All     
SSH on port 22 

这就是它的样子(以“6”结尾的安全组)

【问题讨论】:

    标签: amazon-web-services batch-processing condor


    【解决方案1】:

    显然问题正在运行condor_reconfig -full。我只是重新安装它而没有这样做,而是使用systemctl restart condor 并且它起作用了。如果有人想对它的原因提出一些见解,请这样做:)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-01-21
      • 1970-01-01
      • 2016-12-03
      • 1970-01-01
      • 2020-09-27
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多