【问题标题】:SGE faild to submit job, attribute is not a memory valueSHE 提交作业失败,属性不是内存值
【发布时间】:2016-08-05 08:39:18
【问题描述】:

我未能提交具有 mem 属性的作业。由于我是新手,经过谷歌两天后,我在这里寻求帮助。任何建议将不胜感激!

以下是我所做的:

\1。提交我的脚本:

qsub -S /bin/bash -A assembly -pe threads 16 -l mem=2GB -cwd -N "pBcR_correct_asm" -j y -o /dev/null runCorrection.sh

Unable to run job: unknown resource "mem".
Exiting.

\2。考虑到我已经将“h”替换为“host”,根据SGE unknown resource "nodes"解决了我的问题,我将“m”替换为“mem”,但没有成功。

\3。在google之后,我知道“h”是在“/opt/gridengine/util/resources/centry/”中定义的快捷方式 hostname”,可以用“qconf -sc”确认:

qconf -sc

#name               shortcut   type        relop requestable consumable default  urgency 
#----------------------------------------------------------------------------------------
arch                a          RESTRING    ==    YES         NO         NONE     0
calendar            c          RESTRING    ==    YES         NO         NONE     0
cpu                 cpu        DOUBLE      >=    YES         NO         0        0
display_win_gui     dwg        BOOL        ==    YES         NO         0        0
h_core              h_core     MEMORY      <=    YES         NO         0        0
h_cpu               h_cpu      TIME        <=    YES         NO         0:0:0    0
h_data              h_data     MEMORY      <=    YES         NO         0        0
h_fsize             h_fsize    MEMORY      <=    YES         NO         0        0
h_rss               h_rss      MEMORY      <=    YES         NO         0        0
h_rt                h_rt       TIME        <=    YES         NO         0:0:0    0
h_stack             h_stack    MEMORY      <=    YES         NO         0        0
h_vmem              h_vmem     MEMORY      <=    YES         NO         0        0
hostname            h          HOST        ==    YES         NO         NONE     0
load_avg            la         DOUBLE      >=    NO          NO         0        0
load_long           ll         DOUBLE      >=    NO          NO         0        0
load_medium         lm         DOUBLE      >=    NO          NO         0        0
load_short          ls         DOUBLE      >=    NO          NO         0        0
m_core              core       INT         <=    YES         NO         0        0
m_socket            socket     INT         <=    YES         NO         0        0
m_topology          topo       RESTRING    ==    YES         NO         NONE     0
m_topology_inuse    utopo      RESTRING    ==    YES         NO         NONE     0
mem_free            mf         MEMORY      <=    YES         NO         0        0
mem_total           mt         MEMORY      <=    YES         NO         0        0
mem_used            mu         MEMORY      >=    YES         NO         0        0

\4.我因此将“mt”替换为“mem”,但它抱怨属性问题。根据上面的输出,mem_total 似乎与之前工作的“主机名”几乎相同。然后,我认为jsv通过SGE指南后可能会出现问题,但是我找不到任何包含“无法运行作业:属性......”的脚本,该脚本在“/opt/gridengine”的目录下/util/resources/jsv”。我想我必须配置一些文件,但是这些文件是什么,我应该怎么做呢?

qsub -S /bin/bash -A assembly -pe threads 16 -l mt=2GB -cwd -N "pBcR_correct_asm" -j y -o test.out  runCorrection.sh

Unable to run job: attribute "mem_total" is not a memory value.
Exiting.

【问题讨论】:

    标签: qsub sungridengine


    【解决方案1】:

    @文斯!

    非常感谢您的回复。

    最后我解决了我的问题,使用“h_vmem=2g”(“2GB”会出错),但我不知道在哪里可以找到如何设计复杂值(MEMORY)。

    现在不需要以下信息。

    我已经阅读了您提供的网站,并将复杂的h_vmem和s_vmeme的属性配置为“消耗品”,但它不起作用。我想我现在必须配置队列的“complex_value”,它是“NONE”。但是,我无法打开可能会告诉我如何配置的网络http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_types.html?pathrev=V62u5_TAG。我是否正确配置配置队列?我也必须配置主机吗?

    任何建议将不胜感激!

    以下是我所做的:

    \1。将 h_vmem 和 s_vmem 的 consumable 属性更改为“YES”:

    qconf -sc
    
    #name               shortcut   type        relop requestable consumable default  urgency 
    #----------------------------------------------------------------------------------------
    arch                a          RESTRING    ==    YES         NO         NONE     0
    calendar            c          RESTRING    ==    YES         NO         NONE     0
    cpu                 cpu        DOUBLE      >=    YES         NO         0        0
    display_win_gui     dwg        BOOL        ==    YES         NO         0        0
    h_core              h_core     MEMORY      <=    YES         NO         0        0
    h_cpu               h_cpu      TIME        <=    YES         NO         0:0:0    0
    h_data              h_data     MEMORY      <=    YES         NO         0        0
    h_fsize             h_fsize    MEMORY      <=    YES         NO         0        0
    h_rss               h_rss      MEMORY      <=    YES         NO         0        0
    h_rt                h_rt       TIME        <=    YES         NO         0:0:0    0
    h_stack             h_stack    MEMORY      <=    YES         NO         0        0
    h_vmem              h_vmem     MEMORY      <=    YES         YES        0        0
    hostname            h          HOST        ==    YES         NO         NONE     0
    load_avg            la         DOUBLE      >=    NO          NO         0        0
    load_long           ll         DOUBLE      >=    NO          NO         0        0
    load_medium         lm         DOUBLE      >=    NO          NO         0        0
    load_short          ls         DOUBLE      >=    NO          NO         0        0
    m_core              core       INT         <=    YES         NO         0        0
    m_socket            socket     INT         <=    YES         NO         0        0
    m_topology          topo       RESTRING    ==    YES         NO         NONE     0
    m_topology_inuse    utopo      RESTRING    ==    YES         NO         NONE     0
    mem_free            mf         MEMORY      <=    YES         NO         0        0
    mem_total           mt         MEMORY      <=    YES         NO         0        0
    mem_used            mu         MEMORY      >=    YES         NO         0        0
    min_cpu_interval    mci        TIME        <=    NO          NO         0:0:0    0
    np_load_avg         nla        DOUBLE      >=    NO          NO         0        0
    np_load_long        nll        DOUBLE      >=    NO          NO         0        0
    np_load_medium      nlm        DOUBLE      >=    NO          NO         0        0
    np_load_short       nls        DOUBLE      >=    NO          NO         0        0
    num_proc            p          INT         ==    YES         NO         0        0
    qname               q          RESTRING    ==    YES         NO         NONE     0
    rerun               re         BOOL        ==    NO          NO         0        0
    s_core              s_core     MEMORY      <=    YES         NO         0        0
    s_cpu               s_cpu      TIME        <=    YES         NO         0:0:0    0
    s_data              s_data     MEMORY      <=    YES         NO         0        0
    s_fsize             s_fsize    MEMORY      <=    YES         NO         0        0
    s_rss               s_rss      MEMORY      <=    YES         NO         0        0
    s_rt                s_rt       TIME        <=    YES         NO         0:0:0    0
    s_stack             s_stack    MEMORY      <=    YES         NO         0        0
    s_vmem              s_vmem     MEMORY      <=    YES         YES        0        0
    seq_no              seq        INT         ==    NO          NO         0        0
    slots               s          INT         <=    YES         YES        1        1000
    swap_free           sf         MEMORY      <=    YES         NO         0        0
    swap_rate           sr         MEMORY      >=    YES         NO         0        0
    swap_rsvd           srsv       MEMORY      >=    YES         NO         0        0
    swap_total          st         MEMORY      <=    YES         NO         0        0
    swap_used           su         MEMORY      >=    YES         NO         0        0
    tmpdir              tmp        RESTRING    ==    NO          NO         NONE     0
    virtual_free        vf         MEMORY      <=    YES         NO         0        0
    virtual_total       vt         MEMORY      <=    YES         NO         0        0
    virtual_used        vu         MEMORY      >=    YES         NO         0        0
    # >#< starts a comment but comments are not saved across edits --------
    

    \2。将我的工作提交到 smp.q 的队列中,它抱怨了同样的问题:

    qsub -S /bin/bash -A assembly -q smp.q -pe newPe 16 -l h_vmem=2GB -cwd -N "pBcR_correct_asm" -j y -o runCorrection.sh
    
    Unable to run job: attribute "h_vmem" is not a memory value.
    Exiting.
    

    \3。 smp.q 的信息。我认为“complex_values”应该改变,“h_vmem”可以保持不变:

    qconf -sq smp.q
    
    qname                 smp.q
    hostlist              @smp.q
    seq_no                0
    load_thresholds       np_load_avg=1.75
    suspend_thresholds    NONE
    nsuspend              1
    suspend_interval      00:05:00
    priority              0
    min_cpu_interval      00:05:00
    processors            UNDEFINED
    qtype                 BATCH INTERACTIVE
    ckpt_list             NONE
    pe_list               make newPe
    rerun                 FALSE
    slots                 160
    tmpdir                /tmp
    shell                 /bin/csh
    prolog                NONE
    epilog                NONE
    shell_start_mode      posix_compliant
    starter_method        NONE
    suspend_method        NONE
    resume_method         NONE
    terminate_method      NONE
    notify                00:00:60
    owner_list            NONE
    user_lists            NONE
    xuser_lists           NONE
    subordinate_list      NONE
    complex_values        NONE
    projects              NONE
    xprojects             NONE
    calendar              NONE
    initial_state         default
    s_rt                  INFINITY
    h_rt                  INFINITY
    s_cpu                 INFINITY
    h_cpu                 INFINITY
    s_fsize               INFINITY
    h_fsize               INFINITY
    s_data                INFINITY
    h_data                INFINITY
    s_stack               INFINITY
    h_stack               INFINITY
    s_core                INFINITY
    h_core                INFINITY
    s_rss                 INFINITY
    h_rss                 INFINITY
    s_vmem                INFINITY
    h_vmem                INFINITY
    

    \4. @smp.q 中的主机信息:

    qconf -sconf smp03.local
    
    #smp03.local:
    mailer                       /bin/mail
    xterm                        /usr/bin/X11/xterm
    execd_spool_dir              /opt/gridengine/default/spool
    

    \5.全局信息。我在这里添加了 h_vmem 和 s_vmem 吗?

    qconf -sconf
    
    #global:
    execd_spool_dir              /opt/gridengine/default/spool
    mailer                       /bin/mail
    xterm                        /usr/bin/X11/xterm
    load_sensor                  none
    prolog                       none
    epilog                       none
    shell_start_mode             posix_compliant
    login_shells                 sh,ksh,csh,tcsh
    min_uid                      0
    min_gid                      0
    user_lists                   none
    xuser_lists                  none
    projects                     none
    xprojects                    none
    enforce_project              false
    enforce_user                 auto
    load_report_time             00:00:40
    max_unheard                  00:05:00
    reschedule_unknown           00:00:00
    loglevel                     log_warning
    administrator_mail           none
    set_token_cmd                none
    pag_cmd                      none
    token_extend_time            none
    shepherd_cmd                 none
    qmaster_params               none
    execd_params                 ENABLE_ADDGRP_KILL=TRUE H_MEMORYLOCKED=infinity
    reporting_params             accounting=true reporting=true \
                                 flush_time=00:00:15 joblog=true sharelog=00:00:00
    finished_jobs                100
    gid_range                    20000-20100
    qlogin_command               builtin
    qlogin_daemon                builtin
    rlogin_command               builtin
    rlogin_daemon                builtin
    rsh_command                  builtin
    rsh_daemon                   builtin
    max_aj_instances             2000
    max_aj_tasks                 75000
    max_u_jobs                   0
    max_jobs                     0
    max_advance_reservations     0
    auto_user_oticket            0
    auto_user_fshare             0
    auto_user_default_project    none
    auto_user_delete_time        86400
    delegated_file_staging       false
    reprioritize                 0
    jsv_url                      none
    jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
    

    【讨论】:

    • 我想我知道我为什么失败了。似乎 h_vmem 没有配置全局,也就是说我必须“qconf -mconf global”并添加“h_vmem 1024M”。但是由于管理员不在,我无法对其进行测试。如果可行,我会在这里发布解决方案。
    【解决方案2】:

    您可能想要的是h_vmem。至少这是我总是用来指​​定我想要的工作请求的内存的属性。

    见:

    http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html?pathrev=V62u5_TAG

    具体来说,

         The resource limit parameters s_vmem and h_vmem  are  imple-
         mented  by  Sun  Grid  Engine  as a job limit. They impose a
         limit on the amount of combined virtual memory  consumed  by
         all the processes in the job. If h_vmem is exceeded by a job
         running in the queue, it is aborted  via  a  SIGKILL  signal
         (see  kill(1)).   If  s_vmem  is exceeded, the job is sent a
         SIGXCPU signal which can be caught by the job.  If you  wish
         to  allow  a  job  to  be "warned" so it can exit gracefully
         before it is killed then you should set the s_vmem limit  to
         a  lower  value  than  h_vmem.   For parallel processes, the
         limit is applied per slot which means that the limit is mul-
         tiplied  by the number of slots being used by the job before
         being applied.
    

    此外,您可能需要使用 qconf 将其设置为消耗品。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-01-18
      • 2016-02-26
      • 2013-12-15
      • 2019-07-13
      • 2014-07-27
      • 1970-01-01
      相关资源
      最近更新 更多