Snakemake WorkflowError：目标规则可能不包含通配符答案

【问题标题】：Snakemake WorkflowError: Target rules may not contain wildcardsSnakemake WorkflowError：目标规则可能不包含通配符
【发布时间】：2022-12-13 17:00:06
【问题描述】：

rule all:
        input:
                "../data/A_checkm/{genome}"

rule A_checkm:
    input:
      "../data/genomesFna/{genome}_genomic.fna.gz"
    output:
        directory("../data/A_checkm/{genome}")
    threads:
        16
    resources:
        mem_mb = 40000
    shell:
        """
        # setup a tmp working dir
        tmp=$(mktemp -d)
        mkdir $tmp/ref
        cp {input} $tmp/ref/genome.fna.gz
        cd $tmp/ref
        gunzip -c genome.fna.gz > genome.fna
        cd $tmp

        # run checking
        checkm lineage_wf -t {threads} -x fna ref out > stdout

        # prepare output folder
        cd {config[project_root]}
        mkdir -p {output}
        # copy results over
        cp -r $tmp/out/* {output}/
        cp $tmp/stdout {output}/checkm.txt
        # cleanup
        rm -rf $tmp
        """

预先感谢您的帮助！我想在约 600 个已下载的扩展名为“.fna.gz”的基因组文件列表上运行 checkm。每个下载的文件都保存在一个与基因组同名的单独文件夹中。我还想将所有结果放在每个基因组的单独文件夹中，这就是为什么我的输出是一个目录。当我使用“snakemake -s Snakefile --cores 10 A_checkm”运行此代码时，出现以下错误：

WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end).

任何人都可以帮我找出错误吗？

【问题讨论】：

这回答了你的问题了吗？ Q : Target rules may not contain wildcards Error in Snakemake - No wildcards in Target?

标签： wildcard snakemake

【解决方案1】：

您需要为 snakemake 提供 {genome} 通配符的具体值。你不能让它保持打开状态并期望 snakemake 像那样处理你项目的某个文件夹中的所有文件。

使用glob_wildcards(...) 确定您要处理的文件的文件名/基因组值。有关详细信息，请参阅the documentation。

现在您可以使用这些值在 rule all 中指定，以使用这些 {genome} 值创建所有文件夹（使用您的其他规则）：

# Determine the {genome} for all downloaded files
(GENOMES,) = glob_wildcards("../data/genomesFna/{genome}_genomic.fna.gz")


rule all:
    input:
        expand("../data/A_checkm/{genome}", genome=GENOMES),


rule A_checkm:
    input:
        "../data/genomesFna/{genome}_genomic.fna.gz",
    output:
        directory("../data/A_checkm/{genome}"),
    threads: 16
    resources:
        mem_mb=40000,
    shell:
        # Your magic goes here

如果下载应该在 snakemake 内部进行，请为此添加一个 checkpoint。然后看看this answer。

【讨论】：