【发布时间】:2021-01-29 02:40:01
【问题描述】:
我正在尝试创建一个管道,该管道将采用config.yml 中的用户配置目录(他们已从 BaseSpace 下载了 .fastq.gz 文件的项目目录),以在序列文件上运行 fastqc。我已经有了按车道合并 fastqs 并在合并文件上运行 fastqc 的下游步骤。
但是,通配符给我在原始基本空间文件上运行 fastqc 带来了问题。以下是我尝试运行snakemake时的错误。
Missing input files for rule all:
qc/fastqc_premerge/DEX-13_S9_L001_ngc1838-10_L001_ds.9fd1f6dff0df47ab821125aab07be69b_r1_fastqc.zip
qc/fastqc_premerge/BOMB-3-2-19D_S8_L002_ngc1838-8_L002_ds.b81c308d62ba447b8caf074ffb27917e_r1_fastqc.zip
qc/fastqc_premerge/DEX-13_S9_L002_ngc1838-10_L002_ds.6369bc71fac44f00931eecb9b0a45d59_r1_fastqc.zip
任何建议将不胜感激。以下是重现此问题的最少代码。
import glob
configfile: "config.yaml"
wildcard_constraints:
bsdir = '\w+_L\d+_ds.\w+',
lanenum = '\d+'
inputdirectory=config["directory"]
DIRECTORY, SAMPLES, LANENUMS = glob_wildcards(inputdirectory+"/{bsdir}/{sample}_L{lanenum}_R1_001.fastq.gz")
DIRECTORY, SAMPLES, LANENUMS = glob_wildcards(inputdirectory+"/{bsdir}/{sample}_L{lanenum}_R2_001.fastq.gz")
##### target rules #####
rule all:
input:
#expand('qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip', sample=SAMPLES, bsdir=DIRECTORY, lanenum=LANENUMS)
expand('qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip', zip, sample=SAMPLES, bsdir=DIRECTORY, lanenum=LANENUMS) ##Changed to this from commenters suggestion, however, snakemake still wont run
rule fastqc_premerge_r1:
input:
f"{config['directory']}/{{bsdir}}/{{sample}}_L{{lanenum}}_R1_001.fastq.gz"
output:
html="qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1.html",
zip="qc/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc to find the file. If not using multiqc, you are free to choose an arbitrary filename
params: ""
log:
"logs/fastqc_premerge/{sample}_L{lanenum}_{bsdir}_r1.log"
threads: 1
wrapper:
"v0.69.0/bio/fastqc"
目录结构:
ngc1838-10_L001_ds.9fd1f6dff0df47ab821125aab07be69b/DEX-13_S9_L001_R1_001.fastq.gz
ngc1838-10_L001_ds.9fd1f6dff0df47ab821125aab07be69b/DEX-13_S9_L001_R2_001.fastq.gz
ngc1838-10_L002_ds.6369bc71fac44f00931eecb9b0a45d59/DEX-13_S9_L002_R1_001.fastq.gz
ngc1838-10_L002_ds.6369bc71fac44f00931eecb9b0a45d59/DEX-13_S9_L002_R2_001.fastq.gz
ngc1838-8_L002_ds.b81c308d62ba447b8caf074ffb27917e/BOMB-3-2-19D_S8_L002_R1_001.fastq.gz
ngc1838-8_L002_ds.b81c308d62ba447b8caf074ffb27917e/BOMB-3-2-19D_S8_L002_R2_001.fastq.gz
在上述情况下,我想在所有 6 个输入 R1/R2 文件上运行 fastqc,然后在下游,为 DEX_13_S9(用于合并的两个输入)和 BOMB-3_2_19D(这将是一个副本)创建一个合并文件的 1 个输入)。然后为这些生成的 R1 和 R2 文件创建 4 个 fastqc 报告。
编辑:我必须更改以下内容才能运行 snakemake
inputdirectory=config["directory"]
PROJECTDIR, RANDOMINT, LANENUM1, BSSTRINGS, SAMPLES, LANENUMS = glob_wildcards(inputdirectory+"/{proj}-{randint}_L{lanenum1}_ds.{bsstring}/{sample}_L{lanenum}_R1_001.fastq.gz", followlinks=True)
PROJECTDIR, RANDOMINT, LANENUM1, BSSTRINGS, SAMPLES, LANENUMS = glob_wildcards(inputdirectory+"/{proj}-{randint}_L{lanenum1}_ds.{bsstring}/{sample}_L{lanenum}_R2_001.fastq.gz", followlinks=True)
##### target rules #####
rule all:
input:
"qc/multiqc_report_premerge.html"
rule fastqc_premerge_r1:
input:
f"{config['directory']}/{{proj}}-{{randint}}_L{{lanenum1}}_ds.{{bsstring}}/{{sample}}_L{{lanenum}}_R1_001.fastq.gz"
output:
html="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1.html",
zip="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc
params: ""
log:
"logs/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1.log"
threads: 1
wrapper:
"v0.69.0/bio/fastqc"
rule fastqc_premerge_r2:
input:
f"{config['directory']}/{{proj}}-{{randint}}_L{{lanenum1}}_ds.{{bsstring}}/{{sample}}_L{{lanenum}}_R2_001.fastq.gz"
output:
html="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2.html",
zip="qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2_fastqc.zip" # the suffix _fastqc.zip is necessary for multiqc
params: ""
log:
"logs/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2.log"
threads: 1
wrapper:
"v0.69.0/bio/fastqc"
rule multiqc_pre:
input:
expand("qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r1_fastqc.zip", zip, sample=SAMPLES, lanenum=LANENUMS, proj=PROJECTDIR, randint=RANDOMINT, lanenum1=LANENUM1, bsstring=BSSTRINGS),
expand("qc/fastqc_premerge/{sample}_L{lanenum}_{proj}-{randint}_L{lanenum1}_ds.{bsstring}_r2_fastqc.zip", zip, sample=SAMPLES, lanenum=LANENUMS, proj=PROJECTDIR, randint=RANDOMINT, lanenum1=LANENUM1, bsstring=BSSTRINGS)
output:
"qc/multiqc_report_premerge.html"
log:
"logs/multiqc_premerge.log"
wrapper:
"0.62.0/bio/multiqc"
【问题讨论】:
-
恐怕你需要自己找出来。通常,Snakemake 会立即开始构建 DAG。如果没有,您的 Snakefile 中有一些 Python 代码需要很长时间才能完成。尝试调试一下,添加打印语句等。堆栈跟踪指向 Snakefile 中的第 70 行。
-
谢谢 Johannes,我搞定了。我将固定的 Snakemake 代码放在评论的底部。我的原始代码中有两个关键问题。 1) 需要在扩展规则中使用 zip。 2)需要将子目录通配符从 bsdir 拆分为多个通配符,以便我可以正确构建路径。
标签: snakemake