【问题标题】:output files (chromosomal chunks) merging in nextflow输出文件(染色体块)在 nextflow 中合并
【发布时间】:2021-08-07 23:41:16
【问题描述】:

我有一个 nextflow 过程,它为每个染色体生成多个块到一个通道中,比如imputation,看起来像,

chr1.imputed.chunk1.gen.gz chr1.imputed.chunk2.gen.gz chr1.imputed.chunk3.gen.gz 
chr1.imputed.chunk1.stats chr1.imputed.chunk2.stats chr1.imputed.chunk3.stats
chr1.imputed.chunk1.bgen chr1.imputed.chunk2.bgen chr1.imputed.chunk3.bgen
.....

每条染色体有很多块(22 条染色体)。我怎样才能有效地合并它们 为每种类型的文件集获取相应的染色体,

chr1.imputed.merged.gen.gz
chr1.imputed.merged.stats
chr1.imputed.merged.bgen

得到合并后的输出后,我想删除所有的块。有什么帮助吗?

生成这些块的实际代码是:

process imputation {
publishDir params.out, mode:'copy'
input:
tuple val(chrom),val(chunk_array),val(chunk_start),val(chunk_end),path(in_haps),path(refs),path(maps) from imp_ch
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
script:
def (haps,sample)=in_haps
def (haplotype, legend, samples)=refs
"""
impute4.1.2_r300.3 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.step10.imputed.chunk${chunk_array}" -no_maf_align -o_gz -int "${chunk_start}" "${chunk_end}" -Ne 20000 -buffer 1000 -seed 54321

if [[ \$(gunzip -c "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c) == "0" ]]
then 
 echo  "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" is empty
else
 qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -snp-stats -osnp "${chrom}.step10.imputed.chunk${chunk_array}.snp.stats"
 qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -og "${chrom}.step10.imputed.chunk${chunk_array}.bgen" -os "${chrom}.step10.imputed.chunk${chunk_array}.sample"
fi
 """

【问题讨论】:

    标签: nextflow


    【解决方案1】:

    您能否发布生成您显示的 sn-p 的实际代码

    不看你的代码,我建议你试试这个http://nextflow-io.github.io/patterns/index.html#_process_per_file_range

    【讨论】:

    • 嗨,谢谢。您共享的链接在这种特殊情况下没有帮助,因为它是流程的输出。但是,我添加了生成这些块的实际代码。希望它能澄清这个问题。再次感谢。
    【解决方案2】:

    你有这个

    output:
    tuple val("${chrom}"),path("${chrom}.*") into imputed
    

    使用之前的输出通道规范,您可能必须在下游 process 中执行类似的操作

    input:
    tuple val(name), path(chr_files) from imputed
    
    script:  
    gen_files = chr_files.findAll { it.toString().endsWith('.gen.gz') }.sort()
    stat_files = chr_files.findAll { it.toString().endsWith('.stats') }.sort()
    """
    # try with echo first to see if you get what you want
    echo ${gen_files.join(' ')} > ${name}_gen_fileList.txt
    echo ${stat_files.join(' ')} > ${name}_stat_fileList.txt
    """
    

    一旦您确定上面的 echo 正在按预期打印,那么您可以在该 process 块中执行其他操作

    【讨论】:

    • 谢谢@user10101904。我只得到输出 *txt 文件中的最后一个块。我尝试使用echo ${gen_files.join(' ')} >> ${name}_gen_fileList.txt,但输出相同。另外,我收到一个错误:WARN: failed to publish file。其他进程工作正常,没有给出这样的警告。
    • 我稍微修改了input 声明tuple val("${chrom}"),path("${chrom}.*") into imputed.groupTuple().collect{chrom, files -> [ chrom, files.collect{it.string()}.join(' ')]},它给出了一个新错误:input tuple does not match input set cardinality declared by process 'merging'。有什么帮助吗?
    【解决方案3】:

    显然以下代码行解决了这个问题。

    imputed.into{impute_bgen;impute_gen;impute_sample;impute_stat}
    bgens=impute_bgen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[0])}.groupTuple()
    gens=impute_gen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[1])}.groupTuple()
    samples=impute_sample.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[2])}.groupTuple()
    stats=impute_stat.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[3])}.groupTuple()
    

    【讨论】:

      猜你喜欢
      • 2022-11-02
      • 1970-01-01
      • 2021-06-03
      • 1970-01-01
      • 2018-09-09
      • 2021-12-12
      • 2021-08-04
      • 2018-09-13
      • 2021-10-13
      相关资源
      最近更新 更多