【发布时间】:2017-01-02 13:51:37
【问题描述】:
我有一个文件,里面是这样的:
1 33725 36725 ENHANCER0002
1 711760 714760 ENHANCER0003
1 724150 727150 ENHANCER0004
1 725455 728455 ENHANCER0005
1 871280 874410 ENHANCER0006
1 874180 877180 ENHANCER0007
1 900540 903540 ENHANCER0008
1 901475 904475 ENHANCER0009
1 910260 913260 ENHANCER00010
1 933355 936355 ENHANCER00011
1 947660 950660 ENHANCER00012
1 1013530 1016530 ENHANCER00013
.
.
.
1 2477030 2480030 ENHANCER00043
1 2478160 2481160 ENHANCER00044
1 2478845 2481845 ENHANCER00045
中间的两列是我的下边界和上边界。就像在 line3-4 或 line5-6 中一样,边界重叠。我必须以某种方式重塑它,如果边界重叠,它只打印最低的下边界和最高的上边界。我正在使用 Python 寻求这样的解决方案,这是我的代码:
def write_line(chr_no,tmp_l,tmp_h,cnt,filename):
filename.write(str(chr_no)+"\t"+str(tmp_l)+"\t"+str(tmp_h)+"\t"+"ENHANCER000"+str(cnt)+"\n")
inf = open("/home/firat/Desktop/Onder_Lab/Kenan/enhancers_bj.bed","r")
outf = open("/home/firat/Desktop/deneme_v3.bed","w")
cnt = 0
tmp_l=0
tmp_h=0
tmp_list = []
for line in inf:
cnt += 1
line = line.split(' ')
current_low = line[1]
current_high = line[2]
previous_low = tmp_l
previous_high = tmp_h
if (int(current_low) <= int(previous_high)):
tmp_list.append(int(current_low))
tmp_list.append(int(current_high))
tmp_list.append(int(previous_low))
tmp_list.append(int(previous_high))
write_line(line[0],min(tmp_list),max(tmp_list),cnt,outf)
tmp_l = min(tmp_list)
tmp_h = max(tmp_list)
tmp_list = []
else:
write_line(line[0], previous_low, previous_high, cnt, outf)
tmp_l= current_low
tmp_h= current_high
虽然我的解决方案看起来很有效,但输出是这样的:
1 27460 30460 ENHANCER0002
1 33725 36725 ENHANCER0003
1 711760 714760 ENHANCER0004
1 724150 728455 ENHANCER0005
1 724150 728455 ENHANCER0006
1 871280 877180 ENHANCER0007
1 871280 877180 ENHANCER0008
1 900540 904475 ENHANCER0009
1 900540 904475 ENHANCER00010
1 910260 913260 ENHANCER00011
1 933355 936355 ENHANCER00012
1 947660 950660 ENHANCER00013
1 1013530 1016530 ENHANCER00014
.
.
.
1 2477030 2481160 ENHANCER00044
1 2477030 2481845 ENHANCER00045
1 2477030 2481845 ENHANCER00046
如前所述,当边界重叠时,打印会出现重复。也有 3 行重叠的情况,就像在最底部一样。预期的输出应该是这样的:
1 27460 30460 ENHANCER0002
1 33725 36725 ENHANCER0003
1 711760 714760 ENHANCER0004
1 724150 728455 ENHANCER0005
1 871280 877180 ENHANCER0006
1 900540 904475 ENHANCER0007
1 910260 913260 ENHANCER0008
.
.
.
1 2477030 2481845 ENHANCER00046
我的代码有什么问题,即使有超过 2 行重叠,我如何改进它以使其正常工作?
【问题讨论】: