【发布时间】:2014-02-16 15:46:04
【问题描述】:
好吧,我的挑战似乎很容易,但我已经没有选择了。因此,我们将不胜感激。
我有许多 fasta 格式的 DNA 序列,它们需要在特定位置切片,然后连接生成的部分。所以如果我的序列文件是这样的:
~$ cat seq_file
>Sequence1
This is now a sequence that must require a bit of slicing and concatenation to be useful
>Sequence2
I have many more uncleaned strings like this in the form of sequences
我希望输出是这样的:
>Sequence1
This is useful
>Sequence2
I have cleaned sequences
现在切片部分由单独 csv 文件中的切片索引确定。在这种情况下,切片位置是这样组织的:
~$ cat test.csv
Sequence1,0,9,66,74,,
Sequence2,0,5,15,22,48,57
我的代码:
from Bio import SeqIO
import csv
seq_dict = {}
for seq_record in SeqIO.parse('seq_file', 'fasta'):
descr = seq_record.description
seq_dict[descr] = seq_record.seq
with open('test.csv', 'rb') as file:
reader = csv.reader(file)
for row in reader:
seq_id = row[0]
for n in range(1,7):
if n % 2 != 0:
start = row[n] # all start positions for the slice occupy non-even rows
else:
end = row[n]
for key, value in sorted(seq_dict.iteritems()):
#print key, value
if key == string_id: # cross check matching sequence identities
try:
slice_seq = value[int(start):int(end)]
print key
print slice_seq
except ValueError:
print 'Ignore empty slice indices.. '
现在会打印出来:
Sequence1
Thisisnow
Sequence1
useful
Ignore empty slice indices..
Sequence2
Ihave
Sequence2
cleaned
Sequence2
sequences
到目前为止一切顺利,这是我所期望的。但是如何通过连接或连接或通过 python 中可能的任何操作将切片部分组合在一起达到我想要的目的?谢谢。
【问题讨论】: