【发布时间】:2022-01-28 14:21:58
【问题描述】:
我的代码不会抛出错误,它只是创建文件,但其中是空的。我从命令行尝试了它,它使用通配符 training_set_pssm/*.pssm 路径工作,但我必须从 IDE 执行它,因为它无论如何都不会打印正确的输出。
输入文件是一组检查点文件,如下所示:
从这个文件,这是一个文本文件,保存为 .pssm,本质上,我只提取右侧的 PROFILE 端并同时对其进行标准化......我的代码似乎没有做它是正确的,并且从 IDE 中它根本不这样做,所以我不确定我需要在脚本中修改什么来这样做。
代码如下:
#!/usr/bin/env python3
import sys
import os.path
from pathlib import Path
def pssm_list(infile): # call list of file names and for dsspfile
''' Reads relevant lines from a pssm file and saves them to a list.
Returns values of the 2 matrices (no header).'''
with open(infile) as ofile:
flist = ofile.readlines()[3:-6] # list of each line of the file excluding first 3 & last 6 lines
return flist
def lines_to_list(infile1):
''' Reads all lines from a file and saves them to a list containing the '\n' char. '''
all_lines_list = []
with open(infile1, 'r') as rfile:
all_lines_list = rfile.readlines()
return all_lines_list # need to rstrip in a loop for using filenames.
def relevant_lines(infile2):
'''Takes list (extracted from a .pssm file) and extracts the Sequence Profile Portion only.
Returns a list of list where each element is one line of the sequence profile matrix. '''
pssm_profile_list = pssm_list(infile2) # contains all lines from the pssm file.
profile_final_list = [] # for holding relevant fields of the lines
for line in pssm_profile_list:
#print(line)
pssm_profile_list = line.split()[22:42] # profile ranges from pos 22-42
profile_final_list.append(pssm_profile_list) # appending to final list of lists
return profile_final_list # list of lists
# # divide all values by 100
def write_normalized_profile(profile_final_list, ofile):
'''Takes profile list of lists and outfile name as input. Writes each number that is in
one of the sublists and devides it by 100. The number is converted to a string and added
a tab and written to a file. After each sublist a newline character is written to the file.'''
with open(ofile, "a") as wfile:
for sublist in profile_final_list:
# print(sublist)
for el in sublist:
num = int(el) / 100
numstring = str(num)
wfile.write(numstring + '\t') # adding tab after each number
wfile.write("\n") # adding newline at the end of each sublist.
#print(sublist)
#print(numstring)
if __name__ == '__main__':
# infile = sys.argv[1]
infile = ('/Users/name/Desktop/PDB/training_set_pssm/idlist/') # the idlist to loop on
#print(infile)
# Call the function by looping through an id list+'.pssm' extension
# name the outfile the same --> id list+'.profile'
idlist = lines_to_list("/Users/name/Desktop/PDB/training_set_idlist") # containing the id of the file but NOT the extension ".pssm"
#print(idlist)
for ids in idlist:
#print(ids)
part2 = ids.rstrip() + '.pssm' # removing newlinecharacter, adding necessary extension
#print(part2)
if os.path.isfile(infile) == True: # does this file exist
ofile = ids.rstrip() + '.profile' # outfile for each id with correct extension
#print(ofile)
profile_list = relevant_lines(infile)
#print(profile_list)
write_normalized_profile(profile_list, ofile)
#print(write_normalized_profile)
#print(profile_list)
else:
print("Error file: " + infile + " not found.")
【问题讨论】:
-
您使用的是哪个 IDE?
-
熟悉 stackoverflow.com/questions/4929251/… 或查看 IDE 的帮助部分?
-
@plumbn Pycharm!
-
@PM77-1 好吧,我认为它的代码有问题......我将它定向到的“内联”路径中有些东西不正确,而是从命令行调用它使用通配符,访问每个 *.pssm 文件。
-
所以如果我正确理解你的代码,
infile只是一个包含 id 的文件。ofile是为每个 id 创建的输出文件。 ispart2有什么需要吗?
标签: python loops matrix arguments parameter-passing