【问题标题】:Python not appending to listPython未附加到列表
【发布时间】:2021-12-03 06:17:54
【问题描述】:

我有一个我之前使用过的脚本,它使用关键字列表来查询具有多个列和条目的主文件。该脚本应逐行读取主​​文件,当遇到关键字时,会将整行写入新文件。

关键字文件如下所示:

A2M,ABCC9,ACADVL,ACTC1,ACTN2,ADA2,AGL

主文件如下所示:

8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
GSA-rs72475893,8,27380763,[A/G],NM_001979,NM_001256482,NM_001256483,NM_001256484,AM,AM,AM,AM,EXON,Missense_R1407W,Missense_R1307W,Missense_R1257W,Missense_R1407W
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent 
GSA-rs117056676,6,72385948,[T/C],,,,,AADACL2-AS1,AADAC,EXON,Silent,Silent,Missense_X400Q

所需的输出将是:

8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent

我正在使用的代码如下。我遇到的问题是“匹配”列表变量似乎是空的,它没有附加任何东西。为什么会这样?它没有进行任何匹配吗?还是因为它没有将它们附加到列表中?

我尝试将主文件和关键字文件用作 .csv 和 .txt,但没有任何效果。

感谢您的帮助!

#open the list of words to search for
list_file = open(r'file.csv','r')

search_words = []

#loop through the words in the search list
for word in list_file:

    #save each word in an array and strip whitespace
    search_words.append(word.strip())

list_file.close()

#this is where the matching lines will be stored
matches = []

#open the master file
master_file = open(r'file2.csv','r')

#loop through each line in the master file
for line in master_file:

    #split the current line into array, this allows for us to use the "in" operator to search for exact strings
    current_line = line.split()

    #loop through each search word
    for search_word in search_words:

        #check if the search word is in the current line
        if search_word in current_line:

            #if found then save the line as we found it in the file
            matches.append(line)

            #once found then stop searching the current line
            break

master_file.close()

#create the new file
new_file = open(r'file3.txt', 'w')

#loop through all of the matched lines
for line in matches:

    #write the current matched line to the new file
    new_file.write(line)

new_file.close()

【问题讨论】:

  • .split() 默认情况下用空格分隔字符串(您的文件中似乎不存在),not 逗号。
  • 简单调试会发现current_line 可能不包含您认为的内容。 How to debug small programs
  • @jasonharper 谢谢!这似乎解决了它。
  • @Woodford 线实际上运行良好,一直迭代到最后。对未来来说仍然是一个非常有用的资源,谢谢!

标签: python matching file-writing


【解决方案1】:

我尝试添加两个打印语句以查看幕后发生的情况,发现您正在阅读的第一个文件并没有将句子分成单独的单词。

search_words 是这样存储的:

search_words = ['hello this is a line']

而不是这样

search_words = ['hello', 'this', 'is', 'a', 'line']

我已经修改了第 10 行

对此:search_words += (word.strip()).split() 而不是这个:search_words.append(word.strip())

这里是修改后的代码:

#open the list of words to search for
list_file = open(r'file.csv','r')

search_words = []

#loop through the words in the search list
for word in list_file:

    #save each word in an array and strip whitespace
    search_words += (word.strip()).split()

list_file.close()

#print (search_words)    

#this is where the matching lines will be stored
matches = []

#open the master file
master_file = open(r'file2.csv','r')

#loop through each line in the master file
for line in master_file:

    #split the current line into array, this allows for us to use the "in" operator to search for exact strings
    current_line = line.split()

    #print (current_line)

    #loop through each search word
    for search_word in search_words:

        #check if the search word is in the current line
        if search_word in current_line:

            #if found then save the line as we found it in the file
            matches.append(line)

            #once found then stop searching the current line
            break

master_file.close()

#create the new file
new_file = open(r'file3.txt', 'w')

#loop through all of the matched lines
for line in matches:

    #write the current matched line to the new file
    new_file.write(line)

new_file.close()

【讨论】:

  • 最好在拆分术语后使用.strip(),否则您只会在拆分之前从字符串末尾删除空格,而不是从每个术语中删除空格。
  • 太棒了,没有意识到它们之间没有分隔符,因为它们在视觉上是分开的。谢谢!
  • @Da Chucky 一个空的 split() 函数删除字符串之间的任意数量的空格:D
  • @Da Chucky 也发布了一个解决方案,你可以看看它,看看我是否可能忽略了一些东西。
  • @SeekNDstroy 是的,确实如此,但这假设您要拆分空格而不是其他一些标记,例如 ,(在这种情况下您需要这样做)。
【解决方案2】:

一目了然有两个问题:

在文件上迭代会给你行,而不是在 , 字符上分割。在使用 .strip() 并附加到搜索列表之前,您需要使用 .split()。我已经删除了每一行的迭代,因为您的示例输入只有一行,但如果您希望有多行,您可以很容易地将其添加回来。

其次,.split() 默认会拆分为 (空格),而不是,,因此您需要将其指定为.split() 的参数。

通过这些修复(并使用上下文打开文件),修复的代码是:

search_words = []
with open(r'file.csv','r') as list_file:
    for word in list_file.read().split(","):  # Fix 1
        search_words.append(word.strip())

matches = []

with open(r'file2.csv','r') as master_file:
    for line in master_file:
        # Not strictly necessary, we can search in the string using in
        current_line = line.split(",")   # Fix 2

        for search_word in search_words:
            if search_word in current_line:
                matches.append(line)
                break

with open(r'file3.txt', 'w') as new_file:
    for line in matches:
        new_file.write(line)
        print(line)

结果:

8:27379821,8,27379821,[A/T],NM_001979,NM_001256482,NM_001256483,NM_001256484,A2M,A2M,A2M,A2M,,Silent,Silent,Silent,Silent
8:27381207,8,27381207,[A/C],NM_001979,NM_001256482,NM_001256483,NM_001256484,ADA2,ADA2,ADA2,ADA2,,Silent,Silent,Silent,Silent 

(请注意,控制台输出的每一行之间都有额外的新行)

【讨论】:

    猜你喜欢
    • 2021-11-13
    • 2021-07-12
    • 2015-03-20
    • 1970-01-01
    • 2010-12-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-13
    相关资源
    最近更新 更多