【问题标题】:Python: text file replace different strings in multiple lines HOW?Python:文本文件替换多行中的不同字符串如何?
【发布时间】:2017-02-16 05:22:27
【问题描述】:

基本任务:将 URL 请求转换为文本,并将其转储为文本文件(几乎是可用的 CSV)。

目标:干净的 CSV。在多行上,我试图替换多个(不同的)字符:

括号, 波浪号 (~), 每行末尾的额外逗号。

我找不到任何相对简单的示例来完成此操作。寻找可以逐行循环并替换的东西。

请注意:我预计此文件会随着时间的推移而变大,因此对内存不友好。

以下是创建文件的代码:

import urllib.request
with urllib.request.urlopen(URL1) as response:
    data = response.read()
decoded_data = data.decode(encoding='UTF-8')

str_data = str(decoded_data)
saveFile = open("test.txt",'w')
saveFile.write(str_data)
saveFile.close()

这是文件中的一个简化示例,第一行是字段名,第二行和第三行代表记录。

[["F1","F2","F3","F4","F5","F6"],

["string11","string12","string13","s~ring14","string15","string16"],

["string21","string22","s~ring23","string24","string25","string26"]]

【问题讨论】:

    标签: string python-3.x replace


    【解决方案1】:

    如果要替换字符串开头或结尾的字符,请使用strip。如果要删除的字符具有任意位置,请改用replace,如下所示:line.replace("~","")。请注意,与strip 不同,您不能在一个replace 调用中指定多个字符,但可以将它们链接起来,如下所示:line.replace("~","").replace(",","").replace("[","")

    只是一个可能对你有用的快速模型:

    with open("text.txt", 'r') as f:
        with open("result.txt", 'w') as new_f:
            for line in f:
                new_line = line.strip(" [],\n\t\r").replace("~","")
                print(new_line)
                new_f.write(new_line+"\n")
    

    因为我看到波浪号可以在任何地方,括号和逗号通常出现在末尾。我还在strip 中添加了“\n”、“\t”、“\r”和一个空格,因为这些字符可能(至少,“\n”肯定会出现在每一行的末尾)。

    【讨论】:

    • 是的,它做到了。完美的!谢谢你!! :-) 哇。处理波浪号和括号。
    • 找到括号进入文本文件的原始原因。该 URL 转到一个 JSON,该 JSON 旨在传达数据表(即列和行)。问题是我找不到一个可靠的例子来展示它。下面我重新发布了我的代码并进行了更正。请注意,上面的“洗涤器”不在我更正的代码中。
    【解决方案2】:

    您可以使用简单的 for 循环来遍历文件。然后你可以替换每一行中的字符

    file = open("text.txt", "r")
    clean_txt = ""
    for line in file:
        line = line.replace("~", "").replace("[","").replace("]","")
        line[len(line)-1] = "" #Replace the last character of the line.
    file.close
    w = open("text.txt", "w")
    w.write(clean_txt)
    w.close
    

    【讨论】:

    • 感谢您的意见。它实际上删除了文件的所有内容。我在发布之前尝试了这种方法。当我确实让它工作时,它只会在第一行“执行手术”。”正在寻找可以通过文件的东西。
    【解决方案3】:
    #!/usr/bin/env python3
    
    # Note, I used the print function as a way to visually confirm the code worked.
    # the URL_call will yield a byte that has serialized data for a basic table (columns and rows, where first row are column names -- just like Excel or SQL)
    
    URL_call = ("http://www.zzz.com/blabla.html")
    
    # URLIB module & function: the request has to be first decoded from UTF-8
    import urllib.request
    with urllib.request.urlopen(URL_call) as response:
        URL_data = response.read()
    
    URL_data_decoded = URL_data.decode(encoding='UTF-8')
    
    # use json to convert decoded response into a python structure (from a JSON structure)
    import json
    URL_data_JSON = json.loads(URL_data_decoded)
    
    # pandas will transition the python data structure from a "list-like" array to a table.
    import pandas as pd
    URL_data_panda = pd.DataFrame(URL_data_JSON)
    
    # this will create the text (in this case a CSV) file
    URL_data_panda.to_csv("test.csv")
    
    # The file will need the first row removed (columns are indexed coming out of the panda)
    
    #determine line count
    num_lines = sum(1 for line in open("test.csv"))
    
    print(num_lines)
    
    # the zero position is assigned to the first row of text. Writing from the second row (indexed as 1) get the removal done.
    lines = open("test.csv").readlines()
    open("test2.csv","w").writelines(lines[1:(num_lines)])
    
    
    # Changes the name of the first column from zero to a normalized name.
    
    import fileinput
    
    # Note, below you could setup a back-up file, in the file input, by adding an extra argument in the parens ("test2.csv", inplace=True, backup='.bak')
    with fileinput.FileInput("test2.csv", inplace=True) as file:
        for line in file:
            print(line.replace("0,", "REC_NUM,"), end='')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-03-19
      • 2022-08-11
      • 2012-11-21
      • 2017-08-01
      • 2021-02-08
      • 1970-01-01
      • 1970-01-01
      • 2016-04-01
      相关资源
      最近更新 更多