【问题标题】:Combining two "for" to integrate a progress bar while deleting line in a text file in Python?在Python中删除文本文件中的行时结合两个“for”来集成进度条?
【发布时间】:2020-09-18 20:09:01
【问题描述】:

我设法让两个 python 脚本独立工作。第一个是关于在文本文件中查找字符串并删除包含该字符串的所有行。

bad_words = ['first.com','second.org','third.io']

with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in bad_words):
            newfile.write(line) 

这个过程很长,因为输入接近 1 000 000 行,而 bad_words 接近 100 个条目。

所以我想在进行时在终端中显示一个进度条。我发现了这一点,它正在工作,每 1/10 秒增加一次。

import time

# Print iterations progress
def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required  : current iteration (Int)
        total       - Required  : total iterations (Int)
        prefix      - Optional  : prefix string (Str)
        suffix      - Optional  : suffix string (Str)
        decimals    - Optional  : positive number of decimals in percent complete (Int)
        length      - Optional  : character length of bar (Int)
        fill        - Optional  : bar fill character (Str)
        printEnd    - Optional  : end character (e.g. "\r", "\r\n") (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filledLength = int(length * iteration // total)
    bar = fill * filledLength + '-' * (length - filledLength)
    print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = printEnd)
    # Print New Line on Complete
    if iteration == total: 
        print()



# A List of Items
items = list(range(0, 57))
l = len(items)

# Initial call to print 0% progress
printProgressBar(0, l, prefix = 'Progress:', suffix = 'Complete', length = 50)
for i, item in enumerate(items):
    # Do stuff...
    time.sleep(0.1)
    # Update Progress Bar
    printProgressBar(i + 1, l, prefix = 'Progress:', suffix = 'Complete', length = 50)

我希望进度条在处理 bad_words 中的每个单词时向前移动,而不是时间睡眠。

所以我想出了这个:

def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):

    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filledLength = int(length * iteration // total)
    bar = fill * filledLength + '-' * (length - filledLength)
    print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = printEnd)
    # Print New Line on Complete
    if iteration == total: 
        print()

items = ['first.com','second.org','third.io']
l = len(items)


printProgressBar(0, l, prefix = 'Progress:', suffix = 'Complete', length = 50)
with open('input.txt') as oldfile, open('output.txt', 'w') as newfile:
    for line in oldfile:
        if not any(items in line for item in items):
            newfile.write(line)
    for i, item in enumerate(items):
    # Update Progress Bar
        printProgressBar(i + 1, l, prefix = 'Progress:', suffix = 'Complete', length = 50)

似乎结合“For If”是不合适的。

【问题讨论】:

  • 我不知道你所说的“叠印”是什么意思,但是你现在所拥有的是所有的写作first,并且然后 尝试进行所有 的进度条更新。除了更新每一行输出,它更新每个“坏词”。尝试做这种进度条的问题是你需要提前知道会有多少次迭代。这就是为什么您从操作系统获得的进度条看起来如此不准确的原因 - 因为它们通常知道,而且迭代并不都花费相同的时间,所以它必须估计。
  • 感谢您的回复卡尔。我所说的重叠是指共同行动。迭代次数由“l = len (items)”定义。我还尝试将其设置为数字“I = 7”(当 bad_words 中只有 7 个项目时)。这并没有改变什么。您遇到了问题,它会在更新进度条之前尝试完成所有写入。输出文件已创建,但保持为空。
  • ` 迭代次数由“l = len (items)”定义` 好的,但现在看看您选择的items,以及它的长度。这对你期望做的事情有意义吗? The output file is created, but stays empty. 那么,输入实际上有任何“好”行吗?您是否尝试检查是否已接通 newfile.write 呼叫?
  • 项目是字符串,而在示例中是整数(从 1 到 57)。输入文件确实有 100 000 行,我尝试使用较小的输入文件。结果是一样的。有没有办法使用字符串在项目中的位置而不是它们的值?
  • @KarlKnechtel 你指出了一个可能的错误,我得到 FileNotFoundError: [Errno 2] No such file or directory: 'input.txt' 即使目录中有同名文件。我不明白发生了什么。

标签: python python-3.x for-loop if-statement progress-bar


【解决方案1】:

正如@KarlKnechtel 建议的那样,方法是计算已处理的行数,而不是“bad_words”,因为它们当时是一起处理的。

添加进度条会使整个脚本变慢,所以我会尽量提高它的效率。

import time

# Print iterations progress
def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required  : current iteration (Int)
        total       - Required  : total iterations (Int)
        prefix      - Optional  : prefix string (Str)
        suffix      - Optional  : suffix string (Str)
        decimals    - Optional  : positive number of decimals in percent complete (Int)
        length      - Optional  : character length of bar (Int)
        fill        - Optional  : bar fill character (Str)
        printEnd    - Optional  : end character (e.g. "\r", "\r\n") (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filledLength = int(length * iteration // total)
    bar = fill * filledLength + '-' * (length - filledLength)
    print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = printEnd)
    # Print New Line on Complete
    if iteration == total: 
        print()

fname = "A+B_net.txt"
count = 0
with open(fname, 'r') as f:
    for line in f:
        count += 1
print("Total number of lines is:", count)

l = count

bad_words = ['geovisite.com','cable.dyn.cableonline.com.mx']
printProgressBar(0, l, prefix = 'Avancement:', suffix = 'Completé', length = 50)

with open('A+B_net.txt') as oldfile, open('A+B_sub.txt', 'w') as newfile:
        # for i, line in oldfile:
        for i, line in enumerate(oldfile, 1):  
        # for i, item in enumerate(items):
            if not any(bad_word in line for bad_word in bad_words):
                newfile.write(line)
                printProgressBar(i + 1, l, prefix = 'Avancement:', suffix = 'Completé', length = 50)
print("suppression des sous domaines terminé")

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-06-27
    • 1970-01-01
    • 1970-01-01
    • 2023-01-03
    • 2021-08-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多