【问题标题】:Simple way to remove duplicate whitespaces and remove all \n efficiently删除重复空格并有效删除所有 \n 的简单方法
【发布时间】:2019-08-31 23:52:43
【问题描述】:

我有一个名为test.txt 的文件,它有一堆重复的空格。 test.txt 文件包含 HTML。我想删除所有不必要的空格以减小 test.txt 文件中内容的大小。如何删除重复的空格并将整个字符串放在一行上。

test.txt

 <center>
    <b class="test" >My       name

is


fred</      b> <center>

我要打印的内容

<center><b class="test">My name is fred</b><center>

打印出来的内容

<center><b class="test" >Mynameisfred</b> <center>

程序.py

def is_white_space(before, curr, after):

    # remove duplicate spaces
    if (curr == " " and (before == " " or after == " ")):
        return True

    # Remove all \n
    elif (curr == "\n"):
        return True

    return False


f = open('test.txt', 'r')
contents = f.read()
f.close()

new = "";
i = 0
while (i < len(contents)):

    if (i != 0 and
        i != (len(contents) - 1) and
        not is_white_space(contents[i - 1], contents[i], contents[i + 1])):
        new += contents[i]

    i += 1

print(new)

【问题讨论】:

    标签: python python-3.x string for-loop


    【解决方案1】:

    这将在数字或字母之间留一个空格。

    from string import ascii_letters, digits
    
    
    def main():
        with open('test.txt', 'r') as f:
            parts = f.read().split()
    
        keep_separated = set(ascii_letters) | set(digits)
    
        for i in range(len(parts) - 1):
            if parts[i][-1] in keep_separated and parts[i + 1][0] in keep_separated:
                parts[i] = parts[i] + " "
    
        print(''.join(parts))
    
    
    if __name__ == '__main__':
        main()
    

    【讨论】:

      猜你喜欢
      • 2015-02-12
      • 2022-03-07
      • 1970-01-01
      • 2011-03-18
      • 1970-01-01
      • 2019-08-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多