【问题标题】:Python combine lines without blank new linesPython 合并没有空白新行的行
【发布时间】:2020-04-21 07:23:58
【问题描述】:

我需要您的帮助来解决以下问题。例如,我有一些大文本文件:

This is the Name of the Person

This is his surname

He likes to sing 
every time.

我只想将He likes to singevery time. 合并,因为在此之后我对每个字符串执行其他正则表达式。

所以输出应该是:

This is the Name of the Person

This is his surname

He likes to sing every time.

所以我试过了:

for file in file_list:
    with open(file, 'r', encoding='UTF-8', errors='ignore') as f_in:
        for line in f_in:
              if not line.startswith('\n'):
                line.replace('\n', '')
                print(line)

感谢您的帮助。

【问题讨论】:

  • print() 默认会在行尾添加换行符。试试print(line, end="")

标签: python newline


【解决方案1】:

你可以试试这个:

for file in file_list:
    with open(file, 'r', encoding='UTF-8', errors='ignore') as f_in:
        lines = [i.replace('\n', ' ') for i in f_in.read().split('\n\n')]

    # here you do something with your `lines`

【讨论】:

    【解决方案2】:

    我认为这样做会更好:

    for file_name in file_list:
        with open(file_name, "r", encoding="UTF-8", errors="ignore") as file:
            text = file.read()
            text_blocks = text.split("\n\n")
            for text_block in text_blocks:
                formatted_text_block = text_block.replace("\n", "")
                # then you can do what ever you want with this new block of text
    

    【讨论】:

      【解决方案3】:

      您可以在\n\n 上拆分部分,然后通过在\n 上拆分来合并每个部分:

      with open("data.txt") as f:
          for line in f.read().split("\n\n"):
              print("".join(line.split("\n")) + "\n")
      

      输出:

      This is the Name of the Person
      
      This is his surname
      
      He likes to sing every time.
      

      如果要将输出写回新文件,可以这样做:

      with open("data.txt") as f, open("output.txt", mode="w") as o:
          for line in f.read().split("\n\n"):
              o.write("".join(line.split("\n")) + "\n\n")
      

      我们需要添加一个额外的\n,因为我们不打印。

      output.txt

      This is the Name of the Person
      
      This is his surname
      
      He likes to sing every time.
      

      另一种选择是将所有行收集到一个字符串中,然后将整个字符串内容写入文件:

      with open("data.txt") as f, open("output.txt", mode="w") as o:
          lines = "\n\n".join("".join(line.split("\n")) for line in f.read().split("\n\n"))
          o.writelines(lines)
      

      上述解决方案的问题是他们在处理之前使用read() 将整个文件内容读入内存,这对于大文件可能会很慢。

      相反,我们可以创建一个生成器函数,从文件中生成部分:

      def collect_file_sections(f):
          section = []
          for line in f:
              line = line.strip()
              if line:
                  section.append(line)
                  continue
              yield section
              section = []
          yield section
      

      然后写成这样的部分:

      with open("data.txt") as f, open("output.txt", mode="w") as o:
          o.writelines("\n\n".join(" ".join(section) for section in collect_file_sections(f)))
      

      【讨论】:

        猜你喜欢
        • 2017-04-05
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-12-09
        • 2019-02-06
        • 1970-01-01
        • 2015-09-05
        • 1970-01-01
        相关资源
        最近更新 更多