【问题标题】:Read emails from txt file using python使用 python 从 txt 文件中读取电子邮件
【发布时间】:2020-07-24 02:33:08
【问题描述】:

我有一个包含电子邮件的 TXT 文件,例如:

From r Wed Oct 30 21:41:56 2002
Return ...
...
From r Thu Oct 31 08:11:39 2002
Return ...
...

我想将每封电子邮件提取到一个数组中,例如:

["From r Wed Oct 30 21:41:56 2002 Return ...", "From r Thu Oct 31 08:11:39 2002 Return ...", ..., "From r ..."]

我正在使用 python

 with open(self.file, encoding="utf8", errors='ignore') as data_file:
     lines = ''

     first_line = True

     for line in data_file:
         if line.startswith("From r") and not first_line:
             emails.append(lines)
             lines = ''
          else:
              first_line = False
          lines = lines + line

【问题讨论】:

  • 您希望根据什么拆分电子邮件?每封电子邮件只有 2 行 - From 和 Return?
  • 每封邮件有n行,但都以'From r'开头
  • @vitorcarvalho 提取数组的具体原因?

标签: python arrays file email


【解决方案1】:

假设每封电子邮件的第一行都以From r 开头,我们可以遍历电子邮件的每一行,每次看到From r 时都会在电子邮件列表中添加一个新条目,然后将之后的每一行连接到索引 i 跟踪的“当前”电子邮件。

emails = []
with open('emails.txt') as f:
    i = -1
    for line in f:
        if line.startswith('From r'):
            emails.append(line)
            i += 1
        else:
            emails[i] += line

print(emails)

输出:

['From r Wed Oct 30 21:41:56 2002\nReturn ...\n...\n', 'From r Thu Oct 31 08:11:39 2002\nReturn ...\n...\n']

【讨论】:

    【解决方案2】:

    试试这个:

    emails_list = []
    email = ""
    with open("full/path/to/file", "r") as f:
        email += f.readline()
        for l in f.readlines():
            l = l.strip()
            if not l.startswith("From r"):
                email += " " + l
            else:
                emails_list.append(email)
                email = l
        else:
            emails_list.append(email)
    
    print(emails_list)
    

    【讨论】:

      猜你喜欢
      • 2021-08-13
      • 1970-01-01
      • 1970-01-01
      • 2018-02-23
      • 2015-03-06
      • 2014-07-27
      • 1970-01-01
      • 2019-09-02
      • 2021-06-22
      相关资源
      最近更新 更多