【发布时间】:2010-10-07 16:37:34
【问题描述】:
我正在尝试从电子邮件的纯文本副本中提取电子邮件地址。 我拼凑了一些代码来自己查找地址,但我不知道如何区分它们;现在它只是吐出文件中的所有电子邮件地址。我想让它只吐出以“发件人:”和几个通配符开头并以“>”结尾的地址(因为电子邮件设置为发件人 [name] )。
现在是代码:
import re #allows program to use regular expressions
foundemail = []
#this is an empty list
mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
#do not currently know exact meaning of this expression but assuming
#it means something like "[stuff]@[stuff][stuff1-4 letters]"
# "line" is a variable is set to a single line read from the file
# ("text.txt"):
for line in open("text.txt"):
foundemail.extend(mailsrch.findall(line))
# this extends the previously named list via the "mailsrch" variable
#which was named before
print foundemail
【问题讨论】:
标签: python string email parsing text