【问题标题】:Unable to return email message body无法返回电子邮件正文
【发布时间】:2019-07-03 12:06:27
【问题描述】:

我创建了一个类来阅读电子邮件并转换为数据框。这适用于所有 HEADER 数据,但我无法解析消息内容并尝试了多种方法。我正在关注这里的教程http://beneathdata.com/how-to/email-behavior-analysis/

我尝试修改代码中的 def fetch_and_parse 函数以选择消息内容,但似乎没有任何返回。我也尝试过修改 FETCH 查询,但我迷路了。

from imaplib import IMAP4_SSL
import email as em
from email.utils import parsedate, parsedate_tz
from email.parser import HeaderParser


class OutlookAccount(object):
    def __init__(self, username=None, password=None, folder=None):
        self.username = username
        self.password = password
        self.folder = folder

    def login(self):
        self.conn = IMAP4_SSL('outlook.office365.com')
        response = self.conn.login(self.username, self.password)
        return response

    def search(self, query, folder=None, readonly=False):
        ff = self.folder if self.folder else folder
        self.conn.select(ff, readonly)
        resp, data = self.conn.search(None, query)
        return data

    def fetch(self, uids, query):
        uid_arr = b','.join(uids[0].split())
        resp, data = self.conn.fetch(uid_arr, query)
        return data

    def fetch_and_parse(self, uids, query):
        data = self.fetch(uids, query)
        parser = HeaderParser()
        emails = []

        for email in data:
            if len(email) < 2:
                continue
            msg = em.message_from_bytes(email[1]).as_string()

            emails.append(parser.parsestr(msg))

        return emails

    def load_parse_query(self, search_query, fetch_query, folder=None, readonly=False):
        '''Perform search and fetch on an imap Gmail account. After fetching relevant info
        from fetch query, parse into a dict-like email object, return list of emails.'''
        uids = self.search(search_query, folder, readonly)
        return self.fetch_and_parse(uids, fetch_query)




import numpy as np
import pandas as pd
import getpass
#import matplotlib.pyplot as plt
#import matplotlib.dates as dates
#import matplotlib.gridspec as gridspec
from datetime import timedelta, datetime, date

imap_password = getpass.getpass()

outlook = OutlookAccount(username='some@email.com', password=imap_password)
outlook.login()

daysback = 6000 # ~10yrs...make this whatever ya like
notsince = 0 # since now.
since = (date.today() - timedelta(daysback)).strftime("%d-%b-%Y")
before = (date.today() - timedelta(notsince)).strftime("%d-%b-%Y")

SEARCH = '(SENTSINCE {si} SENTBEFORE {bf})'.format(si=since, bf=before)
ALL_HEADERS = '(BODY.PEEK[HEADER])'

# Search and fetch emails!
received = outlook.load_parse_query(search_query=SEARCH, 
                                  fetch_query=ALL_HEADERS, 
                                  folder='"INBOX"')


#create function to convert to dataframe

def scrub_email(headers):   
    # IMAP sometimes returns fields with varying capitalization. Lowercase each header name.
    return dict([(title.lower(), value) for title, value in headers]) 

df = pd.dataframe([scrub_email(email._headers) for email in received])

我希望数据框包含所有标题数据和包含电子邮件内容/正文的字段。

【问题讨论】:

    标签: python-3.x email imaplib


    【解决方案1】:

    需要选择主体作为 fetc_and_parse 融合的一部分,例如:

    if mime_msg.is_multipart():
            for part in mime_msg.walk():
                if part.is_multipart():
                    for subpart in part.get_payload():
                        if subpart.is_multipart():
                            for subsubpart in subpart.get_payload():
                                body = body + str(subsubpart.get_payload(decode=True)) + '\n'
                        else:
                            body = body + str(subpart.get_payload(decode=True)) + '\n'
                else:
                    body = body + str(part.get_payload(decode=True)) + '\n'
    else:
        body = body + str(mime_msg.get_payload(decode=True)) + '\n'
    
    body = bytes(body,'utf-8').decode('unicode-escape')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-08-03
      • 2019-12-16
      • 2021-08-13
      • 2017-05-13
      • 1970-01-01
      • 1970-01-01
      • 2020-08-26
      • 1970-01-01
      相关资源
      最近更新 更多