【问题标题】:How to extract the date from a paragraph如何从段落中提取日期
【发布时间】:2022-01-25 01:05:04
【问题描述】:

我有大句如下图,

how are you

On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:


-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy.

我想要句子中指定的日期和时间 (Tue, Dec 21, 2021 at 1:51 PM)。 如何从句子中提取?

【问题讨论】:

    标签: python python-3.x python-datetime


    【解决方案1】:

    这里的方法是使用正则表达式,但为了简单起见,如果文本的格式始终相同,您可以通过查找类似于 On SOME DATE &lt;Someone&lt;someone's email address&gt;&gt; wrote: 的行来获取日期字符串。这是一个示例实现:

    email = """how are you
    
    On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
    
    
    -------------------------------------------------------------
    NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy."""
    
    for line in email.splitlines():
        if line.startswith("On ") and line.endswith(" wrote:"):
            date_string = line[3 : line.index(" <")]
            print(f"Found the date: {date_string!r}")
            break
    else:
        print("Could not find the date.")
    

    【讨论】:

    • 不是解决这个问题的最佳答案。使用正则表达式可以做得更好
    • @A H Bensiali 在我的回答的第一句话中明确提到了这一点。无论如何,它回答了OP的问题。如果您认为此答案不好,请考虑通过添加“第二种(或更好的)方法来改进它”或编写您自己的答案。这对社区会更有用。
    【解决方案2】:

    很脏:

    string = """how are you \r\n\r\nOn Tue, Dec 21, 2021 at 1:51 PM 
    <abchttp://localhost> wrote:\r\n\r\n\r\n--------------------------------- 
    ----------------------------\r\nNOTE: Please do not remove email address 
    from the"To" line of this email when replying.This address is used to 
    capture the email and report it.Please do not remove or change the 
    subject line of this email.The subject line of this email contains 
    information to refer this correspondence back to the originating 
    discrepancy.\r\n"""
    
    string = string.split("\r\n\r\n")
    date = ' '.join(string[1].split(' ')[:8])
    print(date)
    

    【讨论】:

      【解决方案3】:
      1. 使用正则表达式提取日期和时间。
      import re
      
      text = '''how are you
      
      On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
      ...
      '''
      match = re.search('(Mon|Tue|Wed|Thu|Fri|Sat|Sun).*?(AM|PM)', text)
      match_date_and_time = match.group() # Tue, Dec 21, 2021 at 1:51 PM
      
      1. 使用datetime.strptime 解析日期和时间。
      import datetime
      
      datetime.strptime(match_date_and_time, '%a, %b %d, %Y at %I:%M %p')
      

      【讨论】:

        猜你喜欢
        • 2011-07-15
        • 1970-01-01
        • 2023-03-24
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多