【发布时间】:2021-09-10 05:52:58
【问题描述】:
我对包含日期年份的 re 模式有一些问题。
代码
import re
text ="May 2020 Musical Portraits September 24 - 25, 2021 Time: 8:00 pm Toledo Museum of Art Peristyle Romeo & JulietSpecial EventWhenFriday, Mar 23 / 20187:30pmBuy TicketsSunday, Mar 25 / 20182:30pmBuy TicketsWhereSamford University Wright CenterMap & DirectionsArtist"
format_list = ["(?:(?:(?:j|J)an)|(?:(?:f|F)eb)|(?:(?:m|M)ar)|(?:(?:a|A)pr)|(?:(?:m|M)ay)|(?:(?:j|J)un)|(?:(?:j|J)ul)|(?:(?:a|A)ug)|(?:(?:s|S)ep)|(?:(?:o|O)ct)|(?:(?:n|N)ov)|(?:(?:d|D)ec))\w*(?:\s)?(?:\n)?[0-9]{1,2}(?:\s)?(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}"]
all_dates=[]
for pattern in format_list:
all_dates = re.findall(pattern, text)
if all_dates == []:
continue
else:
for index,txt in enumerate(all_dates):
text = re.sub('([^\x00-\x7F]+)|(\n)|(\t)',' ', txt)
all_dates[index] = text
print(all_dates)
输出
['September 24 - 25, 2021', 'Mar 23 / 20187', 'Mar 25 / 20182']
所需的输出
['September 24 - 25, 2021', 'Mar 23 / 2018', 'Mar 25 / 2018']
问题
我得到的是"…20187" 和"…20182",而不是"…2018"。
【问题讨论】:
标签: python python-3.x regex python-re