【发布时间】:2021-12-12 16:42:11
【问题描述】:
我在文件中有以下字符串。我正在尝试使用正则表达式来提取“---”之后的段落,这在此编辑器中未显示为文本。下图应该能让你对文字有所了解。
( 2021-07-10 01:24:55 PM GMT )STEMAILTE
---
Badminton is a racquet sport played using racquets to hit a shuttlecock across
a net. Although it may be played with larger teams, the most common forms of
the game are "singles" (with one player per side) and "doubles" (with two
players per side).
( 2021-07-10 01:27:55 PM GMT )ARAMASU
---
Both the academies run a residential training program for upcoming and
talented footballers and Boxers. The Academies are functioning in Sarusajai
Sports complex.
已附上下面的图片-
到目前为止我已经尝试过
re.findall(r'([(){}[\]][^\S]\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} [a-zA-Z]{2} [a-zA-Z]{3}[^\S][(){}[\]][\w.]+)(\n-+)((\n.+))+',text)
这只是给了我日期时间和之后的文本。
我正在尝试从上图中的每组中提取“---”之后的三个段落。此处提供的文字上方有垃圾文字。
【问题讨论】:
-
如果模式相同,是否可行:
data = text.split('---') -
我明白你的意思是只捕获文本。不过,使用返回 null 的 re.findall 并没有提取出来。
标签: python python-3.x regex