【发布时间】:2023-03-13 19:10:02
【问题描述】:
我正在尝试抓取 Reddit 子版块的帖子,其中很多问题的形式如下:
s1 = "I [22M] and my partner (21F) are foo and bar"
s2 = "My (22m) and my partner (21m) are bar and foo"
我想做一个函数来解析每个字符串,然后返回年龄和性别对。所以:
def parse(s1):
....
return [(22, "male"), (21, "female")]
基本上,每个年龄/性别标签都是一个两位数,后跟f, F, m, M。
【问题讨论】:
标签: regex python-3.x nlp reddit