【发布时间】:2015-05-19 05:08:30
【问题描述】:
假设我有一个这样的列表:
lis_ = [['"Fun is the enjoyment of pleasure"\t\t',
'@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t','Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware https://t.co/k9sOEpKjbg\t\t'],
['I just became the mayor of Porta Romana on @username! http://4sq.com/9QROVv\t\t', "RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated http://t.co/heyOhpb1\t\t", "@username Don't use my family surname for your app ???? http://t.co/1yYLXIO9\t\t"]
]
我想删除每个子列表的链接,所以我尝试了这个正则表达式:
new_list = re.sub(r'^https?:\/\/.*[\r\n]*', '', tweets, flags=re.MULTILINE)
我使用了MULTILINE 标志,因为当我打印list_ 时,它看起来像:
[]
[]
[]
...
[]
上述方法的问题是我得到了一个TypeError: expected string or buffer,显然我不能像这样将子列表传递给正则表达式。 如何将上述正则表达式应用于 list_ 中的子列表集? 以获得类似的东西(即没有任何类型链接的子列表):
[['"Fun is the enjoyment of pleasure"\t\t',
'@username det fanns ett utvik med "sabrina without a stitch". acke nothing. @username\t\t','Report by @username - #JeSuisCharlie Movement Leveraged to Distribute DarkComet Malware'],
['I just became the mayor of Porta Romana on @username! \t\t', "RT benturner83 Someone's chucking stuff out of the window of an office on tottenham court road #tcr street evacuated \t\t", "@username Don't use my family surname for your app ????\t\t"]
]
这可以通过地图完成还是有其他有效的方法?
在此先感谢各位
【问题讨论】:
-
您应该修复您的
list_示例,因为现在它不是有效的 Python,因此很难确切知道它是什么。我猜它是一个包含字符串列表的列表,但我们不应该这样猜测。 -
您的预期输出是什么?
-
@AvinashRaj 我编辑了,谢谢大家的帮助!
标签: python regex list python-2.7 parsing