【发布时间】:2013-07-17 13:44:30
【问题描述】:
我通过正则表达式和beautifulsoup 获得了以下信息。我需要提取 UID 值,例如 5968723334。
[u'/home.html', u'browse_settings.html', u'browse.html?', u'test.html?uid=5415292833', u'test.html?uid=5968723334', u'test.html?uid=5968723334', u'test.html?uid=5453943714', u'test.html?uid=5453943714', u'test.html?uid=6740871094', u'test.html?uid=6740871094', u'test.html?uid=5991868792', u'test.html?uid=5991868792', u'test.html?uid=25072413', u'test.html?uid=25072413', u'test.html?uid=6739965683', u'test.html?uid=6739965683', u'test.html?uid=7272910004', u'test.html?uid=7272910004', u'test.html?uid=13179298', u'test.html?uid=13179298', u'test.html?uid=5392816266', u'test.html?uid=5392816266', u'test.html?uid=5992588819', u'test.html?uid=5992588819', u'test.html?uid=6727114420', u'test.html?uid=6727114420', u'test.html?uid=7263648884', u'test.html?uid=7263648884', u'test.html?uid=5447240210', u'test.html?uid=5447240210', u'test.html?uid=5460515002', u'test.html?uid=5460515002', u'test.html?uid=5400731231', u'test.html?uid=5400731231', u'browse.html?params=_F_18_24_GB_0___grid_1', u'/home.html?t=1374068507', u'/account_info.html', u'http://www.example.com/browse.html?params=_F_18_24_GB_0___grid_0', u'http://www.example.com/contact.html', u'/logout.html', u'#top', u'/terms_of_service.html', u'http://safety.example.com']
我已经设法像这样提取了一个“uid”,但是我想提取所有 UID:
>>> m = re.search("uid=(\d*)", soup.contents[0])
>>> print m
<_sre.SRE_Match object at 0x211b210>
>>> print m.group(1)
5442562712
请帮忙!
【问题讨论】:
-
要求代码的问题必须证明对正在解决的问题有最基本的了解。包括尝试的解决方案、它们为什么不起作用以及预期的结果。 See also: Stack Overflow question checklist
-
已更新以包含尝试的解决方案...
-
你尝试过stackoverflow.com/questions/17681269/…的任何东西吗?
标签: python regex beautifulsoup