【问题标题】:python extract id value from href sourcepython从href源中提取id值
【发布时间】:2013-07-14 21:54:40
【问题描述】:

我已经使用 beautifulsoup 从页面源中提取了 href URI,但是我现在想从以下示例的多个实例中提取 UID 值:

例如

<a href="test.html?uid=5444974">
<a href="test.html?uid=5444972">
<a href="test.html?uid=54444972">

我们将不胜感激!

【问题讨论】:

标签: python regex beautifulsoup


【解决方案1】:
>>> html
'<a href="test.html?uid=5444974">\n<a href="test.html?uid=5444972">\n<a href="test.html?uid=54444972">'
>>> soup = BeautifulSoup(html)
>>> ass = soup.find_all('a')
>>> r = re.compile('uid=(\d+)')
>>> uids = []
>>> for a in ass:
...     uids.append(r.search(a['href']).group(1))
... 
>>> uids
['5444974', '5444972', '54444972']
>>> 

【讨论】:

    【解决方案2】:

    使用urlparseparse_qs

    html = """<a href="test.html?uid=5444974">
    <a href="test.html?uid=5444972">
    <a href="test.html?uid=54444972">
    """
    
    from bs4 import BeautifulSoup as BS
    from urlparse import urlparse, parse_qs
    soup = BS(html)
    for a in soup('a', href=True):
        print parse_qs(urlparse(a['href']).query)['uid'][0]
    

    输出:

    5444974
    5444972
    54444972
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-04-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多