【问题标题】:Same name under different class, get URL, BeautifulSoup Python不同类下同名,获取URL,BeautifulSoup Python
【发布时间】:2020-10-08 21:57:31
【问题描述】:

我有这个 HTML,我需要获取上面的 URL:

<div class="posts-container col-md-6"
   <ul class="emb-embassies-list"
     <a class="entry-title" href="commonlink.com"
   <ul class="emb-embassies-list"
     <a class="entry-title" href="rarelink.com"

<div class="col-md-6"
   <ul class="emb-embassies-list"
     <a class="entry-title" href="anothercommonlink.com"
   <ul class="emb-embassies-list"
     <a class="entry-title" href="legendarylink.com"

当我申请时:

for i in soup.findAll('div', "posts-container col-md-6"):
    for anchor in soup.findAll('a', class_="entry-title", href=True):
        print(anchor['href'])

我明白了:

>commonlink.com
>rarelink.com
>anothercommonlink.com
>legendarylink.com

我只想获取“posts-container col-md-6”:

>commonlink.com
>rarelink.com

【问题讨论】:

  • 在内部循环中迭代i,而不是整个页面for anchor in i.findAll('a', class_="entry-title", href=True)
  • @Sushanth 出于某种原因,它只返回“进程以退出代码 0 完成”:/

标签: python html url beautifulsoup


【解决方案1】:

要使用class="posts-container col-md-6" 获取&lt;div&gt; 下的所有链接,请使用CSS 选择器.posts-container.col-md-6 a

from bs4 import BeautifulSoup


txt = '''
<div class="posts-container col-md-6">
   <ul class="emb-embassies-list">
     <a class="entry-title" href="commonlink.com">some link1</a>
   <ul class="emb-embassies-list">
     <a class="entry-title" href="rarelink.com">some link2</a>
</div>
<div class="col-md-6">
   <ul class="emb-embassies-list">
     <a class="entry-title" href="anothercommonlink.com">some link3</a>
   <ul class="emb-embassies-list">
     <a class="entry-title" href="legendarylink.com">some link4</a>
</div>'''



soup = BeautifulSoup(txt, 'html.parser')

for a in soup.select('.posts-container.col-md-6 a'):
    print(a['href'])

打印:

commonlink.com
rarelink.com

【讨论】:

    【解决方案2】:

    你可以试试:

    from bs4 import BeautifulSoup
    
    html_doc = '''
    <div class="posts-container col-md-6">
       <ul class="emb-embassies-list">
         <a class="entry-title" href="commonlink.com">some link1</a>
       <ul class="emb-embassies-list">
         <a class="entry-title" href="rarelink.com">some link2</a>
    </div>
    <div class="col-md-6">
       <ul class="emb-embassies-list">
         <a class="entry-title" href="anothercommonlink.com">some link3</a>
       <ul class="emb-embassies-list">
         <a class="entry-title" href="legendarylink.com">some link4</a>
    </div>'''
    
    soup = BeautifulSoup(html_doc, 'lxml')
    ancors = soup.find('div', class_="posts-container col-md-6").find_all('a')
    
    for a in ancors:
        print(a['href'])
    

    输出将是:

    commonlink.com
    rarelink.com
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-08-18
      • 2020-08-19
      • 1970-01-01
      • 2013-03-21
      • 2023-04-08
      • 2017-06-11
      • 1970-01-01
      相关资源
      最近更新 更多