【发布时间】:2020-10-08 21:57:31
【问题描述】:
我有这个 HTML,我需要获取上面的 URL:
<div class="posts-container col-md-6"
<ul class="emb-embassies-list"
<a class="entry-title" href="commonlink.com"
<ul class="emb-embassies-list"
<a class="entry-title" href="rarelink.com"
<div class="col-md-6"
<ul class="emb-embassies-list"
<a class="entry-title" href="anothercommonlink.com"
<ul class="emb-embassies-list"
<a class="entry-title" href="legendarylink.com"
当我申请时:
for i in soup.findAll('div', "posts-container col-md-6"):
for anchor in soup.findAll('a', class_="entry-title", href=True):
print(anchor['href'])
我明白了:
>commonlink.com
>rarelink.com
>anothercommonlink.com
>legendarylink.com
我只想获取“posts-container col-md-6”:
>commonlink.com
>rarelink.com
【问题讨论】:
-
在内部循环中迭代
i,而不是整个页面for anchor in i.findAll('a', class_="entry-title", href=True)。 -
@Sushanth 出于某种原因,它只返回“进程以退出代码 0 完成”:/
标签: python html url beautifulsoup