使用beautifulSoup和python查找html中最大链接序列的长度？答案

【问题标题】：Using beautifulSoup and python to find length of the maximum sequence of links in html?使用beautifulSoup和python查找html中最大链接序列的长度？
【发布时间】：2018-09-12 08:44:19
【问题描述】：

我的任务是找到文章<div id="bodyContent"> 的正文并在其中计算最大链接序列的长度，在这些链接之间没有其他打开或关闭的标签。例如：

<p>
    <span><a></a></span>
    **<a></a>
    <a></a>**
</p>

- 连续有 2 个链接，因为关闭 span 会中断序列。

 <p>
    **<a><span></span></a>
    <a></a>
    <a></a>**
</p

- 并且子系列有3个链接，因为span是在链接里面，不是在链接之间。为了解决这个问题，我使用了 beautifulsoup 和 python。

代码：

import requests
from bs4 import BeautifulSoup

html = requests.get('https://en.wikipedia.org/wiki/Stone_Age')
soup = BeautifulSoup(html.text, "lxml")
body = soup.find(id="bodyContent")

# get first link
first_link = body.a

# find all links that are in the same level
first_link.find_next_siblings('a')

如何进入以下链接？

最好的问候！

【问题讨论】：

标签： python-3.x beautifulsoup python-requests

【解决方案1】：

我的解决办法是：

import requests
from bs4 import BeautifulSoup

html = requests.get('https://en.wikipedia.org/wiki/Stone_Age')
soup = BeautifulSoup(html.text, "lxml")
body = soup.find(id="bodyContent")

tag = body.find_next("a")
linkslen = -1
while (tag):
    curlen = 1
    for tag in tag.find_next_siblings():
        if tag.name != 'a':
            break
        curlen += 1
    if curlen > linkslen:
         linkslen = curlen
     tag = tag.find_next("a")
 print(linkslen)

【讨论】：

【解决方案2】：

另一种解决方案

import requests
from bs4 import BeautifulSoup

html = requests.get('https://en.wikipedia.org/wiki/Stone_Age')
soup = BeautifulSoup(html.text, "lxml")
body = soup.find(id="bodyContent")
all_links = body.find_all('a')
sequence = 0
for link in all_links:
    len = 1
    for l in link.find_next_siblings():
        if l.name != 'a':
            break
        len += 1
    sequence = max(sequence, len)
print(sequence)

【讨论】：