其实跟findAll()和find_all()没有关系。 findAll() 曾在 BeautifulSoup3 中使用,而留在 BeautifulSoup4 中出于兼容性原因,引用自 bs4 的源代码:
def find_all(self, name=None, attrs={}, recursive=True, text=None,
limit=None, **kwargs):
generator = self.descendants
if not recursive:
generator = self.children
return self._find_all(name, attrs, text, limit, generator, **kwargs)
findAll = find_all # BS3
还有一种更好的方法来获取单曲列表,它依赖于带有id="Singles" 的span 元素,它表示Singles 段落的开始。然后,使用find_next_sibling() 获取span 标记父级之后的第一个表。然后,用scope="row" 获取所有th 元素:
from bs4 import BeautifulSoup
import requests
source_code = requests.get('http://en.wikipedia.org/wiki/Taylor_Swift_discography')
soup = BeautifulSoup(source_code.content)
table = soup.find('span', id='Singles').parent.find_next_sibling('table')
for single in table.find_all('th', scope='row'):
print(single.text)
打印:
"Tim McGraw"
"Teardrops on My Guitar"
"Our Song"
"Picture to Burn"
"Should've Said No"
"Change"
"Love Story"
"White Horse"
"You Belong with Me"
"Fifteen"
"Fearless"
"Today Was a Fairytale"
"Mine"
"Back to December"
"Mean"
"The Story of Us"
"Sparks Fly"
"Ours"
"Safe & Sound"
(featuring The Civil Wars)
"Long Live"
(featuring Paula Fernandes)
"Eyes Open"
"We Are Never Ever Getting Back Together"
"Ronan"
"Begin Again"
"I Knew You Were Trouble"
"22"
"Highway Don't Care"
(with Tim McGraw)
"Red"
"Everything Has Changed"
(featuring Ed Sheeran)
"Sweeter Than Fiction"
"The Last Time"
(featuring Gary Lightbody)
"Shake It Off"
"Blank Space"