Python：BeautifulSoup 将 HTML 段落标签中的单词添加到列表中答案

【问题标题】：Python: BeautifulSoup adding words from an HTML paragraph tag to listPython：BeautifulSoup 将 HTML 段落标签中的单词添加到列表中
【发布时间】：2020-06-09 20:30:46
【问题描述】：

我一直在努力尝试使用 BeautifulSoup 学习网络抓取。我正在尝试制作一个 Hangman 游戏来学习 Python，并希望制作一个使用英语中最常见的 1000 个单词的单人游戏模式。我最初只是要复制粘贴每个单词并遍历一个列表（这就是为什么会有那个 while 循环），但我决定改用 BeautifulSoup。

import requests
from bs4 import BeautifulSoup

#words = []
#while True:
    #word = input("Enter the word: ")
    #words.append(word)
    #print(words)

page = requests.get("https://www.ef.edu/english-resources/english-vocabulary/top-1000-words/") 
resources/english-vocabulary/top-1000-words/")
soup = BeautifulSoup(page.content, "html.parser")
para = soup.find(class_="field-item even")

我不太确定从这里去哪里。我正在尝试将网站中的所有这些项目（甚至位于 feild-item 类的第二段标记中）单独附加到一个列表中，然后将该列表保存为一个包以在主 Hangman 游戏中使用。由于单词出现在第二段标签中，我不知道该怎么做。我观看了一些 YouTube 视频，但它们都处理具有 id 或其他类可调用的文本。谢谢

【问题讨论】：

链接是否一分为二？

标签： python beautifulsoup

【解决方案1】：

单词在第二个<p>，所以使用para.find_all("p")[1] 来获取它。

然后你可以从这个标签中得到.text - 它在一个字符串中包含所有单词。

字符串有你应该删除的标签 - .replace("\t", "") 然后你可以.split("\n") 来创建包含单词的列表。

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.ef.edu/english-resources/english-vocabulary/top-1000-words/") 

soup = BeautifulSoup(page.content, "html.parser")
para = soup.find(class_="field-item even")

second_p = para.find_all('p')[1]
text = second_p.text.replace('\t', '')
words = text.split('\n')
print(words)

【讨论】：

【解决方案2】：

我为您编写了一个快速解决方案。您只需要找到正确的 div，然后选择包含单词的正确子标签。由于格式的原因，单词需要去掉空格并放在一个列表中。 Furas' answer 更详细地描述了该过程。

import requests
from bs4 import BeautifulSoup

class Hangman:

    def run_game(self):
        word_bank = self.get_word_bank()
        while True:
            # Your game here

    def get_word_bank(self):
        page = requests.get("https://www.ef.edu/english-resources/english-vocabulary/top-1000-words/", verify=False)
        soup = BeautifulSoup(page.content, "html.parser")
        words_tag = soup.find('div', {'class': "field-item even"})
        word_bank = words.split(", ".join(words_tag.findChildren()[1].getText().split()))
        return word_bank

if __name__ == "__main__":
    hangman = Hangman()
    hangman.run_game()

【讨论】：