【问题标题】:How can I scrape tables that seem to be hidden by jquery?如何抓取似乎被 jquery 隐藏的表?
【发布时间】:2021-07-18 16:34:08
【问题描述】:
我试图在website 上用它们的含义来抓取这些单词,我刮掉了第一个表,但即使在通过单击它显示单词列表 2 之后,bs4 也找不到该表(或任何其他隐藏表)。对于这样的切换/隐藏元素,我有什么不同的吗?
这是我用来访问第一个表的内容:
root = "https://www.graduateshotline.com/gre-word-list.html#x2"
content = requests.get(root).text
soup = BeautifulSoup(content,'html.parser')
table = soup.find_all('table',attrs={'class':'tablex border1'})[0]
print(table)
【问题讨论】:
标签:
python
web-scraping
beautifulsoup
【解决方案1】:
import pandas as pd
df = pd.read_html('https://www.graduateshotline.com/gre/load.php?file=list2.html',
attrs={'class': 'tablex border1'})[0]
print(df)
输出:
0 1
0 multifarious varied; motley; greatly diversified
1 substantiation giving facts to support (statement)
2 feud bitter quarrel over a long period of time
3 indefatigability not easily exhaustible; tirelessness
4 convoluted complicated;coiled; twisted
.. ... ...
257 insensible unconscious; unresponsive; unaffected
258 gourmand a person who is devoted to eating and drinking...
259 plead address a court of law as an advocate
260 morbid diseased; unhealthy (e.g.. about ideas)
261 enmity hatred being an enemy
[262 rows x 2 columns]