【发布时间】:2021-03-04 10:40:45
【问题描述】:
我正在尝试从页面中抓取两个表
但是当我使用 soup.find('table') 时,它就是找不到它。另外,当我打印汤对象时,HTML代码的表格部分没有打印出来,有什么解决办法吗?
到目前为止我的代码:
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find('div').find_all('table')
print(table)
输出:
[]
[Finished in 3.4s]
当我运行这个时:
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find('tbody').find_all('tr')
print(table)
我明白了,但是在页面的 HTML 中,表格信息在一个 tbody > tr 中,就像我之前刮过的表格一样
Traceback (most recent call last):
File "C:\Users\jvbf9\Documents\data-science\scraping_thiago\main.py", line 11, in <module>
table = soup.find('tbody').find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'
[Finished in 7.2s with exit code 1]
【问题讨论】:
-
如果您查看原始页面源代码,这些表格都是由 javascript 生成的,因此您必须改用 Selenium 之类的东西。
标签: python html web-scraping python-requests