python HTML页面中的网页抓取未满答案

【问题标题】：Web scraping in python HTML page does not come fullpython HTML页面中的网页抓取未满
【发布时间】：2021-03-04 10:40:45
【问题描述】：

我正在尝试从页面中抓取两个表

但是当我使用 soup.find('table') 时，它就是找不到它。另外，当我打印汤对象时，HTML代码的表格部分没有打印出来，有什么解决办法吗？

到目前为止我的代码：

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')

table = soup.find('div').find_all('table')

print(table)

输出：

[]
[Finished in 3.4s]

当我运行这个时：

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em-aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm?empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')

table = soup.find('tbody').find_all('tr')

print(table)

我明白了，但是在页面的 HTML 中，表格信息在一个 tbody > tr 中，就像我之前刮过的表格一样

Traceback (most recent call last):
  File "C:\Users\jvbf9\Documents\data-science\scraping_thiago\main.py", line 11, in <module>
    table = soup.find('tbody').find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'
[Finished in 7.2s with exit code 1]

【问题讨论】：

如果您查看原始页面源代码，这些表格都是由 javascript 生成的，因此您必须改用 Selenium 之类的东西。

标签： python html web-scraping python-requests

【解决方案1】：

当您创建解析器时，您不会检索您检索内容的文本：

from bs4 import BeautifulSoup
import pandas as pd
import requests

url = 'http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de- 
dados/market-data/consultas/mercado-a-vista/opcoes/posicoes-em- 
aberto/posicoes-em-aberto-8AE490CA64BA055F0164CCCAE1F1460A.htm? 
empresaEmissora=AMBEV%20S.A.&data=19/11/2020&dataVencimento=21/12/20&f=0'

r = requests.get(url)

soup = BeautifulSoup(r.content, 'lxml')

table = soup.find('div').find_all('table')

print(table)

这应该是问题所在。

【讨论】：