Python 用beautifulsoup 抓取不能正确抓取一些数据行答案

【问题标题】：Python scraping with beautifulsoup cannot scrape properly some lines of dataPython 用beautifulsoup 抓取不能正确抓取一些数据行
【发布时间】：2021-07-21 08:49:22
【问题描述】：

我正在探索 Python 中的网页抓取。我有以下 sn-p 但此代码的问题是提取的某些数据行不正确。这个 sn-p 可能是什么问题？

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

url = 'https://bscscan.com/txsinternal?ps=100&zero=false&valid=all'
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req, timeout=10).read()
soup = BeautifulSoup(webpage, 'html.parser')
rows = soup.findAll('table')[0].findAll('tr')

for row in rows[1:]:
    ttype = (row.find_all('td')[3].text[0:])
    amt = (row.find_all('td')[7].text[0:])
    transamt = str(amt)
    print()
    print ("this is bnbval: ", transamt)
    print ("transactiontype: ", ttype)

样本输出：

trans amt:   Binance: WBNB Token #- wrong data being extracted
transtype:  0x2de500a9a2d01c1d0a0b84341340f92ac0e2e33b9079ef04d2a5be88a4a633d4 #- wrong data being extracted

trans amt:  1 BNB
transtype:  call

trans amt:  1 BNB
transtype:  call

this is bnbval:   Binance: WBNB Token #- wrong data being extracted
transactiontype: 0x1cc224ba17182f8a4a1309cb2aa8fe4d19de51c650c6718e4febe07a51387dce #- wrong data being extracted

trans amt:  1 BNB
transtype:  call

【问题讨论】：

标签： python parsing beautifulsoup python-3.8

【解决方案1】：

您的代码没有任何问题。但是页面上的数据有问题。

有些行是 7 列行 - 您期望的行，有些行是 9 列行。那些有 9 列的行会给你错误的数据。

您只需转到页面并检查元素即可查看问题。

我可以建议您使用最后一个元素 [-1] 而不是 [7]。但是你需要对第三列进行某种 if 检查

【讨论】：

谢谢。根据您的想法，我能够找到解决方法。它已经按照我想要的方式工作了。