【发布时间】:2022-01-16 00:41:45
【问题描述】:
我想用beautifulsoup4 和requests 编写一个websraper。它在特定表上抓取特定表的特定列的数据。它刮一次,等待一段时间,再刮一次,然后比较两个“刮”。如果有差异,则打印"something has changed",如果没有差异,则打印"no changes"
这是完整的代码:
import requests
import time
from bs4 import BeautifulSoup
URL = "https://website.com"
website = requests.get(URL)
soup = BeautifulSoup(website.content, "html.parser")
data = []
table = soup.find("table", class_="table table-bordered table-sm table-responsive")
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')[0]
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
cols2 = row.find_all('td')[1]
cols2 = [ele.text.strip() for ele in cols2]
data.append([ele for ele in cols2 if ele]) # Get rid of empty values
cols3 = row.find_all('td')[2]
cols3 = [ele.text.strip() for ele in cols3]
data.append([ele for ele in cols3 if ele]) # Get rid of empty values
cols4 = row.find_all('td')[3]
cols4 = [ele.text.strip() for ele in cols4]
data.append([ele for ele in cols4 if ele])
cols5 = row.find_all('td')[5]
cols5 = [ele.text.strip() for ele in cols5]
data.append([ele for ele in cols5 if ele])
print(cols, cols2, cols3, cols4, cols5)
time.sleep(600)
for row in rows:
cols11 = row.find_all('td')[0]
cols11 = [ele.text.strip() for ele in cols11]
data.append([ele for ele in cols11 if ele]) # Get rid of empty values
cols22 = row.find_all('td')[1]
cols22 = [ele.text.strip() for ele in cols22]
data.append([ele for ele in cols22 if ele]) # Get rid of empty values
cols33 = row.find_all('td')[2]
cols33 = [ele.text.strip() for ele in cols33]
data.append([ele for ele in cols33 if ele]) # Get rid of empty values
cols44 = row.find_all('td')[3]
cols44 = [ele.text.strip() for ele in cols44]
data.append([ele for ele in cols44 if ele])
cols55 = row.find_all('td')[5]
cols55 = [ele.text.strip() for ele in cols55]
data.append([ele for ele in cols55 if ele])
print(cols11, cols22, cols33, cols44, cols55)
if(cols == cols11, cols2 == cols22, cols5 == cols55):
print("no changes")
else:
print("something has changed")
问题是:它总是说"no changes",即使我知道有些东西已经改变了。如何解决这个问题?
【问题讨论】:
-
虽然它可能是相关的,但不是您问题的根源,您能用自己的话解释一下您希望条件
if(cols == cols11, cols2 == cols22, cols5 == cols55)适用的逻辑吗?与此相关的是,您是否手动检查了这些列表的内容以确保它们包含您期望的数据? -
好的,我试试。
cols和cols11等是同一列。colsX包含第一次抓取的数据,colsXX包含第二次抓取的数据。在 if 条件中,它比较“关联”列的内容。
标签: python html web-scraping beautifulsoup python-requests