【发布时间】:2020-06-18 19:41:30
【问题描述】:
我正在尝试使用此代码从消费者事务评论网站收集评论。但我不断收到错误,特别是在 dateElements & jsonData 部分。有人可以帮我修复此代码以与我要抓取的网站兼容吗?
from bs4 import BeautifulSoup
import requests
import pandas as pd
import json
print ('all imported successfuly')
# Initialize an empty dataframe
df = pd.DataFrame()
for x in range(1, 5):
names = []
headers = []
bodies = []
ratings = []
published = []
updated = []
reported = []
link = (f'https://www.consumeraffairs.com/online/allure-beauty-box.html?page={x}')
print (link)
req = requests.get(link)
content = req.content
soup = BeautifulSoup(content, "lxml")
articles = soup.find_all('div', {'class':'rvw js-rvw'})
for article in articles:
names.append(article.find('strong', attrs={'class': 'rvw-aut__inf-nm'}).text.strip())
try:
bodies.append(article.find('p', attrs={'class':'rvw-bd'}).text.strip())
except:
bodies.append('')
try:
ratings.append(article.find('div', attrs={'class':'stars-rtg stars-rtg--sm'}).text.strip())
except:
ratings.append('')
dateElements = article.find('span', attrs={'class':'ca-txt-cpt'}).text.strip()
jsonData = json.loads(dateElements)
published.append(jsonData['publishedDate'])
updated.append(jsonData['updatedDate'])
reported.append(jsonData['reportedDate'])
# Create your temporary dataframe of the first iteration, then append that into your "final" dataframe
temp_df = pd.DataFrame({'User Name': names, 'Body': bodies, 'Rating': ratings, 'Published Date': published, 'Updated Date':updated, 'Reported Date':reported})
df = df.append(temp_df, sort=False).reset_index(drop=True)
print ('pass1')
df.to_csv('AllureReviews.csv', index=False, encoding='utf-8')
print ('excel done')
这是我遇到的错误
Traceback(最近一次通话最后一次):文件“C:/Users/Sara Jitkresorn/PycharmProjects/untitled/venv/Caffairs.py”,第 37 行,在 jsonData = json.loads(dateElements) 文件 "C:\Users\Sara Jitkresorn\AppData\Local\Programs\Python\Python37\lib\json__init__.py", 第 348 行,在负载中 返回 _default_decoder.decode(s) 文件“C:\Users\Sara Jitkresorn\AppData\Local\Programs\Python\Python37\lib\json\decoder.py”, 第 337 行,在解码中 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 文件“C:\Users\Sara Jitkresorn\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", 第 355 行,在 raw_decode 提高 JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
【问题讨论】:
标签: python json pandas web-scraping beautifulsoup