字节类型序列化的python json问题答案

【问题标题】：python json issue with bytes type serializing字节类型序列化的python json问题
【发布时间】：2020-06-10 16:42:47
【问题描述】：

我正在按照教程从静态网站构建一个简单的 webscraper，但我收到以下类型错误： TypeError(f'对象类型为 {o.class.name} ' TypeError：字节类型的对象不是 JSON 可序列化的

到目前为止，这是我的代码：从 bs4 导入 BeautifulSoup 导入请求导入json

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text.encode('utf-8'),
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text.encode('utf-8'),
        "content": tweet.find('p', attrs= {'class': 'content'}).text.encode('utf-8'),
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text.encode('utf-8'),
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text.encode('utf-8')
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

我唯一可以假设的错误是这篇文章使用的是早期版本的 python，但这篇文章是最近的，所以不应该是这种情况。正在执行代码并创建 json 文件，但那里唯一的数据是“作者：”。抱歉，如果你们中的某些人答案很明显，但我才刚刚开始学习。

这是整个错误日志： (tutorial-env) C:\Users\afaal\Desktop\python\webscraper>python webscraper.py 回溯（最近一次通话最后）：文件“webscraper.py”，第 20 行，在 json.dump（tweetArr，输出文件）转储中的文件“C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json__init__.py”，第 179 行对于可迭代的块： _iterencode 中的文件“C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py”，第 429 行来自 _iterencode_list(o, _current_indent_level) 的产量 _iterencode_list 中的文件“C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py”，第 325 行从块中产出 _iterencode_dict 中的文件“C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py”，第 405 行从块中产出 _iterencode 中的文件“C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py”，第 438 行 o = _default(o) 文件“C:\Users\afaal\AppData\Local\Programs\Python\Python38\lib\json\encoder.py”，第 179 行，默认 raise TypeError(f'Object of type {o.class.name} ' TypeError：字节类型的对象不是 JSON 可序列化的

【问题讨论】：

请分享整个错误信息。为什么都是.text.encode('utf-8')？
停止创建字节对象并保留字符串？
@AMC 完成。只需按照教程，请将您的问题转发给 HackerNoon 的 Ethan Jarell。 ;)
@juanpa.arrivillaga 我该怎么做呢？
我觉得我是如此接近，然而如此遥远......我怎么没有意识到在encode()之后什么都没有做？！

标签： python web-scraping

【解决方案1】：

好的，事实证明我需要删除“.text”之后的所有内容，并且只需要谷歌“Json 序列化”（我只尝试谷歌搜索我的特定 TypeError 并没有得到任何确凿的信息）。正确的代码如下，以防像我这样的业余爱好者遇到同样的问题：

url = 'http://ethans_fake_twitter_site.surge.sh/'
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
tweetArr = []

for tweet in content.findAll('div', attrs = {'class': 'tweetcontainer'}):
    tweetObject = {
        "author": tweet.find('h2', attrs= {'class': 'author'}).text,
        "date": tweet.find('h5', attrs= {'class': 'dateTime'}).text,
        "content": tweet.find('p', attrs= {'class': 'content'}).text,
        "likes": tweet.find('p', attrs= {'class': 'likes'}).text,
        "shares": tweet.find('p', attrs= {'class': 'shares'}).text
    }
    tweetArr.append(tweetObject)
with open('twitterData.json', 'w') as outfile:
    json.dump(tweetArr, outfile)

感谢@juanpa.arrivillaga，非常感谢您彻底清除此问题！

【讨论】：