2021.3.9 python2.7爬虫中遇到的中文字符乱码问题

在爬虫中经常遇到中文字符存储乱码的情况，比如对我的博客进行爬虫：

import json
import requests
from bs4 import BeautifulSoup

user_agent = \'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)\'
headers = {\'User-Agent\': user_agent}
r=requests.get(\'https://www.cnblogs.com/yue-qian/\',headers=headers)
soup=BeautifulSoup(r.text,\'html.parser\')
text=[]
for zx in soup.find_all(\'div\',class_="c_b_p_desc"):

    text.append(zx.text)
with open("xyz.txt",\'w\') as fp:

   json.dump(text, fp=fp,indent=4)

结果部分截图如下：

如上所示，将爬虫结果存入json中后会出现乱码情况，这是因为Python在安装时，默认的编码是Ascii码

做如下更改：

import json
import requests
from bs4 import BeautifulSoup
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
user_agent = \'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)\'
headers = {\'User-Agent\': user_agent}
r=requests.get(\'https://www.cnblogs.com/yue-qian/\',headers=headers)
soup=BeautifulSoup(r.text,\'html.parser\')
text=[]
for zx in soup.find_all(\'div\',class_="c_b_p_desc"):

    text.append(zx.text)
with open("xyz.txt",\'w\') as fp:

   json.dump(text, fp=fp,ensure_ascii=False,indent=4)

结果如下：