编码 UTF-8 时出错答案

【问题标题】：Error when encoding UTF-8编码 UTF-8 时出错
【发布时间】：2016-11-04 14:52:21
【问题描述】：

我正在尝试从网站获取文本数据，但此代码显示了一些错误。请让我知道错误在哪里。

import requests

from bs4 import BeautifulSoup

def getportions(soup):

for p in soup.find_all("p", {"class": ""}):   
    yield p.text


def readpage(address):   
   page = requests.get(address)    
   soup = BeautifulSoup(page.text, "html.parser")
   output_text = ''
   for s in getportions(soup):
      output_text += s.encode("utf8")
      output_text += "\n"
   print (output_text)
   print ("End of article")
   fp = open("content.txt", "w")
   fp.write(output_text)
if __name__ == "__main__":
  readpage("http://yahoo.com")

错误如下图：

output_text += s.encode("utf8")。 TypeError: 无法将 'bytes' 对象隐式转换为 str

【问题讨论】：

.encode 返回一个 bytes 对象。你想做什么？
@MorganThrapp 我正在尝试将内容写入文件
你的意思是decode吗？为什么你认为你需要对utf-8 做任何事情？
@MorganThrapp 如果我将对象设为字符串，那么它包含不必要的字符

标签： python python-3.x utf-8

【解决方案1】：

如果您使用 Python 3，所有字符串都是原生 unicode，您可以在打开文件时指定编码。你的代码可能变成：

def readpage(address):   
   ...
   output_text = ''
   for s in getportions(soup):
      output_text += s
      output_text += "\n"
   print (output_text)
   print ("End of article")
   fp = open("content.txt", "w", encoding='utf8')
   fp.write(output_text)

如果您只是想通过将所有非 ascii 字符替换为 ? 来清理文本，请以这种方式打开文件：

   fp = open("content.txt", "w", encoding='ascii', errors='replace')

【讨论】：

它显示错误 agin: return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u03a3' in position 350: character maps到
@NARAYANCHANGDER：无法复制。显示产生错误的代码和堆栈跟踪。 Utf8 旨在能够编码任何 unicode 字符...
@NARAYANCHANGDER: ...我可以确认我可以成功处理u03a3 (Σ)