UnicodeEncodeError：“ascii”编解码器无法在位置 32 编码字符“\u2159”：序数不在范围内（128）答案

【问题标题】：UnicodeEncodeError: 'ascii' codec can't encode character '\u2159' in position 32: ordinal not in range(128)UnicodeEncodeError：“ascii”编解码器无法在位置 32 编码字符“\u2159”：序数不在范围内（128）
【发布时间】：2019-11-10 03:20:41
【问题描述】：

我正在使用 python3 和 beautifulsoup 来抓取一个网站，但我收到了这个错误。我尝试使用其他答案中给出的解决方案来解决这个问题，但没有一个能解决我的问题。

# -*- coding: utf-8 -*-
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import pandas as pd


def getrank (animeurl):
    html = urlopen(animeurl)
    bslink = BeautifulSoup(html.read(), 'html.parser')
    
    rank = bslink.find('span', {'class' : 'numbers ranked'}).get_text().replace('Ranked #', '')
    


def spring19():
    html = urlopen('https://...')
    bs = BeautifulSoup(html.read(), 'html.parser')
    
    link = []
    for x in bs.find_all('a', {'class' : 'link-title'}):
        link.append(x.get("href"))
    
    
    
    ranklist = []
    for x in link:
        x.encode(encoding='UTF-8',errors='ignore')
        ranklist.append(getrank(x))
    
    return ranklist

spring19()

错误信息是： UnicodeEncodeError: 'ascii' codec can't encode character '\u2159' in position 32: ordinal not in range(128)

出现这个错误的原因是我抓取的url中有一些符号。但我仍然不知道我应该如何解决它。

非常感谢！

【问题讨论】：

您是否尝试过其他类型的编码？例如 Windows-1252？您应该能够从 HTML 本身（在 head par 中，charset 元元素）中获取用于网页的编码，或者可能更好的是，从服务器提供的标头中获取（BeautifulSoup 对此一无所知） ; 一旦您下载了文档，它就会丢失）。
请指出您的脚本中发生错误的确切位置。
你永远不会将x.encode(encoding='UTF-8',errors='ignore') 的结果分配给任何东西。 x 保持原来的样子，因为编码的结果被丢弃了。
非常感谢您的帮助！用于网站的编码实际上是 utf-8。但是我发现这个错误的原因是我报废的网址中有☆等符号，但我仍在努力解决这个问题。

标签： python python-3.x url beautifulsoup python-unicode

【解决方案1】：

用来自How to convert a url string to safe characters with python?的解决方案解决了这个问题

代码修改如下：

    ranklist = []
    for x in link:
        x = quote(x, safe='/:?=&')
        ranklist.append(getrank(x))

【讨论】：