【发布时间】:2020-05-14 16:15:23
【问题描述】:
我正在尝试将包含德语变音符号的信息写入 CSV。当我只写第一个参数“名称”时,它会正确显示。如果我写“名称”和“机构”,我会收到这个错误:
UnicodeEncodeError: 'charmap' codec can't encode character '\u0308' in position 71: character maps to <undefined>
正如您在下面的代码中看到的,我尝试使用不同的字符组合对文本进行编码和解码。
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
# this is the header of the csv
with open('/filepath/result.csv', 'w', encoding='utf-8') as f:
f.write("name, institution, \n")
l = list(range(1148, 1153))
for i in l:
url = 'webaddress.com' + str(i)
driver.get(url)
name = driver.find_elements_by_xpath('//div[@style="width:600px; display:inline-block;"]')[0].text
name = '\"' + name + '\"'
institution = driver.find_elements_by_xpath('//div[@style="width:600px; display:inline-block;"]')[1].text
institution = '\"' + institution + '\"'
print(str(i) + ': ' + name, '\n', str(i) + ': ' + institution, '\n')
print(institution.encode('utf-8'))
print(institution.encode('utf-8').decode('utf-8'))
print(institution.encode('utf-8').decode('ISO-8859-15'))
with open('/filepath/result.csv', 'a', encoding='utf-8') as f:
f.write(name + ',' + institution + '\n')
driver.close()
当我将所有编码设置为 UTF-8 时,CSV 中显示的结果与我编码 UTF-8 并解码 ISO-8859-15 (latin1) 的结果类似。当我编码 latin1 并解码 utf-8 时,我得到了与上面相同的错误。
感谢您的帮助。
【问题讨论】:
标签: python selenium web-scraping utf-8 data-cleaning