假设文件的其余部分不需要检查或者是 UTF-8 合法的(包括 ASCII 数据),您可以使用 encoding='utf-8' 和 errors='replace' open 文件。这会将任何无效字节(UTF-8 编码)更改为 Unicode 替换字符 \ufffd。或者,为了保留数据,您可以使用'surrogateescape' 作为errors 处理程序,它使用专用Unicode 代码以一种以后可以撤消的方式表示原始值。然后,您可以随时检查:
with open(csvname, encoding='utf-8', errors='replace', newline='') as f:
for PersonName, age, address in csv.reader(f):
if '\ufffd' in PersonName:
continue
... PersonName was decoded without errors, so process the row ...
或者使用surrogateescape,您可以确保在写入时恢复其他字段中的任何非UTF-8 数据(如果“可能”):
with open(incsvname, encoding='utf-8', errors='surrogateescape', newline='') as inf,\
open(outcsvname, 'w', encoding='utf-8', errors='surrogateescape', newline='') as outf:
csvout = csv.writer(outf)
for PersonName, age, address in csv.reader(f):
try:
# Check for surrogate escapes, and reject PersonNames containing them
# Most efficient way to do so is a test encode; surrogates will fail
# to encode with default error handler
PersonName.encode('utf-8')
except UnicodeEncodeError:
continue # Had non-UTF-8, skip this row
... PersonName was decoded without surrogate escapes, so process the row ...
# You can recover the original file bytes in your code for a field with:
# fieldname.encode('utf-8', errors='surrogateescape')
# Or if you're just passing data to a new file, write the same strings
# back to a file opened with the same encoding/errors handling; the surrogates
# will be restored to their original values:
csvout.writerow([PersonName, age, address])