将数据写入 csv 文件时出现编码错误答案

【问题标题】：Getting encoding error while writing data to csv file将数据写入 csv 文件时出现编码错误
【发布时间】：2017-06-04 16:07:43
【问题描述】：

from tweetpy import *
import re
import json
from pprint import pprint
import csv

# Import the necessary methods from "twitter" library
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream

# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN =  ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''

oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)

# Initiate the connection to Twitter Streaming API
twitter_stream = TwitterStream(auth=oauth)

# Get a sample of the public data following through Twitter
iterator = twitter_stream.statuses.filter(track="#kindle",language="en",replies="all")
 # Print each tweet in the stream to the screen

 # Here we set it to stop after getting 10000000 tweets.
 # You don't have to set it to stop, but can continue running
 # the Twitter API to collect data for days or even longer.

tweet_count = 10000000

file = "C:\\Users\\WELCOME\\Desktop\\twitterfeeds.csv"
with open(file,"w") as csvfile:
    fieldnames=['Username','Tweet','Timezone','Timestamp','Location']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for tweet in iterator:
        #pprint(tweet)
        username = str(tweet['user']['screen_name'])
        tweet_text = str(tweet['text'])
        user_timezone = str(tweet['user']['time_zone'])
        tweet_timestamp=str(tweet['created_at'])
        user_location = str(tweet['user']['location'])
        print tweet
        tweet_count -= 1
        writer.writerow({'Username':username,'Tweet':tweet_text,'Timezone':user_timezone,'Location':user_location,'Timestamp':tweet_timestamp})

        if tweet_count <= 0:
            break

我正在尝试将推文写入包含 'username'、'Tweet'、'Timezone'、'Location' 和 'Timestamp' 列的 csv 文件。

但我收到以下错误：

tweet_text = str(tweet['text'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128).

我知道这是编码问题，但我不知道要编码的变量的确切位置。

【问题讨论】：

你想对有问题的角色做什么？省略它们？将它们转换为最接近的 ASCII 等价物？转换成问号等固定字符？
Python 2 与 Python 3 的答案很可能不同。无论如何，您没有正确打开 csv 文件。建议您阅读文档（在两个版本中），其中显示了如何正确执行此操作。

标签： python python-2.7 python-3.x

【解决方案1】：

使用 Python 3，因为 Python 2 csv 模块不能很好地进行编码。
将open 与encoding 和newline 选项一起使用。
删除str 转换（在Python 3 中str 已经是Unicode 字符串。

结果：

with open(file,"w",encoding='utf8',newline='') as csvfile:
    fieldnames=['Username','Tweet','Timezone','Timestamp','Location']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for tweet in iterator:
        username = tweet['user']['screen_name']
        tweet_text = tweet['text']
        user_timezone = tweet['user']['time_zone']
        tweet_timestamp = tweet['created_at']
        user_location = tweet['user']['location']
            .
            .
            .

如果使用 Python 2，获取第 3 方 unicodecsv 模块以克服 csv 的缺点。

【讨论】：

【解决方案2】：

如果你真的想转换你所有的 unicode 数据

tweet['text'].encode("ascii", "replace")
or
tweet['text'].encode("ascii", "ignore") # if you want skip char

【讨论】：