【发布时间】:2020-07-26 18:03:11
【问题描述】:
当我通过按单词搜索数据从 twitter 测试数据挖掘时遇到问题。
此代码错误UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
retweet = "-filter:retweets"
query = "#Thailand" + retweet
df = pd.DataFrame(columns = ["create_at","user","location","text", "retweet_count", "favourite_count","hashtag","follower","source"])
for tweet in tweepy.Cursor(api.search, q = query,result_type="recent", tweet_mode='extended').items(100):
entity_hashtag = tweet.entities.get('hashtags')
hashtag = ""
for i in range(0, len(entity_hashtag)):
hashtag = hashtag + "/" + entity_hashtag[i]["text"]
re_count = tweet.retweet_count
create_at = tweet.created_at
user = tweet.user.screen_name
source = tweet.source
location = tweet.user.location
follower = tweet.user.followers_count
try:
text = tweet.retweeted_status.full_text
fav_count = tweet.retweeted_status.favorite_count
except:
text = tweet.full_text
fav_count = tweet.favorite_count
new_column = pd.Series([create_at,user,location,text, re_count, fav_count,hashtag,follower,source], index = df.columns)
df = df.append(new_column, ignore_index = True)
df.to_csv(date_time+".csv")
为什么会有这个问题?
【问题讨论】:
-
总是将完整的错误消息(从单词“Traceback”开始)作为文本(不是屏幕截图)放在有问题的(不是评论)中。还有其他有用的信息。
-
哪些行有问题?添加有问题(不在评论中)
-
通常的问题是文本有一些本机字符但系统尝试将其转换为
ascii而不是utf-8、latin1或cp1250,您必须手动添加此选项(即.encode="utf-8") 如果有可能会出现问题。 -
我认为这个 df.to_csv(date_time+".csv")
-
更好地显示完整的错误信息
标签: python pandas dataframe twitter ascii