Python twitter 流保存到文件答案

【问题标题】：Python twitter stream save to filePython twitter 流保存到文件
【发布时间】：2017-08-29 22:27:13
【问题描述】：

我目前正在编写用于流式传输 Twitter 帖子并将它们保存到 json 文件的代码。同时，textblob 确定推文的情绪。到目前为止一切正常，但没有将所有输出保存到文件中。它目前保存推文，但不保存由 textblob 计算的情绪分数。这是我在 Python 中编码的第一天，我感谢每一点帮助:)

import textblob as textblob
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob

# consumer key, consumer secret, access token, access secret.
consumer_key = x
consumer_secret = x
access_token = x
access_token_secret = x


class StdOutlistener(StreamListener):
    def on_data(self, data):
        all_data = json.loads(data)
        tweet = TextBlob(all_data["text"])
        print(tweet)
        print(tweet.sentiment)

        # Open json text file to save the tweets
        With open('tweets.json', 'a') as tf:
            tf.write(data)

        return True

    def on_error(self, status):
        print(status)


auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

twitterStream = Stream(auth, StdOutlistener())
twitterStream.filter(languages=["en"], track=["Test"])

【问题讨论】：

您的问题到底是什么？
哦，我知道我有点不清楚：1：我想将推文与情绪结合起来。 2：我想知道如何将推文和情绪写入 json 文件。杰克做了一些假设，他们是正确的:)

标签： python twitter save streaming

【解决方案1】：

首先您确定要使用on_data 而不是on_status，this 详细说明了两者之间的区别。我对tweepy 不太熟悉，所以这方面可能是错误的。

其次，您似乎没有正确更新有关情绪的数据。您可以使用tweet = TextBlob(all_data['text']) 计算它，但是不要对tweet 变量或all_data 变量做任何进一步的事情。你想要的是all_data['sentiment'] = tweet.sentiment。

最后，您最后没有正确地将数据写入文件。我正在假设您希望文件是 JSON 条目的集合而不是单个 JSON 文档。您正在做的是将提供的字符串 data 写入文件末尾，没有新行，而不是您可能拥有的任何更新的字典。您可能希望将 all_data 字典作为 JSON 对象写入文件。

我上述观点的一个示例修复是：

import textblob as textblob
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from textblob import TextBlob

# consumer key, consumer secret, access token, access secret.
consumer_key = x
consumer_secret = x
access_token = x
access_token_secret = x


class StdOutlistener(StreamListener):
    def on_data(self, data):
        all_data = json.loads(data)
        tweet = TextBlob(all_data["text"])

        #Add the 'sentiment data to all_data
        all_data['sentiment'] = tweet.sentiment

        print(tweet)
        print(tweet.sentiment)

        # Open json text file to save the tweets
        With open('tweets.json', 'a') as tf:
            # Write a new line
            tf.write('\n')

            # Write the json data directly to the file
            json.dump(all_data, tf)
            # Alternatively: tf.write(json.dumps(all_data))

        return True

    def on_error(self, status):
        print(status)


auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

twitterStream = Stream(auth, StdOutlistener())
twitterStream.filter(languages=["en"], track=["Test"])

【讨论】：

您好，谢谢！这正是我一直在寻找的和超级解释。关于json数据；我不知道写一个新行是必要的。但是您的假设是正确的；我不想要一个 json 文档
只是python在追加时不会自动写新行。如果编写了新行，那么您将只需像{<tweet_data>}{<other_tweet_data>}... 这样的超行即可。另一种解决方案是始终确保您在数据末尾写入一个换行符，这将获得基本相同的结果。