【问题标题】:How to get all attributes in .csv from the twitter data in Python?如何从 Python 中的 twitter 数据中获取 .csv 中的所有属性?
【发布时间】:2020-12-23 08:02:47
【问题描述】:

我正在尝试获取我的 Twitter 帐户的数据并希望将其导出到 .csv 文件。我有相同的以下代码。通过使用此代码,我只获得了 3 个属性,例如 ID、create_at 和 Text。我想获取 .csv 文件中的所有属性。我该怎么做?

提前致谢。

import pandas as pd
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import time
import csv
#from tweepy import twitter_credentials
def get_all_tweets(screen_name):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    
    #initialize a list to hold all the tweepy Tweets
    alltweets = []  
    
    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)
    
    #save most recent tweets
    alltweets.extend(new_tweets)
    
    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1
    
    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print(f"getting tweets before {oldest}")
        
        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
        
        #save most recent tweets
        alltweets.extend(new_tweets)
        
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        
        print(f"...{len(alltweets)} tweets downloaded so far")
    
    #transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text] for tweet in alltweets]
    
    #write the csv  
    with open(f'new_{screen_name}_tweets.csv', 'w', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(outtweets)
    
    pass

【问题讨论】:

    标签: python web-scraping twitter tweepy


    【解决方案1】:

    我认为这可能是因为您只是想从结果中获取这些信息。

    对生成的Status 对象的引用是here

    def get_all_tweets(screen_name):
        #Twitter only allows access to a users most recent 3240 tweets with this method
        
        #authorize twitter, initialize tweepy
        auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
        auth.set_access_token(access_key, access_secret)
        api = tweepy.API(auth)
        
        #initialize a list to hold all the tweepy Tweets
        alltweets = []  
        
        #make initial request for most recent tweets (200 is the maximum allowed count)
        new_tweets = api.user_timeline(screen_name = screen_name,count=200)
        
        #save most recent tweets
        alltweets.extend(new_tweets)
        
        #save the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        
        #keep grabbing tweets until there are no tweets left to grab
        while len(new_tweets) > 0:
            print(f"getting tweets before {oldest}")
            
            #all subsiquent requests use the max_id param to prevent duplicates
            new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
            
            #save most recent tweets
            alltweets.extend(new_tweets)
            
            #update the id of the oldest tweet less one
            oldest = alltweets[-1].id - 1
            
            print(f"...{len(alltweets)} tweets downloaded so far")
        
        #transform the tweepy tweets into a 2D array that will populate the csv 
        outtweets = [[tweet.created_at, tweet.id, tweet.id_str, tweet.text, tweet.entities, tweet.source, tweet.source_url, tweet.in_reply_to_status_id, tweet.in_reply_to_status_id_str, tweet.in_reply_to_user_id, tweet.in_reply_to_user_id_str, tweet.in_reply_to_screen_name, tweet.user.id,tweet.user.name, tweet.geo, tweet.coordinates, tweet.place, tweet.contributors, tweet.is_quote_status, tweet.retweet_count, tweet.favorite_count, tweet.favorited, tweet.retweeted, tweet.lang] for tweet in alltweets]
        
        #write the csv  
        with open(f'new_{screen_name}_tweets.csv', 'w', encoding='utf-8') as f:
            writer = csv.writer(f)
            writer.writerow(["created_at", "id", "id_str", "text", "entities", "source", "source_url", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user_id","user_name", "geo", "coordinates", "place", "contributors", "is_quote_status", "retweet_count", "favorite_count", "favorited", "retweeted", "lang"])
            writer.writerows(outtweets)
        
        pass
    

    对于用户,因为它是一个对象,你可以通过附加一个逗号和属性名称来访问它的属性,例如:tweet.user.idtweet.user.name

    属性名称列表可以在Twitter API documentation中找到。

    【讨论】:

    • 您好,感谢您的回复。但它给出了以下错误。 名称“created_at”未定义
    • 我更新了代码,忘记了数组中的tweet.,如果你设法修复它,请告诉我
    • 重新更新,如果有帮助请告诉我 =)
    • 非常感谢。现在它在最后给出以下错误。 AttributeError: 'Status' object has no attribute 'possibly_sensitive' 所以我在删除该属性后尝试了。这样就生成了csv文件。在用户列中,它给出 User(_api=, _json={'id': 623205342, 'id_str': '623205342'...... 信息
    • 是的,因为 user 是一个对象,其中包含更多信息,您应该查看有关 User 对象中有哪些信息的文档,然后将您想要实现的数据存储在 csv 中。文档在这里:tweepyuser object in twitter docs
    猜你喜欢
    • 2023-03-30
    • 2020-07-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-02-03
    • 2012-07-27
    • 2020-10-04
    • 2017-04-27
    相关资源
    最近更新 更多