如何从 Python 中的 twitter 数据中获取 .csv 中的所有属性？答案

【问题标题】：How to get all attributes in .csv from the twitter data in Python?如何从 Python 中的 twitter 数据中获取 .csv 中的所有属性？
【发布时间】：2020-12-23 08:02:47
【问题描述】：

我正在尝试获取我的 Twitter 帐户的数据并希望将其导出到 .csv 文件。我有相同的以下代码。通过使用此代码，我只获得了 3 个属性，例如 ID、create_at 和 Text。我想获取 .csv 文件中的所有属性。我该怎么做？

提前致谢。

import pandas as pd
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import time
import csv
#from tweepy import twitter_credentials

def get_all_tweets(screen_name):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    
    #initialize a list to hold all the tweepy Tweets
    alltweets = []  
    
    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)
    
    #save most recent tweets
    alltweets.extend(new_tweets)
    
    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1
    
    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print(f"getting tweets before {oldest}")
        
        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
        
        #save most recent tweets
        alltweets.extend(new_tweets)
        
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        
        print(f"...{len(alltweets)} tweets downloaded so far")
    
    #transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text] for tweet in alltweets]
    
    #write the csv  
    with open(f'new_{screen_name}_tweets.csv', 'w', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(outtweets)
    
    pass

【问题讨论】：

标签： python web-scraping twitter tweepy

【解决方案1】：

我认为这可能是因为您只是想从结果中获取这些信息。

对生成的Status 对象的引用是here

def get_all_tweets(screen_name):
    #Twitter only allows access to a users most recent 3240 tweets with this method
    
    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    
    #initialize a list to hold all the tweepy Tweets
    alltweets = []  
    
    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)
    
    #save most recent tweets
    alltweets.extend(new_tweets)
    
    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1
    
    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print(f"getting tweets before {oldest}")
        
        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
        
        #save most recent tweets
        alltweets.extend(new_tweets)
        
        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1
        
        print(f"...{len(alltweets)} tweets downloaded so far")
    
    #transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.created_at, tweet.id, tweet.id_str, tweet.text, tweet.entities, tweet.source, tweet.source_url, tweet.in_reply_to_status_id, tweet.in_reply_to_status_id_str, tweet.in_reply_to_user_id, tweet.in_reply_to_user_id_str, tweet.in_reply_to_screen_name, tweet.user.id,tweet.user.name, tweet.geo, tweet.coordinates, tweet.place, tweet.contributors, tweet.is_quote_status, tweet.retweet_count, tweet.favorite_count, tweet.favorited, tweet.retweeted, tweet.lang] for tweet in alltweets]
    
    #write the csv  
    with open(f'new_{screen_name}_tweets.csv', 'w', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(["created_at", "id", "id_str", "text", "entities", "source", "source_url", "in_reply_to_status_id", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_user_id_str", "in_reply_to_screen_name", "user_id","user_name", "geo", "coordinates", "place", "contributors", "is_quote_status", "retweet_count", "favorite_count", "favorited", "retweeted", "lang"])
        writer.writerows(outtweets)
    
    pass

对于用户，因为它是一个对象，你可以通过附加一个逗号和属性名称来访问它的属性，例如：tweet.user.id 或tweet.user.name。

属性名称列表可以在Twitter API documentation中找到。

【讨论】：

您好，感谢您的回复。但它给出了以下错误。 名称“created_at”未定义
我更新了代码，忘记了数组中的tweet.，如果你设法修复它，请告诉我
重新更新，如果有帮助请告诉我 =)
非常感谢。现在它在最后给出以下错误。 AttributeError: 'Status' object has no attribute 'possibly_sensitive' 所以我在删除该属性后尝试了。这样就生成了csv文件。在用户列中，它给出 User(_api=, _json={'id': 623205342, 'id_str': '623205342'...... 信息
是的，因为 user 是一个对象，其中包含更多信息，您应该查看有关 User 对象中有哪些信息的文档，然后将您想要实现的数据存储在 csv 中。文档在这里：tweepy 和 user object in twitter docs