【问题标题】:Error when converting Tweet object to pandas data frame将 Tweet 对象转换为 pandas 数据框时出错
【发布时间】:2022-01-27 01:04:32
【问题描述】:

我使用 python 中的 tweepy 库提取了推文。但是,当我尝试将列表对象“推文”转换为 pd 数据框时,我收到一条错误消息。我环顾四周,找不到解决问题的方法?

非常感谢您的帮助。

def getTweets():
    client = getClient()
    tweets = client.search_all_tweets(query=search_tweets,
                                      tweet_fields=['created_at', 'author_id'],
                                      start_time='2017-01-01T00:00:00Z',
                                      max_results=10,
                                      expansions=['attachments.media_keys','author_id'],
                                      media_fields=['preview_image_url','url'],
                                      user_fields=['description'])

    return tweets

tweets = getTweets()
len(tweets[0]) # data number of columns 10
type(tweets) # <class 'tweepy.client.Response'>
# The tweets object looks as follows. Note: I had to remove URLs from the tweets object as Stackoverflow did not allow me to post the object with URls.
print(tweets)
Response(data=[<Tweet id=1469341674786951170 text=I don't know. I kinda feel like it's more cowardly to tell a bald-faced lie about me and then block me so I can't respond to the lie.
The only party I accuse of stealing any presidential elections are the Republicans, in 2000, bc they did. Not sure how that leads to Trump fan>, <Tweet id=1469322836481302529 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL>, <Tweet id=1469317002678484992 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL>, <Tweet id=1469310335412629507 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL>, <Tweet id=1469309977101684739 text=RT @mmasnick: Yesterday, I pointed out to @repthomasmassie that, according to Knight v. Trump, it violates the 1st Amendment for a governme…>, <Tweet id=1469309777998135304 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL>, <Tweet id=1469271536821538824 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL, <Tweet id=1469249652398694405 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL, <Tweet id=1469244476073914370 text=RT @mmasnick: Yesterday, I pointed out to @repthomasmassie that, according to Knight v. Trump, it violates the 1st Amendment for a governme…>, <Tweet id=1469203509484523520 text=RT @almostjingo: She blocked me but @cher blamed Trump of course IN ALL CAPS AS USUAL>], includes={'media': [<Media media_key=3_1469341668461535238 type=photo>, <Media media_key=3_1469341672307716101 type=photo>, <Media media_key=3_1469091713536851968 type=photo>], 'users': [<User id=1373600866129698820 name=Muad'dib's Most Loyal, Least Competent Fedaykin username=NotmesomeoneLs>, <User id=4919111699 name=Joe Gzel username=GzelJoe>, <User id=1436345175895326720 name=ARES username=AresofRome>, <User id=17108219 name=Donovan the American username=don_arete>, <User id=917284339 name=Babarushka username=Babarushka>, <User id=30277795 name=Nathalie username=divnanata>, <User id=832679454354796544 name=unifyforfreedom username=unifyforfreedom>, <User id=4721727745 name=Lake girl - part deux username=Justalakegirl>, <User id=928086587115802624 name=B A Dusername username=DusernameA>, <User id=958865926085951488 name=Donna Loco username=locantore_k>]}, errors=[], meta={'newest_id': '1469341674786951170', 'oldest_id': '1469203509484523520', 'result_count': 10, 'next_token': 'b26v89c19zqg8o3fpdy9coohd8mivk2mmv9g6grt5hy7x'})
# Convert to dataframe
tweet_df = pd.DataFrame(tweets)
print(tweet_df)

当我尝试打印 tweet_df 时出现错误。

    pprint_thing(next(s), _nest_lvl + 1, max_seq_items=max_seq_items, **kwds)
StopIteration

更新:

print([tweet_x for tweet_x in tweets])
[[<Tweet id=1453409986961346561 text=@mondospooky ohhhh i need to rewatch attack the block for sure… and maybe a pan’s labyrinth rewatch i love that one so much even tho it scares me. it helps that doug jones is there being sexy>, 
<Tweet id=1444106508510248964 text=@crypto1701 @BernfriedI @CptMutant @theolivefilm @SamRorie1 @ShadowMann9 @CaptainTeag @BurnettRM @MattKleinman @waytoomuchbeer @Stelzi79 @JeanRoddenberry @OliverFranke10 @A_Drop_of_Truth @SothManigan @LouiseSparrow @JosephDickerson @ScythianRaven @oldclam @SaraMich Yep, I don’t know him personally, but I always liked Doug Jones. That said, ya know, @StarTrek hasn’t blocked me, in spite of how critical I am regarding DSC/STP. Perhaps that’s because my criticisms are always civil and never include personal attacks, name-calling or profanity?>, 
<Tweet id=1441163731908448258 text=@DougJones @JohnHMerrill John blocked me because I gave him so much hell for this. It was ridiculous to give the idiot a meeting &amp; not surprisingly, John got played by the nut job.>, 
<Tweet id=1439091171976904704 text=@glennkirschner2 @petersoby Sally Yates is my preference ... I don't think Doug Jones would be tough enuf, he's weaker than #MAGAtMerrick ... I know y'all hate when I tweet that and Tony Ahonen blocked me for it but he's not doing anything for us. Just placating, appeasing us ...>, 
<Tweet id=1436832370267668482 text=@DougJones Merrill blocked me because he didn’t like me speaking the truth about the pandemic and my high risk son’s freedoms being taken away due to the selfishness of others.>, 
<Tweet id=1436830208041099276 text=@DougJones @BurkhalterEddie Can’t read this because Merrill blocked me and hundreds of others. It’s illegal and he knows it.>, 
<Tweet id=1422172212811927553 text=@MasonMornings @DougJones @juliemason @USATODAY Okay, I will say that this audience wants you to quote .  I am enjoying the conversation, but saying it is in the Constitution without corroboration is a block for me.>, 
<Tweet id=1411093577946611715 text=@RonnieMotes8 @arapaho415 @wackiejalsh @clearing_fog @Belltollstrump @carrybeyond @MountainsStars @RighteousBabe4 @satirehat @Standupchai @100FrogLegs @DougJones She blocked me for that, too. She couldn’t defend the BS she was spewing so she blocked me and then tweeted behind my back.>, 
<Tweet id=1411093280759160833 text=@YDanasmithdutra @arapaho415 @wackiejalsh @clearing_fog @Belltollstrump @carrybeyond @MountainsStars @RighteousBabe4 @satirehat @Standupchai @100FrogLegs @DougJones She blocked me for that.>], {'users': [<User id=767832373950054400 name=smelly padmé amidala username=feraIdanvers>, <User id=815998516451426306 name=Chuck Marble username=C_Marb>, <User id=219497320 name=George May username=George3May>, <User id=91008852 name=Straight Blue The Only Way 2022 username=gato918>, <User id=146456755 name=Macy username=toifrogs>, <User id=2762852093 name=Meadowbrookwoman username=alcacountry>, <User id=2361855676 name=Joey Tomatoes username=joeytomatoes>, <User id=769702230 name=Dana Smith Dutra = NO Pantyhose or Plastics username=YDanasmithdutra>, <User id=1293230422072205316 name=Rhonda Harbison username=RonnieMotes8>], 'media': [<Media media_key=3_1444106504781529089 type=photo>]}, [], {'newest_id': '1453409986961346561', 'oldest_id': '1411093280759160833', 'result_count': 9, 'next_token': 'b26v89c19zqg8o3fpdg8lorfxo8kinyj76ndtb56vksjh'}]


【问题讨论】:

    标签: python pandas dataframe twitter tweepy


    【解决方案1】:

    “tweets”是一个生成器对象,而不是一个列表。除非您先对其进行迭代,否则您不能直接将其转换为 DataFrame。试试这个:

    tweet_df = pd.DataFrame([tweet_x._json for tweet_x in tweets])
    

    请注意,如果您增加 max_results,这可能需要一些时间

    【讨论】:

    • 我已经尝试过您的建议,但收到以下错误消息:AttributeError: 'list' object has no attribute '_json'
    • 在 Tweepy 库中它有 ._json。也许您正在使用不同的库。首先,运行以下代码:“print([tweet_x for tweet_x in tweets])”并告诉我输出。
    • 更新:输出如上所示。
    • 好的,这就是数据。但是需要解析。你能试试 .Cursor 代替 .search_all_tweets 吗?
    • 我正在使用 Twitter API v2,据我所知,它不支持 .Cursor 作为 API v1.1,而是使用 .Paginator。见链接:link。但是,当我插入 tweepy.Paginator(client.search_all_tweets, query=search_tweets ... user_fields=['description']) 时,出现速率限制错误 tweepy.errors.TooManyRequests: 429 Too Many Requests
    猜你喜欢
    • 2013-06-02
    • 2020-03-27
    • 2018-05-17
    • 2018-12-19
    • 2015-05-16
    • 1970-01-01
    • 2014-04-15
    • 2021-12-21
    • 2018-10-27
    相关资源
    最近更新 更多