使用 snscrape 从标签列表中抓取推文答案

【问题标题】：scrape tweets from a list of hashtags using snscrape使用 snscrape 从标签列表中抓取推文
【发布时间】：2022-09-27 18:46:29
【问题描述】：

我正在尝试使用 snscrape 抓取推文。我可以按位置和包含特定短语的推文进行抓取。我的问题是如何从我创建的列表中抓取可能包含推文的推文。例如，假设我想查找主题标签

hashtags = (\'data analytics\', \'data science\', \'machine learning\')

我想在 OR 意义上进行搜索（主题标签可以是列表中的其中一个，也可以是这些主题标签的组合或全部）

标签： filter tweets

【解决方案1】：

为了通过主题标签抓取推文，您必须将它们搜索为#hashtag。在您的示例中，您必须执行 #dataanalytics #datascience 之类的操作。如果您在研究中想要它们之间的或介词，只需添加它（#dataanalytics OR #datascience）。我在这里添加了一个函数，它是为了抓取推文并返回一个包含我感兴趣的一些特性的 df。 n_tweet 用于设置您想要的推文数量的上限。在函数的末尾，我还添加了一个可能的调用。

def tweet_scraper（查询，n_tweet）：

attributes_container = []
max_tweet = n_tweet

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):

    if i>max_tweet:
        break
        
    attributes_container.append([tweet.user.username,
                                 tweet.user.verified,
                                 tweet.user.created,
                                 tweet.user.followersCount,
                                 tweet.user.friendsCount,
                                 tweet.retweetCount,
                                 tweet.lang,
                                 tweet.date,
                                 tweet.likeCount,
                                 tweet.sourceLabel,
                                 tweet.id,
                                 tweet.content,
                                 tweet.hashtags,
                                 tweet.conversationId,
                                 tweet.inReplyToUser,
                                 tweet.coordinates,
                                 tweet.place])
    
return pd.DataFrame(attributes_container, columns=["User",
                                                   "verified",
                                                   "Date_Created",
                                                   "Follows_Count",
                                                   "Friends_Count",
                                                   "Retweet_Count",
                                                   "Language",
                                                   "Date_Tweet",
                                                   "Number_of_Likes",
                                                   "Source_of_Tweet",
                                                   "Tweet_Id",
                                                   "Tweet",
                                                   "Hashtags",
                                                   "Conversation_Id",
                                                   "In_reply_To",
                                                   "Coordinates",
                                                   "Place"])

example = tweet_scraper('(#example OR #suggestion) since:2020-09-01 until:2022-09-01', 500000)

【讨论】：