【发布时间】:2021-08-06 07:19:11
【问题描述】:
我想提取任何包含 author.description 的行,其中包含关键字“医生”。我认为 .iloc 之类的东西可以解决这个问题,但我不确定如何选择这个特定的列? 任何帮助表示赞赏
注意:我使用的是 Twitter API V2,如果有人知道任何避免打开文件和删除列的技巧,请告诉我,我在 query_param 中尝试了以下操作..
-bio:doctor 和 -bio_contains:doctor 但它们不起作用
import requests
import expansions
import os
import json
import pandas as pd
import csv
import sys
import time
bearer_token = "bearer token"
search_url = "https://api.twitter.com/2/tweets/search/all"
query_params = {'query': 'vaccine -is:retweet -is:verified -baby -lotion -shampoo lang:en has:geo place_country:US',
'tweet.fields':'created_at,lang,text,geo,author_id,id,public_metrics,referenced_tweets',
'expansions':'geo.place_id,author_id',
'place.fields':'contained_within,country,country_code,full_name,geo,id,name,place_type',
'user.fields':'description,username,id',
'start_time':'2021-01-20T00:00:01.000Z',
'end_time':'2021-02-17T23:30:00.000Z',
'max_results':'10'}
def create_headers(bearer_token):
headers = {"Authorization": "Bearer {}".format(bearer_token)}
return headers
def connect_to_endpoint(url, headers, params):
response = requests.request("GET", search_url, headers=headers, params=params)
if response.status_code != 200:
raise Exception(response.status_code, response.text)
return response.json()
def main():
headers = create_headers(bearer_token)
json_response = connect_to_endpoint(search_url, headers, query_params)
json_response = expansions.flatten(json_response)
df = pd.json_normalize(json_response['data'])
df.to_csv("myfile.csv", encoding="utf-8-sig")
if __name__ == "__main__":
main()
【问题讨论】:
标签: python pandas database csv