Python：从 CSV 中删除列答案

【问题标题】：Python: Removing columns from a CSVPython：从 CSV 中删除列
【发布时间】：2021-08-06 07:19:11
【问题描述】：

我想提取任何包含 author.description 的行，其中包含关键字“医生”。我认为 .iloc 之类的东西可以解决这个问题，但我不确定如何选择这个特定的列？任何帮助表示赞赏

注意：我使用的是 Twitter API V2，如果有人知道任何避免打开文件和删除列的技巧，请告诉我，我在 query_param 中尝试了以下操作..
-bio:doctor 和 -bio_contains:doctor 但它们不起作用

import requests
import expansions
import os
import json
import pandas as pd
import csv
import sys
import time

bearer_token = "bearer token"

search_url = "https://api.twitter.com/2/tweets/search/all"

query_params = {'query': 'vaccine -is:retweet -is:verified -baby -lotion -shampoo lang:en has:geo place_country:US',
                'tweet.fields':'created_at,lang,text,geo,author_id,id,public_metrics,referenced_tweets',
                'expansions':'geo.place_id,author_id', 
                'place.fields':'contained_within,country,country_code,full_name,geo,id,name,place_type',
                'user.fields':'description,username,id',
                'start_time':'2021-01-20T00:00:01.000Z',
                'end_time':'2021-02-17T23:30:00.000Z',
                'max_results':'10'}


def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers


def connect_to_endpoint(url, headers, params):
    response = requests.request("GET", search_url, headers=headers, params=params)

    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()


def main():
    headers = create_headers(bearer_token)
    json_response = connect_to_endpoint(search_url, headers, query_params)
    json_response = expansions.flatten(json_response) 
    df = pd.json_normalize(json_response['data'])
    df.to_csv("myfile.csv", encoding="utf-8-sig")


if __name__ == "__main__":
    main()

【问题讨论】：

标签： python pandas database csv

【解决方案1】：

我认为这样的东西应该是您正在寻找的东西？

import pandas as pd


my_data = pd.DataFrame(
    {'geo.id': ['lkajsdf', 'alksjdf', 'assssddf'], 
     'author.description': ['Hey, I am a doctor', 'I am also a doctor', 'Me? I am a lawyer']})

drop = my_data['author.description'].str.contains('doctor|hospital')
result = my_data[-drop]
result

结果

	geo.id	author.description
0	assssddf	Me? I am a lawyer

【讨论】：

这看起来像是在我的数据中搜索“医生”然后存储它。那是对的吗？由于我不确定在我的搜索中将返回哪个 author.descriptions 我无法在其中指定我想要保留的关键字。我只能尝试删除我不想要的，例如“医生”或“医院”等。
啊，对不起，我想我的编辑会是你想要的，然后医生和医院被取消了？

【解决方案2】：

由于您使用 pandas 创建规范化 json 数据的数据框 (df)，因此您可以执行以下操作：

authorDescptionContainingDoctor = df[df['author.description'].str.contains('doctor')]

这会将'author.description'列中包含关键字'doctor'的每一行存储到它自己的df中，因此它不会修改原始数据集，但您可以单独过滤或进一步分析新的df。

关于上面发生的事情的进一步解释： https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

【讨论】：