从其他列中删除字符串行及其对应的值答案

【问题标题】：Delete string rows and their corresponding values from other columns从其他列中删除字符串行及其对应的值
【发布时间】：2017-09-16 01:32:21
【问题描述】：

请帮我弄清楚怎么做。我有一个数据框。在“指标”列中有一堆不同的参数（字符串），但我只需要“生活满意度”。我不知道如何删除其他指标，例如“没有基本设施的住宅”及其对应的值和国家。

import numpy as np
import pandas as pd

oecd_bli = pd.read_csv("/Users/vladelec/Desktop/Life.csv")
df = pd.DataFrame(oecd_bli)
df.drop(df.columns[[0,2,4,5,6,7,8,9,10,11,12,13,15,16]], axis=1, inplace=True) 
#dropped other columns that a do not need

这是我的数据框的截图：

【问题讨论】：

你不需要在第一行做oecd_bli = pd.read_csv("/Users/vladelec/Desktop/Life.csv") df = pd.DataFrame(oecd_bli)。
Deleting DataFrame row in Pandas based on column value的可能重复

标签： python rows

【解决方案1】：

您可以像这样加载数据：

file_name = "/Users/vladelec/Desktop/Life.csv"

# Columns you want to load
keep_cols = ['Country', 'Indicator']

# pd.read_csv() will load the data into a pd.DataFrame
oecd_bli = pd.read_csv(file_name, usecols=keep_cols)

如果您只想从Indicator 获得"Life Satisfaction"，那么您可以执行以下操作：

oecd_bli = oecd_bli[oecd_bli['Indicator'] == "Life Satisfaction"]

如果您想保留更多Indicators，那么您可以这样做：

keep_indicators = [
    "Life Satisfaction",
    "Crime Indicator",
]

oecd_bli = oecd_bli[oecd_bli['Indicator'].isin(keep_indicators)]

【讨论】：

【解决方案2】：

@GiantsLoveDeathMetal 有优点。原则上，您可以以oecd_bli 的形式读取原始数据，并选择满足一定条件的DataFrame 子集。

演示

import pandas as pd


# Given a DataFrame of raw data
d = {
    "Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]),
    "Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]),
    "Value": pd.Series([1.1, 1.0, 2.2, 2.9]),
}

oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"] )
oecd_bli

# Select rows starting with "Life" in column "Indicator"
condition = oecd_bli["Indicator"].str.startswith("Life")
subset = oecd_bli[condition]
subset

或者，通过.loc 使用标签索引选择一个子集：

subset = oecd_bli.loc[condition, :]

这里loc 期望[<rows>, <columns>]。因此，将显示那些满足条件的行。

详情

请注意，每个给出True 条件的行都会显示一个DataFrame 视图。这是因为 DataFrame 响应 boolean arrays。

布尔数组示例：

>>> condition = oecd_bli["Indicator"].str.startswith("Life")
>>> condition

0    False
1    False
2     True
3     True
Name: Indicator, dtype: bool

其他设置条件的方法：

>>> condition = oecd_bli["Indicator"] == "Life ..."
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell")
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."])
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...")

直接相等 (==)
排除 (~) 不希望出现的情况
通过isin 包括列入白名单的列
与逻辑位运算符的多重比较（|、& 等）

【讨论】：