@GiantsLoveDeathMetal 有优点。原则上,您可以以oecd_bli 的形式读取原始数据,并选择满足一定条件的DataFrame 子集。
演示
import pandas as pd
# Given a DataFrame of raw data
d = {
"Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]),
"Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]),
"Value": pd.Series([1.1, 1.0, 2.2, 2.9]),
}
oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"] )
oecd_bli
# Select rows starting with "Life" in column "Indicator"
condition = oecd_bli["Indicator"].str.startswith("Life")
subset = oecd_bli[condition]
subset
或者,通过.loc 使用标签索引选择一个子集:
subset = oecd_bli.loc[condition, :]
这里loc 期望[<rows>, <columns>]。因此,将显示那些满足条件的行。
详情
请注意,每个给出True 条件的行都会显示一个DataFrame 视图。这是因为 DataFrame 响应 boolean arrays。
布尔数组示例:
>>> condition = oecd_bli["Indicator"].str.startswith("Life")
>>> condition
0 False
1 False
2 True
3 True
Name: Indicator, dtype: bool
其他设置条件的方法:
>>> condition = oecd_bli["Indicator"] == "Life ..."
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell")
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."])
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...")
- 直接相等 (
==)
- 排除 (
~) 不希望出现的情况
- 通过
isin 包括列入白名单的列
- 与逻辑位运算符的多重比较(
|、& 等)