【问题标题】:Print out a specific set of rows of a dataset based on conditions根据条件打印出数据集的一组特定行
【发布时间】:2020-10-02 02:49:51
【问题描述】:

我正在尝试什么:

import re
new_df = census_df.loc[(census_df['REGION']==1 | census_df['REGION']== 2) & (census_df['CTYNAME'].str.contains('^Washington[a-z]*'))& (census_df['POPESTIMATE2015']>census_df['POPESTIMATE2014'])]
new_df

它返回此错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

【问题讨论】:

  • 欢迎来到 SO。您能否阅读此stackoverflow.com/questions/20109391/…,并以一种可以重现的方式重新表述您的问题?
  • 您没有使用 re 模块,所以可能不需要导入它?并且,请生成 census_df 数据框内容的样本。

标签: python pandas dataframe data-science


【解决方案1】:

您需要为 filt_1 中的每个逻辑表达式设置括号:

filt_1 = (census_df['REGION'] == 1)  | (census_df['REGION'] == 2)

请注意,我的 census_df 数据是半虚构的,但显示了功能。从 filt_1 分配操作和向下的所有内容仍然适用于您的整个 census_df 数据框。这是完整的程序:

import pandas as pd

cols = ['REGION', 'CTYNAME', 'POPESTIMATE2014', 'POPESTIMATE2015']
data = [[1, "Washington", 4846411, 4858979],
        [3, "Autauga County", 55290, 55347]]

census_df = pd.DataFrame(data, columns=cols)

filt_1 = (census_df['REGION'] == 1)  | (census_df['REGION'] == 2)
filt_2 = census_df['CTYNAME'].str.contains("^Washington[a-z]*")
filt_3 = census_df['POPESTIMATE2015'] > census_df['POPESTIMATE2014']

filt = filt_1 & filt_2 & filt_3

new_df = census_df.loc[filt]

print(new_df)

返回:

   REGION     CTYNAME  POPESTIMATE2014  POPESTIMATE2015
0       1  Washington          4846411          4858979

【讨论】:

    猜你喜欢
    • 2015-02-17
    • 1970-01-01
    • 1970-01-01
    • 2023-02-16
    • 2012-01-25
    • 2022-09-30
    • 2021-05-03
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多