【问题标题】:updating row values where MULTIPLE column conditions are from a nested list更新 MULTIPLE 列条件来自嵌套列表的行值
【发布时间】:2021-02-11 19:04:01
【问题描述】:

下面附上部分数据框:

                   state  year 4rcsallmn  ... 4rcsndl90se dyslaw dyslawk0
30                Alaska  2015    212.79  ...        1.42    0.0        0
31               Arizona  2015    215.31  ...        1.42    2.0        0
32              Arkansas  2015    218.08  ...        1.99    2.0        0
33            California  2015    212.68  ...        1.65    2.0        0
34              Colorado  2015    224.02  ...        1.38    1.0        0
35           Connecticut  2015    228.95  ...        1.90    2.0        0
36              Delaware  2015    223.70  ...        1.39    2.0        0
37  District of Columbia  2015    212.31  ...        2.20    NaN        0
38                 DoDEA  2015    233.76  ...        2.06    NaN        0
39               Florida  2015    227.19  ...        1.63    2.0        0
40               Georgia  2015    222.01  ...        1.12    1.0        0
41                Hawaii  2015    215.12  ...        2.31    1.0        0
42              National  2013    221.83  ...        0.36    NaN        0
43               Alabama  2013    218.58  ...        1.38    1.0        0
44                Alaska  2013    209.35  ...        0.90    0.0        0
45               Arizona  2013    213.13  ...        2.20    2.0        0
46              Arkansas  2013    218.52  ...        1.19    2.0        0
47            California  2013    212.55  ...        2.12    2.0        0
48              Colorado  2013    226.66  ...        1.92    1.0        0
49           Connecticut  2013    229.58  ...        1.74    2.0        0

我在dyslawk0 列中添加了除了州和年份的某些值之外应该为零的列。 我首先将所有列设置为零 (df_4['dyslawk0'] = 0)

dyslawk0 列不会根据我的多列条件进行更新。

我有一个嵌套的州和年份列表,其中如果某行作为某个州和年份的组合,dyslawk0 应该更新为 1。

这是我的清单

treat_year = [['Arizona', 2015],
              ['Arkansas', 2013],
              ['California', 2012],
              ['Connecticut', 2014],
              ['Delaware', 2014],
              ['Florida', 2017]]

这是我的代码:

for pair in treat_year:
 df_4['dyslawk0'] = np.where(((df_4['state'] == pair[0]) & (df_4['year'] == pair[1])), 1, 0)

例如,亚利桑那州和 2015 年的第一个 应该将dyslaw0 更新为 1。

df.loc 也不适用于嵌套列表:

for pair in treat_year:
  df_4.loc[((df_4['state'] == pair[0]) & (df_4['year'] == pair[1])), 'dyslawk0'] = 1

让我知道这是否有意义!

【问题讨论】:

    标签: python pandas indexing iterator nested-lists


    【解决方案1】:

    编辑:如果仍然无法正常工作,可能year 列不是由数字填充,而是由数字的字符串 repr 填充,因此需要将年份替换为以下所有解决方案的积分:

    df['year'] = df['year'].astype(int)
    

    您可以使用技巧 - 将两列都转换为MultiIndex,然后通过Index.isin 测试索引,最后将True, False 映射到1, 0 使用Series.viewnumpy.where

    df['dyslawk0'] = df.set_index(['state','year']).index.isin(treat_year).view('i1')
    #alternative1
    df['dyslawk0'] = pd.MultiIndex.from_frame(df[['state','year']]).isin(treat_year).view('i1')
    #alternative2
    df['dyslawk0'] = np.where(df.set_index(['state','year']).index.isin(treat_year), 1, 0)
    print (df)
                       state  year  4rcsallmn  4rcsndl90se  dyslaw  dyslawk0
    30                Alaska  2015     212.79         1.42     0.0         0
    31               Arizona  2015     215.31         1.42     2.0         1
    32              Arkansas  2015     218.08         1.99     2.0         0
    33            California  2015     212.68         1.65     2.0         0
    34              Colorado  2015     224.02         1.38     1.0         0
    35           Connecticut  2015     228.95         1.90     2.0         0
    36              Delaware  2015     223.70         1.39     2.0         0
    37  District of Columbia  2015     212.31         2.20     NaN         0
    38                 DoDEA  2015     233.76         2.06     NaN         0
    39               Florida  2015     227.19         1.63     2.0         0
    40               Georgia  2015     222.01         1.12     1.0         0
    41                Hawaii  2015     215.12         2.31     1.0         0
    42              National  2013     221.83         0.36     NaN         0
    43               Alabama  2013     218.58         1.38     1.0         0
    44                Alaska  2013     209.35         0.90     0.0         0
    45               Arizona  2013     213.13         2.20     2.0         0
    46              Arkansas  2013     218.52         1.19     2.0         1
    47            California  2013     212.55         2.12     2.0         0
    48              Colorado  2013     226.66         1.92     1.0         0
    49           Connecticut  2013     229.58         1.74     2.0         0
    

    顺便说一句,你的最后一个解决方案对我来说效果很好:

    for pair in treat_year:
      df.loc[((df['state'] == pair[0]) & (df['year'] == pair[1])), 'dyslawk0'] = 1
    
    print (df)
                       state  year  4rcsallmn  4rcsndl90se  dyslaw  dyslawk0
    30                Alaska  2015     212.79         1.42     0.0         0
    31               Arizona  2015     215.31         1.42     2.0         1
    32              Arkansas  2015     218.08         1.99     2.0         0
    33            California  2015     212.68         1.65     2.0         0
    34              Colorado  2015     224.02         1.38     1.0         0
    35           Connecticut  2015     228.95         1.90     2.0         0
    36              Delaware  2015     223.70         1.39     2.0         0
    37  District of Columbia  2015     212.31         2.20     NaN         0
    38                 DoDEA  2015     233.76         2.06     NaN         0
    39               Florida  2015     227.19         1.63     2.0         0
    40               Georgia  2015     222.01         1.12     1.0         0
    41                Hawaii  2015     215.12         2.31     1.0         0
    42              National  2013     221.83         0.36     NaN         0
    43               Alabama  2013     218.58         1.38     1.0         0
    44                Alaska  2013     209.35         0.90     0.0         0
    45               Arizona  2013     213.13         2.20     2.0         0
    46              Arkansas  2013     218.52         1.19     2.0         1
    47            California  2013     212.55         2.12     2.0         0
    48              Colorado  2013     226.66         1.92     1.0         0
    49           Connecticut  2013     229.58         1.74     2.0         0
    

    【讨论】:

      猜你喜欢
      • 2020-09-19
      • 2020-12-03
      • 1970-01-01
      • 2017-12-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多