【问题标题】:Adding new rows in the existing columns based on condition根据条件在现有列中添加新行
【发布时间】:2020-04-30 06:38:47
【问题描述】:

我有一个数据框,它有四列,即范围、天气、标志和计算。我需要从三个列表中获取三列(范围、天气和标志)的组合,并检查组合是否不存在然后这三列在数据框中添加新行。

range   weather  flag   calculation
 0-5    good      y      12
 5-6    good      n      14
 0-5    bad       n      2
 5-6    worse     y      5

输出如下:

range   weather  flag   calculation
 0-5    good      y       12
 0-5    bad       n        2
 0-5    good      n       null
 0-5    worse     n       null
 0-5    bad       y       null
 0-5    worse     y       null
 5-6    good      n       14
 5-6    worse     y        5
 5-6    bad       n       null
 5-6    worse     n       null
 5-6    bad       y       null
 5-6    good      y       null

我试过的代码如下:

r=['0-5','5-6']
w=['good','bad','worse']
f=['n','y']
for i in r:
    for j in w:
        for k in f:
            if i in data1['range'].values and j in data1['weather'].values and k in data1['flag'].values:
            print(i,j,k)
            print("yes")      
        else:
            print(i,j,k)
            print("no")
            data1=data1.append([{'bl_flag':j},{'weather_status':k}], ignore_index=True)
        print(data1)

上述代码不会检查所有 3 个组合是否存在于一行中,如果它不存在于一行中,则必须将其附加到数据框中。

【问题讨论】:

    标签: python-3.x pandas list dataframe append


    【解决方案1】:

    解决此问题的一种方法是使用"range", "weather" and "flag" 列中所有可能的值组合创建一个DataFrame,然后使用outer join 将新DataFrame 与原始DataFrame 合并。

    使用所有可能的组合创建数据框:

    r=['0-5','5-6']
    w=['good','bad','worse']
    f=['n','y']
    
    res = [[i, j, k] for i in r  
                     for j in w 
                     for k in f] 
    
    cls = ["range","weather","flag"]
    
    df1 = pd.DataFrame(res,columns  =  cls)
    df1
    

    输出:

       range weather flag
    0    0-5    good    n
    1    0-5    good    y
    2    0-5     bad    n
    3    0-5     bad    y
    4    0-5   worse    n
    5    0-5   worse    y
    6    5-6    good    n
    7    5-6    good    y
    8    5-6     bad    n
    9    5-6     bad    y
    10   5-6   worse    n
    11   5-6   worse    y
    

    现在,您可以通过以下方式使用outer join 将此 DataFrame 与原始 DataFrame 合并:

    new_df = pd.merge(df1, orignal_df,  how='outer', left_on=cls, right_on = cls)
    

    输出:

       range weather flag  calculation
    0    0-5    good    n          NaN
    1    0-5    good    y          NaN
    2    0-5     bad    n          NaN
    3    0-5     bad    y          NaN
    4    0-5   worse    n          NaN
    5    0-5   worse    y          NaN
    6    5-6    good    n          NaN
    7    5-6    good    y          NaN
    8    5-6     bad    n          NaN
    9    5-6     bad    y          NaN
    10   5-6   worse    n          NaN
    11   5-6   worse    y          NaN
    12   0-5    good    y         12.0
    13   5-6    good    n         14.0
    14   0-5     bad    n          2.0
    15   5-6   worse    y          5.0
    

    【讨论】:

      【解决方案2】:
      r=['0-5','5-6']
      w=['good','bad','worse']
      f=['n','y']
      for i in r:
          for j in f:
              for k in w:          
                  count=data1[data1["range"]==i].groupby(["range","weather","flag"]).apply(lambda x: x[(x["flag"]==j)&(x["weather"]==k).any()])              
                  if count.size==0:
                      data1=data1.append({'flag':j,'weather':k}, ignore_index=True)
      

      【讨论】:

        猜你喜欢
        • 2020-03-12
        • 1970-01-01
        • 2022-11-25
        • 1970-01-01
        • 1970-01-01
        • 2022-11-22
        • 2019-09-29
        • 2020-06-09
        • 2021-08-31
        相关资源
        最近更新 更多