【问题标题】:modifying nans position in the dataframe修改数据框中的 nans 位置
【发布时间】:2022-08-17 00:33:43
【问题描述】:

我希望我能很好地解释这一点。我有这个 df 有 2 列:组和数字。我正在尝试获取该 np.nan 并将其放入它的新组中。

def check_for_nan():
    # for example let\'s say my new value is 14.5
    new_nan_value=14.5
    data = {\"group:\" : [-1,0,1,2,3],
            \'numbers\': [[np.nan], [11, 12], [14, 15], [16, 17], [18, 19]],
            }
    df = pd.DataFrame(data=data)


    # *** add some code ***


    # I created a new dataframe to visually show how it should look like but we would want to operate only on the same df from above 
    data_2 = {\"group\" : [0,1,2,3],
            \'numbers\': [[11, 12], [14,np.nan, 15], [16, 17], [18, 19]],
            }
    df_2 = pd.DataFrame(data=data_2)
    # should return the new group number where the nan would live
    return data_2[\"group\"][1]

输出:

当前的:

   group:   numbers
0      -1     [nan]
1       0  [11, 12]
2       1  [14, 15]
3       2  [16, 17]
4       3  [18, 19]

new_nan_value =14.5 时所需的输出

   group        numbers
0      0       [11, 12]
1      1  [14, nan, 15]
2      2       [16, 17]
3      3       [18, 19]

return 1

    标签: python arrays pandas dataframe nan


    【解决方案1】:

    使用您提供的数据框,这是一种方法:

    def move_nan(df, new_nan_value):
        """Helper function.
    
        Args:
            df: input dataframe.
            new_nan_value: insertion value.
    
        Returns:
            Dataframe with nan value at insertion point.
    
        """
    
        # Reshape dataframe along row axis
        df = df.explode("numbers").dropna().reset_index(drop=True)
    
        # Insert new row
        insert_pos = df.loc[df["numbers"] < new_nan_value, "numbers"].index[-1] + 1
        df = pd.concat(
            [
                df.loc[: insert_pos - 1, :],
                pd.DataFrame({"group": [pd.NA], "numbers": pd.NA}, index=[insert_pos]),
                df.loc[insert_pos:, :],
            ]
        )
        df["group"] = df["group"].fillna(method="bfill")
    
        # Groupby and reshape dataframe along column axis
        return df.groupby("group").agg(list).reset_index(drop=False)
    

    以便:

    print(move_nan(df, 14.5))
    # Output
       group        numbers
    0      0       [11, 12]
    1      1  [14, nan, 15]
    2      2       [16, 17]
    3      3       [18, 19]
    

    【讨论】:

      猜你喜欢
      • 2023-02-24
      • 2020-10-20
      • 2018-05-22
      • 1970-01-01
      • 1970-01-01
      • 2016-09-22
      • 2017-06-26
      • 2016-11-01
      • 1970-01-01
      相关资源
      最近更新 更多