【发布时间】:2021-05-27 04:14:52
【问题描述】:
我有一个如下的DataFrame:
df
len scores
5 [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324, 0.33509446692807404, 0.01202741856859997, 0.01202741856859997, 0.031149023579740857, 0.031149023579740857, 0.9382029832667171]
4 [0.1289882974831455, 0.17069367229950574, 0.03518847270370917, 0.3283517918439753, 0.41119171582425107, 0.5057528742869354]
3 [0.22345885572316307, 0.1366147609256035, 0.09309687010700848]
2 [0.4049920770888036]
我想根据 len 列的值来索引 score 列并获取多行
len scores
5 [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324]
5 [0.33509446692807404, 0.01202741856859997, 0.01202741856859997]
5 [0.031149023579740857, 0.031149023579740857]
5 [0.9382029832667171]
5
4 [0.1289882974831455, 0.17069367229950574, 0.03518847270370917]
4 [0.3283517918439753, 0.41119171582425107]
4 [0.9382029832667171]
4
3 [0.22345885572316307, 0.1366147609256035]
3 [0.09309687010700848]
3
2 [0.4049920770888036]
2
我尝试了以下代码以获得所需的结果
def create_nested_list_s (x):
l_idx = [0]+np.cumsum(np.arange(x['len'])[::-1]).tolist()
return pd.Series([x['scores'][i:j] for i, j in zip(l_idx[:-1], l_idx[1:])])
df_f = (df.apply(create_nested_list_s, axis=1)
.set_index(df['len'])
.stack()
.reset_index(name='scores')
.drop('level_1', axis=1))
我得到了所需格式的结果
len scores
5 [0.45814112124905954, 0.34974337172257086, 0.042586941883761324, 0.042586941883761324]
4 [0.1289882974831455, 0.17069367229950574, 0.03518847270370917]
3 [0.22345885572316307, 0.1366147609256035]
2 [0.4049920770888036]
但问题是我有多个数据框,例如“len”和“scores”列,但列名不同,并且想要使用相同的上述函数并获取上述格式的数据。
我尝试将数据框列名本身添加为参数,并将这两个函数组合如下:
def create_nested_list(x, col_len, col, col_name):
l_idx = [0]+np.cumsum(np.arange(x[col_len])[::-1]).tolist()
df =(x.apply(pd.Series([x[col][i:j] for i, j in zip(l_idx[:-1], l_idx[1:])]), axis=1)
.set_index(x[col_len])
.stack()
.reset_index(name=col_name)
.drop('level_1', axis=1))
return df
假设df_test 是具有df_len 和df_col 作为不同列名的数据框,就像上面的df 结构一样
testing = create_nested_list(df_test, 'df_len', 'df_col', 'df_name')
但我得到 ValueError:
Series 的真值是模棱两可的。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
任何修复该功能的帮助将不胜感激。
【问题讨论】: