【发布时间】:2020-10-20 10:07:45
【问题描述】:
我在通过元组的前两个元素进行分组时遇到了一些麻烦,我已经搜索了很多并尝试了但我无法弄清楚:(
我有这个数据集:
idi d2 duplicates
0 a b (us2, us1, 1)
0 a b (us1, us4, 1)
0 a b (us4, us2, 1)
0 a b (us2, us5, 1)
0 a b (us5, us4, 1)
0 a b (us4, us1, 1)
0 a b (us1, us2, 1)
0 a b (us2, us1, 2)
0 a b (us1, us4, 4)
0 a b (us4, us2, 1)
0 a b (us2, us4, 1)
0 a b (us4, us2, 1)
1 c b (us1, us2, 1)
1 c b (us2, us1, 1)
1 c b (us1, us2, 1)
1 c b (us2, us4, 1)
1 c b (us4, us5, 1)
2 v b (us4, us5, 1)
我想根据id、id2和'usx'进行分组,所以输出应该是:
idi d2 duplicates
0 a b (us2, us1, 1), (us2, us1, 2)
0 a b (us1, us4, 1), (us1, us4, 4)
0 a b (us4, us2, 1), (us4, us2, 1), (us4, us2, 1)
0 a b (us2, us5, 1)
0 a b (us5, us4, 1)
0 a b (us4, us1, 1)
0 a b (us1, us2, 1)
0 a b (us2, us4, 1)
1 c b (us1, us2, 1), (us1, us2, 1)
1 c b (us2, us1, 1)
1 c b (us2, us4, 1)
1 c b (us4, us5, 1)
2 v b (us4, us5, 1)
生成有效部分的代码是:
d = {'id': [ "a", "a", "a", "a", "a", "a", "a", "a", "a", "c", "c", "c", "c", "c", "a", "a", "a", "a", "v", "v", "c", "c"],
'id2': ["b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"],
'userid': ["us1", "us2", "us1", "us2", "us4", "us4", "us5", "us1", "us2", "us1", "us2", "us1", "us2", "us4", "us4", "us2", "us4", "us2", "us4", "us5", "us4", "us5"],
"time": [11, 2, 3, 5, 4, 7, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]}
df_test = pd.DataFrame(data=d).sort_values('time').reset_index()
df_test = df_test.groupby(['id','id2']).apply(lambda x: list(zip(x['userid'][:-1], x['userid'][1:],
x['time'][:-1], x['time'][1:]))).reset_index(name = 'duplicates')
df_test['duplicates'] = df_test.apply(lambda x: [(k, v, j - y) for k,v, y,j in x.duplicates if k != v], 1)
df_test['duplicates'] = df_test.apply(lambda x: [(k,v,y) for k,v,y in x.duplicates], 1)
df_test.explode('duplicates')
【问题讨论】:
标签: python pandas lambda tuples pandas-groupby