【发布时间】:2021-11-19 17:25:47
【问题描述】:
我有一个如下的数据框
import pandas as pd
df = pd.DataFrame({"order_id":[1,3,7],"order_date":["20/5/2018","22/5/2018","23/5/2018"], "package":["p1","p4","p5,p6"],"package_code":["As he crossed toward the pharmacy at the","he was dancing in the","they were playing football"]})
df
order_id order_date package package_code
0 1 20/5/2018 p1 As he crossed toward the pharmacy at the
1 3 22/5/2018 p4 he was dancing in the
2 7 23/5/2018 p5,p6 they were playing football
我写了一个如下的函数,它将一个字符串分成 5 个单词的组
s = 'As he crossed toward the pharmacy at the corner '
n = 5
def group_words(s, n):
words = s.split()
for i in range(0, len(words), n):
yield ' '.join(words[i:i+n])
list(group_words(s,n))
['As he crossed toward the', 'pharmacy at the corner']
我想获取数据框并将“package_code”列拆分为多行,每行 5 个单词,同时保持列的其余部分相同(每行)。
我该怎么做
例如第一行应该是:
order_id order_date package package_code
0 1 20/5/2018 p1 As he crossed toward the
0 1 20/5/2018 p1 pharmacy at the
我在下面尝试过,但它没有提供我想要的东西
(df.set_index(['order_id', 'order_date'])
.apply(lambda x: group_words(x, 3))
.reset_index())
index 0
0 package <generator object group_words at 0x7fa263e98570>
1 package_code <generator object group_words at 0x7fa263e98678>
【问题讨论】: