【发布时间】:2022-01-11 14:37:56
【问题描述】:
我想在我的 pandas 数据框中删除 NaN 值,并将值相对于 Category 和 Gender 上的 groupby 向上移动。这是我创建的一个示例,它模仿了我正在使用的数据:
import pandas as pd
test = {'Price':
[20, 10, 'NaN', 'NaN', 'NaN', 'NaN',21, 11,'NaN', 'NaN', 'NaN','NaN'],
'Gender':
['womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing'],
'Category':['dresses','dresses','dresses', 'dresses', 'dresses', 'dresses', 'jackets','jackets', 'jackets', 'jackets', 'jackets', 'jackets'],
'Title':['NaN', 'NaN', 'Cheap Dress', 'First Dress', 'NaN', 'NaN','NaN', 'NaN','Main Jacket', 'Black Jacket','NaN', 'NaN'],
'Review':['NaN','NaN','NaN','NaN',203,12,'NaN','NaN','NaN','NaN',201, 15]}
df = pd.DataFrame(test)
这就是它的样子:
Price Gender Category Title Review
0 20 womens-clothing dresses NaN NaN
1 10 womens-clothing dresses NaN NaN
2 NaN womens-clothing dresses Cheap Dress NaN
3 NaN womens-clothing dresses First Dress NaN
4 NaN womens-clothing dresses NaN 203
5 NaN womens-clothing dresses NaN 12
6 21 mens-clothing jackets NaN NaN
7 11 mens-clothing jackets NaN NaN
8 NaN mens-clothing jackets Main Jacket NaN
9 NaN mens-clothing jackets Black Jacket NaN
10 NaN mens-clothing jackets NaN 201
11 NaN mens-clothing jackets NaN 15
我想丢弃剩余的 NaN 值和来自 Gender 和 Category 的值的行,然后将单元格向上移动一个,使其匹配如下:
Price Gender Category Title Review
0 20 womens-clothing dresses Cheap Dress 203
2 10 womens-clothing dresses First Dress 12
3 21 mens-clothing jackets Main Jacket 201
4 11 mens-clothing jackets Black Jacket 15
我试过了:
data = df.apply(lambda x: pd.Series(x.drop(index=x[x[0] == 'NaN'], inplace=True).values))
但是,我似乎无法以这种方式删除特定行。因为这些 NaN 是字符串(它们对我来说是实际的 NA,我只是不知道如何在我可以为可重现代码创建的字典中生成它们。)
我怎样才能得到预期的输出 - 假设 NaNs 是实际的 Nas。我尝试在上面的函数中包含groupby,但是我可以在 numpy 数组上使用它。我可以在函数之外包含,但没有帮助。
【问题讨论】:
标签: python pandas dataframe numpy