【问题标题】:Shift values relative to groupby相对于 groupby 移动值
【发布时间】:2022-01-11 14:37:56
【问题描述】:

我想在我的 pandas 数据框中删除 NaN 值,并将值相对于 CategoryGender 上的 groupby 向上移动。这是我创建的一个示例,它模仿了我正在使用的数据:

import pandas as pd
test = {'Price':
        [20, 10, 'NaN', 'NaN',  'NaN', 'NaN',21, 11,'NaN', 'NaN', 'NaN','NaN'], 
        'Gender':
        ['womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing'],
        'Category':['dresses','dresses','dresses', 'dresses',  'dresses', 'dresses', 'jackets','jackets', 'jackets', 'jackets', 'jackets', 'jackets'],
        'Title':['NaN', 'NaN', 'Cheap Dress', 'First Dress', 'NaN', 'NaN','NaN', 'NaN','Main Jacket', 'Black Jacket','NaN', 'NaN'],
        'Review':['NaN','NaN','NaN','NaN',203,12,'NaN','NaN','NaN','NaN',201, 15]}

df = pd.DataFrame(test)

这就是它的样子:

    Price   Gender     Category Title         Review
0   20  womens-clothing dresses NaN             NaN
1   10  womens-clothing dresses NaN             NaN
2   NaN womens-clothing dresses Cheap Dress     NaN
3   NaN womens-clothing dresses First Dress     NaN
4   NaN womens-clothing dresses NaN             203
5   NaN womens-clothing dresses NaN             12
6   21  mens-clothing   jackets NaN             NaN
7   11  mens-clothing   jackets NaN             NaN
8   NaN mens-clothing   jackets Main Jacket     NaN
9   NaN mens-clothing   jackets Black Jacket    NaN
10  NaN mens-clothing   jackets NaN             201
11  NaN mens-clothing   jackets NaN             15

我想丢弃剩余的 NaN 值和来自 GenderCategory 的值的行,然后将单元格向上移动一个,使其匹配如下:

    Price   Gender     Category Title         Review
0   20  womens-clothing dresses Cheap Dress     203
2   10  womens-clothing dresses First Dress     12
3   21  mens-clothing   jackets Main Jacket     201
4   11  mens-clothing   jackets Black Jacket    15

我试过了:

data = df.apply(lambda x: pd.Series(x.drop(index=x[x[0] == 'NaN'], inplace=True).values))

但是,我似乎无法以这种方式删除特定行。因为这些 NaN 是字符串(它们对我来说是实际的 NA,我只是不知道如何在我可以为可重现代码创建的字典中生成它们。)

我怎样才能得到预期的输出 - 假设 NaNs 是实际的 Nas。我尝试在上面的函数中包含groupby,但是我可以在 numpy 数组上使用它。我可以在函数之外包含,但没有帮助。

【问题讨论】:

    标签: python pandas dataframe numpy


    【解决方案1】:

    在理想的数据样本中使用:

    f = lambda x: x.apply(lambda x: x[x!='NaN'])
    df = df.set_index(['Gender','Category']).groupby(['Gender','Category'], group_keys=False).apply(f).reset_index()
    print (df)
                Gender Category Price         Title Review
    0    mens-clothing  jackets    21   Main Jacket    201
    1    mens-clothing  jackets    11  Black Jacket     15
    2  womens-clothing  dresses    20   Cheap Dress    203
    3  womens-clothing  dresses    10   First Dress     12
    

    如果是一般数据,则表示使用的非NaNs 值的数量可能不同:

    test = {'Price':
            [20, 10, 'NaN', 'NaN',  'NaN', 'NaN',21, 11,45, 'NaN', 'NaN','NaN'], 
            'Gender':
            ['womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing'],
            'Category':['dresses','dresses','dresses', 'dresses',  'dresses', 'dresses', 'jackets','jackets', 'jackets', 'jackets', 'jackets', 'jackets'],
            'Title':['NaN', 'NaN', 'Cheap Dress', 'First Dress', 'NaN', 'NaN','NaN', 'NaN','Main Jacket', 'Black Jacket','NaN', 'NaN'],
            'Review':['NaN','NaN','NaN','NaN',203,12,'NaN','NaN','NaN','NaN',201, 15]}
    
    df = pd.DataFrame(test)
    

    f = lambda x: x.apply(lambda x: pd.Series(x[x!='NaN'].to_numpy()))
    #if NaNs are missing values
    #f = lambda x: x.apply(lambda x: pd.Series(x.dropna().to_numpy()))
    df = (df.set_index(['Gender','Category'])
            .groupby(['Gender','Category'])
            .apply(f)
            .droplevel(-1)
            .reset_index())
    print (df)
                Gender Category Price         Title Review
    0    mens-clothing  jackets    21   Main Jacket    201
    1    mens-clothing  jackets    11  Black Jacket     15
    2    mens-clothing  jackets    45           NaN    NaN
    3  womens-clothing  dresses    20   Cheap Dress    203
    4  womens-clothing  dresses    10   First Dress     12
    

    【讨论】:

      猜你喜欢
      • 2014-08-20
      • 2011-05-28
      • 2012-01-19
      • 2015-02-19
      • 2018-03-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多