【问题标题】:How to Filter Numbers(n>3) in List of a DataFrame?如何过滤 DataFrame 列表中的数字(n>3)?
【发布时间】:2019-01-28 14:09:53
【问题描述】:
movie_id    user_id      rating
0   1   [5, 2, 1, 6]    [4, 4, 5, 4]
1   2   [5, 1]          [3, 3]
2   3   [1]             [4]
3   4   [1]             [3]
4   5   [1]             [3]
5   6   [1]             [5]
6   7   [6, 1]          [2, 4]
7   8   [1, 6]          [1, 4]
8   9   [1, 6]          [5, 4]

我正在尝试获取“评级”中每行大于 3 的数字计数。例如,[4, 4, 5, 5] => 4 / [3, 3] => 0。

这是我到目前为止所做的:

appr = df.copy()

appr['approval'] = appr['rating'].map(Counter)
appr

它输出:

    movie_id    user_id   rating        approval
0   1        [5, 2, 1, 6][4, 4, 5, 4]   {4: 3, 5: 1}
1   2        [5, 1]      [3, 3]         {3: 2}
2   3        [1]         [4]            {4: 1}
3   4        [1]         [3]            {3: 1}
4   5        [1]         [3]            {3: 1}
5   6        [1]         [5]            {5: 1}
6   7        [6, 1]      [2, 4]         {2: 1, 4: 1}
7   8        [1, 6]      [1, 4]         {1: 1, 4: 1}
8   9        [1, 6]      [5, 4]         {5: 1, 4: 1}

我的目标是在每一行的“评分”中过滤掉不大于 3 的数字,并将它们的出现次数相加:

    movie_id    user_id   rating        approval       appr_sum
0   1        [5, 2, 1, 6][4, 4, 5, 4]   {4: 3, 5: 1}   4
1   2        [5, 1]      [3, 3]         {3: 2}         0
2   3        [1]         [4]            {4: 1}         1
3   4        [1]         [3]            {3: 1}         0
4   5        [1]         [3]            {3: 1}         0
5   6        [1]         [5]            {5: 1}         1
6   7        [6, 1]      [2, 4]         {2: 1, 4: 1}   1
7   8        [1, 6]      [1, 4]         {1: 1, 4: 1}   1
8   9        [1, 6]      [5, 4]         {5: 1, 4: 1}   2

我试过了:

s = appr['rating'].map

t = [x for x in s if x > 3]
t

但是有一个TypeError: 'method' 对象是不可迭代的,如果这部分代码是正确的,它并不是对它们的出现求和。

【问题讨论】:

    标签: python python-3.x pandas dataframe jupyter-notebook


    【解决方案1】:

    使用带有过滤和sum的嵌套列表推导:

    appr['appr_sum'] = [sum(v for k, v in x.items() if k > 3) for x in appr['approval']]
    print (appr)
       movie_id       user_id        rating      approval  appr_sum
    0         1  [5, 2, 1, 6]  [4, 4, 5, 4]  {4: 3, 5: 1}         4
    1         2        [5, 1]        [3, 3]        {3: 2}         0
    2         3           [1]           [4]        {4: 1}         1
    3         4           [1]           [3]        {3: 1}         0
    4         5           [1]           [3]        {3: 1}         0
    5         6           [1]           [5]        {5: 1}         1
    6         7        [6, 1]        [2, 4]  {2: 1, 4: 1}         1
    7         8        [1, 6]        [1, 4]  {1: 1, 4: 1}         1
    8         9        [1, 6]        [5, 4]  {5: 1, 4: 1}         2
    

    【讨论】:

      【解决方案2】:

      您的表达式不起作用的原因是您错误地迭代了熊猫系列。完成这项工作的一种更简单的方法是:

      import pandas as pd
      
      df = pd.DataFrame({'A': [1, 3, 4]})
      
      a = [x for _, x in df.iterrows() if x['A'] > 3]
      print(a)
      
      > [A]
        [4]
      

      【讨论】:

        【解决方案3】:

        一个更好的主意是避免串联列表。取而代之的是:

        1. 将您的一系列列表扩展到其他列。
        2. 将您的一系列列表扩展为多行。

        这两个选项都支持矢量化计算。选择第一种:

        rats = pd.DataFrame(df.pop('rating').values.tolist()).add_suffix('rat')
        appr = appr.join(rats).assign(appr_sum=rats.gt(3).sum(1))
        

        【讨论】:

          【解决方案4】:

          您也可以在您的评分列上使用apply 方法:

          appr['appr_sum'] = \
          appr['rating'].apply(lambda ratings: len([x for x in ratings if x > 3]))
          print(appr)
          
           movie_id       user_id        rating  count
          0        1  [5, 2, 1, 6]  [4, 4, 5, 4]      4
          1        2        [5, 1]        [3, 3]      0
          2        3           [1]           [4]      1
          3        4           [1]           [3]      0
          4        5           [1]           [3]      0
          5        6           [1]           [5]      1
          6        7        [6, 1]        [2, 4]      1
          7        8        [1, 6]        [1, 4]      1
          8        9        [1, 6]        [5, 4]      2
          

          【讨论】:

            猜你喜欢
            • 2015-12-22
            • 2020-03-15
            • 1970-01-01
            • 1970-01-01
            • 2020-07-17
            • 1970-01-01
            • 2021-11-06
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多