【问题标题】:Assign same index number to duplicated values up to n duplicates将相同的索引号分配给最多 n 个重复的重复值
【发布时间】:2020-06-19 12:57:49
【问题描述】:

我有以下列,其中值被重复任意次数:

FRUIT
Apples
Bananas
Bananas
Pear
Pear
Pear
Pear
Melon
Melon
Melon
Melon
Melon
Melon
Orange
Orange
Orange
Orange
Orange
Orange
Orange
Orange
Orange

我想为每个值分配一个索引号,但是对于重复的值,我想重复该索引号最多 4 次。如果该值出现 10 次,我希望索引重复前四个,然后索引 + 1 到第二个四个,依此类推。例如:

Index    FRUIT
1        Apples
2        Bananas
2        Bananas
3        Pear
3        Pear
3        Pear
3        Pear
4        Melon
4        Melon
4        Melon
4        Melon
5        Melon
5        Melon
6        Orange
6        Orange
6        Orange
6        Orange
7        Orange
7        Orange
7        Orange
7        Orange
8        Orange

这是我的尝试:

fruit = {'FRUIT':['Apples','Bananas','Bananas','Pear','Pear','Pear','Pear','Melon','Melon','Melon','Melon','Melon','Melon','Orange','Orange','Orange','Orange','Orange','Orange','Orange','Orange','Orange']}
fruit_df = pd.DataFrame(fruit)

index = 0
index_and_fruit = []
for (columnName, columnData) in fruit_df.iteritems():
    fruit_list = fruit_df['FRUIT'].tolist()
    index = index + 1
    for i in fruit_list:
        if fruit_list.count(i) >= 4:
            index = index + 1
            index_with_fruit_list = {i:index}
            index_and_fruit.append(index_with_fruit_list)
            if fruit_list.count(i) >= 8:
                index_with_fruit_list = {i:index}
                index_and_fruit.append(index_with_fruit_list)
        else: 
            index_with_fruit_list = {i:index}
            index_and_fruit.append(index_with_fruit_list)
            print(index_and_fruit)

【问题讨论】:

    标签: python indexing duplicates


    【解决方案1】:

    您可以通过计算组内每个水果的相对索引来使用累积来形成组。这允许您设置组大小的最大值并在达到最大值时更改fruit r 时重置相对索引。

    通过此分组,您可以根据每个组的第一项分配顺序索引(再次使用累积):

    fruits = ['Apples','Bananas','Bananas','Pear','Pear','Pear','Pear','Melon','Melon',
              'Melon','Melon','Melon','Melon','Orange','Orange','Orange','Orange','Orange',
              'Orange','Orange','Orange','Orange']
    
    from itertools import accumulate
    
    maxGroup = 4
    indexes  = range(len(fruits))
    byGroup  = accumulate(indexes,lambda i,f: (i+1)*(f>0 and i<maxGroup-1 and fruits[f-1]==fruits[f]))
    indexes  = [i-1 for i in accumulate(int(g==0) for g in byGroup)]
    indexAndFruit = [(i,f) for i,f in zip(indexes,fruits)]
    

    输出:

    for i,f in indexAndFruit: print(i,f)
    
    0 Apples
    1 Bananas
    1 Bananas
    2 Pear
    2 Pear
    2 Pear
    2 Pear
    3 Melon
    3 Melon
    3 Melon
    3 Melon
    4 Melon
    4 Melon
    5 Orange
    5 Orange
    5 Orange
    5 Orange
    6 Orange
    6 Orange
    6 Orange
    6 Orange
    7 Orange
    

    为了说明这是如何工作的,让我们看看 byGroup 迭代器将产生什么:

    [0, 0, 1, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 0, 1, 2, 3, 0, 1, 2, 3, 0]
    

    索引从零重新开始的每个位置对应于水果的变化或达到最大值的相对索引

    此列表中的零对应于组的开始。将它们标记为 1 和其他索引的 0 将给出以下结果:

    [1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]
    

    如果我们运行这些初始位置的累积总和,我们会得到(基于一的)索引:

    [1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8]
    

    减 1 为我们提供了所需的索引,我们只需将其与水果组合(使用 zip):

    [0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7]
    

    【讨论】:

      【解决方案2】:

      我的看法,假设水果是订购的:

      fruits = ['Apples', 'Bananas', 'Bananas', 'Pear', 'Pear', 'Pear',
          'Pear', 'Melon', 'Melon', 'Melon', 'Melon', 'Melon', 'Melon',
          'Orange', 'Orange', 'Orange', 'Orange', 'Orange', 'Orange',
          'Orange', 'Orange', 'Orange']
      
      # The index for the next fruit.
      current_index = 0
      
      # The last fruit we've seen.
      last_fruit = None
      
      # The number of times we've assigned the current index to the last
      # fruit already.
      fruit_count = 0
      
      for fruit in fruits:
          if fruit != last_fruit or fruit == last_fruit and fruit_count >= 4:
              # This is either
              #   (a) a new fruit, or
              #   (b) a repeated fruit to which we've assigned the current
              #       index four times already.
              # In both cases, we want to skip to the next index.
              current_index += 1
              fruit_count = 0
      
          last_fruit = fruit
          fruit_count += 1
      
          print(current_index, fruit, f"(fruit_count={fruit_count})")
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2015-09-08
        • 2021-09-26
        • 1970-01-01
        • 2017-07-17
        • 2021-11-21
        • 1970-01-01
        • 2020-10-11
        相关资源
        最近更新 更多