如何在 Python 中创建大小不同（2D 列表大小不同）的 2D 列表的 3D 列表？答案

【问题标题】：How to create a 3D list of 2D lists that vary in size(The 2D lists vary in size) in Python?如何在 Python 中创建大小不同（2D 列表大小不同）的 2D 列表的 3D 列表？
【发布时间】：2021-08-24 13:00:52
【问题描述】：

我有这样的数据框。您可以看到名为 filename 的列表示该行在一个文件中，而另一行在另一个文件中。我还创建了另一列来计算文件中的总行数。

我通过将它们全部连接到一个列表中提取了 ymin 和 ymax，结果是一个 2D 列表：

y = [[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38],[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39],...]

因此，这只是将所有坐标放入列表中，不知道哪个属于第一个文件，哪个属于第二个文件

我的方法是制作这样的 3D 列表：

y = [[[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38]],[[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39]],...]

您可以从[4,43] to [1,38] 看到它们在同一个文件中。从[2,40] to [2,39]也可以看到，它们也在同一个文件中。

这是我目前的尝试

def get_y_coordinate(count):
    """
    Create a 3D list of y_coordinates that can distinguish which list is in a file which list belongs to another file
    :param: count - the list taken from the column "count" from the dataframe
    """
    c = 2 # Number of chunks to make
    fi_y= lambda y, c: [y[i:i+c] for i in range(0, len(y), c)] # Making y into chunks of 2 ex: from [4,43,9,47] to [4,43],[9,74]
    y = fi_y(y,c) # Now y is [[4, 43], [9, 47],...]]

    # This is my current approach, I create a new list called bigy. 

    bigy = []   
    current = 0 
    for i in count:
        if current != i:
            bigy.append([y[j] for j in range(current, i+current)])
        current = i
        
    return bigy
>> bigy = [[[4, 43], [9, 47], [76, 122], [30, 74], [10, 47], [81, 125], [84, 124], [47, 90], [1, 38]],[[2, 40], [2, 44], [4, 48], [5, 48], [6, 44], [8, 45], [75, 116], [73, 123], [28, 73], [39, 84], [84, 121], [2, 39]],...]

我在前几百个文件中获得了结果。但是，直到文件 700 左右，它不再起作用。如果有人能有足够的耐心阅读到这里并帮助我，我需要另一种方法来解决这个问题。非常感谢！

【问题讨论】：

从概念上讲，您想要每个“文件”的最小/最大元组数组对吗？另外，当您说它在 700 左右后无法工作时，您是收到错误还是太慢了？
是的，这就是我想要的
不显示错误，停止追加新值
总共大约 17000 个元组，只有 13000 个元组
这能回答你的问题吗？ How to group dataframe rows into list in pandas groupby

标签： arrays python-3.x pandas list algorithm

【解决方案1】：

我认为我的第一个倾向是遍历dataframe 并在defaultdict 中收集结果。可能是这样的：

import collections
import pandas

mock_data = pandas.DataFrame([
    {"Name": "product_name", "ymin": 4, "ymax": 43},
    {"Name": "product_name", "ymin": 9, "ymax": 47},
    {"Name": "product_total_money", "ymin": 76, "ymax": 122},
    {"Name": "vat", "ymin": 30, "ymax": 74},
    {"Name": "product_name", "ymin": 10, "ymax": 47}
])

y_results = collections.defaultdict(list)
for _, row in mock_data.iterrows():
    y_results[row["Name"]].append((row["ymin"], row["ymax"]))

print(y_results)

或者，您也可以尝试：

mock_data.groupby('Name').agg(lambda x: list(x))

【讨论】：