在 Python 中高效地迭代 3.311031748 E+12 组合答案

【问题标题】：Efficiently iterating over 3.311031748 E+12 combinations in Python在 Python 中高效地迭代 3.311031748 E+12 组合
【发布时间】：2019-11-24 00:02:22
【问题描述】：

我收集了一个大型口袋妖怪数据集，我的目标是根据我构建的比率确定“前 10 名团队” - 口袋妖怪 BST（基础统计总数据）：平均弱点。对于那些关心的人，我将平均弱点计算为口袋妖怪对每种类型的弱点的总和（0.25 飞行 + 1 水 + 2 钢 + 4 火等）然后除以 18（总数游戏中可用的类型）。

举个简单的例子 - 由以下三个口袋妖怪组成的团队：Kingler、Mimikyu、Magnezone 将产生 1604.1365384615383 的团队比率。

因为数据将用于竞技比赛，我删除了所有未完全进化的口袋妖怪以及传奇/神话口袋妖怪。到目前为止，这是我的过程：

创建完全进化的口袋妖怪团队的所有可能组合的集合
使用 for 循环遍历每个组合
前 10 个组合将自动添加到列表中
从第11个组合开始，我将当前团队迭代添加到列表中，将列表按降序排序，然后移除比例最低的团队。这样可以确保每次迭代后只保留前 10 名。

很明显，这个过程需要很长时间才能运行。我想知道是否有更有效的方法来运行它。最后，请看我下面的代码：

import itertools
import pandas as pd

df = pd.read_csv("Downloads/pokemon.csv")  # read in csv of fully-evolved Pokemon data
# list(df)  # list of df column names - useful to see what data has been collected
df = df[df["is_legendary"] == 0]  # remove legendary pokemon - many legendaries are allowed in competitive play
df = df[['abilities',  # trim df to contain only the columns we care about
        'against_bug',
        'against_dark',
        'against_dragon',
        'against_electric',
        'against_fairy',
        'against_fight',
        'against_fire',
        'against_flying',
        'against_ghost',
        'against_grass',
        'against_ground',
        'against_ice',
        'against_normal',
        'against_poison',
        'against_psychic',
        'against_rock',
        'against_steel',
        'against_water',
        'attack',
        'defense',
        'hp',
        'name',
        'sp_attack',
        'sp_defense',
        'speed',
        'type1',
        'type2']]
df["bst"] = df["hp"] + df["attack"] + df["defense"] + df["sp_attack"] + df["sp_defense"] + df["speed"]  # calculate BSTs
df['average_weakness'] = (df['against_bug'] # calculates a Pokemon's 'average weakness' to other types
                        + df['against_dark']
                        + df['against_dragon']
                        + df['against_electric']
                        + df['against_fairy']
                        + df['against_fight']
                        + df['against_fire']
                        + df['against_flying']
                        + df['against_ghost']
                        + df['against_grass']
                        + df['against_ground']
                        + df['against_ice']
                        + df['against_normal']
                        + df['against_poison']
                        + df['against_psychic']
                        + df['against_rock']
                        + df['against_steel']
                        + df['against_water']) / 18  
df['bst-weakness-ratio'] = df['bst'] / df['average_weakness']  # ratio of BST:avg weakness - the higher the better
names = df["name"]  # pull out list of all names for creating combinations
combinations = itertools.combinations(names, 6) # create all possible combinations of 6 pokemon teams
top_10_teams = []  # list for storing top 10 teams
for x in combinations:
    ratio = sum(df.loc[df['name'].isin(x)]['bst-weakness-ratio'])  # pull out sum of team's ratio
    if(len(top_10_teams) != 10):
        top_10_teams.append((x, ratio))  # first 10 teams will automatically populate list
    else:
        top_10_teams.append((x, ratio))  # add team to list
        top_10_teams.sort(key=lambda x:x[1], reverse=True)  # sort list by descending ratios
        del top_10_teams[-1]  # drop team with the lowest ratio - only top 10 remain in list
top_10_teams

【问题讨论】：

如果限制为 2 的组合需要多长时间？如果可能的话，甚至是 1 个？
好吧，我对口袋妖怪了解不多，但我首先要说的是，肯定有一种方法可以不遍历六个口袋妖怪的所有可能组合（六个火口袋妖怪肯定不会进入前 10例如团队），因此您可以首先尝试想一种方法来获取您当前拥有的 3.31e12 组合的子集！然后，我建议您将可能的组合分成较小的组（以免遇到内存错误），并尝试使用 NumPy 数组而不是 pandas 数据帧来矢量化您想要做的事情。
进展如何，你找到了完美的团队吗？

标签： python loops bigdata combinations

【解决方案1】：

在你的例子中，每个口袋妖怪都有一个 bst_weakness-ratio 并且在计算团队价值时你没有考虑成员抵消彼此的弱点，而只是简单地总结 6 个成员的比率？如果是这样，最好的团队不应该是拥有 6 个最佳个人口袋妖怪的团队吗？我不明白你为什么需要这些组合。

尽管如此，我想你可以在进入组合学之前从你的列表中删除很多口袋妖怪。如果你有一个布尔数组 (n_pokemons, n_types) 以 True 表示每个 Pokemon 的弱点，你可以检查是否存在具有相同弱点但 bst 值更好的 Pokemon。

# Loop over all pokemon and check if there are other pokemon
# ... with the exact same weaknesses but better stats
#                    -name      -weaknesses           -bst
#                    pokemon A  [0, 0, 1, 1, 0, ...], bst=34.85  -> delete A
#                    pokemon B  [0, 0, 1, 1, 0, ...], bst=43.58
# ... with a subset of the weaknesses and better stats
#                    pokemon A  [0, 0, 1, 1, 0, ...], bst=34.85  -> delete A
#                    pokemon B  [0, 0, 1, 0, 0, ...], bst=43.58

我用 numpy 写了一个小 sn-p。 bst 的值和弱点是随机选择。用我的设置

n_pokemons = 1000
n_types = 18
n_min_weaknesses = 1  # number of minimal and maximal weaknesses for each Pokemon 
n_max_weaknesses = 4

列表中只剩下大约 30-40 只宠物小精灵。我不确定这对于“真正的”口袋妖怪来说有多合理，但有了这样一个数字，组合搜索就更可行了。

import numpy as np
# Generate pokemons
name_arr = np.array(['pikabra_{}'.format(i) for i in range(n_pokemons)])
# Random stats
bst_arr = np.random.random(n_pokemons) * 100
# Random weaknesses 
weakness_array = np.zeros((n_pokemons, n_types), dtype=bool)  # bool array indicating the weak types of each pokemon
for i in range(n_pokemons):
    rnd_weaknesses = np.random.choice(np.arange(n_types), np.random.randint(n_min_weaknesses, n_max_weaknesses+1))
    weakness_array[i, rnd_weaknesses] = True


# Remove unnecessary pokemons
i = 0
while i < n_pokemons:
    j = i + 1
    while j < n_pokemons:
        del_idx = None

        combined_weaknesses = np.logical_or(weakness_array[i], weakness_array[j])
        if np.all(weakness_array[i] == weakness_array[j]):
            if bst_arr[j] < bst_arr[i]:
                del_idx = i
            else:
                del_idx = j

        elif np.all(combined_weaknesses == weakness_array[i]) and bst_arr[j] < bst_arr[i]:
            del_idx = i

        elif np.all(combined_weaknesses == weakness_array[j]) and bst_arr[i] < bst_arr[j]:
            del_idx = j

        if del_idx is not None:
            name_arr = np.delete(name_arr, del_idx, axis=0)
            bst_arr = np.delete(bst_arr, del_idx, axis=0)
            weakness_array = np.delete(weakness_array, del_idx, axis=0)
            n_pokemons -= 1

            if del_idx == i:
                i -= 1
                break
            else:
                j -= 1

        j += 1
    i += 1


print(n_pokemons)

【讨论】：