使用列表索引或排序数据框答案

【问题标题】：Indexing or sorting dataframe using list使用列表索引或排序数据框
【发布时间】：2021-11-04 15:22:30
【问题描述】：

我想根据列表对数据框进行排序。数据框由唯一 id 组成，我有一个 id 列表。

注意：- 列表没有所有 id 的值。我使用了 df.loc，但它有局限性。

示例代码如下：

import pandas as pd
ratings_dict = {
    "ID": ["101", "102", "103", "104", "105"],
    "title": ['TV', 'AC', 'Monitor', 'Headphone', 'Laptop'],
    "rating": [1, 2, 2, 3, 2]
}

df = pd.DataFrame(ratings_dict)

trend_sort=["103","101"]

trend_sort 是 id 列表。

df.set_index('ID',inplace=True)
df=df.loc[trend_sort]

使用 df.loc 后，我得到了输出，

预期输出：

【问题讨论】：

标签： python python-3.x pandas list dataframe

【解决方案1】：

我唯一能想到的就是创建一个这样的新列表：

sorted_list = trend_sort + [i for i in df.index.tolist() if i not in trend_sort]

然后：

df = df.loc[sorted_list]

输出：

    title   rating
ID      
103 Monitor   2
101 TV        1
102 AC        2
104 Headphone 3
105 Laptop    2

【讨论】：

感谢您的解决方案，所有解决方案都运行良好，但此解决方案最省时。

【解决方案2】：

只需附加其余索引，无需对值进行排序，无需映射，无需 lambda：

trend_sort=["103","101"]
new_idx = pd.Index(trend_sort).append(df.index.difference(trend_sort))
df.loc[new_idx]

         title  rating
      <object> <int64>
103    Monitor       2
101         TV       1
102         AC       2
104  Headphone       3
105     Laptop       2

【讨论】：

我想我在这个解决方案上打败了你。 :-) 重新索引或定位非常接近相同。重新索引在处理不在索引中的值时更加灵活。

【解决方案3】：

我不确定你真正想要什么。但如果您希望表中的第一列成为趋势排序中的列，您可以执行以下操作

import pandas as pd
ratings_dict = {
    "ID": ["101", "102", "103", "104", "105"],
    "title": ['TV', 'AC', 'Monitor', 'Headphone', 'Laptop'],
    "rating": [1, 2, 2, 3, 2]
}

df = pd.DataFrame(ratings_dict)

trend_sort=["103","101"]

def get_sort_value(item,trend_l):
    if item in trend_l:
        return trend_l.index(item)
    return len(trend_l)+1

df['sort_column'] = df['ID'].apply(lambda x: get_sort_value(x,trend_sort))
df = df.sort_values(by=['sort_column'])
df = df.drop(columns=['sort_column'])
print(df)

【讨论】：

【解决方案4】：

另一种解决方案，使用.sort_values中的key=参数：

df = df.sort_values(
    by="ID", key=lambda x: x.map({v: i for i, v in enumerate(trend_sort)})
)
print(df)

打印：

    ID      title  rating
2  103    Monitor       2
0  101         TV       1
1  102         AC       2
3  104  Headphone       3
4  105     Laptop       2

【讨论】：

【解决方案5】：

你可以这样做：

df.reindex(pd.Index(trend_sort).append(df.index[~df.index.isin(trend_sort)]))

输出：

         title  rating
103    Monitor       2
101         TV       1
102         AC       2
104  Headphone       3
105     Laptop       2

【讨论】：

【解决方案6】：

您可以先找到每个ID的排名，然后按排名排序：

# to optimize the rank look up, store the rank / indices in a dictionary
rank = {v: i for i, v in enumerate(trend_sort)}
rank
# {'103': 0, '101': 1}

# map ID to the rank and if ID doesn't exist default to len of data frame
# so it will sorted to the end
df.loc[df.ID.map(lambda x: rank.get(x, len(df))).argsort()]

    ID      title  rating
2  103    Monitor       2
0  101         TV       1
1  102         AC       2
3  104  Headphone       3
4  105     Laptop       2

【讨论】：