【问题标题】:Merge the data created in a loop (python)合并循环中创建的数据(python)
【发布时间】:2021-06-07 22:11:14
【问题描述】:

我有一个简单的数据集:

import pandas as pd
data = [['A', 10,16], ['B', 15,11], ['C', 14,8]] 
df = pd.DataFrame(data, columns = ['Name', 'Apple','Pear']) 

Output
    Name Apple  Pear
0   A   10  16
1   B   15  11
2   C   14  8

我想对不同水果的数量进行排名 - 苹果和梨。规则:

  1. 确定苹果和梨每个地方的区别
  2. 按位置排列差异。数量越近的两个地方排名越低
# apple
dif = abs(df['Apple'].values - df['Apple'].values[:, None])
df_apple  = pd.concat((df['Name'], pd.DataFrame(dif, columns = df['Name'])), axis=1)
df_apple1 = pd.melt(df_apple, id_vars = ['Name'], value_name='Difference_apple')
df_apple1 = df_apple1[df_apple1.Difference_apple != 0]
df_apple1['Ranking_apple'] = df_apple1.groupby('variable')['Difference_apple'].rank(method = 'dense', ascending = True)
df_apple1 = df_apple1[["variable","Name","Ranking_apple"]]
df_apple1

# Output - apple
    variable    Name    Ranking_apple
1   A   B   2.0
2   A   C   1.0
3   B   A   2.0
5   B   C   1.0
6   C   A   2.0
7   C   B   1.0
# pear
dif = abs(df['Pear'].values - df['Pear'].values[:, None])
df_pear  = pd.concat((df['Name'], pd.DataFrame(dif, columns = df['Name'])), axis=1)
df_pear1 = pd.melt(df_pear, id_vars = ['Name'], value_name='Difference_pear')
df_pear1 = df_pear1[df_pear1.Difference_pear != 0]
df_pear1['Ranking_pear'] = df_pear1.groupby('variable')['Difference_pear'].rank(method = 'dense', ascending = True)
df_pear1 = df_pear1[["variable","Name","Ranking_pear"]]
df_pear1

# output-pear
    variable    Name    Ranking_pear
1   A   B   1.0
2   A   C   2.0
3   B   A   2.0
5   B   C   1.0
6   C   A   2.0
7   C   B   1.0

这是每个水果的算法。因为我使用相同的逻辑,所以我可以为每个水果创建一个循环。 我不确定如何合并这两部分,因为我需要最终输出如下所示:

new_df = pd.merge(df_apple1, df_pear1,  how='inner', left_on=['variable','Name'], right_on = ['variable','Name'])

new_df = new_df[["variable","Name","Ranking_apple","Ranking_pear"]]

new_df

# output
variable    Name    Ranking_apple   Ranking_pear
0   A   B   2.0 1.0
1   A   C   1.0 2.0
2   B   A   2.0 2.0
3   B   C   1.0 1.0
4   C   A   2.0 2.0
5   C   B   1.0 1.0

我很欣赏任何想法。谢谢

【问题讨论】:

  • 有什么问题?似乎您有预期的输出。你只是想概括一下吗?
  • 是的,我想为多列使用一种算法。谢谢
  • 太好了,希望答案能满足您的需要。

标签: python pandas dataframe loops merge


【解决方案1】:

如果您希望将您的方法推广到任意数量的水果,您可以执行以下操作:

data = [['A', 10,16], ['B', 15,11], ['C', 14,8]] 
df = pd.DataFrame(data, columns = ['Name', 'Apple','Pear']) 

# all fruit
final = pd.DataFrame()
fruitcols = df.columns.values.tolist()
fruitcols.remove('Name')
for col in fruitcols:
    dif = abs(df[col].values - df[col].values[:, None])
    diff_col = 'Difference_{}'.format(col)
    rank_col = 'Ranking_{}'.format(col)
    df_frt  = pd.concat((df['Name'], pd.DataFrame(dif, columns = df['Name'])), axis=1)
    df_frt1 = pd.melt(df_frt, id_vars = ['Name'], value_name=diff_col)

    df_frt1 = df_frt1[df_frt1[diff_col] != 0]
    df_frt1[rank_col] = df_frt1.groupby('variable')[diff_col].rank(method = 'dense', ascending = True)
    df_frt1 = df_frt1[["variable","Name",rank_col]]
    df_frt1
    final = pd.concat([final, df_frt1], axis=1)

final.loc[:,~final.columns.duplicated()]


    variable    Name    Ranking_Apple   Ranking_Pear
1   A           B       2.0             1.0
2   A           C       1.0             2.0
3   B           A       2.0             2.0
5   B           C       1.0             1.0
6   C           A       2.0             2.0
7   C           B       1.0             1.0

【讨论】:

    猜你喜欢
    • 2017-11-22
    • 2015-09-10
    • 2021-10-20
    • 1970-01-01
    • 2019-01-31
    • 1970-01-01
    • 2022-01-18
    • 2019-08-12
    • 1970-01-01
    相关资源
    最近更新 更多