【问题标题】:Python sort dataframe columns in groups by integer tied to stringsPython按与字符串绑定的整数对数据框列进行分组
【发布时间】:2021-02-19 11:44:21
【问题描述】:

我有一个包含以下列和值的数据框:

product 1_vendor    2_vendor    3_vendor    price_shop1 price_shop2 price_shop3 url_shop1   url_shop2   url_shop3
blue    shop1       shop3       shop2       500         600         550         1.com/blue  2.com/blue  3.com/blue
pink    shop3       shop2       shop1       700         650         600         1.com/pink  2.com/pink  3.com/pink
cyan    shop1       shop2       shop3       0           200         300         1.com/cyan  2.com/cyan  3.com/cyan

“1_vendor”是最便宜的供应商的名称,“3_vendor”是最昂贵的供应商的名称。

根据这些信息,我想以列结尾:product、1_vendor、1_price、1_url、2_vendor、2_price、2_url 等。按照 1_ 最便宜和 3_ 最贵的顺序。像这样:

product 1_vendor 1_price 1_url      2_vendor 2_price 2_url
blue    shop3    555     3.com/blue shop1    700     1.com/blue

我以为我可以对每一列使用 .replace 来将“shop”字符串更改为价格和 url,但是下面的代码给出了错误。

df['1_url'] = df['1_vendor'].replace('shop1', df['url_shop1'])
df['1_url'] = df['1_vendor'].replace('shop2', df['url_shop2'])

ValueError: Series.replace cannot use dict-value and non-None to_replace

如果我以 str(df['url_shop1']) 开头,它会运行,但会用整个列的值填充单元格。

如何以这种方式对数据框进行排序?我最终会导出为 CSV。

【问题讨论】:

    标签: python pandas dataframe csv sorting


    【解决方案1】:


    我希望我正确理解了您的问题。
    我有点喝醉了,所以可能有错误,强制在家办公的几周比预期的要难 xD。

    反正有解决办法:
    # Import pandasand numpy
    import pandas as pd
    import numpy as np
    
    # Sample df
    product = ['blue', 'pink', 'cyan']
    v1_vendor = ['shop1', 'shop3', 'shop1']
    v2_vendor = ['shop3', 'shop2', 'shop2']
    v3_vendor = ['shop2', 'shop1', 'shop3']
    price_shop1 = [500, 700, 0]
    price_shop2 = [600, 650, 200]
    price_shop3 = [550, 600, 300]
    url_shop1 = ['1.com/blue', '1.com/pink', '1.com/cyan']
    url_shop2 = ['2.com/blue', '2.com/pink', '2.com/cyan']
    url_shop3 = ['3.com/blue', '3.com/pink', '3.com/cyan']
    
    df = pd.DataFrame({'product':product, '1_vendor' : v1_vendor, '2_vendor' : v2_vendor, '3_vendor' : v3_vendor, 'price_shop1' : price_shop1, 'price_shop2' : price_shop2, 'price_shop3' : price_shop3,'url_shop1' : url_shop1,'url_shop2' : url_shop2,'url_shop3' : url_shop3})
    


    # Create second dataframe that we will fill with final data
    df_f = pd.DataFrame({'product':product})
    df_f['1_vendor'] = np.nan
    df_f['1_price'] = np.nan
    df_f['1_url'] = np.nan
    df_f['2_vendor'] = np.nan
    df_f['2_price'] = np.nan
    df_f['2_url'] = np.nan
    df_f['3_vendor'] = np.nan
    df_f['3_price'] = np.nan
    df_f['3_url'] = np.nan
    



    现在我们可以使用简单的 for 函数来循环原始 df 并提取结果。

    # For loop to fill in the final dataframe
    for i in list(df.index.values):
        df_f.loc[i, '1_vendor'] = df.loc[i,'1_vendor']
        df_f.loc[i, '2_vendor'] = df.loc[i,'2_vendor']
        df_f.loc[i, '3_vendor'] = df.loc[i,'3_vendor']
        df_f.loc[i, '1_price'] = df.loc[i, 'price_'+df_f.loc[i,'1_vendor']]
        df_f.loc[i, '2_price'] = df.loc[i, 'price_'+df_f.loc[i,'2_vendor']]
        df_f.loc[i, '3_price'] = df.loc[i, 'price_'+df_f.loc[i,'3_vendor']]
        df_f.loc[i, '1_url'] = df.loc[i, 'url_'+df_f.loc[i,'1_vendor']]
        df_f.loc[i, '2_url'] = df.loc[i, 'url_'+df_f.loc[i,'2_vendor']]
        df_f.loc[i, '3_url'] = df.loc[i, 'url_'+df_f.loc[i,'3_vendor']]
    


    编辑:对于导出,只需使用 to_csv 命令,如果您有问题,请告诉我。

    好的,应该就是这样。
    如果我没有收到问题或您有任何问题,请告诉我。
    祝你好运!

    (如果是正确答案请标记,谢谢)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-02-21
      • 2019-10-16
      • 1970-01-01
      • 2022-01-01
      • 2018-02-28
      • 2022-01-22
      相关资源
      最近更新 更多