【问题标题】:Add new column to dataframe based on an average根据平均值向数据框添加新列
【发布时间】:2018-12-12 08:24:13
【问题描述】:

我有一个包含项目类别、货币、投资者数量、目标等的数据框,我想创建一个新列,该列将是“其类别的平均成功率”:

   state        category main_category currency  backers country  \

0      0          Poetry    Publishing      GBP        0      GB
1      0  Narrative Film  Film & Video      USD       15      US
2      0  Narrative Film  Film & Video      USD        3      US
3      0           Music         Music      USD        1      US
4      1     Restaurants          Food      USD      224      US

   usd_goal_real  duration  year       hour
0        1533.95        59  2015    morning
1       30000.00        60  2017    morning
2       45000.00        45  2013    morning
3        5000.00        30  2012    morning
4       50000.00        35  2016  afternoon

我有系列格式的平均成功率:

Dance           65.435209

Theater         63.796134

Comics          59.141527

Music           52.660558

Art             44.889045

Games           43.890467

Film & Video    41.790649

Design          41.594386

Publishing      34.701650

Photography     34.110847

Fashion         28.283186

Technology      23.785582

现在我想添加一个新列,其中每一列都有与其类别匹配的成功率,即无论该行是技术,新列将包括该行的 23.78。

df[category_success_rate] = i希望输出列是与“主类别”列中的类别匹配的成功百分比。

【问题讨论】:

标签: python pandas multiple-columns


【解决方案1】:

我认为您需要带有布尔掩码的GroupBy.transformdf['state'].eq(1)(df['state'] == 1)

df['category_success_rate'] = (df['state'].eq(1)
                                 .groupby(df['main_category']).transform('mean') * 100)

替代方案:

df['category_success_rate'] = ((df['state'] == 1)
                                 .groupby(df['main_category']).transform('mean') * 100)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-12-29
    • 2020-02-24
    • 1970-01-01
    • 1970-01-01
    • 2019-04-04
    • 2017-01-14
    • 1970-01-01
    相关资源
    最近更新 更多