【问题标题】:Pandas sum over duplicated indices with sumPandas 用 sum 对重复的索引求和
【发布时间】:2016-05-26 00:45:43
【问题描述】:

我有一个按日期索引的数据框

transactions_ind
Out[25]: 
                   Ticker     Transaction  Number_of_units      Price
Date                                                                 
2012-10-11  ROG VX Equity             Buy            12000  182.00000
2012-10-16  ROG VX Equity            Sell            -5000  184.70000
2012-11-16  ROG VX Equity            Sell            -5000  175.51580
2012-12-07  ROG VX Equity             Buy             5000  184.90000
2012-12-11  ROG VX Equity            Sell            -3000  188.50000
2012-12-11  ROG VX Equity  Reversal: Sell             3000  188.50000
2012-12-11  ROG VX Equity            Sell            -3000  188.50000
2012-12-11  ROG VX Equity  Reversal: Sell             3000  188.50000
2012-12-11  ROG VX Equity            Sell            -3000  188.50000
2012-12-20  ROG VX Equity            Sell            -5000  185.80000

我想对重复的索引值 (2012-12-11) 求和,但只对“Number_of_units”列求和。

transactions_ind
Out[25]: 
                   Ticker     Transaction  Number_of_units      Price
Date                                                                 
2012-10-11  ROG VX Equity             Buy            12000  182.00000
2012-10-16  ROG VX Equity            Sell            -5000  184.70000
2012-11-16  ROG VX Equity            Sell            -5000  175.51580
2012-12-07  ROG VX Equity             Buy             5000  184.90000
2012-12-11  ROG VX Equity            Sell            -3000  188.50000
2012-12-20  ROG VX Equity            Sell            -5000  185.80000

使用

transactions_ind.groupby(transactions_ind.index).sum()

删除列“Ticker”和“Transaction”,因为它们填充了非数字值。另外,当我对“Number_of_units”列求和时,我想知道如何处理“Transactions”列中的不同字符串。希望熊猫中存在单线。感谢您的帮助!

【问题讨论】:

    标签: python pandas indexing duplicates


    【解决方案1】:

    您可以将aggfirstsum 一起使用:

    df = df.groupby(df.index).agg({'Ticker': 'first',
                                    'Transaction': 'first',
                                    'Number_of_units':sum, 
                                    'Price': 'first'})
    #reorder columns
    df = df[['Ticker','Transaction','Number_of_units','Price']]
    print df
                       Ticker Transaction  Number_of_units     Price
    Date                                                            
    2012-10-11  ROG VX Equity         Buy            12000  182.0000
    2012-10-16  ROG VX Equity        Sell            -5000  184.7000
    2012-11-16  ROG VX Equity        Sell            -5000  175.5158
    2012-12-07  ROG VX Equity         Buy             5000  184.9000
    2012-12-11  ROG VX Equity        Sell            -3000  188.5000
    2012-12-20  ROG VX Equity        Sell            -5000  185.8000
    

    【讨论】:

      【解决方案2】:

      如果(如您的情况)您只有一个索引列,则接受的答案非常有用。如果你有一个 MultiIndex,不幸的是它会减少到一个元组。这是恢复 MultiIndex 的方法:

      import pandas as pd
      index_names = df.index.names
      df = df.groupby(df.index).agg({...})
      df.index = pd.MultiIndex.from_tuples(df.index, names=index_names)
      

      【讨论】:

        猜你喜欢
        • 2018-07-27
        • 2017-06-11
        • 2017-08-09
        • 1970-01-01
        • 2016-05-14
        • 2013-12-10
        • 2013-04-01
        • 2017-05-18
        • 1970-01-01
        相关资源
        最近更新 更多