【问题标题】:Writing a function that is summing up certain values in a row in a pandas dataframe编写一个函数,在 pandas 数据框中连续汇总某些值
【发布时间】:2017-08-11 12:20:36
【问题描述】:

我有一个 pandas DataFrame,我想编写一个函数来帮助我将每个负值总结为 result1,并将每个正值总结为 result2。所以基本上,这个函数应该遍历列“total_load”

def total_battery(ok6, col_name='total_load'):
"""Return a dictionary with counts of occurrences."""


# Initialize an empty dictionary: cols_values
cols_values = {}

# Extract column from df: col
col = ok6[col_name]

# iterate over the column in df
for entry in col:


    if entry in cols_values.keys() < 0: ***--> then sum all the negative values*** 
        cols_values[entry] += sum

    else: 
        if entry in cols_values.keys() > 0: ***--> then sum all the negative values*** 
            cols_values[entry] += sum


    # Return the cols_count dictionary
    return cols_values

# Call count_entries(): result1
result1 = total_battery(ok6, "total_load")

# Call count_entries(): result2
result2 = total_battery(ok6, "total_load")

# Print result1 and result2
print(result1)
print(result2)

我希望这不会太令人困惑。

感谢您的帮助。

【问题讨论】:

    标签: python function pandas writing


    【解决方案1】:

    使用boolean indexingquery 进行过滤,然后使用Series.sum

    result1 = df.loc[df['total_load'] < 0, 'total_load'].sum()
    result2 = df.loc[df['total_load'] > 0, 'total_load'].sum()
    

    result1 = df.query('total_load < 0')['total_load'].sum()
    result2 = df.query('total_load > 0')['total_load'].sum()
    

    示例

    rng = pd.date_range('2016-06-01', periods=4, freq='T')
    df = pd.DataFrame({'total_load':[1,2,-3,-5]}, index=rng)
    print (df)
                         total_load
    2016-06-01 00:00:00           1
    2016-06-01 00:01:00           2
    2016-06-01 00:02:00          -3
    2016-06-01 00:03:00          -5
    
    result1 = df.loc[df['total_load'] < 0, 'total_load'].sum()
    result2 = df.loc[df['total_load'] > 0, 'total_load'].sum()
    print (result1)
    -8
    print (result2)
    3
    
    result1 = df.query('total_load < 0')['total_load'].sum()
    result2 = df.query('total_load > 0')['total_load'].sum()
    print (result1)
    -8
    print (result2)
    3
    

    【讨论】:

    • 太棒了,而且容易多了^^谢谢jezrael!
    • 是的。我也首先使用纯 python 进行数据分析,但为了获得更好的性能,我开始使用 pandas。这是一个非常好的决定:) 祝你好运!
    猜你喜欢
    • 2019-01-02
    • 2016-10-12
    • 2022-07-13
    • 1970-01-01
    • 1970-01-01
    • 2018-11-11
    • 1970-01-01
    • 2020-02-24
    • 2020-08-22
    相关资源
    最近更新 更多