【问题标题】:Python pandas 1.3.5 to 1.4.0 breaking changes - Got array instead of stringPython pandas 1.3.5 to 1.4.0 breaking changes - Got array instead of string
【发布时间】:2022-12-27 14:50:24
【问题描述】:

I'm encountering an error with the update of pandas version from 1.3.5 to the major version 1.4.0. It is still happening on all subversion 1.4.2 and 1.4.4.

Here is my code:

    print(df.T.to_dict().values())
    df = df.reset_index().groupby(['startTime']).agg({
        'startTime': np.unique,
        'endTimes': lambda field: list(field),
        'durationSplit': lambda field: list(field),
        'split': lambda field: list(field),
    })
    print(df.T.to_dict().values())

With version 1.35. it print:

dict_values([{'startTime': '1970-01-01T10:30:00', 'endTimes': '1970-01-01T13:00:00', 'durationSplit': None, 'split': None}])
dict_values([{'startTime': '1970-01-01T10:30:00', 'endTimes': ['1970-01-01T13:00:00'], 'durationSplit': [None], 'split': [None]}])

With versions 1.4.0, 1.4.2, 1.4.4 (1.5.0 too) it print:

dict_values([{'startTime': '1970-01-01T10:30:00', 'endTimes': '1970-01-01T13:00:00', 'durationSplit': None, 'split': None}])
dict_values([{'startTime': array(['1970-01-01T10:30:00'], dtype=object), 'endTimes': ['1970-01-01T13:00:00'], 'durationSplit': [None], 'split': [None]}])

I cannot find any breaking change about that with pandas or found someone else with the same problem.

I only get a new warning here which say:

FutureWarning: Dropping invalid columns in SeriesGroupBy.agg is deprecated. In a future version, a TypeError will be raised. Before calling .agg, select only columns which should be valid for the function.

Do you have more information or can explain me what is going on ? or how can I do something similar differently :')

Thank you by advance for your help !

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    There is a way to write this aggregation which will work in both versions of Pandas. Specifically, you could write this:

    df = df.reset_index().groupby(['startTime']).agg({
        'startTime': 'first',
        ...
    })
    

    Since startTime is part of your group key, taking the first element of your group is the same as taking all unique elements. It also won't result in an array.

    Bisecting the pandas git repository, I find the commit which first has this behavior is ad0baebc2a7015a7cf80d39c5b5b21dd8e8bbba6, part of PR #44122. Since it's marked as a refactoring, it probably wasn't an intended change.

    【讨论】:

      猜你喜欢
      • 2021-05-05
      • 2021-11-01
      • 2020-12-05
      • 2021-07-28
      • 2018-12-23
      • 2020-10-10
      • 1970-01-01
      • 2020-11-22
      • 1970-01-01
      相关资源
      最近更新 更多