【问题标题】:How to update multiple column values in pandas如何更新熊猫中的多个列值
【发布时间】:2022-01-24 02:11:13
【问题描述】:

一直试图破解这个问题,但现在卡住了。 这是我的代码

l=list()
column_name=[col for col in df.columns if 'SalesPerson' in col]
filtereddf=pd.DataFrame(columns=['Item','SerialNo','Location','SalesPerson01','SalesPerson02',SalesPerson03',SalesPerson04',SalesPerson05',SalesPerson06','PredictedSales01','PredictedSales02','PredictedSales03','PredictedSales04','PredictedSales05','PredictedSales06']
for i,r in df.iterrows():
       if len(r['Name'].split(';'))>1:
            for x in r['Name'].split(';'):
                for y in column_name:
                    if x in r[y]:
                        number_is=y[-2:]
                        filtereddf.at[i,'SerialNo']=r['SerialNo']
                        filtereddf.at[i,'Location']=r['Location']
                        filtereddf.at[i,y]=r[y]
                        filtereddf.at[i,'Item']=r['Item']
                        filtereddf.at[i,f'PredictedSales{number_is}']=r[f'PredictedSales{number_is}']
#The below statement however prints the values correctly. But I want to filter the values and use in a dataframe
#print(r['SerialNo'],r['Location'],r[f'SalesPerson{number_is}'],r[f'PredictedSales{number_is}]',r['Definition'])
                        l.append(filtereddf)
       elif for y in column_name:
            if r['Name'] in r[y]:                
                        number_is=y[-2:]
                        filtereddf.at[i,'SerialNo']=r['SerialNo']
                        filtereddf.at[i,'Location']=r['Location']
                        filtereddf.at[i,y]=r[y]
                        filtereddf.at[i,'Item']=r['Item']
                        filtereddf.at[i,f'PredictedSales{number_is}']=r[f'PredictedSales{number_is}']
#The below statement however prints the values correctly. But I want to filter the values and use in a dataframe
#print(r['SerialNo'],r['Location'],r[f'SalesPerson{number_is}'],r[f'PredictedSales{number_is}]',r['Definition'])
                        l.append(filtereddf)
finaldf=pd.concat(l,ignore_index=True)

它最终会抛出一个错误

MemoryError: Unable to allocate 9.18 GiB for an array with shape (1, 1231543895) and data type object

基本上我想从主数据框 df 中提取 SalesPersonNN 和相应的 PredicatedSalesNN

采样数据集是(实际的 csv 文件几乎有 100k 个条目)

Item    Name    SerialNo    Location    SalesPerson01   SalesPerson02   SalesPerson03   SalesPerson04   SalesPerson05   SalesPerson06   PredictedSales01    PredictedSales02    PredictedSales03    PredictedSales04    PredictedSales05    PredictedSales06
0   TV  Joe;Mary;Philip 11111   NY  Tom Julie   Joe Sara    Mary    Philip  90  80  30  98  99  100
1   WashingMachine  Mike    22222   NJ  Tom Julie   Joe Mike    Mary    Philip  80  70  40  74  88  42
2   Dishwasher  Tony;Sue    33333   NC  Margaret    Tony    William Brian   Sue Bert    58  49  39  59  78  89
3   Microwave   Bill;Jeff;Mary  44444   PA  Elmo    Bill    Jeff    Mary    Chris   Kevin   80  70  90  56  92  59
4   Printer Keith;Joe   55555   DE  Keith   Clark   Ed  Matt    Martha  Joe 87  94  59  48  74  89

我希望输出数据框看起来像

tem Name    SerialNo    Location    SalesPerson01   SalesPerson02   SalesPerson03   SalesPerson04   SalesPerson05   SalesPerson06   PredictedSales01    PredictedSales02    PredictedSales03    PredictedSales04    PredictedSales05    PredictedSales06
0   TV  Joe;Mary;Philip 11111   NY  NaN NaN Joe NaN Mary    Philip  NaN NaN 30.0    NaN 99.0    100.0
1   WashingMachine  Mike    22222   NJ  NaN NaN NaN Mike    NaN NaN NaN NaN NaN 74.0    NaN NaN
2   Dishwasher  Tony;Sue    33333   NC  NaN Tony    NaN NaN Sue NaN NaN 49.0    NaN NaN 78.0    NaN
3   Microwave   Bill;Jeff;Mary  44444   PA  NaN Bill    Jeff    Mary    NaN NaN NaN 70.0    90.0    56.0    NaN NaN
4   Printer Keith;Joe   55555   DE  Keith   NaN NaN NaN NaN Joe 87.0    NaN NaN NaN NaN 89.0
​

我不确定我使用 dataframe.at 的方法是否正确,或者是否有任何关于我可以用来有效过滤与列名称中的值匹配的列值的指针

【问题讨论】:

  • 您会将示例和预期的输出数据帧添加为文本,而不是图像吗?无法从图像中复制文本。
  • @richardec - 很抱歉。尝试粘贴为文本,但格式难以阅读
  • 其实很完美。只要任何单元格/列中没有空格,我就可以复制它并使用pd.read_clipboard 将其很好地放入数据框中。

标签: python pandas dataframe loops


【解决方案1】:

我建议从以列为重点的数据框更改为以行为重点的数据框。你可以使用melt重写你的数据集:

df_person = df.loc[:,'SalesPerson01':'SalesPerson06']
df_sales = df.loc[:,'PredictedSales01':'PredictedSales06']
df_person = df_person.melt(ignore_index=False, value_name='SalesPerson')[['SalesPerson']]
PredictedSales = df_sales.melt(ignore_index=False, value_name='PredictedSales')[['PredictedSales']]
df_person['PredictedSales'] = PredictedSales

index_cols = ['Item','SerialNo', 'Location', 'SalesPerson']
df_person = df_person.reset_index().sort_values(index_cols).set_index(index_cols)

df_person 将如下所示:

Item            SerialNo    Location    SalesPerson PredictedSales
TV              11111       NY          Joe         30
                                        Julie       80
                                        Mary        99
                                        Philip      100
                                        Sara        98
                                        Tom         90
WashingMachine  22222       NJ          Joe         40
                                        Julie       70
                                        Mary        88
                                        Mike        74
                                        Philip      42
                                        Tom         80
...             ...         ...         ...         ...
Printer         55555       DE          Clark       94
                                        Ed          59
                                        Joe         89
                                        Keith       87
                                        Martha      74
                                        Matt        48

现在您只需要“名称”列中名称的值。因此我们使用explode创建一个单独的数据框:

df_names = df[['Name']].explode('Name').rename({'Name':'SalesPerson'}, axis=1)
df_names = df_names.reset_index().set_index(['Item','SerialNo', 'Location', 'SalesPerson'])

df_names 看起来像这样:

Item            SerialNo    Location    SalesPerson
TV              11111       NY          Joe
                                        Mary
                                        Philip
WashingMachine  22222       NJ          Mike
Dishwasher      33333       NC          Tony
                                        Sue
Microwave       44444       PA          Bill
                                        Jeff
                                        Mary
Printer         55555       DE          Keith
                                        Joe

现在您可以简单地合并您的数据框:

df_names.merge(df_person, left_index=True, right_index=True)

现在 PredictedSales 已添加到您的 df_names 数据框。

希望这将运行没有错误。请告诉我?

【讨论】:

    猜你喜欢
    • 2022-11-16
    • 2020-01-04
    • 2022-10-06
    • 2018-12-29
    • 2018-02-01
    • 2019-05-02
    • 2016-10-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多