【问题标题】:Transform dataframe based on columns and rows根据列和行转换数据框
【发布时间】:2022-12-17 16:05:02
【问题描述】:

我有以下数据框

CustomerNr Target Source Percentage
1001 A C 0.2
1004 D np.nan 0.3
1005 C D 0.4
1010 A D 0.5
import numpy as np
df = pd.DataFrame([[1001, 'A','C',0.2], [1004, 'D',np.nan,0.3],[1005, 'C','D',0.4], 
                   [1010, 'A','D',0.5]], columns=['CustomerNr','Target','Source','Percentage'])

到这个(顺便说一下如何制定这个问题的标题)

import numpy as np
df = pd.DataFrame([['1001 Target' , 'A',0.2],
                   ['1001 Source' , 'C',0.2], 
                   ['1004 Target', 'D',0.3],
                   ['1004 Source', np.nan,0.3],
                   ['1005 Target', 'C',0.4],
                   ['1005 Source', 'D',0.4],
                   ['10010 Target', 'A',0.5],
                   ['10010 Source', 'D',0.5],
                  ], columns=['CustomerNr Scope','Value','Percentage'])
CustomerNr Scope Value Percentage
1001 Target A 0.2
1001 Source C 0.2
1004 Target D 0.3
1004 Source NaN 0.3
1005 Target C 0.4
1005 Source D 0.4
10010 Target A 0.5
10010 Source D 0.5

【问题讨论】:

    标签: python dataframe


    【解决方案1】:

    你可以使用 pandas stack 来实现这个:

    (df.set_index(["CustomerNr", "Percentage"])
             .rename_axis("Scope", axis=1)
             .stack(dropna=False)
             .rename("Value")
             .reset_index()
             .assign(CustomerNrScope=lambda df: df[["CustomerNr", "Scope"]].astype(str).apply(" ".join, axis=1)))
    

    或者连接源表和目标表:

    df_new = pd.concat([df[["CustomerNr", tscol, "Percentage"]]
                   .rename(columns={tscol: "Value"})
                   .assign(Scope=tscol)
               for tscol in ["Target", "Source"]])
    df_new["CustomerNr Scope"] = df_new.CustomerNr.astype(str) + " " + df_new.Scope
    
    
    # result
       CustomerNr Value  Percentage   Scope CustomerNr Scope
    0        1001     A         0.2  Target      1001 Target
    1        1004     D         0.3  Target      1004 Target
    2        1005     C         0.4  Target      1005 Target
    3        1010     A         0.5  Target      1010 Target
    0        1001     C         0.2  Source      1001 Source
    1        1004   NaN         0.3  Source      1004 Source
    2        1005     D         0.4  Source      1005 Source
    3        1010     D         0.5  Source      1010 Source
    
    

    或者(基于 Zephyrus 的回答)使用带有 Percentage 的 melt 作为额外的 id_var 来立即获得所需的表(假设 Percentage 唯一地依赖于 CustomerNr):

    pd.melt(df, id_vars=['CustomerNr', "Percentage"],
            value_vars=['Target', 'Source'],
            var_name='Scope')
    

    【讨论】:

      【解决方案2】:

      您可以使用 pandas melt 来取消透视数据框:

      df_melted =  pd.melt(df, id_vars=['CustomerNr'], value_vars=['Target', 'Source'], var_name='Scope')
      

      这不包括 'Percentage' 列,但您可以将其合并回新的数据框:

      df_melted = df_melted.merge(df[[ 'CustomerNr', 'Percentage']], left_on='CustomerNr', right_on='CustomerNr' )
      
      If you want your `'CustomerNr'` column and `'Scope`' column together you can easily add them together to one column. 
      

      【讨论】:

        【解决方案3】:

        解决方案将是这样的。

        import pandas as pd
        df = pd.read_csv("test.csv", encoding='utf-8')
        df
        
        CustomerNr  Target  Source  Percentage
        0   1001    A         C         0.2
        1   1004    D         np.nan    0.3
        2   1005    C         D         0.4
        3   1010    A         D         0.5
        

        解决方案

        new_df = pd.DataFrame()
        new_indx = len(df)
        for ind, row in df.iterrows():
            
            print(new_indx,"  ", ind)
            new_df.at[ind, "CustomerNrScoope"] = str(row['CustomerNr'])+'Target'
            new_df.at[ind,'Value'] = row['Target']
            new_df.at[ind, 'Percentage'] = row['Percentage']
            
            new_df.at[new_indx,"CustomerNrScoope"] = str(row['CustomerNr'])+'Source'
            new_df.at[new_indx,'Value'] = row['Source']
            new_df.at[new_indx,'Percentage'] = row['Percentage']
            new_indx = new_indx +1
        

        输出

        
        CustomerNrScoope    Value   Percentage
        0   1001Target        A        0.2
        4   1001Source        C        0.2
        1   1004Target        D        0.3
        5   1004Source        np.nan    0.3
        2   1005Target        C        0.4
        6   1005Source        D        0.4
        3   1010Target        A        0.5
        7   1010Source        D        0.5
        

        【讨论】:

          猜你喜欢
          • 2015-04-30
          • 2018-07-11
          • 1970-01-01
          • 2021-10-23
          • 1970-01-01
          • 1970-01-01
          • 2023-03-29
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多