【问题标题】:Transform and reshape a Data Frame from wide to long with additional column使用附加列将数据框从宽转换为长
【发布时间】:2021-05-05 22:46:04
【问题描述】:

我有一个数据框,我想将其从宽格式转换为长格式。但我不想使用所有列。
详细的,我想融化下面的数据框

import pandas as pd
data = {'year': [2014, 2018,2020,2017], 
        'model':[12, 14,21,8],
        'amount': [100, 120,80,210],
        'quality': ["low", "high","medium","high"]
       }

# pass column names in the columns parameter 
df = pd.DataFrame.from_dict(data)
print(df)

进入这个数据框:

data2 = {'year': [2014, 2014, 2018, 2018, 2020, 2020, 2017, 2017], 
        'variable': ["model", "amount", "model", "amount", "model", "amount", "model", "amount"],
        'value':[12, 100, 14, 120, 21, 80, 8, 210],
        'quality': ["low", "low", "high", "high", "medium", "medium", "high", "high"]
       }

# pass column names in the columns parameter 
df2 = pd.DataFrame.from_dict(data2)
print(df2)

我尝试了 pd.melt() 与输入参数的不同组合,如果我不考虑 quality 列,它会以某种方式工作。但是根据结果,我不能跳过 quality 列。此外,我尝试了 df.pivot()、df.pivot_table() 和 pd.wide_to_long()。所有在几个组合。但不知何故,我没有得到想要的结果。在执行任何 pd.melt() 操作之前,将列 yearquality 推入数据框索引可能会有所帮助?

非常感谢您提前提供的帮助!

【问题讨论】:

    标签: dataframe indexing pivot melt


    【解决方案1】:
    import pandas as pd
    
    data = {'year': [2014, 2018,2020,2017],
            'model':[12, 14,21,8],
            'amount': [100, 120,80,210],
            'quality': ["low", "high","medium","high"]
           }
    
    # pass column names in the columns parameter
    df = pd.DataFrame.from_dict(data)
    print(df)
    
    data2 = {'year': [2014, 2014, 2018, 2018, 2020, 2020, 2017, 2017],
            'variable': ["model", "amount", "model", "amount", "model", "amount", "model", "amount"],
            'value':[12, 100, 14, 120, 21, 80, 8, 210],
            'quality': ["low", "low", "high", "high", "medium", "medium", "high", "high"]
           }
    
    # pass column names in the columns parameter
    df2 = pd.DataFrame.from_dict(data2)
    print(df2)
    
    df3 = pd.melt(df, id_vars=['year', 'quality'], var_name='variable', value_name='value')
    df3 = df3[['year', 'variable', 'value', 'quality']]
    df3.sort_values('year', inplace=True)
    
    print(df3)
    

    输出(用于 df3):

       year variable  value quality
    0  2014    model     12     low
    4  2014   amount    100     low
    3  2017    model      8    high
    7  2017   amount    210    high
    1  2018    model     14    high
    5  2018   amount    120    high
    2  2020    model     21  medium
    6  2020   amount     80  medium
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-08-11
      • 1970-01-01
      • 1970-01-01
      • 2016-07-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多