【问题标题】:Simultaneously melt multiple columns in Python Pandas在 Python Pandas 中同时熔化多个列
【发布时间】:2019-01-02 06:50:23
【问题描述】:

想知道 pd.melt 是否支持熔化多列。我有以下示例尝试将 value_vars 作为列表列表,但出现错误:

ValueError: Location based indexing can only have [labels (MUST BE IN THE INDEX), slices of labels (BOTH endpoints included! Can be slices of integers if the index is integers), listlike of labels, boolean] types

使用熊猫 0.23.1。

df = pd.DataFrame({'City': ['Houston', 'Austin', 'Hoover'],
                   'State': ['Texas', 'Texas', 'Alabama'],
                   'Name':['Aria', 'Penelope', 'Niko'],
                   'Mango':[4, 10, 90],
                   'Orange': [10, 8, 14], 
                   'Watermelon':[40, 99, 43],
                   'Gin':[16, 200, 34],
                   'Vodka':[20, 33, 18]},
                 columns=['City', 'State', 'Name', 'Mango', 'Orange', 'Watermelon', 'Gin', 'Vodka'])

期望的输出:

      City    State       Fruit  Pounds  Drink  Ounces
0  Houston    Texas       Mango       4    Gin    16.0
1   Austin    Texas       Mango      10    Gin   200.0
2   Hoover  Alabama       Mango      90    Gin    34.0
3  Houston    Texas      Orange      10  Vodka    20.0
4   Austin    Texas      Orange       8  Vodka    33.0
5   Hoover  Alabama      Orange      14  Vodka    18.0
6  Houston    Texas  Watermelon      40    nan     NaN
7   Austin    Texas  Watermelon      99    nan     NaN
8   Hoover  Alabama  Watermelon      43    nan     NaN

我试过了,我得到了上述错误:

df.melt(id_vars=['City', 'State'], 
        value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']],var_name=['Fruit', 'Drink'], 
        value_name=['Pounds', 'Ounces'])

【问题讨论】:

  • 预期输出如何?
  • 嗨,jezrael,您可以在所需输出下方看到它:
  • 你确定吗?看起来像输入数据
  • 对不起,你说得对,我忽略了复制/粘贴 - 现在修复它

标签: pandas melt valueerror


【解决方案1】:

对每个类别使用双精度 melt,然后使用 concat,但由于重复值添加 cumcount 以在 MultiIndex 中添加唯一的 triples

df1 = df.melt(id_vars=['City', 'State'], 
              value_vars=['Mango', 'Orange', 'Watermelon'],
              var_name='Fruit', value_name='Pounds')
df2 = df.melt(id_vars=['City', 'State'], 
              value_vars=['Gin', 'Vodka'], 
              var_name='Drink', value_name='Ounces')

df1 = df1.set_index(['City', 'State', df1.groupby(['City', 'State']).cumcount()])
df2 = df2.set_index(['City', 'State', df2.groupby(['City', 'State']).cumcount()])


df3 = (pd.concat([df1, df2],axis=1)
         .sort_index(level=2)
         .reset_index(level=2, drop=True)
         .reset_index())
print (df3)
      City    State       Fruit  Pounds  Drink  Ounces
0   Austin    Texas       Mango      10    Gin   200.0
1   Hoover  Alabama       Mango      90    Gin    34.0
2  Houston    Texas       Mango       4    Gin    16.0
3   Austin    Texas      Orange       8  Vodka    33.0
4   Hoover  Alabama      Orange      14  Vodka    18.0
5  Houston    Texas      Orange      10  Vodka    20.0
6   Austin    Texas  Watermelon      99    NaN     NaN
7   Hoover  Alabama  Watermelon      43    NaN     NaN
8  Houston    Texas  Watermelon      40    NaN     NaN

【讨论】:

  • 谢谢你,jezrael,我正在寻找一个“更短”的解决方案,因为我可以拥有具有多组列(不仅是 2 个)的数据帧,如本例所示,但这也可以,
【解决方案2】:

一个已经很好回答的老问题;这只是一种替代方法,它依赖于来自pyjanitor 的辅助函数,特别是pivot_longerpivot_wider,来帮助重塑过程:

# pip install pyjanitor

import pandas as pd
import janitor as jn

fruits = ("Mango", "Orange", "Watermelon")
drinks = ("Gin", "Vodka")
mapp = {key : "fruits" for key in fruits} | {key : "drinks" for key in drinks}

(df.rename(columns = lambda col: f"{col}_{fruits.index(col)}_pounds"
                                 if col in fruits
                                 else f"{col}_{drinks.index(col)}_ounces" 
                                 if col in drinks 
                                 else col)
.pivot_longer(index = slice('City', 'Name'), 
              names_to = ('generic', 'position', '.value'), 
              names_sep = '_')
.assign(temp = lambda df: df.generic.map(mapp))
.pivot_wider(index = [slice('City', 'Name'), 'position'], 
             names_from = 'temp')
.dropna(how='all', axis = 1)
.rename(columns = lambda col: col.replace("generic_","")
                                 .replace("_drinks", "")
                                 .replace("_fruits", ""))
.loc[:, ['City', 'State', 'fruits', 'pounds', 'drinks', 'ounces']]
)

      City    State      fruits  pounds drinks  ounces
0   Austin    Texas       Mango    10.0    Gin   200.0
1   Austin    Texas      Orange     8.0  Vodka    33.0
2   Austin    Texas  Watermelon    99.0    NaN     NaN
3   Hoover  Alabama       Mango    90.0    Gin    34.0
4   Hoover  Alabama      Orange    14.0  Vodka    18.0
5   Hoover  Alabama  Watermelon    43.0    NaN     NaN
6  Houston    Texas       Mango     4.0    Gin    16.0
7  Houston    Texas      Orange    10.0  Vodka    20.0
8  Houston    Texas  Watermelon    40.0    NaN     NaN

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-02-10
    • 2016-01-05
    • 2021-01-26
    • 2020-02-06
    • 2019-11-30
    • 2013-08-03
    • 1970-01-01
    相关资源
    最近更新 更多