【发布时间】:2020-01-22 02:41:53
【问题描述】:
+------+------+------+------+------+------+-------+----+
| | | | | USD | EUR | JPY | RUP |
+------+------+------+------+------+------+------+-----+
+------+------+------+------+------+------+------+-----+
| | | | | Case | Cons | Case | Case|
+------+------+------+------+------+------+------+-----+
+------+------+------+------+------+------+------+-----+
| | | | | High | Low | CWM | AEP |
+------+------+------+------+------+------+------+-----+
+------+------+------+------+------+------+------+-----+
| Col1 | Col2 | Col3 | Col4 | Owner| OPS | VH |Delta|
+------+------+------+------+------+------+------+-----+
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 |
| V1a | V2a | V3a | V4a | V5a | V6a | V7a | V8a |
+------+------+------+------+------+------+------+-----+
这里要求的是df.to_dict()输出的样本数据:
{('Unnamed: 0_level_0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2', 'Year'): {0: 2020, 1: 2020, 2: 2020, 3: 2020, 4: 2020, 5: 2020, 6: 2020, 7: 2020, 8: 2020, 9: 2020, 10: 2020, 11: 2020, 12: 2020, 13: 2020, 14: 2020, 15: 2020, 16: 2020, 17: 2020, 18: 2020, 19: 2020, 20: 2020, 21: 2020, 22: 2020, 23: 2020, 24: 2020, 25: 2020, 26: 2020, 27: 2020, 28: 2020, 29: 2020, 30: 2020, 31: 2020, 32: 2020, 33: 2020, 34: 2020, 35: 2020, 36: 2020, 37: 2020, 38: 2020, 39: 2020, 40: 2020, 41: 2020, 42: 2020, 43: 2020, 44: 2020, 45: 2020, 46: 2020, 47: 2020}, ('Unnamed: 1_level_0', 'Unnamed: 1_level_1', 'Unnamed: 1_level_2', 'Month'): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 29: 1, 30: 1, 31: 1, 32: 1, 33: 1, 34: 1, 35: 1, 36: 1, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 1, 43: 1, 44: 1, 45: 1, 46: 1, 47: 1}, ('Unnamed: 2_level_0', 'Unnamed: 2_level_1', 'Unnamed: 2_level_2', 'Day'): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 1, 24: 2, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2, 30: 2, 31: 2, 32: 2, 33: 2, 34: 2, 35: 2, 36: 2, 37: 2, 38: 2, 39: 2, 40: 2, 41: 2, 42: 2, 43: 2, 44: 2, 45: 2, 46: 2, 47: 2}, ('Unnamed: 3_level_0', 'Unnamed: 3_level_1', 'Unnamed: 3_level_2', 'Hour'): {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 21, 22: 22, 23: 23, 24: 0, 25: 1, 26: 2, 27: 3, 28: 4, 29: 5, 30: 6, 31: 7, 32: 8, 33: 9, 34: 10, 35: 11, 36: 12, 37: 13, 38: 14, 39: 15, 40: 16, 41: 17, 42: 18, 43: 19, 44: 20, 45: 21, 46: 22, 47: 23}, ('USD', 'Cons', 'very high', 'Hub1'): {0: 23.06, 1: 21.49, 2: 21.73, 3: 21.58, 4: 21.67, 5: 22.78, 6: 27.15, 7: 26.09, 8: 26.23, 9: 28.21, 10: 29.21, 11: 31.97, 12: 30.45, 13: 30.45, 14: 30.45, 15: 29.14, 16: 28.28, 17: 26.35, 18: 26.32, 19: 27.01, 20: 26.34, 21: 28.22, 22: 27.77, 23: 26.94, 24: 24.16, 25: 22.74, 26: 22.67, 27: 22.67, 28: 22.74, 29: 23.14, 30: 27.81, 31: 27.87, 32: 28.05, 33: 27.91, 34: 32.66, 35: 35.14, 36: 33.32, 37: 36.17, 38: 38.33, 39: 31.75, 40: 30.9, 41: 26.36, 42: 27.17, 43: 28.17, 44: 26.17, 45: 26.5, 46: 28.95, 47: 26.94}, ('EUR', 'Case', 'CWM', 'Hub2'): {0: 18.59, 1: 18.32, 2: 18.32, 3: 18.32, 4: 18.32, 5: 19.19, 6: 22.57, 7: 25.38, 8: 25.53, 9: 25.9, 10: 26.47, 11: 26.47, 12: 26.09, 13: 25.59, 14: 25.35, 15: 24.97, 16: 24.22, 17: 25.22, 18: 25.49, 19: 26.19, 20: 25.63, 21: 25.1, 22: 21.93, 23: 19.61, 24: 19.4, 25: 18.75, 26: 18.85, 27: 18.75, 28: 18.88, 29: 19.41, 30: 23.97, 31: 27.07, 32: 27.23, 33: 29.21, 34: 30.49, 35: 28.52, 36: 27.49, 37: 26.93, 38: 26.71, 39: 25.76, 40: 25.24, 41: 25.67, 42: 26.72, 43: 27.98, 44: 26.73, 45: 25.97, 46: 22.34, 47: 19.47}, ('USD', 'Cons', 'Ventyx', 'Hub3'): {0: 19.78, 1: 20.96, 2: 21.58, 3: 21.5, 4: 21.27, 5: 22.59, 6: 26.22, 7: 26.78, 8: 26.78, 9: 26.97, 10: 26.97, 11: 26.97, 12: 26.53, 13: 26.34, 14: 26.5, 15: 26.22, 16: 25.6, 17: 26.5, 18: 26.74, 19: 27.44, 20: 26.87, 21: 26.5, 22: 23.2, 23: 23.58, 24: 22.74, 25: 22.31, 26: 22.27, 27: 22.27, 28: 22.74, 29: 22.84, 30: 27.79, 31: 31.63, 32: 29.6, 33: 29.25, 34: 30.53, 35: 28.51, 36: 27.48, 37: 26.97, 38: 26.74, 39: 26.53, 40: 26.5, 41: 26.92, 42: 28.89, 43: 30.24, 44: 28.38, 45: 27.38, 46: 24.39, 47: 23.2}}
这是我可以为这个文件做的最好的表示。
第 1-4 列有一个标题第 5-N 列(是 N,因为我们不知道有多少)有 4 个标题。
数据框需要如下所示:
+------+------+------+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | NCol1| NCol2|NCol3 | NCol4| Col9 |
+------+------+------+------+------+------+------+------+------+
| V1 | V2 | V3 | V4 | USD | Case | High | Owner| V5 |
| V1a | V2a | V3a | V4a | USD | Case | High | Owner| V5a |
| V1a | V2a | V3a | V4a | EUR | Cons | Low | Ops | V6 |
| V1a | V2a | V3a | V4a | EUR | Cons | Low | Ops | V6a |
| V1a | V2a | V3a | V4a | JPY | Case | CWM | VH | V7 |
| V1a | V2a | V3a | V4a | JPY | Case | CWM | VH | V7a |
| V1a | V2a | V3a | V4a | RUP | Case | AEP | Delta| V8 |
| V1a | V2a | V3a | V4a | RUP | Case | AEP | Delta| V8a |
+------+------+------+------+------+------+-----+------+-------+
因此基本上将第 5 列到第 N 列标题转换为新列,其中每行数据与前 4 列以及值最初所在的标题对齐。
我试过了:
df = pd.read_csv(file,header=[0,1,2,3])
df.melt(var_name=['a','b','c','d'], value_name='e')
还有:
df2 = df.melt(id_vars=['Year','Month','Day','Hour'], col_level=3)
还有:
df2 = df.stack().stack().stack().stack()
最后一个非常接近,但它完成了前 4 列
但这不起作用,因为它只给了我 col1 和 col2。
【问题讨论】:
-
你能做一个 df.to_dict() 并粘贴结果吗?即读取 csv 并将其输出为 dict 并共享它。它应该比你目前提供的更容易使用
-
让我看看如何创建一些更匹配的相同信息并将其发布。
-
在提供该 dict 时,我唯一担心的是这是一个小样本,并且数据框可能有未知数量的列。
-
您没有阅读我尝试过的部分吗?
-
在这里我将添加另外 10 个我尝试过但也不起作用的东西。