【发布时间】:2020-03-13 19:58:51
【问题描述】:
我对 Python (Pandas) 比较陌生,我想用它来自动执行 Excel 任务并提高工作效率 :)
目前我正坐在 Excel 销售报告下方,其中“年份”是一个合并单元格。
| 2018 | 2019 |
| Product | January | February | March | April | January | February | March | April |
| A | 8 | 10 | 65 | 50 | 8 | 10 | 65 | 50 |
| B | 9 | 10 | 65 | 50 | 8 | 63 | 65 | 50 |
| C | 7 | 10 | 65 | 50 | 8 | 10 | 65 | 50 |
| D | 8 | 10 | 65 | 50 | 8 | 10 | 65 | 50 |
现在我想将报告重塑为堆叠格式,然后我可以将其写回 Excel,并用于进一步分析:
Product | Year | Month | Values
A | 2018 | January | 8
B | 2018 | February| 9
我的想法是创建一个数据框并使用 pd.melt()
不幸的是,我在尝试创建数据框时已经在第一步失败了。
“年份”只写在 2 个单元格中,其余显示“未命名 x”。
import pandas as pd
// change console output
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
//Read Excel file and create dataframe
df = pd.read_excel("Stackoverflow_example.xlsx")
print(df)
Unnamed: 0 2018 Unnamed: 2 Unnamed: 3 Unnamed: 4 2019 Unnamed: 6 Unnamed: 7 Unnamed: 8
0 Product January February March April January February March April
1 A 8 10 65 50 8 10 65 50
2 B 9 10 65 50 8 63 65 50
3 C 7 10 65 50 8 10 65 50
4 D 8 10 65 50 8 10 65 50
如果有人能帮助我解决这个问题,那就太好了。
非常感谢。
编辑:
添加 header=[0,1], index_col=[0] 有效,但我仍在努力寻找将其转换为堆叠格式的方法.....
import pandas as pd
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
df = pd.read_excel("Stackoverflow_example.xlsx", header=[0,1], index_col=[0])
print(df)
----------------------------------------------------------------------
2018 2019
Product January February March April January February March April
A 8 10 65 50 8 10 65 50
B 9 10 65 50 8 63 65 50
C 7 10 65 50 8 10 65 50
D 8 10 65 50 8 10 65 50
它有效,但同时弄乱了列标题名称(level_0,“产品”在“月”列中......
import pandas as pd
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
df = pd.read_excel("Stackoverflow_example.xlsx", header=[0,1], index_col=[0])
df = df.stack().reset_index()
print(df)
-----------------------------------------------------------------------------
level_0 Product 2018 2019
0 A April 50 50
1 A February 10 10
2 A January 8 8
3 A March 65 65
4 B April 50 50
5 B February 10 63
6 B January 9 8
7 B March 65 65
8 C April 50 50
9 C February 10 10
10 C January 7 8
11 C March 65 65
12 D April 50 50
13 D February 10 10
14 D January 8 8
15 D March 65 65
我尝试重命名列并将索引设置为“Product”,导致“Month 2018 2019”下方的“单元格”为空
import pandas as pd
desired_width = 320
pd.set_option("display.width", desired_width)
pd.set_option("display.max_columns", 30)
df = pd.read_excel("Stackoverflow_example.xlsx", header=[0,1], index_col=[0])
df = df.stack().reset_index()
df.columns = ["Product", "Month", "2018", "2019"]
df = df.set_index("Product")
print(df)
----------------------------------------------------------
Month 2018 2019
Product
A April 50 50
A February 10 10
A January 8 8
A March 65 65
B April 50 50
B February 10 63
B January 9 8
B March 65 65
C April 50 50
C February 10 10
C January 7 8
C March 65 65
D April 50 50
D February 10 10
D January 8 8
D March 65 65
【问题讨论】:
-
谢谢,jezrael - 这行得通,但我仍在努力将其转换为堆叠格式:/
-
你能检查答案吗?
-
之前使用过这类数据(主要是 SAP BW !!)如果我的回答有帮助,请告诉我。
-
@SebK - 哎呀,有必要
unstack,答案已编辑。 -
非常感谢,伙计们!两种解决方案都可以正常工作:)
标签: python excel pandas dataframe header