首先通过移位列名创建Series,由第一个空格分割并附加Date,最后一个过滤器仅Unnamed索引值和rename列名:
s = df.columns.to_series().shift(-1).str.split(n=1).str[0] + ' Date'
s = s[s.index.str.startswith('Unnamed')]
print (s)
Unnamed: 0 233740 Date
Unnamed: 2 233160 Date
dtype: object
df = df.rename(columns=s)
print (df)
233740 Date 233740 KS Equity 233160 Date 233160 KS Equity
0 2015-12-17 10330.0 2017-08-31 10460.0
1 2015-12-18 10710.0 2017-09-01 10815.0
2 2015-12-21 10720.0 2017-09-04 10835.0
3 2015-12-22 10495.0 2017-09-05 10660.0
4 2015-12-23 10425.0 2017-09-06 10535.0
如果需要从所有数据中创建 2 或 3 列,首先通过 split 创建 MultiIndex,然后调用 unstack:
df.columns = df.columns.str.split(n=1, expand=True)
df = df.stack(0).reset_index(level=0, drop=True).rename_axis('val').reset_index()
print (df)
val Date KS Equity
0 233160 2017-08-31 10460.0
1 233740 2015-12-17 10330.0
2 233160 2017-09-01 10815.0
3 233740 2015-12-18 10710.0
4 233160 2017-09-04 10835.0
5 233740 2015-12-21 10720.0
6 233160 2017-09-05 10660.0
7 233740 2015-12-22 10495.0
8 233160 2017-09-06 10535.0
9 233740 2015-12-23 10425.0
编辑:
多个不同标题的解决方案:
#create dummy data
df1 = df.copy()
df1.columns = ['Unnamed: 4','233 JP Equity','Unnamed: 6','235 JP Equity']
df = df.join(df1)
print (df)
Unnamed: 0 233740 KS Equity Unnamed: 2 233160 KS Equity Unnamed: 4 \
0 2015-12-17 10330.0 2017-08-31 10460.0 2015-12-17
1 2015-12-18 10710.0 2017-09-01 10815.0 2015-12-18
2 2015-12-21 10720.0 2017-09-04 10835.0 2015-12-21
3 2015-12-22 10495.0 2017-09-05 10660.0 2015-12-22
4 2015-12-23 10425.0 2017-09-06 10535.0 2015-12-23
233 JP Equity Unnamed: 6 235 JP Equity
0 10330.0 2017-08-31 10460.0
1 10710.0 2017-09-01 10815.0
2 10720.0 2017-09-04 10835.0
3 10495.0 2017-09-05 10660.0
4 10425.0 2017-09-06 10535.0
s = df.columns.to_series().shift(-1) + ' Date'
s = s[s.index.str.startswith('Unnamed')]
print (s)
Unnamed: 0 233740 KS Equity Date
Unnamed: 2 233160 KS Equity Date
Unnamed: 4 233 JP Equity Date
Unnamed: 6 235 JP Equity Date
dtype: object
df = df.rename(columns=s)
在list comprehension groupby by first number,同时创建dataetimeindex和concat。最后由stack 和unstack 重塑以删除NaNs:
f = lambda x: x.split(' ',1)[1]
df = pd.concat([x.set_index(x.columns[0]).rename(columns=f) for i, x
in df.groupby(df.columns.str.split(n=1).str[0], axis=1)], 1).stack().unstack()
print (df)
JP Equity KS Equity
2015-12-17 10330.0 10330.0
2015-12-18 10710.0 10710.0
2015-12-21 10720.0 10720.0
2015-12-22 10495.0 10495.0
2015-12-23 10425.0 10425.0
2017-08-31 10460.0 10460.0
2017-09-01 10815.0 10815.0
2017-09-04 10835.0 10835.0
2017-09-05 10660.0 10660.0
2017-09-06 10535.0 10535.0