重新索引

pandas对象的一个重要反复是reindex,其作用是创建一个适应新索引的新对象。

In [136]: obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])

In [137]: obj2=obj.reindex(['a','b','c','d','e'])

In [138]: obj2
Out[138]:
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

如果某个索引值不存在,就引入缺失值

In [140]: obj.reindex(['a','b','c','d','e'],fill_value=0)
Out[140]:
a   -5.3
b    7.2
c    3.6
d    4.5
e    0.0
dtype: float64

对于时间序列这样的有序数据,重新索引时可能需要做一些插值处理,method选项即可达到目的

In [141]: obj3=Series(['blue','purple','yellow'],index=[0,2,4])

In [142]: obj3.reindex(range(6),method='ffill')
Out[142]:
0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

reindex的(插值) method选项

fill或pad:前向填充(或搬运)值

bfill或backfill:后向填充(或搬运)值

对于DataFrame,reindex可以修改(行)索引、列,或2个都修改,如果只传入一个序列,则重新索引行

In [143]: frame=DataFrame(np.arange(9).reshape(3,3),index=['a','b','c'],columns=['one','two','three'])
     ...:

In [144]: frame
Out[144]:
   one  two  three
a    0    1      2
b    3    4      5
c    6    7      8

In [145]: frame2=frame.reindex(['a','b','c','d'])

In [146]: frame2
Out[146]:
   one  two  three
a  0.0  1.0    2.0
b  3.0  4.0    5.0
c  6.0  7.0    8.0
d  NaN  NaN    NaN

In [147]: states=['red','yellow','green']

In [148]: frame.reindex(columns=states)
Out[148]:
   red  yellow  green
a  NaN     NaN    NaN
b  NaN     NaN    NaN
c  NaN     NaN    NaN

In [149]: states=['red','one','two']

In [150]: frame.reindex(columns=states)
Out[150]:
   red  one  two
a  NaN    0    1
b  NaN    3    4
c  NaN    6    7
View Code

相关文章: