重新索引
pandas对象的一个重要反复是reindex,其作用是创建一个适应新索引的新对象。
In [136]: obj=Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c']) In [137]: obj2=obj.reindex(['a','b','c','d','e']) In [138]: obj2 Out[138]: a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
如果某个索引值不存在,就引入缺失值
In [140]: obj.reindex(['a','b','c','d','e'],fill_value=0) Out[140]: a -5.3 b 7.2 c 3.6 d 4.5 e 0.0 dtype: float64
对于时间序列这样的有序数据,重新索引时可能需要做一些插值处理,method选项即可达到目的
In [141]: obj3=Series(['blue','purple','yellow'],index=[0,2,4]) In [142]: obj3.reindex(range(6),method='ffill') Out[142]: 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object
reindex的(插值) method选项
fill或pad:前向填充(或搬运)值
bfill或backfill:后向填充(或搬运)值
对于DataFrame,reindex可以修改(行)索引、列,或2个都修改,如果只传入一个序列,则重新索引行
In [143]: frame=DataFrame(np.arange(9).reshape(3,3),index=['a','b','c'],columns=['one','two','three']) ...: In [144]: frame Out[144]: one two three a 0 1 2 b 3 4 5 c 6 7 8 In [145]: frame2=frame.reindex(['a','b','c','d']) In [146]: frame2 Out[146]: one two three a 0.0 1.0 2.0 b 3.0 4.0 5.0 c 6.0 7.0 8.0 d NaN NaN NaN In [147]: states=['red','yellow','green'] In [148]: frame.reindex(columns=states) Out[148]: red yellow green a NaN NaN NaN b NaN NaN NaN c NaN NaN NaN In [149]: states=['red','one','two'] In [150]: frame.reindex(columns=states) Out[150]: red one two a NaN 0 1 b NaN 3 4 c NaN 6 7