【发布时间】:2018-06-12 14:24:20
【问题描述】:
数据帧不工作的多维插值
import pandas as pd
import numpy as np
raw_data = {'CCY_CODE': ['SGD','USD','USD','USD','USD','USD','USD','EUR','EUR','EUR','EUR','EUR','EUR','USD'],
'END_DATE': ['16/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018',
'17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018'],
'STRIKE':[0.005,0.01,0.015,0.02,0.025,0.03,0.035,0.04,0.045,0.05,0.55,0.06,0.065,0.07],
'VOLATILITY':[np.nan,np.nan,0.3424,np.nan,0.2617,0.2414,np.nan,np.nan,0.215,0.212,0.2103,np.nan,0.2092,np.nan]
}
df_volsurface = pd.DataFrame(raw_data,columns = ['CCY_CODE','END_DATE','STRIKE','VOLATILITY'])
df_volsurface['END_DATE'] = pd.to_datetime(df_volsurface['END_DATE'])
df_volsurface.interpolate(method='akima',limit_direction='both')
输出:
<table><tbody><tr><th> </th><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>NaN</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>NaN</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.296358</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.230295</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.220911</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.209471</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>NaN</td></tr></tbody></table>
预期结果:
<table><tbody><tr><th> </th><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>NaN</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>Expected some logical value</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.296358</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.230295</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.220911</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.209471</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>Expected some logical value</td></tr></tbody></table>
线性插值方法在不考虑 ccy_code 的情况下将最后一个可用值复制到所有后向和前向缺失值
df_volsurface.interpolate(method='linear',limit_direction='both')
输出:
<table><tbody><tr><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th><th> </th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>0.3424</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>0.3424</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.30205</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.2326</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.2238</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.20975</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>0.2092</td></tr></tbody></table>
感谢任何帮助!谢谢!
【问题讨论】:
-
您可以在调用 interpolate 之前发布一个数据帧示例
df_volsurface吗? -
嗨 WolfgangK,感谢您提供帮助。按要求更新。
-
从你发的图片来看,没有缺失数据。如果您在两个都为 0 的值之间进行插值,则结果仍然为 0。作为一般提示,请让我们轻松地为您提供帮助:将数据样本以我们可以复制和粘贴的形式保存会很棒重现您的错误。此外,显示您预期的结果。由于没有人拥有您的 csv 文件,因此数据准备的代码对其他人来说几乎毫无意义。也许看看How to create a Minimal, Complete, and Verifiable example
-
WolfgangK,正如我所建议的那样,我改变了我的问题。另外,感谢您先前关于 0. 0 的评论,需要在插值之前转换 NaN,这现在可以工作,但发布时存在一些问题。谢谢!
-
我猜您的
'STRIKE'列中有错字(0.55 应该是 0.055)。
标签: python pandas dataframe interpolation