添加缺失的行并插入它们的值答案

【问题标题】：Adding Missing Rows and Interpolating their Values添加缺失的行并插入它们的值
【发布时间】：2021-03-22 08:37:10
【问题描述】：

我正在使用以下数据框，

altitude    density east_wind   north_wind
0   5   0.020567    39.714397   6.795392
1   7   0.016871    41.171996   6.852655
2   9   0.013839    42.629594   6.909918
3   11  0.011351    44.087193   6.967182
4   13  0.009311    45.544791   7.024445

我希望在altitude 中有一个连续值，而不仅仅是奇数，然后使用 SciPy 的.interpolate(method='linear') 填充缺失值，并将插值扩展到altitude 的20 值

预期输出

altitude    density east_wind   north_wind
0   5   0.020567    39.714397   6.795392
1   6   0.018871    41.171996   6.852655
2   7   0.015839    42.629594   6.909918
3   8   0.013351    44.087193   6.967182
4   9   0.010311    45.544791   7.024445
...
...
9   19  0.000351    50.087193   11.967182
10  20  0.000311    51.544791   12.024445

请指教

【问题讨论】：

标签： python pandas interpolation

【解决方案1】：

Pandas 中的插值相对容易，外插有点困难。于是我们“作弊”，手动计算altitude=21行，然后调用reindex和interpolate

首先我们加载数据

from io import StringIO
data = StringIO(
"""
altitude    density east_wind   north_wind
0   5   0.020567    39.714397   6.795392
1   7   0.016871    41.171996   6.852655
2   9   0.013839    42.629594   6.909918
3   11  0.011351    44.087193   6.967182
4   13  0.009311    45.544791   7.024445
""")
df = pd.read_csv(data, sep='\s+', index_col=0)
df

然后

last_index = 21
df2 = df.set_index('altitude')
df2.loc[last_index] = df2.loc[df2.index[-1]] + (last_index - df2.index[-1])*(df2.loc[df2.index[-1]] - df2.loc[df2.index[-2]])/(df2.index[-1] - df2.index[-2])
df2.reindex(range(5,22)).interpolate().reset_index()

得到

      altitude    density    east_wind    north_wind
--  ----------  ---------  -----------  ------------
 0           5   0.020567      39.7144       6.79539
 1           6   0.018719      40.4432       6.82402
 2           7   0.016871      41.172        6.85266
 3           8   0.015355      41.9008       6.88129
 4           9   0.013839      42.6296       6.90992
 5          10   0.012595      43.3584       6.93855
 6          11   0.011351      44.0872       6.96718
 7          12   0.010331      44.816        6.99581
 8          13   0.009311      45.5448       7.02445
 9          14   0.008291      46.2736       7.05308
10          15   0.007271      47.0024       7.08171
11          16   0.006251      47.7312       7.11034
12          17   0.005231      48.46         7.13897
13          18   0.004211      49.1888       7.1676
14          19   0.003191      49.9176       7.19623
15          20   0.002171      50.6464       7.22487
16          21   0.001151      51.3752       7.2535

【讨论】：

我已经尝试实现您的代码，但它会产生错误ValueError: cannot reindex from a duplicate axis
嗯。我编辑了我的答案，显示了我如何加载数据——如果你完全按照我的步骤进行操作，包括数据加载，它是否有效？你原来的df中'altitude'的dtype是什么？
我试过你的代码，插值部分工作得很好，但外推导致一些不合理的值与趋势不符。是否还有其他方法可以进行外推？
docs.google.com/spreadsheets/d/…
我的观点是，您通常不能仅仅线性推断您的数据并期望得到合理的答案。虽然上面的代码可以满足您的要求，但它可能不是您真正需要的