具有对数采样时间间隔的 Pandas 数据帧插值答案

【问题标题】：Pandas dataframe interpolation with logarithmically sampled time intervals具有对数采样时间间隔的 Pandas 数据帧插值
【发布时间】：2021-01-09 06:10:15
【问题描述】：

我有一个 pandas 数据框，其中包含以分钟为单位的“时间”列和从数据记录器中提取的“值”。数据以对数时间间隔记录，这意味着第一个值以分数分钟记录，然后随着时间的推移时间间隔变得更长：

print(df)
      Minutes   Value
0       0.001    0.00100
1       0.005    0.04495
2       0.010    0.04495
3       0.015    0.09085
4       0.020    0.11368
..        ...        ...
561  4275.150  269.17782
562  4285.150  266.90964
563  4295.150  268.35306
564  4305.150  269.42984
565  4315.150  268.37594

我想以 0 到 4315 分钟的一分钟间隔对“值”进行线性插值。

我尝试了几次不同的 df.interpolate() 迭代，但都没有成功。有人可以帮我吗？谢谢

【问题讨论】：

标签： python pandas dataframe scipy interpolation

【解决方案1】：

我认为我的问题可能非常基本，或者我提出了一个令人困惑的问题。无论哪种方式，我只是写了一个小循环来解决我的问题，并且觉得我应该分享它。我确信这不是做我所要求的最有效的方法，希望有人可以提出更好的方法来实现这一点。我对这一切还是很陌生。

首先是一些符合条件的事情：

我所说的“价值”数据称为“回撤”，是指水井内水位与初始起始水位的差异。它从 0 开始。
这种数据通常在半对数图中查看，有时更容易将 0 替换为非常小的数字（即 0.0001），以便在其他程序中轻松绘制。

此代码采用列名为“Minutes”和“Drawdown”的 .csv 文件，并将时间值与从 0 到数据集末尾的分钟的新参考数据帧进行比较。它引用列表中与所需时间值最接近的 2 个时间值，并对这些值进行加权平均，然后创建一个带有回撤的整数分钟的新 csv。

干杯！

# -*- coding: utf-8 -*-
"""
Created on Tue Sep 22 13:42:29 2020

@author: cmeyer
"""

import pandas as pd
import numpy as np

df=pd.read_csv('Read_in.csv')
length=len(df)-1
last=df.at[length,'Drawdown']
lengthpump=int(df.at[length,'Minutes'])
minutes=np.arange(0,lengthpump,1)
dfminutes=pd.DataFrame(minutes)
dfminutes.columns = ['Minutes']
for i in range(1, lengthpump, 1):

    non_uni_minutes=df['Minutes']
    uni_minutes=dfminutes.at[i,'Minutes']

    close1=non_uni_minutes[np.argsort(np.abs(non_uni_minutes-uni_minutes))[0]]
    close2=non_uni_minutes[np.argsort(np.abs(non_uni_minutes-uni_minutes))[1]]

    index1 = np.where(non_uni_minutes == close1)
    index1 = int(index1[0])
    index2 = np.where(non_uni_minutes == close2)
    index2 = int(index2[0])

    num1=df.at[index1,'Drawdown']
    num2=df.at[index2,'Drawdown']

    weight1 = 1-abs((i-close1)/i)
    weight2 = 1-abs((i-close2)/i)

    Value = (weight1*num1+weight2*num2)/(weight1+weight2)

    dfminutes.at[i,'Drawdown'] = Value
dfminutes.at[0,'Drawdown'] = 0.000001
dfminutes.at[0,'Minutes'] = 0.000001
dfminutes.to_csv('integer_minutes_drawdown.csv')

【讨论】：

【解决方案2】：

在这里，我使用numpy.interp 实现了高效的解决方案。我编写了一种从字符串中将数据读入pandas.DataFrame 的奇特方式，您可以使用任何更简单的方式来满足您的需求，例如pandas.read_csv(...)。

Try next code here online!

import math
import pandas as pd, numpy as np

# Here is just fancy way of reading data, use any other method of reading instead
df = pd.DataFrame([map(float, line.split()) for line in """
   0.001    0.00100
   0.005    0.04495
   0.010    0.04495
   0.015    0.09085
   0.020    0.11368
4275.150  269.17782
4285.150  266.90964
4295.150  268.35306
4305.150  269.42984
4315.150  268.37594
""".splitlines() if line.strip()], columns = ['Time', 'Value'])

a = df.values
# Create array of integer x = [0 1 2 3 ... LastTimeFloor].
x = np.arange(math.floor(a[-1, 0] + 1e-6) + 1)
# Linearly interpolate
y = np.interp(x, a[:, 0], a[:, 1])

df = pd.DataFrame({'Time': x, 'Value': y})
print(df)

【讨论】：