Python - 插值图答案

【问题标题】：Python - Interpolation of plotsPython - 插值图
【发布时间】：2017-06-08 02:11:36
【问题描述】：

对于我的评估，我使用gnuplot 绘制了来自两个单独的 csv 文件（在此链接中找到：https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs）的数据，这些文件具有不同的行数，从而生成了以下图表。

这些数据在csv 文件中似乎没有共同的时间戳（第一列），但gnuplot 似乎符合如上所示的绘图。

这是我用来生成情节的gnuplot 脚本。

# ###### GNU Plot

set style data lines
set terminal postscript eps enhanced color "Times" 20

set output "output.eps"

set title "Actual vs. Estimated Comparison"

set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey

set grid xtics ytics mytics
#set size 2
#set size ratio 0.4

#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"

set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0

plot  "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";

我想将我的绿线插入到定义我的粉红色线的网格中，然后比较两者。这是我最初的方法

#!/usr/bin/env python
import sys

import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
    return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))

for g in lines[0].intersection(lines[1]):
    if g.geom_type != 'Point':
        continue
    print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:

set terminal pngcairo
set output 'fig.png'

set datafile separator comma
set yr [0:700]
set xr [0:10]

set xtics 0,2,10
set ytics 0,100,700

set grid

set xlabel "Time [seconds]"
set ylabel "Segments"

plot \
    'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
    'actual.csv' w l lc rgb 'green' t 'Actual', \
    '<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''

这给了我们以下情节

我从这个脚本将过滤后的点写入了另一个文件（filtered_points.csv 在此链接中找到：https://drive.google.com/open?id=0B2Iv8dfU4fTUSHVOMzYySjVzZWc）。但是，过滤后的点不到实际数据集的 10%（这是基本事实）。

有什么方法可以通过使用python 忽略绿色图上方的粉红色高峰来插入两条线？ Gnuplot 似乎不是最好的工具。如果粉线不接触绿线（即如果它低于绿线），我想取最近的绿线的值，以便它是一一对应的（或非常接近) 与实际数据集。我想返回粉红色线网格中绿线的插值，以便我们可以比较两条线，因为它们具有相同的数组大小。

【问题讨论】：

我想我不明白你真正想要做什么。 “我想将我的绿线插入到定义我的粉红色线的网格中，然后比较两者。”是什么意思？意思是？据我了解，您喜欢： 1. 拟合绿色曲线 2. 确保所有粉红色数据都低于绿色数据 3. 比较数据并通过这种方式寻找交叉点。 4. 返回这个路口数据对吗？绿色曲线不是已经满足您的需求了吗？
什么样的插值？线性？样条？其他？
@Franz，完全正确！！！但我最终想要的是绿线和粉线的一对一数据大小。如果您在此链接中看到了 .csv 文件：drive.google.com/drive/folders/0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs - 我们在estimated.csv 中的数据点比actual.csv 多（基本事实）。在这种情况下，我想对其进行平滑处理，使其符合基本事实。如果存在间隙（从图中可以看出，一些点位于绿线下方 - 在这种情况下，我们将采用绿线的当前值（数据点））。希望这能解释。
@Goyo，我认为Splines 会很好。

标签： python python-3.x numpy scipy interpolation

【解决方案1】：

numpy.interp() 在插值方面获得相同的数据大小非常简单。对我来说，这段代码有效：

import numpy as np
import matplotlib.pyplot as plt

names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
interpolated_estimation = np.interp(estimated_data[:,0],actual_data[:,0],actual_data[:,1])

plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(estimated_data[:,0],interpolated_estimation, label='interpolated')
plt.legend()
plt.show(block=True)

在这个插值之后，interpolated_estimation 与actual_data 的 x 轴大小相同，如图所示。切片有点混乱，但我尝试使用您的函数并使plot 调用尽可能清晰。

要保存到文件并按照建议进行绘图，我将代码更改为：

import numpy as np
import matplotlib.pyplot as plt

names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
interpolated_estimation = np.interp(estimated_data[:,0],actual_data[:,0],actual_data[:,1])

plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
#plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(estimated_data[:,0],interpolated_estimation, label='interpolated')
np.savetxt('interpolated.csv',
       np.vstack((estimated_data[:,0],interpolated_estimation)).T,
       delimiter=',', fmt='%10.5f') #saves data to filedata to file
plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,500)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

这会产生以下输出：

【讨论】：

谢谢弗兰兹。是否可以将interpolated 的数据点写入另一个文件，以便轻松查看是否存在一一对应关系？最后，我想生成如下链接中的图表：drive.google.com/open?id=0B2Iv8dfU4fTUSHVOMzYySjVzZWc
嗨，Desta，我添加了更改后的源。
太棒了。我已经让你的回答接受了弗朗茨。但是，如果您看过文件actual.csv（15179 行）和estimated.csv（258267 行）- 有两列数据用逗号分隔。但是在interpolated.csv新文件中，一列数据大约有516534行。即使是新数据（例如第一行：2.648999999999999879e-03）也很难解释。是否可以将两列作为原始文件并使其与actual.csv 一一对应？我的意思是和actual.csv有相同的行？
非常好，谢谢。现在interpolated.csv - 新文件的行与原始estimated.csv 完全相同。这是否意味着我们不能与actual.csv 进行一一对应（这意味着actual.csv 和interpolated.csv 具有相同的行数？
由于行数取自estimated.csv，数据取自actual.csv，interpolated-data 代表您的actual-data。通过这种方式，如果您将interpolated 视为actual，则实际和估计的行数相同。