【发布时间】:2021-09-14 16:17:56
【问题描述】:
我正在寻找一种将曲线拟合到一组 x,y 散点数据的方法,这些数据可以描述最典型的形状(平均?最佳拟合?不确定正确的术语是什么)。
到目前为止,我尝试了几种关于 polyfitting 的变体,但这根本不起作用。低阶拟合不能很好地捕捉形状,高阶拟合具有各种过度拟合或不希望的最终效果。我还研究了将这些数据转换为热图,这确实为我提供了我正在寻找的东西的良好视觉效果,但没有给我一种将形状描述为 x 位置函数的方法。接受任何想法...谢谢大家!
数据由二维坐标数据的多个“捕获”组成。每个捕获都是代表对象外部轮廓的 x,y 坐标字符串。所有对象在宏观层面上都是相似的,但它们各有不同。所附的树干图片说明了信息的类型,它们都是圆形的,但轮廓明显不同。
这是我的实际数据的散点图。每种颜色代表一个单独的捕获/对象。红线是我想象的理想输出/拟合。
请参阅帖子底部以获取较小的数据子集,以便进行讨论,采用 csv 格式。
读入样本数据:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('/content/sample_data.csv', )
for capture_number in range(max(data.capture_number)+1):
x = data.x[data.capture_number == capture_number]
y = data.y[data.capture_number == capture_number]
plt.scatter(x,y)
plt.plot(x,y, alpha = 0.3)
这给了我这个情节。每种颜色都是单独的捕获/对象。
然后做一些拟合。
data_sorted = data.sort_values(by='x')
xs = np.linspace(min(data.x), max(data.x), 100)
p_3 = np.polyfit(data_sorted.x, data_sorted.y, 3)
p_5 = np.polyfit(data_sorted.x, data_sorted.y, 5)
p_10 = np.polyfit(data_sorted.x, data_sorted.y, 10)
plt.figure()
plt.plot(xs, np.polyval(p_3, xs), c='m')
plt.plot(xs, np.polyval(p_5, xs), c='c')
plt.plot(xs, np.polyval(p_10, xs), c='r')
plt.scatter(data.x, data.y, s=1, marker='o')
plt.legend(['3rd order fit', '5th order fit', '10th order fit' , 'scatter data'])
样本数据:
x,y,capture_number
-92.48513173328318,174.46181346779125,0
-102.34411197872745,143.10470214178093,0
-105.06373626025295,84.86118244975245,0
-98.61190594972697,46.294407824292506,0
-63.99942017496949,9.215007045016817,0
-18.011106513233937,15.073676637862253,0
-2.4091675236573122,66.79032858512424,0
-2.793581679386326,88.45604299278679,0
-0.037191373829076044,83.57866552583225,1
-3.372037438453564,68.37521993841754,1
-6.3020594949810445,32.46959340011879,1
-23.646729955279078,4.201053940534801,1
-60.723536889231134,11.131211365998759,1
-78.67812538210005,26.701665893588228,1
-85.36645115296895,47.23880092937693,1
-95.4513113514432,69.35182638987914,1
-99.18677616845986,76.78728094217132,1
-11.57760873856192,97.62529943790491,2
-2.3850312567657643,46.40555613892471,2
-30.972436937263602,5.161311187235333,2
-80.50235616412658,45.73866780689221,2
-100.84679376767056,82.57165001365009,2
-1.579108157217389,93.4006768863743,3
-1.0342435877346552,79.90689875086049,3
-5.389782177008227,57.17976276644515,3
-6.933090791486306,30.294195133777237,3
-18.071889237594064,4.123664591593948,3
-35.68449626893269,3.6709119699324777,3
-52.024640946341634,10.892909487984635,3
-74.60092054794526,30.140927095466658,3
-91.11409785124107,60.15913910456948,3
-95.08970881852622,71.32115598902,3
-4.170806597499514,95.98055160555235,4
-2.4180312611738395,85.98687321564216,4
-0.5498018356144946,62.4353954245081,4
-2.0744302915463395,29.14987932885067,4
-4.435901811714583,9.792402649412317,4
-16.614338883267788,3.709200622148345,4
-34.87621519152463,2.995086459817584,4
-50.85275052997459,3.9733989189931154,4
-69.99280429729266,18.813544393730336,4
-84.71518869796297,39.58603196895316,4
-96.85090881319562,57.03667006335702,4
-107.87093848852234,85.48437316116156,4
-109.39596178555179,109.19679183900347,4
-107.54308863516981,116.12024736811875,4
0.5117546553123122,78.90192280827411,5
-8.95047861926479,47.224485130899815,5
-11.24909392853944,25.00751807126977,5
-30.443249589625818,5.731132987759837,5
-61.93003194342436,10.249018898126533,5
-78.02611959770444,31.457502011386566,5
-89.74733849583858,54.007959025287285,5
-98.60749645003874,77.54827040087625,5
-100.51048477456361,88.45519980721456,5
-10.12650583863164,132.11572810052456,6
-8.092609250243742,89.89042433421311,6
-3.619178885745626,52.07668960611805,6
-0.9181312645347878,24.215983486617777,6
-15.363333435476594,2.4053207891565536,6
-34.901522245357214,4.831842592092085,6
-50.57437592147933,9.90012604519583,6
-66.10788430707454,19.876547966332367,6
-79.99990573199646,35.55513375638101,6
-95.9648607113633,53.26275540580688,6
-102.11805211534988,97.94937981624,6
-99.55995261744383,130.16356048679103,6
-6.463495249711035,98.7572335450711,7
-0.6995064583571309,69.63788098118992,7
-11.265182407597008,17.91027394127386,7
-42.92158289984183,-2.2681684837896534,7
-76.30092222746524,36.875233496201446,7
-91.83228706419811,68.7361985764675,7
-94.15078846035587,74.31891629844836,7
-0.8557562431032705,117.05886485812867,8
-1.4316909413126913,69.35863586507791,8
-1.732543610955167,31.312002315071403,8
-8.002117735463669,6.822473131379365,8
-39.07947403605981,3.0170544498847915,8
-81.05156816306311,28.09752208418372,8
-96.45367007880188,87.44681780046406,8
-93.94267869648395,129.9987553469081,8
-92.66722585758654,138.25266699687177,8
-2.215747392819274,83.4829833439352,9
-4.456089705671695,45.43342366742096,9
-26.515974744921557,1.541075059187552,9
-67.03191101940496,12.648076176354751,9
-92.9866094421785,56.79854957617513,9
-98.90124191249355,73.7029457010593,9
【问题讨论】:
-
您是要拟合函数还是一般关系(例如圆、日食、双曲线)?如果您希望最终曲线成为非函数,则不太可能使用多项式获得良好拟合。
-
@JethroCao - 我不确定我正在寻找的正确术语是什么。在高层次上,我正在寻找一些方法来定义曲线,以便我可以将曲线导入 CAD 软件并 3D 打印基于曲线的部分。我同意多项式拟合不是正确的方法,我只是使用它,因为这是我最熟悉的东西。
-
并不是说我确定这种方法会解决您的问题,但我会研究 SVM(支持向量机),它在 ML 中用于分类和回归。以下是您可能想要探索的 API:scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html
-
我查看了一些文档,但无法深入了解。似乎那里可能有一些东西,但我需要更加熟悉调整参数以获得良好的拟合。
-
是的,如果您是新手,SVM 并不是最容易上手的东西,但不管 imo,它都值得学习。解决该问题的另一种潜在方法是将数据集分成两半,以便每一部分都可以通过实际函数进行近似,然后分别对数据集的“上”和“下”半部分进行回归。最后,您需要将两条回归曲线连接在一起,特别注意它们应该连接的点,以确保连续性和平滑性;甚至可能将其用作回归的约束。
标签: python matplotlib data-science curve-fitting