如何使用 numy linalg lstsq 来拟合具有相同斜率但截距不同的两个数据集？答案

【问题标题】：How to use numy linalg lstsq to fit two datasets with same slope but different intercept?如何使用 numy linalg lstsq 来拟合具有相同斜率但截距不同的两个数据集？
【发布时间】：2020-10-03 16:21:45
【问题描述】：

我正在尝试进行加权最小二乘拟合，并遇到了numpy.linalg.lstsq。我需要拟合加权最小二乘。因此，以下工作：

# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]

这给了我最合适的斜率和截距。现在，假设我有两个具有相同斜率但截距不同的数据集？我将如何进行联合拟合以获得最佳拟合斜率加上两个截距。我仍然需要加权最小二乘版本。对于未加权的情况，我发现以下工作：

(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
                                        np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
                                        np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T, 
                              np.concatenate((y1,y2)))

【问题讨论】：

标签： python numpy curve-fitting least-squares data-fitting

【解决方案1】：

首先我重写你的第一种方法，因为我认为它可以写得更清楚

weights = 1 / yerr
m, b = np.linalg.lstsq(np.c_[weights * x, weights], weights * y, rcond=None)[0]

为了适应 2 个数据集，您可以堆叠 2 个数组，但将矩阵的一些元素设为 0。

np.random.seed(12)
N = 3
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)

M = 2
x1 = np.sort(10 * np.random.rand(M))
yerr1 = 0.1 * 0.5 * np.random.rand(M)
y1 = 10.0 * x1 + 25
y1 += yerr1 * np.random.randn(M)
#do the fitting
weights = 1 / yerr
weights1 = 1 / yerr1
first_column = np.r_[weights * x, weights1 * x1]
second_column = np.r_[weights, [0] * x1.size]
third_column = np.r_[[0] * x.size, weights1]
a = np.c_[first_column, second_column, third_column]
print(a)
# [[  4.20211437   2.72576342   0.        ]
#  [ 24.54293941   9.32075195   0.        ]
#  [ 13.22997409   1.78771428   0.        ]
#  [126.37829241   0.          26.03711851]
#  [686.96961895   0.         124.44253391]]
c = np.r_[weights * y, weights1 * y1]
print(c)
# [  83.66073785  383.70595203  159.12058215 1914.59065915 9981.85549321]
m, b1, b2 = np.linalg.lstsq(a, c, rcond=None)[0]
print(m, b1, b2)
# 10.012202998026055 14.841412336510793 24.941219918240172

编辑

如果你想要不同的斜率和一个截距，你可以这样做。可能最好掌握单斜率 2 截距情况的总体思路。看看数组a：你从权重和c 构造它，所以现在它是未加权的问题。您尝试找到这样的vector = [slope, intercept1, intercept2] 和a @ vector = c（通过最小化差异的平方和来尽可能地）。通过在a 中置零，我们使其可分离：矩阵的上半部分a 变化slope 和intercept1，a 的下半部分变化slope 和intercept2。类似于 vector = [slope1, slope2, intercept] 的 2 个斜坡案例。

first_column = np.r_[weights * x, [0] * x1.size]
second_column = np.r_[[0] * x.size, weights1 * x1]
third_column = np.r_[weights, weights1]

【讨论】：

但是对于有零和没有零的情况，这会产生不同的截距和斜率，对吧？据我了解，他想保留一个修复。
@Joe 我不确定我是否完全理解您的评论，但它给出了一个斜率和 2 个截距。当我们生成 y 和 y1 时，我们使用 10、15 和 25 作为斜率和 2 个截距。最后的代码列表显示我们得到了这个值。
好的，我现在明白了。将删除我的答案。
一个后续问题：如果我有两个斜率不同但截距相同的数据集，代码将如何修改？
非常感谢。这似乎工作得很好。最后一个问题：有没有办法获得与斜率和截距相关的标准不确定性？输出似乎只给出了残差的总和。