【问题标题】:How can I use gsl to calculate polynomial regression data points?如何使用 gsl 计算多项式回归数据点?
【发布时间】:2017-07-18 14:53:41
【问题描述】:

(免责声明:我的数学很糟糕,并且来自 JavaScript,因此对于任何不准确之处,我深表歉意,并会尽力纠正它们。)

Rosetta Code 上的example 展示了如何使用 gsl 计算系数。代码如下:

polifitgsl.h:

#ifndef _POLIFITGSL_H
#define _POLIFITGSL_H
#include <gsl/gsl_multifit.h>
#include <stdbool.h>
#include <math.h>
bool polynomialfit(int obs, int degree, 
           double *dx, double *dy, double *store); /* n, p */
#endif

polifitgsl.cpp:

#include "polifitgsl.h"

bool polynomialfit(int obs, int degree, 
           double *dx, double *dy, double *store) /* n, p */
{
  gsl_multifit_linear_workspace *ws;
  gsl_matrix *cov, *X;
  gsl_vector *y, *c;
  double chisq;

  int i, j;

  X = gsl_matrix_alloc(obs, degree);
  y = gsl_vector_alloc(obs);
  c = gsl_vector_alloc(degree);
  cov = gsl_matrix_alloc(degree, degree);

  for(i=0; i < obs; i++) {
    for(j=0; j < degree; j++) {
      gsl_matrix_set(X, i, j, pow(dx[i], j));
    }
    gsl_vector_set(y, i, dy[i]);
  }

  ws = gsl_multifit_linear_alloc(obs, degree);
  gsl_multifit_linear(X, y, c, cov, &chisq, ws);

  /* store result ... */
  for(i=0; i < degree; i++)
  {
    store[i] = gsl_vector_get(c, i);
  }

  gsl_multifit_linear_free(ws);
  gsl_matrix_free(X);
  gsl_matrix_free(cov);
  gsl_vector_free(y);
  gsl_vector_free(c);
  return true; /* we do not "analyse" the result (cov matrix mainly)
          to know if the fit is "good" */
}

main.cpp(请注意,我已将 x any y 的示例编号替换为我自己的):

#include <stdio.h>

#include "polifitgsl.h"

#define NP 11
double x[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
double y[] = {98.02, 98.01, 98.01, 98.02, 97.98, 97.97, 97.96, 97.94, 97.96, 97.96, 97.97, 97.97, 97.94, 97.94, 97.94, 97.92, 97.96, 97.9, 97.85, 97.9};

#define DEGREE 3
double coeff[DEGREE];

int main()
{
  int i;

  polynomialfit(NP, DEGREE, x, y, coeff);
  for(i=0; i < DEGREE; i++) {
    printf("%lf\n", coeff[i]);
  }
  return 0;
}

这是输出:

98.030909
-0.016182
0.000909

所以这给了我系数。但我真正想要的是实际的拟合点。在 JavaScript 中,我使用了regression package 来计算分数:

var regression = require('regression');

var calculateRegression = function(values, degree) {
    var data = [];
    var regressionOutput;
    var valuesCount = values.length;
    var i = 0;

    // Format the data in a way the regression library expects.
    for (i = 0; i < valuesCount; i++) {
        data[i] = [i, values[i]];
    }

    // Use the library to calculate the regression.
    regressionOutput = regression('polynomial', data, degree);

    return regressionOutput;
};

var y = [98.02, 98.01, 98.01, 98.02, 97.98, 97.97, 97.96, 97.94, 97.96, 97.96, 97.97, 97.97, 97.94, 97.94, 97.94, 97.92, 97.96, 97.9, 97.85, 97.9];

console.log(calculateRegression(y, 3));

产生:

{ equation: 
   [ 98.02987916431594,
     -0.017378390369880512,
     0.0015748071645344357,
     -0.00005721503635571101 ],
  points: 
   [ [ 0, 98.02987916431594 ],
     [ 1, 98.01401836607424 ],
     [ 2, 98.00096389194348 ],
     [ 3, 97.9903724517055 ],
     [ 4, 97.98190075514219 ],
     [ 5, 97.97520551203543 ],
     [ 6, 97.96994343216707 ],
     [ 7, 97.96577122531896 ],
     [ 8, 97.96234560127297 ],
     [ 9, 97.959323269811 ],
     [ 10, 97.95636094071487 ],
     [ 11, 97.95311532376647 ],
     [ 12, 97.94924312874768 ],
     [ 13, 97.94440106544033 ],
     [ 14, 97.93824584362629 ],
     [ 15, 97.93043417308745 ],
     [ 16, 97.92062276360569 ],
     [ 17, 97.90846832496283 ],
     [ 18, 97.89362756694074 ],
     [ 19, 97.87575719932133 ] ],
  string: 'y = 0x^3 + 0x^2 + -0.02x + 98.03' }

(请注意,JavaScript 中存在浮点问题,因此数字并不完全准确。)

points 这是我想要使用 gsl 生成的。有什么办法可以做到吗?

【问题讨论】:

  • gsl 确实不需要生成积分。如果您上面的等式是y = x^3 + x^2 - 0.2x + 98.03,那么对于任何x,您计算一个对应的值y,构成您的(x, y) 点。您只需要一个简单的double y; for (x = 0; x &lt; 20; x++) y = x*x*x + x*x -.02*x + 98.03; 即可生成上面列出的点。完全没有 gsl 影响。
  • 当然——非常有意义。我怎么没有意识到这一点:) 谢谢!
  • 嗯,其实我还是有点迷茫。当我运行 gsl 示例时,这就是我得到的所有输出:98.030909、-0.016182 和 0.000909。我究竟如何从中建立一个方程?
  • 乍得,如果你说的是曲线拟合中使用的向量和协方差矩阵中的点,可以使用gsl_vector_getgsl_matrix_get(参见:gsl multifit example)否则,如果您只是在谈论x 值范围内任何点的点,那么只需使用找到系数的多项式方程即可。您在上面的示例中显示的似乎是曲线拟合中使用的点。

标签: c polynomial-math gsl


【解决方案1】:

乍得,如果我了解您的需求,您只需使用您在polynomialfit 函数中找到的系数,根据多项式方程计算值。计算系数后,您可以通过以下等式找到任何 x(对于 DEGREE = 3)的值 y

y = x^2*(coeff2) + x*(coeff1) + coeff0

或在 C 语法中

y = x*x*coeff[2] + x*coeff[1] + coeff[0];

您可以按如下方式修改您的main.cpp(您应该将main.cpp 重命名为main.cpolyfitgsl.cpppolyfitgsl.c——因为它们都是C 源文件,而不是C++)

#include <stdio.h>

#include "polifitgsl.h"

#define NP 20
double x[] = {  0,  1,  2,  3,  4,  5,  6,  7,  8, 9,
               10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
double y[] = { 98.02, 98.01, 98.01, 98.02, 97.98,
               97.97, 97.96, 97.94, 97.96, 97.96,
               97.97, 97.97, 97.94, 97.94, 97.94,
               97.92, 97.96, 97.9,  97.85, 97.9 };

#define DEGREE 3
double coeff[DEGREE];

int main (void)
{
    int i;

    polynomialfit (NP, DEGREE, x, y, coeff);

    printf ("\n polynomial coefficients:\n\n");
    for (i = 0; i < DEGREE; i++) {
        printf ("  coeff[%d] : %11.7lf\n", i, coeff[i]);
    }
    putchar ('\n');

    printf (" computed values:\n\n   x    y\n");
    for (i = 0; i < (int)(sizeof x/sizeof *x); i++)
        printf ("  %2d, %11.7lf\n", i, 
                i*i*coeff[2] + i*coeff[1] + coeff[0]);
    putchar ('\n');

    return 0;
}

提供以下输出:

使用/输出示例

$ ./bin/main

 polynomial coefficients:

  coeff[0] :  98.0132468
  coeff[1] :  -0.0053003
  coeff[2] :  -0.0000558

 computed values:

   x    y
   0,  98.0132468
   1,  98.0078906
   2,  98.0024229
   3,  97.9968435
   4,  97.9911524
   5,  97.9853497
   6,  97.9794354
   7,  97.9734094
   8,  97.9672718
   9,  97.9610226
  10,  97.9546617
  11,  97.9481891
  12,  97.9416049
  13,  97.9349091
  14,  97.9281016
  15,  97.9211825
  16,  97.9141517
  17,  97.9070093
  18,  97.8997553
  19,  97.8923896

这似乎是您所要求的。仔细查看,比较您的价值观,如果您需要其他帮助,请告诉我。


进一步清理

只是为了进一步清理代码,因为不需要全局声明xycoeff,并使用enum 来定义常量DEGREE 和@ 987654339@,更简洁的版本是:

#include <stdio.h>

#include "polifitgsl.h"

enum { DEGREE = 3, NP = 20 };

int main (void)
{
    int i;
    double x[] = {  0,  1,  2,  3,  4,  5,  6,  7,  8, 9,
                   10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
    double y[] = { 98.02, 98.01, 98.01, 98.02, 97.98,
                   97.97, 97.96, 97.94, 97.96, 97.96,
                   97.97, 97.97, 97.94, 97.94, 97.94,
                   97.92, 97.96, 97.9,  97.85, 97.9 };
    double coeff[DEGREE] = {0};


    polynomialfit (NP, DEGREE, x, y, coeff);

    printf ("\n polynomial coefficients:\n\n");
    for (i = 0; i < DEGREE; i++)
        printf ("  coeff[%d] : %11.7lf\n", i, coeff[i]);
    putchar ('\n');

    printf (" computed values:\n\n   x    y\n");
    for (i = 0; i < (int)(sizeof x/sizeof *x); i++)
        printf ("  %2d, %11.7lf\n", i, 
                i*i*coeff[2] + i*coeff[1] + coeff[0]);
    putchar ('\n');

    return 0;
}

【讨论】:

  • 非常感谢 David 竭尽全力 - 非常感谢。 JS 回归库的数字太差了!哇。
  • 我刚刚意识到 JavaScript 库的编号实际上是错误的,但它们并没有那么远。我需要将 C 程序中的 NP 调整为 20 而不是 11。我将继续编辑你的答案以反映这一点——希望你不介意我修正我自己的错误 :)
  • 很高兴我能帮上忙。如果你喜欢编程,你很快就会发现 C 语言会成为你的最爱。是的,学习需要一些额外的努力,因为它需要准确了解您正在使用的内存以及如何访问它。但是,这种低级访问带来了无与伦比的编码灵活性和强大的功能,可以让您的代码完全按照您的需要执行。没有更好的编译语言可以学习。如果你掌握了 C 语言,那么掌握剩下的就是小菜一碟了。
  • C 当然有它的用途。但是,一切都是全局函数,这让我很难理解应用程序的结构。我更喜欢 C++、Java 甚至 JavaScript 等面向对象的语言。几年前我在做一个PHP代码库,所有的函数都是全局的,所以很乱,很难理解。
  • 在了解了如何用 C 编写面向对象的桌面(例如 Gnome 等)之后,您会意识到全局 main 实际上只是控制应用程序的入口点。 C++ 使用相同的main。这是您将控制权从main 移交给差异消失的地方。我更多地将其视为为工作选择合适的工具。 C 不打算成为 a、.js、php 等。它可以以面向对象或过程的方式使用。这就是它的灵活性。最明显的是开销和加载时间。与 Java 和 C++ 相比,您通常会看到 400% 以上的收益。
【解决方案2】:

我遇到了同样的问题,并已采取上述答案并添加了从http://www.cplusplus.com/forum/general/181580/获取的直接计算解决方案(没有gsl)

您可以在下面找到一个独立的测试程序,该程序具有基于 gsl 的直接计算解决方案。

我已经进行了一些分析运行,直接计算的性能在我的系统上是令人印象深刻的 65 倍,对相应函数的 1000 次调用分别为 5.22 秒和 0.08 秒。

附带说明,直接计算使用克莱默规则,因此您应该小心病态数据。通常我会避免使用 Cramer,但对 3x3 系统使用全线性系统求解器对我来说似乎有点过分了。

/*
* =====================================================================================
*
*       Filename:  polyfit.cpp
*
*    Description:  Least squares fit of second order polynomials 
*                  Test program using gsl and direct calculation
*        Version:  1.0
*        Created:  2017-07-17 09:32:55
*       Compiler:  gcc
*
*         Author:  Bernhard Brunner, brb_blog@epr.ch
*     References:  This code was merged, adapted and optimized from these two sources:
*                  http://www.cplusplus.com/forum/general/181580/
*                  https://stackoverflow.com/questions/36522882/how-can-i-use-gsl-to-calculate-polynomial-regression-data-points
*                  http://brb.epr.ch/blog/blog:least_squares_regression_of_parabola
*          Build:  compile and link using 
*                  g++    -c -o polifitgsl.o polifitgsl.cpp
*                  gcc  polifitgsl.o -lgsl -lm -lblas -o polifitgsl
* 
*      Profiling:
*                  valgrind --tool=callgrind ./polifitgsl
*                  kcachegrind
*                  
*                  polynomialfit        takes 5.22s for 1000 calls
*                  findQuadCoefficients takes 0.08s for 1000 calls
*                   65x faster
* =====================================================================================
*/

#include <stdio.h>
#include <gsl/gsl_multifit.h>
#include <stdbool.h>
#include <math.h>

bool polynomialfit(int obs, int degree, 
        double *dx, double *dy, double *store); /* n, p */
double x[] = {  0,  1,  2,  3,  4,  5,  6,  7,  8, 9,
            10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
double y[] = { 98.02, 98.01, 98.01, 98.02, 97.98,
            97.97, 97.96, 97.94, 97.96, 97.96,
            97.97, 97.97, 97.94, 97.94, 97.94,
            97.92, 97.96, 97.9,  97.85, 97.9 };

#define NP (sizeof(x)/sizeof(double)) // 20

#define DEGREE 3
double coeff[DEGREE];

bool findQuadCoefficients(double timeArray[], double valueArray[], double *coef, double &critPoint, int PointsNum){
    const double S00=PointsNum;//points number
    double S40=0, S10=0, S20=0, S30=0, S01=0, S11=0, S21 = 0;
//    const double MINvalue = valueArray[0];
//    const double MINtime = timeArray[0];
    for (int i=0; i<PointsNum; i++ ){
        double value = valueArray[i]; //  - MINvalue); //normalizing
//      cout << "i=" << i << " index=" << index << " value=" << value << endl;

        int index = timeArray[i]; //  - MINtime;
        int index2 = index * index;
        int index3 = index2 * index;
        int index4 = index3 * index;

        S40+= index4;
        S30+= index3; 
        S20+= index2;
        S10+= index;

        S01 += value;
        S11 += value*index;
        S21 += value*index2;
    }

    double S20squared = S20*S20;

    //minors M_ij=M_ji
    double M11 = S20*S00 - S10*S10;
    double M21 = S30*S00 - S20*S10;
    double M22 = S40*S00 - S20squared;
    double M31 = S30*S10 - S20squared;

    double M32 = S40*S10 - S20*S30;
//  double M33 = S40*S20 - pow(S30,2);

    double discriminant = S40*M11 - S30*M21 + S20*M31;
//    printf("discriminant :%lf\n", discriminant);
    if (abs(discriminant) < .00000000001) return  false;

    double Da = S21*M11
               -S11*M21
               +S01*M31;
    coef[2] = Da/discriminant;
//  cout << "discriminant=" << discriminant;
//  cout << " Da=" << Da;

    double Db = -S21*M21
                +S11*M22
                -S01*M32;
    coef[1] = Db/discriminant;
//  cout << " Db=" << Db << endl;

    double Dc =   S40*(S20*S01 - S10*S11) 
                - S30*(S30*S01 - S10*S21) 
                + S20*(S30*S11 - S20*S21);
    coef[0] = Dc/discriminant;
//    printf("c=%lf\n", c);

    critPoint = -Db/(2*Da); // + MINtime; //-b/(2*a)= -Db/discriminant / (2*(Da/discriminant)) = -Db/(2*Da);

    return true;
}

bool polynomialfit(int obs, int degree, 
        double *dx, double *dy, double *store) /* n, p */
{
gsl_multifit_linear_workspace *ws;
gsl_matrix *cov, *X;
gsl_vector *y, *c;
double chisq;

int i, j;

X = gsl_matrix_alloc(obs, degree);
y = gsl_vector_alloc(obs);
c = gsl_vector_alloc(degree);
cov = gsl_matrix_alloc(degree, degree);

for(i=0; i < obs; i++) {
    for(j=0; j < degree; j++) {
        gsl_matrix_set(X, i, j, pow(dx[i], j));
    }
    gsl_vector_set(y, i, dy[i]);
}

ws = gsl_multifit_linear_alloc(obs, degree);
gsl_multifit_linear(X, y, c, cov, &chisq, ws);

/* store result ... */
for(i=0; i < degree; i++) {
    store[i] = gsl_vector_get(c, i);
}

gsl_multifit_linear_free(ws);
gsl_matrix_free(X);
gsl_matrix_free(cov);
gsl_vector_free(y);
gsl_vector_free(c);
return true; /* we do not "analyse" the result (cov matrix mainly)
        to know if the fit is "good" */
}

void testcoeff(double *coeff)
{
    printf ("\n polynomial coefficients\n");
    for (int i = 0; i < DEGREE; i++) {
        printf ("  coeff[%d] : %11.7lf\n", i, coeff[i]);
    }
    putchar ('\n');

    printf (" computed values:\n\n   x, yi, yip\n");
    for (unsigned i = 0; i < NP; i++) {
        printf ("%2u,%.7lf,%.7lf\n", i, 
                y[i], 
                i*i*coeff[2] + i*coeff[1] + coeff[0]);
    }
    putchar ('\n');
}

int main (void)
{
    #define ITER 1000
    for (int i=0; i< ITER; i++) {
        polynomialfit (NP, DEGREE, x, y, coeff);
    }
    testcoeff(coeff);

    double sx; 
    for (int i=0; i< ITER; i++) {
        findQuadCoefficients(x, y, coeff, sx, NP);
    }
    printf("critical point %lf\n", sx);
    testcoeff(coeff);
    return 0;
}

参考资料:

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-04-06
    • 2018-02-22
    • 2018-03-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多