需要一种快速方法将大量双精度转换为字符串答案

【问题标题】：Need a fast method to convert large amount of double to string需要一种快速方法将大量双精度转换为字符串
【发布时间】：2018-12-28 07:08:08
【问题描述】：

我正在为一个高速计算程序编写一个结果输出模块。

我的计划是：

我的任务是以相对较快的速度将结果插入数据库 (PostgreSQL)。
我使用 libpq 的 [COPY FROM STDIN]，有人告诉我这是最快的方法。
该方法需要将结果转换为 char* 格式。

虽然结果如下所示：

未来 106 年的月度现金流（总计 1272 倍）。
每个条目大约有 14 个现金流。
大约 2800 个实体（测试数据为 2790 个）。

数据库中的表是这样的：

表格的每一行都包含一个实体。
有一些前缀来标识不同的实体。
CashFlows 是跟在前缀后面的双数组（PGSQL 中的 float8[] 类型）。

下面是在数据库中创建表的代码：

create table AgentCF(
PlanID     int4,
Agent      int4,
Senario    int4,
RM_Prev    float8[], DrvFac_Cur float8[], Prem       float8[],
Comm       float8[], CommOR     float8[], FixExp     float8[],
VarExp     float8[], CIRCFee    float8[], SaftyFund  float8[],
Surr       float8[], Benefit_1  float8[], Benefit_2  float8[],
Benefit_3  float8[], Benefit_4  float8[], Benefit_5  float8[],
Benefit_6  float8[], Benefit_7  float8[], Benefit_8  float8[],
Benefit_9  float8[], Benefit_10 float8[]
);

为准备插入的 CashFlow 的函数提供代码：

void AsmbCF(char *buffer, int size, int ProdNo, int i, int Pos, int LineEnd)
{
    int     j, Step = sizeof(nodecf) / sizeof(double), PosST, Temp;
    double *LoopRate = &AllHeap[ProdNo].Heap.AgentRes[i].CF.NodeCF[0].Prem;
    strcpy_s(buffer, size, "{");
    for (j = 0; j < TOTLEN / 10; j++) {
        PosST = j * 10 * Step + Pos;
        sprintf_s(&buffer[strlen(buffer)], size - strlen(buffer), "%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,",
            LoopRate[PosST],
            LoopRate[PosST + 1 * Step],
            LoopRate[PosST + 2 * Step],
            LoopRate[PosST + 3 * Step],
            LoopRate[PosST + 4 * Step],
            LoopRate[PosST + 5 * Step],
            LoopRate[PosST + 6 * Step],
            LoopRate[PosST + 7 * Step],
            LoopRate[PosST + 8 * Step],
            LoopRate[PosST + 9 * Step]
        );
    }
    Temp = j * 10;
    PosST = Temp * Step + Pos;
    sprintf_s(&buffer[strlen(buffer)], size - strlen(buffer), "%f", LoopRate[PosST]);
    Temp = Temp + 1;
    for (j = Temp; j < TOTLEN; j++) {
        PosST = j * Step + Pos;
        sprintf_s(&buffer[strlen(buffer)], size - strlen(buffer), ",%f", LoopRate[PosST]);
    }
    if (LineEnd) {
        strcat_s(buffer, size, "}\n");
    }
    else {
        strcat_s(buffer, size, "}\t");
    }
}

以下是速度测试的代码：

void ThreadOutP(LPVOID pM)
{
    char       *buffer = malloc(BUFFLEN), sql[SQLLEN];
    int         Status, ProdNo = (int)pM, i, j, ben;
    PGconn     *conn = NULL;
    PGresult   *res;
    clock_t     begin, end;

    fprintf_s(fpOutP, "PlanID %d Start inseting...\n", AllHeap[ProdNo].PlanID);
    begin = clock();
    DBConn(&conn, CONNSTR, fpOutP);

#pragma region General cashflow
    //============================== Data Query ==============================
    //strcpy_s(&sql[0], SQLLEN, "COPY AgentCF(PlanID,Agent,Senario,Prem,Comm,CommOR,CIRCFee,SaftyFund,FixExp,VarExp,Surr");
    //for (ben = 1; ben <= AllHeap[ProdNo].Heap.TotNo.NoBenft; ben++) {
    //  strcat_s(&sql[0], SQLLEN, ",Benefit_");
    //  _itoa_s(ben, &sql[strlen(sql)], sizeof(sql) - strlen(sql), 10);
    //}
    //strcat_s(&sql[0], SQLLEN, ") FROM STDIN;");
    //res = PQexec(conn, &sql[0]);
    //if (PQresultStatus(res) != PGRES_COPY_IN) {
    //  fprintf_s(fpOutP, "Not in COPY_IN mode\n");
    //}
    //PQclear(res);
    //============================== Data Apply ==============================
    for (i = 0; i < AllHeap[ProdNo].MaxAgntPos + AllHeap[ProdNo].Heap.TotNo.NoSensi; i++) {
        sprintf_s(buffer, BUFFLEN, "%d\t%d\t%d\t", AllHeap[ProdNo].PlanID, AllHeap[ProdNo].Heap.AgentRes[i].Agent, AllHeap[ProdNo].Heap.AgentRes[i].Sensi);
        //Status = PQputCopyData(conn, buffer, (int)strlen(buffer));
        //if (1 != Status) {
        //  fprintf_s(fpOutP, "PlanID %d inserting error for agent %d\n", AllHeap[ProdNo].PlanID, AllHeap[ProdNo].Heap.AgentRes[i].Agent);
        //}
        for (j = 0; j < 8 + AllHeap[ProdNo].Heap.TotNo.NoBenft; j++) {
            if (j == 7 + AllHeap[ProdNo].Heap.TotNo.NoBenft) {
                AsmbCF(buffer, BUFFLEN, ProdNo, i, j, 1);
            }
            else {
                AsmbCF(buffer, BUFFLEN, ProdNo, i, j, 0);
            }
            //Status = PQputCopyData(conn, buffer, (int)strlen(buffer));
            //if (1 != Status) {
            //  fprintf_s(fpOutP, "PlanID %d inserting error for agent %d\n", AllHeap[ProdNo].PlanID, AllHeap[ProdNo].Heap.AgentRes[i].Agent);
            //}
        }
    }
    //Status = PQputCopyEnd(conn, NULL);
#pragma endregion

#pragma region K cashflow

#pragma endregion

    PQfinish(conn);
    FreeProd(ProdNo);
    free(buffer);
    end = clock();
    fprintf_s(fpOutP, "PlanID %d inserted, total %d rows inserted, %d millisecond cost\n", AllHeap[ProdNo].PlanID, i, end - begin);
    AllHeap[ProdNo].Printed = 1;
}

请注意，我禁用了涉及插入的代码。

测试结果为：

仅组装字符串的成本为 45930 毫秒。
组装字符串和插入的成本为 54829 毫秒。

因此，大部分成本在于将 double 转换为 char。

所以想问问有没有更快的方法把double序列转成string，因为相比计算成本，瓶颈其实是结果的输出。

顺便说一下，我的平台是 Windows 10、PostgreSQL 11、Visual Studio 2017。

非常感谢！

【问题讨论】：

你在 strlen() 和 strcat() [en.wikipedia.org/wiki/…] 上花费了很多时间。你知道 snprintf() 返回一个可用的值吗？

标签： c postgresql hpc

【解决方案1】：

实际上有几种更快的方法可以将浮点数准确表示为字符串，其中之一是Grisu, by Florian Loitsch。

This github repo 比较了 C 和 C++ 中的几种算法，其中包含 source code for the Grisu2 method in C，他声称比 sprintf 快 5.7 倍。

然而，同一个 repo (Milo Yip) 的作者提供了他自己的 C++ 单头实现，据称速度提高了 9.1 倍，大概是因为更多的函数是完全内联的。我相信将此代码移植到 C 应该是微不足道的，因为它不使用任何特殊的 C++ 语法。

【讨论】：

【解决方案2】：

我对原始代码做了一些记账：

  Total score("function" calls):
    2 + 4*TOTLEN * strlen()
    1 + 2*TOTLEN * sprintf() 
    1 * strcat()

  Estimated string() cost:
    3 + 4* size * (TOTLEN*TOTLEN) / 2 (measured in characters)

  Estimated sprintf() cost:
    2 * TOTLEN (measured in %lf conversions)
    2 * size (measured in characters)

现在，我不知道 TOTLEN 是什么，但是在不断增长的字符串上调用 strlen() 和朋友会导致二次行为，请参阅 https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the_Painter.27s_algorithm

在优化之前分析/衡量（或思考）
snprintf() ，如果使用正确，是溢出安全的；阅读手册页并使用返回值
strxxx_x() 功能几乎没用，它们的存在只是为了取悦 PHB

【讨论】：

顺便说一句，我低估了字符串成本（4 倍）你能找到吗？

【解决方案3】：

将大量双精度转换为字符串的快速方法

对于完整的double 范围应用程序，请使用sprintf(buf, "%a", some_double)。如果需要十进制输出，请使用"%e"。

任何其他代码只有在包含准确性或允许的输入范围不知何故时才会更快。

通常的 方法是将double x 转换为某个宽比例的整数并将其转换为字符串。这意味着 OP 尚未明确表达对 x 的限制。

即使其他一些方法看起来更快，但随着代码的发展或移植，它可能不会更快。

OP 需要发布的是用于客观性能评估的速度测试代码。

【讨论】：

“全双”是指没有小数的双？？我现在试试%e，我也试试用itoa写个代码，因为我用的是intel的cpu，应该有fild指令。
@Shore “全双精度范围”并不意味着 没有小数的双精度。一个典型的double 有大约 2^64 个不同的值。快速输出应该对所有这些（全范围）都做得很好，而不仅仅是少数几个。否则，如果double 的子范围需要存在快速输出，请明确说明所需的范围和精度。
我可能只需要6-7个小数点，但是如果我只使用sprintf和%f，它只会输出6个小数点......
@Shore "如果我只使用带有 %f 的 sprintf，它只会输出 6 个小数点..." --> 通常是浪费、无信息或误导。 floating-point 被称为 floating-point 是有原因的。精度浮动。对于 6 位精度，123456789012234567890.123456 通常与 1.23456e19 一样好，而 0.0000000000123456 最好是 1.23456e-19。使用"%f," 输出可以得到无信息的"123456789012234567890.123456"vs 整洁的'1.23456e19" 和"0.000000" vs."1.23456e-19"。换句话说，"%f" 要么是浪费，要么是没有信息。
@shore 要获得比"%a" 更快的输出并且有有用的输出，您需要指定输出的限制。 “只需要小数点后 6-7 位”不足以作为一个好的答案。

【解决方案4】：

替代chux的回答，我做了以下功能：

__inline char* dbltoa(char* buff, double A, int Precision)
{
    int     Temp;
    char   *ptr;

    Temp = (int)A;
    _itoa_s(Temp, buff, 50, 10);
    ptr = buff + strlen(buff);
    ptr[0] = '.';
    Temp = (int)((A - Temp) * pow(10, Precision));
    _itoa_s(Temp, ptr + 1, 50, 10);
    return ptr + strlen(ptr);
}

并更新了生成 CashFlow 字符串的函数：

void AsmbCF(char *buffer, int size, int ProdNo, int i, int Pos, int LineEnd)
{
    int     j, Step = sizeof(nodecf) / sizeof(double), PosST, Temp;
    double *LoopRate = &AllHeap[ProdNo].Heap.AgentRes[i].CF.NodeCF[0].Prem;
    char   *ptr;
    strcpy_s(buffer, size, "{");
    ptr = buffer + 1;
    for (j = 0; j < TOTLEN; j++) {
        PosST = j * Step + Pos;
        ptr = dbltoa(ptr, LoopRate[PosST], 8);
        ptr[0] = ',';
        ptr++;
    }
    ptr[-1] = 0;
    if (LineEnd) {
        strcat_s(buffer, size, "}\n");
    }
    else {
        strcat_s(buffer, size, "}\t");
    }
}

没有插入的测试结果是 4558 毫秒，而有插入则需要 29260 毫秒（可能是数据库的并行运行使得这个比率不相等）。

【讨论】：

这个“答案”只进行了TOTLEN 的转换，而原来的是原来的两倍。因此，您可以至少将此结果乘以 2。（并且：我希望基于snprintf() 的解决方案不会慢很多）