Tesseract OCR (C++) 无法评估输出字符串答案

【问题标题】：Tesseract OCR (C++) Cannot Evaluate Output StringTesseract OCR (C++) 无法评估输出字符串
【发布时间】：2021-05-21 17:54:03
【问题描述】：

我正在尝试从 OpenCV 矩阵窗口中提取输出字符串并对其进行评估，但它似乎返回类似于“someString\n”而不是“someString”的内容。这使得很难比较知道有 (x) 数量的空白。

我试过了：

创建一个省略空格的 char 数组（我知道我只评估 5 个索引）

std::string redef;
    char charArr[100] = {NULL};
    strcpy_s(charArr, str.c_str());

    for (int i = 0; i < 5; i++)
    {
        if (charArr[i] != ' ')
        {
            redef += charArr[i];
        }
    }
    std::cout << "analyseAction ran:" << redef << "white-space?";

但字符串返回类似

analyseAction ran:redefString
white-space?

main函数中运行的相关代码：

api->Recognize(0);
            outText = api->GetUTF8Text();
            analyseAction(outText);

下面，请注意 else 语句会运行，因为当 long 在窗口中以可视方式显示时，redef 不等于“long”。

void analyseAction(std::string str)
{
    std::string redef;
    char charArr[100] = {NULL};
    strcpy_s(charArr, str.c_str());

    for (int i = 0; i < 5; i++)
    {
        if (charArr[i] != ' ')
        {
            redef += charArr[i];
        }
    }
    std::cout << "analyseAction ran:" << redef << "white-space?";

//alot of missing code, trying to show what is relevant

    if (redef == "long") //check if it has white space after long, seems like it new line's
    {
        //NOTE FOR FUTURE: Stop being lazy and make this a function of its own
        //BUY
        std::cout << "Long ran";
        for (int i = 0; i < a; i++) //no comma with first line so 0 element 
        {
            context += inData[i];
        }
        x = std::stoi(context);

        for (int i = a+1; i < a1; i++) 
        {
            context += inData[i]; 
        }
        y = std::stoi(context);

        simClick(x,y);
        //BUY CONFIRM
        for (int i = a1+1; i < b; i++) //starting from pipeline??
        {
            context += inData[i];
        }
        x = std::stoi(context);

        for (int i = b+1; i < b1; i++) //starting with comma? +1 to fix
        {
            context += inData[i];
        }
        y = std::stoi(context);

        simClick(x, y);

    }
    else
    {
        std::cout << "long does not match";
    }
}

我很困惑，为什么字符串会出现新行？我怎样才能成功地评估输出？我是 C++ 的菜鸟，所以任何帮助都将不胜感激。

【问题讨论】：

要明确一点...您是根据 OCR 识别的某些单词的视觉输入来评估陈述？
是的。但是 tesseract api 返回的字符串似乎在字符串中包含类似于 "\n" 的内容。
您是否考虑过评估第一个字符或前几个字符而不是整个字符串变量作为替代方案？
我该怎么做？

标签： c++ string if-statement newline tesseract

【解决方案1】：

至于为什么它返回一个字符串和一个换行符，我无法回答。但我可以为您提供一个替代系统来替代您想要完成的工作。删除 analyseAction 函数和 if 语句中的第一个 for 循环以“长”传递...if(charArr[0] == 'l') {//do stuff} 如果您正在评估许多以“l”开头的单词，这确实有限制，在这种情况下评估第一个两个或三个字母的单词，只要它们不少于 2 或 3 个字母的单词。 PS。这是在手机上写的。

【讨论】：