无法在 tesseract 中打开印地语语言的 Cube 语言模型参数答案

【问题标题】：unable to open Cube language model params for hindi Language in tesseract无法在 tesseract 中打开印地语语言的 Cube 语言模型参数
【发布时间】：2016-05-25 23:47:45
【问题描述】：

Tesseract 无法读取立方体语言模型。 tesseract 1.png output.txt -l hin 执行上述命令后出现以下错误。

Cube ERROR (CubeRecoContext::Load): unable to read cube language model params from /usr/share/tesseract-ocr/tessdata/hin.cube.lm
Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext object
init_cube_objects(false, &tessdata_manager):Error:Assert failed:in file tessedit.cpp, line 207
Segmentation fault

我从哪里获得 hin.cube.lm 文件以及如何处理该文件？

【问题讨论】：

我也遇到了同样的问题，请问您找到解决办法了吗？我刚刚将 github hin 文件从 tessedata 复制到我的本地文件夹。
@mridul 我也做同样的事情。将文件复制到本地文件夹，它的工作。

标签： ocr tesseract hindi language-model

【解决方案1】：

我通过安装以下文件的正确版本修复了这个错误：

hin.cube.bigrams
hin.cube.fold
hin.cube.lm
hin.cube.nn
hin.cube.params
hin.cube.word-freq
hin.tesseract_cube.nn

连同正确版本的印地语和英语训练数据。

以上所有文件均可在以下位置获得： https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-304305

我把这些文件放在：/usr/local/share/tessdata

这是在 CentOS 7.2 上

【讨论】：

你在哪里找到这些文件的？？我设法在 github 上只找到了受过训练的数据 :(
github.com/tesseract-ocr/tesseract/wiki/…
你能告诉我在哪里可以找到日本等其他语言的立方体数据吗？我找到了这个，谢谢你的回复