lzma totalread 比标头未压缩大小大 1答案

【问题标题】：lzma totalread is greater than header uncompressed size by 1lzma totalread 比标头未压缩大小大 1
【发布时间】：2020-10-14 21:28:46
【问题描述】：

我正在尝试使用 easylzma 库使用 lzma 解压缩文件，有些文件运行良好，但随机文件无法解压缩。
经过一些调试后，我发现总读取数比标头 uncompressedSize 大 1，而且流式传输的标头也为 0。
代码说没有页脚，但是当我从总读取中减去 1 以跳过错误时，文件已正确解压缩，但在文件末尾添加了一行，其中有多个字段为 0，单个字段为值。这些文件是来自 dukascopy 的 .bi5。
我想确定错误是由于我使用的库中的一些错误逻辑引起的，还是文件错误，在这种情况下应该怎么做。
使用的库是来自 github 的 easylzma-master 和 dukascopy-master，文件是从 dukascopy 服务器下载的。
2020 年 9 月 30 日“september is 8”的 13h_ticks.bi5 和 21_ticks.bi5 文件正好显示了这个问题。

更新：
我没有输入代码，因为我现在正在询问指南，代码存在并且它显示了问题。但它是库代码。所以我想知道是否有人对 dukascopy bi5 类型的特定文件和这个 lzma 有同样的问题图书馆。我现在只是在寻找一般规则“在 lzma 解压缩中，我们什么时候得到总读取大于标头未压缩大小重复 1 的行为？这是否意味着有页脚但在标头字节中没有提及？？ "

更新：
这就是我打开文件的方式

int HTTPRequest::read_bi5_main(boost::filesystem::path p, ptime epoch)
{
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock(mBOOST_LOGMutex,boost::defer_lock);
    boost::unique_lock<boost::mutex> read_bi5_to_bin_lock2(m_read_bi5_to_binMutex, boost::defer_lock);

    unsigned char *buffer;
    size_t buffer_size;

    int counter;

    size_t raw_size = 0;

    std::string filename_string = p.generic_string();
    path p2 = p;
    p2.replace_extension(".bin");
    std::string filename_string_to_bin =p2.generic_string() ;

    path p3 = p;
    p3.replace_extension(".csv");
    std::string filename_string_to_csv = p3.generic_string();

    const char *filename = filename_string.c_str();
    const char *filename_to_bin = filename_string_to_bin.c_str();
    const char *filename_to_csv = filename_string_to_csv.c_str();

    //22-9-2020 here I open the downloaded file if possible
    if (fs::exists(p) && fs::is_regular(p))
    {
        buffer_size = fs::file_size(p);
        buffer = new unsigned char[buffer_size];
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data file. |"
            << filename << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    //22-9-2020 here I read the downloaded file into filestream
    std::ifstream fin(filename, std::ifstream::binary);
    fin.read(reinterpret_cast<char*>(buffer), buffer_size);
    fin.close();

    //22-9-2020 here I check if file is related to japanese yen so that I determine how to write its value
    /*
    if symbols_xxx has mHTTPRequest_Symbol_str then PV=0.001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.0001
    else if symbols_xxxx has mHTTPRequest_Symbol_str then PV=0.00001
    */
    //28-9-2020 I will make 3 vectors in utils.h for 3,4,5 point value ,then I find symbol in vector,
    //std::size_t pos = mHTTPRequest_Symbol_str.find("JPY");

    double PV;

    std::vector<std::string>::iterator it3 = std::find(point_value_xxx.begin(), point_value_xxx.end(), mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it4 = std::find(point_value_xxxx.begin(), point_value_xxxx.end(), mHTTPRequest_Symbol_str);

    std::vector<std::string>::iterator it5 = std::find(point_value_xxxxx.begin(), point_value_xxxxx.end(), mHTTPRequest_Symbol_str);
    if (it3 != point_value_xxx.end())
    {
        PV = 0.001;
    }
    else if (it4 != point_value_xxxx.end())
    {
        PV = 0.0001;
    }
    else if (it5 != point_value_xxxxx.end())
    {
        PV = 0.00001;
    }
    else
    {
        //10-1-2020throw;
        PV = 0.001;

    }
    read_bi5_to_bin_lock2.lock();
    unsigned char *data_bin_buffer = 0 ;
    n47::tick_data *data = n47::read_bi5_to_bin(
            buffer, buffer_size, epoch, PV, &raw_size, &data_bin_buffer);

    //5-11-2020 here i will save binary file
    std::string file_name_path_string=output_compressed_file_2(&data_bin_buffer, raw_size, filename_to_bin);
    read_bi5_to_bin_lock2.unlock();

    path file_name_path_2{ file_name_path_string };
    buffer_size = 0;
    if (fs::exists(file_name_path_2) && fs::is_regular(file_name_path_2))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << boost::this_thread::get_id() <<"\t we can access the data .bin file. |"
            << filename_to_bin << "| with size ="<< fs::file_size(file_name_path_2) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    else {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Error: couldn't access the data .bin file. |"
            << filename_to_bin << "|" << std::endl;
        read_bi5_to_bin_lock.unlock();
        return 2;
    }

    n47::tick_data_iterator iter;

    //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks
    if (data == 0)
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Failed to load the data!" << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //5-15-2020 take care that without else ,error happens with empty files because data is pointer to vector of pointers to ticks .so when data is made inside read_bi5 ,it is made as null pointer and later it is assigned to vector if file has ticks.if file does not have ticks ,then it is just returned as null pointer .so when dereferencing null pointer we got error
    else if (data->size() != (raw_size / n47::ROW_SIZE))
    {
        read_bi5_to_bin_lock.lock();
        BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "Failure: Loaded " << data->size()
            << " ticks but file size indicates we should have loaded "
            << (raw_size / n47::ROW_SIZE) << std::endl;
        read_bi5_to_bin_lock.unlock();
    }
    //22-9-2020 in last if and if else I checked if file is either empty or has error of data size So now I have good clean file to work with
    //read_bi5_to_bin_lock.lock();
    //BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << "time, bid, bid_vol, ask, ask_vol" << std::endl;
    //read_bi5_to_bin_lock.unlock();

    counter = 0;

    std::ofstream out_csv(filename_string_to_csv);
    if (data == 0)
    {

    }
    else if (data != 0)
    {
        for (iter = data->begin(); iter != data->end(); iter++) {
            //5-11-2020 here i will save file.csv from data which is pointer to vector to pointers to ticks>>>>>>>here i should open file stream for output and save data to it
            out_csv
            //<< std::setfill('0')<<std::setw(sizeof((*iter)->epoch + (*iter)->td))<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            //<< std::setfill('0')<<std::setw(27)<<std::fixed<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<((*iter)->epoch + (*iter)->td) << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<<std::fixed << (*iter)->bid << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<<std::fixed << (*iter)->bidv << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<<std::fixed << (*iter)->ask << ","
            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<<std::fixed << (*iter)->askv << std::endl;
            //??5-17-2020 isolate multithreaded error
            /*
            read_bi5_to_bin_lock.lock();
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                boost::this_thread::get_id() << "\t"<<((*iter)->epoch + (*iter)->td) << ", "
                << (*iter)->bid << ", " << (*iter)->bidv << ", "
                << (*iter)->ask << ", " << (*iter)->askv << std::endl;
            BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) <<
                            boost::this_thread::get_id() << "\t"<< std::setfill('0')<< std::setw(sizeof((*iter)->epoch + (*iter)->td))<<((*iter)->epoch + (*iter)->td) << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bid)<< (*iter)->bid << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->bidv)<< (*iter)->bidv << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->ask)<< (*iter)->ask << ","
                            << std::setfill('0')<<std::setw(sizeof(*iter)->askv)<< (*iter)->askv << std::endl;
            read_bi5_to_bin_lock.unlock();
            */
            counter++;
        }
        ////read_bi5_to_bin_lock.unlock();

    }
    out_csv.close();
    //5-13-2020

    //??5-17-2020 isolate multithreaded error
    read_bi5_to_bin_lock.lock();

    BOOST_LOG((*mHTTPRequest_LoggingInstance_shared_pointer).mloggerCoutLog) << ".end." << std::endl << std::endl
        << "From " << raw_size << " bytes we read " << counter
        << " records." << std::endl
        << raw_size << " / " << n47::ROW_SIZE << " = "
        << (raw_size / n47::ROW_SIZE) << std::endl;
    read_bi5_to_bin_lock.unlock();


    delete data;
    delete[] buffer;
    delete [] data_bin_buffer;
    return 0;
}

这是我的 dukascopy 修改文件

//#include "stdafx.h"

/*
Copyright 2013 Michael O'Keeffe (a.k.a. ninety47).

This file is part of ninety47 Dukascopy toolbox.

The "ninety47 Dukascopy toolbox" is free software: you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of the License,
or any later version.

"ninety47 Dukascopy toolbox" is distributed in the hope that it will be
useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
Public License for more details.

You should have received a copy of the GNU General Public License along with
"ninety47 Dukascopy toolbox".  If not, see <http://www.gnu.org/licenses/>.
*/

#include "ninety47/dukascopy.h"
#include <boost/date_time/posix_time/posix_time.hpp>
#include <algorithm>
#include <vector>
#include "ninety47/dukascopy/defs.h"
#include "ninety47/dukascopy/io.hpp"
#include "ninety47/dukascopy/lzma.h"



namespace n47 {

namespace pt = boost::posix_time;


tick *tickFromBuffer(
        unsigned char *buffer, pt::ptime epoch, float digits, size_t offset) {
    bytesTo<unsigned int, n47::BigEndian> bytesTo_unsigned;
    bytesTo<float, n47::BigEndian> bytesTo_float;

    unsigned int ts = bytesTo_unsigned(buffer + offset);
    pt::time_duration ms = pt::millisec(ts);
    unsigned int ofs = offset + sizeof(ts);
    float ask = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    float bid = bytesTo_unsigned(buffer + ofs) * digits;
    ofs += sizeof(ts);
    //28-9-2020 convert volume to million
    float askv = bytesTo_float(buffer + ofs) *1000000;
    ofs += sizeof(ts);
    float bidv = bytesTo_float(buffer + ofs) *1000000;

    return new tick(epoch, ms, ask, bid, askv, bidv);
}


tick_data* read_bin(
        unsigned char *buffer, size_t buffer_size, pt::ptime epoch, float point_value) {
    std::vector<tick*> *data = new std::vector<tick*>();
    std::vector<tick*>::iterator iter;

    std::size_t offset = 0;

    while ( offset < buffer_size ) {
        data->push_back(tickFromBuffer(buffer, epoch, point_value, offset));
        offset += ROW_SIZE;
    }

    return data;
}


tick_data* read_bi5(
        unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
        float point_value, size_t *bytes_read) {
    tick_data *result = 0;

    // decompress
    int status;
    unsigned char *buffer = n47::lzma::decompress(lzma_buffer,
            lzma_buffer_size, &status, bytes_read);

    //5-11-2020 here i will save binary file


    if (status != N47_E_OK) {
        bytes_read = 0;
    } else {
        // convert to tick data (with read_bin).
        result = read_bin(buffer, *bytes_read, epoch, point_value);
        delete [] buffer;
    }

    return result;
}

//5-11-2020
tick_data* read_bi5_to_bin(
    unsigned char *lzma_buffer, size_t lzma_buffer_size, pt::ptime epoch,
    float point_value, size_t *bytes_read, unsigned char** buffer_decompressed) {
    tick_data *result = 0;

    // decompress
    int status;
    *buffer_decompressed = n47::lzma::decompress(lzma_buffer,
        lzma_buffer_size, &status, bytes_read);

    if (status != N47_E_OK) 
    {
        bytes_read = 0;
    }
    else {
        // convert to tick data (with read_bin).
        result = read_bin(*buffer_decompressed, *bytes_read, epoch, point_value);
        //delete[] buffer;
    }

    return result;
}


tick_data* read(
        const char *filename, pt::ptime epoch, float point_value, size_t *bytes_read) {
    tick_data *result = 0;
    size_t buffer_size = 0;
    unsigned char *buffer = n47::io::loadToBuffer<unsigned char>(filename, &buffer_size);

    if ( buffer != 0 ) {
        if ( n47::lzma::bufferIsLZMA(buffer, buffer_size) ) {
            result = read_bi5(buffer, buffer_size, epoch, point_value, bytes_read);
            // Reading in as bi5 failed lets double check its not binary
            // data in the buffer.
            if (result == 0) {
                result = read_bin(buffer, buffer_size, epoch, point_value);
            }
        } else {
            result = read_bin(buffer, buffer_size, epoch, point_value);
            *bytes_read = buffer_size;
        }
        delete [] buffer;

        if (result != 0 && result->size() != (*bytes_read / n47::ROW_SIZE)) {
            delete result;
            result = 0;
        }
    }
    return result;
}

}  // namespace n47

【问题讨论】：

请花一些时间刷新the help pages，采取SO tour，阅读How to Ask，以及this question checklist。请不要忘记如何创建您自己的代码的minimal reproducible example，以及如何edit 您的问题来改进它。
我没有放代码，因为我现在正在询问指南，代码存在并且它显示了问题。但它是库代码。所以我想知道是否有人对此规范有同样的问题dukascopy bi5 类型的文件和这个 lzma 库。我现在只是在寻找“什么时候在 lzma 解压缩中我们得到的行为总读取大于标题未压缩大小重复 1？？这是否意味着有页脚但它头字节中没有提到？？”
那么您确定问题出在库本身，而不是因为代码中的错误或错误？你真的百分百确定吗？然后仍然创建一个minimal reproducible example 并在给库维护者的错误报告中使用它。
在图书馆页面上提出问题
我的代码很少，它只是下载文件。问题是压缩文件有错误，在解压缩库中显示为额外的行，或者库有一些东西添加了不同文件的单行，只有单行。所以我不确定它是库还是文件。其他文件正确解压缩

标签： c++ lzma

【解决方案1】：

我挖掘到 lzma 工作的细节，这对我来说很重，所以我更改了库并使用了 7z cpp lzma 规范文件。
它有效。
我认为这个问题与在 cpp 程序中使用 c 代码有关。
该库还声明它已针对 bsd 进行了测试谢谢你的帮助。任何有相同案例的人下载 7zip 并使用 cpp 文件

【讨论】：