使用堆栈库解码/编码文本文件 - 无法编码大文件 C++答案

【问题标题】：Decoding / Encloding Text File using Stack Library - Can't Encode Large Files C++使用堆栈库解码/编码文本文件 - 无法编码大文件 C++
【发布时间】：2014-04-23 00:07:53
【问题描述】：

我正在开发一个可以在 C++ 中对文本进行编码然后解码的程序。我正在使用堆栈库。该程序的工作方式是它首先要求您提供一个密码密钥，您手动输入该密钥。然后它会询问文件名，这是一个文本文件。如果它是一个普通的 txt 文件，它会将消息编码为一个新文件并添加一个 .iia 文件扩展名。如果文本文件已经具有 .iia 文件扩展名，那么它会解码消息，只要密码密钥与用于对其进行编码的密钥相同。

我的程序确实进行了编码和解码，但它解码的字符数由temp.size() % cypher.length() 决定，它位于readFileEncode() 函数的while 循环中。我认为这是阻止整个文件被编码然后正确解码的原因。换句话说，从说“example.txt.iia”解码回“example.txt”后的结束文件丢失了原始“example.txt”文件中的大部分文本。我只尝试了cypher.length()，但当然那不会编码或解码任何东西。整个过程由解码和编码的那个参数决定。

我似乎无法找到对任何大小文件中的所有字符进行编码和解码的确切逻辑。以下是执行解码和编码的函数的以下代码：

编辑：使用 WhozCraig 为我编辑的代码：

void readFileEncode(string fileName, stack<char> &text, string cypher)
{
    std::ifstream file(fileName, std::ios::in|std::ios::binary);
    stack<char> temp;
    char ch;

    while (file.get(ch))
        temp.push(ch ^ cypher[temp.size() % cypher.length()]);

    while (!temp.empty())
    {
        text.push(temp.top());
        temp.pop();
    }
}

编辑：需要一个堆栈。我将实现我自己的堆栈类，但我试图让它首先与堆栈库一起工作。另外，如果有更好的实现方式，请告诉我。否则，我相信除了让它通过循环对整个文件进行编码和解码之外，这并没有太大的问题。我只是不确定它为什么会停在，有时说 20 个字符或 10 个字符。我知道这与密码的长度有关，所以我相信它在 % (mod) 中。只是不知道如何重写。

编辑：好的，尝试了 WhozCraig 的解决方案，但我没有得到所需的输出，所以现在错误必须在我的主要内容中。这是我的主要代码：

#include <iostream> 
#include <iomanip> 
#include <fstream>
#include <string> 
#include <cstdlib>
#include <cctype>
#include <stack>


using namespace std;

void readFileEncode(string fileName, stack<char> &text, string cypher);

int main()
{
    stack<char> text;   // allows me to use stack from standard library
    string cypher;
    string inputFileName;
    string outputFileName;
    int position;

    cout << "Enter a cypher code" << endl;
    cin >> cypher;
    cout << "Enter the name of the input file" << endl;
    cin >> inputFileName;

    position = inputFileName.find(".iia");//checks to see if the input file has the iia extension

    if (position > 1){
        outputFileName = inputFileName;
        outputFileName.erase(position, position + 3);// if input file has the .iia extension it is erased 
    }
    else
        //outputFileName.erase(position, position + 3);// remove the .txt extension and
        outputFileName = inputFileName + ".iia";// add the .iia extension to file if it does not have it

    cout << "Here is the new name of the inputfile " << outputFileName << endl; // shows you that it did actually put the .iia on or erase it depending on the situation

    system("pause");

    readFileEncode(inputFileName, text, cypher); //calls function            

    std::ofstream file(outputFileName); // calling function

    while (text.size()){// goes through text file
        file << text.top();
        text.pop(); //clears pop
    }

    system("pause");
}

基本上，我正在读取 .txt 文件进行加密，然后在文件名上添加 .iia 文件扩展名。然后我回去，输入带有 .iia 扩展名的文件以将其解码。当我将其解码回来时，大约在前十个单词之后是胡言乱语。

@WhozCraig 文件中的空格、换行符或标点符号是否重要？也许有了这里的完整解决方案，您可以指导我找出问题所在。

【问题讨论】：

我知道您没有为此使用queue 或deque 是有原因的。我只知道... 为什么不仍然让我难以捉摸。
需要使用堆栈。我现在正在使用该库，但最终可能会构建自己的堆栈类。
你必须为 both 使用堆栈，还是仅仅引用参数就足够了？并不是说它一定会降低你的表现（这可以大大提高，如果有机会我会发布答案）。
@WhozCraig 两者都有？我不确定你的意思。您是在谈论仅将指针存储在堆栈上吗？我实际上是在尝试接受 user2445771 的建议，现在就使用字符串。我会尽可能更新该代码。我正在尽我所能扭转这种局面。

标签： c++ file-io encoding stack decoding

【解决方案1】：

如果我理解您要正确执行的操作，您希望整个文件与密钥中的字符进行旋转异或运算。如果是这种情况，您可能只需执行以下操作即可解决您的直接错误：

void readFileEncode(string fileName, stack<char> &text, string cypher)
{
    std::ifstream file(fileName, std::ios::in|std::ios::binary);
    stack<char> temp;
    char ch;

    while (file.get(ch))
        temp.push(ch ^ cypher[temp.size() % cypher.length()]);

    while (!temp.empty())
    {
        text.push(temp.top());
        temp.pop();
    }
}

最显着的变化是

使用std::ios::in|std::ios::binary 作为打开模式以二进制模式打开文件。这将消除为提取的每个字符调用 noskipws 操纵器（通常是一个函数调用）的需要。
使用file.get(ch) 提取下一个字符。成员将直接从文件缓冲区中提取下一个字符（如果可用），否则加载下一个缓冲区并重试。

替代方案

一个字符一个字符的方法会很昂贵，无论你怎么切片它。这将通过stack<> 得到vector 或deque 的支持，这对您没有任何好处。它正在经历其中的两个只会加剧痛苦。您也可以一次性加载整个文件，直接计算所有 XOR，然后通过反向迭代器将它们推送到您的堆栈：

void readFileEncode
(
    const std::string& fileName,
    std::stack<char> &text,
    const std::string& cypher
)
{
    std::ifstream file(fileName, std::ios::in|std::ios::binary);

    // retrieve file size
    file.seekg(0, std::ios::end);
    std::istream::pos_type pos = file.tellg();
    file.seekg(0, std::ios::beg);

    // early exit on zero-length file.
    if (pos == 0)
        return;

    // make space for a full read
    std::vector<char> temp;
    temp.resize(static_cast<size_t>(pos));
    file.read(temp.data(), pos);

    size_t c_len = cypher.length();
    for (size_t i=0; i<pos; ++i)
        temp[i] ^= cypher[i % c_len];

    for (auto it=temp.rbegin(); it!=temp.rend(); ++it)
        text.push(*it);
}

您仍然可以在调用方获得堆栈，但我认为您会对性能感到非常满意。

【讨论】：

顺便感谢您抽出这么多时间来发布这个！
您的显着变化是有意义的。我明白为什么要使用 file.get(ch)。然而，这个解决方案仍然不起作用。它解码了前 10 个左右的单词，之后的其他所有内容仍然是胡言乱语。不知道为什么。我现在正在看它。我不使用替代方法的唯一原因是因为我没有研究过向量以及您正在使用的一些函数和其他东西。我希望能够完全理解解决方案，我不得不说替代方案有点超出我的想象。
我已经针对 100MB（精确）文件测试了这两种解决方案，结果与我预期的完全一样。事实上，将堆栈转储到另一个文件并重复该过程会产生第一个文件，这是完全可以预料的。我不确定您使用的是什么工具链和系统，但按照编写的方式，两者都应该可以工作。
密码是什么无关紧要，只要它有长度。（即传递一个空字符串并不好）。只要您对 encr 和 decr 使用相同的对称密码就可以了。
好的。我刚刚检查了您的输出文件代码，您应该 not 使用流插入运算符。使用我上面的代码，您应该使用模式std::ios::out | std::ios::binary 并使用file.put(text.top()); 打开输出文件。试试看。

【解决方案2】：

仅供参考：永远不要逐个字符地读取文件，它需要几个小时才能完成 100Mb。读取至少 512 字节（在我的情况下，我直接读取 1 或 2Mb ==> 存储在 char * 中，然后处理）。

【讨论】：