【问题标题】:huffman compressor/decompressor霍夫曼压缩器/解压器
【发布时间】:2017-05-11 10:10:39
【问题描述】:

代码已更新为使用 unique_ptr 和命名空间。 注意:我尝试在命名空间 huffman 内实现匿名命名空间,但它不允许将文件分成 .cpp 和 .h。欢迎对当前代码提出任何批评。随意使用 MIT 协议中规定的代码。

source.cpp:

/*
#######################################################################################################################################
Copyright 2017 Daniel Rossinsky

Permission is hereby granted, free of charge, to any person obtaining a copy of this software
and associated documentation files (the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#######################################################################################################################################
*/

#include"Huffman.h"

int main(int argc, char *argv[]) {

    if (argc < 4) std::cout << "Too few arguments\n";
    else if (argc == 4) {
        if (*argv[1] == 'c') Huffman::compress(argv[2], argv[3], argv[3]);
        else if (*argv[1] == 'd') {
            std::string temp{ argv[2] };
            std::size_t pathEnd{ temp.find_last_of("/\\") };
            Huffman::decompress(argv[2], argv[3], temp.substr(0, pathEnd + 1));
        }//end of else if
        else std::cout << "Unknown command\n";
    }//end of else if
    else std::cout << "Too much arguments\n";
    return 0;

    //Huffman::compress("C:/Users/User/Desktop/test.txt", "C:/Users/User/Desktop/", "C:/Users/User/Desktop/");
    //Huffman::decompress("C:/Users/User/Desktop/testCompressed.bin", "C:/Users/User/Desktop/testKey.bin", "C:/Users/User/Desktop/");
}

/*
cmd example:
-----------
compress:
syntax: huffman.exe c filePath dest
example: C:/Users/User/Desktop/huffman.exe c C:/Users/User/Desktop/test.txt C:/Users/User/Desktop/

decompress:
syntax: huffman.exe d filePath keyPath
example: C:/Users/User/Desktop/huffman.exe d C:/Users/User/Desktop/testCompressed.bin C:/Users/User/Desktop/testKey.bin

NOTE:
-----
You can use the commented code in main instead
*/

霍夫曼.h:

#ifndef HUFFMAN
#define HUFFMAN

#include<iostream>
#include<map>
#include<vector>
#include<string>
#include<deque>
#include<memory>

namespace Huffman {
    namespace inner {
        struct node;

        /*type aliases*/
        using Table     = std::map<char, std::size_t>;
        using Cypher    = std::map<char, std::vector<bool> >;
        using smartNode = std::unique_ptr<node>;
        /*type aliases*/

        struct node {
            smartNode   m_left;
            smartNode   m_right;
            std::size_t m_frequency{};
            char        m_data{};

            node() = default;
            node(smartNode left, smartNode right) :
                m_left{ std::move(left) }, m_right{ std::move(right) } {
                m_frequency = m_left->m_frequency + m_right->m_frequency;
            }
        };

        struct functor {
            bool operator()(smartNode const& first, smartNode const& second) const
            {
                return first->m_frequency > second->m_frequency;
            }
        };

        /*shared functions*/
        smartNode makeTree(std::deque<smartNode>& nodeData);
        void readFile(const std::string& filePath, std::string& fileContent);
        std::deque<smartNode> storeFreqTable(const Table& table);
        /*shared functions*/

        /*compressor related functions*/
        void setNameAndExten(const std::string& filePath, std::string& fileName, std::string& fileExten);
        void UpdateFreqTable(Table& freqTable, const std::string& fileContent);
        void encode(smartNode const &root, Cypher& key, std::vector<bool>& code);
        void createBinaryFile(const std::string& filePath,
                              const std::string& fileName,
                              const std::string& fileContent,
                              Cypher& key,
                              std::vector<bool>& code);
        void createKey(const std::string& filePath,
                       const Table& freqTable,
                       const std::string& fileName,
                       const std::string& fileExten);
        /*compressor related functions*/

        /*decompressor related functions*/
        void readKey(Table& freqTable,
                     std::string& fileExten,
                     const std::string keyPath,
                     std::string& fileContent);
        std::size_t decodedContentSize(const Table& freqTable);
        void decode(const std::string& filePath,
                    std::string& decodedContent,
                    smartNode root,
                    std::string& fileName,
                    std::string& fileContent);
        void createFile(const std::string& decodedContent,
                        const std::string& locToDecompress,
                        const std::string& fileName,
                        const std::string& fileExten);
        /*decompressor related functions*/
    }//end of inner namespace

    void compress(const std::string& filePath, const std::string& locToCreateKey, const std::string& locToCompress);
    void decompress(const std::string& filePath, const std::string& keyPath, const std::string& locToDecompress);
}//end of Huffman namespace

#endif

霍夫曼.cpp:

#include"Huffman.h"
#include<fstream>
#include<sstream>
#include<algorithm>
#include<cstdlib>


/*----------------SHARED_FUNCTIONS_START----------------*/
Huffman::inner::smartNode Huffman::inner::makeTree(std::deque<smartNode>& nodeData) {
    while (nodeData.size() > 1) {
        std::sort(nodeData.begin(), nodeData.end(), functor());
        smartNode leftSon{ std::move(nodeData.back()) };
        nodeData.pop_back();
        smartNode rightSon{ std::move(nodeData.back()) };
        nodeData.pop_back();
        smartNode parent = std::make_unique<node>(std::move(leftSon), std::move(rightSon));
        nodeData.emplace_back(std::move(parent));
    }//end of while loop
    return std::move(nodeData.front());
}

void Huffman::inner::readFile(const std::string& filePath, std::string& fileContent) {
    std::ifstream inFile(filePath, std::ios::binary);
    if (inFile.is_open()) {
        auto const start_pos{ inFile.tellg() };
        inFile.ignore(std::numeric_limits<std::streamsize>::max());
        std::streamsize char_count{ inFile.gcount() };
        inFile.seekg(start_pos);
        fileContent = std::string(static_cast<std::size_t>(char_count), '0');
        inFile.read(&fileContent[0], static_cast<std::streamsize> (fileContent.size()));
        inFile.close();
    }//end of if
    else {
        std::cout << "Unable to open file\n";
        std::exit(EXIT_FAILURE);
    }//end of else
}

std::deque<Huffman::inner::smartNode> Huffman::inner::storeFreqTable(const Table& table) {
    std::deque<smartNode> nodeData;
    for (const auto& index : table) {
        smartNode leaf = std::make_unique<node>();
        leaf->m_data = index.first;
        leaf->m_frequency = index.second;
        nodeData.emplace_back(std::move(leaf));
    }//end of for loop
    return nodeData;
}
/*-----------------SHARED_FUNCTIONS_END-----------------*/

/*-----------------COMPRESSOR_FUNCTIONS_START-----------------*/
void Huffman::inner::setNameAndExten(const std::string& filePath,
                                     std::string& fileName,
                                     std::string& fileExten) {
    std::size_t foundName{ filePath.find_last_of("/\\") };
    std::size_t foundExten{ filePath.find_last_of('.') };
    fileName = filePath.substr(foundName + 1, foundExten - foundName - 1);
    fileExten = filePath.substr(foundExten);
}

void Huffman::inner::UpdateFreqTable(Table& freqTable, const std::string& fileContent) {
    for (const auto& data : fileContent) {
        ++freqTable[data];
    }//end of for loop
}

void Huffman::inner::encode(smartNode const &root,
                            Cypher& key,
                            std::vector<bool>& code) {
    if (root->m_left != nullptr) {
        code.emplace_back(false);
        encode(std::move(root->m_left), key, code);
    }//end of if
    if (root->m_right != nullptr) {
        code.emplace_back(true);
        encode(std::move(root->m_right), key, code);
    }//end of if 
    if (root->m_data) key[root->m_data] = code;
    if (!code.empty()) code.pop_back();
}

void Huffman::inner::createBinaryFile(const std::string& filePath,
                                      const std::string& fileName,
                                      const std::string& fileContent,
                                      Cypher& key,
                                      std::vector<bool>& code) {
    int offSet{}; int tempBuff{}; int inBuff{};
    std::ofstream outFile(filePath + fileName + "Compressed.bin", std::ios::binary);
    if (outFile.is_open()) {
        for (const auto& data : fileContent) {
            tempBuff = data;
            code = key[static_cast<char>(tempBuff)];
            for (const auto& index : code) {
                inBuff |= index << (7 - offSet);
                ++offSet;
                if (offSet == 8) {
                    offSet = 0;
                    outFile.put(static_cast<char>(inBuff));
                    inBuff = 0;
                }//end of if
            }//end of for loop
        }//end of for loop
        outFile.close();
    }//end of if
    else {
        std::cout << "Unable to open file\n";
        std::exit(EXIT_FAILURE);
    }//end of else
}

void Huffman::inner::createKey(const std::string& filePath,
                               const Table& freqTable,
                               const std::string& fileName,
                               const std::string& fileExten) {
    std::ofstream outFile(filePath + fileName + "Key.bin", std::ios::binary);
    if (outFile.is_open()) {
        auto&& index{ freqTable.begin() };
        do {
            outFile.put(index->first);
            outFile.put(' ');
            outFile << std::to_string(index->second);
            ++index;
            if (index != freqTable.end()) outFile.put(' ');
        } while (index != freqTable.end());
        outFile << fileExten;
        outFile.close();
    }//end of if
    else {
        std::cout << "Unable to open file\n";
        std::exit(EXIT_FAILURE);
    }//end of else
}
/*------------------COMPRESSOR_FUNCTIONS_END------------------*/

/*-----------------DECOMPRESSOR_FUNCTIONS_START-----------------*/
void Huffman::inner::readKey(Table& freqTable,
                             std::string& fileExten,
                             const std::string keyPath,
                             std::string& fileContent) {
    char buffer{};
    std::string freq{};
    readFile(keyPath, fileContent);
    for (std::size_t index{}; index < fileContent.length(); ++index) {
        buffer = fileContent[index];
        index += 2;
        do {
            freq += fileContent[index];
            ++index;
        } while ((fileContent[index] != ' ') && (fileContent[index] != '.'));
        if (fileContent[index] == '.') {
            fileExten = fileContent.substr(index, (fileContent.length() - 1));
            index = fileContent.length();
        }//end of if
        else {
            freqTable[buffer] = static_cast<unsigned int>(std::stoi(freq));
            freq.clear();
        }//end of else
    }//end of for
    freqTable[buffer] = static_cast<unsigned int>(std::stoi(freq));
    fileContent.clear();
    fileContent.shrink_to_fit();
}

std::size_t Huffman::inner::decodedContentSize(const Table& freqTable) {
    std::size_t size{};
    for (const auto& index : freqTable) size += index.second;                   
    return size;
}

void Huffman::inner::decode(const std::string& filePath,
                            std::string& decodedContent,
                            smartNode root,
                            std::string& fileName,
                            std::string& fileContent) {
    node* temp = root.get();
    int offSet{}; int inBuff{};
    std::size_t foundName{ filePath.find_last_of("/\\") };
    fileName = filePath.substr(foundName + 1, filePath.find_last_of('C') - foundName - 1);      
    readFile(filePath, fileContent);
    for (const auto& data : fileContent) {
        inBuff = data;
        while (offSet < 8) {
            if (inBuff & (1 << (7 - offSet))) temp = temp->m_right.get();
            else                              temp = temp->m_left.get();
            if (temp->m_data) {
                decodedContent += temp->m_data;
                temp = root.get();
            }//end of if 
            ++offSet;
        }//end of while
        offSet = 0;
    }//end of for
}

void Huffman::inner::createFile(const std::string& decodedContent,
                                const std::string& locToDecompress,
                                const std::string& fileName,
                                const std::string& fileExten) {
    std::ofstream outFile(locToDecompress + fileName + fileExten, std::ios::binary);
    if (outFile.is_open()) {
        outFile.write(&decodedContent[0], static_cast<std::streamsize>(decodedContent.size()));
        outFile.close();
    }//end of if
    else {
        std::cout << "Unable to open file\n";
        std::exit(EXIT_FAILURE);
    }//end of else
}
/*------------------DECOMPRESSOR_FUNCTIONS_END------------------*/

void Huffman::compress(const std::string& filePath,
                       const std::string& locToCreateKey,
                       const std::string& locToCompress) {
    std::string                        fileName;
    std::string                        fileExten;
    Huffman::inner::setNameAndExten(filePath, fileName, fileExten);

    std::string                        fileContent;
    Huffman::inner::readFile(filePath, fileContent);

    Huffman::inner::Table              freqTable;
    Huffman::inner::UpdateFreqTable(freqTable, fileContent);

    Huffman::inner::smartNode root = Huffman::inner::makeTree(Huffman::inner::storeFreqTable(freqTable));

    Huffman::inner::Cypher             key;
    std::vector<bool>                  code;
    encode(root, key, code);
    Huffman::inner::createBinaryFile(locToCompress, fileName, fileContent, key, code);
    Huffman::inner::createKey(locToCreateKey, freqTable, fileName, fileExten);
}

void Huffman::decompress(const std::string& filePath,
                         const std::string& keyPath,
                         const std::string& locToDecompress) {
    Huffman::inner::Table       freqTable;
    std::string                 fileExten;
    std::string                 fileContent;
    Huffman::inner::readKey(freqTable, fileExten, keyPath, fileContent);

    Huffman::inner::smartNode root = Huffman::inner::makeTree(Huffman::inner::storeFreqTable(freqTable));

    std::string                 fileName;
    std::string                 decodedContent;
    decodedContent.reserve(Huffman::inner::decodedContentSize(freqTable));
    decode(filePath, decodedContent, std::move(root), fileName, fileContent);
    Huffman::inner::createFile(decodedContent, locToDecompress, fileName, fileExten);
}

【问题讨论】:

  • 我认为使用双端队列不会获得任何明显的性能提升。你应该使用 std::unique_ptr 左右和你的根。然后你永远不必清理。为什么霍夫曼是一堂课?根本没有状态!公共函数使用命名空间,私有函数使用 cpp 文件中的匿名命名空间。
  • 这些都是好点!我将使用命名空间而不是类,但是智能指针的问题是我指出我并不精通,上次我尝试将它们添加到代码中时遇到了很多奇怪的错误,但我会再试一次。感谢重播顺便说一句!

标签: c++ memory-management c++14 smart-pointers huffman-code


【解决方案1】:

编辑:固定功能:

decode(const std::string& filePath,
                            std::string& decodedContent,
                            smartNode root,
                            std::string& fileName,
                            std::string& fileContent) {
    node* temp = root.get();
    int offSet{}; int inBuff{};
    std::size_t foundName{ filePath.find_last_of("/\\") };
    fileName = filePath.substr(foundName + 1, filePath.find_last_of('C') - foundName - 1);      
    readFile(filePath, fileContent);
    for (const auto& data : fileContent) {
        inBuff = data;
        while (offSet < 8) {
            if (inBuff & (1 << (7 - offSet))) temp = temp->m_right.get();
            else                              temp = temp->m_left.get();
            if (temp->m_data) {
                decodedContent += temp->m_data;
                temp = root.get();
            }//end of if 
            ++offSet;
        }//end of while
        offSet = 0;
    }//end of for
}

【讨论】:

  • 嗯,在原始代码中,你有temp=root,而在这段代码中,你有root = std::make_unique&lt;node&gt;(*temp)。这是一个巨大的变化。 Root 改变了平等的立场!
  • 仅当我将行更改为 [code]root = static_cast(temp);[/code] 不会引发错误但程序仍然崩溃我不知道如何修复这部分除了制作 smartNode shared_ptr
  • 整个函数没有意义。 Temp 被初始化为 root,然后你将 root 分配给 temp,这似乎毫无意义。我怀疑整个功能是无稽之谈。那个静态演员是个坏主意。将其更改为共享指针可能会消除我们的构建错误和崩溃,但它不会修复损坏的函数。唯一 ptr 的要点是,如果您真的想要唯一所有权,它会在您搞砸时破坏构建,而不是在运行时。这很好,您希望它在构建时在您的逻辑被破坏时中断。
  • 好吧,如果您使用我的原始代码,该功能非常相似并且可以完美运行,问题是我指出我不精通智能指针,我只知道基础知识,这就是为什么我用试错法来做。现在我正在尝试绕过使用 temp 并消除它,但到目前为止还没有成功。如果你能给我一些关于如何继续前进的建议,我会很高兴的。
  • 当你重写它时,你完全改变了temp 的使用方式。 temp=rootroot=temp 是完全不同的东西。没有智能指针的原始代码的赋值是一种方式;带有智能指针的代码你颠倒了分配。我不知道哪个是概念错误,哪个不是,但它们都没有意义。
【解决方案2】:

调整node:

using upnode = std::unique_ptr<node>;
struct node {
  upnode       m_left;
  upnode       m_right;
  std::size_t m_frequency{};
  char        m_data{};

  node()=default;
  node(upnode left, upnode right) :
    m_left{ std::move(left) }, m_right{ std::move(right) }
  {
    m_frequency = m_left->m_frequency + m_right->m_frequency;
  }
};

删除此 API:

void Huffman::deleteTree(node* root)

使用东西的代码:

// std::move(nodeData) into `makeTree`:
upnode Huffman::makeTree(std::deque<upnode> nodeData) {
  while (nodeData.size() > 1) {
    // functor must take upnode const&:
    std::sort(nodeData.begin(), nodeData.end(), functor());
    upnode leftSon{ std::move(nodeData.back()) };
    nodeData.pop_back();
    upnode rightSon{ std::move(nodeData.back()) };
    nodeData.pop_back();
    upnode parent = std::make_unique<node>(std::move(leftSon), std::move(rightSon));
    nodeData.emplace_back(std::move(parent));
  }//end of while loop
  return std::move(nodeData.front());
}
// return the deque here, instead of return-by-reference
std::deque<upnode> Huffman::storeFreqTable(const table& table) {
  std::deque<upnode> nodeData;
  for (const auto& index : table) {
    upnode leaf = std::make_unique<node>();
    leaf->m_data = index.first;
    leaf->m_frequency = index.second;
    nodeData.emplace_back(std::move(leaf));
  }//end of for loop
  return nodeData; // move is implicit
}
void Huffman::encode(upnode const &root,
                 cypher& key,
                 std::vector<bool>& code) {
  if (root->m_left != nullptr) {
    code.emplace_back(false);
    encode(root->m_left, key, code);
  }//end of if
  if (root->m_right != nullptr) {
    code.emplace_back(true);
    encode(root->m_right, key, code);
  }//end of if 
  if (root->m_data) key[root->m_data] = code;
  if (!code.empty()) code.pop_back();
}

示例使用更改。我还将变量使用移到了初始化附近。对于大多数函数来说,存在一大堆变量但其中包含垃圾数据是没有意义的。

void Huffman::compress(
  const std::string& filePath,
  const std::string& locToCreateKey,
  const std::string& locToCompress
) {
  std::string                        fileName;
  std::string                        fileExten;
  setNameAndExten(filePath, fileName, fileExten);

  std::string                        fileContent;
  readFile(filePath, fileContent);

  table                              freqTable;
  UpdateFreqTable(freqTable, fileContent);

  // these two lines could become one:
  std::deque<upnode> nodeData = storeFreqTable(freqTable);
  uproot root = makeTree(std::move(nodeData));
  // auto root = makeTree(storeFreqTable(freqTable));

  cypher                             key;
  std::vector<bool>                  code;
  encode(root, key, code);
  createBinaryFile(locToCompress, fileName, fileContent, key, code);
  createKey(locToCreateKey, freqTable, fileName, fileExten);
  /*compressor algorithm*/

  /*memory release*/
  root.reset(); // really, optional, destruction of root var does it
}

【讨论】:

  • 感谢您发布此代码!我将首先尝试自己实现它并将其用作备忘单,它简单明了。
  • 我正在努力更改编码函数,它不想与 unique_ptr 合作,因为它需要另一个指向 root 对象的指针,而 unique_ptr 不允许,将其设为 shared_ptr 是否是个好主意并解决问题?
  • @globalturist 不,shared_ptr 仅应在您拥有必须复杂且一次在多个位置进行管理的生命周期对象时使用。我在上面写了一个encode;它需要 unique_ptr 引用并且不复制?
  • 对不起,命名错误的函数我的意思是解码我会发布到目前为止我重写的代码。
  • @globalturist 指向具有足够生命周期的对象的非拥有指针可以是原始指针。所以temp 可以是一个原始指针。使用temp=root.get()进行设置。
猜你喜欢
  • 2023-03-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多