SHA256 查找部分冲突答案

【问题标题】：SHA256 Find Partial CollisionSHA256 查找部分冲突
【发布时间】：2020-03-16 07:37:07
【问题描述】：

我有两条消息：

messageA: "Frank is one of the "best" students topicId{} "

messageB: "Frank is one of the "top" students topicId{} "

我需要找到这两条消息的 SHA256 部分冲突（8 位）。因此，SHA256(messageA) 的前 8 个摘要 == SHA256(messageB) 的前 8 个摘要

我们可以在{} 中输入任何字母和数字，两个{} 应该有相同的字符串

我已经尝试过使用哈希表的蛮力和生日攻击来解决这个问题，但是它花费了太多时间。我知道像 Floyd 和 Brent 这样的循环检测算法，但是我不知道如何为这个问题构建循环。有没有其他方法可以解决这个问题？非常感谢！

【问题讨论】：

您确实意识到，在 SHA256 上实现 8 个十六进制数字的部分碰撞的蛮力平均需要 20 亿轮（最多 42 亿，或 2**32）SHA256 计算，对吧?您是否意识到这是安全散列算法的重点？没有比蛮力更能找到碰撞的已知方法吗？对吗？
这个问题有点不清楚：这两条消息应该是相同的topicID，还是可以不同（答案假设是后者）？。
嗨，@HansOlsson，两个 topicId 应该是一样的。非常感谢！

标签： algorithm cryptography cryptanalysis

【解决方案1】：

用生日攻击来解决这个问题非常简单。以下是我在 Python (v2) 中的做法：

def find_collision(ntries):
    from hashlib import sha256
    str1 = 'Frank is one of the "best" students topicId{%d} '
    str2 = 'Frank is one of the "top" students topicId{%d} '
    seen = {}
    for n in xrange(ntries):
        h = sha256(str1 % n).digest()[:4].encode('hex')
        seen[h] = n
    for n in xrange(ntries):
        h = sha256(str2 % n).digest()[:4].encode('hex')
        if h in seen:
            print str1 % seen[h]
            print str2 % n

find_collision(100000)

如果您的尝试花费了太长时间才找到解决方案，那么您要么只是在某处编码错误，要么您使用了错误的数据类型。

Python 的字典数据类型是使用哈希表实现的。这意味着您可以在恒定时间内搜索字典元素。如果您在上面的代码中使用列表而不是字典来实现seen，那么第 11 行的搜索将花费更长的时间。

编辑：

如果两个topicId 令牌必须相同，那么——正如 cmets 中所指出的——除了在 2³¹ 值的顺序中的某个地方进行打磨外，别无选择。您会最终发现碰撞，但这可能需要很长时间。

让这个运行一夜之间，如果运气好的话，你会在早上得到答案：

def find_collision():
    from hashlib import sha256
    str1 = 'Frank is one of the "best" students topicId{%x} '
    str2 = 'Frank is one of the "top" students topicId{%x} '
    seen = {}
    n = 0
    while True:
        if sha256(str1 % n).digest()[:4] == sha256(str2 % n).digest()[:4]:
            print str1 % n
            print str2 % n
            break
        n += 1

find_collision()

如果您赶时间，可以考虑使用 GPU 来加速哈希计算。

【讨论】：

您好，感谢您的回答。但是，{} 中的两个字符串应该相同。

【解决方案2】：

我假设问题中字符串末尾的空格是故意的，所以我把它留在了。

“Frank 是“顶尖”学生之一 topicId{59220691223} ” 6026d9b323898bcd7ecdbcbcd575b0a1d9dc22fd9e60074aefcbaade494a50ae

“Frank 是“最好的”学生之一 topicId{59220691223} ” 6026d9b31ba780bb9973e7cfc8c9f74a35b54448d441a61cc9bf8db0fcae5280

实际上，使用蛮力找到了大约 70 亿次尝试，比我预期的要多得多。

我认为 2^32 大约是 43 亿次，因此在 43 亿次尝试后找不到任何匹配项的几率约为 36.78%

我实际上在大约 70 亿次尝试后找到了匹配项，在 70 亿次尝试中没有匹配项的可能性不到 20%。

这是我在 7 个线程上运行的 C++ 代码，每个线程都有不同的起点，一旦在任何线程上找到匹配项，它就会退出。每个线程还会更新其进度，以每 100 万次尝试计算一次。

我已快速转发到在 threadId=5 上找到匹配项的位置，因此运行时间不到一分钟。但是，如果您更改起点，则可以寻找其他匹配项。

而且我也不确定如何使用 Floyd 和 Brent，因为字符串必须使用相同的 topicId，因此您被锁定在前缀和后缀上。

/*
To compile go get picosha2 header file from https://github.com/okdshin/PicoSHA2 
Copy this code into same directory as picosha2.h file, save it as hash.cpp for example.
On Linux go to command line and cd to directory where these files are. 

To compile it:
g++ -O2 -o hash hash.cpp -l pthread

And run it:
./hash

*/

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

// I used picoSHA2 header only file for the hashing
// https://github.com/okdshin/PicoSHA2
#include "picosha2.h"


// return 1st 4 bytes (8 chars) of SHA256 hash
std::string hash8(const std::string& src_str) {
    std::vector<unsigned char> hash(picosha2::k_digest_size);
    picosha2::hash256(src_str.begin(), src_str.end(), hash.begin(), hash.end());
    return picosha2::bytes_to_hex_string(hash.begin(), hash.begin() + 4);
}

bool done = false;
std::mutex mtxCout;

void work(unsigned long long threadId) {
    std::string a = "Frank is one of the \"best\" students topicId{",
        b = "Frank is one of the \"top\" students topicId{";
        
    // Each thread gets a different starting point, I've fast forwarded to the part 
    // where I found the match so this won't take long to run if you try it, < 1 minute.
    // If you want to run a while drop the last "+ 150000000ULL" term and it will run 
    // for about 1 billion total (150 million each thread, assuming 7 threads) take 
    // about 30 minutes on Linux.
    // Collision occurred on threadId = 5, so if you change it to use less than 6 threads  
    // then your mileage may vary.
    
    unsigned long long start = threadId * (11666666667ULL + 147000000ULL) + 150000000ULL;
    unsigned long long x = start;
    
    for (;;) {
        // Not concerned with making the reading/updating "done" flag atomic, unlikely
        // 2 collisions are found at once on separate threads, and writing to cout 
        // is guarded anyway.
        
        if (done) return;
        std::string xs = std::to_string(x++);
        std::string hashA = hash8(a + xs + "} "), hashB = hash8(b + xs + "} ");
        
        if (hashA == hashB) {
            std::lock_guard<std::mutex> lock(mtxCout);
            std::cout << "*** SOLVED ***" << std::endl;
            std::cout << (x-1) << std::endl;
            std::cout << "\"" << a << (x - 1) << "} \" = " << hashA << std::endl;
            std::cout << "\"" << b << (x - 1) << "} \"  = " << hashB << std::endl;
            done = true;
            return;
        }
        
        if (((x - start) % 1000000ULL) == 0) {
            std::lock_guard<std::mutex> lock(mtxCout);
            std::cout << "thread: " << threadId << " = " << (x-start) 
                << " tries so far" << std::endl;
        }
    }
}

void runBruteForce() {
    const int NUM_THREADS = 7;
    std::thread threads[NUM_THREADS];
    for (int i = 0; i < NUM_THREADS; i++) threads[i] = std::thread(work, i);
    for (int i = 0; i < NUM_THREADS; i++) threads[i].join();
}

int main(int argc, char** argv) {
    runBruteForce();
    return 0;
}

【讨论】：