【问题标题】:Python implementation of Jenkins Hash?Jenkins Hash的Python实现?
【发布时间】:2010-07-19 08:55:40
【问题描述】:

是否存在 Jenkins hash 算法的原生 Python 实现?

我需要一个哈希算法,它接受任意字符串并将其转换为 32 位整数。对于给定的字符串,它必须保证跨平台返回相同的整数。

我查看了 ELF 哈希算法,其中我找到了一个 Python 实现。考虑到上述标准,这可能是一个合适的替代品吗? (http://www.partow.net/programming/hashfunctions/#ELFHashFunction)

【问题讨论】:

    标签: python hash


    【解决方案1】:

    这个原生 python 代码应该给你与原始 lookup3.c 相同的哈希

    # Need to constrain U32 to only 32 bits using the & 0xffffffff
    # since Python has no native notion of integers limited to 32 bit
    # http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
    
    '''Original copyright notice:
        By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may use this
        code any way you wish, private, educational, or commercial.  Its free.
    '''
    
    def rot(x,k):
        return (((x)<<(k)) | ((x)>>(32-(k))))
    
    def mix(a, b, c):
        a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff
        a -= c; a &= 0xffffffff; a ^= rot(c,4);  a &= 0xffffffff; c += b; c &= 0xffffffff
        b -= a; b &= 0xffffffff; b ^= rot(a,6);  b &= 0xffffffff; a += c; a &= 0xffffffff
        c -= b; c &= 0xffffffff; c ^= rot(b,8);  c &= 0xffffffff; b += a; b &= 0xffffffff
        a -= c; a &= 0xffffffff; a ^= rot(c,16); a &= 0xffffffff; c += b; c &= 0xffffffff
        b -= a; b &= 0xffffffff; b ^= rot(a,19); b &= 0xffffffff; a += c; a &= 0xffffffff
        c -= b; c &= 0xffffffff; c ^= rot(b,4);  c &= 0xffffffff; b += a; b &= 0xffffffff
        return a, b, c
    
    def final(a, b, c):
        a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff
        c ^= b; c &= 0xffffffff; c -= rot(b,14); c &= 0xffffffff
        a ^= c; a &= 0xffffffff; a -= rot(c,11); a &= 0xffffffff
        b ^= a; b &= 0xffffffff; b -= rot(a,25); b &= 0xffffffff
        c ^= b; c &= 0xffffffff; c -= rot(b,16); c &= 0xffffffff
        a ^= c; a &= 0xffffffff; a -= rot(c,4);  a &= 0xffffffff
        b ^= a; b &= 0xffffffff; b -= rot(a,14); b &= 0xffffffff
        c ^= b; c &= 0xffffffff; c -= rot(b,24); c &= 0xffffffff
        return a, b, c
    
    def hashlittle2(data, initval = 0, initval2 = 0):
        length = lenpos = len(data)
    
        a = b = c = (0xdeadbeef + (length) + initval)
    
        c += initval2; c &= 0xffffffff
    
        p = 0  # string offset
        while lenpos > 12:
            a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24)); a &= 0xffffffff
            b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24)); b &= 0xffffffff
            c += (ord(data[p+8]) + (ord(data[p+9])<<8) + (ord(data[p+10])<<16) + (ord(data[p+11])<<24)); c &= 0xffffffff
            a, b, c = mix(a, b, c)
            p += 12
            lenpos -= 12
    
        if lenpos == 12: c += (ord(data[p+8]) + (ord(data[p+9])<<8) + (ord(data[p+10])<<16) + (ord(data[p+11])<<24)); b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24)); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 11: c += (ord(data[p+8]) + (ord(data[p+9])<<8) + (ord(data[p+10])<<16)); b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24)); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 10: c += (ord(data[p+8]) + (ord(data[p+9])<<8)); b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24)); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 9:  c += (ord(data[p+8])); b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24)); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 8:  b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24)); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 7:  b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16)); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 6:  b += ((ord(data[p+5])<<8) + ord(data[p+4])); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24))
        if lenpos == 5:  b += (ord(data[p+4])); a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24));
        if lenpos == 4:  a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24))
        if lenpos == 3:  a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16))
        if lenpos == 2:  a += (ord(data[p+0]) + (ord(data[p+1])<<8))
        if lenpos == 1:  a += ord(data[p+0])
        a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff
        if lenpos == 0: return c, b
    
        a, b, c = final(a, b, c)
    
        return c, b
    
    def hashlittle(data, initval=0):
        c, b = hashlittle2(data, initval, 0)
        return c
    
    if __name__ == "__main__":
        import sys
        hashstr = 'Four score and seven years ago'
        hash, hash2 = hashlittle2(hashstr, 0xdeadbeef, 0xdeadbeef)
        print '"%s": %x %x' % (hashstr, hash, hash2)
    
        hash = hashlittle(hashstr, 0)
        print '"%s": %x' % (hashstr, hash)
    

    【讨论】:

    • 太棒了!,完美运行,我测试了相同哈希的本地 c 实现,非常感谢!,你为我节省了大量工作:)
    【解决方案2】:

    这是一个最小的实现:

    def hash(x):
        """ Jenkins Hash 32bits."""
        x = (x + int('7ed55d16', 16)) + (x << 12)
        x = (x ^ int('c761c23c', 16)) ^ (x >> 19)
        x = (x + int('165667b1', 16)) + (x << 5)
        x = (x + int('d3a2646c', 16)) ^ (x << 9)
        x = (x + int('fd7046c5', 16)) + (x << 3)
        x = (x ^ int('b55a4f09', 16)) ^ (x >> 16)
        return x
    

    【讨论】:

      【解决方案3】:

      one_at_a_time hash given at wikipedia 的简单实现。注意这个算法有雪崩行为。

      def getJenkinHash(ba : bytearray):
          hash : int = 0
          for i in range(len(ba)):
              hash += ba[i]
              hash &= 0xFFFFFFFF
              hash += hash << 10
              hash &= 0xFFFFFFFF
              hash ^= hash >> 6
              hash &= 0xFFFFFFFF
          hash += hash << 3
          hash &= 0xFFFFFFFF
          hash ^= hash >> 11
          hash &= 0xFFFFFFFF
          hash += hash << 15
          hash &= 0xFFFFFFFF
          return hash
      

      【讨论】:

        【解决方案4】:

        你可以只使用 md5sum 的前 32 位

        >>> from hashlib import md5
        >>> from struct import unpack
        >>> unpack("<IIII",md5("thestring").digest())[0]
        1515985217
        

        【讨论】:

        • 通常出于速度原因避免使用 MD5,除非您真的需要加密哈希。但当然,MD5 的 C 库实现可能会远远超过上述算法的纯 Python 实现!
        【解决方案5】:

        最终我自己实现了它。根据原始版权声明发布(见下文)。使用风险自负,玩得开心:-)

        '''Implements a straight Jenkins lookup hash - http://burtleburtle.net/bob/hash/doobs.html
        
        Usage: 
            from jhash import jhash
            print jhash('My hovercraft is full of eels')
        
        Returns: unsigned 32 bit integer value
        
        Prereqs: None
        
        Tested with Python 2.6.
        Version 1.00 - der@dod.no - 23.08.2010
        
        Partly based on the Perl module Digest::JHash
        http://search.cpan.org/~shlomif/Digest-JHash-0.06/lib/Digest/JHash.pm
        
        Original copyright notice:
            By Bob Jenkins, 1996.  bob_jenkins@burtleburtle.net.  You may use this
            code any way you wish, private, educational, or commercial.  It's free.
        
            See http://burtleburtle.net/bob/hash/evahash.html
            Use for hash table lookup, or anything where one collision in 2^^32 is
            acceptable.  Do NOT use for cryptographic purposes.
        '''
        
        def mix(a, b, c):
            '''mix() -- mix 3 32-bit values reversibly.
        For every delta with one or two bits set, and the deltas of all three
          high bits or all three low bits, whether the original value of a,b,c
          is almost all zero or is uniformly distributed,
        * If mix() is run forward or backward, at least 32 bits in a,b,c
          have at least 1/4 probability of changing.
        * If mix() is run forward, every bit of c will change between 1/3 and
          2/3 of the time.  (Well, 22/100 and 78/100 for some 2-bit deltas.)'''
            # Need to constrain U32 to only 32 bits using the & 0xffffffff 
            # since Python has no native notion of integers limited to 32 bit
            # http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
            a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff
            a -= b; a -= c; a ^= (c>>13); a &= 0xffffffff
            b -= c; b -= a; b ^= (a<<8); b &= 0xffffffff
            c -= a; c -= b; c ^= (b>>13); c &= 0xffffffff
            a -= b; a -= c; a ^= (c>>12); a &= 0xffffffff
            b -= c; b -= a; b ^= (a<<16); b &= 0xffffffff
            c -= a; c -= b; c ^= (b>>5); c &= 0xffffffff
            a -= b; a -= c; a ^= (c>>3); a &= 0xffffffff
            b -= c; b -= a; b ^= (a<<10); b &= 0xffffffff
            c -= a; c -= b; c ^= (b>>15); c &= 0xffffffff
            return a, b, c
        
        def jhash(data, initval = 0):
            '''hash() -- hash a variable-length key into a 32-bit value
          data    : the key (the unaligned variable-length array of bytes)
          initval : can be any 4-byte value, defaults to 0
        Returns a 32-bit value.  Every bit of the key affects every bit of
        the return value.  Every 1-bit and 2-bit delta achieves avalanche.'''
            length = lenpos = len(data)
        
            # empty string returns 0
            if length == 0:
                return 0
        
            # Set up the internal state
            a = b = 0x9e3779b9 # the golden ratio; an arbitrary value
            c = initval        # the previous hash value
            p = 0              # string offset
        
            # ------------------------- handle most of the key in 12 byte chunks
            while lenpos >= 12:
                a += (ord(data[p+0]) + (ord(data[p+1])<<8) + (ord(data[p+2])<<16) + (ord(data[p+3])<<24))
                b += (ord(data[p+4]) + (ord(data[p+5])<<8) + (ord(data[p+6])<<16) + (ord(data[p+7])<<24))
                c += (ord(data[p+8]) + (ord(data[p+9])<<8) + (ord(data[p+10])<<16) + (ord(data[p+11])<<24))
                a, b, c = mix(a, b, c)
                p += 12
                lenpos -= 12
        
            # ------------------------- handle the last 11 bytes
            c += length
            if lenpos >= 11: c += ord(data[p+10])<<24
            if lenpos >= 10: c += ord(data[p+9])<<16
            if lenpos >= 9:  c += ord(data[p+8])<<8
            # the first byte of c is reserved for the length
            if lenpos >= 8:  b += ord(data[p+7])<<24
            if lenpos >= 7:  b += ord(data[p+6])<<16
            if lenpos >= 6:  b += ord(data[p+5])<<8
            if lenpos >= 5:  b += ord(data[p+4])
            if lenpos >= 4:  a += ord(data[p+3])<<24
            if lenpos >= 3:  a += ord(data[p+2])<<16
            if lenpos >= 2:  a += ord(data[p+1])<<8
            if lenpos >= 1:  a += ord(data[p+0])
            a, b, c = mix(a, b, c)
        
            # ------------------------- report the result
            return c
        
        if __name__ == "__main__": 
            hashstr = 'My hovercraft is full of eels'
            myhash = jhash(hashstr)
            print 'jhash("%s"): %d' % (hashstr, myhash)
        

        【讨论】:

        • 这里实现了Jenkins hashlookup2,供参考,免得让人迷惑。
        【解决方案6】:

        PyPi 上已经有一个"jenkins" package。但它不是“原生”的,因为核心实现是在 C 中。

        【讨论】:

        • 是的,我找到了它,但是当我尝试在 Windows 上安装它时,我收到了关于一些丢失的 .bat 文件的错误消息。我正在寻找具有最少先决条件的解决方案。
        • Jenkins 1.0.2 PyPi 包也无法在 Linux 上构建,并出现“OSError:/usr/lib/python2.6/dist-packages/lookup3.so:无法打开共享对象文件:没有这样的文件或目录”
        猜你喜欢
        • 2011-05-13
        • 2022-01-07
        • 1970-01-01
        • 2010-10-17
        • 2015-05-18
        • 1970-01-01
        • 2016-07-28
        • 2013-08-02
        • 1970-01-01
        相关资源
        最近更新 更多