看完下面两篇文章:

  1. 阅读.NET源代码的那些事
  2. 关于 Hash Collision DoS 问题(哈希碰撞)

回忆一下Hash表的概念、构造方法和查找效率。

概念

顺序查找、折半查找、二叉排序树查找和B-树查找,查找的效率依赖于查找过程中比较的次数。理想的情况是不经过任何比较,直接定位要找的元素。定位是根据给定的Key找到记录存储位置的映射。我们一般称这种映射关系为hash函数。按照这个思想建立的表叫hash表。

好的hash函数的标准?简单和均匀。简单,指hash函数简单,计算速度快。均匀,指分布均匀,冲突少。

Hash函数的构造方法有:直接定址法,数字分析法,平方取中法,除留余数法,随机数法。(见《数据结构》严蔚敏

由于Hash函数是一个压缩映像,不可避免的会产生冲突。所以设计Hash表的时候还要设计一种处理冲突的办法。

处理冲突的方法有:开放定址法,再Hash法,链地址法,公共溢出区。(见《数据结构》严蔚敏

C#的Dictionary

C#中的Dictionary的hash函数算法是什么?还是用老赵文章中的代码片段,下面这段HashTable代码注释:

/*
  Implementation Notes:
  The generic Dictionary was copied from Hashtable's source - any bug 
  fixes here probably need to be made to the generic Dictionary as well.
  This Hashtable uses double hashing.  There are hashsize buckets in the 
  table, and each bucket can contain 0 or 1 element.  We a bit to mark
  whether there's been a collision when we inserted multiple elements 
  (ie, an inserted item was hashed at least a second time and we probed
  this bucket, but it was already in use).  Using the collision bit, we
  can terminate lookups & removes for elements that aren't in the hash
  table more quickly.  We steal the most significant bit from the hash code 
  to store the collision bit.
  Our hash function is of the following form: 
  h(key, n) = h1(key) + n*h2(key) 
  where n is the number of times we've hit a collided bucket and rehashed
  (on this particular lookup).  Here are our hash functions:
  h1(key) = GetHash(key);  // default implementation calls key.GetHashCode();
  h2(key) = 1 + (((h1(key) >> 5) + 1) % (hashsize - 1)); 
  The h1 can return any number.  h2 must return a number between 1 and
  hashsize - 1 that is relatively prime to hashsize (not a problem if 
  hashsize is prime).  (Knuth's Art of Computer Programming, Vol. 3, p. 528-9)
  If this is true, then we are guaranteed to visit every bucket in exactly
  hashsize probes, since the least common multiple of hashsize and h2(key)
  will be hashsize * h2(key).  (This is the first number where adding h2 to 
  h1 mod hashsize will be 0 and we will search the same bucket twice).
  We previously used a different h2(key, n) that was not constant.  That is a 
  horrifically bad idea, unless you can prove that series will never produce
  any identical numbers that overlap when you mod them by hashsize, for all 
  subranges from i to i+hashsize, for all i.  It's not worth investigating,
  since there was no clear benefit from using that hash function, and it was
  broken.
  For efficiency reasons, we've implemented this by storing h1 and h2 in a
  temporary, and setting a variable called seed equal to h1.  We do a probe, 
  and if we collided, we simply add h2 to seed each time through the loop. 
  A good test for h2() is to subclass Hashtable, provide your own implementation 
  of GetHash() that returns a constant, then add many items to the hash table.
  Make sure Count equals the number of items you inserted.
  Note that when we remove an item from the hash table, we set the key 
  equal to buckets, if there was a collision in this bucket.  Otherwise
  we'd either wipe out the collision bit, or we'd still have an item in 
  the hash table. 
   -- 
*/

相关文章:

猜你喜欢
  • 2022-12-23
  • 2021-09-22
  • 2021-08-13
  • 2021-05-30
相关资源
相似解决方案