2014-04-24 22:01

题目:你有10亿条url,怎么检测其中时候有重复呢?

解法:Hash,算签名,然后用K-V数据库保存数据查重。

代码:

1 // 10.6 You have 10 billion URLs, how would you do to detect duplicates in them.
2 // Answer:
3 //    1. Use digital sign algorithm to convert string to a number of checksum.
4 //    2. Use this sign as the hash key, if memory allow, use an in-memory hash table to detect duplicates.
5 //    3. If memory won't fit in, use K-V database instead. 10GB scale should be acceptable for one machine, so I won't seek help from another computer.
6 int main()
7 {
8     return 0;
9 }

 

相关文章:

  • 2021-08-22
  • 2021-10-19
  • 2022-01-20
  • 2021-07-24
  • 2021-07-13
  • 2021-10-04
  • 2021-12-10
  • 2022-01-29
猜你喜欢
  • 2021-09-16
  • 2021-11-25
  • 2021-11-20
  • 2021-06-28
  • 2021-12-30
  • 2021-10-22
  • 2021-10-03
相关资源
相似解决方案