如何计算 (A*B)%C？ [复制]答案

【问题标题】：How to calculate (A*B)%C? [duplicate]如何计算 (A*B)%C？ [复制]
【发布时间】：2013-06-24 19:09:47
【问题描述】：

有人可以帮我计算(A*B)%C，其中1<=A,B,C<=10^18 在C++ 中，没有大数字，只是一种数学方法。

【问题讨论】：

我发誓这是一堆东西的复制品。虽然很难找到它们......
我正要回答，然后我看到@Mysticial 在这里，哈哈，我出去了
stackoverflow.com/questions/10076011/overflow-aa-mod-n, stackoverflow.com/questions/14857702/…, stackoverflow.com/questions/14858476/…, 我找不到我要找的那个... :(
好吧，我相信您可以尝试将数组中的数字存储为二进制，然后像使用移位运算符一样使用它。
@devnull SO 的目标是成为对其他人有帮助的 QA 档案。如果 Q 碰巧缺乏努力，那么如果它设法提供帮助，那就这样吧。这个问题不同于“通常的”缺乏努力的问题，因为它不是太本地化（它甚至有无数的骗局）。只需列出关于 SO 的热门问题列表。他们中的许多人表现出同样的努力。但是他们有成千上万的选票，因为他们很有帮助 - IOW 实现了 SO 的目标。

【解决方案1】：

在我的脑海中（未经过广泛测试）

typedef unsigned long long BIG;
BIG mod_multiply( BIG A, BIG B, BIG C )
{
    BIG mod_product = 0;
    A %= C;

    while (A) {
        B %= C;
        if (A & 1) mod_product = (mod_product + B) % C;
        A >>= 1;
        B <<= 1;
    }

    return mod_product;
}

这具有复杂性O(log A) 迭代。您可能可以将大部分 % 替换为条件减法，以获得更高的性能。

typedef unsigned long long BIG;
BIG mod_multiply( BIG A, BIG B, BIG C )
{
    BIG mod_product = 0;
    // A %= C; may or may not help performance
    B %= C;

    while (A) {
        if (A & 1) {
            mod_product += B;
            if (mod_product > C) mod_product -= C;
        }
        A >>= 1;
        B <<= 1;
        if (B > C) B -= C;
    }

    return mod_product;
}

这个版本只有一个长整数模——它甚至可能比大块方法更快，这取决于你的处理器如何实现整数模。

现场演示：https://ideone.com/1pTldb -- 与 Yakk 的结果相同。

【讨论】：

if (mod_product > C) mod_product -= C; 不会让它更快我敢打赌——用分支替换 % 不是一个胜利：BIG tmp[]={mod_product, mod_product-C}; mod_product = tmp[mod_product>=C]; 是等价的，但在大多数现代处理器/编译器下，是快多了。同样杀死A&1 分支会很好。
@Yakk：这完全取决于处理器。有些有一个非常有效的条件 mov 指令。在其他方面，您的数组查找可能会更好。但是 64 位 % 在几乎所有系统上都非常慢，在许多情况下比管道刷新慢。在数组查找速度更快的处理器上，您不认为优化器已经知道这一点吗？
此外，如果优化器无法完成消除分支的任务，我还有其他技巧可以在查找表之前使用。
我还没有遇到很多编译器可以完全消除这样的分支......我没有想到%在64位上的缓慢。
@Yakk：整数除法和模数是慢指令。即使在每条指令的循环计数一致的处理器上，这些通常也是一个例外。

【解决方案2】：

this堆栈溢出答案的实现：

#include <stdint.h>
#include <tuple>
#include <iostream>

typedef std::tuple< uint32_t, uint32_t > split_t;
split_t split( uint64_t a )
{
  static const uint32_t mask = -1;
  auto retval = std::make_tuple( mask&a, ( a >> 32 ) );
  // std::cout << "(" << std::get<0>(retval) << "," << std::get<1>(retval) << ")\n";
  return retval;
}

typedef std::tuple< uint64_t, uint64_t, uint64_t, uint64_t > cross_t;
template<typename Lambda>
cross_t cross( split_t lhs, split_t rhs, Lambda&& op )
{
  return std::make_tuple( 
    op(std::get<0>(lhs), std::get<0>(rhs)),
    op(std::get<1>(lhs), std::get<0>(rhs)),
    op(std::get<0>(lhs), std::get<1>(rhs)),
    op(std::get<1>(lhs), std::get<1>(rhs))
  );
}

// c must have high bit unset:
uint64_t a_times_2_k_mod_c( uint64_t a, unsigned k, uint64_t c )
{
  a %= c;
  for (unsigned i = 0; i < k; ++i)
  {
    a <<= 1;
    a %= c;
  }
  return a;
}

// c must have about 2 high bits unset:
uint64_t a_times_b_mod_c( uint64_t a, uint64_t b, uint64_t c )
{
  // ensure a and b are < c:
  a %= c;
  b %= c;
  
  auto Z = cross( split(a), split(b), [](uint32_t lhs, uint32_t rhs)->uint64_t {
    return (uint64_t)lhs * (uint64_t)rhs;
  } );
  
  uint64_t to_the_0;
  uint64_t to_the_32_a;
  uint64_t to_the_32_b;
  uint64_t to_the_64;
  std::tie( to_the_0, to_the_32_a, to_the_32_b, to_the_64 ) = Z;
  
  // std::cout << to_the_0 << "+ 2^32 *(" << to_the_32_a << "+" << to_the_32_b << ") + 2^64 * " << to_the_64 << "\n";
  
  // this line is the one that requires 2 high bits in c to be clear
  // if you just add 2 of them then do a %c, then add the third and do
  // a %c, you can relax the requirement to "one high bit must be unset":
  return
    (to_the_0
    + a_times_2_k_mod_c(to_the_32_a+to_the_32_b, 32, c) // + will not overflow!
    + a_times_2_k_mod_c(to_the_64, 64, c) )
  %c;
}

int main()
{
  uint64_t retval = a_times_b_mod_c( 19010000000000000000, 1011000000000000, 1231231231231211 );
  std::cout << retval << "\n";
}

这里的想法是将您的 64 位整数拆分为一对 32 位整数，它们可以安全地在 64 位域中相乘。

我们将 a*b 表示为 (a_high * 2^32 + a_low) * (b_high * 2^32 + b_low)，进行 4 倍乘法（跟踪 2³² 因数而不将它们存储在我们的位中），然后注意使用 a * 2^k % c 可以通过一系列k 重复此模式：((a*2 %c) *2%c)... 来完成。所以我们可以在 2³² 中取这个 64 位整数的 3 到 4 元素多项式并减少它而不必担心。

昂贵的部分是a_times_2_k_mod_c 函数（唯一的循环）。

如果您知道c 有不止一个高位清除，您可以让它快很多倍。

您可以改为将 a %= c 替换为减法 a -= (a>=c)*c;

两者都做不太实际。

Live example

【讨论】：

而不是数组查找，为什么不A -= C * (A >= C);？
@BenVoigt 没有充分的理由。
另外，您对a_times_2_k_mod_c(to_the_64, 64, c) 的单次调用与我的整个函数的工作几乎相同。我看不出你的拆分方法有什么好处……或者引入一堆新奇的 C++ 模板来做到这一点。
@BenVoigt 它用减法/加法换取 4 次乘法，除非我数错了？很可能我做到了。你的更干净——我只是将某人的帖子移植到一个较早的问题到一个实现中，所以它非常笨拙。
是的，但是如果运行移位循环 96 次而不是（最多）64 次，您可能会失去任何好处。也许a_times_2_k_mod_c(to_the_32_a+to_the_32_b+a_times_2_k_mod_c(to_the_64, 32, c), 32, c) 会有所帮助。