Lesson Learned:
1. Just as the sorting, searching algorithms, the sequence search(O(N)) < binary search(O(logN)) < bitmap algorithm(O(1)).
2. To boost performance a program, the most possible bottleneck is the nested loops. For deep nested loops, the time is: d1*d2*d3...(di stands for the loop count of level i). So, we should optimize at different levels to boost the whole nested loops performance.
For example, we first caculate the MaxB(using fomular a+(N-1)*b <=p^2+q^2 ) for variable "b" in level1; then, we pre-calculate the possible value array for variable "a" in level2("a" must be in the square array set. Note: this greatly cuts level2 loop count). Then, we check if the a+(N-1)*b exceeds the max square value to filter even more level2 values(This at least cuts the level2 loop by half). Finally, in the level3, we use the bitmap search for optimization.