Some Algorithms for Exponentiation: 2k-ary Algorithm

[This page is auto-generated; please prefer to read the article’s PDF Form.]

2^k-ary Algorithm

In the Binary LR algorithm, we multiply r with a whenever the bit read in an iteration is 1. Now suppose r = r₁ just before an iteration. The upcoming 4 iterations will be reading 4 bits from b (assuming these many bits are left), say (1011)₂, while updating r as:

Thus the value of r after these 4 iterations will be: r₁¹⁶a¹¹ = r₁^2⁴a^(1011)₂. Note that this required 4 squarings of r, and 3 multiplications r ⋅a (as there are 3 1-bit in (1011)₂).

In many applications of exponentiation, b may contain a large number of bits, say, in the order of 1000. Suppose, the pattern of 4 bits (1011)₂ is seen again in later iterations. Then again we will be doing the above mentioned 4 squarings and 3 multiplications. But suppose we had computed a^(1011)₂ once and saved it somewhere. Then, whenever we ﬁnd 4 consecutive bits of b being (1011)₂, we can update r directly to r^2⁴ (using 4 squarings: r^2¹,r^2²,r^2³,r^2⁴) and then to r^2⁴a^(1011)₂ (using 1 multiplication with the saved value). Thus, using the precomputed value, we could reduce the number of multiplications from 4+3 to 4+1 for every occurrence of the pattern (1011)₂ in b. Though, such precomputations will also be requiring certain multiplications, which we need to account for.

The 2^k-ary Algorithm is based on the above idea. The bits in b are considered in groups of k (k ≥ 2) consecutive bits; the least signiﬁcant k bits form a group and so on. An optimal value of k is chosen for this. Each group of k bits forms a number in range 0 to 2^k − 1, and can be treated as a digit d in base 2^k representation of b. The values a^d are precomputed and stored in a table for each digit d in base 2^k (as we will see, we need not do it for all digits). The algorithm proceeds similarly to Binary LR algorithm.

In the common-structure shown earlier, x will contain some preﬁx digits (in base 2^k) of b at any point. It is initialized to 0 to correspond to 0 preﬁx digits of b. Every iteration increases the number of preﬁx digits in x by 1. So, if the next digit of b to append to x is d, the iteration will do it by updating x to (2^k)x + d. To maintain invariant P, r can be updated to (modulo n):

Now we will see how the number of values to precompute can be reduced. Any nonzero digit d in base 2^k can be written:

(t represents “power of two”, and o “odd”). We will precompute only the odd powers of a: a¹,a³,a⁵,…,a^{2^k−1}. In each iteration when x is updated to (2^k)x + d, to maintain invariant P, r will be updated to (modulo n):

For this, ﬁrst r^{2^k−t} will be computed using (k − t) squarings of r. Then, the precomputed a^o will be multiplied with the updated r, and this result itself will be squared t times.

void precompute(uint *table, uint tsize, uint a, uint n)
{
  int i;
  uint aa = ((ulong)a * a) % n;

  /* table[i] will contain a^(2i+1) % n */

  i = 0;
  table[0] = a % n;

  while(i < tsize-1)
  {
    i = i + 1;
    table[i] = ((ulong)table[i-1] * aa) % n;
  }
}

void split_d(uint d, uint *t, uint *o)
{
  *t = 0;

  /* loop-invariant: d * 2^t = (initial d) */

  while((d & 0x1) == 0)
  {
    d = d >> 1;
    (*t)++;
  }

  *o = d;
}
void setup_mask(ulong *mask1, int *p1, uint b, int k)
{
  ulong mask;
  int p;

  mask = (1 << k) - 1;

  /* mask will always have some consecutive k bits as 1. Currently,
     it has the least-significant k bits as 1. We will need to read
     base 2^k digits from b, where each digit is a group of k bits.
     We now shift this k-bit mask so that it coincides with the
     most-significant 2^k digit (k bits) in b. */

  /* Number of base 2^k digits in 32 bits are (32 + k-1)/k. */
  p = ((32 + k-1)/k - 1)*k;
  mask = mask << p;

  while(!(b & mask))
  {
    mask = mask >> k;
    p = p - k;
  }

  *mask1 = mask;
  *p1 = p;
}

uint base2k_ary(uint a, uint b, uint n)
{
  /* although k = 3 here, we should choose optimal k based on m */
  int k = 3, i, p;
  uint B, *table, d, t, o;
  ulong mask;
  uint r, x;

  B = 1 << k; /* 2^k */

  table = malloc((B/2)*sizeof(uint));

  precompute(table, B/2, a, n);

  /* p tracks the position of the rightmost (k-th) 1-bit in mask */
  setup_mask(&mask, &p, b, k);

  x = 0;
  r = 1;

  /* loop-invariants:
       X: x consists of prefix bits of b from position m-1 to p+k.
       P: r = (a ^ x) % n */

  while(mask != 0)
  {
    /* read the base 2^k digit from the mask position */
    d = (b & mask) >> p;

    x = x*B + d; /* maintain X corresponding to update in step M */

    /* x was updated, now update r to maintain P */
    if(d > 0)
    {
      split_d(d, &t, &o);
      /* d = o * 2^t */

      /* compute r^(2^(k-t)) */
      for(i = 0; i < k-t; i++)
        r = ((ulong)r * r) % n;

      /* compute r*(a^o) */
      r = ((ulong)r * table[o/2]) % n;

      /* compute r^(2^t) */
      for(i = 0; i < t; i++)
        r = ((ulong)r * r) % n;
    }
    else
    {
      /* compute r^(2^k) */
      for(i = 0; i < k; i++)
        r = ((ulong)r * r) % n;
    }

    mask = mask >> k;  /* step M */
    p = p - k;
  }
  /* x = b */

  free(table);

  return r;
}

In each iteration except the ﬁrst one, this algorithm performs total k squarings of r. The ﬁrst iteration starts with r = 1, and so the ﬁrst (inner) loop of k −t squarings of r has negligible cost, leaving us with only the second loop of t squarings of r. The total number of iterations equals the number of base 2^k digits in the m bits of b, i.e. ⌈m ∕k ⌉

. The total squarings of r will come out to be maximum m − 1, which is same as the m − 1 squarings of r in Binary LR algorithm.

Also, each iteration does 1 multiplication r ⋅a^o if d≠0. So, there will be up to ⌈m∕k ⌉

such multiplications by all iterations. The precomputation itself involves 1 squaring (a²) and 2^k−1 − 1 multiplications, hence total 2^k−1 multiplications.

Thus, maximum total number of multiplications (including those for squaring) required is:

For a given m, let N_b and N(k) (for any integer k) denote the number of multiplications required by Binary LR algorithm and 2^k-ary algorithm respectively. For k = 2, N(k) < N_b iﬀ:

Similarly, for k = 3 and k = 4 also, when m ≤ 8, N(k) exceeds N_b. So, for the case of m ≤ 8, we can prefer Binary LR algorithm over 2^k-ary algorithm. Now we will see how to choose an optimal k when m > 8.