Home

[This page is auto-generated; please prefer to read the article’s PDF Form.]



[Prev]   [Up]   [Next]   

2k-ary Algorithm

In the Binary LR algorithm, we multiply r with a whenever the bit read in an iteration is 1. Now suppose r = r1 just before an iteration. The upcoming 4 iterations will be reading 4 bits from b (assuming these many bits are left), say (1011)2, while updating r as:

r = r21a,  r = r2,  r = r2a, r = r2a

Thus the value of r after these 4 iterations will be: r116a11 = r124 a(1011)2. Note that this required 4 squarings of r, and 3 multiplications r a (as there are 3 1-bit in (1011)2).

In many applications of exponentiation, b may contain a large number of bits, say, in the order of 1000. Suppose, the pattern of 4 bits (1011)2 is seen again in later iterations. Then again we will be doing the above mentioned 4 squarings and 3 multiplications. But suppose we had computed a(1011)2 once and saved it somewhere. Then, whenever we find 4 consecutive bits of b being (1011)2, we can update r directly to r24 (using 4 squarings: r21 ,r22 ,r23 ,r24 ) and then to r24 a(1011)2 (using 1 multiplication with the saved value). Thus, using the precomputed value, we could reduce the number of multiplications from 4+3 to 4+1 for every occurrence of the pattern (1011)2 in b. Though, such precomputations will also be requiring certain multiplications, which we need to account for.

The 2k-ary Algorithm is based on the above idea. The bits in b are considered in groups of k (k 2) consecutive bits; the least significant k bits form a group and so on. An optimal value of k is chosen for this. Each group of k bits forms a number in range 0 to 2k 1, and can be treated as a digit d in base 2k representation of b. The values ad are precomputed and stored in a table for each digit d in base 2k (as we will see, we need not do it for all digits). The algorithm proceeds similarly to Binary LR algorithm.

In the common-structure shown earlier, x will contain some prefix digits (in base 2k) of b at any point. It is initialized to 0 to correspond to 0 prefix digits of b. Every iteration increases the number of prefix digits in x by 1. So, if the next digit of b to append to x is d, the iteration will do it by updating x to (2k)x + d. To maintain invariant P, r can be updated to (modulo n):

   k           k      k
a(2)x+d = (ax)2ad = r2 ad

where ad is a precomputed value.

Now we will see how the number of values to precompute can be reduced. Any nonzero digit d in base 2k can be written:

d = 2to,  where 0 ≤ t < k, o ∈ {1,3,5,...,2k − 1}

(t represents “power of two”, and o “odd”). We will precompute only the odd powers of a: a1,a3,a5,,a2k1 . In each iteration when x is updated to (2k)x + d, to maintain invariant P, r will be updated to (modulo n):

   k           k t     k  t      k−t    t
a(2)x+d = (ax)2a2 o = r2a2 o = (r2   ao)2

For this, first r2kt will be computed using (k t) squarings of r. Then, the precomputed ao will be multiplied with the updated r, and this result itself will be squared t times.

Note that when d is 0, we can simplify the update to r as:

a(2k)x+d = (ax)2ka0 = r2k

Following is an implementation of this algorithm:

void precompute(uint *table, uint tsize, uint a, uint n)  
{  
  int i;  
  uint aa = ((ulong)a * a) % n;  
 
  /* table[i] will contain a^(2i+1) % n */  
 
  i = 0;  
  table[0] = a % n;  
 
  while(i < tsize-1)  
  {  
    i = i + 1;  
    table[i] = ((ulong)table[i-1] * aa) % n;  
  }  
}  
 
void split_d(uint d, uint *t, uint *o)  
{  
  *t = 0;  
 
  /* loop-invariant: d * 2^t = (initial d) */  
 
  while((d & 0x1) == 0)  
  {  
    d = d >> 1;  
    (*t)++;  
  }  
 
  *o = d;  
}  
void setup_mask(ulong *mask1, int *p1, uint b, int k)  
{  
  ulong mask;  
  int p;  
 
  mask = (1 << k) - 1;  
 
  /* mask will always have some consecutive k bits as 1. Currently,  
     it has the least-significant k bits as 1. We will need to read  
     base 2^k digits from b, where each digit is a group of k bits.  
     We now shift this k-bit mask so that it coincides with the  
     most-significant 2^k digit (k bits) in b. */  
 
  /* Number of base 2^k digits in 32 bits are (32 + k-1)/k. */  
  p = ((32 + k-1)/k - 1)*k;  
  mask = mask << p;  
 
  while(!(b & mask))  
  {  
    mask = mask >> k;  
    p = p - k;  
  }  
 
  *mask1 = mask;  
  *p1 = p;  
}  
 
uint base2k_ary(uint a, uint b, uint n)  
{  
  /* although k = 3 here, we should choose optimal k based on m */  
  int k = 3, i, p;  
  uint B, *table, d, t, o;  
  ulong mask;  
  uint r, x;  
 
  B = 1 << k; /* 2^k */  
 
  table = malloc((B/2)*sizeof(uint));  
 
  precompute(table, B/2, a, n);  
 
  /* p tracks the position of the rightmost (k-th) 1-bit in mask */  
  setup_mask(&mask, &p, b, k);  
 
  x = 0;  
  r = 1;  
 
 
 
  /* loop-invariants:  
       X: x consists of prefix bits of b from position m-1 to p+k.  
       P: r = (a ^ x) % n */  
 
  while(mask != 0)  
  {  
    /* read the base 2^k digit from the mask position */  
    d = (b & mask) >> p;  
 
    x = x*B + d; /* maintain X corresponding to update in step M */  
 
    /* x was updated, now update r to maintain P */  
    if(d > 0)  
    {  
      split_d(d, &t, &o);  
      /* d = o * 2^t */  
 
      /* compute r^(2^(k-t)) */  
      for(i = 0; i < k-t; i++)  
        r = ((ulong)r * r) % n;  
 
      /* compute r*(a^o) */  
      r = ((ulong)r * table[o/2]) % n;  
 
      /* compute r^(2^t) */  
      for(i = 0; i < t; i++)  
        r = ((ulong)r * r) % n;  
    }  
    else  
    {  
      /* compute r^(2^k) */  
      for(i = 0; i < k; i++)  
        r = ((ulong)r * r) % n;  
    }  
 
    mask = mask >> k;  /* step M */  
    p = p - k;  
  }  
  /* x = b */  
 
  free(table);  
 
  return r;  
}

In each iteration except the first one, this algorithm performs total k squarings of r. The first iteration starts with r = 1, and so the first (inner) loop of k t squarings of r has negligible cost, leaving us with only the second loop of t squarings of r. The total number of iterations equals the number of base 2k digits in the m bits of b, i.e. ⌈m ∕k ⌉. The total squarings of r will come out to be maximum m 1, which is same as the m 1 squarings of r in Binary LR algorithm.

Also, each iteration does 1 multiplication r ao if d0. So, there will be up to ⌈m∕k ⌉ such multiplications by all iterations. The precomputation itself involves 1 squaring (a2) and 2k1 1 multiplications, hence total 2k1 multiplications.

Thus, maximum total number of multiplications (including those for squaring) required is:

         ⌈   ⌉
(m − 1)+   m- + 2k− 1 ≈ m + m-+ 2k− 1
           k                k
(1)

For a given m, let Nb and N(k) (for any integer k) denote the number of multiplications required by Binary LR algorithm and 2k-ary algorithm respectively. For k = 2, N(k) < Nb iff:

         m
    m  + -- + 22− 1 < 2(m − 1)
         2
⇔               m > 8                                            (2)

Similarly, for k = 3 and k = 4 also, when m 8, N(k) exceeds Nb. So, for the case of m 8, we can prefer Binary LR algorithm over 2k-ary algorithm. Now we will see how to choose an optimal k when m > 8.

[Prev]   [Up]   [Next]