Home

[This page is auto-generated; please prefer to read the article’s PDF Form.]



[Prev]   [Up]   

A Modification

The Dolphin Algorithm rearranges all n∕g elements of one cycle before moving on to another cycle; there are total g such cycles to consider. We already know that among these g cycles of n∕g elements each, no two have a common element. Further, from point (c) at the beginning of the last section, the (j + 1)th element (for all j 0) of all these cycles are consecutive, forming a block of g contiguous indices in the array. Starting positions of these blocks is given by the elements of cycle C0, which are the set G0 = {0,g,2g,,((n∕g) 1)g}. There are total n∕g such blocks.

This allows that we can process all the g cycles together, by moving together the elements at their (j + 1)th index to the respective target-indices (which too must be contiguous). That is, we now move a block of g contiguous elements at a time, to its target location.

But this will require extra memory to store first g elements of the array, like we stored a single element in the method process_a_cycle(). The g = gcd(k,n) may be large requiring more memory. So, instead of processing all the g cycles together, we can pick up to c (1 c g) cycles at a time, where c can be a parameter. These picked c cycles should have their corresponding ((j + 1)th in each cycle) indices contiguous in the array. Below is an implementation of this algorithm:

/* k must satisfy: 1 <= k <= (n-1) */  
void cycling_blocks(int a[], int n, int k, int c)  
{  
  int i = 0, g;  
  int si, ti; /* source-index, target-index as per cycle C_i */  
  int external[c];  
 
  g = gcd(n, k);  
 
  while(i < g)  
  {  
    if(c > g-i)  
     c = g-i;  
 
    /* process these c cycles together: C_i,C_{i+1},...,C_{i+c-1} */  
 
    /* notice the similarity between this shifting of blocks and  
       shifting of single elements in process_a_cycle(). The  
       loop-invariants are similar to that method. */  
 
    ti = i;  
    si = g(ti);  
    memcpy(external, &a[ti], c*sizeof(int));  
    /* above created a vacancy/hole of c elements, starting at ti */  
 
    while(si != i)  
    {  
      memcpy(&a[ti], &a[si], c*sizeof(int));  
      ti = si;  
      si = g(ti);  
    }  
 
    memcpy(&a[ti], external, c*sizeof(int));  
    i = i+c;  
  }  
}

The original Dolphin Algorithm visits the entire range of array in hops of size k, while moving one element at a time. Hence, (depending upon k,n) it may not show good spatial locality of reference. This modified version (if g > 1 and c > 1) reduces the number of visits across the entire array, and hence must exhibit more spatial locality.

We may call this modified algorithm as “Cycling Blocks Algorithm”, because blocks of contiguous elements are shifted along a cycle. Note, for g = 1, the blocks can have a single element only, and so it becomes equivalent to the original Dolphin Algorithm.

The idea of processing the g cycles together was also discussed briefly in [4] (section 2.3.3), which also mentions about its use in [1].

References

[1]   W. Fletcher, R. Silver. Interchange of Two Blocks of Data. C. ACM, Vol 9 (5) (1966), 326.

[2]   D. Gries, H. Mills. Swapping Sections. Technical Report (Cornell University), 81-452 (1981). https://hdl.handle.net/1813/6292.

[3]   Nitin Verma. Multiples of an Integer Modulo Another Integer.
https://mathsanew.com/articles/
multiples_of_an_integer_modulo_another_integer.pdf (2020).

[4]   J. Bojesen, J. Katajainen. Managing Memory Hierarchies (MSc Thesis).
http://hjemmesider.diku.dk/~jyrki/PE-lab/Jesper/
bojesen00.ps (2000).

[Prev]   [Up]